|
|
February 1995 - Volume 11:4 [Printout | Contents | Search ]
By Tony Catone The Internet can be a confusing place if you don't know an .arc from a .zoo; recognize .zip, .hqx, and .gz; or know how to deal with .tar.z or .sit.hqx file extension combinations. Information available from Internet archive sites typically comes packaged in a variety of special file formats. Although some mail and FTP programs are intelligent enough to convert these formats for you automatically (e.g., Elm, Eudora, NUpop, Mosaic, and Fetch), most are not. Fortunately, it is usually possible to identify a file format by the filename extension, and that, combined with a knowledge of which program can decipher the file format, will allow you to manually convert the file into something usable. Those key pieces of information - filename extensions; associated file formats; and conversion programs for PC, Mac, and UNIX platforms - are presented in the tables at the end of this article. That information may be all you want, or need to know, about file formats. For example, if you're a Mac user and you see a file with a .sit extension, you know that you can use stuffitlite to unpack it. The remainder of this article is a primer on file formats - for those who either want, or at least are not averse to, a little technical detail. Transport formatsTransport formats allow files that cannot be transmitted without damage over a particular type of communication channel to be translated into a form that can. For example, they allow binary files (such as spreadsheets or word-processing documents) to be converted into a form that can be e-mailed to colleagues. The price paid for this conversion is an increase in file size, typically on the order of 25 to 100 percent, depending on the particular transport format used. Common transport formats include uuencode (.uu or .uue), originally part of the UNIX to UNIX Copy (uucp) suite of utilities; xxencode (.xx), a newer variation of uuencode that avoids character set translation problems between ASCII and the IBM mainframe EBCDIC character sets; and BinHex (.hqx), originally written for the Macintosh and which understands the multiple-fork structure of Macintosh files. A new Internet standard, the Multipurpose Internet Mail Extensions (.mime), is slowly gaining acceptance as a replacement for all these disparate transport formats. Compression formatsCompression programs work on the contents of one file at a time, and attempt to shrink the size of that file by encoding more compactly redundancies that may exist within it. A brief introduction to various compression algorithms can be found in the Usenet newsgroup comp.compression's monthly Frequently Asked Questions (FAQ) post. Since most users find it easier to have a program compress many files at once rather than one at time, file compressors have fallen out of favor in lieu of file archivers that support compression. The major exceptions to this rule are the various flavors of the UNIX operating system, where file compressors such as compress (.Z), pack (.z) and gzip (.gz) are still commonly used, most often in conjunction with a file archive format such as the Tape ARchive format (.tar). Archive formatsThe most popular file formats are archive formats, which are designed to maintain some external features of a collection of files, most often their placement in a directory structure. The most common UNIX archive format is the Tape ARchive format, or tar. Tar files typically end in a .tar extension, are binary files, and usually include nested directory and subdirectory information. Since they are binary files, tar files cannot be mailed directly over the Internet without preprocessing by a file format transport utility such as uuencode. Tar files are not compressed, though they may achieve some minor space savings by eliminating the slack space operating systems' minimum cluster sizes typically induce, especially on large numbers of smaller files. Because of this, it is common for tar files to subsequently be processed by one of the UNIX compression programs, resulting in gziped .tar.gz files or .tar.Z files. Some file archivers also support compression of files: GNU tar for UNIX (in conjunction with the gzip compression program); PKzip (.zip) for DOS; StuffIt (.sit) for the Macintosh; and Info-Zip for UNIX, DOS, and Macintosh are all file archivers that compress the files as they are added to the archive. Compressing files while archiving them yields better compression ratios, as the compression algorithm can take advantage of redundancies across multiple files in the archive. PC and Macintosh archives are typically generated by an archive program that supports compression, so although in the general case they need not be compressed, they almost always are. Common on both PCs and Macs are archive formats that prepend an executable header to the archive, so that the program can be directly executed; PC .exe file archives and Macintosh .sea archives are both of this type. The disadvantage to these self-extracting archives is that, since they are binary executables, they cannot be unarchived on other operating system platforms without software emulation of the operating system under which they were created. Although the range of file formats is vast, and their abilities to compress files vary substantially, in reality there are only a handful of file formats that predominate at any given time. The .zip format has achieved great acceptance over the last few years, in part because Phil Katz, the author of PKzip, placed in the public domain the .zip file format, compression format and .zip filename extension. As a result, a freely available implementation of zip and unzip by the Info-Zip Internet group is available for a wide array of operating systems including UNIX, MSDOS, Windows NT, MacOS, OS/2, VMS, the Atari, and the Amiga. The appeal of having a format actively maintained on so many platforms has made PKZIP the format of choice with many Internet archive sites. Some predominantly UNIX sites still use compressed or gziped tar files almost exclusively, and Macintosh archives favor file formats that understand the multi-fork Macintosh file structure, such as .sea, .sit (StuffIt), or .hqx (BinHex). Older files may still be archived in what is today a more obscure format but which was all the rage when the file archive was constructed, so it is not only for historical interest that other formats are listed in the following tables. TONY CATONE (catone@stat.wharton.upenn.edu) is the Wharton Computing and Information Technology distributed representative to the Wharton School's Department of Statistics. Sidebar: Bibliography Gailly, Jean-loup. (9 Nov 94 09:58:56 GMT) NetNews comp.compression's "Frequently Asked Questions." Glossbrenner, Alfred and Emily Glossbrenner. (1994) Internet Slick Tricks: Smart Secrets Revealed, Random House, New York, NY. Heslop, Brent and David Angell. (1994) The Instant Internet Guide: Hands-On Global Networking. Addison-Wesley, Reading, MA. LaQuey, Tracy L. (1994) The Internet Companion Plus: A beginner's Start-Up Kit for Global Networking. Addison-Wesley, Reading, MA. Lemson, David, (2 Dec.94) ftp.cso.uiuc.edu:/doc/pcnet/compression. Manager, Jason J. (1995) The Essential Internet Information Guide. McGraw-Hill, Berkshire, England. Tables: Conversion programs by platform
Table of file formats
Conversion programs for MSDOS
Extension File Format MSDOS
===================================================
ap Whap -
arc LH(arc) arc602.exe
arj ARJ arj230/arj.exe
bndl Bundle -
boo BOO msbpct/msbmbk.exe
c Compact -
com COMT comt010d.zip
cpt Compactor -
dd Disk Doubler -
dwc DWC dwc-a501.exe
exe Self extracting (automatic)
f Freeze -
gz GNU zip gzip-1.2.4.msdos.exe
hex Intel HEX hc.zip
hpk HPACK hpack78.zip
hqx BinHex xbin23.zip
hyp HYPER hyper25.zip
ish Ish ish200.lzh
lbr LU/LAR lue220.arc
lzh LHA lha213.exe
lzh LHarc lh113c.exe
lzss LZSS -
md MDCD mdcd10.arc
mime MIME mpack-1.4-pc.zip
pak PAK pak251.exe
pit PackIt unpackit.exe
sea Self extracting -
shar Shell archive toadshr1.arc
shk ShrinkIt -
sit StuffIt unsit30.zip
stf ToFit -
tar Tape Archive tar
uue Uuencoded toaduu20.zip
wrp WARP -
xqx Squeeze sqpc131.arc
xxe Xxencoded ncdc150.zip
y Yabba -
Z UNIX compress u16.zip
z UNIX pack -
zip PKZIP pkz204g.exe
zoo ZOO zoo210.exe
Notes: 1. A comprehensive guide to some of the
more obscure file compression formats is available
via anonymous ftp at ftp.cso.uiuc.edu:/doc/pcnet/compression.
Table of file formats
Conversion programs for Macintosh
Extension File Format Macintosh
=================================================
ap Whap -
arc LH(arc) arcmac1.3c
arj ARJ -
bndl Bundle bundle
boo BOO -
c Compact -
com COMT -
cpt Compactor compactor1.21
dd Disk Doubler diskdoubler3.7
dwc DWC -
exe Self extracting -
f Freeze -
gz GNU zip MacGzip0.2
hex Intel HEX -
hpk HPACK -
hqx BinHex binhex4.0
hyp HYPER -
ish Ish ishmac-06
lbr LU/LAR -
lzh LHA -
lzh LHarc maclharc0.41
lzss LZSS lzss2.0b5
md MDCD -
mime MIME mpack-1.4-mac.hqx
pak PAK -
pit PackIt packit3.1.3
sea Self extracting (automatic)
shar Shell archive -
shk ShrinkIt -
sit StuffIt stuffitlite
stf ToFit stf1.2
tar Tape Archive untar2.0
uue Uuencoded uutool2.0.3
wrp WARP -
xqx Squeeze -
xxe Xxencoded -
y Yabba -
Z UNIX compress maccompress3.2a
z UNIX pack -
zip PKZIP unzip1.1
zoo ZOO macbooz2.1
Notes: 1. A comprehensive guide to some of the
more obscure file compression formats is available
via anonymous ftp at ftp.cso.uiuc.edu:/doc/pcnet/compression.
Table of file formats
Conversion programs for UNIX
Extension File Format UNIX
===================================================
ap Whap yabbawhap
arc LH(arc) arc521
arj ARJ unarj230
bndl Bundle unbundle
boo BOO -
c Compact uncompact
com COMT -
cpt Compactor -
dd Disk Doubler -
dwc DWC -
exe Self extracting -
f Freeze freeze-2.3.4.tar.Z
gz GNU zip gzip-1.2.4.tar
hex Intel HEX -
hpk HPACK -
hqx BinHex mcvert
hyp HYPER -
ish Ish -
lbr LU/LAR lar
lzh LHA lha1.00
lzh LHarc lharc102
lzss LZSS -
md MDCD -
mime MIME mpack-1.4-src.tar.Z
pak PAK arc521
pit PackIt unpit
sea Self extracting -
shar Shell archive unhshar
shk ShrinkIt -
sit StuffIt unsit
stf ToFit -
tar Tape Archive tar
uue Uuencoded uudecode
wrp WARP -
xqx Squeeze -
xxe Xxencoded xxdecode
y Yabba yabbawhap
Z UNIX compress compress
z UNIX pack unpack
zip PKZIP unzip50
zoo ZOO zoo210
Notes: 1. A comprehensive guide to some of the
more obscure file compression formats is available
via anonymous ftp at ftp.cso.uiuc.edu:/doc/pcnet/compression.
|