February 1995 - Volume 11:4
By Tony Catone
The Internet can be a confusing place if you don't know an .arc from a .zoo; recognize .zip, .hqx, and .gz; or know how to deal with .tar.z or .sit.hqx file extension combinations. Information available from Internet archive sites typically comes packaged in a variety of special file formats. Although some mail and FTP programs are intelligent enough to convert these formats for you automatically (e.g., Elm, Eudora, NUpop, Mosaic, and Fetch), most are not.
Fortunately, it is usually possible to identify a file format by the filename extension, and that, combined with a knowledge of which program can decipher the file format, will allow you to manually convert the file into something usable. Those key pieces of information - filename extensions; associated file formats; and conversion programs for PC, Mac, and UNIX platforms - are presented in the tables at the end of this article. That information may be all you want, or need to know, about file formats. For example, if you're a Mac user and you see a file with a .sit extension, you know that you can use stuffitlite to unpack it. The remainder of this article is a primer on file formats - for those who either want, or at least are not averse to, a little technical detail.
Transport formats allow files that cannot be transmitted without damage over a particular type of communication channel to be translated into a form that can. For example, they allow binary files (such as spreadsheets or word-processing documents) to be converted into a form that can be e-mailed to colleagues. The price paid for this conversion is an increase in file size, typically on the order of 25 to 100 percent, depending on the particular transport format used.
Common transport formats include uuencode (.uu or .uue), originally part of the UNIX to UNIX Copy (uucp) suite of utilities; xxencode (.xx), a newer variation of uuencode that avoids character set translation problems between ASCII and the IBM mainframe EBCDIC character sets; and BinHex (.hqx), originally written for the Macintosh and which understands the multiple-fork structure of Macintosh files. A new Internet standard, the Multipurpose Internet Mail Extensions (.mime), is slowly gaining acceptance as a replacement for all these disparate transport formats.
Compression programs work on the contents of one file at a time, and attempt to shrink the size of that file by encoding more compactly redundancies that may exist within it. A brief introduction to various compression algorithms can be found in the Usenet newsgroup comp.compression's monthly Frequently Asked Questions (FAQ) post. Since most users find it easier to have a program compress many files at once rather than one at time, file compressors have fallen out of favor in lieu of file archivers that support compression. The major exceptions to this rule are the various flavors of the UNIX operating system, where file compressors such as compress (.Z), pack (.z) and gzip (.gz) are still commonly used, most often in conjunction with a file archive format such as the Tape ARchive format (.tar).
The most popular file formats are archive formats, which are designed to maintain some external features of a collection of files, most often their placement in a directory structure. The most common UNIX archive format is the Tape ARchive format, or tar. Tar files typically end in a .tar extension, are binary files, and usually include nested directory and subdirectory information. Since they are binary files, tar files cannot be mailed directly over the Internet without preprocessing by a file format transport utility such as uuencode.
Tar files are not compressed, though they may achieve some minor space savings by eliminating the slack space operating systems' minimum cluster sizes typically induce, especially on large numbers of smaller files. Because of this, it is common for tar files to subsequently be processed by one of the UNIX compression programs, resulting in gziped .tar.gz files or .tar.Z files.
Some file archivers also support compression of files: GNU tar for UNIX (in conjunction with the gzip compression program); PKzip (.zip) for DOS; StuffIt (.sit) for the Macintosh; and Info-Zip for UNIX, DOS, and Macintosh are all file archivers that compress the files as they are added to the archive. Compressing files while archiving them yields better compression ratios, as the compression algorithm can take advantage of redundancies across multiple files in the archive. PC and Macintosh archives are typically generated by an archive program that supports compression, so although in the general case they need not be compressed, they almost always are.
Common on both PCs and Macs are archive formats that prepend an executable header to the archive, so that the program can be directly executed; PC .exe file archives and Macintosh .sea archives are both of this type. The disadvantage to these self-extracting archives is that, since they are binary executables, they cannot be unarchived on other operating system platforms without software emulation of the operating system under which they were created.
Although the range of file formats is vast, and their abilities to compress files vary substantially, in reality there are only a handful of file formats that predominate at any given time. The .zip format has achieved great acceptance over the last few years, in part because Phil Katz, the author of PKzip, placed in the public domain the .zip file format, compression format and .zip filename extension. As a result, a freely available implementation of zip and unzip by the Info-Zip Internet group is available for a wide array of operating systems including UNIX, MSDOS, Windows NT, MacOS, OS/2, VMS, the Atari, and the Amiga. The appeal of having a format actively maintained on so many platforms has made PKZIP the format of choice with many Internet archive sites. Some predominantly UNIX sites still use compressed or gziped tar files almost exclusively, and Macintosh archives favor file formats that understand the multi-fork Macintosh file structure, such as .sea, .sit (StuffIt), or .hqx (BinHex).
Older files may still be archived in what is today a more obscure format but which was all the rage when the file archive was constructed, so it is not only for historical interest that other formats are listed in the following tables.
TONY CATONE (firstname.lastname@example.org) is the Wharton Computing and Information Technology distributed representative to the Wharton School's Department of Statistics.
Gailly, Jean-loup. (9 Nov 94 09:58:56 GMT) NetNews comp.compression's "Frequently Asked Questions."
Glossbrenner, Alfred and Emily Glossbrenner. (1994) Internet Slick Tricks: Smart Secrets Revealed, Random House, New York, NY.
Heslop, Brent and David Angell. (1994) The Instant Internet Guide: Hands-On Global Networking. Addison-Wesley, Reading, MA.
LaQuey, Tracy L. (1994) The Internet Companion Plus: A beginner's Start-Up Kit for Global Networking. Addison-Wesley, Reading, MA.
Lemson, David, (2 Dec.94) ftp.cso.uiuc.edu:/doc/pcnet/compression.
Manager, Jason J. (1995) The Essential Internet Information Guide. McGraw-Hill, Berkshire, England.
Tables: Conversion programs by platform
Table of file formats Conversion programs for MSDOS Extension File Format MSDOS =================================================== ap Whap - arc LH(arc) arc602.exe arj ARJ arj230/arj.exe bndl Bundle - boo BOO msbpct/msbmbk.exe c Compact - com COMT comt010d.zip cpt Compactor - dd Disk Doubler - dwc DWC dwc-a501.exe exe Self extracting (automatic) f Freeze - gz GNU zip gzip-1.2.4.msdos.exe hex Intel HEX hc.zip hpk HPACK hpack78.zip hqx BinHex xbin23.zip hyp HYPER hyper25.zip ish Ish ish200.lzh lbr LU/LAR lue220.arc lzh LHA lha213.exe lzh LHarc lh113c.exe lzss LZSS - md MDCD mdcd10.arc mime MIME mpack-1.4-pc.zip pak PAK pak251.exe pit PackIt unpackit.exe sea Self extracting - shar Shell archive toadshr1.arc shk ShrinkIt - sit StuffIt unsit30.zip stf ToFit - tar Tape Archive tar uue Uuencoded toaduu20.zip wrp WARP - xqx Squeeze sqpc131.arc xxe Xxencoded ncdc150.zip y Yabba - Z UNIX compress u16.zip z UNIX pack - zip PKZIP pkz204g.exe zoo ZOO zoo210.exe Notes: 1. A comprehensive guide to some of the more obscure file compression formats is available via anonymous ftp at ftp.cso.uiuc.edu:/doc/pcnet/compression.
Table of file formats Conversion programs for Macintosh Extension File Format Macintosh ================================================= ap Whap - arc LH(arc) arcmac1.3c arj ARJ - bndl Bundle bundle boo BOO - c Compact - com COMT - cpt Compactor compactor1.21 dd Disk Doubler diskdoubler3.7 dwc DWC - exe Self extracting - f Freeze - gz GNU zip MacGzip0.2 hex Intel HEX - hpk HPACK - hqx BinHex binhex4.0 hyp HYPER - ish Ish ishmac-06 lbr LU/LAR - lzh LHA - lzh LHarc maclharc0.41 lzss LZSS lzss2.0b5 md MDCD - mime MIME mpack-1.4-mac.hqx pak PAK - pit PackIt packit3.1.3 sea Self extracting (automatic) shar Shell archive - shk ShrinkIt - sit StuffIt stuffitlite stf ToFit stf1.2 tar Tape Archive untar2.0 uue Uuencoded uutool2.0.3 wrp WARP - xqx Squeeze - xxe Xxencoded - y Yabba - Z UNIX compress maccompress3.2a z UNIX pack - zip PKZIP unzip1.1 zoo ZOO macbooz2.1 Notes: 1. A comprehensive guide to some of the more obscure file compression formats is available via anonymous ftp at ftp.cso.uiuc.edu:/doc/pcnet/compression.
Table of file formats Conversion programs for UNIX Extension File Format UNIX =================================================== ap Whap yabbawhap arc LH(arc) arc521 arj ARJ unarj230 bndl Bundle unbundle boo BOO - c Compact uncompact com COMT - cpt Compactor - dd Disk Doubler - dwc DWC - exe Self extracting - f Freeze freeze-2.3.4.tar.Z gz GNU zip gzip-1.2.4.tar hex Intel HEX - hpk HPACK - hqx BinHex mcvert hyp HYPER - ish Ish - lbr LU/LAR lar lzh LHA lha1.00 lzh LHarc lharc102 lzss LZSS - md MDCD - mime MIME mpack-1.4-src.tar.Z pak PAK arc521 pit PackIt unpit sea Self extracting - shar Shell archive unhshar shk ShrinkIt - sit StuffIt unsit stf ToFit - tar Tape Archive tar uue Uuencoded uudecode wrp WARP - xqx Squeeze - xxe Xxencoded xxdecode y Yabba yabbawhap Z UNIX compress compress z UNIX pack unpack zip PKZIP unzip50 zoo ZOO zoo210 Notes: 1. A comprehensive guide to some of the more obscure file compression formats is available via anonymous ftp at ftp.cso.uiuc.edu:/doc/pcnet/compression.