PENN PRINTOUT
The University of Pennsylvania's Online Computing Magazine

February 1995 - Volume 11:4

[Printout | Contents | Search ]


Keys to the kingdom: Unlocking Internet file formats

By Tony Catone

The Internet can be a confusing place if you don't know an .arc from a .zoo; recognize .zip, .hqx, and .gz; or know how to deal with .tar.z or .sit.hqx file extension combinations. Information available from Internet archive sites typically comes packaged in a variety of special file formats. Although some mail and FTP programs are intelligent enough to convert these formats for you automatically (e.g., Elm, Eudora, NUpop, Mosaic, and Fetch), most are not.

Fortunately, it is usually possible to identify a file format by the filename extension, and that, combined with a knowledge of which program can decipher the file format, will allow you to manually convert the file into something usable. Those key pieces of information - filename extensions; associated file formats; and conversion programs for PC, Mac, and UNIX platforms - are presented in the tables at the end of this article. That information may be all you want, or need to know, about file formats. For example, if you're a Mac user and you see a file with a .sit extension, you know that you can use stuffitlite to unpack it. The remainder of this article is a primer on file formats - for those who either want, or at least are not averse to, a little technical detail.


Transport formats

Transport formats allow files that cannot be transmitted without damage over a particular type of communication channel to be translated into a form that can. For example, they allow binary files (such as spreadsheets or word-processing documents) to be converted into a form that can be e-mailed to colleagues. The price paid for this conversion is an increase in file size, typically on the order of 25 to 100 percent, depending on the particular transport format used.

Common transport formats include uuencode (.uu or .uue), originally part of the UNIX to UNIX Copy (uucp) suite of utilities; xxencode (.xx), a newer variation of uuencode that avoids character set translation problems between ASCII and the IBM mainframe EBCDIC character sets; and BinHex (.hqx), originally written for the Macintosh and which understands the multiple-fork structure of Macintosh files. A new Internet standard, the Multipurpose Internet Mail Extensions (.mime), is slowly gaining acceptance as a replacement for all these disparate transport formats.


Compression formats

Compression programs work on the contents of one file at a time, and attempt to shrink the size of that file by encoding more compactly redundancies that may exist within it. A brief introduction to various compression algorithms can be found in the Usenet newsgroup comp.compression's monthly Frequently Asked Questions (FAQ) post. Since most users find it easier to have a program compress many files at once rather than one at time, file compressors have fallen out of favor in lieu of file archivers that support compression. The major exceptions to this rule are the various flavors of the UNIX operating system, where file compressors such as compress (.Z), pack (.z) and gzip (.gz) are still commonly used, most often in conjunction with a file archive format such as the Tape ARchive format (.tar).


Archive formats

The most popular file formats are archive formats, which are designed to maintain some external features of a collection of files, most often their placement in a directory structure. The most common UNIX archive format is the Tape ARchive format, or tar. Tar files typically end in a .tar extension, are binary files, and usually include nested directory and subdirectory information. Since they are binary files, tar files cannot be mailed directly over the Internet without preprocessing by a file format transport utility such as uuencode.

Tar files are not compressed, though they may achieve some minor space savings by eliminating the slack space operating systems' minimum cluster sizes typically induce, especially on large numbers of smaller files. Because of this, it is common for tar files to subsequently be processed by one of the UNIX compression programs, resulting in gziped .tar.gz files or .tar.Z files.

Some file archivers also support compression of files: GNU tar for UNIX (in conjunction with the gzip compression program); PKzip (.zip) for DOS; StuffIt (.sit) for the Macintosh; and Info-Zip for UNIX, DOS, and Macintosh are all file archivers that compress the files as they are added to the archive. Compressing files while archiving them yields better compression ratios, as the compression algorithm can take advantage of redundancies across multiple files in the archive. PC and Macintosh archives are typically generated by an archive program that supports compression, so although in the general case they need not be compressed, they almost always are.

Common on both PCs and Macs are archive formats that prepend an executable header to the archive, so that the program can be directly executed; PC .exe file archives and Macintosh .sea archives are both of this type. The disadvantage to these self-extracting archives is that, since they are binary executables, they cannot be unarchived on other operating system platforms without software emulation of the operating system under which they were created.

Although the range of file formats is vast, and their abilities to compress files vary substantially, in reality there are only a handful of file formats that predominate at any given time. The .zip format has achieved great acceptance over the last few years, in part because Phil Katz, the author of PKzip, placed in the public domain the .zip file format, compression format and .zip filename extension. As a result, a freely available implementation of zip and unzip by the Info-Zip Internet group is available for a wide array of operating systems including UNIX, MSDOS, Windows NT, MacOS, OS/2, VMS, the Atari, and the Amiga. The appeal of having a format actively maintained on so many platforms has made PKZIP the format of choice with many Internet archive sites. Some predominantly UNIX sites still use compressed or gziped tar files almost exclusively, and Macintosh archives favor file formats that understand the multi-fork Macintosh file structure, such as .sea, .sit (StuffIt), or .hqx (BinHex).

Older files may still be archived in what is today a more obscure format but which was all the rage when the file archive was constructed, so it is not only for historical interest that other formats are listed in the following tables.


TONY CATONE (catone@stat.wharton.upenn.edu) is the Wharton Computing and Information Technology distributed representative to the Wharton School's Department of Statistics.

Sidebar: Bibliography

Gailly, Jean-loup. (9 Nov 94 09:58:56 GMT) NetNews comp.compression's "Frequently Asked Questions."

Glossbrenner, Alfred and Emily Glossbrenner. (1994) Internet Slick Tricks: Smart Secrets Revealed, Random House, New York, NY.

Heslop, Brent and David Angell. (1994) The Instant Internet Guide: Hands-On Global Networking. Addison-Wesley, Reading, MA.

LaQuey, Tracy L. (1994) The Internet Companion Plus: A beginner's Start-Up Kit for Global Networking. Addison-Wesley, Reading, MA.

Lemson, David, (2 Dec.94) ftp.cso.uiuc.edu:/doc/pcnet/compression.

Manager, Jason J. (1995) The Essential Internet Information Guide. McGraw-Hill, Berkshire, England.


Tables: Conversion programs by platform


                    Table of file formats
                 Conversion programs for MSDOS 


     Extension   File Format      MSDOS                 
     ===================================================
     ap          Whap              -                    
     arc         LH(arc)          arc602.exe            
     arj         ARJ              arj230/arj.exe        
     bndl        Bundle            -                    
     boo         BOO              msbpct/msbmbk.exe     
     c           Compact           -                    
     com         COMT             comt010d.zip          
     cpt         Compactor         -                    
     dd          Disk Doubler      -                    
     dwc         DWC              dwc-a501.exe          
     exe         Self extracting  (automatic)           
     f           Freeze            -                    
     gz          GNU zip          gzip-1.2.4.msdos.exe  
     hex         Intel HEX        hc.zip                
     hpk         HPACK            hpack78.zip           
     hqx         BinHex           xbin23.zip            
     hyp         HYPER            hyper25.zip           
     ish         Ish              ish200.lzh            
     lbr         LU/LAR           lue220.arc            
     lzh         LHA              lha213.exe            
     lzh         LHarc            lh113c.exe            
     lzss        LZSS              -                    
     md          MDCD             mdcd10.arc            
     mime        MIME             mpack-1.4-pc.zip      
     pak         PAK              pak251.exe            
     pit         PackIt           unpackit.exe          
     sea         Self extracting   -                    
     shar        Shell archive    toadshr1.arc          
     shk         ShrinkIt          -                    
     sit         StuffIt          unsit30.zip           
     stf         ToFit             -                    
     tar         Tape Archive     tar                   
     uue         Uuencoded        toaduu20.zip          
     wrp         WARP              -                    
     xqx         Squeeze          sqpc131.arc           
     xxe         Xxencoded        ncdc150.zip           
     y           Yabba             -                    
     Z           UNIX compress    u16.zip               
     z           UNIX pack         -                    
     zip         PKZIP            pkz204g.exe           
     zoo         ZOO              zoo210.exe            

     Notes:  1.  A comprehensive guide to some of the 
     more obscure file compression formats is available 
     via anonymous ftp at ftp.cso.uiuc.edu:/doc/pcnet/compression.  



                 Table of file formats
            Conversion programs for Macintosh 

     Extension   File Format      Macintosh           
     =================================================
     ap          Whap              -                  
     arc         LH(arc)          arcmac1.3c          
     arj         ARJ               -                  
     bndl        Bundle           bundle              
     boo         BOO               -                  
     c           Compact           -                  
     com         COMT              -                  
     cpt         Compactor        compactor1.21       
     dd          Disk Doubler     diskdoubler3.7      
     dwc         DWC               -                  
     exe         Self extracting   -                  
     f           Freeze            -                  
     gz          GNU zip          MacGzip0.2          
     hex         Intel HEX         -                  
     hpk         HPACK             -                  
     hqx         BinHex           binhex4.0           
     hyp         HYPER             -                  
     ish         Ish              ishmac-06           
     lbr         LU/LAR            -                  
     lzh         LHA               -                  
     lzh         LHarc            maclharc0.41        
     lzss        LZSS             lzss2.0b5           
     md          MDCD              -                  
     mime        MIME             mpack-1.4-mac.hqx   
     pak         PAK               -                  
     pit         PackIt           packit3.1.3         
     sea         Self extracting  (automatic)         
     shar        Shell archive     -                  
     shk         ShrinkIt          -                  
     sit         StuffIt          stuffitlite         
     stf         ToFit            stf1.2              
     tar         Tape Archive     untar2.0            
     uue         Uuencoded        uutool2.0.3         
     wrp         WARP              -                  
     xqx         Squeeze           -                  
     xxe         Xxencoded         -                  
     y           Yabba             -                  
     Z           UNIX compress    maccompress3.2a     
     z           UNIX pack         -                  
     zip         PKZIP            unzip1.1            
     zoo         ZOO              macbooz2.1          

     Notes:  1.  A comprehensive guide to some of the 
     more obscure file compression formats is available 
     via anonymous ftp at ftp.cso.uiuc.edu:/doc/pcnet/compression.  


                Table of file formats
          Conversion programs for UNIX 

     Extension   File Format      UNIX
     ===================================================
     ap          Whap             yabbawhap
     arc         LH(arc)          arc521
     arj         ARJ              unarj230
     bndl        Bundle           unbundle
     boo         BOO               - 
     c           Compact          uncompact
     com         COMT              - 
     cpt         Compactor         - 
     dd          Disk Doubler      - 
     dwc         DWC               - 
     exe         Self extracting   - 
     f           Freeze           freeze-2.3.4.tar.Z
     gz          GNU zip          gzip-1.2.4.tar
     hex         Intel HEX         - 
     hpk         HPACK             -                    
     hqx         BinHex           mcvert
     hyp         HYPER             - 
     ish         Ish               - 
     lbr         LU/LAR           lar
     lzh         LHA              lha1.00
     lzh         LHarc            lharc102
     lzss        LZSS              - 
     md          MDCD              - 
     mime        MIME             mpack-1.4-src.tar.Z
     pak         PAK              arc521
     pit         PackIt           unpit
     sea         Self extracting   - 
     shar        Shell archive    unhshar           
     shk         ShrinkIt          - 
     sit         StuffIt          unsit
     stf         ToFit             - 
     tar         Tape Archive     tar
     uue         Uuencoded        uudecode
     wrp         WARP              - 
     xqx         Squeeze           - 
     xxe         Xxencoded        xxdecode
     y           Yabba            yabbawhap
     Z           UNIX compress    compress
     z           UNIX pack        unpack
     zip         PKZIP            unzip50
     zoo         ZOO              zoo210

     Notes:  1.  A comprehensive guide to some of the 
     more obscure file compression formats is available 
     via anonymous ftp at ftp.cso.uiuc.edu:/doc/pcnet/compression.