Common Compression File Formats

There are various compression formats, but many of them can be obscure. Let's review many of those formats—that way, you can make an informed decision when you're creating your archives.

001 is an extension that indicates that the archive is using the ARJ format for compression. You may also see such files with the extension of .arj. Used on MS-DOS, although other platforms have tools that will uncompress 001 and ARJ files.

7Z is a new format created for use with 7-Zip, an open source Windows-based archiver.

ARJ files are discussed previously, with 001.

BIN is Mac OS-only, & stands for MacBinary. Does very little compression, and creates binary files instead of text files. Leaves Mac-specific data intact & therefore keeping the "resource fork" together with the "data fork". Since both forks are kept together, for example, a decompressed file will still display its actual icon, instead of a generic file icon. Since it's a binary format, you need to transfer .bin files over FTP only after setting your FTP program to "binary".

BZIP and BZIP2 uses the "Burrows-Wheeler block sorting text compression algorithm" (no, I don't know what that means either). It is used on Linux and other Unix-like systems. Files using this method end in ".bz2."

CAB is a Microsoft cabinet file, used to distribute software programs.

CPIO is a Unix command used for copying files into, and out of, archives. It's not seen very much any more, since it's been pretty much supplanted by TAR and GZIP.

DEB is used by the Debian distribution of Linux to package software installation files. RPM is a similar tool for different distributions of Linux.

EAR, for Enterprise ARchive, is used with Java 2 Enterprise Edition (J2EE) applications that require multiple JAR and WAR files, discussed afterwards. EAR, like JAR and WAR, uses the same compression method as ZIP.

GZ is the GNU version of ZIP. It is commonly used on Linux systems.

HQX is a BinHex file. Converts text and binary files into ASCII text; specifically, the 7 bits that most Unix systems use. Results in larger files than .bin; however, it's safer for traveling around the Internet via email because that fact that it uses ASCII text allows the transfer of binary programs over non-binary transfer protocols like UUCP and sendmail. When using FTP, it doesn't matter if you set your transfer to "binary" or "ASCII7"; either way, if you're using .hqx, things will be fine.

JAR stands for Java ARchive, and is used with archives containing software written in and for the Java programming language. JAR, like EAR and WAR, uses the same compression method as ZIP.

LHA is a Japanese compression format dating from the 1980s. It proved to be influential, since the source code was made available by Dr. Haruyasu Yoshizaki, its creator. One of the few archivers used on computers running the Amiga operating system.

RAR is a proprietary format developed by Eugene Roshal. His licensing allows for the free decoding of RAR archives, but encoding is only allowed by his company.

RPM stands for "Red Hat Package Manager." Invented by Red Hat, it is used to build and install individual software packages. Since it is almost entirely used as a tool for Linux software installation, it is extremely rare to find it used to compress normal data files, or to find it on Windows or Mac OS X machines.

SEA stands for Self-Expanding Archive, and it goes with SIT, discussed next.

SIT is use with the Mac program StuffIt. Also leaves Mac-specific data intact, like a .bin. This form of compression is proprietary to Alladin Systems, but their "Expander" program is free for download for both Mac and Windows. Does a pretty good job of compressing files.

TAR files aren't really compressed; instead, they're conjoined to form one large file. In other words, if you have 100 files, each 3 kb, and you tar them together, you end up with one 300 kb file. At this point, most tar files are compressed using another program, often gzip, resulting in a file with the extension of ".tar.gz" or "tgz." Almost never seen on Windows or Mac OS X, and extremely common on Linux computers.

WAR files are related to JAR archives. WAR, which stands for Web ARchive, brings together all the files that a Java-based Web application needs—Java archives, HTML pages, XML files, and so on—so the application can be run easily on a Web server. Like JAR and EAR, WAR use the same compression method as ZIP.

ZIP works across a wide variety of computing platforms, including Unix and Linux, VMS, OS/2, MS-DOS, Windows, and Macintosh. The reason for the format's near universality can be attributed to Phil Katz, the developer of the original ZIP compression algorithm, who placed in the public domain the ZIP file format, its compression format, and the .zip" filename extension.

WebSanity Top Secret