Welcome to the Linux Foundation Forum!
GZ or BZ2?
in Applications
Hi. I'd like to adopt a default compression algorithm to my files. It has to be the most linux friendly, have the highest compression rate and security. Instantly I thought in tar.gz and tar.bz2. But I don't know their main differences. Witch one should I chose?
0
Comments
Here is what I ran to test various compress algorithms.
File: linux-2.6.36-rc4
Uncompressed Size: 454M
tar.gz
Time: 25.546s
New Size: 88M
Compression Percent: 80.6%
7zip
Time: 4m 0.327s
New Size: 61M
Compression Percent: 86.6%
xz
Time: 6m 6.009s
New Size: 59M
Compression Percent: 87.0%
tar.bz2
Time: 1m 41.552s
New Size: 52M
Compression Percent: 88.5%
By what I have heard xz is supposed to be much better than bz2 or gz, but it may just be my archive contents that restricted the compression. In the end you will have to create compressed archives with the various algorithms on a realistic sample and decide which best files your speed, compression and system load needs.
I know that the format rar is can add a defined amount of redundancy to an archive, thus making the archive more robust when it comes to corruption; the catch is that the rar format isn't open. I don't think that GNU tar has the same feature built in, but third party utilities might provide something similar.
When it comes to efficiency, all tests I have seen so far indicate that the compression ratio of bz2 is superior to that of it's older counterpart gz, but that efficiency boost comes at the price of requiring more CPU time. I don't have anything to contribute regarding other formats (xz, 7z, etc); see Matthews excellent answer for some raw numbers on this issue.
Lets compare the two formats:
file.tar.gz : generally larger, but it takes less time to decompress
extraction command: tar -zxvf file.tar.gz
file.tar.bz2: generally smaller, but takes more time to decompress
extraction command: tar -jxvf file.tar.bz2
Now let's assess your needs: If you want to distribute your file online, I reccomend using bz2. This allows for faster downloads of said file. If you are running on a lower-end machine with a faster internet connection, than by all means use gz. This will allow for faster decompression, but download time will suffer.
To conclude: Both formats are (basically) useable accross Linux systems, you just have to decide which format meets your needs best.