This is a quick comparison of some of the data compression and decompression formats on Linux. The idea is to compare compression/decompression time and compression size difference using seven compression formats on five different file types.
Five different data files were tested: a fastq text file, mp3 tar archive, an mp4 movie file, a randomly generated text file and a tiff image stack. Some properties of the files: fastq file (403 MB, 1.56 million reads), mp3 tar archive (390 MB, a tar archive composed of four tar archives each with 6 mp3 tracks of size 10MB to 32MB), mp4 file (340 MB), text file (400MB, created using (
base64 /dev/urandom | head -c 419430400 > text.txt) and tiff stack (404MB, 1380 frames, 640 x 480 px, sequence of zebrafish larvae swimming in a microtitre plate). For clarity, fastq files are text files containing next generation sequencing data and tiff stacks are used for image analysis using ImageJ, for example.
Seven different compression formats were tested: 7z, bzip2, gzip, lrzip, lz4, xz and zip using ten different compression commands: 7za, bzip2, lbzip2, pbzip2, gzip, pigz, lrzip, xz and zip. For decompression, the same commands were used except for zip where unzip was used. The 7za command by default compresses to the 7z format but also allows exporting to bzip2, gzip and zip. lbzip2 and pbzip2 are multi-threaded versions of bzip2. Similarly, pigz is the multi-threaded version of gzip.