I use ImageJ for many of my image analysis needs. My desktop computer runs Windows 7 and it has pretty solid specs with Core i7 processor and 16GB RAM. I recently had to handle some large tiff stacks (4-5gb) and it simply wouldn’t work on my desktop as I constantly ran into ‘out of memory’ errors. So I decided to run them on a computing cluster instead since I have access to one. Running on a cluster might be useful when handling data with large memory requirements or to perform computations on numerous files in parallel by distributing load to multiple cores. It took me a while to figure out how to get things to work, so I thought I would make a record of it. And this might hopefully be useful to others.
This is a quick comparison of some of the data compression and decompression formats on Linux. The idea is to compare compression/decompression time and compression size difference using seven compression formats on five different file types.
Five different data files were tested: a fastq text file, mp3 tar archive, an mp4 movie file, a randomly generated text file and a tiff image stack. Some properties of the files: fastq file (403 MB, 1.56 million reads), mp3 tar archive (390 MB, a tar archive composed of four tar archives each with 6 mp3 tracks of size 10MB to 32MB), mp4 file (340 MB), text file (400MB, created using (
base64 /dev/urandom | head -c 419430400 > text.txt) and tiff stack (404MB, 1380 frames, 640 x 480 px, sequence of zebrafish larvae swimming in a microtitre plate). For clarity, fastq files are text files containing next generation sequencing data and tiff stacks are used for image analysis using ImageJ, for example.
Seven different compression formats were tested: 7z, bzip2, gzip, lrzip, lz4, xz and zip using ten different compression commands: 7za, bzip2, lbzip2, pbzip2, gzip, pigz, lrzip, xz and zip. For decompression, the same commands were used except for zip where unzip was used. The 7za command by default compresses to the 7z format but also allows exporting to bzip2, gzip and zip. lbzip2 and pbzip2 are multi-threaded versions of bzip2. Similarly, pigz is the multi-threaded version of gzip.