Yes, another post about compression programs. No, data compression is not an area of particular research interest for me, but I’ve been dealing with so much data recently that I’m really looking for better and quicker ways to compress, decompress, and transfer data.
The zlib website hosts the home page for pigz, a parallel implementation of the UNIX program gzip. It compiled very quickly and cleanly out-of-the-box on several platforms (Fedora, Red Hat, OS X) and works just like gzip, bzip2, or any other compression program would on the command line.
# Here is how I would compress a tarball... tar cf - $DATA/*.fastq | pigz -p 16 > $WD/RawReads.tar.gz # ...and here is how you would decompress. pigz -p 16 -d -c $WD/RawReads.tar.gz | tar x
The performance improvement is significant, so initially I was very excited about this finding. However, after a few uses I did encounter a case in which I had issues decompressing a particularly large tarball that had been created with pigz. It appears that the tarball was corrupted somehow during the compression process.
Definitely a program worth checking out. I’m cautiously optimistic that my troubles have just been a fluke or the result of some mistake on my part, but I’m not betting the farm on it yet.