In my tutorial on designing command line interfaces for scientific software, I propose the idea that most software could benefit from the ability to read from standard input and write to standard output—following the paradigm implemented by standard UNIX shell tools which enables their stitching together using pipes and redirects. Not too long ago, however, I came across some syntax that will forever change the way I think about this topic.
Imagine you have a program called
myprog that reads from standard input—you could stitch it together with other programs like this.
someCommand | cut -f 2,4,6-10 | sort | myprog
Using the nifty process substitution syntax I recently learned, you could do the same thing like this.
myprog <(someCommand | cut -f 2,4,6-10 | sort)
Surely you must be underwhelmed…but bear with me. This may not seem like much, but the primary reason this is really cool is that you can redirect the output of multiple commands to your program, treating each as a positional argument. As far as I know, the standard piping and redirection syntax limits you to a single data stream. But process substitution allows you to feed multiple data streams into your program without the responsibility of dealing with intermediate files. UNIX creates temporary “anonymous named pipes”, but the data never hit disk.
So if you have another program called
otherprog that takes three arguments, you can use the following syntax.
otherprog <(grep -v '^@' reads.sam) <(grep '^chr1' genes.gff3 | cut -f 4-6) <(blastdbcmd -db prot.fa -entry_batch seqids.txt)
In this example, the
otherprog program is able to read input from 3 separate data streams without the need to store the data in temporary data files. This is extremely convenient and I have already begun to make extensive use of this syntax in my daily work.