Process substitutions: commands in place of program arguments

In my tutorial on designing command line interfaces for scientific software, I propose the idea that most software could benefit from the ability to read from standard input and write to standard output—following the paradigm implemented by standard UNIX shell tools which enables their stitching together using pipes and redirects. Not too long ago, however, I came across some syntax that will forever change the way I think about this topic.

Imagine you have a program called myprog that reads from standard input—you could stitch it together with other programs like this.

someCommand | cut -f 2,4,6-10 | sort | myprog

Using the nifty process substitution syntax I recently learned, you could do the same thing like this.

myprog <(someCommand | cut -f 2,4,6-10 | sort)

Surely you must be underwhelmed…but bear with me. This may not seem like much, but the primary reason this is really cool is that you can redirect the output of multiple commands to your program, treating each as a positional argument. As far as I know, the standard piping and redirection syntax limits you to a single data stream. But process substitution allows you to feed multiple data streams into your program without the responsibility of dealing with intermediate files. UNIX creates temporary “anonymous named pipes”, but the data never hit disk.

So if you have another program called otherprog that takes three arguments, you can use the following syntax.

otherprog <(grep -v '^@' reads.sam) <(grep '^chr1' genes.gff3 | cut -f 4-6) <(blastdbcmd -db prot.fa -entry_batch seqids.txt)

In this example, the otherprog program is able to read input from 3 separate data streams without the need to store the data in temporary data files. This is extremely convenient and I have already begun to make extensive use of this syntax in my daily work.

PS Props to @climagic for sharing this in the first place, and to @vsbuffalo whose blog post (which I saw after writing this post–see comments) provided some better terminology!

Advertisements

2 comments

  1. Pingback: The fastest darn Fastq decoupling procedure I ever done seen | BioWize

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s