RTFM: comm

Comparing lists of IDs, filenames, or other strings is something I do on a regular basis. When I was an undergrad, I remember using a Perl script someone in our lab had written to look at two files and perform simple set operations (pull out the intersection of two lists, or the union, or unique values from one list or the other). Over the years, as the need to perform such tasks has frequently recurred, I’ve repeatedly had to dig through my old files looking for the script.

Recently, the need to do some set operations came up again, but rather than scraping around for this script I figured I should learn how to Do It the Right Way, e.g., perform the task using standard UNIX command(s). Enter the comm command.

comm

I’m guessing “comm” is short for common. It is designed precisely for the use case I described above. It takes two files (assumed to be sorted lexicographically) and produces 3 columns of output. The first column corresponds to values found only in the first file, the second column corresponds to values found only in the second file, and the third column corresponds to values found in both files. The command has flags that enable case-insensitive comparisons and, more relevant to the question at hand, exclusion of one or more of the columns of output. For example, if you want to pull out just the values found in both file1 and file2 (the intersection), you would use the following command.

comm -12 file1 file2

If you wanted to pull out the values unique to file1 using case-insensitive comparison, you would use the following command.

comm -23i file1 file2

Today’s lesson is brought to you by this thread on ServerFault@StackExchange.

Advertisements

One comment

  1. Pingback: RTFM: paste « BioWize

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s