Bash tricks: getopts and die signal handling

For better and for worse, Perl has been my scripting go-to language for years. I’ve since learned Python, and can appreciate why it has won so many crazed evangelists enthusiasts in the programming community (in general) and the scientific computing community (in particular). However, I’m all about using the best most convenient tool for the job, and sometimes the best glue for Your Little Bioinformatics Tool is a makefile, or even just a simple little shell script.

Recently I was writing a bash script to implement a very simple procedure, stringing together the results of several calls to small scripts and programs I had written. As is typical for bash scripts I have written in the past, I used positional command-line arguments for any values I needed to adjust on a run-by-run basis, and then accessed these in the script using the variables $1, $2, and so on.

As I started running the script to do my analyses, I began thinking I wish there was a better way to do this, to make some arguments optional but some required—something like getopts. Well, a simple Google search solved that one for me. A few minutes later, I had put a nice command-line interface on my bash script. The syntax is really pretty simple.

# Usage statement
print_usage()
{
  cat <<EOF
Usage: $0 [options] genomeseq.fasta annotation.gff3
  Options:
    -c    some important cutoff value; default is 0.2
    -d    debug mode
    -h    print this help message and exit
    -o    file to which output will be written; default is 'ylt.txt'
    -t    home directory for YourLittleTool; default is '/usr/local/src/YourLittleTool'
EOF
}

# Command-line option parsing
CUTOFF=0.2
DEBUG=0
YLTHOME="/usr/local/src/YourLittleTool"
OUTFILE="ylt.txt"
while getopts "c:dho:t:" OPTION
do
  case $OPTION in
    c)
      CUTOFF=$OPTARG
      ;;
    d)
      DEBUG=1
      ;;
    h)
      print_usage
      exit 0
      ;;
    o)
      OUTDIR=$OPTARG
      ;;
    t)
      YLTHOME=$OPTARG
      ;;
  esac
done

# Remove arguments associated with options
shift $((OPTIND-1))

# Verify the two required positional arguments are there
if [[ $# != 2 ]]; then
  echo -e "error: please provide 2 input files (genome sequence file (Fasta format) and annotation file (GFF3 format))\n"
  print_usage
  exit 1
fi
FASTA=$1
GFF3=$2

# Now implement the procedure of  your little tool

So on one hand, this does add quite a bit to a bash script that originally had only 4-8 lines of logic. But on the other hand, with not too much work on my part, it now has a convenient and self-documented interface that makes it much easier in case someone else in my lab (or if I’m so lucky, someone “out there”) wants to use it in the future.

As I was sprucing up the bash script, I also decided to investigate another feature I was interested in. This particular procedure creates a new directory, into which several data files, graphics, and HTML to reports are written. If the procedure failed and prematurely terminated, I wanted the default behavior to be that the output directory gets deleted so as not to interfere with subsequent run attempts (and of course I provided an option not to delete the output on failure, which is essential for troubleshooting bugs in the pipeline). I had already added set -e to the script, which kills execution of the script if any command returns an unsuccessful status. While this is very convenient, it could potentially have made it pretty complicated to delete incomplete output at different stages of the pipeline.

Enter trap. This keyword is meant to associate a handler function with various signals, one of which is the ERR signal which is fired when a bash script terminates with an error.

die_handler()
{
  if [[ !($DEBUG) ]]; then
    rm -r $OUTDIR
  fi
}
trap die_handler ERR

The trap statement above essentially says in case of an error causing premature script termination, run the die_handler function.

I’ve always considered bash scripts to be pretty hackish, and I’m not sure this experience has completely changed that opinion (lipstick on a pig?). However, for this particular case I was very happy I was able to combine the convenience of a bash script with the flexibility and power provided by getopts and event-based error handling.

Advertisements

One comment

  1. Gaston Bengolea Monzon

    Amazing, I’ve just discovered your blog and already love it!
    This feature reminds me of the .DELETE_ON_ERROR of the Makefiles. I was amazed to discover that the makefiles don’t delete their incomplete targets..

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s