Retrieving split subsequences with BioPerl

Whenever I need to store the positions of genes and other features of a genomic sequence, I typically keep them in a separate GFF3 file. Then, when I need to retrieve the genomic sub-sequence corresponding to a given feature, I simply grab that feature’s start and end positions from the GFF3 file, grab the genomic sequence from a Fasta file, and then pull out the sub-sequence of the feature using Perl’s substr function or BioPerl’s subseq function (from the Bio::PrimarySeq module). Pretty simple.

However, it requires a bit more work to retrieve split features (such as CDS fragments separated by introns). My typical approach here has been to grab each fragment separately, concatenate them, and then look at the feature’s strand to determine whether I needed to compute the reverse complement of the sequence I had just put together. Still not too difficult, but a lot of places where you can make simple silly mistakes.

A few days ago, I discovered BioPerl’s Bio::Location::Split module which really simplifies this process. The module allows you to create an object to store the start and end positions of every fragment belonging to a feature that has been split up across a genomic sequence. Then, instead of grabbing each individual fragment and concatenating them together, you can simply pass the Bio::Location::Split object as the argument in the subseq method to retrieve the sequence.

my $loc = Bio::Location::Split->new();
foreach my $cds_fragment(@cds_features)
  $loc->add_sub_Location( $cds_fragment->location );
my $cds_sequence = $genomic_sequence->subseq($loc);

This nice discovery has made my code much more simple, readable, and maintainable.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s