Scripting a search and replace with Perl

I use Perl one-liners and the sed command quite frequently to do search-and-replace tasks with data and code files. For search-and-replace tasks that can’t easily be reduced to a s/search/replace/g command, Perl provides support for scripting within regular expressions.

I recently took advantage of this capability for a data processing task I had. I had a GFF3 file for which I needed to reformat all of the sequence IDs. The old IDs followed the pattern scaffold_0, scaffold_1, etc, while the new IDs needed to follow the pattern Scaf0001, Scaf0002, etc. I couldn’t simply replace each instance of scaffold_ with Scaf, since that would not pad with leading 0s as was required. The following Perl one-liner did the trick (you may have to scroll right to see the whole thing).

perl -ne 's[scaffold_(\d+)]{$num = $1; $num++; $num = "0". $num while(length($num) < 4); qq[Scaf$num]}eg; print' < old.gff3 > new.gff3

Here’s the Perl broken up.

s[scaffold_(\d+)]
{
  $num = $1;
  $num++;
  $num = "0". $num while(length($num) < 4);
  qq[Scaf$num]
}eg;
print

This is a very useful feature, kludgy syntax notwithstanding.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s