Story of my life right now

Yesterday’s XCKD comic could not have been more timely. This week I am trying to gather whole-genome annotations for a variety of model organisms–well, I am in fact gathering two sets of annotations for each organism (for comparison). My real trouble hasn’t been downloading the data to my local machine (although navigating a smorgasbord of genome browsers and FTP sites has been “fun”). My real trouble begins once I have the data in hand.

BED, GTF, GFF2, GFF3, XML; all too loosely defined (or too loosely adhered to) to enable any kind of reliable conversion utilities. So I’m stuck searching for conversion scripts on Google, hoping I find one that works for my particular data set…until I throw my hands up in the air and consign myself to writing yet another Perl script that will take 5 minutes to code and 2 hours to debug.

I’m glad I’m not the only one that feels this way. Take a look at the top answer to this thread in a bioinformatics Q&A forum.

At times I’ve felt that, given enough time, I could come up with a solution that suits everybody’s needs. But then that just puts us right back to where we started (refer again to the XKCD comic). The problem isn’t formats, the problem is people.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s