Research philosophy

A few months ago, I sat down with a post-doc and we made a list of TE prediction software. We came up with over 20 programs, scripts, etc, and got to work trying to download, install, and use these various software tools. This was perhaps the most frustrating experience I’ve had in grad school to date.

A select few programs were well documented and “just worked” exactly as advertised. More often though, the program documentation was unclear, redundant, contradictory, and simply insufficient. A few programs were even missing documentation altogether! Although we had a list of over 20 programs, we were only able to get results from 6 of them after several weeks of trying.

At one point during these few horrid weeks, I stormed into the office of one of my professors and just vented about how frustrating it had been. He was very patient with me and helped me talk it out, and I was able to get back to work soon. However, I made a promise to myself that day that I will never cause anyone that amount of grief by writing crappy software, incomplete documentation, or research that is not completely and easily reproducible.

Recently, I had the first meeting with my PhD committee and as part of my Description of Proposed Research, I decided to state this goal explicitly in a section called Research Philosophy.

Research philosophy

Much of my dissertation work will involve developing new tools and methodologies for genomics research. My goal is to make all of this work accessible, usable, and reproducible by the scientific community. Of course this philosophy is not unique to me, as it is implicit in the scientific method. My reason for making this goal explicit during the initial stages of my research is to commit myself to a higher standard than what may minimally be expected for graduation.

The following provides my specific plans for achieving the goal of accessibility, usability, and reproducibility with my research.

Accessibility

  • Use permissive open-source licensing
  • Host source code, data, other supplements externally
  • Maximize software portability; compatibility with all POSIX-like systems preferred, but compatibility with all Linux systems as a minimum

Usability

  • Provide clear, accurate documentation
  • Eliminate complicated installation procedures
  • Reduce external dependencies

Reproducibility

  • Provide simple examples
  • List all parameter values used for more complicated examples or use cases
  • Provide accurate accession numbers for all data used
Advertisements

One comment

  1. Pingback: My comments on “software solutions for big biology” paper | BioWize

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s