A few months ago, I sat down with a post-doc and we made a list of TE prediction software. We came up with over 20 programs, scripts, etc, and got to work trying to download, install, and use these various software tools. This was perhaps the most frustrating experience I’ve had in grad school to date.
A select few programs were well documented and “just worked” exactly as advertised. More often though, the program documentation was unclear, redundant, contradictory, and simply insufficient. A few programs were even missing documentation altogether! Although we had a list of over 20 programs, we were only able to get results from 6 of them after several weeks of trying.
At one point during these few horrid weeks, I stormed into the office of one of my professors and just vented about how frustrating it had been. He was very patient with me and helped me talk it out, and I was able to get back to work soon. However, I made a promise to myself that day that I will never cause anyone that amount of grief by writing crappy software, incomplete documentation, or research that is not completely and easily reproducible.
Recently, I had the first meeting with my PhD committee and as part of my Description of Proposed Research, I decided to state this goal explicitly in a section called Research Philosophy.
Much of my dissertation work will involve developing new tools and methodologies for genomics research. My goal is to make all of this work accessible, usable, and reproducible by the scientific community. Of course this philosophy is not unique to me, as it is implicit in the scientific method. My reason for making this goal explicit during the initial stages of my research is to commit myself to a higher standard than what may minimally be expected for graduation.
The following provides my specific plans for achieving the goal of accessibility, usability, and reproducibility with my research.
- Use permissive open-source licensing
- Host source code, data, other supplements externally
- Maximize software portability; compatibility with all POSIX-like systems preferred, but compatibility with all Linux systems as a minimum
- Provide clear, accurate documentation
- Eliminate complicated installation procedures
- Reduce external dependencies
- Provide simple examples
- List all parameter values used for more complicated examples or use cases
- Provide accurate accession numbers for all data used