A recent post by Stephen Turner about the woes of posting code on your lab website really resonated with me. As a scientist I have occasionally clicked on a link or copy&pasted a URL from a paper, only to find that the web address I’m looking for no longer exists. Sure it’s frustrating in the short term, but in the long term it’s troubling to think that so much of the collective scientific output has such a short digital shelf life.
This happened to me again just yesterday. I was looking over this paper on algorithms for composition-based segmentation of DNA sequences, and I was interested in running one of the algorithms. The code, implemented in Matlab (good grief), is available (you guessed it!) from their lab website: http://nsm.uh.edu/~dgraur/eran/simulation/main.htm. Following that link takes you to a page with a warning that the lab website has moved, and if you follow that link you end on the splash page for some department or institute that has no idea how to handle the redirect request. This paper is from 2010, and yet I can’t access supplements from their lab website! I did a bit of Google searching and found the first author’s current website, which even included links to the software mentioned in the paper, but unfortunately the links to the code point to the (now defunct) server published in the paper. I finally found the code buried in a Google Code project, and now I’m sitting here wondering whether it was really worth all the hassle in the first place, and whether I even want to check if our institution has a site license for Matlab…
With regards to my own research, I’ve been using hosting services like SourceForge, Github, and BitBucket to host my source code for years. However, I’ve continued using our lab server to host this blog, along with all the supplementary graphics and data that go along with it. I guess I initially enjoyed the amount of control I had. But after reading Stephen’s post, realizing how big of a problem this is in general, and of course thinking of all of the fricking SELinux crap I’ve had to put up with (our lab servers run Fedora and Red Hat), the idea of using a blog hosting service all of a sudden seemed much more reasonable.
So as of this post, the BioWi[sz]e blog is officially migrated to WordPress.com. Unfortunately, someone got the http://biowise.wordpress.com subdomain less than a year ago—they even spent the $25 bucks to reserve a
.me domain, and yet they’re doing nothing with it. Grrr…So anyway, the BioWise you know and love is now BioWize, for better and for worse.
As far as the supplementary graphics and data files, I have followed Stephen Turner’s example and posted everything on FigShare. While uploading data files and providing relevant metadata was very straightforward, there is a bit of a learning curve when it comes to organizing and grouping related files. And once data is publicly published on FigShare, deleting it is not an option, even if you’re just trying to clean things up and fix mistakes. So if I could have done one thing differently, I would have been more careful about how I uploaded and grouped the files. Otherwise, I have no complaints. I love the idea that the content of my blog will be accessible long after I’ve moved on from my current institution (without any additional work on my part), and that all of the supporting data sets are permanently accessible, each with its own DOI.