This week I needed to get the latest maize gene annotations (from MaizeGDB) for comparison against a different set, and the experience was very similar to the one I had a couple of weeks ago with human genome annotations. Again, I feel the need to document this experience for future reference.
MaizeGDB vs MaizeSequence.org
The first thing to consider is that MaizeGDB gets all of its gene annotations from MaizeSequence.org. After learning this fact, I decided to go directly to the source of the data rather than download it from a second party.
MaizeGDB has downloadable annotations for two assemblies: B73 RefGen v1 and B73 RefGen v2. You need to make sure you get annotations for the appropriate assembly. This is complicated by the fact that MaizeSequence.org uses different identifiers for the assemblies: “release 4” refers to version 1, and “release 5” refers to version 2.
For both assembly versions, there are two sets of gene structure annotations: a “working” gene set (annotations with relaxed quality control) and a “filtered” gene set (annotations with stricter quality control). You need to decide whether you want to have more genes and risk including possible false positives, or whether you want to reduce false positives at the risk of missing some real genes.
MaizeGDB uses the abbreviations “WGS” and “FGS” to refer to the working and filtered gene sets (respectively). MaizeSequence.org also uses these abbreviations for release 4, but in release 5 uses separate identifiers for the working and filtered gene sets: release “5a” is the working set, and release “5b” is the filtered set.