F1 score and gene annotation comparisons

Burset and Guigó published a foundational paper on evaluating gene annotations in 1996. A lot of my work as a graduate student has involved writing software for comparing multiple sets of gene structure annotations against each other, and I’ve used the statistics described in this paper (matching coefficient, correlation coefficient, sensitivity, specificity) as the basis for my comparisons. However, there is another statistic (called the F1 score) that is (apparently) used commonly for analyzing gene annotations, and someone recently recommended that I should include this statistic in my comparisons. Never having heard of it, I decided to investigate.

I found several papers that referenced the F1 score (none of them were related to gene structure annotation, by the way) and was able to eventually track down the origin of the statistic. It was introduced by van Rijsbergen in 1979 in a text on information retrieval and has since found application in a variety of fields. The F1 score combines two other commonly used statistics: the precision $P$ (defined as the ratio of true positives to all predicted positives) and the recall $R$ (defined as the ratio of true positives to all actual positives).

With precision defined as
$P = \frac{TP}{TP + FP}$

and recall defined as
$R = \frac{TP}{TP + FN}$

we now define the F1 score as follows.
$F1 = \frac{2PR}{P + R}$

After 15 minutes of searching, 10 minutes of reading, and 5 minutes of coding, my software now has a new feature!