Gene model vectors

Genome sequences in Fasta format are strings of As, Cs, Gs, and Ts representing the sequence of nitrogenous bases along the chromosome. I’ve been working on a similar format recently for encoding gene structure along the chromosomes. A ‘G’ represents a nucleotide in an intergenic region, a ‘C’ represents a nucleotide in a coding region within a gene, an ‘I’ represents a nucleotide in an intron, and ‘F’ and ‘T’ represent nucleotides in 5′ and 3′ UTRs of a gene (respectively). Encoding a genomic sequence in this way doesn’t tell you anything about the nitrogenous bases at each position, but if you’re only interested in investigating gene structure, then this format can be quite handy. I’m calling these strings “model vectors” (vectors representing gene models) to differentiate from the standard Fasta format, but standard tools from any bioinformatics libraries (BioPerl, BioPython, etc) shouldn’t have any problem processing data in this format.

For sake of simplicity, I will offer a small, and consequently unrealistic, example. The model vector TTTTTCCIICCCCIIICFFFFFFFF would be annotated something like this in GFF3 format.

##gff-version 3
chr	vim	gene	1	25	.	-	.	ID=g1
chr	vim	mRNA	1	25	.	-	.	ID=g1.t1;Parent=g1
chr	vim	three_prime_UTR	1	5	.	-	.	ID=g1.t1.utr1;Parent=g1.t1
chr	vim	CDS	6	7	.	-	.	ID=g1.t1.cds1;Parent=g1.t1
chr	vim	CDS	10	13	.	-	.	ID=g1.t1.cds2;Parent=g1.t1
chr	vim	CDS	17	17	.	-	.	ID=g1.t1.cds3;Parent=g1.t1
chr	vim	five_prime_UTR	18	25	.	-	.	ID=g1.t1.utr2;Parent=g1.t1


  1. Pingback: Permutation algorithm | BioWize
  2. Pingback: Model vectors revisited: gene structure comparison, maximal transcript cliques, and the Bron-Kerbosch algorithm « BioWize

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s