Plotting log-scale axes in R

Wow, it feels like a long time since I have blogged, but it’s only been a few weeks. I’m teaching a class on computational genome science this semester, and taking another one on the evolution of genes and genomes, so yeah, coursework has been kicking me in the butt the last couple of months. But anyway, enough meta-blogging…

This week I was reading about codon usage bias in Mike Lynch’s excellent book The Origins of Genome Architecture. In the last section of Chapter 6, he highlights the perils of automatically attributing elevated usage of “non-optimal” codons to selection and ignoring the stochastic effects of population genetic processes. Figure 6.9 plots the equilibrium frequency of “optimal” codons given different mutation rates and different values of 2N_gs (the power of selection on a gene).

I wanted to use this figure for an assignment. Since it is not readily available in electronic form, and since scanning it in would be lame, and since the equations used to generate the plots are clearly laid out in the book, and since I’m a curious and headstrong scientist…yeah, I had no reason not to generate the plot myself using R. I quickly threw together an R script to plot the graphic.

# The n variable represents different values of 2N_{g}s
n <- 1

# The x variable is the parameter of the equation and represents u/v, the 
# mutation rate from "optimal" to "non-optimal" codon
x <- seq(0, 100, by=.0001)

y <- exp(n)/(exp(n)+x)
plot(x, y, type="l",
     xlab="Mutation rate toward non-optimal codon (u/v)",
     ylab="Equilibrium frequency of optimal codon (p̃)")

n <- 3
y <- exp(n)/(exp(n)+x)
lines(x, y=y, col="blue")

n <- 0
y <- exp(n)/(exp(n)+x)
lines(x, y=y, col="red")

This script worked, although it was immediately clear why the graphic in the book had used a log scale for the x-axis: my plot didn’t look that great. So I started trying to work out in my head whether I should transform the x values or the y values, and how I was going to properly label the x-axis, when I thought “there has got to be a better way to do this; I bet this is built in to R.” The help page for the plot function didn’t give any clues, but a bit more searching helped me find what I was looking for. The log option of the plot function lets you specify whether to plot the x-axis or the y-axis or both on a log scale (using “x”, “y”, and “xy” as arguments, respectively). I then found a cool trick to label the axis using exponents rather than decimals or the lame default scientific notation. Anyway, here is the updated code…

# The n variable represents different values of 2N_{g}s
n <- 1

# The x variable is the parameter of the equation and represents u/v, the 
# mutation rate from "optimal" to "non-optimal" codon
x <- seq(0, 100, by=.0001)

y <- exp(n)/(exp(n)+x)
plot(x, y, log="x", type="l", xaxt="n", xlim=c(0.01, 100),
     xlab="Mutation rate toward non-optimal codon (u/v)",
     ylab="Equilibrium frequency of optimal codon (p̃)")
ticks <- seq(-2, 2, by=1)
labels <- sapply(ticks, function(i) as.expression(bquote(10^ .(i))))
axis(1, at=c(0.01, 0.1, 1, 10, 100), labels=labels)

n <- 3
y <- exp(n)/(exp(n)+x)
lines(x, y=y, col="blue")

n <- 0
y <- exp(n)/(exp(n)+x)
lines(x, y=y, col="red")

…and the final product.

Codon frequency distribution

Many thanks to the following StackOverflow threads for pointing me in the right direction.

  • On this one, I saw the log parameter for the plot function.
  • On this one, I learned the cool trick for the axis labels.
Advertisements

2 comments

  1. Fernando

    Thank you for sharing
    You may also appreciate this solution for the curve

    # The n variable represents different values of 2N_{g}s
     
    # The x variable is the parameter of the equation and represents u/v, the
    # mutation rate from "optimal" to "non-optimal" codon
    n=c(1,3,0) 
    color=c("black","blue","red")
    curve(exp(n[1])/(exp(n[1])+x), log="x", type="l", xaxt="n", xlim=c(0.01, 100),col= color[1]
         xlab="Mutation rate toward non-optimal codon (u/v)",
         ylab="Equilibrium frequency of optimal codon (p̃)")
    ticks &lt;- seq(-2, 2, by=1)
    labels &lt;- sapply(ticks, function(i) as.expression(bquote(10^ .(i))))
    axis(1, at=c(0.01, 0.1, 1, 10, 100), labels=labels)
    curve(exp(n[2])/(exp(n[2])+x),col= color[2],add=TRUE)
    curve(exp(n[3])/(exp(n[3])+x) ,col= color[3],add=TRUE)
    

    When you have a formula, the output of curve looks nicer than that of lines

  2. Fernando

    There was a comma missing. Sorry for the confusion

    n=c(1,3,0) 
    color=c("black","blue","red")
    curve(exp(n[1])/(exp(n[1])+x), log="x", type="l", xaxt="n", xlim=c(0.01, 100),col= color[1],
         xlab="Mutation rate toward non-optimal codon (u/v)",
         ylab="Equilibrium frequency of optimal codon (p̃)")
    ticks &lt;- seq(-2, 2, by=1)
    labels &lt;- sapply(ticks, function(i) as.expression(bquote(10^ .(i))))
    axis(1, at=c(0.01, 0.1, 1, 10, 100), labels=labels)
    curve(exp(n[2])/(exp(n[2])+x),col= color[2],add=TRUE)
    curve(exp(n[3])/(exp(n[3])+x) ,col= color[3],add=TRUE)
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s