Re-organizing factors in ggplot2 plots

When you have a factor with many levels and want to see a bar chart (kind of like a histogram for discrete data, as Hadley pointed out in the comments), it’s always more useful to see them sorted in some kind of order.  An example of this is shown below:

labels <- c(
  rep("a", 100*rexp(1)), 
  rep("b", 100*rexp(1)),
  rep("c", 100*rexp(1)),
  rep("d", 100*rexp(1)))
x <- data.frame(labels = factor(labels), some.value = runif(length(labels)))
qplot(labels, data=x)

This produces the following chart:

Not bad if you only have 4 levels in your factor, but what if you have 400?  I’d like to see the sequence of bars go a, d, b, c instead of a, b, c, d.  Here’s the trick to re-order your factor:

qplot(reorder(x$labels, as.numeric(x$labels), length))

The reorder function takes three parameters:

  • The factor that you want to reorder
  • A numeric vector of equal length (the values don’t matter for for this specific task, as long as each number corresponds to one factor level)
  • A function to apply to the numeric vector

In this case, we want to re-order by the number of each level, so we use the length function. The plot now looks like this:

To get them in descending order, I figured it was easiest to just multiply the length by -1.

qplot(reorder(x$labels, as.numeric(x$labels), function(y){-1*length(y)}))

Which will produce the following:

blog comments powered by Disqus