Alternative to Grouped Bar Charts in R

The #spiffy @dseverski gave me this posit the other day:

and, I obliged shortly thereafter, but figured I’d toss a post up on the blog before heading to Strata.

To rephrase the tweet a bit, Mr. Severski asked me what alternate encoding I’d use for this grouped bar chart (larger version at the link in David’s tweet):

linkedinq31

I have almost as much disdain for grouped bar charts as I do for pie or donut charts, so appreciated the opportunity to try a makeover. However, I ran into an immediate problem: the usually #spiffy 451 Group folks did not include raw data. So, I reverse engineered the graph with WebPlotDigitizer, cleaned up the result and made a CSV from it. Then, I headed to RStudio with a plan in mind.

The old chart and data screamed faceted dot plot. The only trick necessary was to manually order the factor levels.

library(ggplot)
 
# read in the CSV file
nosql.df <- read.csv("nosql.csv", header=TRUE)
# manually order facets
nosql.df$Database <- factor(nosql.df$Database,
                            levels=c("MongoDB","Cassandra","Redis","HBase","CouchDB",
                                     "Neo4j","Riak","MarkLogic","Couchbase","DynamoDB"))
 
# start the plot
gg <- ggplot(data=nosql.df, aes(x=Quarter, y=Index))
# use points, colored by Quarter
gg <- gg + geom_point(aes(color=Quarter), size=3)
# make strips by nosql db factor
gg <- gg + facet_grid(Database~.)
# rotate the plot
gg <- gg + coord_flip()
# get rid of most of the junk
gg <- gg + theme_bw()
# add a title
gg <- gg + labs(x="", title="NoSQL LinkedIn Skills Index\nSeptember 2013")
# get rid of the legend
gg <- gg + theme(legend.position = "none")
# ensure the strip is gone
gg <- gg + theme(strip.text.x = element_blank())
gg

The result is below in SVG form (install a proper browser if you can’t see it, or run the R code :-) I think it conveys the data in a much more informative way. How would you encode the data to make it more informative and accessible?

Full source & data over at github.




Cover image from Data-Driven Security
Amazon Author Page

7 Comments Alternative to Grouped Bar Charts in R

  1. tcamm

    Why do you have disdain for grouped bar charts? The new chart takes up a lot of vertical space…

    Reply
  2. tylerrinker

    Funny thing is that as I read the post and you state, “what alternate encoding I’d use for this grouped bar chart” my mind immediately went to a sorted, faceted dotplot using ggplot2. I laughed when what I envisioned was precisely what you used at the end of the blog. Thanks for sharing.

    Reply
  3. pattern

    Hi,

    Wonderful Post.

    Can you please help as to how we can make the following graph in R – http://ge.tt/24iG4Sw/v/0?c

    Secondly a bit more detail on the reverse engineereing with WebPlotDigitizer, I tried but could not.

    Cheers / P

    Reply
  4. Owe Jessen (@ojessen)

    Honestly, I don’t think this works either – the plot is too high to view on one page, and i can’t imediatly compare the different factors. Additionally, the label for the factors is the wrong side up – your head can read without problems if tilted 90 degrees against the clock, but it has problems with the way it is printed now. The color is meaningless, as it repeats the y-label. I would try to print lines for the factors on a single timeline. You’ll get overprints for the lesser factors, but I’m not shure that would be much of a loss.

    Reply
  5. duyn

    I’m not sure either graph really gets the message through. It seems to me the interesting bits of the data are:

    1. Growth over time for each DB.
    2. Rankings of the DBs over time relative to each other.

    The common time dimension suggests you could save space by showing all the DBs on a common time axis.

    If you wanted to show that the stellar growth of MongoDB simply eclipses the growth in everything else, a simple line graph like this would suffice:

    < http://imgur.com/MBYVTr8 >

    If you wanted to show something useful about the less-used DBs, you could pick a log scale for the y axis:

    < http://imgur.com/P09ycEo >

    You no longer get a visual sense of the masive gulf between MongoDB and everything else, but the log scale shows you something useful about all the smaller DBs. The log scale also lets you compare growth rates, since the line slopes are approximately proportional to percentage growth (though the growth rates we see here probably strain that approximation).

    Looking at the last chart, a few things become visible which weren’t clear from either the grouped bar or the faceted dot plots:

    1. The top three (MongoDB, Cassandra and Redis) have basically grown at the same rate over the past year.

    2. Mentions for Cassandra and Redis are not just very close, they’re practically the same. It looks like almost everyone who mentions one in their profile also mentions the other.

    3. Couchbase probably has the strongest sustained growth of all the DBs (in percentage terms). It has passed MarkLogic and is on par with Riak.

    The weird line for DynamoDB likely reflects the fact that the original plot didn’t show it very well, so the reverse-engineered values are likely to not be very precise.

    [R code omitted since this comment is already long enough]

    Reply
  6. Pingback: You Don’t Need Complex Charts to Tell Powerful Stories | Visual.ly Blog | Tony Gurney

  7. Pingback: Types Of Infographics: What Is A Chart? | Stephen Darori: "What is?"

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.