The #spiffy @dseverski gave me this posit the other day:
Hey, @hrbrmstr, doughnut chart aside, how would you approach the first graph at http://t.co/zjHoHRVOeo? Bump chart? Trend line? Leave as is?
— David F. Severski (@dseverski) October 25, 2013
and, I obliged shortly thereafter, but figured I’d toss a post up on the blog before heading to Strata.
To rephrase the tweet a bit, Mr. Severski asked me what alternate encoding I’d use for this grouped bar chart (larger version at the link in David’s tweet):
I have almost as much disdain for grouped bar charts as I do for pie or donut charts, so appreciated the opportunity to try a makeover. However, I ran into an immediate problem: the usually #spiffy 451 Group folks did not include raw data. So, I reverse engineered the graph with WebPlotDigitizer, cleaned up the result and made a CSV from it. Then, I headed to RStudio with a plan in mind.
The old chart and data screamed faceted dot plot. The only trick necessary was to manually order the factor levels.
library(ggplot) # read in the CSV file nosql.df <- read.csv("nosql.csv", header=TRUE) # manually order facets nosql.df$Database <- factor(nosql.df$Database, levels=c("MongoDB","Cassandra","Redis","HBase","CouchDB", "Neo4j","Riak","MarkLogic","Couchbase","DynamoDB")) # start the plot gg <- ggplot(data=nosql.df, aes(x=Quarter, y=Index)) # use points, colored by Quarter gg <- gg + geom_point(aes(color=Quarter), size=3) # make strips by nosql db factor gg <- gg + facet_grid(Database~.) # rotate the plot gg <- gg + coord_flip() # get rid of most of the junk gg <- gg + theme_bw() # add a title gg <- gg + labs(x="", title="NoSQL LinkedIn Skills Index\nSeptember 2013") # get rid of the legend gg <- gg + theme(legend.position = "none") # ensure the strip is gone gg <- gg + theme(strip.text.x = element_blank()) gg
The result is below in SVG form (install a proper browser if you can’t see it, or run the R code :-) I think it conveys the data in a much more informative way. How would you encode the data to make it more informative and accessible?
Full source & data over at github.
5 Comments
Why do you have disdain for grouped bar charts? The new chart takes up a lot of vertical space…
Funny thing is that as I read the post and you state, “what alternate encoding I’d use for this grouped bar chart” my mind immediately went to a sorted, faceted dotplot using ggplot2. I laughed when what I envisioned was precisely what you used at the end of the blog. Thanks for sharing.
Hi,
Wonderful Post.
Can you please help as to how we can make the following graph in R – http://ge.tt/24iG4Sw/v/0?c
Secondly a bit more detail on the reverse engineereing with WebPlotDigitizer, I tried but could not.
Cheers / P
Honestly, I don’t think this works either – the plot is too high to view on one page, and i can’t imediatly compare the different factors. Additionally, the label for the factors is the wrong side up – your head can read without problems if tilted 90 degrees against the clock, but it has problems with the way it is printed now. The color is meaningless, as it repeats the y-label. I would try to print lines for the factors on a single timeline. You’ll get overprints for the lesser factors, but I’m not shure that would be much of a loss.
I’m not sure either graph really gets the message through. It seems to me the interesting bits of the data are:
The common time dimension suggests you could save space by showing all the DBs on a common time axis.
If you wanted to show that the stellar growth of MongoDB simply eclipses the growth in everything else, a simple line graph like this would suffice:
< http://imgur.com/MBYVTr8 >
If you wanted to show something useful about the less-used DBs, you could pick a log scale for the y axis:
< http://imgur.com/P09ycEo >
You no longer get a visual sense of the masive gulf between MongoDB and everything else, but the log scale shows you something useful about all the smaller DBs. The log scale also lets you compare growth rates, since the line slopes are approximately proportional to percentage growth (though the growth rates we see here probably strain that approximation).
Looking at the last chart, a few things become visible which weren’t clear from either the grouped bar or the faceted dot plots:
The top three (MongoDB, Cassandra and Redis) have basically grown at the same rate over the past year.
Mentions for Cassandra and Redis are not just very close, they’re practically the same. It looks like almost everyone who mentions one in their profile also mentions the other.
Couchbase probably has the strongest sustained growth of all the DBs (in percentage terms). It has passed MarkLogic and is on par with Riak.
The weird line for DynamoDB likely reflects the fact that the original plot didn’t show it very well, so the reverse-engineered values are likely to not be very precise.
[R code omitted since this comment is already long enough]
2 Trackbacks/Pingbacks
[…] Alternative to Grouped Bar Charts in R […]
[…] Alternative to Grouped Bar Charts in R (rud.is) […]