We were looking for a different type of visualization for a project at work this past week and my thoughts immediately gravitated towards [streamgraphs](http://www.leebyron.com/else/streamgraph/). The TLDR on streamgraphs is they they are generalized versions of stacked area graphs with free baselines across the x axis. They are somewhat [controversial](http://www.visualisingdata.com/index.php/2010/08/making-sense-of-streamgraphs/) but have a “draw you in” aesthetic appeal (which is what we needed for our visualization).
You can make streamgraphs/stacked area charts pretty easily in D3, and since we needed to try many different sets of data in the streamgraph style, it made sense to make this an R [htmlwidget](http://www.htmlwidgets.org/). Thus, the [streamgraph package](https://github.com/hrbrmstr/streamgraph) was born.
### Making a streamgraph
The package isn’t in CRAN yet, so you have to do the `devtools` dance:
Streamgraphs require a continuous variable for the x axis, and the `streamgraph` widget/package works with years or dates (support for `xts` objects and `POSIXct` types coming soon). Since they display categorical values in the area regions, the data in R needs to be in [long format](http://blog.rstudio.org/2014/07/22/introducing-tidyr/) which is easy to do with `dplyr` & `tidyr`.
The package recognizes when years are being used and does all the necessary conversions for you. It also uses a technique similar to `expand.grid` to ensure all categories are represented at every observation (not doing so makes `d3.stack` unhappy).
Let’s start by making a `streamgraph` of the number of movies made per year by genre using the `ggplot2` `movies` dataset:
library(streamgraph) library(dplyr) ggplot2::movies %>% select(year, Action, Animation, Comedy, Drama, Documentary, Romance, Short) %>% tidyr::gather(genre, value, -year) %>% group_by(year, genre) %>% tally(wt=value) %>% streamgraph("genre", "n", "year") %>% sg_axis_x(20) %>% sg_fill_brewer("PuOr") %>% sg_legend(show=TRUE, label="Genres: ")
We can also mimic an example from the [Name Voyager](http://www.bewitched.com/namevoyager.html) project (using the `babynames` R package) but change some of the aesthetics, just to give an idea of how some of the options work:
library(dplyr) library(babynames) library(streamgraph) babynames %>% filter(grepl("^(Alex|Bob|Jay|David|Mike|Jason|Stephen|Kymberlee|Lane|Sophie|John|Andrew|Thibault|Russell)$", name)) %>% group_by(year, name) %>% tally(wt=n) %>% streamgraph("name", "n", "year", offset="zero", interpolate="linear") %>% sg_legend(show=TRUE, label="DDSec names: ")
There are more examples over at [RPubs](http://rpubs.com/hrbrmstr/streamgraph04) and [github](http://hrbrmstr.github.io/streamgraph/), but I’ll close with a streamgraph of housing data originally made by [Alex Bresler](http://asbcllc.com/blog/2015/february/cre_stream_graph_test/):
dat <- read.csv("http://asbcllc.com/blog/2015/february/cre_stream_graph_test/data/cre_transaction-data.csv") dat %>% streamgraph("asset_class", "volume_billions", "year") %>% sg_axis_x(1, "year", "%Y") %>% sg_fill_brewer("PuOr") %>% sg_legend(show=TRUE, label="Assets: ")
While the radical volume change would have been noticeable in almost any graph style, it’s especially noticeable with the streamgraph version as your eyes tend to naturally follow the curves of the flow.
While I wouldn’t have these replace my trusty ggplot2 faceted bar charts for regular EDA and reporting, streamgraphs can add a bit of color and flair, and may be an especially good choice when you need to view many categorical variables over time.
As usual, issues/feature requests on [github](http://github.com/hrbrmstr/streamgraph) and showcase/general feedback in the comments.