Introducing the streamgraph htmlwidget R Package

We were looking for a different type of visualization for a project at work this past week and my thoughts immediately gravitated towards streamgraphs. The TLDR on streamgraphs is they they are generalized versions of stacked area graphs with free baselines across the x axis. They are somewhat controversial but have a “draw you in” aesthetic appeal (which is what we needed for our visualization).

You can make streamgraphs/stacked area charts pretty easily in D3, and since we needed to try many different sets of data in the streamgraph style, it made sense to make this an R htmlwidget. Thus, the streamgraph package was born.

Making a streamgraph

The package isn’t in CRAN yet, so you have to do the devtools dance:

devtools::install_github("hrbrmstr/streamgraph")

Streamgraphs require a continuous variable for the x axis, and the streamgraph widget/package works with years or dates (support for xts objects and POSIXct types coming soon). Since they display categorical values in the area regions, the data in R needs to be in long format which is easy to do with dplyr & tidyr.

The package recognizes when years are being used and does all the necessary conversions for you. It also uses a technique similar to expand.grid to ensure all categories are represented at every observation (not doing so makes d3.stack unhappy).

Let’s start by making a streamgraph of the number of movies made per year by genre using the ggplot2 movies dataset:

library(streamgraph)
library(dplyr)
 
ggplot2::movies %>%
  select(year, Action, Animation, Comedy, Drama, Documentary, Romance, Short) %>%
  tidyr::gather(genre, value, -year) %>%
  group_by(year, genre) %>%
  tally(wt=value) %>%
  streamgraph("genre", "n", "year") %>%
  sg_axis_x(20) %>%
  sg_fill_brewer("PuOr") %>%
  sg_legend(show=TRUE, label="Genres: ")

Movie count by genre by year

We can also mimic an example from the Name Voyager project (using the babynames R package) but change some of the aesthetics, just to give an idea of how some of the options work:

library(dplyr)
library(babynames)
library(streamgraph)
 
babynames %>%
 filter(grepl("^(Alex|Bob|Jay|David|Mike|Jason|Stephen|Kymberlee|Lane|Sophie|John|Andrew|Thibault|Russell)$", name)) %>%
  group_by(year, name) %>%
  tally(wt=n) %>%
  streamgraph("name", "n", "year", offset="zero", interpolate="linear") %>%
  sg_legend(show=TRUE, label="DDSec names: ")

Data-Driven Security Podcast guest+host names by year

There are more examples over at RPubs and github, but I’ll close with a streamgraph of housing data originally made by Alex Bresler:

dat <- read.csv("http://asbcllc.com/blog/2015/february/cre_stream_graph_test/data/cre_transaction-data.csv")
 
dat %>%
  streamgraph("asset_class", "volume_billions", "year") %>%
  sg_axis_x(1, "year", "%Y") %>%
  sg_fill_brewer("PuOr") %>%
  sg_legend(show=TRUE, label="Assets: ")

Commercial Real Estate Transaction Volume by Asset Class Since 2006

While the radical volume change would have been noticeable in almost any graph style, it’s especially noticeable with the streamgraph version as your eyes tend to naturally follow the curves of the flow.

Fin

While I wouldn’t have these replace my trusty ggplot2 faceted bar charts for regular EDA and reporting, streamgraphs can add a bit of color and flair, and may be an especially good choice when you need to view many categorical variables over time.

As usual, issues/feature requests on github and showcase/general feedback in the comments.

SweetheaRstats

I felt compelled to dust off my 2013 Valentine’s Day #rstats post and make it all Shiny and new again. I used the same math from that post, but made the polygon a bit sharper and used ggplot2 for the plotting.

To kick it up a bit, I decided to pay homage to a local company (Necco) who makes the venerable Sweethearts candies that are popular this time of year (at least in the U.S.). The heart color is randomized to take on one of the signature pastels of the candy and I took the various sayings that have been on their hearts over the years (except 2006, which was a strange year) and also randomized them. The plot also displays the year of the saying.

I wrapped it all up in a Shiny bow, so all you have to do to get a new heart/saying is refresh the page!

To play with it, you can:

  • refresh this post (since the iframe below is pointing to the shiny app)
  • refresh the target shinyapps.io app
  • run it locally via shiny::runGist("cf34f7230c88bd99b153")

NOTE: You’ll probably want/need to run it locally. I only have a free ShinyApps account and it’ll probably run out of free CPU time depending on when you’re reading this post.

The surprisingly small bit of code is in this gist.

Now there’s even one more way to have a data-driven romance!

Oh, and Happy Valentine’s Day @mrshrbrmstr!!