In preparation for using some of our streamgraphs for production (PDF/print) graphics, I ended up having to hand-edit labels in on one of the graphics in an Adobe product. This bumped up the priority on adding annotation functions to the streamgraph package (you really don’t want to have to hand-edit graphics if at all possible, trust me). To illustrate them, I’ll use unemployment data that I started gathering for a course I’m teaching in the Fall.
We’ll start with the setup and initial data gathering:
library(dplyr) library(streamgraph) library(pbapply) url <- "http://www.bls.gov/lau/ststdsadata.txt" dat <- readLines(url)
This data is not exactly in a happy format (hit the URL in your browser and you’ll see what I mean). It was definitely made for line printers/human consumption and I feel bad for any human that has to stare at it. The function I’m using to extract data is not necessarily what I’d do to just read in the whole data, but it’s more for teaching something else than optimization. It’ll do for our purposes here:
get_state_data <- function(state) { section <- paste("^%s| (", paste0(month.name, sep="", collapse="|"), ")\ +[[:digit:]]{4}", sep="", collapse="") section <- sprintf(section, state) vals <- gsub("^\ +|\ +$", "", grep(section, dat, value=TRUE)) state_vals <- gsub("^.* \\.+", "", vals[seq(from=2, to=length(vals), by=2)]) cols <- read.table(text=state_vals) cols$month <- as.Date(sprintf("01 %s", vals[seq(from=1, to=length(vals), by=2)]), format="%d %B %Y") cols$state <- state cols %>% select(8:9, 1:8) %>% mutate(V1=as.numeric(gsub(",", "", V1)), V2=as.numeric(gsub(",", "", V2)), V4=as.numeric(gsub(",", "", V4)), V6=as.numeric(gsub(",", "", V6)), V3=V3/100, V5=V5/100, V7=V7/100) %>% rename(civ_pop=V1, labor_force=V2, labor_force_pct=V3, employed=V4, employed_pct=V5, unemployed=V6, unemployed_pct=V7) } state_unemployment <- bind_rows(pblapply(state.name, get_state_data))
This will give us a data frame for employment(/unemployment) rates for all the (US) states. I only wanted to focus on New England and a few others for the course example, so this bit filters out them out:
state_unemployment %>% filter(state %in% c("California", "Ohio", "Rhode Island", "Maine", "Massachusetts", "Connecticut", "Vermont", "New Hampshire", "Nebraska")) -> some
With that setup out of the way, let me introduce the two new functions: `sg_add_marker` and `sg_annotate`. `sg_add_marker` adds a vertical, dotted line that spans the height of the graph and is placed at the designated spot on the x axis. You can add an optional label for the marker by specifying the y position, label text, color, size, space away from the line and how it’s aligned – start (left), center (middle), right (end). This is primarily useful for placing the label on either side of the line.
`sg_annotate` is for adding text anywhere on the streamgraph. The original use for it was to label streams, but you can use it any way you think would add meaning to your streamgraph. You can see them both in action below, where I plot the streamgraph for unemployment (%) for the selected states, then label the start of each recession since 1980 (with the peak national unemployment rate) with a marker and also label each stream:
streamgraph(some, "state", "unemployed_pct", "month") %>% sg_axis_x(tick_interval=10, tick_units = "year", tick_format="%Y") %>% sg_axis_y(0) %>% sg_add_marker(x=as.Date("1981-07-01"), "1981 (10.8%)", anchor="end") %>% sg_add_marker(x=as.Date("1990-07-01"), "1990 (7.8%)", anchor="start") %>% sg_add_marker(x=as.Date("2001-03-01"), "2001 (6.3%)", anchor="end") %>% sg_add_marker(x=as.Date("2007-12-01"), "2007 (10.1%)", anchor="end") %>% sg_annotate(label="Vermont", x=as.Date("1978-04-01"), y=0.6, color="#ffffff") %>% sg_annotate(label="Maine", x=as.Date("1978-03-01"), y=0.30, color="#ffffff") %>% sg_annotate(label="Nebraska", x=as.Date("1977-06-01"), y=0.41, color="#ffffff") %>% sg_annotate(label="Massachusetts", x=as.Date("1977-06-01"), y=0.36, color="#ffffff") %>% sg_annotate(label="New Hampshire", x=as.Date("1978-03-01"), y=0.435, color="#ffffff") %>% sg_annotate(label="California", x=as.Date("1978-02-01"), y=0.175, color="#ffffff") %>% sg_annotate(label="Rhode Island", x=as.Date("1977-11-01"), y=0.55, color="#ffffff") %>% sg_annotate(label="Ohio", x=as.Date("1978-06-01"), y=0.485, color="#ffffff") %>% sg_annotate(label="Connecticut", x=as.Date("1978-01-01"), y=0.235, color="#ffffff") %>% sg_fill_tableau() %>% sg_legend(show=TRUE)
I probably could have positioned the annotations a bit better, but this should be a good enough example to get the general idea. I may add an option to place the marker vertical lines behind streamgraph and will be adding some toggle options to the interactive version (to hide/show markers and/or annotations).
As usual, the package is up [on github](https://github.com/hrbrmstr/streamgraph) and a contiguous copy of the above snippets are in [this gist](https://gist.github.com/hrbrmstr/4e181ae045807ca3a858).
Three final notes. First, I suggest enabling the y axis when you’re trying to figure out where the y position for a label should be (since the y axis range is calculated by the summed span of the data). Second, the x axis works with both dates and continuous values, but you need to match what you setup the streamgraph with. Finally, just a tip: I’ve found [SVG Crowbar 2](http://nytimes.github.io/svg-crowbar/) to be super-helpful when I need to extract these streamgraphs out for non-interactive reproduction. Just yank the SVG out with it and hand it (or a converted form of it) to whomever is handling final production and they should be able to work with it.
3 Comments
Enjoying this widget immensely even though this post was 3 years ago.
I keep getting the following error when trying to use sg_add_marker
Error in function_list[i] : could not find function “sg_add_marker”
I’ve been thinking abt upgrading this to use the new D3 5.0, so stay tuned.
I aslo get the same error. Have you upgrades the sg_add_marker already?
One Trackback/Pingback
[…] In preparation for using some of our streamgraphs for production (PDF/print) graphics, I ended up having to hand-edit labels in on one of the graphics in an Adobe product. This bumped up the priority on adding annotation functions to the streamgraph package (you really don’t want to have to hand-edit graphics if at all possible, […] […]