DataViz Archives

Category Archives: DataViz

Foliage 2023

2023-09-09 – 06:45
Posted in data wrangling, DataVis, DataViz, ggplot, Javascript, Observable
Comments (3)

2023-09-10 UPDATE: Art Steinmetz took me up on the Shiny challenge (at the end of the post) and did a fantastic job!

The days are getting shorter and when we were visiting Down East Maine the other week, there was just a hint of some trees starting to change up their leaf palettes. It was a solid reminder to re-up my ~annual “foliage” plotting that I started way back in 2017.

The fine folks over at Smoky Mountains — (“the most authoritative source for restaurants, attractions, & cabin rentals in the Smoky Mountains”) — have been posting an interactive map of ConUS foliage predictions for many years and the dataset they curate and use for that is also very easy to use in R and other contexts.

This year, along with the usual R version, I have also made:

an Observable Notebook version, using data wrangled from the R version
a Quarto project version (see it rendered, here) that was made almost solely using my Reveal QMD Chrome Extension, which turns Observable notebooks into QMD projects with one click
and a LitElement + Observable Plot + Tachyons CSS + Vanilla JS version (see it live, here)

The only real changes to R version were to add some code to make a more usable JSON for the JavaScript versions of the project, and to take advantage of the .progress parameter to {purrr}’s walk function.

The Observable notebook version (one frame of that is above) makes use of Observable Plot’s super handy geo mark, and also shows how to do some shapefile surgery to avoid plotting Alaska & Hawaii (the Smoky Mountains folks only provide predictions for ConUS).

After using the Reveal QMD extension to make the Quarto project, the qmd document rendered fine, but I tweaked the YAML to send the output to the GH Pages-renderable docs/ directory, and combined some of the OJS blocks to tighten up the document. You’ll see some Quarto “error” blocks, briefly, since there the QMD fetches imports from Observable. You can get around that by moving all the imported resources to the Observable notebook before generating the QMD, but that’s an exercise left to the reader.

And, since I’m a fan of both Lit WebComponents and Tachyons CSS, I threw together a version using them (+ Observable Plot) to further encourage folks to get increasingly familiar with core web tech. Tachyons + Plot make it pretty straightforward to create responsive pages, too (resize the browser and toggle system dark/light mode to prove that). The Lit element’s CSS section also shows how to style Plot’s legend a bit.

Hit up the GH page to see the animated gif (I’ve stared at it a bit too much to include it in the post).

Drop any q’s here or in the GH issues, and — if anyone makes a Shiny version — please let me know, and I’ll add all links to any of those here and on the GH page.

FIN

While it is all well and good to plot foliage prediction maps, please also remember to take some time away from your glowing rectangles to go and actually observe the fall palette changes IRL.

Using the new Plot Javascript Exploratory Visualization Library Sans-Observable

2021-05-09 – 08:25
Posted in d3, data wrangling, DataVis, DataViz, Javascript
Comments (1)

The fine folks over at @ObservableHQ released a new javascript exploratory visualization library called Plot last week with great fanfare. It was primarily designed to be used in Observable notebooks and I quickly tested it out there (you can find them at my Observable landing page: https://observablehq.com/@hrbrmstr).

{Plot} doesn’t require Observable, however, and I threw together a small example that dynamically tracks U.S. airline passenger counts by the TSA to demonstrate how to use it in a plain web page.

It’s small enough that I can re-create it here:

TSA Total Traveler Throughput 2021 vs 2020 vs 2019 (same weekday)

and include the (lightly annotated) source:

fetch(
"https://observable-cors.glitch.me/https://www.tsa.gov/coronavirus/passenger-throughput",
{
  cache: "no-store",
  mode: "cors",
  redirect: "follow"
}
)
.then((response) => response.text()) // we get the text here
.then((html) => {

   var parser = new DOMParser();
   var doc = parser.parseFromString(html, "text/html"); // we turn it into DOM elements here

   // some helpers to make the code less crufty
   // first a function to make proper dates

   var as_date = d3.timeParse("%m/%d/%Y");

   // and, now, a little function to pull a specific <table> column and
   // convert it to a proper numeric array. I would have put this inline
   // if we were only converting one column but there are three of them,
   // so it makes sense to functionize it.

   var col_double = (col_num) => {
     return Array.from(
     doc.querySelectorAll(`table tbody tr td:nth-child(${col_num})`)
     ).map((d) => parseFloat(d.innerText.trim().replace(/,/g, "")));
   };

   // build an arquero table from the scraped columns

   var flights = aq
         .table({
            day: Array.from(
                   doc.querySelectorAll("table tbody tr td:nth-child(1)")
                 ).map((d) => as_date(d.innerText.trim().replace(/.{4}$/g, "2021"))),
            y2021: col_double(2),
            y2020: col_double(3),
            y2019: col_double(4)
        })
        .orderby("day")
        .objects()
        .filter((d) => !isNaN(d.y2021))

   document.getElementById('vis').appendChild(
     Plot.plot({
       marginLeft: 80,
       x: {
         grid: true
       },
       y: {
         grid: true,
         label: "# Passengers"
       },
       marks: [
         Plot.line(flights, { x: "day", y: "y2019", stroke: "#4E79A7" }),
         Plot.line(flights, { x: "day", y: "y2020", stroke: "#F28E2B" }),
         Plot.line(flights, { x: "day", y: "y2021", stroke: "#E15759" })
       ]
    })
  );

})
.catch((err) => err)

FIN

I’ll likely do a more in-depth piece on Plot in the coming weeks (today is Mother’s Dayin the U.S. and that’s going to consume most of my attention today), but I highly encourage y’all to play with this new, fun tool.

COVID-19 U.S. County Vaccination Tracker With An Observable Notebook Using Datasettes and {Plot}

Rather than continue to generate daily images with R, I threw together an Observable notebook that takes advantage of the CDC COVID-19 county data datasette (provided by Simon Willison) and the new {Plot} library (by the @ObservableHQ team) that enables users to interactively see the daily county resident vaccination “series complete” percentage distribution.

The full notebook is here — https://observablehq.com/@hrbrmstr/us-county-covid-vaccination-progress — and the interactive visualization is embedded below (though it doesn’t support “dark mode” well):

<br />

‘data:’ Scraping & Chart Reproduction : Arrows of Environmental Destruction

2019-01-03 – 10:26
Posted in data wrangling, DataVis, DataViz, ggplot, R, web scraping
Comments (3)

Today’s RSS feeds picked up this article by Marianne Sullivan, Chris Sellers, Leif Fredrickson, and Sarah Lamdanon on the woeful state of enforcement actions by the U.S. Environmental Protection Agency (EPA). While there has definitely been overreach by the EPA in the past the vast majority of its regulatory corpus is quite sane and has made Americans safer and healthier as a result. What’s happened to an EPA left in the hands of evil (yep, “evil”) in the past two years is beyond lamentable and we likely have two more years of lamenting ahead of us (unless you actually like your water with a coal ash chaser).

The authors of the article made this chart to show the stark contrast between 2017 and 2018 when it comes to regulatory actions for eight acts:

Clean Air Act (CAA)
Clean Water Act (CWA)
Emergency Planning and Community Right to Know Act (EPCRA)
Federal Insecticide, Fungicide, and Rodenticide Act (FIFRA)
Resource Conservation and Recovery Act (RCRA)
Safe Drinking Water Act (SDWA)
Toxic Substances Control Act (TSCA)
– Comprehensive Environmental Response, Compensation, and Liability Act (CERCLA)

They made this arrow chart (via Datawrapper):

For some reason, that chart sparked a “I really need to make that in R” moment, and thus begat this post.

I’ve got a geom for dumbbell charts but that’s not going to work for this arrow chart since I really wanted to (mostly) reproduce it the way it was. Here’s my go at it.

Data First

Datawrapper embeds have a handy “Get the data” link in them but it’s not a link to a file. It’s a javascript-generated data: href so you either need to click on the link and download it or be hard-headed like I am go the way of pain and scrape it (reproducibility FTW). Let’s get packages and data gathering code out of the way. I’ll exposit a bit more about said data gathering after the code block:

library(stringi)
library(rvest)
library(hrbrthemes) # git[la|hu]b / hrbrmstr / hrbrthemes
library(tidyverse)

article <- read_html("https://theconversation.com/the-epa-has-backed-off-enforcement-under-trump-here-are-the-numbers-108640")

html_node(article, "iframe#psm7n") %>% # find the iframe
  html_attr("src") %>% # get iframe URL
  read_html() %>%  # read it in
  html_node(xpath=".//script[contains(., 'data: ')]") %>% # find the javascript section with the data
  html_text() %>% # get that section
  stri_split_lines() %>% # split into lines so we can target the actual data element
  unlist() %>% 
  keep(stri_detect_fixed, 'data: "Fiscal') %>% # just get the data line
  stri_trim_both() %>% # prep it for extraction
  stri_replace_first_fixed('data: "', "") %>% 
  stri_replace_last_fixed('"', "") %>% 
  stri_replace_all_fixed("\\n", "\n") %>% # make lines lines
  stri_split_lines() %>% 
  unlist() %>%
  stri_split_fixed("\\t") %>% # we now have a list of vectors
  map_dfc(~set_names(list(.x[2:length(.x)]), .x[1])) %>%  # first element of each vector is colname
  type_convert(col_types = "cddn") %>% # get real types
  set_names(c("act", "y2018", "y2017", "pct")) -> psm

psm
## # A tibble: 8 x 4
##   act    y2018 y2017   pct
##   <chr>  <dbl> <dbl> <dbl>
## 1 CAA      199   405   -51
## 2 CERCLA   147   194   -24
## 3 CWA      320   565   -43
## 4 EPCRA     56   107   -48
## 5 FIFRA    363   910   -60
## 6 RCRA     149   275   -46
## 7 SDWA     121   178   -32
## 8 TSCA      80   152   -47

Inside the main article URL content there’s an iframe load:

<p><iframe id="psm7n" class="tc-infographic-datawrapper" src="https://datawrapper.dwcdn.net/psm7n/2/" height="400px" width="100%" style="border: none" frameborder="0"></iframe></p>

We grab the contents of that iframe link (https://datawrapper.dwcdn.net/psm7n/2/) which has a data: line way down towards the bottom of one of the last javascript blocks:

That ugly line gets transformed into a link that will download as a normal CSV file, but we have to do the above wrangling on it before we can get it into a format we can work with.

Now, we can make the chart.

Chart Time!

Let’s get the Y axis in the right order:

psm %>%
  arrange(desc(y2017)) %>%
  mutate(act = factor(act, levels = rev(act))) -> psm

Next, we setup X axis breaks and also get the max value for some positioning calculations (so we don’t hardcode values):

# setup x axis breaks and max value for label position computation
x_breaks <- pretty(c(psm$y2018, psm$y2017))
max_val <- max(x_breaks)

I have two minor nitpicks about the original chart (and changes to them as a result). First, I really don’t like the Y axis gridlines but I do believe we need something to help the eye move horizontally and associate each label to its respective geom. Instead of gridlines I opt for a diminutive dotted line from 0 to the first (min) value.

The second nitpick is that — while the chart has the act information in the caption area — the caption is in alpha order vs the order the act acronyms appear in the data. If it was an alpha bullet list I might not complain, but I chose to modify the order to fit the chart, which we build dynamically with the help of this vector:

# act info for caption
c(
  "CAA" = "Clean Air Act (CAA)",
  "CWA" = "Clean Water Act (CWA)",
  "EPCRA" = "Emergency Planning and Community Right to Know Act (EPCRA)",
  "FIFRA" = "Federal Insecticide, Fungicide, and Rodenticide Act (FIFRA)",
  "RCRA" = "Resource Conservation and Recovery Act (RCRA)",
  "SDWA" = "Safe Drinking Water Act (SDWA)",
  "TSCA" = "Toxic Substances Control Act (TSCA)",
  "CERCLA" = "Comprehensive Environmental Response, Compensation, and Liability Act (CERCLA)"
) -> acts

w125 <- scales::wrap_format(125) # help us word wrap at ~125 chars

# order the vector and turn it into wrapped lines
act_info <- w125(paste0(unname(acts[as.character(psm$act)]), collapse = "; "))

Now, we can generate the geoms. It looks like alot of code, but I like to use newlines to help structure ggplot2 calls. I still miss my old gg <- gg + idiom but RStudio makes it way too easy to execute the whole expression with just the use of + so I’ve succumbed to their behaviour modification. To break it down w/o code, we essentially need:

the arrows for each act
the 2017 and 2018 direct label values for each act
the 2017 and 2018 top “titles”
segments for ^^
title, subtitle and caption(s)

We use percent-maths to position labels and other objects so the code can be re-used for other arrow plots (hardcoding to the data values is likely fine, but you’ll end up tweaking the numbers more and wasting ~2-5m per new chart).

  # dots from 0 to minval
  geom_segment(
    aes(0, act, xend = y2018, yend = act),
    linetype = "dotted", color = "#b2b2b2", size = 0.33
  ) +

  # minval label
  geom_label(
    aes(y2018, act, label = y2018),
    label.size = 0, hjust = 1, size = 3.5, family = font_rc
  ) +

  # maxval label
  geom_label(
    aes(y2017 + (0.0015 * y2017), act, label = y2017),
    label.size = 0, hjust = 0, size = 3.5, family = font_rc
  ) +

  # the measure line+arrow
  geom_segment(
    aes(y2018, act, xend = y2017, yend = act),
    color = "#4a90e2", size = 0.75, # I pulled the color value from the original chart
    arrow = arrow(ends = "first", length = unit(5, "pt"))
  ) +

  # top of chart year (min)
  geom_label(
    data = head(psm, 1),
    aes(y2018, 9, label = "2018"),
    hjust = 0, vjust = 1, label.size = 0, size = 3.75, family = font_rc, color = ft_cols$slate
  ) +

  # top of chart year (max)
  geom_label(
    data = head(psm, 1),
    aes(y2017, 9, label = "2017"),
    hjust = 1, vjust = 1, label.size = 0, size = 3.75, family = font_rc, color = ft_cols$slate
  ) +

  # bar from top of chart year label to first minval measure
  geom_segment(
    data = head(psm, 1),
    aes(
      y2018 + (0.005 * max_val), 8.5, 
      xend = y2018 + (0.005 * max_val), yend = 8.25
    ), 
    size = 0.25
  ) +

  # bar from top of chart year label to first maxval measure
  geom_segment(
    data = head(psm, 1),
    aes(
      y2017 - (0.005 * max_val), 8.5, 
      xend = y2017 - (0.005 * max_val), yend = 8.25
    ), 
    size = 0.25
  ) +

  # fix x axis scale and place breaks
  scale_x_comma(limits = c(0, max_val), breaks = seq(0, max_val, 200)) +

  # make room for top "titles"
  scale_y_discrete(expand = c(0, 1)) +

  labs(
    y = NULL,
    title = "Decline by statute",
    subtitle = "The number of civil cases the EPA brought to conclusion has dropped across a number of federal statutes,\nincluding the Clean Air Act (CAA) and others.",
    x = act_info,
    caption = "Original Chart/Data: The Conversation, CC-BY-ND;<https://bit.ly/2VuJrOT>; Source: Environmental Data & Government Initiative <https://bit.ly/2VpcFyl>"
  ) +
  theme_ipsum_rc(grid = "X") +
  theme(axis.text.x = element_text(color = ft_cols$slate)) +
  theme(axis.title.x = element_text(
    hjust = 0, size = 10, face = "italic", color = ft_cols$gray, margin = margin(t = 10)
  )) +
  theme(plot.caption = element_text(hjust = 0))

Here’s the result:

(it even looks ok in “batman” mode):

FIN

With Microsoft owning GitHub I’m not using gists anymore and the GitLab “snippets” equivalent is just too dog-slow to use, so starting in 2019 I’m self-hosing contiguous R example code used in the blog posts. For the moment, that means links to plain R files but I may just setup gitea for them sometime before the end of Q1. You can find a contiguous, commented version of the above code in here.

If you do your own makeover don’t forget to drop a link to your creation(s) in the comments!

R⁶ — General (Attys) Distributions

2017-07-25 – 11:15
Posted in Data Visualization, data wrangling, DataVis, DataViz, Javascript, R
Tagged post, r6
Comments (2)

Matt @stiles is a spiffy data journalist at the @latimes and he posted an interesting chart on U.S. Attorneys General longevity (given that the current US AG is on thin ice):

Only Watergate and the Civil War have prompted shorter tenures as AG (if Sessions were to leave now). A daily viz: https://t.co/aJ4KDsC5kC pic.twitter.com/ZoiEV3MhGp

— Matt Stiles (@stiles) July 25, 2017

I thought it would be neat (since Matt did the data scraping part already) to look at AG tenure distribution by party, while also pointing out where Sessions falls.

Now, while Matt did scrape the data, it’s tucked away into a javascript variable in an iframe on the page that contains his vis.

It’s still easier to get it from there vs re-scrape Wikipedia (like Matt did) thanks to the V8 package by @opencpu.

The following code:

grabs the vis iframe
extracts and evaluates the target javascript to get a nice data frame
performs some factor re-coding (for better grouping and to make it easier to identify Sessions)
plots the distributions using the beeswarm quasirandom alogrithm

library(V8)
library(rvest)
library(ggbeeswarm)
library(hrbrthemes)
library(tidyverse)

pg <- read_html("http://mattstiles.org/dailygraphics/graphics/attorney-general-tenure-20172517/child.html?initialWidth=840&childId=pym_0&parentTitle=Chart%3A%20If%20Ousted%2C%20Jeff%20Sessions%20Would%20Have%20a%20Historically%20Short%20Tenure%20%7C%20The%20Daily%20Viz&parentUrl=http%3A%2F%2Fthedailyviz.com%2F2017%2F07%2F25%2Fchart-if-ousted-jeff-sessions-would-have-a-historically-short-tenure%2F")

ctx <- v8()
ctx$eval(html_nodes(pg, xpath=".//script[contains(., 'DATA')]") %>% html_text())

ctx$get("DATA") %>% 
  as_tibble() %>% 
  readr::type_convert() %>% 
  mutate(party = ifelse(is.na(party), "Other", party)) %>% 
  mutate(party = fct_lump(party)) %>% 
  mutate(color1 = case_when(
    party == "Democratic" ~ "#313695",
    party == "Republican" ~ "#a50026",
    party == "Other" ~ "#4d4d4d")
  ) %>% 
  mutate(color2 = ifelse(grepl("Sessions", label), "#2b2b2b", "#00000000")) -> ags

ggplot() + 
  geom_quasirandom(data = ags, aes(party, amt, color = color1)) +
  geom_quasirandom(data = ags, aes(party, amt, color = color2), 
                   fill = "#ffffff00", size = 4, stroke = 0.25, shape = 21) +
  geom_text(data = data_frame(), aes(x = "Republican", y = 100, label = "Jeff Sessions"), 
            nudge_x = -0.15, family = font_rc, size = 3, hjust = 1) +
  scale_color_identity() +
  scale_y_comma(limits = c(0, 4200)) +
  labs(x = "Party", y = "Tenure (days)", 
       title = "U.S. Attorneys General",
       subtitle = "Distribution of tenure in office, by days & party: 1789-2017",
       caption = "Source data/idea: Matt Stiles <bit.ly/2vXAHTM>") +
  theme_ipsum_rc(grid = "XY")

I turned the data into a CSV and stuck it in this gist if folks want to play w/o doing the js scraping.

Shuttering Pies With Retiring Stores

2017-04-21 – 22:32
Posted in Charts & Graphs, Data Visualization, DataVis, DataViz, ggplot, R
Tagged post
Comments (7)

I caught this “gem” in the Wall Street Journal tonight:

It’s pretty hard to compare store-to-store, even though it is fairly clear which ones are going-going-gone. If we want to see the relative percentage of each store closing and also want to see how they stack up against each other, then let’s make a column of 100% bars and label total stores in each:

library(hrbrthemes)
library(tidyverse)

read.table(text='store,closing,total
"Radio Shack",550,1500
"Payless",400,2600
"Rue21",400,1100
"The Limited",250,250
"bebe",180,180
"Wet Seal",170,170
"Crocs",160,560
"JCPenny",138,1000
"American Apparel",110,110
"Kmart",109,735
"hhgregg",88,220
"Sears",41,695', sep=",", header=TRUE, stringsAsFactors=FALSE) %>% 
  as_tibble() %>% 
  mutate(remaining = total - closing,
         gone = round((closing/total) * 100)/100,
         stay = 1-gone,
         rem_lab = ifelse(remaining == 0, "", scales::comma(remaining))) %>% 
  arrange(desc(stay)) %>% 
  mutate(store=factor(store, levels=store)) -> closing_df

update_geom_font_defaults(font_rc)

ggplot(closing_df) +
  geom_segment(aes(0, store, xend=gone, yend=store, color="Closing"), size=8) +
  geom_segment(aes(gone, store, xend=gone+stay, yend=store, color="Remaining"), size=8) +
  geom_text(aes(x=0, y=store, label=closing), color="white", hjust=0, nudge_x=0.01) +
  geom_text(aes(x=1, y=store, label=rem_lab), color="white", hjust=1, nudge_x=-0.01) +
  scale_x_percent() +
  scale_color_ipsum(name=NULL) +
  labs(x=NULL, y=NULL, 
       title="Selected 2017 Store closings (estimated)",
       subtitle="Smaller specialty chains such as Bebe and American Apparel are closing their stores,\nwhile lareger chains such as J.C. Penny and Sears are scaling back their footprint.") +
  theme_ipsum_rc(grid="X") +
  theme(axis.text.x=element_text(hjust=c(0, 0.5, 0.5, 0.5, 1))) +
  theme(legend.position=c(0.875, 1.025)) +
  theme(legend.direction="horizontal")

One might try circle packing or a treemap to show both relative store count and percentage, but I think the bigger story is the percent reduction for each retail chain. It’d be cool to see what others come up with.

Bridging The Political [Polygons] Gap with ggplot2

2016-07-07 – 16:27
Posted in Data Visualization, DataVis, DataViz, ggplot, R
Tagged post
Comments (8)

The @pewresearch folks have been collecting political survey data for quite a while, and I noticed the [visualization below](http://www.people-press.org/2014/06/12/section-1-growing-ideological-consistency/#interactive) referenced in a [Tableau vis contest entry](https://www.interworks.com/blog/rrouse/2016/06/24/politics-viz-contest-plotting-political-polarization):

Those are filled [frequency polygons](http://onlinestatbook.com/2/graphing_distributions/freq_poly.html), which are super-easy to replicate in ggplot2, especially since Pew even _kind of_ made the data available via their interactive visualization (it’s available in other Pew resources, just not as compact). So, we can look at all 5 study years for both the general population and politically active respondents with `ggplot2` facets, incorporating the use of `V8`, `dplyr`, `tidyr`, `purrr` and some R spatial functions along the way.

The first code block has the “data”, data transformations and initial plot code. The “data” is really javascript blocks picked up from the `view-source:` of the interactive visualization. We use the `V8` package to get this data then bend it to our will for visuals.

library(V8)
library(dplyr)
library(tidyr)
library(purrr)
library(ggplot2)  # devtools::install_github("hadley/ggplot2)
library(hrbrmisc) # devtools::install_github("hrbrmstr/hrbrmisc)
library(rgeos)
library(sp)

ctx <- v8()
ctx$eval("
	var party_data = [
		[{
			name: 'Dem',
			data: [0.57,1.60,1.89,3.49,3.96,6.56,7.23,8.54,9.10,9.45,9.30,9.15,7.74,6.80,4.66,4.32,2.14,1.95,0.87,0.57,0.12]
		},{
			name: 'REP',
			data: [0.03,0.22,0.28,1.49,1.66,2.77,3.26,4.98,5.36,7.28,7.72,8.16,8.86,8.88,8.64,8.00,6.20,5.80,4.87,4.20,1.34]
		}],
		[{
			name: 'Dem',
			data: [1.22,2.78,3.28,5.12,6.15,7.77,8.24,9.35,9.73,9.19,8.83,8.47,5.98,5.17,3.62,2.87,1.06,0.75,0.20,0.15,0.04]
		}, {
			name: 'REP',
			data: [0.23,0.49,0.65,2.23,2.62,4.06,5.02,7.53,7.70,7.28,7.72,8.15,8.87,8.47,7.08,6.27,4.29,3.99,3.54,2.79,1.03]
		}],
		[{
			name: 'Dem',
			data: [2.07,3.57,4.21,6.74,7.95,8.41,8.58,9.07,8.98,8.46,8.47,8.49,5.39,3.62,2.11,1.98,1.00,0.55,0.17,0.17,0.00]
		}, {
			name: 'REP',
			data: [0.19,0.71,1.04,2.17,2.07,3.65,4.92,7.28,8.26,9.64,9.59,9.55,7.91,7.74,6.84,6.01,4.37,3.46,2.09,1.65,0.86]
		}],
		[{
			name: 'Dem',
			data: [2.97,4.09,4.28,6.65,7.90,8.37,8.16,8.74,8.61,8.15,7.74,7.32,4.88,4.82,2.79,2.07,0.96,0.78,0.41,0.29,0.02]
		}, {
			name: 'REP',
			data: [0.04,0.21,0.28,0.88,1.29,2.64,3.08,4.92,5.84,6.65,6.79,6.92,8.50,8.61,8.05,8.00,7.52,7.51,5.61,4.17,2.50]
		}],
		[{
			name: 'Dem',
			data: [4.81,6.04,6.57,7.67,7.84,8.09,8.24,8.91,8.60,6.92,6.69,6.47,4.22,3.85,1.97,1.69,0.66,0.49,0.14,0.10,0.03]
		}, {
			name: 'REP',
			data: [0.11,0.36,0.49,1.23,1.35,2.35,2.83,4.63,5.09,6.12,6.27,6.41,7.88,8.03,7.58,8.26,8.12,7.29,6.38,5.89,3.34]
		}],
	];

	var party_engaged_data = [
		[{
			name: 'Dem',
			data: [0.88,2.19,2.61,4.00,4.76,6.72,7.71,8.45,8.03,8.79,8.79,8.80,7.23,6.13,4.53,4.31,2.22,2.01,1.05,0.66,0.13]
		}, {
			name: 'REP',
			data: [0.00,0.09,0.09,0.95,1.21,1.67,2.24,3.22,3.70,6.24,6.43,6.62,8.01,8.42,8.97,8.48,7.45,7.68,8.64,7.37,2.53]
		}],
		[{
			name: 'Dem',
			data: [1.61,3.35,4.25,6.75,8.01,8.20,8.23,9.14,8.94,8.68,8.46,8.25,4.62,3.51,2.91,2.63,1.19,0.74,0.24,0.17,0.12]
		},{
			name: 'REP',
			data: [0.21,0.38,0.68,1.62,1.55,2.55,3.99,4.65,4.31,5.78,6.28,6.79,8.47,9.01,8.61,8.34,7.16,6.50,6.10,4.78,2.25]
		}],
		[{
			name: 'Dem',
			data: [3.09,4.89,6.22,9.40,9.65,9.20,8.99,6.48,7.36,7.67,6.95,6.22,4.53,3.79,2.19,2.02,0.74,0.07,0.27,0.27,0.00]
		}, {
			name: 'REP',
			data: [0.29,0.59,0.67,2.11,2.03,2.67,4.12,6.55,6.93,8.42,8.79,9.17,7.33,6.84,7.42,7.25,6.36,5.32,3.35,2.57,1.24]
		}],
		[{
			name: 'Dem',
			data: [6.00,5.24,5.11,7.66,9.25,8.25,8.00,8.09,8.12,7.05,6.59,6.12,4.25,4.07,2.30,1.49,0.98,0.80,0.42,0.16,0.06]
		}, {
			name: 'REP',
			data: [0.00,0.13,0.13,0.48,0.97,2.10,2.73,3.14,3.64,5.04,5.30,5.56,6.87,6.75,8.03,9.33,11.01,10.49,7.61,6.02,4.68]
		}],
		[{
			name: 'Dem',
			data: [9.53,9.68,10.35,9.33,9.34,7.59,6.67,6.41,6.60,5.21,4.84,4.47,2.90,2.61,1.37,1.14,0.73,0.59,0.30,0.28,0.06]
		}, {
			name: 'REP',
			data: [0.15,0.11,0.13,0.46,0.52,1.18,1.45,2.46,2.84,4.15,4.37,4.60,6.36,6.66,7.34,9.09,11.40,10.53,10.58,9.85,5.76]
		}],
	];
")

years <- c(1994, 1999, 2004, 2001, 2014)

# Transform the javascript data -------------------------------------------

party_data <- ctx$get("party_data")
map_df(1:length(party_data), function(i) {
  x <- party_data[[i]]
  names(x$data) <- x$name
  dat <- as.data.frame(x$data)
  bind_cols(dat, data_frame(x=-10:10, year=rep(years[i], nrow(dat))))
}) -> party_data

party_engaged_data <- ctx$get("party_engaged_data")
map_df(1:length(party_engaged_data), function(i) {
  x <- party_engaged_data[[i]]
  names(x$data) <- x$name
  dat <- as.data.frame(x$data)
  bind_cols(dat, data_frame(x=-10:10, year=rep(years[i], nrow(dat))))
}) -> party_engaged_data

# We need it in long form -------------------------------------------------

gather(party_data, party, pct, -x, -year) %>%
  mutate(party=factor(party, levels=c("REP", "Dem"))) -> party_data_long

gather(party_engaged_data, party, pct, -x, -year) %>%
  mutate(party=factor(party, levels=c("REP", "Dem"))) -> party_engaged_data_long

# Traditional frequency polygon plots -------------------------------------

gg <- ggplot()
gg <- gg + geom_ribbon(data=party_data_long,
                       aes(x=x, ymin=0, ymax=pct, fill=party, color=party), alpha=0.5)
gg <- gg + scale_x_continuous(expand=c(0,0), breaks=c(-8, 0, 8),
                              labels=c("Consistently\nliberal", "Mixed", "Consistently\nconservative"))
gg <- gg + scale_y_continuous(expand=c(0,0), limits=c(0, 12))
gg <- gg + scale_color_manual(name=NULL, values=c(Dem="#728ea2", REP="#cf6a5d"),
                              labels=c(Dem="Democrats", REP="Republicans"))
gg <- gg + guides(color="none", fill=guide_legend(override.aes=list(alpha=1)))
gg <- gg + scale_fill_manual(name=NULL, values=c(Dem="#728ea2", REP="#cf6a5d"),
                             labels=c(Dem="Democrats", REP="Republicans"))
gg <- gg + facet_wrap(~year, ncol=2, scales="free_x")
gg <- gg + labs(x=NULL, y=NULL,
                title="Political Polarization, 1994-2014 (General Population)",
                caption="Source: http://www.people-press.org/2014/06/12/section-1-growing-ideological-consistency/iframe/")
gg <- gg + theme_hrbrmstr_an(grid="")
gg <- gg + theme(panel.margin=margin(t=30, b=30, l=30, r=30))
gg <- gg + theme(legend.position=c(0.75, 0.1))
gg <- gg + theme(legend.direction="horizontal")
gg <- gg + theme(axis.text.y=element_blank())
gg

gg <- ggplot()
gg <- gg + geom_ribbon(data=party_engaged_data_long,
                       aes(x=x, ymin=0, ymax=pct, fill=party, color=party), alpha=0.5)
gg <- gg + scale_x_continuous(expand=c(0,0), breaks=c(-8, 0, 8),
                              labels=c("Consistently\nliberal", "Mixed", "Consistently\nconservative"))
gg <- gg + scale_y_continuous(expand=c(0,0), limits=c(0, 12))
gg <- gg + scale_color_manual(name=NULL, values=c(Dem="#728ea2", REP="#cf6a5d"),
                              labels=c(Dem="Democrats", REP="Republicans"))
gg <- gg + guides(color="none", fill=guide_legend(override.aes=list(alpha=1)))
gg <- gg + scale_fill_manual(name=NULL, values=c(Dem="#728ea2", REP="#cf6a5d"),
                             labels=c(Dem="Democrats", REP="Republicans"))
gg <- gg + facet_wrap(~year, ncol=2, scales="free_x")
gg <- gg + labs(x=NULL, y=NULL,
                title="Political Polarization, 1994-2014 (Politically Active)",
                caption="Source: http://www.people-press.org/2014/06/12/section-1-growing-ideological-consistency/iframe/")
gg <- gg + theme_hrbrmstr_an(grid="")
gg <- gg + theme(panel.margin=margin(t=30, b=30, l=30, r=30))
gg <- gg + theme(legend.position=c(0.75, 0.1))
gg <- gg + theme(legend.direction="horizontal")
gg <- gg + theme(axis.text.y=element_blank())
gg

It provides a similar effect to the Pew & Interworks visuals using alpha transparency to blend the point of polygon intersections. But I _really_ kinda like the way both Pew & Interworks did their visualizations without alpha blending yet still highlighting the intersected areas. We can do that in R as well with a bit more work by:

– grouping each data frame by year
– turning each set of points (Dem & Rep) to R polygons
– computing the intersection of those polygons
– turning that intersection back into a data frame
– adding this new polygon to the plots while also removing the alpha blend

Here’s what that looks like in code:

# Setup a function to do the polygon intersection -------------------------

polysect <- function(df) {

  bind_rows(data_frame(x=-10, pct=0),
            select(filter(df, party=="Dem"), x, pct),
            data_frame(x=10, pct=0)) %>%
    as.matrix() %>%
    Polygon() %>%
    list() %>%
    Polygons(1) %>%
    list() %>%
    SpatialPolygons() -> dem

  bind_rows(data_frame(x=-10, pct=0),
            select(filter(df, party=="REP"), x, pct),
            data_frame(x=10, pct=0)) %>%
    as.matrix() %>%
    Polygon() %>%
    list() %>%
    Polygons(1) %>%
    list() %>%
    SpatialPolygons() -> rep

  inter <- gIntersection(dem, rep)
  inter <- as.data.frame(inter@polygons[[1]]@Polygons[[1]]@coords)[c(-1, -25),]
  inter <- mutate(inter, year=df$year[1])
  inter

}

# Get the intersected area ------------------------------------------------

group_by(party_data_long, year) %>%
  do(polysect(.)) -> general_sect

group_by(party_engaged_data_long, year) %>%
  do(polysect(.)) -> engaged_sect


# Try the plots again -----------------------------------------------------

gg <- ggplot()
gg <- gg + geom_ribbon(data=party_data_long,
                       aes(x=x, ymin=0, ymax=pct, fill=party, color=party))
gg <- gg + geom_ribbon(data=general_sect, aes(x=x, ymin=0, ymax=y), color="#666979", fill="#666979")
gg <- gg + scale_x_continuous(expand=c(0,0), breaks=c(-8, 0, 8),
                              labels=c("Consistently\nliberal", "Mixed", "Consistently\nconservative"))
gg <- gg + scale_y_continuous(expand=c(0,0), limits=c(0, 12))
gg <- gg + scale_color_manual(name=NULL, values=c(Dem="#728ea2", REP="#cf6a5d"),
                              labels=c(Dem="Democrats", REP="Republicans"))
gg <- gg + guides(color="none", fill=guide_legend(override.aes=list(alpha=1)))
gg <- gg + scale_fill_manual(name=NULL, values=c(Dem="#728ea2", REP="#cf6a5d"),
                             labels=c(Dem="Democrats", REP="Republicans"))
gg <- gg + facet_wrap(~year, ncol=2, scales="free_x")
gg <- gg + labs(x=NULL, y=NULL,
                title="Political Polarization, 1994-2014 (General Population)",
                caption="Source: http://www.people-press.org/2014/06/12/section-1-growing-ideological-consistency/iframe/")
gg <- gg + theme_hrbrmstr_an(grid="")
gg <- gg + theme(panel.margin=margin(t=30, b=30, l=30, r=30))
gg <- gg + theme(legend.position=c(0.75, 0.1))
gg <- gg + theme(legend.direction="horizontal")
gg <- gg + theme(axis.text.y=element_blank())
gg

gg <- ggplot()
gg <- gg + geom_ribbon(data=party_engaged_data_long,
                       aes(x=x, ymin=0, ymax=pct, fill=party, color=party))
gg <- gg + geom_ribbon(data=engaged_sect, aes(x=x, ymin=0, ymax=y), color="#666979", fill="#666979")
gg <- gg + scale_x_continuous(expand=c(0,0), breaks=c(-8, 0, 8),
                              labels=c("Consistently\nliberal", "Mixed", "Consistently\nconservative"))
gg <- gg + scale_y_continuous(expand=c(0,0), limits=c(0, 12))
gg <- gg + scale_color_manual(name=NULL, values=c(Dem="#728ea2", REP="#cf6a5d"),
                              labels=c(Dem="Democrats", REP="Republicans"))
gg <- gg + guides(color="none", fill=guide_legend(override.aes=list(alpha=1)))
gg <- gg + scale_fill_manual(name=NULL, values=c(Dem="#728ea2", REP="#cf6a5d"),
                             labels=c(Dem="Democrats", REP="Republicans"))
gg <- gg + facet_wrap(~year, ncol=2, scales="free_x")
gg <- gg + labs(x=NULL, y=NULL,
                title="Political Polarization, 1994-2014 (Politically Active)",
                caption="Source: http://www.people-press.org/2014/06/12/section-1-growing-ideological-consistency/iframe/")
gg <- gg + theme_hrbrmstr_an(grid="")
gg <- gg + theme(panel.margin=margin(t=30, b=30, l=30, r=30))
gg <- gg + theme(legend.position=c(0.75, 0.1))
gg <- gg + theme(legend.direction="horizontal")
gg <- gg + theme(axis.text.y=element_blank())
gg

Without much extra effort/work we now have what I believe to be a more striking set of visuals. (And, I should probably makes a `points_to_spatial_polys()` convenience function.)

You’ll find the “overall” group data as well as the party median values in [the Pew HTML source code](view-source:http://www.people-press.org/2014/06/12/section-1-growing-ideological-consistency/iframe/) if you want to try to fully replicate their visualizations.

Making “Time Rivers” in R

2016-06-28 – 17:18
Posted in Charts & Graphs, Data Visualization, DataVis, DataViz, ggplot, R
Tagged post
Comments (5)

Once again, @albertocairo notices an interesting chart and spurs pondering in the visualization community with [his post](http://www.thefunctionalart.com/2016/06/defying-conventions-in-visualization.html) covering an unusual “vertical time series” chart produced for the print version of the NYTimes:

I’m actually less concerned about the vertical time series chart component here since I agree with TAVE^* Cairo that folks are smart enough to grok it and that it will be a standard convention soon enough given the prevalence of our collective tiny, glowing rectangles. The Times folks plotted Martin-Quinn (M-Q) scores for the U.S. Supreme Court justices which are estimates of how liberal or conservative a justice was in a particular term. Since they are estimates they aren’t exact and while it’s fine to plot the mean value (as suggested by the M-Q folks), if we’re going to accept the intelligence of the reader to figure out the nouveau time series layout, perhaps we can also show them some of the uncertainty behind these estimates.

What I’ve done below is take the data provided by the M-Q folks and make what I’ll call a vertical time series river plot using the mean, median and one standard deviation. This shows the possible range of real values the estimates can take and provides a less-precise but more forthright view of the values (in my opinion). You can see right away that they estimates are not so precise, but there is still an overall trend for the justices to become more liberal in modern times.

The ggplot2 code is a bit intricate, which is one reason I’m posting it. You need to reorient your labeling mind due to the need to use `coord_flip()`. I also added an arrow on the Y-axis to show how time flows. I think the vis community will need to help standardize on some good practices for how to deal with these vertical time series charts to help orient readers more quickly. In a more dynamic visualization, either using something like D3 or even just stop-motion animation, the flow could actually draw in the direction time flows, which would definitely make it easier immediately orient the reader.

However, the main point here is to not be afraid to show uncertainty. In fact, the more we all work at it, the better we’ll all be able to come up with effective ways to show it.

^* == “The Awesome Visualization Expert” since he winced at my use of “Dr. Cairo” :-)

library(dplyr)
library(readr)
library(ggplot2)  # devtools::install_github("hadley/ggplot2")
library(hrbrmisc) # devtools::install_github("hrbrmstr/hrbrmisc")
library(grid)
library(scales)

URL <- "http://mqscores.berkeley.edu/media/2014/justices.csv"
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil)

justices <- read_csv(fil)

justices %>%
  filter(term>=1980,
         justiceName %in% c("Thomas", "Scalia", "Alito", "Roberts", "Kennedy",
                            "Breyer", "Kagan", "Ginsburg", "Sotomayor")) %>%
  mutate(col=ifelse(justiceName %in% c("Breyer", "Kagan", "Ginsburg", "Sotomayor"),
                    "Democrat", "Republican")) -> recent

just_labs <- data_frame(
  label=c("Thomas", "Scalia", "Alito", "Roberts", "Kennedy", "Breyer", "Kagan", "Ginsburg", "Sotomayor"),
      x=c(  1990.5,   1985.5,  2004.5,    2004.5,    1986.5,      1994,   2010,     1992.5,      2008.5),
      y=c(     2.9,      1.4,    1.35,       1.7,       1.0,      -0.1,   -0.9,       -0.1,          -2)
)

gg <- ggplot(recent)
gg <- gg + geom_hline(yintercept=0, alpha=0.5)
gg <- gg + geom_label(data=data.frame(x=c(0.1, -0.1),
                                      label=c("More →\nconservative", "← More\nliberal"),
                                      hjust=c(0, 1)), aes(y=x, x=1982, hjust=hjust, label=label),
                      family="Arial Narrow", fontface="bold", size=4, label.size=0, vjust=1)
gg <- gg + geom_ribbon(aes(ymin=post_mn-post_sd, ymax=post_mn+post_sd, x=term,
                             group=justice, fill=col, color=col), size=0.1, alpha=0.3)
gg <- gg + geom_line(aes(x=term, y=post_med, color=col, group=justice), size=0.1)
gg <- gg + geom_text(data=just_labs, aes(x=x, y=y, label=label),
                     family="Arial Narrow", size=2.5)
gg <- gg + scale_x_reverse(expand=c(0,0), limits=c(2014, 1982),
                           breaks=c(2014, seq(2010, 1990, -10), 1985, 1982),
                           labels=c(2014, seq(2010, 1990, -10), "1985\nTERM\n↓", ""))
gg <- gg + scale_y_continuous(expand=c(0,0), labels=c(-2, "0\nM-Q Score", 2, 4))
gg <- gg + scale_color_manual(name=NULL, values=c(Democrat="#2166ac", Republican="#b2182b"), guide=FALSE)
gg <- gg + scale_fill_manual(name="Nominated by a", values=c(Democrat="#2166ac", Republican="#b2182b"))
gg <- gg + coord_flip()
gg <- gg + labs(x=NULL, y=NULL,
                title="Martin-Quinn scores for selected justices, 1985-2014",
                subtitle="Ribbon band derived from mean plus one standard deviation. Inner line is the M-Q median.",
                caption="Data source: http://mqscores.berkeley.edu/measures.php")
gg <- gg + theme_hrbrmstr_an(grid="XY")
gg <- gg + theme(plot.subtitle=element_text(margin=margin(b=15)))
gg <- gg + theme(legend.title=element_text(face="bold"))
gg <- gg + theme(legend.position=c(0.05, 0.6))
gg <- gg + theme(plot.margin=margin(20,20,20,20))
gg

Yes, I manually positioned the names of the justices, hence the weird spacing for those lines. Also, after publishing this post, I tweaked the line-height of the “More Liberal”/”More Conservative” top labels a bit and would definitely suggest doing that to anyone attempting to reproduce this code (the setting I used was `0.9`).