Skip navigation

Category Archives: ggplot

The days are getting shorter and when we were visiting Down East Maine the other week, there was just a hint of some trees starting to change up their leaf palettes. It was a solid reminder to re-up my ~annual “foliage” plotting that I started way back in 2017.

The fine folks over at Smoky Mountains — (“the most authoritative source for restaurants, attractions, & cabin rentals in the Smoky Mountains”) — have been posting an interactive map of ConUS foliage predictions for many years and the dataset they curate and use for that is also very easy to use in R and other contexts.

This year, along with the usual R version, I have also made:

The only real changes to R version were to add some code to make a more usable JSON for the JavaScript versions of the project, and to take advantage of the .progress parameter to {purrr}’s walk function.

The Observable notebook version (one frame of that is above) makes use of Observable Plot’s super handy geo mark, and also shows how to do some shapefile surgery to avoid plotting Alaska & Hawaii (the Smoky Mountains folks only provide predictions for ConUS).

After using the Reveal QMD extension to make the Quarto project, the qmd document rendered fine, but I tweaked the YAML to send the output to the GH Pages-renderable docs/ directory, and combined some of the OJS blocks to tighten up the document. You’ll see some Quarto “error” blocks, briefly, since there the QMD fetches imports from Observable. You can get around that by moving all the imported resources to the Observable notebook before generating the QMD, but that’s an exercise left to the reader.

And, since I’m a fan of both Lit WebComponents and Tachyons CSS, I threw together a version using them (+ Observable Plot) to further encourage folks to get increasingly familiar with core web tech. Tachyons + Plot make it pretty straightforward to create responsive pages, too (resize the browser and toggle system dark/light mode to prove that). The Lit element’s CSS section also shows how to style Plot’s legend a bit.

Hit up the GH page to see the animated gif (I’ve stared at it a bit too much to include it in the post).

Drop any q’s here or in the GH issues, and — if anyone makes a Shiny version — please let me know, and I’ll add all links to any of those here and on the GH page.

FIN

While it is all well and good to plot foliage prediction maps, please also remember to take some time away from your glowing rectangles to go and actually observe the fall palette changes IRL.

I was cranking out a blog post for work earlier this week that shows off just how many integrations our platform has. I won’t blather about that content here, but as I was working on it, I really wanted to show off all the integrations.

A table seemed far too boring.

Several categorized unordered lists seemed too unwieldy.

Then, it dawned on me that I could make a visual representation of all the integration partners we have by thinking of the entire integrations’ ecosystem as a “universe” with each category being a “solar system” of that universe.

I’ve been leaning more heavily on javsascript for datavis these days, but I will always be more comfortable in {ggplot2}, so I headed to R to design a way to:

  • generate concentric orbits for “n” solar systems
  • randomize the placement of the planets in each ring
  • make a decent plot!

I worked with one of the most amazing designers on the planet (heh) to come up with some stellar (heh) styling for it, and this was the result:

5 solar system panels

I took the styling guidance and wrapped the messy, individual functions I had into a new {ggsolar} package, you can find at https://github.com/hrbrmstr/ggsolar.

It’s pretty raw, and I need to “geomify” it at some point, but it has

  • a function to generate the concentric circle polygons
  • another one to identify a random point on each ring
  • a naive plotting function, and
  • a theme cleanup function for decent output.

The default is to generate uniformly distributed concentric circles, but you have the option of supplying a custom radii vector to make it more “real”/“solar-sysetm-y”.

Here’s the general flow:

# sol_planets is a built in vector of our system's planet names
sol_orbits <- generate_orbits(sol_planets)

set.seed(1323) # this produced a decent placements

# naive but it works! You can specify your own point picker, too.
placed_planets <- randomize_planet_positions(sol_orbits)

# do the thing!
plot_orbits(
  orbits = sol_orbits, 
  planet_positions = placed_planets,
  label_planets = TRUE,
  label_family = hrbrthemes::font_es_bold
) +
  hrbrthemes::theme_ipsum_es(grid="") +
  coord_equal() +
  labs(
    title = "Sol",
    caption = "Pluto is 100% a planet"
  ) +
  theme_enhance_solar()

Random Systems

I included a generate_random_planets() function that uses a hidden Markov model to create believable planetary names, so you can now make your own universe with {ggplot2}!

set.seed(42)
(rando_planets <- generate_random_planets(12))

rando_orbits <- generate_orbits(rando_planets)

set.seed(123) # this produced decent placements

placed_planets <- randomize_planet_positions(rando_orbits)

plot_orbits(
  orbits = rando_orbits, 
  planet_positions = placed_planets,
  label_planets = TRUE,
  label_family = hrbrthemes::font_es_bold
) +
  hrbrthemes::theme_ipsum_es(grid="") +
  coord_equal() +
  labs(
    title = "Rando System"
  ) +
  theme_enhance_solar()

random system

FIN

Kick the tyres, use {gganimate} to make some animations, and be the ruler of your own universe! (We’re going to try to generate team “org charts” with these later in the week, so be creative, too!).

I have graphics working in Vanilla JS WebR, now, and I’ll cover the path to that in two parts.

The intent was to jump straight into ggplot2-land, but, as you saw in my previous post, WASM’d ggplot2 is a bear. And, I really didn’t grok what the WebR site docs were saying about how to deal with the special WebR canvas() device until I actually tried to work with it and failed miserably.

You will need to have gotten caught up on the previous WebR blog posts and experiments as I’m just covering some of the gnarly bits.

Not Your Parents’ evalR…()

If you’ve either been playing a bit with WebR or peeked under the covers of what others are doing, you’ve seen the evalR…() family of functions which evaluate supplied R code and optionally return a result. Despite reading the WebR docs on “canvas”, daft me tried to simply use one of those “eval” functions, to no avail.

The solution involves:

I’m going to block quote a key captureR since it explains “why” pretty well. Hit me up anywhere you like if you desire more info.

Unlike evalR() which only returns one R object, captureR() returns a variable number of objects when R conditions are captured. Since this makes memory management of individual objects unwieldy, captureR() requires the shelter approach to memory management, where all the sheltered objects are destroyed at once.

Let’s work through the “plottR” function I made to avoid repeating code to just get images out of R. It takes, as input:

  • an initialized WebR context
  • code that will produce something on a graphics device
  • dimensions of the image
  • the HTML <canvas> id to shove the image data to (we’ll explain this after the code block)
async function plottR(webR, plot_code = "plot(mtcars, col='blue')",
                        width = 400, height = 400, 
                            id = "base-canvas") {

  const webRCodeShelter = await new webR.Shelter();

  await webR.evalRVoid(`canvas(width=${width}, height=${height})`);

  const result = await webRCodeShelter.captureR(`${plot_code}`, {
    withAutoprint: true,
    captureStreams: true,
    captureConditions: false,
    env: webR.objs.emptyEnv,
  });

  await webR.evalRVoid("dev.off()");

  const msgs = await webR.flush();

  const canvas = document.getElementById(id)
  canvas.setAttribute("width", 2 * width);
  canvas.setAttribute("height", 2 * height);

  msgs.forEach(msg => {
    if (msg.type === "canvasExec") Function(`this.getContext("2d").${msg.data}`).bind(canvas)()
  });

}

You 100% need to read up on the HTML canvas element if you’re going to wield WebR yourself vs use Quarto, Shiny, Jupyter-lite, or anything else clever folks come up with. The output of your plots is going to be a series of HTML canvas instructions to do things like “move here”, “switch to this color”, “draw an ellipse”, etc. I will be linking to a full example of the canvas instructions output towards the end.

Now, let’s work through the function’s innards.

const webRCodeShelter = await new webR.Shelter();

gives us a temporary place to execute R code, knowing all the memory consumed will go away after we’re immediately done with it. Unlike the baked-in “global” shelter, this one is super ephemeral.

await webR.evalRVoid(`canvas(width=${width}, height=${height})`);

This is just like a call to png(…), svglite(…), pdf(…), etc. Check out coolbutuseless’ repo for tons of great examples of alternate graphics devices. I have a half-finished one for omnigraffle. They aren’t “hard” to write, but I think they are very tedious to crank through.

const result = await webRCodeShelter.captureR(`${plot_code}`, {
  withAutoprint: true,
  captureStreams: true,
  captureConditions: false,
  env: webR.objs.emptyEnv,
});

is different from what you’re used to. The captureR function will evaluate the given code, and takes some more options, described in the docs. TL;DR: we’re asking the evaluator to give us back pretty much what’d we see in the R console: all console messages and output streams, plus it does the familiar “R object autoprint” that you get for free when you fire up an R console.

So, we’ve sent our plot code into the abyss, and — since this is 100% like “normal” graphics devices — we also need to do the dev.off dance:

await webR.evalRVoid("dev.off()");

This will cause the rendering to happen.

Right now, where you can’t see it, is the digital manifestation of your wonderful plot. That’s great, but we’d like to see it!

const msgs = await webR.flush();

will tell it to get on with it and make sure everything that needs to be done is done. If you’re not familiar with async/await yet, you really need to dig into that to survive in DIY WebR land.

const canvas = document.getElementById(id)
canvas.setAttribute("width", 2 * width);   // i still need to read "why 2x"
canvas.setAttribute("height", 2 * height);

msgs.forEach(msg => {
  if (msg.type === "canvasExec") Function(`this.getContext("2d").${msg.data}`).bind(canvas)()
});

finds our HTML canvas element and then throws messages at it; alot of messages. To see the generated code for the 3D perspective plot example, head to this gist where I’ve pasted all ~10K instructions.

To make said persp plot, it’s just a simple call, now:

await plottR(webR, `basetheme("dark"); persp(x, y, z, theta=-45)`)

I used the default id for the canvas in the online example.

“How Did You Use The basetheme Package? It’s Not In The WASM R Repo?”

I yanked the four R source code files from the package and just source‘d them into the WebR environment:

const baseThemePackage = [ "basetheme.R", "coltools.R", "themes.R", "utils.R" ];

// load up the source from the basetheme pkg
for (const rSource of baseThemePackage) {
  console.log(`Sourcing: ${rSource}…`)
  await globalThis.webR.evalRVoid(`source("https://rud.is/w/ggwebr/r/${rSource}")`)
}

10K+ Lines Is Alot Canvas Code…

Yep! But, that’s how the HTML canvas element works and it’s shockingly fast, as you’ve seen via the demo.

FIN

We’ll cover a bit more in part 2 when we see how to get ggplot2 working, which will include a WebR version of {hrbrthemes}! I also want to thank James Balamuta for the Quarto WebR project which helped me out quite a bit in figuring this new tech out.

Before I let you go, I wanted to note that in those “messages” (the ones we pulled canvasExec call out of), there are message types that are not canvasExec (hence our need to filter them).

I thought you might want to know what they are, so I extracted the JSON, and ran it through some {dplyr}:

msgs |> 
  filter(
    type != "canvasExec"
  ) |> 
  pull(data) |> 
  writeLines()
R version 4.1.3 (2022-03-10) -- "One Push-Up"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: wasm32-unknown-emscripten (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> 

The New York Times had a [tragic] story on Covid deaths today and one of their plots really stuck with me for how well it told that part of the story.

NYT Chart on Covid Deaths

NOTE: The red panel highlights are off a bit as I manually typed the data in (I only did the recreation to keep {ggplot2} muscle memory as I hadn’t doe a major customization like this in quite some time).

Only one {grid} hack (for the faceted X axis labels) too!

Hopefully, I’ll have more real-world opportunity to build some detailed, properly-annotated {ggplot2} plots this year.

Shout out to @ClausWilke for {ggtext} and all the folks who’ve made {ggplot2} such a powerful data visualization tool.

library(grid)
library(gtable)
library(hrbrthemes)
library(tidyverse)

gtable_filter_remove <- function (x, name, trim = FALSE) {
  # https://stackoverflow.com/a/36780639
  matches <- !(x$layout$name %in% name)
  x$layout <- x$layout[matches, , drop = FALSE]
  x$grobs <- x$grobs[matches]
  if (trim) 
    x <- gtable_trim(x)
  x
}

read.csv(text="race,age_group,before,after,cause
White,Under 25,1,3,Covid-19 deaths increased as a share of deaths from all cause
White,25-44,3,10,Covid-19 deaths increased as a share of deaths from all cause
White,45-64,8,15,Covid-19 deaths increased as a share of deaths from all cause
White,65-84,13,11,NA
White,85+,14,6,NA
Hispanic,Under 25,3,4,Covid-19 deaths increased as a share of deaths from all cause
Hispanic,25-44,17,21,Covid-19 deaths increased as a share of deaths from all cause
Hispanic,45-64,33,26,NA
Hispanic,65-84,33,17,NA
Hispanic,85+,21,9,NA
Black,Under 25,1,3,Covid-19 deaths increased as a share of deaths from all cause
Black,25-44,7,13,Covid-19 deaths increased as a share of deaths from all cause
Black,45-64,15,17,Covid-19 deaths increased as a share of deaths from all cause
Black,65-84,20,12,Covid-19 deaths increased as a share of deaths from all cause
Black,85+,17,8,NA
Asian,Under 25,2,4,Covid-19 deaths increased as a share of deaths from all cause
Asian,25-44,12,14,Covid-19 deaths increased as a share of deaths from all cause
Asian,45-64,21,13,NA
Asian,65-84,23,8,NA
Asian,85+,17,4,NA") -> xdf

xdf %>% 
  mutate(
    before = before/100,
    after = after/100,
    age_group = fct_inorder(age_group),
    race = factor(race, levels = rev(c("Asian", "Black", "Hispanic", "White")))
  ) -> xdf

{

  ggplot( data = xdf) +
    geom_rect(
      data = xdf,
      aes(
        xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf,
        fill = cause
      ),
      alpha = 1/6, color = NA
    ) +
    geom_rect(
      data = xdf %>% 
        filter(
          (race == "White" & age_group %in% c("65-84", "85+")) |
            (race == "Hispanic" & age_group %in% c("45-64", "65-84", "85+")) |
            (race == "Black" & age_group %in% c("85+")) |
            (race == "Asian" & age_group %in% c("45-64", "65-84", "85+"))
        ),
      aes(
        xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf
      ),
      fill = "#999999", alpha = 1/6, color = NA
    ) +
    geom_segment(
      aes(-Inf, xend = Inf, -Inf, yend= -Inf),
      size = 0.25, color = "black"
    ) +
    geom_segment(
      data = xdf, aes("1", before, xend="2", yend=after),
      size = 0.25
    ) +
    geom_point(
      data = xdf, aes("1", before), 
      fill = "#999999", color = "white", size = 2, stroke = 0.5, shape = 21
    ) +
    geom_point(
      data = xdf, aes("2", after),
      fill = "#bb271a", color = "white", size = 2, stroke = 0.5, shape = 21
    ) +
    geom_text(
      data = xdf,
      aes("1", before+0.05, label = scales::percent(before, 1)),
      color = "#999999", family = font_es_bold, fontface = "bold", size = 3
    ) +
    geom_text(
      data = xdf,
      aes("2", after+0.05, label = scales::percent(after, 1)),
      color = "#bb271a", family = font_es_bold, fontface = "bold", size = 3
    ) +
    scale_x_discrete(
      expand = c(0, 0),
      labels = c("<span style='color:#999999'>BEFORE</span>", "<span style='color:#bb271a'>AFTER</a>")
    ) +
    scale_y_percent(
      limits = c(-0.005, 0.405),
      breaks = c(-0.005, 0.1, 0.2, 0.3, 0.405),
      labels = c("", "", "", "", "40%\nof deaths from\nall causes for\nthis group")
    ) +
    scale_fill_manual(
      name = NULL,
      values = c("#bb271a"),
      na.translate = FALSE
    ) +
    coord_cartesian(clip = "off") +
    facet_wrap(
      facets = race~age_group, 
      scales = "free_x",
      labeller = \(labels, multi_line = TRUE){
        labels <- lapply(labels, as.character)
        labels[["race"]][c(1,2,4,5,6,7,9,10,11,12,14,15,16,17,19,20)] <- ""
        labels[["age_group"]] <- sprintf("<span style='font-style:normal;font-weight:normal;'>%s</span>", labels[["age_group"]])
        labels[["race"]][c(3,8,13,18)] <- sprintf("<span style='font-size:12pt;'>**%s**</span>", labels[["race"]][c(3,8,13,18)])
        labels
      }
    ) +
    labs(
      x = NULL, y = NULL,
      title = "Covid-19 deaths <span style='color:#999999'>before</span> and <span style='color:#bb271a'>after</span> universal adult vaccine eligibility",
      caption = "Source: Provisional weekly death data from the C.D.C. through Nov. 27. Note: Only the four largest racial and ethnic groups are included. Universal vaccine eligibility was April 19, the date when all adults in the United States were eligible for vaccination."
    ) +
    theme_ipsum_es(grid="Y", plot_title_size = 16) +
    theme(
      plot.title.position = "plot",
      plot.title = ggtext::element_markdown(hjust = 0.5),
      plot.caption = ggtext::element_textbox_simple(
        hjust = 0, size = 8.5, family = font_es, color = "#999999",
        margin = margin(t = 14)
      ),
      axis.ticks.x.bottom = ell(size = 0.25) ,
      axis.line.x.bottom = ell(lineend = "square", size = 0.25),
      axis.text.x.bottom = ggtext::element_markdown(size = 8, margin = margin(t = 6)),
      axis.text.y.left = elt(size = 8, vjust = 1, lineheight = 0.875,  color = "#999999"),
      strip.text.x = ggtext::element_markdown(hjust = 0.5, size = 10, family = font_es),
      strip.text = ggtext::element_markdown(hjust = 0.5, size = 10, family = font_es),
      panel.spacing.x = unit("40", "pt"),
      panel.spacing.y = unit(6, "pt"),
      panel.border = elb(),
      legend.position = "top"
    ) -> gg

  grid.newpage()
  grid.draw(
    gtable_filter_remove(
      x = ggplotGrob(gg),
      name = c(sprintf("axis-b-%d-1", 2:5), sprintf("axis-b-%d-2", 2:5), sprintf("axis-b-%d-3", 2:5), sprintf("axis-b-%d-4", 2:5))
    )
  )

}

{ggplot2} recreation of NYT plot

I woke up to Axios’ “1 Big Thing” ridgeline chart showing the crazy that was the 2019 news cycle:

and, I decided to reproduce it in {ggplot2}.

Getting The Data

First, I had to find the data. The Axios chart is interactive, so I assumed the visualization was built on-load. It was, but the data was embedded in a javascript file vs loaded as JSON via an XHR request:

which was easy enough to turn into JSON anyone can use.

NOTE: The # hrbrmstr/hrbrthemes is an indication you may need to use the version of {hrbrthemes} from my gitea/sourcehut/gitlab/bitbucket/github. That package has instructions for installing fonts needed. Sub out theme_ipsum_es() with theme_ipsum(), theme_ipsum_rc() or just use theme_bw() and tweak aesthetics manually.

library(ggalt)
library(hrbrthemes) # hrbrmstr/hrbrthemes
library(tidyverse)

jsonlite::fromJSON("https://rud.is/dl/2019-axios-news.json") %>% 
  as_tibble() -> xdf

xdf
## # A tibble: 31 x 3
##    name                    avg data      
##    <chr>                 <dbl> <list>    
##  1 Gov't shutdown        20.5  <int [51]>
##  2 Mexico-U.S. border    22.8  <int [51]>
##  3 Green New Deal        11.3  <int [51]>
##  4 Blackface              9.61 <int [51]>
##  5 N. Korea-Hanoi Summit 11.2  <int [51]>
##  6 Boeing 737 Max         4.79 <int [51]>
##  7 Brexit                28.5  <int [51]>
##  8 Israel                42.1  <int [51]>
##  9 SpaceX                24.1  <int [51]>
## 10 Game of Thrones       16.8  <int [51]>
## # … with 21 more rows

This is pretty tidy already, but we’ll need to expand the data column and give each week an index:

unnest(xdf, data) %>% 
  group_by(name) %>% 
  mutate(idx = 1:n()) %>% 
  ungroup() %>% 
  mutate(name = fct_inorder(name)) -> xdf # making a factor foe strip/panel ordering 

xdf
## # A tibble: 1,581 x 4
##    name             avg  data   idx
##    <fct>          <dbl> <int> <int>
##  1 Gov't shutdown  20.5    69     1
##  2 Gov't shutdown  20.5   100     2
##  3 Gov't shutdown  20.5    96     3
##  4 Gov't shutdown  20.5   100     4
##  5 Gov't shutdown  20.5    19     5
##  6 Gov't shutdown  20.5     9     6
##  7 Gov't shutdown  20.5    17     7
##  8 Gov't shutdown  20.5     3     8
##  9 Gov't shutdown  20.5     2     9
## 10 Gov't shutdown  20.5     1    10
## # … with 1,571 more rows

We’ll take this opportunity to find the first week of each month (via rle()) so we can have decent axis labels:

# get index placement for each month axis label
sprintf("2019-%02s-1", 1:51) %>% 
  as.Date(format = "%Y-%W-%w") %>% 
  format("%b") %>% 
  rle() -> mons

mons
## Run Length Encoding
##   lengths: int [1:12] 4 4 4 5 4 4 5 4 5 4 ...
##   values : chr [1:12] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" ...

month_idx <- cumsum(mons$lengths)-3

month_idx
##  [1]  1  5  9 14 18 22 27 31 36 40 44 48

We’ve got all we need to make a {ggplot2} version of the chart. Here’s the plan:

  • use geom_area() and map colour and fill to avg (like Axios did), using an medium alpha value so we can still see below the overlapped areas
  • also use an xspline() stat with geom_area() so we get smooth lines vs pointy ones
  • use geom_hline() vs an axis line so we can map a colour aesthetic to avg as well
  • make a custom x-axis scale so we can place the labels we just made
  • expand the y-axis upper limit to avoid cutting off any part of the geoms
  • use the inferno viridis palette, but not the extremes of it
  • make facets/panels on the name, positioning the labels on the left
  • finally, tweak strip positioning so we get overlapped charts
ggplot(xdf, aes(idx, data)) +
  geom_area(alpha = 1/2, stat = "xspline", aes(fill = avg, colour = avg)) +
  geom_hline(
    data = distinct(xdf, name, avg),
    aes(yintercept = 0, colour = avg), size = 0.5
  ) +
  scale_x_continuous(
    expand = c(0,0.125), limits = c(1, 51),
    breaks = month_idx, labels = month.abb
  ) +
  scale_y_continuous(expand = c(0,0), limits = c(0, 105)) +
  scale_colour_viridis_c(option = "inferno", direction = -1, begin = 0.1, end = 0.9) +
  scale_fill_viridis_c(option = "inferno", direction = -1, begin = 0.1, end = 0.9) +
  facet_wrap(~name, ncol = 1, strip.position = "left", dir = "h") +
  labs(
    x = NULL, y = NULL, fill = NULL, colour = NULL,
    title = "1 big thing: The insane news cycles of 2019",
    subtitle = "Height is search interest in a given topic, indexed to 100.\nColor is average search interest between Dec. 30, 2018–Dec. 20, 2019",
    caption = "Source: Axios <https://www.axios.com/newsletters/axios-am-1d9cd913-6142-43b8-9186-4197e6da7669.html?chunk=0#story0>\nData: Google News Lab. Orig. Chart: Danielle Alberti/Axios"
  ) +
  theme_ipsum_es(grid="X", axis = "") +
  theme(strip.text.y = element_text(angle = 180, hjust = 1, vjust = 0)) +
  theme(panel.spacing.y = unit(-0.5, "lines")) +
  theme(axis.text.y = element_blank()) +
  theme(legend.position = "none")

To produce this finished product:

FIN

The chart could be tweaked a bit more to get even closer to the Axios finished product.

Intrepid readers can also try to use {plotly} to make an interactive version.

Somehow, I get the feeling 2020 will have an even more frenetic news cycle.

The latest round of the 2020 Democratic debates is over and the data from all the 2019 editions of the debates have been added to {ggchicklet}. The structure of the debates2019 built-in dataset has changed a bit:

library(ggchicklet)
library(hrbrthemes)
library(tidyverse)

debates2019
## # A tibble: 641 x 7
##    elapsed timestamp speaker   topic   debate_date debate_group night
##      <dbl> <time>    <chr>     <chr>   <date>             <dbl> <dbl>
##  1   1.04  21:03:05  Warren    Economy 2019-09-13             1     1
##  2   1.13  21:04:29  Klobuchar Economy 2019-09-13             1     1
##  3   1.13  21:06:02  O'Rourke  Economy 2019-09-13             1     1
##  4   0.226 21:07:20  O'Rourke  Economy 2019-09-13             1     1
##  5   1.06  21:07:54  Booker    Economy 2019-09-13             1     1
##  6   0.600 21:09:08  Booker    Economy 2019-09-13             1     1
##  7   0.99  21:09:50  Warren    Economy 2019-09-13             1     1
##  8   0.872 21:11:03  Castro    Economy 2019-09-13             1     1
##  9   1.07  21:12:00  Gabbard   Economy 2019-09-13             1     1
## 10   1.11  21:13:20  de Blasio Economy 2019-09-13             1     1
## # … with 631 more rows

There are now debate_date, debate_group and night columns to make it easier to segment out or group together the debate nights.

The topic names across the online JavaScript data for the June, July and September debates weren’t uniform so they’ve been cleaned up as well:

distinct(debates2019, topic) %>% 
  arrange(topic) %>% 
  print(n=nrow(.))
## # A tibble: 26 x 1
##    topic                  
##    <chr>                  
##  1 Abortion               
##  2 Age                    
##  3 Campaign Finance Reform
##  4 Civil Rights           
##  5 Climate                
##  6 Closing                
##  7 Economy                
##  8 Education              
##  9 Elections Reform       
## 10 Foreign Policy         
## 11 Gun Control            
## 12 Healthcare             
## 13 Immigration            
## 14 Lead                   
## 15 Opening                
## 16 Other                  
## 17 Party Strategy         
## 18 Politics               
## 19 Race                   
## 20 Resilience             
## 21 Socialism              
## 22 Statement              
## 23 Trade                  
## 24 Trump                  
## 25 Veterans               
## 26 Women's Rights 

This should make it easier to compare speaker times per-topic across the debates.

Here’ how to generate the chart in the featured image slot for the September debate:

debates2019 %>%
  filter(debate_group == 3) %>% 
  mutate(speaker = fct_reorder(speaker, elapsed, sum, .desc=FALSE)) %>%
  mutate(topic = fct_inorder(topic)) %>% 
  ggplot(aes(speaker, elapsed, group = timestamp, fill = topic)) +
  geom_chicklet(width = 0.75) +
  scale_y_continuous(
    expand = c(0, 0.0625),
    position = "right",
    breaks = seq(0, 18, 2),
    labels = c(0, sprintf("%d min.", seq(2, 18, 2))),
    limits = c(0, 18)
  ) +
  ggthemes::scale_fill_tableau("Tableau 20") +
  guides(
    fill = guide_legend(nrow = 2)
  ) +
  coord_flip() +
  labs(
    x = NULL, y = NULL, fill = NULL,
    title = "How Long Each Candidate Spoke",
    subtitle = "September 2019 Democratic Debates",
    caption = "Each bar segment represents the length of a candidate’s response to a question.\nOriginal <https://www.nytimes.com/interactive/2019/09/12/us/elections/debate-speaking-time.html>\n#rstats reproduction by @hrbrmstr"
  ) +
  theme_ipsum_rc(grid="X") +
  theme(axis.text.x = element_text(color = "gray60", size = 10)) +
  theme(legend.position = "top")

Now that the field has been thinned a bit (yes, others are still running, but really?) we can see who has blathered the most on stage so far:

debates2019 %>%
  filter(debate_group == 3) %>% 
  distinct(speaker) %>% 
  left_join(debates2019) %>% 
  count(speaker, wt=elapsed, sort=TRUE) %>% 
  mutate(speaker = fct_inorder(speaker) %>% fct_rev()) %>% 
  ggplot(aes(speaker, n)) +
  geom_col(fill = ft_cols$slate, width=0.55) +
  coord_flip() +
  scale_y_continuous(expand = c(0, 0.55), position = "right") +
  labs(
    x = NULL, y = "Speaking time (minutes)",
    title = "Total Speaking Time Across All 2019 Debates\nfor Those Left Standing in September"
  ) +
  theme_ipsum_es(grid="X")


And, here’s what they’ve all blathered about:

debates2019 %>%
  filter(debate_group == 3) %>% 
  distinct(speaker) %>% 
  left_join(debates2019) %>% 
  count(topic, wt=elapsed, sort=TRUE) %>% 
  mutate(topic = fct_inorder(topic) %>% fct_rev()) %>% 
  ggplot(aes(topic, n)) +
  geom_col(fill = ft_cols$slate, width=0.55) +
  coord_flip() +
  scale_y_continuous(expand = c(0, 0.25), position = "right") +
  labs(
    x = NULL, y = "Topic time (minutes)",
    title = "Total Topic Time Across All 2019 Debates\nfor Those Left Standing in September"
  ) +
  theme_ipsum_es(grid="X")

The first U.S. Democratic debates of the 2020 election season were held over two nights this past week due to the daft number of candidates running for POTUS. The spiffy @NYTgraphics folks took the tallies of time spent blathering by each speaker/topic and made rounded rectangle segmented bar charts ordered by the time the blathering was performed (these aren’t really debates, they’re loosely-prepared for performances) which I have dubbed “chicklet” charts due to a vague resemblance to the semi-popular gum/candy.

You can see each day’s live, javascript-created NYTimes charts here:

and this is a PNG snapshot of one of them:

nytimes chicklet chart for debate day 1

I liked the chicklet aesthetic enough to make a new {ggplot2} geom_chicklet() to help folks make them. To save some blog bytes, you can read how to install the package over at https://cinc.rud.is/web/packages/ggchicklet/.

Making Chicklet Charts

Since the @NYTimes chose to use javascript to make their chart they also kinda made the data available (view the source of both of the aforelinked URLs) which I’ve wrangled a bit and put into the {ggchicklet} package. We’ll use it to progress from producing basic bars to crunching out chicklets and compare all the candidates across both days.

While the @Nytimes chart(s) provide a great deal of information, most media outlets focused on how much blather time each candidate managed to get. We do not need anything fancier than a bar chart or table to do that:

library(hrbrthemes)
library(ggchicklet)
library(tidyverse)

data("debates2019")

count(debates2019, speaker, wt=elapsed, sort=TRUE) %>%
  mutate(speaker = fct_reorder(speaker, n, sum, .desc=FALSE)) %>%
  mutate(speaker = fct_inorder(speaker) %>% fct_rev()) %>%
  ggplot(aes(speaker,n)) +
  geom_col() +
  scale_y_comma(position = "right") +
  coord_flip() +
  labs(x = NULL, y = "Minutes Spoken") +
  theme_ipsum_rc(grid="X")

ordered blather time chart for each spaker

If we want to see the same basic view but include how much time each speaker spent on each topic, we can also do that without much effort:

count(debates2019, speaker, topic, wt=elapsed, sort=TRUE) %>%
  mutate(speaker = fct_reorder(speaker, n, sum, .desc=FALSE)) %>%
  ggplot(aes(speaker, n , fill = topic)) +
  geom_col() +
  scale_y_comma(position = "right") +
  ggthemes::scale_fill_tableau("Tableau 20", name = NULL) +
  coord_flip() +
  labs(x = NULL, y = "Minutes Spoken") +
  theme_ipsum_rc(grid="X") +
  theme(legend.position = "bottom")

time spent per topic per speaker

By default geom_col() is going to use the fill aesthetic to group the bars and use the default sort order to stack them together.

We can also get a broken out view by not doing the count() and just letting the segments group together and use a white bar outline to keep them distinct:

debates2019 %>%
  mutate(speaker = fct_reorder(speaker, elapsed, sum, .desc=FALSE)) %>%
  ggplot(aes(speaker, elapsed, fill = topic)) +
  geom_col(color = "white") +
  scale_y_comma(position = "right") +
  ggthemes::scale_fill_tableau("Tableau 20", name = NULL) +
  coord_flip() +
  labs(x = NULL, y = "Minutes Spoken") +
  theme_ipsum_rc(grid="X") +
  theme(legend.position = "bottom")

grouped distinct topics

While I liked the rounded rectangle aesthetic, I also really liked how the @nytimes ordered the segments by when the topics occurred during the debate. For other types of chicklet charts you don’t need the grouping variable to be a time-y whime-y column, just try to use something that has a sane ordering characteristic to it:

debates2019 %>%
  mutate(speaker = fct_reorder(speaker, elapsed, sum, .desc=FALSE)) %>%
  ggplot(aes(speaker, elapsed, group = timestamp, fill = topic)) +
  geom_col(color = "white", position = position_stack(reverse = TRUE)) +
  scale_y_comma(position = "right") +
  ggthemes::scale_fill_tableau("Tableau 20", name = NULL) +
  coord_flip() +
  labs(x = NULL, y = "Minutes Spoken") +
  theme_ipsum_rc(grid="X") +
  theme(legend.position = "bottom")

distinct topics ordered by when blathered abt during each debate

That last chart is about as far as you could go to reproduce the @nytimes look-and-feel without jumping through some serious gg-hoops.

I had made a rounded rectangle hidden geom to make rounded-corner tiles for the {statebins} package so making a version of ggplot2::geom_col() (which I also added to {ggplot2}) was pretty straightforward. There are some key differences in the defaults of geom_chicklet():

  • a “white” stroke for the chicklet/segment (geom_col() has NA for the stroke)
  • automatic reversing of the group order (geom_col() uses the standard sort order)
  • radius setting of unit(3, "px") (change this as you need)
  • chicklet legend geom (b/c they aren’t bars or points)

You likely just want to see it in action, so here it is without further adieu:

debates2019 %>%
  mutate(speaker = fct_reorder(speaker, elapsed, sum, .desc=FALSE)) %>%
  ggplot(aes(speaker, elapsed, group = timestamp, fill = topic)) +
  geom_chicklet(width = 0.75) +
  scale_y_continuous(
    expand = c(0, 0.0625),
    position = "right",
    breaks = seq(0, 14, 2),
    labels = c(0, sprintf("%d min.", seq(2, 14, 2)))
  ) +
  ggthemes::scale_fill_tableau("Tableau 20", name = NULL) +
  coord_flip() +
  labs(
    x = NULL, y = NULL, fill = NULL,
    title = "How Long Each Candidate Spoke",
    subtitle = "Nights 1 & 2 of the June 2019 Democratic Debates",
    caption = "Each bar segment represents the length of a candidate’s response to a question.\n\nOriginals <https://www.nytimes.com/interactive/2019/admin/100000006581096.embedded.html?>\n<https://www.nytimes.com/interactive/2019/admin/100000006584572.embedded.html?>\nby @nytimes Weiyi Cai, Jason Kao, Jasmine C. Lee, Alicia Parlapiano and Jugal K. Patel\n\n#rstats reproduction by @hrbrmstr"
  ) +
  theme_ipsum_rc(grid="X") +
  theme(axis.text.x = element_text(color = "gray60", size = 10)) +
  theme(legend.position = "bottom")

chiclet chart with all the topics

Yes, I upped the ggplot2 tweaking a bit to get closer to the @nytimes (FWIW I like the Y gridlines, YMMV) but didn’t have to do much else to replace geom_col() with geom_chicket(). You’ll need to play with the segment width value depending on the size of your own, different plots to get the best look (just like you do with any other geom).

Astute, intrepid readers will note that the above chart has all the topics whereas the @nytimes just has a few. We can do the grouping of non-salient topics into an “Other” category with forcats::fct_other() and make a manual fill scale from the values stolen fromused in homage from the @nytimes:

debates2019 %>%
  mutate(speaker = fct_reorder(speaker, elapsed, sum, .desc=FALSE)) %>%
  mutate(topic = fct_other(
    topic,
    c("Immigration", "Economy", "Climate Change", "Gun Control", "Healthcare", "Foreign Policy"))
  ) %>%
  ggplot(aes(speaker, elapsed, group = timestamp, fill = topic)) +
  geom_chicklet(width = 0.75) +
  scale_y_continuous(
    expand = c(0, 0.0625),
    position = "right",
    breaks = seq(0, 14, 2),
    labels = c(0, sprintf("%d min.", seq(2, 14, 2)))
  ) +
  scale_fill_manual(
    name = NULL,
    values = c(
      "Immigration" = "#ae4544",
      "Economy" = "#d8cb98",
      "Climate Change" = "#a4ad6f",
      "Gun Control" = "#cc7c3a",
      "Healthcare" = "#436f82",
      "Foreign Policy" = "#7c5981",
      "Other" = "#cccccc"
    ),
    breaks = setdiff(unique(debates2019$topic), "Other")
  ) +
  guides(
    fill = guide_legend(nrow = 1)
  ) +
  coord_flip() +
  labs(
    x = NULL, y = NULL, fill = NULL,
    title = "How Long Each Candidate Spoke",
    subtitle = "Nights 1 & 2 of the June 2019 Democratic Debates",
    caption = "Each bar segment represents the length of a candidate’s response to a question.\n\nOriginals <https://www.nytimes.com/interactive/2019/admin/100000006581096.embedded.html?>\n<https://www.nytimes.com/interactive/2019/admin/100000006584572.embedded.html?>\nby @nytimes Weiyi Cai, Jason Kao, Jasmine C. Lee, Alicia Parlapiano and Jugal K. Patel\n\n#rstats reproduction by @hrbrmstr"
  ) +
  theme_ipsum_rc(grid="X") +
  theme(axis.text.x = element_text(color = "gray60", size = 10)) +
  theme(legend.position = "top")

final chicklet chart

FIN

Remember, you can find out how to install {ggchicklet} and also where you can file issues or PRs over at https://cinc.rud.is/web/packages/ggchicklet/. The package has full documentation, including a vignette, but if any usage help is lacking, definitely file an issue.

If you use the package, don’t hesitate to share your creations in a comment or on Twitter so other folks can see how to use the package in different contexts.

I’ve become a big fan of the {logspline} package over the past ~6 months and decided to wrap up a manual ggplot2 plotting process (well, it was at least in an RStudio snippet) into a small {ggplot2} Stat to make it easier to visualize various components of the fitted model.

If you’re new to logspline density estimation this paper by Stone, Hansen, Kooperberg, and Truong is a pretty accessible introduction to the methodology and the (primary) algorithm used in the {logspline} package.

If you’re already a user of the {logspline} package and have been just using the package’s base S3 plot function but wanted the functionality in {ggplot2} this was pretty much designed for you.

If you hit up https://git.rud.is/hrbrmstr/gglogspline there are links to all your fav social code collaboration services plus installation instructions. There’s a single Statstat_logspline() — which does all the hard work. By default it will make y the stat(density) computed field but there are other computed fields you can use as well:

  • count: computed count estimates (similar to ggplot2::stat_density())
  • probs: distribution function
  • survival: survival function
  • hazard : hazard function

You can take a look at the (light) comparison between geom_histogram(), geom_density(), ggalt::geom_bkde(), and gglogspline::stat_logspline() below (or use this link to bust the iframe):

FIN

For “general purpose” density estimation you can likely still rely on stat_density() but after you dig into the background of logspline and try it out in some appropriate use-cases you may grow to like it as much as I have.

The package itself is super-small and focused. One reason for that is it may be helpful for those who want to create or customize their own ggplot2 Stats. (I think I forgot to change the license to MIT before this post so I’ll rectify that ASAP; I default to AGPL in a fairly pathetic attempt to stop skeezy cybersecurity startups — pretty much 99% of them — from stealing code without giving back to the community).

As usual, kick the tyres and file issues or PRs as you feel moved (and wherever you feel comfortable social coding).