Skip navigation

This week’s edition of Data is Plural had two really fun data sets. One is serious fun (the first comprehensive data set on U.S. evictions, and the other I knew about but had forgotten: The Federal Register Executive Order (EO) data set(s).

The EO data is also comprehensive as the summary JSON (or CSV) files have links to more metadata and even more links to the full-text in various formats.

What follows is a quick post to help bootstrap folks who may want to do some tidy text mining on this data. We’ll look at EOs-per-year (per-POTUS) and also take a look at the “top 5 ‘first words'” in the titles of the EOS (also by POTUS).

Ingesting the Data

The EO main page has a list of EO JSON files by POTUS. We’re going to scrape this so we can classify the EOs by POTUS (we could also just use the Federal Register API since @thosjleeper wrote a spiffy package to access it):

library(rvest)
library(stringi)
library(pluralize) # devtools::install_github("hrbrmstr/pluralize")
library(hrbrthemes)
library(tidyverse)

#' Retrieve the Federal Register main EO page so we can get the links for each POTUS
pg <- read_html("https://www.federalregister.gov/executive-orders") 

#' Find the POTUS EO data nodes, excluding the one for "All"
html_nodes(pg, "ul.bulk-files") %>% 
  html_nodes(xpath = ".//li[span[a[contains(@href, 'json')]] and 
                            not(span[contains(., 'All')])]") -> potus_nodes

#' Turn the POTUS info into a data frame with the POTUS name and EO JSON link,
#' then retrieve the JSON file and make a data frame of individual data elements
data_frame(
  potus = html_nodes(potus_nodes, "span:nth-of-type(1)") %>% html_text(),
  eo_link = html_nodes(potus_nodes, "a[href *= 'json']") %>% 
    html_attr("href") %>% 
    sprintf("https://www.federalregister.gov%s", .)
) %>% 
  mutate(eo = map(eo_link, jsonlite::fromJSON)) %>% 
  mutate(eo = map(eo, "results")) %>% 
  unnest() -> eo_df

glimpse(eo_df)
## Observations: 887
## Variables: 16
## $ potus                  <chr> "Donald Trump", "Donald Trump", "Donald Trump", "Donald Trump", "Donald Trump", "D...
## $ eo_link                <chr> "https://www.federalregister.gov/documents/search.json?conditions%5Bcorrection%5D=...
## $ citation               <chr> "82 FR 8351", "82 FR 8657", "82 FR 8793", "82 FR 8799", "82 FR 8977", "82 FR 9333"...
## $ document_number        <chr> "2017-01799", "2017-02029", "2017-02095", "2017-02102", "2017-02281", "2017-02450"...
## $ end_page               <int> 8352, 8658, 8797, 8803, 8982, 9338, 9341, 9966, 10693, 10696, 10698, 10700, 12287,...
## $ executive_order_notes  <chr> NA, "See: EO 13807, August 15, 2017", NA, NA, "See: EO 13780, March 6, 2017", "Sup...
## $ executive_order_number <int> 13765, 13766, 13767, 13768, 13769, 13770, 13771, 13772, 13773, 13774, 13775, 13776...
## $ html_url               <chr> "https://www.federalregister.gov/documents/2017/01/24/2017-01799/minimizing-the-ec...
## $ pdf_url                <chr> "https://www.gpo.gov/fdsys/pkg/FR-2017-01-24/pdf/2017-01799.pdf", "https://www.gpo...
## $ publication_date       <chr> "2017-01-24", "2017-01-30", "2017-01-30", "2017-01-30", "2017-02-01", "2017-02-03"...
## $ signing_date           <chr> "2017-01-20", "2017-01-24", "2017-01-25", "2017-01-25", "2017-01-27", "2017-01-28"...
## $ start_page             <int> 8351, 8657, 8793, 8799, 8977, 9333, 9339, 9965, 10691, 10695, 10697, 10699, 12285,...
## $ title                  <chr> "Minimizing the Economic Burden of the Patient Protection and Affordable Care Act ...
## $ full_text_xml_url      <chr> "https://www.federalregister.gov/documents/full_text/xml/2017/01/24/2017-01799.xml...
## $ body_html_url          <chr> "https://www.federalregister.gov/documents/full_text/html/2017/01/24/2017-01799.ht...
## $ json_url               <chr> "https://www.federalregister.gov/api/v1/documents/2017-01799.json", "https://www.f...

EOs By Year

To see how many EOs were signed per-year, per-POTUS, we’ll convert the signing_date into a year (and return it back to a Date object so we get spiffier plot labels), factor order the POTUS names and mark the start of each POTUS term. I’m not usually a fan of stacked bar charts, but since there will only be — at most — two segments, I think they work well and it also shows just how many EOs are established in year one of a POTUS term:

mutate(eo_df, year = lubridate::year(signing_date)) %>% 
  mutate(year = as.Date(sprintf("%s-01-01", year))) %>% 
  count(year, potus) %>%
  mutate(
    potus = factor(
      potus, 
      levels = c("Donald Trump", "Barack Obama", "George W. Bush", "William J. Clinton")
    )
  ) %>%
  ggplot(aes(year, n, group=potus)) +
  geom_col(position = "stack", aes(fill = potus)) +
  scale_x_date(
    name = NULL,
    expand = c(0,0),
    breaks = as.Date(c("1993-01-01", "2001-01-01", "2009-01-01", "2017-01-01")),
    date_labels = "%Y",
    limits = as.Date(c("1992-01-01", "2020-12-31"))
  ) +
  scale_y_comma(name = "# EOs") +
  scale_fill_ipsum(name = NULL) +
  guides(fill = guide_legend(reverse=TRUE)) +
  labs(
    title = "Number of Executive Orders Signed Per-Year, Per-POTUS",
    subtitle = "1993-Present",
    caption = "Source: Federal Register <https://www.federalregister.gov/executive-orders>"
  ) +
  theme_ipsum_rc(grid = "Y") +
  theme(legend.position = "bottom")

Favourite First (Title) Words

I’ll let some eager tidy text miners go-to-town on the full text links and just focus on one aspect of the EO titles: the “first” words. These are generally words like “Amending”, “Establishing”, “Promoting”, etc. to give citizens a quick idea of what’s the order is supposed to be doing. We’ll remove common words, turn plurals into singulars and also get rid of years/dates to make the data a bit more useful and focus on the “top 5” first words used by each POTUS (and show all the first words across each POTUS). I’m using raw counts here (since this is a quick post) but another view normalized by percent of all POTUS EOs might prove more interesting/valuable:

mutate(titles_df, first_word = singularize(first_word)) %>% 
  count(potus, first_word, sort=TRUE) %>% 
  filter(!stri_detect_regex(first_word, "President|Federal|National")) %>%
  mutate(first_word = stri_replace_all_fixed(first_word, "Establishment", "Establishing")) %>% 
  mutate(first_word = stri_replace_all_fixed(first_word, "Amendment", "Amending")) -> first_words

group_by(first_words, potus) %>% 
    top_n(5) %>%  
    ungroup() %>% 
    distinct(first_word) %>% 
    pull(first_word) -> all_first_words

filter(first_words, first_word %in% all_first_words) %>% 
  mutate(
    potus = factor(
      potus, 
      levels = c("Donald Trump", "Barack Obama", "George W. Bush", "William J. Clinton")
    )
  ) %>% 
  mutate(
    first_word = factor(
      first_word, 
      levels = rev(sort(unique(first_word)))
    )
  ) -> first_df

ggplot(first_df, aes(n, first_word)) +
  geom_segment(aes(xend=0, yend=first_word, color=potus), size=4) +
  scale_x_comma(limits=c(0,40)) +
  scale_y_discrete(limits = sort(unique(first_df$first_word))) +
  facet_wrap(~potus, scales = "free", ncol = 2) +
  labs(
    x = "# EOs",
    y = NULL,
    title = "Top 5 Executive Order 'First Words' by POTUS",
    subtitle = "1993-Present",
    caption = "Source: Federal Register <https://www.federalregister.gov/executive-orders>"
  ) +
  theme_ipsum_rc(grid="X", strip_text_face = "bold") +
  theme(panel.spacing.x = unit(5, "lines")) +
  theme(legend.position="none")

FWIW I expected more “Revocation”/”Removing” from the current tangerine-in-chief, but there’s plenty “Enforcing” and “Blocking” to make up for it (being the “tough guy” that he likes to pretend he is).

FIN

There’s way more that can be done with this data set and hopefully folks will take it for a spin and come up with their own interesting views. If you do, drop a note in the comments with a link to your creation(s)!

The code blocks are all combined into this gist.

One Trackback/Pingback

  1. […] article was first published on R – rud.is, and kindly contributed to […]

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.