Tragic Documentation

NOTE: If the usual aggregators are picking this up and there are humans curating said aggregators, this post is/was not intended as something to go into the “data science” aggregation sites. Just personal commentary with code in the event someone stumbles across it and wanted to double check me. These “data-dives” help me cope with these type of horrible events.

The “data science” feed URL is

I saw the story about body camera footage from a officers involved police stop & fatal shooting in Salt Lake City.
The indiviual killed was a felon — convicted of aggravated assuault — with an outstanding warrant.

He tried to run. At some point in the brief chase he pivoted and appeared to be reaching for a weapon — likely a knife, which was confirmed after the fact.

One officer pulled a tazer. Another pulled a gun. Officer Fox — the one who fired the gun — said he was terrified by how close Mr. Harmon was to the officers when Mr. Harmon stopped and turned toward them.

I wasn’t there. I don’t risk getting injured or killed in the line of duty every day. I don’t face down armed suspects in fast-moving, tense situations.

But, I’m weary of this being a cut+paste story that is a nigh weekly event in America.

Officers are killed by suspects as well. It’s equally tragic.

Below is just “data”. Just a visual documentary of where we are 17-ish years into the 21st Century in America.

And, most of America seems to be OK with this. Then again, most of America is OK with the “price of freedom” being one mass shooting a day.

I’m not.

I scaled the Y axis the same in both faceted charts to make it easier to glance across both sets of tragedies.

This was generated on Sunday, October 8, 2017. If you run the code after that date, remove the saved data files and tweak the Y-scale limits since the death toll will rise.


read.table(sep=":", stringsAsFactors=FALSE, header=TRUE, 
W:White, non-Hispanic
B:Black, non-Hispanic
N:Native American
O:Other") -> rdf

wapo_data_url <- ""
shootings_file <- basename(wapo_data_url)
if (!file.exists(shootings_file)) download.file(wapo_data_url, shootings_file)

  id = col_integer(),
  name = col_character(),
  date = col_date(format = ""),
  manner_of_death = col_character(),
  armed = col_character(),
  age = col_integer(),
  gender = col_character(),
  race = col_character(),
  city = col_character(),
  state = col_character(),
  signs_of_mental_illness = col_character(),
  threat_level = col_character(),
  flee = col_character(),
  body_camera = col_character()
) -> shootings_cols

read_csv(shootings_file, col_types = shootings_cols) %>% 
  mutate(yr = lubridate::year(date), wk = lubridate::week(date)) %>% 
  filter(yr >= 2017) %>% 
  mutate(race = ifelse(, "None", race)) %>% 
  mutate(race = ifelse(race=="O", "None", race)) %>% 
  count(race, wk) %>% 
  left_join(rdf, by="race") %>% 
  mutate(description = factor(description, levels=rdf$description)) -> xdf

lod_url <- ""
lod_rds <- "officer_lod.rds"
if (!file.exists(lod_rds)) {
  res <- httr::GET(lod_url)
  write_rds(res, lod_rds)
} else {
  res <- read_rds(lod_rds)
pg <- httr::content(res, as="parsed", encoding = "UTF-8")

html_nodes(pg, xpath=".//table[contains(., 'Detective Chad William Parque')]") %>% 
  html_nodes(xpath=".//td[contains(., 'EOW')]") %>% 
  html_text() %>% 
  stri_extract_all_regex("(EOW:[[:space:]]+(.*)\n|Cause of Death:[[:space:]]+(.*)\n)", simplify = TRUE) %>% 
  as_data_frame() %>% 
    stri_replace_first_regex(.x, "^[[:alpha:][:space:]]+: ", "") %>% 
  ) %>% 
  as_data_frame() %>%  
  set_names(c("day", "cause")) %>% 
  mutate(day = as.Date(day, "%A, %B %e, %Y"), wk = lubridate::week(day))%>% 
  count(wk, cause) -> odf 

ggplot(xdf, aes(wk, n)) +
  geom_segment(aes(xend=wk, yend=0)) +
  scale_y_comma(limits=c(0,15)) +
  facet_wrap(~description, scales="free_x") +
  labs(x="2017 Week #", y="# Deaths",
       title="Weekly Fatal Police Shootings in 2017",
       subtitle=sprintf("2017 total: %s", scales::comma(sum(xdf$n))),
       caption="Source:") +

count(odf, cause, wt=n, sort=TRUE) -> ordr

mutate(odf, cause = factor(cause, levels=ordr$cause)) %>% 
  ggplot(aes(wk, n)) +
  geom_segment(aes(xend=wk, yend=0)) +
  scale_x_continuous(limits=c(0, 40)) +
  scale_y_comma(limits=c(0,15)) +
  facet_wrap(~cause, ncol=3, scales="free_x") +
  labs(x="2017 Week #", y="# Deaths",
       title="Weekly Officer Line of Duty Deaths in 2017",
       subtitle=sprintf("2017 total: %s", scales::comma(sum(odf$n))),
       caption="Source:") +
Cover image from Data-Driven Security
Amazon Author Page

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.