Mapping Tornado Alley with R

I caught a re-tweet of this tweet by @harry_stevens:

THREAD: I wrote a post on @observablehq about a map I made today. It shows a typical day in the life of a graphics journalist: You never know what problems you'll have to solve on deadline! https://t.co/yRhW1wbLxN #d3js #dataviz 1/7 pic.twitter.com/7N6mmK0nz3

— Harry Stevens (@Harry_Stevens) May 16, 2019

Harry’s thread and Observable post are great on their own and both show the power and utility of Observable javascript notebooks.

However, the re-tweet (which I’m not posting because it’s daft) took a swipe at both Python & R. Now, I’m all for a good swipe at Python (mostly to ensure we never forget all those broken spacebars and tab keys that language has caused) but I’ll gladly defend it and R together when it comes to Getting Things Done, even on deadline.

Let’s walk through what one of us might have done had we been in the same scenario as Harry.

Mapping On A Deadline

So, we have to create a map of historical tornado frequency trends on deadline.

We emailed researchers and received three txt files. One is a set of latitudes, another longitudes, and the final one is the trend value. It’s gridded data.

Download that ZIP and pretend you got three files in email vs a nice ZIP and make a new RStudio project called “tornado” and put those three files in a local-to-the-project-root data/ directory. Let’s read them in and look at them:

library(hrbrthemes) # not 100% necessary but i like my ggplot2 theme(s) :-)
library(tidyverse)  # data wrangling & ggplot2

tibble(
  lat = scan(here::here("data/lats.txt")),
  lon = scan(here::here("data/lons.txt")),
  trend = scan(here::here("data/trends.txt"))
) -> tornado

You very likely never directly use the base::scan() function, but it’s handy here since we just have files of doubles with each value separated by whitespace. Now, let’s see what we have:

tornado
## # A tibble: 30,000 x 3
##      lat   lon trend
##    <dbl> <dbl> <dbl>
##  1 0.897 -180.     0
##  2 0.897 -179.     0
##  3 0.897 -178.     0
##  4 0.897 -176.     0
##  5 0.897 -175.     0
##  6 0.897 -174.     0
##  7 0.897 -173.     0
##  8 0.897 -172.     0
##  9 0.897 -170.     0
## 10 0.897 -169.     0
## # … with 29,990 more rows

summary(tornado)
##      lat               lon                 trend           
## Min.   : 0.8973   Min.   :-179.99808   Min.   :-0.4733610  
## 1st Qu.:22.0063   1st Qu.: -90.00066   1st Qu.: 0.0000000  
## Median :43.1154   Median :  -0.00323   Median : 0.0000000  
## Mean   :43.1154   Mean   :  -0.00323   Mean   : 0.0002756  
## 3rd Qu.:64.2245   3rd Qu.:  89.99419   3rd Qu.: 0.0000000  
## Max.   :85.3335   Max.   : 179.99161   Max.   : 0.6314569  

#+ grid-overview
ggplot(tornado, aes(lon, lat)) +
  geom_point(aes(color = trend))

#+ trend-overview
ggplot(tornado, aes(trend)) +
  geom_histogram() +
  scale_x_continuous(breaks = seq(-0.5, 0.5, 0.05))

Since we’re looking for trends (either direction) in just the United States the latitude and longitude ranges will need to be shrunk down a bit (it does indeed look like globally gridded data) and we’ll be able to shrink the data set a bit more since we only want to look at large or small tends.

We don’t really need modern R/ggplot2 mapping idioms for this project (i.e. the new {sf} ecosystem), so we’ll keep it “simple” (scare quotes since that’s a loaded term) and just use the built in maps and geom_map(). First, let’s get the U.S. states and extract their bounding boxes/limits:

maps::map("state", ".", exact = FALSE, plot = FALSE, fill = TRUE) %>% 
  fortify(map_obj) %>% 
  as_tibble() -> state_map

xlim <- range(state_map$long)
ylim <- range(state_map$lat)

NOTE: I tend not to use the handy ggplot::map_data() function since it ends up clobbering purrr::map() which I use heavily (though not in this post). I also try to use {sf} these days so this tends to not be an issue anymore anyway.

Now, let’s focus in on the target area in the original paper and the Axios article:

filter(
  tornado,
  between(lon, -107, xlim[2]), between(lat, ylim[1], ylim[2]), # -107 gets us ~left-edge of TX
  ((trend < -0.07) | (trend > 0.07)) # approximates notebook selection range
) -> tornado

#+ grid-overview-2
ggplot(tornado, aes(lon, lat)) +
  geom_point(aes(color = trend))

Now we’re getting close to our final solution.

As stated in the Observable notebook and implied by the word “grid” these dots are centroids of grid rectangles. This means we really want boxes, not points. The article got all fancy but it’s not really necessary since we can use ggplot2::geom_tile() to get us said boxes:

#+ grid-overview-3
ggplot(tornado, aes(lon, lat)) +
  geom_tile(aes(fill = trend, color = trend))

Now, we just need to add in map layers, and tweak some aesthetics to make it look like a map. We’ll start naively:

#+ map-1
ggplot() +
  geom_tile(
    data = tornado,
    aes(lon, lat, fill = trend, color = trend)
  ) +
  geom_map(
    data = state_map, map = state_map,
    aes(long, lat, map_id = region),
    color = "black", size = 0.125, fill = NA
  )

Our gridded data is definitely covering the right/same areas so we just need to make this more suitable for an article. We’ll use Harry’s palette and layer in U.S. state borders, an overall country border, and approximate the title and legend aesthetics:

#+ map-final
c(
  "#023858", "#045a8d", "#0570b0", "#3690c0", "#74a9cf",
  "#a6bddb", "#d0d1e6", "#ece7f2", "#fff7fb", "#ffffff",
  "#ffffcc", "#ffeda0", "#fed976", "#feb24c", "#fd8d3c",
  "#fc4e2a", "#e31a1c", "#bd0026", "#800026"
) -> grad_cols # colors from article

ggplot() +

  # tile layer

  geom_tile(
    data = tornado,
    aes(lon, lat, fill = trend, color = trend)
  ) +

  # state borders

  geom_map(
    data = state_map, map = state_map,
    aes(long, lat, map_id = region),
    color = ft_cols$slate, size = 0.125, fill = NA
  ) +

  # usa border

  borders("usa", colour = "black", size = 0.5) +

  # color scales

  scale_colour_gradientn(
    colours = grad_cols,
    labels = c("Fewer", rep("", 4), "More"),
    name = "Change in tornado frequency, 1979-2017"
  ) +
  scale_fill_gradientn(
    colours = grad_cols,
    labels = c("Fewer", rep("", 4), "More"),
    name = "Change in tornado frequency, 1979-2017"
  ) +

  # make it Albers-ish and ensure we can fit the borders in 

  coord_map(
    projection = "polyconic",
    xlim = scales::expand_range(range(tornado$lon), add = 2),
    ylim = scales::expand_range(range(tornado$lat), add = 2)
  ) +

  # tweak legend aesthetics

  guides(
    colour = guide_colourbar(
      title.position = "top", title.hjust = 0.5
    ),
    fill = guide_colourbar(
      title.position = "top", title.hjust = 0.5
    )
  ) +
  labs(
    x = NULL, y = NULL
  ) +
  theme_ipsum_rc(grid="") +
  theme(axis.text = element_blank()) +
  theme(legend.position = "top") +
  theme(legend.title = element_text(size = 16, hjust = 0.5)) +
  theme(legend.key.width = unit(4, "lines")) +
  theme(legend.key.height = unit(0.5, "lines"))

FIN

I went through some extra steps for folks new to R but the overall approach was at the very least equally as expedient as the Observable one and — despite the claims by the quite daft retweet — this is no less “shareable” or “reusable” than the Observable notebook. You can clone the repo (https://git.sr.ht/~hrbrmstr/tornado) and reuse this work immediately.

If you take a stab at an alternate approach — especially if you do use {sf} — definitely blog about it and drop a link here or on Twitter.

3 Comments

- Michael Davis
- Posted 2019-05-19 at 08:37
- Permalink
- Reply
I’ve enjoyed reading this. I live in Tornado Alley and spent about 3 hours yesterday waiting to see if any tornadoes would materialize from the high winds, hail, and thunderstorms near Dallas. So, I thought it was fortuitous that your article was posted on R-blogger. As a relative beginner in R, I was anxious to try out your code. Alas, when trying to run:

tibble(
lat = scan(here::here(“data/lats.txt”)),
lon = scan(here::here(“data/lons.txt”)),
trend = scan(here::here(“data/trends.txt”))
) -> tornado

I got an error:

Error in loadNamespace(name) : there is no package called ‘here’

I don’t know why it’s throwing that error, in spite of looking up the use of base packages, ‘scan’, and ‘here’ online.

As I say, I’m a beginner in R. Any suggestions as to an alternate means to load in these files or perhaps suggestions on what I may be doing wrong with the Scan or Here library?
- - hrbrmstr
  - Posted 2019-05-19 at 10:03
  - Permalink
  - Reply
  That’s my bad. I neglected to put that in a library(here) call. You’ll need to install.packages("here") to be able to run those lines.
  - - Michael Davis
    - Posted 2019-05-20 at 07:17
    - Permalink
    - Reply
    Oh, awesome. Thank you. This has personal meaning AND learning value. All the best! Mike Davis

One Trackback/Pingback

By Mapping Tornado Alley with R – Data Science Austria on 31 Aug 2019 at 2:32 am

[…] leave a comment for the author, please follow the link and comment on their blog: R – rud.is. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data […]

rud.is