Skip navigation

Luke Whyte posted an article (apologies for a Medium link) over on Towards Data Science showing how to use a command line workflow involving curl, node and various D3 libraries and javascript source files to build a series of SVG static maps. It’s well written and you should give it a read especially since he provides the code and data.

We can do all of that in R with the help of a couple packages and by using a free geocoding service which will also allow us to put more data on the map, albeit with some extra work due to it returning weird values for Hawaii locations.

library(albersusa) # git.sr.ht/~hrbrmstr/albersusa | git[la|hu]b/hrbrmstr/albersusa
library(rgeocodio) # git.sr.ht/~hrbrmstr/rgeocodio | git[la|hu]b/hrbrmstr/rgeocodio
library(tidyverse)

# the data url from the original blog
fil <- "https://query.data.world/s/awyahzfiikyoqi5ygpvauqslwmqltr"

read_csv(fil, col_types = "cd") %>% 
  select(area=1, pct=2) %>% 
  mutate(pct = pct/100) -> xdf # make percents proper percents

gc <- gio_batch_geocode(xdf$area)

The result of the geocoding is a data frame that has various confidences associated with the result. We’ll pick the top one for each and then correct for the errant Hawaii longitude it gives back:

map2_df(gc$query, gc$response_results, ~{
  out <- .y[1,,]
  out$area <- .x
  out
}) %>% 
  filter(!is.na(location.lat)) %>% 
  select(area, state = address_components.state, lat=location.lat, lon=location.lng) %>% 
  mutate(
    lat = ifelse(grepl("Honolu", area), 21.3069, lat),
    lon = ifelse(grepl("Honolu", area), -157.8583, lon)
  ) %>% 
  left_join(xdf) %>% 
  as_tibble() -> area_pct

area_pct
## # A tibble: 47 x 5
##    area                                        state   lat    lon   pct
##    <chr>                                       <chr> <dbl>  <dbl> <dbl>
##  1 McAllen-Edinburg-Mission, TX                TX     26.2  -98.1 0.102
##  2 Houston-The Woodlands-Sugar Land, TX        TX     29.6  -95.8 0.087
##  3 Santa Maria-Santa Barbara, CA               CA     34.4 -120.  0.081
##  4 Las Vegas-Henderson-Paradise, NV            NV     36.1 -115.  0.08 
##  5 Los Angeles-Long Beach-Anaheim, CA          CA     33.9 -118.  0.075
##  6 Miami-Fort Lauderdale-West Palm Beach, FL   FL     26.6  -80.1 0.073
##  7 Dallas-Fort Worth-Arlington, TX             TX     33.3  -98.4 0.069
##  8 Washington-Arlington-Alexandria, DC-VA-MD-… WV     39.2  -81.7 0.068
##  9 Bridgeport-Stamford-Norwalk, CT             CT     41.3  -73.1 0.067
## 10 San Jose-Sunnyvale-Santa Clara, CA          CA     37.4 -122.  0.065
## # … with 37 more rows

The albersusa package provides base maps with Alaska & Hawaii elided into a composite U.S. map. As such, we need to elide any points that are in Alaska & Hawaii:

us <- usa_composite()
us_map <- fortify(us, region="name")

hi <- select(filter(area_pct, state == "HI"), lon, lat)
(hi <- points_elided(hi))
area_pct[area_pct$state == "HI", c("lon", "lat")] <- hi

Then, it’s just a matter of using ggplot2:

ggplot() +
  geom_map(
    data = us@data, map=us_map,
    aes(map_id=name), 
    fill = "white", color = "#2b2b2b", size = 0.1
  ) +
  geom_point(
    data = area_pct, aes(lon, lat, size = pct), 
    fill = alpha("#b30000", 1/2), color = "#b30000", shape=21
  ) +
  ggalt::coord_proj(us_laea_proj) +
  scale_y_continuous(expand=c(0, 3)) +
  scale_radius(
    name = NULL, label = scales::percent_format(1)
  ) +
  labs(x = "Estimated percent of undocumented residents in U.S. metro areas. Source: Pew Research Center") +
  theme_minimal() +
  theme(axis.text = element_blank()) +
  theme(axis.title.x = element_text(hjust=0.5, size = 8)) +
  theme(axis.title.y = element_blank()) +
  theme(panel.grid = element_blank()) +
  theme(legend.position = c(0.9, 0.3)) -> gg

ggsave(filename = "map.svg", device = "svg", plot = gg, height = 5, width = 7)

Unlike the post’s featured image (which has to be a bitmap…grrr) the resultant SVG is below:

FIN

There is absolutely nothing wrong with working where you’re most comfortable and capable and Luke definitely wields the command line and javascript incredibly well. This alternate way of doing things in R may help other data journalists who are more comfortable in R or want to increase their R knowledge replicate and expand upon Luke’s process.

If you’ve got alternate ways of doing this in R or even (gasp) Python, drop a note in the comments with a link to your blog so folks who are comfortable neither in R nor the command line can see even more ways of producing this type of content.

5 Comments

  1. Nice post,

    thank you. I am getting an error, which is ultimately placing the HI datapoint off the map itself.

    hi <- points_elided(hi)
    Error in (function (classes, fdef, mtable) :
    unable to find an inherited method for function ‘proj4string’ for signature ‘”tbl_df”’

    I simply cut-n-pasted your code, so not sure why the error.

    Works great on the lower 48 ;-).

    • Have you installed the latest albersusa? I made a change that should enable this snippet to work. If so, can you post this as an issue on the albersusa repo (whichever social coding service you prefer). #ty

  2. Thanks for this article ! Very useful :)
    As far as I am concerned, I have an issue when doing
    gc <- gio_batch_geocode(xdf$area)

    It says :
    Couldn’t find env var GEOCODIO_API_KEY See ?gio_auth for more details.
    Please enter your API key and press enter:

    Do you know what I should enter in there ?

    Thank you !

    • As the README and reference link in ?rgeocodio::gio_batch_geocode states you need to go their site and register for a free API key.

      • Indeed ! Thank you :)


One Trackback/Pingback

  1. […] leave a comment for the author, please follow the link and comment on their blog: R – rud.is. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data […]

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.