14 Day 12: Movement

14.1 Technologies/Techniques

  • Working with R Simple Features {sf}
  • Incorporating {albersusa} composite projections
  • Working with external data
  • Using {ggplot2} geom_sf() and geom_curve() to draw flowing lines of commute data
  • Making faceted maps

14.2 Data Source: U.S. Census Commuting Flows

The U.S. Census’ American Community Survey (ACS) asks respondents about their primary workplace location. When information about workers’ residence location and workplace location are coupled, a commuting flow is generated. The origin-destination flow format informs our understanding of interconnectedness between communities, including the interchange of people, goods, and services. They make this data available online49 and I decided to take a look at all the places Mainers go (outside of Maine) to work.

library(sf)
library(readxl)
library(hrbrthemes)
library(albersusa)
library(tidyverse)

The ACS Commuting Flows data is logged by county, so let’s bring in the composite county map from {albersusa}:

counties_sf() %>% 
  st_transform(us_laea_proj) -> cmap

Now we’ll retrieve the ACS data and clean it up a bit (it’s not the worst Excel file, but it needs work):

if (!file.exists(here::here("data/table1.xlsx"))){
  download.file(
    url = "https://www2.census.gov/programs-surveys/demo/tables/metro-micro/2015/commuting-flows-2015/table1.xlsx",
    destfile = here::here("data/table1.xlsx")
  )
}
read_excel(here::here("data/table1.xlsx"), skip=6) %>%
  janitor::clean_names() %>%
  select(
    start_state_fips = state_fips_code_1,
    start_county_fips = county_fips_code_2,
    start_state = state_name_3,
    start_county = county_name_4,
    end_state_fips = state_fips_code_5,
    end_county_fips = county_fips_code_6,
    end_state = state_name_7,
    end_county = county_name_8,
    workers = workers_in_commuting_flow,
    moe = margin_of_error
  ) %>%
  mutate(end_state_fips = gsub("^0", "", end_state_fips)) -> xdf

glimpse(xdf)
## Observations: 139,435
## Variables: 10
## $ start_state_fips  <chr> "01", "01", "01", "01", "01", "01", "01", "01", "01…
## $ start_county_fips <chr> "001", "001", "001", "001", "001", "001", "001", "0…
## $ start_state       <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabam…
## $ start_county      <chr> "Autauga County", "Autauga County", "Autauga County…
## $ end_state_fips    <chr> "01", "01", "01", "01", "01", "01", "01", "01", "01…
## $ end_county_fips   <chr> "001", "013", "021", "043", "047", "051", "053", "0…
## $ end_state         <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabam…
## $ end_county        <chr> "Autauga County", "Butler County", "Chilton County"…
## $ workers           <dbl> 8828, 6, 504, 27, 296, 2186, 14, 271, 8, 79, 97, 31…
## $ moe               <dbl> 752, 10, 228, 44, 130, 486, 23, 142, 16, 108, 68, 5…

14.3 Making Commuting Flows

Now we’ll build from/to pairs of places Mainers go outside of Maine to work. First we’ll get the start/end county FIPS code:

filter(xdf, start_state == "Maine", end_state != "Maine") %>%
  filter(start_county_fips != end_county_fips) %>%
  mutate(
    start_fips = glue::glue("{start_state_fips}{start_county_fips}") %>%
      as.character() %>%
      factor(levels = levels(cmap$fips)),
    end_fips = glue::glue("{end_state_fips}{end_county_fips}") %>%
      as.character() %>%
      factor(levels = levels(cmap$fips))
  ) -> me_start

glimpse(me_start)
## Observations: 501
## Variables: 12
## $ start_state_fips  <chr> "23", "23", "23", "23", "23", "23", "23", "23", "23…
## $ start_county_fips <chr> "001", "001", "001", "001", "001", "001", "001", "0…
## $ start_state       <chr> "Maine", "Maine", "Maine", "Maine", "Maine", "Maine…
## $ start_county      <chr> "Androscoggin County", "Androscoggin County", "Andr…
## $ end_state_fips    <chr> "09", "12", "17", "17", "22", "22", "24", "25", "25…
## $ end_county_fips   <chr> "009", "086", "031", "097", "057", "109", "037", "0…
## $ end_state         <chr> "Connecticut", "Florida", "Illinois", "Illinois", "…
## $ end_county        <chr> "New Haven County", "Miami-Dade County", "Cook Coun…
## $ workers           <dbl> 4, 15, 37, 9, 22, 16, 10, 10, 53, 12, 2, 7, 8, 6, 9…
## $ moe               <dbl> 6, 19, 42, 14, 32, 26, 14, 13, 49, 13, 6, 11, 14, 9…
## $ start_fips        <fct> 23001, 23001, 23001, 23001, 23001, 23001, 23001, 23…
## $ end_fips          <fct> 09009, 12086, 17031, 17097, 22057, 22109, 24037, 25…

Then compute the county centroids so we have from/to points to use with geom_curve():

select(cmap, fips, geometry) %>%
  mutate(geometry = st_centroid(geometry)) %>%
  st_coordinates() %>%
  as_tibble() %>%
  bind_cols(
    select(cmap, fips) %>%
      as_tibble() %>%
      select(-geometry)
  ) %>%
  select(fips, lng = X, lat = Y) -> centers

glimpse(centers)
## Observations: 3,143
## Variables: 3
## $ fips <fct> 01001, 01009, 01017, 01021, 01033, 01045, 01051, 01065, 01079, 0…
## $ lng  <dbl> 1253479.8, 1237658.1, 1363279.6, 1241407.0, 1114517.1, 1367007.1…
## $ lat  <dbl> -1285059.8, -1124835.1, -1224657.2, -1251718.1, -1063167.5, -139…
ggplot() +
  geom_point(data = centers, aes(lng, lat), size = 0.125) +
  coord_sf(crs = us_laea_proj, datum = NA) +
  labs(x = NULL, y = NULL) +
  theme_ipsum_es(grid="")

We’re going to make a faceted map by starting Maine county. We can make the facet names more useful if we include both the county name as well as how many folks move from one county to another for work:

count(me_start, start_county, wt=workers, sort=TRUE) %>%
  mutate(lab = glue::glue("{gsub(' County', '', start_county)} → {scales::comma(n, accuracy=1)}")) -> labs

glimpse(labs)
## Observations: 16
## Variables: 3
## $ start_county <chr> "York County", "Cumberland County", "Oxford County", "An…
## $ n            <dbl> 17739, 3350, 1758, 525, 516, 446, 342, 247, 188, 168, 15…
## $ lab          <glue> "York → 17,739", "Cumberland → 3,350", "Oxford → 1,758"…

Finally, we’re take those country centroids and make the final from/to pairs:

left_join(
  me_start, centers,
  by = c("start_fips"="fips")
) %>%
  rename(start_lng = lng, start_lat = lat) %>%
  left_join(centers, by = c("end_fips"="fips")) %>%
  rename(end_lng = lng, end_lat = lat) %>%
  left_join(labs, "start_county") %>%
  mutate(lab = factor(lab, levels = labs$lab)) -> start

glimpse(start)
## Observations: 501
## Variables: 18
## $ start_state_fips  <chr> "23", "23", "23", "23", "23", "23", "23", "23", "23…
## $ start_county_fips <chr> "001", "001", "001", "001", "001", "001", "001", "0…
## $ start_state       <chr> "Maine", "Maine", "Maine", "Maine", "Maine", "Maine…
## $ start_county      <chr> "Androscoggin County", "Androscoggin County", "Andr…
## $ end_state_fips    <chr> "09", "12", "17", "17", "22", "22", "24", "25", "25…
## $ end_county_fips   <chr> "009", "086", "031", "097", "057", "109", "037", "0…
## $ end_state         <chr> "Connecticut", "Florida", "Illinois", "Illinois", "…
## $ end_county        <chr> "New Haven County", "Miami-Dade County", "Cook Coun…
## $ workers           <dbl> 4, 15, 37, 9, 22, 16, 10, 10, 53, 12, 2, 7, 8, 6, 9…
## $ moe               <dbl> 6, 19, 42, 14, 32, 26, 14, 13, 49, 13, 6, 11, 14, 9…
## $ start_fips        <fct> 23001, 23001, 23001, 23001, 23001, 23001, 23001, 23…
## $ end_fips          <fct> 09009, 12086, 17031, 17097, 22057, 22109, 24037, 25…
## $ start_lng         <dbl> 2309793, 2309793, 2309793, 2309793, 2309793, 230979…
## $ start_lat         <dbl> 341072.6, 341072.6, 341072.6, 341072.6, 341072.6, 3…
## $ end_lng           <dbl> 2207736.9, 1957628.3, 1005063.9, 982123.2, 932468.6…
## $ end_lat           <dbl> -28338.31, -1928437.75, -276629.17, -225515.15, -16…
## $ n                 <dbl> 525, 525, 525, 525, 525, 525, 525, 525, 525, 525, 5…
## $ lab               <fct> Androscoggin → 525, Androscoggin → 525, Androscoggi…

14.4 Drawing the Map

For the final map product we’ll use color to help signify volume of workers since using size might overwhelm the map on some facets given how small they are. We’ll also order the facets by most number of outflows to least.

ggplot() +
  geom_sf(data = cmap, color = "#b2b2b277", size = 0.05, fill = "#3B454A") +
  geom_curve(
    data = start,
    aes(
      x = start_lng, y = start_lat, xend = end_lng, yend = end_lat,
      color = workers
    ),
    size = 0.15, arrow = arrow(type = "open", length = unit(5, "pt"))
  ) +
  scale_color_distiller(
    limits = range(start$workers), labels = scales::comma,
    trans = "log10", palette = "Reds", direction = 1, name = "Worker\nOutflow"
  ) +
  coord_sf(datum = NA, ylim = c(-2500000.0, 1500000)) +
  facet_wrap(~lab) +
  labs(
    x = NULL, y = NULL,
    title = "Oh The Places [Mainers] Will Go [For Work]!",
    subtitle = "2011-2015 5-Year ACS commuting outflows from Maine counties to out-of-state counties; sorted from most worker outflow to least.",
    caption = "Data source: <www.census.gov/data/tables/2015/demo/metro-micro/commuting-flows-2015.html> • #30DayMapChallenge"
  ) +
  theme_ft_rc(grid="", strip_text_family = font_es_bold, strip_text_size = 13) +
  theme(strip.text = element_text(color = "white"))

14.5 In Review

We turned a fairly bland data source into spatial data and looked at worker outflows from Maine to other states. We did this by computing county centers and associating them with the original source data. Maine only has sixteen counties so this was an easier task than it might be for other states.

14.6 Try This At Home

Try using size vs (or with) color to see if it does crowd out some map facets.

Generate large, individual maps vs facets to make it easier to see the outflows.

Use techniques from previous days to make a {mapdeck} version of the outflows.

Pick another state and focus on the top counties and compare the outflows.

Focus just on Maine county to Maine county flows and see what patterns may show up.