Skip navigation

Category Archives: maps

I caught a re-tweet of this tweet by @harry_stevens:

Harry’s thread and Observable post are great on their own and both show the power and utility of Observable javascript notebooks.

However, the re-tweet (which I’m not posting because it’s daft) took a swipe at both Python & R. Now, I’m all for a good swipe at Python (mostly to ensure we never forget all those broken spacebars and tab keys that language has caused) but I’ll gladly defend it and R together when it comes to Getting Things Done, even on deadline.

Let’s walk through what one of us might have done had we been in the same scenario as Harry.

Mapping On A Deadline

So, we have to create a map of historical tornado frequency trends on deadline.

We emailed researchers and received three txt files. One is a set of latitudes, another longitudes, and the final one is the trend value. It’s gridded data.

Download that ZIP and pretend you got three files in email vs a nice ZIP and make a new RStudio project called “tornado” and put those three files in a local-to-the-project-root data/ directory. Let’s read them in and look at them:

library(hrbrthemes) # not 100% necessary but i like my ggplot2 theme(s) :-)
library(tidyverse)  # data wrangling & ggplot2

tibble(
  lat = scan(here::here("data/lats.txt")),
  lon = scan(here::here("data/lons.txt")),
  trend = scan(here::here("data/trends.txt"))
) -> tornado

You very likely never directly use the base::scan() function, but it’s handy here since we just have files of doubles with each value separated by whitespace. Now, let’s see what we have:

tornado
## # A tibble: 30,000 x 3
##      lat   lon trend
##    <dbl> <dbl> <dbl>
##  1 0.897 -180.     0
##  2 0.897 -179.     0
##  3 0.897 -178.     0
##  4 0.897 -176.     0
##  5 0.897 -175.     0
##  6 0.897 -174.     0
##  7 0.897 -173.     0
##  8 0.897 -172.     0
##  9 0.897 -170.     0
## 10 0.897 -169.     0
## # … with 29,990 more rows

summary(tornado)
##      lat               lon                 trend           
## Min.   : 0.8973   Min.   :-179.99808   Min.   :-0.4733610  
## 1st Qu.:22.0063   1st Qu.: -90.00066   1st Qu.: 0.0000000  
## Median :43.1154   Median :  -0.00323   Median : 0.0000000  
## Mean   :43.1154   Mean   :  -0.00323   Mean   : 0.0002756  
## 3rd Qu.:64.2245   3rd Qu.:  89.99419   3rd Qu.: 0.0000000  
## Max.   :85.3335   Max.   : 179.99161   Max.   : 0.6314569  

#+ grid-overview
ggplot(tornado, aes(lon, lat)) +
  geom_point(aes(color = trend))

#+ trend-overview
ggplot(tornado, aes(trend)) +
  geom_histogram() +
  scale_x_continuous(breaks = seq(-0.5, 0.5, 0.05))

Since we’re looking for trends (either direction) in just the United States the latitude and longitude ranges will need to be shrunk down a bit (it does indeed look like globally gridded data) and we’ll be able to shrink the data set a bit more since we only want to look at large or small tends.

We don’t really need modern R/ggplot2 mapping idioms for this project (i.e. the new {sf} ecosystem), so we’ll keep it “simple” (scare quotes since that’s a loaded term) and just use the built in maps and geom_map(). First, let’s get the U.S. states and extract their bounding boxes/limits:

maps::map("state", ".", exact = FALSE, plot = FALSE, fill = TRUE) %>% 
  fortify(map_obj) %>% 
  as_tibble() -> state_map

xlim <- range(state_map$long)
ylim <- range(state_map$lat)

NOTE: I tend not to use the handy ggplot::map_data() function since it ends up clobbering purrr::map() which I use heavily (though not in this post). I also try to use {sf} these days so this tends to not be an issue anymore anyway.

Now, let’s focus in on the target area in the original paper and the Axios article:

filter(
  tornado,
  between(lon, -107, xlim[2]), between(lat, ylim[1], ylim[2]), # -107 gets us ~left-edge of TX
  ((trend < -0.07) | (trend > 0.07)) # approximates notebook selection range
) -> tornado

#+ grid-overview-2
ggplot(tornado, aes(lon, lat)) +
  geom_point(aes(color = trend))

Now we’re getting close to our final solution.

As stated in the Observable notebook and implied by the word “grid” these dots are centroids of grid rectangles. This means we really want boxes, not points. The article got all fancy but it’s not really necessary since we can use ggplot2::geom_tile() to get us said boxes:

#+ grid-overview-3
ggplot(tornado, aes(lon, lat)) +
  geom_tile(aes(fill = trend, color = trend))

Now, we just need to add in map layers, and tweak some aesthetics to make it look like a map. We’ll start naively:

#+ map-1
ggplot() +
  geom_tile(
    data = tornado,
    aes(lon, lat, fill = trend, color = trend)
  ) +
  geom_map(
    data = state_map, map = state_map,
    aes(long, lat, map_id = region),
    color = "black", size = 0.125, fill = NA
  )

Our gridded data is definitely covering the right/same areas so we just need to make this more suitable for an article. We’ll use Harry’s palette and layer in U.S. state borders, an overall country border, and approximate the title and legend aesthetics:

#+ map-final
c(
  "#023858", "#045a8d", "#0570b0", "#3690c0", "#74a9cf",
  "#a6bddb", "#d0d1e6", "#ece7f2", "#fff7fb", "#ffffff",
  "#ffffcc", "#ffeda0", "#fed976", "#feb24c", "#fd8d3c",
  "#fc4e2a", "#e31a1c", "#bd0026", "#800026"
) -> grad_cols # colors from article

ggplot() +

  # tile layer

  geom_tile(
    data = tornado,
    aes(lon, lat, fill = trend, color = trend)
  ) +

  # state borders

  geom_map(
    data = state_map, map = state_map,
    aes(long, lat, map_id = region),
    color = ft_cols$slate, size = 0.125, fill = NA
  ) +

  # usa border

  borders("usa", colour = "black", size = 0.5) +

  # color scales

  scale_colour_gradientn(
    colours = grad_cols,
    labels = c("Fewer", rep("", 4), "More"),
    name = "Change in tornado frequency, 1979-2017"
  ) +
  scale_fill_gradientn(
    colours = grad_cols,
    labels = c("Fewer", rep("", 4), "More"),
    name = "Change in tornado frequency, 1979-2017"
  ) +

  # make it Albers-ish and ensure we can fit the borders in 

  coord_map(
    projection = "polyconic",
    xlim = scales::expand_range(range(tornado$lon), add = 2),
    ylim = scales::expand_range(range(tornado$lat), add = 2)
  ) +

  # tweak legend aesthetics

  guides(
    colour = guide_colourbar(
      title.position = "top", title.hjust = 0.5
    ),
    fill = guide_colourbar(
      title.position = "top", title.hjust = 0.5
    )
  ) +
  labs(
    x = NULL, y = NULL
  ) +
  theme_ipsum_rc(grid="") +
  theme(axis.text = element_blank()) +
  theme(legend.position = "top") +
  theme(legend.title = element_text(size = 16, hjust = 0.5)) +
  theme(legend.key.width = unit(4, "lines")) +
  theme(legend.key.height = unit(0.5, "lines"))

FIN

I went through some extra steps for folks new to R but the overall approach was at the very least equally as expedient as the Observable one and — despite the claims by the quite daft retweet — this is no less “shareable” or “reusable” than the Observable notebook. You can clone the repo (https://git.sr.ht/~hrbrmstr/tornado) and reuse this work immediately.

If you take a stab at an alternate approach — especially if you do use {sf} — definitely blog about it and drop a link here or on Twitter.

Political machinations are a tad insane in the U.S. these days & I regularly hit up @ProPublica & @GovTrack sites (& sub to the GovTrack e-mail updates) as I try to be an informed citizen, especially since I’ve got a Senator and Representative who seem to be in the sway of ?.

I’ve always appreciated the ProPublica and GovTrack cartograms as they present a great deal of information in a compact space (especially the House versions). Something nudged me into starting an R package to let folks create them in R (mainly with ggplot2 but an htmlwidget version is planned), which I’ve dubbed voteogram.

With the voteogram package, you can:

  • pull ProPublica roll call vote data for the 101st Congress up through today (via roll_call())
  • plot ProPublica-esque Senate roll call vote cartograms
  • plot ProPublica-esque House roll call vote cartograms
  • plot GovTrack-esque House roll call vote cartograms

GovTrack uses — what I’ve seen @thosjleeper refer to as — a “parliamentary plot” for their version of the Senate roll call cartogram and sir Leeper already has that type of plot covered in ggparliament, so I’ve just focused on the other ones here.

Roll Call

You need data for these cartogram generation functions and you can specify your own populated data frame (the needed columns are in the manual pages for the cartogram plotters). However, you’ll likely want to plot existing data that others have tallied and ProPublica makes that super simple since each vote is in a standalone JSON file. All you have to do is specify whether you want the roll call vote for the house or senate, the Congress number (current one is 115), the session number (current one is 1) and the roll call vote number.

For example, we can see all the idiots Representatives who voted, recently, to kill people repeal the ACA with the following function call:

(h256 <- roll_call("house", 115, 1, 256))
## 115th Congress / Session: 1 / House Roll Call: 256 / May  4, 2017
## 
## American Health Care Act
## 
## Result: Passed

str(h256, max.level = 1)
## List of 29
##  $ vote_id              : chr "H_115_1_256"
##  $ chamber              : chr "House"
##  $ year                 : int 2017
##  $ congress             : chr "115"
##  $ session              : chr "1"
##  $ roll_call            : int 256
##  $ needed_to_pass       : int 216
##  $ date_of_vote         : chr "May  4, 2017"
##  $ time_of_vote         : chr "02:18 PM"
##  $ result               : chr "Passed"
##  $ vote_type            : chr "RECORDED VOTE"
##  $ question             : chr "On Passage"
##  $ description          : chr "American Health Care Act"
##  $ nyt_title            : chr "On Passage"
##  $ total_yes            : int 217
##  $ total_no             : int 213
##  $ total_not_voting     : int 1
##  $ gop_yes              : int 217
##  $ gop_no               : int 20
##  $ gop_not_voting       : int 1
##  $ dem_yes              : int 0
##  $ dem_no               : int 193
##  $ dem_not_voting       : int 0
##  $ ind_yes              : int 0
##  $ ind_no               : int 0
##  $ ind_not_voting       : int 0
##  $ dem_majority_position: chr "No"
##  $ gop_majority_position: chr "Yes"
##  $ votes                :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':  435 obs. of  11 variables:
##  - attr(*, "class")= chr [1:2] "pprc" "list"

As you can see, it has a custom print function and the usable data (for cartographic needs) is in $votes. You can go to town with just that information, making bar charts or tracking individual Congress-critter votes.

Do your best to cache this data as you retrieve it. ProPublica is a non-profit and the JSON files are on AWS. While there’s a certain number of free bits of bandwidth-per-month allotted buy Amazon’s S3 service, best to make sure you’re not tipping them over on any given month. Plus, the vote data doesn’t change once it’s recorded. Consider donating to them if you decided to always grab fresh copies.

There’s a fortify function for this object (it’s classed pprc) so you can pass it right into ggplot() for use or pipe it into a dplyr chain for aggregation & filtering.

House Rules

With the data in hand, we can make some cartograms (the real purpose of the package). I riffed off the ProPublica colors (and haven’t fully finished copying them yet as I need to search for 2 more categories of Independent voting colors) but you can replace them with anything you want. Just reset the scale and use the names in the exposed color value vectors.

There’s also a theme_voteogram() which is designed to augment any base theme (like hrbrthemes::theme_ipsum_rc()) (it’s much like ggthemes::theme_map()).

Here’s the ProPublica view for that particular vote:

house_carto(rep) +
  labs(x=NULL, y=NULL, 
       title="House Vote 256 - Passes American Health Care Act,\nRepealing Obamacare") +
  theme_ipsum_rc(plot_title_size = 24) +
  theme_voteogram()

The house_carto() function defaults to the ProPublica cartogram, but you can easily change that:

house_carto(rep, "gt") +
  labs(x=NULL, y=NULL, 
       title="House Vote 256 - Passes American Health Care Act,\nRepealing Obamacare") +
  theme_ipsum_rc(plot_title_size = 24) +
  theme_voteogram()

Senate Drools

Again, the senate_carto() function only has the ProPublica-esque cartogram available and works pretty much the same way after getting the Senate vote data:

sen <- roll_call("senate", 115, 1, 110)

senate_carto(sen) +
  labs(title="Senate Vote 110 - Invokes Cloture on Neil Gorsuch Nomination") +
  theme_ipsum_rc(plot_title_size = 24) +
  theme_voteogram()

FIN

There’s a bit of work left to do in the package (including an htmlwidget version). You’re invited to file PRs or Issues as you are so moved.

This is another purrr-focused post but it’s also an homage to the nascent magick package (R interface to ImageMagick) by @opencpu.

We’re starting to see/feel the impact of the increasing drought up here in southern Maine. I’ve used the data from the U.S. Drought Monitor before on the blog, but they also provide shapefiles and this seemed like a good opportunity to further demonstrate the utility of purrr and make animations directly using magick. Plus, I wanted to see the progression of the drought. Putting library() statements for purrr, magick and broom together was completely random, but I now feel compelled to find a set of functions to put into a cauldron package. But, I digress.

What does this demonstrate?

Apart from giving you an idea of the extent of the drought, working through this will help you:

  • use the quietly() function (which automagically turns off warnings for a function)
  • see another example of a formula function
  • illustrate the utility map_df(), and
  • see how to create an animation pipeline for magick

Comments are in the code and the drought gif is at the end. I deliberately only had it loop once, so refresh the image if you want to see the progression again. Also, drop a note in the comments if anything needs more exposition. (NOTE: I was fairly bad and did virtually no file cleanup in the function, so you’ll have half a year’s shapefiles in your getwd(). Consider the cleanup an exercise for the reader :-)

library(rgdal)
library(sp)
library(albersusa) # devtools::install_github("hrbrmstr/albersusa")
library(ggplot2) # devtools::install_github("hadley/ggplot2")
library(ggthemes)
library(rgeos)

# the witch's brew
library(purrr)
library(broom)
library(magick)

#' Get a drought map shapefile and turn it into a PNG
drought_map <- function(wk) {

  # need to hush some chatty functions
  hush_tidy <- quietly(tidy)

  # some are more stubbon than others
  old_warn <- getOption("warn")
  options(warn=-1)

  week <- format(wk, "%Y%m%d")

  # get the drought shapefile only if we don't have it already
  URL <- sprintf("http://droughtmonitor.unl.edu/data/shapefiles_m/USDM_%s_M.zip", week)
  (fil <- basename(URL))
  if (!file.exists(fil)) download.file(URL, fil)
  unzip(fil)

  # read in the shapefile and reduce the polygon complexity
  dr <- readOGR(sprintf("USDM_%s.shp", week),
                sprintf("USDM_%s", week),
                verbose=FALSE,
                stringsAsFactors=FALSE)

  dr <- SpatialPolygonsDataFrame(gSimplify(dr, 0.01, TRUE), dr@data)

  # turn separate out each drought level into its own fortified data.frame
  map(dr$DM, ~subset(dr, DM==.)) %>%
    map(hush_tidy) %>%
    map_df("result", .id="DM") -> m

  # get a conus base map (prbly cld have done map_data("usa"), too)
  usa_composite() %>%
    subset(!(iso_3166_2 %in% c("AK", "HI"))) %>%
    hush_tidy() -> usa

  usa <- usa$result # an artifact of using quietly()

  # this is all Ushey's fault. the utility of cmd-enter to run
  # the entire ggplot2 chain (in RStudio) turns out to have a
  # greater productity boost (i measured it) than my shortcuts for
  # gg <- gg + snippets and hand-editing the "+" bits out when
  # editing old plot constructs. I'm not giving up on gg <- gg + tho

  # Note putting the "base" layer on top since we don't really
  # want to deal with alpha levels of the drought polygons and
  # we're only plotting the outline of the us/states, not filling
  # the interior(s).

  ggplot() +
    geom_map(data=m, map=m,
             aes(long, lat, fill=DM, map_id=id),
             color="#2b2b2b", size=0.05) +
    geom_map(data=usa, map=usa, aes(long, lat, map_id=id),
             color="#2b2b2b88", fill=NA, size=0.1) +
    scale_fill_brewer("Drought Level", palette="YlOrBr") +
    coord_map("polyconic", xlim=c(-130, -65), ylim=c(25, 50)) +
    labs(x=sprintf("Week: %s", wk)) +
    theme_map() +
    theme(axis.title=element_text()) +
    theme(axis.title.x=element_text()) +
    theme(axis.title.y=element_blank()) +
    theme(legend.position="bottom") +
    theme(legend.direction="horizontal") -> gg

  options(warn=old_warn) # put things back the way they were

  outfil <- sprintf("gg-dm-%s.png", wk)
  ggsave(outfil, gg, width=8, height=5)

  outfil

}

# - create a vector of weeks (minus the current one)
# - create the individual map PNGs
# - read the individual map PNGs into a list
# - join the images together
# - create the animated gif structure
# - write the gif to a file

seq(as.Date("2016-01-05"), Sys.Date(), by="1 week") %>%
  head(-1) %>%
  map(drought_map) %>%
  map(image_read) %>%
  image_join() %>%
  image_animate(fps=2, loop=1) %>%
  image_write("drought.gif")

NOTE: an updated, comment-free version of the above code block is in this gist and uses spdplyr::filter() vs subset(), keeps downloaded files tidy in a temporary directory and includes a progress bar vs raw, ugly download.file() messages.

The Wall Street Journal did a project piece [a while back](http://projects.wsj.com/waste-lands/) in the _”Waste Lands: America’s Forgotten Nuclear Legacy”_. They dug through [Department of Energy](http://www.lm.doe.gov/default.aspx?id=2602) and [CDC](http://www.cdc.gov/niosh/ocas/ocasawe.html) data to provide an overview of the lingering residue of this toxic time in America’s past (somehow, I have to believe the fracking phenomena of our modern era will end up doing far more damage in the long run).

Being a somewhat interactive piece, I was able to tease out the data source behind it for this week’s challenge. I’m, once again, removing the obvious vis and re-creating a non-interactive version of the WSJ’s main map view (with some additional details).

There’s definitely a story or two here, but I felt that the overall message fell a bit flat the way the WSJ folks told it. Can you find an angle or question that tells a tale in a more compelling fashion? I added some hints in the code snippet below (and in the repo) as to how you might find additional details for each toxic site (and said details are super-scrape-able with `rvest`). I also noticed some additional external data sets that could be brought in (but I’ll leave that detective work to our contestants).

If you’re up to the task, fork [this week’s repo](https://github.com/52vis/2016-15), create a subdirectory for your submission and shoot a PR my way (notifying folks here in the comments about your submission is also encouraged).

Entries accepted up until 2016-04-20 23:59:59 EDT.

Hadley has volunteered a signed book and I think I’ll take him up on the offer for this week’s prize (unless you really want a copy of Data-Driven Security :-).

One last note: I’ve secured `52vis.com` and will be getting that configured for next week’s contest. It’ll be a nice showcase site for all the submissions.

library(albersusa) # devtools::install_github("hrbrmstr/hrbrmisc")
library(rgeos)
library(maptools)
library(ggplot2)
library(ggalt)
library(ggthemes)
library(jsonlite)
library(tidyr)
library(dplyr)
library(purrr)
 
#' WSJ Waste Lands: http://projects.wsj.com/waste-lands/
#' Data from: http://www.lm.doe.gov/default.aspx?id=2602 &
#'            http://www.cdc.gov/niosh/ocas/ocasawe.html
 
sites <- fromJSON("sites.json", flatten=TRUE)
 
#' need to replace the 0-length data.frames with at least one row of `NA`s
#' so we can safely `unnest()` them later
 
sites$locations <- map(sites$locations, function(x) {
  if (nrow(x) == 0) {
    data_frame(latitude=NA, longitude=NA, postal_code=NA, name=NA, street_address=NA)
  } else {
    x
  }
})
 
#' we'll need this later
 
sites$site.rating <- factor(sites$site.rating,
                           levels=c(3:0),
                           labels=c("Remote or no potential for radioactive contamination, based on criteria at the time of FUSRAP review.",
                                    "Referred to another agency or program, no authority to clean up under FUSRAP, or status unclear.",
                                    "Cleanup declared complete under FUSRAP.",
                                    "Cleanup in progress under the Formerly Utilized Sites Remedial Action Program (FUSRAP)."))
 
#' One teensy discrepancy:
 
nrow(sites)
 
#' ## [1] 517
#'
#' The stacked bars total on the WSJ site is 515.
#' Further complication is that sites$locations is a list column with nested
#' data.frames:
 
sites <- unnest(sites)
 
nrow(sites)
 
#' ## [1] 549
 
sum(complete.cases(sites[,c("longitude", "latitude")]))
 
#' ## [1] 352
#'
#' So, just mapping long/lat is going to miss part of the story. But, I'm just
#' providing a kick-start for folks, so I'll just map long/lat :-)
 
glimpse(sites)
 
#' ## Observations: 549
#' ## Variables: 11
#' ## $ site.city      (chr) "Flint", "Albuquerque", "Buffalo", "Los...
#' ## $ site.name      (chr) "AC Spark Plug, Dort Highway Plant", "A...
#' ## $ site.rating    (fctr) Remote or no potential for radioactive...
#' ## $ site.state     (chr) "MI", "NM", "NY", "NM", "PA", "NY", "OH...
#' ## $ site.state_ap  (chr) "Mich.", "N.M.", "N.Y.", "N.M.", "Pa.",...
#' ## $ site.slug      (chr) "1-ac-spark-plug-dort-highway-plant", "...
#' ## $ latitude       (dbl) 43.02938, NA, NA, 35.88883, 39.95295, 4...
#' ## $ longitude      (dbl) -83.65525, NA, NA, -106.30502, -75.5927...
#' ## $ postal_code    (chr) "48506", NA, NA, "87544", "19382", "100...
#' ## $ name           (chr) "", NA, NA, "", "", "", "Former Buildin...
#' ## $ street_address (chr) "1300 North Dort Highway", NA, NA, "Pue...
 
#' Note that `site.slug` can be used with this URL:
#' `http://projects.wsj.com/waste-lands/site/SITE.SLUG.HERE/` to get to
#' detail pages on the WSJ site.
 
#' I want to use my `albersusa` mutated U.S. shapefile for this (NOTE: I'm moving
#' `albersus` into one of the rOpenSci pacakges soon vs publishing it standalone to CRAN)
#' so I need to mutate the Alaska points (there are no Hawaii points).
#' This step is *not necessary* unless you plan on displaying points on this
#' mutated map. I also realized I need to provide a mutated projection translation
#' function for AK & HI for the mutated Albers mapss.
 
tmp  <- data.frame(dplyr::select(filter(sites, site.state=="AK"), longitude, latitude))
coordinates(tmp) <- ~longitude+latitude
proj4string(tmp) <- CRS(us_longlat_proj)
tmp <- spTransform(tmp, CRS(us_laea_proj))
tmp <- elide(tmp, rotate=-50)
tmp <- elide(tmp, scale=max(apply(bbox(tmp), 1, diff)) / 2.3)
tmp <- elide(tmp, shift=c(-2100000, -2500000))
proj4string(tmp) <- CRS(us_laea_proj)
tmp <- spTransform(tmp, us_longlat_proj)
tmp <- as.data.frame(tmp)
 
sites[sites$site.state=="AK",]$longitude <- tmp$x
sites[sites$site.state=="AK",]$latitude <- tmp$y
 
#' and now we plot the sites
 
us_map <- fortify(usa_composite(), region="name")
 
gg <- ggplot()
gg <- gg + geom_map(data=us_map, map=us_map,
                    aes(x=long, y=lat, map_id=id),
                    color="#2b2b2b", size=0.15, fill="#e5e3df")
gg <- gg + geom_point(dat=sites, aes(x=longitude, y=latitude, fill=site.rating),
                      shape=21, color="white", stroke=1, alpha=1, size=3)
gg <- gg + scale_fill_manual(name="", values=c("#00a0b0", "#edc951", "#6a4a3c", "#eb6841"))
gg <- gg + coord_proj(us_laea_proj)
gg <- gg + guides(fill=guide_legend(override.aes=list(alpha=1, stroke=0.2, color="#2b2b2b", size=4)))
gg <- gg + labs(title="Waste Lands: America's Forgotten Nuclear Legacy",
                 caption="Data from the WSJ")
gg <- gg + theme_map()
gg <- gg + theme(legend.position="bottom")
gg <- gg + theme(legend.direction="vertical")
gg <- gg + theme(legend.key=element_blank())
gg <- gg + theme(plot.title=element_text(size=18, face="bold", hjust=0.5))
gg

waste

Folks who’ve been tracking this blog on R-bloggers probably remember [this post](https://rud.is/b/2014/11/16/moving-the-earth-well-alaska-hawaii-with-r/) where I showed how to create a composite U.S. map with an Albers projection (which is commonly referred to as AlbersUSA these days thanks to D3).

I’m not sure why I didn’t think of this earlier, but you don’t _need_ to do those geographical machinations every time you want a prettier & more inclusive map (Alaska & Hawaii have been states for a while, so perhaps we should make more of an effort to include them in both data sets and maps). After doing the map transformations, the composite shape can be saved out to a shapefile, preferably GeoJSON since (a) you can use `geojsonio::geojson_write()` to save it and (b) it’s a single file vs a ZIP/directory.

I did just that and saved both state and country maps out with FIPS codes and other useful data slot bits and created a small data package : [`albersusa`](https://github.com/hrbrmstr/albersusa) : with some helper functions. It’s not in CRAN yet so you need to `devtools::install_github(“hrbrmstr/albersusa”)` to use it. The github repo has some basic examples, heres a slightly more complex one.

### Mapping Obesity

I grabbed an [obesity data set](http://www.cdc.gov/diabetes/data/county.html) from the CDC and put together a compact example for how to make a composite U.S. county choropleth to show obesity rates per county (for 2012, which is the most recent data). I read in the Excel file, pull out the county FIPS code and 2012 obesity rate, then build the choropleth. It’s not a whole lot of code, but that’s one main reason for the package!

library(readxl)
library(rgeos)
library(maptools)
library(ggplot2)   # devtools::install_github("hadley/ggplot2") only if you want subtitles/captions
library(ggalt)
library(ggthemes)
library(albersusa) # devtools::install_github("hrbrmstr/albersusa")
library(viridis)
library(scales)
 
# get the data and be nice to the server and keep a copy of the data for offline use
 
URL <- "http://www.cdc.gov/diabetes/atlas/countydata/OBPREV/OB_PREV_ALL_STATES.xlsx"
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil)
 
# it's not a horrible Excel file, but we do need to hunt for the data
# and clean it up a bit. we just need FIPS & 2012 percent info
 
wrkbk <- read_excel(fil)
obesity_2012 <- setNames(wrkbk[-1, c(2, 61)], c("fips", "pct"))
obesity_2012$pct <- as.numeric(obesity_2012$pct) / 100
 
# I may make a version of this that returns a fortified data.frame but
# for now, we just need to read the built-in saved shapefile and turn it
# into something ggplot2 can handle
 
cmap <- fortify(counties_composite(), region="fips")
 
# and this is all it takes to make the map below
 
gg <- ggplot()
gg <- gg + geom_map(data=cmap, map=cmap,
                    aes(x=long, y=lat, map_id=id),
                    color="#2b2b2b", size=0.05, fill=NA)
gg <- gg + geom_map(data=obesity_2012, map=cmap,
                    aes(fill=pct, map_id=fips),
                    color="#2b2b2b", size=0.05)
gg <- gg + scale_fill_viridis(name="Obesity", labels=percent)
gg <- gg + coord_proj(us_laea_proj)
gg <- gg + labs(title="U.S. Obesity Rate by County (2012)",
                subtitle="Content source: Centers for Disease Control and Prevention",
           caption="Data from http://www.cdc.gov/diabetes/atlas/countydata/County_ListofIndicators.html")
gg <- gg + theme_map(base_family="Arial Narrow")
gg <- gg + theme(legend.position=c(0.8, 0.25))
gg <- gg + theme(plot.title=element_text(face="bold", size=14, margin=margin(b=6)))
gg <- gg + theme(plot.subtitle=element_text(size=10, margin=margin(b=-14)))
gg

Fullscreen_3_29_16__9_06_AM

### Fin

Note that some cartographers think of this particular map view the way I look at a pie chart, but it’s a compact & convenient way to keep the states/counties together and will make it easier to include Alaska & Hawaii in your cartographic visualizations.

The composite GeoJSON files are in:

– `system.file(“extdata/composite_us_states.geojson.gz”, package=”albersusa”)`
– `system.file(“extdata/composite_us_counties.geojson.gz”, package=”albersusa”)`

if you want to use them in another program/context.

Drop an issue [on github](https://github.com/hrbrmstr/albersusa) if you want any more default fields in the data slot and if you “need” territories (I’d rather have a PR for the latter tho :-).

My wife tricked me into a partial-weekend project to try to get all the primary/caucus results to-date on a map (the whole us). This is challenging since not all states use counties as boundaries for aggregate results. I’m still piecing together some shapefiles for the primary/caucus summation boundaries for a couple remaining states but I didn’t want to let the data source for the election results go without a mention.

The bestest part of the `iframe` below (which can be busted with [this link](/projects/primaryplotting.html)) is the CNN JSON link. You can discover those with Developer Tools on any modern browser. Here’s [the rest](https://gist.github.com/hrbrmstr/25a53e2fcaee2aafa908) of those links (using a gist to add enough layers of redirection to hopefully keep this data free/available).

It’s really well-formatted JSON. As of this post, not all those links completely work (the Maine & PR results weren’t certified yet). Please credit the hard-working folks at CNN whenever/wherever you use this data (if you use it at all). Making a resource like this available is a great service (even if it wasn’t 100% intentional).

The rest of the post shows how to display the voting % per top-candidate in each Texas county. Because Texas uses counties for roll-up aggregation, we can also use `tigris` to get great maps.



I saw this post over at NatGeo over the weekend and felt compelled to replicate this:

with ggplot2.

Three shapefiles later and we have it close enough to toss into a post (and I really don’t believe the continent names are necessary).

library(rgdal)
library(ggplot2)
library(ggthemes)
library(ggalt) # devtools::install_github("hrbrmstr/ggalt")

# grab these from http://rud.is/dl/quakefiles.tgz

world <- readOGR("countries.geo.json", "OGRGeoJSON", stringsAsFactors=FALSE)
plates <- readOGR("plates.json", "OGRGeoJSON", stringsAsFactors=FALSE)
quakes <- readOGR("quakes.json", "OGRGeoJSON", stringsAsFactors=FALSE)

world_map <- fortify(world)
plates_map <- fortify(plates)
quakes_dat <- data.frame(quakes)
quakes_dat$trans <- quakes_dat$mag %% 5

gg <- ggplot()
gg <- gg + geom_cartogram(data=world_map, map=world_map,
                          aes(x=long, y=lat, map_id=id),
                          color="white", size=0.15, fill="#d8d8d6")
gg <- gg + geom_cartogram(data=plates_map, map=plates_map,
                          aes(x=long, y=lat, map_id=id),
                          color="black", size=0.1, fill="#00000000", alpha=0)
gg <- gg + geom_point(data=quakes_dat,
                      aes(x=coords.x1, y=coords.x2, size=trans),
                      shape=1, alpha=1/3, color="#d47e5d", fill="#00000000")
gg <- gg + geom_point(data=subset(quakes_dat, mag>7.5),
                      aes(x=coords.x1, y=coords.x2, size=trans),
                      shape=1, alpha=1, color="black", fill="#00000000")
gg <- gg + geom_text(data=subset(quakes_dat, mag>7.5),
                     aes(x=coords.x1, y=coords.x2, label=sprintf("Mag %2.1f", mag)),
                     color="black", size=3, vjust=c(3.9, 3.9, 5), fontface="bold")
gg <- gg + scale_size(name="Magnitude", trans="exp", labels=c(5:8), range=c(1, 20))
gg <- gg + coord_map("mollweide")
gg <- gg + theme_map()
gg <- gg + theme(legend.position=c(0.05, 0.99))
gg <- gg + theme(legend.direction="horizontal")
gg <- gg + theme(legend.key=element_rect(color="#00000000"))
gg

unnamed-chunk-1-1

I can only imagine how many mouse clicks that would be in a GIS program.

Addendum

Avast, me hearties! It’s time four t’ annual International Talk Like a Pirate Day #rstats post!

(OK, I won’t make you suffer continuous pirate-speak for the entire post)

I tried to be a bit more practical this year and have two treasuRe chests for you to (hopefully) enjoy.

A Package Full o’ Pirates

I’ve covered the Anti-shipping Activity Messages (ASAM) Database before for TLAPD before but getting, updating and working with the data has more pain points than it should, so I wrapped a small package around it.

Here’s how to get all pirate attacks this year (2015) so far:

# devtools::install_github("hrbrmstr/asam")
library(asam)
 
data(asam_shp)
pirates <- subset(asam_shp,
                  grepl("pirate", Aggressor, ignore.case=TRUE) &
                  format(DateOfOcc, "%Y") == "2015")
 
nrow(pirates)
## [1] 78

It looks like there have been 78 registered pirate attacks this year. The National Geospatial Intelligence Agency (NGIA) marks the attacks by lat/lon and also by region/subregion, and I managed to obtain the official polygons for these regions, so we can plot these attacks on a world map and also show the subregions:

library(ggplot2)
 
# get the ASAM subregion polygons
subregions <- asam_subregions()
subregions_map <- fortify(subregions)
 
# get the world map
world <- map_data("world")
 
# get the points for the pirate attack occurrences
pirate_pts <- data.frame(pirates)
 
gg <- ggplot()
 
# world map layer
gg <- gg + geom_map(data=world, map=world,
                    aes(x=long, y=lat, map_id=region),
                    color="black", fill="#e7e7e7", size=0.15)
# ASAM regions layer
gg <- gg + geom_map(data=subregions_map, map=subregions_map,
                    aes(x=long, y=lat, map_id=id),
                    color="white", fill="white", size=0.15, alpha=0)
 
# attacks
gg <- gg + geom_point(data=pirate_pts, color="black", fill="yellow", 
                      aes(x=coords.x1, y=coords.x2), shape=21)
 
gg <- gg + xlim(-170, 170)
gg <- gg + ylim(-58, 75)
gg <- gg + coord_map("mollweide")
gg <- gg + theme_map()
gg <- gg + theme(panel.background=element_rect(fill="steelblue"))
gg

README-map-1

There is quite a bit more data than just location, though and we can work with it much better in an interactive map.

Makin’ Interactive Pirate Maps

Now, what makes the following an interactive pirate map is not so much the fact that we’ll be plotting points of pirate attacks on a Leaflet map, but we’ll also be using a pirate treasure map theme on the Leaflet map.

Let’s start with showing how to use a general pirate map theme before adding in the ASAM data.

You’ll have to pause here and head on over to MapBox to register for a (free) account. You’ll need to go through the gyrations to eventually get a public token and mapbox id to use the pirate map tiles they have. I store those in my .Renviron so I don’t have to cut/paste inane strings when I need to use this, or other, APIs or need to keep them from prying eyes. Since MapBox exposes these strings in GET call URLs, the use of environment variables is strictly for convenience in this case.

library(leaflet)
 
mapbox_public_token <- Sys.getenv("MAPBOX_PUBLIC_TOKEN")
mapbox_map_id <- Sys.getenv("PIRATE_MAP_ID")
mapbox_url <- "https://a.tiles.mapbox.com/v4/%s/{z}/{x}/{y}.png?access_token=%s"
mapbox_tiles_template <- sprintf(mapbox_url, mapbox_map_id, mapbox_public_token)

Now, what good is a pirate map without an ‘X’ marking the spot for some treasure. For that we’ll need an ‘X’:

in a format we can use with Leaflet:

x_marker <- icons("http://rud.is/dl/x.png",
                  iconHeight=64, iconWidth=64,
                  iconAnchorX=32, iconAnchorY=32)

Now, we can display a pirate map for all scurvy dogs to see:

leaflet() %>%
  addTiles(mapbox_tiles_template) %>%
  setView(lng=-50.9249, lat=45.68929, zoom=3) %>%
  addMarkers(-70.2667, 43.6667, icon=x_marker)

NOTE: I have not buried treasure in Portland, Maine, but go nuts digging at that location if you still want to.

Pirates on Pirate Maps

We can make a [crude] interactive ASAM browser by combining our data from above with our new, pirate-y mapping capabilities:

library(asam)
library(sp)
library(dplyr)
library(leaflet)
 
data(asam_shp)
dat <- subset(asam_shp,
              DateOfOcc > as.Date("2015-01-01") &
                grepl("pirate", Aggressor, ignore.case=TRUE))
# could also do data.frame(dat)
dat <- bind_cols(dat@data, data.frame(coordinates(dat), stringsAsFactors=FALSE))

We’ll build a popup with the ASAM incident description fields and add it and the pirate incident points to a pirate-themed Leaflet map:

popup_template <- '<div style="background:#f3e0b5; padding:10px"><b>Date:</b> %s
<span style="float:right"><a target="_blank" href="https://msi.nga.mil/NGAPortal/msi/query_results.jsp?MSI_queryType=ASAM&amp;MSI_generalFilterType=SpecificNumber&amp;MSI_generalFilterValue=%s&amp;MSI_additionalFilterType1=None&amp;MSI_additionalFilterType2=-999&amp;MSI_additionalFilterValue1=-999&amp;MSI_additionalFilterValue2=-999&amp;MSI_outputOptionType1=SortBy&amp;MSI_outputOptionType2=-999&amp;MSI_outputOptionValue1=Date_DESC&amp;MSI_outputOptionValue2=-999&amp;MSI_MAP=-999">ASAM Record</a>
</span><br/>
<b>Victim:</b> %s<br/>
<b>Description:</b> %s</div>'
 
nona <- function(x) ifelse(is.na(x), " ", x)
 
pirate_pops <- sprintf(popup_template,
                       dat$date,
                       gsub("-", "_", dat$Reference),
                       dat$Victim,
                       paste0(nona(dat$Descript),
                              nona(dat$Desc1), nona(dat$Desc2), nona(dat$Desc3),
                              nona(dat$Desc4), nona(dat$Desc5), nona(dat$Desc6),
                              sep=" "))
 
mapbox_public_token <- Sys.getenv("MAPBOX_PUBLIC_TOKEN")
mapbox_map_id <- Sys.getenv("PIRATE_MAP_ID")
mapbox_url <- "https://a.tiles.mapbox.com/v4/%s/{z}/{x}/{y}.png?access_token=%s"
mapbox_tiles_template <- sprintf(mapbox_url, mapbox_map_id, mapbox_public_token)
 
leaflet() %>%
  addTiles(mapbox_tiles_template) %>%
  setView(lng=-50.9249, lat=45.68929, zoom=3) %>%
  addCircles(dat$coords.x1, dat$coords.x2, radius=300,
             color="#664c1f", popup=pirate_pops)

Select any of the circle marks and you’ll get a popup with a description and link to the official ASAM record (like this):

tlapd2015012_html

Fin

I’m not sure when I’ll get back to the asam package, but it could use some attention. The Aggressor field could be auto-cleaned to make it more usable and a dplyr-esque interface could be developed to select incidents. Also, since it includes a shapefile of subregions, that could also be used to do more spatial-oriented analyses of the incidents. It’s all there for any pirate lackey to pilfer.

Drop a note in the comments if you have any of your own pirate-y creations or an issue on github for feature requests & bug reports.