Skip navigation

Category Archives: gis

The 30-Day Map Challenge is on again, and I’m hoping to be able to scrounge some time to get an entry for each day. Day 2 is lines (Day 1 was posted on Twitter only) and — while I’m hoping to focus on saving U.S. democracy for the majority of the entries, today’s is a short one that shows all the walks/hikes/boats we took during our Iceland vacation.

I use HealthFit to export data (automagically) from all my Apple Watch, Garmin, and Peloton activities, and have it auto-sync to iCloud, which means I have instant access to all the Garmin FIT files it creates in ~/Library/Mobile Documents/iCloud~com~altifondo~HealthFit/.

We can use {FITFileR} to read in those files, turn the points into lines, and plot them all on a {leaflet} map, so I can get an interactive view into the paths.

If I had more time, I’d add activity names and other clickable statistics in popups, which is pretty straightforward in {leaflet}.

Comments in code hopefully explain the workflow:

library(sf)
library(FITfileR) # remotes::install_github("grimbough/FITfileR")
library(leaflet)
library(tidyverse)

# get a listing of all the files for when we were in Iceland
list.files(
  path = "~/Library/Mobile Documents/iCloud~com~altifondo~HealthFit/Documents", 
  pattern = "(2021-07-3[01]|2021-08-0[1-9])",
  full.names = TRUE
) %>% 
  map(
    ~readFitFile(.x) %>%          # read the file in
      records() %>%               # turn into something we can then bind into a data frame
      bind_rows() %>%             # bind ^^ into a data frame!
      mutate(file = basename(.x)) # add the file info in case we want to eventually make a popup of information
  ) -> iceland_fit_files


# look at the data

iceland_fit_files[[1]]
## # A tibble: 2,210 × 12
##    timestamp           position_lat position_long gps_accuracy altitude distance speed temperature heart_rate cadence
##    <dttm>                     <dbl>         <dbl>        <dbl>    <dbl>    <dbl> <dbl>       <dbl>      <dbl>   <dbl>
##  1 2021-07-30 06:28:26         64.2         -21.9            2     21       318. 0.221          13        116      NA
##  2 2021-07-30 06:28:27         64.2         -21.9            2     21       318. 0.521          13        116      NA
##  3 2021-07-30 06:28:28         64.2         -21.9            2     21       319. 0.792          13        116      NA
##  4 2021-07-30 06:28:47         64.2         -21.9            2     19.4     338. 0.217          13        118      NA
##  5 2021-07-30 06:28:48         64.2         -21.9            2     19.2     338. 0.147          13        118      NA
##  6 2021-07-30 06:28:49         64.2         -21.9            2     19       338. 0.109          13        117      NA
##  7 2021-07-30 06:28:50         64.2         -21.9            2     19       338. 0.076          13        117      NA
##  8 2021-07-30 06:28:51         64.2         -21.9            2     18.8     338. 0.036          13        116      NA
##  9 2021-07-30 06:28:52         64.2         -21.9            2     18.6     338. 0.153          13        115      NA
## 10 2021-07-30 06:28:53         64.2         -21.9            2     18.4     339. 0.393          13        115      NA
## # … with 2,200 more rows, and 2 more variables: fractional_cadence <dbl>, file <chr>

iceland_fit_files[map_lgl(iceland_fit_files, has_name, "position_long")] %>% # only want activities with geo data
  map(
    ~.x %>% 
      filter(!is.na(position_long), !is.na(position_lat)) %>% # {sf} hates NAs
      st_as_sf(
        coords = c("position_long", "position_lat"), # turn the data frame into an {sf} object
        crs = 4326
      )
  ) %>% 
  bind_rows() %>% # this makes one big data frame
  group_by(file) %>%  # which we can turn into individual geometries
  summarise(          # with the epic summarise() function
    m = max(distance)
  ) %>% 
  st_cast("LINESTRING") -> paths # and then turn the points into linestrings

# let's take a look

paths
## Simple feature collection with 33 features and 2 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: -23.92788 ymin: 63.40121 xmax: -16.18086 ymax: 65.08101
## Geodetic CRS:  WGS 84
## # A tibble: 33 × 3
##    file                                        m                                                                    geometry
##    <chr>                                   <dbl>                                                            <LINESTRING [°]>
##  1 2021-07-30-062401-Walking-Chetzmoka.fit 2807. (-21.94386 64.15495, -21.94382 64.15497, -21.94384 64.15496, -21.94388 64.…
##  2 2021-07-30-101146-Walking-Chetzmoka.fit 1191. (-21.93504 64.14761, -21.93502 64.1476, -21.93503 64.14761, -21.93501 64.1…
##  3 2021-07-30-105706-Walking-Chetzmoka.fit 1554. (-21.91706 64.12906, -21.91704 64.12905, -21.91702 64.12905, -21.917 64.12…
##  4 2021-07-30-143247-Walking-Chetzmoka.fit 1620. (-21.94977 64.15683, -21.94978 64.15685, -21.94979 64.15686, -21.94995 64.…
##  5 2021-07-31-122122-Walking-Chetzmoka.fit  702. (-22.26431 64.7628, -22.26463 64.76297, -22.26527 64.76332, -22.26496 64.7…
##  6 2021-07-31-131832-Walking-Chetzmoka.fit 2084. (-22.84353 64.90553, -22.84351 64.90553, -22.84349 64.90553, -22.84347 64.…
##  7 2021-07-31-182725-Walking-Chetzmoka.fit  578. (-22.72256 65.0809, -22.72255 65.0809, -22.72254 65.08089, -22.72252 65.08…
##  8 2021-08-01-105420-Walking-Chetzmoka.fit 1210. (-23.63732 64.79827, -23.6373 64.79829, -23.6373 64.79829, -23.6373 64.798…
##  9 2021-08-01-142847-Walking-Chetzmoka.fit 2385. (-23.80382 64.73048, -23.8038 64.73047, -23.8038 64.73047, -23.8038 64.730…
## 10 2021-08-01-165642-Walking-Chetzmoka.fit  423. (-23.92745 64.85198, -23.92749 64.85197, -23.92747 64.85197, -23.92746 64.…
## # … with 23 more rows

# make room locally

dir.create("~/projects/2021-iceland", showWarnings = FALSE)

# save the widget out

paths %>% 
  filter(!grepl("2021-08-10", file)) %>% 
  leaflet() %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(color = "green") %>% 
  htmlwidgets::saveWidget(
    file = "~/projects/2021-iceland/index.html",
    selfcontained = FALSE
  )

# rsync it out to my website
#

This is the (static) overview:

And, this is a zoom into our boating tour of a glacier lagoon:

Hit up the widget to see where we did our Iceland activities this summer!

My wife tricked me into a partial-weekend project to try to get all the primary/caucus results to-date on a map (the whole us). This is challenging since not all states use counties as boundaries for aggregate results. I’m still piecing together some shapefiles for the primary/caucus summation boundaries for a couple remaining states but I didn’t want to let the data source for the election results go without a mention.

The bestest part of the `iframe` below (which can be busted with [this link](/projects/primaryplotting.html)) is the CNN JSON link. You can discover those with Developer Tools on any modern browser. Here’s [the rest](https://gist.github.com/hrbrmstr/25a53e2fcaee2aafa908) of those links (using a gist to add enough layers of redirection to hopefully keep this data free/available).

It’s really well-formatted JSON. As of this post, not all those links completely work (the Maine & PR results weren’t certified yet). Please credit the hard-working folks at CNN whenever/wherever you use this data (if you use it at all). Making a resource like this available is a great service (even if it wasn’t 100% intentional).

The rest of the post shows how to display the voting % per top-candidate in each Texas county. Because Texas uses counties for roll-up aggregation, we can also use `tigris` to get great maps.



I saw this post over at NatGeo over the weekend and felt compelled to replicate this:

with ggplot2.

Three shapefiles later and we have it close enough to toss into a post (and I really don’t believe the continent names are necessary).

library(rgdal)
library(ggplot2)
library(ggthemes)
library(ggalt) # devtools::install_github("hrbrmstr/ggalt")

# grab these from http://rud.is/dl/quakefiles.tgz

world <- readOGR("countries.geo.json", "OGRGeoJSON", stringsAsFactors=FALSE)
plates <- readOGR("plates.json", "OGRGeoJSON", stringsAsFactors=FALSE)
quakes <- readOGR("quakes.json", "OGRGeoJSON", stringsAsFactors=FALSE)

world_map <- fortify(world)
plates_map <- fortify(plates)
quakes_dat <- data.frame(quakes)
quakes_dat$trans <- quakes_dat$mag %% 5

gg <- ggplot()
gg <- gg + geom_cartogram(data=world_map, map=world_map,
                          aes(x=long, y=lat, map_id=id),
                          color="white", size=0.15, fill="#d8d8d6")
gg <- gg + geom_cartogram(data=plates_map, map=plates_map,
                          aes(x=long, y=lat, map_id=id),
                          color="black", size=0.1, fill="#00000000", alpha=0)
gg <- gg + geom_point(data=quakes_dat,
                      aes(x=coords.x1, y=coords.x2, size=trans),
                      shape=1, alpha=1/3, color="#d47e5d", fill="#00000000")
gg <- gg + geom_point(data=subset(quakes_dat, mag>7.5),
                      aes(x=coords.x1, y=coords.x2, size=trans),
                      shape=1, alpha=1, color="black", fill="#00000000")
gg <- gg + geom_text(data=subset(quakes_dat, mag>7.5),
                     aes(x=coords.x1, y=coords.x2, label=sprintf("Mag %2.1f", mag)),
                     color="black", size=3, vjust=c(3.9, 3.9, 5), fontface="bold")
gg <- gg + scale_size(name="Magnitude", trans="exp", labels=c(5:8), range=c(1, 20))
gg <- gg + coord_map("mollweide")
gg <- gg + theme_map()
gg <- gg + theme(legend.position=c(0.05, 0.99))
gg <- gg + theme(legend.direction="horizontal")
gg <- gg + theme(legend.key=element_rect(color="#00000000"))
gg

unnamed-chunk-1-1

I can only imagine how many mouse clicks that would be in a GIS program.

Addendum

I engage with the Stack[Overflow|Exchange] community quite a bit and was super-happy @treycausey made the [Stack Overflow bot](https://twitter.com/StackOverflowR) (@StackOverflowR) since I’m also on Twitter alot (mostly hanging out in these days).

However, questions exist in other Stack watering holes, like the [Geographic Information Systems Stack Exchange](http://gis.stackexchange.com/questions/tagged/r). [Cross Validated](http://stats.stackexchange.com/questions/tagged/r) and the fledgling [Data Science Stack Exchange](http://datascience.stackexchange.com/questions/tagged/r). They are all easy enough to follow in your favorite RSS reader (which _is_ @feedly, _right_?), but it’s also equally as easy to turn them into Twitter bots (here they are):

– @DataSciSERBot (Data Science Stack Exchange posts)
– @GISStackExchR (Geographic Information Systems posts)
– @RStatsStExBot (Cross Validated Stack Exchange posts)

They use [this IFTTT recipe](https://ifttt.com/recipes/322136-twitter-bot-for-data-science-stack-exchange-rstats-questions) to take the RSS feeds from each Stack community and turn them into tweets. Each forum (and, hence, Twitter bot) has _way_ less volume than @StackOverflowR and also tend to be of less general interest than @StackOverflowR. However, each specialized question forums [usually] have _really_ interesting problems & answers that I’ve learned a great deal from. You may get inspiration for a solution to something, see a really neat way to accomplish a task, get an idea for a new R package or just gain new and useful knowledge in areas previously unfamiliar to you.

They won’t be spamming the Twitter channel (the errant tag on one of the tweets was fixed in the recipe), so you’ll have to follow them or deliberately check in on them to see the updates.

I’ll try to keep an eye out for other Stack communities that feature and add them to the bot family. If you see any I’ve missed or am missing in the future, drop a comment here or note to @hrbrmstr on Twitter.

Poynter did a nice interactive piece on world population by income (i.e. “How Many Live on How Much, and Where”). I’m always on the lookout for optimized shapefiles and clean data (I’m teaching a data science certificate program starting this Fall) and the speed of the site load and the easy availability of the data set made this one a “must acquire”. Rather than just repeat Poynter’s D3-goodness, here’s a way to look at the income data in series of small multiple choropleths—using R & ggplot2—that involves:

  • downloading data & shapefiles from a web site
  • using dplyr & tidyr for data munging
  • applying custom fill color scale mapping in ggplot
  • ordering plots with a custom facet order (using factors)
  • tweaking the theme and aesthetics for a nicely finished result

By using D3, Poynter inherently made the data available. Pop open the “Developer Tools” in any browser, reload the page and look at the “Network” tab and you’ll see a list of files (you can sometimes see things in the source code, but this technique is often faster). The income data is a well-formed CSV file http://www.pewglobal.org/wp-content/themes/pew-global/interactive-global-class.csv and their highly optimized world map was also easy to discern http://www.pewglobal.org/wp-content/lib/js/world-geo.json. We’ll start by grabbing the map and using the same map projection that Poynter did (Robinson). Don’t be put off by all the library calls since one of the best parts of R is the ever-increasing repository of great packages to help you get things done.

library(httr)     # getting data
library(rgdal)    # working with shapefile
library(dplyr)    # awesome data manipulation
library(readr)    # faster reading of CSV data
library(stringi)  # string manipulation
library(stringr)  # string manipulation
library(tidyr)    # reshaping data
library(grid)     # for 'unit'
library(scales)   # for 'percent'
library(ggplot2)  # plotting
library(ggthemes) # theme_map
 
# this ensures you only download the shapefile once and hides
# errors and warnings. remove `try` and `invisible` to see messages
try(invisible(GET("http://www.pewglobal.org/wp-content/lib/js/world-geo.json",
                  write_disk("world-geo.json"))), silent=TRUE)
 
# use ogrListLayers("world-geo.json") to see file type & 
# layer info to use in the call to readOGR
 
world <- readOGR("world-geo.json", "OGRGeoJSON")
world_wt <- spTransform(world, CRS("+proj=robin"))
world_map <- fortify(world_wt)

I would have liked to do fortify(world_wt, region="name") (since that makes working with filling in countries by name much easier in the choropleth part of the code) but that generated TopologyException errors (I’ve seen this happen quite a bit with simplified/optimized shapefiles and some non-D3 geo-packages). One can sometimes fix those with a strategic rgeos::gBuffer call, but that didn’t work well in this case. We can still use country names with a slight rejiggering of the fortified data frame using dplyr:

world_map %>%
  left_join(data_frame(id=rownames(world@data), name=world@data$name)) %>%
  select(-id) %>%
  rename(id=name) -> world_map

Now it’s time to get the data. The CSV file has annoying spaces in it that causes R to interpret all the columns as strings, so we can use dplyr again to get them into the format we want them in. Note that I’m also making the percentages decimals so we can use percent later on to easily format them.

# a good exercise would be to repeat the download code above 
# rather and make repeated calls to an external resource
read_csv("http://www.pewglobal.org/wp-content/themes/pew-global/interactive-global-class.csv") %>%
  mutate_each(funs(str_trim)) %>%
  filter(id != "None") %>%
  mutate_each(funs(as.numeric(.)/100), -name, -id) -> dat

For this post, we’ll only be working with the actual share percentages, so let’s:

  • ignore the “change” columns
  • convert the data frame from wide to long
  • extract out the income levels (e.g. “Poor”, “Low Income”…)
  • set a factor order for them so our plots will be in the correct sequence
dat %>%
  gather(share, value, starts_with("Share"), -name, -id) %>%
  select(-starts_with("Change")) %>%
  mutate(label=factor(stri_trans_totitle(str_match(share, "Share ([[:alpha:]- ]+),")[,2]),
                      c("Poor", "Low Income", "Middle Income", "Upper-Middle Income", "High Income"),
                      ordered=TRUE)) -> share_dat

The stringi package is really handy (stringr is built on it, too). The stri_trans_totitle function alleviates some mundane string operations and the stri_replace_all_regex (below) also allows us to do vectorized regular expression replacements without a ton of code.

To keep the charts aligned, we’ll use Poynter’s color scale (which was easy to extract from the site’s code) and use the same legend breaks via `cut’. We’ll also format the labels for these breaks to make our legend nicer to view.

# use same "cuts" as poynter
poynter_scale_breaks <- c(0, 2.5, 5, 10, 25, 50, 75, 80, 100)
 
sprintf("%2.1f-%s", poynter_scale_breaks, percent(lead(poynter_scale_breaks/100))) %>%
  stri_replace_all_regex(c("^0.0", "-NA%"), c("0", "%"), vectorize_all=FALSE) %>%
  head(-1) -> breaks_labels
 
share_dat %>%
  mutate(`Share %`=cut(value,
                       c(0, 2.5, 5, 10, 25, 50, 75, 80, 100)/100,
                       breaks_labels))-> share_dat
 
share_pal <- c("#eaecd8", "#d6dab3", "#c2c98b", "#949D48", "#6e7537", "#494E24", "#BB792A", "#7C441C", "#ffffff")

Finally, we get to the good part and start plotting the visualization. There are only two layers: the base map and the choropleth filled map. We then:

  • apply our manual color palette to them
  • remove the line color slashes in the legend boxes
  • setup the overall plot label
  • tell ggplot which coordinate system to use (in this case coord_equal is fine since we already projected the points)
  • apply a base theme that’s good for mapping
  • tweak the text and ensure our legend is in the position we want it to be ing
gg <- ggplot()
 
gg <- gg + geom_map(data=world_map, map=world_map,
                    aes(x=long, y=lat, group=group, map_id=id),
                    color="#7f7f7f", fill="white", size=0.15)
 
gg <- gg + geom_map(data=share_dat, map=world_map,
                    aes(map_id=name, fill=`Share %`),
                    color="#7f7f7f", size=0.15)
 
gg <- gg + scale_fill_manual(values=share_pal)
 
gg <- gg + guides(fill=guide_legend(override.aes=list(colour=NA)))
gg <- gg + labs(title="World Population by Income\n")
gg <- gg + facet_wrap(~label, ncol=2)
 
gg <- gg + coord_equal()
 
gg <- gg + theme_map()
 
gg <- gg + theme(panel.margin=unit(1, "lines"))
gg <- gg + theme(plot.title=element_text(face="bold", hjust=0, size=24))
gg <- gg + theme(legend.title=element_text(face="bold", hjust=0, size=12))
gg <- gg + theme(legend.text=element_text(size=10))
gg <- gg + theme(strip.text=element_text(face="bold", size=10))
gg <- gg + theme(strip.background=element_blank())
gg <- gg + theme(legend.position="bottom")
 
gg

And, here’s the result (click for larger version):

forblog

The optimized shapefile makes for a very fast plot and you can plot individual chorpleths by filtering the data and not using facets.

While there are a number of choropleth packages out there for R, learning how to do the core components on your own can (a) make you appreciate those packages a bit more (b) give you the skills to do them on your own when you need a more customized version. Many of the theme tweaks will also apply to the ggplot-based choropleth packages.

With this base, it should be a fun exercise to the reader to do something similar with Poynter’s “percentage point change” choropleth. You’ll need to change the color palette and manipulate different data columns to get the same scales and visual representation they do. Drop a note in the comments if you give this a go!

You can find all the code in this post in one convenient gist.

There’s been lots of buzz about “statebin” maps of late. A recent tweet by @andrewxhill referencing work by @dannydb pointed to a nice shapefile (alternate link) that ends up being a really great way to handle statebin maps (and I feel like a fool for not considering it for a more generic solution earlier).

Here’s a way to use the GeoJSON version in R. I like GeoJSON since it’s a single file vs a directory of files and is readable vs binary. If you’re in a TL;DR hurry, you can just review the code in this gist. Read on for expository.

Taking a look around

When you download the GeoJSON, it should be in a file called us_states_hexgrid.geojson. We can see what’s in there with R pretty easily:

library(rgdal)

ogrInfo("us_states_hexgrid.geojson", "OGRGeoJSON")

## Source: "us_states_hexgrid.geojson", layer: "OGRGeoJSON"
## Driver: GeoJSON number of rows 51 
## Feature type: wkbPolygon with 2 dimensions
## Extent: (-137.9747 26.39343) - (-69.90286 55.3132)
## CRS: +proj=longlat +datum=WGS84 +no_defs  
## Number of fields: 6 
##         name type length typeName
## 1 cartodb_id    0      0  Integer
## 2 created_at    4      0   String
## 3 updated_at    4      0   String
## 4      label    4      0   String
## 5       bees    2      0     Real
## 6  iso3166_2    4      0   String

Along with the basic shapefile goodness, we have some data, too! We’ll use all this to make a chorpleth hexbin of “bees” (I have no idea what this is but assume it has something to do with bee population, which is a serious problem on the planet right now). Let’s dig in.

Plotting the bins

First we need to read in the data, which is pretty simple:


us <- readOGR("us_states_hexgrid.geojson", "OGRGeoJSON")

That ends up being a fairly complex object with polygons and data. However, we can take a quick look at it with base R graphics:


plot(us)

base1

Yay! While we could do most (if not all) the remainder of the graphics in base R, I personally believe ggplot is more intuitive and expressive, so let's do the same thing with ggplot. First, we'll have to get the data structure into something ggplot can handle:

library(ggplot2)

us_map <- fortify(us, region="iso3166_2")

That gives us a data frame with the 2-letter state abbreviations as the "region" keys. Now we can do a basic ggplot:

ggplot(data=us_map, aes(map_id=id, x=long, y=lat)) + 
  geom_map(map=us_map, color="black", fill="white")

Rplot

Ugh. Talk about ugly. But, at least it works! Now all we need to do is turn it into a choropleth, remove some map chart junk and make it look prettier!

Upping the aesthetics

There's a pretty good idiom for making maps in R. There's a handy layer/geom called geom_map which takes care of a ton of details under the hood. We can use it for making outlines and fills and add as many layers of them as we want/need. For our needs, we'll:

  • put down a base layer of polygons
  • add a fill layer for our data
  • get rid of map chart junk

This is all pretty straightforward once you get the hang of it:

g <- ggplot()

# Plot base map -----------------------------------------------------------

gg <- gg + geom_map(data=us_map, map=us_map,
                    aes(x=long, y=lat, map_id=id),
                    color="white", size=0.5)

# Plot filled polygons ----------------------------------------------------

gg <- gg + geom_map(data=us@data, map=us_map,
                    aes(fill=bees, map_id=iso3166_2))

# Remove chart junk for the “map" -----------------------------------------

gg <- gg + labs(x=NULL, y=NULL)
gg <- gg + theme_bw()
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(panel.grid=element_blank())
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(axis.text=element_blank())
gg

Rplot01

Definitely better, but it still needs work. Outlines would be good and it definitely needs a better color palette. It would also be nice if the polygons weren't "warped". We can fix these issues by adding in a few other elements:

gg <- ggplot()
gg <- gg + geom_map(data=us_map, map=us_map,
                    aes(x=long, y=lat, map_id=id),
                    color="white", size=0.5)
gg <- gg + geom_map(data=us@data, map=us_map,
                    aes(fill=bees, map_id=iso3166_2))

# Overlay borders without ugly line on legend -----------------------------

gg <- gg + geom_map(data=us@data, map=us_map,
                    aes(map_id=iso3166_2),
                    fill="#ffffff", alpha=0, color="white",
                    show_guide=FALSE)

# ColorBrewer scale; using distiller for discrete vs continuous -----------

gg <- gg + scale_fill_distiller(palette="RdPu", na.value="#7f7f7f")

# coord_map mercator works best for the display ---------------------------

gg <- gg + coord_map()

gg <- gg + labs(x=NULL, y=NULL)
gg <- gg + theme_bw()
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(panel.grid=element_blank())
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(axis.text=element_blank())
gg

Rplot02

Much better. We use a "hack" to keep the legend free of white slash marks for the polygon outlines (see the comments for a less-hackish way) and coord_map to let the projection handle the "unwarping". By using the distiller fill, we get discrete color bins vs continuous shades (use what you feel is most appropriate, though, for your own work).

Where are we?

Most statebin/hexbin maps still (probably) need state labels since (a) Americans are notoriously bad at geography and (b) even if they were good at geography, we've removed much of the base references for folks to work from accurately.

The data exists in the shapefile, but to get the labels put in the centers of each polygon we have to do a bit of work:

library(rgeos)

centers <- cbind.data.frame(data.frame(gCentroid(us, byid=TRUE), id=us@data$iso3166_2))

That gets us a data frame of the x & y centers of each polygon along with the (abbreviated) state name. We can now add a layer with geom_text to place the label. The following is the complete solution:

library(rgdal)
library(rgeos)
library(ggplot2)

us <- readOGR("us_states_hexgrid.geojson", "OGRGeoJSON")

centers <- cbind.data.frame(data.frame(gCentroid(us, byid=TRUE), id=us@data$iso3166_2))

us_map <- fortify(us, region="iso3166_2")

ggplot(data=us_map, aes(map_id=id, x=long, y=lat)) + geom_map(map=us_map, color="black", fill="white")

gg <- ggplot()
gg <- gg + geom_map(data=us_map, map=us_map,
                    aes(x=long, y=lat, map_id=id),
                    color="white", size=0.5)
gg <- gg + geom_map(data=us@data, map=us_map,
                    aes(fill=bees, map_id=iso3166_2))
gg <- gg + geom_map(data=us@data, map=us_map,
                    aes(map_id=iso3166_2),
                    fill="#ffffff", alpha=0, color="white",
                    show_guide=FALSE)
gg <- gg + geom_text(data=centers, aes(label=id, x=x, y=y), color="white", size=4)
gg <- gg + scale_fill_distiller(palette="RdPu", na.value="#7f7f7f")
gg <- gg + coord_map()
gg <- gg + labs(x=NULL, y=NULL)
gg <- gg + theme_bw()
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(panel.grid=element_blank())
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(axis.text=element_blank())
gg

Rplot03

Wrapping up

This is a pretty neat way to work with "statebins" and I'll probably take some time over the summer to update my statebins package to use shapefiles and allow for more generic shapes. Ramnath Vaidyanathan has also done some work with statebins and javascript, so I'll see what I can do to merge all the functionality into one package.

If you've got an alternate way to work with these or have some interesting "bins" to show, drop a note in the comments.

I and @awpiii were trading news about the power outages in Maine & New Hampshire last night and he tweeted the link to the @PSNH [Outage Map](http://www.psnh.com/outage/). As if the Bing Maps tiles weren’t bad enough, the use of a categorical color scale instead of a sequential one[[1](http://earthobservatory.nasa.gov/blogs/elegantfigures/2011/05/20/qualitative-vs-sequential-color-scales/)] caused sufficient angst that I whipped up an alternate version in R between making pies and bread for Thanksgiving (even with power being out for us).

PSNH provides a text version of outages (by town) that ends up being a pretty clean HTML table, and a quick Google search led me to a fairly efficient town-level [shapefile](http://www.mass.gov/anf/research-and-tech/it-serv-and-support/application-serv/office-of-geographic-information-massgis/datalayers/adjacent-states-town-boundaries.html) for New Hampshire. With these data files at the ready, it was time to make a better map.

**Step 0 – Environment Setup**

So, I lied. There are six steps. “5” just works way better in attention-grabbing list headlines. The first one is setting up the project and loading all the libraries we’ll need. I use RStudio for most of my R coding and their IDE has the concept of a “project” which has it’s own working directory, workspace, history, and source documents separate from any other RStudio windows you have open. They are a great way to organize your analyses and experiments. I have my own “new project” [script](http://rud.is/dl/newprj.sh) that sets up additional directory structures, configures the `Rproj` file with my preferences and initializes a git repository for the project.

I also use the setup step to load up a ggplot2 map theme I keep in a gist.

library(sp)
library(rgdal)
library(dplyr)
library(rvest)
library(stringi)
library(scales)
library(RColorBrewer)
library(ggplot2)
 
# for theme_map
devtools::source_gist("https://gist.github.com/hrbrmstr/33baa3a79c5cfef0f6df")

**Step 1 – Read in the map**

This is literally a one-liner:

nh <- readOGR("data/nhtowns/NHTOWNS_POLY.shp", "NHTOWNS_POLY")

My projects all have a `data` directory and thats where I normally store shapefiles. I used `ogrinfo -al NHTOWNS_POLY.shp` at the command line to determine the layer name.

**Step 2 – Read in the outage data**

The `rvest` package is nothing short of amazing. It makes very quick work of web scraping and—despite some newlines in the mix—this qualifies as a one-liner in my book:

outage <- html("http://www.psnh.com/outagelist/") %>%
  html_nodes("table") %>%
  html_table() %>%
  .[[1]]

That bit of code grabs the whole page, extracts all the HTML tables (there is just one in this example), turns it into a list of data frames and then returns the first one.

**Step 3 – Data wrangling**

While this step is definitely not as succinct as the two previous ones, it’s pretty straightforward:

outage <- outage[complete.cases(outage),]
colnames(outage) <- c("id", "total_customers", "without_power", "percentage_out")
outage$id <- stri_trans_totitle(outage$id)
outage$out <- cut(outage$without_power,
    breaks=c(0, 25, 100, 500, 1000, 5000, 10000, 20000, 40000),
    labels=c("1 - 25", "26 - 100", "101 - 500", "501 - 1,000",
             "1,001 - 5,000", "5,001- 10,000", "10,001 - 20,000",
             "20,001 - 40,000"))

We filter out the `NA`’s (this expunges the “total” row), rename the columns, convert the town name to the same case used in the shapefile (NOTE: I could have just `toupper`ed all the town names, but I really like this function from the `stringi` package) and then use `cut` to make an 8-level factor out of the customer outage count (to match the PSNH map legend).

**Step 4 – Preparing the map for plotting with `ggplot`**

This is another one-liner:

nh_map <- fortify(nh, region="NAME")

and makes it possible to use the town names when specifying the polygon regions we want to fill with our spiffy color scheme.

**Step 5 – Plotting the map**

It is totally possible to do this in one line, but many kittens will lose their lives if you do. I like this way of structuring the creation of a `ggplot` graphic since it makes it very easy to comment out or add various layers or customizations without worrying about stray `+` signs.

gg <- ggplot(data=nh_map, aes(map_id=id))
gg <- gg + geom_map(map=nh_map, aes(x=long, y=lat),
                    color="#0e0e0e", fill="white", size=0.2)
gg <- gg + geom_map(data=outage, map=nh_map, aes(fill=out),
                    color="#0e0e0e", size=0.2)
gg <- gg + scale_fill_brewer(type="seq", palette="RdPu",
                             name="Number of\ncustomer outages\nin each town")
gg <- gg + coord_equal()
gg <- gg + labs(title=sprintf("%s Total PSNH Customers Without Power",
                              comma(sum(outage$without_power))))
gg <- gg + theme_map()
gg <- gg + theme(legend.position="right")
gg

That sequence starts the base `ggplot` object creation, sets up the base map colors and then overlays the town outage colors. We use the `RdPu` [Color Brewer](http://colorbrewer2.org/) sequential palette and give the legend the same title as the PSNH counterpart.

The shapefile is already projected (Lambert Conformal Conic—take a look at it with `ogrinfo -al`), so we can get away with using `coord_equal` vs re-projecting it, and we do a tally of outages to stick in the title. My base `theme_map` is designed for Maine, hence the extra `theme` call to move the legend.

**The Finished Product**

Crisp SVG polygons, no cluttered Bing Maps tiles and a proper color palette.

![img](http://rud.is/dl/psnh.svg)

All the code is [up on github](https://github.com/hrbrmstr/psnh).