I and @awpiii were trading news about the power outages in Maine & New Hampshire last night and he tweeted the link to the @PSNH [Outage Map](http://www.psnh.com/outage/). As if the Bing Maps tiles weren’t bad enough, the use of a categorical color scale instead of a sequential one[[1](http://earthobservatory.nasa.gov/blogs/elegantfigures/2011/05/20/qualitative-vs-sequential-color-scales/)] caused sufficient angst that I whipped up an alternate version in R between making pies and bread for Thanksgiving (even with power being out for us).
PSNH provides a text version of outages (by town) that ends up being a pretty clean HTML table, and a quick Google search led me to a fairly efficient town-level [shapefile](http://www.mass.gov/anf/research-and-tech/it-serv-and-support/application-serv/office-of-geographic-information-massgis/datalayers/adjacent-states-town-boundaries.html) for New Hampshire. With these data files at the ready, it was time to make a better map.
**Step 0 – Environment Setup**
So, I lied. There are six steps. “5” just works way better in attention-grabbing list headlines. The first one is setting up the project and loading all the libraries we’ll need. I use RStudio for most of my R coding and their IDE has the concept of a “project” which has it’s own working directory, workspace, history, and source documents separate from any other RStudio windows you have open. They are a great way to organize your analyses and experiments. I have my own “new project” [script](http://rud.is/dl/newprj.sh) that sets up additional directory structures, configures the `Rproj` file with my preferences and initializes a git repository for the project.
I also use the setup step to load up a ggplot2 map theme I keep in a gist.
library(sp) library(rgdal) library(dplyr) library(rvest) library(stringi) library(scales) library(RColorBrewer) library(ggplot2) # for theme_map devtools::source_gist("https://gist.github.com/hrbrmstr/33baa3a79c5cfef0f6df")
**Step 1 – Read in the map**
This is literally a one-liner:
nh <- readOGR("data/nhtowns/NHTOWNS_POLY.shp", "NHTOWNS_POLY")
My projects all have a `data` directory and thats where I normally store shapefiles. I used `ogrinfo -al NHTOWNS_POLY.shp` at the command line to determine the layer name.
**Step 2 – Read in the outage data**
The `rvest` package is nothing short of amazing. It makes very quick work of web scraping and—despite some newlines in the mix—this qualifies as a one-liner in my book:
outage <- html("http://www.psnh.com/outagelist/") %>% html_nodes("table") %>% html_table() %>% .[[1]]
That bit of code grabs the whole page, extracts all the HTML tables (there is just one in this example), turns it into a list of data frames and then returns the first one.
**Step 3 – Data wrangling**
While this step is definitely not as succinct as the two previous ones, it’s pretty straightforward:
outage <- outage[complete.cases(outage),] colnames(outage) <- c("id", "total_customers", "without_power", "percentage_out") outage$id <- stri_trans_totitle(outage$id) outage$out <- cut(outage$without_power, breaks=c(0, 25, 100, 500, 1000, 5000, 10000, 20000, 40000), labels=c("1 - 25", "26 - 100", "101 - 500", "501 - 1,000", "1,001 - 5,000", "5,001- 10,000", "10,001 - 20,000", "20,001 - 40,000"))
We filter out the `NA`’s (this expunges the “total” row), rename the columns, convert the town name to the same case used in the shapefile (NOTE: I could have just `toupper`ed all the town names, but I really like this function from the `stringi` package) and then use `cut` to make an 8-level factor out of the customer outage count (to match the PSNH map legend).
**Step 4 – Preparing the map for plotting with `ggplot`**
This is another one-liner:
nh_map <- fortify(nh, region="NAME")
and makes it possible to use the town names when specifying the polygon regions we want to fill with our spiffy color scheme.
**Step 5 – Plotting the map**
It is totally possible to do this in one line, but many kittens will lose their lives if you do. I like this way of structuring the creation of a `ggplot` graphic since it makes it very easy to comment out or add various layers or customizations without worrying about stray `+` signs.
gg <- ggplot(data=nh_map, aes(map_id=id)) gg <- gg + geom_map(map=nh_map, aes(x=long, y=lat), color="#0e0e0e", fill="white", size=0.2) gg <- gg + geom_map(data=outage, map=nh_map, aes(fill=out), color="#0e0e0e", size=0.2) gg <- gg + scale_fill_brewer(type="seq", palette="RdPu", name="Number of\ncustomer outages\nin each town") gg <- gg + coord_equal() gg <- gg + labs(title=sprintf("%s Total PSNH Customers Without Power", comma(sum(outage$without_power)))) gg <- gg + theme_map() gg <- gg + theme(legend.position="right") gg
That sequence starts the base `ggplot` object creation, sets up the base map colors and then overlays the town outage colors. We use the `RdPu` [Color Brewer](http://colorbrewer2.org/) sequential palette and give the legend the same title as the PSNH counterpart.
The shapefile is already projected (Lambert Conformal Conic—take a look at it with `ogrinfo -al`), so we can get away with using `coord_equal` vs re-projecting it, and we do a tally of outages to stick in the title. My base `theme_map` is designed for Maine, hence the extra `theme` call to move the legend.
**The Finished Product**
Crisp SVG polygons, no cluttered Bing Maps tiles and a proper color palette.
All the code is [up on github](https://github.com/hrbrmstr/psnh).
One Comment
Your map theme gist uses the %+replace% operator, which is new to me. Is that part of dplyr? I can’t find the relevant documentation but it seems to work for me with that library loaded.