Power Outage Impact Choropleths In 5 Steps in R (featuring rvest & RStudio “Projects”)

I and @awpiii were trading news about the power outages in Maine & New Hampshire last night and he tweeted the link to the @PSNH [Outage Map](http://www.psnh.com/outage/). As if the Bing Maps tiles weren’t bad enough, the use of a categorical color scale instead of a sequential one[[1](http://earthobservatory.nasa.gov/blogs/elegantfigures/2011/05/20/qualitative-vs-sequential-color-scales/)] caused sufficient angst that I whipped up an alternate version in R between making pies and bread for Thanksgiving (even with power being out for us).

PSNH provides a text version of outages (by town) that ends up being a pretty clean HTML table, and a quick Google search led me to a fairly efficient town-level [shapefile](http://www.mass.gov/anf/research-and-tech/it-serv-and-support/application-serv/office-of-geographic-information-massgis/datalayers/adjacent-states-town-boundaries.html) for New Hampshire. With these data files at the ready, it was time to make a better map.

**Step 0 – Environment Setup**

So, I lied. There are six steps. “5” just works way better in attention-grabbing list headlines. The first one is setting up the project and loading all the libraries we’ll need. I use RStudio for most of my R coding and their IDE has the concept of a “project” which has it’s own working directory, workspace, history, and source documents separate from any other RStudio windows you have open. They are a great way to organize your analyses and experiments. I have my own “new project” [script](http://rud.is/dl/newprj.sh) that sets up additional directory structures, configures the `Rproj` file with my preferences and initializes a git repository for the project.

I also use the setup step to load up a ggplot2 map theme I keep in a gist.

# for theme_map

**Step 1 – Read in the map**

This is literally a one-liner:

nh <- readOGR("data/nhtowns/NHTOWNS_POLY.shp", "NHTOWNS_POLY")

My projects all have a `data` directory and thats where I normally store shapefiles. I used `ogrinfo -al NHTOWNS_POLY.shp` at the command line to determine the layer name.

**Step 2 – Read in the outage data**

The `rvest` package is nothing short of amazing. It makes very quick work of web scraping and—despite some newlines in the mix—this qualifies as a one-liner in my book:

outage <- html("http://www.psnh.com/outagelist/") %>%
  html_nodes("table") %>%
  html_table() %>%

That bit of code grabs the whole page, extracts all the HTML tables (there is just one in this example), turns it into a list of data frames and then returns the first one.

**Step 3 – Data wrangling**

While this step is definitely not as succinct as the two previous ones, it’s pretty straightforward:

outage <- outage[complete.cases(outage),]
colnames(outage) <- c("id", "total_customers", "without_power", "percentage_out")
outage$id <- stri_trans_totitle(outage$id)
outage$out <- cut(outage$without_power,
    breaks=c(0, 25, 100, 500, 1000, 5000, 10000, 20000, 40000),
    labels=c("1 - 25", "26 - 100", "101 - 500", "501 - 1,000",
             "1,001 - 5,000", "5,001- 10,000", "10,001 - 20,000",
             "20,001 - 40,000"))

We filter out the `NA`’s (this expunges the “total” row), rename the columns, convert the town name to the same case used in the shapefile (NOTE: I could have just `toupper`ed all the town names, but I really like this function from the `stringi` package) and then use `cut` to make an 8-level factor out of the customer outage count (to match the PSNH map legend).

**Step 4 – Preparing the map for plotting with `ggplot`**

This is another one-liner:

nh_map <- fortify(nh, region="NAME")

and makes it possible to use the town names when specifying the polygon regions we want to fill with our spiffy color scheme.

**Step 5 – Plotting the map**

It is totally possible to do this in one line, but many kittens will lose their lives if you do. I like this way of structuring the creation of a `ggplot` graphic since it makes it very easy to comment out or add various layers or customizations without worrying about stray `+` signs.

gg <- ggplot(data=nh_map, aes(map_id=id))
gg <- gg + geom_map(map=nh_map, aes(x=long, y=lat),
                    color="#0e0e0e", fill="white", size=0.2)
gg <- gg + geom_map(data=outage, map=nh_map, aes(fill=out),
                    color="#0e0e0e", size=0.2)
gg <- gg + scale_fill_brewer(type="seq", palette="RdPu",
                             name="Number of\ncustomer outages\nin each town")
gg <- gg + coord_equal()
gg <- gg + labs(title=sprintf("%s Total PSNH Customers Without Power",
gg <- gg + theme_map()
gg <- gg + theme(legend.position="right")

That sequence starts the base `ggplot` object creation, sets up the base map colors and then overlays the town outage colors. We use the `RdPu` [Color Brewer](http://colorbrewer2.org/) sequential palette and give the legend the same title as the PSNH counterpart.

The shapefile is already projected (Lambert Conformal Conic—take a look at it with `ogrinfo -al`), so we can get away with using `coord_equal` vs re-projecting it, and we do a tally of outages to stick in the title. My base `theme_map` is designed for Maine, hence the extra `theme` call to move the legend.

**The Finished Product**

Crisp SVG polygons, no cluttered Bing Maps tiles and a proper color palette.


All the code is [up on github](https://github.com/hrbrmstr/psnh).

Cover image from Data-Driven Security
Amazon Author Page

1 Comment Power Outage Impact Choropleths In 5 Steps in R (featuring rvest & RStudio “Projects”)

  1. David

    Your map theme gist uses the %+replace% operator, which is new to me. Is that part of dplyr? I can’t find the relevant documentation but it seems to work for me with that library loaded.


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.