Overcoming D3 Cartographic Envy With R + ggplot

When I used one of the Scotland TopoJSON files for a [recent post](https://rud.is/b/2014/09/20/chartingmapping-the-scottish-vote-with-r-rvestdplyrtidyrtopojsonggplot/), it really hit me just how much D3 cartography envy I had/have as an R user. Don’t get me wrong, I can conjure up D3 maps pretty well [[1](http://vcdb.org/explore.html)] [[2](http://datadrivensecurity.info/blog/posts/2014/Jan/malicious-cartography-part-1/)] and the utility of an interactive map visualization goes without saying, but we can make great static maps in R without a great deal of effort, so I decided to replicate a few core examples from the [D3 topojson gallery](https://github.com/mbostock/topojson/wiki/Gallery) in R.

I chose five somewhat different examples, each focusing on various aspects of creating map layers and trying to not be too U.S. focused. Here they are (hit the main link to go to the gist for the example and the `bl.ocks` URL to see it’s D3 counterpart):

– [Swiss Cantons](https://gist.github.com/hrbrmstr/e3d0dc87eaacf7bbece7) – R version of [http://bl.ocks.org/mbostock/4207744](http://bl.ocks.org/mbostock/4207744)
– [Costa Rica](https://gist.github.com/hrbrmstr/08eab5535619daaa281e) – R version of [http://bl.ocks.org/arce/9357998](http://bl.ocks.org/arce/9357998)
– [Area Choropleth](https://gist.github.com/hrbrmstr/9bda23f79ce2a5776733) – R version of [http://bl.ocks.org/mbostock/4206573](http://bl.ocks.org/mbostock/4206573)
– [Block Counties](https://gist.github.com/hrbrmstr/b7ffe6137c4ff6acd009) – R version of [http://bl.ocks.org/mbostock/4206857](http://bl.ocks.org/mbostock/4206857)
– [County Circles](https://gist.github.com/hrbrmstr/6c6ffda42a93e2f8b380) (OK, Ovals) – R version of [http://bl.ocks.org/mbostock/4206975](http://bl.ocks.org/mbostock/4206975)

I used the TopoJSON/GeoJSON files provided with each example, so you’ll need a recent [gdal](http://www.gdal.org/) (>= 1.11), and—consequently—a suitable build of `rgdal`) to work through the examples.

### The Core Mapping Idiom

While the details may vary with each project you work on, the basic flow to present a map in R with `ggplot` are:

– read in a map features (I use `readOGR` in these examples)
– convert that into something `ggplot` can handle
– identify values you wish to pair with those features (optional if we’re just plotting a plain map)
– determine which portion of the map is to be displayed
– plot the map features

Words & abbreviations mean things, just like map symbols mean things, and if you’re wondering what this “OGR” is, here’s the answer from the [official FAQ](https://trac.osgeo.org/gdal/wiki/FAQGeneral#WhatdoesOGRstandfor):

>_OGR used to stand for OpenGIS Simple Features Reference Implementation. However, since OGR is not fully compliant with the OpenGIS Simple Feature specification and is not approved as a reference implementation of the spec the name was changed to OGR Simple Features Library. The only meaning of OGR in this name is historical. OGR is also the prefix used everywhere in the source of the library for class names, filenames, etc._

The `readOGR` function can work with a wide variety of file formats and OGR files can hold a wide variety of data. The most basic use for our mapping is to read in these TopoJSON/GeoJSON files and use the right features from them to make our maps. Features/layers can be almost anything (counties, states, countries, rivers, lakes, etc) and we can see what features we want to work with by using the `ogrListLayers` function (you can do this from an operating system command line as well, but we’ll stay in R for now). Let’s take a look at the layers available in the map from the Costa Rica example:

## [1] "limites"    "provincias" "cantones"   "distritos" 
## [1] "GeoJSON"
## [1] 4

Those translate to “country”, “provinces”, “cantons”, & “districts”. Each layer has polygons and associated data for the polygons (and overall layer), including information about the type of projection. If you’re sensing a “math trigger warning”, fear not; I won’t be delving into to much more cartographic detail as you probably just want to see the maps & code.

### Swiss Cantons

If you’re from the U.S. you (most likely) have no idea what a canton is. The quickest explanation is that it is an administrative division within a country and, in this specific example, the 26 cantons of Switzerland are the member states of the federal state of Switzerland.

The [D3 Swiss Cantons](http://bl.ocks.org/mbostock/4207744) uses a TopoJSON/GeoJSON file that has only one layer (i.e. the cantons) along with metadata about the canton `id` and `name`:

ogrInfo("readme-swiss.json", "cantons")
## Source: "readme-swiss.json", layer: "cantons"
## Driver: GeoJSON number of rows 26 
## Feature type: wkbPolygon with 2 dimensions
## Extent: (5.956 45.818) - (10.492 47.808)
## Number of fields: 2 
##   name type length typeName
## 1   id    4      0   String
## 2 name    4      0   String

>NOTE: you should learn to get pretty adept with the OGR functions or command-line tools as you can do some really amazing things with them, including extracting only certain features, simplifying the polygons or fixing issues. Some of the TopoJSON/GeoJSON files you’ll find with D3 examples may have missing or invalid components and you can fix some of them with these tools. We’ll be working around errors and missing values in these examples.

The D3 example displays the canton `name` at the [centroid](http://en.wikipedia.org/wiki/Centroid) of the polygon, so that’s what we’ll do in R:

library(rgdal) # needs gdal > 1.11.0
# ggplot map theme
map = readOGR("readme-swiss.json", "cantons")
map_df <- fortify(map)

The `map` object is a `SpatialPolygonsDataFrame` and has a fairly complex structure:

## [1] "data"        "polygons"    "plotOrder"   "bbox"       
## [5] "proj4string"
## [1] "id"   "name"
# execute these on your own and poke around the data structures after determining the class

The `fortify` function turns all that into something we can use with `ggplot`. Normally, we’d be able to get `fortify` to associate the canton `name` to the polygon points it encodes via the `region` parameter. That did not work with these TopoJSON/GeoJSON files and I didn’t really poke around much to determine why since it’s easy enough to work around. In this case, I manually merged the names with the fortified map data frame.

#  create mapping for id # to name since "region=" won't work
dat <- data.frame(id=0:(length(map@data$name)-1), canton=map@data$name)
map_df <- merge(map_df, dat, by="id")

We can get the centroid via the `gCentroid` function, and we’ll make a data frame of those center points and the name of the canton for use with a `geom_text` layer after plotting the base outlines of the cantons (with a rather bland fill, but I didn’t pick the color):

# find canton centers
centers <- data.frame(gCentroid(map, byid=TRUE))
centers$canton <- dat$canton
# make a map!
gg <- ggplot()
gg <- gg + geom_map(data=map_df, map=map_df,
                    aes(map_id=id, x=long, y=lat, group=group),
                    color="#ffffff", fill="#bbbbbb", size=0.25)
# gg <- gg + geom_point(data=centers, aes(x=x, y=y))
gg <- gg + geom_text(data=centers, aes(label=canton, x=x, y=y), size=3)
gg <- gg + coord_map()
gg <- gg + labs(x="", y="", title="Swiss Cantons")
gg <- gg + theme_map()

The `coord_map()` works with the `mapproj` package to help us display maps in reasonable projections (or really dumb ones). The default is `”mercator”` and we’ll stick with that since the D3 examples use it (but, [winkel-tripel](http://www.winkel.org/other/Winkel%20Tripel%20Projections.htm) FTW!).

Here’s the result of our hard work (select map for larger version):

If you ignore the exposition above and just take into account non-blank source code lines, we did all that in ~16LOC and have a scaleable SVG file as a result. You can have some fun with the above code and remove the static `fill=”#bbbbbb”` and move it to the `mapping` aesthetic parameter and tie it’s value to the canton name.

### Costa Rica

The TopoJSON/GeoJSON file provided with the [D3 example](http://bl.ocks.org/arce/9357998) is a good example of encoding multiple layers into a single file (see the first `ogrListLayers` above). We’ll create a fortified version of each layer and then plot each with a `geom_map` layer using different line colors, sizes and fills:

limites = readOGR("division.json", "limites")
provincias = readOGR("division.json", "provincias")
cantones = readOGR("division.json", "cantones")
distritos = readOGR("division.json", "distritos")
limites_df <- fortify(limites)
cantones_df <- fortify(cantones)
distritos_df <- fortify(distritos)
provincias_df <- fortify(provincias)
gg <- ggplot()
gg <- gg + geom_map(data=limites_df, map=limites_df,
                    aes(map_id=id, x=long, y=lat, group=group),
                    color="white", fill="#dddddd", size=0.25)
gg <- gg + geom_map(data=cantones_df, map=cantones_df,
                    aes(map_id=id, x=long, y=lat, group=group),
                    color="red", fill="#ffffff00", size=0.2)
gg <- gg + geom_map(data=distritos_df, map=distritos_df,
                    aes(map_id=id, x=long, y=lat, group=group),
                    color="#999999", fill="#ffffff00", size=0.1)
gg <- gg + geom_map(data=provincias_df, map=provincias_df,
                    aes(map_id=id, x=long, y=lat, group=group),
                    color="black", fill="#ffffff00", size=0.33)
gg <- gg + coord_map()
gg <- gg + labs(x="", y="", title="Costa Rica TopoJSON")
gg <- gg + theme_map()

The result is pretty neat and virtually identical to the D3 version:

Try playing around with the order of the `geom_map` layers (or remove some) and also the line color/size/fill & alpha values to see how it changes the map.

### Area Choropleth

I’m not a huge fan of the colors used in the [D3 version](http://bl.ocks.org/mbostock/4206573) and I’m not going to spend any time moving Hawaii & Alaska around (that’s a whole different post). **But**, I _will_ show how to make a similar area choropleth:

# read in the county borders
map = readOGR("us.json", "counties")
# calculate (well retrieve) the area since it's part of the polygon structure
# and associate it with the polygon id so we can use it later. We need to do
# the merge manually again since the "us.json" file threw errors again when
# trying to use the fortify "region" parameter.
map_area <- data.frame(id=0:(length(map@data$id)-1),
                       area=sapply(slot(map, "polygons"), slot, "area") )
# read in the state borders
states = readOGR("us.json", "states")
states_df <- fortify(states)
# create map data frame and merge area info
map_df <- fortify(map)
map_df <- merge(map_df, map_area, by="id")
gg <- ggplot()
# thin white border around counties and shades of yellow-green for area (log scale)
gg <- gg + geom_map(data=map_df, map=map_df,
                    aes(map_id=id, x=long, y=lat, group=group, fill=log1p(area)),
                    color="white", size=0.05)
# thick white border for states
gg <- gg + geom_map(data=states_df, map=states_df,
                    aes(map_id=id, x=long, y=lat, group=group),
                    color="white", size=0.5, alpha=0)
gg <- gg + scale_fill_continuous(low="#ccebc5", high="#084081")
# US continental extents - not showing alaska & hawaii
gg <- gg + xlim(-124.848974, -66.885444)
gg <- gg + ylim(24.396308, 49.384358)
gg <- gg + coord_map()
gg <- gg + labs(x="", y="", title="Area Choropleth")
gg <- gg + theme_map()
gg <- gg + theme(legend.position="none")

Play with the colors and use different values instead of the polygon area (perhaps use `sample` or `runif` to generate some data) to see how it changes the choropleth outcome.

### Blocky Counties

The example from the [D3 wiki](http://bl.ocks.org/mbostock/4206857) is more “how to work with shapefiles and map coordinates” than it is useful, but we have the same flexibility in R, so we’ll make the same plot by using the `bbox` function to make a data frame of bounding boxes we can use with `geom_rect` (there’s _no_ `geom_map` in this example, just using the coordinate system to plot boxes):

# use the topojson from the bl.ocks example
map = readOGR("us.json", "counties")
# build our map data frame of rects
map_df <- do.call("rbind", lapply(map@polygons, function(p) {
  b <- bbox(p) # get bounding box of polygon and put it into a form we can use later
  data.frame(xmin=b["x", "min"],
             xmax=b["x", "max"],
             ymin=b["y", "min"],
             ymax=b["y", "max"])
map_df$id <- map$id # add the id even though we aren't using it now
gg <- ggplot(data=map_df)
gg <- gg + geom_rect(aes(xmin=xmin, xmax=xmax,
                         ymin=ymin, ymax=ymax),
                     color="steelblue", alpha=0, size=0.25)
# continental us only
gg <- gg + xlim(-124.848974, -66.885444)
gg <- gg + ylim(24.396308, 49.384358)
gg <- gg + coord_map()
gg <- gg + labs(x="", y="", title="Blocky Counties")
gg <- gg + theme_map()
gg <- gg + theme(legend.position="none")

To re-emphasize we’re just working with `ggplot` layers, so play around and, perhaps color in only the odd numbered counties.

### County Circles (OK, Ovals)

The last [D3 example](http://bl.ocks.org/mbostock/4206975) I’m copying swaps squares for circles, which makes this more of a challenge to do in R+ggplot since `ggplot` has no “circle” geom (and holey `geom_point`s _do not count_). So, we’ll borrow and slightly adapt a [function from StackOverflow](http://stackoverflow.com/a/6863490/1457051) by [joran](http://stackoverflow.com/users/324364/joran) that builds a data frame of polygon points derived by a center & diameter. We’ll add an `id` value (for each of the counties) and make one _really_ big data frame (well, big for use in `ggplot`) that we can then plot as grouped `geom_path`s. Unlike our cantons example, the `gCentroid` function coughed up errors on this TopoJSON/GeoJSON file, so I resorted to approximating the center from the rectangular bounding box. Also, I don’t project the circle coordinates before plotting, so they’re ovals. While it doesn’t mirror the D3 example perfectly, it should help reinforce how to work with the map’s metadata and draw arbitrary components on a map:

# adapted from http://stackoverflow.com/questions/6862742/draw-a-circle-with-ggplot2
# computes a circle from a given diameter. we add "id" so we can have one big
# data frame and group them for plotting
circleFun <- function(id, center = c(0,0),diameter = 1, npoints = 100){
    r = diameter / 2
    tt <- seq(0,2*pi,length.out = npoints)
    xx <- center[1] + r * cos(tt)
    yy <- center[2] + r * sin(tt)
    return(data.frame(id=id, x = xx, y = yy))
# us topojson from the bl.ocks example
map = readOGR("us.json", "counties")
# this topojson file gives rgeos_getcentroid errors here
# so we approximate the centroid
map_df <- do.call("rbind", lapply(map@polygons, function(p) {
  b <- bbox(p)
  data.frame(x=b["x", "min"] + ((b["x", "max"] - b["x", "min"]) / 2),
             y=b["y", "min"] + ((b["y", "max"] - b["y", "min"]) / 2))
# get area & diameter
map_df$area <- sapply(slot(map, "polygons"), slot, "area")
map_df$diameter <- sqrt(map_df$area / pi) * 2
# make our big data frame of circles
circles <- do.call("rbind", lapply(1:nrow(map_df), function(i) {
  circleFun(i, c(map_df[i,]$x, map_df[i,]$y), map_df[i,]$diameter)
gg <- ggplot(data=circles, aes(x=x, y=y, group=id))
gg <- gg + geom_path(color="steelblue", size=0.25)
# continental us
gg <- gg + xlim(-124.848974, -66.885444)
gg <- gg + ylim(24.396308, 49.384358)
gg <- gg + coord_map()
gg <- gg + labs(x="", y="", title="County Circles (OK, Ovals)")
gg <- gg + theme_map()
gg <- gg + theme(legend.position="none")

If you poke around a bit at the various map libraries in R, you should be able to figure out how to get those plotted as circles (and learn alot in the process).

### Wrapping Up

R `ggplot` maps won’t and shouldn’t replace D3 maps for many reasons, paramount of which is interactivity. The generated SVG files are also fairly large and the non-SVG versions don’t look nearly as crisp (and aren’t as flexible). However, this should be a decent introductory primer on mapping and shapefiles and might come in handy when you want to use R to enhance maps with other data and write out (yep, R can read _and_ write OGR) your own shapefiles for use in D3 (or other tools/languages).

Don’t forget that all source code (including TopoJSON/GeoJSON files and sample SVGs) are in their own gists:

– [Swiss Cantons](https://gist.github.com/hrbrmstr/e3d0dc87eaacf7bbece7)
– [Costa Rica](https://gist.github.com/hrbrmstr/08eab5535619daaa281e)
– [Area Choropleth](https://gist.github.com/hrbrmstr/9bda23f79ce2a5776733)
– [Block Counties](https://gist.github.com/hrbrmstr/b7ffe6137c4ff6acd009)
– [County Circles](https://gist.github.com/hrbrmstr/6c6ffda42a93e2f8b380) (OK, Ovals)

If you figure out what is causing some of the errors I mentioned or make some epic maps of your own, don’t hesitate to drop a note in the comments.

Cover image from Data-Driven Security
Amazon Author Page

8 Comments Overcoming D3 Cartographic Envy With R + ggplot

  1. Pingback: Somewhere else, part 169 | Freakonometrics

  2. Pingback: Moving The Earth (well, Alaska & Hawaii) With R | rud.is

  3. Pingback: Making Static/Interactive Voronoi Map Layers In ggplot/leaflet | rud.is

  4. Michael Herradora

    Hi, I was trying to run the examples but I always have the same error:

    Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, useiconv = useiconv, :
    Cannot open file

    What can I do?


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.