Skip navigation

Category Archives: maps

In the never-ending battle for truth, justice and publishing more R
packages than [Oliver](http://twitter.com/quominus), I whipped out an R
package for the [OpenStreetMap Nominatim
API](http://wiki.openstreetmap.org/wiki/Nominatim). It actually hits the
[MapQuest Nominatim Servers](http://open.mapquestapi.com/nominatim/) for
most of the calls, but the functionality is the same.

The R package lets you:

– `address_lookup`: Lookup the address of one or multiple OSM objects
like node, way or relation.
– `osm_geocode`: Search for places by address
– `osm_search`: Search for places
– `osm_search_spatial`: Search for places, returning a list of
`SpatialPointsDataFrame`, `SpatialLinesDataFrame` or a
`SpatialPolygonsDataFrame`
– `reverse_geocode_coords`: Reverse geocode based on lat/lon
– `reverse_geocode_osm`: Reverse geocode based on OSM Type & Id

Just like Google Maps, these services are not meant to be your
freebie-access to mega-bulk-geocoding. You can and should pay for that.
But, when you need a few items geocoded (or want to lookup some
interesting things on OSM since it provides [special
phrases](http://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases)
to work with), Nominatim lookups can be just what’s needed.

Let’s say we wanted to see where pubs are in the Seattle metro area.
That’s a simple task for nominatim:

# devtools::install_github("hrbrmstr/nominatim")
library(nominatim)
library(dplyr)
 
sea_pubs <- osm_search("pubs near seattle, wa", limit=20)
 
glimpse(sea_pubs)
 
## Observations: 20
## Variables:
## $ place_id     (chr) "70336054", "82743439", "11272568", "21478701", "...
## $ licence      (chr) "Data © OpenStreetMap contributors, ODbL 1.0. htt...
## $ osm_type     (chr) "way", "way", "node", "node", "node", "node", "no...
## $ osm_id       (chr) "51460516", "96677583", "1077652159", "2123245933...
## $ lat          (dbl) 47.64664, 47.63983, 47.60210, 47.62438, 47.59203,...
## $ lon          (dbl) -122.3503, -122.3023, -122.3321, -122.3559, -122....
## $ display_name (chr) "Nickerson Street Saloon, 318, Nickerson Street, ...
## $ class        (chr) "amenity", "amenity", "amenity", "amenity", "amen...
## $ type         (chr) "pub", "pub", "pub", "pub", "pub", "pub", "pub", ...
## $ importance   (dbl) 0.201, 0.201, 0.201, 0.201, 0.201, 0.201, 0.201, ...
## $ icon         (chr) "http://mq-open-search-int-ls03.ihost.aol.com:800...
## $ bbox_left    (dbl) 47.64650, 47.63976, 47.60210, 47.62438, 47.59203,...
## $ bbox_top     (dbl) 47.64671, 47.63990, 47.60210, 47.62438, 47.59203,...
## $ bbox_right   (dbl) -122.3504, -122.3025, -122.3321, -122.3559, -122....
## $ bbox_bottom  (dbl) -122.3502, -122.3022, -122.3321, -122.3559, -122....

We can even plot those locations:

library(rgdal)
library(ggplot2)
library(ggthemes)
library(sp)
library(DT)
 
# Grab a neighborhood map of Seattle
url <- "https://data.seattle.gov/api/file_data/VkU4Er5ow6mlI0loFhjIw6eL6eKEYMefYMm4MGcUakU?filename=Neighborhoods.zip"
fil <- "seattle.zip"
if (!file.exists(fil)) download.file(url, fil)
if (!dir.exists("seattle")) unzip(fil, exdir="seattle")
 
# make it usable
sea <- readOGR("seattle/Neighborhoods/WGS84/Neighborhoods.shp", "Neighborhoods")
 
## OGR data source with driver: ESRI Shapefile 
## Source: "seattle/Neighborhoods/WGS84/Neighborhoods.shp", layer: "Neighborhoods"
## with 119 features
## It has 12 fields
 
sea_map <- fortify(sea)
 
# Get the extenes of where the pubs are so we can "zoom in"
bnd_box <- bbox(SpatialPoints(as.matrix(sea_pubs[, c("lon", "lat")])))
 
# plot them
gg <- ggplot()
gg <- gg + geom_map(data=sea_map, map=sea_map,
                    aes(x=long, y=lat, map_id=id),
                    color="black", fill="#c0c0c0", size=0.25)
gg <- gg + geom_point(data=sea_pubs, aes(x=lon, y=lat),
                      color="#ffff33", fill="#ff7f00",
                      shape=21, size=4, alpha=1/2)
# decent projection for Seattle-y things and expand the zoom/clip a bit
gg <- gg + coord_map("gilbert",
                     xlim=extendrange(bnd_box["lon",], f=0.5),
                     ylim=extendrange(bnd_box["lat",], f=0.5))
gg <- gg + labs(title="Seattle Pubs")
gg <- gg + theme_map()
gg <- gg + theme(title=element_text(size=16))
gg

seattle_map-1

Of course you can geocode:

addrs <- osm_geocode(c("1600 Pennsylvania Ave, Washington, DC.",
                     "1600 Amphitheatre Parkway, Mountain View, CA",
                     "Seattle, Washington"))
addrs %>% select(display_name)
 
## Source: local data frame [3 x 1]
## 
##                                                                  display_name
## 1                  Washington, District of Columbia, United States of America
## 2 Mountainview Lane, Huntington Beach, Orange County, California, 92648, Unit
## 3                  Seattle, King County, Washington, United States of America
 
addrs %>% select(lat, lon)
 
## Source: local data frame [3 x 2]
## 
##        lat        lon
## 1 38.89495  -77.03665
## 2 33.67915 -118.02588
## 3 47.60383 -122.33006

Or, reverse geocode:

# Reverse geocode Canadian embassies
# complete list of Canadian embassies here:
# http://open.canada.ca/data/en/dataset/6661f0f8-2fb2-46fa-9394-c033d581d531
 
embassies <- data.frame(lat=c("34.53311", "41.327546", "41.91534", "36.76148", "-13.83282",
                             "40.479094", "-17.820705", "13.09511", "13.09511"),
                       lon=c("69.1835", "19.818698", "12.50891", "3.0166", "-171.76462",
                             "-3.686115", "31.043559", "-59.59998", "-59.59998"), stringsAsFactors=FALSE)
 
emb_coded_coords <- reverse_geocode_coords(embassies$lat, embassies$lon)
 
emb_coded_coords %>% select(display_name)
 
## Source: local data frame [9 x 1]
## 
##                                                                  display_name
## 1                Embassy of Canada, Ch.R.Wazir Akbar Khan, Kabul, Afghanistan
## 2 Monumenti i Skënderbeut, Skanderbeg Square, Lulishtja Këshilli i Europëes, 
## 3 Nomentana/Trieste, Via Nomentana, San Lorenzo, Salario, Municipio Roma II, 
## 4 18, Avenue Khalef Mustapha, Ben Aknoun, Daïra Bouzareah, Algiers, Ben aknou
## 5                               The Hole in the Wall, Beach Road, Āpia, Samoa
## 6 Torre Espacio, 259 D, Paseo de la Castellana, Fuencarral, Fuencarral-El Par
## 7 Leopold Takawira Street, Avondale West, Harare, Harare Province, 00263, Zim
## 8                    Bishop's Court Hill, Bridgetown, Saint Michael, Barbados
## 9                    Bishop's Court Hill, Bridgetown, Saint Michael, Barbados

It can even return `Spatial` objects (somewhat experimental):

# stock example search from OSM
osm_search_spatial("[bakery]+berlin+wedding", limit=5)[[1]]
 
##            coordinates   place_id
## 1 (13.34931, 52.54165)    9039748
## 2 (13.34838, 52.54125) 2659941153
## 3 (13.35678, 52.55138)   23586341
## 4 (13.34985, 52.54158)    7161987
## 5  (13.35348, 52.5499)   29179742
##                                                                               licence
## 1 Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
## 2 Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
## 3 Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
## 4 Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
## 5 Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
##   osm_type     osm_id      lat      lon
## 1     node  939667448 52.54165 13.34931
## 2     node 3655549445 52.54125 13.34838
## 3     node 2299953786 52.55138 13.35678
## 4     node  762607353 52.54158 13.34985
## 5     node 2661679367 52.54990 13.35348
##                                                                                    display_name
## 1            Baguetterie, Föhrer Straße, Brüsseler Kiez, Wedding, Mitte, Berlin, 13353, Germany
## 2 Föhrer Cafe & Backshop, Föhrer Straße, Brüsseler Kiez, Wedding, Mitte, Berlin, 13353, Germany
## 3               Körfez, Amsterdamer Straße, Leopoldkiez, Wedding, Mitte, Berlin, 13347, Germany
## 4             Knusperbäcker, Torfstraße, Brüsseler Kiez, Wedding, Mitte, Berlin, 13353, Germany
## 5             Hofbäckerei, Müllerstraße, Brüsseler Kiez, Wedding, Mitte, Berlin, 13353, Germany
##   class   type importance
## 1  shop bakery      0.201
## 2  shop bakery      0.201
## 3  shop bakery      0.201
## 4  shop bakery      0.201
## 5  shop bakery      0.201
##                                                                                                      icon
## 1 http://mq-open-search-int-ls04.ihost.aol.com:8000/nominatim/v1/images/mapicons/shopping_bakery.p.20.png
## 2 http://mq-open-search-int-ls04.ihost.aol.com:8000/nominatim/v1/images/mapicons/shopping_bakery.p.20.png
## 3 http://mq-open-search-int-ls04.ihost.aol.com:8000/nominatim/v1/images/mapicons/shopping_bakery.p.20.png
## 4 http://mq-open-search-int-ls04.ihost.aol.com:8000/nominatim/v1/images/mapicons/shopping_bakery.p.20.png
## 5 http://mq-open-search-int-ls04.ihost.aol.com:8000/nominatim/v1/images/mapicons/shopping_bakery.p.20.png
##    bbox_left   bbox_top bbox_right bbox_bottom
## 1 52.5416504 52.5416504  13.349306   13.349306
## 2 52.5412496 52.5412496 13.3483832  13.3483832
## 3 52.5513806 52.5513806 13.3567785  13.3567785
## 4   52.54158   52.54158 13.3498507  13.3498507
## 5 52.5499029 52.5499029 13.3534756  13.3534756

The lookup functions are vectorized but there’s a delay built in to
avoid slamming the free servers.

Some things on the TODO list are:

– enabling configuration of timeouts
– enabling switching Nominatim API server providers (you can host your
own!)
– better `Spatial` support

So, give the [code a spin](https://github.com/hrbrmstr/nominatim) and
submit feature requests/issues to github!

Despite having shown various ways to overcome D3 cartographic envy, there are always more examples that can cause the green monster to rear it’s ugly head.

Take the Voronoi Arc Map example.


Voronoi_Arc_Map

For those in need of a primer, a Voronoi tesslation/diagram is:

a partitioning of a plane into regions based on distance to points in a specific subset of the plane. That set of points (called seeds, sites, or generators) is specified beforehand, and for each seed there is a corresponding region consisting of all points closer to that seed than to any other. Wikipedia

We can overlay a Voronoi tessalation on top of a map in R as well thanks to the deldir package (which has been around since the “S” days!). Let’s get (most of) the package requirements cruft out of the way, first:

library(sp)
library(rgdal)
library(deldir)
library(dplyr)
library(ggplot2)
library(ggthemes)

Now we’ll [ab]use the data from the Arc Map example:

flights <- read.csv("http://bl.ocks.org/mbostock/raw/7608400/flights.csv", stringsAsFactors=FALSE)
airports <- read.csv("http://bl.ocks.org/mbostock/raw/7608400/airports.csv", stringsAsFactors=FALSE)

Since the D3 example cheats and only uses the continental US (CONUS) we’ll do the same and we’ll also filter out only those airports mentioned in the flights data and get the total # of incoming/outgoing flights for each airport:

conus <- state.abb[!(state.abb %in% c("AK", "HI"))]
airports <- filter(airports,
                   state %in% conus,
                   iata %in% union(flights$origin, flights$destination))
orig <- select(count(flights, origin), iata=origin, n1=n)
dest <- select(count(flights, destination), iata=destination, n2=n)
airports <- left_join(airports,
                      select(mutate(left_join(orig, dest),
                                    tot=n1+n2),
                             iata, tot)) %>% 
            filter(!is.na(tot))

Since we’re going to initially plot polygons in ggplot (and, eventually, in leaflet), we’ll need to work with Spatial objects, so let’s make those airport lat/lon pairs into a SpatialPointsDataFrame:

vor_pts <- SpatialPointsDataFrame(cbind(airports$longitude,
                                        airports$latitude),
                                  airports, match.ID=TRUE)

The deldir function returns a pretty complex object. Thankfully, the authors of the package realized that one might just want the polygons from the computation and pre-made a function: tile.list for computing/extracting them. Those polygons aren’t, however, closed and we really want to keep the airport data associatd with them, so we need to close the polygons and associate the data. Since we’re likely going to repeat this task, let’s make it a (very badly named) function:

SPointsDF_to_voronoi_SPolysDF <- function(sp) {
 
  # tile.list extracts the polygon data from the deldir computation
  vor_desc <- tile.list(deldir(sp@coords[,1], sp@coords[,2]))
 
  lapply(1:(length(vor_desc)), function(i) {
 
    # tile.list gets us the points for the polygons but we
    # still have to close them, hence the need for the rbind
    tmp <- cbind(vor_desc[[i]]$x, vor_desc[[i]]$y)
    tmp <- rbind(tmp, tmp[1,])
 
    # now we can make the Polygon(s)
    Polygons(list(Polygon(tmp)), ID=i)
 
  }) -> vor_polygons
 
  # hopefully the caller passed in good metadata!
  sp_dat <- sp@data
 
  # this way the IDs _should_ match up w/the data & voronoi polys
  rownames(sp_dat) <- sapply(slot(SpatialPolygons(vor_polygons),
                                  'polygons'),
                             slot, 'ID')
 
  SpatialPolygonsDataFrame(SpatialPolygons(vor_polygons),
                           data=sp_dat)
 
}

Before we can make the plots, we need to put the Spatial objects into the proper form for ggplot2 (and get the U.S. state map):

vor <- SPointsDF_to_voronoi_SPolysDF(vor_pts)
 
vor_df <- fortify(vor)
 
states <- map_data("state")

Now we can have some fun. Let’s try to mimic the D3 example map as closely as possible. We’ll lay down the CONUS map, add a points layer for the the airports, sizing & styling them just like the D3 example. Note that we order the points so that the smallest ones appear on top (so we can still see them).

We’ll then lay down our newly created Voronoi layer. We’ll also use the same projection (Albers) that the D3 examples uses:

gg <- ggplot()
# base map
gg <- gg + geom_map(data=states, map=states,
                    aes(x=long, y=lat, map_id=region),
                    color="white", fill="#cccccc", size=0.5)
# airports layer
gg <- gg + geom_point(data=arrange(airports, desc(tot)),
                      aes(x=longitude, y=latitude, size=sqrt(tot)),
                      shape=21, color="white", fill="steelblue")
# voronoi layer
gg <- gg + geom_map(data=vor_df, map=vor_df,
                    aes(x=long, y=lat, map_id=id),
                    color="#a5a5a5", fill="#FFFFFF00", size=0.25)
gg <- gg + scale_size(range=c(2, 9))
gg <- gg + coord_map("albers", lat0=30, lat1=40)
gg <- gg + theme_map()
gg <- gg + theme(legend.position="none")
gg

ggplot-1

While that’s pretty, it’s not exactly useful. I’m sure there are times when it’s important to show the Voronoi polygons, but they are especially useful when they are used to help with user interface interactions.

In the case of this map, some airport “bubbles” are very small and many overlap, making a “click” (or even “hover”) a potentially painstaking task for someone looking to get more data out of the visualization. The D3 example uses Voronoi polygons to make it super-easy for the user to hover over a map area and get more info about the flights for the closest airport to the mouse pointer.

We’ll use the leaflet htmlwidget to do something similar. Until I can figure out “hover” events for R+leaflet, you’ll have to live with “click”.

First we’ll need some additional packages:

library(leaflet)
library(rgeos)
library(htmltools)

And, we’ll also need a U.S. shapefile (which we simplify since the polygons are pretty detailed and that’s not necessary for this vis):

url <- "http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_500k.json"
fil <- "gz_2010_us_040_00_500k.json"
 
if (!file.exists(fil)) download.file(url, fil, cacheOK=TRUE)
 
states_m <- readOGR("gz_2010_us_040_00_500k.json", 
                    "OGRGeoJSON", verbose=FALSE)
states_m <- subset(states_m, 
                   !NAME %in% c("Alaska", "Hawaii", "Puerto Rico"))
dat <- states_m@data # gSimplify whacks the data bits
states_m <- SpatialPolygonsDataFrame(gSimplify(states_m, 0.05,
                                               topologyPreserve=TRUE),
                                     dat, FALSE)

The leaflet vis idiom is similar to the ggplot idiom. I’m using a base tile layer since I was too lazy to figure out how to change the leaflet default gray background map color. The map polygons are added, then the circles/bubbles (note that you work in meters with addCircles which lets leaflet scale the bubbles as you zoom in/out). Finally, the Voronoi layer is added. I kept the stroke visible purely for demonstration purposes. You need to keep fill=TRUE otherwise the Voronoi layer won’t get click/hover events and once I figure out how to trigger popups on hover and use a static popup layer, this will let users hover around the map to get the underlying airport flight information.

leaflet(width=900, height=650) %>%
  # base map
  addProviderTiles("Hydda.Base") %>%
  addPolygons(data=states_m,
              stroke=TRUE, color="white", weight=1, opacity=1,
              fill=TRUE, fillColor="#cccccc", smoothFactor=0.5) %>%
  # airports layer
  addCircles(data=arrange(airports, desc(tot)),
             lng=~longitude, lat=~latitude,
             radius=~sqrt(tot)*5000, # size is in m for addCircles O_o
             color="white", weight=1, opacity=1,
             fillColor="steelblue", fillOpacity=1) %>%
  # voronoi (click) layer
  addPolygons(data=vor,
              stroke=TRUE, color="#a5a5a5", weight=0.25,
              fill=TRUE, fillOpacity = 0.0,
              smoothFactor=0.5, 
              popup=sprintf("Total In/Out: %s",
                            as.character(vor@data$tot)))

I made the Voronoi layer very light, so you may want to keep it there as a cue for the user. How you work with it is completely up to you.

Now you have one less reason to be envious of the D3 cartographers!

The $DAYJOB doesn’t afford much opportunity to work with cartographic datasets, but I really like maps and tinker with shapefiles and geo-data when I can, plus answer a ton of geo-questions on StackOverflow. R makes it easy—one might even say too easy—to work with maps. All it takes to make a map of the continental United States (CONUS) is:

library(maps)
map("state")

base_1-1

It’s a little more involved with ggplot2:

library(ggplot2)
library(ggthemes)

states <- map_data("state")

gg <- ggplot()
gg <- gg + geom_map(data=states, map=states,
                    aes(x=long, y=lat, map_id=region),
                    color="black", fill="white", size=0.25)
gg <- gg + theme_map()
gg

ggplot_1-1

Both of those maps are horrible. The maps::map function defaults to using a rectangular projection with the aspect ratio chosen so that longitude and latitude scales are equivalent at the center of the picture. ggplot will size the graphic to the device window. Sticking with these defaults is a really bad idea. Why? I’ll let Mark Monmonier (author of How to Lie with Maps) explain:

Maps have three basic attributes: scale, projection, and symbolization. Each element is a source of distortion. As a group, they describe the essence of the map’s possibilities and limitations. No one can use maps or make maps safely and effectively without understanding map scales, map projections, and map symbols.

Map projections distort five geographic relationships: areas, angles, gross shapes, distances, and directions. Although some projections preserve local angles but not areas, others preserve areas but not local angles. All distort large shapes noticeably (but some distort continental shapes more than others), and all distort at least some distances and some directions.

There are great examples in his book on how map projections can inflate or diminish the area and relative importance of countries and regions, and how a map projection can itself become a rallying point for “cartographically oppressed” regions.

If map projections are important (and they are) what should you do? Both map and ggplot give you options to use projections that are found in the mapproj library (specifically using the mapproject function). The help for mapproject even gives an example of using the Albers equal-area conic projection when plotting the CONUS:

library(mapproj)
map("state", projection="albers", par=c(lat0=30, lat1=40))

base_2-1

As it’s full name suggests, Albers is an equal-area projection which is recommended for U.S. choropleths as it preserves the relative areas of geographic features. It’s also better than the ggplot default (“Mercator“) in it’s coord_map:

gg + coord_map()

ggplot_2-1

But, we can pass in parameters to use a saner projection:

gg + coord_map("albers", lat0=30, lat1=40)

ggplot_3-1

The mapproject function exposes 41 projections, which may seem generous; however, not all of them are practical and even the ones with parameters are not very customizable. Before we dig into that a bit more, you’re probably wondering (if you don’t work with cartography for a living or hobby) how one goes about choosing a map projection…

Choosing A Map Projection

Thankfully, many smart folks have thought quite a bit about choosing map projections and there are a number of resources you can draw upon when making your own choices.

The first one is Map Projections – A Working Manual. This is a free resource from the U.S. Geological Survey and was published back in 1987. It has a “companion” resource – An Album of Map Projections. Both are outstanding resources to have on hand as they provide a great deal of information on map projections. If you’re in a hurry, the “Album” makes for a good quick reference. Here’s the entry for our buddy, Albers:


album-albers

(Go to page 10 of the “Album” for the larger version)

The image in the upper-right is a “Tissot indicatrix” (named for it’s creator Nicolas Auguste Tissot), which “puts Tissot indicatrices at every 30° of latitude and longitude, except at the poles. This shows the shape of infinitesimally small circles on the Earth as they appear when they are plotted by using a fixed finite scale at the same locations on a map. Every circle is plotted as a circle or an ellipse or, in extreme cases, as a straight line. On an equal-area projection, all these ellipses and circles are shown as having the same area. The flattening of the ellipse shows the extent of local shape distortion and how much the scale is changed and in what direction. On conformal map projections, all indicatrices remain circles, but the areas change. On other projections, both the areas and the shapes of the indicatrices change”. This makes for a great way to understand just how your creations are being distorted (though that may be obvious when you actually plot your maps).

The “Working Manual” also includes the equations necessary to compute the projections. Both provide projection usage guidance as well as the technically bits describing them.

The Institute for Environment and Sustainability has a similar guide for Map Projections for Europe.

Many countries and municipalities also have their own guides for using projections (e.g. California).

If you can handle what feels like a TARDIS trip back in time to the days of GeoCities, MapRef also provides good information on projections. You can also review Carlos A. Furuti’s compilation of projections for more information.

Using More Complex/Nuanced Projections

Despite being able to use Albers and 40 other projections via mapproject, having more flexibility would allow us to use grid systems (see the refs in the previous section for what those are) and also hyper-customize many projections without the need to write our own equations (be thankful of that as you skim the math in the “Working Manual”!).

Gerald Evenden developed a library and utility called proj for the USGS back in 1995 and said utility, library & projection specification idiom has been maintained and expanded ever since in the PROJ.4 project. This library/tool is straightforward to install on most (if not all) operating systems. PROJ.4 lets you specify projections in a (fairly complex) string format (often referred to as a proj4string, especially in R). For example, you can specify Albers for the U.S. with:

"+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96"

Cartographic Projection Procedures for the UNIX Environment—A User’s Manual (considered “the”manual for PROJ.4) explains how to build the specification. At the time of this blog post there are 134 named (i.e. +proj=NAME) projections availble for use & customization (run proj -l at a command-line to see the full list).

A simple example of why this is useful is when you need to plot a world map. Said activity should always be prefaced with a review of this seminal work to ensure you will choose a good projection—say, Winkel tripel. A quick glance of what mapproject supports will show you that you’re pretty much out of luck using the standard R tools we’ve seen so far for that but there is a handy +proj=wintri in PROJ.4 we can use.

Here’s how you’d plot that projection in the base plotting system:

library(sp)
library(rworldmap) # this pkg has waaaay better world shapefiles

plot(spTransform(getMap(), CRS("+proj=wintri")))

base_3-1

However, we can’t pass in PROJ.4 strings to ggplot’s coord_map, so we have to project the Earth first before plotting and use coord_equal:

world <- fortify(spTransform(getMap(), CRS("+proj=wintri")))

gg <- ggplot()
gg <- gg + geom_map(data=world, map=world,
                    aes(x=long, y=lat, map_id=id),
                    color="black", fill="white", size=0.25)
gg <- gg + coord_equal()
gg <- gg + theme_map()
gg

ggplot_4-1

While that works, you have to pre-project any lines, points, etc. as well before plotting them with the map. For example, here are the locations of all of the non-military rocket launch sites (yeah, I’ve done that before, but it’s a fun data set! – and I had it handy):

library(rgdal) # for readOGR
library(httr)  # for downloading

launch_sites <- paste0("https://collaborate.org/sites/collaborate.org/",
                       "files/spacecentersnon-militaryspacelaunchsites.kml")
invisible(try(GET(launch_sites, write_disk("/tmp/launch-sites.kml")), silent=TRUE))

sites <- readOGR("/tmp/launch-sites.kml", 
                 "SpaceCenters: Non-Military Space Launch Sites", 
                 verbose=FALSE)
sites_wt <- spTransform(sites, CRS("+proj=wintri"))
sites_wt <- coordinates(sites_wt)

gg <- gg + geom_point(data=as.data.frame(sites_wt), 
                      aes(x=coords.x1, y=coords.x2), 
                      color="#b2182b")
gg

ggplot_5-1

If you have more layers, that’s doable, but painful. If we could make it so ggplot allow for PROJ.4 projections in some coord_ this would help keep our plot idioms (and code) cleaner. Thus, coord_proj was born.

coord_proj mimics the functionality of coord_map but uses proj4::project instead of mapproject (and takes in any of those parameters). The world map with launch sites now looks like this (complete ggplot code):

world <- fortify(getMap())
sites <- as.data.frame(coordinates(readOGR("/tmp/launch-sites.kml", 
                                           "SpaceCenters: Non-Military Space Launch Sites", 
                                           verbose=FALSE)))

gg <- ggplot()
gg <- gg + geom_map(data=world, map=world,
                    aes(x=long, y=lat, map_id=id),
                    color="black", fill="white", size=0.25)
gg <- gg + geom_point(data=(sites), 
                      aes(x=coords.x1, y=coords.x2), 
                      color="#b2182b")
gg <- gg + coord_proj("+proj=wintri")
gg <- gg + theme_map()
gg

ggplot_6-1

Now, if you want to go all Hobo-Dyer instead of Winkel tripel, it’s one, small change:

gg + coord_proj("+proj=cea +lat_ts=37.5")

ggplot_7-1

Spatial Reference is a website by Howard Butler, Christopher Schmidt, Dane Springmeyer, and Josh Livni that helps assist others in their understanding, recording, and usage of spatial reference systems. It’s ultimate goal is to let you use URI’s to specify spatial reference systems, but you can use it to lookup Proj4 strings to meet virtually any need you have.

The Bad News

Presently, coord_proj is not part of the ggplot2 code base. I chose a really bad time to muck with this as Hadley is doing all sorts of spiffy stuff to ggplot2 and it’s not a good time to shove this in there.

You can fork your own copy of ggplot2 from Hadley’s GitHub repo and add this file to it, re-document/collate/build the package and then use it locally. It still needs some work (there’s a horrible hack in it that anyone familiar with geo-stuff will cringe at) but I welcome feedback/contributions. Hopefully this will get into ggplot2 (or out of it and into a separate package since Hadley is making ggplot2 very extensible in his update) at some point in the not-too-distant future.

Nathaniel Smith and Stéfan van der Walt presented a new colormap (for Python) at SciPy 2015 called viridis.

From the authors:

The default colourmap in Matplotlib is the colourful rainbow-map called Jet, which is deficient in many ways: small changes in the data sometimes produce large perceptual differences and vice-versa; its lightness gradient is non-monotonic; and, it is not particularly robust against color-blind viewing. Thus, a new default colormap is needed — but no obvious candidate has been found. Here, we present our proposed new default colormap for Matplotlib, and expose the theory, tools, data exploration and motivations behind its design.

You can also find out a tad more about their other colormap designs (a.k.a. the runner-ups), along with Parula, which is a proprietary MATLAB color map.

Simon Garnier (@sjmgarnier) took Nathaniel & Stéfan’s work and turned it into an R package.

Noam Ross (@noamross) & I piled on shortly thereafter to add some ggplot color scale_ functions which are (for now) only available in Simon’s github repo.

Rather than duplicate the examples already provided in the documentation of those functions, I thought there might be more efficacy in creating a post that helped showcase why you should switch from rainbow (et al) to viridis.

Since folks seem to like maps, we’ll work with one for the example, but let’s get some package machinations out of the way first:

library(viridis)
library(raster)
library(scales)
library(dichromat)
library(rasterVis)
library(httr)
library(colorspace)

Now, we’ll need a map to work with so let’s grab a U.S. max temperature GeoTIFF raster from NOAA (from the bitter cold month of February 2015) and project it to something more reasonable:

temp_raster <- "http://ftp.cpc.ncep.noaa.gov/GIS/GRADS_GIS/GeoTIFF/TEMP/us_tmax/us.tmax_nohads_ll_20150219_float.tif"
 
try(GET(temp_raster,
        write_disk("us.tmax_nohads_ll_20150219_float.tif")), silent=TRUE)
us <- raster("us.tmax_nohads_ll_20150219_float.tif")
 
# albers FTW
us <- projectRaster(us, crs="+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs")

We’ll also make a helper function to save us some typing and setup the base # of colors in the colormap:

n_col <- 64
 
img <- function(obj, col) {
  image(obj, col=col, asp=1, axes=FALSE, xaxs="i", xaxt='n', yaxt='n', ann=FALSE)
}

Let’s take a look at various color palettes with different types of vision. We’ll use a 3×2 grid and:

  • use 4 color palettes from grDevices,
  • make a gradient palette from one of the ColorBrewer sequential palettes, and
  • then (finally) use a viridis color palette.

We’ll take this grid of 6 maps and view it through the eyes of three different types of color vision as well as a fully desaturated version. Note that I’m not adding much cruft to the map display (including legends) since this isn’t about the values so much as it is about the visual perception of the colormaps.

Remember you can select/click/tap the map grids for (slightly) larger versions.

“Normal” Vision

par(mfrow=c(3, 2), mar=rep(0, 4))
img(us, rev(heat.colors(n_col)))
img(us, rev(rainbow(n_col)))
img(us, rev(topo.colors(n_col)))
img(us, rev(cm.colors(n_col)))
img(us, gradient_n_pal(brewer_pal("seq")(9))(seq(0, 1, length=n_col)))
img(us, rev(viridis(n_col)))

01_normal-1

All of the maps convey the differences in max temperature. If you happen to have “normal” color vision you should be drawn to the bottom two (ColorBrewer on the left and Viridis on the right). They are both sequential and convey the temperature changes more precisely (and they aren’t as gosh-awful ugly the the other four).

While the ColorBrewer gradient may be “good”, Viridis is designed to be:

  • colorful & “pretty”
  • sequential (to not impose other structure on exploratory data analysis visualizations)
  • perceptually uniform (i.e. changes in the data should be accurately decoded by our brains) even when desaturated
  • accessible to colorblind viewers

and seems to meet those goals quite well.

Take a look at each of the vision-adjusted examples:

Green-Blind (Deuteranopia)

par(mfrow=c(3, 2), mar=rep(0, 4))
img(us, dichromat(rev(heat.colors(n_col)), "deutan"))
img(us, dichromat(rev(rainbow(n_col)), "deutan"))
img(us, dichromat(rev(topo.colors(n_col)), "deutan"))
img(us, dichromat(rev(cm.colors(n_col)), "deutan"))
img(us, dichromat(gradient_n_pal(brewer_pal("seq")(9))(seq(0, 1, length=n_col)), "deutan"))
img(us, dichromat(rev(viridis(n_col)), "deutan"))

02_deutan-1

Red-Blind (Protanopia)

par(mfrow=c(3, 2), mar=rep(0, 4))
img(us, dichromat(rev(heat.colors(n_col)), "protan"))
img(us, dichromat(rev(rainbow(n_col)), "protan"))
img(us, dichromat(rev(topo.colors(n_col)), "protan"))
img(us, dichromat(rev(cm.colors(n_col)), "protan"))
img(us, dichromat(gradient_n_pal(brewer_pal("seq")(9))(seq(0, 1, length=n_col)), "protan"))
img(us, dichromat(rev(viridis(n_col)), "protan"))

03_protan-1

Blue-Blind (Tritanopia)

par(mfrow=c(3, 2), mar=rep(0, 4))
img(us, dichromat(rev(heat.colors(n_col)), "tritan"))
img(us, dichromat(rev(rainbow(n_col)), "tritan"))
img(us, dichromat(rev(topo.colors(n_col)), "tritan"))
img(us, dichromat(rev(cm.colors(n_col)), "tritan"))
img(us, dichromat(gradient_n_pal(brewer_pal("seq")(9))(seq(0, 1, length=n_col)), "tritan"))
img(us, dichromat(rev(viridis(n_col)), "tritan"))

04_tritan-1

Desaturated

par(mfrow=c(3, 2), mar=rep(0, 4))
img(us, desaturate(rev(heat.colors(n_col))))
img(us, desaturate(rev(rainbow(n_col))))
img(us, desaturate(rev(topo.colors(n_col))))
img(us, desaturate(rev(cm.colors(n_col))))
img(us, desaturate(gradient_n_pal(brewer_pal("seq")(9))(seq(0, 1, length=n_col))))
img(us, desaturate(rev(viridis(n_col))))

05_desatureated-1

Hopefully both the ColorBrewer gradient and Viridis palettes stood out as conveying the temperature data with more precision and more consistently across all non-standard vision types as you progressed through each one.

To see this for yourself in your own work, grab Simon’s package and start substituting viridis for some of your usual defaults to see if it makes a difference in helping you convey the story your data is trying to tell, both more accurately and for a more diverse audience. Remember, the github version (which will be on CRAN soon) have handy ggplot scale_ functions to make using viridis as painless as possible.

I also updated my Melbourne Walking EDA project to use the viridis palette instead of parula (which I was only really using in defiance of MATLAB’s inane restrictions).

Poynter did a nice interactive piece on world population by income (i.e. “How Many Live on How Much, and Where”). I’m always on the lookout for optimized shapefiles and clean data (I’m teaching a data science certificate program starting this Fall) and the speed of the site load and the easy availability of the data set made this one a “must acquire”. Rather than just repeat Poynter’s D3-goodness, here’s a way to look at the income data in series of small multiple choropleths—using R & ggplot2—that involves:

  • downloading data & shapefiles from a web site
  • using dplyr & tidyr for data munging
  • applying custom fill color scale mapping in ggplot
  • ordering plots with a custom facet order (using factors)
  • tweaking the theme and aesthetics for a nicely finished result

By using D3, Poynter inherently made the data available. Pop open the “Developer Tools” in any browser, reload the page and look at the “Network” tab and you’ll see a list of files (you can sometimes see things in the source code, but this technique is often faster). The income data is a well-formed CSV file http://www.pewglobal.org/wp-content/themes/pew-global/interactive-global-class.csv and their highly optimized world map was also easy to discern http://www.pewglobal.org/wp-content/lib/js/world-geo.json. We’ll start by grabbing the map and using the same map projection that Poynter did (Robinson). Don’t be put off by all the library calls since one of the best parts of R is the ever-increasing repository of great packages to help you get things done.

library(httr)     # getting data
library(rgdal)    # working with shapefile
library(dplyr)    # awesome data manipulation
library(readr)    # faster reading of CSV data
library(stringi)  # string manipulation
library(stringr)  # string manipulation
library(tidyr)    # reshaping data
library(grid)     # for 'unit'
library(scales)   # for 'percent'
library(ggplot2)  # plotting
library(ggthemes) # theme_map
 
# this ensures you only download the shapefile once and hides
# errors and warnings. remove `try` and `invisible` to see messages
try(invisible(GET("http://www.pewglobal.org/wp-content/lib/js/world-geo.json",
                  write_disk("world-geo.json"))), silent=TRUE)
 
# use ogrListLayers("world-geo.json") to see file type & 
# layer info to use in the call to readOGR
 
world <- readOGR("world-geo.json", "OGRGeoJSON")
world_wt <- spTransform(world, CRS("+proj=robin"))
world_map <- fortify(world_wt)

I would have liked to do fortify(world_wt, region="name") (since that makes working with filling in countries by name much easier in the choropleth part of the code) but that generated TopologyException errors (I’ve seen this happen quite a bit with simplified/optimized shapefiles and some non-D3 geo-packages). One can sometimes fix those with a strategic rgeos::gBuffer call, but that didn’t work well in this case. We can still use country names with a slight rejiggering of the fortified data frame using dplyr:

world_map %>%
  left_join(data_frame(id=rownames(world@data), name=world@data$name)) %>%
  select(-id) %>%
  rename(id=name) -> world_map

Now it’s time to get the data. The CSV file has annoying spaces in it that causes R to interpret all the columns as strings, so we can use dplyr again to get them into the format we want them in. Note that I’m also making the percentages decimals so we can use percent later on to easily format them.

# a good exercise would be to repeat the download code above 
# rather and make repeated calls to an external resource
read_csv("http://www.pewglobal.org/wp-content/themes/pew-global/interactive-global-class.csv") %>%
  mutate_each(funs(str_trim)) %>%
  filter(id != "None") %>%
  mutate_each(funs(as.numeric(.)/100), -name, -id) -> dat

For this post, we’ll only be working with the actual share percentages, so let’s:

  • ignore the “change” columns
  • convert the data frame from wide to long
  • extract out the income levels (e.g. “Poor”, “Low Income”…)
  • set a factor order for them so our plots will be in the correct sequence
dat %>%
  gather(share, value, starts_with("Share"), -name, -id) %>%
  select(-starts_with("Change")) %>%
  mutate(label=factor(stri_trans_totitle(str_match(share, "Share ([[:alpha:]- ]+),")[,2]),
                      c("Poor", "Low Income", "Middle Income", "Upper-Middle Income", "High Income"),
                      ordered=TRUE)) -> share_dat

The stringi package is really handy (stringr is built on it, too). The stri_trans_totitle function alleviates some mundane string operations and the stri_replace_all_regex (below) also allows us to do vectorized regular expression replacements without a ton of code.

To keep the charts aligned, we’ll use Poynter’s color scale (which was easy to extract from the site’s code) and use the same legend breaks via `cut’. We’ll also format the labels for these breaks to make our legend nicer to view.

# use same "cuts" as poynter
poynter_scale_breaks <- c(0, 2.5, 5, 10, 25, 50, 75, 80, 100)
 
sprintf("%2.1f-%s", poynter_scale_breaks, percent(lead(poynter_scale_breaks/100))) %>%
  stri_replace_all_regex(c("^0.0", "-NA%"), c("0", "%"), vectorize_all=FALSE) %>%
  head(-1) -> breaks_labels
 
share_dat %>%
  mutate(`Share %`=cut(value,
                       c(0, 2.5, 5, 10, 25, 50, 75, 80, 100)/100,
                       breaks_labels))-> share_dat
 
share_pal <- c("#eaecd8", "#d6dab3", "#c2c98b", "#949D48", "#6e7537", "#494E24", "#BB792A", "#7C441C", "#ffffff")

Finally, we get to the good part and start plotting the visualization. There are only two layers: the base map and the choropleth filled map. We then:

  • apply our manual color palette to them
  • remove the line color slashes in the legend boxes
  • setup the overall plot label
  • tell ggplot which coordinate system to use (in this case coord_equal is fine since we already projected the points)
  • apply a base theme that’s good for mapping
  • tweak the text and ensure our legend is in the position we want it to be ing
gg <- ggplot()
 
gg <- gg + geom_map(data=world_map, map=world_map,
                    aes(x=long, y=lat, group=group, map_id=id),
                    color="#7f7f7f", fill="white", size=0.15)
 
gg <- gg + geom_map(data=share_dat, map=world_map,
                    aes(map_id=name, fill=`Share %`),
                    color="#7f7f7f", size=0.15)
 
gg <- gg + scale_fill_manual(values=share_pal)
 
gg <- gg + guides(fill=guide_legend(override.aes=list(colour=NA)))
gg <- gg + labs(title="World Population by Income\n")
gg <- gg + facet_wrap(~label, ncol=2)
 
gg <- gg + coord_equal()
 
gg <- gg + theme_map()
 
gg <- gg + theme(panel.margin=unit(1, "lines"))
gg <- gg + theme(plot.title=element_text(face="bold", hjust=0, size=24))
gg <- gg + theme(legend.title=element_text(face="bold", hjust=0, size=12))
gg <- gg + theme(legend.text=element_text(size=10))
gg <- gg + theme(strip.text=element_text(face="bold", size=10))
gg <- gg + theme(strip.background=element_blank())
gg <- gg + theme(legend.position="bottom")
 
gg

And, here’s the result (click for larger version):

forblog

The optimized shapefile makes for a very fast plot and you can plot individual chorpleths by filtering the data and not using facets.

While there are a number of choropleth packages out there for R, learning how to do the core components on your own can (a) make you appreciate those packages a bit more (b) give you the skills to do them on your own when you need a more customized version. Many of the theme tweaks will also apply to the ggplot-based choropleth packages.

With this base, it should be a fun exercise to the reader to do something similar with Poynter’s “percentage point change” choropleth. You’ll need to change the color palette and manipulate different data columns to get the same scales and visual representation they do. Drop a note in the comments if you give this a go!

You can find all the code in this post in one convenient gist.

The recent announcement of the [start of egg rationing in the U.S.](http://www.washingtonpost.com/blogs/wonkblog/wp/2015/06/05/the-largest-grocer-in-the-texas-is-now-rationing-eggs/) made me curious enough about the avian flu outbreak to try to dig into the numbers a bit. I finally stumbled upon a [USDA site](http://www.aphis.usda.gov/wps/portal/aphis/ourfocus/animalhealth/sa_animal_disease_information/sa_avian_health/sa_detections_by_states/ct_ai_pacific_flyway/!ut/p/a1/lVPBkqIwEP2WOexpCxMBAY-oo6Kybo01q3JJNSGB1GCgSNTi7zcwe3CnZnQ3h1Sl-3X369cdlKADSiRcRA5aVBLK7p14ZLVd2sMJtqPFbvyMox-_5nGw8Z3t0jWAowHgL06I_47friOvi3_Bk-VsiHcO2qMEJVTqWhfoCHUhFKGV1ExqUoq0gab9hhWQ6twQXtGz6l8gxQlKUjAodXFryYRioBgRklfNqW_i3X0RIG_xGdOMdm5F0pYoDZqZ1FQTEKQGKrighJftFdqOX01Fho7c9iiAzS3HG6WWm2HbSnmAzYXxyA3AG1L-R487Df-TntNFuHT9jVHQDWwczUywP44xjrxH8b2eDzL0gHsj-1Bk8TwxReabn_56ZeP1CB0NSf9LFmMX7f5TtdXDtmYoib9z5YadQnYTT-PcVABdWN2s0eHuDry7b3agN3y2A-jw6Q4YfnlZpeZD7Ke3REKZOoEh0jDOGtYMikppdLher4OzymCQVxdUn15PgdMK6-0lwM7obR9ayTx_evoNPxBrVg!!/?1dmy&urile=wcm:path:/APHIS_Content_Library/SA_Our_Focus/SA_Animal_Health/SA_Animal_Disease_Information/SA_Avian_Health/SA_Detections_by_States/) that had an embedded HTML table of flock outbreak statistics by state, county and date (also flock type and whether it was a commercial enterprise or “backyard” farm). Just looking at the sum of flock sizes on that page shows that nearly 50 million birds have been impacted since December, 2014.

We can scrape the data with R & `rvest` and then use the shapefile hexbins from [previous posts](http://rud.is/b/2015/05/15/u-s-drought-monitoring-with-hexbin-state-maps-in-r/) to watch the spread week-over-week.

The number of packages I ended up relying on was a bit surprising. Let’s get them out of the way before focusing on the scraping and hexbin-making:

library(rvest)     # scraping
library(stringr)   # string manipulation
library(lubridate) # date conversion
library(dplyr)     # data mjnging
library(zoo)       # for locf
library(ggplot2)   # plotting
library(rgdal)     # map stuff
library(rgeos)     # map stuff

We also end up using `magrittr` and `tidyr` but only for one function, so you’ll see those with `::` in the code.

Grabbing the USDA page is pretty straightforward:

url <- "http://www.aphis.usda.gov/wps/portal/aphis/ourfocus/animalhealth/sa_animal_disease_information/sa_avian_health/ct_avian_influenza_disease/!ut/p/a1/lVJbb4IwFP41e1qwFZDLI-oUnGgyswm8kAMUaAaFQNG4X7-ibnEPYtakDz3nO_kupyhAHgoYHGgGnFYMiv4daOFqa8vjKZad5c58wc7mY-Eaa13Z2qoA-AKA7xwL_53fvjpaP_-Gp_Z8jHcK2qMABTHjNc-RD3VO2zCuGCeMhwWNGmhOT7iFsOqaMK3irj2_gNESijAnUPD8tpLQlkBLQsrSqinPJi7tAwX2i4_5tSBgRUfYF_wM9mLqmCbIj2QzxZpMJMUYg6TGkSLBBCaSPEnSJIljXVH0q_kBdw_CO5sXkNnSslV9LQJTDRk7czGumy7GjnYFDOTrCw36XRJTRbt_mlo9VD1FgfuctqrVByA37szNBAPwXOpzR97gPi7tm30gb2AfQkxWVJH4ifvZLavFIsUQrA1JSUOaUV61HHnH43HUtQmMsuqA6vK9NJST9JluNlKwyPr7DT6YvRs!/?1dmy&urile=wcm%3apath%3a%2Faphis_content_library%2Fsa_our_focus%2Fsa_animal_health%2Fsa_animal_disease_information%2Fsa_avian_health%2Fsa_detections_by_states%2Fct_ai_pacific_flyway"
 
#' read in the data, extract the table and clean up the fields
#' also clean up the column names to since they are fairly nasty
 
pg <- html(url)

If you poke at the source for the page you’ll see there are two tables in the code and we only need the first one. Also, if you scan the rendered table on the USDA page by eye you’ll see that the column names are horrible for data analysis work and they are also inconsistent in the values used for various columns. Furthermore, there are commas in the flock counts and it would be handy to have the date as an actual date type. We can extract the table we need and clean all that up in a reasonably-sized `dplyr` pipe:

pg %>%
  html_nodes("table") %>%
  magrittr::extract2(1) %>%
  html_table(header=TRUE) %>%
  filter(`Flock size`!="pending") %>%
  mutate(Species=str_replace(tolower(Species), "s$", ""),
         `Avian influenza subtype*`=str_replace_all(`Avian influenza subtype*`, " ", ""),
         `Flock size`=as.numeric(str_replace_all(`Flock size`, ",", "")),
         `Confirmation date`=as.Date(mdy(`Confirmation date`))) %>%
  rename(state=State, county=County, flyway=Flyway, flock_type=`Flock type`,
         species=Species, subtype=`Avian influenza subtype*`, date=`Confirmation date`,
         flock_size=`Flock size`) -> birds

Let’s take a look at what we have:

glimpse(birds)
 
## Observations: 202
## Variables:
## $ state      (chr) "Iowa", "Minnesota", "Minnesota", "Iowa", "Minnesota", "Iowa",...
## $ county     (chr) "Sac", "Renville", "Renville", "Hamilton", "Kandiyohi", "Hamil...
## $ flyway     (chr) "Mississippi", "Mississippi", "Mississippi", "Mississippi", "M...
## $ flock_type (chr) "Commercial", "Commercial", "Commercial", "Commercial", "Comme...
## $ species    (chr) "turkey", "chicken", "turkey", "turkey", "turkey", "turkey", "...
## $ subtype    (chr) "EA/AM-H5N2", "EA/AM-H5N2", "EA/AM-H5N2", "EA/AM-H5N2", "EA/AM...
## $ date       (date) 2015-06-04, 2015-06-04, 2015-06-04, 2015-06-04, 2015-06-03, 2...
## $ flock_size (dbl) 42200, 415000, 24800, 19600, 37000, 26200, 17200, 1115700, 159...

To make an animated map of cumulative flock totals by week, we’ll need to

– group the `birds` data frame by week and state
– calculate the cumulative sums
– fill in the gaps where there are missing state/week combinations
– carry the last observations by state/week forward in this expanded data frame
– make breaks for data ranges so we can more intelligently map them to colors

This ends up being a longer `dplyr` pipe than I usuall like to code (I think very long ones are hard to follow) but it gets the job done and is still pretty readable:

birds %>%
  mutate(week=as.numeric(format(birds$date, "%Y%U"))) %>%
  arrange(week) %>%
  group_by(week, state) %>%
  tally(flock_size) %>%
  group_by(state) %>%
  mutate(cum=cumsum(n)) %>%
  ungroup %>%
  select(week, state, cum) %>%
  mutate(week=as.Date(paste(week, 1), "%Y%U %u")) %>%
  left_join(tidyr::expand(., week, state), .) %>%
  group_by(state) %>%
  do(na.locf(.)) %>%
  mutate(state_abb=state.abb[match(state, state.name)],
         cum=as.numeric(ifelse(is.na(cum), 0, cum)),
         brks=cut(cum,
                  breaks=c(0, 200, 50000, 1000000, 10000000, 50000000),
                  labels=c("1-200", "201-50K", "50k-1m",
                           "1m-10m", "10m-50m"))) -> by_state_and_week

Now, we perform the standard animation steps:

– determine where we’re going to break the data up
– feed that into a loop
– partition the data in the loop
– render the plot to a file
– combine all the individual images into an animation

For this graphic, I’m doing something a bit extra. The color ranges for the hexbin choropleth go from very light to very dark, so it would be helpful if the titles for the states went from very dark to very light, matching the state colors. The lines that do this check for state breaks that fall in the last two values and appropriately assign `”black”` or `”white”` as the color.

i <- 0
 
for (wk in unique(by_state_and_week$week)) {
 
  # filter by week
 
  by_state_and_week %>% filter(week==wk) -> this_wk
 
  # hack to let us color the state labels in white or black depending on
  # the value of the fill
 
  this_wk %>%
    filter(brks %in% c("1m-10m", "10m-50m")) %>%
    .$state_abb %>%
    unique -> white_states
 
  centers %>%
    mutate(txt_col="black") %>%
    mutate(txt_col=ifelse(id %in% white_states, "white", "black")) -> centers
 
  # setup the plot
 
  gg <- ggplot()
  gg <- gg + geom_map(data=us_map, map=us_map,
                      aes(x=long, y=lat, map_id=id),
                      color="white", fill="#dddddd", size=2)
  gg <- gg + geom_map(data=this_wk, map=us_map,
                      aes(fill=brks, map_id=state_abb),
                      color="white", size=2)
  gg <- gg + geom_text(data=centers,
                       aes(label=id, x=x, y=y, color=txt_col), size=4)
  gg <- gg + scale_color_identity()
  gg <- gg + scale_fill_brewer(name="Combined flock size\n(all types)",
                               palette="RdPu", na.value="#dddddd", drop=FALSE)
  gg <- gg + guides(fill=guide_legend(override.aes=list(colour=NA)))
  gg <- gg + coord_map()
  gg <- gg + labs(x=NULL, y=NULL,
                  title=sprintf("U.S. Avian Flu Total Impact as of %s\n", wk))
  gg <- gg + theme_bw()
  gg <- gg + theme(plot.title=element_text(face="bold", hjust=0, size=24))
  gg <- gg + theme(panel.border=element_blank())
  gg <- gg + theme(panel.grid=element_blank())
  gg <- gg + theme(axis.ticks=element_blank())
  gg <- gg + theme(axis.text=element_blank())
  gg <- gg + theme(legend.position="bottom")
  gg <- gg + theme(legend.direction="horizontal")
  gg <- gg + theme(legend.title.align=1)
 
  # save the image
 
  # i'm using "quartz" here since I'm on a Mac. Use what works for you system to ensure you
  # get the best looking output png
 
  png(sprintf("output/%03d.png", i), width=800, height=500, type="quartz")
  print(gg)
  dev.off()
 
  i <- i + 1
 
}

We could use one of the R animation packages to actually make the animation, but I know ImageMagick pretty well so I just call it as a `system` command:

system("convert -delay 60 -loop 1 output/*png output/avian.gif")

All that results in:

avian

If that’s a static image, open it in a new tab/window (or just click on it). I really didn’t want to do a looping gif but if you do just make the `-loop 1` into `-loop 0`.

Now, we can just re-run the code when the USDA refreshes the data.

The code, data and sample bitmaps are on [github](https://github.com/hrbrmstr/avianflu).

On the news, today, of the early stages of drought hitting the U.S. northeast states I decided to springboard off of yesterday’s post and show a more practical use of hexbin state maps than the built-in (and still purpose unknown to me) “bees” data.

The U.S. Drought Monitor site supplies more than just a pretty county-level map. There’s plenty of data and you can dynamically retrieve just data tables for the whole U.S., U.S. states and U.S. counties. Since we’re working with state hexbins, we just need the state-level data. Drought levels for all five stages are reported per-state, so we can take all this data and created a faceted/small-multiples map based on it.

This builds quite a bit on the previous work, so you’ll see some familiar code. Most of the new code is actually making the map look nice (the great part about this is that once you have the idiom down, it’s just a matter of running the script each day vs a billion mouse clicks). The other bit of new code is the data-retrieval component:

library(readr)
library(tidyr)
library(dplyr)

intensity <- c(D0="Abnormally Dry", D1="Moderate Drought", D2="Severe Drought", 
               D3="Extreme Drought", D4="Exceptional Drought")

today <- format(Sys.Date(), "%Y%m%d")

read_csv(sprintf("http://droughtmonitor.unl.edu/USDMStatistics.ashx/?mode=table&aoi=state&date=%s", today)) %>% 
  gather(drought_level, value, D0, D1, D2, D3, D4) %>% 
  mutate(intensity=factor(intensity[drought_level], 
                          levels=as.character(intensity), ordered=TRUE)) -> drought

This:

  • sets up a fast way to add the prettier description of the drought levels (besides D0, D1, etc)
  • dynamically uses today’s date as the parameter for the URL we read with read_csv (from the readr package)
  • covert the data from wide to long
  • adds the intensity description

The ggplot code will facet on the intensity level to make the overall map:

library(rgdal)
library(rgeos)
library(ggplot2)
library(readr)
library(tidyr)
library(dplyr)
library(grid)

# get map from https://gist.github.com/hrbrmstr/51f961198f65509ad863#file-us_states_hexgrid-geojson

us <- readOGR("us_states_hexgrid.geojson", "OGRGeoJSON")

centers <- cbind.data.frame(data.frame(gCentroid(us, byid=TRUE), id=us@data$iso3166_2))

us_map <- fortify(us, region="iso3166_2")

intensity <- c(D0="Abnormally Dry", D1="Moderate Drought", D2="Severe Drought",
               D3="Extreme Drought", D4="Exceptional Drought")

today <- format(Sys.Date(), "%Y%m%d")

read_csv(sprintf("http://droughtmonitor.unl.edu/USDMStatistics.ashx/?mode=table&aoi=state&date=%s", today)) %>%
  gather(drought_level, value, D0, D1, D2, D3, D4) %>%
  mutate(intensity=factor(intensity[drought_level],
                          levels=as.character(intensity), ordered=TRUE)) -> drought

gg <- ggplot()
gg <- gg + geom_map(data=us_map, map=us_map,
                    aes(x=long, y=lat, map_id=id),
                    color="white", size=0.5)
gg <- gg + geom_map(data=drought, map=us_map,
                    aes(fill=value, map_id=State))
gg <- gg + geom_map(data=drought, map=us_map,
                    aes(map_id=State),
                    fill="#ffffff", alpha=0, color="white",
                    show_guide=FALSE)
gg <- gg + geom_text(data=centers, aes(label=id, x=x, y=y), color="white", size=4)
gg <- gg + scale_fill_distiller(name="State\nDrought\nCoverage", palette="RdPu", na.value="#7f7f7f",
                                labels=sprintf("%d%%", c(0, 25, 50, 75, 100)))
gg <- gg + coord_map()
gg <- gg + facet_wrap(~intensity, ncol=2)
gg <- gg + labs(x=NULL, y=NULL, title=sprintf("U.S. Drought Conditions as of %s\n", Sys.Date()))
gg <- gg + theme_bw()
gg <- gg + theme(plot.title=element_text(face="bold", hjust=0, size=24))
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(panel.margin=unit(3, "lines"))
gg <- gg + theme(panel.grid=element_blank())
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(axis.text=element_blank())
gg <- gg + theme(strip.background=element_blank())
gg <- gg + theme(strip.text=element_text(face="bold", hjust=0, size=14))
gg <- gg + theme(legend.position=c(0.75, 0.15))
gg <- gg + theme(legend.direction="horizontal")
gg <- gg + theme(legend.title.align=1)

png(sprintf("%s.png", today), width=800, height=800)
print(gg)
dev.off()

20150515

Now, you can easily animate these over time to show the progression/regression of the drought conditions. If you're sure your audience can work with SVG files, you can use those for very crisp/sharp maps (and even feed it to D3 or path editing tools). If you have an example of how you're using hexbin choropleths, drop a note in the comments. The code from above is also on github.

I’ve been seeing an uptick in static US “lower 48” maps with “meh” projections this year, possibly caused by a flood of new folks resolving to learn R but using pretty old documentation or tutorials. I’ve also been seeing an uptick in folks needing to geocode US city/state to lat/lon. I thought I’d tackle both in a quick post to show how to (simply) use a decent projection for lower 48 US maps and then how to use a _very_ basic package I wrote – [localgeo](http://github.com/hrbrmstr/localgeo) to avoid having to use an external API/service for basic city/state geocoding.

### Albers All The Way

I could just plot an Albers projected map, but it’s more fun to add some data. We’ll start with some setup libraries and then read in some recent earthquake data, then filter it for our map display:

library(ggplot2)
library(dplyr)
library(readr) # devtools::install_github("hadley/readr")
 
# Earthquakes -------------------------------------------------------------
 
# get quake data ----------------------------------------------------------
quakes <- read_csv("http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_month.csv")
 
# filter all but lower 48 US ----------------------------------------------
quakes %>%
  filter(latitude>=24.396308, latitude<=49.384358,
         longitude>=-124.848974, longitude<=-66.885444) -> quakes
 
# bin by .5 ---------------------------------------------------------------
quakes$Magnitude <- as.numeric(as.character(cut(quakes$mag, breaks=c(2.5, 3, 3.5, 4, 4.5, 5),
    labels=c(2.5, 3, 3.5, 4, 4.5), include.lowest=TRUE)))

Many of my mapping posts use quite a few R geo libraries, but this one just needs `ggplot2`. We extract the US map data, turn it into something `ggplot` can work with, then plot our quakes on the map:

us <- map_data("state")
us <- fortify(us, region="region")
 
# theme_map ---------------------------------------------------------------
devtools::source_gist("33baa3a79c5cfef0f6df")
 
# plot --------------------------------------------------------------------
gg <- ggplot()
gg <- gg + geom_map(data=us, map=us,
                    aes(x=long, y=lat, map_id=region, group=group),
                    fill="#ffffff", color="#7f7f7f", size=0.25)
gg <- gg + geom_point(data=quakes,
                      aes(x=longitude, y=latitude, size=Magnitude),
                      color="#cb181d", alpha=1/3)
gg <- gg + coord_map("albers", lat0=39, lat1=45)
gg <- gg + theme_map()
gg <- gg + theme(legend.position="right")
gg

2.5+ mag quakes in Lower US 48 in past 30 days

Plot_Zoom

### Local Geocoding

There are many APIs with corresponding R packages/functions to perform geocoding (one really spiffy recent one is [geocodeHERE](http://cran.r-project.org/web/packages/geocodeHERE/)). While Nokia’s service is less restrictive than Google’s, most of these sites are going to have some kind of restriction on the number of calls per second/minute/day. You could always install the [Data Science Toolkit](http://www.datasciencetoolkit.org/) locally (note: it was down as of the original posting of this blog) and perform the geocoding locally, but it does take some effort (and space/memory) to setup and get going.

If you have relatively clean data and only need city/state resolution, you can use a package I made – [localgeo](http://github.com/hrbrmstr/localgeo) as an alternative. I took a US Gov census shapefile and extracted city, state, lat, lon into a data frame and put a lightweight function shim over it (it’s doing nothing more than `dplyr::left_join`). It won’t handle nuances like “St. Paul, MN” == “Saint Paul, MN” and, for now, it requires you to do the city/state splitting, but I’ll be tweaking it over the year to be a bit more forgiving.

We can give this a go and map the [greenest cities in the US in 2014](http://www.nerdwallet.com/blog/cities/greenest-cities-america/) as crowned by, er, Nerd Wallet. I went for “small data file with city/state in it”, so if you know of a better source I’ll gladly use it instead. Nerd Wallet used DataWrapper, so getting the actual data was easy and here’s a small example of how to get the file, perform the local geocoding and use an Albers projection for plotting the points. The code below assumes you’re still in the R session that used some of the `library` calls earlier in the post.

library(httr)
library(localgeo) # devtools::install_github("hrbrmstr/localgeo")
library(tidyr)
 
# greenest cities ---------------------------------------------------------
# via: http://www.nerdwallet.com/blog/cities/greenest-cities-america/
 
url <- "https://gist.githubusercontent.com/hrbrmstr/1078fb798e3ab17556d2/raw/53a9af8c4e0e3137a0a8d4d6332f7a6073d93fb5/greenest.csv"
greenest <- read.table(text=content(GET(url), as="text"), sep=",", header=TRUE, stringsAsFactors=FALSE)
 
greenest %>%
  separate(City, c("city", "state"), sep=", ") %>%
  filter(!state %in% c("AK", "HI")) -> greenest
 
greenest_geo <- geocode(greenest$city, greenest$state)
 
gg <- ggplot()
gg <- gg + geom_map(data=us, map=us,
                    aes(x=long, y=lat, map_id=region, group=group),
                    fill="#ffffff", color="#7f7f7f", size=0.25)
gg <- gg + geom_point(data=greenest_geo,
                      aes(x=lon, y=lat),
                      shape=21, color="#006d2c", fill="#a1d99b", size=4)
gg <- gg + coord_map("albers", lat0=39, lat1=45)
gg <- gg + labs(title="Greenest Cities")
gg <- gg + theme_map()
gg <- gg + theme(legend.position="right")
gg

Nerd Wallets’s Greenest US (Lower 48) Cities 2014

Plot_Zoom 2

Let me reinforce that the `localgeo` package will most assuredly fail to geocode some city/state combinations. I’m looking for a more comprehensive shapefile to ensure I have the complete list of cities and I’ll be adding some code to help make the lookups more forgiving. It may at least help when you bump into an API limit and need to crank out something in a hurry.