Skip navigation

Category Archives: cartography

Moritz Stefaner started off 2016 with a [very spiffy post](http://truth-and-beauty.net/experiments/ach-ingen-zell/) on _”a visual exploration of the spatial patterns in the endings of German town and village names”_. Moritz was [exploring some new data processing & visualization tools](https://github.com/moritzstefaner/ach-ingen-zell) for the post, but when I saw what he was doing I wondered how hard it would be to do something similar in R and also used it as an opportunity to start practicing a new habit in 2016: packages vs projects.

To state more precisely the goals for this homage, the plan was to:

– use as close to the same data sets Mortiz has in his github repo, _including_ the ones in pure javascript
– generate an HTML page as output that is as close to the style in Moritz’s visualization
– use R for _everything_ (i.e. no “cheating” by sneaking in some javascript via `htmlwidgets`)
– bundle everything into a package to take advantage of all the good stuff that comes with R package validation

You may want to [take a look at the result](http://rud.is/zellingenach.html) to see if you want to continue reading (I hope you will!).

### The Setup
rud_is_zellingenach_htmlBy using an R package as the framework for the visualization, it’s possible to keep the data with the code and also organize and document the code in a way that makes it easy for folks to use and explore without cutting and pasting (our `source`ing) code. It also makes it possible to list all the dependencies for the project and help ensure they’ll be installed when someone tries to work with it.

While I _could_ have converted Moritz’s processed data into R data files, I left the CSV intact and the javascript file of suffix groupings also intact to show that R is extremely flexible when it comes to data processing (which is a “duh” for most folks by this point but the use of javascript data structures might give some folks ideas as how to reduce data duplication between projects). Both these files get stored in the `inst/alt` folder of the source package. I also end up using some CSS for the final visualization and placed that into a file in the same directory, which makes the code that generates the HTML a bit cleaner.

Because R processes some things automatically (like `.onAttach`) when it interacts with a package one can have it provide helpful instructions (in this case, how to generate the visualization) in similar fashion to the `ggplot2` loading messages.

Similarly, there both the package itself and the package functions have documentation to help folks understand both what the package and each component is doing.

### The Fun Stuff
rud_is_zellingenach_htmlThe CSV file of places looks something like this:

name,latitude,longitude
Nierskanal,49.01,13.23
Zwiefelhof,49.22,11.18
Zwiefaltendorf,48.21,9.51
Zwiefalten,48.23,9.46
Zwiedorf,53.69,13.05
Zwickgabel,48.58,8.31
Zwickau,50.72,12.48
Zwethau,51.58,13.04
Zwesten,51.05,9.17

and, the suffix groupings list looks like this:

const suffixList = [
  ["ach", "a", "aa", "ah"],
  ["ar", "ahr"],
  ["ate", "te", "nit", "net"],
  ["au", "aue", "oog", "ooge", "ohe", "oie"],
  ["bach", "bach", "bek", "beken", "beck", "bke"],
  ["berg", "bergen", "barg", "bargen"],
  ["born", "bronn"],
  ["bruch", "broich", "brook", "brock", "brauk"],
  ["bruck", "brück", "brügge"],
  ...
];

While `read.csv` (no need for `readr` as the file is small) can handle the CSV file, we use the `V8` package to source the javascript and convert it to an R object:

ct <- v8()
ct$source(system.file("alt/suffixlist.js", package="zellingenach"))
ct$get("suffixList")

We actually turn that into a vector of regular expressions (for town name ending checking) and a list of vectors (for the HTML visualization creation). Check out `suffix_regex()` and `suffix_names()` in the source code.

The `read_places()` function builds a `data.frame` of the places combined with the suffix grouping(s) they belong to:

# read in the file
plc <- read.csv(system.file("alt/placenames_de.tsv", package="zellingenach"),
                stringsAsFactors=FALSE)
 
# iterate over each suffix and identify which place names match the grouping
lapply(suf, function(regex) {
  which(stri_detect_regex(plc$name, regex))
}) -> matched_endings
 
plc$found <- ""
 
# add which grouping(s) the place was found to a new column
for(i in 1:length(matched_endings)) {
  where_found <- matched_endings[[i]]
  plc$found[where_found] <-
    paste0(plc$found[where_found], sprintf("%d|", i))
}
 
# some don't match so get rid of them
mutate(filter(plc, found != ""), found=sub("\\|$", "", found))

I do something a bit different than Moritz in that in that I allow towns to be part of multiple suffix groups, since:

– I’m neither a historian nor expert in German town naming conventions, and
– the javascript version and this R version both take a naive approach to suffix mapping.

This means my numbers (for the _”#### places”_ label) will be different for some of my maps.

R has similar shortcut functions (Mortiz uses D3) to make hexgrids out of shapefiles. Here’s the entirety of `create_hexgrid()`:

de_shp <- getData("GADM", country="DEU", level=0, path=tempdir())
 
de_hex_pts <- spsample(de_shp, type="hexagonal", n=10000, cellsize=0.19,
                       offset=c(0.5, 0.5), pretty=TRUE)
 
HexPoints2SpatialPolygons(de_hex_pts)

You can play with `cellsize` to change the number of hexes. I tried to find a good number to get close to the # in Moritz’s maps.

This all gets put together in `make_maps()` where we use `ggplot2` to build 52 gridded heatmaps (one for each suffix grouping). I used a log of the counts to map to a binned viridis color scale, so my colors come out a bit different than Moritz’s but the overall patterns are on par with his.

Finally, `display_maps()` takes the list created by `make_maps()` and builds out an HTML page using the `htmltools` package for the page framework and `svglite::htmlSVG` to make SVGs of the ggplot objects). NOTE that you can use the `output_file` option of `display_maps()` to send the HTML to a file as well as display it in the viewer/browser.

### Fin
rud_is_zellingenach_htmlBecause the project is in a pacakge, we can run package checks to see if we’re missing anything including other pacakge dependencies, function documentation and other details that the package tools are gleeful to point out. We can also include code to test out our various components to ensure they are behaving as expected (i.e. generating the right data/output).

Once nice thing about the output is that it’s “responsive”, which means it handles multiple screen sizes quite well. So, if your screen is huge, you’ll have many map boxes on one line and if it’s small (like the `iframe` below) it will have fewer.

You’ll see that my maps are a bit bigger than Moritz’s. This is due to both the hex grid size and the fact that the SVG output is just slightly larger overall than the ones made by D3. Of note: I noticed some suffix subtitle components wrapped at the “-” so I converted the plain dashes to non-breaking ones `‑`/”‑”.

The one downside to using a package for this is that it’s harder to post complete code into a blog post, but you can [clone the repo](https://github.com/hrbrmstr/zellingenach) to look at the code and skip the dissection and just generate the visualization locally via:

install.packages("ggalt")
# OR: devtools::install_github("hrbrmstr/ggalt") 
devtools::install_github("hrbrmstr/zellingenach")
display_maps()

By targeting SVG & HTML, we can make a cross-platform, crisp and responsive visualization all without leaving RStudio.

If you caught any errors or made something cool with any of the code, please drop an issue on github and a note in the comments (respectively)!

If you prefer a single- `source`-able version, please see [this gist](https://gist.github.com/hrbrmstr/f3d2568ad0f27b2384d3).

Happy New YeaR!

James Austin (@awhstin) made some #spiffy 4-panel maps with base R graphics but also posited he didn’t use ggplot2 because:

ggplot2 and maps currently do not support world maps at this point, which does not give us a great overall view.

That is certainly a box I would not put ggplot2 into, especially with the newly updated R maps (et al) packages, ggplot2 2.0 and my (still in development) ggalt package (though this was all possible before ggplot2 2.0 and ggalt). NOTE: I have no idea why I get so defensive about ggplot2 besides the fact that it’s one the best visualization tools ever created.

Here’s all you need to use the built-in facet options of ggplot2 to make the 4-panel plot (as James points out, you can get the data file from here: CLIWOC15.csv):

library(ggplot2)  # FYI you need v2.0
library(dplyr)    # yes, i could have not done this and just used 'subset' instead of 'filter'
library(ggalt)    # devtools::install_github("hrbrmstr/ggalt")
library(ggthemes) # theme_map and tableau colors
 
world <- map_data("world")
world <- world[world$region != "Antarctica",] # intercourse antarctica
 
dat <- read.csv("CLIWOC15.csv")        # having factors here by default isn't a bad thing
dat <- filter(dat, Nation != "Sweden") # I kinda feel bad for Sweden but 4 panels look better than 5 and it doesn't have much data
 
gg <- ggplot()
gg <- gg + geom_map(data=world, map=world,
                    aes(x=long, y=lat, map_id=region),
                    color="white", fill="#7f7f7f", size=0.05, alpha=1/4)
gg <- gg + geom_point(data=dat, 
                      aes(x=Lon3, y=Lat3, color=Nation), 
                      size=0.15, alpha=1/100)
gg <- gg + scale_color_tableau()
gg <- gg + coord_proj("+proj=wintri")
gg <- gg + facet_wrap(~Nation)
gg <- gg + theme_map()
gg <- gg + theme(strip.background=element_blank())
gg <- gg + theme(legend.position="none")
gg

facetmaps

You can use a separate shapefile if you want, but this is quite minimalist (a feature James suggests is desirable) and emphasizes the routes quite nicely IMO.

I saw this post over at NatGeo over the weekend and felt compelled to replicate this:

with ggplot2.

Three shapefiles later and we have it close enough to toss into a post (and I really don’t believe the continent names are necessary).

library(rgdal)
library(ggplot2)
library(ggthemes)
library(ggalt) # devtools::install_github("hrbrmstr/ggalt")

# grab these from http://rud.is/dl/quakefiles.tgz

world <- readOGR("countries.geo.json", "OGRGeoJSON", stringsAsFactors=FALSE)
plates <- readOGR("plates.json", "OGRGeoJSON", stringsAsFactors=FALSE)
quakes <- readOGR("quakes.json", "OGRGeoJSON", stringsAsFactors=FALSE)

world_map <- fortify(world)
plates_map <- fortify(plates)
quakes_dat <- data.frame(quakes)
quakes_dat$trans <- quakes_dat$mag %% 5

gg <- ggplot()
gg <- gg + geom_cartogram(data=world_map, map=world_map,
                          aes(x=long, y=lat, map_id=id),
                          color="white", size=0.15, fill="#d8d8d6")
gg <- gg + geom_cartogram(data=plates_map, map=plates_map,
                          aes(x=long, y=lat, map_id=id),
                          color="black", size=0.1, fill="#00000000", alpha=0)
gg <- gg + geom_point(data=quakes_dat,
                      aes(x=coords.x1, y=coords.x2, size=trans),
                      shape=1, alpha=1/3, color="#d47e5d", fill="#00000000")
gg <- gg + geom_point(data=subset(quakes_dat, mag>7.5),
                      aes(x=coords.x1, y=coords.x2, size=trans),
                      shape=1, alpha=1, color="black", fill="#00000000")
gg <- gg + geom_text(data=subset(quakes_dat, mag>7.5),
                     aes(x=coords.x1, y=coords.x2, label=sprintf("Mag %2.1f", mag)),
                     color="black", size=3, vjust=c(3.9, 3.9, 5), fontface="bold")
gg <- gg + scale_size(name="Magnitude", trans="exp", labels=c(5:8), range=c(1, 20))
gg <- gg + coord_map("mollweide")
gg <- gg + theme_map()
gg <- gg + theme(legend.position=c(0.05, 0.99))
gg <- gg + theme(legend.direction="horizontal")
gg <- gg + theme(legend.key=element_rect(color="#00000000"))
gg

unnamed-chunk-1-1

I can only imagine how many mouse clicks that would be in a GIS program.

Addendum

Avast, me hearties! It’s time four t’ annual International Talk Like a Pirate Day #rstats post!

(OK, I won’t make you suffer continuous pirate-speak for the entire post)

I tried to be a bit more practical this year and have two treasuRe chests for you to (hopefully) enjoy.

A Package Full o’ Pirates

I’ve covered the Anti-shipping Activity Messages (ASAM) Database before for TLAPD before but getting, updating and working with the data has more pain points than it should, so I wrapped a small package around it.

Here’s how to get all pirate attacks this year (2015) so far:

# devtools::install_github("hrbrmstr/asam")
library(asam)
 
data(asam_shp)
pirates <- subset(asam_shp,
                  grepl("pirate", Aggressor, ignore.case=TRUE) &
                  format(DateOfOcc, "%Y") == "2015")
 
nrow(pirates)
## [1] 78

It looks like there have been 78 registered pirate attacks this year. The National Geospatial Intelligence Agency (NGIA) marks the attacks by lat/lon and also by region/subregion, and I managed to obtain the official polygons for these regions, so we can plot these attacks on a world map and also show the subregions:

library(ggplot2)
 
# get the ASAM subregion polygons
subregions <- asam_subregions()
subregions_map <- fortify(subregions)
 
# get the world map
world <- map_data("world")
 
# get the points for the pirate attack occurrences
pirate_pts <- data.frame(pirates)
 
gg <- ggplot()
 
# world map layer
gg <- gg + geom_map(data=world, map=world,
                    aes(x=long, y=lat, map_id=region),
                    color="black", fill="#e7e7e7", size=0.15)
# ASAM regions layer
gg <- gg + geom_map(data=subregions_map, map=subregions_map,
                    aes(x=long, y=lat, map_id=id),
                    color="white", fill="white", size=0.15, alpha=0)
 
# attacks
gg <- gg + geom_point(data=pirate_pts, color="black", fill="yellow", 
                      aes(x=coords.x1, y=coords.x2), shape=21)
 
gg <- gg + xlim(-170, 170)
gg <- gg + ylim(-58, 75)
gg <- gg + coord_map("mollweide")
gg <- gg + theme_map()
gg <- gg + theme(panel.background=element_rect(fill="steelblue"))
gg

README-map-1

There is quite a bit more data than just location, though and we can work with it much better in an interactive map.

Makin’ Interactive Pirate Maps

Now, what makes the following an interactive pirate map is not so much the fact that we’ll be plotting points of pirate attacks on a Leaflet map, but we’ll also be using a pirate treasure map theme on the Leaflet map.

Let’s start with showing how to use a general pirate map theme before adding in the ASAM data.

You’ll have to pause here and head on over to MapBox to register for a (free) account. You’ll need to go through the gyrations to eventually get a public token and mapbox id to use the pirate map tiles they have. I store those in my .Renviron so I don’t have to cut/paste inane strings when I need to use this, or other, APIs or need to keep them from prying eyes. Since MapBox exposes these strings in GET call URLs, the use of environment variables is strictly for convenience in this case.

library(leaflet)
 
mapbox_public_token <- Sys.getenv("MAPBOX_PUBLIC_TOKEN")
mapbox_map_id <- Sys.getenv("PIRATE_MAP_ID")
mapbox_url <- "https://a.tiles.mapbox.com/v4/%s/{z}/{x}/{y}.png?access_token=%s"
mapbox_tiles_template <- sprintf(mapbox_url, mapbox_map_id, mapbox_public_token)

Now, what good is a pirate map without an ‘X’ marking the spot for some treasure. For that we’ll need an ‘X’:

in a format we can use with Leaflet:

x_marker <- icons("http://rud.is/dl/x.png",
                  iconHeight=64, iconWidth=64,
                  iconAnchorX=32, iconAnchorY=32)

Now, we can display a pirate map for all scurvy dogs to see:

leaflet() %>%
  addTiles(mapbox_tiles_template) %>%
  setView(lng=-50.9249, lat=45.68929, zoom=3) %>%
  addMarkers(-70.2667, 43.6667, icon=x_marker)

NOTE: I have not buried treasure in Portland, Maine, but go nuts digging at that location if you still want to.

Pirates on Pirate Maps

We can make a [crude] interactive ASAM browser by combining our data from above with our new, pirate-y mapping capabilities:

library(asam)
library(sp)
library(dplyr)
library(leaflet)
 
data(asam_shp)
dat <- subset(asam_shp,
              DateOfOcc > as.Date("2015-01-01") &
                grepl("pirate", Aggressor, ignore.case=TRUE))
# could also do data.frame(dat)
dat <- bind_cols(dat@data, data.frame(coordinates(dat), stringsAsFactors=FALSE))

We’ll build a popup with the ASAM incident description fields and add it and the pirate incident points to a pirate-themed Leaflet map:

popup_template <- '<div style="background:#f3e0b5; padding:10px"><b>Date:</b> %s
<span style="float:right"><a target="_blank" href="https://msi.nga.mil/NGAPortal/msi/query_results.jsp?MSI_queryType=ASAM&amp;MSI_generalFilterType=SpecificNumber&amp;MSI_generalFilterValue=%s&amp;MSI_additionalFilterType1=None&amp;MSI_additionalFilterType2=-999&amp;MSI_additionalFilterValue1=-999&amp;MSI_additionalFilterValue2=-999&amp;MSI_outputOptionType1=SortBy&amp;MSI_outputOptionType2=-999&amp;MSI_outputOptionValue1=Date_DESC&amp;MSI_outputOptionValue2=-999&amp;MSI_MAP=-999">ASAM Record</a>
</span><br/>
<b>Victim:</b> %s<br/>
<b>Description:</b> %s</div>'
 
nona <- function(x) ifelse(is.na(x), " ", x)
 
pirate_pops <- sprintf(popup_template,
                       dat$date,
                       gsub("-", "_", dat$Reference),
                       dat$Victim,
                       paste0(nona(dat$Descript),
                              nona(dat$Desc1), nona(dat$Desc2), nona(dat$Desc3),
                              nona(dat$Desc4), nona(dat$Desc5), nona(dat$Desc6),
                              sep=" "))
 
mapbox_public_token <- Sys.getenv("MAPBOX_PUBLIC_TOKEN")
mapbox_map_id <- Sys.getenv("PIRATE_MAP_ID")
mapbox_url <- "https://a.tiles.mapbox.com/v4/%s/{z}/{x}/{y}.png?access_token=%s"
mapbox_tiles_template <- sprintf(mapbox_url, mapbox_map_id, mapbox_public_token)
 
leaflet() %>%
  addTiles(mapbox_tiles_template) %>%
  setView(lng=-50.9249, lat=45.68929, zoom=3) %>%
  addCircles(dat$coords.x1, dat$coords.x2, radius=300,
             color="#664c1f", popup=pirate_pops)

Select any of the circle marks and you’ll get a popup with a description and link to the official ASAM record (like this):

tlapd2015012_html

Fin

I’m not sure when I’ll get back to the asam package, but it could use some attention. The Aggressor field could be auto-cleaned to make it more usable and a dplyr-esque interface could be developed to select incidents. Also, since it includes a shapefile of subregions, that could also be used to do more spatial-oriented analyses of the incidents. It’s all there for any pirate lackey to pilfer.

Drop a note in the comments if you have any of your own pirate-y creations or an issue on github for feature requests & bug reports.

I engage with the Stack[Overflow|Exchange] community quite a bit and was super-happy @treycausey made the [Stack Overflow bot](https://twitter.com/StackOverflowR) (@StackOverflowR) since I’m also on Twitter alot (mostly hanging out in these days).

However, questions exist in other Stack watering holes, like the [Geographic Information Systems Stack Exchange](http://gis.stackexchange.com/questions/tagged/r). [Cross Validated](http://stats.stackexchange.com/questions/tagged/r) and the fledgling [Data Science Stack Exchange](http://datascience.stackexchange.com/questions/tagged/r). They are all easy enough to follow in your favorite RSS reader (which _is_ @feedly, _right_?), but it’s also equally as easy to turn them into Twitter bots (here they are):

– @DataSciSERBot (Data Science Stack Exchange posts)
– @GISStackExchR (Geographic Information Systems posts)
– @RStatsStExBot (Cross Validated Stack Exchange posts)

They use [this IFTTT recipe](https://ifttt.com/recipes/322136-twitter-bot-for-data-science-stack-exchange-rstats-questions) to take the RSS feeds from each Stack community and turn them into tweets. Each forum (and, hence, Twitter bot) has _way_ less volume than @StackOverflowR and also tend to be of less general interest than @StackOverflowR. However, each specialized question forums [usually] have _really_ interesting problems & answers that I’ve learned a great deal from. You may get inspiration for a solution to something, see a really neat way to accomplish a task, get an idea for a new R package or just gain new and useful knowledge in areas previously unfamiliar to you.

They won’t be spamming the Twitter channel (the errant tag on one of the tweets was fixed in the recipe), so you’ll have to follow them or deliberately check in on them to see the updates.

I’ll try to keep an eye out for other Stack communities that feature and add them to the bot family. If you see any I’ve missed or am missing in the future, drop a comment here or note to @hrbrmstr on Twitter.

In the never-ending battle for truth, justice and publishing more R
packages than [Oliver](http://twitter.com/quominus), I whipped out an R
package for the [OpenStreetMap Nominatim
API](http://wiki.openstreetmap.org/wiki/Nominatim). It actually hits the
[MapQuest Nominatim Servers](http://open.mapquestapi.com/nominatim/) for
most of the calls, but the functionality is the same.

The R package lets you:

– `address_lookup`: Lookup the address of one or multiple OSM objects
like node, way or relation.
– `osm_geocode`: Search for places by address
– `osm_search`: Search for places
– `osm_search_spatial`: Search for places, returning a list of
`SpatialPointsDataFrame`, `SpatialLinesDataFrame` or a
`SpatialPolygonsDataFrame`
– `reverse_geocode_coords`: Reverse geocode based on lat/lon
– `reverse_geocode_osm`: Reverse geocode based on OSM Type & Id

Just like Google Maps, these services are not meant to be your
freebie-access to mega-bulk-geocoding. You can and should pay for that.
But, when you need a few items geocoded (or want to lookup some
interesting things on OSM since it provides [special
phrases](http://wiki.openstreetmap.org/wiki/Nominatim/Special_Phrases)
to work with), Nominatim lookups can be just what’s needed.

Let’s say we wanted to see where pubs are in the Seattle metro area.
That’s a simple task for nominatim:

# devtools::install_github("hrbrmstr/nominatim")
library(nominatim)
library(dplyr)
 
sea_pubs <- osm_search("pubs near seattle, wa", limit=20)
 
glimpse(sea_pubs)
 
## Observations: 20
## Variables:
## $ place_id     (chr) "70336054", "82743439", "11272568", "21478701", "...
## $ licence      (chr) "Data © OpenStreetMap contributors, ODbL 1.0. htt...
## $ osm_type     (chr) "way", "way", "node", "node", "node", "node", "no...
## $ osm_id       (chr) "51460516", "96677583", "1077652159", "2123245933...
## $ lat          (dbl) 47.64664, 47.63983, 47.60210, 47.62438, 47.59203,...
## $ lon          (dbl) -122.3503, -122.3023, -122.3321, -122.3559, -122....
## $ display_name (chr) "Nickerson Street Saloon, 318, Nickerson Street, ...
## $ class        (chr) "amenity", "amenity", "amenity", "amenity", "amen...
## $ type         (chr) "pub", "pub", "pub", "pub", "pub", "pub", "pub", ...
## $ importance   (dbl) 0.201, 0.201, 0.201, 0.201, 0.201, 0.201, 0.201, ...
## $ icon         (chr) "http://mq-open-search-int-ls03.ihost.aol.com:800...
## $ bbox_left    (dbl) 47.64650, 47.63976, 47.60210, 47.62438, 47.59203,...
## $ bbox_top     (dbl) 47.64671, 47.63990, 47.60210, 47.62438, 47.59203,...
## $ bbox_right   (dbl) -122.3504, -122.3025, -122.3321, -122.3559, -122....
## $ bbox_bottom  (dbl) -122.3502, -122.3022, -122.3321, -122.3559, -122....

We can even plot those locations:

library(rgdal)
library(ggplot2)
library(ggthemes)
library(sp)
library(DT)
 
# Grab a neighborhood map of Seattle
url <- "https://data.seattle.gov/api/file_data/VkU4Er5ow6mlI0loFhjIw6eL6eKEYMefYMm4MGcUakU?filename=Neighborhoods.zip"
fil <- "seattle.zip"
if (!file.exists(fil)) download.file(url, fil)
if (!dir.exists("seattle")) unzip(fil, exdir="seattle")
 
# make it usable
sea <- readOGR("seattle/Neighborhoods/WGS84/Neighborhoods.shp", "Neighborhoods")
 
## OGR data source with driver: ESRI Shapefile 
## Source: "seattle/Neighborhoods/WGS84/Neighborhoods.shp", layer: "Neighborhoods"
## with 119 features
## It has 12 fields
 
sea_map <- fortify(sea)
 
# Get the extenes of where the pubs are so we can "zoom in"
bnd_box <- bbox(SpatialPoints(as.matrix(sea_pubs[, c("lon", "lat")])))
 
# plot them
gg <- ggplot()
gg <- gg + geom_map(data=sea_map, map=sea_map,
                    aes(x=long, y=lat, map_id=id),
                    color="black", fill="#c0c0c0", size=0.25)
gg <- gg + geom_point(data=sea_pubs, aes(x=lon, y=lat),
                      color="#ffff33", fill="#ff7f00",
                      shape=21, size=4, alpha=1/2)
# decent projection for Seattle-y things and expand the zoom/clip a bit
gg <- gg + coord_map("gilbert",
                     xlim=extendrange(bnd_box["lon",], f=0.5),
                     ylim=extendrange(bnd_box["lat",], f=0.5))
gg <- gg + labs(title="Seattle Pubs")
gg <- gg + theme_map()
gg <- gg + theme(title=element_text(size=16))
gg

seattle_map-1

Of course you can geocode:

addrs <- osm_geocode(c("1600 Pennsylvania Ave, Washington, DC.",
                     "1600 Amphitheatre Parkway, Mountain View, CA",
                     "Seattle, Washington"))
addrs %>% select(display_name)
 
## Source: local data frame [3 x 1]
## 
##                                                                  display_name
## 1                  Washington, District of Columbia, United States of America
## 2 Mountainview Lane, Huntington Beach, Orange County, California, 92648, Unit
## 3                  Seattle, King County, Washington, United States of America
 
addrs %>% select(lat, lon)
 
## Source: local data frame [3 x 2]
## 
##        lat        lon
## 1 38.89495  -77.03665
## 2 33.67915 -118.02588
## 3 47.60383 -122.33006

Or, reverse geocode:

# Reverse geocode Canadian embassies
# complete list of Canadian embassies here:
# http://open.canada.ca/data/en/dataset/6661f0f8-2fb2-46fa-9394-c033d581d531
 
embassies <- data.frame(lat=c("34.53311", "41.327546", "41.91534", "36.76148", "-13.83282",
                             "40.479094", "-17.820705", "13.09511", "13.09511"),
                       lon=c("69.1835", "19.818698", "12.50891", "3.0166", "-171.76462",
                             "-3.686115", "31.043559", "-59.59998", "-59.59998"), stringsAsFactors=FALSE)
 
emb_coded_coords <- reverse_geocode_coords(embassies$lat, embassies$lon)
 
emb_coded_coords %>% select(display_name)
 
## Source: local data frame [9 x 1]
## 
##                                                                  display_name
## 1                Embassy of Canada, Ch.R.Wazir Akbar Khan, Kabul, Afghanistan
## 2 Monumenti i Skënderbeut, Skanderbeg Square, Lulishtja Këshilli i Europëes, 
## 3 Nomentana/Trieste, Via Nomentana, San Lorenzo, Salario, Municipio Roma II, 
## 4 18, Avenue Khalef Mustapha, Ben Aknoun, Daïra Bouzareah, Algiers, Ben aknou
## 5                               The Hole in the Wall, Beach Road, Āpia, Samoa
## 6 Torre Espacio, 259 D, Paseo de la Castellana, Fuencarral, Fuencarral-El Par
## 7 Leopold Takawira Street, Avondale West, Harare, Harare Province, 00263, Zim
## 8                    Bishop's Court Hill, Bridgetown, Saint Michael, Barbados
## 9                    Bishop's Court Hill, Bridgetown, Saint Michael, Barbados

It can even return `Spatial` objects (somewhat experimental):

# stock example search from OSM
osm_search_spatial("[bakery]+berlin+wedding", limit=5)[[1]]
 
##            coordinates   place_id
## 1 (13.34931, 52.54165)    9039748
## 2 (13.34838, 52.54125) 2659941153
## 3 (13.35678, 52.55138)   23586341
## 4 (13.34985, 52.54158)    7161987
## 5  (13.35348, 52.5499)   29179742
##                                                                               licence
## 1 Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
## 2 Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
## 3 Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
## 4 Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
## 5 Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
##   osm_type     osm_id      lat      lon
## 1     node  939667448 52.54165 13.34931
## 2     node 3655549445 52.54125 13.34838
## 3     node 2299953786 52.55138 13.35678
## 4     node  762607353 52.54158 13.34985
## 5     node 2661679367 52.54990 13.35348
##                                                                                    display_name
## 1            Baguetterie, Föhrer Straße, Brüsseler Kiez, Wedding, Mitte, Berlin, 13353, Germany
## 2 Föhrer Cafe & Backshop, Föhrer Straße, Brüsseler Kiez, Wedding, Mitte, Berlin, 13353, Germany
## 3               Körfez, Amsterdamer Straße, Leopoldkiez, Wedding, Mitte, Berlin, 13347, Germany
## 4             Knusperbäcker, Torfstraße, Brüsseler Kiez, Wedding, Mitte, Berlin, 13353, Germany
## 5             Hofbäckerei, Müllerstraße, Brüsseler Kiez, Wedding, Mitte, Berlin, 13353, Germany
##   class   type importance
## 1  shop bakery      0.201
## 2  shop bakery      0.201
## 3  shop bakery      0.201
## 4  shop bakery      0.201
## 5  shop bakery      0.201
##                                                                                                      icon
## 1 http://mq-open-search-int-ls04.ihost.aol.com:8000/nominatim/v1/images/mapicons/shopping_bakery.p.20.png
## 2 http://mq-open-search-int-ls04.ihost.aol.com:8000/nominatim/v1/images/mapicons/shopping_bakery.p.20.png
## 3 http://mq-open-search-int-ls04.ihost.aol.com:8000/nominatim/v1/images/mapicons/shopping_bakery.p.20.png
## 4 http://mq-open-search-int-ls04.ihost.aol.com:8000/nominatim/v1/images/mapicons/shopping_bakery.p.20.png
## 5 http://mq-open-search-int-ls04.ihost.aol.com:8000/nominatim/v1/images/mapicons/shopping_bakery.p.20.png
##    bbox_left   bbox_top bbox_right bbox_bottom
## 1 52.5416504 52.5416504  13.349306   13.349306
## 2 52.5412496 52.5412496 13.3483832  13.3483832
## 3 52.5513806 52.5513806 13.3567785  13.3567785
## 4   52.54158   52.54158 13.3498507  13.3498507
## 5 52.5499029 52.5499029 13.3534756  13.3534756

The lookup functions are vectorized but there’s a delay built in to
avoid slamming the free servers.

Some things on the TODO list are:

– enabling configuration of timeouts
– enabling switching Nominatim API server providers (you can host your
own!)
– better `Spatial` support

So, give the [code a spin](https://github.com/hrbrmstr/nominatim) and
submit feature requests/issues to github!

Despite having shown various ways to overcome D3 cartographic envy, there are always more examples that can cause the green monster to rear it’s ugly head.

Take the Voronoi Arc Map example.


Voronoi_Arc_Map

For those in need of a primer, a Voronoi tesslation/diagram is:

a partitioning of a plane into regions based on distance to points in a specific subset of the plane. That set of points (called seeds, sites, or generators) is specified beforehand, and for each seed there is a corresponding region consisting of all points closer to that seed than to any other. Wikipedia

We can overlay a Voronoi tessalation on top of a map in R as well thanks to the deldir package (which has been around since the “S” days!). Let’s get (most of) the package requirements cruft out of the way, first:

library(sp)
library(rgdal)
library(deldir)
library(dplyr)
library(ggplot2)
library(ggthemes)

Now we’ll [ab]use the data from the Arc Map example:

flights <- read.csv("http://bl.ocks.org/mbostock/raw/7608400/flights.csv", stringsAsFactors=FALSE)
airports <- read.csv("http://bl.ocks.org/mbostock/raw/7608400/airports.csv", stringsAsFactors=FALSE)

Since the D3 example cheats and only uses the continental US (CONUS) we’ll do the same and we’ll also filter out only those airports mentioned in the flights data and get the total # of incoming/outgoing flights for each airport:

conus <- state.abb[!(state.abb %in% c("AK", "HI"))]
airports <- filter(airports,
                   state %in% conus,
                   iata %in% union(flights$origin, flights$destination))
orig <- select(count(flights, origin), iata=origin, n1=n)
dest <- select(count(flights, destination), iata=destination, n2=n)
airports <- left_join(airports,
                      select(mutate(left_join(orig, dest),
                                    tot=n1+n2),
                             iata, tot)) %>% 
            filter(!is.na(tot))

Since we’re going to initially plot polygons in ggplot (and, eventually, in leaflet), we’ll need to work with Spatial objects, so let’s make those airport lat/lon pairs into a SpatialPointsDataFrame:

vor_pts <- SpatialPointsDataFrame(cbind(airports$longitude,
                                        airports$latitude),
                                  airports, match.ID=TRUE)

The deldir function returns a pretty complex object. Thankfully, the authors of the package realized that one might just want the polygons from the computation and pre-made a function: tile.list for computing/extracting them. Those polygons aren’t, however, closed and we really want to keep the airport data associatd with them, so we need to close the polygons and associate the data. Since we’re likely going to repeat this task, let’s make it a (very badly named) function:

SPointsDF_to_voronoi_SPolysDF <- function(sp) {
 
  # tile.list extracts the polygon data from the deldir computation
  vor_desc <- tile.list(deldir(sp@coords[,1], sp@coords[,2]))
 
  lapply(1:(length(vor_desc)), function(i) {
 
    # tile.list gets us the points for the polygons but we
    # still have to close them, hence the need for the rbind
    tmp <- cbind(vor_desc[[i]]$x, vor_desc[[i]]$y)
    tmp <- rbind(tmp, tmp[1,])
 
    # now we can make the Polygon(s)
    Polygons(list(Polygon(tmp)), ID=i)
 
  }) -> vor_polygons
 
  # hopefully the caller passed in good metadata!
  sp_dat <- sp@data
 
  # this way the IDs _should_ match up w/the data & voronoi polys
  rownames(sp_dat) <- sapply(slot(SpatialPolygons(vor_polygons),
                                  'polygons'),
                             slot, 'ID')
 
  SpatialPolygonsDataFrame(SpatialPolygons(vor_polygons),
                           data=sp_dat)
 
}

Before we can make the plots, we need to put the Spatial objects into the proper form for ggplot2 (and get the U.S. state map):

vor <- SPointsDF_to_voronoi_SPolysDF(vor_pts)
 
vor_df <- fortify(vor)
 
states <- map_data("state")

Now we can have some fun. Let’s try to mimic the D3 example map as closely as possible. We’ll lay down the CONUS map, add a points layer for the the airports, sizing & styling them just like the D3 example. Note that we order the points so that the smallest ones appear on top (so we can still see them).

We’ll then lay down our newly created Voronoi layer. We’ll also use the same projection (Albers) that the D3 examples uses:

gg <- ggplot()
# base map
gg <- gg + geom_map(data=states, map=states,
                    aes(x=long, y=lat, map_id=region),
                    color="white", fill="#cccccc", size=0.5)
# airports layer
gg <- gg + geom_point(data=arrange(airports, desc(tot)),
                      aes(x=longitude, y=latitude, size=sqrt(tot)),
                      shape=21, color="white", fill="steelblue")
# voronoi layer
gg <- gg + geom_map(data=vor_df, map=vor_df,
                    aes(x=long, y=lat, map_id=id),
                    color="#a5a5a5", fill="#FFFFFF00", size=0.25)
gg <- gg + scale_size(range=c(2, 9))
gg <- gg + coord_map("albers", lat0=30, lat1=40)
gg <- gg + theme_map()
gg <- gg + theme(legend.position="none")
gg

ggplot-1

While that’s pretty, it’s not exactly useful. I’m sure there are times when it’s important to show the Voronoi polygons, but they are especially useful when they are used to help with user interface interactions.

In the case of this map, some airport “bubbles” are very small and many overlap, making a “click” (or even “hover”) a potentially painstaking task for someone looking to get more data out of the visualization. The D3 example uses Voronoi polygons to make it super-easy for the user to hover over a map area and get more info about the flights for the closest airport to the mouse pointer.

We’ll use the leaflet htmlwidget to do something similar. Until I can figure out “hover” events for R+leaflet, you’ll have to live with “click”.

First we’ll need some additional packages:

library(leaflet)
library(rgeos)
library(htmltools)

And, we’ll also need a U.S. shapefile (which we simplify since the polygons are pretty detailed and that’s not necessary for this vis):

url <- "http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_500k.json"
fil <- "gz_2010_us_040_00_500k.json"
 
if (!file.exists(fil)) download.file(url, fil, cacheOK=TRUE)
 
states_m <- readOGR("gz_2010_us_040_00_500k.json", 
                    "OGRGeoJSON", verbose=FALSE)
states_m <- subset(states_m, 
                   !NAME %in% c("Alaska", "Hawaii", "Puerto Rico"))
dat <- states_m@data # gSimplify whacks the data bits
states_m <- SpatialPolygonsDataFrame(gSimplify(states_m, 0.05,
                                               topologyPreserve=TRUE),
                                     dat, FALSE)

The leaflet vis idiom is similar to the ggplot idiom. I’m using a base tile layer since I was too lazy to figure out how to change the leaflet default gray background map color. The map polygons are added, then the circles/bubbles (note that you work in meters with addCircles which lets leaflet scale the bubbles as you zoom in/out). Finally, the Voronoi layer is added. I kept the stroke visible purely for demonstration purposes. You need to keep fill=TRUE otherwise the Voronoi layer won’t get click/hover events and once I figure out how to trigger popups on hover and use a static popup layer, this will let users hover around the map to get the underlying airport flight information.

leaflet(width=900, height=650) %>%
  # base map
  addProviderTiles("Hydda.Base") %>%
  addPolygons(data=states_m,
              stroke=TRUE, color="white", weight=1, opacity=1,
              fill=TRUE, fillColor="#cccccc", smoothFactor=0.5) %>%
  # airports layer
  addCircles(data=arrange(airports, desc(tot)),
             lng=~longitude, lat=~latitude,
             radius=~sqrt(tot)*5000, # size is in m for addCircles O_o
             color="white", weight=1, opacity=1,
             fillColor="steelblue", fillOpacity=1) %>%
  # voronoi (click) layer
  addPolygons(data=vor,
              stroke=TRUE, color="#a5a5a5", weight=0.25,
              fill=TRUE, fillOpacity = 0.0,
              smoothFactor=0.5, 
              popup=sprintf("Total In/Out: %s",
                            as.character(vor@data$tot)))

I made the Voronoi layer very light, so you may want to keep it there as a cue for the user. How you work with it is completely up to you.

Now you have one less reason to be envious of the D3 cartographers!

The $DAYJOB doesn’t afford much opportunity to work with cartographic datasets, but I really like maps and tinker with shapefiles and geo-data when I can, plus answer a ton of geo-questions on StackOverflow. R makes it easy—one might even say too easy—to work with maps. All it takes to make a map of the continental United States (CONUS) is:

library(maps)
map("state")

base_1-1

It’s a little more involved with ggplot2:

library(ggplot2)
library(ggthemes)

states <- map_data("state")

gg <- ggplot()
gg <- gg + geom_map(data=states, map=states,
                    aes(x=long, y=lat, map_id=region),
                    color="black", fill="white", size=0.25)
gg <- gg + theme_map()
gg

ggplot_1-1

Both of those maps are horrible. The maps::map function defaults to using a rectangular projection with the aspect ratio chosen so that longitude and latitude scales are equivalent at the center of the picture. ggplot will size the graphic to the device window. Sticking with these defaults is a really bad idea. Why? I’ll let Mark Monmonier (author of How to Lie with Maps) explain:

Maps have three basic attributes: scale, projection, and symbolization. Each element is a source of distortion. As a group, they describe the essence of the map’s possibilities and limitations. No one can use maps or make maps safely and effectively without understanding map scales, map projections, and map symbols.

Map projections distort five geographic relationships: areas, angles, gross shapes, distances, and directions. Although some projections preserve local angles but not areas, others preserve areas but not local angles. All distort large shapes noticeably (but some distort continental shapes more than others), and all distort at least some distances and some directions.

There are great examples in his book on how map projections can inflate or diminish the area and relative importance of countries and regions, and how a map projection can itself become a rallying point for “cartographically oppressed” regions.

If map projections are important (and they are) what should you do? Both map and ggplot give you options to use projections that are found in the mapproj library (specifically using the mapproject function). The help for mapproject even gives an example of using the Albers equal-area conic projection when plotting the CONUS:

library(mapproj)
map("state", projection="albers", par=c(lat0=30, lat1=40))

base_2-1

As it’s full name suggests, Albers is an equal-area projection which is recommended for U.S. choropleths as it preserves the relative areas of geographic features. It’s also better than the ggplot default (“Mercator“) in it’s coord_map:

gg + coord_map()

ggplot_2-1

But, we can pass in parameters to use a saner projection:

gg + coord_map("albers", lat0=30, lat1=40)

ggplot_3-1

The mapproject function exposes 41 projections, which may seem generous; however, not all of them are practical and even the ones with parameters are not very customizable. Before we dig into that a bit more, you’re probably wondering (if you don’t work with cartography for a living or hobby) how one goes about choosing a map projection…

Choosing A Map Projection

Thankfully, many smart folks have thought quite a bit about choosing map projections and there are a number of resources you can draw upon when making your own choices.

The first one is Map Projections – A Working Manual. This is a free resource from the U.S. Geological Survey and was published back in 1987. It has a “companion” resource – An Album of Map Projections. Both are outstanding resources to have on hand as they provide a great deal of information on map projections. If you’re in a hurry, the “Album” makes for a good quick reference. Here’s the entry for our buddy, Albers:


album-albers

(Go to page 10 of the “Album” for the larger version)

The image in the upper-right is a “Tissot indicatrix” (named for it’s creator Nicolas Auguste Tissot), which “puts Tissot indicatrices at every 30° of latitude and longitude, except at the poles. This shows the shape of infinitesimally small circles on the Earth as they appear when they are plotted by using a fixed finite scale at the same locations on a map. Every circle is plotted as a circle or an ellipse or, in extreme cases, as a straight line. On an equal-area projection, all these ellipses and circles are shown as having the same area. The flattening of the ellipse shows the extent of local shape distortion and how much the scale is changed and in what direction. On conformal map projections, all indicatrices remain circles, but the areas change. On other projections, both the areas and the shapes of the indicatrices change”. This makes for a great way to understand just how your creations are being distorted (though that may be obvious when you actually plot your maps).

The “Working Manual” also includes the equations necessary to compute the projections. Both provide projection usage guidance as well as the technically bits describing them.

The Institute for Environment and Sustainability has a similar guide for Map Projections for Europe.

Many countries and municipalities also have their own guides for using projections (e.g. California).

If you can handle what feels like a TARDIS trip back in time to the days of GeoCities, MapRef also provides good information on projections. You can also review Carlos A. Furuti’s compilation of projections for more information.

Using More Complex/Nuanced Projections

Despite being able to use Albers and 40 other projections via mapproject, having more flexibility would allow us to use grid systems (see the refs in the previous section for what those are) and also hyper-customize many projections without the need to write our own equations (be thankful of that as you skim the math in the “Working Manual”!).

Gerald Evenden developed a library and utility called proj for the USGS back in 1995 and said utility, library & projection specification idiom has been maintained and expanded ever since in the PROJ.4 project. This library/tool is straightforward to install on most (if not all) operating systems. PROJ.4 lets you specify projections in a (fairly complex) string format (often referred to as a proj4string, especially in R). For example, you can specify Albers for the U.S. with:

"+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96"

Cartographic Projection Procedures for the UNIX Environment—A User’s Manual (considered “the”manual for PROJ.4) explains how to build the specification. At the time of this blog post there are 134 named (i.e. +proj=NAME) projections availble for use & customization (run proj -l at a command-line to see the full list).

A simple example of why this is useful is when you need to plot a world map. Said activity should always be prefaced with a review of this seminal work to ensure you will choose a good projection—say, Winkel tripel. A quick glance of what mapproject supports will show you that you’re pretty much out of luck using the standard R tools we’ve seen so far for that but there is a handy +proj=wintri in PROJ.4 we can use.

Here’s how you’d plot that projection in the base plotting system:

library(sp)
library(rworldmap) # this pkg has waaaay better world shapefiles

plot(spTransform(getMap(), CRS("+proj=wintri")))

base_3-1

However, we can’t pass in PROJ.4 strings to ggplot’s coord_map, so we have to project the Earth first before plotting and use coord_equal:

world <- fortify(spTransform(getMap(), CRS("+proj=wintri")))

gg <- ggplot()
gg <- gg + geom_map(data=world, map=world,
                    aes(x=long, y=lat, map_id=id),
                    color="black", fill="white", size=0.25)
gg <- gg + coord_equal()
gg <- gg + theme_map()
gg

ggplot_4-1

While that works, you have to pre-project any lines, points, etc. as well before plotting them with the map. For example, here are the locations of all of the non-military rocket launch sites (yeah, I’ve done that before, but it’s a fun data set! – and I had it handy):

library(rgdal) # for readOGR
library(httr)  # for downloading

launch_sites <- paste0("https://collaborate.org/sites/collaborate.org/",
                       "files/spacecentersnon-militaryspacelaunchsites.kml")
invisible(try(GET(launch_sites, write_disk("/tmp/launch-sites.kml")), silent=TRUE))

sites <- readOGR("/tmp/launch-sites.kml", 
                 "SpaceCenters: Non-Military Space Launch Sites", 
                 verbose=FALSE)
sites_wt <- spTransform(sites, CRS("+proj=wintri"))
sites_wt <- coordinates(sites_wt)

gg <- gg + geom_point(data=as.data.frame(sites_wt), 
                      aes(x=coords.x1, y=coords.x2), 
                      color="#b2182b")
gg

ggplot_5-1

If you have more layers, that’s doable, but painful. If we could make it so ggplot allow for PROJ.4 projections in some coord_ this would help keep our plot idioms (and code) cleaner. Thus, coord_proj was born.

coord_proj mimics the functionality of coord_map but uses proj4::project instead of mapproject (and takes in any of those parameters). The world map with launch sites now looks like this (complete ggplot code):

world <- fortify(getMap())
sites <- as.data.frame(coordinates(readOGR("/tmp/launch-sites.kml", 
                                           "SpaceCenters: Non-Military Space Launch Sites", 
                                           verbose=FALSE)))

gg <- ggplot()
gg <- gg + geom_map(data=world, map=world,
                    aes(x=long, y=lat, map_id=id),
                    color="black", fill="white", size=0.25)
gg <- gg + geom_point(data=(sites), 
                      aes(x=coords.x1, y=coords.x2), 
                      color="#b2182b")
gg <- gg + coord_proj("+proj=wintri")
gg <- gg + theme_map()
gg

ggplot_6-1

Now, if you want to go all Hobo-Dyer instead of Winkel tripel, it’s one, small change:

gg + coord_proj("+proj=cea +lat_ts=37.5")

ggplot_7-1

Spatial Reference is a website by Howard Butler, Christopher Schmidt, Dane Springmeyer, and Josh Livni that helps assist others in their understanding, recording, and usage of spatial reference systems. It’s ultimate goal is to let you use URI’s to specify spatial reference systems, but you can use it to lookup Proj4 strings to meet virtually any need you have.

The Bad News

Presently, coord_proj is not part of the ggplot2 code base. I chose a really bad time to muck with this as Hadley is doing all sorts of spiffy stuff to ggplot2 and it’s not a good time to shove this in there.

You can fork your own copy of ggplot2 from Hadley’s GitHub repo and add this file to it, re-document/collate/build the package and then use it locally. It still needs some work (there’s a horrible hack in it that anyone familiar with geo-stuff will cringe at) but I welcome feedback/contributions. Hopefully this will get into ggplot2 (or out of it and into a separate package since Hadley is making ggplot2 very extensible in his update) at some point in the not-too-distant future.