Skip navigation

Category Archives: cartography

Nathaniel Smith and Stéfan van der Walt presented a new colormap (for Python) at SciPy 2015 called viridis.

From the authors:

The default colourmap in Matplotlib is the colourful rainbow-map called Jet, which is deficient in many ways: small changes in the data sometimes produce large perceptual differences and vice-versa; its lightness gradient is non-monotonic; and, it is not particularly robust against color-blind viewing. Thus, a new default colormap is needed — but no obvious candidate has been found. Here, we present our proposed new default colormap for Matplotlib, and expose the theory, tools, data exploration and motivations behind its design.

You can also find out a tad more about their other colormap designs (a.k.a. the runner-ups), along with Parula, which is a proprietary MATLAB color map.

Simon Garnier (@sjmgarnier) took Nathaniel & Stéfan’s work and turned it into an R package.

Noam Ross (@noamross) & I piled on shortly thereafter to add some ggplot color scale_ functions which are (for now) only available in Simon’s github repo.

Rather than duplicate the examples already provided in the documentation of those functions, I thought there might be more efficacy in creating a post that helped showcase why you should switch from rainbow (et al) to viridis.

Since folks seem to like maps, we’ll work with one for the example, but let’s get some package machinations out of the way first:

library(viridis)
library(raster)
library(scales)
library(dichromat)
library(rasterVis)
library(httr)
library(colorspace)

Now, we’ll need a map to work with so let’s grab a U.S. max temperature GeoTIFF raster from NOAA (from the bitter cold month of February 2015) and project it to something more reasonable:

temp_raster <- "http://ftp.cpc.ncep.noaa.gov/GIS/GRADS_GIS/GeoTIFF/TEMP/us_tmax/us.tmax_nohads_ll_20150219_float.tif"
 
try(GET(temp_raster,
        write_disk("us.tmax_nohads_ll_20150219_float.tif")), silent=TRUE)
us <- raster("us.tmax_nohads_ll_20150219_float.tif")
 
# albers FTW
us <- projectRaster(us, crs="+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs")

We’ll also make a helper function to save us some typing and setup the base # of colors in the colormap:

n_col <- 64
 
img <- function(obj, col) {
  image(obj, col=col, asp=1, axes=FALSE, xaxs="i", xaxt='n', yaxt='n', ann=FALSE)
}

Let’s take a look at various color palettes with different types of vision. We’ll use a 3×2 grid and:

  • use 4 color palettes from grDevices,
  • make a gradient palette from one of the ColorBrewer sequential palettes, and
  • then (finally) use a viridis color palette.

We’ll take this grid of 6 maps and view it through the eyes of three different types of color vision as well as a fully desaturated version. Note that I’m not adding much cruft to the map display (including legends) since this isn’t about the values so much as it is about the visual perception of the colormaps.

Remember you can select/click/tap the map grids for (slightly) larger versions.

“Normal” Vision

par(mfrow=c(3, 2), mar=rep(0, 4))
img(us, rev(heat.colors(n_col)))
img(us, rev(rainbow(n_col)))
img(us, rev(topo.colors(n_col)))
img(us, rev(cm.colors(n_col)))
img(us, gradient_n_pal(brewer_pal("seq")(9))(seq(0, 1, length=n_col)))
img(us, rev(viridis(n_col)))

01_normal-1

All of the maps convey the differences in max temperature. If you happen to have “normal” color vision you should be drawn to the bottom two (ColorBrewer on the left and Viridis on the right). They are both sequential and convey the temperature changes more precisely (and they aren’t as gosh-awful ugly the the other four).

While the ColorBrewer gradient may be “good”, Viridis is designed to be:

  • colorful & “pretty”
  • sequential (to not impose other structure on exploratory data analysis visualizations)
  • perceptually uniform (i.e. changes in the data should be accurately decoded by our brains) even when desaturated
  • accessible to colorblind viewers

and seems to meet those goals quite well.

Take a look at each of the vision-adjusted examples:

Green-Blind (Deuteranopia)

par(mfrow=c(3, 2), mar=rep(0, 4))
img(us, dichromat(rev(heat.colors(n_col)), "deutan"))
img(us, dichromat(rev(rainbow(n_col)), "deutan"))
img(us, dichromat(rev(topo.colors(n_col)), "deutan"))
img(us, dichromat(rev(cm.colors(n_col)), "deutan"))
img(us, dichromat(gradient_n_pal(brewer_pal("seq")(9))(seq(0, 1, length=n_col)), "deutan"))
img(us, dichromat(rev(viridis(n_col)), "deutan"))

02_deutan-1

Red-Blind (Protanopia)

par(mfrow=c(3, 2), mar=rep(0, 4))
img(us, dichromat(rev(heat.colors(n_col)), "protan"))
img(us, dichromat(rev(rainbow(n_col)), "protan"))
img(us, dichromat(rev(topo.colors(n_col)), "protan"))
img(us, dichromat(rev(cm.colors(n_col)), "protan"))
img(us, dichromat(gradient_n_pal(brewer_pal("seq")(9))(seq(0, 1, length=n_col)), "protan"))
img(us, dichromat(rev(viridis(n_col)), "protan"))

03_protan-1

Blue-Blind (Tritanopia)

par(mfrow=c(3, 2), mar=rep(0, 4))
img(us, dichromat(rev(heat.colors(n_col)), "tritan"))
img(us, dichromat(rev(rainbow(n_col)), "tritan"))
img(us, dichromat(rev(topo.colors(n_col)), "tritan"))
img(us, dichromat(rev(cm.colors(n_col)), "tritan"))
img(us, dichromat(gradient_n_pal(brewer_pal("seq")(9))(seq(0, 1, length=n_col)), "tritan"))
img(us, dichromat(rev(viridis(n_col)), "tritan"))

04_tritan-1

Desaturated

par(mfrow=c(3, 2), mar=rep(0, 4))
img(us, desaturate(rev(heat.colors(n_col))))
img(us, desaturate(rev(rainbow(n_col))))
img(us, desaturate(rev(topo.colors(n_col))))
img(us, desaturate(rev(cm.colors(n_col))))
img(us, desaturate(gradient_n_pal(brewer_pal("seq")(9))(seq(0, 1, length=n_col))))
img(us, desaturate(rev(viridis(n_col))))

05_desatureated-1

Hopefully both the ColorBrewer gradient and Viridis palettes stood out as conveying the temperature data with more precision and more consistently across all non-standard vision types as you progressed through each one.

To see this for yourself in your own work, grab Simon’s package and start substituting viridis for some of your usual defaults to see if it makes a difference in helping you convey the story your data is trying to tell, both more accurately and for a more diverse audience. Remember, the github version (which will be on CRAN soon) have handy ggplot scale_ functions to make using viridis as painless as possible.

I also updated my Melbourne Walking EDA project to use the viridis palette instead of parula (which I was only really using in defiance of MATLAB’s inane restrictions).

Poynter did a nice interactive piece on world population by income (i.e. “How Many Live on How Much, and Where”). I’m always on the lookout for optimized shapefiles and clean data (I’m teaching a data science certificate program starting this Fall) and the speed of the site load and the easy availability of the data set made this one a “must acquire”. Rather than just repeat Poynter’s D3-goodness, here’s a way to look at the income data in series of small multiple choropleths—using R & ggplot2—that involves:

  • downloading data & shapefiles from a web site
  • using dplyr & tidyr for data munging
  • applying custom fill color scale mapping in ggplot
  • ordering plots with a custom facet order (using factors)
  • tweaking the theme and aesthetics for a nicely finished result

By using D3, Poynter inherently made the data available. Pop open the “Developer Tools” in any browser, reload the page and look at the “Network” tab and you’ll see a list of files (you can sometimes see things in the source code, but this technique is often faster). The income data is a well-formed CSV file http://www.pewglobal.org/wp-content/themes/pew-global/interactive-global-class.csv and their highly optimized world map was also easy to discern http://www.pewglobal.org/wp-content/lib/js/world-geo.json. We’ll start by grabbing the map and using the same map projection that Poynter did (Robinson). Don’t be put off by all the library calls since one of the best parts of R is the ever-increasing repository of great packages to help you get things done.

library(httr)     # getting data
library(rgdal)    # working with shapefile
library(dplyr)    # awesome data manipulation
library(readr)    # faster reading of CSV data
library(stringi)  # string manipulation
library(stringr)  # string manipulation
library(tidyr)    # reshaping data
library(grid)     # for 'unit'
library(scales)   # for 'percent'
library(ggplot2)  # plotting
library(ggthemes) # theme_map
 
# this ensures you only download the shapefile once and hides
# errors and warnings. remove `try` and `invisible` to see messages
try(invisible(GET("http://www.pewglobal.org/wp-content/lib/js/world-geo.json",
                  write_disk("world-geo.json"))), silent=TRUE)
 
# use ogrListLayers("world-geo.json") to see file type & 
# layer info to use in the call to readOGR
 
world <- readOGR("world-geo.json", "OGRGeoJSON")
world_wt <- spTransform(world, CRS("+proj=robin"))
world_map <- fortify(world_wt)

I would have liked to do fortify(world_wt, region="name") (since that makes working with filling in countries by name much easier in the choropleth part of the code) but that generated TopologyException errors (I’ve seen this happen quite a bit with simplified/optimized shapefiles and some non-D3 geo-packages). One can sometimes fix those with a strategic rgeos::gBuffer call, but that didn’t work well in this case. We can still use country names with a slight rejiggering of the fortified data frame using dplyr:

world_map %>%
  left_join(data_frame(id=rownames(world@data), name=world@data$name)) %>%
  select(-id) %>%
  rename(id=name) -> world_map

Now it’s time to get the data. The CSV file has annoying spaces in it that causes R to interpret all the columns as strings, so we can use dplyr again to get them into the format we want them in. Note that I’m also making the percentages decimals so we can use percent later on to easily format them.

# a good exercise would be to repeat the download code above 
# rather and make repeated calls to an external resource
read_csv("http://www.pewglobal.org/wp-content/themes/pew-global/interactive-global-class.csv") %>%
  mutate_each(funs(str_trim)) %>%
  filter(id != "None") %>%
  mutate_each(funs(as.numeric(.)/100), -name, -id) -> dat

For this post, we’ll only be working with the actual share percentages, so let’s:

  • ignore the “change” columns
  • convert the data frame from wide to long
  • extract out the income levels (e.g. “Poor”, “Low Income”…)
  • set a factor order for them so our plots will be in the correct sequence
dat %>%
  gather(share, value, starts_with("Share"), -name, -id) %>%
  select(-starts_with("Change")) %>%
  mutate(label=factor(stri_trans_totitle(str_match(share, "Share ([[:alpha:]- ]+),")[,2]),
                      c("Poor", "Low Income", "Middle Income", "Upper-Middle Income", "High Income"),
                      ordered=TRUE)) -> share_dat

The stringi package is really handy (stringr is built on it, too). The stri_trans_totitle function alleviates some mundane string operations and the stri_replace_all_regex (below) also allows us to do vectorized regular expression replacements without a ton of code.

To keep the charts aligned, we’ll use Poynter’s color scale (which was easy to extract from the site’s code) and use the same legend breaks via `cut’. We’ll also format the labels for these breaks to make our legend nicer to view.

# use same "cuts" as poynter
poynter_scale_breaks <- c(0, 2.5, 5, 10, 25, 50, 75, 80, 100)
 
sprintf("%2.1f-%s", poynter_scale_breaks, percent(lead(poynter_scale_breaks/100))) %>%
  stri_replace_all_regex(c("^0.0", "-NA%"), c("0", "%"), vectorize_all=FALSE) %>%
  head(-1) -> breaks_labels
 
share_dat %>%
  mutate(`Share %`=cut(value,
                       c(0, 2.5, 5, 10, 25, 50, 75, 80, 100)/100,
                       breaks_labels))-> share_dat
 
share_pal <- c("#eaecd8", "#d6dab3", "#c2c98b", "#949D48", "#6e7537", "#494E24", "#BB792A", "#7C441C", "#ffffff")

Finally, we get to the good part and start plotting the visualization. There are only two layers: the base map and the choropleth filled map. We then:

  • apply our manual color palette to them
  • remove the line color slashes in the legend boxes
  • setup the overall plot label
  • tell ggplot which coordinate system to use (in this case coord_equal is fine since we already projected the points)
  • apply a base theme that’s good for mapping
  • tweak the text and ensure our legend is in the position we want it to be ing
gg <- ggplot()
 
gg <- gg + geom_map(data=world_map, map=world_map,
                    aes(x=long, y=lat, group=group, map_id=id),
                    color="#7f7f7f", fill="white", size=0.15)
 
gg <- gg + geom_map(data=share_dat, map=world_map,
                    aes(map_id=name, fill=`Share %`),
                    color="#7f7f7f", size=0.15)
 
gg <- gg + scale_fill_manual(values=share_pal)
 
gg <- gg + guides(fill=guide_legend(override.aes=list(colour=NA)))
gg <- gg + labs(title="World Population by Income\n")
gg <- gg + facet_wrap(~label, ncol=2)
 
gg <- gg + coord_equal()
 
gg <- gg + theme_map()
 
gg <- gg + theme(panel.margin=unit(1, "lines"))
gg <- gg + theme(plot.title=element_text(face="bold", hjust=0, size=24))
gg <- gg + theme(legend.title=element_text(face="bold", hjust=0, size=12))
gg <- gg + theme(legend.text=element_text(size=10))
gg <- gg + theme(strip.text=element_text(face="bold", size=10))
gg <- gg + theme(strip.background=element_blank())
gg <- gg + theme(legend.position="bottom")
 
gg

And, here’s the result (click for larger version):

forblog

The optimized shapefile makes for a very fast plot and you can plot individual chorpleths by filtering the data and not using facets.

While there are a number of choropleth packages out there for R, learning how to do the core components on your own can (a) make you appreciate those packages a bit more (b) give you the skills to do them on your own when you need a more customized version. Many of the theme tweaks will also apply to the ggplot-based choropleth packages.

With this base, it should be a fun exercise to the reader to do something similar with Poynter’s “percentage point change” choropleth. You’ll need to change the color palette and manipulate different data columns to get the same scales and visual representation they do. Drop a note in the comments if you give this a go!

You can find all the code in this post in one convenient gist.

The recent announcement of the [start of egg rationing in the U.S.](http://www.washingtonpost.com/blogs/wonkblog/wp/2015/06/05/the-largest-grocer-in-the-texas-is-now-rationing-eggs/) made me curious enough about the avian flu outbreak to try to dig into the numbers a bit. I finally stumbled upon a [USDA site](http://www.aphis.usda.gov/wps/portal/aphis/ourfocus/animalhealth/sa_animal_disease_information/sa_avian_health/sa_detections_by_states/ct_ai_pacific_flyway/!ut/p/a1/lVPBkqIwEP2WOexpCxMBAY-oo6Kybo01q3JJNSGB1GCgSNTi7zcwe3CnZnQ3h1Sl-3X369cdlKADSiRcRA5aVBLK7p14ZLVd2sMJtqPFbvyMox-_5nGw8Z3t0jWAowHgL06I_47friOvi3_Bk-VsiHcO2qMEJVTqWhfoCHUhFKGV1ExqUoq0gab9hhWQ6twQXtGz6l8gxQlKUjAodXFryYRioBgRklfNqW_i3X0RIG_xGdOMdm5F0pYoDZqZ1FQTEKQGKrighJftFdqOX01Fho7c9iiAzS3HG6WWm2HbSnmAzYXxyA3AG1L-R487Df-TntNFuHT9jVHQDWwczUywP44xjrxH8b2eDzL0gHsj-1Bk8TwxReabn_56ZeP1CB0NSf9LFmMX7f5TtdXDtmYoib9z5YadQnYTT-PcVABdWN2s0eHuDry7b3agN3y2A-jw6Q4YfnlZpeZD7Ke3REKZOoEh0jDOGtYMikppdLher4OzymCQVxdUn15PgdMK6-0lwM7obR9ayTx_evoNPxBrVg!!/?1dmy&urile=wcm:path:/APHIS_Content_Library/SA_Our_Focus/SA_Animal_Health/SA_Animal_Disease_Information/SA_Avian_Health/SA_Detections_by_States/) that had an embedded HTML table of flock outbreak statistics by state, county and date (also flock type and whether it was a commercial enterprise or “backyard” farm). Just looking at the sum of flock sizes on that page shows that nearly 50 million birds have been impacted since December, 2014.

We can scrape the data with R & `rvest` and then use the shapefile hexbins from [previous posts](http://rud.is/b/2015/05/15/u-s-drought-monitoring-with-hexbin-state-maps-in-r/) to watch the spread week-over-week.

The number of packages I ended up relying on was a bit surprising. Let’s get them out of the way before focusing on the scraping and hexbin-making:

library(rvest)     # scraping
library(stringr)   # string manipulation
library(lubridate) # date conversion
library(dplyr)     # data mjnging
library(zoo)       # for locf
library(ggplot2)   # plotting
library(rgdal)     # map stuff
library(rgeos)     # map stuff

We also end up using `magrittr` and `tidyr` but only for one function, so you’ll see those with `::` in the code.

Grabbing the USDA page is pretty straightforward:

url <- "http://www.aphis.usda.gov/wps/portal/aphis/ourfocus/animalhealth/sa_animal_disease_information/sa_avian_health/ct_avian_influenza_disease/!ut/p/a1/lVJbb4IwFP41e1qwFZDLI-oUnGgyswm8kAMUaAaFQNG4X7-ibnEPYtakDz3nO_kupyhAHgoYHGgGnFYMiv4daOFqa8vjKZad5c58wc7mY-Eaa13Z2qoA-AKA7xwL_53fvjpaP_-Gp_Z8jHcK2qMABTHjNc-RD3VO2zCuGCeMhwWNGmhOT7iFsOqaMK3irj2_gNESijAnUPD8tpLQlkBLQsrSqinPJi7tAwX2i4_5tSBgRUfYF_wM9mLqmCbIj2QzxZpMJMUYg6TGkSLBBCaSPEnSJIljXVH0q_kBdw_CO5sXkNnSslV9LQJTDRk7czGumy7GjnYFDOTrCw36XRJTRbt_mlo9VD1FgfuctqrVByA37szNBAPwXOpzR97gPi7tm30gb2AfQkxWVJH4ifvZLavFIsUQrA1JSUOaUV61HHnH43HUtQmMsuqA6vK9NJST9JluNlKwyPr7DT6YvRs!/?1dmy&urile=wcm%3apath%3a%2Faphis_content_library%2Fsa_our_focus%2Fsa_animal_health%2Fsa_animal_disease_information%2Fsa_avian_health%2Fsa_detections_by_states%2Fct_ai_pacific_flyway"
 
#' read in the data, extract the table and clean up the fields
#' also clean up the column names to since they are fairly nasty
 
pg <- html(url)

If you poke at the source for the page you’ll see there are two tables in the code and we only need the first one. Also, if you scan the rendered table on the USDA page by eye you’ll see that the column names are horrible for data analysis work and they are also inconsistent in the values used for various columns. Furthermore, there are commas in the flock counts and it would be handy to have the date as an actual date type. We can extract the table we need and clean all that up in a reasonably-sized `dplyr` pipe:

pg %>%
  html_nodes("table") %>%
  magrittr::extract2(1) %>%
  html_table(header=TRUE) %>%
  filter(`Flock size`!="pending") %>%
  mutate(Species=str_replace(tolower(Species), "s$", ""),
         `Avian influenza subtype*`=str_replace_all(`Avian influenza subtype*`, " ", ""),
         `Flock size`=as.numeric(str_replace_all(`Flock size`, ",", "")),
         `Confirmation date`=as.Date(mdy(`Confirmation date`))) %>%
  rename(state=State, county=County, flyway=Flyway, flock_type=`Flock type`,
         species=Species, subtype=`Avian influenza subtype*`, date=`Confirmation date`,
         flock_size=`Flock size`) -> birds

Let’s take a look at what we have:

glimpse(birds)
 
## Observations: 202
## Variables:
## $ state      (chr) "Iowa", "Minnesota", "Minnesota", "Iowa", "Minnesota", "Iowa",...
## $ county     (chr) "Sac", "Renville", "Renville", "Hamilton", "Kandiyohi", "Hamil...
## $ flyway     (chr) "Mississippi", "Mississippi", "Mississippi", "Mississippi", "M...
## $ flock_type (chr) "Commercial", "Commercial", "Commercial", "Commercial", "Comme...
## $ species    (chr) "turkey", "chicken", "turkey", "turkey", "turkey", "turkey", "...
## $ subtype    (chr) "EA/AM-H5N2", "EA/AM-H5N2", "EA/AM-H5N2", "EA/AM-H5N2", "EA/AM...
## $ date       (date) 2015-06-04, 2015-06-04, 2015-06-04, 2015-06-04, 2015-06-03, 2...
## $ flock_size (dbl) 42200, 415000, 24800, 19600, 37000, 26200, 17200, 1115700, 159...

To make an animated map of cumulative flock totals by week, we’ll need to

– group the `birds` data frame by week and state
– calculate the cumulative sums
– fill in the gaps where there are missing state/week combinations
– carry the last observations by state/week forward in this expanded data frame
– make breaks for data ranges so we can more intelligently map them to colors

This ends up being a longer `dplyr` pipe than I usuall like to code (I think very long ones are hard to follow) but it gets the job done and is still pretty readable:

birds %>%
  mutate(week=as.numeric(format(birds$date, "%Y%U"))) %>%
  arrange(week) %>%
  group_by(week, state) %>%
  tally(flock_size) %>%
  group_by(state) %>%
  mutate(cum=cumsum(n)) %>%
  ungroup %>%
  select(week, state, cum) %>%
  mutate(week=as.Date(paste(week, 1), "%Y%U %u")) %>%
  left_join(tidyr::expand(., week, state), .) %>%
  group_by(state) %>%
  do(na.locf(.)) %>%
  mutate(state_abb=state.abb[match(state, state.name)],
         cum=as.numeric(ifelse(is.na(cum), 0, cum)),
         brks=cut(cum,
                  breaks=c(0, 200, 50000, 1000000, 10000000, 50000000),
                  labels=c("1-200", "201-50K", "50k-1m",
                           "1m-10m", "10m-50m"))) -> by_state_and_week

Now, we perform the standard animation steps:

– determine where we’re going to break the data up
– feed that into a loop
– partition the data in the loop
– render the plot to a file
– combine all the individual images into an animation

For this graphic, I’m doing something a bit extra. The color ranges for the hexbin choropleth go from very light to very dark, so it would be helpful if the titles for the states went from very dark to very light, matching the state colors. The lines that do this check for state breaks that fall in the last two values and appropriately assign `”black”` or `”white”` as the color.

i <- 0
 
for (wk in unique(by_state_and_week$week)) {
 
  # filter by week
 
  by_state_and_week %>% filter(week==wk) -> this_wk
 
  # hack to let us color the state labels in white or black depending on
  # the value of the fill
 
  this_wk %>%
    filter(brks %in% c("1m-10m", "10m-50m")) %>%
    .$state_abb %>%
    unique -> white_states
 
  centers %>%
    mutate(txt_col="black") %>%
    mutate(txt_col=ifelse(id %in% white_states, "white", "black")) -> centers
 
  # setup the plot
 
  gg <- ggplot()
  gg <- gg + geom_map(data=us_map, map=us_map,
                      aes(x=long, y=lat, map_id=id),
                      color="white", fill="#dddddd", size=2)
  gg <- gg + geom_map(data=this_wk, map=us_map,
                      aes(fill=brks, map_id=state_abb),
                      color="white", size=2)
  gg <- gg + geom_text(data=centers,
                       aes(label=id, x=x, y=y, color=txt_col), size=4)
  gg <- gg + scale_color_identity()
  gg <- gg + scale_fill_brewer(name="Combined flock size\n(all types)",
                               palette="RdPu", na.value="#dddddd", drop=FALSE)
  gg <- gg + guides(fill=guide_legend(override.aes=list(colour=NA)))
  gg <- gg + coord_map()
  gg <- gg + labs(x=NULL, y=NULL,
                  title=sprintf("U.S. Avian Flu Total Impact as of %s\n", wk))
  gg <- gg + theme_bw()
  gg <- gg + theme(plot.title=element_text(face="bold", hjust=0, size=24))
  gg <- gg + theme(panel.border=element_blank())
  gg <- gg + theme(panel.grid=element_blank())
  gg <- gg + theme(axis.ticks=element_blank())
  gg <- gg + theme(axis.text=element_blank())
  gg <- gg + theme(legend.position="bottom")
  gg <- gg + theme(legend.direction="horizontal")
  gg <- gg + theme(legend.title.align=1)
 
  # save the image
 
  # i'm using "quartz" here since I'm on a Mac. Use what works for you system to ensure you
  # get the best looking output png
 
  png(sprintf("output/%03d.png", i), width=800, height=500, type="quartz")
  print(gg)
  dev.off()
 
  i <- i + 1
 
}

We could use one of the R animation packages to actually make the animation, but I know ImageMagick pretty well so I just call it as a `system` command:

system("convert -delay 60 -loop 1 output/*png output/avian.gif")

All that results in:

avian

If that’s a static image, open it in a new tab/window (or just click on it). I really didn’t want to do a looping gif but if you do just make the `-loop 1` into `-loop 0`.

Now, we can just re-run the code when the USDA refreshes the data.

The code, data and sample bitmaps are on [github](https://github.com/hrbrmstr/avianflu).

On the news, today, of the early stages of drought hitting the U.S. northeast states I decided to springboard off of yesterday’s post and show a more practical use of hexbin state maps than the built-in (and still purpose unknown to me) “bees” data.

The U.S. Drought Monitor site supplies more than just a pretty county-level map. There’s plenty of data and you can dynamically retrieve just data tables for the whole U.S., U.S. states and U.S. counties. Since we’re working with state hexbins, we just need the state-level data. Drought levels for all five stages are reported per-state, so we can take all this data and created a faceted/small-multiples map based on it.

This builds quite a bit on the previous work, so you’ll see some familiar code. Most of the new code is actually making the map look nice (the great part about this is that once you have the idiom down, it’s just a matter of running the script each day vs a billion mouse clicks). The other bit of new code is the data-retrieval component:

library(readr)
library(tidyr)
library(dplyr)

intensity <- c(D0="Abnormally Dry", D1="Moderate Drought", D2="Severe Drought", 
               D3="Extreme Drought", D4="Exceptional Drought")

today <- format(Sys.Date(), "%Y%m%d")

read_csv(sprintf("http://droughtmonitor.unl.edu/USDMStatistics.ashx/?mode=table&aoi=state&date=%s", today)) %>% 
  gather(drought_level, value, D0, D1, D2, D3, D4) %>% 
  mutate(intensity=factor(intensity[drought_level], 
                          levels=as.character(intensity), ordered=TRUE)) -> drought

This:

  • sets up a fast way to add the prettier description of the drought levels (besides D0, D1, etc)
  • dynamically uses today’s date as the parameter for the URL we read with read_csv (from the readr package)
  • covert the data from wide to long
  • adds the intensity description

The ggplot code will facet on the intensity level to make the overall map:

library(rgdal)
library(rgeos)
library(ggplot2)
library(readr)
library(tidyr)
library(dplyr)
library(grid)

# get map from https://gist.github.com/hrbrmstr/51f961198f65509ad863#file-us_states_hexgrid-geojson

us <- readOGR("us_states_hexgrid.geojson", "OGRGeoJSON")

centers <- cbind.data.frame(data.frame(gCentroid(us, byid=TRUE), id=us@data$iso3166_2))

us_map <- fortify(us, region="iso3166_2")

intensity <- c(D0="Abnormally Dry", D1="Moderate Drought", D2="Severe Drought",
               D3="Extreme Drought", D4="Exceptional Drought")

today <- format(Sys.Date(), "%Y%m%d")

read_csv(sprintf("http://droughtmonitor.unl.edu/USDMStatistics.ashx/?mode=table&aoi=state&date=%s", today)) %>%
  gather(drought_level, value, D0, D1, D2, D3, D4) %>%
  mutate(intensity=factor(intensity[drought_level],
                          levels=as.character(intensity), ordered=TRUE)) -> drought

gg <- ggplot()
gg <- gg + geom_map(data=us_map, map=us_map,
                    aes(x=long, y=lat, map_id=id),
                    color="white", size=0.5)
gg <- gg + geom_map(data=drought, map=us_map,
                    aes(fill=value, map_id=State))
gg <- gg + geom_map(data=drought, map=us_map,
                    aes(map_id=State),
                    fill="#ffffff", alpha=0, color="white",
                    show_guide=FALSE)
gg <- gg + geom_text(data=centers, aes(label=id, x=x, y=y), color="white", size=4)
gg <- gg + scale_fill_distiller(name="State\nDrought\nCoverage", palette="RdPu", na.value="#7f7f7f",
                                labels=sprintf("%d%%", c(0, 25, 50, 75, 100)))
gg <- gg + coord_map()
gg <- gg + facet_wrap(~intensity, ncol=2)
gg <- gg + labs(x=NULL, y=NULL, title=sprintf("U.S. Drought Conditions as of %s\n", Sys.Date()))
gg <- gg + theme_bw()
gg <- gg + theme(plot.title=element_text(face="bold", hjust=0, size=24))
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(panel.margin=unit(3, "lines"))
gg <- gg + theme(panel.grid=element_blank())
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(axis.text=element_blank())
gg <- gg + theme(strip.background=element_blank())
gg <- gg + theme(strip.text=element_text(face="bold", hjust=0, size=14))
gg <- gg + theme(legend.position=c(0.75, 0.15))
gg <- gg + theme(legend.direction="horizontal")
gg <- gg + theme(legend.title.align=1)

png(sprintf("%s.png", today), width=800, height=800)
print(gg)
dev.off()

20150515

Now, you can easily animate these over time to show the progression/regression of the drought conditions. If you're sure your audience can work with SVG files, you can use those for very crisp/sharp maps (and even feed it to D3 or path editing tools). If you have an example of how you're using hexbin choropleths, drop a note in the comments. The code from above is also on github.

There’s been lots of buzz about “statebin” maps of late. A recent tweet by @andrewxhill referencing work by @dannydb pointed to a nice shapefile (alternate link) that ends up being a really great way to handle statebin maps (and I feel like a fool for not considering it for a more generic solution earlier).

Here’s a way to use the GeoJSON version in R. I like GeoJSON since it’s a single file vs a directory of files and is readable vs binary. If you’re in a TL;DR hurry, you can just review the code in this gist. Read on for expository.

Taking a look around

When you download the GeoJSON, it should be in a file called us_states_hexgrid.geojson. We can see what’s in there with R pretty easily:

library(rgdal)

ogrInfo("us_states_hexgrid.geojson", "OGRGeoJSON")

## Source: "us_states_hexgrid.geojson", layer: "OGRGeoJSON"
## Driver: GeoJSON number of rows 51 
## Feature type: wkbPolygon with 2 dimensions
## Extent: (-137.9747 26.39343) - (-69.90286 55.3132)
## CRS: +proj=longlat +datum=WGS84 +no_defs  
## Number of fields: 6 
##         name type length typeName
## 1 cartodb_id    0      0  Integer
## 2 created_at    4      0   String
## 3 updated_at    4      0   String
## 4      label    4      0   String
## 5       bees    2      0     Real
## 6  iso3166_2    4      0   String

Along with the basic shapefile goodness, we have some data, too! We’ll use all this to make a chorpleth hexbin of “bees” (I have no idea what this is but assume it has something to do with bee population, which is a serious problem on the planet right now). Let’s dig in.

Plotting the bins

First we need to read in the data, which is pretty simple:


us <- readOGR("us_states_hexgrid.geojson", "OGRGeoJSON")

That ends up being a fairly complex object with polygons and data. However, we can take a quick look at it with base R graphics:


plot(us)

base1

Yay! While we could do most (if not all) the remainder of the graphics in base R, I personally believe ggplot is more intuitive and expressive, so let's do the same thing with ggplot. First, we'll have to get the data structure into something ggplot can handle:

library(ggplot2)

us_map <- fortify(us, region="iso3166_2")

That gives us a data frame with the 2-letter state abbreviations as the "region" keys. Now we can do a basic ggplot:

ggplot(data=us_map, aes(map_id=id, x=long, y=lat)) + 
  geom_map(map=us_map, color="black", fill="white")

Rplot

Ugh. Talk about ugly. But, at least it works! Now all we need to do is turn it into a choropleth, remove some map chart junk and make it look prettier!

Upping the aesthetics

There's a pretty good idiom for making maps in R. There's a handy layer/geom called geom_map which takes care of a ton of details under the hood. We can use it for making outlines and fills and add as many layers of them as we want/need. For our needs, we'll:

  • put down a base layer of polygons
  • add a fill layer for our data
  • get rid of map chart junk

This is all pretty straightforward once you get the hang of it:

g <- ggplot()

# Plot base map -----------------------------------------------------------

gg <- gg + geom_map(data=us_map, map=us_map,
                    aes(x=long, y=lat, map_id=id),
                    color="white", size=0.5)

# Plot filled polygons ----------------------------------------------------

gg <- gg + geom_map(data=us@data, map=us_map,
                    aes(fill=bees, map_id=iso3166_2))

# Remove chart junk for the “map" -----------------------------------------

gg <- gg + labs(x=NULL, y=NULL)
gg <- gg + theme_bw()
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(panel.grid=element_blank())
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(axis.text=element_blank())
gg

Rplot01

Definitely better, but it still needs work. Outlines would be good and it definitely needs a better color palette. It would also be nice if the polygons weren't "warped". We can fix these issues by adding in a few other elements:

gg <- ggplot()
gg <- gg + geom_map(data=us_map, map=us_map,
                    aes(x=long, y=lat, map_id=id),
                    color="white", size=0.5)
gg <- gg + geom_map(data=us@data, map=us_map,
                    aes(fill=bees, map_id=iso3166_2))

# Overlay borders without ugly line on legend -----------------------------

gg <- gg + geom_map(data=us@data, map=us_map,
                    aes(map_id=iso3166_2),
                    fill="#ffffff", alpha=0, color="white",
                    show_guide=FALSE)

# ColorBrewer scale; using distiller for discrete vs continuous -----------

gg <- gg + scale_fill_distiller(palette="RdPu", na.value="#7f7f7f")

# coord_map mercator works best for the display ---------------------------

gg <- gg + coord_map()

gg <- gg + labs(x=NULL, y=NULL)
gg <- gg + theme_bw()
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(panel.grid=element_blank())
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(axis.text=element_blank())
gg

Rplot02

Much better. We use a "hack" to keep the legend free of white slash marks for the polygon outlines (see the comments for a less-hackish way) and coord_map to let the projection handle the "unwarping". By using the distiller fill, we get discrete color bins vs continuous shades (use what you feel is most appropriate, though, for your own work).

Where are we?

Most statebin/hexbin maps still (probably) need state labels since (a) Americans are notoriously bad at geography and (b) even if they were good at geography, we've removed much of the base references for folks to work from accurately.

The data exists in the shapefile, but to get the labels put in the centers of each polygon we have to do a bit of work:

library(rgeos)

centers <- cbind.data.frame(data.frame(gCentroid(us, byid=TRUE), id=us@data$iso3166_2))

That gets us a data frame of the x & y centers of each polygon along with the (abbreviated) state name. We can now add a layer with geom_text to place the label. The following is the complete solution:

library(rgdal)
library(rgeos)
library(ggplot2)

us <- readOGR("us_states_hexgrid.geojson", "OGRGeoJSON")

centers <- cbind.data.frame(data.frame(gCentroid(us, byid=TRUE), id=us@data$iso3166_2))

us_map <- fortify(us, region="iso3166_2")

ggplot(data=us_map, aes(map_id=id, x=long, y=lat)) + geom_map(map=us_map, color="black", fill="white")

gg <- ggplot()
gg <- gg + geom_map(data=us_map, map=us_map,
                    aes(x=long, y=lat, map_id=id),
                    color="white", size=0.5)
gg <- gg + geom_map(data=us@data, map=us_map,
                    aes(fill=bees, map_id=iso3166_2))
gg <- gg + geom_map(data=us@data, map=us_map,
                    aes(map_id=iso3166_2),
                    fill="#ffffff", alpha=0, color="white",
                    show_guide=FALSE)
gg <- gg + geom_text(data=centers, aes(label=id, x=x, y=y), color="white", size=4)
gg <- gg + scale_fill_distiller(palette="RdPu", na.value="#7f7f7f")
gg <- gg + coord_map()
gg <- gg + labs(x=NULL, y=NULL)
gg <- gg + theme_bw()
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(panel.grid=element_blank())
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(axis.text=element_blank())
gg

Rplot03

Wrapping up

This is a pretty neat way to work with "statebins" and I'll probably take some time over the summer to update my statebins package to use shapefiles and allow for more generic shapes. Ramnath Vaidyanathan has also done some work with statebins and javascript, so I'll see what I can do to merge all the functionality into one package.

If you've got an alternate way to work with these or have some interesting "bins" to show, drop a note in the comments.

I’ve been seeing an uptick in static US “lower 48” maps with “meh” projections this year, possibly caused by a flood of new folks resolving to learn R but using pretty old documentation or tutorials. I’ve also been seeing an uptick in folks needing to geocode US city/state to lat/lon. I thought I’d tackle both in a quick post to show how to (simply) use a decent projection for lower 48 US maps and then how to use a _very_ basic package I wrote – [localgeo](http://github.com/hrbrmstr/localgeo) to avoid having to use an external API/service for basic city/state geocoding.

### Albers All The Way

I could just plot an Albers projected map, but it’s more fun to add some data. We’ll start with some setup libraries and then read in some recent earthquake data, then filter it for our map display:

library(ggplot2)
library(dplyr)
library(readr) # devtools::install_github("hadley/readr")
 
# Earthquakes -------------------------------------------------------------
 
# get quake data ----------------------------------------------------------
quakes <- read_csv("http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_month.csv")
 
# filter all but lower 48 US ----------------------------------------------
quakes %>%
  filter(latitude>=24.396308, latitude<=49.384358,
         longitude>=-124.848974, longitude<=-66.885444) -> quakes
 
# bin by .5 ---------------------------------------------------------------
quakes$Magnitude <- as.numeric(as.character(cut(quakes$mag, breaks=c(2.5, 3, 3.5, 4, 4.5, 5),
    labels=c(2.5, 3, 3.5, 4, 4.5), include.lowest=TRUE)))

Many of my mapping posts use quite a few R geo libraries, but this one just needs `ggplot2`. We extract the US map data, turn it into something `ggplot` can work with, then plot our quakes on the map:

us <- map_data("state")
us <- fortify(us, region="region")
 
# theme_map ---------------------------------------------------------------
devtools::source_gist("33baa3a79c5cfef0f6df")
 
# plot --------------------------------------------------------------------
gg <- ggplot()
gg <- gg + geom_map(data=us, map=us,
                    aes(x=long, y=lat, map_id=region, group=group),
                    fill="#ffffff", color="#7f7f7f", size=0.25)
gg <- gg + geom_point(data=quakes,
                      aes(x=longitude, y=latitude, size=Magnitude),
                      color="#cb181d", alpha=1/3)
gg <- gg + coord_map("albers", lat0=39, lat1=45)
gg <- gg + theme_map()
gg <- gg + theme(legend.position="right")
gg

2.5+ mag quakes in Lower US 48 in past 30 days

Plot_Zoom

### Local Geocoding

There are many APIs with corresponding R packages/functions to perform geocoding (one really spiffy recent one is [geocodeHERE](http://cran.r-project.org/web/packages/geocodeHERE/)). While Nokia’s service is less restrictive than Google’s, most of these sites are going to have some kind of restriction on the number of calls per second/minute/day. You could always install the [Data Science Toolkit](http://www.datasciencetoolkit.org/) locally (note: it was down as of the original posting of this blog) and perform the geocoding locally, but it does take some effort (and space/memory) to setup and get going.

If you have relatively clean data and only need city/state resolution, you can use a package I made – [localgeo](http://github.com/hrbrmstr/localgeo) as an alternative. I took a US Gov census shapefile and extracted city, state, lat, lon into a data frame and put a lightweight function shim over it (it’s doing nothing more than `dplyr::left_join`). It won’t handle nuances like “St. Paul, MN” == “Saint Paul, MN” and, for now, it requires you to do the city/state splitting, but I’ll be tweaking it over the year to be a bit more forgiving.

We can give this a go and map the [greenest cities in the US in 2014](http://www.nerdwallet.com/blog/cities/greenest-cities-america/) as crowned by, er, Nerd Wallet. I went for “small data file with city/state in it”, so if you know of a better source I’ll gladly use it instead. Nerd Wallet used DataWrapper, so getting the actual data was easy and here’s a small example of how to get the file, perform the local geocoding and use an Albers projection for plotting the points. The code below assumes you’re still in the R session that used some of the `library` calls earlier in the post.

library(httr)
library(localgeo) # devtools::install_github("hrbrmstr/localgeo")
library(tidyr)
 
# greenest cities ---------------------------------------------------------
# via: http://www.nerdwallet.com/blog/cities/greenest-cities-america/
 
url <- "https://gist.githubusercontent.com/hrbrmstr/1078fb798e3ab17556d2/raw/53a9af8c4e0e3137a0a8d4d6332f7a6073d93fb5/greenest.csv"
greenest <- read.table(text=content(GET(url), as="text"), sep=",", header=TRUE, stringsAsFactors=FALSE)
 
greenest %>%
  separate(City, c("city", "state"), sep=", ") %>%
  filter(!state %in% c("AK", "HI")) -> greenest
 
greenest_geo <- geocode(greenest$city, greenest$state)
 
gg <- ggplot()
gg <- gg + geom_map(data=us, map=us,
                    aes(x=long, y=lat, map_id=region, group=group),
                    fill="#ffffff", color="#7f7f7f", size=0.25)
gg <- gg + geom_point(data=greenest_geo,
                      aes(x=lon, y=lat),
                      shape=21, color="#006d2c", fill="#a1d99b", size=4)
gg <- gg + coord_map("albers", lat0=39, lat1=45)
gg <- gg + labs(title="Greenest Cities")
gg <- gg + theme_map()
gg <- gg + theme(legend.position="right")
gg

Nerd Wallets’s Greenest US (Lower 48) Cities 2014

Plot_Zoom 2

Let me reinforce that the `localgeo` package will most assuredly fail to geocode some city/state combinations. I’m looking for a more comprehensive shapefile to ensure I have the complete list of cities and I’ll be adding some code to help make the lookups more forgiving. It may at least help when you bump into an API limit and need to crank out something in a hurry.

I noticed that the @rOpenSci folks had an interface to [ip-api.com](http://ip-api.com/) on their [ToDo](https://github.com/ropensci/webservices/wiki/ToDo) list so I whipped up a small R package to fill said gap.

Their IP Geolocation API will take an IPv4, IPv6 or FQDN and kick back a ASN, lat/lon, address and more. The [ipapi package](https://github.com/hrbrmstr/ipapi) exposes one function – `geolocate` which takes in a character vector of any mixture of IPv4/6 and domains and returns a `data.table` of results. Since `ip-api.com` has a restriction of 250 requests-per-minute, the package also tries to help ensure you don’t get your own IP address banned (there’s a form on their site you can fill in to get it unbanned if you do happen to hit the limit). Overall, there’s nothing fancy in the package, but it gets the job done.

I notified the rOpenSci folks about it, so hopefully it’ll be one less thing on that particular to-do list.

You can see it in action in combination with the super-spiffy [leaflet](http://www.htmlwidgets.org/showcase_leaflet.html) htmlwidget:

library(leaflet)
library(ipapi)
library(maps)
 
# get top 500 domains
sites <- read.csv("http://moz.com/top500/domains/csv", stringsAsFactors=FALSE)
 
# make reproducible
set.seed(1492)
 
# pick out a random 50
sites <- sample(sites$URL, 50) 
sites <- gsub("/", "", sites)
locations <- geolocate(sites)
 
# take a quick look
dplyr::glimpse(locations)
 
## Observations: 50
## Variables:
## $ as          (fctr) AS2635 Automattic, Inc, AS15169 Google Inc., AS3561...
## $ city        (fctr) San Francisco, Mountain View, Chesterfield, Mountai...
## $ country     (fctr) United States, United States, United States, United...
## $ countryCode (fctr) US, US, US, US, US, US, JP, US, US, IT, US, US, US,...
## $ isp         (fctr) Automattic, Google, Savvis, Google, Level 3 Communi...
## $ lat         (dbl) 37.7484, 37.4192, 38.6631, 37.4192, 38.0000, 33.7516...
## $ lon         (dbl) -122.4156, -122.0574, -90.5771, -122.0574, -97.0000,...
## $ org         (fctr) Automattic, Google, Savvis, Google, AddThis, Peer 1...
## $ query       (fctr) 192.0.80.242, 74.125.227.239, 206.132.6.134, 74.125...
## $ region      (fctr) CA, CA, MO, CA, , GA, 13, MA, TX, , MA, TX, CA, , ,...
## $ regionName  (fctr) California, California, Missouri, California, , Geo...
## $ status      (fctr) success, success, success, success, success, succes...
## $ timezone    (fctr) America/Los_Angeles, America/Los_Angeles, America/C...
## $ zip         (fctr) 94110, 94043, 63017, 94043, , 30303, , 02142, 78218...
 
# all i want is the world!
world <- map("world", fill = TRUE, plot = FALSE) 
 
# kick out a a widget
leaflet(data=world) %>% 
  addTiles() %>% 
  addCircleMarkers(locations$lon, locations$lat, 
                   color = '#ff0000', popup=sites)

50 Random Top Sites

Even though it’s still at version `0.4`, the `ggvis` package has quite a bit of functionality and is highly useful for exploratory data analysis (EDA). I wanted to see how geographical visualizations would work under it, so I put together six examples that show how to use various features of `ggvis` for presenting static & interactive cartographic creations. Specifically, the combined exercises demonstrate:

– basic map creation
– basic maps with points/labels
– dynamic choropleths (with various scales & tooltips)
– applying projections and custom color fills (w/tooltips)
– apply projections and projecting coordinates for plotting (w/tooltips that handle missing data well)

If you want to skip the post and head straight to the code you can [head on over to github](https://github.com/hrbrmstr/ggvis-maps), [peruse the R markdown file on RPubs](http://rpubs.com/hrbrmstr/ggvis-maps) or play with the [shiny version](https://hrbrmstr.shinyapps.io/ggvis-maps/). You’ll need that code to actually run any of the snippets below since I’m leaving out some code-cruft for brevity. Also, all the map graphics below were generated by saving the `ggvis` output as PNG files (for best browser compatibility), right from the `ggvis` renderer popup. Click/tap each for a larger version.

### Basic Polygons

1

Even though we still need the help of `ggplot2`’s `fortify`, it’s pretty straightforward to crank out a basic map in `ggvis`:

maine <- readOGR("data/maine.geojson", "OGRGeoJSON")
 
map <- ggplot2::fortify(maine, region="name")
 
map %>%
  ggvis(~long, ~lat) %>%
  group_by(group, id) %>%
  layer_paths(strokeOpacity:=0.5, stroke:="#7f7f7f") %>%
  hide_legend("fill") %>%
  hide_axis("x") %>% hide_axis("y") %>%
  set_options(width=400, height=600, keep_aspect=TRUE)

The code is very similar to one of the ways we render the same image in `ggplot`. We first read in the shapefile, convert it into a data frame we can use for plotting, group the polygons properly, render them with `layer_paths` and get rid of chart junk. Now, `ggvis` (to my knowledge as of this post) has no equivalent of `coord_map`, so we have to rely on the positioning in the projection and work out the proper `height` and `width` parameters to use with a uniform aspect ratio (`keep_aspect=TRUE`).

>For those not familiar with `ggvis` the `~` operator lets us tell `ggivs` which columns (or expressions using columns) to map to function parameters and `:=` operator just tells it to use a raw, un-scaled value. You can find out more about [why the tilde was chosen](https://github.com/rstudio/ggvis/issues/173) or about the [various other special operators](http://ggvis.rstudio.com/ggvis-basics.html).

### Basic Annotations

2

You can annotate maps in an equally straightforward way.

county_centers <- maine %>%
  gCentroid(byid=TRUE) %>%
  data.frame %>%
  cbind(name=maine$name %>% gsub(" County, ME", "", .) )
 
map %>%
  group_by(group, id) %>%
  ggvis(~long, ~lat) %>%
  layer_paths(strokeWidth:=0.25, stroke:="#7f7f7f") %>%
  layer_points(data=county_centers, x=~x, y=~y, size:=8) %>%
  layer_text(data=county_centers,
             x=~x+0.05, y=~y, text:=~name,
             baseline:="middle", fontSize:=8) %>%
  hide_legend("fill") %>%
  hide_axis("x") %>% hide_axis("y") %>%
  set_options(width=400, height=600, keep_aspect=TRUE)

>Note that the `group_by` works both before or after the `ggvis` call. Consistent pipe idioms FTW!

Here, we’re making a data frame out of the county centroids and names then using that in a call to `layer_points` and `layer_text`. Note how you can change the data source for each layer (just like in `ggplot)` and use expressions just like in `ggplot` (we moved the text just slightly to the right of the dot).

>Since `ggvis` outputs [vega](http://trifacta.github.io/vega/) and uses [D3](http://d3js.org/) for rendering, you should probably take a peek at those frameworks as it will help you understand the parameter name differences between `ggvis` and `ggplot`.

### Basic Choropleths

3

There are actually two examples of this basic state choropleth in the code, but one just uses a different color scale, so I’ll just post the code for one here. This is also designed for interactivity (it has tooltips and lets you change the fill variable) so you should run it locally or look at the [shiny version](https://hrbrmstr.shinyapps.io/ggvis-maps/).

# read in some crime & population data for maine counties
me_pop <- read.csv("data/me_pop.csv", stringsAsFactors=FALSE)
me_crime <- read.csv("data/me_crime.csv", stringsAsFactors=FALSE)
 
# get it into a form we can use (and only use 2013 data)
 
crime_1k <- me_crime %>%
  filter(year==2013) %>%
  select(1,5:12) %>%
  left_join(me_pop) %>%
  mutate(murder_1k=1000*(murder/population_2010),
         rape_1k=1000*(rape/population_2010),
         robbery_1k=1000*(robbery/population_2010),
         aggravated_assault_1k=1000*(aggravated_assault/population_2010),
         burglary_1k=1000*(burglary/population_2010),
         larceny_1k=1000*(larceny/population_2010),
         motor_vehicle_theft_1k=1000*(motor_vehicle_theft/population_2010),
         arson_1k=1000*(arson/population_2010))
 
# normalize the county names
 
map %<>% mutate(id=gsub(" County, ME", "", id)) %>%
  left_join(crime_1k, by=c("id"="county"))
 
# this is for the tooltip. it does a lookup into the crime data frame and
# then uses those values for the popup
 
crime_values <- function(x) {
  if(is.null(x)) return(NULL)
  y <- me_crime %>% filter(year==2013, county==x$id) %>% select(1,5:12)
  sprintf("<table width='100%%'>%s</table>",
          paste0("<tr><td style='text-align:left'>", names(y),
         ":</td><td style='text-align:right'>", format(y), collapse="</td></tr>"))
}
 
map %>%
  group_by(group, id) %>%
  ggvis(~long, ~lat) %>%
  layer_paths(fill=input_select(label="Crime:",
                                choices=crime_1k %>%
                                  select(ends_with("1k")) %>%
                                  colnames %>% sort,
                                id="Crime",
                                map=as.name),
              strokeWidth:=0.5, stroke:="white") %>%
  scale_numeric("fill", range=c("#bfd3e6", "#8c6bb1" ,"#4d004b")) %>%
  add_tooltip(crime_values, "hover") %>%
  add_legend("fill", title="Crime Rate/1K Pop") %>%
  hide_axis("x") %>% hide_axis("y") %>%
  set_options(width=400, height=600, keep_aspect=TRUE)

You can omit the `input_select` bit if you just want to do a single choropleth (just map `fill` to a single variable). The `input_select` tells `ggvis` to make a minimal bootstrap sidebar-layout scaffold around the actual graphic to enable variable interaction. In this case we let the user explore different types of crimes (by 1K population) and we also have a tooltip that shows the #’s of each crime in each county as we hover.

### Projections and Custom Colors

4

We’re pretty much (mostly) re-creating a [previous post](http://rud.is/b/2014/11/16/moving-the-earth-well-alaska-hawaii-with-r/) in this example and making a projected U.S. map with drought data (as of 2014-12-23).

us <- readOGR("data/us.geojson", "OGRGeoJSON")
us <- us[!us$STATEFP %in% c("02", "15", "72"),]
 
# same method to change the projection
 
us_aea <- spTransform(us, CRS("+proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs"))
 
map <- ggplot2::fortify(us_aea, region="GEOID")
 
droughts <- read.csv("data/dm_export_county_20141223.csv")
droughts$id <- sprintf("%05d", as.numeric(as.character(droughts$FIPS)))
droughts$total <- with(droughts, (D0+D1+D2+D3+D4)/5)
 
map_d <- merge(map, droughts, all.x=TRUE)
 
# pre-make custom colors per county
 
ramp <- colorRampPalette(c("white", brewer.pal(n=9, name="YlOrRd")), space="Lab")
 
map_d$fill_col <- as.character(cut(map_d$total, seq(0,100,10), include.lowest=TRUE, labels=ramp(10)))
map_d$fill_col <- ifelse(is.na(map_d$fill_col), "#FFFFFF", map_d$fill_col)
 
drought_values <- function(x) {
  if(is.null(x) | !(x$id %in% droughts$id)) return(NULL)
  y <- droughts %>% filter(id==x$id) %>% select(1,3,4,6:10)
  sprintf("<table width='100%%'>%s</table>",
          paste0("<tr><td style='text-align:left'>", names(y),
         ":</td><td style='text-align:right'>", format(y), collapse="</td></tr>"))
}
 
map_d %>%
  group_by(group, id) %>%
  ggvis(~long, ~lat) %>%
  layer_paths(fill:=~fill_col, strokeOpacity := 0.5, strokeWidth := 0.25) %>%
  add_tooltip(drought_values, "hover") %>%
  hide_legend("fill") %>%
  hide_axis("x") %>% hide_axis("y") %>%
  set_options(width=900, height=600, keep_aspect=TRUE)

It’s really similar to the previous code (and you may/should be familiar with the Albers transform from the previous post).

### World Domination

5

world <- readOGR("data/ne_50m_admin_0_countries.geojson", layer="OGRGeoJSON")
world <- world[!world$iso_a3 %in% c("ATA"),]
world <- spTransform(world, CRS("+proj=wintri"))
 
map_w <- ggplot2::fortify(world, region="iso_a3")
 
# really quick way to get coords from a KML file
 
launch_sites <- rbindlist(lapply(ogrListLayers("data/launch-sites.kml")[c language="(1:2,4:9)"][/c], function(layer) {
  tmp <- readOGR("data/launch-sites.kml", layer)
  places <- data.table(coordinates(tmp)[,1:2], as.character(tmp$Name))
}))
setnames(launch_sites, colnames(launch_sites), c("lon", "lat", "name"))
 
# now, project the coordinates we extracted
 
coordinates(launch_sites) <-  ~lon+lat
launch_sites <- as.data.frame(SpatialPointsDataFrame(spTransform(
  SpatialPoints(launch_sites, CRS("+proj=longlat")), CRS("+proj=wintri")),
  launch_sites@data))
 
map_w %>%
  group_by(group, id) %>%
  ggvis(~long, ~lat) %>%
  layer_paths(fill:="#252525", stroke:="white", strokeOpacity:=0.5, strokeWidth:=0.25) %>%
  layer_points(data=launch_sites, 
               x=~lon, y=~lat, 
               fill:="#cb181d", stroke:="white", 
               size:=25, fillOpacity:=0.5, strokeWidth:=0.25) %>%
  hide_legend("fill") %>%
  hide_axis("x") %>% hide_axis("y") %>%
  set_options(width=900, height=500, keep_aspect=TRUE)

The main differences in this example are the re-projection of the data we’re using. I grabbed a KML file of rocket launch sites from Wikipedia and made it into a data frame then [re]project those points into Winkel-Tripel for use with Winkel-Tripel world map made at the beginning of the example. The `ggplot` `coord_map` handles these transforms for you, so until there’s a `ggvis` equivalent, you’ll need to do it this way (though, there’s not Winkel-Tripel projection in the `mapproject` package so you kinda need to do it this way for `ggplot` as well for this projection).

### Wrapping Up

There’s code up on github for the “normal”, `Rmd` and Shiny versions of these examples. Give each a go and try tweaking various parameters, changing up the tooltips or using your own data. Don’t forget to drop a note in the comments with any of your creations and use github for any code issues.