https://github.com/hrbrmstr/spdorling" />
There are two reasons for this post, one is to introduce a new R Markdown template and the other is to introduce a new R package for creating Dorling cartograms from R Spatial objects.
The new template is Prism Skeleton which you can find as a template type in the markdowntemplates
package. It uses the Skeleton CSS framework as a basis for the HTML template, Prism for syntax highlighting and uses both Fira Sans & Fira Code fonts.
The spdorling
package is based on work by datagistips
(Mathieu Rajerison). You can read all about Dorling’s catogram machinations in this document [PDF] and flip to page 60 to see the core algorithm for creating circular cartograms.
The key to this particular cartogram is keeping in mind who the neighbours of a given named region are and building such a list form scratch is not exactly “fun” so Mathieu cleverly used spdep::poly2nb()
to build the neighbours list automagically.
The basic operation of spdorling
is in the package README so we’ll work with another example based on Mike Bostock’s Protovis Experiments.
As most of you are likely at the stage where you’re riffing off of Minsc and shouting “Code! Not, Words!”, let’s dig in.
We’ll need some helpers, especially since we’re going to work with Bostock’s original data (which is JavaScript code O_o):
library(sp)
library(httr)
library(V8)
library(albersusa) # devtools::install_github("hrbrmstr/albersusa")
library(spdorling) # devtools::install_github("hrbrmstr/spdorling")
library(firasans) # devtools::install_github("hrbrmstr/firasans")
library(hrbrthemes)
library(tidyverse)
We’ll grab the code and run it through V8, then clean it up a bit
ctx <- v8()
res <- httr::GET("https://mbostock.github.io/protovis/ex/us_stats.js")
invisible(ctx$eval(content(res, as="text")))
us_stats <- ctx$get("us_stats")
states <- us_stats[setdiff(names(us_stats), c("maxYear", "minYear"))]
map_df(states, ~{ .x[c("obese", "pop")]}, .id="iso_3166_2") %>%
mutate(obese = obese / 100, count = obese * pop) %>%
group_by(iso_3166_2) %>%
mutate(year = us_stats$minYear:us_stats$maxYear) %>%
ungroup() -> ob_df
ob_df
## # A tibble: 714 x 5
## iso_3166_2 obese pop count year
## <chr> <dbl> <int> <dbl> <int>
## 1 AL 0.322 4627851 1490168. 1995
## 2 AL 0.309 4661900 1440527. 1996
## 3 AL 0.305 4599000 1402695. 1997
## 4 AL 0.289 4564591 1319167. 1998
## 5 AL 0.288 4530182 1304692. 1999
## 6 AL 0.284 4509412 1280673. 2000
## 7 AL 0.257 4488641 1153581. 2001
## 8 AL 0.245 4467871 1094628. 2002
## 9 AL 0.239 4447100 1062857. 2003
## 10 AL 0.224 4406449 987045. 2004
## # ... with 704 more rows
Now, we’ll need a basemap and there’s no way I’m going to wrangle JavaScript for this when it’s handy in a package I have installed:
usa <- albersusa::usa_composite(proj = "laea")
usa <- subset(usa, !(name %in% c("Alaska", "Hawaii", "District of Columbia")))
The data has obesity measurements for a number of years and you may want to plot more than a couple (we’ll be plotting two of them) of them, so let’s make a function to handle the details so you just have to call one function later on and make modifications to only one mosty self-contained piece of code to do so:
# @title Plot one Dorling obesity cartogram
# @param selected_year year in the available range of `us_stats$minYear:us_stats$maxYear`
# @param quiet show Dorling iterations/
# @param .pb if creating a bunch of these and using a `dplyr::progress_estimated()` prog
# progress bar, increment the progress
one_year <- function(selected_year, quiet=FALSE, .pb=NULL) {
stopifnot(selected_year %in% us_stats$minYear:us_stats$maxYear)
if (!is.null(.pb)) .pb$tick()$print()
# grab the year and merge the data
usa@data <- left_join(usa@data, filter(ob_df, year == selected_year), "iso_3166_2")
# make the cartogram!
dor <- dorling_from_sp(usa, value=usa$count, quiet=quiet)
# I'm only binding data to it since I want to color it better. I should likely
# make this a feature of the package, so feel free to file a PR or issue request
# if this would be handy.
dor$discs <- sp::SpatialPolygonsDataFrame(dor$discs, data=usa@data, match.ID = FALSE)
fortify(dor$discs, region="iso_3166_2") %>%
left_join(usa@data, c("id"="iso_3166_2")) %>%
tbl_df() -> discs_df
# I'm making the label data frame separate for convenice
as_data_frame(dor$xy) %>%
set_names(c("lng", "lat")) %>%
mutate(iso_3166_2 = usa@data$iso_3166_2) %>%
left_join(usa@data, "iso_3166_2") -> usa_labs
ggplot() +
geom_polygon(
data = discs_df, aes(long, lat, group=group, fill=obese), # plot the polys
size=0.125, color="white"
) +
geom_text(
data = usa_labs, aes(lng, lat, label=iso_3166_2, size=count), # plot the labels
family = font_fsm, color="white", show.legend = FALSE
) +
viridis::scale_fill_viridis(
name="Obesity\nRate",
direction=-1, label=scales::percent,
limits=range(filter(ob_df, obese>0) %>% pull(obese)) # use the whole obesity % range for a consistent legend
) +
scale_size_area(name="Population", labels=scales::comma) +
coord_fixed() +
labs(x=selected_year, y=NULL) +
ggthemes::theme_map(base_family = font_fsc) +
theme(legend.direction="horizontal")
}
Let’s see what 2008 looked like:
one_year(2008)
## Scale factor of 1 => number of overlaps : 5
## Scale factor of 0.9795918 => number of overlaps : 8
## Scale factor of 0.9591837 => number of overlaps : 11
## Scale factor of 0.9387755 => number of overlaps : 16
## Scale factor of 0.9183673 => number of overlaps : 20
## Scale factor of 0.8979592 => number of overlaps : 24
## Scale factor of 0.877551 => number of overlaps : 16
## Scale factor of 0.8571429 => number of overlaps : 15
## Scale factor of 0.8367347 => number of overlaps : 22
## Scale factor of 0.8163265 => number of overlaps : 19
## Scale factor of 0.7959184 => number of overlaps : 8
## Scale factor of 0.7755102 => number of overlaps : 0