Moritz Stefaner started off 2016 with a [very spiffy post](http://truth-and-beauty.net/experiments/ach-ingen-zell/) on _”a visual exploration of the spatial patterns in the endings of German town and village names”_. Moritz was [exploring some new data processing & visualization tools](https://github.com/moritzstefaner/ach-ingen-zell) for the post, but when I saw what he was doing I wondered how hard it would be to do something similar in R and also used it as an opportunity to start practicing a new habit in 2016: packages vs projects.
To state more precisely the goals for this homage, the plan was to:
– use as close to the same data sets Mortiz has in his github repo, _including_ the ones in pure javascript
– generate an HTML page as output that is as close to the style in Moritz’s visualization
– use R for _everything_ (i.e. no “cheating” by sneaking in some javascript via `htmlwidgets`)
– bundle everything into a package to take advantage of all the good stuff that comes with R package validation
You may want to [take a look at the result](http://rud.is/zellingenach.html) to see if you want to continue reading (I hope you will!).
### The Setup
By using an R package as the framework for the visualization, it’s possible to keep the data with the code and also organize and document the code in a way that makes it easy for folks to use and explore without cutting and pasting (our `source`ing) code. It also makes it possible to list all the dependencies for the project and help ensure they’ll be installed when someone tries to work with it.
While I _could_ have converted Moritz’s processed data into R data files, I left the CSV intact and the javascript file of suffix groupings also intact to show that R is extremely flexible when it comes to data processing (which is a “duh” for most folks by this point but the use of javascript data structures might give some folks ideas as how to reduce data duplication between projects). Both these files get stored in the `inst/alt` folder of the source package. I also end up using some CSS for the final visualization and placed that into a file in the same directory, which makes the code that generates the HTML a bit cleaner.
Because R processes some things automatically (like `.onAttach`) when it interacts with a package one can have it provide helpful instructions (in this case, how to generate the visualization) in similar fashion to the `ggplot2` loading messages.
Similarly, there both the package itself and the package functions have documentation to help folks understand both what the package and each component is doing.
### The Fun Stuff
The CSV file of places looks something like this:
name,latitude,longitude Nierskanal,49.01,13.23 Zwiefelhof,49.22,11.18 Zwiefaltendorf,48.21,9.51 Zwiefalten,48.23,9.46 Zwiedorf,53.69,13.05 Zwickgabel,48.58,8.31 Zwickau,50.72,12.48 Zwethau,51.58,13.04 Zwesten,51.05,9.17
and, the suffix groupings list looks like this:
const suffixList = [ ["ach", "a", "aa", "ah"], ["ar", "ahr"], ["ate", "te", "nit", "net"], ["au", "aue", "oog", "ooge", "ohe", "oie"], ["bach", "bach", "bek", "beken", "beck", "bke"], ["berg", "bergen", "barg", "bargen"], ["born", "bronn"], ["bruch", "broich", "brook", "brock", "brauk"], ["bruck", "brück", "brügge"], ... ];
While `read.csv` (no need for `readr` as the file is small) can handle the CSV file, we use the `V8` package to source the javascript and convert it to an R object:
ct <- v8() ct$source(system.file("alt/suffixlist.js", package="zellingenach")) ct$get("suffixList")
We actually turn that into a vector of regular expressions (for town name ending checking) and a list of vectors (for the HTML visualization creation). Check out `suffix_regex()` and `suffix_names()` in the source code.
The `read_places()` function builds a `data.frame` of the places combined with the suffix grouping(s) they belong to:
# read in the file plc <- read.csv(system.file("alt/placenames_de.tsv", package="zellingenach"), stringsAsFactors=FALSE) # iterate over each suffix and identify which place names match the grouping lapply(suf, function(regex) { which(stri_detect_regex(plc$name, regex)) }) -> matched_endings plc$found <- "" # add which grouping(s) the place was found to a new column for(i in 1:length(matched_endings)) { where_found <- matched_endings[[i]] plc$found[where_found] <- paste0(plc$found[where_found], sprintf("%d|", i)) } # some don't match so get rid of them mutate(filter(plc, found != ""), found=sub("\\|$", "", found))
I do something a bit different than Moritz in that in that I allow towns to be part of multiple suffix groups, since:
– I’m neither a historian nor expert in German town naming conventions, and
– the javascript version and this R version both take a naive approach to suffix mapping.
This means my numbers (for the _”#### places”_ label) will be different for some of my maps.
R has similar shortcut functions (Mortiz uses D3) to make hexgrids out of shapefiles. Here’s the entirety of `create_hexgrid()`:
de_shp <- getData("GADM", country="DEU", level=0, path=tempdir()) de_hex_pts <- spsample(de_shp, type="hexagonal", n=10000, cellsize=0.19, offset=c(0.5, 0.5), pretty=TRUE) HexPoints2SpatialPolygons(de_hex_pts)
You can play with `cellsize` to change the number of hexes. I tried to find a good number to get close to the # in Moritz’s maps.
This all gets put together in `make_maps()` where we use `ggplot2` to build 52 gridded heatmaps (one for each suffix grouping). I used a log of the counts to map to a binned viridis color scale, so my colors come out a bit different than Moritz’s but the overall patterns are on par with his.
Finally, `display_maps()` takes the list created by `make_maps()` and builds out an HTML page using the `htmltools` package for the page framework and `svglite::htmlSVG` to make SVGs of the ggplot objects). NOTE that you can use the `output_file` option of `display_maps()` to send the HTML to a file as well as display it in the viewer/browser.
### Fin
Because the project is in a pacakge, we can run package checks to see if we’re missing anything including other pacakge dependencies, function documentation and other details that the package tools are gleeful to point out. We can also include code to test out our various components to ensure they are behaving as expected (i.e. generating the right data/output).
Once nice thing about the output is that it’s “responsive”, which means it handles multiple screen sizes quite well. So, if your screen is huge, you’ll have many map boxes on one line and if it’s small (like the `iframe` below) it will have fewer.
You’ll see that my maps are a bit bigger than Moritz’s. This is due to both the hex grid size and the fact that the SVG output is just slightly larger overall than the ones made by D3. Of note: I noticed some suffix subtitle components wrapped at the “-” so I converted the plain dashes to non-breaking ones `‑`/”‑”.
The one downside to using a package for this is that it’s harder to post complete code into a blog post, but you can [clone the repo](https://github.com/hrbrmstr/zellingenach) to look at the code and skip the dissection and just generate the visualization locally via:
install.packages("ggalt") # OR: devtools::install_github("hrbrmstr/ggalt") devtools::install_github("hrbrmstr/zellingenach") display_maps()
By targeting SVG & HTML, we can make a cross-platform, crisp and responsive visualization all without leaving RStudio.
If you caught any errors or made something cool with any of the code, please drop an issue on github and a note in the comments (respectively)!
If you prefer a single- `source`-able version, please see [this gist](https://gist.github.com/hrbrmstr/f3d2568ad0f27b2384d3).
Happy New YeaR!