Skip navigation

Category Archives: Javascript

[Vega-Lite](http://vega.github.io/vega-lite/) 1.0 was [released this past week](https://medium.com/@uwdata/introducing-vega-lite-438f9215f09e#.yfkl0tp1c). I had been meaning to play with it for a while but I’ve been burned before by working with unstable APIs and was waiting for this to bake to a stable release. Thankfully, there were no new shows in the Fire TV, Apple TV or Netflix queues, enabling some fast-paced nocturnal coding to make an [R `htmlwidget`s interface](https://github.com/hrbrmstr/vegalite) to the Vega-Lite code before the week was out.

What is “Vega” and why “-Lite”? [Vega](http://vega.github.io/) is _”a full declarative visualization grammar, suitable for expressive custom interactive visualization design and programmatic generation.”_ Vega-Lite _”provides a higher-level grammar for visual analysis, comparable to ggplot or Tableau, that generates complete Vega specifications.”_ Vega-Lite compiles to Vega and is more compact and accessible than Vega (IMO). Both are just JSON data files with a particular schema that let you encode the data, encodings and aesthetics for statistical charts.

Even I don’t like to write JSON by hand and I can’t imagine anyone really wanting to do that. I see Vega and Vega-Lite as amazing ways to serialize statistical charts from ggplot2 or even Tableau (or any Grammar of Graphics-friendly creation tool) and to pass around for use in other programs—like [Voyager](http://vega.github.io/voyager/) or [Pole★](http://vega.github.io/polestar/)—or directly on the web. It is “glued” to D3 (given the way data transformations are encoded and colors are specified) but it’s a pretty weak glue and one could make a Vega or Vega-Lite spec render to anything given some elbow grease.

But, enough words! Here’s how to make a simple Vega-Lite bar chart using `vegalite`:

# devtools::install_github("hrbrmstr/vegalite")
library(vegalite)
 
dat <- jsonlite::fromJSON('[
    {"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43},
    {"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53},
    {"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52}
  ]')
 
vegalite() %>% 
  add_data(dat) %>%
  encode_x("a", "ordinal") %>%
  encode_y("b", "quantitative") %>%
  mark_bar()

Note that bar graph you see above is _not_ a PNG file or `iframe`d widget. If you `view-source:` you’ll see that I was able to take the Vega-Lite generated spec for that widget code (done by piping the widget to `to_spec()`) and just insert it into this post via:

<style media="screen">.wpvegadiv { display:inline-block; margin:auto }</style>
 
<center><div id="vlvis1" class="wpvegadiv"></div></center>
 
<script>
var spec1 = JSON.parse('{"description":"","data":{"values":[{"a":"A","b":28},{"a":"B","b":55},{"a":"C","b":43},{"a":"D","b":91},{"a":"E","b":81},{"a":"F","b":53},{"a":"G","b":19},{"a":"H","b":87},{"a":"I","b":52}]},"mark":"bar","encoding":{"x":{"field":"a","type":"ordinal"},"y":{"field":"b","type":"quantitative"}},"config":[],"embed":{"renderer":"svg","actions":{"export":false,"source":false,"editor":false}}} ');
 
var embedSpec = { "mode": "vega-lite", "spec": spec1, "renderer": spec1.embed.renderer, "actions": spec1.embed.actions };
 
vg.embed("#vlvis1", embedSpec, function(error, result) {});
</script>

I did have have all the necessary js libs pre-loaded like you see [in this example](http://vega.github.io/vega-lite/tutorials/getting_started.html#embed). You can use the `embed_spec()` function to generate most of that for you, too.

This means you can use R to gather, clean, tidy and analyze data. Then, generate a visualization based on that data with `vegalite`. _Then_ generate a lightweight JSON spec from it and easily embed it anywhere without having to rig up a way to get a widget working or ship giant R markdown created files (like [this one](http://rud.is/projects/vegalite01.html) which has many full `vegalite` widgets on it).

One powerful feature of Vega/Vega-Lite is that the data does not have to be embedded in the spec.

Take this streamgraph visualization about unemployment levels across various industries over time:

vegalite() %>%
  cell_size(500, 300) %>%
  add_data("https://vega.github.io/vega-editor/app/data/unemployment-across-industries.json") %>%
  encode_x("date", "temporal") %>%
  encode_y("count", "quantitative", aggregate="sum") %>%
  encode_color("series", "nominal") %>%
  scale_color_nominal(range="category20b") %>%
  timeunit_x("yearmonth") %>%
  scale_x_time(nice="month") %>%
  axis_x(axisWidth=0, format="%Y", labelAngle=0) %>%
  mark_area(interpolate="basis", stack="center")

The URL you see in the R code is placed into the JSON spec. That means whenever that data changes and the visualization is refreshed, you see updated content without going back to R (or js code).

Now, dynamically-created visualizations are great, but what if you want to actually let your viewers have a copy of it? With Vega/Vega-Lite, you don’t need to resort to hackish bookmarklets, just change a configuration option to enable an export link:

vegalite(export=TRUE) %>%
  add_data("https://vega.github.io/vega-editor/app/data/seattle-weather.csv") %>%
  encode_x("date", "temporal") %>%
  encode_y("*", "quantitative", aggregate="count") %>%
  encode_color("weather", "nominal") %>%
  scale_color_nominal(domain=c("sun","fog","drizzle","rain","snow"),
                      range=c("#e7ba52","#c7c7c7","#aec7e8","#1f77b4","#9467bd")) %>%
  timeunit_x("month") %>%
  axis_x(title="Month") %>% 
  mark_bar()

(You can style/place that link however/wherever you want. It’s a simple classed `

`.)

If you choose a `canvas` renderer, the “export” option will be PNG vs SVG.

The package is nearly (~98%) feature complete to the 1.0 Vega-Lite standard. There are some tedious bits from the Vega-Lite spec remaining to be encoded. I’ve transcribed much of the Vega-Lite documentation to R function & package documentation with links back to the Vega-Lite sources if you need more detail.

I’m hoping to be able to code up an “`as_spec()`” function to enable quick conversion of ggplot2-created graphics to Vega-Lite (and support converting a ggplot2 object to a Vega-Lite spec in `to_spec()`) but that won’t be for a while unless someone wants to jump on board and implement an Vega expression creator/parser in R for me :-)

You can work with the current code [on github](https://github.com/hrbrmstr/vegalite) and/or jump on board to help with package development or file an issue with an idea or a bug. Please note that this package is under _heavy development_ and the function interface is very likely to change as I and others work with it and develop more streamlined ways to handle the encodings. Check back to the github repo often to find out what’s different (there will be a `NEWS` file posted soon and maintained as well).

An R user recently had the need to split a “full, human name” into component parts to retrieve first & last names. The full names could be anything from something simple like _”David Regan”_ to more complex & diverse such as _”John Smith Jr.”_, _”Izaque Iuzuru Nagata”_ or _”Christian Schmit de la Breli”_. Despite the face that I’m _pretty good_ at searching GitHub & CRAN for R stuff, my quest came up empty (though a teensy part of me swears I saw this type of thing in a package somewhere). I _did_ manage to find Python & node.js modules that carved up human names but really didn’t have the time to re-implement their functionality from scratch in R (or, preferably, Rcpp).

Rather than rely on the Python bridge to R (yuck) I decided to use @opencpu’s [V8 package](https://cran.rstudio.com/web/packages/V8/index.html) to wrap a part of the node.js [humanparser](https://github.com/chovy/humanparser) module. If you’re not familiar with V8, it provides the ability to run JavaScript code within R and makes it possible to pass variables into JavaScript functions and get data back in return. All the magic happens via a JSON data passing & Rcpp wrappers (and, of course, the super-awesome code Jeroen writes).

Working with JavaScript in R is as simple as creating an instance of the JavaScript V8 interpreter, loading up the JavaScript code that makes the functions work:

library(V8)
 
ct <- new_context()
ct$source(system.file("js/underscore.js", package="V8"))
ct$call("_.filter", mtcars, JS("function(x){return x.mpg < 15}"))
 
#>                      mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Duster 360          14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
#> Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
#> Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
#> Chrysler Imperial   14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
#> Camaro Z28          13.3   8  350 245 3.73 3.840 15.41  0  0    3    4

There are many more examples in the [V8 vignette](https://cran.rstudio.com/web/packages/V8/vignettes/v8_intro.html).

For `humanparser` I needed to use Underscore.js (it comes with V8) and a [function](https://github.com/chovy/humanparser/blob/master/index.js#L5-L74) from `humanparser` that I carved out to work the way I wanted it to. You can look at the innards of the package [on github](https://github.com/hrbrmstr/humanparser)—specifically, [this file](https://github.com/hrbrmstr/humanparser/blob/master/R/humanparser.r) (it’s _really_ small)— and, to use the two new functions the package exposes it’s as simple as doing:

devtools::install_github("hrbrmstr/humanparser")
 
library(humanparser)
 
parse_name("John Smith Jr.")
 
#> $firstName
#> [1] "John"
#> 
#> $suffix
#> [1] "Jr."
#> 
#> $lastName
#> [1] "Smith"
#> 
#> $fullName
#> [1] "John Smith Jr."

or the following to convert a bunch of ’em:

full_names <- c("David Regan", "Izaque Iuzuru Nagata", 
                "Christian Schmit de la Breli", "Peter Doyle", "Hans R.Bruetsch", 
                "Marcus Reichel", "Per-Axel Koch", "Louis Van der Walt", 
                "Mario Adamek", "Ugur Tozsekerli", "Judit Ludvai" )
 
parse_names(full_names)
 
#> Source: local data frame [11 x 4]
#> 
#>    firstName     lastName                     fullName middleName
#> 1      David        Regan                  David Regan         NA
#> 2     Izaque       Nagata         Izaque Iuzuru Nagata     Iuzuru
#> 3  Christian  de la Breli Christian Schmit de la Breli     Schmit
#> 4      Peter        Doyle                  Peter Doyle         NA
#> 5       Hans   R.Bruetsch              Hans R.Bruetsch         NA
#> 6     Marcus      Reichel               Marcus Reichel         NA
#> 7   Per-Axel         Koch                Per-Axel Koch         NA
#> 8      Louis Van der Walt           Louis Van der Walt         NA
#> 9      Mario       Adamek                 Mario Adamek         NA
#> 10      Ugur   Tozsekerli              Ugur Tozsekerli         NA
#> 11     Judit       Ludvai                 Judit Ludvai         NA

Now, the functions in this package won’t win any land-speed records since we’re going from R to C[++] to JavaScript and back, passing JSON-converted data back & forth, so I pwnd @quominus into making a full Rcpp-based human, full-name parser. And, he’s nearly done! So, keep an eye on [humaniformat](https://github.com/Ironholds/humaniformat) since it will no doubt be in CRAN soon.

The real point of this post is that there are _tons_ of JavaScript modules that will work well with the V8 package and let you get immediate functionality for something that might not be in R yet. You can prototype quickly (it took almost no time to make that package and you don’t even need to go that far), then optimize later. So, next time—if you can’t find some functionality directly in R—see if you can get by with a JavaScript shim, then convert to full R/Rcpp when/if you need to go into production.

If you’ve done any creative V8 hacks, drop a note in the comments!

There was some chatter on the twitters this week about a relatively new D3-based charting library called [TauCharts](http://taucharts.com/) (also @taucharts). The API looked pretty clean and robust, so I started working on an htmlwidget for it and was quickly joined by the Widget Master himself, @timelyportfolio.

TauCharts definitely has a “grammar of graphics” feel about it and the default aesthetics are super-nifty While the developers are actively adding new features and “geoms”, the core points (think scatterplot), lines and bars (including horizontal bars!) geoms are quite robust and definitely ready for your dashboards.

Between the two of us, we have a _substantial_ part of the [charting library API](http://api.taucharts.com/) covered. I think the only major thing left unimplemented is composite charts (i.e. lines + bars + points on the same chart) and some minor tweaks around the edges.

While you can find it on [github](http://github.com/hrbrmstr/taucharts) and do the normal:

devtools::install_github("hrbrmstr/taucharts")

or, even use the official initial release version:

devtools::install_github("hrbrmstr/taucharts@v0.1.0")

I’ll use the `dev` version:

devtools::install_github("hrbrmstr/taucharts@dev"

for the example below, mostly since it includes the data set I want to use to mimic the current, featured example on the [TauCharts homepage](http://taucharts.com/) and also has full documentation with examples.

Here’s all it takes to make a faceted scatterplot with:

– interactive tooltips
– interactive legend
– custom-selectable trendline annotation:

devtools::install_github("hrbrmstr/taucharts@dev")
 
library(taucharts)
 
data(cars_data)
 
tauchart(cars_data) %>% 
  tau_point("milespergallon", c("class", "price"), color="class") %>% 
  tau_guide_padding(bottom=300) %>% 
  tau_legend() %>% 
  tau_trendline() %>% 
  tau_tooltip(c("vehicle", "year", "class", "price", "milespergallon"))


Hybrid cars fuel economy by price and class
It seems expensive cars are less efficient.

There are _tons_ more examples in the [TauCharts RPub](http://rpubs.com/hrbrmstr/taucharts) (and soon-to-be vignette) and @timelyportfolio will be featuring it in his weekly [widget update](http://www.buildingwidgets.com/).

I’m super-pleased to announce that the Benevolent CRAN Overlords [accepted the metricsgraphics package](http://cran.r-project.org/web/packages/metricsgraphics/index.html) into CRAN over the weekend. Now, you no longer need to rely on github/devtools to use [MetricsGraphics.js](http://metricsgraphicsjs.org/) charts from your R scripts. If you’re not familiar with `htmlwidgets`, take a look at [the official site for them](http://www.htmlwidgets.org/).

To make it easier to grok the package, I replicated many of the core [MetricsGraphics examples](http://metricsgraphicsjs.org/examples.htm) in the package [vignette](http://cran.r-project.org/web/packages/metricsgraphics/vignettes/introductiontometricsgraphics.html) (which is also below).

I’ll be finishing up support for all of the features of MetricsGraphics library, most importantly `POSIX[cl]t` support for time ranges in the not-too-distant future. You can drop feature requests, questions or problems [over at github](https://github.com/hrbrmstr/metricsgraphics/issues).

I set aside a small bit of time to give [rbokeh](https://github.com/bokeh/rbokeh) a try and figured I’d share a small bit of code that shows how to make the “same” chart in both ggplot2 and rbokeh.

#### What is Bokeh/rbokeh?

rbokeh is an [htmlwidget](http://htmlwidgets.org) wrapper for the [Bokeh](http://bokeh.pydata.org/en/latest/) visualization library that has become quite popular in Python circles. Bokeh makes creating interactive charts pretty simple and rbokeh lets you do it all with R syntax.

#### Comparing ggplot & rbokeh

This is not a comprehensive introduction into rbokeh. You can get that [here (officially)](http://hafen.github.io/rbokeh/). I merely wanted to show how a ggplot idiom would map to an rbokeh one for those that may be looking to try out the rbokeh library and are familiar with ggplot. They share a very common “grammar of graphics” base where you have a plot structure and add layers and aesthetics. They each do this a tad bit differently, though, as you’ll see.

First, let’s plot a line graph with some markers in ggplot. The data I’m using is a small time series that we’ll use to plot a cumulative sum of via a line graph. It’s small enough to fit inline:

library(ggplot2)
library(rbokeh)
library(htmlwidgets)
 
structure(list(wk = structure(c(16069, 16237, 16244, 16251, 16279,
16286, 16300, 16307, 16314, 16321, 16328, 16335, 16342, 16349,
16356, 16363, 16377, 16384, 16391, 16398, 16412, 16419, 16426,
16440, 16447, 16454, 16468, 16475, 16496, 16503, 16510, 16517,
16524, 16538, 16552, 16559, 16566, 16573), class = "Date"), n = c(1L,
1L, 1L, 1L, 3L, 1L, 3L, 2L, 4L, 2L, 3L, 2L, 5L, 5L, 1L, 1L, 3L,
3L, 3L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 7L, 1L, 2L, 6L, 7L, 1L, 1L,
1L, 2L, 2L, 7L, 1L)), .Names = c("wk", "n"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -38L)) -> by_week
 
events <- data.frame(when=as.Date(c("2014-10-09", "2015-03-20", "2015-05-15")),
                     what=c("Thing1", "Thing2", "Thing2"))

The ggplot version is pretty straightforward:

gg <- ggplot()
gg <- gg + geom_vline(data=events,
                      aes(xintercept=as.numeric(when), color=what),
                      linetype="dashed", alpha=1/2)
gg <- gg + geom_text(data=events,
                     aes(x=when, y=1, label=what, color=what),
                     hjust=1.1, size=3)
gg <- gg + geom_line(data=by_week, aes(x=wk, y=cumsum(n)))
gg <- gg + scale_x_date(expand=c(0, 0))
gg <- gg + scale_y_continuous(limits=c(0, 100))
gg <- gg + labs(x=NULL, y="Cumulative Stuff")
gg <- gg + theme_bw()
gg <- gg + theme(panel.grid=element_blank())
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(legend.position="none")
gg

We:

– setup a base ggplot object
– add a layer of marker lines (which are the 3 `events` dates)
– add a layer of text for the marker lines
– add a layer of the actual line – note that we can use `cumsum(n)` vs pre-compute it
– setup scale and other aesthetic properties

That gives us this:

gg

Here’s a similar structure in rbokeh:

figure(width=550, height=375,
       logo="grey", outline_line_alpha=0) %>%
  ly_abline(v=events$when, color=c("red", "blue", "blue"), type=2, alpha=1/4) %>%
  ly_text(x=events$when, y=5, color=c("red", "blue", "blue"),
          text=events$what, align="right", font_size="7pt") %>%
  ly_lines(x=wk, y=cumsum(n), data=by_week) %>%
  y_range(c(0, 100)) %>%
  x_axis(grid=FALSE, label=NULL,
         major_label_text_font_size="8pt",
         axis_line_alpha=0) %>%
  y_axis(grid=FALSE,
         label="Cumulative Stuff",
         minor_tick_line_alpha=0,
         axis_label_text_font_size="10pt",
         axis_line_alpha=0) -> rb
rb

Here, we set the `width` and `height` and configure some of the initial aesthetic options. Note that `outline_line_alpha=0` is the equivalent of `theme(panel.border=element_blank())`.

The markers and text do not work exactly as one might expect since there’s no way to specify a `data` parameter, so we have to set the colors manually. Also, since the target is a browser, points are specified in the same way you would with CSS. However, it’s a pretty easy translation from `geom_[hv]line` to `ly_abline` and `geom_text` to `ly_text`.

The `ly_lines` works pretty much like `geom_line`.

Notice that both ggplot and rbokeh can grok dates for plotting (though we do not need the `as.numeric` hack for rbokeh).

rbokeh will auto-compute bounds like ggplot would but I wanted the scale to go from 0 to 100 in each plot. You can think of `y_range` as `ylim` in ggplot.

To configure the axes, you work directly with `x_axis` and `y_axis` parameters vs `theme` elements in ggplot. To turn off only lines, I set the alpha to 0 in each and did the same with the y axis minor tick marks.

Here’s the rbokeh result:

NOTE: you can save out the widget with:

saveWidget(rb, file="rbokeh001.html")

and I like to use the following `iframe` settings to include the widgets:

<iframe style="max-width=100%" 
        src="rbokeh001.html" 
        sandbox="allow-same-origin allow-scripts" 
        width="100%" 
        height="400" 
        scrolling="no" 
        seamless="seamless" 
        frameBorder="0"></iframe>

#### Wrapping up

Hopefully this helped a bit with translating some ggplot idioms over to rbokeh and developing a working mental model of rbokeh plots. As I play with it a bit more I’ll add some more examples here in the event there are “tricks” that need to be exposed. You can find the code [up on github](https://gist.github.com/hrbrmstr/a3a1be8132530b355bf9) and please feel free to drop a note in the comments if there are better ways of doing what I did or if you have other hints for folks.

When I used one of the Scotland TopoJSON files for a recent post, it really hit me just how much D3 cartography envy I had/have as an R user. Don’t get me wrong, I can conjure up D3 maps pretty well [1] [2] and the utility of an interactive map visualization goes without saying, but we can make great static maps in R without a great deal of effort, so I decided to replicate a few core examples from the D3 topojson gallery in R.

I chose five somewhat different examples, each focusing on various aspects of creating map layers and trying to not be too U.S. focused. Here they are (hit the main link to go to the gist for the example and the bl.ocks URL to see it’s D3 counterpart):

I used the TopoJSON/GeoJSON files provided with each example, so you’ll need a recent gdal (>= 1.11), and—consequently—a suitable build of rgdal) to work through the examples.

The Core Mapping Idiom

While the details may vary with each project you work on, the basic flow to present a map in R with ggplot are:

  • read in a map features (I use readOGR in these examples)
  • convert that into something ggplot can handle
  • identify values you wish to pair with those features (optional if we’re just plotting a plain map)
  • determine which portion of the map is to be displayed
  • plot the map features

Words & abbreviations mean things, just like map symbols mean things, and if you’re wondering what this “OGR” is, here’s the answer from the official FAQ:

OGR used to stand for OpenGIS Simple Features Reference Implementation. However, since OGR is not fully compliant with the OpenGIS Simple Feature specification and is not approved as a reference implementation of the spec the name was changed to OGR Simple Features Library. The only meaning of OGR in this name is historical. OGR is also the prefix used everywhere in the source of the library for class names, filenames, etc.

The readOGR function can work with a wide variety of file formats and OGR files can hold a wide variety of data. The most basic use for our mapping is to read in these TopoJSON/GeoJSON files and use the right features from them to make our maps. Features/layers can be almost anything (counties, states, countries, rivers, lakes, etc) and we can see what features we want to work with by using the ogrListLayers function (you can do this from an operating system command line as well, but we’ll stay in R for now). Let’s take a look at the layers available in the map from the Costa Rica example:

ogrListLayers("division.json")
 
## [1] "limites"    "provincias" "cantones"   "distritos" 
attr(,"driver")
## [1] "GeoJSON"
attr(,"nlayers")
## [1] 4

Those translate to “country”, “provinces”, “cantons”, & “districts”. Each layer has polygons and associated data for the polygons (and overall layer), including information about the type of projection. If you’re sensing a “math trigger warning”, fear not; I won’t be delving into to much more cartographic detail as you probably just want to see the maps & code.

Swiss Cantons

If you’re from the U.S. you (most likely) have no idea what a canton is. The quickest explanation is that it is an administrative division within a country and, in this specific example, the 26 cantons of Switzerland are the member states of the federal state of Switzerland.

The D3 Swiss Cantons uses a TopoJSON/GeoJSON file that has only one layer (i.e. the cantons) along with metadata about the canton id and name:

ogrInfo("readme-swiss.json", "cantons")
 
## Source: "readme-swiss.json", layer: "cantons"
## Driver: GeoJSON number of rows 26 
## Feature type: wkbPolygon with 2 dimensions
## Extent: (5.956 45.818) - (10.492 47.808)
## Number of fields: 2 
##   name type length typeName
## 1   id    4      0   String
## 2 name    4      0   String

NOTE: you should learn to get pretty adept with the OGR functions or command-line tools as you can do some really amazing things with them, including extracting only certain features, simplifying the polygons or fixing issues. Some of the TopoJSON/GeoJSON files you’ll find with D3 examples may have missing or invalid components and you can fix some of them with these tools. We’ll be working around errors and missing values in these examples.

The D3 example displays the canton name at the centroid of the polygon, so that’s what we’ll do in R:

library(rgeos)
library(rgdal) # needs gdal > 1.11.0
library(ggplot2)
 
# ggplot map theme
devtools::source_gist("https://gist.github.com/hrbrmstr/33baa3a79c5cfef0f6df")
 
map = readOGR("readme-swiss.json", "cantons")
 
map_df <- fortify(map)

The map object is a SpatialPolygonsDataFrame and has a fairly complex structure:

slotNames(map)
## [1] "data"        "polygons"    "plotOrder"   "bbox"       
## [5] "proj4string"
 
names(map)
## [1] "id"   "name"
 
# execute these on your own and poke around the data structures after determining the class
class(map@data)
class(map@polygons)
class(map@plotOrder)
class(map@bbox)
class(map@proj4string)

The fortify function turns all that into something we can use with ggplot. Normally, we’d be able to get fortify to associate the canton name to the polygon points it encodes via the region parameter. That did not work with these TopoJSON/GeoJSON files and I didn’t really poke around much to determine why since it’s easy enough to work around. In this case, I manually merged the names with the fortified map data frame.

#  create mapping for id # to name since "region=" won't work
dat <- data.frame(id=0:(length(map@data$name)-1), canton=map@data$name)
map_df <- merge(map_df, dat, by="id")

We can get the centroid via the gCentroid function, and we’ll make a data frame of those center points and the name of the canton for use with a geom_text layer after plotting the base outlines of the cantons (with a rather bland fill, but I didn’t pick the color):

# find canton centers
centers <- data.frame(gCentroid(map, byid=TRUE))
centers$canton <- dat$canton
 
# make a map!
gg <- ggplot()
gg <- gg + geom_map(data=map_df, map=map_df,
                    aes(map_id=id, x=long, y=lat, group=group),
                    color="#ffffff", fill="#bbbbbb", size=0.25)
# gg <- gg + geom_point(data=centers, aes(x=x, y=y))
gg <- gg + geom_text(data=centers, aes(label=canton, x=x, y=y), size=3)
gg <- gg + coord_map()
gg <- gg + labs(x="", y="", title="Swiss Cantons")
gg <- gg + theme_map()

The coord_map() works with the mapproj package to help us display maps in reasonable projections (or really dumb ones). The default is "mercator" and we’ll stick with that since the D3 examples use it (but, winkel-tripel FTW!).

Here’s the result of our hard work (select map for larger version):

If you ignore the exposition above and just take into account non-blank source code lines, we did all that in ~16LOC and have a scaleable SVG file as a result. You can have some fun with the above code and remove the static fill="#bbbbbb" and move it to the mapping aesthetic parameter and tie it’s value to the canton name.

Costa Rica

The TopoJSON/GeoJSON file provided with the D3 example is a good example of encoding multiple layers into a single file (see the first ogrListLayers above). We’ll create a fortified version of each layer and then plot each with a geom_map layer using different line colors, sizes and fills:

limites = readOGR("division.json", "limites")
provincias = readOGR("division.json", "provincias")
cantones = readOGR("division.json", "cantones")
distritos = readOGR("division.json", "distritos")
 
limites_df <- fortify(limites)
cantones_df <- fortify(cantones)
distritos_df <- fortify(distritos)
provincias_df <- fortify(provincias)
 
gg <- ggplot()
gg <- gg + geom_map(data=limites_df, map=limites_df,
                    aes(map_id=id, x=long, y=lat, group=group),
                    color="white", fill="#dddddd", size=0.25)
gg <- gg + geom_map(data=cantones_df, map=cantones_df,
                    aes(map_id=id, x=long, y=lat, group=group),
                    color="red", fill="#ffffff00", size=0.2)
gg <- gg + geom_map(data=distritos_df, map=distritos_df,
                    aes(map_id=id, x=long, y=lat, group=group),
                    color="#999999", fill="#ffffff00", size=0.1)
gg <- gg + geom_map(data=provincias_df, map=provincias_df,
                    aes(map_id=id, x=long, y=lat, group=group),
                    color="black", fill="#ffffff00", size=0.33)
gg <- gg + coord_map()
gg <- gg + labs(x="", y="", title="Costa Rica TopoJSON")
gg <- gg + theme_map()

The result is pretty neat and virtually identical to the D3 version:

Try playing around with the order of the geom_map layers (or remove some) and also the line color/size/fill & alpha values to see how it changes the map.

Area Choropleth

I’m not a huge fan of the colors used in the D3 version and I’m not going to spend any time moving Hawaii & Alaska around (that’s a whole different post). But, I will show how to make a similar area choropleth:

# read in the county borders
map = readOGR("us.json", "counties")
 
# calculate (well retrieve) the area since it's part of the polygon structure
# and associate it with the polygon id so we can use it later. We need to do
# the merge manually again since the "us.json" file threw errors again when
# trying to use the fortify "region" parameter.
 
map_area <- data.frame(id=0:(length(map@data$id)-1),
                       area=sapply(slot(map, "polygons"), slot, "area") )
 
# read in the state borders
states = readOGR("us.json", "states")
states_df <- fortify(states)
 
# create map data frame and merge area info
map_df <- fortify(map)
map_df <- merge(map_df, map_area, by="id")
 
gg <- ggplot()
 
# thin white border around counties and shades of yellow-green for area (log scale)
gg <- gg + geom_map(data=map_df, map=map_df,
                    aes(map_id=id, x=long, y=lat, group=group, fill=log1p(area)),
                    color="white", size=0.05)
 
# thick white border for states
gg <- gg + geom_map(data=states_df, map=states_df,
                    aes(map_id=id, x=long, y=lat, group=group),
                    color="white", size=0.5, alpha=0)
gg <- gg + scale_fill_continuous(low="#ccebc5", high="#084081")
 
# US continental extents - not showing alaska & hawaii
gg <- gg + xlim(-124.848974, -66.885444)
gg <- gg + ylim(24.396308, 49.384358)
 
gg <- gg + coord_map()
gg <- gg + labs(x="", y="", title="Area Choropleth")
gg <- gg + theme_map()
gg <- gg + theme(legend.position="none")

Play with the colors and use different values instead of the polygon area (perhaps use sample or runif to generate some data) to see how it changes the choropleth outcome.

Blocky Counties

The example from the D3 wiki is more “how to work with shapefiles and map coordinates” than it is useful, but we have the same flexibility in R, so we’ll make the same plot by using the bbox function to make a data frame of bounding boxes we can use with geom_rect (there’s no geom_map in this example, just using the coordinate system to plot boxes):

# use the topojson from the bl.ocks example
map = readOGR("us.json", "counties")
 
# build our map data frame of rects
 
map_df <- do.call("rbind", lapply(map@polygons, function(p) {
 
  b <- bbox(p) # get bounding box of polygon and put it into a form we can use later
 
  data.frame(xmin=b["x", "min"],
             xmax=b["x", "max"],
             ymin=b["y", "min"],
             ymax=b["y", "max"])
 
}))
map_df$id <- map$id # add the id even though we aren't using it now
 
gg <- ggplot(data=map_df)
gg <- gg + geom_rect(aes(xmin=xmin, xmax=xmax,
                         ymin=ymin, ymax=ymax),
                     color="steelblue", alpha=0, size=0.25)
 
# continental us only
gg <- gg + xlim(-124.848974, -66.885444)
gg <- gg + ylim(24.396308, 49.384358)
gg <- gg + coord_map()
gg <- gg + labs(x="", y="", title="Blocky Counties")
gg <- gg + theme_map()
gg <- gg + theme(legend.position="none")

To re-emphasize we’re just working with ggplot layers, so play around and, perhaps color in only the odd numbered counties.

County Circles (OK, Ovals)

The last D3 example I’m copying swaps squares for circles, which makes this more of a challenge to do in R+ggplot since ggplot has no “circle” geom (and holey geom_points do not count). So, we’ll borrow and slightly adapt a function from StackOverflow by joran that builds a data frame of polygon points derived by a center & diameter. We’ll add an id value (for each of the counties) and make one really big data frame (well, big for use in ggplot) that we can then plot as grouped geom_paths. Unlike our cantons example, the gCentroid function coughed up errors on this TopoJSON/GeoJSON file, so I resorted to approximating the center from the rectangular bounding box. Also, I don’t project the circle coordinates before plotting, so they’re ovals. While it doesn’t mirror the D3 example perfectly, it should help reinforce how to work with the map’s metadata and draw arbitrary components on a map:

# adapted from http://stackoverflow.com/questions/6862742/draw-a-circle-with-ggplot2
# computes a circle from a given diameter. we add "id" so we can have one big
# data frame and group them for plotting
 
circleFun <- function(id, center = c(0,0),diameter = 1, npoints = 100){
    r = diameter / 2
    tt <- seq(0,2*pi,length.out = npoints)
    xx <- center[1] + r * cos(tt)
    yy <- center[2] + r * sin(tt)
    return(data.frame(id=id, x = xx, y = yy))
}
 
# us topojson from the bl.ocks example
map = readOGR("us.json", "counties")
 
# this topojson file gives rgeos_getcentroid errors here
# so we approximate the centroid
 
map_df <- do.call("rbind", lapply(map@polygons, function(p) {
 
  b <- bbox(p)
 
  data.frame(x=b["x", "min"] + ((b["x", "max"] - b["x", "min"]) / 2),
             y=b["y", "min"] + ((b["y", "max"] - b["y", "min"]) / 2))
 
}))
 
# get area & diameter
map_df$area <- sapply(slot(map, "polygons"), slot, "area")
map_df$diameter <- sqrt(map_df$area / pi) * 2
 
# make our big data frame of circles
circles <- do.call("rbind", lapply(1:nrow(map_df), function(i) {
  circleFun(i, c(map_df[i,]$x, map_df[i,]$y), map_df[i,]$diameter)
}))
 
gg <- ggplot(data=circles, aes(x=x, y=y, group=id))
gg <- gg + geom_path(color="steelblue", size=0.25)
 
# continental us
gg <- gg + xlim(-124.848974, -66.885444)
gg <- gg + ylim(24.396308, 49.384358)
gg <- gg + coord_map()
gg <- gg + labs(x="", y="", title="County Circles (OK, Ovals)")
gg <- gg + theme_map()
gg <- gg + theme(legend.position="none")

If you poke around a bit at the various map libraries in R, you should be able to figure out how to get those plotted as circles (and learn alot in the process).

Wrapping Up

R ggplot maps won’t and shouldn’t replace D3 maps for many reasons, paramount of which is interactivity. The generated SVG files are also fairly large and the non-SVG versions don’t look nearly as crisp (and aren’t as flexible). However, this should be a decent introductory primer on mapping and shapefiles and might come in handy when you want to use R to enhance maps with other data and write out (yep, R can read and write OGR) your own shapefiles for use in D3 (or other tools/languages).

Don’t forget that all source code (including TopoJSON/GeoJSON files and sample SVGs) are in their own gists:

If you figure out what is causing some of the errors I mentioned or make some epic maps of your own, don’t hesitate to drop a note in the comments.

I’ve been getting a huge uptick in views of my Slopegraphs in Python post and I think it’s due to @edwardtufte’s recent slopegraph contest announcement.

The original Python code is crufty and a mess mostly due to the intermittent attention to it, wanting to reduce dependencies and hacking vs programming. I’ve been wanting to do a D3 version for a while, so I went a bit overboard once I learned of Mr Tufte’s challenge and made more of a “workbench” for making slopegraphs:

D3_Slopegraph_Workshop

It’s all in D3/HTML5/javascrpt/CSS and requires no server-side components at all.

You can play with a live, alpha-quality version and check out the rest of the components on github.

It needs work, but it should be a good starting point for folks.

As my track record for “winning” things is scant, if you do end up using the code, passing on word of my upcoming book with @jayjacobs would be
#spiffy :-)

It started with a local R version and migrated to a Shiny version and is now in full D3 glory.

Some down time gave me the opportunity to start a basic D3 version of the outage map, but it needs a bit of work as it relies on a page meta refresh to update (every 5 minutes) vs an inline element dynamic refresh. The fam was getting a bit irked at coding time on Thanksgiving, so keep watching the following gists for updates after the holiday: