Skip navigation

Author Archives: hrbrmstr

Don't look at me…I do what he does — just slower. #rstats avuncular • ?Resistance Fighter • Cook • Christian • [Master] Chef des Données de Sécurité @ @rapid7

I’ve been seeing an uptick in static US “lower 48” maps with “meh” projections this year, possibly caused by a flood of new folks resolving to learn R but using pretty old documentation or tutorials. I’ve also been seeing an uptick in folks needing to geocode US city/state to lat/lon. I thought I’d tackle both in a quick post to show how to (simply) use a decent projection for lower 48 US maps and then how to use a _very_ basic package I wrote – [localgeo](http://github.com/hrbrmstr/localgeo) to avoid having to use an external API/service for basic city/state geocoding.

### Albers All The Way

I could just plot an Albers projected map, but it’s more fun to add some data. We’ll start with some setup libraries and then read in some recent earthquake data, then filter it for our map display:

library(ggplot2)
library(dplyr)
library(readr) # devtools::install_github("hadley/readr")
 
# Earthquakes -------------------------------------------------------------
 
# get quake data ----------------------------------------------------------
quakes <- read_csv("http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_month.csv")
 
# filter all but lower 48 US ----------------------------------------------
quakes %>%
  filter(latitude>=24.396308, latitude<=49.384358,
         longitude>=-124.848974, longitude<=-66.885444) -> quakes
 
# bin by .5 ---------------------------------------------------------------
quakes$Magnitude <- as.numeric(as.character(cut(quakes$mag, breaks=c(2.5, 3, 3.5, 4, 4.5, 5),
    labels=c(2.5, 3, 3.5, 4, 4.5), include.lowest=TRUE)))

Many of my mapping posts use quite a few R geo libraries, but this one just needs `ggplot2`. We extract the US map data, turn it into something `ggplot` can work with, then plot our quakes on the map:

us <- map_data("state")
us <- fortify(us, region="region")
 
# theme_map ---------------------------------------------------------------
devtools::source_gist("33baa3a79c5cfef0f6df")
 
# plot --------------------------------------------------------------------
gg <- ggplot()
gg <- gg + geom_map(data=us, map=us,
                    aes(x=long, y=lat, map_id=region, group=group),
                    fill="#ffffff", color="#7f7f7f", size=0.25)
gg <- gg + geom_point(data=quakes,
                      aes(x=longitude, y=latitude, size=Magnitude),
                      color="#cb181d", alpha=1/3)
gg <- gg + coord_map("albers", lat0=39, lat1=45)
gg <- gg + theme_map()
gg <- gg + theme(legend.position="right")
gg

2.5+ mag quakes in Lower US 48 in past 30 days

Plot_Zoom

### Local Geocoding

There are many APIs with corresponding R packages/functions to perform geocoding (one really spiffy recent one is [geocodeHERE](http://cran.r-project.org/web/packages/geocodeHERE/)). While Nokia’s service is less restrictive than Google’s, most of these sites are going to have some kind of restriction on the number of calls per second/minute/day. You could always install the [Data Science Toolkit](http://www.datasciencetoolkit.org/) locally (note: it was down as of the original posting of this blog) and perform the geocoding locally, but it does take some effort (and space/memory) to setup and get going.

If you have relatively clean data and only need city/state resolution, you can use a package I made – [localgeo](http://github.com/hrbrmstr/localgeo) as an alternative. I took a US Gov census shapefile and extracted city, state, lat, lon into a data frame and put a lightweight function shim over it (it’s doing nothing more than `dplyr::left_join`). It won’t handle nuances like “St. Paul, MN” == “Saint Paul, MN” and, for now, it requires you to do the city/state splitting, but I’ll be tweaking it over the year to be a bit more forgiving.

We can give this a go and map the [greenest cities in the US in 2014](http://www.nerdwallet.com/blog/cities/greenest-cities-america/) as crowned by, er, Nerd Wallet. I went for “small data file with city/state in it”, so if you know of a better source I’ll gladly use it instead. Nerd Wallet used DataWrapper, so getting the actual data was easy and here’s a small example of how to get the file, perform the local geocoding and use an Albers projection for plotting the points. The code below assumes you’re still in the R session that used some of the `library` calls earlier in the post.

library(httr)
library(localgeo) # devtools::install_github("hrbrmstr/localgeo")
library(tidyr)
 
# greenest cities ---------------------------------------------------------
# via: http://www.nerdwallet.com/blog/cities/greenest-cities-america/
 
url <- "https://gist.githubusercontent.com/hrbrmstr/1078fb798e3ab17556d2/raw/53a9af8c4e0e3137a0a8d4d6332f7a6073d93fb5/greenest.csv"
greenest <- read.table(text=content(GET(url), as="text"), sep=",", header=TRUE, stringsAsFactors=FALSE)
 
greenest %>%
  separate(City, c("city", "state"), sep=", ") %>%
  filter(!state %in% c("AK", "HI")) -> greenest
 
greenest_geo <- geocode(greenest$city, greenest$state)
 
gg <- ggplot()
gg <- gg + geom_map(data=us, map=us,
                    aes(x=long, y=lat, map_id=region, group=group),
                    fill="#ffffff", color="#7f7f7f", size=0.25)
gg <- gg + geom_point(data=greenest_geo,
                      aes(x=lon, y=lat),
                      shape=21, color="#006d2c", fill="#a1d99b", size=4)
gg <- gg + coord_map("albers", lat0=39, lat1=45)
gg <- gg + labs(title="Greenest Cities")
gg <- gg + theme_map()
gg <- gg + theme(legend.position="right")
gg

Nerd Wallets’s Greenest US (Lower 48) Cities 2014

Plot_Zoom 2

Let me reinforce that the `localgeo` package will most assuredly fail to geocode some city/state combinations. I’m looking for a more comprehensive shapefile to ensure I have the complete list of cities and I’ll be adding some code to help make the lookups more forgiving. It may at least help when you bump into an API limit and need to crank out something in a hurry.

In preparation for using some of our streamgraphs for production (PDF/print) graphics, I ended up having to hand-edit labels in on one of the graphics in an Adobe product. This bumped up the priority on adding annotation functions to the streamgraph package (you really don’t want to have to hand-edit graphics if at all possible, trust me). To illustrate them, I’ll use unemployment data that I started gathering for a course I’m teaching in the Fall.

We’ll start with the setup and initial data gathering:

library(dplyr)
library(streamgraph)
library(pbapply)
 
url <- "http://www.bls.gov/lau/ststdsadata.txt"
dat <- readLines(url)

This data is not exactly in a happy format (hit the URL in your browser and you’ll see what I mean). It was definitely made for line printers/human consumption and I feel bad for any human that has to stare at it. The function I’m using to extract data is not necessarily what I’d do to just read in the whole data, but it’s more for teaching something else than optimization. It’ll do for our purposes here:

get_state_data <- function(state) {
 
  section <- paste("^%s|    (", paste0(month.name, sep="", collapse="|"), ")\ +[[:digit:]]{4}", sep="", collapse="")
  section <- sprintf(section, state)
  vals <- gsub("^\ +|\ +$", "", grep(section, dat, value=TRUE))
 
  state_vals <- gsub("^.* \\.+", "", vals[seq(from=2, to=length(vals), by=2)])
 
  cols <- read.table(text=state_vals)
  cols$month <- as.Date(sprintf("01 %s", vals[seq(from=1, to=length(vals), by=2)]),
                        format="%d %B %Y")
  cols$state <- state
 
  cols %>%
    select(8:9, 1:8) %>%
    mutate(V1=as.numeric(gsub(",", "", V1)),
           V2=as.numeric(gsub(",", "", V2)),
           V4=as.numeric(gsub(",", "", V4)),
           V6=as.numeric(gsub(",", "", V6)),
           V3=V3/100,
           V5=V5/100,
           V7=V7/100) %>%
    rename(civ_pop=V1,
           labor_force=V2, labor_force_pct=V3,
           employed=V4, employed_pct=V5,
           unemployed=V6, unemployed_pct=V7)
 
}
 
state_unemployment <- bind_rows(pblapply(state.name, get_state_data))

This will give us a data frame for employment(/unemployment) rates for all the (US) states. I only wanted to focus on New England and a few others for the course example, so this bit filters out them out:

state_unemployment %>%
  filter(state %in% c("California", "Ohio", "Rhode Island", "Maine",
                      "Massachusetts", "Connecticut", "Vermont",
                      "New Hampshire", "Nebraska")) -> some

With that setup out of the way, let me introduce the two new functions: `sg_add_marker` and `sg_annotate`. `sg_add_marker` adds a vertical, dotted line that spans the height of the graph and is placed at the designated spot on the x axis. You can add an optional label for the marker by specifying the y position, label text, color, size, space away from the line and how it’s aligned – start (left), center (middle), right (end). This is primarily useful for placing the label on either side of the line.

`sg_annotate` is for adding text anywhere on the streamgraph. The original use for it was to label streams, but you can use it any way you think would add meaning to your streamgraph. You can see them both in action below, where I plot the streamgraph for unemployment (%) for the selected states, then label the start of each recession since 1980 (with the peak national unemployment rate) with a marker and also label each stream:

streamgraph(some, "state", "unemployed_pct", "month") %>%
  sg_axis_x(tick_interval=10, tick_units = "year", tick_format="%Y") %>%
  sg_axis_y(0) %>%
  sg_add_marker(x=as.Date("1981-07-01"), "1981 (10.8%)", anchor="end") %>%
  sg_add_marker(x=as.Date("1990-07-01"), "1990 (7.8%)", anchor="start") %>%
  sg_add_marker(x=as.Date("2001-03-01"), "2001 (6.3%)", anchor="end") %>%
  sg_add_marker(x=as.Date("2007-12-01"), "2007 (10.1%)", anchor="end") %>%
  sg_annotate(label="Vermont", x=as.Date("1978-04-01"), y=0.6, color="#ffffff") %>%
  sg_annotate(label="Maine", x=as.Date("1978-03-01"), y=0.30, color="#ffffff") %>%
  sg_annotate(label="Nebraska", x=as.Date("1977-06-01"), y=0.41, color="#ffffff") %>%
  sg_annotate(label="Massachusetts", x=as.Date("1977-06-01"), y=0.36, color="#ffffff") %>%
  sg_annotate(label="New Hampshire", x=as.Date("1978-03-01"), y=0.435, color="#ffffff") %>%
  sg_annotate(label="California", x=as.Date("1978-02-01"), y=0.175, color="#ffffff") %>%
  sg_annotate(label="Rhode Island", x=as.Date("1977-11-01"), y=0.55, color="#ffffff") %>%
  sg_annotate(label="Ohio", x=as.Date("1978-06-01"), y=0.485, color="#ffffff") %>%
  sg_annotate(label="Connecticut", x=as.Date("1978-01-01"), y=0.235, color="#ffffff") %>%
  sg_fill_tableau() %>%
  sg_legend(show=TRUE)

Selected State Unemployment Figures Since 1976

I probably could have positioned the annotations a bit better, but this should be a good enough example to get the general idea. I may add an option to place the marker vertical lines behind streamgraph and will be adding some toggle options to the interactive version (to hide/show markers and/or annotations).

As usual, the package is up [on github](https://github.com/hrbrmstr/streamgraph) and a contiguous copy of the above snippets are in [this gist](https://gist.github.com/hrbrmstr/4e181ae045807ca3a858).

Three final notes. First, I suggest enabling the y axis when you’re trying to figure out where the y position for a label should be (since the y axis range is calculated by the summed span of the data). Second, the x axis works with both dates and continuous values, but you need to match what you setup the streamgraph with. Finally, just a tip: I’ve found [SVG Crowbar 2](http://nytimes.github.io/svg-crowbar/) to be super-helpful when I need to extract these streamgraphs out for non-interactive reproduction. Just yank the SVG out with it and hand it (or a converted form of it) to whomever is handling final production and they should be able to work with it.

I noticed that the @rOpenSci folks had an interface to [ip-api.com](http://ip-api.com/) on their [ToDo](https://github.com/ropensci/webservices/wiki/ToDo) list so I whipped up a small R package to fill said gap.

Their IP Geolocation API will take an IPv4, IPv6 or FQDN and kick back a ASN, lat/lon, address and more. The [ipapi package](https://github.com/hrbrmstr/ipapi) exposes one function – `geolocate` which takes in a character vector of any mixture of IPv4/6 and domains and returns a `data.table` of results. Since `ip-api.com` has a restriction of 250 requests-per-minute, the package also tries to help ensure you don’t get your own IP address banned (there’s a form on their site you can fill in to get it unbanned if you do happen to hit the limit). Overall, there’s nothing fancy in the package, but it gets the job done.

I notified the rOpenSci folks about it, so hopefully it’ll be one less thing on that particular to-do list.

You can see it in action in combination with the super-spiffy [leaflet](http://www.htmlwidgets.org/showcase_leaflet.html) htmlwidget:

library(leaflet)
library(ipapi)
library(maps)
 
# get top 500 domains
sites <- read.csv("http://moz.com/top500/domains/csv", stringsAsFactors=FALSE)
 
# make reproducible
set.seed(1492)
 
# pick out a random 50
sites <- sample(sites$URL, 50) 
sites <- gsub("/", "", sites)
locations <- geolocate(sites)
 
# take a quick look
dplyr::glimpse(locations)
 
## Observations: 50
## Variables:
## $ as          (fctr) AS2635 Automattic, Inc, AS15169 Google Inc., AS3561...
## $ city        (fctr) San Francisco, Mountain View, Chesterfield, Mountai...
## $ country     (fctr) United States, United States, United States, United...
## $ countryCode (fctr) US, US, US, US, US, US, JP, US, US, IT, US, US, US,...
## $ isp         (fctr) Automattic, Google, Savvis, Google, Level 3 Communi...
## $ lat         (dbl) 37.7484, 37.4192, 38.6631, 37.4192, 38.0000, 33.7516...
## $ lon         (dbl) -122.4156, -122.0574, -90.5771, -122.0574, -97.0000,...
## $ org         (fctr) Automattic, Google, Savvis, Google, AddThis, Peer 1...
## $ query       (fctr) 192.0.80.242, 74.125.227.239, 206.132.6.134, 74.125...
## $ region      (fctr) CA, CA, MO, CA, , GA, 13, MA, TX, , MA, TX, CA, , ,...
## $ regionName  (fctr) California, California, Missouri, California, , Geo...
## $ status      (fctr) success, success, success, success, success, succes...
## $ timezone    (fctr) America/Los_Angeles, America/Los_Angeles, America/C...
## $ zip         (fctr) 94110, 94043, 63017, 94043, , 30303, , 02142, 78218...
 
# all i want is the world!
world <- map("world", fill = TRUE, plot = FALSE) 
 
# kick out a a widget
leaflet(data=world) %>% 
  addTiles() %>% 
  addCircleMarkers(locations$lon, locations$lat, 
                   color = '#ff0000', popup=sites)

50 Random Top Sites

A post on [StackOverflow](http://stackoverflow.com/questions/28725604/streamgraphs-dataviz-in-r-wont-plot) asked about using a continuous variable for the x-axis (vs dates) in my [streamgraph package](http://github.com/hrbrmstr/streamgraph). While I provided a workaround for the question, it helped me bump up the priority for adding support for continuous x axis scales. With the [DBIR](http://www.verizonenterprise.com/DBIR/) halfway behind me now, I kicked out a new rev of the package/widget that has support for continuous scales.

Using the data from the SO post, you can see there’s not much difference in how you use continuous vs date scales:

library(streamgraph)
 
dat <- read.table(text="week variable value
40     rev1  372.096
40     rev2  506.880
40     rev3 1411.200
40     rev4  198.528
40     rev5   60.800
43     rev1  342.912
43     rev2  501.120
43     rev3  132.352
43     rev4  267.712
43     rev5   82.368
44     rev1  357.504
44     rev2  466.560", header=TRUE)
 
dat %>% 
  streamgraph("variable","value","week", scale="continuous") %>% 
  sg_axis_x(tick_format="d")

Product Revenue

I’ll be adding support for using a categorical variable on the x axis soon. Once that’s done, it’ll be time to do the CRAN dance.

We were looking for a different type of visualization for a project at work this past week and my thoughts immediately gravitated towards [streamgraphs](http://www.leebyron.com/else/streamgraph/). The TLDR on streamgraphs is they they are generalized versions of stacked area graphs with free baselines across the x axis. They are somewhat [controversial](http://www.visualisingdata.com/index.php/2010/08/making-sense-of-streamgraphs/) but have a “draw you in” aesthetic appeal (which is what we needed for our visualization).

You can make streamgraphs/stacked area charts pretty easily in D3, and since we needed to try many different sets of data in the streamgraph style, it made sense to make this an R [htmlwidget](http://www.htmlwidgets.org/). Thus, the [streamgraph package](https://github.com/hrbrmstr/streamgraph) was born.

### Making a streamgraph

The package isn’t in CRAN yet, so you have to do the `devtools` dance:

devtools::install_github("hrbrmstr/streamgraph")

Streamgraphs require a continuous variable for the x axis, and the `streamgraph` widget/package works with years or dates (support for `xts` objects and `POSIXct` types coming soon). Since they display categorical values in the area regions, the data in R needs to be in [long format](http://blog.rstudio.org/2014/07/22/introducing-tidyr/) which is easy to do with `dplyr` & `tidyr`.

The package recognizes when years are being used and does all the necessary conversions for you. It also uses a technique similar to `expand.grid` to ensure all categories are represented at every observation (not doing so makes `d3.stack` unhappy).

Let’s start by making a `streamgraph` of the number of movies made per year by genre using the `ggplot2` `movies` dataset:

library(streamgraph)
library(dplyr)
 
ggplot2::movies %>%
  select(year, Action, Animation, Comedy, Drama, Documentary, Romance, Short) %>%
  tidyr::gather(genre, value, -year) %>%
  group_by(year, genre) %>%
  tally(wt=value) %>%
  streamgraph("genre", "n", "year") %>%
  sg_axis_x(20) %>%
  sg_fill_brewer("PuOr") %>%
  sg_legend(show=TRUE, label="Genres: ")

Movie count by genre by year

We can also mimic an example from the [Name Voyager](http://www.bewitched.com/namevoyager.html) project (using the `babynames` R package) but change some of the aesthetics, just to give an idea of how some of the options work:

library(dplyr)
library(babynames)
library(streamgraph)
 
babynames %>%
 filter(grepl("^(Alex|Bob|Jay|David|Mike|Jason|Stephen|Kymberlee|Lane|Sophie|John|Andrew|Thibault|Russell)$", name)) %>%
  group_by(year, name) %>%
  tally(wt=n) %>%
  streamgraph("name", "n", "year", offset="zero", interpolate="linear") %>%
  sg_legend(show=TRUE, label="DDSec names: ")

Data-Driven Security Podcast guest+host names by year

There are more examples over at [RPubs](http://rpubs.com/hrbrmstr/streamgraph04) and [github](http://hrbrmstr.github.io/streamgraph/), but I’ll close with a streamgraph of housing data originally made by [Alex Bresler](http://asbcllc.com/blog/2015/february/cre_stream_graph_test/):

dat <- read.csv("http://asbcllc.com/blog/2015/february/cre_stream_graph_test/data/cre_transaction-data.csv")
 
dat %>%
  streamgraph("asset_class", "volume_billions", "year") %>%
  sg_axis_x(1, "year", "%Y") %>%
  sg_fill_brewer("PuOr") %>%
  sg_legend(show=TRUE, label="Assets: ")

Commercial Real Estate Transaction Volume by Asset Class Since 2006

While the radical volume change would have been noticeable in almost any graph style, it’s especially noticeable with the streamgraph version as your eyes tend to naturally follow the curves of the flow.

### Fin

While I wouldn’t have these replace my trusty ggplot2 faceted bar charts for regular EDA and reporting, streamgraphs can add a bit of color and flair, and may be an especially good choice when you need to view many categorical variables over time.

As usual, issues/feature requests on [github](http://github.com/hrbrmstr/streamgraph) and showcase/general feedback in the comments.

I felt compelled to dust off my 2013 [Valentine’s Day](http://rud.is/b/2013/02/14/happy-valentines-day-mrshrbrmstr/) `#rstats` post and make it all Shiny and new again. I used the same math from that post, but made the polygon a bit sharper and used ggplot2 for the plotting.

To kick it up a bit, I decided to pay homage to a local company (Necco) who makes the venerable [Sweethearts candies](http://www.necco.com/candy/sweethearts.aspx) that are popular this time of year (at least in the U.S.). The heart color is randomized to take on one of the signature pastels of the candy and I took the various sayings that have been on their hearts over the years (except 2006, which was a strange year) and also randomized them. The plot also displays the year of the saying.

I wrapped it all up in a Shiny bow, so all you have to do to get a new heart/saying is refresh the page!

To play with it, you can:

– refresh this post (since the `iframe` below is pointing to the shiny app)
– refresh the target [shinyapps.io app](https://hrbrmstr.shinyapps.io/SweetheaRstats/)
– run it locally via `shiny::runGist(“cf34f7230c88bd99b153”)`

NOTE: You’ll probably want/need to run it locally. I only have a free ShinyApps account and it’ll probably run out of free CPU time depending on when you’re reading this post.

The surprisingly small bit of code is in [this gist](https://gist.github.com/hrbrmstr/cf34f7230c88bd99b153).

Now there’s even one more way to have a data-driven romance!

Oh, and Happy Valentine’s Day @mrshrbrmstr!!

I received an out-of-band question on the use of `%<>%` in my [CDC FluView](rud.is/b/2015/01/10/new-r-package-cdcfluview-retrieve-flu-data-from-cdcs-fluview-portal/) post, and took the opportunity to address it in a broader, public fashion.

Anyone using R knows that the two most common methods of assignment are the venerable (and sensible) left arrow `<-` and it's lesser cousin `=`. `<-` has an evil sibling, `<<-`, which is used when you want/need to have R search through parent environments for an existing definition of the variable being assigned (up to the global environment). Since the introduction of the "piping idom"--`%>%`–made popular by `magrittr`, `dplyr`, `ggvis` and other packages, I have struggled with the use of `<-` in pipes. Since pipes flow data in a virtual forward motion, that LHS (left hand side) assignment has an awkward characteristic about it. Furthermore, many times you are piping from an object with the intent to replace the contents of said object. For example:

iris$Sepal.Length <- 
  iris$Sepal.Length %>%
  sqrt

(which is from the `magrittr` documentation).

To avoid the repetition of the left-hand side immediately after the assignment operator, Bache & Wickham came up with the `%<>%` operator, which shortens the above to:

iris$Sepal.Length %<>% sqrt

Try as I may (including the CDC FluView blog post), that way of assigning variables still _feels_ awkward, and is definitely confusing to new R users. But, what’s the alternative? I believe it’s R’s infrequently used `->` RHS assignment operator.

Let’s look at that in the context of the somewhat-long pipe in the CDC FluView example:

dat %>%
  mutate(REGION=factor(REGION,
                       levels=unique(REGION),
                       labels=c("Boston", "New York",
                                "Philadelphia", "Atlanta",
                                "Chicago", "Dallas",
                                "Kansas City", "Denver",
                                "San Francisco", "Seattle"),
                       ordered=TRUE)) %>%
  mutate(season_week=ifelse(WEEK>=40, WEEK-40, WEEK),
         season=ifelse(WEEK<40,
                       sprintf("%d-%d", YEAR-1, YEAR),
                       sprintf("%d-%d", YEAR, YEAR+1))) -> dat

That pipe flow says _”take `dat`, change-up some columns, make some new columns and reassign into `dat`”_. It’s a very natural flow and reads well, too, since you’re following a process up to it’s final destination. It’s even more natural in pipes that actually transform the data into something else. For example, to get a vector of the number of US male births since 1880, we’d do:

library(magrittr)
library(rvest)
 
births <- html("http://www.ssa.gov/oact/babynames/numberUSbirths.html")
 
births %>%
  html_nodes("table") %>%
  extract2(2) %>%
  html_table %>%
  use_series(Male) %>%
  gsub(",", "", .) %>%
  as.numeric -> males

That’s very readable (one of the benefits of pipes) and the flow, again, makes sense. Compare that to it’s base R counterpart:

males <- as.numeric(gsub(",", "", html_table(html_nodes(births, "table")[[2]])$Male))

The base R version is short and the LHS assignment fits well as the values “pop out” of the function calls. But, it’s also only initially, quickly readable to veteran R folks. Since code needs to be readable, maintainable and (often times) shared with folks on a team, I believe the pipes help increase overall productivity and aid in documenting what is trying to be achieved in that portion of an analysis (especially when combined with `dplyr` idioms).

Pipes are here to stay and they are definitely a part of my data analysis workflows. Moving forward, so will RHS (`->`) assignments from pipes.

I’ve updated my [metricsgraphics](https://github.com/hrbrmstr/metricsgraphics) package to version [0.7](https://github.com/hrbrmstr/metricsgraphics/releases/tag/v0.7). The core [MetricsGraphics](http://metricsgraphicsjs.org) JavaScript library has been updated to version 2.1.0 (from 1.1.0). Two blog-worthy features since releasing version 0.5 are `mjs_grid` (which is a `grid.arrange`-like equivalent for `metricsgraphics` plots and `mjs_add_rollover` which lets you add your own custom rollover text to the plots.

### The Grid

The `grid.arrange` (and `arrangeGrob`) functions from the `gridExtra` package come in handy when combining `ggplot2` charts. I wanted a similar way to arrange independent or linked `metricsgraphics` charts, hence `mjs_grid` was born.

`mjs_grid` uses the tag functions in `htmltools` to arrange `metricsgraphics` plot objects into an HTML `

` structure. At present, only uniform tables are supported, but I’m working on making the grid feature more flexible (just like `grid.arrange`). The current functionality is pretty straightforward:

– You build individual `metricsgraphics` plots;
– Optionally combine them in a `list`;
– Pass in the plots/lists into `mjs_grid`;
– Tell `mjs_grid` how many rows & columns are in the grid; and
– Specify the column widths

But, code > words, so here are some examples. To avoid code repetition, note that you’ll need the following packages available to run most of the snippets below:

library(metricsgraphics)
library(htmlwidgets)
library(htmltools)
library(dplyr)

First, we’ll combine a few example plots:

tmp <- data.frame(year=seq(1790, 1970, 10), uspop=as.numeric(uspop))
tmp %>%
  mjs_plot(x=year, y=uspop, width=300, height=300) %>%
  mjs_line() %>%
  mjs_add_marker(1850, "Something Wonderful") %>%
  mjs_add_baseline(150, "Something Awful") -> mjs1
 
mjs_plot(rnorm(10000), width=300, height=300) %>%
  mjs_histogram(bins=30, bar_margin=1) -> mjs2
 
movies <- ggplot2::movies[sample(nrow(ggplot2::movies), 1000), ]
mjs_plot(movies$rating, width=300, height=300) %>% mjs_histogram() -> mjs3
 
tmp %>%
  mjs_plot(x=year, y=uspop, width=300, height=300) %>%
  mjs_line(area=TRUE) -> mjs4
 
mjs_grid(mjs1, mjs2, mjs3, mjs4, ncol=2, nrow=2)

Since your can pass a `list` as a parameter, you can generate many (similar) plots and then grid-display them without too much code. This one generates 7 random histograms with linked rollovers and displays them in grid. Note that this example has `mjs_grid` using the same algorithm `grid.arrange` does for auto-computing “optimal” grid size.

lapply(1:7, function(x) {
  mjs_plot(rnorm(10000, mean=x/2, sd=x), width=250, height=250, linked=TRUE) %>%
    mjs_histogram(bar_margin=2) %>%
    mjs_labs(x_label=sprintf("Plot %d", x))
}) -> plots
 
mjs_grid(plots)

And, you can use `do` from `dplyr` to get `ggplot2` `facet_`-like behavior (though, one could argue that interactive graphics should use controls/selectors vs facets). This example uses the `tips` dataset from `reshape2` and creates a list of plots that are then passed to `mjs_grid`:

tips <- reshape2::tips
a <- tips %>%
  mutate(percent=tip/total_bill,
         day=factor(day, levels=c("Thur", "Fri", "Sat", "Sun"), ordered=TRUE)) %>%
  group_by(day) %>%
  do( plot={ day_label <- unique(.$day)
             mjs_plot(., x=total_bill, y=percent, width=275, height=275, left=100) %>%
               mjs_point(color_accessor=sex, color_type="category") %>%
               mjs_labs(x_label=sprintf("Total Bill (%s)", day_label), y_label="Tip %") })
 
mjs_grid(a$plot, ncol=2, nrow=2, widths=c(0.5, 0.5))

### Rollovers

I’ve had a few requests to support the use of different rollovers and this is a first stab at exposing MetricsGraphics’ native functionality to users of the `metricsgraphics` package. The API changed from MG 1.1.0 to 2.2.0, so I’m _kinda_ glad I waited for this. It requires knowledge of javascript, D3 and the use of `{{ID}}` as part of the CSS node selector when targeting the MetricsGraphics SVG element that displays the rollover text. Here is a crude, but illustrative example of how to take advantage of this feature (mouseover the graphics to see the altered text):

set.seed(1492)
dat <- data.frame(date=seq(as.Date("2014-01-01"),
                           as.Date("2014-01-31"),
                           by="1 day"),
                  value=rnorm(n=31, mean=0, sd=2))
 
dat %>%
  mjs_plot(x=date, y=value, width=500, height=300) %>%
  mjs_line() %>%
  mjs_axis_x(xax_format = "date") %>%
  mjs_add_mouseover("function(d, i) {
                $('{{ID}} svg .mg-active-datapoint')
                    .text('custom text : ' + d.date + ' ' + i);
                 }")

### Postremo

If you are using `metricsgraphics`, drop a link in the comments here to show others how you’re using it! If you need/want some functionality (I’m hoping to get `xts` support into the 0.8 release) that isn’t already in existing feature requests or something’s broken for you, post a new [issue on github](https://github.com/hrbrmstr/metricsgraphics/issues).