Skip navigation

Category Archives: DataViz

I noticed that the @rOpenSci folks had an interface to [ip-api.com](http://ip-api.com/) on their [ToDo](https://github.com/ropensci/webservices/wiki/ToDo) list so I whipped up a small R package to fill said gap.

Their IP Geolocation API will take an IPv4, IPv6 or FQDN and kick back a ASN, lat/lon, address and more. The [ipapi package](https://github.com/hrbrmstr/ipapi) exposes one function – `geolocate` which takes in a character vector of any mixture of IPv4/6 and domains and returns a `data.table` of results. Since `ip-api.com` has a restriction of 250 requests-per-minute, the package also tries to help ensure you don’t get your own IP address banned (there’s a form on their site you can fill in to get it unbanned if you do happen to hit the limit). Overall, there’s nothing fancy in the package, but it gets the job done.

I notified the rOpenSci folks about it, so hopefully it’ll be one less thing on that particular to-do list.

You can see it in action in combination with the super-spiffy [leaflet](http://www.htmlwidgets.org/showcase_leaflet.html) htmlwidget:

library(leaflet)
library(ipapi)
library(maps)
 
# get top 500 domains
sites <- read.csv("http://moz.com/top500/domains/csv", stringsAsFactors=FALSE)
 
# make reproducible
set.seed(1492)
 
# pick out a random 50
sites <- sample(sites$URL, 50) 
sites <- gsub("/", "", sites)
locations <- geolocate(sites)
 
# take a quick look
dplyr::glimpse(locations)
 
## Observations: 50
## Variables:
## $ as          (fctr) AS2635 Automattic, Inc, AS15169 Google Inc., AS3561...
## $ city        (fctr) San Francisco, Mountain View, Chesterfield, Mountai...
## $ country     (fctr) United States, United States, United States, United...
## $ countryCode (fctr) US, US, US, US, US, US, JP, US, US, IT, US, US, US,...
## $ isp         (fctr) Automattic, Google, Savvis, Google, Level 3 Communi...
## $ lat         (dbl) 37.7484, 37.4192, 38.6631, 37.4192, 38.0000, 33.7516...
## $ lon         (dbl) -122.4156, -122.0574, -90.5771, -122.0574, -97.0000,...
## $ org         (fctr) Automattic, Google, Savvis, Google, AddThis, Peer 1...
## $ query       (fctr) 192.0.80.242, 74.125.227.239, 206.132.6.134, 74.125...
## $ region      (fctr) CA, CA, MO, CA, , GA, 13, MA, TX, , MA, TX, CA, , ,...
## $ regionName  (fctr) California, California, Missouri, California, , Geo...
## $ status      (fctr) success, success, success, success, success, succes...
## $ timezone    (fctr) America/Los_Angeles, America/Los_Angeles, America/C...
## $ zip         (fctr) 94110, 94043, 63017, 94043, , 30303, , 02142, 78218...
 
# all i want is the world!
world <- map("world", fill = TRUE, plot = FALSE) 
 
# kick out a a widget
leaflet(data=world) %>% 
  addTiles() %>% 
  addCircleMarkers(locations$lon, locations$lat, 
                   color = '#ff0000', popup=sites)

50 Random Top Sites

A post on [StackOverflow](http://stackoverflow.com/questions/28725604/streamgraphs-dataviz-in-r-wont-plot) asked about using a continuous variable for the x-axis (vs dates) in my [streamgraph package](http://github.com/hrbrmstr/streamgraph). While I provided a workaround for the question, it helped me bump up the priority for adding support for continuous x axis scales. With the [DBIR](http://www.verizonenterprise.com/DBIR/) halfway behind me now, I kicked out a new rev of the package/widget that has support for continuous scales.

Using the data from the SO post, you can see there’s not much difference in how you use continuous vs date scales:

library(streamgraph)
 
dat <- read.table(text="week variable value
40     rev1  372.096
40     rev2  506.880
40     rev3 1411.200
40     rev4  198.528
40     rev5   60.800
43     rev1  342.912
43     rev2  501.120
43     rev3  132.352
43     rev4  267.712
43     rev5   82.368
44     rev1  357.504
44     rev2  466.560", header=TRUE)
 
dat %>% 
  streamgraph("variable","value","week", scale="continuous") %>% 
  sg_axis_x(tick_format="d")

Product Revenue

I’ll be adding support for using a categorical variable on the x axis soon. Once that’s done, it’ll be time to do the CRAN dance.

We were looking for a different type of visualization for a project at work this past week and my thoughts immediately gravitated towards [streamgraphs](http://www.leebyron.com/else/streamgraph/). The TLDR on streamgraphs is they they are generalized versions of stacked area graphs with free baselines across the x axis. They are somewhat [controversial](http://www.visualisingdata.com/index.php/2010/08/making-sense-of-streamgraphs/) but have a “draw you in” aesthetic appeal (which is what we needed for our visualization).

You can make streamgraphs/stacked area charts pretty easily in D3, and since we needed to try many different sets of data in the streamgraph style, it made sense to make this an R [htmlwidget](http://www.htmlwidgets.org/). Thus, the [streamgraph package](https://github.com/hrbrmstr/streamgraph) was born.

### Making a streamgraph

The package isn’t in CRAN yet, so you have to do the `devtools` dance:

devtools::install_github("hrbrmstr/streamgraph")

Streamgraphs require a continuous variable for the x axis, and the `streamgraph` widget/package works with years or dates (support for `xts` objects and `POSIXct` types coming soon). Since they display categorical values in the area regions, the data in R needs to be in [long format](http://blog.rstudio.org/2014/07/22/introducing-tidyr/) which is easy to do with `dplyr` & `tidyr`.

The package recognizes when years are being used and does all the necessary conversions for you. It also uses a technique similar to `expand.grid` to ensure all categories are represented at every observation (not doing so makes `d3.stack` unhappy).

Let’s start by making a `streamgraph` of the number of movies made per year by genre using the `ggplot2` `movies` dataset:

library(streamgraph)
library(dplyr)
 
ggplot2::movies %>%
  select(year, Action, Animation, Comedy, Drama, Documentary, Romance, Short) %>%
  tidyr::gather(genre, value, -year) %>%
  group_by(year, genre) %>%
  tally(wt=value) %>%
  streamgraph("genre", "n", "year") %>%
  sg_axis_x(20) %>%
  sg_fill_brewer("PuOr") %>%
  sg_legend(show=TRUE, label="Genres: ")

Movie count by genre by year

We can also mimic an example from the [Name Voyager](http://www.bewitched.com/namevoyager.html) project (using the `babynames` R package) but change some of the aesthetics, just to give an idea of how some of the options work:

library(dplyr)
library(babynames)
library(streamgraph)
 
babynames %>%
 filter(grepl("^(Alex|Bob|Jay|David|Mike|Jason|Stephen|Kymberlee|Lane|Sophie|John|Andrew|Thibault|Russell)$", name)) %>%
  group_by(year, name) %>%
  tally(wt=n) %>%
  streamgraph("name", "n", "year", offset="zero", interpolate="linear") %>%
  sg_legend(show=TRUE, label="DDSec names: ")

Data-Driven Security Podcast guest+host names by year

There are more examples over at [RPubs](http://rpubs.com/hrbrmstr/streamgraph04) and [github](http://hrbrmstr.github.io/streamgraph/), but I’ll close with a streamgraph of housing data originally made by [Alex Bresler](http://asbcllc.com/blog/2015/february/cre_stream_graph_test/):

dat <- read.csv("http://asbcllc.com/blog/2015/february/cre_stream_graph_test/data/cre_transaction-data.csv")
 
dat %>%
  streamgraph("asset_class", "volume_billions", "year") %>%
  sg_axis_x(1, "year", "%Y") %>%
  sg_fill_brewer("PuOr") %>%
  sg_legend(show=TRUE, label="Assets: ")

Commercial Real Estate Transaction Volume by Asset Class Since 2006

While the radical volume change would have been noticeable in almost any graph style, it’s especially noticeable with the streamgraph version as your eyes tend to naturally follow the curves of the flow.

### Fin

While I wouldn’t have these replace my trusty ggplot2 faceted bar charts for regular EDA and reporting, streamgraphs can add a bit of color and flair, and may be an especially good choice when you need to view many categorical variables over time.

As usual, issues/feature requests on [github](http://github.com/hrbrmstr/streamgraph) and showcase/general feedback in the comments.

I’ve updated my [metricsgraphics](https://github.com/hrbrmstr/metricsgraphics) package to version [0.7](https://github.com/hrbrmstr/metricsgraphics/releases/tag/v0.7). The core [MetricsGraphics](http://metricsgraphicsjs.org) JavaScript library has been updated to version 2.1.0 (from 1.1.0). Two blog-worthy features since releasing version 0.5 are `mjs_grid` (which is a `grid.arrange`-like equivalent for `metricsgraphics` plots and `mjs_add_rollover` which lets you add your own custom rollover text to the plots.

### The Grid

The `grid.arrange` (and `arrangeGrob`) functions from the `gridExtra` package come in handy when combining `ggplot2` charts. I wanted a similar way to arrange independent or linked `metricsgraphics` charts, hence `mjs_grid` was born.

`mjs_grid` uses the tag functions in `htmltools` to arrange `metricsgraphics` plot objects into an HTML `

` structure. At present, only uniform tables are supported, but I’m working on making the grid feature more flexible (just like `grid.arrange`). The current functionality is pretty straightforward:

– You build individual `metricsgraphics` plots;
– Optionally combine them in a `list`;
– Pass in the plots/lists into `mjs_grid`;
– Tell `mjs_grid` how many rows & columns are in the grid; and
– Specify the column widths

But, code > words, so here are some examples. To avoid code repetition, note that you’ll need the following packages available to run most of the snippets below:

library(metricsgraphics)
library(htmlwidgets)
library(htmltools)
library(dplyr)

First, we’ll combine a few example plots:

tmp <- data.frame(year=seq(1790, 1970, 10), uspop=as.numeric(uspop))
tmp %>%
  mjs_plot(x=year, y=uspop, width=300, height=300) %>%
  mjs_line() %>%
  mjs_add_marker(1850, "Something Wonderful") %>%
  mjs_add_baseline(150, "Something Awful") -> mjs1
 
mjs_plot(rnorm(10000), width=300, height=300) %>%
  mjs_histogram(bins=30, bar_margin=1) -> mjs2
 
movies <- ggplot2::movies[sample(nrow(ggplot2::movies), 1000), ]
mjs_plot(movies$rating, width=300, height=300) %>% mjs_histogram() -> mjs3
 
tmp %>%
  mjs_plot(x=year, y=uspop, width=300, height=300) %>%
  mjs_line(area=TRUE) -> mjs4
 
mjs_grid(mjs1, mjs2, mjs3, mjs4, ncol=2, nrow=2)

Since your can pass a `list` as a parameter, you can generate many (similar) plots and then grid-display them without too much code. This one generates 7 random histograms with linked rollovers and displays them in grid. Note that this example has `mjs_grid` using the same algorithm `grid.arrange` does for auto-computing “optimal” grid size.

lapply(1:7, function(x) {
  mjs_plot(rnorm(10000, mean=x/2, sd=x), width=250, height=250, linked=TRUE) %>%
    mjs_histogram(bar_margin=2) %>%
    mjs_labs(x_label=sprintf("Plot %d", x))
}) -> plots
 
mjs_grid(plots)

And, you can use `do` from `dplyr` to get `ggplot2` `facet_`-like behavior (though, one could argue that interactive graphics should use controls/selectors vs facets). This example uses the `tips` dataset from `reshape2` and creates a list of plots that are then passed to `mjs_grid`:

tips <- reshape2::tips
a <- tips %>%
  mutate(percent=tip/total_bill,
         day=factor(day, levels=c("Thur", "Fri", "Sat", "Sun"), ordered=TRUE)) %>%
  group_by(day) %>%
  do( plot={ day_label <- unique(.$day)
             mjs_plot(., x=total_bill, y=percent, width=275, height=275, left=100) %>%
               mjs_point(color_accessor=sex, color_type="category") %>%
               mjs_labs(x_label=sprintf("Total Bill (%s)", day_label), y_label="Tip %") })
 
mjs_grid(a$plot, ncol=2, nrow=2, widths=c(0.5, 0.5))

### Rollovers

I’ve had a few requests to support the use of different rollovers and this is a first stab at exposing MetricsGraphics’ native functionality to users of the `metricsgraphics` package. The API changed from MG 1.1.0 to 2.2.0, so I’m _kinda_ glad I waited for this. It requires knowledge of javascript, D3 and the use of `{{ID}}` as part of the CSS node selector when targeting the MetricsGraphics SVG element that displays the rollover text. Here is a crude, but illustrative example of how to take advantage of this feature (mouseover the graphics to see the altered text):

set.seed(1492)
dat <- data.frame(date=seq(as.Date("2014-01-01"),
                           as.Date("2014-01-31"),
                           by="1 day"),
                  value=rnorm(n=31, mean=0, sd=2))
 
dat %>%
  mjs_plot(x=date, y=value, width=500, height=300) %>%
  mjs_line() %>%
  mjs_axis_x(xax_format = "date") %>%
  mjs_add_mouseover("function(d, i) {
                $('{{ID}} svg .mg-active-datapoint')
                    .text('custom text : ' + d.date + ' ' + i);
                 }")

### Postremo

If you are using `metricsgraphics`, drop a link in the comments here to show others how you’re using it! If you need/want some functionality (I’m hoping to get `xts` support into the 0.8 release) that isn’t already in existing feature requests or something’s broken for you, post a new [issue on github](https://github.com/hrbrmstr/metricsgraphics/issues).

D Kelly O’Day did a [great post](https://chartsgraphs.wordpress.com/2015/01/16/nasa-gisss-annual-global-temperature-anomaly-trends/) on charting NASA’s Goddard Institute for Space Studies (GISS) temperature anomaly data, but it sticks with base R for data munging & plotting. While there’s absolutely nothing wrong with base R operations, I thought a modern take on the chart using `dplyr`, `magrittr` & `tidyr` for data manipulation and `ggplot2` for formatting would be helpful for the scores of new folk learning R this year (our little language is becoming [all the rage](http://redmonk.com/sogrady/2015/01/14/language-rankings-1-15/), it seems). I also really enjoy working with weather data.

Before further exposition, here’s the result:

forwp

I made liberal use of the “piping” idiom encouraged `magrittr`, `dplyr` and other new R packages, including the forward assignment operator `->` (which may put some folks off a bit). That also meant using `magrittr`’s aliases for `[` and `[[`, which are more readable in pipes.

I don’t use `library(tidyr)` since `tidyr`’s `extract` conflicts with `magrittr`’s, but you’ll see a `tidyr::gather` in the code for wide-to-long data shaping.

I chose to use the monthly temperature anomaly data as a base layer in the chart as a contrast to the monthly- and annual-anomaly means. I also marked the hottest annual- and annual-mean anomalies and framed the decades with vertical markers.

There are no hardcoded years or decades anywhere in the `ggplot2` code, so this should be quite reusable as the data source gets updated.

As I come back to the chart, I think there may be a bit too much “chart junk” on it, but you can tweak it to your own aesthetic preferences (if you do, drop a note in the comments with a link to your creation).

The code is below and in [this gist](https://gist.github.com/hrbrmstr/07ba10fb4c3fe9c9f3a0).

library(httr)
library(magrittr)
library(dplyr)
library(ggplot2)
 
# data retrieval ----------------------------------------------------------
 
# the user agent string was necessary for me; YMMV
 
pg <- GET("http://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.txt",
          user_agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A"))
 
# extract monthly data ----------------------------------------------------
 
content(pg, as="text") %>%
  strsplit("\n") %>%
  extract2(1) %>%
  grep("^[[:digit:]]", ., value=TRUE) -> lines
 
# extract column names ----------------------------------------------------
 
content(pg, as="text") %>%
  strsplit("\n") %>%
  extract2(1) %>%
  extract(8) %>%
  strsplit("\ +") %>%
  extract2(1) -> lines_colnames
 
# make data frame ---------------------------------------------------------
 
data <- read.table(text=lines, stringsAsFactors=FALSE)
colnames(data) <- lines_colnames
 
# transform data frame ----------------------------------------------------
 
data %>%
  tidyr::gather(month, value, Jan, Feb, Mar, Apr, May, Jun,
                       Jul, Aug, Sep, Oct, Nov, Dec) %>%     # wide to long
  mutate(value=value/100) %>%                                # convert to degree Celcius change
  select(year=Year, month, value) %>%                        # only need these fields
  mutate(date=as.Date(sprintf("%d-%d-%d", year, month, 1)),  # make proper dates
         decade=year %/% 10,                                 # calc decade
         start=decade*10, end=decade*10+9) %>%               # calc decade start/end
  group_by(decade) %>%
    mutate(decade_mean=mean(value)) %>%                      # calc decade mean
  group_by(year) %>%
    mutate(annum_mean=mean(value)) %>%                       # calc annual mean
  ungroup -> data
 
# start plot --------------------------------------------------------------
 
gg <- ggplot()
 
# decade vertical markers -------------------------------------------------
 
gg <- gg + geom_vline(data=data %>% select(end),
                      aes(xintercept=as.numeric(as.Date(sprintf("%d-12-31", end)))),
                          size=0.5, color="#4575b4", linetype="dotted", alpha=0.5)
 
# monthly data ------------------------------------------------------------
 
gg <- gg + geom_line(data=data, aes(x=date, y=value, color="monthly anomaly"),
                     size=0.35, alpha=0.25)
gg <- gg + geom_point(data=data, aes(x=date, y=value, color"monthly anomaly"),
                      size=0.75, alpha=0.5)
 
# decade mean -------------------------------------------------------------
 
gg <- gg + geom_segment(data=data %>% distinct(decade, decade_mean, start, end),
                        aes(x=as.Date(sprintf("%d-01-01", start)),
                            xend=as.Date(sprintf("%d-12-31", end)),
                            y=decade_mean, yend=decade_mean,
                            color="decade mean anomaly"),
                        linetype="dashed")
 
# annual data -------------------------------------------------------------
 
gg <- gg + geom_line(data=data %>% distinct(year, annum_mean),
                      aes(x=as.Date(sprintf("%d-06-15", year)), y=annum_mean,
                          color="annual mean anomaly"),
                      size=0.5)
gg <- gg + geom_point(data=data %>% distinct(year, annum_mean),
                      aes(x=as.Date(sprintf("%d-06-15", year)), y=annum_mean,
                          color="annual mean anomaly"),
                      size=2)
 
# additional annotations --------------------------------------------------
 
# max annual mean anomaly horizontal marker/text
 
gg <- gg + geom_hline(yintercept=max(data$annum_mean),  alpha=0.9,
                      color="#d73027", linetype="dashed", size=0.25)
 
gg <- gg + annotate("text",
                    x=as.Date(sprintf("%d-12-31", mean(range(data$year)))),
                    y=max(data$annum_mean),
                    color="#d73027", alpha=0.9,
                    hjust=0.25, vjust=-1, size=3,
                    label=sprintf("Max annual mean anomaly %2.1fºC", max(data$annum_mean)))
 
gg <- gg + geom_hline(yintercept=max(data$value),  alpha=0.9,
                      color="#7f7f7f", linetype="dashed", size=0.25)
 
# max annual anomaly horizontal marker/text
 
gg <- gg + annotate("text",
                    x=as.Date(sprintf("%d-12-31", mean(range(data$year)))),
                    y=max(data$value),
                    color="#7f7f7f",  alpha=0.9,
                    hjust=0.25, vjust=-1, size=3,
                    label=sprintf("Max annual anomaly %2.1fºC", max(data$value)))
 
gg <- gg + annotate("text",
                    x=as.Date(sprintf("%d-12-31", range(data$year)[2])),
                    y=min(data$value), size=3, hjust=1,
                    label="Data: http://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.txt")
 
# set colors --------------------------------------------------------------
 
gg <- gg + scale_color_manual(name="", values=c("#d73027", "#4575b4", "#7f7f7f"))
 
# set x axis limits -------------------------------------------------------
 
gg <- gg + scale_x_date(expand=c(0, 1),
                        limits=c(as.Date(sprintf("%d-01-01", range(data$year)[1])),
                                 as.Date(sprintf("%d-12-31", range(data$year)[2]))))
 
# add labels --------------------------------------------------------------
 
gg <- gg + labs(x=NULL, y="GLOBAL Temp Anomalies in 1.0ºC",
                title=sprintf("GISS Land and Sea Temperature Annual Anomaly Trend (%d to %d)\n",
                              range(data$year)[1], range(data$year)[2]))
 
# theme/legend tweaks -----------------------------------------------------
 
gg <- gg + theme_bw()
gg <- gg + theme(panel.grid=element_blank())
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(legend.position=c(0.9, 0.2))
gg <- gg + theme(legend.key=element_blank())
gg <- gg + theme(legend.background=element_blank())
gg

**NOTE** If there’s a particular data set from http://www.cdc.gov/flu/weekly/fluviewinteractive.htm that you want and that isn’t in the pacakge, please file it as an issue and be as specific as you can (screen shot if possible).

—–

Towards the end of 2014 I had been tinkering with flu data from the [CDC’s FluView portal](http://gis.cdc.gov/grasp/fluview/fluportaldashboard.html) since flu reports began to look like this season was going to go the way of 2009.

While you can track the flu over at [The Washington Post](http://www.washingtonpost.com/graphics/health/flu-tracker/), I like to work with data on my own. However the CDC’s portal is Flash-driven and there was no obvious way to get the data files programmatically. This is unfortunate, since there are weekly updates to the data set.

As an information security professional, one of the tools in my arsenal is [Burp Proxy](http://portswigger.net/burp/proxy.html), which is an application that—amongst other things—lets you configure a local proxy server for your browser and inspect all web requests. By using this tool, I was able to discern that the Flash portal calls out to `http://gis.cdc.gov/grasp/fluview/FluViewPhase2CustomDownload.ashx` with custom `POST` form parameters (that I also mapped out) to make the data sets it delivers back to the user.

With that information in hand, I whipped together a small R package: [cdcfluview](https://github.com/hrbrmstr/cdcfluview) to interface with the same server the FluView Portal does. It has a singular function – `get_flu_data` that lets you choose between different region/sub-region breakdowns and also whether you want data from WHO, ILINet (or both). It also lets you pick which years you want data for.

One reason I wanted to work with the data was to see just how this season differs from previous ones. The view I’ll leave on the blog this time—mostly as an example of how to use the package—is a faceted chart, by CDC region and CDC week showing this season (in red) as it relates to previous ones.

# devtools::install_github("hrbrmstr/cdcfluview") # if necessary
library(cdcfluview)
library(magrittr)
library(dplyr)
library(ggplot2)
 
dat <- get_flu_data(region="hhs", 
                    sub_region=1:10, 
                    data_source="ilinet", 
                    years=2000:2014)
 
dat %<>%
  mutate(REGION=factor(REGION,
                       levels=unique(REGION),
                       labels=c("Boston", "New York",
                                "Philadelphia", "Atlanta",
                                "Chicago", "Dallas",
                                "Kansas City", "Denver",
                                "San Francisco", "Seattle"),
                       ordered=TRUE)) %>%
  mutate(season_week=ifelse(WEEK>=40, WEEK-40, WEEK),
         season=ifelse(WEEK<40,
                       sprintf("%d-%d", YEAR-1, YEAR),
                       sprintf("%d-%d", YEAR, YEAR+1)))
 
prev_years <- dat %>% filter(season != "2014-2015")
curr_year <- dat %>% filter(season == "2014-2015")
 
curr_week <- tail(dat, 1)$season_week
 
gg <- ggplot()
gg <- gg + geom_point(data=prev_years,
                      aes(x=season_week, y=X..WEIGHTED.ILI, group=season),
                      color="#969696", size=1, alpa=0.25)
gg <- gg + geom_point(data=curr_year,
                      aes(x=season_week, y=X..WEIGHTED.ILI, group=season),
                      color="red", size=1.25, alpha=1)
gg <- gg + geom_line(data=curr_year, 
                     aes(x=season_week, y=X..WEIGHTED.ILI, group=season),
                     size=1.25, color="#d7301f")
gg <- gg + geom_vline(xintercept=curr_week, color="#d7301f", size=0.5, linetype="dashed", alpha=0.5)
gg <- gg + facet_wrap(~REGION, ncol=3)
gg <- gg + labs(x=NULL, y="Weighted ILI Index", 
                title="ILINet - 1999-2015 year weighted flu index history by CDC region\nWeek Ending Jan 3, 2015 (Red == current season)\n")
gg <- gg + theme_bw()
gg <- gg + theme(panel.grid=element_blank())
gg <- gg + theme(strip.background=element_blank())
gg <- gg + theme(axis.ticks.x=element_blank())
gg <- gg + theme(axis.text.x=element_blank())
gg

flureport

(You can see an SVG version of that plot [here](http://rud.is/dl/flureport.svg))

Even without looking at the statistics, it’s pretty easy to tell that this is fixing to be a pretty bad season in many regions.

### State-level data

Soon after this post I found the state-level API for the CDC FluView interface and added a `get_state_data` function for it:

library(statebins)
 
get_state_data() %>%
  filter(WEEKEND=="Jan-03-2015") %>%
  select(state=STATENAME, value=ACTIVITY.LEVEL) %>%
  filter(!(state %in% c("Puerto Rico", "New York City"))) %>% # need to add NYC & PR to statebins
  mutate(value=as.numeric(gsub("Level ", "", value))) %>%
  statebins(brewer_pal="RdPu", breaks=4,
            labels=c("Minimal", "Low", "Moderate", "High"),
            legend_position="bottom", legend_title="ILI Activity Level") +
  ggtitle("CDC State FluView (2014-01-03)")

state

As always, post bugs or feature requests on the [github repo](https://github.com/hrbrmstr/cdcfluview) and drop a note here if you’ve found the package useful or have some other interesting views or analyses to share.

Mozilla released the [MetricsGraphics.js library](http://metricsgraphicsjs.org/) back in November of 2014 ([gh repo](https://github.com/mozilla/metrics-graphics)) and was greeted with great fanfare. It’s primary focus is on crisp, clean layouts for interactive time-series data, but they have support for other chart types as well (though said support is far from comprehensive).

I had been pondering building an R package to help generate these charts when Ramnath Vaidyanathan, Kenton Russell & JJ Allaire came up with the insanely awesome [htmlwidgets](http://www.htmlwidgets.org/) R package, which is the best javascript<->R bridge to-date. Here’s a quick take on how to make a basic line chart before going into some package (and MetricsGraphics) details:

library(metricsgraphics)
 
tmp <- data.frame(year=seq(1790, 1970, 10), uspop=as.numeric(uspop))
 
tmp %>%
  mjs_plot(x=year, y=uspop) %>%
  mjs_line() %>%
  mjs_add_marker(1850, "Something Wonderful") %>%
  mjs_add_baseline(150, "Something Awful")

Example of Basic MetricsGrahpics Chart

One of the package goals (which should be evident from the example) is that it had to conform to the new “piping” idiom, made popular through the [magrittr](https://github.com/smbache/magrittr), [ggvis](http://ggvis.rstudio.com/) and [dplyr](http://github.com/dplyr) packages. This made it possible to avoid one function with a ton of parameters and help break out the chart building into logical steps. While it may not have the flexibility of `ggplot2`, you can do some neat things with MetricsGraphics charts, like use multiple lines:

set.seed(1492)
stocks <- data.frame(
  time = as.Date('2009-01-01') + 0:9,
  X = rnorm(10, 0, 1),
  Y = rnorm(10, 0, 2),
  Z = rnorm(10, 0, 4))
 
stocks %>%
  mjs_plot(x=time, y=X, width=500, height=350) %>%
  mjs_line() %>%
  mjs_add_line(Y) %>%
  mjs_add_line(Z) %>%
  mjs_axis_x(xax_format="date") %>%
  mjs_add_legend(c("X", "Y", "Z"))

Stocks X, Y & Z over time

and, pretty configurable scatterplots:

library(RColorBrewer)
 
mtcars %>%
  mjs_plot(x=wt, y=mpg, width=500, height=350) %>%
  mjs_point(color_accessor=cyl,
            x_rug=TRUE, y_rug=TRUE,
            size_accessor=carb,
            size_range=c(5, 10),
            color_type="category",
            color_range=brewer.pal(n=11, name="RdBu")[c(1, 5, 11)]) %>%
  mjs_labs(x="Weight of Car", y="Miles per Gallon")

Motor Trend Cars – mpg~wt

The `htmlwidgets` developers go into [great detail](http://www.htmlwidgets.org/develop_intro.html) on how to create a widget, but there are some central points I’ll cover and potentially reiterate.

First, use the `htmlwidgets::scaffoldWidget` that `htmlwidgets` provides to kickstart your project. It’ll setup the essentials and free your time up to work on the interface components. You will need to edit the generated `yaml` file to use the minified javascript files for things like jquery or d3 since Chrome will be unhappy if you don’t.

Next, remember that all you’re doing is building an R object with data to be passed into a javascript function/environment. MetricsGraphics made this a bit easier for me since the main graphic configuration is one, giant parameter list (take a look at the `metricsgraphics.js` source in github).

Third, if you need to customize the html generation function in the main `packagename_html` file, ensure you pass in `class` to the main `div` element. I was very pleased to discover that you can return a list of HTML elements vs a single one:

metricsgraphics_html <- function(id, style, class, ...) {
  list(tags$div(id = id, class = class, style=style),
       tags$div(id = sprintf("%s-legend", id), class = sprintf("%s-legend", class)))
}

and that may eventually enable support for facet-like functionality without manually creating multiple plots.

Fourth, try to build around the piping idiom. It makes it so much easier to add parameters and manage the data environment.

Fifth, use `iframe`s for embedding your visualizations in other documents (like this blog post). It avoids potential namespace collisions and frees you from having to cut/paste HTML from one doc to another.

And, lastly, remember that you can generate your own `elementId` in the event you need to use it with your javascript visualization library (like I had to).

Currently, `metricsgraphics` is at 0.4.1 and has support for most of the basic chart types along with linking charts (in `Rmd` files). You can install it from the [github repo](https://github.com/hrbrmstr/metricsgraphics) and make sure to file all issues or feature requests there. If you make something with it like @abresler [did](http://asbcllc.com/blog/2015/January/ww2_tanks/), drop a note in the comments!

Now, go forth and wrap some libraries!

Even though it’s still at version `0.4`, the `ggvis` package has quite a bit of functionality and is highly useful for exploratory data analysis (EDA). I wanted to see how geographical visualizations would work under it, so I put together six examples that show how to use various features of `ggvis` for presenting static & interactive cartographic creations. Specifically, the combined exercises demonstrate:

– basic map creation
– basic maps with points/labels
– dynamic choropleths (with various scales & tooltips)
– applying projections and custom color fills (w/tooltips)
– apply projections and projecting coordinates for plotting (w/tooltips that handle missing data well)

If you want to skip the post and head straight to the code you can [head on over to github](https://github.com/hrbrmstr/ggvis-maps), [peruse the R markdown file on RPubs](http://rpubs.com/hrbrmstr/ggvis-maps) or play with the [shiny version](https://hrbrmstr.shinyapps.io/ggvis-maps/). You’ll need that code to actually run any of the snippets below since I’m leaving out some code-cruft for brevity. Also, all the map graphics below were generated by saving the `ggvis` output as PNG files (for best browser compatibility), right from the `ggvis` renderer popup. Click/tap each for a larger version.

### Basic Polygons

1

Even though we still need the help of `ggplot2`’s `fortify`, it’s pretty straightforward to crank out a basic map in `ggvis`:

maine <- readOGR("data/maine.geojson", "OGRGeoJSON")
 
map <- ggplot2::fortify(maine, region="name")
 
map %>%
  ggvis(~long, ~lat) %>%
  group_by(group, id) %>%
  layer_paths(strokeOpacity:=0.5, stroke:="#7f7f7f") %>%
  hide_legend("fill") %>%
  hide_axis("x") %>% hide_axis("y") %>%
  set_options(width=400, height=600, keep_aspect=TRUE)

The code is very similar to one of the ways we render the same image in `ggplot`. We first read in the shapefile, convert it into a data frame we can use for plotting, group the polygons properly, render them with `layer_paths` and get rid of chart junk. Now, `ggvis` (to my knowledge as of this post) has no equivalent of `coord_map`, so we have to rely on the positioning in the projection and work out the proper `height` and `width` parameters to use with a uniform aspect ratio (`keep_aspect=TRUE`).

>For those not familiar with `ggvis` the `~` operator lets us tell `ggivs` which columns (or expressions using columns) to map to function parameters and `:=` operator just tells it to use a raw, un-scaled value. You can find out more about [why the tilde was chosen](https://github.com/rstudio/ggvis/issues/173) or about the [various other special operators](http://ggvis.rstudio.com/ggvis-basics.html).

### Basic Annotations

2

You can annotate maps in an equally straightforward way.

county_centers <- maine %>%
  gCentroid(byid=TRUE) %>%
  data.frame %>%
  cbind(name=maine$name %>% gsub(" County, ME", "", .) )
 
map %>%
  group_by(group, id) %>%
  ggvis(~long, ~lat) %>%
  layer_paths(strokeWidth:=0.25, stroke:="#7f7f7f") %>%
  layer_points(data=county_centers, x=~x, y=~y, size:=8) %>%
  layer_text(data=county_centers,
             x=~x+0.05, y=~y, text:=~name,
             baseline:="middle", fontSize:=8) %>%
  hide_legend("fill") %>%
  hide_axis("x") %>% hide_axis("y") %>%
  set_options(width=400, height=600, keep_aspect=TRUE)

>Note that the `group_by` works both before or after the `ggvis` call. Consistent pipe idioms FTW!

Here, we’re making a data frame out of the county centroids and names then using that in a call to `layer_points` and `layer_text`. Note how you can change the data source for each layer (just like in `ggplot)` and use expressions just like in `ggplot` (we moved the text just slightly to the right of the dot).

>Since `ggvis` outputs [vega](http://trifacta.github.io/vega/) and uses [D3](http://d3js.org/) for rendering, you should probably take a peek at those frameworks as it will help you understand the parameter name differences between `ggvis` and `ggplot`.

### Basic Choropleths

3

There are actually two examples of this basic state choropleth in the code, but one just uses a different color scale, so I’ll just post the code for one here. This is also designed for interactivity (it has tooltips and lets you change the fill variable) so you should run it locally or look at the [shiny version](https://hrbrmstr.shinyapps.io/ggvis-maps/).

# read in some crime & population data for maine counties
me_pop <- read.csv("data/me_pop.csv", stringsAsFactors=FALSE)
me_crime <- read.csv("data/me_crime.csv", stringsAsFactors=FALSE)
 
# get it into a form we can use (and only use 2013 data)
 
crime_1k <- me_crime %>%
  filter(year==2013) %>%
  select(1,5:12) %>%
  left_join(me_pop) %>%
  mutate(murder_1k=1000*(murder/population_2010),
         rape_1k=1000*(rape/population_2010),
         robbery_1k=1000*(robbery/population_2010),
         aggravated_assault_1k=1000*(aggravated_assault/population_2010),
         burglary_1k=1000*(burglary/population_2010),
         larceny_1k=1000*(larceny/population_2010),
         motor_vehicle_theft_1k=1000*(motor_vehicle_theft/population_2010),
         arson_1k=1000*(arson/population_2010))
 
# normalize the county names
 
map %<>% mutate(id=gsub(" County, ME", "", id)) %>%
  left_join(crime_1k, by=c("id"="county"))
 
# this is for the tooltip. it does a lookup into the crime data frame and
# then uses those values for the popup
 
crime_values <- function(x) {
  if(is.null(x)) return(NULL)
  y <- me_crime %>% filter(year==2013, county==x$id) %>% select(1,5:12)
  sprintf("<table width='100%%'>%s</table>",
          paste0("<tr><td style='text-align:left'>", names(y),
         ":</td><td style='text-align:right'>", format(y), collapse="</td></tr>"))
}
 
map %>%
  group_by(group, id) %>%
  ggvis(~long, ~lat) %>%
  layer_paths(fill=input_select(label="Crime:",
                                choices=crime_1k %>%
                                  select(ends_with("1k")) %>%
                                  colnames %>% sort,
                                id="Crime",
                                map=as.name),
              strokeWidth:=0.5, stroke:="white") %>%
  scale_numeric("fill", range=c("#bfd3e6", "#8c6bb1" ,"#4d004b")) %>%
  add_tooltip(crime_values, "hover") %>%
  add_legend("fill", title="Crime Rate/1K Pop") %>%
  hide_axis("x") %>% hide_axis("y") %>%
  set_options(width=400, height=600, keep_aspect=TRUE)

You can omit the `input_select` bit if you just want to do a single choropleth (just map `fill` to a single variable). The `input_select` tells `ggvis` to make a minimal bootstrap sidebar-layout scaffold around the actual graphic to enable variable interaction. In this case we let the user explore different types of crimes (by 1K population) and we also have a tooltip that shows the #’s of each crime in each county as we hover.

### Projections and Custom Colors

4

We’re pretty much (mostly) re-creating a [previous post](http://rud.is/b/2014/11/16/moving-the-earth-well-alaska-hawaii-with-r/) in this example and making a projected U.S. map with drought data (as of 2014-12-23).

us <- readOGR("data/us.geojson", "OGRGeoJSON")
us <- us[!us$STATEFP %in% c("02", "15", "72"),]
 
# same method to change the projection
 
us_aea <- spTransform(us, CRS("+proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs"))
 
map <- ggplot2::fortify(us_aea, region="GEOID")
 
droughts <- read.csv("data/dm_export_county_20141223.csv")
droughts$id <- sprintf("%05d", as.numeric(as.character(droughts$FIPS)))
droughts$total <- with(droughts, (D0+D1+D2+D3+D4)/5)
 
map_d <- merge(map, droughts, all.x=TRUE)
 
# pre-make custom colors per county
 
ramp <- colorRampPalette(c("white", brewer.pal(n=9, name="YlOrRd")), space="Lab")
 
map_d$fill_col <- as.character(cut(map_d$total, seq(0,100,10), include.lowest=TRUE, labels=ramp(10)))
map_d$fill_col <- ifelse(is.na(map_d$fill_col), "#FFFFFF", map_d$fill_col)
 
drought_values <- function(x) {
  if(is.null(x) | !(x$id %in% droughts$id)) return(NULL)
  y <- droughts %>% filter(id==x$id) %>% select(1,3,4,6:10)
  sprintf("<table width='100%%'>%s</table>",
          paste0("<tr><td style='text-align:left'>", names(y),
         ":</td><td style='text-align:right'>", format(y), collapse="</td></tr>"))
}
 
map_d %>%
  group_by(group, id) %>%
  ggvis(~long, ~lat) %>%
  layer_paths(fill:=~fill_col, strokeOpacity := 0.5, strokeWidth := 0.25) %>%
  add_tooltip(drought_values, "hover") %>%
  hide_legend("fill") %>%
  hide_axis("x") %>% hide_axis("y") %>%
  set_options(width=900, height=600, keep_aspect=TRUE)

It’s really similar to the previous code (and you may/should be familiar with the Albers transform from the previous post).

### World Domination

5

world <- readOGR("data/ne_50m_admin_0_countries.geojson", layer="OGRGeoJSON")
world <- world[!world$iso_a3 %in% c("ATA"),]
world <- spTransform(world, CRS("+proj=wintri"))
 
map_w <- ggplot2::fortify(world, region="iso_a3")
 
# really quick way to get coords from a KML file
 
launch_sites <- rbindlist(lapply(ogrListLayers("data/launch-sites.kml")[c language="(1:2,4:9)"][/c], function(layer) {
  tmp <- readOGR("data/launch-sites.kml", layer)
  places <- data.table(coordinates(tmp)[,1:2], as.character(tmp$Name))
}))
setnames(launch_sites, colnames(launch_sites), c("lon", "lat", "name"))
 
# now, project the coordinates we extracted
 
coordinates(launch_sites) <-  ~lon+lat
launch_sites <- as.data.frame(SpatialPointsDataFrame(spTransform(
  SpatialPoints(launch_sites, CRS("+proj=longlat")), CRS("+proj=wintri")),
  launch_sites@data))
 
map_w %>%
  group_by(group, id) %>%
  ggvis(~long, ~lat) %>%
  layer_paths(fill:="#252525", stroke:="white", strokeOpacity:=0.5, strokeWidth:=0.25) %>%
  layer_points(data=launch_sites, 
               x=~lon, y=~lat, 
               fill:="#cb181d", stroke:="white", 
               size:=25, fillOpacity:=0.5, strokeWidth:=0.25) %>%
  hide_legend("fill") %>%
  hide_axis("x") %>% hide_axis("y") %>%
  set_options(width=900, height=500, keep_aspect=TRUE)

The main differences in this example are the re-projection of the data we’re using. I grabbed a KML file of rocket launch sites from Wikipedia and made it into a data frame then [re]project those points into Winkel-Tripel for use with Winkel-Tripel world map made at the beginning of the example. The `ggplot` `coord_map` handles these transforms for you, so until there’s a `ggvis` equivalent, you’ll need to do it this way (though, there’s not Winkel-Tripel projection in the `mapproject` package so you kinda need to do it this way for `ggplot` as well for this projection).

### Wrapping Up

There’s code up on github for the “normal”, `Rmd` and Shiny versions of these examples. Give each a go and try tweaking various parameters, changing up the tooltips or using your own data. Don’t forget to drop a note in the comments with any of your creations and use github for any code issues.