Data Visualization Archives - Page 5 of 13

Category Archives: Data Visualization

An Ephemeral Update to daylight()

2015-11-02 – 08:45
Posted in Charts & Graphs, Data Visualization, DataVis, DataViz, ggplot, R
Tagged post
Comments (1)

This occurrence of the bi-annual corruption of the space-time continuum (i.e. changing to/from standard/daylight time) in the U.S. caused me to make a slight change to the code [from an older post](https://rud.is/b/2014/09/23/seeing-the-daylight-with-r/). The `daylight()` function now auto-discovers the date and location information (via [telize](http://www.telize.com/)) from the caller, which means all you have to do to get a plot like this:

is to source the [new gist](https://gist.github.com/hrbrmstr/e435d4fa0c31b8e1a9d0) like this:

devtools::source_gist(“e435d4fa0c31b8e1a9d0″, sha1=”64e859227266dc5f9008b3b3959a19fea373fee6”)

Remember that you should verify any code before blindly `source`ing it (in R or anywhere else) and make sure to use the SHA1 hash so you know you’re sourcing the proper code (and not potentially being pwnd).

Note that the granularity/accuracy of the geolocation is only as good as the Telize service (which uses MaxMind). The fact that this shows Vermont instead of Maine should make you all think thrice about trusting IP geolocation in general, especially you world-mapping cybersecurity folks.

Sadly, the darkest of days is still yet to come.

Less Drama, More Encoding

2015-10-24 – 10:43
Posted in Charts & Graphs, Data Analysis, Data Visualization, DataVis, DataViz, ggplot, R
Tagged post
Comments (1)

Junk Charts [adeptly noted and fixed](http://junkcharts.typepad.com/junk_charts/2015/10/is-it-worth-the-drama.html) this excessively stylized chart from the WSJ this week:

Their take on it does reduce the ZOMGOSH WE ARE DOOMED! look and feel of the WSJ chart:

But, we can further reduce the drama by using a more neutral color encoding _and_ encode both the # of outbreaks and total size of the impacted flock populations _per week_ with a lollipop chart (and, thankfully the USDA makes this data readily available):

library(xml2)
library(rvest)
library(dplyr)
library(stringr)
library(ggplot2)
library(scales)
library(viridis)
library(ggthemes)
 
pg <- read_html("https://www.aphis.usda.gov/wps/portal/aphis/ourfocus/animalhealth/sa_animal_disease_information/sa_avian_health/sa_detections_by_states/ct_ai_pacific_flyway/!ut/p/a1/lVNNb-IwEP0tPewx2OSbI_QDwkdBRbuQXKyJ4yTWJnYUG1D-fZ10D7RqadcHS555M_PmPRkl6IgSAWdegOZSQNW_E58stwt7PMN2NN9PHnH0_OdpE64DZ7twDSA2APzFmeL39dtV5Pf1L3i2eBjjvYMOKEEJFbrRJYqhKbkiVArNhCYVT1tou19YAZGnluSSntTwAsFrqEjJoNLldSTjioFihItctvWwxFv6zEFc4zOmGe3TiqQdURo0M62pJsBJA5TnnJK86i7Q9fwayjMU57ZPAezccnwvtdwM21aah9hcGHtuCP6Y5v_0uLHwj_S8n08XbrA2CrqhjaMHUxxMNhhH_nf1g57fdBgAtyz7MGT-ODNDnta7YLW08cpDsSEZfMli4qL9f6q2_IEsdru53xSmLejS6g1Gx5vGv6WvjB8CnxmPjp8af5ihxJNBpIqeX1HJdPgQ8VSkTmiItCxnLWtHpVQaHS-Xy-ikMhgV8oya-ncdOh23_r6E2PGqYrerD9O7u1eBlNG5/?1dmy&urile=wcm%3apath%3a%2Faphis_content_library%2Fsa_our_focus%2Fsa_animal_health%2Fsa_animal_disease_information%2Fsa_avian_health%2Fsa_detections_by_states%2Fct_ai_full_list")
 
dat <- html_table(html_nodes(pg, "table"))[[1]]
 
dat %>% 
  mutate(`Confirmation date` = as.Date(`Confirmation date`, "%b %d, %Y"),
         week = format(`Confirmation date`, "%Y-%U"),
         week_start = as.Date(sprintf("%s-1", week), "%Y-%U-%u") ,
         `Flock size` = as.numeric(str_replace_all(`Flock size`, ",", ""))) %>% 
  select(week, week_start, `Flock size`) %>% 
  filter(!is.na(`Flock size`)) %>% 
  group_by(week_start) %>% 
  summarize(outbreaks=n(), 
            flock_total=sum(`Flock size`)) -> dat
 
first <- dat[2,]
last <- tail(dat, 1)
 
gg <- ggplot(dat, aes(x=week_start, y=outbreaks))
gg <- gg + geom_vline(xintercept=as.numeric(first$week_start), linetype="dashed", size=0.2, color="#7f7f7f")
gg <- gg + geom_text(data=first, aes(x=week_start, y=25), label=" First detection in 2015", hjust=0, size=3, color="#7f7f7f")
gg <- gg + geom_vline(xintercept=as.numeric(last$week_start), linetype="dashed", size=0.2, color="#7f7f7f")
gg <- gg + geom_text(data=last, aes(x=week_start, y=25), label="Last detection ", hjust=1, size=3, color="#7f7f7f")
gg <- gg + geom_segment(aes(x=week_start, xend=week_start, y=0, yend=outbreaks, color=flock_total), size=0.5)
gg <- gg + geom_point(aes(size=flock_total, fill=flock_total), shape=21)
gg <- gg + scale_size_continuous(name="Flock Impact", label=comma, guide="legend")
gg <- gg + scale_color_viridis(name="Flock Impact", label=comma, guide="legend")
gg <- gg + scale_fill_viridis(name="Flock Impact", label=comma, guide="legend")
gg <- gg + scale_x_date(label=date_format("%b"))
gg <- gg + guides(color=guide_legend(), fill=guide_legend(), size=guide_legend())
gg <- gg + labs(x=NULL, y="# Outbreaks", title="Avian Flu Impact by Week (2015)")
gg <- gg + theme_tufte(base_family="Helvetica")
gg <- gg + theme(legend.key=element_rect(color=rgb(0,0,0,0)))
gg

If we really want to see the discrete events, we can do that with our less-ZOMGOSH color scheme, too:

dat <- html_table(html_nodes(pg, "table"))[[1]]
dat %>% 
  mutate(`Confirmation date` = as.Date(`Confirmation date`, "%b %d, %Y"),
         `Flock size` = as.numeric(str_replace_all(`Flock size`, ",", ""))) %>% 
  filter(!is.na(`Flock size`)) %>% 
  rename(date=`Confirmation date`) %>% 
  arrange(date) -> dat
 
first <- dat[2,]
last <- tail(dat, 1)
 
gg <- ggplot(dat, aes(x=date, y=`Flock size`))
gg <- gg + geom_vline(xintercept=as.numeric(first$date), linetype="dashed", size=0.2, color="#7f7f7f")
gg <- gg + geom_text(data=first, aes(x=date, y=3000000), label=" First detection in 2015", hjust=0, size=3, color="#7f7f7f")
gg <- gg + geom_vline(xintercept=as.numeric(last$date), linetype="dashed", size=0.2, color="#7f7f7f")
gg <- gg + geom_text(data=last, aes(x=date, y=3000000), label="Last detection ", hjust=1, size=3, color="#7f7f7f")
gg <- gg + geom_segment(aes(x=date, xend=date, y=0, yend=`Flock size`, color=`Flock size`), size=0.5, alpha=1)
gg <- gg + scale_size_continuous(name="Flock Impact", label=comma, guide="legend")
gg <- gg + scale_color_viridis(name="Flock Impact", label=comma, guide="legend")
gg <- gg + scale_fill_viridis(name="Flock Impact", label=comma, guide="legend")
gg <- gg + scale_x_date(label=date_format("%b"))
gg <- gg + scale_y_continuous(label=comma)
gg <- gg + guides(color=guide_legend(), fill=guide_legend(), size=guide_legend())
gg <- gg + labs(x=NULL, y="Flock size", title="Avian Flu Impact (2015)")
gg <- gg + theme_tufte(base_family="Helvetica")
gg <- gg + theme(legend.key=element_rect(color=rgb(0,0,0,0)))
gg

Neither of those is ever going to sell any ads, tho.

Replicating NatGeo’s “Proper” Earthquake Map in R

2015-10-04 – 20:01
Posted in cartography, Data Visualization, DataVis, DataViz, ggplot, gis, maps, R
Tagged post
Comments (22)

I saw this post over at NatGeo over the weekend and felt compelled to replicate this:

with ggplot2.

Three shapefiles later and we have it close enough to toss into a post (and I really don’t believe the continent names are necessary).

library(rgdal)
library(ggplot2)
library(ggthemes)
library(ggalt) # devtools::install_github("hrbrmstr/ggalt")

# grab these from http://rud.is/dl/quakefiles.tgz

world <- readOGR("countries.geo.json", "OGRGeoJSON", stringsAsFactors=FALSE)
plates <- readOGR("plates.json", "OGRGeoJSON", stringsAsFactors=FALSE)
quakes <- readOGR("quakes.json", "OGRGeoJSON", stringsAsFactors=FALSE)

world_map <- fortify(world)
plates_map <- fortify(plates)
quakes_dat <- data.frame(quakes)
quakes_dat$trans <- quakes_dat$mag %% 5

gg <- ggplot()
gg <- gg + geom_cartogram(data=world_map, map=world_map,
                          aes(x=long, y=lat, map_id=id),
                          color="white", size=0.15, fill="#d8d8d6")
gg <- gg + geom_cartogram(data=plates_map, map=plates_map,
                          aes(x=long, y=lat, map_id=id),
                          color="black", size=0.1, fill="#00000000", alpha=0)
gg <- gg + geom_point(data=quakes_dat,
                      aes(x=coords.x1, y=coords.x2, size=trans),
                      shape=1, alpha=1/3, color="#d47e5d", fill="#00000000")
gg <- gg + geom_point(data=subset(quakes_dat, mag>7.5),
                      aes(x=coords.x1, y=coords.x2, size=trans),
                      shape=1, alpha=1, color="black", fill="#00000000")
gg <- gg + geom_text(data=subset(quakes_dat, mag>7.5),
                     aes(x=coords.x1, y=coords.x2, label=sprintf("Mag %2.1f", mag)),
                     color="black", size=3, vjust=c(3.9, 3.9, 5), fontface="bold")
gg <- gg + scale_size(name="Magnitude", trans="exp", labels=c(5:8), range=c(1, 20))
gg <- gg + coord_map("mollweide")
gg <- gg + theme_map()
gg <- gg + theme(legend.position=c(0.05, 0.99))
gg <- gg + theme(legend.direction="horizontal")
gg <- gg + theme(legend.key=element_rect(color="#00000000"))
gg

I can only imagine how many mouse clicks that would be in a GIS program.

Addendum

This comment has links to where I acquired the shapefiles (good q by Michael)
Thanks to this comment there’s now code in a gist that makes a map re-cenetered on the Pacific

For Some Definition of “Directly” and/or “Contort”

2015-09-30 – 09:21
Posted in Data Visualization, DataVis, DataViz, ggplot, R
Tagged post
Comments (5)

Junk Charts did a post on [Don’t pick your tool before having your design](http://junkcharts.typepad.com/junk_charts/2015/09/dont-pick-your-tool-before-having-your-design.html) and made a claim that this:

_”cannot be produced directly from a tool (without contorting your body in various painful locations)”_.

I beg to differ.

With R & ggplot2, I get to both pick my tool and design at the same time since I have a very flexible and multi-purpose tool. I also don’t believe that the code below qualifies as “contortions”, though I am a ggplot2 fanboi. It’s no different than Excel folks clicking on radio buttons and color pickers, except my process is easily repeatable & scalable once finalized (this is not finalized as it’s not 100% parameterized but it’s not difficult to do that last part).

library(ggplot2)
 
dat <- data.frame(year=2010:2015,
                  penalties=c(627, 625, 653, 617, 661, 730))
 
avg <- data.frame(val=mean(head(dat$penalties, -1)),
                  last=dat$penalties[6],
                  lab="5-Yr\nAvg")
 
gg <- ggplot(dat, aes(x=year, y=penalties))
gg <- gg + geom_point()
gg <- gg + scale_x_continuous(breaks=c(2010, 2014, 2015), limits=c(NA, 2015.1))
gg <- gg + scale_y_continuous(breaks=c(600, 650, 700, 750), 
                              limits=c(599, 751), expand=c(0,0))
gg <- gg + geom_segment(data=avg, aes(x=2010, xend=2015, y=val, yend=val), linetype="dashed")
gg <- gg + geom_segment(data=avg, aes(x=2015, xend=2015, y=val, yend=last), color="steelblue")
gg <- gg + geom_point(data=avg, aes(x=2015, y=val), shape=4, size=3)
gg <- gg + geom_text(data=avg, aes(x=2015, y=val), label="5-Yr\nAvg", size=2.5, hjust=-0.3)
gg <- gg + geom_point(data=avg, aes(x=2015, y=700), shape=17, col="steelblue")
gg <- gg + geom_point(data=avg, aes(x=2015, y=730), shape=4, size=3)
gg <- gg + labs(x=NULL, y="Number of Penalties", 
                title="NFL Penalties Jumped 15% in the\nFirst 3 Weeks of the 2015 Season\n")
gg <- gg + theme_bw()
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(panel.grid.major.x=element_blank())
gg <- gg + theme(panel.grid.major.y=element_line(color="white"))
gg <- gg + theme(panel.background=element_rect(fill="#f3f2f7"))
gg <- gg + theme(axis.ticks=element_blank())
gg

Roll Your Own Stats and Geoms in ggplot2 (Part 1: Splines!)

2015-09-08 – 19:06
Posted in Data Visualization, DataVis, DataViz, ggplot, R
Tagged post
Comments (4)

A huge change is coming to ggplot2 and you can get a preview of it over at Hadley’s github repo. I’ve been keenly interested in this as I will be fixing, finishing & porting coord_proj to it once it’s done.

Hadley & Winston have re-built ggplot2 with an entirely new object-oriented system called ggproto. With ggproto it’s now possible to easily extend ggplot2 from within your own packages (since source() is so last century), often times with very little effort.

Before attempting to port coord_proj I wanted to work through adding a Geom and Stat since thought it would be cool to be able to have interpolated line charts (and it helps answer some recurring StackOverflow “spline”/ggplot2 questions) and also prefer KernSmooth::bkde over the built-in density function (which geom_density and stat_density both use).

To that end, I’ve made a new github-installable package called ggalt (h/t to @jayjacobs for the better package name than I originally came up with) where I’ll be adding new Geoms, Stats, Coords (et al) as I craft them. For now, let me introduce both geom_xspline() and geom_bkde() to show how easy it is to incorporate new functionality into ggplot2.

While not a requirement, I think it’s a going to be a good idea to make both a paired Geom and Stat when adding those types of functionality to ggplot2. I found it easier to work with custom parameters this way and it also makes it feel a bit more like the way ggplot2 itself works. For the interpolated line geom/stat I used R’s graphics::xpsline function. Here’s all it took to give ggplot2 lines some curves (you can find the commented version on github):

geom_xspline <- function(mapping = NULL, data = NULL, stat = "xspline",
                      position = "identity", show.legend = NA,
                      inherit.aes = TRUE, na.rm = TRUE,
                      spline_shape=-0.25, open=TRUE, rep_ends=TRUE, ...) {
  layer(
    geom = GeomXspline,
    mapping = mapping,
    data = data,
    stat = stat,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(spline_shape=spline_shape,
                  open=open,
                  rep_ends=rep_ends,
                  ...)
  )
}

GeomXspline <- ggproto("GeomXspline", GeomLine,
  required_aes = c("x", "y"),
  default_aes = aes(colour = "black", size = 0.5, linetype = 1, alpha = NA)
)

stat_xspline <- function(mapping = NULL, data = NULL, geom = "line",
                     position = "identity", show.legend = NA, inherit.aes = TRUE,
                     spline_shape=-0.25, open=TRUE, rep_ends=TRUE, ...) {
  layer(
    stat = StatXspline,
    data = data,
    mapping = mapping,
    geom = geom,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(spline_shape=spline_shape,
                  open=open,
                  rep_ends=rep_ends,
                  ...
    )
  )
}

StatXspline <- ggproto("StatXspline", Stat,

  required_aes = c("x", "y"),

  compute_group = function(self, data, scales, params,
                           spline_shape=-0.25, open=TRUE, rep_ends=TRUE) {
    tf <- tempfile(fileext=".png")
    png(tf)
    plot.new()
    tmp <- xspline(data$x, data$y, spline_shape, open, rep_ends, draw=FALSE, NA, NA)
    invisible(dev.off())
    unlink(tf)

    data.frame(x=tmp$x, y=tmp$y)
  }
)

If that seems like alot of code, it really isn't. What we have there are:

two functions that handle the Geom aspects &
two functions that handle the Stat aspects.

Let's look at the Stat functions first, though you can also just read the handy vignette, too.

Adding `Stat`s

In this particular case, we have it easy. We get to use geom_line/GeomLine as the base geom_ for the layer since all we're doing is generating more points for it to draw line segments between. We create the creative interface to our new Stat with stat_xspline add three new parameters with default values:

spline_shape
open
rep_ends

"Added three new parameters to what?" you ask? GeomLine/geom_line default to StatIdentity/stat_identity and if you look at the source code, that Stat just returns the data back in the form it came in. We're going to take these three new parameters and pass them to xspline and then return entirely new values back for ggplot2/grid to draw for us, so we tell it to call our new computation engine by giving it the StatXspline value to the layer. By using GeomLine/geom_line as the geom parameter, all we have to do is ensure we pass back the proper values. We do that in compute_group since ggplot2 will segment the incoming data into groups (via the group aesthetic) for us. We take each group and run them through the xspline with the parameters the user specified. If I didn't have to use the hack to work around what seems to be errant plot device issues in xspline, the call would be one line.

Adding `Geom`s

We pair up the Stat with a very basic Geom "shim" so we can use them interchangeably. It's the same idiom, an "object" function and the user-callable function. In this case, it's super-lightweight since we're really having geom_line do all the work for us. In a [very] future post, I'll cover more complex Geoms that require use of the underlying grid graphics system, but I suspect most of your own additions may be able to use the lightweight idiom here (and that's covered in the vignette).

Putting Our New Functions To Work

With our new additions to ggplot2, we can compare the output of geom_smooth to geom_xspline with some test data:

set.seed(1492)
dat <- data.frame(x=c(1:10, 1:10, 1:10),
                  y=c(sample(15:30, 10), 2*sample(15:30, 10), 3*sample(15:30, 10)),
                  group=factor(c(rep(1, 10), rep(2, 10), rep(3, 10)))
)

ggplot(dat, aes(x, y, group=group, color=factor(group))) +
  geom_point(color="black") +
  geom_smooth(se=FALSE, linetype="dashed", size=0.5) +
  geom_xspline(size=0.5)

README-unnamed-chunk-4-3

The github page has more examples for the function, but you don't have to be envious of the smooth D3 curves any more.

I realize this particular addition is not extremely helpful/beneficial, but the next one is. We'll look at adding a new/more accurate density Stat/Geom in the next installment and then discuss the "on-steroids" roxygen2 comments you'll end up using for your creations in part 3.

Coloring (and Drawing) Outside the Lines in ggplot

2015-08-27 – 09:36
Posted in Data Visualization, DataVis, DataViz, ggplot, R
Tagged post
Comments (8)

Time for another Twitter-inspired blog post this week, this time from a tweet by @JonKalodimos:

Is there a way to do this in #rstats #ggplot2 https://t.co/kxWQFlYpbB

— Jonathan Kalodimos (@JonKalodimos) August 27, 2015

I had seen and appreciated Ann’s post on her makeover of the main graphic in [NPR’s story](http://www.npr.org/sections/money/2014/10/21/357629765/when-women-stopped-coding) and did a quick mental check of how I’d do the same in ggplot2 as I was reading it. Jon’s question was a good prompt to dump physical memory to internet memory.

Here’s the NPR graphic:

When_Women_Stopped_Coding___Planet_Money___NPR

It is actually pretty darn good on it’s own, but I also agree with Ann that direct labeling could have made it better. Here’s her makeover:

Let’s see how to do this in ggplot2. We’ll use the actual data from NPR’s story since the graphic was built with D3 and, hence, the data is part of the graphic. Let’s get the `library` stuff out of the way:

library(ggplot2)
library(dplyr)
library(tidyr)
library(stringr)
library(scales)
library(gridExtra)
library(grid)

Now, we’ll grab the CSV that the NPR folks used for the graphic and take a look at it. I found it via Developer Tools in Chrome:

# use the NPR story data file ---------------------------------------------
# and be kind to NPR's bandwidth budget
url <- "http://apps.npr.org/dailygraphics/graphics/women-cs/data.csv"
fil <- "gender.csv"
if (!file.exists(fil)) download.file(url, fil)
 
gender <- read.csv(fil, stringsAsFactors=FALSE)
 
# take a look at the CSV structure ----------------------------------------
 
glimpse(gender)
 
## Observations: 48
## Variables:
## $ date              (int) 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, ...
## $ Medical.School    (dbl) 0.09, 0.10, 0.10, 0.09, 0.09, 0.11, 0.14, 0.17, 0.20, 0.22, 0.24, 0.25, 0.25, 0.25, 0.28, 0.29, 0.31, ...
## $ Law.School        (chr) "0.04", "0.04", "0.05", "0.07", "0.07", "0.1", "0.12", "0.16", "0.2", "0.24", "0.27", "0.28", "0.3", "...
## $ Physical.Sciences (chr) "0.14", "0.14", "0.14", "0.14", "0.14", "0.15", "0.16", "0.16", "0.17", "0.19", "0.2", "0.2", "0.22", ...
## $ Computer.science  (dbl) 0.146, 0.108, 0.120, 0.130, 0.129, 0.136, 0.136, 0.149, 0.164, 0.190, 0.198, 0.239, 0.258, 0.281, 0.30...
 
tail(gender)
 
##    date Medical School Law School Physical Sciences Computer science
## 43 2008           0.48       0.47              0.41            0.177
## 44 2009           0.48       0.47              0.42            0.179
## 45 2010           0.48       0.47              0.41            0.182
## 46 2011           0.47         tk                tk            0.177
## 47 2012           0.47         tk                tk            0.182
## 48 2013           0.46         tk                              0.179

Those `tk` values are referred to in the [code that makes the NPR graphic](http://apps.npr.org/dailygraphics/graphics/women-cs/js/graphic.js) so we’ll replace them with `NA`s and make all the columns numeric:

gender <- mutate_each(gender, funs(as.numeric))

We should also clean up the column names since we’ll be using them for the legend and the direct labels:

colnames(gender) <- str_replace(colnames(gender), "\\.", " ")
 
gender_long <- mutate(gather(gender, area, value, -date),
                      area=factor(area, levels=colnames(gender)[2:5],
                                  ordered=TRUE))

That that code link also has the colors NPR used for the graphic, so let’s define those now since we bothered to look at it:

gender_colors <- c('#11605E', '#17807E', '#8BC0BF','#D8472B')
names(gender_colors) <- colnames(gender)[2:5]

We’ll be needing those names later on, hence why I named the values in the vector.

With the data, labels and colors defined, we can make a “standard” ggplot:

chart_title <- expression(atop("What Happened To Women In Computer Science?",
                               atop(italic("% Of Women Majors, By Field"))))
 
gg <- ggplot(gender_long)
gg <- gg + geom_line(aes(x=date, y=value, group=area, color=area))
gg <- gg + scale_color_manual(name="", values=gender_colors)
gg <- gg + scale_y_continuous(label=percent)
gg <- gg + labs(x=NULL, y=NULL, title=chart_title)
gg <- gg + theme_bw(base_family="Helvetica")
gg <- gg + theme(axis.ticks.y=element_blank())
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(legend.key=element_blank())
gg

Rplot01

That’s also pretty good on it’s own. It’s possible to make it look a bit more like the NPR chart, but it’s hard to format a title & subtitle in a ggplot title _and_ have it left-justified, so I opted for font style. It’s also possible to make the legend look like NPR’s but that’s not the point of the post.

So, how do we make this look more like Ann’s makeover?

First we need to get the last values for each of the variables so we know what point on the `y` axis we need to place the labels. That’s made a bit trickier with the `NA`s:

last_vals <- sapply(colnames(gender)[2:5], function(x) last(na.exclude(gender[,x])))
last_date <- tail(gender$date)+1 # doing this ^ wld have made it a double

Next, we need to turn off the legend and increase the plot margin on the right-hand side:

gg <- gg + theme(legend.position="none")
gg <- gg + theme(plot.margin = unit(c(1, 7, 2, 1), "lines"))

I figured out those #’s by interactive trial-and-error, though I initially guessed `6` for the right-hand margin increase. Also, this should demonstrate one reason for the `gg <- gg +` madness you see in my code/posts since, when you start doing more in ggplot, you end up with that idiom more oft than not. Now, we add the labels. We do it with with custom annotations that are placed "one year" after the latest `x` value and at the same `y` value as the last reading of each area. We also color the label the same as the line, which is why we needed a named vector.

for (i in 1:length(last_vals)) {
  gg <- gg + annotation_custom(grob=textGrob(names(last_vals)[i], hjust=0,
                                             gp=gpar(fontsize=8, 
                                                     col=gender_colors[names(last_vals)[i]])),
                               xmin=2014, xmax=2014,
                               ymin=last_vals[i], ymax=last_vals[i])
}

Finally, we have to do some of the remaining work by hand since we have to turn off panel clipping and the only way I know how to do that is at the grob/gtable level, but it’s not that scary or complex of a task. Also, since we are manipulating the built ggplot object, we have to use `grid.draw` to present our chart:

gb <- ggplot_build(gg)
gt <- ggplot_gtable(gb)
 
gt$layout$clip[gt$layout$name=="panel"] <- "off"
 
grid.draw(gt)

Here’s the result:

Rplot02

I’ve deliberately left the fonts a bit small and not-changed their positions on the `y`-axis to give readers a bit of homework. They both _should_ be changed and the plot margins could also be tweaked a tad. You can find the complete code [on github](https://gist.github.com/hrbrmstr/83deb0baeabae0824389) so tweak away!

If you have another way to accomplish the same task or want to show off your tweaked version, drop a note in the comments or at that gist link.

Track Hurricane Danny (Interactively) with R + leaflet

2015-08-20 – 11:41
Posted in Data Visualization, DataVis, DataViz, R, Weather
Tagged post
Comments (5)

poster image

Danny became the [first hurricane of the 2015 Season](http://www.accuweather.com/en/weather-news/atlantic-gives-birth-to-tropical-depression-four-danny/51857239), so it’s a good time to revisit how one might be able to track them with R.

We’ll pull track data from [Unisys](http://weather.unisys.com/hurricane/atlantic/2015/index.php) and just look at Danny, but it should be easy to extrapolate from the code.

For this visualization, we’ll use [leaflet](http://rstudio.github.io/leaflet/) since it’s all the rage and makes the plots interactive without any real work (thanks to the very real work by the HTML Widgets folks and the Leaflet.JS folks).

Let’s get the library calls out of the way:

library(leaflet)
library(stringi)
library(htmltools)
library(RColorBrewer)

Now, we’ll get the tracks:

danny <- readLines("http://weather.unisys.com/hurricane/atlantic/2015/DANNY/track.dat")

Why aren’t we using `read.csv` or `read.table` directly, you ask? Well, the data is in a _really_ ugly format thanks to the spaces in the `STATUS` column and two prefix lines:

Date: 18-20 AUG 2015
Hurricane-1 DANNY
ADV  LAT    LON      TIME     WIND  PR  STAT
  1  10.60  -36.50 08/18/15Z   30  1009 TROPICAL DEPRESSION
  2  10.90  -37.50 08/18/21Z    -     - TROPICAL DEPRESSION
  3  11.20  -38.80 08/19/03Z    -     - TROPICAL DEPRESSION
  4  11.30  -40.20 08/19/09Z    -     - TROPICAL DEPRESSION
  5  11.20  -41.10 08/19/15Z    -     - TROPICAL DEPRESSION
  6  11.50  -42.00 08/19/21Z    -     - TROPICAL DEPRESSION
  7  12.10  -42.70 08/20/03Z    -     - TROPICAL DEPRESSION
  8  12.20  -43.70 08/20/09Z    -     - TROPICAL DEPRESSION
  9  12.50  -44.80 08/20/15Z    -     - TROPICAL DEPRESSION
+12  13.10  -46.00 08/21/00Z   70     - HURRICANE-1
+24  14.00  -47.60 08/21/12Z   75     - HURRICANE-1
+36  14.70  -49.40 08/22/00Z   75     - HURRICANE-1
+48  15.20  -51.50 08/22/12Z   70     - HURRICANE-1
+72  16.00  -56.40 08/23/12Z   65     - HURRICANE-1
+96  16.90  -61.70 08/24/12Z   65     - HURRICANE-1
+120  18.00  -66.60 08/25/12Z   55     - TROPICAL STORM

But, we can put that into shape pretty easily, using `gsub` to make it easier to read everything with `read.table` and we just skip over the first two lines (we’d use them if we were doing other things with more of the data).

danny_dat <- read.table(textConnection(gsub("TROPICAL ", "TROPICAL_", danny[3:length(danny)])), 
           header=TRUE, stringsAsFactors=FALSE)

Now, let’s make the data a bit prettier to work with:

# make storm type names prettier
danny_dat$STAT <- stri_trans_totitle(gsub("_", " ", danny_dat$STAT))
 
# make column names prettier
colnames(danny_dat) <- c("advisory", "lat", "lon", "time", "wind_speed", "pressure", "status")

Those steps weren’t absolutely necessary, but why do something half-baked (unless it’s chocolate chip cookies)?

Let’s pick better colors than Unisys did. We’ll use a color-blind safe palette from Color Brewer:

danny_dat$color <- as.character(factor(danny_dat$status, 
                          levels=c("Tropical Depression", "Tropical Storm",
                                   "Hurricane-1", "Hurricane-2", "Hurricane-3",
                                   "Hurricane-4", "Hurricane-5"),
                          labels=rev(brewer.pal(7, "YlOrBr"))))

And, now for the map! We’ll make lines for the path that was already traced by Danny, then make interactive points for the forecast locations from the advisory data:

last_advisory <- tail(which(grepl("^[[:digit:]]+$", danny_dat$advisory)), 1)
 
# draw the map
leaflet() %>% 
  addTiles() %>% 
  addPolylines(data=danny_dat[1:last_advisory,], ~lon, ~lat, color=~color) -> tmp_map
 
if (last_advisory < nrow(danny_dat)) {
 
   tmp_map <- tmp_map %>% 
     addCircles(data=danny_dat[last_advisory:nrow(danny_dat),], ~lon, ~lat, color=~color, fill=~color, radius=25000,
             popup=~sprintf("<b>Advisory forecast for +%sh (%s)</b><hr noshade size='1'/>
                           Position: %3.2f, %3.2f<br/>
                           Expected strength: <span style='color:%s'><strong>%s</strong></span><br/>
                           Forecast wind: %s (knots)<br/>Forecast pressure: %s",
                           htmlEscape(advisory), htmlEscape(time), htmlEscape(lon),
                           htmlEscape(lat), htmlEscape(color), htmlEscape(status), 
                           htmlEscape(wind_speed), htmlEscape(pressure)))
}
 
html_print(tmp_map)

Click on one of the circles to see the popup.

The entire source code is in [this gist](https://gist.github.com/hrbrmstr/e3253ddd353f1a489bb4) and, provided you have the proper packages installed, you can run this at any time with:

devtools::source_gist("e3253ddd353f1a489bb4", sha1="00074e03e92c48c470dc182f67c91ccac612107e")

The use of the `sha1` hash parameter will help ensure you aren’t being asked to run a potentially modified & harmful gist, but you should visit the gist first to make sure I’m not messing with you (which, I’m not).

If you riff off of this or have suggestions for improvement, drop a note here or in the gist comments.

Achieve Charting Zen With TauCharts

2015-08-04 – 17:28
Posted in Charts & Graphs, d3, Data Visualization, DataVis, DataViz, Javascript, R
Tagged post
Comments (3)

There was some chatter on the twitters this week about a relatively new D3-based charting library called [TauCharts](http://taucharts.com/) (also @taucharts). The API looked pretty clean and robust, so I started working on an htmlwidget for it and was quickly joined by the Widget Master himself, @timelyportfolio.

TauCharts definitely has a “grammar of graphics” feel about it and the default aesthetics are super-nifty While the developers are actively adding new features and “geoms”, the core points (think scatterplot), lines and bars (including horizontal bars!) geoms are quite robust and definitely ready for your dashboards.

Between the two of us, we have a _substantial_ part of the [charting library API](http://api.taucharts.com/) covered. I think the only major thing left unimplemented is composite charts (i.e. lines + bars + points on the same chart) and some minor tweaks around the edges.

While you can find it on [github](http://github.com/hrbrmstr/taucharts) and do the normal:

devtools::install_github("hrbrmstr/taucharts")

or, even use the official initial release version:

devtools::install_github("hrbrmstr/taucharts@v0.1.0")

I’ll use the `dev` version:

devtools::install_github("hrbrmstr/taucharts@dev"

for the example below, mostly since it includes the data set I want to use to mimic the current, featured example on the [TauCharts homepage](http://taucharts.com/) and also has full documentation with examples.

Here’s all it takes to make a faceted scatterplot with:

– interactive tooltips
– interactive legend
– custom-selectable trendline annotation:

devtools::install_github("hrbrmstr/taucharts@dev")
 
library(taucharts)
 
data(cars_data)
 
tauchart(cars_data) %>% 
  tau_point("milespergallon", c("class", "price"), color="class") %>% 
  tau_guide_padding(bottom=300) %>% 
  tau_legend() %>% 
  tau_trendline() %>% 
  tau_tooltip(c("vehicle", "year", "class", "price", "milespergallon"))

Hybrid cars fuel economy by price and class
It seems expensive cars are less efficient.

There are _tons_ more examples in the [TauCharts RPub](http://rpubs.com/hrbrmstr/taucharts) (and soon-to-be vignette) and @timelyportfolio will be featuring it in his weekly [widget update](http://www.buildingwidgets.com/).

Category Archives: Data Visualization

Adding Stats

Adding Geoms

Putting Our New Functions To Work

Adding `Stat`s

Adding `Geom`s