hrbrmstr, Author at rud.is

Author Archives: hrbrmstr

Don't look at me…I do what he does — just slower. #rstats avuncular • ?Resistance Fighter • Cook • Christian • [Master] Chef des Données de Sécurité @ @rapid7

Less Drama, More Encoding

2015-10-24 – 10:43
Posted in Charts & Graphs, Data Analysis, Data Visualization, DataVis, DataViz, ggplot, R
Tagged post
Comments (1)

Junk Charts [adeptly noted and fixed](http://junkcharts.typepad.com/junk_charts/2015/10/is-it-worth-the-drama.html) this excessively stylized chart from the WSJ this week:

Their take on it does reduce the ZOMGOSH WE ARE DOOMED! look and feel of the WSJ chart:

But, we can further reduce the drama by using a more neutral color encoding _and_ encode both the # of outbreaks and total size of the impacted flock populations _per week_ with a lollipop chart (and, thankfully the USDA makes this data readily available):

library(xml2)
library(rvest)
library(dplyr)
library(stringr)
library(ggplot2)
library(scales)
library(viridis)
library(ggthemes)
 
pg <- read_html("https://www.aphis.usda.gov/wps/portal/aphis/ourfocus/animalhealth/sa_animal_disease_information/sa_avian_health/sa_detections_by_states/ct_ai_pacific_flyway/!ut/p/a1/lVNNb-IwEP0tPewx2OSbI_QDwkdBRbuQXKyJ4yTWJnYUG1D-fZ10D7RqadcHS555M_PmPRkl6IgSAWdegOZSQNW_E58stwt7PMN2NN9PHnH0_OdpE64DZ7twDSA2APzFmeL39dtV5Pf1L3i2eBjjvYMOKEEJFbrRJYqhKbkiVArNhCYVT1tou19YAZGnluSSntTwAsFrqEjJoNLldSTjioFihItctvWwxFv6zEFc4zOmGe3TiqQdURo0M62pJsBJA5TnnJK86i7Q9fwayjMU57ZPAezccnwvtdwM21aah9hcGHtuCP6Y5v_0uLHwj_S8n08XbrA2CrqhjaMHUxxMNhhH_nf1g57fdBgAtyz7MGT-ODNDnta7YLW08cpDsSEZfMli4qL9f6q2_IEsdru53xSmLejS6g1Gx5vGv6WvjB8CnxmPjp8af5ihxJNBpIqeX1HJdPgQ8VSkTmiItCxnLWtHpVQaHS-Xy-ikMhgV8oya-ncdOh23_r6E2PGqYrerD9O7u1eBlNG5/?1dmy&urile=wcm%3apath%3a%2Faphis_content_library%2Fsa_our_focus%2Fsa_animal_health%2Fsa_animal_disease_information%2Fsa_avian_health%2Fsa_detections_by_states%2Fct_ai_full_list")
 
dat <- html_table(html_nodes(pg, "table"))[[1]]
 
dat %>% 
  mutate(`Confirmation date` = as.Date(`Confirmation date`, "%b %d, %Y"),
         week = format(`Confirmation date`, "%Y-%U"),
         week_start = as.Date(sprintf("%s-1", week), "%Y-%U-%u") ,
         `Flock size` = as.numeric(str_replace_all(`Flock size`, ",", ""))) %>% 
  select(week, week_start, `Flock size`) %>% 
  filter(!is.na(`Flock size`)) %>% 
  group_by(week_start) %>% 
  summarize(outbreaks=n(), 
            flock_total=sum(`Flock size`)) -> dat
 
first <- dat[2,]
last <- tail(dat, 1)
 
gg <- ggplot(dat, aes(x=week_start, y=outbreaks))
gg <- gg + geom_vline(xintercept=as.numeric(first$week_start), linetype="dashed", size=0.2, color="#7f7f7f")
gg <- gg + geom_text(data=first, aes(x=week_start, y=25), label=" First detection in 2015", hjust=0, size=3, color="#7f7f7f")
gg <- gg + geom_vline(xintercept=as.numeric(last$week_start), linetype="dashed", size=0.2, color="#7f7f7f")
gg <- gg + geom_text(data=last, aes(x=week_start, y=25), label="Last detection ", hjust=1, size=3, color="#7f7f7f")
gg <- gg + geom_segment(aes(x=week_start, xend=week_start, y=0, yend=outbreaks, color=flock_total), size=0.5)
gg <- gg + geom_point(aes(size=flock_total, fill=flock_total), shape=21)
gg <- gg + scale_size_continuous(name="Flock Impact", label=comma, guide="legend")
gg <- gg + scale_color_viridis(name="Flock Impact", label=comma, guide="legend")
gg <- gg + scale_fill_viridis(name="Flock Impact", label=comma, guide="legend")
gg <- gg + scale_x_date(label=date_format("%b"))
gg <- gg + guides(color=guide_legend(), fill=guide_legend(), size=guide_legend())
gg <- gg + labs(x=NULL, y="# Outbreaks", title="Avian Flu Impact by Week (2015)")
gg <- gg + theme_tufte(base_family="Helvetica")
gg <- gg + theme(legend.key=element_rect(color=rgb(0,0,0,0)))
gg

If we really want to see the discrete events, we can do that with our less-ZOMGOSH color scheme, too:

dat <- html_table(html_nodes(pg, "table"))[[1]]
dat %>% 
  mutate(`Confirmation date` = as.Date(`Confirmation date`, "%b %d, %Y"),
         `Flock size` = as.numeric(str_replace_all(`Flock size`, ",", ""))) %>% 
  filter(!is.na(`Flock size`)) %>% 
  rename(date=`Confirmation date`) %>% 
  arrange(date) -> dat
 
first <- dat[2,]
last <- tail(dat, 1)
 
gg <- ggplot(dat, aes(x=date, y=`Flock size`))
gg <- gg + geom_vline(xintercept=as.numeric(first$date), linetype="dashed", size=0.2, color="#7f7f7f")
gg <- gg + geom_text(data=first, aes(x=date, y=3000000), label=" First detection in 2015", hjust=0, size=3, color="#7f7f7f")
gg <- gg + geom_vline(xintercept=as.numeric(last$date), linetype="dashed", size=0.2, color="#7f7f7f")
gg <- gg + geom_text(data=last, aes(x=date, y=3000000), label="Last detection ", hjust=1, size=3, color="#7f7f7f")
gg <- gg + geom_segment(aes(x=date, xend=date, y=0, yend=`Flock size`, color=`Flock size`), size=0.5, alpha=1)
gg <- gg + scale_size_continuous(name="Flock Impact", label=comma, guide="legend")
gg <- gg + scale_color_viridis(name="Flock Impact", label=comma, guide="legend")
gg <- gg + scale_fill_viridis(name="Flock Impact", label=comma, guide="legend")
gg <- gg + scale_x_date(label=date_format("%b"))
gg <- gg + scale_y_continuous(label=comma)
gg <- gg + guides(color=guide_legend(), fill=guide_legend(), size=guide_legend())
gg <- gg + labs(x=NULL, y="Flock size", title="Avian Flu Impact (2015)")
gg <- gg + theme_tufte(base_family="Helvetica")
gg <- gg + theme(legend.key=element_rect(color=rgb(0,0,0,0)))
gg

Neither of those is ever going to sell any ads, tho.

Installing R on OS X – “100% Homebrew Edition”

In a previous post I provided “mouse-heavy” instructions for getting R running on your Mac. A few of the comments suggested that an “all Homebrew” solution may be preferable for some folks. Now, there are issues with this since getting “support” for what may be R issues will be very difficult on the official mailing lists as you’ll immediately be told to “use the official distribution” by some stalwart R folks (this happens on StackOverflow and other forums as well). However, if you have a thick skin and can be somewhat self-sustaining, Homebrew is a superb alternative to setting up your R environment (and other things) on your OS X system.

What is ‘Homebrew’?

Homebrew is the “missing package manager for OS X”. It’s similar to apt, yum and other package managers on linux/BSD that enable you to install open source (and other types of) packages without having to do the download→unarchive→compile→curse→google→compile→curse→google some more→compile→smile→test→install dance manually. MacPorts is another third-party package manager for OS X, but I use Homebrew, so you get Homebrew examples here.

Homebrew’s inventory of packages comes from github repositories that contain “Formulas” for where to get package components and how to (literally) make them work on OS X. Here’s a small-ish example (some Formula are fairly long/involved/complex) of a Homebrew Formula for the cowsay utility (what, you don’t like to have ASCII animals give you handy messages?):

class Cowsay < Formula
  desc "Configurable talking characters in ASCII art"
  homepage "https://web.archive.org/web/20120225123719/http://www.nog.net/~tony/warez/cowsay.shtml"
  url "http://ftp.acc.umu.se/mirror/cdimage/snapshot/Debian/pool/main/c/cowsay/cowsay_3.03.orig.tar.gz"
  sha256 "0b8672a7ac2b51183780db72618b42af8ec1ce02f6c05fe612510b650540b2af"

  bottle do
    cellar :any_skip_relocation
    revision 1
    sha256 "c041ce7fbf41fd89bf620ae848e3b36fe1e69ab3e2dfca18bc2f2e79cfe8063a" => :el_capitan
    sha256 "ffacfb987481394174267fd987dea52607825e3542d1ea3d0b7aa4ccf7ea5cc5" => :yosemite
    sha256 "12c41b969af30817a4dc7ec25572fe1b707b9d4dcb46d8cc06d22264594219c1" => :mavericks
  end

  # Official download is 404:
  # url "http://www.nog.net/~tony/warez/cowsay-3.03.tar.gz"

  def install
    system "/bin/sh", "install.sh", prefix
    mv prefix/"man", share
  end

  test do
    output = shell_output("#{bin}/cowsay moo")
    assert output.include?("moo")  # bubble
    assert output.include?("^__^") # cow
  end
end

It has:

a description of what the package is
the official location of the program/libraries “home”
where the main URL of the contents of the program/library is
weird hex strings to help the Homebrew ecosystem now pwn you
instructions for how to install (with optional patching of problematic code on particular setups)
test/validation instructions

(You can/should overlook the fact they use icky Ruby for this whole thing.)

There are thousands of Formula in the main Homebrew repository and you can “tap” other (properly organized) GitHub repositories for other (usually task-specific) formula. We’ll need to do this for R.

Finally, the Homebrew community has also come up with the notion of Casks where actual binary OS X programs (and other things) can be installed. We’ll use this as well for some ancillary components.

Apart from the ease of initial setup, the truly compelling part of using Homebrew is that all it takes to update components/libraries is to do a:

brew update && brew upgrade

from a Terminal prompt. You should get into the habit of issuing those commands daily-ish.

Yes, you will need to become comfortable in the Terminal (or, preferably, iTerm 2) to use the Homebrew ecosystem (though there are some efforts to make this more GUI-user friendly).

Using Homebrew to Create & Maintain Your R Installation

I won’t provide much (if any) color commentary to the commands below. Suffice it to say that in a few short lines of a script, you’ll end up having:

R (with gfortran and the vast majority of required support libraries for many packages)
Oracle Java (a later step in the sequence ensures R knows about your Java install)
X11 (XQuartz)
MacTex
RStudio
extra SVG, XML, curl, geo-munging and C++ support libraries
A cool font for RStudio (FiraCode, though that’s not necessary)
iTerm 2 (optional)
GitUp git gui (optional)

If it’s your first time using Homebrew you’ll need to do this:

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

NOTE that I’m generally against piping output of curl to run raw on your system without validation, so you can download https://raw.githubusercontent.com/Homebrew/install/master/install and verify it (or ask a security friend to verify it), but you’ll be trusting the Homebrew ecosystem to not pwn your system for the rest of your time using it, so as long as you trust that I gave you the proper URL to get to the installer, cut/paste away.

Once Homebrew is setup I’d recommend copying and pasting each line (below) one-by-one to get familiar with Homebrew operations and messaging.

This can be a pretty scary experience if you’re not used to the command-line, so drop a note in the comments or on Twitter (target @hrbrmstr and use the #rstats #homebrew tags) if some things go awry or get confusing and I’ll try to help as much as I can.

brew tap caskroom/cask
brew install brew-cask

brew install Caskroom/cask/xquartz
brew cask install java

brew tap homebrew/science
brew install R --with-openblas # added --with-openblas based on a spiffy comment by Lincoln Mullen

brew install Caskroom/cask/rstudio


# For latex:
brew cask install mactex

# OR YOU CAN DO
brew install gnupg
brew cask install basictex # suggested by @noamross
# plus the following, provided by Kras.Pten
# read why here: http://rud.is/b/2015/10/22/installing-r-on-os-x-100-homebrew-edition/#comment-10603
sudo tlmgr update –-self 
sudo tlmgr update –-all 
sudo tlmgr install titling framed inconsolata

# DO NOT DO BOTH!

brew install libsvg curl libxml2 gdal geos boost

R CMD javareconf JAVA_CPPFLAGS=-I/System/Library/Frameworks/JavaVM.framework/Headers

brew tap caskroom/fonts
brew cask install font-fira-code
brew cask install iterm2
brew cask install gitup           # if you want a GUI for git stuff (h/t @jennybryan)

That’s quite a bit less clicking/typing from what was required in the previous post.

Fin

I validated that entire configuration on a completely fresh installation of El Capitan (OS X 10.11) in a VM. At the end, I had a fully-functioning data-science setup. Hopefully, you will as well.

If you have suggestions for other Homebrew things to add to make a good “data science OS X setup”, drop a note in the comments!

P.S.

Once you have a full Homebrew & “Cask” setup, the way to keep up-to-date with everything is more like:

brew update && brew upgrade brew-cask && brew cleanup && brew cask cleanup

but an complete “how to use Homebrew” guide is beyond the scope of this post.

Installing R on OS X

NOTE: The comments are a must read for this. Some excellent additional advice and “gotchas” by some super-helpful readers.

I was in a conversation with an academic colleague (wicked smart dude) and the subject of installing R came up (NOTE: this will happen to you, too, if you ever have the misfortune to have a face-to-face convo with me ;-). They noted that getting up and running with R was not as seamless as one would like it to be and, to be honest, I have to agree, especially after typing the rest of this post out.

I recently had a similar experience helping folks who use Windows get R & RStudio up and running and that’s even more of a nightmare, especially if you do not have Administrator privileges (or, perhaps I just scare easily).

Prior to these experiences, I never really stopped to consider just how less friendly the installation process of R is when compared to Excel, Tableau or other apps one might use for data analysis and visualization. Hopefully this will becomre a top priority for the R Consortium.

Since this colleague uses OS X, I offered to put together instructions for how to get R & RStudio installed and finally had 5 minutes to crank out a blog post to help the broader community with the information.

Get R

Go to https://cran.rstudio.com/bin/macosx/
If you’re on a recent version of OS X, download the first .pkg link. As of this post, it’s R-3.2.2.pkg and the direct URL is https://cran.rstudio.com/bin/macosx/R-3.2.2.pkg
Install the package by double-clicking on it and working through the prompts.

Verify R itself is working

Look in the Applications folder for the R application.
Double-click it and you should see an R console window.
If that did not work, try installing R again
Once you’ve verified R is working, quit the app

Download RStudio

RStudio is an integrated development environment for R that will make your life and coding easier.

Go to https://www.rstudio.com/products/rstudio/download/ and download the RStudio version for Mac OS X. It’ll say something like RStudio 0.99.486 - Mac OS X 10.6+ (64-bit). The direct URL for that verison (which was current at the time of this post) is https://download1.rstudio.org/RStudio-0.99.486.dmg
Double-click that DMG file
Drag the RStudio icon to the Applications folder.

Verify RStudio & R are working together

Look in the Applications folder for the RStudio application.
Double-click it and you should see an RStudio window with four panes.

From now on, just start RStudio when you want to work in R.

[Optional] Install XQuartz

Some functions in R require an “X11 Server” and/or libraries associated with an X11 server. Apple does not provide this software with OS X anymore so you have to do it on your own via a third-party application called XQuartz.

Go to http://xquartz.macosforge.org/landing/ and download the top-most “quick download” disk image file. As of this post, that’s version 2.7.8 and this is the direct link http://xquartz.macosforge.org/downloads/SL/XQuartz-2.7.8.dmg
Double click the downloaded disk image then double click on the XQuartz.pkg and follow the installation steps.

[Optional] Install Xcode Tools

Some R packages require compilation. That requires utilities not installed on OS X by default. You can wait to do the following until it’s needed, but since you’re already installing things…

Get Xcode https://itunes.apple.com/us/app/xcode/id497799835?mt=12 and install it like any “normal” Mac application
When the intallation is done, open Xcode then close it just to verify it installed correctly
Find and open the Terminal program in the Utilities folder under the Applications folder
Paste the following into the Termainal window and hit enter/return (accept any dialog/prompt):

xcode-select --install`

Close the Terminal application

[Optional] Set yourself up for easier future compiled package installation

Some R packages need additional libraries to work and most aren’t on your system by default. There are a myriad of ways to get these libraries, and the way I obtain them is via the homebrew utility. You can save yourself the trouble of installing homebrew later by doing the following now:

Find and open the Terminal program in the Utilities folder under the Applications folder
Paste the following into the Terminal window and hit enter/return:

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Read and accept the various prompts until it’s installed
Close the Terminal application

You can now do brew install xyz in the future when a library is needed to support a package. Drop a note in the comments if you’d like this discussed more in a future blog post.

[Optional] [If you have an hour+ to kill] Install MacTeX

R has an academic history and there are many semi-advanced functions that are tied to something called latex. Installing latex for OS X is not hard, just time (and bandwidth) consuming (it’s about the same size as a new OS X installer). If you delve into package creation or do more detailed output work in R, you’ll want to install MacTex sooner than later.

Go to https://tug.org/mactex/mactex-download.html and download the latest MacTeX.pkg file. This direct link should always work (until they change up the way their site works) http://tug.org/cgi-bin/mactex-download/MacTeX.pkg
Double-click the pkg installer and follow the prompts. The defaults should all be good enough for now.

Fin

If you have any changes/additions/etc drop a note in the comments. I may even stick this on github to make it easier to contribute in the future.

Replicating NatGeo’s “Proper” Earthquake Map in R

2015-10-04 – 20:01
Posted in cartography, Data Visualization, DataVis, DataViz, ggplot, gis, maps, R
Tagged post
Comments (22)

I saw this post over at NatGeo over the weekend and felt compelled to replicate this:

with ggplot2.

Three shapefiles later and we have it close enough to toss into a post (and I really don’t believe the continent names are necessary).

library(rgdal)
library(ggplot2)
library(ggthemes)
library(ggalt) # devtools::install_github("hrbrmstr/ggalt")

# grab these from http://rud.is/dl/quakefiles.tgz

world <- readOGR("countries.geo.json", "OGRGeoJSON", stringsAsFactors=FALSE)
plates <- readOGR("plates.json", "OGRGeoJSON", stringsAsFactors=FALSE)
quakes <- readOGR("quakes.json", "OGRGeoJSON", stringsAsFactors=FALSE)

world_map <- fortify(world)
plates_map <- fortify(plates)
quakes_dat <- data.frame(quakes)
quakes_dat$trans <- quakes_dat$mag %% 5

gg <- ggplot()
gg <- gg + geom_cartogram(data=world_map, map=world_map,
                          aes(x=long, y=lat, map_id=id),
                          color="white", size=0.15, fill="#d8d8d6")
gg <- gg + geom_cartogram(data=plates_map, map=plates_map,
                          aes(x=long, y=lat, map_id=id),
                          color="black", size=0.1, fill="#00000000", alpha=0)
gg <- gg + geom_point(data=quakes_dat,
                      aes(x=coords.x1, y=coords.x2, size=trans),
                      shape=1, alpha=1/3, color="#d47e5d", fill="#00000000")
gg <- gg + geom_point(data=subset(quakes_dat, mag>7.5),
                      aes(x=coords.x1, y=coords.x2, size=trans),
                      shape=1, alpha=1, color="black", fill="#00000000")
gg <- gg + geom_text(data=subset(quakes_dat, mag>7.5),
                     aes(x=coords.x1, y=coords.x2, label=sprintf("Mag %2.1f", mag)),
                     color="black", size=3, vjust=c(3.9, 3.9, 5), fontface="bold")
gg <- gg + scale_size(name="Magnitude", trans="exp", labels=c(5:8), range=c(1, 20))
gg <- gg + coord_map("mollweide")
gg <- gg + theme_map()
gg <- gg + theme(legend.position=c(0.05, 0.99))
gg <- gg + theme(legend.direction="horizontal")
gg <- gg + theme(legend.key=element_rect(color="#00000000"))
gg

I can only imagine how many mouse clicks that would be in a GIS program.

Addendum

This comment has links to where I acquired the shapefiles (good q by Michael)
Thanks to this comment there’s now code in a gist that makes a map re-cenetered on the Pacific

For Some Definition of “Directly” and/or “Contort”

2015-09-30 – 09:21
Posted in Data Visualization, DataVis, DataViz, ggplot, R
Tagged post
Comments (5)

Junk Charts did a post on [Don’t pick your tool before having your design](http://junkcharts.typepad.com/junk_charts/2015/09/dont-pick-your-tool-before-having-your-design.html) and made a claim that this:

_”cannot be produced directly from a tool (without contorting your body in various painful locations)”_.

I beg to differ.

With R & ggplot2, I get to both pick my tool and design at the same time since I have a very flexible and multi-purpose tool. I also don’t believe that the code below qualifies as “contortions”, though I am a ggplot2 fanboi. It’s no different than Excel folks clicking on radio buttons and color pickers, except my process is easily repeatable & scalable once finalized (this is not finalized as it’s not 100% parameterized but it’s not difficult to do that last part).

library(ggplot2)
 
dat <- data.frame(year=2010:2015,
                  penalties=c(627, 625, 653, 617, 661, 730))
 
avg <- data.frame(val=mean(head(dat$penalties, -1)),
                  last=dat$penalties[6],
                  lab="5-Yr\nAvg")
 
gg <- ggplot(dat, aes(x=year, y=penalties))
gg <- gg + geom_point()
gg <- gg + scale_x_continuous(breaks=c(2010, 2014, 2015), limits=c(NA, 2015.1))
gg <- gg + scale_y_continuous(breaks=c(600, 650, 700, 750), 
                              limits=c(599, 751), expand=c(0,0))
gg <- gg + geom_segment(data=avg, aes(x=2010, xend=2015, y=val, yend=val), linetype="dashed")
gg <- gg + geom_segment(data=avg, aes(x=2015, xend=2015, y=val, yend=last), color="steelblue")
gg <- gg + geom_point(data=avg, aes(x=2015, y=val), shape=4, size=3)
gg <- gg + geom_text(data=avg, aes(x=2015, y=val), label="5-Yr\nAvg", size=2.5, hjust=-0.3)
gg <- gg + geom_point(data=avg, aes(x=2015, y=700), shape=17, col="steelblue")
gg <- gg + geom_point(data=avg, aes(x=2015, y=730), shape=4, size=3)
gg <- gg + labs(x=NULL, y="Number of Penalties", 
                title="NFL Penalties Jumped 15% in the\nFirst 3 Weeks of the 2015 Season\n")
gg <- gg + theme_bw()
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(panel.grid.major.x=element_blank())
gg <- gg + theme(panel.grid.major.y=element_line(color="white"))
gg <- gg + theme(panel.background=element_rect(fill="#f3f2f7"))
gg <- gg + theme(axis.ticks=element_blank())
gg

A Package Full o’ Pirates & Makin’ Interactive Pirate Maps in arrrrrRstats

2015-09-19 – 14:34
Posted in cartography, DataVis, DataViz, Humor, maps, R, TLAPD
Tagged post
Comments (4)

Avast, me hearties! It’s time four t’ annual International Talk Like a Pirate Day #rstats post!

(OK, I won’t make you suffer continuous pirate-speak for the entire post)

I tried to be a bit more practical this year and have two treasuRe chests for you to (hopefully) enjoy.

A Package Full o’ Pirates

I’ve covered the Anti-shipping Activity Messages (ASAM) Database before for TLAPD before but getting, updating and working with the data has more pain points than it should, so I wrapped a small package around it.

Here’s how to get all pirate attacks this year (2015) so far:

# devtools::install_github("hrbrmstr/asam")
library(asam)
 
data(asam_shp)
pirates <- subset(asam_shp,
                  grepl("pirate", Aggressor, ignore.case=TRUE) &
                  format(DateOfOcc, "%Y") == "2015")
 
nrow(pirates)
## [1] 78

It looks like there have been 78 registered pirate attacks this year. The National Geospatial Intelligence Agency (NGIA) marks the attacks by lat/lon and also by region/subregion, and I managed to obtain the official polygons for these regions, so we can plot these attacks on a world map and also show the subregions:

library(ggplot2)
 
# get the ASAM subregion polygons
subregions <- asam_subregions()
subregions_map <- fortify(subregions)
 
# get the world map
world <- map_data("world")
 
# get the points for the pirate attack occurrences
pirate_pts <- data.frame(pirates)
 
gg <- ggplot()
 
# world map layer
gg <- gg + geom_map(data=world, map=world,
                    aes(x=long, y=lat, map_id=region),
                    color="black", fill="#e7e7e7", size=0.15)
# ASAM regions layer
gg <- gg + geom_map(data=subregions_map, map=subregions_map,
                    aes(x=long, y=lat, map_id=id),
                    color="white", fill="white", size=0.15, alpha=0)
 
# attacks
gg <- gg + geom_point(data=pirate_pts, color="black", fill="yellow", 
                      aes(x=coords.x1, y=coords.x2), shape=21)
 
gg <- gg + xlim(-170, 170)
gg <- gg + ylim(-58, 75)
gg <- gg + coord_map("mollweide")
gg <- gg + theme_map()
gg <- gg + theme(panel.background=element_rect(fill="steelblue"))
gg

README-map-1

There is quite a bit more data than just location, though and we can work with it much better in an interactive map.

Makin’ Interactive Pirate Maps

Now, what makes the following an interactive pirate map is not so much the fact that we’ll be plotting points of pirate attacks on a Leaflet map, but we’ll also be using a pirate treasure map theme on the Leaflet map.

Let’s start with showing how to use a general pirate map theme before adding in the ASAM data.

You’ll have to pause here and head on over to MapBox to register for a (free) account. You’ll need to go through the gyrations to eventually get a public token and mapbox id to use the pirate map tiles they have. I store those in my .Renviron so I don’t have to cut/paste inane strings when I need to use this, or other, APIs or need to keep them from prying eyes. Since MapBox exposes these strings in GET call URLs, the use of environment variables is strictly for convenience in this case.

library(leaflet)
 
mapbox_public_token <- Sys.getenv("MAPBOX_PUBLIC_TOKEN")
mapbox_map_id <- Sys.getenv("PIRATE_MAP_ID")
mapbox_url <- "https://a.tiles.mapbox.com/v4/%s/{z}/{x}/{y}.png?access_token=%s"
mapbox_tiles_template <- sprintf(mapbox_url, mapbox_map_id, mapbox_public_token)

Now, what good is a pirate map without an ‘X’ marking the spot for some treasure. For that we’ll need an ‘X’:

in a format we can use with Leaflet:

x_marker <- icons("http://rud.is/dl/x.png",
                  iconHeight=64, iconWidth=64,
                  iconAnchorX=32, iconAnchorY=32)

Now, we can display a pirate map for all scurvy dogs to see:

leaflet() %>%
  addTiles(mapbox_tiles_template) %>%
  setView(lng=-50.9249, lat=45.68929, zoom=3) %>%
  addMarkers(-70.2667, 43.6667, icon=x_marker)

NOTE: I have not buried treasure in Portland, Maine, but go nuts digging at that location if you still want to.

Pirates on Pirate Maps

We can make a [crude] interactive ASAM browser by combining our data from above with our new, pirate-y mapping capabilities:

library(asam)
library(sp)
library(dplyr)
library(leaflet)
 
data(asam_shp)
dat <- subset(asam_shp,
              DateOfOcc > as.Date("2015-01-01") &
                grepl("pirate", Aggressor, ignore.case=TRUE))
# could also do data.frame(dat)
dat <- bind_cols(dat@data, data.frame(coordinates(dat), stringsAsFactors=FALSE))

We’ll build a popup with the ASAM incident description fields and add it and the pirate incident points to a pirate-themed Leaflet map:

popup_template <- '<div style="background:#f3e0b5; padding:10px"><b>Date:</b> %s
<span style="float:right"><a target="_blank" href="https://msi.nga.mil/NGAPortal/msi/query_results.jsp?MSI_queryType=ASAM&amp;MSI_generalFilterType=SpecificNumber&amp;MSI_generalFilterValue=%s&amp;MSI_additionalFilterType1=None&amp;MSI_additionalFilterType2=-999&amp;MSI_additionalFilterValue1=-999&amp;MSI_additionalFilterValue2=-999&amp;MSI_outputOptionType1=SortBy&amp;MSI_outputOptionType2=-999&amp;MSI_outputOptionValue1=Date_DESC&amp;MSI_outputOptionValue2=-999&amp;MSI_MAP=-999">ASAM Record</a>
</span><br/>
<b>Victim:</b> %s<br/>
<b>Description:</b> %s</div>'
 
nona <- function(x) ifelse(is.na(x), " ", x)
 
pirate_pops <- sprintf(popup_template,
                       dat$date,
                       gsub("-", "_", dat$Reference),
                       dat$Victim,
                       paste0(nona(dat$Descript),
                              nona(dat$Desc1), nona(dat$Desc2), nona(dat$Desc3),
                              nona(dat$Desc4), nona(dat$Desc5), nona(dat$Desc6),
                              sep=" "))
 
mapbox_public_token <- Sys.getenv("MAPBOX_PUBLIC_TOKEN")
mapbox_map_id <- Sys.getenv("PIRATE_MAP_ID")
mapbox_url <- "https://a.tiles.mapbox.com/v4/%s/{z}/{x}/{y}.png?access_token=%s"
mapbox_tiles_template <- sprintf(mapbox_url, mapbox_map_id, mapbox_public_token)
 
leaflet() %>%
  addTiles(mapbox_tiles_template) %>%
  setView(lng=-50.9249, lat=45.68929, zoom=3) %>%
  addCircles(dat$coords.x1, dat$coords.x2, radius=300,
             color="#664c1f", popup=pirate_pops)

Select any of the circle marks and you’ll get a popup with a description and link to the official ASAM record (like this):

tlapd2015012_html

Fin

I’m not sure when I’ll get back to the asam package, but it could use some attention. The Aggressor field could be auto-cleaned to make it more usable and a dplyr-esque interface could be developed to select incidents. Also, since it includes a shapefile of subregions, that could also be used to do more spatial-oriented analyses of the incidents. It’s all there for any pirate lackey to pilfer.

Drop a note in the comments if you have any of your own pirate-y creations or an issue on github for feature requests & bug reports.

A Better Way To Read Nest Data Into R

2015-09-10 – 09:05
Posted in APIs, Authentication, OAuth, R, REST
Tagged API, post
Comments (1)

I glimpsed a [post](http://blagrants.blogspot.com/2015/09/nest-thermostat-and-r-creating-shiny.html) in the RSS feeds today on how to connect Nest data with a Shiny dashboard and was compelled to post a less brute-force way to get data from the Nest API. The authors of the Shiny+Nest post used `system` calls to `curl` and regular expression character vector operations to slice, dice & work with the Nest API/data. However, we have packages like `RCurl`, `curl` and `httr` that all make `system` calls unnecessary. Here’s how to use the Nest API in a more R-like (or `httr`-like) fashion.

### Nest & OAuth 2.0

Nest uses OAuth 2.0 for authentication. As the authors of the other post pointed out, you’ll need to go to `https://developer.nest.com/clients` (create an account if you don’t already have one) to setup a new client, making sure to leave the `OAuth Redirect URL` field _blank_. Copy the _Client ID_ for use later and store the _Client secret_ in your `.Renviron` file as `NEST_CONSUMER_SECRET=…` (restart any open R sessions so R will re-read the `.Renviron` file).

Once your API client information is setup, you’ll need a way of working with it. We first create a Nest OAuth endpoint that has the core URLs that `httr` will use to help authorize the client with. Unfortunately, not all OAuth 2.0 endpoints work the same way. For the Nest API an initial `state` parameter is required even if using PIN-based authentication (which we are in this example since Nest doesn’t honor a dynamic callback URL as far as I can tell). This is how top setup the `httr` endpoint.

library(httr)
library(jsonlite)
 
nest <- oauth_endpoint(
  request=NULL,
  authorize="https://home.nest.com/login/oauth2?state=login",
  access="https://api.home.nest.com/oauth2/access_token"
)

Now, we need to setup the “app”. This is more of an “R” need than an “OAuth” need. Use the _Client ID_ you copied earlier. `httr` will pull the secret from the environment variable you created.

nest_app <- oauth_app("nest", key="a8bf6e0c-89a0-40ae-869a-943e928316f5")

With that out of the way, now we authorize the client. Because we’re using PIN-based authentication, the user will have to cut/paste the URL displayed in the R Console into a browser then cut/paste the PIN displayed in the browser back into the R Console. With `cache=TRUE` this will be a one-time event.

nest_token <- oauth2.0_token(nest, nest_app, use_oob=TRUE, cache=TRUE)

We can now use our shiny new token to make Nest API calls.

### Using the Nest API

Since this is not a detailed introduction to the Nest API in general, you may want to take the time to read [their documentation](https://developer.nest.com/documentation/topics/api-guides-and-reference). Here’s how to get a list of all the devices for the account and then read the data from the first thermostat. I’m gaming this a bit since I only have one Nest device and it is a thermostat, but you can use their simulator to play with more data.

To get all the devices in use, it’s just a call do the `devices` path. Again, not all OAuth 2.0 APIs work the same way, so instead of embedding the access token into the http request headers, you need to specify it in the query parameters:

req <- GET("https://developer-api.nest.com",
           path="devices",
           query=list(auth=nest_token$credentials$access_token))
stop_for_status(req)
devices <- fromJSON(content(req, as=text))

Now, `devices` contains a very ugly list (most JSON APIs return really ugly responses) but we can easily get the ID of the first thermometer (which we’ll need to use in the next API call) by doing:

first_thermostat <- names(devices$thermostats)[1]

Use `str(devices)` to see what else is there for use.

Now, to get the data from thermostat, all you have to do is:

req <- GET("https://developer-api.nest.com/",
           path=sprintf("devices/thermostats/%s", first_thermostat),
           query=list(auth=nest_token$credentials$access_token))
stop_for_status(req)
thermo <- data.frame(fromJSON(content(req, as="text")),
                     stringsAsFactors=FALSE)

And, you can display the current temperature/humidity via:

cat(thermo$ambient_temperature_f, "F / ", thermo$humidity, "%", sep="")

### Fin

Ideally, one would wrap this into a package (which I _may_ do but feel free to take this code and make one before I get the cycles to get to it) and add more error checking and convenience functions for working with the data. But, you can now adapt the [Nest+Shiny dashboard](http://blagrants.blogspot.com/2015/09/nest-thermostat-and-r-creating-shiny.html) post code to use proper API calls and data structures vs `system(“curl…”)` and `gsub()`.

A contiguous version of the above code is in [this gist](https://gist.github.com/hrbrmstr/2da2312170a84340c14f).

Roll Your Own Stats and Geoms in ggplot2 (Part 1: Splines!)

2015-09-08 – 19:06
Posted in Data Visualization, DataVis, DataViz, ggplot, R
Tagged post
Comments (4)

A huge change is coming to ggplot2 and you can get a preview of it over at Hadley’s github repo. I’ve been keenly interested in this as I will be fixing, finishing & porting coord_proj to it once it’s done.

Hadley & Winston have re-built ggplot2 with an entirely new object-oriented system called ggproto. With ggproto it’s now possible to easily extend ggplot2 from within your own packages (since source() is so last century), often times with very little effort.

Before attempting to port coord_proj I wanted to work through adding a Geom and Stat since thought it would be cool to be able to have interpolated line charts (and it helps answer some recurring StackOverflow “spline”/ggplot2 questions) and also prefer KernSmooth::bkde over the built-in density function (which geom_density and stat_density both use).

To that end, I’ve made a new github-installable package called ggalt (h/t to @jayjacobs for the better package name than I originally came up with) where I’ll be adding new Geoms, Stats, Coords (et al) as I craft them. For now, let me introduce both geom_xspline() and geom_bkde() to show how easy it is to incorporate new functionality into ggplot2.

While not a requirement, I think it’s a going to be a good idea to make both a paired Geom and Stat when adding those types of functionality to ggplot2. I found it easier to work with custom parameters this way and it also makes it feel a bit more like the way ggplot2 itself works. For the interpolated line geom/stat I used R’s graphics::xpsline function. Here’s all it took to give ggplot2 lines some curves (you can find the commented version on github):

geom_xspline <- function(mapping = NULL, data = NULL, stat = "xspline",
                      position = "identity", show.legend = NA,
                      inherit.aes = TRUE, na.rm = TRUE,
                      spline_shape=-0.25, open=TRUE, rep_ends=TRUE, ...) {
  layer(
    geom = GeomXspline,
    mapping = mapping,
    data = data,
    stat = stat,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(spline_shape=spline_shape,
                  open=open,
                  rep_ends=rep_ends,
                  ...)
  )
}

GeomXspline <- ggproto("GeomXspline", GeomLine,
  required_aes = c("x", "y"),
  default_aes = aes(colour = "black", size = 0.5, linetype = 1, alpha = NA)
)

stat_xspline <- function(mapping = NULL, data = NULL, geom = "line",
                     position = "identity", show.legend = NA, inherit.aes = TRUE,
                     spline_shape=-0.25, open=TRUE, rep_ends=TRUE, ...) {
  layer(
    stat = StatXspline,
    data = data,
    mapping = mapping,
    geom = geom,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(spline_shape=spline_shape,
                  open=open,
                  rep_ends=rep_ends,
                  ...
    )
  )
}

StatXspline <- ggproto("StatXspline", Stat,

  required_aes = c("x", "y"),

  compute_group = function(self, data, scales, params,
                           spline_shape=-0.25, open=TRUE, rep_ends=TRUE) {
    tf <- tempfile(fileext=".png")
    png(tf)
    plot.new()
    tmp <- xspline(data$x, data$y, spline_shape, open, rep_ends, draw=FALSE, NA, NA)
    invisible(dev.off())
    unlink(tf)

    data.frame(x=tmp$x, y=tmp$y)
  }
)

If that seems like alot of code, it really isn't. What we have there are:

two functions that handle the Geom aspects &
two functions that handle the Stat aspects.

Let's look at the Stat functions first, though you can also just read the handy vignette, too.

Adding `Stat`s

In this particular case, we have it easy. We get to use geom_line/GeomLine as the base geom_ for the layer since all we're doing is generating more points for it to draw line segments between. We create the creative interface to our new Stat with stat_xspline add three new parameters with default values:

spline_shape
open
rep_ends

"Added three new parameters to what?" you ask? GeomLine/geom_line default to StatIdentity/stat_identity and if you look at the source code, that Stat just returns the data back in the form it came in. We're going to take these three new parameters and pass them to xspline and then return entirely new values back for ggplot2/grid to draw for us, so we tell it to call our new computation engine by giving it the StatXspline value to the layer. By using GeomLine/geom_line as the geom parameter, all we have to do is ensure we pass back the proper values. We do that in compute_group since ggplot2 will segment the incoming data into groups (via the group aesthetic) for us. We take each group and run them through the xspline with the parameters the user specified. If I didn't have to use the hack to work around what seems to be errant plot device issues in xspline, the call would be one line.

Adding `Geom`s

We pair up the Stat with a very basic Geom "shim" so we can use them interchangeably. It's the same idiom, an "object" function and the user-callable function. In this case, it's super-lightweight since we're really having geom_line do all the work for us. In a [very] future post, I'll cover more complex Geoms that require use of the underlying grid graphics system, but I suspect most of your own additions may be able to use the lightweight idiom here (and that's covered in the vignette).

Putting Our New Functions To Work

With our new additions to ggplot2, we can compare the output of geom_smooth to geom_xspline with some test data:

set.seed(1492)
dat <- data.frame(x=c(1:10, 1:10, 1:10),
                  y=c(sample(15:30, 10), 2*sample(15:30, 10), 3*sample(15:30, 10)),
                  group=factor(c(rep(1, 10), rep(2, 10), rep(3, 10)))
)

ggplot(dat, aes(x, y, group=group, color=factor(group))) +
  geom_point(color="black") +
  geom_smooth(se=FALSE, linetype="dashed", size=0.5) +
  geom_xspline(size=0.5)

README-unnamed-chunk-4-3

The github page has more examples for the function, but you don't have to be envious of the smooth D3 curves any more.

I realize this particular addition is not extremely helpful/beneficial, but the next one is. We'll look at adding a new/more accurate density Stat/Geom in the next installment and then discuss the "on-steroids" roxygen2 comments you'll end up using for your creations in part 3.