New Gig!

Well, the proverbial cat is definitely out of the bag now. I’m moving on from the current gig to take a security data scientist position at Verizon Enterprise. The esteemed Wade Baker will be my new benevolent overlord and it probably isn’t a shocker that I went to the place my co-author works.

Wade’s got an awesome team and I’m excited to start contributing. I’ll definitely miss my evil (and, not-so-evil) minions from the current-but-soon-to-be-former gig, but they’ll continue doing EPIC risk work and security analytics in my absence.

Also, I’m staying put in Maine (apart from what I suspect will be a boatload of travel), so fret not Seacoasters, many a night at 7th Settlement will continue to be had!

Rforecastio 1.2.0 Bug-fix Update

Not even going to put an R category on this since I don’t want to pollute R-bloggers with this tiny post, but I had to provide the option to let folks specify ssl.verifypeer=FALSE (so I made it a generic option to pass in any CURL parameters) and I had a couple gaping bugs that I missed due to not clearing out my environment before building & testing.

Rforecastio Package Update (1.1.0)

I’ve bumped up the version number of Rforecastio (github) to 1.1.0. The new features are:

  • removing the SSL certificate bypass check (it doesn’t need it anymore)
  • using plyr for easier conversion of JSON->data frame
  • adding in a new daily forecast data frame
  • roxygen2 inline documentation
# NEVER put API keys in revision control systems or source code!
fio.api.key= readLines("~/")
my.latitude = "43.2673"
my.longitude = "-70.8618"
fio.list <- fio.forecast(fio.api.key, my.latitude, my.longitude) <- ggplot(data=fio.list$hourly.df, aes(x=time, y=temperature)) <- + labs(y="Readings", x="Time", title="Houry Readings") <- + geom_line(aes(y=humidity*100), color="green") <- + geom_line(aes(y=temperature), color="red") <- + geom_line(aes(y=dewPoint), color="blue") <- + theme_bw()

daily <- ggplot(data=fio.list$daily.df, aes(x=time, y=temperature)) <- + labs(y="Readings", x="Time", title="Daily Readings") <- + geom_line(aes(y=humidity*100), color="green") <- + geom_line(aes(y=temperatureMax), color="red") <- + geom_line(aes(y=temperatureMin), color="red", linetype=2) <- + geom_line(aes(y=dewPoint), color="blue") <- + theme_bw()


Moving From system() calls to Rcpp Interfaces

Over on the Data Driven Security Blog there’s a post on how to use Rcpp to interface with an external library (in this case ldns for DNS lookups). It builds on another post which uses system() to make a call to dig to lookup DNS TXT records.

The core code is below and at both the aforementioned blog post and this gist. The post walks you though creating a simple interface and a future post will cover how to build a full package interface to an external library.

Getting Fit2Tcx Working on Mac OS X (10.9.x)

Andreas Diesner’s #spiffy Fit2Tcx command-line utility is a lightweight way to convert Garmin/ANT FIT files to TCX for further processing.

On a linux system, installing it is as simple as:

sudo add-apt-repository ppa:andreas-diesner/garminplugin
sudo apt-get update
sudo apt-get install fit2tcx

On a Mac OS X system, you’ll need to first grab the tinyxml package from homebrew:

brew install tinyxml

to install the necessary support library.

After a git clone of the Fit2Tcx repository, change the


line in to


and then do the typical ./configure && make (there is no test target).

You’ll now have a relatively small fit2tcx binary that you can move to /usr/local/bin or wherever you like command-line utilities to be put.

You can also grab the pre-compiled binary (built on OS X 10.9.2 with “latest” Xcode).

Mapping the March 2014 California Earthquake with ggmap

I had no intention to blog this, but @jayjacobs convinced me otherwise. I was curious about the recent (end of March, 2014) California earthquake “storm” and did a quick plot for “fun” and personal use using ggmap/ggplot.

I used data from the Southern California Earthquake Center (that I cleaned up a bit and that you can find here) but would have used the USGS quake data if the site hadn’t been down when I tried to get it from there.

The code/process isn’t exactly rocket-science, but if you’re looking for a simple way to layer some data on a “real” map (vs handling shapefiles on your own) then this is a really compact/self-contained tutorial/example.

You can find the code & data over at github as well.

There’s lots of ‘splainin in the comments (which are prbly easier to read on the github site) but drop a note in the comments or on Twitter if it needs any further explanation. The graphic is SVG, so use a proper browser :-) or run the code in R if you can’t see it here.

(click for larger version)

# read in cleaned up data
dat <- read.table("quakes.dat", header=TRUE, stringsAsFactors=FALSE)
# map decimal magnitudes into an integer range
dat$m <- cut(dat$MAG, c(0:10))
# convert to dates
dat$DATE <- as.Date(dat$DATE)
# so we can re-order the data frame
dat <- dat[order(dat$DATE),]
# not 100% necessary, but get just the numeric portion of the cut factor
dat$Magnitude <- factor(as.numeric(dat$m))
# sum up by date for the barplot
dat.sum <- count(dat, .(DATE, Magnitude))
# start the ggmap bit
# It's super-handy that it understands things like "Los Angeles" #spoffy
# I like the 'toner' version. Would also use a stamen map but I can't get 
# to it consistently from behind a proxy server
la <- get_map(location="Los Angeles", zoom=10, color="bw", maptype="toner")
# get base map layer
gg <- ggmap(la) 
# add points. Note that the plot will produce warnings for all points not in the
# lat/lon range of the base map layer. Also note that i'm encoding magnitude by
# size and color and using alpha for depth. because of the way the data is sorted
# the most recent quakes in the set should be on top
gg <- gg + geom_point(data=dat,
                      mapping=aes(x=LON, y=LAT, 
                                  size=MAG, fill=m, alpha=DEPTH), shape=21, color="black")
# this takes the magnitude domain and maps it to a better range of values (IMO)
gg <- gg + scale_size_continuous(range=c(1,15))
# this bit makes the right size color ramp. i like the reversed view better for this map
gg <- gg + scale_fill_manual(values=rev(terrain.colors(length(levels(dat$Magnitude)))))
gg <- gg + ggtitle("Recent Earthquakes in CA & NV")
# no need for a legend as the bars are pretty much the legend
gg <- gg + theme(legend.position="none")
# now for the bars. we work with the summarized data frame
gg.1 <- ggplot(dat.sum, aes(x=DATE, y=freq, group=Magnitude))
# normally, i dislike stacked bar charts, but this is one time i think they work well
gg.1 <- gg.1 + geom_bar(aes(fill=Magnitude), position="stack", stat="identity")
# fancy, schmanzy color mapping again
gg.1 <- gg.1 + scale_fill_manual(values=rev(terrain.colors(length(levels(dat$Magnitude)))))
# show the data source!
gg.1 <- gg.1 + labs(x="Data from:", y="Quake Count")
gg.1 <- gg.1 + theme_bw() #stopthegray
# use grid.arrange to make the sizes work well
grid.arrange(gg, gg.1, nrow=2, ncol=1, heights=c(3,1))

Guardian Words: Visualized

Andy Kirk (@visualisingdata) & Lynn Cherny (@arnicas) tweeted about the Guardian Word Count service/archive site, lamenting the lack of visualizations:

This gave me a chance to bust out another Shiny app over on our Data Driven Security shiny server:

I used my trusty “Google-Drive-spreadsheet-IMPORTHTML-to-CSV” workflow (you can access the automagically updated data here) to make the CSV that updates daily on the site and is referenced by the Shiny/R code.

The code has been gist-ified, and I’ll be re-visiting it to refactor the data.frame creation bits and add some more charts as the data set gets larger.

(Don’t forget to take a peek at our new book, Data-Driven Security!)

Using Twitter as a Data Source For Monitoring Password Dumps

I shot a quick post over at the Data Driven Security blog explaining how to separate Twitter data gathering from R code via the Ruby t (github repo) command. Using t frees R code from having to be a Twitter processor and lets the analyst focus on analysis and visualization, plus you can use t as a substitute for Twitter GUIs if you’d rather play at the command-line:

$ t timeline ddsecblog
   Monitoring Credential Dumps Plus Using Twitter As a Data Source
   Nice intro to R + stats // Data Analysis and Statistical Inference free @datacamp_com course
   Very accessible paper & cool approach to detection // Nazca: Detecting Malware Distribution in
   Large-Scale Networks
   Start of a new series by new contributing blogger @spttnnh! // @AlienVault rep db Longitudinal
   Study Part 1 :

The DDSec post shows how to mine the well-formatted output from the @dumpmon Twitter bot to visualize dump trends over time:

and has the code in-line and over at the DDSec github repo [R].

Optimization WordPress Plugins & Solutions by W3 EDGE