Skip navigation

Author Archives: hrbrmstr

Don't look at me…I do what he does — just slower. #rstats avuncular • ?Resistance Fighter • Cook • Christian • [Master] Chef des Données de Sécurité @ @rapid7

I’ve been getting a huge uptick in views of my Slopegraphs in Python post and I think it’s due to @edwardtufte’s recent slopegraph contest announcement.

The original Python code is crufty and a mess mostly due to the intermittent attention to it, wanting to reduce dependencies and hacking vs programming. I’ve been wanting to do a D3 version for a while, so I went a bit overboard once I learned of Mr Tufte’s challenge and made more of a “workbench” for making slopegraphs:

D3_Slopegraph_Workshop

It’s all in D3/HTML5/javascrpt/CSS and requires no server-side components at all.

You can play with a live, alpha-quality version and check out the rest of the components on github.

It needs work, but it should be a good starting point for folks.

As my track record for “winning” things is scant, if you do end up using the code, passing on word of my upcoming book with @jayjacobs would be
#spiffy :-)

It started with a local R version and migrated to a Shiny version and is now in full D3 glory.

Some down time gave me the opportunity to start a basic D3 version of the outage map, but it needs a bit of work as it relies on a page meta refresh to update (every 5 minutes) vs an inline element dynamic refresh. The fam was getting a bit irked at coding time on Thanksgiving, so keep watching the following gists for updates after the holiday:

Even though I really liked Origin, the performance issues associated with it were just too much to debug and I have a ton of other work to do. Back to Frank it is. Page loads are much faster and there are far fewer warnings in Google’s PageSpeed diagnostics (and the remaining ones I can live with).

I decided to forego the D3 map mentioned in the previous post in favor of a Shiny one since I had 90% of the mapping code written.

I binned the ranges into three groups, changed the color over to something more pleasant (with RColorBrewer), added an interactive table for the counties with outage and have the elements updating every minute.

You can see the Live Outage Map over at it’s live Shiny server. Source is below or over at github if you’ve got blockers enabled.

UPDATE: A Shiny (dynamic) version of this is now available.

We had yet-another power outage this morning due to the weird weather patterns of the week and it was the final catalyst I needed to crank out some R code to map the affected counties.

Central Maine Power provides an outage portal where folks can both report outages and see areas impacted by outages. They use an SAP web service that generates the outage table and the aforelinked page just embeds that URL (http://www3.cmpco.com/OutageReports/CMP.html) as an iframe. We can use the XML package in R to grab that HTML file, parse it, extract the table and then send the data to ggplot.

It should be a good starting point for anyone wishing to do something similar. The next itch to scratch for me on this is a live D3 map that uses the outage table with drill-down capabilities to the linked data.

library(maps)
library(maptools)
library(ggplot2)
library(plyr)
library(XML)
 
cmp.url <- "http://www3.cmpco.com/OutageReports/CMP.html"
# get outage table (first one on the cmp.url page)
cmp.node <- getNodeSet(htmlParse(cmp.url),"//table")[[1]]
cmp.tab <- readHTMLTable(cmp.node,
                         header=c("subregion","total.customers","without.power"),
                         skip.rows=c(1,2,3),
                         trim=TRUE, stringsAsFactors=FALSE)
 
# clean up the table to it's easier to work with
cmp.tab <- cmp.tab[-nrow(cmp.tab),] # get rid of last row
cmp.tab$subregion <- tolower(cmp.tab$subregion)
cmp.tab$total.customers <- as.numeric(gsub(",","",cmp.tab$total.customers))
cmp.tab$without.power <- as.numeric(gsub(",","",cmp.tab$without.power))
 
# get maine map with counties
county.df <- map_data('county')
me <- subset(county.df, region=="maine")
 
# get a copy with just the affected counties
out <- subset(me, subregion %in% cmp.tab$subregion)
 
# add outage into to it
out <- join(out, cmp.tab)
 
# plot the map
gg <- ggplot(me, aes(long, lat, group=group))
gg <- gg + geom_polygon(fill=NA, colour='gray50', size=0.25)
gg <- gg + geom_polygon(data=out, aes(long, lat, group=group, fill=without.power), 
                        colour='gray50', size=0.25)
gg <- gg + scale_fill_gradient2(low="#FFFFCC", mid="#FD8D3C", high="#800026")
gg <- gg + coord_map()
gg <- gg + theme_bw()
gg <- gg + labs(x="", y="", title="CMP (Maine) Customers Without Power by County")
gg <- gg + theme(panel.border = element_blank(),
                 panel.background = element_blank(),
                 panel.grid = element_blank(),
                 axis.text = element_blank(),
                 axis.ticks = element_blank(),
                 legend.position="left",
                 legend.title=element_blank())
gg

Plot_Zoom(click for larger)

Data Driven Security launches in February 2014. @jayjacobs & I have seen half of the book in PDF form so far and it’s almost unbelievable that this journey is almost over.

Data_Driven_Security___Amazon_Sales_Rank_Tracker

We setup a live Amazon “sales rank” tracker over at the book’s web site and provided some Python and JavaScript code to show folks how use the AWS API in conjunction with the dygraphs charting library to do the same for any ISBN. In the coming weeks, we’ll have a Google App Engine component you can clone to setup something similar without the need for your own server(s).

Since @jayjacobs & I are down to the home stretch on Data Driven Security, I thought it would be interesting to do some post-writing pseudo-analyses of the book itself. I won’t have exact page or word counts for a bit, but I wanted to see how many R packages we ended up relying on for the examples in the chapters. It was fairly straightforward to run a grep for calls to library() or require() across all the source files, and I grouped the results into four categories: “analysis”, “core”, “munging” and “visualization”.

Since I <3 D3 circular dendrograms, I figured that would be a fun way to show the groupings. For those who dislike spinning your noggin around, a more traditional one is also presented. You'll need an SVG-capable browser to see the visualizations (below). Stay on the lookout for more "behind the scenes" posts.

visualizationaplpackcolorspaceggdendroggplot2ggthemesgridExtraigraphmapsmaptoolsRColorBrewervcdanalysisbinomcareffectsportfoliosplinesscalesverisrzoorgdalcoredevtoolsstatsmungingbitopsgdatareshapeplyrrjsonRJSONIO

visualizationaplpackcolorspaceggdendroggplot2ggthemesgridExtraigraphmapsmaptoolsRColorBrewervcdanalysisbinomcareffectsportfoliosplinesscalesverisrzoorgdalcoredevtoolsstatsmungingbitopsgdatareshapeplyrrjsonRJSONIO

The #spiffy @dseverski gave me this posit the other day:

and, I obliged shortly thereafter, but figured I’d toss a post up on the blog before heading to Strata.

To rephrase the tweet a bit, Mr. Severski asked me what alternate encoding I’d use for this grouped bar chart (larger version at the link in David’s tweet):

linkedinq31

I have almost as much disdain for grouped bar charts as I do for pie or donut charts, so appreciated the opportunity to try a makeover. However, I ran into an immediate problem: the usually #spiffy 451 Group folks did not include raw data. So, I reverse engineered the graph with WebPlotDigitizer, cleaned up the result and made a CSV from it. Then, I headed to RStudio with a plan in mind.

The old chart and data screamed faceted dot plot. The only trick necessary was to manually order the factor levels.

library(ggplot)
 
# read in the CSV file
nosql.df <- read.csv("nosql.csv", header=TRUE)
# manually order facets
nosql.df$Database <- factor(nosql.df$Database,
                            levels=c("MongoDB","Cassandra","Redis","HBase","CouchDB",
                                     "Neo4j","Riak","MarkLogic","Couchbase","DynamoDB"))
 
# start the plot
gg <- ggplot(data=nosql.df, aes(x=Quarter, y=Index))
# use points, colored by Quarter
gg <- gg + geom_point(aes(color=Quarter), size=3)
# make strips by nosql db factor
gg <- gg + facet_grid(Database~.)
# rotate the plot
gg <- gg + coord_flip()
# get rid of most of the junk
gg <- gg + theme_bw()
# add a title
gg <- gg + labs(x="", title="NoSQL LinkedIn Skills Index\nSeptember 2013")
# get rid of the legend
gg <- gg + theme(legend.position = "none")
# ensure the strip is gone
gg <- gg + theme(strip.text.x = element_blank())
gg

The result is below in SVG form (install a proper browser if you can’t see it, or run the R code :-) I think it conveys the data in a much more informative way. How would you encode the data to make it more informative and accessible?

Full source & data over at github.