rud.is

New #rstats Podcast – R World News

Keeping up with R-related news on Twitter, GitHub, CRAN & even R-Bloggers (et al) can be an all-encompassing task that may be fun, but doesn’t always make it easy to get work done. There is so much going on in the R community that we (myself and @jayjacobs) felt there was room for another podcast focused on the (highly subjective) “best of the best of the week”. We’ve dubbed this effort “R World News” and will be publishing it weekly starting this week.

Each episode will highlight new CRAN packages/developments, cutting edge releases from rOpenSci & the “GitHub”-ecosystem, book reviews, interviews and featurettes on topics such as the new feather file format. Show notes will have links to everything and we even have a small newsletter setup to let you know when new episodes are up and deliver the notes right to your inbox.

You can also follow us on Twitter (@r_world_news) to be informed there when new episodes are posted.

Episode 1 is up and you can use the following URL to subscribe to the podcast feed:

http://www.rworld.news/feed/r-world-news.xml

We’ve been approved for the iTunes Store (that link should work starting Wed/Thu) and will be getting the feed on the major podcast services as quickly as they can process requests. We’re also working on getting the companion blog (really, just a show-notes feed) up on R-bloggers, so stay tuned!

Make sure to give a shout-out to info@rworld.news with topic suggestions and drop a note in the comments for each episode over on rworld.news.

Pining for the fjoRds & monitoring SSL/TLS certificate expiration in R with flexdashboard

Rumors of my demise have been (almost) greatly exaggerated.

Folks have probably noticed that #52Vis has stalled, as has most blogging, package & Twitter activity. I came down with a nasty bout of bronchitis after attending rOpenSci Unconf 16 (there were _so_ many people hacking [the sick kind] up a storm in SFO!) and then managed to get pneumonia (which I’m still working through) so any and all awake time has gone to work, class and fam. However, #52Vis winds back up this week, a new R endeavor will be revealed and hopefully I’ll be done with getting ill until Fall.

Getting ill does have some advantages. I completely forgot about renewing SSL/TLS certificates on some (official – yikes!) sites I help manage and decided to have that not be “a thing” moving forward with some help from R. Specifically, I decided to use the `openssl` and `flexdashboard` packages to accomplish my monitoring goals. I’m probably not the only one who needs to care about SSL/TLS certificate renewals so my illness-born-invention is presented below for anyone else to use or mod.

### Flexing flexdashboard muscles

If you haven’t heard about [`flexdashboard`](http://rstudio.github.io/flexdashboard/) then you should visit that link before continuing. It’s an emerging package from the fine folks over at RStudio that makes it super-easy to create quick and pretty dashboards. You can look [at the examples](http://rstudio.github.io/flexdashboard/examples.html) if you want proof. Here’s how `flexdashboard` fit into my goals. I wanted a way to:

– provide a character vector of hosts and ports (you can run SSL/TLS on any port and for many types of services)
– retrieve the certificates at those endpoints
– compare the expiration date to the current date
– provided a dashboard-like view of the state of those certificates, ordered from soonest-expiring to longest-expiring and color-coded (to make it easier to see the certs of impending DOOM)

I immediately thought of `flexdashboard` but my hopes were quickly dashed when all attempts to provide a list of `valueBox()` elements (as I could with `htmlwidgets` in R markdown documents) failed to deliver the desired result of a scrolling, responsive set of boxes.

My workaround was to have an R script create a `flexdashboard` R markdown document on the fly then call `rmarkdown::render()` to generate the final HTML page. Rather than bore you with a tiny view of the sites I work with, I decided to scrape the list of R CRAN mirrors that are SSL/TLS-enabled and present them via this rube goldberg contraption as the show-and-tell example.

The annotated code is below and in [this gist](https://gist.github.com/hrbrmstr/910af8ddc6371572aa4414b77ae86c6a).

library(rvest)
library(urltools)
library(rmarkdown)

# Some Rmd template setup -----------------------------------------------------------

preamble <- '---
title: "CRAN Mirrors Certificate Expiration Dashboard (Days left from %s)"
output:
  flexdashboard::flex_dashboard:
    orientation: rows
    vertical_layout: scroll
---
```{r setup, include=FALSE}
library(flexdashboard)
library(openssl)
library(purrr)
library(dplyr)
library(scales)
'

after_data <- '

dsc <- safely(download_ssl_cert);

expires_delta <- function(site) {

  site_info <- strsplit(site, ":")[[1]]
  port <- as.numeric(site_info[2])

  chain_res <- dsc(site_info[1], port)
  if (!is.null(chain_res$result)) {

    chain <- chain_res$result

    valid_from <- as.Date(as.POSIXct(as.list(chain[[1]])$validity[1],
                                     "%b %d %H:%M:%S %Y", tz="GMT"))
    expires_on <- as.Date(as.POSIXct(as.list(chain[[1]])$validity[2],
                                     "%b %d %H:%M:%S %Y", tz="GMT"))

    data_frame(site=site_info[1],
               valid_from=valid_from,
               expires_on=expires_on,
               cert_valid_length=expires_on-valid_from,
               days_left_from_valid=expires_on - valid_from,
               days_left_from_today=expires_on - Sys.Date(),
               days_left_from_today_lab=comma(expires_on - Sys.Date()),
               color="primary",
               color=ifelse(days_left_from_today<=15, "danger", color),
               color=ifelse(days_left_from_today>15 & days_left_from_today<60, "warning", color))

  } else {

    data_frame(site=site_info[1],
               valid_from=NA,
               expires_on=NA,
               cert_valid_length=NA,
               days_left_from_valid=NA,
               days_left_from_today=NA,
               days_left_from_today_lab="Host unreachable",
               color="info")

  }

}

ssl_df <- arrange(map_df(sites, expires_delta), days_left_from_today)
```

'

# Get a list of all https-enabled CRAN mirrors --------------------------------------

pg <- read_html("https://cran.r-project.org/mirrors.html")
sites <- sprintf("%s:443", domain(html_attr(html_nodes(pg, "td > a[href^='https:']"), "href")))

# Capture this vector for use in the R markdown template ----------------------------

setup_data <- capture.output(dput(sites))

# Create a temporary Rmd file -------------------------------------------------------

dashfile <- tempfile(fileext=".Rmd")

# Write out the initial template bits we've been making -----------------------------

cat(sprintf(preamble, Sys.Date()), "sites <- ", setup_data, after_data, file=dashfile)

# 5 valueBoxes per row seems like a good # ----------------------------------------

max_vbox_per_row <- 5

n_dashrows <- ceiling(length(sites)/max_vbox_per_row)

# Generate a valueBox() per site, making rows every max_vbox_per_row ----------------

for (i in 1:length(sites)) {

  if (((i-1) %% max_vbox_per_row) == 0) {
    cat('
Row
-------------------------------------

', file=dashfile, append=TRUE)
  }

  cat(sprintf("\n### %s\n```{r}\n", gsub(":.*$", "", sites[i])), file=dashfile, append=TRUE)
  cat(sprintf('valueBox(ssl_df[%d, "days_left_from_today_lab"], icon="fa-lock", color=ssl_df[%d, "color"])\n```\n', i, i),
      file=dashfile, append=TRUE)
}

# Temporary html file (you prbly want this more readily available -------------------

dir <- tempfile()
dir.create(dir)
dash_html <- file.path(dir, "sslexpires.html")

# Render the dashboard --------------------------------------------------------------

rmarkdown::render(dashfile, output_file=dash_html)

# View in RStudio -------------------------------------------------------------------

rstudioapi::viewer(dash_html)

# Clean up --------------------------------------------------------------------------

unlink(dashfile)

You can see the output below and can use [this link](/projects/sslexpires.html) to bust the iframe.

You can use different values for the color thresholds or use a different visual display altogether. The `flexdashboard` package works with virtually any widget or static R visualization. You should also look at the frame-busted version and shrink the browser window (or view it on a mobile phone) to see the responsive nature of the framework.

I’m pretty sure the CRAN R mirror that is displaying an error is due to my accessing it via the resolved IPv6 address (I run IPV6 at home and have an IPv6 internet connection) vs the IPv4 address it’s probably actually listening on.

Keep an eye out for #52vis news and the new R project I hinted at in the intro.

(ggplot2) Exercising with (ggalt) dumbbells

2016-04-17 – 10:43
Posted in Charts & Graphs, Data Visualization, DataVis, DataViz, ggplot, R
Tagged post
Comments (24)

I follow the most excellent Pew Research folks on Twitter to stay in tune with what’s happening (statistically speaking) with the world. Today, they tweeted this excerpt from their 2015 Global Attitudes survey:

The age gap in social media use around the world https://t.co/0Dq1PcbExG pic.twitter.com/9HBM7gLxwR

— PewResearch Internet (@pewinternet) April 17, 2016

I thought it might be helpful to folks if I made a highly aesthetically tuned version of Pew’s chart (though I chose to go a bit more minimal in terms of styling than they did) with the new geom_dumbbell() in the development version of ggalt. The source (below) is annotated, but please drop a note in the comments if any of the code would benefit from more exposition.

I’ve also switched to using the Prism javascript library starting with this post after seeing how well it works in RStudio’s flexdashboard package. If the “light on black” is hard to read or distracting, drop a note here and I’ll switch the theme if enough folks are having issues.

library(ggplot2) # devtools::install_github("hadley/ggplot2")
library(ggalt)   # devtools::install_github("hrbrmstr/ggalt")
library(dplyr)   # for data_frame() & arrange()

# I'm not crazy enough to input all the data; this will have to do for the example
df <- data_frame(country=c("Germany", "France", "Vietnam", "Japan", "Poland", "Lebanon",
                           "Australia", "South\nKorea", "Canada", "Spain", "Italy", "Peru",
                           "U.S.", "UK", "Mexico", "Chile", "China", "India"),
                 ages_35=c(0.39, 0.42, 0.49, 0.43, 0.51, 0.57,
                           0.60, 0.45, 0.65, 0.57, 0.57, 0.65,
                           0.63, 0.59, 0.67, 0.75, 0.52, 0.48),
                 ages_18_to_34=c(0.81, 0.83, 0.86, 0.78, 0.86, 0.90,
                                 0.91, 0.75, 0.93, 0.85, 0.83, 0.91,
                                 0.89, 0.84, 0.90, 0.96, 0.73, 0.69),
                 diff=sprintf("+%d", as.integer((ages_18_to_34-ages_35)*100)))

# we want to keep the order in the plot, so we use a factor for country
df <- arrange(df, desc(diff))
df$country <- factor(df$country, levels=rev(df$country))

# we only want the first line values with "%" symbols (to avoid chart junk)
# quick hack; there is a more efficient way to do this
percent_first <- function(x) {
  x <- sprintf("%d%%", round(x*100))
  x[2:length(x)] <- sub("%$", "", x[2:length(x)])
  x
}

gg <- ggplot()
# doing this vs y axis major grid line
gg <- gg + geom_segment(data=df, aes(y=country, yend=country, x=0, xend=1), color="#b2b2b2", size=0.15)
# dum…dum…dum!bell
gg <- gg + geom_dumbbell(data=df, aes(y=country, x=ages_35, xend=ages_18_to_34),
                         size=1.5, color="#b2b2b2", point.size.l=3, point.size.r=3,
                         point.colour.l="#9fb059", point.colour.r="#edae52")
# text below points
gg <- gg + geom_text(data=filter(df, country=="Germany"),
                     aes(x=ages_35, y=country, label="Ages 35+"),
                     color="#9fb059", size=3, vjust=-2, fontface="bold", family="Calibri")
gg <- gg + geom_text(data=filter(df, country=="Germany"),
                     aes(x=ages_18_to_34, y=country, label="Ages 18-34"),
                     color="#edae52", size=3, vjust=-2, fontface="bold", family="Calibri")
# text above points
gg <- gg + geom_text(data=df, aes(x=ages_35, y=country, label=percent_first(ages_35)),
                     color="#9fb059", size=2.75, vjust=2.5, family="Calibri")
gg <- gg + geom_text(data=df, color="#edae52", size=2.75, vjust=2.5, family="Calibri",
                     aes(x=ages_18_to_34, y=country, label=percent_first(ages_18_to_34)))
# difference column
gg <- gg + geom_rect(data=df, aes(xmin=1.05, xmax=1.175, ymin=-Inf, ymax=Inf), fill="#efefe3")
gg <- gg + geom_text(data=df, aes(label=diff, y=country, x=1.1125), fontface="bold", size=3, family="Calibri")
gg <- gg + geom_text(data=filter(df, country=="Germany"), aes(x=1.1125, y=country, label="DIFF"),
                     color="#7a7d7e", size=3.1, vjust=-2, fontface="bold", family="Calibri")
gg <- gg + scale_x_continuous(expand=c(0,0), limits=c(0, 1.175))
gg <- gg + scale_y_discrete(expand=c(0.075,0))
gg <- gg + labs(x=NULL, y=NULL, title="The social media age gap",
                subtitle="Adult internet users or reported smartphone owners who\nuse social networking sites",
                caption="Source: Pew Research Center, Spring 2015 Global Attitudes Survey. Q74")
gg <- gg + theme_bw(base_family="Calibri")
gg <- gg + theme(panel.grid.major=element_blank())
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(axis.text.x=element_blank())
gg <- gg + theme(plot.title=element_text(face="bold"))
gg <- gg + theme(plot.subtitle=element_text(face="italic", size=9, margin=margin(b=12)))
gg <- gg + theme(plot.caption=element_text(size=7, margin=margin(t=12), color="#7a7d7e"))
gg

52Vis Week #3 – Waste Not, Want Not

2016-04-13 – 23:35
Posted in 52vis, Charts & Graphs, DataVis, DataViz, ggplot, maps, R
Tagged post
Comments (9)

The Wall Street Journal did a project piece [a while back](http://projects.wsj.com/waste-lands/) in the _”Waste Lands: America’s Forgotten Nuclear Legacy”_. They dug through [Department of Energy](http://www.lm.doe.gov/default.aspx?id=2602) and [CDC](http://www.cdc.gov/niosh/ocas/ocasawe.html) data to provide an overview of the lingering residue of this toxic time in America’s past (somehow, I have to believe the fracking phenomena of our modern era will end up doing far more damage in the long run).

Being a somewhat interactive piece, I was able to tease out the data source behind it for this week’s challenge. I’m, once again, removing the obvious vis and re-creating a non-interactive version of the WSJ’s main map view (with some additional details).

There’s definitely a story or two here, but I felt that the overall message fell a bit flat the way the WSJ folks told it. Can you find an angle or question that tells a tale in a more compelling fashion? I added some hints in the code snippet below (and in the repo) as to how you might find additional details for each toxic site (and said details are super-scrape-able with `rvest`). I also noticed some additional external data sets that could be brought in (but I’ll leave that detective work to our contestants).

If you’re up to the task, fork [this week’s repo](https://github.com/52vis/2016-15), create a subdirectory for your submission and shoot a PR my way (notifying folks here in the comments about your submission is also encouraged).

Entries accepted up until 2016-04-20 23:59:59 EDT.

Hadley has volunteered a signed book and I think I’ll take him up on the offer for this week’s prize (unless you really want a copy of Data-Driven Security :-).

One last note: I’ve secured `52vis.com` and will be getting that configured for next week’s contest. It’ll be a nice showcase site for all the submissions.

library(albersusa) # devtools::install_github("hrbrmstr/hrbrmisc")
library(rgeos)
library(maptools)
library(ggplot2)
library(ggalt)
library(ggthemes)
library(jsonlite)
library(tidyr)
library(dplyr)
library(purrr)
 
#' WSJ Waste Lands: http://projects.wsj.com/waste-lands/
#' Data from: http://www.lm.doe.gov/default.aspx?id=2602 &
#'            http://www.cdc.gov/niosh/ocas/ocasawe.html
 
sites <- fromJSON("sites.json", flatten=TRUE)
 
#' need to replace the 0-length data.frames with at least one row of `NA`s
#' so we can safely `unnest()` them later
 
sites$locations <- map(sites$locations, function(x) {
  if (nrow(x) == 0) {
    data_frame(latitude=NA, longitude=NA, postal_code=NA, name=NA, street_address=NA)
  } else {
    x
  }
})
 
#' we'll need this later
 
sites$site.rating <- factor(sites$site.rating,
                           levels=c(3:0),
                           labels=c("Remote or no potential for radioactive contamination, based on criteria at the time of FUSRAP review.",
                                    "Referred to another agency or program, no authority to clean up under FUSRAP, or status unclear.",
                                    "Cleanup declared complete under FUSRAP.",
                                    "Cleanup in progress under the Formerly Utilized Sites Remedial Action Program (FUSRAP)."))
 
#' One teensy discrepancy:
 
nrow(sites)
 
#' ## [1] 517
#'
#' The stacked bars total on the WSJ site is 515.
#' Further complication is that sites$locations is a list column with nested
#' data.frames:
 
sites <- unnest(sites)
 
nrow(sites)
 
#' ## [1] 549
 
sum(complete.cases(sites[,c("longitude", "latitude")]))
 
#' ## [1] 352
#'
#' So, just mapping long/lat is going to miss part of the story. But, I'm just
#' providing a kick-start for folks, so I'll just map long/lat :-)
 
glimpse(sites)
 
#' ## Observations: 549
#' ## Variables: 11
#' ## $ site.city      (chr) "Flint", "Albuquerque", "Buffalo", "Los...
#' ## $ site.name      (chr) "AC Spark Plug, Dort Highway Plant", "A...
#' ## $ site.rating    (fctr) Remote or no potential for radioactive...
#' ## $ site.state     (chr) "MI", "NM", "NY", "NM", "PA", "NY", "OH...
#' ## $ site.state_ap  (chr) "Mich.", "N.M.", "N.Y.", "N.M.", "Pa.",...
#' ## $ site.slug      (chr) "1-ac-spark-plug-dort-highway-plant", "...
#' ## $ latitude       (dbl) 43.02938, NA, NA, 35.88883, 39.95295, 4...
#' ## $ longitude      (dbl) -83.65525, NA, NA, -106.30502, -75.5927...
#' ## $ postal_code    (chr) "48506", NA, NA, "87544", "19382", "100...
#' ## $ name           (chr) "", NA, NA, "", "", "", "Former Buildin...
#' ## $ street_address (chr) "1300 North Dort Highway", NA, NA, "Pue...
 
#' Note that `site.slug` can be used with this URL:
#' `http://projects.wsj.com/waste-lands/site/SITE.SLUG.HERE/` to get to
#' detail pages on the WSJ site.
 
#' I want to use my `albersusa` mutated U.S. shapefile for this (NOTE: I'm moving
#' `albersus` into one of the rOpenSci pacakges soon vs publishing it standalone to CRAN)
#' so I need to mutate the Alaska points (there are no Hawaii points).
#' This step is *not necessary* unless you plan on displaying points on this
#' mutated map. I also realized I need to provide a mutated projection translation
#' function for AK & HI for the mutated Albers mapss.
 
tmp  <- data.frame(dplyr::select(filter(sites, site.state=="AK"), longitude, latitude))
coordinates(tmp) <- ~longitude+latitude
proj4string(tmp) <- CRS(us_longlat_proj)
tmp <- spTransform(tmp, CRS(us_laea_proj))
tmp <- elide(tmp, rotate=-50)
tmp <- elide(tmp, scale=max(apply(bbox(tmp), 1, diff)) / 2.3)
tmp <- elide(tmp, shift=c(-2100000, -2500000))
proj4string(tmp) <- CRS(us_laea_proj)
tmp <- spTransform(tmp, us_longlat_proj)
tmp <- as.data.frame(tmp)
 
sites[sites$site.state=="AK",]$longitude <- tmp$x
sites[sites$site.state=="AK",]$latitude <- tmp$y
 
#' and now we plot the sites
 
us_map <- fortify(usa_composite(), region="name")
 
gg <- ggplot()
gg <- gg + geom_map(data=us_map, map=us_map,
                    aes(x=long, y=lat, map_id=id),
                    color="#2b2b2b", size=0.15, fill="#e5e3df")
gg <- gg + geom_point(dat=sites, aes(x=longitude, y=latitude, fill=site.rating),
                      shape=21, color="white", stroke=1, alpha=1, size=3)
gg <- gg + scale_fill_manual(name="", values=c("#00a0b0", "#edc951", "#6a4a3c", "#eb6841"))
gg <- gg + coord_proj(us_laea_proj)
gg <- gg + guides(fill=guide_legend(override.aes=list(alpha=1, stroke=0.2, color="#2b2b2b", size=4)))
gg <- gg + labs(title="Waste Lands: America's Forgotten Nuclear Legacy",
                 caption="Data from the WSJ")
gg <- gg + theme_map()
gg <- gg + theme(legend.position="bottom")
gg <- gg + theme(legend.direction="vertical")
gg <- gg + theme(legend.key=element_blank())
gg <- gg + theme(plot.title=element_text(size=18, face="bold", hjust=0.5))
gg

52 Vis Week #2 Wrap Up

2016-04-13 – 23:07
Posted in 52vis, Data Visualization, DataVis, DataViz, R
Tagged post
Comments (3)

I’ve been staring at this homeless data set for a few weeks now since I’m using it both here and in the data science class I’m teaching. It’s been one of the most mindful data sets I’ve worked with in a while. Even when reduced to pure numbers in named columns, the names really stick with you…_”Unsheltered Homeless People in Families”_…_”Unsheltered Chronically Homeless”_…_”Homeless Veterans”_…_”Unsheltered Homeless Unaccompanied Youth”_. These are real people, really hurting.

That’s one of the _superpowers_ “Data Science” gives you: the ability to shed light on the things that matter and to tell meaningful stories that people _need_ to hear. From my interactions with some of the folks who submitted entries, I know they, too, were impacted by the stories contained in this data set. Let’s see what they uncovered.

(All the code & un-shrunk visualizations are in the [52vis github repo](https://github.com/52vis/2016-14))

Camille compared a point-in-time view of one of the most vulnerable parts of the population—youth (under 25)—with the overall population:

The youth homelessness situation in Nevada seems especially disconcerting and I wonder how much better/worse it might be if we factored in the 25 & under U.S. census information (I’m really a bit reticent to run those numbers for fear it’ll be even worse).

Craine Munton submitted our first D3 entry! I ended up tweaking some of the JS & CSS `href`s (to fix non-sync’d files) and you can see the full version [here](https://rud.is/52vis/2016-14/cmrunton/). I’m going to try to embed it below as well (I’ll leave it up even if it’s not fully sized well. Just hit the aforementioned URL for the full-browser version).

Craine focused on another vulnerable and sometimes forgotten segment: those that put their lives on the line for our freedom and the safety and security of threatened people groups around the globe.

Joshua Kunst is an incredibly talented individual who has made a number of stunning visualizations in R. He used htmlwidgets to tell [a captivating story](https://rud.is/52vis/2016-14/jbkunst/) that ends in (statistically inferred) _hope_.

Hit the [full page](https://rud.is/52vis/2016-14/jbkunst/) for the frame-busted visualization.

Jake Kaupp took inspiration from Alberto Cairo and created some truly novel visualizations. I’m putting the easiest to embed first:

Jake has [written a superb piece](https://jkaupp.github.io/) on his creation, included an [interactive Shiny app](https://jkaupp.shinyapps.io/52vis_Homeless/) and brought in extra data to try to come to grips with this data. Definitely take time to read his post (even if it means you never get back to this post).

His small-multiples view is below but you should click on it to see it in full-browser view.

Jonathan Carroll (another fellow rOpenSci’er) created a companion [blog post](http://jcarroll.com.au/2016/04/10/52vis-week-2-challenge/) for his animated choropleth entry:

I really like how it highlights the differences per year and a number of statistical/computational choices he made.

Julia Silge focused on youth as well asking two compelling questions (you can read her [exposition](http://rpubs.com/juliasilge/170499) as well):

(In seeing this second youth-focused vis and also having a clearer picture of the areas of greatest concern, I’m wondering if there’s a climate/weather-oriented reason for certain areas standing out when it comes to homeless youth issues.)

Philipp Ottolinger took a statistical look at youth and veterans:

Make sure to dedicate some cycles to [check out his approach](https://github.com/ottlngr/2016-14/blob/ottlngr/ottlngr/homeless_ottlngr.Rmd).

@patternproject did not succumb to the temptation to draw a map just because “it’s U.S. State data” and chose, instead, to look across time and geography to tease out patterns using a heatmap.

I _really_ like this novel approach and am now rethinking my approach geo-temporal visualizations.

Xan Gregg looked at sheltered vs unsheltered homeless populations from a few different viewpoints (including animation).

(Again, beautiful work in JMP).

### We have a winner

The diversity and craftsmanship of these entries was amazing to see, as was the care and attention each submitter took when dealing with this truly tough subject. I was personally impacted by each one and I hope it raised awareness in the broader data science community.

I couldn’t choose between Joshua Kunst’s & Jake Kaupp’s entries so they’re tied for first place and they’ll both be getting a copy of [Data-Driven Security](http://dds.ec/amzn).

Joshua & Jake: hit up bob at rudis dot net to claim your prize!

A $50.00 donation has also been made to the [National Coalition for the Homeless](http://nationalhomeless.org/) dedicating it by name to each of the contest participants.

52 Vis Week 1 Winners!

The response to 52Vis has exceeded expectations and there have been great entries for both weeks. It’s time to award some prizes!

### Week 1 – Send in the Drones

I’ll take [this week](https://github.com/52vis/2016-13) in comment submission order (remember, the rules changed to submission via PR in Week 2).

NOTE: WordPress seems to have “eaten” the animations on upload, so _please_ check out the direct links to them (they are worth the clicks).

First up is a straightforward but really colorful take take on the data by J. Alexander Branham’s (in R):

You can see his code and commentary (he went into detail on both the code and thought process) on [his blog](http://jabranham.com/blog/2016/03/ggplot-maps.html). Extra points for the Albers projection!

Next up is Jérôme Laurent who did an equally great job explaining his thought process behind his approach and some great code exposition that produced this really neat time-lapse animation:

You can see Jérôme’s [blog](https://jerome-laurent-pro.github.io/2016-04-01-dataviz-week13/) for all the code and ‘splainin.

Timothy Kiely took a super neat approach and mapped out the drone geographic density, but then expanded on his vision to look at the density over time (with a paired bar chart!).

Balázs Dukai also took the density approach in his [RPubs submission](http://rpubs.com/BalazsDukai/vis_2016-13) (with code) but attacked it from a small multiples perspective. It was interesting to see California move in and out of prominence and I think this will be an interesting project to replicate as the FAA collects more data.

Fellow rOpenSci 2016 participant Julia Silge wanted to see the distribution of sightings throughout the week and took a straightforward faceted bar chart approach:

BUT she also doubled-down on stats to make her case. Read her [superb exposition](http://rpubs.com/juliasilge/168308) to see the conclusion!

I was _extremely_ excited to see a [non-R submission](https://github.com/xangregg/dronetimes) from Xan Gregg. Using JMP & JSL (SAS components), Xan walks us through both the dreaded data-munging component (that R folks got for free from me) and found some really interesting oddities in the data that suggest there are issues with data quality. I really loved the in-depth explanation and the scatterplot that was produced to help diagnose data issues is really well-crafted.

Philipp Ottolinger was super-kind enough to go back and PR his submission so all his code is in the 52vis repo, but you should also [check it out in his repo](https://github.com/ottlngr/52Vis/blob/master/01/01_52Vis.md) since he has alot of other really cool things to persuse. He tried to see if drones were a weekend or daily occurrence in his approach:

Jacob Barnett was the first Leaflet submission (with [full code](https://gist.github.com/barnettjacob/58601c78f22616a02c3d3e1fa1aea724#file-flight_data-png)) and explored the geographic prevalence of the drone sightings:

@patternproject [explored](https://github.com/patternproject/r.rudis.challenge1) the density by week of year:

and this is going to be a really neat graph to watch over time as more of these flying annoyances take to the skies. I’m re-running that code later this year to see if the weeks with greater density stay the same.

Mukul Chaware did some [text analysis](https://github.com/mukul13/2016-13/tree/master/mukul) as well as temporal queries on the data one of the more interesting charts is the day vs night one:

but he also used the text analysis to see where there were LEO notifications or not and did multiple views using that as a pivot (really neat idea). Check out his [full submission](https://github.com/mukul13/2016-13/tree/master/mukul) for some great exposition and analysis.

Lastly is Andrew Heiss‘ [inquiry to discover](https://www.andrewheiss.com/blog/2016/04/03/drone-sightings-in-the-us-visualized/) whether we have a hobbyist problem or an official problem with these flying digital buzzards:

The pairing of external data, combined with the gorgeous map truly ruled the day, but he went further to answer another question: which state had the most (per capita) drone sightings:

### The Results Are In

It was really excellent work by everyone and the different approaches by the the submitters shows exactly what I was hoping this project would show: that there are many ways to approach the data that you have and finding the question that really takes hold of you can ultimately deliver amazing results.

For this week:

– Andrew Heiss takes the #1 spot and gets not only a digital copy of [Data-Driven Security](http://dds.ec/amzn) but also a $25.00 Amazon gift cart (I actually had a sponsor who wanted to remain anonymous for this one)
– Xan Gregg takes a #2 spot and gets a copy of Data-Driven Security for an excellent analysis and being willing to be the first non-R submitter!

Andrew & Xan: please e-mail me at bob at rudis dot net so I can get your prizes to you!

Next post will be the Week 2 winners! Then, Week 3’s challenge!

Beating lollipops into dumbbells

2016-04-12 – 19:56
Posted in Charts & Graphs, Data Visualization, DataVis, DataViz, ggplot, R
Tagged post
Comments (10)

Shortly after I added lollipop charts to ggalt I had a few requests for a dumbbell geom. It wasn’t difficult to do modify the underlying lollipop Geoms to make a geom_dumbbell(). Here it is in action:

library(ggplot2)
library(ggalt) # devtools::install_github("hrbrmstr/ggalt")
library(dplyr)

# from: https://plot.ly/r/dumbbell-plots/
URL <- "https://raw.githubusercontent.com/plotly/datasets/master/school_earnings.csv"
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil)

df <- read.csv(fil, stringsAsFactors=FALSE)
df <- arrange(df, desc(Men))
df <- mutate(df, School=factor(School, levels=rev(School)))

gg <- ggplot(df, aes(x=Women, xend=Men, y=School))
gg <- gg + geom_dumbbell(colour="#686868",
                         point.colour.l="#ffc0cb",
                         point.colour.r="#0000ff",
                         point.size.l=2.5,
                         point.size.r=2.5)
gg <- gg + scale_x_continuous(breaks=seq(60, 160, by=20),
                              labels=sprintf("$%sK", comma(seq(60, 160, by=20))))
gg <- gg + labs(x="Annual Salary", y=NULL,
                title="Gender Earnings Disparity",
                caption="Data from plotly")
gg <- gg + theme_bw()
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(axis.title.x=element_text(hjust=1, face="italic", margin=margin(t=-24)))
gg <- gg + theme(plot.caption=element_text(size=8, margin=margin(t=24)))
gg

The API isn't locked in, so definitely file an issue if you want different or additional functionality. One issue I personally still have is how to identify the left/right points (blue is male and pink is female in this one).

Working Out With Dumbbells

I thought folks might like to see behind the ggcurtain. It really only took the addition of two functions to ggalt: geom_dumbbell() (which you call directly) and GeomDumbbell() which acts behind the scenes.

There are a few additional, custom parameters to geom_dumbbell() and the mapped stat and position are hardcoded in the layer call. We also pass in these new parameters into the params list.

geom_dumbbell <- function(mapping = NULL, data = NULL, ...,
                          point.colour.l = NULL, point.size.l = NULL,
                          point.colour.r = NULL, point.size.r = NULL,
                          na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) {

  layer(
    data = data,
    mapping = mapping,
    stat = "identity",
    geom = GeomDumbbell,
    position = "identity",
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      na.rm = na.rm,
      point.colour.l = point.colour.l,
      point.size.l = point.size.l,
      point.colour.r = point.colour.r,
      point.size.r = point.size.r,
      ...
    )
  )
}

The exposed function eventually calls it's paired Geom. There we get to tell it what are required aes parameters and which ones aren't required, plus set some defaults.

We automagically add yend to the data in setup_data() (which gets called by the ggplot2 API).

Then, in draw_group() we create additional data.frames and return a list of three Geom layers (two points and one segment). Finally, we provide a default legend symbol.

GeomDumbbell <- ggproto("GeomDumbbell", Geom,
  required_aes = c("x", "xend", "y"),
  non_missing_aes = c("size", "shape",
                      "point.colour.l", "point.size.l",
                      "point.colour.r", "point.size.r"),
  default_aes = aes(
    shape = 19, colour = "black", size = 0.5, fill = NA,
    alpha = NA, stroke = 0.5
  ),

  setup_data = function(data, params) {
    transform(data, yend = y)
  },

  draw_group = function(data, panel_scales, coord,
                        point.colour.l = NULL, point.size.l = NULL,
                        point.colour.r = NULL, point.size.r = NULL) {

    points.l <- data
    points.l$colour <- point.colour.l %||% data$colour
    points.l$size <- point.size.l %||% (data$size * 2.5)

    points.r <- data
    points.r$x <- points.r$xend
    points.r$colour <- point.colour.r %||% data$colour
    points.r$size <- point.size.r %||% (data$size * 2.5)

    gList(
      ggplot2::GeomSegment$draw_panel(data, panel_scales, coord),
      ggplot2::GeomPoint$draw_panel(points.l, panel_scales, coord),
      ggplot2::GeomPoint$draw_panel(points.r, panel_scales, coord)
    )

  },

  draw_key = draw_key_point
)

In essence, this new geom saves calls to three additional geom_s, but does add more parameters, so it's not really clear if it saves much typing.

If you end up making anything interesting with geom_dumbbell() I encourage you to drop a note in the comments with a link.

Clandestine DNS lookups with gdns

2016-04-11 – 00:25
Posted in APIs, Cybersecurity, DNS, Information Security, R, Security Awareness
Tagged post
Comments (1)

Google recently [announced](https://developers.google.com/speed/public-dns/docs/dns-over-https) their DNS-over-HTTPS API, which _”enhances privacy and security between a client and a recursive resolver, and complements DNSSEC to provide end-to-end authenticated DNS lookups”_. The REST API they provided was pretty simple to [wrap into a package](https://github.com/hrbrmstr/gdns) and I tossed in some [SPF](http://www.openspf.org/SPF_Record_Syntax) functions that I had lying around to bulk it up a bit.

### Why DNS-over-HTTPS?

DNS machinations usually happen over UDP (and sometimes TCP). Unless you’re using some fairly modern DNS augmentations, these exchanges happen in cleartext, meaning your query and the response are exposed during transport (and they are already exposed to the server you’re querying for a response).

DNS queries over HTTPS will be harder to [spoof](http://www.veracode.com/security/spoofing-attack) and the query + response will be encrypted, so you gain transport privacy when, say, you’re at Starbucks or from your DSL, FiOS, Gogo Inflight, or cable internet provider (yes, they all snoop on your DNS queries).

You end up trusting Google quite a bit with this API, but if you were currently using `8.8.8.8` or `8.8.4.4` (or their IPv6 equivalents) you were already trusting Google (and it’s likely Google knows what you’re doing on the internet anyway given all the trackers and especially if you’re using Chrome).

One additional item you gain using this API is more control over [`EDNS0`](https://tools.ietf.org/html/draft-vandergaast-edns-client-ip-00) settings. `EDNS0` is a DNS protocol extension that, for example, enables the content delivery networks to pick the “closest” server farm to ensure speedy delivery of your streaming Game of Thrones binge watch. They get to know a piece of your IP address so they can make this decision, but you end up giving away a bit of privacy (though you lose the privacy in the end since the target CDN servers know precisely where you are).

Right now, there’s no way for most clients to use DNS-over-HTTPS directly, but the API can be used in a programmatic fashion, which may be helpful in situations where you need to do some DNS spelunking but UDP is blocked or you’re on a platform that can’t build the [`resolv`](https://github.com/hrbrmstr/resolv) package.

You can learn a bit more about DNS and privacy in this [IETF paper](https://www.ietf.org/mail-archive/web/dns-privacy/current/pdfWqAIUmEl47.pdf) [PDF].

### Mining DNS with `gdns`

The `gdns` package is pretty straightforward. Use the `query()` function to get DNS info for a single entity:

library(gdns)
 
query("apple.com")
## $Status  
## [1] 0          # NOERROR - Standard DNS response code (32 bit integer)
## 
## $TC
## [1] FALSE      # Whether the response is truncated
## 
## $RD
## [1] TRUE       # Should always be true for Google Public DNS
## 
## $RA
## [1] TRUE       # Should always be true for Google Public DNS
## 
## $AD
## [1] FALSE      # Whether all response data was validated with DNSSEC
## 
## $CD
## [1] FALSE      # Whether the client asked to disable DNSSEC
## 
## $Question
##         name type
## 1 apple.com.    1
## 
## $Answer
##         name type  TTL          data
## 1 apple.com.    1 1547 17.172.224.47
## 2 apple.com.    1 1547  17.178.96.59
## 3 apple.com.    1 1547 17.142.160.59
## 
## $Additional
## list()
## 
## $edns_client_subnet
## [1] "0.0.0.0/0"

The `gdns` lookup functions are set to use an `edns_client_subnet` of `0.0.0.0/0`, meaning your local IP address or subnet is not leaked outside of your connection to Google (you can override this behavior).

You can do reverse lookups as well (i.e. query IP addresses):

query("17.172.224.47", "PTR")
## $Status
## [1] 0
## 
## $TC
## [1] FALSE
## 
## $RD
## [1] TRUE
## 
## $RA
## [1] TRUE
## 
## $AD
## [1] FALSE
## 
## $CD
## [1] FALSE
## 
## $Question
##                          name type
## 1 47.224.172.17.in-addr.arpa.   12
## 
## $Answer
##                            name type  TTL                           data
## 1   47.224.172.17.in-addr.arpa.   12 1073               webobjects.info.
## 2   47.224.172.17.in-addr.arpa.   12 1073                   yessql.info.
## 3   47.224.172.17.in-addr.arpa.   12 1073                 apples-msk.ru.
## 4   47.224.172.17.in-addr.arpa.   12 1073                     icloud.se.
## 5   47.224.172.17.in-addr.arpa.   12 1073                     icloud.es.
## 6   47.224.172.17.in-addr.arpa.   12 1073                     icloud.om.
## 7   47.224.172.17.in-addr.arpa.   12 1073                   icloudo.com.
## 8   47.224.172.17.in-addr.arpa.   12 1073                     icloud.ch.
## 9   47.224.172.17.in-addr.arpa.   12 1073                     icloud.fr.
## 10  47.224.172.17.in-addr.arpa.   12 1073                   icloude.com.
## 11  47.224.172.17.in-addr.arpa.   12 1073          camelspaceeffect.com.
## 12  47.224.172.17.in-addr.arpa.   12 1073                 camelphat.com.
## 13  47.224.172.17.in-addr.arpa.   12 1073              alchemysynth.com.
## 14  47.224.172.17.in-addr.arpa.   12 1073                    openni.org.
## 15  47.224.172.17.in-addr.arpa.   12 1073                      swell.am.
## 16  47.224.172.17.in-addr.arpa.   12 1073                  appleweb.net.
## 17  47.224.172.17.in-addr.arpa.   12 1073       appleipodsettlement.com.
## 18  47.224.172.17.in-addr.arpa.   12 1073                    earpod.net.
## 19  47.224.172.17.in-addr.arpa.   12 1073                 yourapple.com.
## 20  47.224.172.17.in-addr.arpa.   12 1073                    xserve.net.
## 21  47.224.172.17.in-addr.arpa.   12 1073                    xserve.com.
## 22  47.224.172.17.in-addr.arpa.   12 1073            velocityengine.com.
## 23  47.224.172.17.in-addr.arpa.   12 1073           velocity-engine.com.
## 24  47.224.172.17.in-addr.arpa.   12 1073            universityarts.com.
## 25  47.224.172.17.in-addr.arpa.   12 1073            thinkdifferent.com.
## 26  47.224.172.17.in-addr.arpa.   12 1073               theatremode.com.
## 27  47.224.172.17.in-addr.arpa.   12 1073               theatermode.com.
## 28  47.224.172.17.in-addr.arpa.   12 1073           streamquicktime.net.
## 29  47.224.172.17.in-addr.arpa.   12 1073           streamquicktime.com.
## 30  47.224.172.17.in-addr.arpa.   12 1073                ripmixburn.com.
## 31  47.224.172.17.in-addr.arpa.   12 1073              rip-mix-burn.com.
## 32  47.224.172.17.in-addr.arpa.   12 1073        quicktimestreaming.net.
## 33  47.224.172.17.in-addr.arpa.   12 1073        quicktimestreaming.com.
## 34  47.224.172.17.in-addr.arpa.   12 1073                  quicktime.cc.
## 35  47.224.172.17.in-addr.arpa.   12 1073                      qttv.net.
## 36  47.224.172.17.in-addr.arpa.   12 1073                      qtml.com.
## 37  47.224.172.17.in-addr.arpa.   12 1073                     qt-tv.net.
## 38  47.224.172.17.in-addr.arpa.   12 1073          publishingsurvey.org.
## 39  47.224.172.17.in-addr.arpa.   12 1073          publishingsurvey.com.
## 40  47.224.172.17.in-addr.arpa.   12 1073        publishingresearch.org.
## 41  47.224.172.17.in-addr.arpa.   12 1073        publishingresearch.com.
## 42  47.224.172.17.in-addr.arpa.   12 1073         publishing-survey.org.
## 43  47.224.172.17.in-addr.arpa.   12 1073         publishing-survey.com.
## 44  47.224.172.17.in-addr.arpa.   12 1073       publishing-research.org.
## 45  47.224.172.17.in-addr.arpa.   12 1073       publishing-research.com.
## 46  47.224.172.17.in-addr.arpa.   12 1073                  powerbook.cc.
## 47  47.224.172.17.in-addr.arpa.   12 1073             playquicktime.net.
## 48  47.224.172.17.in-addr.arpa.   12 1073             playquicktime.com.
## 49  47.224.172.17.in-addr.arpa.   12 1073           nwk-apple.apple.com.
## 50  47.224.172.17.in-addr.arpa.   12 1073                   myapple.net.
## 51  47.224.172.17.in-addr.arpa.   12 1073                  macreach.net.
## 52  47.224.172.17.in-addr.arpa.   12 1073                  macreach.com.
## 53  47.224.172.17.in-addr.arpa.   12 1073                   macmate.com.
## 54  47.224.172.17.in-addr.arpa.   12 1073         macintoshsoftware.com.
## 55  47.224.172.17.in-addr.arpa.   12 1073                    machos.net.
## 56  47.224.172.17.in-addr.arpa.   12 1073                   mach-os.net.
## 57  47.224.172.17.in-addr.arpa.   12 1073                   mach-os.com.
## 58  47.224.172.17.in-addr.arpa.   12 1073                   ischool.com.
## 59  47.224.172.17.in-addr.arpa.   12 1073           insidemacintosh.com.
## 60  47.224.172.17.in-addr.arpa.   12 1073             imovietheater.com.
## 61  47.224.172.17.in-addr.arpa.   12 1073               imoviestage.com.
## 62  47.224.172.17.in-addr.arpa.   12 1073             imoviegallery.com.
## 63  47.224.172.17.in-addr.arpa.   12 1073               imacsources.com.
## 64  47.224.172.17.in-addr.arpa.   12 1073        imac-applecomputer.com.
## 65  47.224.172.17.in-addr.arpa.   12 1073                imac-apple.com.
## 66  47.224.172.17.in-addr.arpa.   12 1073                     ikids.com.
## 67  47.224.172.17.in-addr.arpa.   12 1073              ibookpartner.com.
## 68  47.224.172.17.in-addr.arpa.   12 1073                   geoport.com.
## 69  47.224.172.17.in-addr.arpa.   12 1073                   firewire.cl.
## 70  47.224.172.17.in-addr.arpa.   12 1073               expertapple.com.
## 71  47.224.172.17.in-addr.arpa.   12 1073              edu-research.org.
## 72  47.224.172.17.in-addr.arpa.   12 1073               dvdstudiopro.us.
## 73  47.224.172.17.in-addr.arpa.   12 1073              dvdstudiopro.org.
## 74  47.224.172.17.in-addr.arpa.   12 1073              dvdstudiopro.net.
## 75  47.224.172.17.in-addr.arpa.   12 1073             dvdstudiopro.info.
## 76  47.224.172.17.in-addr.arpa.   12 1073              dvdstudiopro.com.
## 77  47.224.172.17.in-addr.arpa.   12 1073              dvdstudiopro.biz.
## 78  47.224.172.17.in-addr.arpa.   12 1073          developercentral.com.
## 79  47.224.172.17.in-addr.arpa.   12 1073             desktopmovies.org.
## 80  47.224.172.17.in-addr.arpa.   12 1073             desktopmovies.net.
## 81  47.224.172.17.in-addr.arpa.   12 1073              desktopmovie.org.
## 82  47.224.172.17.in-addr.arpa.   12 1073              desktopmovie.net.
## 83  47.224.172.17.in-addr.arpa.   12 1073              desktopmovie.com.
## 84  47.224.172.17.in-addr.arpa.   12 1073          darwinsourcecode.com.
## 85  47.224.172.17.in-addr.arpa.   12 1073              darwinsource.org.
## 86  47.224.172.17.in-addr.arpa.   12 1073              darwinsource.com.
## 87  47.224.172.17.in-addr.arpa.   12 1073                darwincode.com.
## 88  47.224.172.17.in-addr.arpa.   12 1073                carbontest.com.
## 89  47.224.172.17.in-addr.arpa.   12 1073              carbondating.com.
## 90  47.224.172.17.in-addr.arpa.   12 1073                 carbonapi.com.
## 91  47.224.172.17.in-addr.arpa.   12 1073           braeburncapital.com.
## 92  47.224.172.17.in-addr.arpa.   12 1073                  applexpo.net.
## 93  47.224.172.17.in-addr.arpa.   12 1073                  applexpo.com.
## 94  47.224.172.17.in-addr.arpa.   12 1073                applereach.net.
## 95  47.224.172.17.in-addr.arpa.   12 1073                applereach.com.
## 96  47.224.172.17.in-addr.arpa.   12 1073            appleiservices.com.
## 97  47.224.172.17.in-addr.arpa.   12 1073     applefinalcutproworld.org.
## 98  47.224.172.17.in-addr.arpa.   12 1073     applefinalcutproworld.net.
## 99  47.224.172.17.in-addr.arpa.   12 1073     applefinalcutproworld.com.
## 100 47.224.172.17.in-addr.arpa.   12 1073            applefilmmaker.com.
## 101 47.224.172.17.in-addr.arpa.   12 1073             applefilmaker.com.
## 102 47.224.172.17.in-addr.arpa.   12 1073                appleenews.com.
## 103 47.224.172.17.in-addr.arpa.   12 1073               appledarwin.org.
## 104 47.224.172.17.in-addr.arpa.   12 1073               appledarwin.net.
## 105 47.224.172.17.in-addr.arpa.   12 1073               appledarwin.com.
## 106 47.224.172.17.in-addr.arpa.   12 1073         applecomputerimac.com.
## 107 47.224.172.17.in-addr.arpa.   12 1073        applecomputer-imac.com.
## 108 47.224.172.17.in-addr.arpa.   12 1073                  applecare.cc.
## 109 47.224.172.17.in-addr.arpa.   12 1073               applecarbon.com.
## 110 47.224.172.17.in-addr.arpa.   12 1073                 apple-inc.net.
## 111 47.224.172.17.in-addr.arpa.   12 1073               apple-enews.com.
## 112 47.224.172.17.in-addr.arpa.   12 1073              apple-darwin.org.
## 113 47.224.172.17.in-addr.arpa.   12 1073              apple-darwin.net.
## 114 47.224.172.17.in-addr.arpa.   12 1073              apple-darwin.com.
## 115 47.224.172.17.in-addr.arpa.   12 1073                  mobileme.com.
## 116 47.224.172.17.in-addr.arpa.   12 1073                ipa-iphone.net.
## 117 47.224.172.17.in-addr.arpa.   12 1073               jetfuelapps.com.
## 118 47.224.172.17.in-addr.arpa.   12 1073                jetfuelapp.com.
## 119 47.224.172.17.in-addr.arpa.   12 1073                   burstly.net.
## 120 47.224.172.17.in-addr.arpa.   12 1073             appmediagroup.com.
## 121 47.224.172.17.in-addr.arpa.   12 1073             airsupportapp.com.
## 122 47.224.172.17.in-addr.arpa.   12 1073            burstlyrewards.com.
## 123 47.224.172.17.in-addr.arpa.   12 1073        surveys-temp.apple.com.
## 124 47.224.172.17.in-addr.arpa.   12 1073               appleiphone.com.
## 125 47.224.172.17.in-addr.arpa.   12 1073                       asto.re.
## 126 47.224.172.17.in-addr.arpa.   12 1073                 itunesops.com.
## 127 47.224.172.17.in-addr.arpa.   12 1073                     apple.com.
## 128 47.224.172.17.in-addr.arpa.   12 1073     st11p01ww-apple.apple.com.
## 129 47.224.172.17.in-addr.arpa.   12 1073                      apple.by.
## 130 47.224.172.17.in-addr.arpa.   12 1073                 airtunes.info.
## 131 47.224.172.17.in-addr.arpa.   12 1073              applecentre.info.
## 132 47.224.172.17.in-addr.arpa.   12 1073         applecomputerinc.info.
## 133 47.224.172.17.in-addr.arpa.   12 1073                appleexpo.info.
## 134 47.224.172.17.in-addr.arpa.   12 1073             applemasters.info.
## 135 47.224.172.17.in-addr.arpa.   12 1073                 applepay.info.
## 136 47.224.172.17.in-addr.arpa.   12 1073 applepaymerchantsupplies.info.
## 137 47.224.172.17.in-addr.arpa.   12 1073         applepaysupplies.info.
## 138 47.224.172.17.in-addr.arpa.   12 1073              applescript.info.
## 139 47.224.172.17.in-addr.arpa.   12 1073               appleshare.info.
## 140 47.224.172.17.in-addr.arpa.   12 1073                   macosx.info.
## 141 47.224.172.17.in-addr.arpa.   12 1073                powerbook.info.
## 142 47.224.172.17.in-addr.arpa.   12 1073                 powermac.info.
## 143 47.224.172.17.in-addr.arpa.   12 1073            quicktimelive.info.
## 144 47.224.172.17.in-addr.arpa.   12 1073              quicktimetv.info.
## 145 47.224.172.17.in-addr.arpa.   12 1073                 sherlock.info.
## 146 47.224.172.17.in-addr.arpa.   12 1073            shopdifferent.info.
## 147 47.224.172.17.in-addr.arpa.   12 1073                 skyvines.info.
## 148 47.224.172.17.in-addr.arpa.   12 1073                     ubnw.info.
## 
## $Additional
## list()
## 
## $edns_client_subnet
## [1] "0.0.0.0/0"

And, you can go “easter egg” hunting:

cat(query("google-public-dns-a.google.com", "TXT")$Answer$data)
## "http://xkcd.com/1361/"

Note that Google DNS-over-HTTPS supports [all the RR types](http://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-4).

If you have more than a few domains to lookup and are querying for the same RR record, you can use the `bulk_query()` function:

hosts <- c("rud.is", "dds.ec", "r-project.org", "rstudio.com", "apple.com")
bulk_query(hosts)
## Source: local data frame [7 x 4]
## 
##             name  type   TTL            data
##            (chr) (int) (int)           (chr)
## 1        rud.is.     1  3599 104.236.112.222
## 2        dds.ec.     1   299   162.243.111.4
## 3 r-project.org.     1  3601   137.208.57.37
## 4   rstudio.com.     1  3599    45.79.156.36
## 5     apple.com.     1  1088   17.172.224.47
## 6     apple.com.     1  1088    17.178.96.59
## 7     apple.com.     1  1088   17.142.160.59

Note that this function only returns a `data_frame` (none of the status fields).

### More DNSpelunking with `gdns`

DNS records contain a treasure trove of data (at least for cybersecurity researchers). Say you have a list of base, primary domains for the Fortune 1000:

library(readr)
library(urltools)
 
URL <- "https://gist.githubusercontent.com/hrbrmstr/ae574201af3de035c684/raw/2d21bb4132b77b38f2992dfaab99649397f238e9/f1000.csv"
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil)
 
f1k <- read_csv(fil)
 
doms1k <- suffix_extract(domain(f1k$website))
doms1k <- paste(doms1k$domain, doms1k$suffix, sep=".")
 
head(doms1k)
## [1] "walmart.com"           "exxonmobil.com"       
## [3] "chevron.com"           "berkshirehathaway.com"
## [5] "apple.com"             "gm.com"

We can get all the `TXT` records for them:

library(parallel)
library(doParallel) # parallel ops will make this go faster
library(foreach)
library(dplyr)
library(ggplot2)
library(grid)
library(hrbrmrkdn)
 
cl <- makePSOCKcluster(4)
registerDoParallel(cl)
 
f1k_l <- foreach(dom=doms1k) %dopar% gdns::bulk_query(dom, "TXT")
f1k <- bind_rows(f1k_l)
 
length(unique(f1k$name))
## [1] 858
 
df <- count(count(f1k, name), `Number of TXT records`=n)
df <- bind_rows(df, data_frame(`Number of TXT records`=0, n=142))
 
gg <- ggplot(df, aes(`Number of TXT records`, n))
gg <- gg + geom_bar(stat="identity", width=0.75)
gg <- gg + scale_x_continuous(expand=c(0,0), breaks=0:13)
gg <- gg + scale_y_continuous(expand=c(0,0))
gg <- gg + labs(y="# Orgs", 
                title="TXT record count per Fortune 1000 Org")
gg <- gg + theme_hrbrmstr(grid="Y", axis="xy")
gg <- gg + theme(axis.title.x=element_text(margin=margin(t=-22)))
gg <- gg + theme(axis.title.y=element_text(angle=0, vjust=1, 
                                           margin=margin(r=-49)))
gg <- gg + theme(plot.margin=margin(t=10, l=30, b=30, r=10))
gg <- gg + theme(plot.title=element_text(margin=margin(b=20)))
gg

We can see that 858 of the Fortune 1000 have `TXT` records and more than a few have between 2 and 5 of them. Why look at `TXT` records? Well, they can tell us things like who uses cloud e-mail services, such as Outlook365:

sort(f1k$name[which(grepl("(MS=|outlook)", spf_includes(f1k$data), ignore.case=TRUE))])
##   [1] "21cf.com."                  "77nrg.com."                 "abbott.com."                "acuitybrands.com."         
##   [5] "adm.com."                   "adobe.com."                 "alaskaair.com."             "aleris.com."               
##   [9] "allergan.com."              "altria.com."                "amark.com."                 "ameren.com."               
##  [13] "americantower.com."         "ametek.com."                "amkor.com."                 "amphenol.com."             
##  [17] "amwater.com."               "analog.com."                "anixter.com."               "apachecorp.com."           
##  [21] "archrock.com."              "archrock.com."              "armstrong.com."             "aschulman.com."            
##  [25] "assurant.com."              "autonation.com."            "autozone.com."              "axiall.com."               
##  [29] "bd.com."                    "belk.com."                  "biglots.com."               "bio-rad.com."              
##  [33] "biomet.com."                "bloominbrands.com."         "bms.com."                   "borgwarner.com."           
##  [37] "boydgaming.com."            "brinks.com."                "brocade.com."               "brunswick.com."            
##  [41] "cabotog.com."               "caleres.com."               "campbellsoupcompany.com."   "carefusion.com."           
##  [45] "carlyle.com."               "cartech.com."               "cbrands.com."               "cbre.com."                 
##  [49] "chemtura.com."              "chipotle.com."              "chiquita.com."              "churchdwight.com."         
##  [53] "cinemark.com."              "cintas.com."                "cmc.com."                   "cmsenergy.com."            
##  [57] "cognizant.com."             "colfaxcorp.com."            "columbia.com."              "commscope.com."            
##  [61] "con-way.com."               "convergys.com."             "couche-tard.com."           "crestwoodlp.com."          
##  [65] "crowncastle.com."           "crowncork.com."             "csx.com."                   "cummins.com."              
##  [69] "cunamutual.com."            "dana.com."                  "darlingii.com."             "deanfoods.com."            
##  [73] "dentsplysirona.com."        "discoverfinancial.com."     "disney.com."                "donaldson.com."            
##  [77] "drhorton.com."              "dupont.com."                "dyn-intl.com."              "dynegy.com."               
##  [81] "ea.com."                    "eastman.com."               "ecolab.com."                "edgewell.com."             
##  [85] "edwards.com."               "emc.com."                   "enablemidstream.com."       "energyfutureholdings.com." 
##  [89] "energytransfer.com."        "eogresources.com."          "equinix.com."               "expeditors.com."           
##  [93] "express.com."               "fastenal.com."              "ferrellgas.com."            "fisglobal.com."            
##  [97] "flowserve.com."             "fmglobal.com."              "fnf.com."                   "g-iii.com."                
## [101] "genpt.com."                 "ggp.com."                   "gilead.com."                "goodyear.com."             
## [105] "grainger.com."              "graphicpkg.com."            "hanes.com."                 "hanover.com."              
## [109] "harley-davidson.com."       "harsco.com."                "hasbro.com."                "hbfuller.com."             
## [113] "hei.com."                   "hhgregg.com."               "hnicorp.com."               "homedepot.com."            
## [117] "hpinc.com."                 "hubgroup.com."              "iac.com."                   "igt.com."                  
## [121] "iheartmedia.com."           "insperity.com."             "itt.com."                   "itw.com."                  
## [125] "jarden.com."                "jcpenney.com."              "jll.com."                   "joyglobal.com."            
## [129] "juniper.net."               "kellyservices.com."         "kennametal.com."            "kiewit.com."               
## [133] "kindermorgan.com."          "kindredhealthcare.com."     "kodak.com."                 "lamresearch.com."          
## [137] "lansingtradegroup.com."     "lennar.com."                "levistrauss.com."           "lithia.com."               
## [141] "manitowoc.com."             "manpowergroup.com."         "marathonoil.com."           "marathonpetroleum.com."    
## [145] "mastec.com."                "mastercard.com."            "mattel.com."                "maximintegrated.com."      
## [149] "mednax.com."                "mercuryinsurance.com."      "mgmresorts.com."            "micron.com."               
## [153] "mohawkind.com."             "molsoncoors.com."           "mosaicco.com."              "motorolasolutions.com."    
## [157] "mpgdriven.com."             "mscdirect.com."             "mtb.com."                   "murphyoilcorp.com."        
## [161] "mutualofomaha.com."         "mwv.com."                   "navistar.com."              "nbty.com."                 
## [165] "newellrubbermaid.com."      "nexeosolutions.com."        "nike.com."                  "nobleenergyinc.com."       
## [169] "o-i.com."                   "oge.com."                   "olin.com."                  "omnicomgroup.com."         
## [173] "onsemi.com."                "owens-minor.com."           "paychex.com."               "peabodyenergy.com."        
## [177] "pepboys.com."               "pmi.com."                   "pnkinc.com."                "polaris.com."              
## [181] "polyone.com."               "postholdings.com."          "ppg.com."                   "prudential.com."           
## [185] "qg.com."                    "quantaservices.com."        "quintiles.com."             "rcscapital.com."           
## [189] "rexnord.com."               "roberthalf.com."            "rushenterprises.com."       "ryland.com."               
## [193] "sandisk.com."               "sands.com."                 "scansource.com."            "sempra.com."               
## [197] "sonoco.com."                "spiritaero.com."            "sprouts.com."               "stanleyblackanddecker.com."
## [201] "starwoodhotels.com."        "steelcase.com."             "stryker.com."               "sunedison.com."            
## [205] "sunpower.com."              "supervalu.com."             "swifttrans.com."            "synnex.com."               
## [209] "taylormorrison.com."        "techdata.com."              "tegna.com."                 "tempursealy.com."          
## [213] "tetratech.com."             "theice.com."                "thermofisher.com."          "tjx.com."                  
## [217] "trueblue.com."              "ufpi.com."                  "ulta.com."                  "unfi.com."                 
## [221] "unifiedgrocers.com."        "universalcorp.com."         "vishay.com."                "visteon.com."              
## [225] "vwr.com."                   "westarenergy.com."          "westernunion.com."          "westrock.com."             
## [229] "wfscorp.com."               "whitewave.com."             "wpxenergy.com."             "wyndhamworldwide.com."     
## [233] "xilinx.com."                "xpo.com."                   "yum.com."                   "zimmerbiomet.com."

That’s 236 of them outsourcing some part of e-mail services to Microsoft.

We can also see which ones have terrible mail configs (`+all` or `all` passing):

f1k[which(passes_all(f1k$data)),]$name
## [1] "wfscorp.com."      "dupont.com."       "group1auto.com."   "uhsinc.com."      
## [5] "bigheartpet.com."  "pcconnection.com."

or are configured for Exchange federation services:

sort(f1k$name[which(grepl("==", f1k$data))])
## sort(f1k$name[which(grepl("==", f1k$data))])
##   [1] "21cf.com."                 "aarons.com."               "abbott.com."               "abbvie.com."              
##   [5] "actavis.com."              "activisionblizzard.com."   "acuitybrands.com."         "adm.com."                 
##   [9] "adobe.com."                "adt.com."                  "advanceautoparts.com."     "aecom.com."               
##  [13] "aetna.com."                "agilent.com."              "airproducts.com."          "alcoa.com."               
##  [17] "aleris.com."               "allergan.com."             "alliancedata.com."         "amcnetworks.com."         
##  [21] "amd.com."                  "americantower.com."        "amfam.com."                "amgen.com."               
##  [25] "amtrustgroup.com."         "amtrustgroup.com."         "amtrustgroup.com."         "amtrustgroup.com."        
##  [29] "anadarko.com."             "analog.com."               "apachecorp.com."           "applied.com."             
##  [33] "aptar.com."                "aramark.com."              "aramark.com."              "arcb.com."                
##  [37] "archcoal.com."             "armstrong.com."            "armstrong.com."            "arrow.com."               
##  [41] "asburyauto.com."           "autonation.com."           "avnet.com."                "ball.com."                
##  [45] "bankofamerica.com."        "baxter.com."               "bc.com."                   "bd.com."                  
##  [49] "bd.com."                   "bd.com."                   "belden.com."               "bemis.com."               
##  [53] "bestbuy.com."              "biogen.com."               "biomet.com."               "bloominbrands.com."       
##  [57] "bms.com."                  "boeing.com."               "bonton.com."               "borgwarner.com."          
##  [61] "brinks.com."               "brocade.com."              "brunswick.com."            "c-a-m.com."               
##  [65] "ca.com."                   "cabelas.com."              "cabotog.com."              "caleres.com."             
##  [69] "caleres.com."              "caleres.com."              "calpine.com."              "capitalone.com."          
##  [73] "cardinal.com."             "carlyle.com."              "carlyle.com."              "cartech.com."             
##  [77] "cbre.com."                 "celgene.com."              "centene.com."              "centurylink.com."         
##  [81] "cerner.com."               "cerner.com."               "cfindustries.com."         "ch2m.com."                
##  [85] "chevron.com."              "chipotle.com."             "chiquita.com."             "chk.com."                 
##  [89] "chrobinson.com."           "chs.net."                  "chsinc.com."               "chubb.com."               
##  [93] "ciena.com."                "cigna.com."                "cinemark.com."             "cit.com."                 
##  [97] "cmc.com."                  "cmegroup.com."             "coach.com."                "cognizant.com."           
## [101] "cokecce.com."              "colfaxcorp.com."           "columbia.com."             "commscope.com."           
## [105] "con-way.com."              "conagrafoods.com."         "conocophillips.com."       "coopertire.com."          
## [109] "core-mark.com."            "crbard.com."               "crestwoodlp.com."          "crowncastle.com."         
## [113] "crowncork.com."            "csx.com."                  "danaher.com."              "darden.com."              
## [117] "darlingii.com."            "davita.com."               "davita.com."               "davita.com."              
## [121] "dentsplysirona.com."       "diebold.com."              "diplomat.is."              "dish.com."                
## [125] "disney.com."               "donaldson.com."            "dresser-rand.com."         "dstsystems.com."          
## [129] "dupont.com."               "dupont.com."               "dyn-intl.com."             "dyn-intl.com."            
## [133] "dynegy.com."               "ea.com."                   "ea.com."                   "eastman.com."             
## [137] "ebay.com."                 "echostar.com."             "ecolab.com."               "edmc.edu."                
## [141] "edwards.com."              "elcompanies.com."          "emc.com."                  "emerson.com."             
## [145] "energyfutureholdings.com." "energytransfer.com."       "eogresources.com."         "equinix.com."             
## [149] "essendant.com."            "esterline.com."            "evhc.net."                 "exelisinc.com."           
## [153] "exeloncorp.com."           "express-scripts.com."      "express.com."              "express.com."             
## [157] "exxonmobil.com."           "familydollar.com."         "fanniemae.com."            "fastenal.com."            
## [161] "fbhs.com."                 "ferrellgas.com."           "firstenergycorp.com."      "firstsolar.com."          
## [165] "fiserv.com."               "flowserve.com."            "fmc.com."                  "fmglobal.com."            
## [169] "fnf.com."                  "freddiemac.com."           "ge.com."                   "genpt.com."               
## [173] "genworth.com."             "ggp.com."                  "grace.com."                "grainger.com."            
## [177] "graphicpkg.com."           "graybar.com."              "guess.com."                "hain.com."                
## [181] "halliburton.com."          "hanes.com."                "hanes.com."                "harley-davidson.com."     
## [185] "harman.com."               "harris.com."               "harsco.com."               "hasbro.com."              
## [189] "hcahealthcare.com."        "hcc.com."                  "hei.com."                  "henryschein.com."         
## [193] "hess.com."                 "hhgregg.com."              "hnicorp.com."              "hollyfrontier.com."       
## [197] "hologic.com."              "honeywell.com."            "hospira.com."              "hp.com."                  
## [201] "hpinc.com."                "hrblock.com."              "iac.com."                  "igt.com."                 
## [205] "iheartmedia.com."          "imshealth.com."            "ingrammicro.com."          "intel.com."               
## [209] "interpublic.com."          "intuit.com."               "ironmountain.com."         "jacobs.com."              
## [213] "jarden.com."               "jcpenney.com."             "jll.com."                  "johndeere.com."           
## [217] "johndeere.com."            "joyglobal.com."            "juniper.net."              "karauctionservices.com."  
## [221] "kbhome.com."               "kemper.com."               "keurig.com."               "khov.com."                
## [225] "kindredhealthcare.com."    "kkr.com."                  "kla-tencor.com."           "labcorp.com."             
## [229] "labcorp.com."              "lamresearch.com."          "lamresearch.com."          "landolakesinc.com."       
## [233] "lansingtradegroup.com."    "lear.com."                 "leggmason.com."            "leidos.com."              
## [237] "level3.com."               "libertymutual.com."        "lilly.com."                "lithia.com."              
## [241] "livenation.com."           "lkqcorp.com."              "loews.com."                "magellanhealth.com."      
## [245] "manitowoc.com."            "marathonoil.com."          "marathonpetroleum.com."    "markelcorp.com."          
## [249] "markwest.com."             "marriott.com."             "martinmarietta.com."       "masco.com."               
## [253] "massmutual.com."           "mastec.com."               "mastercard.com."           "mattel.com."              
## [257] "maximintegrated.com."      "mckesson.com."             "mercuryinsurance.com."     "meritor.com."             
## [261] "metlife.com."              "mgmresorts.com."           "micron.com."               "microsoft.com."           
## [265] "mohawkind.com."            "molsoncoors.com."          "monsanto.com."             "mosaicco.com."            
## [269] "motorolasolutions.com."    "mscdirect.com."            "murphyoilcorp.com."        "nasdaqomx.com."           
## [273] "navistar.com."             "nbty.com."                 "ncr.com."                  "netapp.com."              
## [277] "newfield.com."             "newscorp.com."             "nike.com."                 "nov.com."                 
## [281] "nrgenergy.com."            "ntenergy.com."             "nucor.com."                "nustarenergy.com."        
## [285] "o-i.com."                  "oaktreecapital.com."       "ocwen.com."                "omnicare.com."            
## [289] "oneok.com."                "oneok.com."                "onsemi.com."               "outerwall.com."           
## [293] "owens-minor.com."          "owens-minor.com."          "oxy.com."                  "packagingcorp.com."       
## [297] "pall.com."                 "parexel.com."              "paychex.com."              "pcconnection.com."        
## [301] "penskeautomotive.com."     "pepsico.com."              "pfizer.com."               "pg.com."                  
## [305] "polaris.com."              "polyone.com."              "pplweb.com."               "principal.com."           
## [309] "protective.com."           "publix.com."               "qg.com."                   "questdiagnostics.com."    
## [313] "quintiles.com."            "rcscapital.com."           "realogy.com."              "regmovies.com."           
## [317] "rentacenter.com."          "republicservices.com."     "rexnord.com."              "reynoldsamerican.com."    
## [321] "reynoldsamerican.com."     "rgare.com."                "roberthalf.com."           "rpc.net."                 
## [325] "rushenterprises.com."      "safeway.com."              "saic.com."                 "sandisk.com."             
## [329] "scana.com."                "scansource.com."           "seaboardcorp.com."         "selective.com."           
## [333] "selective.com."            "sempra.com."               "servicemaster.com."        "servicemaster.com."       
## [337] "servicemaster.com."        "sjm.com."                  "sm-energy.com."            "spectraenergy.com."       
## [341] "spiritaero.com."           "sprouts.com."              "spx.com."                  "staples.com."             
## [345] "starbucks.com."            "starwoodhotels.com."       "statestreet.com."          "steelcase.com."           
## [349] "steeldynamics.com."        "stericycle.com."           "stifel.com."               "stryker.com."             
## [353] "sunedison.com."            "sungard.com."              "supervalu.com."            "symantec.com."            
## [357] "symantec.com."             "synnex.com."               "synopsys.com."             "taylormorrison.com."      
## [361] "tdsinc.com."               "teamhealth.com."           "techdata.com."             "teledyne.com."            
## [365] "tempursealy.com."          "tenethealth.com."          "teradata.com."             "tetratech.com."           
## [369] "textron.com."              "thermofisher.com."         "tiaa-cref.org."            "tiffany.com."             
## [373] "timewarner.com."           "towerswatson.com."         "treehousefoods.com."       "tribunemedia.com."        
## [377] "trimble.com."              "trinet.com."               "trueblue.com."             "ugicorp.com."             
## [381] "uhsinc.com."               "ulta.com."                 "unifiedgrocers.com."       "unisys.com."              
## [385] "unum.com."                 "usfoods.com."              "varian.com."               "verizon.com."             
## [389] "vfc.com."                  "viacom.com."               "visa.com."                 "vishay.com."              
## [393] "visteon.com."              "wabtec.com."               "walmart.com."              "wecenergygroup.com."      
## [397] "wecenergygroup.com."       "west.com."                 "westarenergy.com."         "westlake.com."            
## [401] "westrock.com."             "weyerhaeuser.com."         "wholefoodsmarket.com."     "williams.com."            
## [405] "wm.com."                   "wnr.com."                  "wpxenergy.com."            "wyndhamworldwide.com."    
## [409] "xerox.com."                "xpo.com."                  "yrcw.com."                 "yum.com."                 
## [413] "zoetis.com."

And, even go so far as to see what are the most popular third-party mail services:

incl <- suffix_extract(sort(unlist(spf_includes(f1k$data))))
incl <- data.frame(table(paste(incl$domain, incl$suffix, sep=".")), stringsAsFactors=FALSE)
incl <- head(arrange(incl, desc(Freq)), 20)
incl <- mutate(incl, Var1=factor(Var1, Var1))
incl <- rename(incl, Service=Var1, Count=Freq)
 
gg <- ggplot(incl, aes(Service, Count))
gg <- gg + geom_bar(stat="identity", width=0.75)
gg <- gg + scale_x_discrete(expand=c(0,0))
gg <- gg + scale_y_continuous(expand=c(0,0), limits=c(0, 250))
gg <- gg + coord_flip()
gg <- gg + labs(x=NULL, y=NULL, 
                title="Most popular services used by the F1000",
                subtitle="As determined by SPF record configuration")
gg <- gg + theme_hrbrmstr(grid="X", axis="y")
gg <- gg + theme(plot.margin=margin(t=10, l=10, b=20, r=10))
gg

### Fin

There are more `TXT` records to play with than just SPF ones and many other hidden easter eggs. I need to add a few more functions into `gdns` before shipping it off to CRAN, so if you have any feature requests, now’s the time to file a [github issue](https://github.com/hrbrmstr/gdns/issues).