Skip navigation

>_Stop making people suicidal. Stop telling people they’re going to be killed. Stop terrifying children. Stop giving racism free advertising. Stop trying to convince Americans that all the other Americans hate them. Stop. Stop. Stop._

Just. [Stop](http://slatestarcodex.com/2016/11/16/you-are-still-crying-wolf/).

2016-08-13 UPDATE: Fortune has a story on this and it does seem to be tax-related vs ideology. @thosjleeper suggested something similar as well about a week ago.

If you’re even remotely following the super insane U.S. 2016 POTUS circus election you’ve no doubt seen a resurgence of _”if X gets elected, I’m moving to Y”_ claims by folks who are “anti” one candidate or another. The [Washington Examiner](http://www.washingtonexaminer.com/americans-renouncing-citizenship-near-record-highs/article/2598074) did a story on last quarter’s U.S. expatriation numbers. I didn’t realize we had a department in charge of tracking and posting that data, but we do thanks to inane bureaucratic compliance laws.

I should have put _”posting that data”_ in quotes as it’s collected quarterly and posted ~2 months later in non-uniform HTML and PDF form across individual posts in a unique/custom Federal Register publishing system. How’s that hope and change in “open government data” working out for y’all?

The data is organized enough that we can take a look at the history of expatriation with some help from R. Along the way we’ll:

– see how to make parameterized web requests a bit cleaner with `httr`
– get even _more_ practice using the `purrr` package
– perhaps learn a new “trick” when using the `stringi` package
– show how we can “make do” living in a non-XPath 2 world (it’s actually pretty much at XPath 3 now, too #sigh)

A manual hunt on that system will eventually reveal a search URL that you can use in a `read.csv()` (to grab a list of URLs with the data, not the data itself #ugh). Those URLs are _gnarly_ (you’ll see what I mean if you do the hunt) but we can take advantage of the standardized URL query parameter that are used in the egregiously long URLs in a far more readable fashion if we use `httr::GET()` directly, especially since `httr::content()` will auto-convert the resultant CSV to a `tibble` for us since the site sets the response MIME type appropriately.

Unfortunately, when using the `6039G` search parameter (the expatriate tracking form ID) we do need to filter out non-quarterly report documents since the bureaucrats must have their ancillary TPS reports.

library(dplyr)
library(httr)
library(rvest)
library(purrr)
library(lubridate)
library(ggplot2) # devtools::install_github("hadley/ggplot2")
library(hrbrmisc) # devtools::install_github("hrbrmstr/hrbrmisc")
library(ggalt)
library(grid)
library(scales)
library(magrittr)
library(stringi)

GET("https://www.federalregister.gov/articles/search.csv",
    query=list(`conditions[agency_ids][]`=254,
               `conditions[publication_date][gte]`="01/01/2006",
               `conditions[publication_date][lte]`="7/29/2016",
               `conditions[term]`="6039G",
               `conditions[type][]`="NOTICE")) %>%
  content("parsed") %>%
  filter(grepl("^Quarterly", title)) -> register

glimpse(register)
## Observations: 44
## Variables: 9
## $ citation         <chr> "81 FR 50058", "81 FR 27198", "81 FR 65...
## $ document_number  <chr> "2016-18029", "2016-10578", "2016-02312...
## $ title            <chr> "Quarterly Publication of Individuals, ...
## $ publication_date <chr> "07/29/2016", "05/05/2016", "02/08/2016...
## $ type             <chr> "Notice", "Notice", "Notice", "Notice",...
## $ agency_names     <chr> "Treasury Department; Internal Revenue ...
## $ html_url         <chr> "https://www.federalregister.gov/articl...
## $ page_length      <int> 9, 17, 16, 20, 8, 20, 16, 12, 9, 15, 8,...
## $ qtr              <date> 2016-06-30, 2016-03-31, 2015-12-31, 20...

Now, we grab the content at each of the `html_url`s and save them off to be kind to bandwidth and/or folks with slow connections (so you don’t have to re-grab the HTML):

docs <- map(register$html_url, read_html)
saveRDS(docs, file="deserters.rds")

That generates a list of parsed HTML documents.

The reporting dates aren’t 100% consistent (i.e. not always “n” weeks from the collection date), but the data collection dates _embedded textually in the report_ are (mostly…some vary in the use of upper/lower case). So, we use the fact that these are boring legal documents that use the same language for various phrases and extract the “quarter ending” dates so we know what year/quarter the data is relevant for:

register %<>%
  mutate(qtr=map_chr(docs, ~stri_match_all_regex(html_text(.), "quarter ending ([[:alnum:], ]+)\\.",
                                                     opts_regex=stri_opts_regex(case_insensitive=TRUE))[[1]][,2]),
         qtr=mdy(qtr))

I don’t often use that particular `magrittr` pipe, but it “feels right” in this case and is handy in a pinch.

If you visit some of the URLs directly, you’ll see that there are tables and/or lists of names of the expats. However, there are woefully inconsistent naming & formatting conventions for these lists of names *and* (as I noted earlier) there’s no XPath 2 support in R. Therefore, we have to make a slightly more verbose XPath query to target the necessary table for scraping since we need to account for vastly different column name structures for the tables we are targeting.

NOTE: Older HTML pages may not have HTML tables at all and some only reference PDFs, so don’t rely on this code working beyond these particular dates (at least consistently).

We’ll also tidy up the data into a neat `tibble` for plotting.

map(docs, ~html_nodes(., xpath=".//table[contains(., 'First name') or
                                         contains(., 'FIRST NAME') or
                                         contains(., 'FNAME')]")) %>%
  map(~html_table(.)[[1]]) -> tabs

data_frame(date=register$qtr, count=map_int(tabs, nrow)) %>%
  filter(format(as.Date(date), "%Y") >= 2006) -> left

With the data wrangling work out of the way, we can tally up the throngs of folks desperate for greener pastures. First, by quarter:

gg <- ggplot(left, aes(date, count))
gg <- gg + geom_lollipop()
gg <- gg + geom_label(data=data.frame(),
                      aes(x=min(left$date), y=1500, label="# individuals"),
                      family="Arial Narrow", fontface="italic", size=3, label.size=0, hjust=0)
gg <- gg + scale_x_date(expand=c(0,14), limits=range(left$date))
gg <- gg + scale_y_continuous(expand=c(0,0), label=comma, limits=c(0,1520))
gg <- gg + labs(x=NULL, y=NULL,
                title="A Decade of Desertion",
                subtitle="Quarterly counts of U.S. individuals who have chosen to expatriate (2006-2016)",
                caption="Source: https://www.federalregister.gov/")
gg <- gg + theme_hrbrmstr_an(grid="Y")
gg

RStudio

and, then annually:

left %>%
  mutate(year=format(date, "%Y")) %>%
  count(year, wt=count) %>%
  ggplot(aes(year, n)) -> gg

gg <- gg + geom_bar(stat="identity", width=0.6)
gg <- gg + geom_label(data=data.frame(), aes(x=0, y=5000, label="# individuals"),
                      family="Arial Narrow", fontface="italic", size=3, label.size=0, hjust=0)
gg <- gg + scale_y_continuous(expand=c(0,0), label=comma, limits=c(0,5100))
gg <- gg + labs(x=NULL, y=NULL,
                title="A Decade of Desertion",
                subtitle="Annual counts of U.S. individuals who have chosen to expatriate (2006-2016)",
                caption="Source: https://www.federalregister.gov/")
gg <- gg + theme_hrbrmstr_an(grid="Y")
gg

RStudio

The exodus isn’t _massive_ but it’s actually more than I expected. It’d be interesting to track various US tax code laws, enactment of other compliance regulations and general news events to see if there are underlying reasons for the overall annual increases but also the dips in some quarters (which could just be data collection hiccups by the Feds…after all, this is government work). If you want to do all the math for correcting survey errors, it’d also be interesting to normalize this by population and track all the data back to 1996 (when HIPPA mandated the creation & publication of this quarterly list) and then see if you can predict where we’ll be at the end of this year (though I suspect political events are a motivator for at least a decent fraction of some of the quarters).

I had tried to convert my data-saving workflows to [`feather`](https://github.com/wesm/feather/tree/master/R) but there have been [issues](https://github.com/wesm/feather/issues/155) with it supporting large files (that seem to be near resolution), so I’ve been continuing to use R Data files for local saving of processed/cleaned data.

I make _many_ of these files and sometimes I do it as a one-off effort, thinking that I’ll come back to it quickly. Inevitably, I don’t do that and also end up naming those one-offs badly. I made a small [R helper package](https://github.com/hrbrmstr/rdatainfo) to make it easier to wrap up checking out these files at the command-line (via a `bash` function) but it hit me that it’d be even easier if there was a way to use the macOS Quick Look feature (hitting `` on a file icon) to see the previews.

Thus, [`QuickLookR`](https://github.com/hrbrmstr/QuickLookR) was born.

You need to [download the ZIP file](https://github.com/hrbrmstr/QuickLookR/releases/tag/v0.1.0), unzip it and save the `QuickLookR.qlgenerator` component into `~/Library/QuickLook`. Then `devtools::install_github(‘hrbrmstr/rdatainfo’)` in an R session. If you’ve got R/Rscript in the standard `/usr/local/bin` location, then you should be able to hit `` on any `.rdata`, `.rda` or `.rds` file and see a `str()` preview like this:

Blank_Skitch_Document

I haven’t cracked open Xcode in a while and my Objective-C is super-rusty, but this works on my El Capitan MacBook Pro (though I’m trying to see why some `.rds` files embedded in packages on my system have no previews).

If you have suggestions or issues, please use [github](https://github.com/hrbrmstr/QuickLookR/issues) to file them. For issues, it’d be really helpful if you included a copy of or link to files that don’t work well.

For the next revision, I plan on generating prettier HTML-based previews and linking against `R.framework` to avoid a call out to the system.

If Wes/Hadley really have fixed `feather`, I’ll be making a QuickLook plugin for that file format as well in the very near future.

The NIH is [moving forward](http://www.npr.org/sections/health-shots/2016/08/04/488387729/nih-plans-to-lift-ban-on-research-funds-for-part-human-part-animal-embryos) with plans to financially support & encourage human-animal chimera research.

You can find more info over at the [NIH blog](http://osp.od.nih.gov/under-the-poliscope/2016/08/next-steps-research-using-animal-embryos-containing-human-cells).

Chimera’s have been a longstanding subject of science-fiction/fantasy and many authors have visited it to help inform the ethics debate. A fairly recent exploration of this has been through the [Fullmetal Alchemist](http://www.fullmetalalchemist.com/) anime/manga series. TLDR: it doesn’t go so well, even in animal-animal chimera hybrids.

Yes, that’s fiction and the current NIH proposals are nowhere near as audacious as what’s described in the FMA series. But, surprisingly, you can’t find a large number vocal critics of human-animal chimera research since the modern “open scientific community” is actually pretty harshly judgmental of anyone that tries to limit or challenge “science” in any way (since it’s their religion, as everyone believes in something whether they claim to or not). Open, logical and—more importantly—effective criticisms against purported “progress” are often career-limiting moves. All this at a time in history when the current generation of scientists seems to be excelling at ignoring the potential for unintended consequences of their works.

Folks can (and should) [add their comments](http://grants.nih.gov/grants/rfi/rfi.cfm?ID=57) either for or against this proposal. Not commenting means you agree with the NIH plans and support your U.S. tax dollars & government resources going to support this research; it also means you are on the hook when this eventually goes horribly, horribly wrong.

For those commenting to show their _lack_ of support, I augmented a statement from an [interview wtih Dr. Stuart Newman](http://www.beliefnet.com/news/science-religion/2005/05/the-peril-and-promise-of-mix-and-match-biotech.aspx), who is a vocal detractor of human-animal chimeras (so much so that he tried to prevent it through the USPTO process, which eventually failed) for my submission:

I agree with Dr. Stuart A. Newman that, like every human activity, biotechnology is open to wise and foolish uses. The profit motive, coupled with an uncritical acceptance of the notion that new technology is the main way to human advancement, often leads to hype and incautious applications. In fact, existing technologies – sanitation, keeping water and air unpolluted, enabling poor people to eat enough and well-off people not too much, providing birth control and maternal and infant health services – would save more lives over the coming century than all foreseeable biotechnological applications.

I see no viable way for the NIH to prevent wanton abuse once they open the doors to this type of research. As a taxpayer, a well-read individual and someone who does have a sense of morality, I would rather precious, scant financial and bureaucratic resources go into known, proven endeavors that can have substantial, real, immediate impact.

As the translation from the FMA series states: _”Humans have a limitless desire to use their knowledge in real life…the desire to see what you can do with the power that is given to them…the desire to understand all the secrets in this world and experiment with them.”_ My reasoning and my faith suggest that there are definitely doors that should remain closed.

From: [Why Corporate America Is Leaving the Suburbs for the City](http://nyti.ms/2adP5Rn):

>_We wanted energy, vibrancy and diversity, and to accelerate a change in our culture by moving downtown._

translation:

>_We want to begin a process of strategically removing more highly paid, legacy employees who can’t commit 12 hours a day to our company and replacing them with younger folks we can take advantage of._

The move to forgo the addition of parking spaces in these new city HQs and encourage the use of mass transit is also interesting, given the [current state](http://www.salon.com/2015/03/01/american_mass_transit_is_dying/) [of mass transit systems](http://www.sfchronicle.com/bayarea/article/BART-shutdown-underscores-aging-system-s-6916061.php) [in America](http://thehill.com/policy/transportation/289278-dc-metro-proposes-permanent-earlier-closing-times-on-subway). Will these corporations be kicking in greenbacks for infrastructure/capacity improvement? Methinks not.

Remember, kids, these are soulless, giant, multinational corporations that place “shareholder value” over **everything else**. Also, remember that you’ll be a “legacy” worker someday, too.

Hopefully some startup will jump in to buy up all the forthcoming empty suburban campus spaces and turn them into indoor farms.

This is another purrr-focused post but it’s also an homage to the nascent magick package (R interface to ImageMagick) by @opencpu.

We’re starting to see/feel the impact of the increasing drought up here in southern Maine. I’ve used the data from the U.S. Drought Monitor before on the blog, but they also provide shapefiles and this seemed like a good opportunity to further demonstrate the utility of purrr and make animations directly using magick. Plus, I wanted to see the progression of the drought. Putting library() statements for purrr, magick and broom together was completely random, but I now feel compelled to find a set of functions to put into a cauldron package. But, I digress.

What does this demonstrate?

Apart from giving you an idea of the extent of the drought, working through this will help you:

  • use the quietly() function (which automagically turns off warnings for a function)
  • see another example of a formula function
  • illustrate the utility map_df(), and
  • see how to create an animation pipeline for magick

Comments are in the code and the drought gif is at the end. I deliberately only had it loop once, so refresh the image if you want to see the progression again. Also, drop a note in the comments if anything needs more exposition. (NOTE: I was fairly bad and did virtually no file cleanup in the function, so you’ll have half a year’s shapefiles in your getwd(). Consider the cleanup an exercise for the reader :-)

library(rgdal)
library(sp)
library(albersusa) # devtools::install_github("hrbrmstr/albersusa")
library(ggplot2) # devtools::install_github("hadley/ggplot2")
library(ggthemes)
library(rgeos)

# the witch's brew
library(purrr)
library(broom)
library(magick)

#' Get a drought map shapefile and turn it into a PNG
drought_map <- function(wk) {

  # need to hush some chatty functions
  hush_tidy <- quietly(tidy)

  # some are more stubbon than others
  old_warn <- getOption("warn")
  options(warn=-1)

  week <- format(wk, "%Y%m%d")

  # get the drought shapefile only if we don't have it already
  URL <- sprintf("http://droughtmonitor.unl.edu/data/shapefiles_m/USDM_%s_M.zip", week)
  (fil <- basename(URL))
  if (!file.exists(fil)) download.file(URL, fil)
  unzip(fil)

  # read in the shapefile and reduce the polygon complexity
  dr <- readOGR(sprintf("USDM_%s.shp", week),
                sprintf("USDM_%s", week),
                verbose=FALSE,
                stringsAsFactors=FALSE)

  dr <- SpatialPolygonsDataFrame(gSimplify(dr, 0.01, TRUE), dr@data)

  # turn separate out each drought level into its own fortified data.frame
  map(dr$DM, ~subset(dr, DM==.)) %>%
    map(hush_tidy) %>%
    map_df("result", .id="DM") -> m

  # get a conus base map (prbly cld have done map_data("usa"), too)
  usa_composite() %>%
    subset(!(iso_3166_2 %in% c("AK", "HI"))) %>%
    hush_tidy() -> usa

  usa <- usa$result # an artifact of using quietly()

  # this is all Ushey's fault. the utility of cmd-enter to run
  # the entire ggplot2 chain (in RStudio) turns out to have a
  # greater productity boost (i measured it) than my shortcuts for
  # gg <- gg + snippets and hand-editing the "+" bits out when
  # editing old plot constructs. I'm not giving up on gg <- gg + tho

  # Note putting the "base" layer on top since we don't really
  # want to deal with alpha levels of the drought polygons and
  # we're only plotting the outline of the us/states, not filling
  # the interior(s).

  ggplot() +
    geom_map(data=m, map=m,
             aes(long, lat, fill=DM, map_id=id),
             color="#2b2b2b", size=0.05) +
    geom_map(data=usa, map=usa, aes(long, lat, map_id=id),
             color="#2b2b2b88", fill=NA, size=0.1) +
    scale_fill_brewer("Drought Level", palette="YlOrBr") +
    coord_map("polyconic", xlim=c(-130, -65), ylim=c(25, 50)) +
    labs(x=sprintf("Week: %s", wk)) +
    theme_map() +
    theme(axis.title=element_text()) +
    theme(axis.title.x=element_text()) +
    theme(axis.title.y=element_blank()) +
    theme(legend.position="bottom") +
    theme(legend.direction="horizontal") -> gg

  options(warn=old_warn) # put things back the way they were

  outfil <- sprintf("gg-dm-%s.png", wk)
  ggsave(outfil, gg, width=8, height=5)

  outfil

}

# - create a vector of weeks (minus the current one)
# - create the individual map PNGs
# - read the individual map PNGs into a list
# - join the images together
# - create the animated gif structure
# - write the gif to a file

seq(as.Date("2016-01-05"), Sys.Date(), by="1 week") %>%
  head(-1) %>%
  map(drought_map) %>%
  map(image_read) %>%
  image_join() %>%
  image_animate(fps=2, loop=1) %>%
  image_write("drought.gif")

NOTE: an updated, comment-free version of the above code block is in this gist and uses spdplyr::filter() vs subset(), keeps downloaded files tidy in a temporary directory and includes a progress bar vs raw, ugly download.file() messages.

I’ve converted the vast majority of my *apply usage over to purrr functions. In an attempt to make this a quick post, I’ll refrain from going into all the benefits of the purrr package. Instead, I’ll show just one thing that’s super helpful: formula functions.

After seeing this Quartz article using a visualization to compare the frequency and volume of mass shootings, I wanted to grab the data to look at it with a stats-eye (humans are ++gd at visually identifying patterns, but we’re also ++gd as misinterpreting them, plus stats validates visual assumptions). I’m not going into that here, but will use the grabbing of the data to illustrate the formula functions. Note that there’s quite a bit of “setup” here for just one example, so I guess I kinda am attempting to shill the purrr package and the “piping tidyverse” just a tad.

If you head on over to the site with the data you’ll see you can download files for all four years. In theory, these are all individual years, but the names of the files gave me pause:

  • MST Data 2013 - 2015.csv
  • MST Data 2014 - 2015.csv
  • MST Data 2015 - 2015.csv
  • Mass Shooting Data 2016 - 2016.csv

So, they may all be individual years, but the naming consistency isn’t there and it’s better to double check than to assume.

First, we can check to see if the column names are the same (we can eyeball this since there are only four files and a small # of columns):

library(purrr)
library(readr)

dir() %>% 
  map(read_csv) %>% 
  map(colnames)

## [[1]]
## [1] "date"                        "name_semicolon_delimited"   
## [3] "killed"                      "wounded"                    
## [5] "city"                        "state"                      
## [7] "sources_semicolon_delimited"
## 
## [[2]]
## [1] "date"                        "name_semicolon_delimited"   
## [3] "killed"                      "wounded"                    
## [5] "city"                        "state"                      
## [7] "sources_semicolon_delimited"
## 
## [[3]]
## [1] "date"                        "name_semicolon_delimited"   
## [3] "killed"                      "wounded"                    
## [5] "city"                        "state"                      
## [7] "sources_semicolon_delimited"
## 
## [[4]]
## [1] "date"                        "name_semicolon_delimited"   
## [3] "killed"                      "wounded"                    
## [5] "city"                        "state"                      
## [7] "sources_semicolon_delimited"

A quick inspection of the date column shows it’s in month/day/year format and we want to know if each file only spans one year. This is where the elegance of the formula function comes in:

library(lubridate)

dir() %>% 
  map(read_csv) %>% 
  map(~range(mdy(.$date))) # <--- the *entire* post was to show this one line ;-)

## [[1]]
## [1] "2016-01-06" "2016-07-25"
## 
## [[2]]
## [1] "2013-01-01" "2013-12-31"
## 
## [[3]]
## [1] "2014-01-01" "2014-12-29"
## 
## [[4]]
## [1] "2015-01-01" "2015-12-31"

To break that down a bit:

  • dir() returns a vector of filenames in the current directory
  • the first map() reads each of those files in and creates a list with four elements, each being a tibble (data_frame / data.frame)
  • the second map() iterates over those data frames and calls a newly created anonymous function which converts the date column to a proper Date data type then gets the range of those dates, ultimately resulting in a four element list, with each element being a two element vector of Dates

For you “basers” out there, this is what that looks like old school style:

fils <- dir()
dfs <- lapply(fils, read.csv, stringsAsFactors=FALSE)
lapply(dfs, function(x) range(as.Date(x$date, format="%m/%e/%Y")))

or

lapply(dir(), function(x) {
  df <- read.csv(x, stringsAsFactors=FALSE)
  range(as.Date(df$date, format="%m/%e/%Y"))
})

You eliminate the function(x) { } and get pre-defined vars (either .x or . and, if needed, .y) to compose your maps and pipes very cleanly and succinctly, but still being super-readable.

After performing this inspection (i.e. that each file does contain only a incidents for a single year), we can now automate the data ingestion:

library(rvest)
library(purrr)
library(readr)
library(dplyr)
library(lubridate)

read_html("https://www.massshootingtracker.org/data") %>% 
  html_nodes("a[href^='https://docs.goo']") %>% 
  html_attr("href") %>% 
  map_df(read_csv) %>% 
  mutate(date=mdy(date)) -> shootings

Here’s what that looks like w/o the tidyverse/piping:

library(XML)

doc <- htmlParse("http://www.massshootingtracker.org/data") # note the necessary downgrade to "http"

dfs <- xpathApply(doc, "//a[contains(@href, 'https://docs.goo')]", function(x) {
  csv <- xmlGetAttr(x, "href")
  df <- read.csv(csv, stringsAsFactors=FALSE)
  df$date <- as.Date(df$date, format="%m/%e/%Y")
  df
})

shootings <- do.call(rbind, dfs)

Even hardcore “basers” may have to admit that the piping/tidyverse version is ultimately better.

Give the purrr package a first (or second) look if you haven’t switched over to it. Type safety, readable anonymous functions and C-backed fast functional idioms will mean that your code may ultimately be purrrfect.

UPDATE #1

I received a question in the comments regarding how I came to that CSS selector for the gdocs CSV URLs, so I made a quick video of the exact steps I took. Exposition below the film.

Right-click “Inspect” in Chrome is my go-to method for finding what I’m after. This isn’t the place to dive deep into the dark art of web page spelunking, but in this case, when I saw there were four similar anchor (<a>) tags that pointed to the CSV “files”, I took the easy way out and just built a selector based on the href attribute value (or, more specifically, the characters at the start of the href attribute). However, all four ways below end up targeting the same four elements:

pg <- read_html("https://www.massshootingtracker.org/data")

html_nodes(pg, "a.btn.btn-default")
html_nodes(pg, "a[href^='https://docs.goo']")
html_nodes(pg, xpath=".//a[@class='btn btn-default']")
html_nodes(pg, xpath=".//a[contains(@href, 'https://docs.goo')]")

UPDATE #2

Due to:

I swapped out list.files() in favour of dir() (though, as someone who detests DOS/Windows, typing that function name is super-painful).

I been updating some existing packages and github-releasing new ones (before a CRAN push). Most are “cyber”-related, but there are some general purpose ones. Here’s a quick overview:

  • docxtractr (CRAN, now, v0.2.0) was initially designed to make it easy to get data tables out of MS Word (docx) documents. The update removes use of a deprecated xml2 package function and adds the ability to extract comments from Word docs.
  • slackr (CRAN, now v1.4.2) lets you pass R code, plots, data objects and arbitrary text to Slack from R. A Slack API change introduced some changes that broke the package. Said changes have been compensated for.
  • iptools (GitHub, v0.5.0) makes it super easy and quick to work with IPv4 (and some IPv6) addresses in R. The dev updates add NA support to checkers/validators (by @quonimous), Hilbert-space coordinate generators and faster ways to test IPv4 address membership in CIDR blocks (one involves using the triebeard pkg by Oliver and works well with BGP dumps read in from mtr [below] and the other uses optimized integer searches).
  • hubway (GitHub, v0.1.0) provides programmatic access to the JSON data from the bike-sharing service Hubway. I’m using it to build a predictive model of bike availability in Boston.
  • myip (GitHub, v0.1.0) provides unified access to numerous “what’s my IP address?” services on the internet. May merge this with other work @wrathematics has been doing.
  • mrt (GitHub, v0.1.0) small, C-backed package that wraps libBGPdump and reads Route Views MRT BGP dumps.
  • algorithmia (GitHub, v0.1.0) provides an R wrapper to the Algorithmia web service, enabling use of a wide range of hosted algorithms using local or cloud data.
  • ssllabs (GitHub, v0.1.0) provides an R wrapper to the SSL Labs (SSL/TLS cert checker) API
  • accidents (GitHub v0.1.0) an R data page containing historical U.S. NHTSA accident data.
  • htmltidy (GitHub, v0.1.0) an R wrapper to the HTML Tidy library that cleans up gnarly HTML/XHTML, making it easier to parse with rvest.
  • gdns (GitHub, v0.1.0) an R wrapper to the Google “DNS over HTTPS” API.
  • ohby (GitHub, v0.1.0) an R wrapper to the nascent ohby URL & content shortener

If any of the GitHub-only pkgs are of [major] use to folks, let me know and I’ll prioritize getting CI tests wired up and CRAN submissions started.

In other news, ggalt is gearing up for an August CRAN release, so if you have any ggplot2 extensions that need a home, fork that repo and PR them my way before the middle of the month.

Finally, I’ve had many folks contribute code and bug reports to packages and wanted to close with a YUGE “thank you!” to all of you who did so.