The rest of the month is going to be super-hectic and it’s unlikely I’ll be able to do any more to help the push to CRAN 10K, so here’s a breakdown of CRAN and GitHub new packages & package updates that I felt were worth raising awareness on:
epidata
I mentioned this one last week but it wasn’t really a package announcement post. epidata
is now on CRAN and is a package to pull data from the Economic Policy Institute (U.S. gov economic data, mostly). Their “hidden” API is well thought out and the data has been nicely curated (and seems to update monthly). It makes it super easy to do things like the following:
library(epidata)
library(tidyverse)
library(stringi)
library(hrbrmisc) # devtools::install_github("hrbrmstr/hrbrmisc")
us_unemp <- get_unemployment("e")
glimpse(us_unemp)
## Observations: 456
## Variables: 7
## $ date <date> 1978-12-01, 1979-01-01, 1979-02-01, 1979-03-0...
## $ all <dbl> 0.061, 0.061, 0.060, 0.060, 0.059, 0.059, 0.05...
## $ less_than_hs <dbl> 0.100, 0.100, 0.099, 0.099, 0.099, 0.099, 0.09...
## $ high_school <dbl> 0.055, 0.055, 0.054, 0.054, 0.054, 0.053, 0.05...
## $ some_college <dbl> 0.050, 0.050, 0.050, 0.049, 0.049, 0.049, 0.04...
## $ college <dbl> 0.032, 0.031, 0.031, 0.030, 0.030, 0.029, 0.03...
## $ advanced_degree <dbl> 0.021, 0.020, 0.020, 0.020, 0.020, 0.020, 0.02...
us_unemp %>%
gather(level, rate, -date) %>%
mutate(level=stri_replace_all_fixed(level, "_", " ") %>%
stri_trans_totitle() %>%
stri_replace_all_regex(c("Hs$"), c("High School")),
level=factor(level, levels=unique(level))) -> unemp_by_edu
col <- ggthemes::tableau_color_pal()(10)
ggplot(unemp_by_edu, aes(date, rate, group=level)) +
geom_line(color=col[1]) +
scale_y_continuous(labels=scales::percent, limits =c(0, 0.2)) +
facet_wrap(~level, scales="free") +
labs(x=NULL, y="Unemployment rate",
title=sprintf("U.S. Monthly Unemployment Rate by Education Level (%s)", paste0(range(format(us_unemp$date, "%Y")), collapse=":")),
caption="Source: EPI analysis of basic monthly Current Population Survey microdata.") +
theme_hrbrmstr(grid="XY")
us_unemp %>%
select(date, high_school, college) %>%
mutate(date_num=as.numeric(date)) %>%
ggplot(aes(x=high_school, xend=college, y=date_num, yend=date_num)) +
geom_segment(size=0.125, color=col[1]) +
scale_x_continuous(expand=c(0,0), label=scales::percent, breaks=seq(0, 0.12, 0.02), limits=c(0, 0.125)) +
scale_y_reverse(expand=c(0,100), label=function(x) format(as_date(x), "%Y")) +
labs(x="Unemployment rate", y="Year ↓",
title=sprintf("U.S. monthly unemployment rate gap (%s)", paste0(range(format(us_unemp$date, "%Y")), collapse=":")),
subtitle="Segment width shows the gap between those with a high school\ndegree and those with a college degree",
caption="Source: EPI analysis of basic monthly Current Population Survey microdata.") +
theme_hrbrmstr(grid="X") +
theme(panel.ontop=FALSE) +
theme(panel.grid.major.x=element_line(size=0.2, color="#2b2b2b25")) +
theme(axis.title.x=element_text(family="Arial", face="bold")) +
theme(axis.title.y=element_text(family="Arial", face="bold", angle=0, hjust=1, margin=margin(r=-14)))
(right edge is high school, left edge is college…I’ll annotate it better next time)
censys
Censys is a search engine by one of the cybersecurity research partners we publish data to at work (free for use by all). The API is moderately decent (it’s mostly a thin shim authentication layer to pass on Google BigQuery query strings to the back-end) and the R package to interface to it censys
is now on CRAN.
waffle
The seminal square pie chart package waffle
has been updated on CRAN to work better with recent ggplot2
2.x changes and has some additional parameters you may want to check out.
cdcfluview
The viral package cdcfluview
has had some updates on the GitHub version to add saner behaviour when specifying dates and had to be updated as the CDC hidden API switched to all https
URLs (major push in .gov-land to do that to get better scores on their cyber report cards). I’ll be adding some features before the next CRAN push to enable retrieval of additional mortality data.
sergeant
If you work with Apache Drill (if you don’t, you should), the sergeant
package (GitHub) will help you whip it into shape. I’ve mentioned it before on the blog but it has a nigh-complete dplyr
interface now that works pretty well. It also has a direct REST API interface and RJDBC interface plus many helper utilities that help you avoid typing SQL strings to get cluster status info. Once I add the ability to create parquet files with it I’ll push it up to CRAN.
The one thing I’d like to do with this package is support any user-defined functions (UDFs in Drill-speak) folks have written. So, if you have a UDF you’ve written or use and you want it wrapped in the package, just drop an issue and I’ll layer it in. I’ll be releasing some open source cybersecurity-related UDFs via the work github in a few weeks.
zkcmd
Drill (in non-standalone mode) relies on Apache Zookeeper to keep everything in sync and it’s sometimes necessary to peek at what’s happening inside the zookeeper cluster, so sergeant
has a sister package zkcmd
that provides an R interface to zookeeper instances.
ggalt
Some helpful folks tweaked ggalt
for better ggplot2 2.x compatibility (#ty!) and I added a new geom_cartogram()
(before you ask if it makes warped shapefiles: it doesn’t) that restores the old (and what I believe to be the correct/sane/proper) behaviour of geom_map()
. I need to get this on CRAN soon as it has both fixes and many new geom
s folks will want to play with in a non-GitHub context.
FIN
There have been some awesome packages released by others in the past month+ and you should add R Weekly to your RSS feeds if you aren’t following it already (there are other things you should have there for R updates as well, but that’s for another blog). I’m definitely looking forward to new packages, visualizations, services and utilities that will be coming this year to the R community.