R Archives - Page 10 of 56

Category Archives: R

Prying “.R” Script Files Away from Xcode (et al) on macOS

As the maintainer of RSwitch — and developer of my own (for personal use) macOS, iOS, watchOS, iPadOS and tvOS apps — I need the full Apple Xcode install around (more R-focused macOS folk can get away with just the command-line tools being installed). As an Apple Developer who insanely runs the macOS & Xcode betas as they are released, I also have the misery of dealing with Xcode usurping authority over .R files every time it receives an update. Sure, I can right-click on an R script, choose “Open With => Other…”, pick RStudio and make it the new default, but clicks interrupt train of thought and take more time than execution a quick shell command at a terminal prompt (which I always have up).

Enter: dtui — https://github.com/moretension/duti — a small command-line tool that lets you change the default application just by knowing the id of the application you want to make the default. For instance, RStudio’s id is org.rstudio.RStudio which can be obtained via:

$ osascript -e 'id of app "RStudio"'
org.rstudio.RStudio

and, we can use that value in a quick call to duti:

$ duti -s org.rstudio.RStudio .R all

If you’d rather Visual Studio Code or Sublime Text to be the default for .R files, their bundle ids are com.sublimetext.3 and com.microsoft.VSCode, respectively. If you’d rather use Atom, well you really need to think about your life choices.

We can see what the current default for R scripts via:

$ duti -x R
RStudio.app
/Applications/RStudio.app
org.rstudio.RStudio

You can turn the “setter” into a shell alias (preferably zsh or sh alias since bash is going away soon) or shell script for quick use.

Installing `duti`

Homebrew users can just brew install duti and get on with their day. Folks can also grab the latest release and get on with their day with just a little more effort.

The duti utility can also be compiled on your own (which is preferred so you can look at the source to make sure you know being compromised by a random developer on the internet); but, if you have macOS 10.15 (Catalina), you’ll need to jump through a few hoops since it doesn’t compile out-of-the-box on that platform yet. Thankfully those hoops aren’t too bad thanks to a helpful pull request that adds support for the current version of macOS. (You’ll need at least the command-line developer tools installed for this to work and likely need to brew install autoconf automake libtool to ensure all the toolchain bits that are needed are in place.):

At a terminal prompt, go to where you normally go to clone git repositories and grab the source:

$ git clone git@github.com:moretension/duti.git
$ cd duti
$ git fetch origin pull/39/head:pull_39 # add and fetch the origin for the PR
$ git checkout pull_39                  # switch to the branch
$                                       # review the source code
$                                       # no, really, review the source code!
$ autoconf                              # run autoconf to generate the configure script
$ ./configure                           # generate the Makefile (there will be "checking" and "creating" messages)
$ make                                  # build it! (there will be macOS API deprecation warnings but no errors)
$ make install                          # install it! (you may need to prefix with "sudo -H"; this will put the binary in `/usr/local/bin/` and man page in `/usr/local/share/man/man`

NOTE: If you only have the macOS Xcode command line tools (vs the entirety of Xcode) you’ll need to edit aclocal.m4 before you run autoconf and change line 9 to be:

sdk_path="/Library/Developer/CommandLineTools/SDKs"

since the existing setting assumes you have the full Xcode installation available.

FIN

I’ll be adding this functionality to the next version of RSwitch, letting you specify the application(s) you want to own various R-ish files. It will check for the proper values being in place on a regular basis and set them to your defined preferences (I also need to see if there’s an event I can have RSwitch watch for to trigger the procedure).

If you have another, preferred way to keep ownership of R files drop a blog post link in the comments (or just drop a note the comments with said procedure).

Monitoring Website SSL/TLS Certificate Expiration Times with R, {openssl}, {pushoverr}, and {DT}

macOS R users who tend to work on the bleeding edge likely noticed some downtime at <mac.r-project.org> this past weekend. Part of the issue was an SSL/TLS certificate expiration situation. Moving forward, we can monitor this with R using the super spiffy {openssl} and {pushoverr} packages whilst also generating a daily report with {rmarkdown} and {DT}.

The Basic Process

The {openssl} package has a handy function — download_ssl_cert() — which will, by default, hit a given host on the standard HTTPS port (443/TCP) and grab the site certificate and issuer. We’ll grab the “validity end” field and convert that to a date to use for comparison.

To get the target list of sites to check I used Rapid7’s FDNS data set and a glance at a few certificate transparency logs to put together a current list of “r-project” domains that have been known to have SSL certs. This process could be made more dynamic, but things don’t change that quickly in r-project domain land.

Finally, we use the {DT} package to build a pretty HTML table and the {pushoverr} package to send notifications at normal priority for certs expiring within a week and critical priority for certs that have expired (the package has excellent documentation which will guide you through setting up a Pushover account).

I put this all in a plain R script named r-project-ssl-notify.R that’s then called from a Linux CRON job which runs:

/usr/bin/Rscript -e 'rmarkdown::render(input="PATH_TO/r-project-ssl-notify.R", output_file="PATH_TO/r-project-cert-status/index.html", quiet=TRUE)'

once a day at 0930 ET to make this status page and also fire off any notifications which I have going to my watch and phone (I did a test send by expanding the delta to 14 days):

watch

phone

Here’s the contents of

#' ---
#' title: "r-project SSL/TLS Certificate Status"
#' date: "`r format(Sys.time(), '%Y-%m-%d')`"
#' output:
#'   html_document:
#'     keep_md: false
#'     theme: simplex
#'     highlight: monochrome
#' ---
#+ init, include=FALSE
knitr::opts_chunk$set(
  message = FALSE, 
  warning = FALSE, 
  echo = FALSE, 
  collapse=TRUE
)

#+ libs
library(DT)
library(openssl)
library(pushoverr)
library(tidyverse)

# Setup -----------------------------------------------------------------------------------------------------------

# This env config file contains two lines:
#
# PUSHOVER_USER=YOUR_PUSHOVER_USER_STRING
# PUSHOVER_APP=YOUR_PUSHOVER_APP_KEY
#
# See the {pushoverr} package for how to setup your Pushover account
readRenviron("~/jobs/conf/r-project-ssl-notify.env")


# Check certs -----------------------------------------------------------------------------------------------------

# r-project.org domains retrieved from Rapid7's FDNS data set
# (https://opendata.rapid7.com/sonar.fdns_v2/) and cert transparency logs

#+ work
c(
  "beta.r-project.org", "bugs.r-project.org", "cloud.r-project.org", 
  "cran-archive.r-project.org", "cran.at.r-project.org", "cran.ch.r-project.org", 
  "cran.es.r-project.org", "cran.r-project.org", "cran.uk.r-project.org", 
  "cran.us.r-project.org", "developer.r-project.org", "ess.r-project.org", 
  "ftp.cran.r-project.org", "journal.r-project.org", "lists.r-forge.r-project.org", 
  "mac.r-project.org", "r-project.org", "svn.r-project.org", "translation.r-project.org", 
  "user2011.r-project.org", "user2014.r-project.org", "user2016.r-project.org", 
  "user2018.r-project.org", "user2019.r-project.org", "user2020.r-project.org", 
  "user2020muc.r-project.org", "win-builder.r-project.org", "www.cran.r-project.org", 
  "www.r-project.org", "www.user2019.fr"
) -> r_doms

# grab each cert

r_certs <- map(r_doms, openssl::download_ssl_cert)

# make a nice table
tibble(
  dom = r_doms,
  expires = map_chr(r_certs, ~.x[[1]][["validity"]][[2]]) %>% # this gets us the "validity end"
    as.Date(format = "%b %d %H:%M:%S %Y", tz = "GMT"),        # and converts it to a date object
  delta = as.numeric(expires - Sys.Date(), "days")            # this computes the delta from the day this script was called
) %>% 
  arrange(expires) -> r_certs_expir

# Status page generation ------------------------------------------------------------------------------------------

# output nice table  
DT::datatable(r_certs_expir, list(pageLength = nrow(r_certs_expir))) # if the # of r-proj doms gets too large we'll cap this for pagination

# Notifications ---------------------------------------------------------------------------------------------------

# See if we need to notify abt things expiring within 1 week
# REMOVE THIS or edit the delta max if you want less noise
one_week <- filter(r_certs_expir, between(delta, 1, 7))
if (nrow(one_week) > 0) {
  pushover_normal(
    title = "There are r-project SSL Certs Expiring Within 1 Week", 
    message = "Check which ones: https://rud.is/r-project-cert-status"
  )
}

# See if we have expired certs
expired <- filter(r_certs_expir, delta <= 0)
if (nrow(expired) > 0) {
  pushover_critical(
    title = "There are expired r-project SSL Certs!", 
    message = "Check which ones: https://rud.is/r-project-cert-status"
  )
}

FIN

With just a tiny bit of R code we have the ability to monitor expiring SSL certs via a diminutive status page and alerts to any/all devices at our disposal.

Convert Apple Card PDF Statements to Tidy Data (i.e. for CSV/Excel/database export)

UPDATE 2020-02-11 Apple now supports downloading transactions as CSV or OFX! (via MacObserver).

I saw this CNBC article on an in-theory browser client-side-only conversion utility for taking Apple Card PDF statements and turning them into CSV files.

Since I (a) never trust any browser or site and (b) the article indicated that there is a $5 fee to avoid the “single random transaction removal”, I felt compelled to throw together an R script to do this for at least folks who are capable of setting up R so that all processing is guaranteed to be local.

FWIW the site does appear to do what it says on the tin (all processing is, indeed, local). That doesn’t mean one of your extensions isn’t spying on you, nor does it mean that the site could not turn evil someday (one its own or via an attacker compromise).

read_apple_card_statement <- function(path) {

  require(stringi)
  require(pdftools)
  require(tidyverse)

  # make sure the file exists
  path <- path.expand(path[1])
  if (!file.exists(path)) stop("File '", path, "' not found.", call.=FALSE)

  pdf_text(path) %>% # read it in
    stri_split_lines() %>% # turn \n to a separate character vector element
    unlist() %>% # flatten it
    stri_trim_both() %>% # get rid of leading/trailing spaces
    keep(stri_detect_regex, "^([[:digit:]]{2}/[[:digit:]]{2}/[[:digit:]]{4})") %>% # find lines that start with a date
    map_df(~{
      rec <- as.list(unlist(stri_split_regex(.x, "[[:space:]]{3,}"))) # find the columns
      if (stri_detect_fixed(.x, "%")) { # lines with a `%` in them distinguish charges from payments
        rec <- set_names(rec, c("date", "description", "daily_cash_pct", "daily_cash_amt", "amt")) # ones with charges have cash back columns
      } else {
        rec <- set_names(rec, c("date", "description", "amt")) # ones w/o % do not
      }
    }) %>%
    mutate(
      date = lubridate::mdy(date), # make dates dates
      amt = stri_replace_first_fixed(amt, "$", "") %>% parse_number(), # dollars to numbers
      daily_cash_pct = parse_number(daily_cash_pct)/100, # % to numbers
      daily_cash_amt = parse_number(daily_cash_amt) # dollars to numbners
    )

}

list.files("~/Downloads", pattern = "Apple Card Statement", full.names = TRUE) %>% 
  map_df(read_apple_card_statement)

You can send the PDF statements from the Apple Card app to your Mac via Air Drop and it will put them into ~/Downloads. I recommend putting them somewhere else since you’ve likely given all sorts of applications access to ~/Downloads when prompted to on Catalina (yay security theatre). Wherever you put them, you can read them individually with read_apple_card_statment() or you can then just list_files() and bind all the individual statements together:

list.files("~/WhereYouPutAppleCardStatements", pattern = "Apple Card Statement", full.names = TRUE) %>% 
  map_df(read_apple_card_statement)

FIN

Be very wary of what you put your trust into online. Just because a site is benign one day does not mean it won’t be malicious (deliberately or otherwise) the next. Also, lobby Apple to provide data in more useful formats, especially since it provides applications like Numbers for free with their operating system.

Writing Frictionless R Package Wrappers — Building A Basic R Package

Before we start wrapping foreign language code we need to make sure that basic R packages can be created. If you’ve followed along from the previous post you have everything you need to get started here. Just to make sure, you should be able to fire up a new RStudio session and execute the following R code and see similar output. If not, you’ll need to go through the steps and resources outlined there before continuing.

pkgbuild::check_build_tools()
### Your system is ready to build packages!

Also: the {bookdown} version should now always match the blog post (apart from some verbiage changes to denote it’s in a book vs a series of blog posts). You can refer to it at any time via — https://rud.is/books/writing-frictionless-r-package-wrappers/.

Configuring {devtools}

We’re going to rely on the {devtools} package for many operations and the first thing you should do now is execute help("create", "devtools") in an RStudio R console to see the package documentation page where you’ll see guidance pointing you to devtools::use_description() that lists some R session options() that you can set to make your package development life much easier and quicker. Specifically, it lets you know that you can setup your ~/.Rprofile to include the certain options settings which will automatically fill in fields each time you create a new package vs you either specifying these fields manually in the package creation GUI or as parameters to devtools::create().

A good, minimal setup would be something like:

options(
  usethis.description = list(
    `Authors@R` = 'person("Some", "One", email = "someone@example.com", role = c("aut", "cre"),
                          comment = c(ORCID = "YOUR-ORCID-ID"))',
    License = "MIT + file LICENSE"
  )
)

NOTE: If you do not have an “ORCID” you really should get one (they’re free!) by heading over to
— https://orcid.org/ — and filling in some basic information.

Take a moment to edit your ~/.Rprofile. If you’re not sure about how to do that there is an excellent chapter in Efficient R Programming¹ which walks you through the process.

Once you’ve added or verified these new options() settings, restart your R session.

Creating A Package

We’re almost ready to create and build a basic R package. All R packages live in a package directory and I highly suggest creating a packages directory right off your home directory (e.g. “~/packages“) or someplace where you’ll be able to keep them all organized and accessible. The rest of these posts will assume you’re using “~/packages” as the

With {devtools} now pre-configured, use the RStudio R Console pane to execute the following code which should produce similar output and open up a new RStudio session with the new package directory:

devtools::create("~/packages/myfirstpackage") 
## ✔ Creating '/Users/someuser/packages/myfirstpackage/'
## ✔ Setting active project to '/Users/someuser/packages/myfirstpackage'
## ✔ Creating 'R/'
## ✔ Writing 'DESCRIPTION'
## Package: myfirstpackage
## Title: What the Package Does (One Line, Title Case)
## Version: 0.0.0.9000
## Authors@R (parsed):
##     * Bob Rudis <bob@rud.is> [aut, cre] (<https://orcid.org/0000-0001-5670-2640>)
## Description: What the package does (one paragraph).
## License: MIT + file LICENSE
## Encoding: UTF-8
## LazyData: true
## ✔ Writing 'NAMESPACE'
## ✔ Writing 'myfirstpackage.Rproj'
## ✔ Adding '.Rproj.user' to '.gitignore'
## ✔ Adding '^myfirstpackage\\.Rproj$', '^\\.Rproj\\.user$' to '.Rbuildignore'
## ✔ Opening '/Users/someuser/packages/myfirstpackage/' in new RStudio session
## ✔ Setting active project to '<no active project>'

The directory structure will look like this:

.
├── DESCRIPTION
├── NAMESPACE
├── R/
└── myfirstpackage.Rproj

At this point we still do not have a “perfect” R package. To prove this, use the R console to run devtools::check() and — after some rather verbose output — you’ll see the following lines at the end:

> checking DESCRIPTION meta-information ... WARNING
  Invalid license file pointers: LICENSE

0 errors ✓ | 1 warning x | 0 notes ✓

Since we’re saying that our package will be using the MIT license, we need to ensure there’s an associated LICENSE file which we can do by executing usethis::use_mit_license() which will create the necessary files and ensure the License field in the DESCRIPTION file is formatted properly.

If you run devtools::check() again, now, your final line should report:

## 0 errors ✓ | 0 warnings ✓ | 0 notes ✓

and the package directory tree should look like this:

├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── R/
└── myfirstpackage.Rproj

Rounding Out The Corners

While we have a minimum viable package there are a few other steps we should take during this setup phase. First we’ll setup our package to use {roxygen2}² for documenting functions, declaring NAMESPACE imports, and other helper-features that will be introduced in later posts. We can do this via usethis::use_roxygen_md():

usethis::use_roxygen_md()
## ✔ Setting Roxygen field in DESCRIPTION to 'list(markdown = TRUE)'
## ✔ Setting RoxygenNote field in DESCRIPTION to '7.0.2'
## ● Run `devtools::document()`

We won’t run devtools::document() just yet, though. Before we do that we’ll also create an R file where we can store top-level package introduction/meta-information:

usethis::use_package_doc()
## ✔ Writing 'R/myfirstpackage-package.R'

Now, our directory tree should look like:

.
├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── R
│   └── myfirstpackage-package.R
└── myfirstpackage.Rproj

Now, run devtools::document() which will translate the {roxygen2} comments into a properly-formatted R documentation file and regenerate the NAMESPACE file (as we’ll be managing package imports and exports via {roxygen2} comments). The directory tree will now look like:

.
├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── R
│   └── myfirstpackage-package.R
├── man
│   └── myfirstpackage-package.Rd
└── myfirstpackage.Rproj

and, we can now re-run devtools::check() to make sure we have the three “0’s” we’re aiming for each time we check our package for errors.

Passing The Test

We’re going to want to write and use tests to ensure our package works properly. There are many R package testing frameworks available. To ease the introduction into this process, we’ll use one of the frameworks that came along for the ride when you installed the various packages outlined in the previous post: {testthat}³. Setting up {testthat} is also pretty painless thanks to the {usethis} package we’ve been taking advantage of quite a bit so far. We’ll create the {testthat} overall infrastructure then add a placeholder test script since devtools::check() will complain about no tests being available if we do not have at least a single script it can execute during the test phase of the package checking process.

usethis::use_testthat()
## ✔ Adding 'testthat' to Suggests field in DESCRIPTION
## ✔ Creating 'tests/testthat/'
## ✔ Writing 'tests/testthat.R'
## ● Call `use_test()` to initialize a basic test file and open it for editing.

usethis::use_test("placeholder")
## ✔ Increasing 'testthat' version to '>= 2.1.0' in DESCRIPTION
## ✔ Writing 'tests/testthat/test-placeholder.R'
## ● Modify 'tests/testthat/test-placeholder.R'

The directory tree will now look like this:

.
├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── R
│   └── myfirstpackage-package.R
├── man
│   └── myfirstpackage-package.Rd
├── myfirstpackage.Rproj
└── tests
    ├── testthat
    │   └── test-placeholder.R
    └── testthat.R

Run devtools::check() one more time to make sure we’ve got those precious 3 “0’s” one last time.

Getting Things Under Control

We’re almost done! One final step is to turn this directory into a git-managed directory so we can work a bit more safely and eventually share our development work with a broader audience. Provided you followed the outline in the previous post, setting up git is as straightforward as one {usethis} function call:

usethis::use_git()
## ✔ Setting active project to '/Users/someuser/packages/myfirstpackage'
## ✔ Initialising Git repo
## ✔ Adding '.Rhistory', '.RData' to '.gitignore'
## There are 10 uncommitted files:
## * '.gitignore'
## * '.Rbuildignore'
## * 'DESCRIPTION'
## * 'LICENSE'
## * 'LICENSE.md'
## * 'man/'
## * 'myfirstpackage.Rproj'
## * 'NAMESPACE'
## * 'R/'
## * 'tests/'
## Is it ok to commit them?
## 
## 1: For sure
## 2: Negative
## 3: Not now
## 
## Selection: 1
## ✔ Adding files
## ✔ Commit with message 'Initial commit'
## ● A restart of RStudio is required to activate the Git pane
## Restart now?
## 
## 1: Negative
## 2: Not now
## 3: Yup
## 
## Selection: 3

RStudio should have been restarted (so it can add a “Git” pane in case you want to use the GUI to manage git) and the directory tree will now have a .git/ subdirectory that you should (almost) never touch by hand.

The last thing to do is to “vaccinate” your git setup so you don’t leak sensitive or unnecessary files when you (eventually) share your creation with the world:

usethis::git_vaccinate()
## ✔ Adding '.Rproj.user', '.Rhistory', '.Rdata' to '/Users/someuser/.gitignore'

We now have a basic, working R package that is devoid of any real functionality other than that of getting us familiar with the package setup and validation processes. We’ll be building upon this experience in most of the coming posts.

Quick Reference

After ensuring you’ve got the recommended options() in place, here are the steps to setup a new package:

# in any RStudio R Console session
devtools::create("~/packages/THE-PACKAGE-NAME")

# in the newly created package RStudio R Console session:
usethis::use_mit_license()       # need a LICENSE file
usethis::use_roxygen_md()        # use {roxygen2} for documentation and configuration
usethis::use_package_doc()       # setup a package-level manual page
usethis::use_testthat()          # setup testing infrastructure
usethis::use_test("placeholder") # setup a placeholder test file
devtools::document()             # Let {roxygen2} create NAMESPACE entries, build manual pages (and, more later on)
devtools::check()                # looking for the three "0's" that tell us we're ready to roll!
usethis::use_git()               # put the directory under git version control
usethis::git_vaccinate()         # Prevent leaking credentials and other unnecessary filesystem cruft

Rather than re-type devtools::document() (et al) whenever you need to run {roxygen2} or build/check a package you can use RStudio keyboard shortcuts that are designed to seamlessly integrate with the {devtools} ecosystem:

Operation	Windows & Linux	Mac	{devtools} equivalent
Install and Restart	Ctrl+Shift+B	Cmd+Shift+B	devtools::install()
Load All (devtools)	Ctrl+Shift+L	Cmd+Shift+L	devtools::load_all()
Test Package (Desktop)	Ctrl+Shift+T	Cmd+Shift+T	devtools::test()
Test Package (Web)	Ctrl+Alt+F7	Cmd+Alt+F7	devtools::test()
Check Package	Ctrl+Shift+E	Cmd+Shift+E	devtools::check()
Document Package	Ctrl+Shift+D	Cmd+Shift+D	devtools::document()

We’ll refer to these operations as “install” (or “build”), “load all”, “test”, “check”, and “document” from now on so you can choose to use the console or the shortcuts as you prefer.

Exercises

Our package may be kinda, well, useless for the moment but that doesn’t mean you can’t show it some love and get some practice in at the same time while things are still relatively straightforward.

Modify the Title, Version, and Description fields of the DESCRIPTION file and refine them as needed until package checks pass.
Deliberately mangle parts of the DESCRIPTION file to see what errors or warnings you receive during the package check process.
Read up on {roxygen2} and add some Sections to it formatted with markdown and/or LaTeX. Re-“document” the package and see how your changes look.
Edit the test-placeholder.R file and change the placeholder test it created so it fails and then re-check the package to see what warnings or errors show up.
After you’ve made (valid, working) modifications to any/all of the above and package checks pass, use either the git command line tools or the RStudio Git pane to add your updates to the git tree. Use the resources linked to in the previous post if you need a refresher on how to do that.
Re-run through all the steps with a brand new package name just to make sure you’re comfortable with the package creation process.

Up Next

In the next installment in the series we will start wrapping by creating a basic wrapper that just calls out to the operating system shell to run commands.

Efficient R Programming, “3.3 R Startup”; (https://csgillespie.github.io/efficientR/3-3-r-startup.html#r-startup) ↩
{roxygen2} Home; (https://roxygen2.r-lib.org/) ↩
{testthat} Home; (https://testthat.r-lib.org/) ↩

Writing Frictionless R Package Wrappers — Introduction

The R language and RStudio IDE are a powerful combination for “getting stuff done”, and one aspect of R itself that makes it especially useful is the ability to use it with other programming languages via a robust foreign language interface capability¹. The term “foreign language” refers to another programming language such as C, C++, Fortran, Java, Python, Rust, etc. A common way of referring the this idiom of using functionality written in another programming language from directly within R is “wrapping” since we’re putting an R “shell” around the code from the other language. Another term you may see used is “extending” (hence the title of the “Writing R Extensions” R manual).

While R supports using this this extension mechanism from any R script leaving tiny trails of R and other language source and binary files all across your filesystem is not exactly the best way to keep these components organized and creates other challenges when you come across the need to use them in other projects or share them with others. Thankfully, the R Core team, along with many individual contributors over the years, has made it pretty straightforward to incorporate this extension capability into R packages which are much easier (honest!) to organize and share.

The goal of this blog series — which will have a {bookdown} book companion along with some screencasts — is to help you get up to speed using R and RStudio to write R packages that wrap code from many different languages to help you “get stuff done” with as little friction as possible.

Base Requirements

It is assumed that readers are familiar with the R programming language, RStudio IDE, and are comfortable installing and using packages. Since this work is about extending R with other programming languages, you should also have some knowledge of one or more of the target languages being covered.

To follow along with the series you’ll need to ensure you have the necessary components installed along the way. Rather than overwhelm you with all of them up front, each new section will introduce requirements specific to the language or situation being covered. However, there are some fundamentals you’ll need to ensure are available.

An R² environment, preferably R 3.6.x which is what was used for this series.
RStudio³, as we’ll be using many of the features provided in it to help reduce development friction
The {pkgbuild}⁴ package installed

Once you’ve gotten through those steps, you should fire up RStudio and run:

pkgbuild::check_build_tools(debug = TRUE)

which will help you make sure your particular system is ready to build packages.

After performing the build tool check and/or installation of the necessary core tools, you will then need to install the {devtools}⁵ package, which will help ensure that the remaining core packages required are installed.

We’re also going to use the git⁶ source code version control system. The git ecosystem is not “GitHub”, which is just a public (or, potentially somewhat private) place to house source code repositories, just like other hosted services such as Bitbucket, GitLab, or SourceHut. You can use the excellent “Happy Git with R”⁷ resource to help ensure you’re source code control environment is also ready to use.

Supplemental References

It may be helpful to create a browser bookmark folder for supplemental reference material that will be referred to from time-to-time across the sections (we’ll be adding to this list in each chapter, too):

Writing R Extensions (https://cran.r-project.org/doc/manuals/r-release/R-exts.html)
Advanced R (http://adv-r.had.co.nz/)
R Packaged (http://r-pkgs.had.co.nz/)

Up Next

If you’ve been a user of “development versions” of R packages or have authored R packages you likely made quick work of this first installment. Those new to creating packages with R, those who tend to only use fully-baked CRAN versions of R packages, and/or those who have not worked with git before likely had to do quite a bit of work to get down to this point (if this describes you, you definitely deserve both a break and kudos for getting this far!).

In the next installment we’ll make sure the package building infrastructure is ready to roll by creating a basic R package that we’ll use as a building block for future work.

“Writing R Extensions”; Chapter 5, “System and foreign language interfaces”; (https://cran.r-project.org/doc/manuals/r-release/R-exts.html#System-and-foreign-language-interfaces) ↩
R Project Home (https://www.r-project.org/) ↩
RStudio Home (http://rstudio.com/) ↩
{pkgbuild} CRAN page (https://cran.rstudio.com/web/packages/pkgbuild/) ↩
{devtools} Home (https://devtools.r-lib.org/) ↩
git Home (https://git-scm.com/) ↩
Happy Git with R (https://happygitwithr.com/) ↩

Short Attention Span Theatre: Reproducing Axios’ “1 Big Thing” Google Trends 2019 News In Review with {ggplot2}

I woke up to Axios’ “1 Big Thing” ridgeline chart showing the crazy that was the 2019 news cycle:

and, I decided to reproduce it in {ggplot2}.

Getting The Data

First, I had to find the data. The Axios chart is interactive, so I assumed the visualization was built on-load. It was, but the data was embedded in a javascript file vs loaded as JSON via an XHR request:

which was easy enough to turn into JSON anyone can use.

NOTE: The # hrbrmstr/hrbrthemes is an indication you may need to use the version of {hrbrthemes} from my gitea/sourcehut/gitlab/bitbucket/github. That package has instructions for installing fonts needed. Sub out theme_ipsum_es() with theme_ipsum(), theme_ipsum_rc() or just use theme_bw() and tweak aesthetics manually.

library(ggalt)
library(hrbrthemes) # hrbrmstr/hrbrthemes
library(tidyverse)

jsonlite::fromJSON("https://rud.is/dl/2019-axios-news.json") %>% 
  as_tibble() -> xdf

xdf
## # A tibble: 31 x 3
##    name                    avg data      
##    <chr>                 <dbl> <list>    
##  1 Gov't shutdown        20.5  <int [51]>
##  2 Mexico-U.S. border    22.8  <int [51]>
##  3 Green New Deal        11.3  <int [51]>
##  4 Blackface              9.61 <int [51]>
##  5 N. Korea-Hanoi Summit 11.2  <int [51]>
##  6 Boeing 737 Max         4.79 <int [51]>
##  7 Brexit                28.5  <int [51]>
##  8 Israel                42.1  <int [51]>
##  9 SpaceX                24.1  <int [51]>
## 10 Game of Thrones       16.8  <int [51]>
## # … with 21 more rows

This is pretty tidy already, but we’ll need to expand the data column and give each week an index:

unnest(xdf, data) %>% 
  group_by(name) %>% 
  mutate(idx = 1:n()) %>% 
  ungroup() %>% 
  mutate(name = fct_inorder(name)) -> xdf # making a factor foe strip/panel ordering 

xdf
## # A tibble: 1,581 x 4
##    name             avg  data   idx
##    <fct>          <dbl> <int> <int>
##  1 Gov't shutdown  20.5    69     1
##  2 Gov't shutdown  20.5   100     2
##  3 Gov't shutdown  20.5    96     3
##  4 Gov't shutdown  20.5   100     4
##  5 Gov't shutdown  20.5    19     5
##  6 Gov't shutdown  20.5     9     6
##  7 Gov't shutdown  20.5    17     7
##  8 Gov't shutdown  20.5     3     8
##  9 Gov't shutdown  20.5     2     9
## 10 Gov't shutdown  20.5     1    10
## # … with 1,571 more rows

We’ll take this opportunity to find the first week of each month (via rle()) so we can have decent axis labels:

# get index placement for each month axis label
sprintf("2019-%02s-1", 1:51) %>% 
  as.Date(format = "%Y-%W-%w") %>% 
  format("%b") %>% 
  rle() -> mons

mons
## Run Length Encoding
##   lengths: int [1:12] 4 4 4 5 4 4 5 4 5 4 ...
##   values : chr [1:12] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" ...

month_idx <- cumsum(mons$lengths)-3

month_idx
##  [1]  1  5  9 14 18 22 27 31 36 40 44 48

We’ve got all we need to make a {ggplot2} version of the chart. Here’s the plan:

use geom_area() and map colour and fill to avg (like Axios did), using an medium alpha value so we can still see below the overlapped areas
also use an xspline() stat with geom_area() so we get smooth lines vs pointy ones
use geom_hline() vs an axis line so we can map a colour aesthetic to avg as well
make a custom x-axis scale so we can place the labels we just made
expand the y-axis upper limit to avoid cutting off any part of the geoms
use the inferno viridis palette, but not the extremes of it
make facets/panels on the name, positioning the labels on the left
finally, tweak strip positioning so we get overlapped charts

ggplot(xdf, aes(idx, data)) +
  geom_area(alpha = 1/2, stat = "xspline", aes(fill = avg, colour = avg)) +
  geom_hline(
    data = distinct(xdf, name, avg),
    aes(yintercept = 0, colour = avg), size = 0.5
  ) +
  scale_x_continuous(
    expand = c(0,0.125), limits = c(1, 51),
    breaks = month_idx, labels = month.abb
  ) +
  scale_y_continuous(expand = c(0,0), limits = c(0, 105)) +
  scale_colour_viridis_c(option = "inferno", direction = -1, begin = 0.1, end = 0.9) +
  scale_fill_viridis_c(option = "inferno", direction = -1, begin = 0.1, end = 0.9) +
  facet_wrap(~name, ncol = 1, strip.position = "left", dir = "h") +
  labs(
    x = NULL, y = NULL, fill = NULL, colour = NULL,
    title = "1 big thing: The insane news cycles of 2019",
    subtitle = "Height is search interest in a given topic, indexed to 100.\nColor is average search interest between Dec. 30, 2018–Dec. 20, 2019",
    caption = "Source: Axios <https://www.axios.com/newsletters/axios-am-1d9cd913-6142-43b8-9186-4197e6da7669.html?chunk=0#story0>\nData: Google News Lab. Orig. Chart: Danielle Alberti/Axios"
  ) +
  theme_ipsum_es(grid="X", axis = "") +
  theme(strip.text.y = element_text(angle = 180, hjust = 1, vjust = 0)) +
  theme(panel.spacing.y = unit(-0.5, "lines")) +
  theme(axis.text.y = element_blank()) +
  theme(legend.position = "none")

To produce this finished product:

FIN

The chart could be tweaked a bit more to get even closer to the Axios finished product.

Intrepid readers can also try to use {plotly} to make an interactive version.

Somehow, I get the feeling 2020 will have an even more frenetic news cycle.

Using #rstats to Help Santa Deliver Presents This Christmas!

The right jolly old elves over at Alteryx created a “Santalytics” challenge back in 2016 to see if their community members could help Santa deliver presents to kids all across the globe.

They posted data for four challenges along with solutions and I’ve made a git repo & RStudio project with the challenges and solves for two of the four (I was going to try to have all four done but December has been a cruel master when it comes to allowing for free time).

Most of tasks are pretty straightforward and range from basic joining and grouping to some spatial optimizations (but all very do-able with a little elbow grease). The featured image at the top of the blog is one solution to finding “distribution hubs” for all the presents.

You can find the starter Rmd and data files over at your favorite social coding site:

FIN

Give Santa a hand and blog your approach to solving each challenge!

All four of our offspring are home for Christmas this year (w00t!!!) so this is likely the last blog post of 2019. Many blessings to all as you celebrate this time of year and catch y’all in 2020!

Quickly Create (Mostly) Responsive HTML Columns With {htmltools}

I had need to present a wall-of-text to show off a giant list of SSL certificate alternate names and needed the entire list to fit on one slide (not really for reading in full, but to show just how many there were in a way that a simple count would not really convey).

Keynote, PowerPoint, and gslides all let you make tables or draw boxes but I really didn’t want to waste time fiddling as much as I’d need to with those tools just for this one slide.

Thankfully, I remembered that HTML5 <div> elements can be styled with a column-count attribute and we can use {htmltools} to make quick work of this task.

To show it off, first we’ll need some words, so let’s make some using stringi::stri_rand_lipsum():

library(stringi)
library(htmltools)
library(tidyverse)

set.seed(201912)

stri_rand_lipsum(5) %>%
  stri_paste(collapse = " ") %>%
  stri_split_boundaries() %>%
  flatten_chr() %>%
  stri_trim_both() -> words

head(words)
## [1] "Lorem"  "ipsum"  "dolor"  "sit"    "amet,"  "sapien"

length(words)
## [1] 514

Now, we’ll make a function — columnize() — that we can reuse in the future and have it take in a character vector, the column count we want and some CSS styling, then use some {htmltools} tag functions to make quick work of this task:

columnize <- function(words, ncol = 5,
                      style = "p { font-family:'Roboto Condensed';font-size:12pt;line-height:12.5pt;padding:0;margin:0}") {

  tagList(
    tags$style(style[1]),
    tags$div(
      words %>%
        map(tags$p) %>%
        tagList(),
      style = sprintf("column-count:%d", as.integer(ncol[1]))
    )
  )

}

In this function we turn the style param into a <style> section in the generated HTML, then turn words into <p> tags wrapped in <div>.

This function can be used in a R Markdown code block (set block parameters to results='markup') to have the columns appear automagically in the resultant HTML document output. You can also use it in standalone fashion by using html_print() on the results:

html_print(
  columnize(words, 10)
)

The above is an image just for easier blog display purposes. You can test out a working example from a spun R script over at https://rud.is/rpubs/columnize.html that has some different column count examples. Grow and shrink the browser width to see how the columns shrink and grow with it.

FIN

Hopefully this helps others save time and effort like it did for me today. You can experiment with making the columnize() function more robust by having it work with all the other column-formatting properties:

column-count: Specifies the number of columns an element should be divided into
column-fill: Specifies how to fill columns
column-gap: Specifies the gap between the columns
column-rule: A shorthand property for setting all the column-rule-* properties
column-rule-color: Specifies the color of the rule between columns
column-rule-style: Specifies the style of the rule between columns
column-rule-width: Specifies the width of the rule between columns
column-span: Specifies how many columns an element should span across
column-width: Specifies a suggested, optimal width for the columns
columns: A shorthand property for setting column-width and column-count

You can find out more about these properties (and play with some examples) over at https://www.w3schools.com/css/css3_multiple_columns.asp.

POST-FIN

I robustified the function a bit:

#' Make a responsive columnar text div
#'
#' @param words character vector of text to present in a columnar div
#' @param div_id tag `id` attribute to assign to the `<div>` (which can help you style it with the `style` param).
#' @param div_class tag `class` attribute to assign to the `<div>` (which can help you style it with the `style` param)
#' @param ncol number of columns
#' @param width  specifies the column width; one of "`auto`" (the default) which leaves it up to the
#'        browser implementation, a _length_ CSS size value that specifies the width of the columns.
#'        The number of columns will be the minimum number of columns needed to show all the content
#'        across the element., "`initial`" or "`inherit`" (see `fill` for descriptions of those).
#' @param fill how to fill columns, balanced or not. One of "`balance`", "`auto`", "`initial`", "`inherit`".
#'        Defaults to "`balance`" which fills each column with about the same amount of content, but will not
#'        allow the columns to be taller than the height (so, columns might be shorter than the height as the
#'        browser distributes the content evenly horizontally). "`auto`" fills each column until it reaches
#'        the height, and do this until it runs out of content (so, this value will not necessarily fill all
#'        the columns nor fill them evenly). "`initial`" sets this property to its default value; and
#'        "`inherit`" inherits this property from its parent element.
#' @param gap either a textual value (e.g. "`10px`") for the spacing gap between columns, or "`normal`"
#'        (the default) which uses a `1em` gap on most browsers, "`initial`" or "`inherit`" (see `fill` for descriptions
#'        of those).
#' @param rule_color specifies the CSS color value of the rule between columns; also can be "`initial`" or "`inherit`"
#'        (see `fill` for descriptions of those).
#' @param rule_style specifies the style of the rule between columns; valid values are "`none`" (the default) for no
#'        rule, "`hidden`", "`dotted`", "`dashed`", "`solid`", "`double`", "`groove`" for 3D grooved rule, "`ridge`"
#'        for a 3D ridged rule, "`inset`" for a 3D inset rule, "`outset`" for a 3D outset rule, "`initial`" or
#'        "`inherit`" (see `fill` for descriptions of those).
#' @param rule_width specifies the width of the rule between columns; one of "`medium`" (the default), "`thin`", "`thick`",
#'        a _length_ CSS size value, `initial`" or "`inherit`" (see `fill` for descriptions of those).
#' @param span specifies how many columns an element should span across; one of "`none`" (the default) so the element spans
#'        across one column, "`all`" (spans across all columns), "`initial`" or "`inherit`"
#'        (see `fill` for descriptions of those).
#' @param style CSS style properties (complete text spec) that will be put into an `{htmltools}` `tags$style()` call that
#'        will come along for the ride with the `<div>`; useful for specifying `<p>` properties for each item of the
#'        `words` vector
#' @note No validation is done on inputs
#' @export
#' @examples
#' columnize(state.name, ncol = 3, rule_color = "black", rule_width = "0.5px")
columnize <- function(words,
                      div_id = NULL,
                      div_class = NULL,
                      ncol = 5,
                      width = "auto",
                      fill = "balance",
                      gap = "normal",
                      rule_color = "initial",
                      rule_style = "none",
                      rule_width = "medium",
                      span = "none",
                      style = "p {font-family:'Roboto Condensed';font-size:12pt;line-height:12.5pt;padding:0;margin:0}") {

  tagList(
    tagList(
     do.call(tags$style, as.list(style)),
    ),
    tags$div(
      id = div_id,
      class = div_class,
      words %>%
        map(tags$p) %>%
        tagList(),
      style = sprintf(
        paste0(c(
          "column-count:%s",
          "column-fill: %s",
          "column-gap: %s",
          "column-rule-color: %s",
          "column-rule-style: %s",
          "column-rule-width: %s",
          "column-span: %s",
          "column-width: %s"
        ), collapse = ";"),
        ncol, fill, gap, rule_color, rule_style, rule_width, span, width
      )
    )
  )

}

So now you can do something like:

columnize(
  div_id = "states",
  words = state.name, 
  ncol = 3, 
  rule_color = "black", 
  rule_style = "solid", 
  rule_width = "2px",
  style = c(
    "#states { width: 50%; text-align: center };\np {font-family:'Roboto Condensed'}",
    "p { font-family: 'sans-serif'}"
  )
) %>% 
  htmltools::html_print()

and get:

Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
Florida
Georgia
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming

Category Archives: R

Installing duti

FIN

The Basic Process

FIN

FIN

Configuring {devtools}

Creating A Package

Rounding Out The Corners

Passing The Test

Getting Things Under Control

Quick Reference

Exercises

Up Next

Base Requirements

Supplemental References

Up Next

Getting The Data

FIN

FIN

FIN

POST-FIN

Installing `duti`