Skip navigation

Author Archives: hrbrmstr

Don't look at me…I do what he does — just slower. #rstats avuncular • ?Resistance Fighter • Cook • Christian • [Master] Chef des Données de Sécurité @ @rapid7

It seems that the need for MX, DKIM, SPF, and DMARC records for modern email setups were just not enough acronyms (and setup tasks) for some folks, resulting in the creation of yet-another-acronym — BIMI, or, Brand Indicators for Message Identification. The goal of BIMI is to “provide a mechanism for mail senders to publish a validated logotype that mail receivers can display with the senders’ messages.” You can read about the rationale for BIMI and the preliminary RFC for crafting BIMI DNS TXT records over a few caffeinated beverages. I’ll try to TL;DR the high points below.

The idea behind BIMI is to provide a visual indicator of the brand associated with a mail message; i.e. you’ll have an image to look at somewhere in the mail list display and/or mail message display of your mail client if it supports BIMI. This visual indicator is merely an image URL association with a brand mail domain through the use of a new special-prefix DNS TXT record. Mail intermediaries and mail clients are only supposed to allow presentation of BIMI-record provided images after verifying that the email domain itself conforms to the DMARC standard (which you should be using if you’re an organization/brand and shame on you if you’re not by now). In fact, the goal of BIMI is to help ensure:

  • the organization is legitimate
  • the domain names are controlled by the organization
  • the organization has current rights to display the indicator

When BIMI validation is being performed, the party requesting validation is currently authorized to do so by the organization and is who they say they are.

If you’re having flashbacks to the lost era of when SSL certificates were supposed to have similar integrity assertions, you’re not alone (thanks, LE).

What’s Really Going On?

I’m not part of any working group associated with BIMI, I just measure and study the internet for a living. As someone who is as likely to use alpine to peruse mail as I am a thick email client or (heaven forbid) web client, BIMI will be of little value to me since I’m not really going to see said images anyway.

Reading through all the BIMI (and associated) RFCs, email security & email marketing vendor blogs/papers, and general RFC commentators, BIMI isn’t solving any problem that well-armored DMARC configurations aren’t already solving. It appears to be driven mainly by brand marketing wonks who just want to shove brand logos in front of you and have one more way to track you.

Yep, tracking email perusals (even if it’s just a list view) will be one of the benefits (to brands and marketing firms) and is most assuredly a non-stated primary goal of this standard. To help illustrate this, let’s look at the BIMI record for one of the most notorious tracking brands on the planet, Verizon (in this case, Verizon Wireless). When you receive a BIMI-“enhanced” email from verizonwireless.com the infrastructure handling the email receipt will look for and process the BIMI header that was sent along for the ride and eventually query a TXT record for default._bimi.verizonwireless.com (or whatever the sender has specified instead of default — more on that in a bit). In this case the response will be:

v=BIMI1; l=https://ecrm.e.verizonwireless.com/AC/Global/Bling/Images/checkmark/verizon.svg;

which means the image they want displayed is at that URL. Your client will have to fetch that during an interactive session, so your IP address — at a minimum — will be leaked when that fetch happens.

Brands can specify something other than the default. selector with the email, so they could easily customize that to be a unique identifier which will “be you” and know when you’ve at least looked at said email in a list view (provided that’s how your email client will show it) if not in the email proper. Since this is a “high integrity” visual component of the message, it’s likely not going to be subject to the “do not load external images/content” rules you have setup (you do view emails with images turned off initially, right?).

So, this is likely just one more way the IETF RFC system is being abused by large corporations to continue to erode our privacy (and get their horribly designed logos in our faces).

Let’s see who are the early adopters of BIMI.

BIMI Through the Alexa Looking Glass

Amazon had stopped updating the Alexa Top 1m sites for a while but it’s been back for quite some time so we can use it to see how many sites in the top 1m have BIMI records.

We’ll use the {zdnsr} package (also on GitLab, SourceHut, BitBucket, and GitUgh) to perform a million default._bimi prefix queries and see how many valid BIMI TXT record responses we get.

library(zdnsr) # hrbrmstr/zdnsr on social coding sites
library(stringi)
library(urltools)
library(tidyverse)

refresh_publc_nameservers_list() # get a current list of active nameservers we can use

# read in the top1m
top1m <- read_csv("~/data/top-1m.csv", col_names = c("rank", "domain")) # http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

# fire off a million queries, storing good results where we can pick them up later
zdns_query(
  entities = sprintf("%s.%s", "default._bimi", top1m$domain),
  query_type = "TXT",
  num_nameservers = 500,
  output_file = "~/data/top1m-bimi.json",
)

# ~10-30m later depending on your system/network/randomly chosen resolvers

bmi <- jsonlite::stream_in(file("~/data/top1m-bimi.json")) # using jsonlite vs ndjson since i don't want a "flat" structure

idx <- which(lengths(bmi$data$answers) > 0) # find all the ones with non-0 results

# start making a tidy data structure
tibble(
  answer = bmi$data$answers[idx]
) %>%
  unnest(answer) %>%
  filter(grepl("^v=BIM", answer)) %>% # only want BIMI records, more on this in a bit
  mutate(
    l = stri_match_first_regex(answer, "l=([^;]+)")[,2], # get the image link
    l_dom = domain(l) # get the image domain
  ) %>% 
  bind_cols(
    suffix_extract(.$name) # so we can get the apex domain below
  ) %>% 
  mutate(
    name_apex = glue::glue("{domain}.{suffix}"),
    name_stripped = stri_replace_first_regex(
      name, "^default\\._bimi\\.", ""
    )
  ) %>% 
  select(name, name_stripped, name_apex, l, l_dom, answer) -> bimi_df

Here’s what we get:

bimi_df
## # A tibble: 321 x 6
##    name       name_stripped  name_apex  l                            l_dom               answer                       
##    <chr>      <chr>          <glue>     <chr>                        <chr>               <chr>                        
##  1 default._… ebay.com       ebay.com   https://ir.ebaystatic.com/p… ir.ebaystatic.com   v=BIMI1; l=https://ir.ebayst…
##  2 default._… linkedin.com   linkedin.… https://media.licdn.com/med… media.licdn.com     v=BIMI1; l=https://media.lic…
##  3 default._… wish.com       wish.com   https://wish.com/static/img… wish.com            v=BIMI1; l=https://wish.com/…
##  4 default._… dropbox.com    dropbox.c… https://cfl.dropboxstatic.c… cfl.dropboxstatic.… v=BIMI1; l=https://cfl.dropb…
##  5 default._… spotify.com    spotify.c… https://message-editor.scdn… message-editor.scd… v=BIMI1; l=https://message-e…
##  6 default._… ebay.co.uk     ebay.co.uk https://ir.ebaystatic.com/p… ir.ebaystatic.com   v=BIMI1; l=https://ir.ebayst…
##  7 default._… asos.com       asos.com   https://content.asos-media.… content.asos-media… v=BIMI1; l=https://content.a…
##  8 default._… wix.com        wix.com    https://valimail-app-prod-u… valimail-app-prod-… v=BIMI1; l=https://valimail-…
##  9 default._… cnn.com        cnn.com    https://amplify.valimail.co… amplify.valimail.c… v=BIMI1; l=https://amplify.v…
## 10 default._… salesforce.com salesforc… https://c1.sfdcstatic.com/c… c1.sfdcstatic.com   v=BIMI1; l=https://c1.sfdcst…
## # … with 311 more rows

I should re-run this mass query since it usually takes 3-4 runs to get a fully comprehensive set of results (I should also really use work’s infrastructure to do the lookups against the authoritative nameservers for each organization like we do for our FDNS studies, but this was a spur-of-the-moment project idea to see if we should add BIMI to our studies and my servers are “free” whereas AWS nodes most certainly are not).

To account for the aforementioned “comprehensiveness” issues, we’ll round up the total from 310 to 400 (the average difference between 1 and 4 bulk queries is more like 5% than 20% but I’m in a generous mood), so 0.04% of the domains in the Alexa Top 1m have BIMI records. Not all of those domains are going to have MX records but it’s safe to say less than 1% of the brands on the Alexa Top 1m have been early BIMI adopters. This is not surprising since it’s not really a fully baked standard and no real clients support it yet (AOL doesn’t count, apologies to the Oathers). Google claims to be “on board” with BIMI, so once they adopt it, we should see that percentage go up.

Tracking isn’t limited to a tricked out dynamic DNS configuration that customizes selectors for each recipient. Since many brands use third party services for all things email, those clearinghouses are set to get some great data on you if these preliminary results are any indicator:

count(bimi_df, l_dom, sort=TRUE)
## # A tibble: 255 x 2
##    l_dom                                                                          n
##    <chr>                                                                      <int>
##  1 irepo.primecp.com                                                             13
##  2 www.letakomat.sk                                                               9
##  3 valimail-app-prod-us-west-2-auth-manager-assets.s3.us-west-2.amazonaws.com     8
##  4 static.mailkit.eu                                                              7
##  5 astatic.ccmbg.com                                                              5
##  6 def0a2r1nm3zw.cloudfront.net                                                   4
##  7 static.be2.com                                                                 4
##  8 www.christin-medium.com                                                        4
##  9 amplify.valimail.com                                                           3
## 10 bimi-host.250ok.com                                                            3
## # … with 245 more rows

The above code counted how many BIMI URLs are hosted at a particular domain and the top 5 are all involved in turning you into the product for other brands.

Speaking of brands, these are the logos of the early adopters which I made by generating some HTML from an R script and screen capturing the browser result:

FIN

The data from the successful BIMI results of the mass DNS query is at https://rud.is/dl/2020-02-21-bimi-responses.json.gz. Knowing there are results to be had, I’ll be setting up a regular (proper) mass-query of the Top 1m and see how things evolve over time and possibly get it on the work docket. We may just do a mass BIMI prefix query against all FDNS apex domains just to see a broader scale result, so stay tuned.

Drop note if you discover any more insights from the data (there are a few in there I’m saving for a future post) or your own BIMI inquiries; also drop a note if you have a good defense for BIMI other than marketing and tracking.

As the maintainer of RSwitch — and developer of my own (for personal use) macOS, iOS, watchOS, iPadOS and tvOS apps — I need the full Apple Xcode install around (more R-focused macOS folk can get away with just the command-line tools being installed). As an Apple Developer who insanely runs the macOS & Xcode betas as they are released, I also have the misery of dealing with Xcode usurping authority over .R files every time it receives an update. Sure, I can right-click on an R script, choose “Open With => Other…”, pick RStudio and make it the new default, but clicks interrupt train of thought and take more time than execution a quick shell command at a terminal prompt (which I always have up).

Enter: dtuihttps://github.com/moretension/duti — a small command-line tool that lets you change the default application just by knowing the id of the application you want to make the default. For instance, RStudio’s id is org.rstudio.RStudio which can be obtained via:

$ osascript -e 'id of app "RStudio"'
org.rstudio.RStudio

and, we can use that value in a quick call to duti:

$ duti -s org.rstudio.RStudio .R all

If you’d rather Visual Studio Code or Sublime Text to be the default for .R files, their bundle ids are com.sublimetext.3 and com.microsoft.VSCode, respectively. If you’d rather use Atom, well you really need to think about your life choices.

We can see what the current default for R scripts via:

$ duti -x R
RStudio.app
/Applications/RStudio.app
org.rstudio.RStudio

You can turn the “setter” into a shell alias (preferably zsh or sh alias since bash is going away soon) or shell script for quick use.

Installing duti

Homebrew users can just brew install duti and get on with their day. Folks can also grab the latest release and get on with their day with just a little more effort.

The duti utility can also be compiled on your own (which is preferred so you can look at the source to make sure you know being compromised by a random developer on the internet); but, if you have macOS 10.15 (Catalina), you’ll need to jump through a few hoops since it doesn’t compile out-of-the-box on that platform yet. Thankfully those hoops aren’t too bad thanks to a helpful pull request that adds support for the current version of macOS. (You’ll need at least the command-line developer tools installed for this to work and likely need to brew install autoconf automake libtool to ensure all the toolchain bits that are needed are in place.):

At a terminal prompt, go to where you normally go to clone git repositories and grab the source:

$ git clone git@github.com:moretension/duti.git
$ cd duti
$ git fetch origin pull/39/head:pull_39 # add and fetch the origin for the PR
$ git checkout pull_39                  # switch to the branch
$                                       # review the source code
$                                       # no, really, review the source code!
$ autoconf                              # run autoconf to generate the configure script
$ ./configure                           # generate the Makefile (there will be "checking" and "creating" messages)
$ make                                  # build it! (there will be macOS API deprecation warnings but no errors)
$ make install                          # install it! (you may need to prefix with "sudo -H"; this will put the binary in `/usr/local/bin/` and man page in `/usr/local/share/man/man`

NOTE: If you only have the macOS Xcode command line tools (vs the entirety of Xcode) you’ll need to edit aclocal.m4 before you run autoconf and change line 9 to be:

sdk_path="/Library/Developer/CommandLineTools/SDKs"

since the existing setting assumes you have the full Xcode installation available.

FIN

I’ll be adding this functionality to the next version of RSwitch, letting you specify the application(s) you want to own various R-ish files. It will check for the proper values being in place on a regular basis and set them to your defined preferences (I also need to see if there’s an event I can have RSwitch watch for to trigger the procedure).

If you have another, preferred way to keep ownership of R files drop a blog post link in the comments (or just drop a note the comments with said procedure).

macOS R users who tend to work on the bleeding edge likely noticed some downtime at <mac.r-project.org> this past weekend. Part of the issue was an SSL/TLS certificate expiration situation. Moving forward, we can monitor this with R using the super spiffy {openssl} and {pushoverr} packages whilst also generating a daily report with {rmarkdown} and {DT}.

The Basic Process

The {openssl} package has a handy function — download_ssl_cert() — which will, by default, hit a given host on the standard HTTPS port (443/TCP) and grab the site certificate and issuer. We’ll grab the “validity end” field and convert that to a date to use for comparison.

To get the target list of sites to check I used Rapid7’s FDNS data set and a glance at a few certificate transparency logs to put together a current list of “r-project” domains that have been known to have SSL certs. This process could be made more dynamic, but things don’t change that quickly in r-project domain land.

Finally, we use the {DT} package to build a pretty HTML table and the {pushoverr} package to send notifications at normal priority for certs expiring within a week and critical priority for certs that have expired (the package has excellent documentation which will guide you through setting up a Pushover account).

I put this all in a plain R script named r-project-ssl-notify.R that’s then called from a Linux CRON job which runs:

/usr/bin/Rscript -e 'rmarkdown::render(input="PATH_TO/r-project-ssl-notify.R", output_file="PATH_TO/r-project-cert-status/index.html", quiet=TRUE)'

once a day at 0930 ET to make this status page and also fire off any notifications which I have going to my watch and phone (I did a test send by expanding the delta to 14 days):

watch

phone

Here’s the contents of

#' ---
#' title: "r-project SSL/TLS Certificate Status"
#' date: "`r format(Sys.time(), '%Y-%m-%d')`"
#' output:
#'   html_document:
#'     keep_md: false
#'     theme: simplex
#'     highlight: monochrome
#' ---
#+ init, include=FALSE
knitr::opts_chunk$set(
  message = FALSE, 
  warning = FALSE, 
  echo = FALSE, 
  collapse=TRUE
)

#+ libs
library(DT)
library(openssl)
library(pushoverr)
library(tidyverse)

# Setup -----------------------------------------------------------------------------------------------------------

# This env config file contains two lines:
#
# PUSHOVER_USER=YOUR_PUSHOVER_USER_STRING
# PUSHOVER_APP=YOUR_PUSHOVER_APP_KEY
#
# See the {pushoverr} package for how to setup your Pushover account
readRenviron("~/jobs/conf/r-project-ssl-notify.env")


# Check certs -----------------------------------------------------------------------------------------------------

# r-project.org domains retrieved from Rapid7's FDNS data set
# (https://opendata.rapid7.com/sonar.fdns_v2/) and cert transparency logs

#+ work
c(
  "beta.r-project.org", "bugs.r-project.org", "cloud.r-project.org", 
  "cran-archive.r-project.org", "cran.at.r-project.org", "cran.ch.r-project.org", 
  "cran.es.r-project.org", "cran.r-project.org", "cran.uk.r-project.org", 
  "cran.us.r-project.org", "developer.r-project.org", "ess.r-project.org", 
  "ftp.cran.r-project.org", "journal.r-project.org", "lists.r-forge.r-project.org", 
  "mac.r-project.org", "r-project.org", "svn.r-project.org", "translation.r-project.org", 
  "user2011.r-project.org", "user2014.r-project.org", "user2016.r-project.org", 
  "user2018.r-project.org", "user2019.r-project.org", "user2020.r-project.org", 
  "user2020muc.r-project.org", "win-builder.r-project.org", "www.cran.r-project.org", 
  "www.r-project.org", "www.user2019.fr"
) -> r_doms

# grab each cert

r_certs <- map(r_doms, openssl::download_ssl_cert)

# make a nice table
tibble(
  dom = r_doms,
  expires = map_chr(r_certs, ~.x[[1]][["validity"]][[2]]) %>% # this gets us the "validity end"
    as.Date(format = "%b %d %H:%M:%S %Y", tz = "GMT"),        # and converts it to a date object
  delta = as.numeric(expires - Sys.Date(), "days")            # this computes the delta from the day this script was called
) %>% 
  arrange(expires) -> r_certs_expir

# Status page generation ------------------------------------------------------------------------------------------

# output nice table  
DT::datatable(r_certs_expir, list(pageLength = nrow(r_certs_expir))) # if the # of r-proj doms gets too large we'll cap this for pagination

# Notifications ---------------------------------------------------------------------------------------------------

# See if we need to notify abt things expiring within 1 week
# REMOVE THIS or edit the delta max if you want less noise
one_week <- filter(r_certs_expir, between(delta, 1, 7))
if (nrow(one_week) > 0) {
  pushover_normal(
    title = "There are r-project SSL Certs Expiring Within 1 Week", 
    message = "Check which ones: https://rud.is/r-project-cert-status"
  )
}

# See if we have expired certs
expired <- filter(r_certs_expir, delta <= 0)
if (nrow(expired) > 0) {
  pushover_critical(
    title = "There are expired r-project SSL Certs!", 
    message = "Check which ones: https://rud.is/r-project-cert-status"
  )
}

FIN

With just a tiny bit of R code we have the ability to monitor expiring SSL certs via a diminutive status page and alerts to any/all devices at our disposal.

Each year the World Economic Forum releases their Global Risk Report around the time of the annual Davos conference. This year’s report is out and below are notes on the “cyber” content to help others speed-read through those sections (in the event you don’t read the whole thing). Their expert panel is far from infallible, but IMO it’s worth taking the time to read through their summarized viewpoints. Some of your senior leadership are represented at Davos and either contributed to the report or will be briefed on the report, so it’s also a good idea just to keep an eye on what they’ll be told.

Direct link to report PDF: http://www3.weforum.org/docs/WEF_Global_Risk_Report_2020.pdf.

“Cyber” Cliffs Notes

  • Cyberattacks moved out of the Top 5 Global Risks in terms of Likelihood (page 2)

  • Cyberattacks remain in the upper-right risk quadrant (page 3)

  • Cyberattacks likelihood estimation reduced slightly but impact moved up a full half point to ~4.0 (out of 5.0) (page 4)

  • Cyberattacks are placed as directly related to named risks of: (page 5)

    • information infrastructure breakdown, (76.2% of the 200+ member expert panel on short-term outlook)
    • data fraud/theft, (75.0% of the 200+ member expert panel on short-term outlook) and
    • adverse tech advances (<70% of the 200+ member expert panel on short-term outlook)

    All three of which have their own relationships (it’s worth tracing them out as an exercise in downstream impact potential if one hasn’t worked through a risk relationship exercise before)

  • Cyberattacks remain on the long-term outlook (next 10 years) for both likelihood and impact by all panel sectors

  • Pages 61-71 cover the “Fourth Industrial Revolution” (4IR) and cyberattacks are mentioned on every page.

    • There are 2025 market projections that might be useful as deck fodder.
    • Interesting statistic that 50% of the world’s population is online and that one million additional people are joining the internet daily.
    • The notion of nation-state mandated “parallel cyberspaces” is posited (we’re seeing that develop in Russia and some other countries right now).
    • They also mention the proliferation of patents to create and enforce a first-mover advantage
    • Last few pages of the section have a wealth of external resources that are worth perusing
  • In the health section on page 78 they mention the susceptibility of health data to cyberattacks

  • They list out specific scenarios in the back; many have a cyber component

    • Page 92: “Geopolitical risk”: Interstate conflict with regional consequences — A bilateral or multilateral dispute between states that escalates into economic (e.g. trade/currency wars, resource nationalization), military, cyber, societal or other conflict.

    • Page 92: “Technological risk”: Breakdown of critical information infrastructure and networks — Cyber dependency that increases vulnerability to outage of critical information infrastructure (e.g. internet, satellites) and networks, causing widespread disruption.

    • Page 92: “Technological risk”: Large-scale cyberattacks — Large-scale cyberattacks or malware causing large economic damage, geopolitical tensions or widespread loss of trust in the internet.

    • Page 92: “Technological risk”: Massive incident of data fraud or theft — Wrongful exploitation of private or official data that takes place on an unprecedented scale.

FIN

Hopefully this saved folks some time, and I’m curious as to how others view the Ouija board scrawls of this expert panel when it comes to cybersecurity predictions, scenarios, and ratings.

UPDATE 2020-02-11 Apple now supports downloading transactions as CSV or OFX! (via MacObserver).


I saw this CNBC article on an in-theory browser client-side-only conversion utility for taking Apple Card PDF statements and turning them into CSV files.

Since I (a) never trust any browser or site and (b) the article indicated that there is a $5 fee to avoid the “single random transaction removal”, I felt compelled to throw together an R script to do this for at least folks who are capable of setting up R so that all processing is guaranteed to be local.

FWIW the site does appear to do what it says on the tin (all processing is, indeed, local). That doesn’t mean one of your extensions isn’t spying on you, nor does it mean that the site could not turn evil someday (one its own or via an attacker compromise).

read_apple_card_statement <- function(path) {

  require(stringi)
  require(pdftools)
  require(tidyverse)

  # make sure the file exists
  path <- path.expand(path[1])
  if (!file.exists(path)) stop("File '", path, "' not found.", call.=FALSE)

  pdf_text(path) %>% # read it in
    stri_split_lines() %>% # turn \n to a separate character vector element
    unlist() %>% # flatten it
    stri_trim_both() %>% # get rid of leading/trailing spaces
    keep(stri_detect_regex, "^([[:digit:]]{2}/[[:digit:]]{2}/[[:digit:]]{4})") %>% # find lines that start with a date
    map_df(~{
      rec <- as.list(unlist(stri_split_regex(.x, "[[:space:]]{3,}"))) # find the columns
      if (stri_detect_fixed(.x, "%")) { # lines with a `%` in them distinguish charges from payments
        rec <- set_names(rec, c("date", "description", "daily_cash_pct", "daily_cash_amt", "amt")) # ones with charges have cash back columns
      } else {
        rec <- set_names(rec, c("date", "description", "amt")) # ones w/o % do not
      }
    }) %>%
    mutate(
      date = lubridate::mdy(date), # make dates dates
      amt = stri_replace_first_fixed(amt, "$", "") %>% parse_number(), # dollars to numbers
      daily_cash_pct = parse_number(daily_cash_pct)/100, # % to numbers
      daily_cash_amt = parse_number(daily_cash_amt) # dollars to numbners
    )

}

list.files("~/Downloads", pattern = "Apple Card Statement", full.names = TRUE) %>% 
  map_df(read_apple_card_statement)

You can send the PDF statements from the Apple Card app to your Mac via Air Drop and it will put them into ~/Downloads. I recommend putting them somewhere else since you’ve likely given all sorts of applications access to ~/Downloads when prompted to on Catalina (yay security theatre). Wherever you put them, you can read them individually with read_apple_card_statment() or you can then just list_files() and bind all the individual statements together:

list.files("~/WhereYouPutAppleCardStatements", pattern = "Apple Card Statement", full.names = TRUE) %>% 
  map_df(read_apple_card_statement)

FIN

Be very wary of what you put your trust into online. Just because a site is benign one day does not mean it won’t be malicious (deliberately or otherwise) the next. Also, lobby Apple to provide data in more useful formats, especially since it provides applications like Numbers for free with their operating system.

Before we start wrapping foreign language code we need to make sure that basic R packages can be created. If you’ve followed along from the previous post you have everything you need to get started here. Just to make sure, you should be able to fire up a new RStudio session and execute the following R code and see similar output. If not, you’ll need to go through the steps and resources outlined there before continuing.

pkgbuild::check_build_tools()
### Your system is ready to build packages!

Also: the {bookdown} version should now always match the blog post (apart from some verbiage changes to denote it’s in a book vs a series of blog posts). You can refer to it at any time via — https://rud.is/books/writing-frictionless-r-package-wrappers/.

Configuring {devtools}

We’re going to rely on the {devtools} package for many operations and the first thing you should do now is execute help("create", "devtools") in an RStudio R console to see the package documentation page where you’ll see guidance pointing you to devtools::use_description() that lists some R session options() that you can set to make your package development life much easier and quicker. Specifically, it lets you know that you can setup your ~/.Rprofile to include the certain options settings which will automatically fill in fields each time you create a new package vs you either specifying these fields manually in the package creation GUI or as parameters to devtools::create().

A good, minimal setup would be something like:

options(
  usethis.description = list(
    `Authors@R` = 'person("Some", "One", email = "someone@example.com", role = c("aut", "cre"),
                          comment = c(ORCID = "YOUR-ORCID-ID"))',
    License = "MIT + file LICENSE"
  )
)

NOTE: If you do not have an “ORCID” you really should get one (they’re free!) by heading over to
https://orcid.org/ — and filling in some basic information.

Take a moment to edit your ~/.Rprofile. If you’re not sure about how to do that there is an excellent chapter in Efficient R Programming1 which walks you through the process.

Once you’ve added or verified these new options() settings, restart your R session.

Creating A Package

We’re almost ready to create and build a basic R package. All R packages live in a package directory and I highly suggest creating a packages directory right off your home directory (e.g. “~/packages“) or someplace where you’ll be able to keep them all organized and accessible. The rest of these posts will assume you’re using “~/packages” as the

With {devtools} now pre-configured, use the RStudio R Console pane to execute the following code which should produce similar output and open up a new RStudio session with the new package directory:

devtools::create("~/packages/myfirstpackage") 
## ✔ Creating '/Users/someuser/packages/myfirstpackage/'
## ✔ Setting active project to '/Users/someuser/packages/myfirstpackage'
## ✔ Creating 'R/'
## ✔ Writing 'DESCRIPTION'
## Package: myfirstpackage
## Title: What the Package Does (One Line, Title Case)
## Version: 0.0.0.9000
## Authors@R (parsed):
##     * Bob Rudis <bob@rud.is> [aut, cre] (<https://orcid.org/0000-0001-5670-2640>)
## Description: What the package does (one paragraph).
## License: MIT + file LICENSE
## Encoding: UTF-8
## LazyData: true
## ✔ Writing 'NAMESPACE'
## ✔ Writing 'myfirstpackage.Rproj'
## ✔ Adding '.Rproj.user' to '.gitignore'
## ✔ Adding '^myfirstpackage\\.Rproj$', '^\\.Rproj\\.user$' to '.Rbuildignore'
## ✔ Opening '/Users/someuser/packages/myfirstpackage/' in new RStudio session
## ✔ Setting active project to '<no active project>'

The directory structure will look like this:

.
├── DESCRIPTION
├── NAMESPACE
├── R/
└── myfirstpackage.Rproj

At this point we still do not have a “perfect” R package. To prove this, use the R console to run devtools::check() and — after some rather verbose output — you’ll see the following lines at the end:

> checking DESCRIPTION meta-information ... WARNING
  Invalid license file pointers: LICENSE

0 errors ✓ | 1 warning x | 0 notes ✓

Since we’re saying that our package will be using the MIT license, we need to ensure there’s an associated LICENSE file which we can do by executing usethis::use_mit_license() which will create the necessary files and ensure the License field in the DESCRIPTION file is formatted properly.

If you run devtools::check() again, now, your final line should report:

## 0 errors ✓ | 0 warnings ✓ | 0 notes ✓

and the package directory tree should look like this:

├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── R/
└── myfirstpackage.Rproj

Rounding Out The Corners

While we have a minimum viable package there are a few other steps we should take during this setup phase. First we’ll setup our package to use {roxygen2}2 for documenting functions, declaring NAMESPACE imports, and other helper-features that will be introduced in later posts. We can do this via usethis::use_roxygen_md():

usethis::use_roxygen_md()
## ✔ Setting Roxygen field in DESCRIPTION to 'list(markdown = TRUE)'
## ✔ Setting RoxygenNote field in DESCRIPTION to '7.0.2'
## ● Run `devtools::document()`

We won’t run devtools::document() just yet, though. Before we do that we’ll also create an R file where we can store top-level package introduction/meta-information:

usethis::use_package_doc()
## ✔ Writing 'R/myfirstpackage-package.R'

Now, our directory tree should look like:

.
├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── R
│   └── myfirstpackage-package.R
└── myfirstpackage.Rproj

Now, run devtools::document() which will translate the {roxygen2} comments into a properly-formatted R documentation file and regenerate the NAMESPACE file (as we’ll be managing package imports and exports via {roxygen2} comments). The directory tree will now look like:

.
├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── R
│   └── myfirstpackage-package.R
├── man
│   └── myfirstpackage-package.Rd
└── myfirstpackage.Rproj

and, we can now re-run devtools::check() to make sure we have the three “0’s” we’re aiming for each time we check our package for errors.

Passing The Test

We’re going to want to write and use tests to ensure our package works properly. There are many R package testing frameworks available. To ease the introduction into this process, we’ll use one of the frameworks that came along for the ride when you installed the various packages outlined in the previous post: {testthat}3. Setting up {testthat} is also pretty painless thanks to the {usethis} package we’ve been taking advantage of quite a bit so far. We’ll create the {testthat} overall infrastructure then add a placeholder test script since devtools::check() will complain about no tests being available if we do not have at least a single script it can execute during the test phase of the package checking process.

usethis::use_testthat()
## ✔ Adding 'testthat' to Suggests field in DESCRIPTION
## ✔ Creating 'tests/testthat/'
## ✔ Writing 'tests/testthat.R'
## ● Call `use_test()` to initialize a basic test file and open it for editing.

usethis::use_test("placeholder")
## ✔ Increasing 'testthat' version to '>= 2.1.0' in DESCRIPTION
## ✔ Writing 'tests/testthat/test-placeholder.R'
## ● Modify 'tests/testthat/test-placeholder.R'

The directory tree will now look like this:

.
├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── R
│   └── myfirstpackage-package.R
├── man
│   └── myfirstpackage-package.Rd
├── myfirstpackage.Rproj
└── tests
    ├── testthat
    │   └── test-placeholder.R
    └── testthat.R

Run devtools::check() one more time to make sure we’ve got those precious 3 “0’s” one last time.

Getting Things Under Control

We’re almost done! One final step is to turn this directory into a git-managed directory so we can work a bit more safely and eventually share our development work with a broader audience. Provided you followed the outline in the previous post, setting up git is as straightforward as one {usethis} function call:

usethis::use_git()
## ✔ Setting active project to '/Users/someuser/packages/myfirstpackage'
## ✔ Initialising Git repo
## ✔ Adding '.Rhistory', '.RData' to '.gitignore'
## There are 10 uncommitted files:
## * '.gitignore'
## * '.Rbuildignore'
## * 'DESCRIPTION'
## * 'LICENSE'
## * 'LICENSE.md'
## * 'man/'
## * 'myfirstpackage.Rproj'
## * 'NAMESPACE'
## * 'R/'
## * 'tests/'
## Is it ok to commit them?
## 
## 1: For sure
## 2: Negative
## 3: Not now
## 
## Selection: 1
## ✔ Adding files
## ✔ Commit with message 'Initial commit'
## ● A restart of RStudio is required to activate the Git pane
## Restart now?
## 
## 1: Negative
## 2: Not now
## 3: Yup
## 
## Selection: 3

RStudio should have been restarted (so it can add a “Git” pane in case you want to use the GUI to manage git) and the directory tree will now have a .git/ subdirectory that you should (almost) never touch by hand.

The last thing to do is to “vaccinate” your git setup so you don’t leak sensitive or unnecessary files when you (eventually) share your creation with the world:

usethis::git_vaccinate()
## ✔ Adding '.Rproj.user', '.Rhistory', '.Rdata' to '/Users/someuser/.gitignore'

We now have a basic, working R package that is devoid of any real functionality other than that of getting us familiar with the package setup and validation processes. We’ll be building upon this experience in most of the coming posts.

Quick Reference

After ensuring you’ve got the recommended options() in place, here are the steps to setup a new package:

# in any RStudio R Console session
devtools::create("~/packages/THE-PACKAGE-NAME")

# in the newly created package RStudio R Console session:
usethis::use_mit_license()       # need a LICENSE file
usethis::use_roxygen_md()        # use {roxygen2} for documentation and configuration
usethis::use_package_doc()       # setup a package-level manual page
usethis::use_testthat()          # setup testing infrastructure
usethis::use_test("placeholder") # setup a placeholder test file
devtools::document()             # Let {roxygen2} create NAMESPACE entries, build manual pages (and, more later on)
devtools::check()                # looking for the three "0's" that tell us we're ready to roll!
usethis::use_git()               # put the directory under git version control
usethis::git_vaccinate()         # Prevent leaking credentials and other unnecessary filesystem cruft

Rather than re-type devtools::document() (et al) whenever you need to run {roxygen2} or build/check a package you can use RStudio keyboard shortcuts that are designed to seamlessly integrate with the {devtools} ecosystem:

Operation Windows & Linux Mac {devtools} equivalent
Install and Restart Ctrl+Shift+B Cmd+Shift+B devtools::install()
Load All (devtools) Ctrl+Shift+L Cmd+Shift+L devtools::load_all()
Test Package (Desktop) Ctrl+Shift+T Cmd+Shift+T devtools::test()
Test Package (Web) Ctrl+Alt+F7 Cmd+Alt+F7 devtools::test()
Check Package Ctrl+Shift+E Cmd+Shift+E devtools::check()
Document Package Ctrl+Shift+D Cmd+Shift+D devtools::document()

We’ll refer to these operations as “install” (or “build”), “load all”, “test”, “check”, and “document” from now on so you can choose to use the console or the shortcuts as you prefer.

Exercises

Our package may be kinda, well, useless for the moment but that doesn’t mean you can’t show it some love and get some practice in at the same time while things are still relatively straightforward.

  • Modify the Title, Version, and Description fields of the DESCRIPTION file and refine them as needed until package checks pass.
  • Deliberately mangle parts of the DESCRIPTION file to see what errors or warnings you receive during the package check process.
  • Read up on {roxygen2} and add some Sections to it formatted with markdown and/or LaTeX. Re-“document” the package and see how your changes look.
  • Edit the test-placeholder.R file and change the placeholder test it created so it fails and then re-check the package to see what warnings or errors show up.
  • After you’ve made (valid, working) modifications to any/all of the above and package checks pass, use either the git command line tools or the RStudio Git pane to add your updates to the git tree. Use the resources linked to in the previous post if you need a refresher on how to do that.
  • Re-run through all the steps with a brand new package name just to make sure you’re comfortable with the package creation process.

Up Next

In the next installment in the series we will start wrapping by creating a basic wrapper that just calls out to the operating system shell to run commands.


  1. Efficient R Programming, “3.3 R Startup”; (https://csgillespie.github.io/efficientR/3-3-r-startup.html#r-startup
  2. {roxygen2} Home; (https://roxygen2.r-lib.org/
  3. {testthat} Home; (https://testthat.r-lib.org/

Offspring #4 eats pretty healthy, normally, but likes to indulge in sugary confections on occasion. He was a big help clearing away the massive amount of snow that accumulated in the last winter storm of 2019 so I decided to start the new year off by making him (and us) a batch of homemade doughnuts which are one of his favourite treats. They came out pretty well and I haven’t posted a recipe in a long while so here it is (which may hopefully start a trend of more recipe posts this year).

Yeast-raised Apple Cider Doughnuts

This makes ~500g of dough which equates to ~10-12 doughnuts depending on the cutter you have. Desired dough temperature is ~27°C.

Set aside 15-20m for prep; ~90-120m for proofing; 20-30m for cooking them all.

  • 150g bread flour (yep, grams; treat yourself to a kitchen/baking scale this year if you do not have one)
  • 100g pastry flour
  • 8g yeast
  • 130g apple cider (warm)
  • 30g egg (it really doesn’t hurt to put in one whole small or medium egg)
  • 16g sugar
  • 16g notfat dry milk (or, reduce apple cider by 10g and add in 15g of oat/almond/coconut milk which I have to do b/c of our combined dairy allergies; use the dry nonfat milk if you can though as the end product is definitely better)
  • 5g baking powder
  • 5g salt
  • 0.5g fresh ground nutmeg
  • 46g emulsified shortening (a.k.a. cake/icing shortening; you can get away with Crisco™ but the donuts will have a slightly more cake-ish texture)
  • oil (for frying)
  • cinnamon sugar (optional as it’s for tossing the cooked doughnuts in, if desired)

Combine flours and yeast well with a whisk in a bowl (I’m assuming y’all are using a stand mixer).

Add water, eggs, milk, sugar, salt, baking powder, and nutmeg.

Mix with dough hook attachment on low speed for 2-3 minutes (all ingredients should be incorporated).

Add shortening and put mixer on medium for 8-9 minutes. You’re looking for decent gluten development.

Bulk ferment the dough (it should just about double). ~30m

Fold the dough and ferment for ~30m more.

Roll out dough on floured board to ½ inch / 1 cm. Cover loosely with plastic wrap or bread towel and let rest for 10-15m.

Cut out doughnuts with cutter, placing cut doughnuts and holes onto a sheet pan with lightly oiled parchment paper.

You can also make other shapes (I made a few twisty sticks).

You can recombine the dough after you get through each cutting pass, but don’t overwork it.

Proof for ~15m (dough needs to spring back slowly after a light finger press).

While waiting, prep a pot with the oil. It is important to maintain a 177°C temperature. If the oil is too hot the dough will burn. If too cold they will be oily.

Put doughnuts in 1-3 at a time depending on size of pot and how well you think you can manage turning and removing them. Having a frying spider/basket helps to turn and remove the cooked doughnuts. Cooking time will be 1-2 minutes for each size. Test one or two doughnuts first before continuing with the entire batch to get the feel for it. After both sides of each doughnut are golden brown remove from oil and let drain over pot or on a rack over a sheet pan.

NOTE: doughnut holes will take 30s-1m per “side”.

If you’re using the cinnamon sugar topping, it helps to have a sidekick to take the cooked doughnuts off the rack right after they stop dripping and coat them in the mixture, but you can also do this on your own.

Let cool sufficiently to eat.

The R language and RStudio IDE are a powerful combination for “getting stuff done”, and one aspect of R itself that makes it especially useful is the ability to use it with other programming languages via a robust foreign language interface capability1. The term “foreign language” refers to another programming language such as C, C++, Fortran, Java, Python, Rust, etc. A common way of referring the this idiom of using functionality written in another programming language from directly within R is “wrapping” since we’re putting an R “shell” around the code from the other language. Another term you may see used is “extending” (hence the title of the “Writing R Extensions” R manual).

While R supports using this this extension mechanism from any R script leaving tiny trails of R and other language source and binary files all across your filesystem is not exactly the best way to keep these components organized and creates other challenges when you come across the need to use them in other projects or share them with others. Thankfully, the R Core team, along with many individual contributors over the years, has made it pretty straightforward to incorporate this extension capability into R packages which are much easier (honest!) to organize and share.

The goal of this blog series — which will have a {bookdown} book companion along with some screencasts — is to help you get up to speed using R and RStudio to write R packages that wrap code from many different languages to help you “get stuff done” with as little friction as possible.

Base Requirements

It is assumed that readers are familiar with the R programming language, RStudio IDE, and are comfortable installing and using packages. Since this work is about extending R with other programming languages, you should also have some knowledge of one or more of the target languages being covered.

To follow along with the series you’ll need to ensure you have the necessary components installed along the way. Rather than overwhelm you with all of them up front, each new section will introduce requirements specific to the language or situation being covered. However, there are some fundamentals you’ll need to ensure are available.

  • An R2 environment, preferably R 3.6.x which is what was used for this series.
  • RStudio3, as we’ll be using many of the features provided in it to help reduce development friction
  • The {pkgbuild}4 package installed

Once you’ve gotten through those steps, you should fire up RStudio and run:

pkgbuild::check_build_tools(debug = TRUE)

which will help you make sure your particular system is ready to build packages.

After performing the build tool check and/or installation of the necessary core tools, you will then need to install the {devtools}5 package, which will help ensure that the remaining core packages required are installed.

We’re also going to use the git6 source code version control system. The git ecosystem is not “GitHub”, which is just a public (or, potentially somewhat private) place to house source code repositories, just like other hosted services such as Bitbucket, GitLab, or SourceHut. You can use the excellent “Happy Git with R”7 resource to help ensure you’re source code control environment is also ready to use.

Supplemental References

It may be helpful to create a browser bookmark folder for supplemental reference material that will be referred to from time-to-time across the sections (we’ll be adding to this list in each chapter, too):

Up Next

If you’ve been a user of “development versions” of R packages or have authored R packages you likely made quick work of this first installment. Those new to creating packages with R, those who tend to only use fully-baked CRAN versions of R packages, and/or those who have not worked with git before likely had to do quite a bit of work to get down to this point (if this describes you, you definitely deserve both a break and kudos for getting this far!).

In the next installment we’ll make sure the package building infrastructure is ready to roll by creating a basic R package that we’ll use as a building block for future work.


  1. “Writing R Extensions”; Chapter 5, “System and foreign language interfaces”; (https://cran.r-project.org/doc/manuals/r-release/R-exts.html#System-and-foreign-language-interfaces
  2. R Project Home (https://www.r-project.org/
  3. RStudio Home (http://rstudio.com/
  4. {pkgbuild} CRAN page (https://cran.rstudio.com/web/packages/pkgbuild/
  5. {devtools} Home (https://devtools.r-lib.org/
  6. git Home (https://git-scm.com/
  7. Happy Git with R (https://happygitwithr.com/