Skip navigation

Author Archives: hrbrmstr

Don't look at me…I do what he does — just slower. #rstats avuncular • ?Resistance Fighter • Cook • Christian • [Master] Chef des Données de Sécurité @ @rapid7

The previous post introduced the topic of how to compile Swift code for use in R using a useless, toy example. This one goes a bit further and makes a case for why one might want to do this by showing how to use one of Apple’s machine learning libraries, specifically the Natural Language one, focusing on extracting parts of speech from text.

I made a parts-of-speech directory to keep the code self-contained. In it are two files. The first is partsofspeech.swift (swiftc seems to dislike dashes in names of library code and I dislike underscores):

import NaturalLanguage
import CoreML

extension Array where Element == String {
  var SEXP: SEXP? {
    let charVec = Rf_protect(Rf_allocVector(SEXPTYPE(STRSXP), count))
    defer { Rf_unprotect(1) }
    for (idx, elem) in enumerated() { SET_STRING_ELT(charVec, idx, Rf_mkChar(elem)) }
    return(charVec)
  }
}

@_cdecl ("part_of_speech")
public func part_of_speech(_ x: SEXP) -> SEXP {

  let text = String(cString: R_CHAR(STRING_ELT(x, 0)))
  let tagger = NLTagger(tagSchemes: [.lexicalClass])

  tagger.string = text

  let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace]

  var txts = [String]()
  var tags = [String]()

  tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
    if let tag = tag {
      txts.append("\(text[tokenRange])")
      tags.append("\(tag.rawValue)")
    }
    return true
  }

  let out = Rf_protect(Rf_allocVector(SEXPTYPE(VECSXP), 2))
  SET_VECTOR_ELT(out, 0, txts.SEXP)
  SET_VECTOR_ELT(out, 1, tags.SEXP)
  Rf_unprotect(1)

  return(out!)
}

The other is bridge code that seems to be the same for every one of these (or could be) so I’ve just named it swift-r-glue.h (it’s the same as the bridge code in the previous post):

#define USE_RINTERNALS

#include <R.h>
#include <Rinternals.h>

const char* R_CHAR(SEXP x);

Let’s walk through the Swift code.

We need to two imports:

import NaturalLanguage
import CoreML

to make use of the NLP functionality provided by Apple.

The following extension to the String Array class:

extension Array where Element == String {
  var SEXP: SEXP? {
    let charVec = Rf_protect(Rf_allocVector(SEXPTYPE(STRSXP), count))
    defer { Rf_unprotect(1) }
    for (idx, elem) in enumerated() { SET_STRING_ELT(charVec, idx, Rf_mkChar(elem)) }
    return(charVec)
  }
}

will reduce the amount of code we need to type later on to turn Swift String Arrays to R character vectors.

The start of the function:

@_cdecl ("part_of_speech")
public func part_of_speech(_ x: SEXP) -> SEXP {

tells swiftc to make this a C-compatible call and notes that the function takes one parameter (in this case, it’s expecting a length 1 character vector) and returns an R-compatible value (which will be a list that we’ll turn into a data.frame in R just for brevity).

The following sets up our inputs and outputs:

  let text = String(cString: R_CHAR(STRING_ELT(x, 0)))
  let tagger = NLTagger(tagSchemes: [.lexicalClass])

  tagger.string = text

  let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace]

  var txts = [String]()
  var tags = [String]()

We convert the passed-in parameter to a Swift String, initialize the NLP tagger, and setup two arrays to hold the results (sentence component in txts and the part of speech that component is in tags).

The following code is mostly straight from Apple and (inefficiently) populates the previous two arrays:


tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in if let tag = tag { txts.append("\(text[tokenRange])") tags.append("\(tag.rawValue)") } return true }

Finally, we use the Swift-R bridge to make a list much like one would in C:


let out = Rf_protect(Rf_allocVector(SEXPTYPE(VECSXP), 2)) SET_VECTOR_ELT(out, 0, txts.SEXP) SET_VECTOR_ELT(out, 1, tags.SEXP) Rf_unprotect(1) return(out!)

To get a shared library we can use from R, we just need to compile this like last time:

swiftc \
  -I /Library/Frameworks/R.framework/Headers \
  -F/Library/Frameworks \
  -framework R \
  -import-objc-header swift-r-glue.h \
  -emit-library \
  partsofspeech.swift

Let’s run that on some text! First, we’ll load the new shared library into R:

dyn.load("libpartsofspeech.dylib")

Next, we’ll make a wrapper function to avoid messy .Call(…)s and to make a data.frame:

parts_of_speech <- function(x) {
  res <- .Call("part_of_speech", x)  
  as.data.frame(stats::setNames(res, c("name", "tag")))
}

Finally, let’s try this on some text!

tibble::as_tibble(
  parts_of_speech(paste0(c(
"The comm wasn't working. Feeling increasingly ridiculous, he pushed",
"the button for the 1MC channel several more times. Nothing. He opened",
"his eyes and saw that all the lights on the panel were out. Then he",
"turned around and saw that the lights on the refrigerator and the",
"ovens were out. It wasn’t just the coffeemaker; the entire galley was",
"in open revolt. Holden looked at the ship name, Rocinante, newly",
"stenciled onto the galley wall, and said, Baby, why do you hurt me",
"when I love you so much?"
  ), collapse = " "))
)
## # A tibble: 92 x 2
##    name         tag
##    <chr>        <chr>
##  1 The          Determiner
##  2 comm         Noun
##  3 was          Verb
##  4 n't          Adverb
##  5 working      Verb
##  6 Feeling      Verb
##  7 increasingly Adverb
##  8 ridiculous   Adjective
##  9 he           Pronoun
## 10 pushed       Verb
## # … with 82 more rows

FIN

If you’re playing along at home, try adding a function to this Swift file that uses Apple’s entity tagger.

The next installment of this topic will be how to wrap all this into a package (then all these examples get tweaked and go into the tome.

I’ve been on a Swift + R bender for a while now, but have been envious of the pure macOS/iOS (et al) folks who get to use Apple’s seriously ++good machine learning libraries, which are even more robust on the new M1 hardware (it’s cool having hardware components dedicated to improving the performance of built models).

Sure, it’s pretty straightforward to make a command-line utility that can take data input, run them through models, then haul the data back into R, but I figured it was about time that Swift got the “Rust” and “Go” treatment in terms of letting R call compiled Swift code directly. Thankfully, none of this involves using Xcode since it’s one of the world’s worst IDEs.

To play along at home you’ll need macOS and at least the command line tools installed (I don’t think this requires a full Xcode install, but y’all can let me know if it does in the comments). If you can enter swiftc at a terminal prompt and get back <unknown>:0: error: no input files then you’re good-to-go.

Hello, Swift!

To keep this post short (since I’ll be adding this entire concept to the SwiftR tome), we’ll be super-focused and just build a shared library we can dynamically load into R. That library will have one function which will be to let us say hello to the planet with a customized greeting.

Make a new directory for this effort (I called mine greetings) and create a greetings.swift file with the following contents:

All this code is also in this gist.

@_cdecl ("greetings_from")
public func greetings_from(_ who: SEXP) -> SEXP {
  print("Greetings, 🌎, it's \(String(cString: R_CHAR(STRING_ELT(who, 0))))!")
  return(R_NilValue)
}

Before I explain what’s going on there, also create a geetings.h file with the following contents:

#define USE_RINTERNALS

#include <R.h>
#include <Rinternals.h>

const char* R_CHAR(SEXP x);

In the Swift file, there’s a single function that takes an R SEXP and converts it into a Swift String which is then routed to stdout (not a “great” R idiom, but benign enough for an intro example). Swift functions aren’t C functions and on their own do not adhere to C calling conventions. Unfortunately R’s ability to work with dynamic library code requires such a contract to be in place. Thankfully, the Swift Language Overlords provided us with the ability to instruct the compiler to create library code that will force the calling conventions to be C-like (that’s what the @cdecl is for).

We’re using SEXP, some R C-interface functions, and even the C version of NULL in the Swift code, but we haven’t done anything in the Swift file to tell Swift about the existence of these elements. That’s what the C header file is for (I added the R_CHAR declaration since complex C macros don’t work in Swift).

Now, all we need to do is make sure the compiler knows about the header file (which is a “bridge” between C and Swift), where the R framework is, and that we want to generate a library vs a binary executable file as we compile the code. Make sure you’re in the same directory as both the .swift and .h file and execute the following at a terminal prompt:

swiftc \
  -I /Library/Frameworks/R.framework/Headers \ # where the R headers are
  -F/Library/Frameworks \                      # where the R.framework lives
  -framework R \                               # we want to link against the R framework
  -import-objc-header greetings.h \            # this is our bridging header which will make R C things available to Swift
  -emit-library \                              # we want a library, not an exe
  greetings.swift                              # our file!

If all goes well, you should have a libgreetings.dylib shared library in that directory.

Now, fire up a R console session in that directory and do:

greetings_lib <- dyn.load("libgreetings.dylib")

If there are no errors, the shared library has been loaded into your R session and we can use the function we just made! Let’s wrap it in an R function so we’re not constantly typing .Call(…):

greetings_from <- function(who = "me") {
  invisible(.Call("greetings_from", as.character(who[1])))
}

I also took the opportunity to make sure we are sending a length-1 character vector to the C/Swift function.

Now, say hello!

greetings_from("hrbrmstr")

And you should see:

Greetings, 🌎, it's hrbrmstr!

FIN

We’ll stop there for now, but hopefully this small introduction has shown how straightforward it can be to bridge Swift & R in the other direction.

I’ll have another post that shows how to extend this toy example to use one of Apple’s natural language processing libraries, and may even do one more on how to put all this into a package before I shunt all the individual posts into a few book chapters.

There was an org that didn’t see
The data exfil hacking spree.
A patch went up, our guard was down,
Oh blow, SolarWinds, blow.

Soon may the Vendorman come,
And bring us Yara rules to run.
One day when their huntin’ is done,
They’ll take their scripts and go.

There was no implant here before,
But after was a sly backdoor.
The C2 signals they did soar
And labored low and slow.

As the attack did laterally move
Our foe did get into their groove;
Stealing certs; installing ‘sploits
Wher’er they did go.

Then one day in the by and by,
Some clever folk with a firey-eye,
Spotted something amiss in Orion’s belt
And a delvin’ they did go.

The sunburst cleared and they did see
The origin of their recent misery.
A blog went up; they did proclaim
Many other orgs were laid low.

As far as I’ve heard, the fight’s still on;
The C2s blocked but the attackers aren’t done.
The Vendorman makes his regular call
To help CISO, crew and all.

Last week I introduced a new bookdown series on how to embed R into a macOS Swift application.

The initial chapters focused on core concepts and showed how to build a macOS compiled, binary command line application that uses embedded R for some functionality.

This week, a new chapter is up that walks you though how to build a basic SwiftUI application that takes input from the user, performs a computation in R (via embedded R) and displays the result of the computation back to the user.

The app looks like this:

and — apart from some of the boilerplate interface code from previous chapters — is around ~60 lines of Swift code that ends up consuming ~65 MB of active RAM when run with almost no energy impact (an equivalent Electron-packaged Shiny app would be 130-200 MB of initial RAM and have a significant, constant energy impact).

There’s sufficient boilerplate in this project to extend to write a basic GUI wrapper for various R operations you have hanging around.

Forthcoming chapters will show how to get graphics out of R and into a SwiftUI window as well as how to make a more diminutive Shiny app wrapper that we’ll eventually be able to ship with an embedded copy of the R framework.

(Leading this with the periodic warning/reminder that this blog occasionally breaks from technical content and has category-based RSS feeds which can be used to ensure one never see non-technical content.)

Every decent human (which excludes 74,222,958 🇺🇸 who voted for this, now 100% undeniable, traitor) with knowledge of this past week’s tragic events is likely still processing — and will be for a while — what happened; I am no exception. The Feedly board I set up to save content I’ve been pouring over has 113 articles in it, so far.

Different aspects of the costume-clad, treasonous chaos have hit me daily, if not hourly.

Two newspaper paragraphs, each one about a different victim, have bubbled up to surface thoughts more often than much of the other stories of the week.

One is about Erin Schaff, a brave, talented journalist from the New York Times:

Grabbing my press pass, they saw that my ID said The New York Times and became really angry. They threw me to the floor, trying to take my cameras. I started screaming for help as loudly as I could. No one came. People just watched. At this point, I thought I could be killed and no one would stop them. They ripped one of my cameras away from me, broke a lens on the other and ran away. [NYTimes]

No one came.

People just watched.

While I am deeply shocked, outraged, and saddened by what happened to Erin, I am not surprised, given that the President of the United States wants journalists to be executed for regularly giving his 2017-2020 reality show bad reviews by stating undeniable facts. Furthermore, he has continually cultivated disdain and hatred for the media in his regiment of cult followers.

Erin is lucky to be alive, even if that means living in the ruins of a failed, so-called democracy.

President Trump is responsible for Erin’s assault, and he is going to get away with it.

The other is about Ashli Babbitt, the troubled insurrectionist who died assaulting the Capitol:

With help from someone who hoisted her up, Babbitt began to step through a portion of the door where the glass had been broken out. An officer on the other side, who was wearing a suit and a surgical mask, immediately shot Babbitt in the neck. She fell to the floor. [WaPo]

After the February 2020 impeachment proceedings failed to do anything substantive, the President boasted of feeling “untouchable”; and, in at a campaign rally in 2016, then Republican presidential candidate Trump boasted that he could “shoot somebody and not lose any voters.”

Trump has taken the lives of hundreds of thousands of Americans. And, while the gun wasn’t in his stubby hand, he is fully responsible for this woman’s shooting and death.

So, Trump was right: he is going to get away with it.

Mike Pence, Ted Cruz, Josh Hawley, Lindsey Graham, Susan Collins, Mitch McConnell, and a few hundred other evil, self-serving, elected cowards are all unindicted co-conspirators to Erin’s assault and Ashley’s death, as are countless “news” and talk show hosts.

Beyond what happened to these two women, this traitorous cabal also helped orchestrate this week’s current crescendo to Trump’s term in office.

I say “current” because there’s a non-zero chance of increased violence and bloodshed before January 20th despite Trump being deplatformed.

I have lost all hope that Trump will face any tangible consequences for his actions, which will only serve to embolden other wanna-be dictators like Cruz and Hawley.

What’s worse is that even after Biden’s victory was finally 100% sealed and one of America’s most cherished institutions was ransacked, the Trump supporters near me (rural-ish Maine) still, proudly, have their 2020 Trump campaign signs up and were very likely laughing and cheering the insurrection while Erin was being assaulted and Ashley’s life was ebbing away.

Even the Court Evangelicals have doubled-down in their support of Trump.

I (literally) pray I’m wrong, but it seems inevitable that the violence and bloodshed will continue through and after the 20th. As Biden tries to (also, literally) heal America by bringing science-fueled, centralized, enforced standards to quell the carnage of Covid, we will very likely and regularly see regional repeats of this week’s contemptible acts. As he and his administration attempt to right the many, many wrongs of the past four years (and more), these necessary actions will further push the ilk of this week to regularly manifest their entitlement-fueled rage.

Nos autem non in antebellum; bella iam inceperat.

I went completely daft this week and broke my months-long Twitter break due to the domestic terror event in my nation’s capitol. I’ll likely be resuming the break starting today.

Whilst keeping up with the final descent of the U.S. into a fully failed state, I also noticed that a debate from months ago on CRAN URL checks was still going strong.

I briefly chimed in those months ago and this week on the dangers of short URLs (which was not exactly the core topic of the debate which centered around HTTP URL redirects which is a feature of the protocol that URL shorteners happen to take advantage of).

Short URLs make it easier to type a URL out or remember a URL (if you can still get a decent, short keyword to use after the /) but they’re dangerous. In case you’re one of the R folks who challenge my security chops, perhaps you’ll believe Bruce.

NOTE: Regular ol’ URLs can be, and are dangerous, too, especially if they’re used in an http:// context vs an https:// context or run by daft folks who think they’re capable of making a system fully impervious to attackers.

The pandemic has made “cyber” fairly hectic, so my plan to wrap up a safety checker and local package URLs re-writer into a small, usable tool/package has no ETA on completion. However, that doesn’t mean you can’t gain visibility into the number, types, and safety of URLs in your locally installed packages.

The code below has exposition in the comments – and you can find it here as well — so I’ll close with it vs my usual “FIN”.

Stay safe out there, folks; and — to my not-so-‘United’-after-all States readers — stay strong! The nightmare of the last four years is almost over (though the cleanup — now both physical and metaphorical — is going to take a long time).

library(urltools)
library(stringi)
library(tidyverse)
# we're also using {clipr} and {tools} but via ::: and ::

# fairly comprehensive list of URL shorteners
shorteners <- read_lines("https://github.com/sambokai/ShortURL-Services-List/raw/master/shorturl-services-list.txt")

# opaque function baked into {tools}
# NOTE: this can take a while
db <- tools:::url_db_from_installed_packages(rownames(installed.packages()), verbose = TRUE)

as_tibble(db) %>% 
  distinct() %>%  # yep, even w/in a pkg there may be dups from ^^
  mutate(
    scheme = scheme(URL), # https or not
    dom = domain(URL)     # need this later to be able to compute apex domain
  )  %>% 
  filter(
    dom != "..", # prbly legit since it will be a relative "go up one directory" 
    !is.na(dom)  # the {tools} url_db_from_installed_packages() is not perfect
  ) %>% 
  bind_cols(
    suffix_extract(.$dom) # break them all down into component atoms
  ) %>% 
  select(-dom) %>% # this is now 'host' from ^^
  mutate(
    apex = sprintf("%s.%s", domain, suffix) # apex domain
  ) %>% 
  mutate(
    is_short = (host %in% shorteners) | (apex %in% shorteners) # does it use a shortener?
  ) -> db

db
## # A tibble: 12,623 x 9
##    URL        Parent    scheme host  subdomain domain suffix apex  is_short
##    <chr>      <chr>     <chr>  <chr> <chr>     <chr>  <chr>  <chr> <lgl>   
##  1 https://g… albersus… https  gith… NA        github com    gith… FALSE   
##  2 https://g… albersus… https  gith… NA        github com    gith… FALSE   
##  3 https://w… AnomalyD… https  www.… www       usenix org    usen… FALSE   
##  4 https://w… AnomalyD… https  www.… www       jstor  org    jsto… FALSE   
##  5 https://w… AnomalyD… https  www.… www       usenix org    usen… FALSE   
##  6 https://w… AnomalyD… https  www.… www       jstor  org    jsto… FALSE   
##  7 https://g… AnomalyD… https  gith… NA        github com    gith… FALSE   
##  8 https://g… AnomalyD… https  gith… NA        github com    gith… FALSE   
##  9 https://g… AnomalyD… https  gith… NA        github com    gith… FALSE   
## 10 https://g… AnomalyD… https  gith… NA        github com    gith… FALSE   
## # … with 12,613 more rows

# what packages do i have installed that use short URLS?
# a nice thing to do would be to file a PR to these authors

filter(db, is_short) %>% 
  select(
    URL,
    Parent,
    scheme
  )
## # A tibble: 5 x 3
##   URL                         Parent                   scheme
##   <chr>                       <chr>                    <chr> 
## 1 https://goo.gl/5KBjL5       fpp2/man/goog.Rd         https 
## 2 http://bit.ly/2016votecount geofacet/man/election.Rd http  
## 3 http://bit.ly/SnLi6h        knitr/man/knit.Rd        http  
## 4 https://bit.ly/magickintro  magick/man/magick.Rd     https 
## 5 http://bit.ly/2UaiYbo       ssh/doc/intro.html       http  

# what protocols are in use? (you'll note that some are borked and
# others got mangled by the {tools} function)

count(db, scheme, sort=TRUE)
## # A tibble: 5 x 2
##   scheme     n
##   <chr>  <int>
## 1 https  10007
## 2 http    2498
## 3 NA       113
## 4 ftp        4
## 5 `https     1

# what are the most used top-level sites?

count(db, host, sort=TRUE) %>% 
  mutate(pct = n/sum(n))
## # A tibble: 1,108 x 3
##    host                      n     pct
##    <chr>                 <int>   <dbl>
##  1 docs.aws.amazon.com    3859 0.306  
##  2 github.com             2954 0.234  
##  3 cran.r-project.org      450 0.0356 
##  4 en.wikipedia.org        220 0.0174 
##  5 aws.amazon.com          204 0.0162 
##  6 doi.org                 181 0.0143 
##  7 wikipedia.org           132 0.0105 
##  8 developers.google.com   114 0.00903
##  9 stackoverflow.com       101 0.00800
## 10 gitlab.com               86 0.00681
## # … with 1,098 more rows

# same as ^^ but apex

count(db, apex, sort=TRUE) %>% 
  mutate(pct = n/sum(n)) 
## # A tibble: 743 x 3
##    apex                  n     pct
##    <chr>             <int>   <dbl>
##  1 amazon.com         4180 0.331  
##  2 github.com         2997 0.237  
##  3 r-project.org       563 0.0446 
##  4 wikipedia.org       352 0.0279 
##  5 doi.org             221 0.0175 
##  6 google.com          179 0.0142 
##  7 tidyverse.org       151 0.0120 
##  8 r-lib.org           137 0.0109 
##  9 rstudio.com         117 0.00927
## 10 stackoverflow.com   102 0.00808
## # … with 733 more rows

# See all the eavesdroppable, interceptable, 
# content-mutable-by-evil-MITM-network-operator URLs
# A nice thing to do would be to fix these and issue PRs

filter(db, scheme == "http") %>% 
  select(URL, Parent)
## # A tibble: 2,498 x 2
##    URL                                                 Parent              
##    <chr>                                               <chr>               
##  1 http://www.winfield.demon.nl                        antiword/DESCRIPTION
##  2 http://github.com/ropensci/antiword/issues          antiword/DESCRIPTION
##  3 http://dirk.eddelbuettel.com/code/anytime.html      anytime/DESCRIPTION 
##  4 http://arrayhelpers.r-forge.r-project.org/          arrayhelpers/DESCRI…
##  5 http://arrow.apache.org/blog/2019/01/25/r-spark-im… arrow/doc/arrow.html
##  6 http://docs.aws.amazon.com/AmazonS3/latest/API/RES… aws.s3/man/accelera…
##  7 http://docs.aws.amazon.com/AmazonS3/latest/API/RES… aws.s3/man/accelera…
##  8 http://docs.aws.amazon.com/AmazonS3/latest/dev/acl… aws.s3/man/acl.Rd   
##  9 http://docs.aws.amazon.com/AmazonS3/latest/API/RES… aws.s3/man/bucket_e…
## 10 http://docs.aws.amazon.com/AmazonS3/latest/API/RES… aws.s3/man/bucketli…
## # … with 2,488 more rows

# find the abusers of "http" URLs

filter(db, scheme == "http") %>% 
  select(URL, Parent) %>% 
  mutate(
    pkg = stri_match_first_regex(Parent, "(^[^/]+)")[,2]
  ) %>% 
  count(pkg, sort=TRUE)
## # A tibble: 265 x 2
##    pkg                        n
##    <chr>                  <int>
##  1 paws.security.identity   258
##  2 paws.management          152
##  3 XML                      129
##  4 paws.analytics            78
##  5 stringi                   70
##  6 paws                      57
##  7 RCurl                     51
##  8 igraph                    49
##  9 base                      47
## 10 aws.s3                    44
## # … with 255 more rows

# send all the apex domains to the clipboard

clipr::write_clip(unique(db$apex))

# go here to paste them into the domain search box
# most domain/URL checker APIs aren't free for more 
# than a cpl dozen URLs/domains

browseURL("https://www.bulkblacklist.com")

# paste what you clipped into the box and wait a while

Over Christmas break I teased some screencaps:

of some almost-natural “R” looking code (this is a snippet):

Button("Run") {
  do { // calls to R can fail so there are lots of "try"s; poking at less ugly alternatives

    // handling dots in named calls is a WIP
    _  = try R.evalParse("options(tidyverse.quiet = TRUE )")

    // in practice this wld be called once in a model
    try R.library("ggplot2")
    try R.library("hrbrthemes")
    try R.library("magick")

    // can mix initialiation of an R list with Swift and R objects
    let mvals: RObject = [
      "month": [ "Jan", "Feb", "Mar", "Apr", "May", "Jun" ],
      "value": try R.sample(100, 6)
    ]

    // ggplot2! `mvals` is above, `col.hexValue` comes from the color picker
    // can't do R.as.data.frame b/c "dots" so this is a deliberately exposed alternate call
    let gg = try R.ggplot(R.as_data_frame(mvals)) +
      R.geom_col(R.aes_string("month", "value"), fill: col.hexValue) + // supports both [un]named
      R.scale_y_comma() +
      R.labs(
        x: rNULL, y: "# things",
        title: "Monthly Bars"
      ) +
      R.theme_ipsum_gs(grid: "Y")

    // an alternative to {magick} could be getting raw SVG from {svglite} device
    // we get Image view width/height and pass that to {magick}
    // either beats disk/ssd round-trip
    let fig = try R.image_graph(
      width: Double(imageRect.width), 
      height: Double(imageRect.height), 
      res: 144
    )

    try R.print(gg)
    _ = R.dev_off() // can't do R.dev.off b/c "dots" so this is a deliberately exposed alternate call

    let res = try R.image_write(fig, path: rNULL, format: "png")

    imgData = Data(res) // "imgData" is a reactive SwiftUI bound object; when it changes Image does too

  } catch {
  }

}

that works in Swift as part of a SwiftUI app that displays a ggplot2 plot inside of a macOS application.

It doesn’t shell out to R, but uses Swift 5’s native abilities to interface with R’s C interface.

I’m not ready to reveal that SwiftR code/library just yet (break’s over and the core bits still need some tweaking) but I can provide some interim resources with an online book about working with R’s C interface from Swift on macOS. It is uninspiringly called SwiftR — Using R from Swift.

There are, at present, six chapters that introduce the Swift+R concepts via command line apps. These aren’t terribly useful (shebanged R scripts work just fine, #tyvm) in and of themselves, but command line machinations are a much lower barrier to entry than starting right in with SwiftUI (that starts in chapter seven).

FIN

If you’ve wanted a reason to burn ~20GB of drive space with an Xcode installation and start to learn Swift (or learn more about Swift) then this is a resource for you.

The topics in the chapters are also a fairly decent (albeit incomplete) overview of R’s C interface and also how to work with C code from Swift in general.

So, take advantage of the remaining pandemic time and give it a 👀.

Feedback is welcome in the comments or the book code repo (book source repo is in progress).

Hope everyone has a safe and strong new year!

While the future of the Apache Drill ecosystem is somewhat in-play (MapR — a major sponsoring org for the project — is kinda dead), I still use it almost daily (on my local home office cluster) to avoid handing over any more money to Amazon than I/we already do. The latest (yet-to-be-released) v1.18.0 has some great improvements, including JSON resultset streaming for the REST API. Alas, tweaking {sergeant} (my REST API R package) to handle that is not on the TODO for the foreseeable future, so I’ve been using {sergeant.caffeinated} — https://github.com/hrbrmstr/sergeant-caffeinated — (a RJDBC wrapper for the Drill JDBC interface) for quite a while since it handles large resultsets quite nicely.

I broke out the RJDBC functionality from {sergeant} into this separate package since, despite the fact that it’s 2019/2020, many folks still have/had problems getting {rJava} to work (FWIW it’s a seamless install for me on Windows, Ubuntu, or macOS, even Apple Silicon macOS). The surgery to separate it was fairly hack-ish (one reason it’s not on CRAN) and it finally broke with the recent {dbplyr} 2.x release. I assumed fixing the caffeinated version was easier/quicker than the REST API version, so I dug in and am cautiously tossing it out for wider poking.

An All New Way To Use 💂☕️

Gone are the days of src_drill_jdbc(), but enter in the new term of more standardized {DBI} and {d[b]plyr} access to Apache Drill. To install this version you can do:

remotes::install_github("hrbrmstr/sergeant-caffeinated")

(more install options using safer and saner social coding sites coming soon).

Let’s load up the package(s) and perform some operations.

library(sergeant.caffeinated)

test_host <- Sys.getenv("DRILL_TEST_HOST", "localhost")

be_quiet()

(con <- dbConnect(drv = DrillJDBC(), sprintf("jdbc:drill:zk=%s", test_host)))
## <DrillJDBCConnection>

The DRILL_TEST_HOST environment variable contains the hostname or IP address of my/your Drill server, defaulting to localhost if none is found.

The be_quiet() function stops the Java engine from yelling at you with “illegal reflective access” warnings. If you see this in other rJava-powered packages it means code in some classes in some Java archive files are doing some sketchy old-school things that newer JVMs aren’t happy about. At some point, these warnings become full-on errors which will break many things. Unfortunately, Drill is still fairly tied to Java 8.x and has tons of introspecting code. The errors are ugly, so if you want to get rid of them, just call this function before doing anything with Drill. (You’ll also notice log4j errors are finally gone!)

Now that we have a Drill JDBC connection, we can do something with it. All the DBI-ish operations work, but it’s 2020 and {d[b]ply} is the bee’s knees, so we’ll just dive right in with that:

(db <- tbl(con, "cp.`employee.json`"))

## # Source:   table<cp.`employee.json`> [?? x 16]
## # Database: DrillJDBCConnection
##    employee_id full_name first_name last_name position_id position_title store_id
##          <dbl> <chr>     <chr>      <chr>           <dbl> <chr>             <dbl>
##  1           1 Sheri No… Sheri      Nowmer              1 President             0
##  2           2 Derrick … Derrick    Whelply             2 VP Country Ma…        0
##  3           4 Michael … Michael    Spence              2 VP Country Ma…        0
##  4           5 Maya Gut… Maya       Gutierrez           2 VP Country Ma…        0
##  5           6 Roberta … Roberta    Damstra             3 VP Informatio…        0
##  6           7 Rebecca … Rebecca    Kanagaki            4 VP Human Reso…        0
##  7           8 Kim Brun… Kim        Brunner            11 Store Manager         9
##  8           9 Brenda B… Brenda     Blumberg           11 Store Manager        21
##  9          10 Darren S… Darren     Stanz               5 VP Finance            0
## 10          11 Jonathan… Jonathan   Murraiin           11 Store Manager         1
## # … with more rows, and 9 more variables: department_id <dbl>, birth_date <chr>,
## #   hire_date <chr>, salary <dbl>, supervisor_id <dbl>, education_level <chr>,
## #   marital_status <chr>, gender <chr>, management_role <chr>

Basically, that’s it: it “just works”.

FIN

If you’ve been a user of {sergeant.caffeinated} and really need src_drill_jdbc() back, drop an issue on GH or a note in the comments, and be sure to file issues if I’ve missed anything as you kick the tyres.