Much of what I need to do for work-work involves using tools that are (for the moment) not in R. Today, I needed to test the validity of (and other processing on) DMARC records and I’m loathe to either reinvent the wheel or reticulate bits from a fragmented programming language ecosystem unless absolutely necessary. Thankfully, there’s libopendmarc
which works well on sane operating systems, but it is a C library that needs an interface to use in R.
However, I also really didn’t want to start a new package for this just yet (there will eventually be one, though, and I prefer working in a package context for Rcpp work). I just needed to run opendmarc_policy_store_dmarc()
against a decent-sized chunk of domain names and already-retrieved DMARC TXT
records. So, I decided to write a small “inline” cppFunction()
to get’er done.
Why am I blogging about this?
Despite growing popularity and a nice examples site, many newcomers to Rcpp
(literally the way you want to go when it comes to bridging C[++] and R) still voice discontent about there not being enough “easy” examples. Granted, they are quitely likely looking for full-bore tutorials covering a different, explicit use cases. The aforelinked Gallery has some of those and there are codified examples in — literally — rcppexamples
. But, there definitely needs to be more blog posts, books and such linking to them and expanding upon them.
Having mentioned that I’m using cppFunction()
, one could, further, ask “cppFunction()
has a help page with an example, so why blather about using it?”. Fair point! And, there is a reason which was hinted at in the opening paragraph.
I need to use libopendmarc
and that requires making a “plugin” if I’m going to do this “inline”. For some other DMARC processing I also need to use libresolv
since the library needs to make DNS requests and uses resolv
. You don’t need a plugin for a package version as you just need to boilerplate some “find these libraries and get their paths right for Makevars.in
” and add the linking code in there as well. Here, we need to register two plugins that provide metdata for the magic that happens under the covers when Rcpp
takes your inline code, compiles it and makes the function shared object available in R.
Plugins can be complex and do transformations, but the two I needed to write are just helping ensure the right #include
lines are there along with the right linker libraries. Here they are:
library(Rcpp)
registerPlugin(
name = "libresolv",
plugin = function(x) {
list(
includes = "",
env = list(PKG_LIBS="-lresolv")
)
}
)
registerPlugin(
name = "libopendmarc",
plugin = function(x) {
list(
includes = "#include <opendmarc/dmarc.h>",
env = list(PKG_LIBS="-lopendmarc")
)
}
)
All they do is make data structures available in the environment. We can use inline::getPlugin()
to see them:
inline::getPlugin("libresolv")
## $includes
## [1] ""
##
## $env
## $env$PKG_LIBS
## [1] "-lresolv"
inline::getPlugin("libopendmarc")
## $includes
## [1] "#include <opendmarc/dmarc.h>"
##
## $env
## $env$PKG_LIBS
## [1] "-lopendmarc"
Finally, the tiny bit of C/C++ code to take in the necessary parameters and return the result. In this case, we’re passing in a character vector of domain names and DMARC records and getting back a logical vector with the test results. Apart from the necessary initialization and cleanup code for libopendmarc
this is an idiom you’ll recognize if you look over packages that use Rcpp.
cppFunction(
std::vector< bool > is_dmarc_valid(std::vector< std::string> domains,
std::vector< std::string> dmarc_records) {
std::vector< bool > out(dmarc_records.size());
DMARC_POLICY_T *pctx;
OPENDMARC_STATUS_T status;
pctx = opendmarc_policy_connect_init((u_char *)"1.2.3.4", 0);
for (unsigned int i=0; i<dmarc_records.size(); i++) {
status = opendmarc_policy_store_dmarc(
pctx,
(u_char *)dmarc_records[i].c_str(),
(u_char *)domains[i].c_str(),
NULL
);
out[i] = (status == DMARC_PARSE_OKAY);
pctx = opendmarc_policy_connect_rset(pctx);
}
pctx = opendmarc_policy_connect_shutdown(pctx);
return(out);
}
,
plugins=c("libresolv", "libopendmarc"))
(Note: the code-formatting plugin was tossing a serious fit about the long text field so you’ll need to put a single quote after cppFunction(
and before the line with the ,
if you’re cutting and pasting at home).
Right at the end, the final parameter is telling cppFunction()
what plugins to use.
Executing that line shunts a modified version of the function to disk, compiles it and lets us use the function in R (use cacheDir
, showOutput
and verbose
parameters to control how many gory details lie undeneath this pristine shell).
After running the function, is_dmarc_valid()
is available in the environment and ready to use.
domains <- c("bit.ly", "bizible.com", "blackmountainsystems.com", "blackspoke.com")
dmarc <- c("v=DMARC1; p=none; pct=100; rua=mailto:dmarc@bit.ly; ruf=mailto:ruf@dmarc.bitly.net; fo=1;",
"v=DMARC1; p=reject; fo=1; rua=mailto:postmaster@bizible.com; ruf=mailto:forensics@bizible.com;",
"v=DMARC1; p=quarantine; pct=100; rua=mailto:demarcrecords@blkmtn.com, mailto:ttran@blkmtn.com",
"user.cechire.com.")
is_dmarc_valid(domains, dmarc)
## [1] TRUE TRUE TRUE FALSE
Processing those 5 took just about 10 microseconds which meant I could process the ~1,000,000 domains+DMARCs in no time at all. And, I have something I can use in a DMARC utility package (coming “soon”).
Hopefully this was a useful reference for both hooking up external libraries to “inline” Rcpp functions and for how to go about doing this type of thing in general.