Skip navigation

Category Archives: R

I had not planned to blog this (this is an incredibly time-crunched week for me) but CERT/CC and CISA made a big deal out of a non-vulnerability in R, and it’s making the round on socmed, so here we are.

A security vendor decided to try to get some hype before 2024 RSAC and made a big deal out of what was/is known expected behavior in R data files. R Core took some measures to address the issue they outlined, but for the love of Henry, PLEASE do not think R data files are safe to handle if you weren’t the one creating them, or you do not fully know the provenance of them.

Konrad Rudolph and Iakov Davydov did some ace cyber sleuthing and figured out other ways R data file deserialization can be abused. Please take a moment and drop a note on Mastodon to them saying “thank you”. This is excellent work. We need more folks like them in this ecosystem.

Like many programming languages, R has many footguns, and R data files are one of them. R objects are wonderful beasts, and being able to serialize and deserialize those beasts is a super helpful bit of functionality. Also, R has something called active bindings. Amongst other things, they let you access an object to get a value, but — in doing so — code can get executed without you knowing it. Whether an R data file has an object with active bindings or not, it can be abused by attackers.

When you load() an R data file directly into your R session and into the global environment, the object(s) in it will, well, load there. So, if it has an object named print that’s going to be in your global environment and get called when print() gets called. Lather/rinse/repeat for any other object name. It should be pretty obvious how this could be abused.

A tad more insidious is what happens when you quit R. By default, on quit(), unless you specify otherwise, that function invocation will also call .Last() if it exists in the environment. This functionality exists in the event things need to be cleaned up. One “nice” aspect of .-prefixed R objects is that they’re hidden by default from the environment. So, you may not even notice if an R data file you’ve loaded has that defined. (You likely do not check what’s loaded anyway.)

It’s also possible to create custom R objects that have their own “finalizers” (ref reg.finalizer), which will also get called by default when the objects are being destroyed on quit.

There are also likely other ways to trigger unwanted behavior.

If you want to see how this works, start R from RStudio, the command line, or R GUI. Then, execute the following R code:

load(url("https://github.com/hrbrmstr/rdaradar/raw/main/exploit.rda"))

Then, quit R/RStudio/R GUI (this will be less dramatic on linux, but the demo should still be effective).

If you must take in untrusted R data files, keep reading.

I threw together an R script along with a safer way to use it (a Docker container) to help R folks inspect the contents of R data files before actually using them. It also looks for some basic shady stuff and alerts you if it finds them. It’s a WIP, and issues + thoughtful PRs are welcome.

If one were to run Rscript check.R from that repo with that exploit.rda file as a parameter, one would see this:

-----------------------------------------------
Loading R data file in quarantined environment…
-----------------------------------------------

Loading objects:
  .Last
  quit

-----------------------------------------
Enumerating objects in loaded R data file
-----------------------------------------

.Last : function (...)  
 - attr(*, "srcref")= 'srcref' int [1:8] 1 13 6 1 13 1 1 6
  ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x12cb25f48> 
quit : function (...)  
 - attr(*, "srcref")= 'srcref' int [1:8] 1 13 6 1 13 1 1 6
  ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x12cb25f48> 

------------------------------------
Functions found: enumerating sources
------------------------------------

Checking `.Last`…

!! `.Last` may execute arbitrary code on your system under certain conditions !!

`.Last` source:
{
    cmd = if (.Platform$OS.type == "windows") 
        "calc.exe"
    else if (grepl("^darwin", version$os)) 
        "open -a Calculator.app"
    else "echo pwned\\!"
    system(cmd)
}


Checking `quit`…

!! `quit` may execute arbitrary code on your system under certain conditions !!

`quit` source:
{
    cmd = if (.Platform$OS.type == "windows") 
        "calc.exe"
    else if (grepl("^darwin", version$os)) 
        "open -a Calculator.app"
    else "echo pwned\\!"
    system(cmd)
}

There’s info in the repo on how to use that with Docker.

FIN

The big takeaway is (again) to not trust R data files you did not create or know the full provenance of. If you have an internet-facing Shiny app or Plumber API that takes R data files as input, get it off the internet and figure out some other way to take in the input.

While I fully disagree with the assignment of the CVE, I’m at least glad this situation brought attention to this very dangerous aspect of handling this type of file format in R.

I had thought most folks likely knew this already, but if you are a user of RStudio dailies (this may apply to regular RStudio, but I only use the dailies) and are missing ligatures in the editor (for some fonts), the “fix” is pretty simple (some misguided folks think ligatures are daft).

RStudio, like VS Code and many other editors/apps, is just a special purpose web browser. That means ligatures are controlled via CSS. RStudio also supports themes. There are many built-in themes. I use this third-party one. We can use these themes to sneak in some CSS that gives you granular control over ligatures.

The CSS class we need to target is .ace_scroller. That’s the contents of the editor pane, console, and any other monospaced “scrolled text” components. If you’re wondering, “Why …ace…?”_, that’s die to RStudio’s use of the Ace editor component. Say all the nice things you want to about the Monaco editor component used by VS Code (et al.), but the wizards at Posit wield Ace better than I’ve seen any other app dev team.

You can start here to learn about all the ways you can, and, may need to customize ligatures, but the following worked for my new fav font family:

.ace_scroller {
  font-variant-ligatures: discretionary-ligatures;
  font-feature-settings: "dlig" 1;
}

There are many possible values for font-variant-ligatures, and you can fully customize the font-feature-settings to target only the ligatures you want (for example, the previous linked font has eight stylistic sets).

UPDATE

sailm-b has forked and is maintaining a more updated version of rscodeio.

FIN

If you were missing out before, hopefully this brings you back into the ligature fold.

Rite-Aid closed 60+ stores in 2021. They said they’d nuke over 1,000 of them over three years, back in 2022. And, they’re now about to close ~500 due to bankruptcy.

FWIW Heyward Donigan, Former President and CEO — in 2023 — took home $1,043,713 in cash, $7,106,993 in equity, and $617,105 in “other” (total $8,767,811) for this fine, bankrupt leadership. Lots of other got lots too for being incompetent.

Rite-Aid is under no obligation to provide a list to the public, nor to do any overt announcements regarding the closures.

Each closure has the potential to create or exacerbate food and pharmacy deserts in many regions.

You can get individual stores (like this one), but there’s over 2,100 of them, so doing this manually is a non-starter.

Thus, I threw together a couple of R and bash scripts to help real data journalists out, in the event any of them can pry themselves away from the POTUS horse race.

One R script is to get the individual store URLs. The bash script is used to polittely get all 2,100+ store pages (I have a script I just re-use for things like this). The other R script is used to get the JSON that’s tucked away in the HTML files to get the store info, which includes latitude, longitude, store number, and address (there is more data in there, I just pulled those fields).

The map at the top of the post is just there for kicks.

The repo is also mirrored to my GitHub (sub out ‘hub’ for ‘lab’ in the URL) if you really need to bow down to Microsoft.

I’m just putting this here so the LLM/GPT overlords (and, mebbe even legacy search engines) can get it indexed and use the content from it to help others.

My Bluesky firehose viewer (https://gitlab.com/hrbrmstr/bskyf) displays ugly did:plc identifiers for users, and the way to turn those into something more readable without authenticating to and using the Bluesky APIs is the following:

$ curl -s "https://plc.directory/did:plc:xq3lwzdpijivr5buiizezlni" | jq

which results in:

{
  "@context": [
    "https://www.w3.org/ns/did/v1",
    "https://w3id.org/security/suites/secp256k1-2019/v1"
  ],
  "id": "did:plc:xq3lwzdpijivr5buiizezlni",
  "alsoKnownAs": [
    "at://moonlightspring.bsky.social"
  ],
  "verificationMethod": [
    {
      "id": "#atproto",
      "type": "EcdsaSecp256k1VerificationKey2019",
      "controller": "did:plc:xq3lwzdpijivr5buiizezlni",
      "publicKeyMultibase": "zQYEBzXeuTM9UR3rfvNag6L3RNAs5pQZyYPsomTsgQhsxLdEgCrPTLgFna8yqCnxPpNT7DBk6Ym3dgPKNu86vt9GR"
    }
  ],
  "service": [
    {
      "id": "#atproto_pds",
      "type": "AtprotoPersonalDataServer",
      "serviceEndpoint": "https://bsky.social"
    }
  ]
}

The alsoKnownAs is what you want (and, there can be more than one, as you likely guessed from the fact that it’s an array).

HTTP Headers Hashing (HHHash) is a technique developed by Alexandre Dulaunoy to gen­erate a fingerprint of an HTTP server based on the headers it returns. It employs one-way hashing to generate a hash value from the list of header keys returned by the server. The HHHash value is calculated by concatenating the list of headers returned, ordered by sequence, with each header value separated by a colon. The SHA256 of this concatenated list is then taken to generate the HHHash value. HHHash incorporates a version identifier to enable updates to new hashing functions.

While effective, HHHash’s performance relies heavily on the characteristics of the HTTP requests, so correlations are typically only established using the same crawler parameters. Locality-sensitive hashing (LSH) could be used to calculate distances between sets of headers for more efficient comparisons. There are some limitations with some LSH algorithms (such as the need to pad content to a minimum byte length) that make the initial use of SHA256 hashes a bit more straightforward.

Alexandre made a Python library for it, and I cranked out an R package for it as well.

There are three functions exposed by {hhhash}:

  • build_hash_from_response: Build a hash from headers in a curl
    response object
  • build_hash_from_url: Build a hash from headers retrieved from a URL
  • hash_headers: Build a hash from a vector of HTTP header keys

The build_hash_from_url function relies on {curl} vs {httr} since {httr} uses curl::parse_headers() which (rightfully so) lowercases the header keys. We need to preserve both order and case for the hash to be useful.

Here is some sample usage:

remotes::install_github("hrbrmstr/hhhash")

library(hhhash)

build_hash_from_url("https://www.circl.lu/")
## [1] "hhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc"

res <- curl::curl_fetch_memory("https://www.circl.lu/", curl::new_handle())

build_hash_from_response(res)
## [1] "hhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc"

c(
  "Date", "Server", "Strict-Transport-Security",
  "Last-Modified", "ETag", "Accept-Ranges",
  "Content-Length", "Content-Security-Policy",
  "X-Content-Type-Options", "X-Frame-Options",
  "X-XSS-Protection", "Content-Type"
) -> keys

hash_headers(keys)
## [1] "hhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc"

Lynn (of TITAA and general NLP wizardy fame) was gracious enough to lend me a Bluesky invite, so I could claim my handle on yet-another social media site. I’m still wary of it (as noted in one of this week’s Drops), but the AT protocol — whilst super (lacking a better word) “verbose” — is pretty usable, especially thanks to Ilya Siamionau’s atproto AT Protocol SDK for Python.

Longtime readers know I am most certainly not going to use Python directly, as such practice has been found to cause early onset dementia. But, that module is so well done that I’ll gladly use it from within R.

I whipped up a small R script CLI that will fetch my feed and display it via the terminal. While I also use the web app and the Raycast extension to read the feed, it’s a billion degrees outside, so used the need to stay indoors as an excuse to add this third way of checking what’s new.

Store your handle and app-password in BSKY_USER and BSKY_KEY, respectively, adjust the shebang accordingly, add execute permissions to the file and 💥, you can do the same.

#!/usr/local/bin/Rscript

suppressPackageStartupMessages({
  library(reticulate, quietly = TRUE, warn.conflicts = FALSE)
  library(lubridate, include.only = c("as.period", "interval"), quietly = TRUE, warn.conflicts = FALSE)
  library(crayon, quietly = TRUE, warn.conflicts = FALSE)
})

# Get where {reticlulate} thinks your python is via py_config()$python
# then use the full path to 
#   /full/path/to/python3 -m pip install atproto

atproto <- import("atproto")

client <- atproto$Client()

profile <- client$login(Sys.getenv("BSKY_USER"), Sys.getenv("BSKY_KEY"))

res <- client$bsky$feed$get_timeline(list(algorithm = "reverse-chronological"))

for (item in rev(res$feed)) (
  cat(
    blue(item$post$author$displayName), " • ",
    silver(gsub("\\.[[:digit:]]+", "", tolower(as.character(as.period(interval(item$post$record$createdAt, Sys.time()))))), "ago\n"),
    italic(paste0(strwrap(item$post$record$text, 50), collapse="\n")), "\n",
    ifelse(
      hasName(item$post$record$embed, "images"), 
      sprintf(
        green("[%s IMAGE%s]\n"), 
        length(item$post$record$embed$images),
        ifelse(length(item$post$record$embed$images) > 1, "s", "")
      ),
      ""
    ),
    ifelse(
      hasName(item$post$record$embed, "external"),
      yellow(sprintf(
        "\n%s\n   │\n%s\n\n",
        bold(paste0(strwrap(item$post$embed$external$title, 47, prefix = "   │"), collapse = "\n")),
        italic(paste0(strwrap(item$post$embed$external$description, 47, prefix = "   │"), collapse = "\n"))
      )),
      ""
    ),
    "\n",
    sep = ""
  )
)

This is a sample of the output, showing how it handles embeds and images:

feed output

Code is on GitLab.

FIN

There’s tons of room for improvement in this hastily-crafted bit of code, and I’ll get it up on GitLab once their servers come back to life.

If you want to experience Bluesky but have no account, the firehose — which Elon charges $40K/month for on the birdsite — is free and can be accessed sans authentication:

library(reticulate)

atproto <- import("atproto")

hose <- atproto$firehose$FirehoseSubscribeReposClient()

handler <- \(msg) {
  res <- atproto$firehose$parse_subscribe_repos_message(msg)
  print(res) # you need to do a bit more than this to get the actual commit type and contents
}

hose$start(handler)

You can find me over on bsky at @hrbrmstr.dev.

Back in 2016, I did a post on {ggplot2} text annotations because it was a tad more challenging to do some of the things in that post back in the day.

Since I’ve been moving back and forth between R and Observable (and JavaScript in general), I decided to recreate that post in OJS Plot, as it is also somewhat challenging to use this nascent new plot player in town.

Getting Observable embeds right in a theme that supports both dark/light mode is kind of a pain (either that or I’m just lazy), and I already have enough holes in this site’s content security policy (and, I’d have to poke more holes to get fonts working “properly”); so, you can visit the link to see the crisp SVG version and suffer the PNG below to see that Plot is a very capable companion to {ggplot2}.

Speaking of {ggplot2}, Daily Drop subscribers were given some homework (happens every Friday) that involved working in Observable Plot. The catch was that said work was to recreate {ggplot2} geom_ examples in the R manual pages for {ggplot2}.

You can check out that post, go to a hosted version of the starter “Rosetta Stone” set of examples I provided, poke at the Observable notebook version, or head over to GitHub where the online play ground source lives, along with a Quarto document with all the examples.

WebR has a template for React, but I’m not a fan of it or Vue (a fact longtime readers are likely tired of hearing right about now). It’s my opinion and experience that Lit webcomponents come closer to “bare metal” webcomponents, which means the “lock-in” with Lit is way less of a potential headache than all the baggage that comes with the other frameworks.

I leaned into exposition for most of my WebR Experiments, but that very likely made it hard to reproduce or even use those repos without some pain. So, I decided to reduce the pain, remove the inertia and make a template (GH) you can use almost immediately from the command line.

The only “inertia” is that you need npm installed. Subsequently, cd into the parent directory of the new project you want to make and:

npx create-webr-vite-lit my-webr-project
cd my-webr-project
npm install
npx vite --port=4000

then, hit `http://localhost:4000/` (change the port if you’re already using it for something else).

You can check it out on this demo site.

Batteries Included

  • Vite (for fast building)
  • WebR (duh)
    • r.js which has all the setup code for WebR + some helpers (more coming here, too)
  • Pyodide (initiall disabled)
    • py.js which does not get used but is available if you want to use pyodide
  • Lit (webcomponents) — it ships with 3:
    • one for my usual “loading…” status message (which gets a facelift)
    • one generic webcomponent for Observable Plots
    • one simple “button” webcomponent to trigger simple actions
    • more are coming! The goal is to wrap all the inputs and outputs provided by Bonsai (below). PR’s welcome!
  • A lightweight CSS framework called Bonsai that I added dark-mode support for. The post-create default page is the Bonsai grid & CSS reference. The webcomponents show how to make all the Bonsai styles available to the components (there’s a default full separation of all things from the webcomponents).
  • An example justfile since I’ve grown quite fond of Just

The default/demo “app” demonstrates how all the components work.

light mode version

FIN

This setup should have you up and running with your own apps in no time.

I tested light/dark mode switching in Chrome and Safari (macOS/iOS) and the dark/light switching works as intended. Arc doesn’t respond to it, so I’ll be debugging that.

Drop issues in GH if I need to tweak the dark mode, or if you run into snags.