hrbrmstr, Author at rud.is

Author Archives: hrbrmstr

Don't look at me…I do what he does — just slower. #rstats avuncular • ?Resistance Fighter • Cook • Christian • [Master] Chef des Données de Sécurité @ @rapid7

Keeping Track Of URLs Shared On Bluesky

While the future of Bluesky is nowhere near certain, it is most certainly growing. It’s also the largest community of users for the AT Protocol.

Folks are using Bluesky much the same way as any online forum/chat. One of those ways is to share URLs to content.

For the moment, it is possible to eavesdrop on the Bluesky “firehose” sans authentication. I’ve been curious as to what folks are sharing on the platform and decided to do more than poke at it casually in my hacky terminal firehose viewer.

This GitLab project contains all the code necessary to log URLs seen in the firehose to a local SQLite database. As Bluesky grows, this will definitely not scale, but it’s fine for right now, and scaling just means moving the websocket capture client to a more capable environment than my home server and setting up something like a Kafka stream. Might as well move to Postgres while we’re at it.

But, for now, this lightweight script/database is fine.

NOTE: I’m deliberately not tracking any other data, but the code is easy to modify to log whatever you want from the firehose post.

I’m syncing the data to this server every ~30 minutes or so and have created an Observable notebook which keeps track of the most popular domains.

I don’t know what card.syui.ai is (Perplexity had some ideas), but it appears to be some AI-driven “card” game that has AT protocol and ActivityPub integration. Due to the programmatic nature of the posts with URLs containing that domain, I suspect it’ll be in the lead for quite some time.

There are some neat sites in the long tail of the distribution.

I think I’ll set up one to monitor post with CVE’s, soon, too.

New R Package For HTTP Headers Hashing

HTTP Headers Hashing (HHHash) is a technique developed by Alexandre Dulaunoy to generate a fingerprint of an HTTP server based on the headers it returns. It employs one-way hashing to generate a hash value from the list of header keys returned by the server. The HHHash value is calculated by concatenating the list of headers returned, ordered by sequence, with each header value separated by a colon. The SHA256 of this concatenated list is then taken to generate the HHHash value. HHHash incorporates a version identifier to enable updates to new hashing functions.

While effective, HHHash’s performance relies heavily on the characteristics of the HTTP requests, so correlations are typically only established using the same crawler parameters. Locality-sensitive hashing (LSH) could be used to calculate distances between sets of headers for more efficient comparisons. There are some limitations with some LSH algorithms (such as the need to pad content to a minimum byte length) that make the initial use of SHA256 hashes a bit more straightforward.

Alexandre made a Python library for it, and I cranked out an R package for it as well.

There are three functions exposed by {hhhash}:

build_hash_from_response: Build a hash from headers in a curl
response object
build_hash_from_url: Build a hash from headers retrieved from a URL
hash_headers: Build a hash from a vector of HTTP header keys

The build_hash_from_url function relies on {curl} vs {httr} since {httr} uses curl::parse_headers() which (rightfully so) lowercases the header keys. We need to preserve both order and case for the hash to be useful.

Here is some sample usage:

remotes::install_github("hrbrmstr/hhhash")

library(hhhash)

build_hash_from_url("https://www.circl.lu/")
## [1] "hhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc"

res <- curl::curl_fetch_memory("https://www.circl.lu/", curl::new_handle())

build_hash_from_response(res)
## [1] "hhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc"

c(
  "Date", "Server", "Strict-Transport-Security",
  "Last-Modified", "ETag", "Accept-Ranges",
  "Content-Length", "Content-Security-Policy",
  "X-Content-Type-Options", "X-Frame-Options",
  "X-XSS-Protection", "Content-Type"
) -> keys

hash_headers(keys)
## [1] "hhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc"

Poor Dude’s Janky Bluesky Feed Reader CLI Via R & Python

Lynn (of TITAA and general NLP wizardy fame) was gracious enough to lend me a Bluesky invite, so I could claim my handle on yet-another social media site. I’m still wary of it (as noted in one of this week’s Drops), but the AT protocol — whilst super (lacking a better word) “verbose” — is pretty usable, especially thanks to Ilya Siamionau’s atproto AT Protocol SDK for Python.

Longtime readers know I am most certainly not going to use Python directly, as such practice has been found to cause early onset dementia. But, that module is so well done that I’ll gladly use it from within R.

I whipped up a small R script CLI that will fetch my feed and display it via the terminal. While I also use the web app and the Raycast extension to read the feed, it’s a billion degrees outside, so used the need to stay indoors as an excuse to add this third way of checking what’s new.

Store your handle and app-password in BSKY_USER and BSKY_KEY, respectively, adjust the shebang accordingly, add execute permissions to the file and 💥, you can do the same.

#!/usr/local/bin/Rscript

suppressPackageStartupMessages({
  library(reticulate, quietly = TRUE, warn.conflicts = FALSE)
  library(lubridate, include.only = c("as.period", "interval"), quietly = TRUE, warn.conflicts = FALSE)
  library(crayon, quietly = TRUE, warn.conflicts = FALSE)
})

# Get where {reticlulate} thinks your python is via py_config()$python
# then use the full path to 
#   /full/path/to/python3 -m pip install atproto

atproto <- import("atproto")

client <- atproto$Client()

profile <- client$login(Sys.getenv("BSKY_USER"), Sys.getenv("BSKY_KEY"))

res <- client$bsky$feed$get_timeline(list(algorithm = "reverse-chronological"))

for (item in rev(res$feed)) (
  cat(
    blue(item$post$author$displayName), " • ",
    silver(gsub("\\.[[:digit:]]+", "", tolower(as.character(as.period(interval(item$post$record$createdAt, Sys.time()))))), "ago\n"),
    italic(paste0(strwrap(item$post$record$text, 50), collapse="\n")), "\n",
    ifelse(
      hasName(item$post$record$embed, "images"), 
      sprintf(
        green("[%s IMAGE%s]\n"), 
        length(item$post$record$embed$images),
        ifelse(length(item$post$record$embed$images) > 1, "s", "")
      ),
      ""
    ),
    ifelse(
      hasName(item$post$record$embed, "external"),
      yellow(sprintf(
        "\n%s\n   │\n%s\n\n",
        bold(paste0(strwrap(item$post$embed$external$title, 47, prefix = "   │"), collapse = "\n")),
        italic(paste0(strwrap(item$post$embed$external$description, 47, prefix = "   │"), collapse = "\n"))
      )),
      ""
    ),
    "\n",
    sep = ""
  )
)

This is a sample of the output, showing how it handles embeds and images:

Code is on GitLab.

FIN

There’s tons of room for improvement in this hastily-crafted bit of code, and I’ll get it up on GitLab once their servers come back to life.

If you want to experience Bluesky but have no account, the firehose — which Elon charges $40K/month for on the birdsite — is free and can be accessed sans authentication:

library(reticulate)

atproto <- import("atproto")

hose <- atproto$firehose$FirehoseSubscribeReposClient()

handler <- \(msg) {
  res <- atproto$firehose$parse_subscribe_repos_message(msg)
  print(res) # you need to do a bit more than this to get the actual commit type and contents
}

hose$start(handler)

You can find me over on bsky at @hrbrmstr.dev.

Supreme Annotations Plot Redux & An OJS Plot↔ggplot2 Rosetta Stone

Back in 2016, I did a post on {ggplot2} text annotations because it was a tad more challenging to do some of the things in that post back in the day.

Since I’ve been moving back and forth between R and Observable (and JavaScript in general), I decided to recreate that post in OJS Plot, as it is also somewhat challenging to use this nascent new plot player in town.

Getting Observable embeds right in a theme that supports both dark/light mode is kind of a pain (either that or I’m just lazy), and I already have enough holes in this site’s content security policy (and, I’d have to poke more holes to get fonts working “properly”); so, you can visit the link to see the crisp SVG version and suffer the PNG below to see that Plot is a very capable companion to {ggplot2}.

Speaking of {ggplot2}, Daily Drop subscribers were given some homework (happens every Friday) that involved working in Observable Plot. The catch was that said work was to recreate {ggplot2} geom_ examples in the R manual pages for {ggplot2}.

You can check out that post, go to a hosted version of the starter “Rosetta Stone” set of examples I provided, poke at the Observable notebook version, or head over to GitHub where the online play ground source lives, along with a Quarto document with all the examples.

Start Creating Vanilla JS WebR Apps With Less Inertia

WebR has a template for React, but I’m not a fan of it or Vue (a fact longtime readers are likely tired of hearing right about now). It’s my opinion and experience that Lit webcomponents come closer to “bare metal” webcomponents, which means the “lock-in” with Lit is way less of a potential headache than all the baggage that comes with the other frameworks.

I leaned into exposition for most of my WebR Experiments, but that very likely made it hard to reproduce or even use those repos without some pain. So, I decided to reduce the pain, remove the inertia and make a template (GH) you can use almost immediately from the command line.

The only “inertia” is that you need npm installed. Subsequently, cd into the parent directory of the new project you want to make and:

npx create-webr-vite-lit my-webr-project
cd my-webr-project
npm install
npx vite --port=4000

then, hit `http://localhost:4000/` (change the port if you’re already using it for something else).

You can check it out on this demo site.

Batteries Included

Vite (for fast building)
WebR (duh)
- r.js which has all the setup code for WebR + some helpers (more coming here, too)
Pyodide (initiall disabled)
- py.js which does not get used but is available if you want to use pyodide
Lit (webcomponents) — it ships with 3:
- one for my usual “loading…” status message (which gets a facelift)
- one generic webcomponent for Observable Plots
- one simple “button” webcomponent to trigger simple actions
- more are coming! The goal is to wrap all the inputs and outputs provided by Bonsai (below). PR’s welcome!
A lightweight CSS framework called Bonsai that I added dark-mode support for. The post-create default page is the Bonsai grid & CSS reference. The webcomponents show how to make all the Bonsai styles available to the components (there’s a default full separation of all things from the webcomponents).
An example justfile since I’ve grown quite fond of Just

The default/demo “app” demonstrates how all the components work.

FIN

This setup should have you up and running with your own apps in no time.

I tested light/dark mode switching in Chrome and Safari (macOS/iOS) and the dark/light switching works as intended. Arc doesn’t respond to it, so I’ll be debugging that.

Drop issues in GH if I need to tweak the dark mode, or if you run into snags.

Make “Solar System” Plots With {ggsolar}

I was cranking out a blog post for work earlier this week that shows off just how many integrations our platform has. I won’t blather about that content here, but as I was working on it, I really wanted to show off all the integrations.

A table seemed far too boring.

Several categorized unordered lists seemed too unwieldy.

Then, it dawned on me that I could make a visual representation of all the integration partners we have by thinking of the entire integrations’ ecosystem as a “universe” with each category being a “solar system” of that universe.

I’ve been leaning more heavily on javsascript for datavis these days, but I will always be more comfortable in {ggplot2}, so I headed to R to design a way to:

generate concentric orbits for “n” solar systems
randomize the placement of the planets in each ring
make a decent plot!

I worked with one of the most amazing designers on the planet (heh) to come up with some stellar (heh) styling for it, and this was the result:

I took the styling guidance and wrapped the messy, individual functions I had into a new {ggsolar} package, you can find at https://github.com/hrbrmstr/ggsolar.

It’s pretty raw, and I need to “geomify” it at some point, but it has

a function to generate the concentric circle polygons
another one to identify a random point on each ring
a naive plotting function, and
a theme cleanup function for decent output.

The default is to generate uniformly distributed concentric circles, but you have the option of supplying a custom radii vector to make it more “real”/“solar-sysetm-y”.

Here’s the general flow:

# sol_planets is a built in vector of our system's planet names
sol_orbits <- generate_orbits(sol_planets)

set.seed(1323) # this produced a decent placements

# naive but it works! You can specify your own point picker, too.
placed_planets <- randomize_planet_positions(sol_orbits)

# do the thing!
plot_orbits(
  orbits = sol_orbits, 
  planet_positions = placed_planets,
  label_planets = TRUE,
  label_family = hrbrthemes::font_es_bold
) +
  hrbrthemes::theme_ipsum_es(grid="") +
  coord_equal() +
  labs(
    title = "Sol",
    caption = "Pluto is 100% a planet"
  ) +
  theme_enhance_solar()

Random Systems

I included a generate_random_planets() function that uses a hidden Markov model to create believable planetary names, so you can now make your own universe with {ggplot2}!

set.seed(42)
(rando_planets <- generate_random_planets(12))

rando_orbits <- generate_orbits(rando_planets)

set.seed(123) # this produced decent placements

placed_planets <- randomize_planet_positions(rando_orbits)

plot_orbits(
  orbits = rando_orbits, 
  planet_positions = placed_planets,
  label_planets = TRUE,
  label_family = hrbrthemes::font_es_bold
) +
  hrbrthemes::theme_ipsum_es(grid="") +
  coord_equal() +
  labs(
    title = "Rando System"
  ) +
  theme_enhance_solar()

FIN

Kick the tyres, use {gganimate} to make some animations, and be the ruler of your own universe! (We’re going to try to generate team “org charts” with these later in the week, so be creative, too!).

Introducing WebRIDEr: The WebR “IDE”-ish REPL You Didn’t Know You Needed

The official example WebR REPL is definitely cool and useful to get the feel for WebR. But, it is far from an ideal way to deal with it interactively, even as just a REPL.

As y’all know, I’ve been conducing numerous experiments with WebR and various web technologies. I started doing this for numerous reasons, one was to get folks excited about WebR and try to show there are endless possibilities for it (and hopefully avoid lock-in to prescribed views on how you should work with it). Another was to brush up on rusty web skills and have something fun to do during the continuing long aftermath of my spike protein invasion.

I started poking under the WebR covers this past weekend, and until there’s a more pyodide-like JS bridge on the R side of WebR, I decided to forego said spelunking. Instead, I began a dive into {plot2}, a really neat {ggplot2}-esque enhancement to base R plotting. While I could use any R-compatible IDE (there are many, btw), I wanted to do all the experiments in WebR-proper, since base plots work out of the box and the {ggplot2} ecosystem takes a bit of time to install. The tinkering began just fine, but it became a bit tedious doing browser refreshes (they’re automatic with Vite in dev mode) for small tweaks. There was no way I was using the official REPL given the lack of real interactivity in the console. And, I wanted to avoid keeping re-rendering Quarto documents, since that would have been as tedious as the Vite refreshes.

So, I decided to make an “IDE REPL” for WebR, so I could work with it like I would R in Sublime Text, VS Code, or RStudio. I mean, wouldn’t everyone?

You can check it out here, and the source is on GitHub.

I’m not going to take up much more time here, since it comes with some explanations out of the box, but I will reproduce the GH README for it at the end. I will present the structure of the project, here, to make it easier to build upon it (clone/fork away!).

I’m using Monaco, the editor that powers VS Code and the online GitHub editor. It has so many batteries included that it’s hard not to want to use it, even considering how much I despise Microsoft as a company. It is dead simple to use.

The entire project is in vanilla javascript, and there is no builder this time, since I wanted to make this as accessible to as many folks as possible.

This is the project structure:

├── boilerplate.js # text that appears in the source on first load or hard refresh
├── completions.js # a decent number of R completions (I'll add more)
├── index.css      # core CSS
├── index.html     # HTML shell
├── main.js        # Main "app"
├── resizers.js    # We need to keep the panes sized properly
├── r.js           # Some WebR bits
└── rlang.js       # Language stuff for Microsoft's Monaco editor

Rather than adorn the interface with silly buttons and baubles, I am putting functionality into the Monaco command palette.

Here’s what you’ve got with v0.1.0:

Auto-saves current source pane contents to local storage
R syntax highlighting
An oddly decent # of auto-completes
cmd-shift-p to bring up command palette
- WebR: Clear Local Storage — nuke local storage and replace with the default document
- WebR: Save Canvas as PNG — captain obvious
- WebR: Save Source Pane — captain obvious
- WebR: View WebR Environment Summary — captain obvious
- WebR: View WebR History — captain obvious
cmd-shift-i inserts |>
option-shift-minus inserts <-
watches for ?… and will open up a new tab for web help on whatev u searched for (XSS protected)
watches for broweURL(…) and will open up the URL in a new tab (XSS protected)
baked-in install.runiverse(pkg) which will try to install a pkg from R Universe. It is ON YOU to load the deps and ensure all deps and the pkg itself will work in WebR. You can use this tool I made to help you out.

FIN

Apart from making the current functionality more robust/pretty, one big forthcoming advancement will be the ability to save/load the WebR workspace to local browser storage. What that will mean for you, is that you can go to an instance of the app, all source changes will automagicallly be saved/restored to the session between visits. Plus — if you’ve saved the workspace image — it will be auto-restored on the visit, leaving you to just have to re-install/load any necessary packages. This means you can get right back to “work”.

I’ll be adding the ability to load files from your local system and use {svglite} for graphics (Monaco has an amazing SVG viewer), and to actually work in the R Console area (either with some janky input box or janky xterm.js).

Kick the tyres, file bugs, feature enhancements, and PRs, and start playing more with WebR!

Using WebR + Pyodide To Fill In The (Temporary) Package Gaps

I won’t wax long and poetic here since I’ve already posted the experiment that has all the details.

TL;DR: there are still only ~90-ish 📦 in the WebR WASM “CRAN”, but more are absolutely on the way, including the capability to build your own CRAN and dev packages via Docker and host your own WebR WASM pkg repo.

@timelyportfolio created an experimental method to install built base R packages from R Universe, and I enhanced that method in another, recent experiment, but that’s a bit wonky, and you have to do some leg work to figure out if a package can be installed and then do a bunch of manual work (though that Observable notebook will save you time).

The aforelinked new experiment shows how to use Pyodide side-by-side with WebR. While this one doesn’t have them sharing data or emscripten filesystems yet, we’ll get there! This puts SCADS of Python packages at your fingertips to fill in the gap while we wait for more R 📦 to arrive.

Code is up on GitHub but hit the experiment first to see what’s going on.

A small taste of the experiment.