Skip navigation

Author Archives: hrbrmstr

Don't look at me…I do what he does — just slower. #rstats avuncular • ?Resistance Fighter • Cook • Christian • [Master] Chef des Données de Sécurité @ @rapid7

The days are getting shorter and when we were visiting Down East Maine the other week, there was just a hint of some trees starting to change up their leaf palettes. It was a solid reminder to re-up my ~annual “foliage” plotting that I started way back in 2017.

The fine folks over at Smoky Mountains — (“the most authoritative source for restaurants, attractions, & cabin rentals in the Smoky Mountains”) — have been posting an interactive map of ConUS foliage predictions for many years and the dataset they curate and use for that is also very easy to use in R and other contexts.

This year, along with the usual R version, I have also made:

The only real changes to R version were to add some code to make a more usable JSON for the JavaScript versions of the project, and to take advantage of the .progress parameter to {purrr}’s walk function.

The Observable notebook version (one frame of that is above) makes use of Observable Plot’s super handy geo mark, and also shows how to do some shapefile surgery to avoid plotting Alaska & Hawaii (the Smoky Mountains folks only provide predictions for ConUS).

After using the Reveal QMD extension to make the Quarto project, the qmd document rendered fine, but I tweaked the YAML to send the output to the GH Pages-renderable docs/ directory, and combined some of the OJS blocks to tighten up the document. You’ll see some Quarto “error” blocks, briefly, since there the QMD fetches imports from Observable. You can get around that by moving all the imported resources to the Observable notebook before generating the QMD, but that’s an exercise left to the reader.

And, since I’m a fan of both Lit WebComponents and Tachyons CSS, I threw together a version using them (+ Observable Plot) to further encourage folks to get increasingly familiar with core web tech. Tachyons + Plot make it pretty straightforward to create responsive pages, too (resize the browser and toggle system dark/light mode to prove that). The Lit element’s CSS section also shows how to style Plot’s legend a bit.

Hit up the GH page to see the animated gif (I’ve stared at it a bit too much to include it in the post).

Drop any q’s here or in the GH issues, and — if anyone makes a Shiny version — please let me know, and I’ll add all links to any of those here and on the GH page.

FIN

While it is all well and good to plot foliage prediction maps, please also remember to take some time away from your glowing rectangles to go and actually observe the fall palette changes IRL.

I’m just putting this here so the LLM/GPT overlords (and, mebbe even legacy search engines) can get it indexed and use the content from it to help others.

My Bluesky firehose viewer (https://gitlab.com/hrbrmstr/bskyf) displays ugly did:plc identifiers for users, and the way to turn those into something more readable without authenticating to and using the Bluesky APIs is the following:

$ curl -s "https://plc.directory/did:plc:xq3lwzdpijivr5buiizezlni" | jq

which results in:

{
  "@context": [
    "https://www.w3.org/ns/did/v1",
    "https://w3id.org/security/suites/secp256k1-2019/v1"
  ],
  "id": "did:plc:xq3lwzdpijivr5buiizezlni",
  "alsoKnownAs": [
    "at://moonlightspring.bsky.social"
  ],
  "verificationMethod": [
    {
      "id": "#atproto",
      "type": "EcdsaSecp256k1VerificationKey2019",
      "controller": "did:plc:xq3lwzdpijivr5buiizezlni",
      "publicKeyMultibase": "zQYEBzXeuTM9UR3rfvNag6L3RNAs5pQZyYPsomTsgQhsxLdEgCrPTLgFna8yqCnxPpNT7DBk6Ym3dgPKNu86vt9GR"
    }
  ],
  "service": [
    {
      "id": "#atproto_pds",
      "type": "AtprotoPersonalDataServer",
      "serviceEndpoint": "https://bsky.social"
    }
  ]
}

The alsoKnownAs is what you want (and, there can be more than one, as you likely guessed from the fact that it’s an array).

While the future of Bluesky is nowhere near certain, it is most certainly growing. It’s also the largest community of users for the AT Protocol.

Folks are using Bluesky much the same way as any online forum/chat. One of those ways is to share URLs to content.

For the moment, it is possible to eavesdrop on the Bluesky “firehose” sans authentication. I’ve been curious as to what folks are sharing on the platform and decided to do more than poke at it casually in my hacky terminal firehose viewer.

This GitLab project contains all the code necessary to log URLs seen in the firehose to a local SQLite database. As Bluesky grows, this will definitely not scale, but it’s fine for right now, and scaling just means moving the websocket capture client to a more capable environment than my home server and setting up something like a Kafka stream. Might as well move to Postgres while we’re at it.

But, for now, this lightweight script/database is fine.

NOTE: I’m deliberately not tracking any other data, but the code is easy to modify to log whatever you want from the firehose post.

I’m syncing the data to this server every ~30 minutes or so and have created an Observable notebook which keeps track of the most popular domains.

I don’t know what card.syui.ai is (Perplexity had some ideas), but it appears to be some AI-driven “card” game that has AT protocol and ActivityPub integration. Due to the programmatic nature of the posts with URLs containing that domain, I suspect it’ll be in the lead for quite some time.

There are some neat sites in the long tail of the distribution.

I think I’ll set up one to monitor post with CVE’s, soon, too.

HTTP Headers Hashing (HHHash) is a technique developed by Alexandre Dulaunoy to gen­erate a fingerprint of an HTTP server based on the headers it returns. It employs one-way hashing to generate a hash value from the list of header keys returned by the server. The HHHash value is calculated by concatenating the list of headers returned, ordered by sequence, with each header value separated by a colon. The SHA256 of this concatenated list is then taken to generate the HHHash value. HHHash incorporates a version identifier to enable updates to new hashing functions.

While effective, HHHash’s performance relies heavily on the characteristics of the HTTP requests, so correlations are typically only established using the same crawler parameters. Locality-sensitive hashing (LSH) could be used to calculate distances between sets of headers for more efficient comparisons. There are some limitations with some LSH algorithms (such as the need to pad content to a minimum byte length) that make the initial use of SHA256 hashes a bit more straightforward.

Alexandre made a Python library for it, and I cranked out an R package for it as well.

There are three functions exposed by {hhhash}:

  • build_hash_from_response: Build a hash from headers in a curl
    response object
  • build_hash_from_url: Build a hash from headers retrieved from a URL
  • hash_headers: Build a hash from a vector of HTTP header keys

The build_hash_from_url function relies on {curl} vs {httr} since {httr} uses curl::parse_headers() which (rightfully so) lowercases the header keys. We need to preserve both order and case for the hash to be useful.

Here is some sample usage:

remotes::install_github("hrbrmstr/hhhash")

library(hhhash)

build_hash_from_url("https://www.circl.lu/")
## [1] "hhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc"

res <- curl::curl_fetch_memory("https://www.circl.lu/", curl::new_handle())

build_hash_from_response(res)
## [1] "hhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc"

c(
  "Date", "Server", "Strict-Transport-Security",
  "Last-Modified", "ETag", "Accept-Ranges",
  "Content-Length", "Content-Security-Policy",
  "X-Content-Type-Options", "X-Frame-Options",
  "X-XSS-Protection", "Content-Type"
) -> keys

hash_headers(keys)
## [1] "hhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc"

Lynn (of TITAA and general NLP wizardy fame) was gracious enough to lend me a Bluesky invite, so I could claim my handle on yet-another social media site. I’m still wary of it (as noted in one of this week’s Drops), but the AT protocol — whilst super (lacking a better word) “verbose” — is pretty usable, especially thanks to Ilya Siamionau’s atproto AT Protocol SDK for Python.

Longtime readers know I am most certainly not going to use Python directly, as such practice has been found to cause early onset dementia. But, that module is so well done that I’ll gladly use it from within R.

I whipped up a small R script CLI that will fetch my feed and display it via the terminal. While I also use the web app and the Raycast extension to read the feed, it’s a billion degrees outside, so used the need to stay indoors as an excuse to add this third way of checking what’s new.

Store your handle and app-password in BSKY_USER and BSKY_KEY, respectively, adjust the shebang accordingly, add execute permissions to the file and 💥, you can do the same.

#!/usr/local/bin/Rscript

suppressPackageStartupMessages({
  library(reticulate, quietly = TRUE, warn.conflicts = FALSE)
  library(lubridate, include.only = c("as.period", "interval"), quietly = TRUE, warn.conflicts = FALSE)
  library(crayon, quietly = TRUE, warn.conflicts = FALSE)
})

# Get where {reticlulate} thinks your python is via py_config()$python
# then use the full path to 
#   /full/path/to/python3 -m pip install atproto

atproto <- import("atproto")

client <- atproto$Client()

profile <- client$login(Sys.getenv("BSKY_USER"), Sys.getenv("BSKY_KEY"))

res <- client$bsky$feed$get_timeline(list(algorithm = "reverse-chronological"))

for (item in rev(res$feed)) (
  cat(
    blue(item$post$author$displayName), " • ",
    silver(gsub("\\.[[:digit:]]+", "", tolower(as.character(as.period(interval(item$post$record$createdAt, Sys.time()))))), "ago\n"),
    italic(paste0(strwrap(item$post$record$text, 50), collapse="\n")), "\n",
    ifelse(
      hasName(item$post$record$embed, "images"), 
      sprintf(
        green("[%s IMAGE%s]\n"), 
        length(item$post$record$embed$images),
        ifelse(length(item$post$record$embed$images) > 1, "s", "")
      ),
      ""
    ),
    ifelse(
      hasName(item$post$record$embed, "external"),
      yellow(sprintf(
        "\n%s\n   │\n%s\n\n",
        bold(paste0(strwrap(item$post$embed$external$title, 47, prefix = "   │"), collapse = "\n")),
        italic(paste0(strwrap(item$post$embed$external$description, 47, prefix = "   │"), collapse = "\n"))
      )),
      ""
    ),
    "\n",
    sep = ""
  )
)

This is a sample of the output, showing how it handles embeds and images:

feed output

Code is on GitLab.

FIN

There’s tons of room for improvement in this hastily-crafted bit of code, and I’ll get it up on GitLab once their servers come back to life.

If you want to experience Bluesky but have no account, the firehose — which Elon charges $40K/month for on the birdsite — is free and can be accessed sans authentication:

library(reticulate)

atproto <- import("atproto")

hose <- atproto$firehose$FirehoseSubscribeReposClient()

handler <- \(msg) {
  res <- atproto$firehose$parse_subscribe_repos_message(msg)
  print(res) # you need to do a bit more than this to get the actual commit type and contents
}

hose$start(handler)

You can find me over on bsky at @hrbrmstr.dev.

Back in 2016, I did a post on {ggplot2} text annotations because it was a tad more challenging to do some of the things in that post back in the day.

Since I’ve been moving back and forth between R and Observable (and JavaScript in general), I decided to recreate that post in OJS Plot, as it is also somewhat challenging to use this nascent new plot player in town.

Getting Observable embeds right in a theme that supports both dark/light mode is kind of a pain (either that or I’m just lazy), and I already have enough holes in this site’s content security policy (and, I’d have to poke more holes to get fonts working “properly”); so, you can visit the link to see the crisp SVG version and suffer the PNG below to see that Plot is a very capable companion to {ggplot2}.

Speaking of {ggplot2}, Daily Drop subscribers were given some homework (happens every Friday) that involved working in Observable Plot. The catch was that said work was to recreate {ggplot2} geom_ examples in the R manual pages for {ggplot2}.

You can check out that post, go to a hosted version of the starter “Rosetta Stone” set of examples I provided, poke at the Observable notebook version, or head over to GitHub where the online play ground source lives, along with a Quarto document with all the examples.

WebR has a template for React, but I’m not a fan of it or Vue (a fact longtime readers are likely tired of hearing right about now). It’s my opinion and experience that Lit webcomponents come closer to “bare metal” webcomponents, which means the “lock-in” with Lit is way less of a potential headache than all the baggage that comes with the other frameworks.

I leaned into exposition for most of my WebR Experiments, but that very likely made it hard to reproduce or even use those repos without some pain. So, I decided to reduce the pain, remove the inertia and make a template (GH) you can use almost immediately from the command line.

The only “inertia” is that you need npm installed. Subsequently, cd into the parent directory of the new project you want to make and:

npx create-webr-vite-lit my-webr-project
cd my-webr-project
npm install
npx vite --port=4000

then, hit `http://localhost:4000/` (change the port if you’re already using it for something else).

You can check it out on this demo site.

Batteries Included

  • Vite (for fast building)
  • WebR (duh)
    • r.js which has all the setup code for WebR + some helpers (more coming here, too)
  • Pyodide (initiall disabled)
    • py.js which does not get used but is available if you want to use pyodide
  • Lit (webcomponents) — it ships with 3:
    • one for my usual “loading…” status message (which gets a facelift)
    • one generic webcomponent for Observable Plots
    • one simple “button” webcomponent to trigger simple actions
    • more are coming! The goal is to wrap all the inputs and outputs provided by Bonsai (below). PR’s welcome!
  • A lightweight CSS framework called Bonsai that I added dark-mode support for. The post-create default page is the Bonsai grid & CSS reference. The webcomponents show how to make all the Bonsai styles available to the components (there’s a default full separation of all things from the webcomponents).
  • An example justfile since I’ve grown quite fond of Just

The default/demo “app” demonstrates how all the components work.

light mode version

FIN

This setup should have you up and running with your own apps in no time.

I tested light/dark mode switching in Chrome and Safari (macOS/iOS) and the dark/light switching works as intended. Arc doesn’t respond to it, so I’ll be debugging that.

Drop issues in GH if I need to tweak the dark mode, or if you run into snags.

I was cranking out a blog post for work earlier this week that shows off just how many integrations our platform has. I won’t blather about that content here, but as I was working on it, I really wanted to show off all the integrations.

A table seemed far too boring.

Several categorized unordered lists seemed too unwieldy.

Then, it dawned on me that I could make a visual representation of all the integration partners we have by thinking of the entire integrations’ ecosystem as a “universe” with each category being a “solar system” of that universe.

I’ve been leaning more heavily on javsascript for datavis these days, but I will always be more comfortable in {ggplot2}, so I headed to R to design a way to:

  • generate concentric orbits for “n” solar systems
  • randomize the placement of the planets in each ring
  • make a decent plot!

I worked with one of the most amazing designers on the planet (heh) to come up with some stellar (heh) styling for it, and this was the result:

5 solar system panels

I took the styling guidance and wrapped the messy, individual functions I had into a new {ggsolar} package, you can find at https://github.com/hrbrmstr/ggsolar.

It’s pretty raw, and I need to “geomify” it at some point, but it has

  • a function to generate the concentric circle polygons
  • another one to identify a random point on each ring
  • a naive plotting function, and
  • a theme cleanup function for decent output.

The default is to generate uniformly distributed concentric circles, but you have the option of supplying a custom radii vector to make it more “real”/“solar-sysetm-y”.

Here’s the general flow:

# sol_planets is a built in vector of our system's planet names
sol_orbits <- generate_orbits(sol_planets)

set.seed(1323) # this produced a decent placements

# naive but it works! You can specify your own point picker, too.
placed_planets <- randomize_planet_positions(sol_orbits)

# do the thing!
plot_orbits(
  orbits = sol_orbits, 
  planet_positions = placed_planets,
  label_planets = TRUE,
  label_family = hrbrthemes::font_es_bold
) +
  hrbrthemes::theme_ipsum_es(grid="") +
  coord_equal() +
  labs(
    title = "Sol",
    caption = "Pluto is 100% a planet"
  ) +
  theme_enhance_solar()

Random Systems

I included a generate_random_planets() function that uses a hidden Markov model to create believable planetary names, so you can now make your own universe with {ggplot2}!

set.seed(42)
(rando_planets <- generate_random_planets(12))

rando_orbits <- generate_orbits(rando_planets)

set.seed(123) # this produced decent placements

placed_planets <- randomize_planet_positions(rando_orbits)

plot_orbits(
  orbits = rando_orbits, 
  planet_positions = placed_planets,
  label_planets = TRUE,
  label_family = hrbrthemes::font_es_bold
) +
  hrbrthemes::theme_ipsum_es(grid="") +
  coord_equal() +
  labs(
    title = "Rando System"
  ) +
  theme_enhance_solar()

random system

FIN

Kick the tyres, use {gganimate} to make some animations, and be the ruler of your own universe! (We’re going to try to generate team “org charts” with these later in the week, so be creative, too!).