Skip navigation

I caught this tweet by Terence Eden about using Twitter image alt-text to “PGP sign” tweet and my mind immediately went to “how can I abuse this for covert communications, malicious command-and-control, and embedding R code in tweets?”.

When you paste or upload an image to tweet (web interface, at least) you have an opportunity to add “alt” text which is — in theory — supposed to help communicate the content of the image to folks using assistive technology. Terence figured out the alt-text limit on Twitter is large (~1K) which is plenty of room for useful R code.

I poked around for something to use as an example and settled on using data from COVID Stimulus Watch. The following makes the chart in this tweet — https://twitter.com/hrbrmstr/status/1261641887603179520.

I’m not posting the chart here b/c it’s nothing special, but the code for it is below.

library(hrbrthemes);

x <- read.csv("https://data.covidstimuluswatch.org/prog.php?&detail=export_csv")[,3:5];

x[,3] <- as.numeric(gsub("[$,]","",x[,3]));
x <- x[(x[,1]>20200400)&x[,3]>0,];
x[,1] <- as.Date(as.character(x[,1]),"%Y%m%d");

ggplot(x, aes(Award.Date, Grant.Amount, fill=Award.Type)) +
  geom_col() +
  scale_y_comma(
    labels = c("$0", "$5bn", "$10bn", "$15bn")
  ) +
  labs(
    title = "COVID Stimulus Watch: Grants",
    caption = "Source: https://data.covidstimuluswatch.org/prog.php?detail=opening"
  ) +
  theme_ipsum_es(grid="XY")

Semicolons are necessary b/c newlines are going to get stripped when we paste that code block into the alt-text entry box.

We can read that code back into R with some help from read_html() & {styler}:

library(rtweet)
library(rvest)
library(stringi)
library(magrittr)

pg <- read_html("https://twitter.com/hrbrmstr/status/1261641887603179520")

html_nodes(pg, "img") %>% 
  html_attr("alt") %>% 
  keep(stri_detect_fixed, "library") %>% 
  styler::style_text()
library(hrbrthemes)
x <- read.csv("https://data.covidstimuluswatch.org/prog.php?&detail=export_csv")[, 3:5]
x[, 3] <- as.numeric(gsub("[$,]", "", x[, 3]))
x <- x[(x[, 1] > 20200400) & x[, 3] > 0, ]
x[, 1] <- as.Date(as.character(x[, 1]), "%Y%m%d")
ggplot(x, aes(Award.Date, Grant.Amount, fill = Award.Type)) +
  geom_col() +
  scale_y_comma(
    labels = c("$0", "$5bn", "$10bn", "$15bn")
  ) +
  labs(
    title = "COVID Stimulus Watch: Grants",
    caption = "Source: https://data.covidstimuluswatch.org/prog.php?detail=opening"
  ) +
  theme_ipsum_es(grid = "XY")

Twitter’s API does not seem to return alt-text: (see UPDATE)

rtweet::lookup_statuses("1261641887603179520") %>% 
  jsonlite::toJSON(pretty=TRUE)
## [
##   {
##     "user_id": "5685812",
##     "status_id": "1261641887603179520",
##     "created_at": "2020-05-16 12:57:20",
##     "screen_name": "hrbrmstr",
##     "text": "Twitter's img alt-text limit is YUGE! So, we can abuse it for semi-covert comms channels, C2, or for \"embedding\" the code ## that makes this chart!\n\nUse `read_html()` on URL of this tweet; find 'img' nodes w/html_nodes(); extract 'alt' attr text w/## html_attr(). #rstats \n\nh/t @edent https://t.co/v5Ut8TzlRO",
##     "source": "Twitter Web App",
##     "display_text_width": 278,
##     "is_quote": false,
##     "is_retweet": false,
##     "favorite_count": 8,
##     "retweet_count": 2,
##     "hashtags": ["rstats"],
##     "symbols": [null],
##     "urls_url": [null],
##     "urls_t.co": [null],
##     "urls_expanded_url": [null],
##     "media_url": ["http://pbs.twimg.com/media/EYI_W-xWsAAZFeP.png"],
##     "media_t.co": ["https://t.co/v5Ut8TzlRO"],
##     "media_expanded_url": ["https://twitter.com/hrbrmstr/status/1261641887603179520/photo/1"],
##     "media_type": ["photo"],
##     "ext_media_url": ["http://pbs.twimg.com/media/EYI_W-xWsAAZFeP.png"],
##     "ext_media_t.co": ["https://t.co/v5Ut8TzlRO"],
##     "ext_media_expanded_url": ["https://twitter.com/hrbrmstr/status/1261641887603179520/photo/1"],
##     "mentions_user_id": ["14054507"],
##     "mentions_screen_name": ["edent"],
##     "lang": "en",
##     "geo_coords": ["NA", "NA"],
##     "coords_coords": ["NA", "NA"],
##     "bbox_coords": ["NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"],
##     "status_url": "https://twitter.com/hrbrmstr/status/1261641887603179520",
##     "name": "boB • Everywhere is Baltimore • Rudis",
##     "location": "Doors & Corners",
##     "description": "Don't look at me…I do what he does—just slower. 🇷 #rstats avuncular • pampa • #tired • 👨‍🍳 • ✝️ • Prìomh ## Neach-saidheans Dàta @ @rapid7",
##     "url": "https://t.co/RgY1wHjoqM",
##     "protected": false,
##     "followers_count": 11886,
##     "friends_count": 458,
##     "listed_count": 667,
##     "statuses_count": 84655,
##     "favourites_count": 15140,
##     "account_created_at": "2007-05-01 14:04:24",
##     "verified": true,
##     "profile_url": "https://t.co/RgY1wHjoqM",
##     "profile_expanded_url": "https://rud.is/b",
##     "profile_banner_url": "https://pbs.twimg.com/profile_banners/5685812/1398248552",
##     "profile_background_url": "http://abs.twimg.com/images/themes/theme15/bg.png",
##     "profile_image_url": "http://pbs.twimg.com/profile_images/824974380803334144/Vpmh_s3x_normal.jpg"
##   }
## ]

but I still need to poke over at the API docs to figure out if there is a way to get it more programmatically. (see UPDATE)

If we want to be incredibly irresponsible and daft (like a recently semi-shuttered R package installation service) we can throw caution to the wind and just plot it outright:

library(rtweet)
library(rvest)
library(stringi)
library(magrittr)

pg <- read_html("https://twitter.com/hrbrmstr/status/1261641887603179520")

html_nodes(pg, "img") %>% 
  html_attr("alt") %>% 
  keep(stri_detect_fixed, "library") %>% 
  textConnection() %>% 
  source() %>% # THIS IS DANGEROUS DO NOT TRY THIS AT HOME
  print()

Seriously, though, don’t do that. Lots of bad things can happen when you source() from the internet.

UPDATE (2020-05-17)

You can use:

paste(as.character(parse(text = "...")), collapse = "; ")

to “minify” R code for the alt-text.

And, you can use https://github.com/hrbrmstr/rtweet until it is PR’d back into {rtweet} proper to send tweets with image alt-text. The “status” functions also return any alt-text in a new ext_alt_text column.

FIN

Now, you can make your Twitter charts reproducible on-platform (until Twitter does something to thwart this new communication and file-sharing channel).

Since twitter status URLs are just GET requests, orgs should consider running the content of those URLs through alt-text extractors just in case there’s some funny business going on across user endpoints.

Ingredients

  • 453g all-purpose flour (this precision is not 100% necessary, you may need to add more when folding in the wet ingredients)
  • 30-37g sugar (I’d err on the lower side for the first run)
  • 28g baking powder
  • 7g salt
  • 151g butter-flavored shortening (cold!)
  • 76g eggs (I kinda just use 2 jumbo and add a teensy bit more flour)
  • 240ml coconut milk yogurt (unsweetened)
  • egg wash (one egg and a tablespoon of oat milk and a pinch of salt)

TODO

Oven @ 425°F/218°C ; Sheet pan lined with parchment paper.

Sift dry bits together.

Cut up shortening into small squares and mix into the dry bits with a pastry blender.

Combine eggs and yogurt. Add to ^^.

Fold and knead gently just until the dough forms. Too much working the dough and the biscuits will be tough.

Press out dough on lightly floured surface to ~3cm thickness and use a 5cm cutter to cut (push down, don’t twist).

Place on lined sheet pan.

Brush with egg wash.

Bake ~15m (start watching at 7m, prbly rotate the sheet pan then, and then again at ~12m).

The United States Centers for Disease Control (CDC from now on) has setup two new public surveillance resources for COVID-19. Together, COVIDView and COVID-NET provide similar weekly surveillance data as FluView does for influenza-like illnesses (ILI).

The COVIDView resources are HTML tables (O_O) and, while the COVID-NET interface provides a “download” button, there is no exposed API to make it easier for the epidemiological community to work with these datasets.

Enter {cdccovidview} — https://cinc.rud.is/web/packages/cdccovidview/ — which scrapes the tables and uses the hidden API in the same way {cdcfluview}(https://cran.rstudio.com/web/packages/cdcfluview/index.html) does for the FluView data.

Weekly case, hospitalization, and mortality data is available at the national, state and regional levels (where provided) and I tried to normalize the fields across each of the tables/datasets (I hate to pick on them when they’re down, but these two sites are seriously sub-optimal from a UX and just general usage perspective).

After you follow the above URL for information on how to install the package, it should “just work”. No API keys are needed, but the CDC may change the layout of tables and fields structure of the hidden API at any time, so keep an eye out for updates.

Using it is pretty simple, just use one of the functions to grab the data you want and then work with it.

library(cdccovidview)
library(hrbrthemes)
library(tidyverse)

hosp <- laboratory_confirmed_hospitalizations()

hosp
## # A tibble: 4,590 x 8
##    catchment      network   year  mmwr_year mmwr_week age_category cumulative_rate weekly_rate
##    <chr>          <chr>     <chr> <chr>     <chr>     <chr>                  <dbl>       <dbl>
##  1 Entire Network COVID-NET 2020  2020      10        0-4 yr                   0           0  
##  2 Entire Network COVID-NET 2020  2020      11        0-4 yr                   0           0  
##  3 Entire Network COVID-NET 2020  2020      12        0-4 yr                   0           0  
##  4 Entire Network COVID-NET 2020  2020      13        0-4 yr                   0.3         0.3
##  5 Entire Network COVID-NET 2020  2020      14        0-4 yr                   0.6         0.3
##  6 Entire Network COVID-NET 2020  2020      15        0-4 yr                  NA          NA  
##  7 Entire Network COVID-NET 2020  2020      16        0-4 yr                  NA          NA  
##  8 Entire Network COVID-NET 2020  2020      17        0-4 yr                  NA          NA  
##  9 Entire Network COVID-NET 2020  2020      18        0-4 yr                  NA          NA  
## 10 Entire Network COVID-NET 2020  2020      19        0-4 yr                  NA          NA  
## # … with 4,580 more rows

c(
  "0-4 yr", "5-17 yr", "18-49 yr", "50-64 yr", "65+ yr", "65-74 yr", "75-84 yr", "85+"
) -> age_f

mutate(hosp, start = mmwr_week_to_date(mmwr_year, mmwr_week)) %>%
  filter(!is.na(weekly_rate)) %>%
  filter(catchment == "Entire Network") %>%
  select(start, network, age_category, weekly_rate) %>%
  filter(age_category != "Overall") %>%
  mutate(age_category = factor(age_category, levels = age_f)) %>%
  ggplot() +
  geom_line(
    aes(start, weekly_rate)
  ) +
  scale_x_date(
    date_breaks = "2 weeks", date_labels = "%b\n%d"
  ) +
  facet_grid(network~age_category) +
  labs(
    x = NULL, y = "Rates per 100,000 pop",
    title = "COVID-NET Weekly Rates by Network and Age Group",
    caption = sprintf("Source: COVID-NET: COVID-19-Associated Hospitalization Surveillance Network, Centers for Disease Control and Prevention.\n<https://gis.cdc.gov/grasp/COVIDNet/COVID19_3.html>; Accessed on %s", Sys.Date())
  ) +
  theme_ipsum_es(grid="XY")

FIN

This is brand new and — as noted — things may change or break due to CDC site changes. I may have also missed a table or two (it’s a truly terrible site).

If you notice things are missing or would like a different interface to various data endpoints, drop an issue or PR wherever you’re most comfortable.

Stay safe!

Ingredients

  • 2 cups adzuki (not dried)
  • 2 cups (4-6 links) andouille sliced
  • 1 tbsp olive oil
  • 1 medium onion, chopped
  • 2 bay leaves
  • 2 garlic cloves, coarse chopped
  • 0-2 dry hot peppers to taste
  • 1-2 fresh sprigs thyme
  • 4 cups stock (chicken or veg)
  • salt & pepper to taste
  • dash of vinegar

TODO

Brown sausage in oil then remove.

Sauté onion in same oil (add more if dry) til clear.

Add garlic and sauté for 1-2 minutes.

Add back sausage and add in everything else and simmer for 45 minutes.

To thicken, remove and pulse a few tablespoons of beans and add back or stir in 1 tbsp corn starch dissolved in stock or water.

Just a quick note that thanks to a gentle nudge an updated version of {uaparser} — a package that processes User Agent strings web clients send to servers — is making its way to all the CRAN mirrors and is also available on CINC. The most significant change is a much overdue update to the user agent regex dictionary.

It takes something like this Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.2 (KHTML, like Gecko) Ubuntu/11.10 Chromium/15.0.874.106 Chrome/15.0.874.106 Safari/535.2 and turns it into a tidy data frame:

uaparserjs::ua_parse("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.2 (KHTML, like Gecko) Ubuntu/11.10 Chromium/15.0.874.106 Chrome/15.0.874.106 Safari/535.2")
## # A tibble: 1 x 9
##   userAgent                                   ua.family ua.major ua.minor ua.patch os.family os.major os.minor device.family
##   <chr>                                       <chr>     <chr>    <chr>    <chr>    <chr>     <chr>    <chr>    <chr>        
## 1 Mozilla/5.0 (X11; Linux x86_64) AppleWebKi… Chromium  15       0        874      Ubuntu    11       10       Other     

The js on the end of the package name is a nod that it uses the javascript ua-parser-core module via Jeroen’s seriously awesome {V8} package. Four years ago, {uaparserjs} did not work on Windows due to V8 VM stack limitations on Windows. Today, it now works on all platforms!

Tis no slouch, either, as it processes 100 user agent strings in ~20ms. No speed demon, but should get the job done for most use-cases.

There is an excellent C++-backed R version that is not on CRAN and some heavy dependencies, but is faster than the javascript version if you need to process scads of user agent strings (I tend to use at-scale Scala environments for this now, hence the reason for a long delay update).

Jeroen has an excellent writeup on how to use browserify to create an application bundle for javascript-backed R packages or scripts. There are some idiosyncrasies with the ua-parser-core reference implementation that was causing me no end of trouble with that method. On a lark, I tried:

$ webpack --mode="production" index.js -o bundle.js

and it worked perfectly on the first try (both in creating the app bundle and that bundle working just as before). This is due in no small part to Jeroen getting {V8} package to work with more recent lib-v8 releases (which is also why it works on Windows now). I’ll try to write-up the webpack alternate method and PR into the vignette as I get time.

FIN

As usual, kick the tyres, jump in with PRs or issues and — most of all — be safe out there!

For folks who interact with CRAN or R Core: they’re continuing to support our community during these crazy times so when you are in exchanges with them, definitely take some time out to add an extra nod of thanks for managing to do so whilst juggling the same things we all are.

Waffle House announced it was closing hundreds of stores this week due to SARS-Cov-2 (a.k.a COVID-19). This move garnered quite a bit of media attention since former FEMA Administrator Craig Fugate used the restaurant chain as both an indicator of the immediate and overall severity of a natural disaster. [He’s not the only one](https://www.ehstoday.com/emergency-management/article/21906815/what-do-waffles-have-to-do-with-risk-management. The original concept was pretty straightforward:

For example, if a Waffle House store is open and offering a full menu, the index is green. If it is open but serving from a limited menu, it’s yellow. When the location has been forced to close, the index is red. Because Waffle House is well prepared for disasters, Kouvelis said, it’s rare for the index to hit red. For example, the Joplin, Mo., Waffle House survived the tornado and remained open.
 
“They know immediately which stores are going to be affected and they call their employees to know who can show up and who cannot,” he said. “They have temporary warehouses where they can store food and most importantly, they know they can operate without a full menu. This is a great example of a company that has learned from the past and developed an excellent emergency plan.”

SARS-Cov-2 is not a tropical storm, so conditions are a bit different and a tad more complex when it comes to basing severity of this particular disaster (mostly caused by inept politicians across the globe), which gave me an idea for how to make the Waffle House Index a proper index, i.e. a _”statistical measure of change in a representative group of individual data points.”_1.

In the case of an outbreak, rather than a simple green/yellow/red condition state, using the ratio of closed to open Waffle House locations as a numeric index — [0-1] — seems to make more sense since it may better help indicate:

  • when shelter-in-place became mandatory where a given restaurant is located
  • the severity of SARS-Cov-2-caused symptoms for a given location
  • disruptions in the supply chain for a given location due to SARS-Cov-2

I kinda desperately needed a covidistraction so I set out to see how hard it would be to build such an index metric.

Waffle House lets you find locations via a standard map/search interface. They provide lots of data via that map which can be used to figure out which stores are open and which are closed. There’s a nascent R package which contains all the recipes necessary for the data gathering. However, you don’t need to use it, since it powers wafflehouseindex.us which is collecting the data when the store closings info changes and provides a snapshot of the latest data daily (direct CSV link).

The historical data will make it to a git repo at some point in the near future.

The current index value is 21.2, which increased quickly after the first value of 18.1 (that event was the catalyst for getting the site up and package done) and the closed locations are on the map at the beginning of the post. I went with three qualitative levels on the gauge mostly to keep things simple.

There will absolutely be more location closings and it will be interesting (and, ultimately, very depressing and likely grave) to see how high the index goes and how long it stays above zero.

FIN

The metric is — for the time being — computed across all stores. As noted earlier, this could be broken down into regional index scores to intuit the aforementioned three indicators on a more local level. The historical data (apart from the first closings announcement) is being saved so it will be possible to go back and compute regional indexes when I’ve got more time.

I shall reiterate that you should grab the data from http://wafflehouseindex.us/data/latest.csv vs use the R package since there’s no point in dup’ing the gathering and the historical data will be up and maintained soon.

Stay safe, folks.

Stuff you need

  • 370g all-purpose flour
  • 7g baking powder
  • 300g sugar
  • 80g shortening
  • 7.5g salt
  • 140g eggs (~2 proper jumbo)
  • 75ml coconut milk or cashew milk or almond milk yogurt
  • 75ml oat milk
  • 15ml vanilla
  • 85ml veg oil
  • 10oz bag ghirardelli dark chocolate chips

Stuff you do

Oven @ 375°F.

Paddle sugar, shortening and salt. 3-5 mins.

Whisk eggs, milk, yogurt, vanilla & oil.

In three batches, mix/fold ^^ into the paddled mixture.

Sift together dry ingredients and mix until moist. Don’t over-mix.

Fold in chips.

Let sit for 3 mins.

While ^^, put liners in a 12-cup muffin tin.

Evenly distribute batter. It’s ~100g batter (~1/2 dry measuring cup) per muffin.

22-30m in the oven (it really depends on your oven type). You should not be afraid to skewer to test nor to move the tin around to evenly brown.

Cool on wire rack.

Über Tuesday has come and almost gone (some state results will take a while to coalesce) and I’m relieved to say that {catchpole} did indeed work, with the example code from before producing this on first run:

If we tweak the buffer space around the squares, I think the cartogram looks better:

but, you should likely use a different palette (see this Twitter thread for examples).

I noted in the previous post that borders might be possible. While I haven’t solved that use-case for individual states, I did manage to come up with a method for making a light version of the cartogram usable:

library(sf)
library(hrbrthemes) 
library(catchpole)
library(tidyverse)

delegates <- read_delegates()

candidates_expanded <- expand_candidates()

gsf <- left_join(delegates_map(), candidates_expanded, by = c("state", "idx"))

m <- delegates_map()

# split off each "area" on the map so we can make a border+background
list(
  setdiff(state.abb, c("HI", "AK")),
  "AK", "HI", "DC", "VI", "PR", "MP", "GU", "DA", "AS"
) %>% 
  map(~{
    suppressWarnings(suppressMessages(st_buffer(
      x = st_union(m[m$state %in% .x, ]),
      dist = 0.0001,
      endCapStyle = "SQUARE"
    )))
  }) -> m_borders

gg <- ggplot()
for (mb in m_borders) {
  gg <- gg + geom_sf(data = mb, col = "#2b2b2b", size = 0.125)
}

gg + 
  geom_sf(
    data = gsf,
    aes(fill = candidate),
    col = "white", shape = 22, size = 3, stroke = 0.125
  ) +
  scale_fill_manual(
    name = NULL,
    na.value = "#f0f0f0",
    values = c(
      "Biden" = '#f0027f',
      "Sanders" = '#7fc97f',
      "Warren" = '#beaed4',
      "Buttigieg" = '#fdc086',
      "Klobuchar" = '#ffff99',
      "Gabbard" = '#386cb0',
      "Bloomberg" = '#bf5b17'
    ),
    limits = intersect(unique(delegates$candidate), names(delegates_pal))
  ) +
  guides(
    fill = guide_legend(
      override.aes = list(size = 4)
    )
  ) +
  coord_sf(datum = NA) +
  theme_ipsum_es(grid="") +
  theme(legend.position = "bottom")

{ssdeepr}

Researcher pals over at Binary Edge added web page hashing (pre- and post-javascript scraping) to their platform using ssdeep. This approach is in the category of context triggered piecewise hashes (CTPH) (or local sensitivity hashing) similar to my R adaptation/packaging of Trend Micro’s tlsh.

Since I’ll be working with BE’s data off-and-on and the ssdeep project has a well-crafted library (plus we might add ssdeep support at $DAYJOB), I went ahead and packaged that up as well.

I recommend using the hash_con() function if you need to read large blobs since it doesn’t require you to read everything into memory first (though hash_file() doesn’t either, but that’s a direct low-level call to the underlying ssdeep library file reader and not as flexible as R connections are).

These types of hashes are great at seeing if something has changed on a website (or see how similar two things are to each other). For instance, how closely do CRAN mirror match the mothership?

library(ssdeepr) # see the links above for installation

cran1 <- hash_con(url("https://cran.r-project.org/web/packages/available_packages_by_date.html"))
cran2 <- hash_con(url("https://cran.biotools.fr/web/packages/available_packages_by_date.html"))
cran3 <- hash_con(url("https://cran.rstudio.org/web/packages/available_packages_by_date.html"))

hash_compare(cran1, cran2)
## [1] 0

hash_compare(cran1, cran3)
## [1] 94

I picked on cran.biotools.fr as I saw they were well-behind CRAN-proper on the monitoring page.

I noted that BE was doing pre- and post-javascript hashing as well. Why, you may ask? Well, websites behave differently with javascript running, plus they can behave differently when different user-agents are set. Let’s grab a page from Wikipedia a few different ways to show how they are not alike at all, depending on the retrieval context. First, let’s grab some web content!

library(httr)
library(ssdeepr)
library(splashr)

# regular grab
h1 <- hash_con(url("https://en.wikipedia.org/wiki/Donald_Knuth"))

# you need Splash running for javascript-enabled scraping this way
sp <- splash(host = "mysplashhost", user = "splashuser", pass = "splashpass")

# js-enabled with one ua
sp %>%
  splash_user_agent(ua_macos_chrome) %>%
  splash_go("https://en.wikipedia.org/wiki/Donald_Knuth") %>%
  splash_wait(2) %>%
  splash_html(raw_html = TRUE) -> js1

# js-enabled with another ua
sp %>%
  splash_user_agent(ua_ios_safari) %>%
  splash_go("https://en.wikipedia.org/wiki/Donald_Knuth") %>%
  splash_wait(2) %>%
  splash_html(raw_html = TRUE) -> js2

h2 <- hash_raw(js1)
h3 <- hash_raw(js2)

# same way {rvest} does it
res <- httr::GET("https://en.wikipedia.org/wiki/Donald_Knuth")

h4 <- hash_raw(content(res, as = "raw"))

Now, let’s compare them:

hash_compare(h1, h4) # {ssdeepr} built-in vs httr::GET() => not surprising that they're equal
## [1] 100

# things look way different with js-enabled

hash_compare(h1, h2)
## [1] 0
hash_compare(h1, h3)
## [1] 0

# and with variations between user-agents

hash_compare(h2, h3)
## [1] 0

hash_compare(h2, h4)
## [1] 0

# only doing this for completeness

hash_compare(h3, h4)
## [1] 0

For this example, just content size would have been enough to tell the difference (mostly, note how the hashes are equal despite more characters coming back with the {httr} method):

length(js1)
## [1] 432914

length(js2)
## [1] 270538

nchar(
  paste0(
    readLines(url("https://en.wikipedia.org/wiki/Donald_Knuth")),
    collapse = "\n"
  )
)
## [1] 373078

length(content(res, as = "raw"))
## [1] 374099

FIN

If you were in a U.S. state with a primary yesterday and were eligible to vote (and had something to vote for, either a (D) candidate or a state/local bit of business) I sure hope you did!

The ssdeep library works on Windows, so I’ll be figuring out how to get that going in {ssdeepr} fairly soon (mostly to try out the Rtools 4.0 toolchain vs deliberately wanting to support legacy platforms).

As usual, drop issues/PRs/feature requests where you’re comfortable for any of these or other packages.