htmlunitjars Updated to 2.34.0

The in-dev htmlunit package for javascript-“enabled” web-scraping without the need for Selenium, Splash or headless Chrome relies on the HtmlUnit library and said library just released version 2.34.0 with a wide array of changes that should make it possible to scrape more gnarly javascript-“enabled” sites. The Chrome emulation is now also on-par with Chrome 72… Continue reading

drat All The 📦! : Enabling Easier Package Discovery and Installation with Your Own CRAN-like Repo for Your Packages

I’ve got a work-in-progress drat-ified CRAN-like repo for (eventually) all my packages over at CINC🔗 (“CINC is not CRAN” and it also sounds like “sync”). This is in parallel with a co-location/migration of all my packages to SourceHut (just waiting for the sr.ht alpha API to be baked) and a self-hosted public Gitea instance. Everything… Continue reading

Cloudy with a chance of Caffeinated Query Orchestration – New rJava Wrappers for AWS Athena SDK for Java

There are two fledgling rJava-based R packages that enable working with the AWS SDK for Athena: awsathena | GL| GH awsathenajars | GL| GH They’re both needed to conform with the way CRAN like rJava-based packages submitted that also have large JAR dependencies. The goal is to eventually have wrappers for anything R folks need… Continue reading

I Just Wanted The Data : Turning Tableau & Tidyverse Tears Into Smiles with Base R (An Encoding Detective Story)

Those outside the Colonies may not know that Payless—a national chain that made footwear affordable for millions of ‘Muricans who can’t spare $100.00 USD for a pair of shoes their 7 year old will outgrow in a year— is closing. CNBC also had a story that featured a choropleth with a tiny button at the… Continue reading

Using the ropendata R Package to Access Petabytes of Free Internet Telemetry Data from Rapid7

I’ve got a post up over at $DAYJOB’s blog on using the ropendata🔗 package to access the ginormous and ever-increasing amount of internet telemetry (scan) data via the Rapid7 Open Data API. It’s super-R-code-heavy but renders surprisingly well in Ghost (the blogging platform we use at work) and covers everything from where to sign up… Continue reading