The in-dev htmlunit package for javascript-“enabled” web-scraping without the need for Selenium, Splash or headless Chrome relies on the HtmlUnit library and said library just released version 2.34.0 with a wide array of changes that should make it possible to scrape more gnarly javascript-“enabled” sites. The Chrome emulation is now also on-par with Chrome 72… Continue reading
Post Category → R
drat All The 📦! : Enabling Easier Package Discovery and Installation with Your Own CRAN-like Repo for Your Packages
I’ve got a work-in-progress drat-ified CRAN-like repo for (eventually) all my packages over at CINC🔗 (“CINC is not CRAN” and it also sounds like “sync”). This is in parallel with a co-location/migration of all my packages to SourceHut (just waiting for the sr.ht alpha API to be baked) and a self-hosted public Gitea instance. Everything… Continue reading
Cloudy with a chance of Caffeinated Query Orchestration – New rJava Wrappers for AWS Athena SDK for Java
There are two fledgling rJava-based R packages that enable working with the AWS SDK for Athena: awsathena | GL| GH awsathenajars | GL| GH They’re both needed to conform with the way CRAN like rJava-based packages submitted that also have large JAR dependencies. The goal is to eventually have wrappers for anything R folks need… Continue reading
I Just Wanted The Data : Turning Tableau & Tidyverse Tears Into Smiles with Base R (An Encoding Detective Story)
Those outside the Colonies may not know that Payless—a national chain that made footwear affordable for millions of ‘Muricans who can’t spare $100.00 USD for a pair of shoes their 7 year old will outgrow in a year— is closing. CNBC also had a story that featured a choropleth with a tiny button at the… Continue reading
In Dev: WiGLE Your Way Into A Hotspot with wiglr
WiGLE has been around a while and is a great site to explore the pervasiveness or sparsity of Wi-Fi (and cellular) networks around the globe. While interactive use is fun, WiGLE also has a free API (so long as you obey the EULA and aren’t abusive) that lets you explore a little deeper if you… Continue reading
Conquering Caffeinated Amazon Athena with the metis Trio of Packages
I must preface this post with the posit that if you’re doing anything interactive() with Amazon Athena you should seriously consider just using their free ODBC drivers as it’s the easiest way to wire them up to R DBI- and tidyverse-wise. I’ve said as much in previous posts. Drop a note in the comments if… Continue reading
Using the ropendata R Package to Access Petabytes of Free Internet Telemetry Data from Rapid7
I’ve got a post up over at $DAYJOB’s blog on using the ropendata🔗 package to access the ginormous and ever-increasing amount of internet telemetry (scan) data via the Rapid7 Open Data API. It’s super-R-code-heavy but renders surprisingly well in Ghost (the blogging platform we use at work) and covers everything from where to sign up… Continue reading
Quick Hit: Speeding Up a Slow/Mundane Task with a Little Rcpp
Over at $DAYJOB’s blog I’ve queued up a post that shows how to use our new ropendata? package to work with our Open Data portal’s API. I’m not super-sure when it’s going to be posted so keep an RSS reader fixed on https://blog.rapid7.com/ if you’re interested in seeing it (I may make a small note… Continue reading