htmlunitjars Updated to 2.34.0

The in-dev htmlunit package for javascript-“enabled” web-scraping without the need for Selenium, Splash or headless Chrome relies on the HtmlUnit library and said library just released version 2.34.0 with a wide array of changes that should make it possible to scrape more gnarly javascript-“enabled” sites. The Chrome emulation is now also on-par with Chrome 72 series (my Chrome beta is at 73.0.3683.56 so it’s super close to very current).

In reality, the update was to the htmlunitjars package where the main project JAR and dependent JARs all received a refresh.

The README and tests were all re-run on both packages and Travis is happy.

If you’ve got a working rJava installation (aye, it’s 2019 and that’s still “a thing”) then you can just do:

install.packages(c("htmlunitjars", "htmlunit"), repos = "https://cinc.rud.is/")

to get them installed and start playing with the DSL or work directly with the Java classes.

FIN

As usual, use your preferred social coding site to log feature requests or problems.

Cover image from Data-Driven Security
Amazon Author Page

3 Comments htmlunitjars Updated to 2.34.0

  1. Pingback: htmlunitjars Updated to 2.34.0 – Data Science Austria

  2. Pingback: Quick Hit: Scraping javascript-“enabled” Sites with {htmlunit} | rud.is

  3. Pingback: Quick Hit: Scraping javascript-“enabled” Sites with {htmlunit} – Data Science Austria

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.