

{"id":11416,"date":"2018-08-23T17:32:53","date_gmt":"2018-08-23T22:32:53","guid":{"rendered":"https:\/\/rud.is\/b\/?p=11416"},"modified":"2018-09-24T18:49:44","modified_gmt":"2018-09-24T23:49:44","slug":"introducing-gepetto-a-splash-like-rest-api-to-headless-chrome","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/","title":{"rendered":"Introducing &#8216;gepetto&#8217; \u2014 a Splash-like REST API to Headless Chrome"},"content":{"rendered":"<p>It&#8217;s been over a year since <a href=\"https:\/\/developers.google.com\/web\/updates\/2017\/04\/headless-chrome\">Headless Chrome<\/a> was introduced and it has matured greatly over that time and has acquired a pretty large user base. The TLDR on it is that you can now use Chrome as you would any command-line interface (CLI) program and generate PDFs, images or render javascript-interpreted HTML by supplying some simple parameters. It has a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Read%E2%80%93eval%E2%80%93print_loop\">REPL<\/a> mode for interactive work and can be instrumented through a custom websockets protocol.<\/p>\n<p>R folks have had the <a href=\"https:\/\/gitlab.com\/hrbrmstr\/decapitated\"><code>decapitated<\/code>?<\/a> package available almost since the launch day of Headless Chrome. It provides a basic wrapper to the CLI. The package has been updated more recently to enable the downloading of a custom Chromium binary to use instead of the system Chrome installation (which is a highly recommended practice).<\/p>\n<p>However, that nigh-mundane addition is not the only new feature in <code>decapitated<\/code>.<\/p>\n<h3>Introducing gepetto<\/h3>\n<p>While it would have been possible to create an R wrapper for the Headless Chrome websockets API, the reality is (and this is just my opinion) that it is better to integrate with a more robust and community supported interface to Headless Chrome instrumentation dubbed <a href=\"https:\/\/github.com\/GoogleChrome\/puppeteer\"><code>puppeteer<\/code>?<\/a>. <code>Puppeteer<\/code> is a javascript module that adds high level functions on top of the lower-level API and has a massive amount of functionality that can be easily tapped into.<\/p>\n<p>Now, Selenium <a href=\"https:\/\/developers.google.com\/web\/updates\/2017\/04\/headless-chrome#drivers\">works<\/a> <em>really well<\/em>  with Headless Chrome and there&#8217;s little point in trying to reinvent that wheel. Rather, I wanted a way to interact with Headless Chrome the way one can with <a href=\"https:\/\/github.com\/scrapinghub\/splash\">ScrapingHub&#8217;s Splash<\/a> service. That is, a simple REST API. To that end, I&#8217;ve started a project called <a href=\"https:\/\/gitlab.com\/hrbrmstr\/gepetto\"><code>gepetto<\/code>?<\/a> which aims to do just that.<\/p>\n<p><code>Gepetto<\/code> is a Node.js application which uses <code>puppeteer<\/code> for all the hard work. After seeing that such a REST API interface was possible via the <a href=\"https:\/\/github.com\/cheeaun\/puppetron\"><code>puppetron<\/code> proof of concept<\/a> I set out to build a framework which will (eventually) provide the same feature set that Splash has, substituting <code>puppeteer<\/code>-fueled javascript for the Lua interface.<\/p>\n<p>A REST API has a number of advantages over repeated CLI calls. First, each CLI call means more more <code>system()<\/code> call to start up a new process. You also need to manage Chrome binaries in that mode and are fairly limited in what you can do. With a REST API, Chrome loads once and then pages can be created at-will with no process startup overhead. Plus (once the API is finished) you&#8217;ll have far more control over what you can do. Again, this is not going to cover the same ground as Selenium, but should be of sufficient utility to add to your web-scraping toolbox.<\/p>\n<h3>Installing gepetto<\/h3>\n<p>There are instructions over at <a href=\"https:\/\/gitlab.com\/hrbrmstr\/gepetto\">the repo<\/a> on installing <code>gepetto<\/code> but R users can try a shortcut by grabbing the latest version of <code>decapitated<\/code> from Git[La|Hu]b and running <code>decapitated::install_gepetto()<\/code> which should (hopefully) go as smoothly as this provided you have a fairly recent version of Node.js installed along with npm:<\/p>\n<p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/rud.is\/gifs\/gepetto\/install-gepetto.gif?w=510&#038;ssl=1\"  ><\/p>\n<p>The installer provides some guidance should thing go awry. You&#8217;ll notice <code>gepetto<\/code> installs a current version of Chromium for your platform along with it, which helps to ensure smoother sailing than using the version of Chrome you may use for browsing.<\/p>\n<h3>Working with gepetto<\/h3>\n<p>Before showing off the R interface, it&#8217;s worth a look at the (still minimal) web interface. Bring up a terminal\/command prompt and enter <code>gepetto<\/code>. You should see something like this:<\/p>\n<pre><code class=\"language-bash\">$ gepetto\n? Launch browser!\n? gepetto running on: http:\/\/localhost:3000<\/code><\/pre>\n<p>NOTE: You can use a different host\/port by setting the <code>HOST<\/code> and <code>PORT<\/code> environment variables accordingly before startup.<\/p>\n<p>You can then go to <a href=\"http:\/\/localhost:3000\">http:\/\/localhost:3000<\/a> in your browser and should see this:<\/p>\n<p><a href=\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/gweb\/\" rel=\"attachment wp-att-11419\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"11419\" data-permalink=\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/gweb\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/gweb.png?fit=2696%2C1576&amp;ssl=1\" data-orig-size=\"2696,1576\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"gweb\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/gweb.png?fit=300%2C175&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/gweb.png?fit=510%2C298&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/gweb.png?resize=510%2C298&#038;ssl=1\" alt=\"\" width=\"510\" height=\"298\" class=\"aligncenter size-full wp-image-11419\" \/><\/a><\/p>\n<p>Enter a URL into the input field and press the buttons! You can do quite a bit just from the web interface.<\/p>\n<p>If you select &#8220;API Docs&#8221; (<a href=\"http:\/\/localhost:3000\/documentation\">http:\/\/localhost:3000\/documentation<\/a>) you&#8217;ll get the Swagger-gen&#8217;d API documentation for all the API endpoints:<\/p>\n<p><a href=\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/gswag\/\" rel=\"attachment wp-att-11420\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"11420\" data-permalink=\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/gswag\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/gswag.png?fit=2784%2C1664&amp;ssl=1\" data-orig-size=\"2784,1664\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"gswag\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/gswag.png?fit=300%2C179&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/gswag.png?fit=510%2C305&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/gswag.png?resize=510%2C305&#038;ssl=1\" alt=\"\" width=\"510\" height=\"305\" class=\"aligncenter size-full wp-image-11420\" \/><\/a><\/p>\n<p>The Swagger definition JSON is also at <a href=\"http:\/\/localhost:3000\/swagger.json\">http:\/\/localhost:3000\/swagger.json<\/a>.<\/p>\n<p>The API documentation will be a bit more robust as the module&#8217;s corners are rounded out.<\/p>\n<h3><em>&#8220;But, this is supposed to be an R post&hellip;&#8221;<\/em><\/h3>\n<p>Yes. Yes it is.<\/p>\n<p>If you followed along in the previous section and started <code>gepetto<\/code> from a command-line interface, kill the running service and fire up your favourite R environment and let&#8217;s scrape some content!<\/p>\n<pre><code class=\"language-r\">library(rvest)\nlibrary(decapitated)\nlibrary(tidyverse)\n\ngpid <- start_gepetto()\n\ngpid\n## PROCESS 'gepetto', running, pid 60827.\n\ngepetto() %>% \n  gep_active()\n## [1] TRUE<\/code><\/pre>\n<p>Anything other than a &#8220;running&#8221; response means there&#8217;s something wrong and you can use the various <code>processx<\/code> methods on that <code>gpid<\/code> object to inspect the error log. If you were able to run <code>gepetto<\/code> from the command line then it should be fine in R, too. The <code>gep()<\/code> function build a connection object and <code>gep_active()<\/code> tests an API endpoint to ensure you can communicate with the server.<\/p>\n<p>Now, let&#8217;s try hitting a website that requires javascript. I&#8217;ll borrow an <a href=\"http:\/\/www.rladiesnyc.org\/post\/scraping-javascript-websites-in-r\/\">example from Brooke Watson<\/a>. The data for <a href=\"http:\/\/therapboard.com\/\">http:\/\/therapboard.com\/<\/a> loads via javascript and will not work with <code>xml2::read_html()<\/code>.<\/p>\n<pre><code class=\"language-r\">gepetto() %>% \n  gep_render_html(\"http:\/\/therapboard.com\/\") -> doc\n\nhtml_nodes(doc, xpath=\".\/\/source[contains(@src, 'mp3')]\") %>%  \n  html_attr(\"src\") %>% \n  head(10)\n## [1] \"audio\/2chainz_4.mp3\"        \"audio\/2chainz_yeah2.mp3\"   \n## [3] \"audio\/2chainz_tellem.mp3\"   \"audio\/2chainz_tru.mp3\"     \n## [5] \"audio\/2chainz_unh3.mp3\"     \"audio\/2chainz_watchout.mp3\"\n## [7] \"audio\/2chainz_whistle.mp3\"  \"audio\/2pac_4.mp3\"          \n## [9] \"audio\/2pac_5.mp3\"           \"audio\/2pac_6.mp3\"<\/code><\/pre>\n<p>Even with a Node.js and npm dependency, I think that&#8217;s a bit friendlier than interacting with <code>phantomjs<\/code>.<\/p>\n<p>We can render a screenshot of a site as well. Since we&#8217;re not stealing content this way, I&#8217;m going to cheat a bit and grab the New York Times front page:<\/p>\n<pre><code class=\"language-r\">gepetto() %>% \n  gep_render_magick(\"https:\/\/nytimes.com\/\")\n##   format width height colorspace matte filesize density\n## 1    PNG  1440   6828       sRGB  TRUE        0   72x72<\/code><\/pre>\n<p><a href=\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/nyt\/\" rel=\"attachment wp-att-11422\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"11422\" data-permalink=\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/nyt\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/nyt.png?fit=1440%2C990&amp;ssl=1\" data-orig-size=\"1440,990\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"nyt\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/nyt.png?fit=300%2C206&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/nyt.png?fit=510%2C351&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/nyt.png?resize=510%2C351&#038;ssl=1\" alt=\"\" width=\"510\" height=\"351\" class=\"aligncenter size-full wp-image-11422\" \/><\/a><\/p>\n<p>Astute readers will notice it returns a <code>magick<\/code> object so you can work with it immediately.<\/p>\n<p>I&#8217;m still working out the interface for image capture and will also be supporting capturing the image of a CSS selector target. I mention that since the <code>gep_render_magick()<\/code> actually captured <em>the entire page<\/em> which you can <a href=\"https:\/\/rud.is\/dl\/nyt.png\">see for yourself<\/a> (the thumbnail doesn&#8217;t do it justice).<\/p>\n<p>Testing <code>gep_render_pdf()<\/code> is an exercise left to the reader.<\/p>\n<h3>FIN<\/h3>\n<p>The <code>gepetto<\/code> REST API is at version 0.1.0 meaning it&#8217;s new, raw and likely to change (quickly, too). Jump on board in whatever repo you&#8217;re more comfortable with and kick the tyres + file issues or PRs (on either or both projects) as you are wont to do.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It&#8217;s been over a year since Headless Chrome was introduced and it has matured greatly over that time and has acquired a pretty large user base. The TLDR on it is that you can now use Chrome as you would any command-line interface (CLI) program and generate PDFs, images or render javascript-interpreted HTML by supplying [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[91,725],"tags":[],"class_list":["post-11416","post","type-post","status-publish","format-standard","hentry","category-r","category-web-scraping"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Introducing &#039;gepetto&#039; \u2014 a Splash-like REST API to Headless Chrome - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Introducing &#039;gepetto&#039; \u2014 a Splash-like REST API to Headless Chrome - rud.is\" \/>\n<meta property=\"og:description\" content=\"It&#8217;s been over a year since Headless Chrome was introduced and it has matured greatly over that time and has acquired a pretty large user base. The TLDR on it is that you can now use Chrome as you would any command-line interface (CLI) program and generate PDFs, images or render javascript-interpreted HTML by supplying [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2018-08-23T22:32:53+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-09-24T23:49:44+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/rud.is\/gifs\/gepetto\/install-gepetto.gif\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"Introducing &#8216;gepetto&#8217; \u2014 a Splash-like REST API to Headless Chrome\",\"datePublished\":\"2018-08-23T22:32:53+00:00\",\"dateModified\":\"2018-09-24T23:49:44+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/\"},\"wordCount\":978,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"image\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/rud.is\/gifs\/gepetto\/install-gepetto.gif\",\"articleSection\":[\"R\",\"web scraping\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/\",\"url\":\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/\",\"name\":\"Introducing 'gepetto' \u2014 a Splash-like REST API to Headless Chrome - rud.is\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/rud.is\/gifs\/gepetto\/install-gepetto.gif\",\"datePublished\":\"2018-08-23T22:32:53+00:00\",\"dateModified\":\"2018-09-24T23:49:44+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/#primaryimage\",\"url\":\"https:\/\/rud.is\/gifs\/gepetto\/install-gepetto.gif\",\"contentUrl\":\"https:\/\/rud.is\/gifs\/gepetto\/install-gepetto.gif\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/rud.is\/b\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Introducing &#8216;gepetto&#8217; \u2014 a Splash-like REST API to Headless Chrome\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/rud.is\/b\/#website\",\"url\":\"https:\/\/rud.is\/b\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/rud.is\/b\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\/\/rud.is\"],\"url\":\"https:\/\/rud.is\/b\/author\/hrbrmstr\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Introducing 'gepetto' \u2014 a Splash-like REST API to Headless Chrome - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/","og_locale":"en_US","og_type":"article","og_title":"Introducing 'gepetto' \u2014 a Splash-like REST API to Headless Chrome - rud.is","og_description":"It&#8217;s been over a year since Headless Chrome was introduced and it has matured greatly over that time and has acquired a pretty large user base. The TLDR on it is that you can now use Chrome as you would any command-line interface (CLI) program and generate PDFs, images or render javascript-interpreted HTML by supplying [&hellip;]","og_url":"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/","og_site_name":"rud.is","article_published_time":"2018-08-23T22:32:53+00:00","article_modified_time":"2018-09-24T23:49:44+00:00","og_image":[{"url":"https:\/\/rud.is\/gifs\/gepetto\/install-gepetto.gif","type":"","width":"","height":""}],"author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"Introducing &#8216;gepetto&#8217; \u2014 a Splash-like REST API to Headless Chrome","datePublished":"2018-08-23T22:32:53+00:00","dateModified":"2018-09-24T23:49:44+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/"},"wordCount":978,"commentCount":0,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"image":{"@id":"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/gifs\/gepetto\/install-gepetto.gif","articleSection":["R","web scraping"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/","url":"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/","name":"Introducing 'gepetto' \u2014 a Splash-like REST API to Headless Chrome - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"primaryImageOfPage":{"@id":"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/#primaryimage"},"image":{"@id":"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/gifs\/gepetto\/install-gepetto.gif","datePublished":"2018-08-23T22:32:53+00:00","dateModified":"2018-09-24T23:49:44+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/#primaryimage","url":"https:\/\/rud.is\/gifs\/gepetto\/install-gepetto.gif","contentUrl":"https:\/\/rud.is\/gifs\/gepetto\/install-gepetto.gif"},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2018\/08\/23\/introducing-gepetto-a-splash-like-rest-api-to-headless-chrome\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"Introducing &#8216;gepetto&#8217; \u2014 a Splash-like REST API to Headless Chrome"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-2Y8","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":12013,"url":"https:\/\/rud.is\/b\/2019\/02\/28\/htmlunitjars-updated-to-2-34-0\/","url_meta":{"origin":11416,"position":0},"title":"htmlunitjars Updated to 2.34.0","author":"hrbrmstr","date":"2019-02-28","format":false,"excerpt":"The in-dev htmlunit package for javascript-\"enabled\" web-scraping without the need for Selenium, Splash or headless Chrome relies on the HtmlUnit library and said library just released version 2.34.0 with a wide array of changes that should make it possible to scrape more gnarly javascript-\"enabled\" sites. The Chrome emulation is now\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":14475,"url":"https:\/\/rud.is\/b\/2023\/09\/30\/avoid-libwebp-electron-woes-on-macos-with-positron\/","url_meta":{"origin":11416,"position":1},"title":"Avoid libwebp Electron Woes On macOS With positron","author":"hrbrmstr","date":"2023-09-30","format":false,"excerpt":"If you've got ? on this blog (directly, or via syndication) you'd have to have been living under a rock to not know about the libwebp supply chain disaster. An unfortunate casualty of inept programming just happened to be any app in the Electron ecosystem that doesn't undergo bleeding-edge updates.\u2026","rel":"","context":"In &quot;Cybersecurity&quot;","block_context":{"text":"Cybersecurity","link":"https:\/\/rud.is\/b\/category\/cybersecurity\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/09\/resource-database-Ix86EQm6HDQ-unsplash.jpg?fit=960%2C1200&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/09\/resource-database-Ix86EQm6HDQ-unsplash.jpg?fit=960%2C1200&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/09\/resource-database-Ix86EQm6HDQ-unsplash.jpg?fit=960%2C1200&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/09\/resource-database-Ix86EQm6HDQ-unsplash.jpg?fit=960%2C1200&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":3605,"url":"https:\/\/rud.is\/b\/2015\/08\/07\/adding-a-cran-search-engine-to-chrome\/","url_meta":{"origin":11416,"position":2},"title":"Adding a CRAN Search Engine to Chrome","author":"hrbrmstr","date":"2015-08-07","format":false,"excerpt":"Riffing off of [the previous post](http:\/\/rud.is\/b\/2015\/08\/05\/speeding-up-your-quests-for-r-stuff\/), here's a way to quickly search CRAN (the @RStudio flavor) from the Chrome search bar. - Paste `chrome:\/\/settings\/searchEngines` into your location bar and hit return\/enter - Scroll down until the input boxes show, enabling you to add a search engine - For _\"Add a\u2026","rel":"","context":"In &quot;Chrome&quot;","block_context":{"text":"Chrome","link":"https:\/\/rud.is\/b\/category\/chrome\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":13498,"url":"https:\/\/rud.is\/b\/2022\/07\/10\/rust-cli-for-apples-weatherkit-rest-api\/","url_meta":{"origin":11416,"position":3},"title":"Rust CLI For Apple&#8217;s WeatherKit REST API","author":"hrbrmstr","date":"2022-07-10","format":false,"excerpt":"Apple is in the final stages of shuttering the DarkSky service\/API. They've replaced it with WeatherKit, which has both an xOS framework version as well as a REST API. To use either, you need to be a member of the Apple Developer Program (ADP) \u2014 $99.00\/USD per-year \u2014 and calls\u2026","rel":"","context":"In &quot;Apple&quot;","block_context":{"text":"Apple","link":"https:\/\/rud.is\/b\/category\/apple\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3599,"url":"https:\/\/rud.is\/b\/2015\/08\/05\/speeding-up-your-quests-for-r-stuff\/","url_meta":{"origin":11416,"position":4},"title":"Speeding Up Your Quest(s) For &#8220;R Stuff&#8221;","author":"hrbrmstr","date":"2015-08-05","format":false,"excerpt":"I use Google quite a bit when conjuring up R projects, whether it be in a lazy pursuit of a PDF vignette or to find a package or function to fit a niche need. Inevitably, I'll do something like [this](https:\/\/www.google.com\/#q=cran+shapefile) (yeah, I'm still on a mapping kick) and the first\u2026","rel":"","context":"In &quot;Browsers&quot;","block_context":{"text":"Browsers","link":"https:\/\/rud.is\/b\/category\/browsers\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":12718,"url":"https:\/\/rud.is\/b\/2020\/04\/01\/uaparserjs-updated-on-cran-using-webpack-to-make-v8-application-bundles\/","url_meta":{"origin":11416,"position":5},"title":"{uaparserjs} Updated on CRAN &#038; Using webpack to Make {V8}  Application Bundles","author":"hrbrmstr","date":"2020-04-01","format":false,"excerpt":"Just a quick note that thanks to a gentle nudge an updated version of {uaparser} --- a package that processes User Agent strings web clients send to servers --- is making its way to all the CRAN mirrors and is also available on CINC. The most significant change is a\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11416","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=11416"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11416\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=11416"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=11416"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=11416"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}