

{"id":5031,"date":"2017-02-14T13:19:29","date_gmt":"2017-02-14T18:19:29","guid":{"rendered":"https:\/\/rud.is\/b\/?p=5031"},"modified":"2018-03-10T07:54:27","modified_gmt":"2018-03-10T12:54:27","slug":"spelunking-xhrs-xmlhttprequests-with-splashr","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/","title":{"rendered":"Spelunking XHRs (XMLHttpRequests) with splashr"},"content":{"rendered":"<p><a href=\"https:\/\/github.com\/hrbrmstr\/splashr\"><code>splashr<\/code><\/a> has gained some new functionality since <a href=\"https:\/\/rud.is\/b\/2017\/02\/09\/diving-into-dynamic-website-content-with-splashr\/\">the introductory post<\/a>. First, there&#8217;s a whole new Docker image for it that embeds a local web server. Why? The main request for it was to enable rendering of <code>htmlwidgets<\/code>:<\/p>\n<pre id=\"splash-widget-01\"><code class=\"language-r\">splash_vm &lt;- start_splash(add_tempdir=TRUE)\r\n\r\nDiagrammeR(&quot;\r\n  graph LR\r\n    A--&gt;B\r\n    A--&gt;C\r\n    C--&gt;E\r\n    B--&gt;D\r\n    C--&gt;D\r\n    D--&gt;F\r\n    E--&gt;F\r\n&quot;) %&gt;% \r\n  saveWidget(&quot;\/tmp\/diag.html&quot;)\r\n\r\nsplash(&quot;localhost&quot;) %&gt;% \r\n  render_file(&quot;\/tmp\/diag.html&quot;, output=&quot;html&quot;)\r\n## {xml_document}\r\n## &lt;html&gt;\r\n## [1] &lt;head&gt;\\n&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text\/html; charset=UTF-8&quot;&gt;\\n&lt;meta charset=&quot;utf-8&quot;&gt;\\n&lt;script src= ...\r\n## [2] &lt;body style=&quot;background-color: white; margin: 0px; padding: 40px;&quot;&gt;\\n&lt;div id=&quot;htmlwidget_container&quot;&gt;\\n&lt;div id=&quot;ht ...\r\n\r\nsplash(&quot;localhost&quot;) %&gt;% \r\n  render_file(&quot;\/tmp\/diag.html&quot;, output=&quot;png&quot;, wait=2)<\/code><\/pre>\n<p><a href=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"5032\" data-permalink=\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/diag\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png?fit=384%2C249&amp;ssl=1\" data-orig-size=\"384,249\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"diag\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png?fit=300%2C195&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png?fit=384%2C249&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png?resize=384%2C249&#038;ssl=1\" alt=\"\" width=\"384\" height=\"249\" class=\"aligncenter size-full wp-image-5032\" \/><\/a><\/p>\n<p>But if you use the new Docker image and the <code>add_tempdir=TRUE<\/code> parameter it can render any local HTML file.<\/p>\n<p>The other new bits are helpers to identify content types in the HAR types. Along with <code>get_content_type()<\/code>:<\/p>\n<pre id=\"splash-get-ctype\"><code class=\"language-r\">library(tidyverse)\r\n\r\nmap_chr(rud_har$log$entries, get_content_type)\r\n##  [1] &quot;text\/html&quot;                &quot;text\/html&quot;                &quot;application\/javascript&quot;   &quot;text\/css&quot;                \r\n##  [5] &quot;text\/css&quot;                 &quot;text\/css&quot;                 &quot;text\/css&quot;                 &quot;text\/css&quot;                \r\n##  [9] &quot;text\/css&quot;                 &quot;application\/javascript&quot;   &quot;application\/javascript&quot;   &quot;application\/javascript&quot;  \r\n## [13] &quot;application\/javascript&quot;   &quot;application\/javascript&quot;   &quot;application\/javascript&quot;   &quot;text\/javascript&quot;         \r\n## [17] &quot;text\/css&quot;                 &quot;text\/css&quot;                 &quot;application\/x-javascript&quot; &quot;application\/x-javascript&quot;\r\n## [21] &quot;application\/x-javascript&quot; &quot;application\/x-javascript&quot; &quot;application\/x-javascript&quot; NA                        \r\n## [25] &quot;text\/css&quot;                 &quot;image\/png&quot;                &quot;image\/png&quot;                &quot;image\/png&quot;               \r\n## [29] &quot;font\/ttf&quot;                 &quot;font\/ttf&quot;                 &quot;text\/html&quot;                &quot;font\/ttf&quot;                \r\n## [33] &quot;font\/ttf&quot;                 &quot;application\/font-woff&quot;    &quot;application\/font-woff&quot;    &quot;image\/svg+xml&quot;           \r\n## [37] &quot;text\/css&quot;                 &quot;text\/css&quot;                 &quot;image\/gif&quot;                &quot;image\/svg+xml&quot;           \r\n## [41] &quot;application\/font-woff&quot;    &quot;application\/font-woff&quot;    &quot;application\/font-woff&quot;    &quot;application\/font-woff&quot;   \r\n## [45] &quot;application\/font-woff&quot;    &quot;application\/font-woff&quot;    &quot;application\/font-woff&quot;    &quot;application\/font-woff&quot;   \r\n## [49] &quot;text\/css&quot;                 &quot;application\/x-javascript&quot; &quot;image\/gif&quot;                NA                        \r\n## [53] &quot;image\/jpeg&quot;               &quot;image\/svg+xml&quot;            &quot;image\/svg+xml&quot;            &quot;image\/svg+xml&quot;           \r\n## [57] &quot;image\/svg+xml&quot;            &quot;image\/svg+xml&quot;            &quot;image\/svg+xml&quot;            &quot;image\/gif&quot;               \r\n## [61] NA                         &quot;application\/x-javascript&quot; NA                         NA<\/code><\/pre>\n<p>there are many <code>is_...()<\/code> functions for logical tests.<\/p>\n<p>But, one of the more interesting <code>is_()<\/code> functions is <code>is_xhr()<\/code>. Sites with dynamic content <em>usually<\/em> load said content via an <code>XMLHttpRequest<\/code> or XHR for short. Modern web apps <em>usually<\/em> return JSON in said requests and, for questions like <a href=\"https:\/\/stackoverflow.com\/questions\/41435049\/scraping-links-from-webpage-with-javascript-in-r\">this one on StackOverflow<\/a> it&#8217;s <em>usually<\/em> better to grab the JSON and use it for data than it is to scrape the table made from JavaScript calls.<\/p>\n<p>Now, it&#8217;s not <em>too hard<\/em> to open Developer Tools and find those XHR requests, but we can also use <code>splashr<\/code> to programmatically find them. We have to do a bit more work and use the new <code>execute_lua()<\/code> function since we need to give the page time to load up all the data. (I&#8217;ll eventually write a mini-R-DSL around this idiom so you don&#8217;t have to grok Lua for non-complex scraping tasks). Here&#8217;s how we&#8217;d answer that StackOverflow question today\u2026<\/p>\n<p>First, we grab the entire HAR contents (including bodies of the individual requests) after waiting a bit:<\/p>\n<pre id=\"splash-ex-har-1\"><code class=\"language-r\">splash_local %&gt;%\r\n  execute_lua(&#039;\r\nfunction main(splash)\r\n  splash.response_body_enabled = true\r\n  splash:go(&quot;http:\/\/www.childrenshospital.org\/directory?state=%7B%22showLandingContent%22%3Afalse%2C%22model%22%3A%7B%22search_specialist%22%3Afalse%2C%22search_type%22%3A%5B%22directoryphysician%22%2C%22directorynurse%22%5D%7D%2C%22customModel%22%3A%7B%22nurses%22%3Atrue%7D%7D&quot;)\r\n  splash:wait(2)\r\n  return splash:har()\r\nend\r\n&#039;) -&gt; res\r\n\r\npg &lt;- as_har(res)<\/code><\/pre>\n<p>then we look for XHRs:<\/p>\n<pre id=\"splash-ex-har-2\"><code class=\"language-r\">map_lgl(pg$log$entries, is_xhr) %&gt;% which()\r\n## 10<\/code><\/pre>\n<p>and, finally, we grab the JSON:<\/p>\n<pre id=\"splash-ex-har-3\"><code class=\"language-r\">pg$log$entries[[10]]$response$content$text %&gt;% \r\n  openssl::base64_decode() %&gt;% \r\n  rawToChar() %&gt;% \r\n  jsonlite::fromJSON() %&gt;% \r\n  glimpse()\r\n## List of 4\r\n##  $ TotalPages  : int 16\r\n##  $ TotalRecords: int 384\r\n##  $ Records     :&#039;data.frame&#039;: 24 obs. of  21 variables:\r\n##   ..$ ID            : chr [1:24] &quot;{5E4B0D96-18D3-4FC6-B1AA-345675F3765C}&quot; &quot;{674EEC8B-062A-4268-9467-5C61030B83C9}&quot; ## &quot;{3E6257FE-67A1-4F13-B377-9EA7CCBD50F2}&quot; &quot;{C28479E6-5458-4010-A005-84E5F35B2FEA}&quot; ...\r\n##   ..$ FirstName     : chr [1:24] &quot;Mirna&quot; &quot;Barbara&quot; &quot;Donald&quot; &quot;Victoria&quot; ...\r\n##   ..$ LastName      : chr [1:24] &quot;Aeschlimann&quot; &quot;Angus&quot; &quot;Annino&quot; &quot;Arthur&quot; ...\r\n##   ..$ Image         : chr [1:24] &quot;&quot; &quot;\/~\/media\/directory\/physicians\/ppoc\/angus_barbara.ashx&quot; &quot;\/~\/media\/directory\/physicians\/ppoc\/## annino_donald.ashx&quot; &quot;\/~\/media\/directory\/physicians\/ppoc\/arthur_victoria.ashx&quot; ...\r\n##   ..$ Suffix        : chr [1:24] &quot;MD&quot; &quot;MD&quot; &quot;MD&quot; &quot;MD&quot; ...\r\n##   ..$ Url           : chr [1:24] &quot;http:\/\/www.childrenshospital.org\/doctors\/mirna-aeschlimann&quot; &quot;http:\/\/www.childrenshospital.org\/doctors\/## barbara-angus&quot; &quot;http:\/\/www.childrenshospital.org\/doctors\/donald-annino&quot; &quot;http:\/\/www.childrenshospital.org\/doctors\/victoria-arthur&quot; ...\r\n##   ..$ Gender        : chr [1:24] &quot;female&quot; &quot;female&quot; &quot;male&quot; &quot;female&quot; ...\r\n##   ..$ Latitude      : chr [1:24] &quot;42.468769&quot; &quot;42.235088&quot; &quot;42.463177&quot; &quot;42.447168&quot; ...\r\n##   ..$ Longitude     : chr [1:24] &quot;-71.100558&quot; &quot;-71.016021&quot; &quot;-71.143169&quot; &quot;-71.229734&quot; ...\r\n##   ..$ Address       : chr [1:24] &quot;{&quot;practice_name&quot;:&quot;Pediatrics, Inc.&quot;, &quot;address_1&quot;:&quot;577 Main ## Street&quot;, &quot;city&quot;:&amp;q&quot;| __truncated__ &quot;{&quot;practice_name&quot;:&quot;Crown Colony Pediatrics&quot;, ## &quot;address_1&quot;:&quot;500 Congress Street, Suite 1F&quot;&quot;| __truncated__ &quot;{&quot;practice_name&quot;:&quot;Pediatricians ## Inc.&quot;, &quot;address_1&quot;:&quot;955 Main Street&quot;, &quot;city&quot;:&quot;| __truncated__ ## &quot;{&quot;practice_name&quot;:&quot;Lexington Pediatrics&quot;, &quot;address_1&quot;:&quot;19 Muzzey Street, Suite 105&quot;, &amp;qu&quot;| ## __truncated__ ...\r\n##   ..$ Distance      : chr [1:24] &quot;&quot; &quot;&quot; &quot;&quot; &quot;&quot; ...\r\n##   ..$ OtherLocations: chr [1:24] &quot;&quot; &quot;&quot; &quot;&quot; &quot;&quot; ...\r\n##   ..$ AcademicTitle : chr [1:24] &quot;&quot; &quot;&quot; &quot;&quot; &quot;Clinical Instructor of Pediatrics - Harvard Medical School&quot; ...\r\n##   ..$ HospitalTitle : chr [1:24] &quot;Pediatrician&quot; &quot;Pediatrician&quot; &quot;Pediatrician&quot; &quot;Pediatrician&quot; ...\r\n##   ..$ Specialties   : chr [1:24] &quot;Primary Care, Pediatrics, General Pediatrics&quot; &quot;Primary Care, Pediatrics, General Pediatrics&quot; &quot;General ## Pediatrics, Pediatrics, Primary Care&quot; &quot;Primary Care, Pediatrics, General Pediatrics&quot; ...\r\n##   ..$ Departments   : chr [1:24] &quot;&quot; &quot;&quot; &quot;&quot; &quot;&quot; ...\r\n##   ..$ Languages     : chr [1:24] &quot;English&quot; &quot;English&quot; &quot;&quot; &quot;&quot; ...\r\n##   ..$ PPOCLink      : chr [1:24] &quot;http:\/\/www.childrenshospital.org\/patient-resources\/provider-glossary&quot; &quot;\/patient-resources\/## provider-glossary&quot; &quot;http:\/\/www.childrenshospital.org\/patient-resources\/provider-glossary&quot; &quot;http:\/\/www.childrenshospital.org\/## patient-resources\/provider-glossary&quot; ...\r\n##   ..$ Gallery       : chr [1:24] &quot;&quot; &quot;&quot; &quot;&quot; &quot;&quot; ...\r\n##   ..$ Phone         : chr [1:24] &quot;781-438-7330&quot; &quot;617-471-3411&quot; &quot;781-729-4262&quot; &quot;781-862-4110&quot; ...\r\n##   ..$ Fax           : chr [1:24] &quot;781-279-4046&quot; &quot;(617) 471-3584&quot; &quot;&quot; &quot;(781) 863-2007&quot; ...\r\n##  $ Synonims    : list()<\/code><\/pre>\n<p><strong>UPDATE<\/strong> So, I wrote a mini-DSL for this:<\/p>\n<pre id=\"splash-dsl-1\"><code class=\"language-r\">splash_local %&gt;%\r\n  splash_response_body(TRUE) %&gt;% \r\n  splash_go(&quot;http:\/\/www.childrenshospital.org\/directory?state=%7B%22showLandingContent%22%3Afalse%2C%22model%22%3A%7B%22search_specialist%22%3Afalse%2C%22search_type%22%3A%5B%22directoryphysician%22%2C%22directorynurse%22%5D%7D%2C%22customModel%22%3A%7B%22nurses%22%3Atrue%7D%7D&quot;) %&gt;% \r\n  splash_wait(2) %&gt;% \r\n  splash_har() -&gt; res<\/code><\/pre>\n<p>which should make it easier to perform basic &#8220;go-wait-retrieve&#8221; operations.<\/p>\n<p>It&#8217;s unlikely we want to rely on a running Splash instance for our production work, so I&#8217;ll be making a helper function to turn HAR XHR requests into a <code>httr<\/code> function calls, similar to the way <a href=\"https:\/\/github.com\/hrbrmstr\/curlconverter\"><code>curlconverter<\/code><\/a> works.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>splashr has gained some new functionality since the introductory post. First, there&#8217;s a whole new Docker image for it that embeds a local web server. Why? The main request for it was to enable rendering of htmlwidgets: But if you use the new Docker image and the add_tempdir=TRUE parameter it can render any local HTML [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":true,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[91,725],"tags":[810],"class_list":["post-5031","post","type-post","status-publish","format-standard","hentry","category-r","category-web-scraping","tag-post"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Spelunking XHRs (XMLHttpRequests) with splashr - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Spelunking XHRs (XMLHttpRequests) with splashr - rud.is\" \/>\n<meta property=\"og:description\" content=\"splashr has gained some new functionality since the introductory post. First, there&#8217;s a whole new Docker image for it that embeds a local web server. Why? The main request for it was to enable rendering of htmlwidgets: But if you use the new Docker image and the add_tempdir=TRUE parameter it can render any local HTML [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2017-02-14T18:19:29+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-03-10T12:54:27+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"Spelunking XHRs (XMLHttpRequests) with splashr\",\"datePublished\":\"2017-02-14T18:19:29+00:00\",\"dateModified\":\"2018-03-10T12:54:27+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/\"},\"wordCount\":341,\"commentCount\":2,\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"image\":{\"@id\":\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png\",\"keywords\":[\"post\"],\"articleSection\":[\"R\",\"web scraping\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/\",\"url\":\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/\",\"name\":\"Spelunking XHRs (XMLHttpRequests) with splashr - rud.is\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png\",\"datePublished\":\"2017-02-14T18:19:29+00:00\",\"dateModified\":\"2018-03-10T12:54:27+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#primaryimage\",\"url\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png?fit=384%2C249&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png?fit=384%2C249&ssl=1\",\"width\":384,\"height\":249},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/rud.is\/b\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Spelunking XHRs (XMLHttpRequests) with splashr\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/rud.is\/b\/#website\",\"url\":\"https:\/\/rud.is\/b\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/rud.is\/b\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\/\/rud.is\"],\"url\":\"https:\/\/rud.is\/b\/author\/hrbrmstr\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Spelunking XHRs (XMLHttpRequests) with splashr - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/","og_locale":"en_US","og_type":"article","og_title":"Spelunking XHRs (XMLHttpRequests) with splashr - rud.is","og_description":"splashr has gained some new functionality since the introductory post. First, there&#8217;s a whole new Docker image for it that embeds a local web server. Why? The main request for it was to enable rendering of htmlwidgets: But if you use the new Docker image and the add_tempdir=TRUE parameter it can render any local HTML [&hellip;]","og_url":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/","og_site_name":"rud.is","article_published_time":"2017-02-14T18:19:29+00:00","article_modified_time":"2018-03-10T12:54:27+00:00","og_image":[{"url":"https:\/\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png","type":"","width":"","height":""}],"author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"Spelunking XHRs (XMLHttpRequests) with splashr","datePublished":"2017-02-14T18:19:29+00:00","dateModified":"2018-03-10T12:54:27+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/"},"wordCount":341,"commentCount":2,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"image":{"@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png","keywords":["post"],"articleSection":["R","web scraping"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/","url":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/","name":"Spelunking XHRs (XMLHttpRequests) with splashr - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"primaryImageOfPage":{"@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#primaryimage"},"image":{"@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png","datePublished":"2017-02-14T18:19:29+00:00","dateModified":"2018-03-10T12:54:27+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#primaryimage","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png?fit=384%2C249&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png?fit=384%2C249&ssl=1","width":384,"height":249},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"Spelunking XHRs (XMLHttpRequests) with splashr"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-1j9","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":5004,"url":"https:\/\/rud.is\/b\/2017\/02\/09\/diving-into-dynamic-website-content-with-splashr\/","url_meta":{"origin":5031,"position":0},"title":"Diving Into Dynamic Website Content with splashr","author":"hrbrmstr","date":"2017-02-09","format":false,"excerpt":"If you do enough web scraping, you'll eventually hit a wall that the trusty httr verbs (that sit beneath rvest) cannot really overcome: dynamically created content (via javascript) on a site. If the site was nice enough to use XHR requests to load the dynamic content, you can generally still\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11383,"url":"https:\/\/rud.is\/b\/2018\/08\/13\/in-brief-splashr-update-high-performance-scraping-with-splashr-furrr-teamhg-memexs-aquarium\/","url_meta":{"origin":5031,"position":1},"title":"In-brief: splashr update + High Performance Scraping with splashr, furrr &#038; TeamHG-Memex&#8217;s Aquarium","author":"hrbrmstr","date":"2018-08-13","format":false,"excerpt":"The development version of splashr now support authenticated connections to Splash API instances. Just specify user and pass on the initial splashr::splash() call to use your scraping setup a bit more safely. For those not familiar with splashr and\/or Splash: the latter is a lightweight alternative to tools like Selenium\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11765,"url":"https:\/\/rud.is\/b\/2019\/01\/14\/splashr-0-6-0-now-uses-the-cran-nascent-stevedore-package-for-docker-orchestration\/","url_meta":{"origin":5031,"position":2},"title":"splashr 0.6.0 Now Uses the CRAN-nascent stevedore Package for Docker Orchestration","author":"hrbrmstr","date":"2019-01-14","format":false,"excerpt":"The splashr package [srht|GL|GH] \u2014 an alternative to Selenium for javascript-enabled\/browser-emulated web scraping \u2014 is now at version 0.6.0 (still in dev-mode but on its way to CRAN in the next 14 days). The major change from version 0.5.x (which never made it to CRAN) is a swap out of\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6206,"url":"https:\/\/rud.is\/b\/2017\/08\/29\/new-cran-package-announcement-splashr\/","url_meta":{"origin":5031,"position":3},"title":"New CRAN Package Announcement: splashr","author":"hrbrmstr","date":"2017-08-29","format":false,"excerpt":"I'm pleased to announce that splashr is now on CRAN. (That image was generated with splashr::render_png(url = \"https:\/\/cran.r-project.org\/web\/packages\/splashr\/\")). The package is an R interface to the Splash javascript rendering service. It works in a similar fashion to Selenium but is fear more geared to web scraping and has quite a\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/08\/splashr.png?fit=1066%2C1108&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/08\/splashr.png?fit=1066%2C1108&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/08\/splashr.png?fit=1066%2C1108&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/08\/splashr.png?fit=1066%2C1108&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/08\/splashr.png?fit=1066%2C1108&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":9496,"url":"https:\/\/rud.is\/b\/2018\/04\/08\/dissecting-r-package-utility-belts\/","url_meta":{"origin":5031,"position":4},"title":"Dissecting R Package &#8220;Utility Belts&#8221;","author":"hrbrmstr","date":"2018-04-08","format":false,"excerpt":"Many R package authors (including myself) lump a collection of small, useful functions into some type of utils.R file and usually do not export the functions since they are (generally) designed to work on package internals rather than expose their functionality via the exported package API. Just like Batman's utility\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/r-utility-belt-final.png?fit=891%2C375&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/r-utility-belt-final.png?fit=891%2C375&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/r-utility-belt-final.png?fit=891%2C375&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/r-utility-belt-final.png?fit=891%2C375&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":6385,"url":"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/","url_meta":{"origin":5031,"position":5},"title":"Pirating Web Content Responsibly With R","author":"hrbrmstr","date":"2017-09-19","format":false,"excerpt":"International Code Talk Like A Pirate Day almost slipped by without me noticing (September has been a crazy busy month), but it popped up in the calendar notifications today and I was glad that I had prepped the meat of a post a few weeks back. There will be no\u2026","rel":"","context":"In &quot;data wrangling&quot;","block_context":{"text":"data wrangling","link":"https:\/\/rud.is\/b\/category\/data-wrangling\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1200%2C917&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1200%2C917&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1200%2C917&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1200%2C917&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1200%2C917&ssl=1&resize=1050%2C600 3x"},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/5031","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=5031"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/5031\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=5031"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=5031"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=5031"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}