

{"id":5031,"date":"2017-02-14T13:19:29","date_gmt":"2017-02-14T18:19:29","guid":{"rendered":"https:\/\/rud.is\/b\/?p=5031"},"modified":"2018-03-10T07:54:27","modified_gmt":"2018-03-10T12:54:27","slug":"spelunking-xhrs-xmlhttprequests-with-splashr","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/","title":{"rendered":"Spelunking XHRs (XMLHttpRequests) with splashr"},"content":{"rendered":"<p><a href=\"https:\/\/github.com\/hrbrmstr\/splashr\"><code>splashr<\/code><\/a> has gained some new functionality since <a href=\"https:\/\/rud.is\/b\/2017\/02\/09\/diving-into-dynamic-website-content-with-splashr\/\">the introductory post<\/a>. First, there&#8217;s a whole new Docker image for it that embeds a local web server. Why? The main request for it was to enable rendering of <code>htmlwidgets<\/code>:<\/p>\n<pre id=\"splash-widget-01\"><code class=\"language-r\">splash_vm &lt;- start_splash(add_tempdir=TRUE)\r\n\r\nDiagrammeR(&quot;\r\n  graph LR\r\n    A--&gt;B\r\n    A--&gt;C\r\n    C--&gt;E\r\n    B--&gt;D\r\n    C--&gt;D\r\n    D--&gt;F\r\n    E--&gt;F\r\n&quot;) %&gt;% \r\n  saveWidget(&quot;\/tmp\/diag.html&quot;)\r\n\r\nsplash(&quot;localhost&quot;) %&gt;% \r\n  render_file(&quot;\/tmp\/diag.html&quot;, output=&quot;html&quot;)\r\n## {xml_document}\r\n## &lt;html&gt;\r\n## [1] &lt;head&gt;\\n&lt;meta http-equiv=&quot;Content-Type&quot; content=&quot;text\/html; charset=UTF-8&quot;&gt;\\n&lt;meta charset=&quot;utf-8&quot;&gt;\\n&lt;script src= ...\r\n## [2] &lt;body style=&quot;background-color: white; margin: 0px; padding: 40px;&quot;&gt;\\n&lt;div id=&quot;htmlwidget_container&quot;&gt;\\n&lt;div id=&quot;ht ...\r\n\r\nsplash(&quot;localhost&quot;) %&gt;% \r\n  render_file(&quot;\/tmp\/diag.html&quot;, output=&quot;png&quot;, wait=2)<\/code><\/pre>\n<p><a href=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"5032\" data-permalink=\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/diag\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png?fit=384%2C249&amp;ssl=1\" data-orig-size=\"384,249\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"diag\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png?fit=384%2C249&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png?resize=384%2C249&#038;ssl=1\" alt=\"\" width=\"384\" height=\"249\" class=\"aligncenter size-full wp-image-5032\" \/><\/a><\/p>\n<p>But if you use the new Docker image and the <code>add_tempdir=TRUE<\/code> parameter it can render any local HTML file.<\/p>\n<p>The other new bits are helpers to identify content types in the HAR types. Along with <code>get_content_type()<\/code>:<\/p>\n<pre id=\"splash-get-ctype\"><code class=\"language-r\">library(tidyverse)\r\n\r\nmap_chr(rud_har$log$entries, get_content_type)\r\n##  [1] &quot;text\/html&quot;                &quot;text\/html&quot;                &quot;application\/javascript&quot;   &quot;text\/css&quot;                \r\n##  [5] &quot;text\/css&quot;                 &quot;text\/css&quot;                 &quot;text\/css&quot;                 &quot;text\/css&quot;                \r\n##  [9] &quot;text\/css&quot;                 &quot;application\/javascript&quot;   &quot;application\/javascript&quot;   &quot;application\/javascript&quot;  \r\n## [13] &quot;application\/javascript&quot;   &quot;application\/javascript&quot;   &quot;application\/javascript&quot;   &quot;text\/javascript&quot;         \r\n## [17] &quot;text\/css&quot;                 &quot;text\/css&quot;                 &quot;application\/x-javascript&quot; &quot;application\/x-javascript&quot;\r\n## [21] &quot;application\/x-javascript&quot; &quot;application\/x-javascript&quot; &quot;application\/x-javascript&quot; NA                        \r\n## [25] &quot;text\/css&quot;                 &quot;image\/png&quot;                &quot;image\/png&quot;                &quot;image\/png&quot;               \r\n## [29] &quot;font\/ttf&quot;                 &quot;font\/ttf&quot;                 &quot;text\/html&quot;                &quot;font\/ttf&quot;                \r\n## [33] &quot;font\/ttf&quot;                 &quot;application\/font-woff&quot;    &quot;application\/font-woff&quot;    &quot;image\/svg+xml&quot;           \r\n## [37] &quot;text\/css&quot;                 &quot;text\/css&quot;                 &quot;image\/gif&quot;                &quot;image\/svg+xml&quot;           \r\n## [41] &quot;application\/font-woff&quot;    &quot;application\/font-woff&quot;    &quot;application\/font-woff&quot;    &quot;application\/font-woff&quot;   \r\n## [45] &quot;application\/font-woff&quot;    &quot;application\/font-woff&quot;    &quot;application\/font-woff&quot;    &quot;application\/font-woff&quot;   \r\n## [49] &quot;text\/css&quot;                 &quot;application\/x-javascript&quot; &quot;image\/gif&quot;                NA                        \r\n## [53] &quot;image\/jpeg&quot;               &quot;image\/svg+xml&quot;            &quot;image\/svg+xml&quot;            &quot;image\/svg+xml&quot;           \r\n## [57] &quot;image\/svg+xml&quot;            &quot;image\/svg+xml&quot;            &quot;image\/svg+xml&quot;            &quot;image\/gif&quot;               \r\n## [61] NA                         &quot;application\/x-javascript&quot; NA                         NA<\/code><\/pre>\n<p>there are many <code>is_...()<\/code> functions for logical tests.<\/p>\n<p>But, one of the more interesting <code>is_()<\/code> functions is <code>is_xhr()<\/code>. Sites with dynamic content <em>usually<\/em> load said content via an <code>XMLHttpRequest<\/code> or XHR for short. Modern web apps <em>usually<\/em> return JSON in said requests and, for questions like <a href=\"https:\/\/stackoverflow.com\/questions\/41435049\/scraping-links-from-webpage-with-javascript-in-r\">this one on StackOverflow<\/a> it&#8217;s <em>usually<\/em> better to grab the JSON and use it for data than it is to scrape the table made from JavaScript calls.<\/p>\n<p>Now, it&#8217;s not <em>too hard<\/em> to open Developer Tools and find those XHR requests, but we can also use <code>splashr<\/code> to programmatically find them. We have to do a bit more work and use the new <code>execute_lua()<\/code> function since we need to give the page time to load up all the data. (I&#8217;ll eventually write a mini-R-DSL around this idiom so you don&#8217;t have to grok Lua for non-complex scraping tasks). Here&#8217;s how we&#8217;d answer that StackOverflow question today\u2026<\/p>\n<p>First, we grab the entire HAR contents (including bodies of the individual requests) after waiting a bit:<\/p>\n<pre id=\"splash-ex-har-1\"><code class=\"language-r\">splash_local %&gt;%\r\n  execute_lua(&#039;\r\nfunction main(splash)\r\n  splash.response_body_enabled = true\r\n  splash:go(&quot;http:\/\/www.childrenshospital.org\/directory?state=%7B%22showLandingContent%22%3Afalse%2C%22model%22%3A%7B%22search_specialist%22%3Afalse%2C%22search_type%22%3A%5B%22directoryphysician%22%2C%22directorynurse%22%5D%7D%2C%22customModel%22%3A%7B%22nurses%22%3Atrue%7D%7D&quot;)\r\n  splash:wait(2)\r\n  return splash:har()\r\nend\r\n&#039;) -&gt; res\r\n\r\npg &lt;- as_har(res)<\/code><\/pre>\n<p>then we look for XHRs:<\/p>\n<pre id=\"splash-ex-har-2\"><code class=\"language-r\">map_lgl(pg$log$entries, is_xhr) %&gt;% which()\r\n## 10<\/code><\/pre>\n<p>and, finally, we grab the JSON:<\/p>\n<pre id=\"splash-ex-har-3\"><code class=\"language-r\">pg$log$entries[[10]]$response$content$text %&gt;% \r\n  openssl::base64_decode() %&gt;% \r\n  rawToChar() %&gt;% \r\n  jsonlite::fromJSON() %&gt;% \r\n  glimpse()\r\n## List of 4\r\n##  $ TotalPages  : int 16\r\n##  $ TotalRecords: int 384\r\n##  $ Records     :&#039;data.frame&#039;: 24 obs. of  21 variables:\r\n##   ..$ ID            : chr [1:24] &quot;{5E4B0D96-18D3-4FC6-B1AA-345675F3765C}&quot; &quot;{674EEC8B-062A-4268-9467-5C61030B83C9}&quot; ## &quot;{3E6257FE-67A1-4F13-B377-9EA7CCBD50F2}&quot; &quot;{C28479E6-5458-4010-A005-84E5F35B2FEA}&quot; ...\r\n##   ..$ FirstName     : chr [1:24] &quot;Mirna&quot; &quot;Barbara&quot; &quot;Donald&quot; &quot;Victoria&quot; ...\r\n##   ..$ LastName      : chr [1:24] &quot;Aeschlimann&quot; &quot;Angus&quot; &quot;Annino&quot; &quot;Arthur&quot; ...\r\n##   ..$ Image         : chr [1:24] &quot;&quot; &quot;\/~\/media\/directory\/physicians\/ppoc\/angus_barbara.ashx&quot; &quot;\/~\/media\/directory\/physicians\/ppoc\/## annino_donald.ashx&quot; &quot;\/~\/media\/directory\/physicians\/ppoc\/arthur_victoria.ashx&quot; ...\r\n##   ..$ Suffix        : chr [1:24] &quot;MD&quot; &quot;MD&quot; &quot;MD&quot; &quot;MD&quot; ...\r\n##   ..$ Url           : chr [1:24] &quot;http:\/\/www.childrenshospital.org\/doctors\/mirna-aeschlimann&quot; &quot;http:\/\/www.childrenshospital.org\/doctors\/## barbara-angus&quot; &quot;http:\/\/www.childrenshospital.org\/doctors\/donald-annino&quot; &quot;http:\/\/www.childrenshospital.org\/doctors\/victoria-arthur&quot; ...\r\n##   ..$ Gender        : chr [1:24] &quot;female&quot; &quot;female&quot; &quot;male&quot; &quot;female&quot; ...\r\n##   ..$ Latitude      : chr [1:24] &quot;42.468769&quot; &quot;42.235088&quot; &quot;42.463177&quot; &quot;42.447168&quot; ...\r\n##   ..$ Longitude     : chr [1:24] &quot;-71.100558&quot; &quot;-71.016021&quot; &quot;-71.143169&quot; &quot;-71.229734&quot; ...\r\n##   ..$ Address       : chr [1:24] &quot;{&quot;practice_name&quot;:&quot;Pediatrics, Inc.&quot;, &quot;address_1&quot;:&quot;577 Main ## Street&quot;, &quot;city&quot;:&amp;q&quot;| __truncated__ &quot;{&quot;practice_name&quot;:&quot;Crown Colony Pediatrics&quot;, ## &quot;address_1&quot;:&quot;500 Congress Street, Suite 1F&quot;&quot;| __truncated__ &quot;{&quot;practice_name&quot;:&quot;Pediatricians ## Inc.&quot;, &quot;address_1&quot;:&quot;955 Main Street&quot;, &quot;city&quot;:&quot;| __truncated__ ## &quot;{&quot;practice_name&quot;:&quot;Lexington Pediatrics&quot;, &quot;address_1&quot;:&quot;19 Muzzey Street, Suite 105&quot;, &amp;qu&quot;| ## __truncated__ ...\r\n##   ..$ Distance      : chr [1:24] &quot;&quot; &quot;&quot; &quot;&quot; &quot;&quot; ...\r\n##   ..$ OtherLocations: chr [1:24] &quot;&quot; &quot;&quot; &quot;&quot; &quot;&quot; ...\r\n##   ..$ AcademicTitle : chr [1:24] &quot;&quot; &quot;&quot; &quot;&quot; &quot;Clinical Instructor of Pediatrics - Harvard Medical School&quot; ...\r\n##   ..$ HospitalTitle : chr [1:24] &quot;Pediatrician&quot; &quot;Pediatrician&quot; &quot;Pediatrician&quot; &quot;Pediatrician&quot; ...\r\n##   ..$ Specialties   : chr [1:24] &quot;Primary Care, Pediatrics, General Pediatrics&quot; &quot;Primary Care, Pediatrics, General Pediatrics&quot; &quot;General ## Pediatrics, Pediatrics, Primary Care&quot; &quot;Primary Care, Pediatrics, General Pediatrics&quot; ...\r\n##   ..$ Departments   : chr [1:24] &quot;&quot; &quot;&quot; &quot;&quot; &quot;&quot; ...\r\n##   ..$ Languages     : chr [1:24] &quot;English&quot; &quot;English&quot; &quot;&quot; &quot;&quot; ...\r\n##   ..$ PPOCLink      : chr [1:24] &quot;http:\/\/www.childrenshospital.org\/patient-resources\/provider-glossary&quot; &quot;\/patient-resources\/## provider-glossary&quot; &quot;http:\/\/www.childrenshospital.org\/patient-resources\/provider-glossary&quot; &quot;http:\/\/www.childrenshospital.org\/## patient-resources\/provider-glossary&quot; ...\r\n##   ..$ Gallery       : chr [1:24] &quot;&quot; &quot;&quot; &quot;&quot; &quot;&quot; ...\r\n##   ..$ Phone         : chr [1:24] &quot;781-438-7330&quot; &quot;617-471-3411&quot; &quot;781-729-4262&quot; &quot;781-862-4110&quot; ...\r\n##   ..$ Fax           : chr [1:24] &quot;781-279-4046&quot; &quot;(617) 471-3584&quot; &quot;&quot; &quot;(781) 863-2007&quot; ...\r\n##  $ Synonims    : list()<\/code><\/pre>\n<p><strong>UPDATE<\/strong> So, I wrote a mini-DSL for this:<\/p>\n<pre id=\"splash-dsl-1\"><code class=\"language-r\">splash_local %&gt;%\r\n  splash_response_body(TRUE) %&gt;% \r\n  splash_go(&quot;http:\/\/www.childrenshospital.org\/directory?state=%7B%22showLandingContent%22%3Afalse%2C%22model%22%3A%7B%22search_specialist%22%3Afalse%2C%22search_type%22%3A%5B%22directoryphysician%22%2C%22directorynurse%22%5D%7D%2C%22customModel%22%3A%7B%22nurses%22%3Atrue%7D%7D&quot;) %&gt;% \r\n  splash_wait(2) %&gt;% \r\n  splash_har() -&gt; res<\/code><\/pre>\n<p>which should make it easier to perform basic &#8220;go-wait-retrieve&#8221; operations.<\/p>\n<p>It&#8217;s unlikely we want to rely on a running Splash instance for our production work, so I&#8217;ll be making a helper function to turn HAR XHR requests into a <code>httr<\/code> function calls, similar to the way <a href=\"https:\/\/github.com\/hrbrmstr\/curlconverter\"><code>curlconverter<\/code><\/a> works.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>splashr has gained some new functionality since the introductory post. First, there&#8217;s a whole new Docker image for it that embeds a local web server. Why? The main request for it was to enable rendering of htmlwidgets: But if you use the new Docker image and the add_tempdir=TRUE parameter it can render any local HTML [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":true,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":"","jetpack_post_was_ever_published":false},"categories":[91,725],"tags":[810],"class_list":["post-5031","post","type-post","status-publish","format-standard","hentry","category-r","category-web-scraping","tag-post"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Spelunking XHRs (XMLHttpRequests) with splashr - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Spelunking XHRs (XMLHttpRequests) with splashr - rud.is\" \/>\n<meta property=\"og:description\" content=\"splashr has gained some new functionality since the introductory post. First, there&#8217;s a whole new Docker image for it that embeds a local web server. Why? The main request for it was to enable rendering of htmlwidgets: But if you use the new Docker image and the add_tempdir=TRUE parameter it can render any local HTML [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2017-02-14T18:19:29+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-03-10T12:54:27+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/02\\\/14\\\/spelunking-xhrs-xmlhttprequests-with-splashr\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/02\\\/14\\\/spelunking-xhrs-xmlhttprequests-with-splashr\\\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"Spelunking XHRs (XMLHttpRequests) with splashr\",\"datePublished\":\"2017-02-14T18:19:29+00:00\",\"dateModified\":\"2018-03-10T12:54:27+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/02\\\/14\\\/spelunking-xhrs-xmlhttprequests-with-splashr\\\/\"},\"wordCount\":341,\"commentCount\":2,\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"image\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/02\\\/14\\\/spelunking-xhrs-xmlhttprequests-with-splashr\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2017\\\/02\\\/diag.png\",\"keywords\":[\"post\"],\"articleSection\":[\"R\",\"web scraping\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/02\\\/14\\\/spelunking-xhrs-xmlhttprequests-with-splashr\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/02\\\/14\\\/spelunking-xhrs-xmlhttprequests-with-splashr\\\/\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/02\\\/14\\\/spelunking-xhrs-xmlhttprequests-with-splashr\\\/\",\"name\":\"Spelunking XHRs (XMLHttpRequests) with splashr - rud.is\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/02\\\/14\\\/spelunking-xhrs-xmlhttprequests-with-splashr\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/02\\\/14\\\/spelunking-xhrs-xmlhttprequests-with-splashr\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2017\\\/02\\\/diag.png\",\"datePublished\":\"2017-02-14T18:19:29+00:00\",\"dateModified\":\"2018-03-10T12:54:27+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/02\\\/14\\\/spelunking-xhrs-xmlhttprequests-with-splashr\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/02\\\/14\\\/spelunking-xhrs-xmlhttprequests-with-splashr\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/02\\\/14\\\/spelunking-xhrs-xmlhttprequests-with-splashr\\\/#primaryimage\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2017\\\/02\\\/diag.png?fit=384%2C249&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2017\\\/02\\\/diag.png?fit=384%2C249&ssl=1\",\"width\":384,\"height\":249},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/02\\\/14\\\/spelunking-xhrs-xmlhttprequests-with-splashr\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/rud.is\\\/b\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Spelunking XHRs (XMLHttpRequests) with splashr\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/rud.is\\\/b\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\\\/\\\/rud.is\"],\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/author\\\/hrbrmstr\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Spelunking XHRs (XMLHttpRequests) with splashr - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/","og_locale":"en_US","og_type":"article","og_title":"Spelunking XHRs (XMLHttpRequests) with splashr - rud.is","og_description":"splashr has gained some new functionality since the introductory post. First, there&#8217;s a whole new Docker image for it that embeds a local web server. Why? The main request for it was to enable rendering of htmlwidgets: But if you use the new Docker image and the add_tempdir=TRUE parameter it can render any local HTML [&hellip;]","og_url":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/","og_site_name":"rud.is","article_published_time":"2017-02-14T18:19:29+00:00","article_modified_time":"2018-03-10T12:54:27+00:00","og_image":[{"url":"https:\/\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png","type":"","width":"","height":""}],"author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"Spelunking XHRs (XMLHttpRequests) with splashr","datePublished":"2017-02-14T18:19:29+00:00","dateModified":"2018-03-10T12:54:27+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/"},"wordCount":341,"commentCount":2,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"image":{"@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png","keywords":["post"],"articleSection":["R","web scraping"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/","url":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/","name":"Spelunking XHRs (XMLHttpRequests) with splashr - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"primaryImageOfPage":{"@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#primaryimage"},"image":{"@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png","datePublished":"2017-02-14T18:19:29+00:00","dateModified":"2018-03-10T12:54:27+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#primaryimage","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png?fit=384%2C249&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/diag.png?fit=384%2C249&ssl=1","width":384,"height":249},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2017\/02\/14\/spelunking-xhrs-xmlhttprequests-with-splashr\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"Spelunking XHRs (XMLHttpRequests) with splashr"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-1j9","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":5004,"url":"https:\/\/rud.is\/b\/2017\/02\/09\/diving-into-dynamic-website-content-with-splashr\/","url_meta":{"origin":5031,"position":0},"title":"Diving Into Dynamic Website Content with splashr","author":"hrbrmstr","date":"2017-02-09","format":false,"excerpt":"If you do enough web scraping, you'll eventually hit a wall that the trusty httr verbs (that sit beneath rvest) cannot really overcome: dynamically created content (via javascript) on a site. If the site was nice enough to use XHR requests to load the dynamic content, you can generally still\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/cerv.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/cerv.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/cerv.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/cerv.png?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":11383,"url":"https:\/\/rud.is\/b\/2018\/08\/13\/in-brief-splashr-update-high-performance-scraping-with-splashr-furrr-teamhg-memexs-aquarium\/","url_meta":{"origin":5031,"position":1},"title":"In-brief: splashr update + High Performance Scraping with splashr, furrr &#038; TeamHG-Memex&#8217;s Aquarium","author":"hrbrmstr","date":"2018-08-13","format":false,"excerpt":"The development version of splashr now support authenticated connections to Splash API instances. Just specify user and pass on the initial splashr::splash() call to use your scraping setup a bit more safely. For those not familiar with splashr and\/or Splash: the latter is a lightweight alternative to tools like Selenium\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11765,"url":"https:\/\/rud.is\/b\/2019\/01\/14\/splashr-0-6-0-now-uses-the-cran-nascent-stevedore-package-for-docker-orchestration\/","url_meta":{"origin":5031,"position":2},"title":"splashr 0.6.0 Now Uses the CRAN-nascent stevedore Package for Docker Orchestration","author":"hrbrmstr","date":"2019-01-14","format":false,"excerpt":"The splashr package [srht|GL|GH] \u2014 an alternative to Selenium for javascript-enabled\/browser-emulated web scraping \u2014 is now at version 0.6.0 (still in dev-mode but on its way to CRAN in the next 14 days). The major change from version 0.5.x (which never made it to CRAN) is a swap out of\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6206,"url":"https:\/\/rud.is\/b\/2017\/08\/29\/new-cran-package-announcement-splashr\/","url_meta":{"origin":5031,"position":3},"title":"New CRAN Package Announcement: splashr","author":"hrbrmstr","date":"2017-08-29","format":false,"excerpt":"I'm pleased to announce that splashr is now on CRAN. (That image was generated with splashr::render_png(url = \"https:\/\/cran.r-project.org\/web\/packages\/splashr\/\")). The package is an R interface to the Splash javascript rendering service. It works in a similar fashion to Selenium but is fear more geared to web scraping and has quite a\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/08\/splashr.png?fit=1066%2C1108&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/08\/splashr.png?fit=1066%2C1108&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/08\/splashr.png?fit=1066%2C1108&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/08\/splashr.png?fit=1066%2C1108&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/08\/splashr.png?fit=1066%2C1108&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":6385,"url":"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/","url_meta":{"origin":5031,"position":4},"title":"Pirating Web Content Responsibly With R","author":"hrbrmstr","date":"2017-09-19","format":false,"excerpt":"International Code Talk Like A Pirate Day almost slipped by without me noticing (September has been a crazy busy month), but it popped up in the calendar notifications today and I was glad that I had prepped the meat of a post a few weeks back. There will be no\u2026","rel":"","context":"In &quot;data wrangling&quot;","block_context":{"text":"data wrangling","link":"https:\/\/rud.is\/b\/category\/data-wrangling\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1200%2C917&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1200%2C917&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1200%2C917&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1200%2C917&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1200%2C917&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":6193,"url":"https:\/\/rud.is\/b\/2017\/08\/29\/rpad-domain-repurposed-to-deliver-creepy-and-potentially-malicious-content\/","url_meta":{"origin":5031,"position":5},"title":"Rpad Domain Repurposed To Deliver Creepy (and potentially malicious) Content","author":"hrbrmstr","date":"2017-08-29","format":false,"excerpt":"I was about to embark on setting up a background task to sift through R package PDFs for traces of functions that \"omit NA values\" as a surprise present for Colin Fay and Sir Tierney: [Please RT]#RStats folks, @nj_tierney & I need your help for {naniar}!When does R silently drop\/omit\u2026","rel":"","context":"In &quot;Cybersecurity&quot;","block_context":{"text":"Cybersecurity","link":"https:\/\/rud.is\/b\/category\/cybersecurity\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/08\/Plot_Zoom.png?fit=868%2C1200&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/08\/Plot_Zoom.png?fit=868%2C1200&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/08\/Plot_Zoom.png?fit=868%2C1200&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/08\/Plot_Zoom.png?fit=868%2C1200&ssl=1&resize=700%2C400 2x"},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/5031","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=5031"}],"version-history":[{"count":13,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/5031\/revisions"}],"predecessor-version":[{"id":8951,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/5031\/revisions\/8951"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=5031"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=5031"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=5031"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}