

{"id":6067,"date":"2017-06-05T09:34:48","date_gmt":"2017-06-05T14:34:48","guid":{"rendered":"https:\/\/rud.is\/b\/?p=6067"},"modified":"2018-03-07T17:09:31","modified_gmt":"2018-03-07T22:09:31","slug":"r%e2%81%b6-scraping-images-to-pdfs","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2017\/06\/05\/r%e2%81%b6-scraping-images-to-pdfs\/","title":{"rendered":"R\u2076 \u2014 Scraping Images To PDFs"},"content":{"rendered":"<p>I&#8217;ve been doing intermittent prep work for a follow-up to an earlier <a href=\"https:\/\/rud.is\/b\/2017\/03\/19\/exploring-2017-retail-store-closings-with-r\/\">post on store closings<\/a> and came across <a href=\"http:\/\/money.cnn.com\/interactive\/technology\/radio-shack-closure-list\/index.html\">this CNN Money<\/a> &#8220;article&#8221; on it. Said &#8220;article&#8221; is a deliberately obfuscated or lazily crafted series of GIF images that contain all the Radio Shack impending store closings. It&#8217;s the most comprehensive list I&#8217;ve found, but the format is terrible and there&#8217;s no easy, in-browser way to download them all.<\/p>\n<p>CNN has ToS that prevent automated data gathering from CNN-proper. But, they used Adobe Document Cloud for these images which has no similar restrictions from a quick glance at their ToS. That means <em>you<\/em> get an R\u2076 post on how to grab the individual 38 images and combine them into one PDF. I did this all with the hopes of OCRing the text, which has not panned out too well since the image quality and font was likely deliberately set to make it hard to do precisely what I&#8217;m trying to do.<\/p>\n<p>If you work through the example, you&#8217;ll get a feel for:<\/p>\n<ul>\n<li>using <code>sprintf()<\/code> to take a template and build a vector of URLs<\/li>\n<li>use <code>dplyr<\/code> progress bars<\/li>\n<li>customize <code>httr<\/code> verb options to ensure you can get to content<\/li>\n<li>use <code>purrr<\/code> to iterate through a process of turning raw image bytes into image content (via <code>magick<\/code>) and turn a list of images into a PDF<\/li>\n<\/ul>\n<pre id=\"scrape-to-pdf-01\"><code class=\"language-r\">library(httr)\r\nlibrary(magick)\r\nlibrary(tidyverse)\r\n\r\nurl_template &lt;- &quot;https:\/\/assets.documentcloud.org\/documents\/1657793\/pages\/radioshack-convert-p%s-large.gif&quot;\r\n\r\npb &lt;- progress_estimated(38)\r\n\r\nsprintf(url_template, 1:38) %&gt;% \r\n  map(~{\r\n    pb$tick()$print()\r\n    GET(url = .x, \r\n        add_headers(\r\n          accept = &quot;image\/webp,image\/apng,image\/*,*\/*;q=0.8&quot;, \r\n          referer = &quot;http:\/\/money.cnn.com\/interactive\/technology\/radio-shack-closure-list\/index.html&quot;, \r\n          authority = &quot;assets.documentcloud.org&quot;))    \r\n  }) -&gt; store_list_pages\r\n\r\nmap(store_list_pages, content) %&gt;% \r\n  map(image_read) %&gt;% \r\n  reduce(image_join) %&gt;% \r\n  image_write(&quot;combined_pages.pdf&quot;, format = &quot;pdf&quot;)<\/code><\/pre>\n<p>I figured out the Document Cloud links and necessary <code>httr::GET()<\/code> options by using Chrome Developer Tools and my <a href=\"https:\/\/github.com\/hrbrmstr\/curlconverter\"><code>curlconverter<\/code><\/a> package.<\/p>\n<p>If any academic-y folks have a <strike>test subject<\/strike>summer intern with a free hour and would be willing to have them transcribe this list and stick it on GitHub, you&#8217;d have my eternal thanks.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been doing intermittent prep work for a follow-up to an earlier post on store closings and came across this CNN Money &#8220;article&#8221; on it. Said &#8220;article&#8221; is a deliberately obfuscated or lazily crafted series of GIF images that contain all the Radio Shack impending store closings. It&#8217;s the most comprehensive list I&#8217;ve found, but [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[91,725],"tags":[810,787],"class_list":["post-6067","post","type-post","status-publish","format-standard","hentry","category-r","category-web-scraping","tag-post","tag-r6"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>R\u2076 \u2014 Scraping Images To PDFs - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2017\/06\/05\/r\u2076-scraping-images-to-pdfs\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"R\u2076 \u2014 Scraping Images To PDFs - rud.is\" \/>\n<meta property=\"og:description\" content=\"I&#8217;ve been doing intermittent prep work for a follow-up to an earlier post on store closings and came across this CNN Money &#8220;article&#8221; on it. Said &#8220;article&#8221; is a deliberately obfuscated or lazily crafted series of GIF images that contain all the Radio Shack impending store closings. It&#8217;s the most comprehensive list I&#8217;ve found, but [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2017\/06\/05\/r\u2076-scraping-images-to-pdfs\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2017-06-05T14:34:48+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-03-07T22:09:31+00:00\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/06\\\/05\\\/r%e2%81%b6-scraping-images-to-pdfs\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/06\\\/05\\\/r%e2%81%b6-scraping-images-to-pdfs\\\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"R\u2076 \u2014 Scraping Images To PDFs\",\"datePublished\":\"2017-06-05T14:34:48+00:00\",\"dateModified\":\"2018-03-07T22:09:31+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/06\\\/05\\\/r%e2%81%b6-scraping-images-to-pdfs\\\/\"},\"wordCount\":287,\"commentCount\":5,\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"keywords\":[\"post\",\"r6\"],\"articleSection\":[\"R\",\"web scraping\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/06\\\/05\\\/r%e2%81%b6-scraping-images-to-pdfs\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/06\\\/05\\\/r%e2%81%b6-scraping-images-to-pdfs\\\/\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/06\\\/05\\\/r%e2%81%b6-scraping-images-to-pdfs\\\/\",\"name\":\"R\u2076 \u2014 Scraping Images To PDFs - rud.is\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\"},\"datePublished\":\"2017-06-05T14:34:48+00:00\",\"dateModified\":\"2018-03-07T22:09:31+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/06\\\/05\\\/r%e2%81%b6-scraping-images-to-pdfs\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/06\\\/05\\\/r%e2%81%b6-scraping-images-to-pdfs\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/06\\\/05\\\/r%e2%81%b6-scraping-images-to-pdfs\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/rud.is\\\/b\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"R\u2076 \u2014 Scraping Images To PDFs\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/rud.is\\\/b\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\\\/\\\/rud.is\"],\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/author\\\/hrbrmstr\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"R\u2076 \u2014 Scraping Images To PDFs - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2017\/06\/05\/r\u2076-scraping-images-to-pdfs\/","og_locale":"en_US","og_type":"article","og_title":"R\u2076 \u2014 Scraping Images To PDFs - rud.is","og_description":"I&#8217;ve been doing intermittent prep work for a follow-up to an earlier post on store closings and came across this CNN Money &#8220;article&#8221; on it. Said &#8220;article&#8221; is a deliberately obfuscated or lazily crafted series of GIF images that contain all the Radio Shack impending store closings. It&#8217;s the most comprehensive list I&#8217;ve found, but [&hellip;]","og_url":"https:\/\/rud.is\/b\/2017\/06\/05\/r\u2076-scraping-images-to-pdfs\/","og_site_name":"rud.is","article_published_time":"2017-06-05T14:34:48+00:00","article_modified_time":"2018-03-07T22:09:31+00:00","author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2017\/06\/05\/r%e2%81%b6-scraping-images-to-pdfs\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2017\/06\/05\/r%e2%81%b6-scraping-images-to-pdfs\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"R\u2076 \u2014 Scraping Images To PDFs","datePublished":"2017-06-05T14:34:48+00:00","dateModified":"2018-03-07T22:09:31+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2017\/06\/05\/r%e2%81%b6-scraping-images-to-pdfs\/"},"wordCount":287,"commentCount":5,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"keywords":["post","r6"],"articleSection":["R","web scraping"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2017\/06\/05\/r%e2%81%b6-scraping-images-to-pdfs\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2017\/06\/05\/r%e2%81%b6-scraping-images-to-pdfs\/","url":"https:\/\/rud.is\/b\/2017\/06\/05\/r%e2%81%b6-scraping-images-to-pdfs\/","name":"R\u2076 \u2014 Scraping Images To PDFs - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"datePublished":"2017-06-05T14:34:48+00:00","dateModified":"2018-03-07T22:09:31+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2017\/06\/05\/r%e2%81%b6-scraping-images-to-pdfs\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2017\/06\/05\/r%e2%81%b6-scraping-images-to-pdfs\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2017\/06\/05\/r%e2%81%b6-scraping-images-to-pdfs\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"R\u2076 \u2014 Scraping Images To PDFs"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-1zR","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":5907,"url":"https:\/\/rud.is\/b\/2017\/05\/05\/scrapeover-friday-a-k-a-another-r-scraping-makeover\/","url_meta":{"origin":6067,"position":0},"title":"Scrapeover Friday \u2014 a.k.a. Another R Scraping Makeover","author":"hrbrmstr","date":"2017-05-05","format":false,"excerpt":"I caught a glimpse of a tweet by @dataandme on Friday: Using R & rvest to explore Malaysian property mkt: \"Web Scraping: The Sequel, Propwall.my\" https:\/\/t.co\/daZOOJJfPN #rstats #rvest pic.twitter.com\/u6QMhm4M3e\u2014 Mara Averick (@dataandme) May 5, 2017 Mara is \u2014 without a doubt \u2014 the best data science promoter in the Twitterverse.\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":5178,"url":"https:\/\/rud.is\/b\/2017\/03\/19\/exploring-2017-retail-store-closings-with-r\/","url_meta":{"origin":6067,"position":1},"title":"Exploring 2017 Retail Store Closings with R","author":"hrbrmstr","date":"2017-03-19","format":false,"excerpt":"A story about one of the retail chains (J.C. Penny) releasing their list of stores closing in 2017 crossed paths with my Feedly reading list today and jogged my memory that there were a number of chains closing many of their doors this year, and I wanted to see the\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/03\/bls-1.png?fit=1200%2C1050&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/03\/bls-1.png?fit=1200%2C1050&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/03\/bls-1.png?fit=1200%2C1050&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/03\/bls-1.png?fit=1200%2C1050&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/03\/bls-1.png?fit=1200%2C1050&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":1849,"url":"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/","url_meta":{"origin":6067,"position":2},"title":"Easier HTML Table-scraping For Scripts With Google Drive","author":"hrbrmstr","date":"2012-12-17","format":false,"excerpt":"We had our first, real, snowfall of the season in Maine today and that usually means school delays\/closings. Our \"local\" station \u2013 @WCHS6 \u2013 has a Storm Center Closings page as well as an SMS notification service. I decided this morning that I needed a command line version (and, eventually,\u2026","rel":"","context":"In &quot;Google Docs&quot;","block_context":{"text":"Google Docs","link":"https:\/\/rud.is\/b\/category\/google-docs\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6134,"url":"https:\/\/rud.is\/b\/2017\/07\/28\/analyzing-wait-delay-settings-in-common-crawl-robots-txt-data-with-r\/","url_meta":{"origin":6067,"position":3},"title":"Analyzing &#8220;Crawl-Delay&#8221; Settings in Common Crawl robots.txt Data with R","author":"hrbrmstr","date":"2017-07-28","format":false,"excerpt":"One of my tweets that referenced an excellent post about the ethics of web scraping garnered some interest: Apologies for a Medium link but if you do ANY web scraping, you need to read this #rstats \/\/ Ethics in Web Scraping https:\/\/t.co\/y5YxvzB8Fd\u2014 boB Rudis (@hrbrmstr) July 26, 2017 If you\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/07\/Cursor_and_RStudio.png?fit=1200%2C620&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/07\/Cursor_and_RStudio.png?fit=1200%2C620&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/07\/Cursor_and_RStudio.png?fit=1200%2C620&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/07\/Cursor_and_RStudio.png?fit=1200%2C620&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/07\/Cursor_and_RStudio.png?fit=1200%2C620&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":5004,"url":"https:\/\/rud.is\/b\/2017\/02\/09\/diving-into-dynamic-website-content-with-splashr\/","url_meta":{"origin":6067,"position":4},"title":"Diving Into Dynamic Website Content with splashr","author":"hrbrmstr","date":"2017-02-09","format":false,"excerpt":"If you do enough web scraping, you'll eventually hit a wall that the trusty httr verbs (that sit beneath rvest) cannot really overcome: dynamically created content (via javascript) on a site. If the site was nice enough to use XHR requests to load the dynamic content, you can generally still\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6164,"url":"https:\/\/rud.is\/b\/2017\/08\/22\/caching-httr-requests-this-means-warc\/","url_meta":{"origin":6067,"position":5},"title":"Caching httr Requests? This means WAR[C]!","author":"hrbrmstr","date":"2017-08-22","format":false,"excerpt":"I've blathered about my crawl_delay project before and am just waiting for a rainy weekend to be able to crank out a follow-up post on it. Working on that project involved sifting through thousands of Web Archive (WARC) files. While I have a nascent package on github to work with\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/6067","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=6067"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/6067\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=6067"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=6067"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=6067"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}