

{"id":6385,"date":"2017-09-19T07:28:03","date_gmt":"2017-09-19T12:28:03","guid":{"rendered":"https:\/\/rud.is\/b\/?p=6385"},"modified":"2018-03-07T17:02:05","modified_gmt":"2018-03-07T22:02:05","slug":"pirating-web-content-responsibly-with-r","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/","title":{"rendered":"Pirating Web Content Responsibly With R"},"content":{"rendered":"<p>International <del datetime=\"2017-09-19T11:31:05+00:00\">Code<\/del> Talk Like A Pirate Day almost slipped by without me noticing (September has been a crazy busy month), but it popped up in the calendar notifications today and I was glad that I had prepped the meat of a post a few weeks back.<\/p>\n<p>There will be no &#8216;rrrrrr&#8217; abuse in this post, I&#8217;m afraid, but there will be plenty of R code.<\/p>\n<p>We&#8217;re going to combine pirate day with &#8220;pirating&#8221; data, in the sense that I&#8217;m going to show one way on how to use the web scraping powers of R <em>responsibly<\/em> to collect data on and explore modern-day pirate encounters.<\/p>\n<h3>Scouring The <del datetime=\"2017-09-19T11:31:05+00:00\">Seas<\/del> Web For Pirate Data<\/h3>\n<p>Interestingly enough, there are many of sources for pirate data. I&#8217;ve blogged a few in the past, but I came across a new (to me) one by the International Chamber of Commerce. Their Commercial Crime Services division has something called the <a href=\"https:\/\/www.icc-ccs.org\/index.php\/piracy-reporting-centre\/live-piracy-report\/details\/169\/1345\">Live Piracy &amp; Armed Robbery Report<\/a>:<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/preview.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"6386\" data-permalink=\"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/preview-2\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/preview.png?fit=1024%2C959&amp;ssl=1\" data-orig-size=\"1024,959\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"preview\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/preview.png?fit=510%2C478&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/preview.png?resize=510%2C478&#038;ssl=1\" alt=\"\" width=\"510\" height=\"478\" class=\"aligncenter size-full wp-image-6386\" \/><\/a><\/p>\n<p>(site png snapshot taken with <code>splashr<\/code>)<\/p>\n<p>I fiddled a bit with the URL and &#8212; sure enough &#8212; if you work a bit you can get data going back to late 2013, all in the same general format, so I jotted down base URLs and start+end record values and filed them away for future use:<\/p>\n<pre id=\"tlap201700\"><code class=\"language-r\">library(V8)\r\nlibrary(stringi)\r\nlibrary(httr)\r\nlibrary(rvest)\r\nlibrary(robotstxt)\r\nlibrary(jwatr) # github\/hrbrmstr\/jwatr\r\nlibrary(hrbrthemes)\r\nlibrary(purrrlyr)\r\nlibrary(rprojroot)\r\nlibrary(tidyverse)\r\n\r\nreport_urls &lt;- read.csv(stringsAsFactors=FALSE, header=TRUE, text=&quot;url,start,end\r\nhttps:\/\/www.icc-ccs.org\/index.php\/piracy-reporting-centre\/live-piracy-report\/details\/169\/, 1345, 1459\r\nhttps:\/\/www.icc-ccs.org\/piracy-reporting-centre\/live-piracy-report\/details\/151\/, 1137, 1339\r\nhttps:\/\/www.icc-ccs.org\/piracy-reporting-centre\/live-piracy-map\/details\/146\/, 885, 1138\r\nhttps:\/\/www.icc-ccs.org\/piracy-reporting-centre\/live-piracy-report\/details\/144\/, 625, 884\r\nhttps:\/\/www.icc-ccs.org\/index.php\/piracy-reporting-centre\/live-piracy-report\/details\/133\/, 337, 623&quot;)\r\n\r\nby_row(report_urls, ~sprintf(.x$url %s+% &quot;%s&quot;, .x$start:.x$end), .to=&quot;url_list&quot;) %&gt;%\r\n  pull(url_list) %&gt;%\r\n  flatten_chr() -&gt; target_urls\r\n\r\nhead(target_urls)\r\n## [1] &quot;https:\/\/www.icc-ccs.org\/index.php\/piracy-reporting-centre\/live-piracy-report\/details\/169\/1345&quot;\r\n## [2] &quot;https:\/\/www.icc-ccs.org\/index.php\/piracy-reporting-centre\/live-piracy-report\/details\/169\/1346&quot;\r\n## [3] &quot;https:\/\/www.icc-ccs.org\/index.php\/piracy-reporting-centre\/live-piracy-report\/details\/169\/1347&quot;\r\n## [4] &quot;https:\/\/www.icc-ccs.org\/index.php\/piracy-reporting-centre\/live-piracy-report\/details\/169\/1348&quot;\r\n## [5] &quot;https:\/\/www.icc-ccs.org\/index.php\/piracy-reporting-centre\/live-piracy-report\/details\/169\/1349&quot;\r\n## [6] &quot;https:\/\/www.icc-ccs.org\/index.php\/piracy-reporting-centre\/live-piracy-report\/details\/169\/1350&quot;<\/code><\/pre>\n<p>Time to pillage some details!<\/p>\n<h3>But&hellip;Can We Really Do It?<\/h3>\n<p>I poked around the site&#8217;s terms of service\/terms and conditions and automated retrieval was not discouraged. Yet, those aren&#8217;t the only sea mines we have to look out for. Perhaps they use their <code>robots.txt<\/code> to stop pirates. Let&#8217;s take a look:<\/p>\n<pre id=\"tlap201701\"><code class=\"language-r\">robotstxt::get_robotstxt(&quot;https:\/\/www.icc-ccs.org\/&quot;)\r\n## # If the Joomla site is installed within a folder such as at\r\n## # e.g. www.example.com\/joomla\/ the robots.txt file MUST be\r\n## # moved to the site root at e.g. www.example.com\/robots.txt\r\n## # AND the joomla folder name MUST be prefixed to the disallowed\r\n## # path, e.g. the Disallow rule for the \/administrator\/ folder\r\n## # MUST be changed to read Disallow: \/joomla\/administrator\/\r\n## #\r\n## # For more information about the robots.txt standard, see:\r\n## # http:\/\/www.robotstxt.org\/orig.html\r\n## #\r\n## # For syntax checking, see:\r\n## # http:\/\/www.sxw.org.uk\/computing\/robots\/check.html\r\n##\r\n## User-agent: *\r\n## Disallow: \/administrator\/\r\n## Disallow: \/cache\/\r\n## Disallow: \/cli\/\r\n## Disallow: \/components\/\r\n## Disallow: \/images\/\r\n## Disallow: \/includes\/\r\n## Disallow: \/installation\/\r\n## Disallow: \/language\/\r\n## Disallow: \/libraries\/\r\n## Disallow: \/logs\/\r\n## Disallow: \/media\/\r\n## Disallow: \/modules\/\r\n## Disallow: \/plugins\/\r\n## Disallow: \/templates\/\r\n## Disallow: \/tmp\/<\/code><\/pre>\n<p>Ahoy! We&#8217;ve got a license to pillage!<\/p>\n<p>But, we don&#8217;t have a license to abuse their site.<\/p>\n<p>While I still haven&#8217;t had time to follow up on an <a href=\"https:\/\/rud.is\/b\/2017\/07\/28\/analyzing-wait-delay-settings-in-common-crawl-robots-txt-data-with-r\/\">earlier post about &#8216;crawl-delay&#8217;<\/a> settings across the internet I <em>have<\/em> done enough work on it to know that a 5 or 10 second delay is the most common setting (when sites bother to have this directive in their <code>robots.txt<\/code> file). ICC&#8217;s site does not have this setting defined, but we&#8217;ll still <del datetime=\"2017-09-19T11:31:05+00:00\">pirate<\/del> crawl responsibly and use a 5 second delay between requests:<\/p>\n<pre id=\"tlap201702\"><code class=\"language-r\">s_GET &lt;- safely(GET)\r\n\r\npb &lt;- progress_estimated(length(target_urls))\r\nmap(target_urls, ~{\r\n  pb$tick()$print()\r\n  Sys.sleep(5)\r\n  s_GET(.x)\r\n}) -&gt; httr_raw_responses\r\n\r\nwrite_rds(httr_raw_responses, &quot;data\/2017-icc-ccs-raw-httr-responses.rds&quot;)\r\n\r\ngood_responses &lt;- keep(httr_raw_responses, ~!is.null(.x$result))\r\n\r\njwatr::response_list_to_warc_file(good_responses, &quot;data\/icc-good&quot;)<\/code><\/pre>\n<p>There are more &#8220;safety&#8221; measures you can use with <code>httr::GET()<\/code> but this one is usually sufficient. It just prevents the iteration from dying when there are hard retrieval errors.<\/p>\n<p>I also like to save off the crawl results so I can go back to the raw file (if needed) vs re-scrape the site (this crawl takes a while). I do it two ways here, first using raw <code>httr<\/code> <code>response<\/code> objects (including any &#8220;broken&#8221; ones) and then filtering out the &#8220;complete&#8221; responses and saving them in WARC format so it&#8217;s in a more common format for sharing with others who may not use R.<\/p>\n<h3>Digging For Treasure<\/h3>\n<p>Did I mention that while the site looks like it&#8217;s easy to scrape it&#8217;s really not easy to scrape? That nice looking table is a sea mirage ready to trap unwary <del datetime=\"2017-09-19T11:31:05+00:00\">sailors<\/del> crawlers in a pit of despair. The UX is built dynamically from on-page javascript content, a portion of which is below:<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/view-source_https___www_icc-ccs_org_index_php_piracy-reporting-centre_live-piracy-report_details_169_1345.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"6392\" data-permalink=\"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/view-source_https___www_icc-ccs_org_index_php_piracy-reporting-centre_live-piracy-report_details_169_1345\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/view-source_https___www_icc-ccs_org_index_php_piracy-reporting-centre_live-piracy-report_details_169_1345.png?fit=2862%2C1582&amp;ssl=1\" data-orig-size=\"2862,1582\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"view-source_https___www_icc-ccs_org_index_php_piracy-reporting-centre_live-piracy-report_details_169_1345\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/view-source_https___www_icc-ccs_org_index_php_piracy-reporting-centre_live-piracy-report_details_169_1345.png?fit=510%2C282&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/view-source_https___www_icc-ccs_org_index_php_piracy-reporting-centre_live-piracy-report_details_169_1345.png?resize=510%2C282&#038;ssl=1\" alt=\"\" width=\"510\" height=\"282\" class=\"aligncenter size-full wp-image-6392\" \/><\/a><\/p>\n<p>Now, you&#8217;re likely thinking: <em>&#8220;Don&#8217;t we need to re-scrape the site with <code>seleniumPipes<\/code> or <code>splashr<\/code>?&#8221;<\/em><\/p>\n<p>Fear not, stout yeoman! We can do this with the content we have if we don&#8217;t mind swabbing the decks first. Let&#8217;s put the <del datetime=\"2017-09-19T11:31:05+00:00\">map<\/del> code up first and then dig into the details:<\/p>\n<pre id=\"tlap201703\"><code class=\"language-r\"># make field names great again\r\nmfga &lt;- function(x) {\r\n  x &lt;- tolower(x)\r\n  x &lt;- gsub(&quot;[[:punct:][:space:]]+&quot;, &quot;_&quot;, x)\r\n  x &lt;- gsub(&quot;_+&quot;, &quot;_&quot;, x)\r\n  x &lt;- gsub(&quot;(^_|_$)&quot;, &quot;&quot;, x)\r\n  x &lt;- make.unique(x, sep = &quot;_&quot;)\r\n  x\r\n}\r\n\r\n# I know the columns I want and this makes getting them into the types I want easier\r\ncols(\r\n  attack_number = col_character(),\r\n  attack_posn_map = col_character(),\r\n  date = col_datetime(format = &quot;&quot;),\r\n  date_time = col_datetime(format = &quot;&quot;),\r\n  id = col_integer(),\r\n  location_detail = col_character(),\r\n  narrations = col_character(),\r\n  type_of_attack = col_character(),\r\n  type_of_vessel = col_character()\r\n) -&gt; pirate_cols\r\n\r\n# iterate over the good responses with a progress bar\r\npb &lt;- progress_estimated(length(good_responses))\r\nmap_df(good_responses, ~{\r\n\r\n  pb$tick()$print()\r\n\r\n  # `safely` hides the data under `result` so expose it\r\n  doc &lt;- content(.x$result)\r\n\r\n  # target the `&lt;script&gt;` tag that has our data, carve out the target lines, do some data massaging and evaluate the javascript with V8\r\n  html_nodes(doc, xpath=&quot;.\/\/script[contains(., &#039;requirejs&#039;)]&quot;) %&gt;%\r\n    html_text() %&gt;%\r\n    stri_split_lines() %&gt;%\r\n    .[[1]] %&gt;%\r\n    grep(&quot;narrations_ro&quot;, ., value=TRUE) %&gt;%\r\n    sprintf(&quot;var dat = %s;&quot;, .) %&gt;%\r\n    ctx$eval()\r\n\r\n  p &lt;- ctx$get(&quot;dat&quot;, flatten=TRUE)\r\n\r\n  # now, process that data, turing the ugly returned list content into something we can put in a data frame\r\n  keep(p[[1]], is.list) %&gt;%\r\n    map_df(~{\r\n      list(\r\n        field = mfga(.x[[3]]$label),\r\n        value = .x[[3]]$value\r\n      )\r\n    }) %&gt;%\r\n    filter(value != &quot;&quot;) %&gt;%\r\n    distinct(field, .keep_all = TRUE) %&gt;%\r\n    spread(field, value)\r\n\r\n}) %&gt;%\r\n  type_convert(col_types = pirate_cols) %&gt;%\r\n  filter(stri_detect_regex(attack_number, &quot;^[[:digit:]]&quot;)) %&gt;%\r\n  filter(lubridate::year(date) &gt; 2012) %&gt;%\r\n  mutate(\r\n    attack_posn_map = stri_replace_last_regex(attack_posn_map, &quot;:.*$&quot;, &quot;&quot;),\r\n    attack_posn_map = stri_replace_all_regex(attack_posn_map, &quot;[\\\\(\\\\) ]&quot;, &quot;&quot;)\r\n  ) %&gt;%\r\n  separate(attack_posn_map, sep=&quot;,&quot;, into=c(&quot;lat&quot;, &quot;lng&quot;)) %&gt;%\r\n  mutate(lng = as.numeric(lng), lat = as.numeric(lat)) -&gt; pirate_df\r\n\r\nwrite_rds(pirate_df, &quot;data\/pirate_df.rds&quot;)<\/code><\/pre>\n<p>The first bit there is a function to &#8220;make field names great again&#8221;. We&#8217;re processing some ugly list data and it&#8217;s not all uniform across all years so this will help make the data wrangling idiom more generic.<\/p>\n<p>Next, I setup a <code>cols<\/code> object because we&#8217;re going to be extracting data from text as text and I think it&#8217;s cleaner to <code>type_convert<\/code> at the end vs have a slew of <code>as.numeric()<\/code> (et al) statements in-code (for small mumnging). You&#8217;ll note at the end of the munging pipeline I still need to do some manual conversions.<\/p>\n<p>Now we can iterate over the good (complete) responses.<\/p>\n<p>The <code>purrr::safely<\/code> function shoves the real <code>httr<\/code> response in <code>result<\/code> so we focus on that then &#8220;surgically&#8221; extract the target data from the <code>&lt;script<\/code>> tag. Once we have it, we get it into a form we can feed into the <code>V8<\/code> javascript engine and then retrieve the data from said evaluation.<\/p>\n<p>Because ICC used the same Joomla plugin over the years, the data is uniform, but also can contain additional fields, so we extract the fields in a generic manner. During the course of data wrangling, I noticed there were often multiple <code>Date:<\/code> fields, so we throw in some logic to help avoid duplicate field names as well.<\/p>\n<p>That whole process goes <em>really<\/em> quickly, but why not save off the clean data at the end for good measure?<\/p>\n<h3>Gotta Have A Pirate Map<\/h3>\n<p>Now we can begin to explore the data. I&#8217;ll leave most of that to you (since I&#8217;m providing the scraped data oh github), but here are a few views. First, just some simple counts per month:<\/p>\n<pre id=\"tlap201704\"><code class=\"language-r\">mutate(pirate_df, year = lubridate::year(date), year_mon = as.Date(format(date, &quot;%Y-%m-01&quot;))) %&gt;%\r\n  count(year_mon) %&gt;%\r\n  ggplot(aes(year_mon, n)) +\r\n  geom_segment(aes(xend=year_mon, yend=0)) +\r\n  scale_y_comma() +\r\n  labs(x=NULL, y=NULL,\r\n       title=&quot;(Confirmed) Piracy Incidents per Month&quot;,\r\n       caption=&quot;Source: International Chamber of Commerce Commercial Crime Services &lt;https:\/\/www.icc-ccs.org\/&gt;&quot;) +\r\n  theme_ipsum_rc(grid=&quot;Y&quot;)<\/code><\/pre>\n<p><a href=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-1.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"6394\" data-permalink=\"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/plot_zoom-12\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-1.png?fit=1972%2C534&amp;ssl=1\" data-orig-size=\"1972,534\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Plot_Zoom\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-1.png?fit=510%2C138&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-1.png?resize=510%2C138&#038;ssl=1\" alt=\"\" width=\"510\" height=\"138\" class=\"aligncenter size-full wp-image-6394\" \/><\/a><\/p>\n<p>And, finally, a map showing pirate encounters but colored by year:<\/p>\n<pre id=\"tlap201705\"><code class=\"language-r\">world &lt;- map_data(&quot;world&quot;)\r\n\r\nmutate(pirate_df, year = lubridate::year(date)) %&gt;%\r\n  arrange(year) %&gt;%\r\n  mutate(year = factor(year)) -&gt; plot_df\r\n\r\nggplot() +\r\n  geom_map(data = world, map = world, aes(x=long, y=lat, map_id=region), fill=&quot;#b2b2b2&quot;) +\r\n  geom_point(data = plot_df, aes(lng, lat, color=year), size=2, alpha=1\/3) +\r\n  ggalt::coord_proj(&quot;+proj=wintri&quot;) +\r\n  viridis::scale_color_viridis(name=NULL, discrete=TRUE) +\r\n  labs(x=NULL, y=NULL,\r\n       title=&quot;Piracy Incidents per Month (Confirmed)&quot;,\r\n       caption=&quot;Source: International Chamber of Commerce Commercial Crime Services &lt;https:\/\/www.icc-ccs.org\/&gt;&quot;) +\r\n  theme_ipsum_rc(grid=&quot;XY&quot;) +\r\n  theme(legend.position = &quot;bottom&quot;)<\/code><\/pre>\n<p><a href=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"6395\" data-permalink=\"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/plot_zoom-13\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1818%2C1390&amp;ssl=1\" data-orig-size=\"1818,1390\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Plot_Zoom\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=510%2C390&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?resize=510%2C390&#038;ssl=1\" alt=\"\" width=\"510\" height=\"390\" class=\"aligncenter size-full wp-image-6395\" \/><\/a><\/p>\n<h3>Taking Up The Mantle of the Dread Pirate Hrbrmstr<\/h3>\n<p>Hopefully this post shed some light on scraping responsibly and using different techniques to get to hidden data in web pages.<\/p>\n<p>There&#8217;s some free-form text and more than a few other ways to look at the data. You can find the code and data <a href=\"https:\/\/github.com\/hrbrmstr\/2017-tlapd\">on Github<\/a> and don&#8217;t hesitate to ask questions in the comments or file an issue. If you make something blog it! Share your ideas and creations with the rest of the R (or other language) communities!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>International Code Talk Like A Pirate Day almost slipped by without me noticing (September has been a crazy busy month), but it popped up in the calendar notifications today and I was glad that I had prepped the meat of a post a few weeks back. There will be no &#8216;rrrrrr&#8217; abuse in this post, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":6395,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[764,91,762,725],"tags":[810],"class_list":["post-6385","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-wrangling","category-r","category-tlapd","category-web-scraping","tag-post"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Pirating Web Content Responsibly With R - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Pirating Web Content Responsibly With R - rud.is\" \/>\n<meta property=\"og:description\" content=\"International Code Talk Like A Pirate Day almost slipped by without me noticing (September has been a crazy busy month), but it popped up in the calendar notifications today and I was glad that I had prepped the meat of a post a few weeks back. There will be no &#8216;rrrrrr&#8217; abuse in this post, [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2017-09-19T12:28:03+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-03-07T22:02:05+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1818%2C1390&ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"1818\" \/>\n\t<meta property=\"og:image:height\" content=\"1390\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/09\\\/19\\\/pirating-web-content-responsibly-with-r\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/09\\\/19\\\/pirating-web-content-responsibly-with-r\\\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"Pirating Web Content Responsibly With R\",\"datePublished\":\"2017-09-19T12:28:03+00:00\",\"dateModified\":\"2018-03-07T22:02:05+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/09\\\/19\\\/pirating-web-content-responsibly-with-r\\\/\"},\"wordCount\":982,\"commentCount\":20,\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"image\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/09\\\/19\\\/pirating-web-content-responsibly-with-r\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2017\\\/09\\\/Plot_Zoom-2.png?fit=1818%2C1390&ssl=1\",\"keywords\":[\"post\"],\"articleSection\":[\"data wrangling\",\"R\",\"TLAPD\",\"web scraping\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/09\\\/19\\\/pirating-web-content-responsibly-with-r\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/09\\\/19\\\/pirating-web-content-responsibly-with-r\\\/\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/09\\\/19\\\/pirating-web-content-responsibly-with-r\\\/\",\"name\":\"Pirating Web Content Responsibly With R - rud.is\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/09\\\/19\\\/pirating-web-content-responsibly-with-r\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/09\\\/19\\\/pirating-web-content-responsibly-with-r\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2017\\\/09\\\/Plot_Zoom-2.png?fit=1818%2C1390&ssl=1\",\"datePublished\":\"2017-09-19T12:28:03+00:00\",\"dateModified\":\"2018-03-07T22:02:05+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/09\\\/19\\\/pirating-web-content-responsibly-with-r\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/09\\\/19\\\/pirating-web-content-responsibly-with-r\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/09\\\/19\\\/pirating-web-content-responsibly-with-r\\\/#primaryimage\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2017\\\/09\\\/Plot_Zoom-2.png?fit=1818%2C1390&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2017\\\/09\\\/Plot_Zoom-2.png?fit=1818%2C1390&ssl=1\",\"width\":1818,\"height\":1390},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2017\\\/09\\\/19\\\/pirating-web-content-responsibly-with-r\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/rud.is\\\/b\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Pirating Web Content Responsibly With R\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/rud.is\\\/b\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\\\/\\\/rud.is\"],\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/author\\\/hrbrmstr\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Pirating Web Content Responsibly With R - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/","og_locale":"en_US","og_type":"article","og_title":"Pirating Web Content Responsibly With R - rud.is","og_description":"International Code Talk Like A Pirate Day almost slipped by without me noticing (September has been a crazy busy month), but it popped up in the calendar notifications today and I was glad that I had prepped the meat of a post a few weeks back. There will be no &#8216;rrrrrr&#8217; abuse in this post, [&hellip;]","og_url":"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/","og_site_name":"rud.is","article_published_time":"2017-09-19T12:28:03+00:00","article_modified_time":"2018-03-07T22:02:05+00:00","og_image":[{"width":1818,"height":1390,"url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1818%2C1390&ssl=1","type":"image\/png"}],"author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"Pirating Web Content Responsibly With R","datePublished":"2017-09-19T12:28:03+00:00","dateModified":"2018-03-07T22:02:05+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/"},"wordCount":982,"commentCount":20,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"image":{"@id":"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1818%2C1390&ssl=1","keywords":["post"],"articleSection":["data wrangling","R","TLAPD","web scraping"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/","url":"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/","name":"Pirating Web Content Responsibly With R - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"primaryImageOfPage":{"@id":"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/#primaryimage"},"image":{"@id":"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1818%2C1390&ssl=1","datePublished":"2017-09-19T12:28:03+00:00","dateModified":"2018-03-07T22:02:05+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/#primaryimage","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1818%2C1390&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1818%2C1390&ssl=1","width":1818,"height":1390},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"Pirating Web Content Responsibly With R"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1818%2C1390&ssl=1","jetpack_shortlink":"https:\/\/wp.me\/p23idr-1EZ","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":2685,"url":"https:\/\/rud.is\/b\/2013\/09\/19\/animated-irl-pirate-attacks-in-r\/","url_meta":{"origin":6385,"position":0},"title":"Animated IRL Pirate Attacks In R","author":"hrbrmstr","date":"2013-09-19","format":false,"excerpt":"Avast me hearRties! (ok, enough of the pirate speak in a blog post) It wouldn't be TLAPD without out some modest code & idea pilfering from Mark Bulling & Simon Raper. While those mateys did a fine job hoisting up some R code (your really didn't think I'd stop with\u2026","rel":"","context":"In &quot;DataVis&quot;","block_context":{"text":"DataVis","link":"https:\/\/rud.is\/b\/category\/datavis-2\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3679,"url":"https:\/\/rud.is\/b\/2015\/09\/19\/a-package-full-o-pirates-makin-interactive-pirate-maps-in-arrrrrrstats\/","url_meta":{"origin":6385,"position":1},"title":"A Package Full o&#8217; Pirates &#038; Makin&#8217; Interactive Pirate Maps in arrrrrRstats","author":"hrbrmstr","date":"2015-09-19","format":false,"excerpt":"Avast, me hearties! It's time four t' annual International Talk Like a Pirate Day #rstats post! (OK, I won't make you suffer continuous pirate-speak for the entire post) I tried to be a bit more practical this year and have two treasuRe chests for you to (hopefully) enjoy. A Package\u2026","rel":"","context":"In &quot;cartography&quot;","block_context":{"text":"cartography","link":"https:\/\/rud.is\/b\/category\/cartography\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":5907,"url":"https:\/\/rud.is\/b\/2017\/05\/05\/scrapeover-friday-a-k-a-another-r-scraping-makeover\/","url_meta":{"origin":6385,"position":2},"title":"Scrapeover Friday \u2014 a.k.a. Another R Scraping Makeover","author":"hrbrmstr","date":"2017-05-05","format":false,"excerpt":"I caught a glimpse of a tweet by @dataandme on Friday: Using R & rvest to explore Malaysian property mkt: \"Web Scraping: The Sequel, Propwall.my\" https:\/\/t.co\/daZOOJJfPN #rstats #rvest pic.twitter.com\/u6QMhm4M3e\u2014 Mara Averick (@dataandme) May 5, 2017 Mara is \u2014 without a doubt \u2014 the best data science promoter in the Twitterverse.\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3424,"url":"https:\/\/rud.is\/b\/2015\/05\/18\/scraping-jquery-datatable-programmatic-json-with-r\/","url_meta":{"origin":6385,"position":3},"title":"Scraping jQuery DataTable Programmatic JSON with R","author":"hrbrmstr","date":"2015-05-18","format":false,"excerpt":"School of Data had a recent post how to copy \"every item\" from a multi-page list. While their post did provide a neat hack, their \"words of warning\" are definitely missing some items and the overall methodology can be improved upon with some basic R scripting. First, the technique they\u2026","rel":"","context":"In &quot;Data Analysis&quot;","block_context":{"text":"Data Analysis","link":"https:\/\/rud.is\/b\/category\/data-analysis-2\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":5801,"url":"https:\/\/rud.is\/b\/2017\/04\/13\/come-fly-with-me-well-not-really-comparing-involuntary-disembarking-rates-across-u-s-airlines-in-r\/","url_meta":{"origin":6385,"position":4},"title":"Come Fly With Me (well, not really) \u2014 Comparing Involuntary Disembarking Rates Across U.S. Airlines in R","author":"hrbrmstr","date":"2017-04-13","format":false,"excerpt":"By now, word of the forcible deplanement of a medical professional by United has reached even the remotest of outposts in the #rstats universe. Since the news brought this practice to global attention, I found some aggregate U.S. Gov data made a quick, annual, aggregate look at this soon after\u2026","rel":"","context":"In &quot;data wrangling&quot;","block_context":{"text":"data wrangling","link":"https:\/\/rud.is\/b\/category\/data-wrangling\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6930,"url":"https:\/\/rud.is\/b\/2017\/11\/02\/yet-another-power-outages-post-full-tidyverse-edition\/","url_meta":{"origin":6385,"position":5},"title":"Yet-Another-Power Outages Post : Full Tidyverse Edition","author":"hrbrmstr","date":"2017-11-02","format":false,"excerpt":"This past weekend, violent windstorms raged through New England. We \u2014 along with over 500,000 other Mainers \u2014 went \"dark\" in the wee hours of Monday morning and (this post was published on Thursday AM) we still have no utility-provided power nor high-speed internet access. The children have turned iFeral,\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/11\/plot_zoom_png.png?fit=1200%2C628&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/11\/plot_zoom_png.png?fit=1200%2C628&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/11\/plot_zoom_png.png?fit=1200%2C628&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/11\/plot_zoom_png.png?fit=1200%2C628&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/11\/plot_zoom_png.png?fit=1200%2C628&ssl=1&resize=1050%2C600 3x"},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/6385","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=6385"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/6385\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media\/6395"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=6385"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=6385"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=6385"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}