

{"id":1849,"date":"2012-12-17T15:21:50","date_gmt":"2012-12-17T20:21:50","guid":{"rendered":"http:\/\/rud.is\/b\/?p=1849"},"modified":"2018-03-10T07:51:27","modified_gmt":"2018-03-10T12:51:27","slug":"easier-html-table-scraping-for-scripts-with-google-drive","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/","title":{"rendered":"Easier HTML Table-scraping For Scripts With Google Drive"},"content":{"rendered":"<p>We had our first, real, snowfall of the season in Maine today and that usually means school delays\/closings. Our &#8220;local&#8221; station \u2013 @WCHS6 \u2013 has a <span class=\"removed_link\" title=\"http:\/\/www.wcsh6.com\/weather\/severe_weather\/cancellations_closings\/default.aspx\">Storm Center Closings<\/span> page as well as an SMS notification service. I decided this morning that I needed a command line version (and, eventually, a version that sends me a Twitter DM), but I also was tight for time (a lunchtime meeting ending early is responsible for this blog post).<\/p>\n<p>While I&#8217;ve consumed my share of <a href=\"https:\/\/www.crummy.com\/software\/BeautifulSoup\/\">Beautiful Soup<\/a> and can throw down some <a href=\"http:\/\/wwwsearch.sourceforge.net\/mechanize\/\">mechanize<\/a> with the best of them, it came to me that there may be an <em>even easier way<\/em>, and one that may also help with the eventual blocking of such a scraping service.<\/p>\n<p>I setup a <a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/1xa1nWOv1AQrBM_WUW07Z19aSoEXlWLsUnhqYdgs0nxw\/edit\">Google Drive spreadsheet<\/a> to use the <a href=\"https:\/\/rud.is\/b\/2012\/01\/13\/importhtml\/\">importHTML<\/a> formula to read in the closings table on the page:<\/p>\n<pre lang=\"vb\">=importHTML(\"http:\/\/www.wcsh6.com\/weather\/severe_weather\/cancellations_closings\/default.aspx\",\"table\",0)<\/pre>\n<p>Then did a <code>File&rarr;Publish to the web<\/code> and setup up Sheet 1 to &#8220;A<em>utomatically republish when changes are made<\/em>&#8221; and also to have the link be to the CSV version of the data:<\/p>\n<p><center><a href=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2012\/12\/Screenshot-121712-116-PM.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"1850\" data-permalink=\"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/screenshot-121712-116-pm\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2012\/12\/Screenshot-121712-116-PM.png?fit=510%2C466&amp;ssl=1\" data-orig-size=\"510,466\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;}\" data-image-title=\"Screenshot 12:17:12 1:16 PM\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2012\/12\/Screenshot-121712-116-PM.png?fit=300%2C274&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2012\/12\/Screenshot-121712-116-PM.png?fit=510%2C466&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2012\/12\/Screenshot-121712-116-PM.png?resize=510%2C466&#038;ssl=1\" alt=\"Screenshot 12:17:12 1:16 PM\" width=\"510\" height=\"466\" class=\"aligncenter size-full wp-image-1850\" srcset=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2012\/12\/Screenshot-121712-116-PM.png?w=510&amp;ssl=1 510w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2012\/12\/Screenshot-121712-116-PM.png?resize=300%2C274&amp;ssl=1 300w\" sizes=\"auto, (max-width: 510px) 100vw, 510px\" \/><\/a><\/center><\/p>\n<p>The raw output looks a bit like:<\/p>\n<pre lang=\"text\">Name,Status,Last Updated\n,,\nWestbook Seniors,Luncheon PPD to January 7th,12\/17\/2012 5:22:51\n,,\nAllied Wheelchair Van Services,Closed,12\/17\/2012 6:49:47\n,,\nAmerican Legion - Dixfield,Bingo cancelled,12\/17\/2012 11:44:12\n,,\nAmerican Legion Post 155 - Naples,Closed,12\/17\/2012 12:49:00<\/pre>\n<p>The conversion has some &#8220;blank&#8221; lines but that&#8217;s easy enough to filter out with some quick <code>bash<\/code>:<\/p>\n<pre lang=\"bash\">curl --silent \"https:\/\/docs.google.com\/spreadsheet\/pub?key=0AlCY1qfmPPZVdFBsX3kzLUVHZl9Mdmw3bS1POWNsWnc&single=true&gid=0&outpu\nt=csv\" | grep -v \"^,,\"<\/pre>\n<p>And, looking for the specific school(s) of our kids is an easy <code>grep<\/code> as well.<\/p>\n<p>The reason this is interesting is that the <code>importHTML<\/code> is dynamic and <em>will re-convert the HTML table each time the code retrieves the CSV URL<\/em>. Couple that with the fact that it&#8217;s far less likely that Google will be blocked than it is my IP address(es) and this seems to be a pretty nice alternative to traditional parsing.<\/p>\n<p>If I get some time over the break, I&#8217;ll do a quick benchmark of using this method over some python and perl scraping\/parsing methods.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We had our first, real, snowfall of the season in Maine today and that usually means school delays\/closings. Our &#8220;local&#8221; station \u2013 @WCHS6 \u2013 has a Storm Center Closings page as well as an SMS notification service. I decided this morning that I needed a command line version (and, eventually, a version that sends me [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":true,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[88,36,7,640],"tags":[],"class_list":["post-1849","post","type-post","status-publish","format-standard","hentry","category-google-docs","category-html5","category-programming","category-python-2"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Easier HTML Table-scraping For Scripts With Google Drive - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Easier HTML Table-scraping For Scripts With Google Drive - rud.is\" \/>\n<meta property=\"og:description\" content=\"We had our first, real, snowfall of the season in Maine today and that usually means school delays\/closings. Our &#8220;local&#8221; station \u2013 @WCHS6 \u2013 has a Storm Center Closings page as well as an SMS notification service. I decided this morning that I needed a command line version (and, eventually, a version that sends me [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2012-12-17T20:21:50+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-03-10T12:51:27+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/rud.is\/b\/wp-content\/uploads\/2012\/12\/Screenshot-121712-116-PM.png\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"Easier HTML Table-scraping For Scripts With Google Drive\",\"datePublished\":\"2012-12-17T20:21:50+00:00\",\"dateModified\":\"2018-03-10T12:51:27+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/\"},\"wordCount\":302,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"image\":{\"@id\":\"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/rud.is\/b\/wp-content\/uploads\/2012\/12\/Screenshot-121712-116-PM.png\",\"articleSection\":[\"Google Docs\",\"HTML5\",\"Programming\",\"Python\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/\",\"url\":\"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/\",\"name\":\"Easier HTML Table-scraping For Scripts With Google Drive - rud.is\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/rud.is\/b\/wp-content\/uploads\/2012\/12\/Screenshot-121712-116-PM.png\",\"datePublished\":\"2012-12-17T20:21:50+00:00\",\"dateModified\":\"2018-03-10T12:51:27+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/#primaryimage\",\"url\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2012\/12\/Screenshot-121712-116-PM.png?fit=510%2C466&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2012\/12\/Screenshot-121712-116-PM.png?fit=510%2C466&ssl=1\",\"width\":510,\"height\":466},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/rud.is\/b\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Easier HTML Table-scraping For Scripts With Google Drive\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/rud.is\/b\/#website\",\"url\":\"https:\/\/rud.is\/b\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/rud.is\/b\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\/\/rud.is\"],\"url\":\"https:\/\/rud.is\/b\/author\/hrbrmstr\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Easier HTML Table-scraping For Scripts With Google Drive - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/","og_locale":"en_US","og_type":"article","og_title":"Easier HTML Table-scraping For Scripts With Google Drive - rud.is","og_description":"We had our first, real, snowfall of the season in Maine today and that usually means school delays\/closings. Our &#8220;local&#8221; station \u2013 @WCHS6 \u2013 has a Storm Center Closings page as well as an SMS notification service. I decided this morning that I needed a command line version (and, eventually, a version that sends me [&hellip;]","og_url":"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/","og_site_name":"rud.is","article_published_time":"2012-12-17T20:21:50+00:00","article_modified_time":"2018-03-10T12:51:27+00:00","og_image":[{"url":"https:\/\/rud.is\/b\/wp-content\/uploads\/2012\/12\/Screenshot-121712-116-PM.png","type":"","width":"","height":""}],"author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"Easier HTML Table-scraping For Scripts With Google Drive","datePublished":"2012-12-17T20:21:50+00:00","dateModified":"2018-03-10T12:51:27+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/"},"wordCount":302,"commentCount":0,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"image":{"@id":"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/b\/wp-content\/uploads\/2012\/12\/Screenshot-121712-116-PM.png","articleSection":["Google Docs","HTML5","Programming","Python"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/","url":"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/","name":"Easier HTML Table-scraping For Scripts With Google Drive - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"primaryImageOfPage":{"@id":"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/#primaryimage"},"image":{"@id":"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/b\/wp-content\/uploads\/2012\/12\/Screenshot-121712-116-PM.png","datePublished":"2012-12-17T20:21:50+00:00","dateModified":"2018-03-10T12:51:27+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/#primaryimage","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2012\/12\/Screenshot-121712-116-PM.png?fit=510%2C466&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2012\/12\/Screenshot-121712-116-PM.png?fit=510%2C466&ssl=1","width":510,"height":466},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2012\/12\/17\/easier-html-table-scraping-for-scripts-with-google-drive\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"Easier HTML Table-scraping For Scripts With Google Drive"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-tP","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":6067,"url":"https:\/\/rud.is\/b\/2017\/06\/05\/r%e2%81%b6-scraping-images-to-pdfs\/","url_meta":{"origin":1849,"position":0},"title":"R\u2076 \u2014 Scraping Images To PDFs","author":"hrbrmstr","date":"2017-06-05","format":false,"excerpt":"I've been doing intermittent prep work for a follow-up to an earlier post on store closings and came across this CNN Money \"article\" on it. Said \"article\" is a deliberately obfuscated or lazily crafted series of GIF images that contain all the Radio Shack impending store closings. It's the most\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2902,"url":"https:\/\/rud.is\/b\/2014\/02\/11\/live-google-spreadsheet-for-keeping-track-of-sochi-medals\/","url_meta":{"origin":1849,"position":1},"title":"Live Google Spreadsheet For Keeping Track Of Sochi Medals","author":"hrbrmstr","date":"2014-02-11","format":false,"excerpt":"The \"medals\" R post by [TRInker](http:\/\/trinkerrstuff.wordpress.com\/2014\/02\/09\/sochi-olympic-medals-2\/) and re-blogged by [Revolutions](http:\/\/blog.revolutionanalytics.com\/2014\/02\/winter-olympic-medal-standings-presented-by-r.html) were both spiffy and a live example why there's no point in not publishing raw data. You don't need to have R (or any other language) do the scraping, though. The \"`IMPORTHTML`\" function (yes, function names seem to be ALL\u2026","rel":"","context":"In &quot;Google Docs&quot;","block_context":{"text":"Google Docs","link":"https:\/\/rud.is\/b\/category\/google-docs\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2717,"url":"https:\/\/rud.is\/b\/2013\/09\/25\/scraping-content-from-google-groups\/","url_meta":{"origin":1849,"position":2},"title":"Scraping Content From Google Groups","author":"hrbrmstr","date":"2013-09-25","format":false,"excerpt":"I was helping a friend out who wanted to build a word cloud from the text in Google Groups posts. If you've made any efforts to try to get content out of Google Groups you know that the only way to do so is to ensure you subscribe to the\u2026","rel":"","context":"In &quot;Development&quot;","block_context":{"text":"Development","link":"https:\/\/rud.is\/b\/category\/development\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/Untitled_and_input_text_not_updated_-_Google_Groups.png?fit=535%2C356&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/Untitled_and_input_text_not_updated_-_Google_Groups.png?fit=535%2C356&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/Untitled_and_input_text_not_updated_-_Google_Groups.png?fit=535%2C356&ssl=1&resize=525%2C300 1.5x"},"classes":[]},{"id":5178,"url":"https:\/\/rud.is\/b\/2017\/03\/19\/exploring-2017-retail-store-closings-with-r\/","url_meta":{"origin":1849,"position":3},"title":"Exploring 2017 Retail Store Closings with R","author":"hrbrmstr","date":"2017-03-19","format":false,"excerpt":"A story about one of the retail chains (J.C. Penny) releasing their list of stores closing in 2017 crossed paths with my Feedly reading list today and jogged my memory that there were a number of chains closing many of their doors this year, and I wanted to see the\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/03\/bls-1.png?fit=1200%2C1050&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/03\/bls-1.png?fit=1200%2C1050&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/03\/bls-1.png?fit=1200%2C1050&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/03\/bls-1.png?fit=1200%2C1050&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/03\/bls-1.png?fit=1200%2C1050&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":3558,"url":"https:\/\/rud.is\/b\/2015\/07\/25\/roll-your-own-gist-comments-notifier-in-r\/","url_meta":{"origin":1849,"position":4},"title":"Roll Your Own Gist Comments Notifier in R","author":"hrbrmstr","date":"2015-07-25","format":false,"excerpt":"As I was putting together the [coord_proj](https:\/\/rud.is\/b\/2015\/07\/24\/a-path-towards-easier-map-projection-machinations-with-ggplot2\/) ggplot2 extension I had posted a (https:\/\/gist.github.com\/hrbrmstr\/363e33f74e2972c93ca7) that I shared on Twitter. Said gist received a comment (several, in fact) and a bunch of us were painfully reminded of the fact that there is no built-in way to receive notifications from said comment\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11383,"url":"https:\/\/rud.is\/b\/2018\/08\/13\/in-brief-splashr-update-high-performance-scraping-with-splashr-furrr-teamhg-memexs-aquarium\/","url_meta":{"origin":1849,"position":5},"title":"In-brief: splashr update + High Performance Scraping with splashr, furrr &#038; TeamHG-Memex&#8217;s Aquarium","author":"hrbrmstr","date":"2018-08-13","format":false,"excerpt":"The development version of splashr now support authenticated connections to Splash API instances. Just specify user and pass on the initial splashr::splash() call to use your scraping setup a bit more safely. For those not familiar with splashr and\/or Splash: the latter is a lightweight alternative to tools like Selenium\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/1849","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=1849"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/1849\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=1849"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=1849"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=1849"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}