

{"id":5841,"date":"2017-04-23T17:27:01","date_gmt":"2017-04-23T22:27:01","guid":{"rendered":"https:\/\/rud.is\/b\/?p=5841"},"modified":"2018-03-07T17:19:23","modified_gmt":"2018-03-07T22:19:23","slug":"decomposing-composers-with-r","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/","title":{"rendered":"Decomposing Composers with R"},"content":{"rendered":"<p>The intrepid @ma_salmon cranked out <a href=\"http:\/\/www.masalmon.eu\/2017\/04\/23\/radioswissclassic\/\">another blog post<\/a>, remixing classical music schedule data from <a href=\"http:\/\/www.radioswissclassic.ch\/en\">Radio Swiss Classic<\/a>. It&#8217;s a fun post and you should read it before continuing here.<\/p>\n<p>Seriously, click the link and go read it before continuing.<\/p>\n<p>No, I mean it. Click the link or the rest of this makes no sense ;-)<\/p>\n<p>OK, good. You finally read her ? post.<\/p>\n<p>Now, I&#8217;m riffing off of said post here for four reasons. Three of the reasons are short, one is longer.<\/p>\n<p>The first, short one is: <em>be kind to web servers when scraping<\/em>. If you ran a site and suddenly got hit with 3,000+ immediately sequential requests you might not be able to handle it depending on your server config. At a <em>minimum<\/em> add a <code>Sys.sleep(sample(seq(0,1,0.25), 1))<\/code> before each sequential scrape and \u2014 if you can spare the time \u2014 <code>sample(5,1)<\/code> would be even better for a delay.<\/p>\n<p>The second, short one is: <code>purrr::safely()<\/code> is your bff when it comes to <code>xml2::read_html()<\/code> and other network-ops. The internet is fundamentally broken. Nodes die. Pages get lost. Links rot. You have to be able to handle exceptions and if you define something like <code>s_read_html &lt;- safely(read_html)<\/code> then when you do <code>s_read_html(\"https:\/\/example.com\/\")<\/code> the <code>$result<\/code> component will be <code>NULL<\/code> if the network request failed but will contain valid, parsed HTML if it succeeds. It is silent by default and works quite well (as we&#8217;ll see below).<\/p>\n<p>The third, short one is: MPGA (Make Progress-bars Great Again). <code>dplyr::progress_estimated()<\/code> can really simplify the usage of progress bars in <code>purrr<\/code> calls (drop a note in the comments if the code is confusing and I&#8217;ll add some expository).<\/p>\n<p>The last requires the code example for context:<\/p>\n<pre id=\"decomposing-01\"><code class=\"language-r\">library(rvest)\r\nlibrary(stringi)\r\nlibrary(lubridate)\r\nlibrary(tidyverse)\r\n\r\ns_read_html &lt;- safely(read_html)\r\n\r\n# helper for brevity\r\nxtract_nodes &lt;- function(node, css) {\r\n  html_nodes(node, css) %&gt;% html_text(trim = TRUE)\r\n}\r\n\r\nget_one_day_program &lt;- function(date=Sys.Date(),\r\n                                base_url=&quot;http:\/\/www.radioswissclassic.ch\/en\/music-programme\/search\/%s&quot;,\r\n                                pb=NULL) {\r\n\r\n  if (!is.null(pb)) pb$tick()$print()\r\n\r\n  Sys.sleep(sample(seq(0,1,0.25), 1)) # ideally, make this sample(5,1)\r\n\r\n  date &lt;- ymd(date) # handles case where input is character ISO date\r\n\r\n  pg &lt;- s_read_html(sprintf(base_url, format(date, &quot;%Y%m%d&quot;)))\r\n\r\n  if (!is.null(pg$result)) {\r\n\r\n    data_frame(\r\n\r\n      date = date,\r\n      duration = xtract_nodes(pg$result, &#039;div[class=&quot;playlist&quot;] *\r\n                                            span[class=&quot;time hidden-xs&quot;]&#039;) %&gt;% hm() %&gt;% as.numeric(),\r\n      artist = xtract_nodes(pg$result, &#039;div[class=&quot;playlist&quot;] * span[class=&quot;titletag&quot;]&#039;),\r\n      title = xtract_nodes(pg$result, &#039;div[class=&quot;playlist&quot;] * span[class=&quot;artist&quot;]&#039;),\r\n\r\n      hour = purrr::map(0:23, ~{\r\n        if (.x&lt;23) {\r\n          nod &lt;- html_nodes(pg$result,\r\n                             xpath=sprintf(&quot;.\/\/div[@id=&#039;%02d&#039;]\/following-sibling::div[contains(@class, &#039;item-row&#039;)\r\n                                                                 and (following-sibling::div[@id=&#039;%02d&#039;])]&quot;, .x, .x+1))\r\n        } else {\r\n          nod &lt;- html_nodes(pg$result,\r\n                            xpath=sprintf(&quot;.\/\/div[@id=&#039;%02d&#039;]\/following-sibling::div[contains(@class, &#039;item-row&#039;)]&quot;, .x))\r\n        }\r\n        rep(.x, length(nod))\r\n      }) %&gt;%\r\n        flatten_int()\r\n\r\n    )\r\n\r\n  } else {\r\n    closeAllConnections()\r\n    NULL\r\n  }\r\n\r\n}\r\n\r\nsearch_dates &lt;- seq(from = ymd(&quot;2008-09-01&quot;), to = ymd(&quot;2017-04-22&quot;), by = &quot;1 day&quot;)\r\n\r\npb &lt;- progress_estimated(length(search_dates[1:5]))\r\nprograms_df &lt;- map_df(search_dates[1:5], get_one_day_program, pb=pb)\r\nprograms_df\r\n## # A tibble: 825 \u00d7 5\r\n##          date duration                    artist                                                                         title  hour\r\n##        &lt;date&gt;    &lt;dbl&gt;                     &lt;chr&gt;                                                                         &lt;chr&gt; &lt;int&gt;\r\n## 1  2008-09-01       60   Franz Anton Hoffmeister &quot;Andante grazioso&quot; From Flute Quartet In A Major (After Mozart&#039;s KV 331) (CH)     0\r\n## 2  2008-09-01      360     Johann Nepomuk Hummel                              &quot;Rondo brillante&quot; Op. 56 For Piano And Orchestra     0\r\n## 3  2008-09-01     1380            Franz Schubert                       &quot;Andante con moto&quot; From Symphony No. 9 In C Major D 944     0\r\n## 4  2008-09-01     2340       Camille Saint-Sa\u00ebns                                       Violin Concerto No. 1 In A Major Op. 20     0\r\n## 5  2008-09-01     3000        Alexander Scriabin                                           Nocturne In A Flat Major Op. posth.     0\r\n## 6  2008-09-01     3180        Alexander Glazunov                                          Valse From &quot;Sc\u00e8nes de ballet&quot; Op. 52     0\r\n## 7  2008-09-01     3540 Carl Philipp Emanuel Bach                                                           Symphony In G Major     0\r\n## 8  2008-09-01     4200            Giuseppe Verdi                      &quot;O Signore, dal tetto natio&quot; From The Opera &quot;I Lombardi&quot;     1\r\n## 9  2008-09-01     4440             Franz Krommer                                 Clarinet Concerto In E Flat Major Op. 36 (CH)     1\r\n## 10 2008-09-01     5820            Georges Onslow             &quot;Andantino molto cantabile&quot; From Symphony No. 4 In G Major Op. 71     1\r\n## # ... with 815 more rows<\/code><\/pre>\n<p>One of the reasons Ma\u00eblle created her post was to use <a href=\"https:\/\/www.w3schools.com\/xml\/xml_xpath.asp\">XPath<\/a>. Now, I was around when XML was defined and I have a sad, long history with the format, so XPath &amp; I are old <strike>friends<\/strike> adversaries. However, there are simpler ways to target some of the nodes.<\/p>\n<p><code>xpath=\"\/\/span[@class='time hidden-xs']\/\/text()\"<\/code> is ++gd XPath but it doesn&#8217;t need to be if we switch to using <code>html_nodes()<\/code> which will automatically translate CSS selectors to XPath for us. That bit of XPath turns into <code>div[class=\"playlist\"] * span[class=\"time hidden-xs\"]<\/code>. Why the extra selector at the beginning? Read on!<\/p>\n<p><code>div[class=\"playlist\"] * span[class=\"time hidden-xs\"]<\/code> actually translates to the following XPath:<\/p>\n<pre id=\"decomposing-02\"><code class=\"language-r\">selectr::css_to_xpath(&#039;div[class=&quot;playlist&quot;] * span[class=&quot;time hidden-xs&quot;]&#039;)\r\n## [1] &quot;descendant-or-self::div[@class = &#039;playlist&#039;]\/descendant::*\/descendant::span[@class = &#039;time hidden-xs&#039;]&quot;<\/code><\/pre>\n<p>I use the parent <code>playlist<\/code> <code>&lt;div&gt;<\/code> because a few of the code bits in Ma\u00eblle&#8217;s post have to subtract away the last node because the XPath expression is a bit too greedy and also gets the &#8220;now playing&#8221; info vs just the &#8220;what played that day&#8221; info. It&#8217;s not strictly necessary for the time-code but it is for the artist &amp; title. You can see that it simplifies the scraping a bit.<\/p>\n<p>However, we <em>can<\/em> use XPath for to scrape the &#8220;hour the song played&#8221;  and use it to fill the resultant data frame.<\/p>\n<p>This <code>.\/\/div[@id='%02d']\/following-sibling::div[contains(@class, 'item-row') and (following-sibling::div[@id='%02d'])]<\/code> is not the most complex XPath but it is pretty gnarly, yet it also shows the power of XPath. What we&#8217;re doing in that <code>purrr::map()<\/code> call (which said XPath is in) is:<\/p>\n<ul>\n<li>if the hour is <code>0:22<\/code>, then use get all the sibling target nodes between one <code>&lt;div id=\"hh\"&gt;<\/code> and the next <code>&lt;div id=\"hh\"&gt;<\/code>.<\/li>\n<li>if the hour is <code>23<\/code>, then get all the target nodes until there are no sibling <\/li>\n<li>for either result, make an integer vector containing the hour repeated <code>n<\/code> times (<code>n<\/code> being the length of the number of songs played in the hour)<\/li>\n<li>flatten it all into one big integer vector<\/li>\n<\/ul>\n<p>(also: note that whitespace is your bff as well when it comes to formatting XPath queries)<\/p>\n<p>If any <code>read_html<\/code> request is &#8220;bad&#8221; <code>NULL<\/code> will be returned instead of a <code>data_frame<\/code>, which <code>purrr::map_df()<\/code> will ignore.<\/p>\n<p>I only did 5 scrapes since I won&#8217;t be using the data, but it&#8217;s working well on other random sequences I tried.<\/p>\n<p>I tossed in a few more alternative ways to get some of the data, which you can pick up on if you compare the each code bits to each other.<\/p>\n<p>Drop any questions, jibes or better XPath queries (once you post an XPath query on the internet the XPath wonks \u2014 like me ? \u2014 come out of hiding to prey on innocent bloggers) in the comments.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The intrepid @ma_salmon cranked out another blog post, remixing classical music schedule data from Radio Swiss Classic. It&#8217;s a fun post and you should read it before continuing here. Seriously, click the link and go read it before continuing. No, I mean it. Click the link or the rest of this makes no sense ;-) [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[732],"tags":[810],"class_list":["post-5841","post","type-post","status-publish","format-standard","hentry","category-xml","tag-post"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Decomposing Composers with R - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Decomposing Composers with R - rud.is\" \/>\n<meta property=\"og:description\" content=\"The intrepid @ma_salmon cranked out another blog post, remixing classical music schedule data from Radio Swiss Classic. It&#8217;s a fun post and you should read it before continuing here. Seriously, click the link and go read it before continuing. No, I mean it. Click the link or the rest of this makes no sense ;-) [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2017-04-23T22:27:01+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-03-07T22:19:23+00:00\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"Decomposing Composers with R\",\"datePublished\":\"2017-04-23T22:27:01+00:00\",\"dateModified\":\"2018-03-07T22:19:23+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/\"},\"wordCount\":687,\"commentCount\":4,\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"keywords\":[\"post\"],\"articleSection\":[\"xml\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/\",\"url\":\"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/\",\"name\":\"Decomposing Composers with R - rud.is\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/#website\"},\"datePublished\":\"2017-04-23T22:27:01+00:00\",\"dateModified\":\"2018-03-07T22:19:23+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/rud.is\/b\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Decomposing Composers with R\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/rud.is\/b\/#website\",\"url\":\"https:\/\/rud.is\/b\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/rud.is\/b\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\/\/rud.is\"],\"url\":\"https:\/\/rud.is\/b\/author\/hrbrmstr\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Decomposing Composers with R - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/","og_locale":"en_US","og_type":"article","og_title":"Decomposing Composers with R - rud.is","og_description":"The intrepid @ma_salmon cranked out another blog post, remixing classical music schedule data from Radio Swiss Classic. It&#8217;s a fun post and you should read it before continuing here. Seriously, click the link and go read it before continuing. No, I mean it. Click the link or the rest of this makes no sense ;-) [&hellip;]","og_url":"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/","og_site_name":"rud.is","article_published_time":"2017-04-23T22:27:01+00:00","article_modified_time":"2018-03-07T22:19:23+00:00","author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"Decomposing Composers with R","datePublished":"2017-04-23T22:27:01+00:00","dateModified":"2018-03-07T22:19:23+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/"},"wordCount":687,"commentCount":4,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"keywords":["post"],"articleSection":["xml"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/","url":"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/","name":"Decomposing Composers with R - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"datePublished":"2017-04-23T22:27:01+00:00","dateModified":"2018-03-07T22:19:23+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2017\/04\/23\/decomposing-composers-with-r\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"Decomposing Composers with R"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-1wd","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":3141,"url":"https:\/\/rud.is\/b\/2014\/11\/27\/power-outage-impact-choropleths-in-5-steps-in-r-featuring-rvest-rstudio-projects\/","url_meta":{"origin":5841,"position":0},"title":"Power Outage Impact Choropleths In 5 Steps in R (featuring rvest &#038; RStudio &#8220;Projects&#8221;)","author":"hrbrmstr","date":"2014-11-27","format":false,"excerpt":"I and @awpiii were trading news about the power outages in Maine & New Hampshire last night and he tweeted the link to the @PSNH [Outage Map](http:\/\/www.psnh.com\/outage\/). As if the Bing Maps tiles weren't bad enough, the use of a categorical color scale instead of a sequential one[[1](http:\/\/earthobservatory.nasa.gov\/blogs\/elegantfigures\/2011\/05\/20\/qualitative-vs-sequential-color-scales\/)] caused sufficient\u2026","rel":"","context":"In &quot;Charts &amp; Graphs&quot;","block_context":{"text":"Charts &amp; Graphs","link":"https:\/\/rud.is\/b\/category\/charts-graphs\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":7637,"url":"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/","url_meta":{"origin":5841,"position":1},"title":"R\u2076 Series \u2014 Random Sampling From Apache Drill Tables With R &#038; sergeant","author":"hrbrmstr","date":"2017-12-20","format":false,"excerpt":"(For first-timers, R\u2076 tagged posts are short & sweet with minimal expository; R\u2076 feed) At work-work I mostly deal with medium-to-large-ish data. I often want to poke at new or existing data sets w\/o working across billions of rows. I also use Apache Drill for much of my exploratory work.\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":12120,"url":"https:\/\/rud.is\/b\/2019\/04\/03\/wicked-fast-accurate-quantiles-using-t-digests-in-r-with-the-tdigest-package\/","url_meta":{"origin":5841,"position":2},"title":"Wicked Fast, Accurate Quantiles Using \u2018t-Digests\u2019 in R with the {tdigest} Package","author":"hrbrmstr","date":"2019-04-03","format":false,"excerpt":"@ted_dunning recently updated the t-Digest algorithm he created back in 2013. What is this \"t-digest\"? Fundamentally, it is a probabilistic data structure for estimating any percentile of distributed\/streaming data. Ted explains it quite elegantly in this short video: Said video has a full transcript as well. T-digests have been baked\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":5131,"url":"https:\/\/rud.is\/b\/2017\/03\/10\/making-a-case-for-case_when\/","url_meta":{"origin":5841,"position":3},"title":"Making a Case for case_when","author":"hrbrmstr","date":"2017-03-10","format":false,"excerpt":"This is a brief (and likely obvious, for some folks) post on the dplyr::case_when() function. Part of my work-work is dealing with data from internet scans. When we're performing a deeper inspection of a particular internet protocol or service we try to capture as much system and service metadata as\u2026","rel":"","context":"In &quot;dplyr&quot;","block_context":{"text":"dplyr","link":"https:\/\/rud.is\/b\/category\/dplyr\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/03\/Cursor_and_RStudio.png?fit=1200%2C464&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/03\/Cursor_and_RStudio.png?fit=1200%2C464&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/03\/Cursor_and_RStudio.png?fit=1200%2C464&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/03\/Cursor_and_RStudio.png?fit=1200%2C464&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/03\/Cursor_and_RStudio.png?fit=1200%2C464&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":12114,"url":"https:\/\/rud.is\/b\/2019\/03\/26\/rome-was-not-built-in-a-day-but-widgetcard-was\/","url_meta":{"origin":5841,"position":4},"title":"Rome Was Not Built In A Day But widgetcard Was!","author":"hrbrmstr","date":"2019-03-26","format":false,"excerpt":"I saw a second post on turning htmlwidgets into interactive Twitter Player cards and felt somewhat compelled to make creating said entities a bit easier so posited the following: Wld this be useful packaged up, #rstats?https:\/\/t.co\/sfqlWnEeJVhttps:\/\/t.co\/troKzmzTNv(TLDR\/V: Single function to turn an HTML widget into a deployable interactive Twitter card) pic.twitter.com\/uahB52YfE2\u2014\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6496,"url":"https:\/\/rud.is\/b\/2017\/09\/30\/identify-analyze-web-site-tech-stacks-with-rappalyzer\/","url_meta":{"origin":5841,"position":5},"title":"Identify &#038; Analyze Web Site Tech Stacks With rappalyzer","author":"hrbrmstr","date":"2017-09-30","format":false,"excerpt":"Modern websites are complex beasts. They house photo galleries, interactive visualizations, web fonts, analytics code and other diverse types of content. Despite the potential for diversity, many web sites share similar \"tech stacks\" --- the components that come together to make them what they are. These stacks consist of web\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Viewer_Zoom.png?fit=1198%2C1200&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Viewer_Zoom.png?fit=1198%2C1200&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Viewer_Zoom.png?fit=1198%2C1200&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Viewer_Zoom.png?fit=1198%2C1200&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Viewer_Zoom.png?fit=1198%2C1200&ssl=1&resize=1050%2C600 3x"},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/5841","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=5841"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/5841\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=5841"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=5841"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=5841"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}