

{"id":11540,"date":"2018-09-19T09:26:17","date_gmt":"2018-09-19T14:26:17","guid":{"rendered":"https:\/\/rud.is\/b\/?p=11540"},"modified":"2018-09-19T11:57:07","modified_gmt":"2018-09-19T16:57:07","slug":"taking-a-tour-of-the-pirate-ship-github-dmca-with-r","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/","title":{"rendered":"Taking a Tour of the Pirate Ship &#8216;GitHub DMCA&#8217; with R"},"content":{"rendered":"<p>Despite having sailed through the core components of this year&#8217;s <a href=\"http:\/\/talklikeapirate.com\/wordpress\/\">Talk Like A Pirate Day<\/a> R post a few months ago, time has been an enemy of late so this will be a short post that others can build off of, especially since there&#8217;s lots more <strike>knife work<\/strike> ground to cover from the data.<\/p>\n<h3>DMC-WhAt?<\/h3>\n<p>Since this is TLAPD, I&#8217;ll pilfer some of the explanation from GitHub itself:<\/p>\n<p>The Digital Millennium Copyright Act (DMCA) <span style=\"font-family:monospace\">&lt;start_of_current_pilfer&gt;<em><\/span>&#8220;provides a safe harbor for service providers that host user-generated content. Since even a single claim of copyright infringement can carry statutory damages of up to $150,000, the possibility of being held liable for user-generated content could be very harmful for service providers. With potential damages multiplied across millions of users, cloud-computing and user-generated content sites like YouTube, Facebook, or GitHub probably never would have existed without the DMCA (or at least not without passing some of that cost downstream to their users).&#8221;<\/em><\/p>\n<p><em>&#8220;The DMCA addresses this issue by creating a copyright liability safe harbor for internet service providers hosting allegedly infringing user-generated content. Essentially, so long as a service provider follows the DMCA&#8217;s notice-and-takedown rules, it won&#8217;t be liable for copyright infringement based on user-generated content. Because of this, it is important for GitHub to maintain its DMCA safe-harbor status.&#8221;<\/em>&lt;<span style=\"font-family:monospace\">\/end_of_current_pilfer&gt;<\/span><\/p>\n<p>(I&#8217;ll save you from a long fact- and opinion-based diatribe on the DMCA, but suffice it to say it&#8217;s done far more harm than good IMO. Also, hopefully the &#8220;piracy&#8221; connection makes sense, now :-)<\/p>\n<p>If your initial reaction was <em>&#8220;What does the DMCA have to do with GitHub?&#8221;<\/em> it likely (quickly) turned to <em>&#8220;Oh&hellip;GitHub is really just a version-controlled file sharing service&hellip;&#8221;<\/em>. As such it has to have <a href=\"https:\/\/help.github.com\/articles\/dmca-takedown-policy\/\">a robust takedown policy<\/a> and process.<\/p>\n<p>I don&#8217;t know if Microsoft is going to keep the practice of being open about DMCA requests now that they own GitHub nor do I know if they&#8217;ll use the same process on themselves (since, as we&#8217;ll see, they have issued DMCA requests to GitHub in the past). For now, we&#8217;ll assume they will, thus making the code from this post usable in the future to check on the status of DMCA requests over a longer period of time. But first we need the data.<\/p>\n<h3>Hunting for treasure in the data hoard<\/h3>\n<p>Unsurprisingly, GitHub stores DMCA data <a href=\"https:\/\/github.com\/github\/dmca\">on GitHub<\/a>. Ironically, they store it openly &mdash; in-part &mdash; to shine a light on what giant, global megacorps like Microsoft are doing. Feel free to use one of the many R packages to clone the repo, but a simple command-line <code>git clone git@github.com:github\/dmca.git<\/code> is quick and efficient (not everything needs to be done from R).<\/p>\n<p>The directory structure looks like this:<\/p>\n<pre><code class=\"language-plain\">\u251c\u2500\u2500 2011\n\u251c\u2500\u2500 2012\n\u251c\u2500\u2500 2013\n\u251c\u2500\u2500 2014\n\u251c\u2500\u2500 2015\n\u251c\u2500\u2500 2016\n\u251c\u2500\u2500 2017\n\u251c\u2500\u2500 2017-02-01-RBoyApps-2.md\n\u251c\u2500\u2500 2017-02-15-DeutscheBank.md\n\u251c\u2500\u2500 2017-03-13-Jetbrains.md\n\u251c\u2500\u2500 2017-06-26-Wipro-Counternotice.md\n\u251c\u2500\u2500 2017-06-30-AdflyLink.md\n\u251c\u2500\u2500 2017-07-28-Toontown-2.md\n\u251c\u2500\u2500 2017-08-31-Tourzan.md\n\u251c\u2500\u2500 2017-09-04-Random-House.md\n\u251c\u2500\u2500 2017-09-05-RandomHouse-2.md\n\u251c\u2500\u2500 2017-09-18-RandomHouse.md\n\u251c\u2500\u2500 2017-09-19-Ragnarok.md\n\u251c\u2500\u2500 2017-10-10-Broadcom.md\n\u251c\u2500\u2500 2018\n\u251c\u2500\u2500 2018-02-01-NihonAdSystems.md\n\u251c\u2500\u2500 2018-03-03-TuneIn.md\n\u251c\u2500\u2500 2018-03-16-Wabg.md\n\u251c\u2500\u2500 2018-05-17-Packt.md\n\u251c\u2500\u2500 2018-06-12-Suning.md\n\u251c\u2500\u2500 2018-07-31-Pearson.md\n\u251c\u2500\u2500 CONTRIBUTING.md\n\u251c\u2500\u2500 data\n\u2514\u2500\u2500 README.md<\/code><\/pre>\n<p>Unfortunately, the <code>data<\/code> directory contains fools&#8217; gold (it&#8217;s just high-level summary data).<\/p>\n<p>We want DMCA filer names, repo names, file names and the DMCA notice text (though we&#8217;ll be leaving NLP projects up to the intrepid readers). For that, it will mean processing the directories of notices.<\/p>\n<p>Notices are named (sadly, with some inconsistency) like this: <code>2018-03-15-Microsoft.md<\/code>. Year, month, date and name of org. The contents are text-versions of correspondence (usually email text) that have <a href=\"https:\/\/help.github.com\/articles\/guide-to-submitting-a-dmca-takedown-notice\/\">some requirements<\/a> in order to be processed. There&#8217;s also <a href=\"https:\/\/github.com\/contact\/dmca-notice\">an online form<\/a> one can fill out but it&#8217;s pretty much a free text field with some semblance of structure. It&#8217;s up to humans to follow that structure and &mdash; as such &mdash; there is inconsistency in the text as well. (Perhaps this is a great lesson that non-constrained inputs and human-originated filenames aren&#8217;t a great plan for curating data stores.)<\/p>\n<p>You may have seen what look like takedown files in the top level of the repo. I have no idea if they are legit (since they aren&#8217;t in the structured directories) so we&#8217;ll be ignoring them.<\/p>\n<p>When I took a look at the directories, some files end in <code>.markdown<\/code> but most end in <code>.md<\/code>. We&#8217;ll cover both instances (you&#8217;ll need to replace <code>\/data\/github\/dmca<\/code> with the prefix where you stored the repo:<\/p>\n<pre><code class=\"language-r\">library(tools)\nlibrary(stringi)\nlibrary(hrbrthemes)\nlibrary(tidyverse)\n\nlist.files(\n  path = sprintf(\"\/data\/github\/dmca\/%s\", 2011:2018), \n  pattern = \"\\\\.md$|\\\\.markdown$\",\n  full.names = TRUE\n) -> dmca_files<\/code><\/pre>\n<p>As noted previously, we&#8217;re going to focus on DMCA views over time, look at organizations who filed DMCA notices and the notice content. It turns out the filenames also distinguish whether a notice is a takedown request or a <a href=\"https:\/\/help.github.com\/articles\/guide-to-submitting-a-dmca-counter-notice\/\">counter-notice<\/a> (i.e. an &#8220;oops&hellip;my bad&hellip;&#8221; by a takedown originator) or a retraction, so we&#8217;ll collect that metadata as well. Finally, we&#8217;ll slurp up the text along the way.<\/p>\n<p>Again, I&#8217;ve taken a pass at this and found out the following:<\/p>\n<ul>\n<li>Some dates are coded incorrectly (infrequently enough to be able to use some causal rules to fix)<\/li>\n<li>Some org names are coded incorrectly (often enough to skew counts, so we need to deal with it)<\/li>\n<li>Counter-notice and retraction tags are inconsistent, so we need to deal with that as well<\/li>\n<\/ul>\n<p>It&#8217;s an ugly pipeline, so I&#8217;ve annotated these initial steps to make what&#8217;s going on a bit clearer:<\/p>\n<pre><code class=\"language-r\">map_df(dmca_files, ~{\n  \n  file_path_sans_ext(.x) %>% # remove extension\n    basename() %>% # get just the filename\n    stri_match_all_regex(\n      \"([[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{1,2})-(.*)\" # try to find the date and the org\n    ) %>% \n    unlist() -> date_org\n  \n  if (is.na(date_org[2])) { # handle a special case where the date pattern above didn't work\n    file_path_sans_ext(.x) %>% \n      basename() %>%\n      stri_match_all_regex(\n        \"([[:digit:]]{4}-[[:digit:]]{2})-(.*)\"\n      ) %>% \n      unlist() -> date_org\n  }\n  \n  # a few files are still broken so we'll deal with them as special cases\n  \n  if (stri_detect_fixed(.x, \"2017\/2017-11-06-1776.md\")) {\n    date_org <- c(\"\", \"2017-11-06\", \"1776\")\n  } else if (stri_detect_fixed(.x, \"2017\/2017-Offensive-Security-7.md\")) {\n    date_org <- c(\"\", \"2017-12-30\", \"Offensive-Security-7\")\n  } else if (stri_detect_fixed(.x, \"2017\/Offensive-Security-6.md\")) {\n    date_org <- c(\"\", \"2017-12-29\", \"Offensive-Security-6\")\n  }\n  \n  # we used a somewhat liberal regex to capture dates since some are \n  # still broken. We'll deal with those first, then turn them\n  # into proper Date objects\n  \n  list(\n    notice_day = case_when(\n      date_org[2] == \"2015-12-3\"  ~ \"2015-12-03\",\n      date_org[2] == \"2015-12-7\"  ~ \"2015-12-07\",\n      date_org[2] == \"2016-08\"    ~ \"2016-08-01\",\n      date_org[2] == \"2016-10-7\"  ~ \"2016-10-07\",\n      date_org[2] == \"2016-11-1\"  ~ \"2016-11-01\",\n      date_org[2] == \"2016-11-3\"  ~ \"2016-11-03\",\n      date_org[2] == \"2017-06\"    ~ \"2017-06-01\",\n      date_org[2] == \"0107-05-22\" ~ \"2017-05-22\",\n      date_org[2] == \"2017-11-1\"  ~ \"2017-11-01\",\n      TRUE ~ date_org[2]\n    ) %>% \n      lubridate::ymd(),\n    notice_org = date_org[3] %>% # somtimes the org name is messed up so we need to clean it up\n      stri_replace_last_regex(\"[-]*[[:digit:]]+$\", \"\") %>% \n      stri_replace_all_fixed(\"-\", \" \"),\n    notice_content = list(read_lines(.x)) # grab the content\n  ) -> ret\n  \n  # and there are still some broken org names\n  if (stri_detect_fixed(.x, \"2017\/2017-11-06-1776.md\")) {\n    ret$notice_org <- \"1776\"\n  } \n  \n  ret\n  \n}) -> dmca\n\ndmca\n## # A tibble: 4,460 x 3\n##    notice_day notice_org                   notice_content\n##    <date>     <chr>                        <list>        \n##  1 2011-01-27 sony                         <chr [65]>    \n##  2 2011-01-28 tera                         <chr [73]>    \n##  3 2011-01-31 sony                         <chr [55]>    \n##  4 2011-02-03 sony counternotice           <chr [8]>     \n##  5 2011-02-03 sony                         <chr [2,348]> \n##  6 2011-03-24 oracle                       <chr [9]>     \n##  7 2011-03-30 mentor graphics              <chr [33]>    \n##  8 2011-05-24 cpp virtual world operations <chr [14]>    \n##  9 2011-06-07 sony                         <chr [32]>    \n## 10 2011-06-13 diablominer                  <chr [12]>    \n## # ... with 4,450 more rows<\/code><\/pre>\n<p>Much better. We&#8217;ve got more deck-swabbing to do, now, to tag the counter-notice and retractions:<\/p>\n<pre><code class=\"language-r\">mutate(\n  dmca,\n  counter_notice = stri_detect_fixed(notice_org, \"counternotice|counter notice\"), # handle inconsistency\n  retraction = stri_detect_fixed(notice_org, \"retraction\"), \n  notice_org = stri_trans_tolower(notice_org) %>% \n    stri_replace_first_regex(\"\\ *(counternotice|counter notice)\\ *\", \"\") %>% # clean up org names with tags\n    stri_replace_first_regex(\"\\ *retraction\\ *\", \"\")\n) -> dmca\n\ndmca\n## # A tibble: 4,460 x 5\n##    notice_day notice_org        notice_content counter_notice retraction\n##    <date>     <chr>             <list>         <lgl>          <lgl>     \n##  1 2011-01-27 sony              <chr [65]>     FALSE          FALSE     \n##  2 2011-01-28 tera              <chr [73]>     FALSE          FALSE     \n##  3 2011-01-31 sony              <chr [55]>     FALSE          FALSE     \n##  4 2011-02-03 sony              <chr [8]>      FALSE          FALSE     \n##  5 2011-02-03 sony              <chr [2,348]>  FALSE          FALSE     \n##  6 2011-03-24 oracle            <chr [9]>      FALSE          FALSE     \n##  7 2011-03-30 mentor graphics   <chr [33]>     FALSE          FALSE     \n##  8 2011-05-24 cpp virtual worl\u2026 <chr [14]>     FALSE          FALSE     \n##  9 2011-06-07 sony              <chr [32]>     FALSE          FALSE     \n## 10 2011-06-13 diablominer       <chr [12]>     FALSE          FALSE     \n## # ... with 4,450 more rows<\/code><\/pre>\n<p>I&#8217;ve lower-cased the org names to make it easier to wrangle them since we do, indeed, need to wrangle them.<\/p>\n<p>I&#8217;m super-not-proud of the following code block, but I went into it thinking the org name corrections would be infrequent. But, as I worked with the supposedly-cleaned data, I kept adding correction rules and eventually created a monster:<\/p>\n<pre><code class=\"language-r\">mutate(\n  dmca,\n  notice_org = case_when(\n    stri_detect_fixed(notice_org, \"accenture\")        ~ \"accenture\",\n    stri_detect_fixed(notice_org, \"adobe\")            ~ \"adobe\",\n    stri_detect_fixed(notice_org, \"amazon\")           ~ \"amazon\",\n    stri_detect_fixed(notice_org, \"ansible\")          ~ \"ansible\",\n    stri_detect_fixed(notice_org, \"aspengrove\")       ~ \"aspengrove\",\n    stri_detect_fixed(notice_org, \"apple\")            ~ \"apple\",\n    stri_detect_fixed(notice_org, \"aws\")              ~ \"aws\",\n    stri_detect_fixed(notice_org, \"blizzard\")         ~ \"blizzard\",\n    stri_detect_fixed(notice_org, \"o reilly\")         ~ \"oreilly\",\n    stri_detect_fixed(notice_org, \"random\")           ~ \"random house\",\n    stri_detect_fixed(notice_org, \"casado\")           ~ \"casadocodigo\",\n    stri_detect_fixed(notice_org, \"ccp\")              ~ \"ccp\",\n    stri_detect_fixed(notice_org, \"cisco\")            ~ \"cisco\",\n    stri_detect_fixed(notice_org, \"cloudsixteen\")     ~ \"cloud sixteen\",\n    stri_detect_fixed(notice_org, \"collinsharper\")    ~ \"collins \u2019harper\",\n    stri_detect_fixed(notice_org, \"contentanalytics\") ~ \"content analytics\",\n    stri_detect_fixed(notice_org, \"packt\")            ~ \"packt\",\n    stri_detect_fixed(notice_org, \"penguin\")          ~ \"penguin\",\n    stri_detect_fixed(notice_org, \"wiley\")            ~ \"wiley\",\n    stri_detect_fixed(notice_org, \"wind river\")       ~ \"windriver\",\n    stri_detect_fixed(notice_org, \"windriver\")        ~ \"windriver\",\n    stri_detect_fixed(notice_org, \"wireframe\")        ~ \"wireframe shader\",\n    stri_detect_fixed(notice_org, \"listen\")           ~ \"listen\",\n    stri_detect_fixed(notice_org, \"wpecommerce\")      ~ \"wpecommerce\",\n    stri_detect_fixed(notice_org, \"yahoo\")            ~ \"yahoo\",\n    stri_detect_fixed(notice_org, \"youtube\")          ~ \"youtube\",\n    stri_detect_fixed(notice_org, \"x pressive\")       ~ \"xpressive\",\n    stri_detect_fixed(notice_org, \"ximalaya\")         ~ \"ximalaya\",\n    stri_detect_fixed(notice_org, \"pragmatic\")        ~ \"pragmatic\",\n    stri_detect_fixed(notice_org, \"evadeee\")          ~ \"evadeee\",\n    stri_detect_fixed(notice_org, \"iaai\")             ~ \"iaai\",\n    stri_detect_fixed(notice_org, \"line corp\")        ~ \"line corporation\",\n    stri_detect_fixed(notice_org, \"mediumrare\")       ~ \"medium rare\",\n    stri_detect_fixed(notice_org, \"profittrailer\")    ~ \"profit trailer\",\n    stri_detect_fixed(notice_org, \"smartadmin\")       ~ \"smart admin\",\n    stri_detect_fixed(notice_org, \"microsoft\")        ~ \"microsoft\",\n    stri_detect_fixed(notice_org, \"monotype\")         ~ \"monotype\",\n    stri_detect_fixed(notice_org, \"qualcomm\")         ~ \"qualcomm\",\n    stri_detect_fixed(notice_org, \"pearson\")          ~ \"pearson\",\n    stri_detect_fixed(notice_org, \"sony\")             ~ \"sony\",\n    stri_detect_fixed(notice_org, \"oxford\")           ~ \"oxford\",\n    stri_detect_fixed(notice_org, \"oracle\")           ~ \"oracle\",\n    stri_detect_fixed(notice_org, \"out fit\")          ~ \"outfit\",\n    stri_detect_fixed(notice_org, \"nihon\")            ~ \"nihon\",\n    stri_detect_fixed(notice_org, \"opencv\")           ~ \"opencv\",\n    stri_detect_fixed(notice_org, \"newsis\")           ~ \"newsis\",\n    stri_detect_fixed(notice_org, \"nostarch\")         ~ \"nostarch\",\n    stri_detect_fixed(notice_org, \"stardog\")          ~ \"stardog\",\n    stri_detect_fixed(notice_org, \"mswindows\")        ~ \"microsoft\",\n    stri_detect_fixed(notice_org, \"moody\")            ~ \"moody\",\n    stri_detect_fixed(notice_org, \"minecraft\")        ~ \"minecraft\",\n    stri_detect_fixed(notice_org, \"medinasoftware\")   ~ \"medina software\",\n    stri_detect_fixed(notice_org, \"linecorporation\")  ~ \"line corporation\",\n    stri_detect_fixed(notice_org, \"steroarts\")        ~ \"stereoarts\",\n    stri_detect_fixed(notice_org, \"mathworks\")        ~ \"mathworks\",\n    stri_detect_fixed(notice_org, \"tmssoftware\")      ~ \"tmssoftware\",\n    stri_detect_fixed(notice_org, \"toontown\")         ~ \"toontown\",\n    stri_detect_fixed(notice_org, \"wahoo\")            ~ \"wahoo\",\n    stri_detect_fixed(notice_org, \"webkul\")           ~ \"webkul\",\n    stri_detect_fixed(notice_org, \"whmcs\")            ~ \"whmcs\",\n    stri_detect_fixed(notice_org, \"viber\")            ~ \"viber\",\n    stri_detect_fixed(notice_org, \"totalfree\")        ~ \"totalfreedom\",\n    stri_detect_fixed(notice_org, \"successacademies\") ~ \"success academies\",\n    stri_detect_fixed(notice_org, \"ecgwaves\")         ~ \"ecgwaves\",\n    stri_detect_fixed(notice_org, \"synology\")         ~ \"synology\",\n    stri_detect_fixed(notice_org, \"infistar\")         ~ \"infistar\u2019\",\n    stri_detect_fixed(notice_org, \"galleria\")         ~ \"galleria\",\n    stri_detect_fixed(notice_org, \"jadoo\")            ~ \"jadoo\",\n    stri_detect_fixed(notice_org, \"dofustouch\")       ~ \"dofus touch\",\n    stri_detect_fixed(notice_org, \"gravityforms\")     ~ \"gravity forms\",\n    stri_detect_fixed(notice_org, \"fujiannewland\")    ~ \"fujian newland\",\n    stri_detect_fixed(notice_org, \"dk uk\")            ~ \"dk\",\n    stri_detect_fixed(notice_org, \"dk us\")            ~ \"dk\",\n    stri_detect_fixed(notice_org, \"dkuk\")             ~ \"dk\",\n    stri_detect_fixed(notice_org, \"dkus\")             ~ \"dk\",\n    stri_detect_fixed(notice_org, \"facet\")            ~ \"facet\",\n    stri_detect_fixed(notice_org, \"fh admin\")         ~ \"fhadmin\",\n    stri_detect_fixed(notice_org, \"electronicarts\")   ~ \"electronic arts\",\n    stri_detect_fixed(notice_org, \"daikonforge\")      ~ \"daikon forge\",\n    stri_detect_fixed(notice_org, \"corgiengine\")      ~ \"corgi engine\",\n    stri_detect_fixed(notice_org, \"epicgames\")        ~ \"epic  games\",\n    stri_detect_fixed(notice_org, \"essentialmode\")    ~ \"essentialmode\",\n    stri_detect_fixed(notice_org, \"jetbrains\")        ~ \"jetbrains\",\n    stri_detect_fixed(notice_org, \"foxy\")             ~ \"foxy themes\",\n    stri_detect_fixed(notice_org, \"cambridgemobile\")  ~ \"cambridge mobile\",\n    stri_detect_fixed(notice_org, \"offensive\")        ~ \"offensive security\",\n    stri_detect_fixed(notice_org, \"outfit\")           ~ \"outfit\",\n    stri_detect_fixed(notice_org, \"haihuan\")          ~ \"shanghai haihuan\",\n    stri_detect_fixed(notice_org, \"schuster\")         ~ \"simon & schuster\",\n    stri_detect_fixed(notice_org, \"silicon\")          ~ \"silicon labs\",\n    TRUE ~ notice_org\n  )) %>% \n  arrange(notice_day) -> dmca\n\ndmca\n## # A tibble: 4,460 x 5\n##    notice_day notice_org        notice_content counter_notice retraction\n##    <date>     <chr>             <list>         <lgl>          <lgl>     \n##  1 2011-01-27 sony              <chr [65]>     FALSE          FALSE     \n##  2 2011-01-28 tera              <chr [73]>     FALSE          FALSE     \n##  3 2011-01-31 sony              <chr [55]>     FALSE          FALSE     \n##  4 2011-02-03 sony              <chr [8]>      FALSE          FALSE     \n##  5 2011-02-03 sony              <chr [2,348]>  FALSE          FALSE     \n##  6 2011-03-24 oracle            <chr [9]>      FALSE          FALSE     \n##  7 2011-03-30 mentor graphics   <chr [33]>     FALSE          FALSE     \n##  8 2011-05-24 cpp virtual worl\u2026 <chr [14]>     FALSE          FALSE     \n##  9 2011-06-07 sony              <chr [32]>     FALSE          FALSE     \n## 10 2011-06-13 diablominer       <chr [12]>     FALSE          FALSE     \n## # ... with 4,450 more rows<\/code><\/pre>\n<p>You are heartily encouraged to create a translation table in place of that monstrosity.<\/p>\n<p>But, we finally have usable data. You can avoid the above by downloading <a href=\"https:\/\/rud.is\/dl\/github-dmca.json.gz\">https:\/\/rud.is\/dl\/github-dmca.json.gz<\/a> and using <code>jsonlite::stream_in()<\/code> or <code>ndjson::stream_in()<\/code> to get the above data frame.<\/p>\n<h3>Hoisting the mizzen <strike>sail<\/strike>plots<\/h3>\n<p>Let&#8217;s see what the notice submission frequency looks like over time:<\/p>\n<pre><code class=\"language-r\"># assuming you downloaded it as suggested\njsonlite::stream_in(gzfile(\"~\/Data\/github-dmca.json.gz\")) %>% \n  tbl_df() %>% \n  mutate(notice_day = as.Date(notice_day)) -> dmca\n\nfilter(dmca, !retraction) %>% \n  mutate(\n    notice_year = lubridate::year(notice_day),\n    notice_ym = as.Date(format(notice_day, \"%Y-%m-01\"))\n  ) %>% \n  dplyr::count(notice_ym) %>% \n  arrange(notice_ym) %>% \n  ggplot(aes(notice_ym, n)) +\n  ggalt::stat_xspline(\n    geom=\"area\", fill=alpha(ft_cols$blue, 1\/3), color=ft_cols$blue\n  ) +\n  scale_y_comma() +\n  labs(\n    x = NULL, y = \"# Notices\", \n    title = \"GitHub DMCA Notices by Month Since 2011\"\n  ) +\n  theme_ft_rc(grid=\"XY\")<\/code><\/pre>\n<p><a href=\"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/github-notices\/\" rel=\"attachment wp-att-11548\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"11548\" data-permalink=\"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/github-notices\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/github-notices.png?fit=2050%2C1022&amp;ssl=1\" data-orig-size=\"2050,1022\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"github-notices\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/github-notices.png?fit=510%2C254&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/github-notices.png?resize=510%2C254&#038;ssl=1\" alt=\"\" width=\"510\" height=\"254\" class=\"aligncenter size-full wp-image-11548\" \/><\/a><\/p>\n<p>I&#8217;m not naive, but that growth was a bit of a shocker, which made me want to jump in and see who the top-filers were:<\/p>\n<pre><code class=\"language-r\">count(dmca, notice_org, sort=TRUE)\n## # A tibble: 1,948 x 2\n##    notice_org             n\n##    <chr>              <int>\n##  1 webkul                92\n##  2 pearson               90\n##  3 stereoarts            86\n##  4 qualcomm              72\n##  5 codility              71\n##  6 random house          62\n##  7 outfit                57\n##  8 offensive security    49\n##  9 sensetime             46\n## 10 penguin               44\n## # ... with 1,938 more rows<\/code><\/pre>\n<p>&#8220;Webkul&#8221; is an enterprise eCommerce (I kinda miss all the dashed &#8220;e-&#8221; prefixes we used to use back in the day) platform. I mention that since I didn&#8217;t know what it was either. There are some recognizable names there like &#8220;Pearson&#8221; and &#8220;Random House&#8221; and &#8220;Penguin&#8221; which make sense since it&#8217;s easy to share improperly share e-books (modern non-dashed idioms be darned).<\/p>\n<p>Let&#8217;s see the top 15 orgs by year since 2015 (since that&#8217;s when DMCA filings really started picking up and because I like 2&#215;2 grids). We&#8217;ll also leave out counter-notices and retractions and alpha-order it since I want to be able to scan the names more than I want to see rank:<\/p>\n<pre><code class=\"language-r\">filter(dmca, !retraction, !counter_notice, notice_day >= as.Date(\"2015-01-01\")) %>%\n  mutate(\n    notice_year = lubridate::year(notice_day),\n  ) %>% \n  dplyr::count(notice_year, notice_org) %>% \n  group_by(notice_year) %>% \n  top_n(15) %>% \n  slice(1:15) %>% \n  dplyr::ungroup() %>%\n  mutate( # a-z order with \"a\" on top \n    notice_org = factor(notice_org, levels = unique(sort(notice_org, decreasing = TRUE)))\n  ) %>% \n  ggplot(aes(n, notice_org, xend=0, yend=notice_org)) +\n  geom_segment(size = 2, color = ft_cols$peach) +\n  facet_wrap(~notice_year, scales = \"free\") +\n  scale_x_comma(limits=c(0, 60)) +\n  labs(\n    x = NULL, y = NULL,\n    title = \"Top 15 GitHub DMCA Filers by Year Since 2015\"\n  ) +\n  theme_ft_rc(grid=\"X\")<\/code><\/pre>\n<p><a href=\"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/top-15-by-year\/\" rel=\"attachment wp-att-11550\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"11550\" data-permalink=\"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/top-15-by-year\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/top-15-by-year.png?fit=2166%2C1468&amp;ssl=1\" data-orig-size=\"2166,1468\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"top-15-by-year\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/top-15-by-year.png?fit=510%2C346&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/top-15-by-year.png?resize=510%2C346&#038;ssl=1\" alt=\"\" width=\"510\" height=\"346\" class=\"aligncenter size-full wp-image-11550\" \/><\/a><\/p>\n<p>Let&#8217;s look at rogues&#8217; gallery of the pirates themselves:<\/p>\n<pre><code class=\"language-r\">dmca %>% \n  mutate(\n    ghusers = notice_content %>% \n      map(~{\n        stri_match_all_regex(.x, \"http[s]*:\/\/github.com\/([^\/]+)\/.*\") %>% \n          discard(~is.na(.x[,1])) %>% \n          map_chr(~.x[,2]) %>% \n          unique() %>% \n          discard(`==`, \"github\") %>% \n          discard(~grepl(\" \", .x))\n      })\n  ) %>% \n  unnest(ghusers) %>% \n  dplyr::count(ghusers, sort=TRUE) %>% \n  print() -> offenders\n## # A tibble: 18,396 x 2\n##    ghusers           n\n##    <chr>         <int>\n##  1 RyanTech         16\n##  2 sdgdsffdsfff     12\n##  3 gamamaru6005     10\n##  4 ranrolls         10\n##  5 web-padawan      10\n##  6 alexinfopruna     8\n##  7 cyr2242           8\n##  8 liveqmock         8\n##  9 promosirupiah     8\n## 10 RandyMcMillan     8\n## # ... with 18,386 more rows<\/code><\/pre>\n<p>As you might expect, most users have only 1 or two complaints filed against them since it was likely an oversight more than malice on their part:<\/p>\n<pre><code class=\"language-r\">ggplot(offenders, aes(x=\"\", n)) +\n  ggbeeswarm::geom_quasirandom(\n    color = ft_cols$white, fill = alpha(ft_cols$red, 1\/10),\n    shape = 21, size = 3, stroke = 0.125\n  ) +\n  scale_y_comma(breaks=1:16, limits=c(1,16)) +\n  coord_flip() +\n  labs(\n    x = NULL, y = NULL,\n    title = \"Distribution of the Number of GitHub DMCA Complaints Received by a User\"\n  ) +\n  theme_ft_rc(grid=\"X\")<\/code><\/pre>\n<p><a href=\"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/dist1\/\" rel=\"attachment wp-att-11557\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"11557\" data-permalink=\"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/dist1\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/dist1.png?fit=2010%2C648&amp;ssl=1\" data-orig-size=\"2010,648\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"dist1\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/dist1.png?fit=510%2C164&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/dist1.png?resize=510%2C164&#038;ssl=1\" alt=\"\" width=\"510\" height=\"164\" class=\"aligncenter size-full wp-image-11557\" \/><\/a><\/p>\n<p>But, there <em>are<\/em> hundreds of digital buccaneers, and we can have a bit of fun with them especially since I noticed quite a few had default (generated) avatars with lots of white in them (presenting this with a <a href=\"https:\/\/livefreeordichotomize.com\/2017\/07\/18\/the-making-of-we-r-ladies\/\">pirate hat-tip<\/a> to Ma\u00eblle &amp; Lucy):<\/p>\n<pre><code class=\"language-r\">library(magick)\n\ndir.create(\"gh-pirates\")\ndir.create(\"gh-pirates-jpeg\")\n\n# this kinda spoils the surprise; i should have renamed it\ndownload.file(\"https:\/\/rud.is\/dl\/jolly-roger.jpeg\", \"jolly-roger.jpeg\")\n\nghs <- safely(gh::gh) # no need to add cruft to our namespace for one function \n\nfilter(offenders, n>2) %>% \n  pull(ghusers) %>% \n  { .pb <<- progress_estimated(length(.)); . } %>% # there are a few hundred of them\n  walk(~{\n    .pb$tick()$print()\n    user <- ghs(sprintf(\"\/users\/%s\", .x))$result # the get-user and then download avatar idiom shld help us not bust GH API rate limits\n    if (!is.null(user)) {\n      download.file(user$avatar_url, file.path(\"gh-pirates\", .x), quiet=TRUE) # can't assume avatar file type\n    }\n  })\n\n# we'll convert them all to jpeg and resize them at the same time plus make sure they aren't greyscale\ndir.create(\"gh-pirates-jpeg\")\nlist.files(\"gh-pirates\", full.names = TRUE, recursive = FALSE) %>%\n  walk(~{\n    image_read(.x) %>% \n      image_scale(\"72x72\") %>% \n      image_convert(\"jpeg\", type = \"TrueColor\", colorspace = \"rgb\") %>% \n      image_write(\n        path = file.path(\"gh-pirates-jpeg\", sprintf(\"%s.jpeg\", basename(.x))), \n        format = \"jpeg\"\n      )\n  })\n\nset.seed(20180919) # seemed appropriate for TLAPD\nRsimMosaic::composeMosaicFromImageRandomOptim( # this takes a bit\n  originalImageFileName = \"jolly-roger.jpeg\",\n  outputImageFileName = \"gh-pirates-flag.jpeg\",\n  imagesToUseInMosaic = \"gh-pirates-jpeg\",\n  removeTiles = TRUE,\n  fracLibSizeThreshold = 0.1\n)<\/code><\/pre>\n<p><a href=\"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/gh-pirates-flag\/\" rel=\"attachment wp-att-11555\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"11555\" data-permalink=\"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/gh-pirates-flag\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/gh-pirates-flag.jpeg?fit=1024%2C640&amp;ssl=1\" data-orig-size=\"1024,640\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;1&quot;}\" data-image-title=\"gh-pirates-flag\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/gh-pirates-flag.jpeg?fit=510%2C319&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/gh-pirates-flag.jpeg?resize=510%2C319&#038;ssl=1\" alt=\"\" width=\"510\" height=\"319\" class=\"aligncenter size-full wp-image-11555\" \/><\/a><\/p>\n<p>Finally, we&#8217;ll look at the types of pilfered files. To do that, we&#8217;ll first naively look for github repo URLs (there are <code>github.io<\/code> ones in there too, though, which is an exercise left to ye corsairs):<\/p>\n<pre><code class=\"language-r\">mutate(\n  dmca,\n  files = notice_content %>% \n    map(~{\n      paste0(.x, collapse = \" \") %>% \n        stri_extract_all_regex(gh_url_pattern, omit_no_match=FALSE, opts_regex = stri_opts_regex(TRUE)) %>% \n        unlist() %>% \n        stri_replace_last_regex(\"[[:punct:]]+$\", \"\")\n    })\n) -> dmca_with_files<\/code><\/pre>\n<p>Now, we can see just how many resources\/repos\/files are in a complaint:<\/p>\n<pre><code class=\"language-r\">filter(dmca_with_files, map_lgl(files, ~!is.na(.x[1]))) %>% \n  select(notice_day, notice_org, files) %>% \n  mutate(num_refs = lengths(files)) %>%\n  arrange(desc(num_refs)) %>%  # take a peek at the heavy hitters\n  print() -> files_with_counts\n## # A tibble: 4,020 x 4\n##    notice_day notice_org files         num_refs\n##    <date>     <chr>      <list>           <int>\n##  1 2014-08-27 monotype   <chr [2,504]>     2504\n##  2 2011-02-03 sony       <chr [1,160]>     1160\n##  3 2016-06-08 monotype   <chr [1,015]>     1015\n##  4 2018-04-05 hexrays    <chr [906]>        906\n##  5 2016-06-15 ibo        <chr [877]>        877\n##  6 2016-08-18 jetbrains  <chr [777]>        777\n##  7 2017-10-14 cengage    <chr [611]>        611\n##  8 2016-08-23 yahoo      <chr [556]>        556\n##  9 2017-08-30 altis      <chr [529]>        529\n## 10 2015-09-22 jetbrains  <chr [468]>        468\n## # ... with 4,010 more rows\n\nggplot(files_with_counts, aes(x=\"\", num_refs)) +\n  ggbeeswarm::geom_quasirandom(\n    color = ft_cols$white, fill = alpha(ft_cols$red, 1\/10),\n    shape = 21, size = 3, stroke = 0.125\n  ) +\n  scale_y_comma(trans=\"log10\") +\n  coord_flip() +\n  labs(\n    x = NULL, y = NULL,\n    title = \"Distribution of the Number of Files\/Repos per-GitHub DMCA Complaint\",\n    caption = \"Note: Log10 Scale\"\n  ) +\n  theme_ft_rc(grid=\"X\")\n<\/code><\/pre>\n<p><a href=\"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/dist2\/\" rel=\"attachment wp-att-11558\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"11558\" data-permalink=\"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/dist2\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/dist2.png?fit=1744%2C706&amp;ssl=1\" data-orig-size=\"1744,706\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"dist2\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/dist2.png?fit=510%2C206&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/dist2.png?resize=510%2C206&#038;ssl=1\" alt=\"\" width=\"510\" height=\"206\" class=\"aligncenter size-full wp-image-11558\" \/><\/a><\/p>\n<p>And, what are the most offensive file types (per-year):<\/p>\n<pre><code class=\"language-r\">mutate(\n  files_with_counts, \n  extensions = map(files, ~tools::file_ext(.x) %>% \n    discard(`==` , \"\")\n  )\n) %>% \n  select(notice_day, notice_org, extensions) %>% \n  unnest(extensions) %>% \n  mutate(year = lubridate::year(notice_day)) -> file_types\n\ncount(file_types, year, extensions) %>% \n  filter(year >= 2014) %>% \n  group_by(year) %>% \n  top_n(10) %>% \n  slice(1:10) %>% \n  ungroup() %>% \n  ggplot(aes(year, n)) +\n  ggrepel::geom_text_repel(\n    aes(label = extensions, size=n), \n    color = ft_cols$green, family=font_ps, show.legend=FALSE\n  ) +\n  scale_size(range = c(3, 10)) +\n  labs(\n    x = NULL, y = NULL,\n    title = \"Top 10 File-type GitHub DMCA Takedowns Per-year\"\n  ) +\n  theme_ft_rc(grid=\"X\") +\n  theme(axis.text.y=element_blank())<\/code><\/pre>\n<p><a href=\"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/takedown-types\/\" rel=\"attachment wp-att-11559\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"11559\" data-permalink=\"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/takedown-types\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/takedown-types.png?fit=1556%2C1756&amp;ssl=1\" data-orig-size=\"1556,1756\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"takedown-types\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/takedown-types.png?fit=510%2C576&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/takedown-types.png?resize=510%2C576&#038;ssl=1\" alt=\"\" width=\"510\" height=\"576\" class=\"aligncenter size-full wp-image-11559\" \/><\/a><\/p>\n<p>It&#8217;s not all code (lots of fonts and books) but there are plenty of source code files in those annual lists.<\/p>\n<h3>FIN<\/h3>\n<p>That&#8217;s it for this year&#8217;s TLAPD post. You&#8217;ve got the data and some starter code so build away! There are plenty more insights left to find and if you do take a stab at finding your own treasure, definitely leave a note in the comments.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Despite having sailed through the core components of this year&#8217;s Talk Like A Pirate Day R post a few months ago, time has been an enemy of late so this will be a short post that others can build off of, especially since there&#8217;s lots more knife work ground to cover from the data. DMC-WhAt? [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[91,762],"tags":[],"class_list":["post-11540","post","type-post","status-publish","format-standard","hentry","category-r","category-tlapd"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Taking a Tour of the Pirate Ship &#039;GitHub DMCA&#039; with R - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Taking a Tour of the Pirate Ship &#039;GitHub DMCA&#039; with R - rud.is\" \/>\n<meta property=\"og:description\" content=\"Despite having sailed through the core components of this year&#8217;s Talk Like A Pirate Day R post a few months ago, time has been an enemy of late so this will be a short post that others can build off of, especially since there&#8217;s lots more knife work ground to cover from the data. DMC-WhAt? [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2018-09-19T14:26:17+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-09-19T16:57:07+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/rud.is\/b\/wp-content\/uploads\/2018\/09\/github-notices.png\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/09\\\/19\\\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/09\\\/19\\\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\\\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"Taking a Tour of the Pirate Ship &#8216;GitHub DMCA&#8217; with R\",\"datePublished\":\"2018-09-19T14:26:17+00:00\",\"dateModified\":\"2018-09-19T16:57:07+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/09\\\/19\\\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\\\/\"},\"wordCount\":1351,\"commentCount\":1,\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"image\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/09\\\/19\\\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2018\\\/09\\\/github-notices.png\",\"articleSection\":[\"R\",\"TLAPD\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/09\\\/19\\\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/09\\\/19\\\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\\\/\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/09\\\/19\\\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\\\/\",\"name\":\"Taking a Tour of the Pirate Ship 'GitHub DMCA' with R - rud.is\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/09\\\/19\\\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/09\\\/19\\\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2018\\\/09\\\/github-notices.png\",\"datePublished\":\"2018-09-19T14:26:17+00:00\",\"dateModified\":\"2018-09-19T16:57:07+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/09\\\/19\\\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/09\\\/19\\\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/09\\\/19\\\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\\\/#primaryimage\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2018\\\/09\\\/github-notices.png?fit=2050%2C1022&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2018\\\/09\\\/github-notices.png?fit=2050%2C1022&ssl=1\",\"width\":2050,\"height\":1022},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/09\\\/19\\\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/rud.is\\\/b\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Taking a Tour of the Pirate Ship &#8216;GitHub DMCA&#8217; with R\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/rud.is\\\/b\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\\\/\\\/rud.is\"],\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/author\\\/hrbrmstr\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Taking a Tour of the Pirate Ship 'GitHub DMCA' with R - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/","og_locale":"en_US","og_type":"article","og_title":"Taking a Tour of the Pirate Ship 'GitHub DMCA' with R - rud.is","og_description":"Despite having sailed through the core components of this year&#8217;s Talk Like A Pirate Day R post a few months ago, time has been an enemy of late so this will be a short post that others can build off of, especially since there&#8217;s lots more knife work ground to cover from the data. DMC-WhAt? [&hellip;]","og_url":"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/","og_site_name":"rud.is","article_published_time":"2018-09-19T14:26:17+00:00","article_modified_time":"2018-09-19T16:57:07+00:00","og_image":[{"url":"https:\/\/rud.is\/b\/wp-content\/uploads\/2018\/09\/github-notices.png","type":"","width":"","height":""}],"author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"Taking a Tour of the Pirate Ship &#8216;GitHub DMCA&#8217; with R","datePublished":"2018-09-19T14:26:17+00:00","dateModified":"2018-09-19T16:57:07+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/"},"wordCount":1351,"commentCount":1,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"image":{"@id":"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/b\/wp-content\/uploads\/2018\/09\/github-notices.png","articleSection":["R","TLAPD"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/","url":"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/","name":"Taking a Tour of the Pirate Ship 'GitHub DMCA' with R - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"primaryImageOfPage":{"@id":"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/#primaryimage"},"image":{"@id":"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/b\/wp-content\/uploads\/2018\/09\/github-notices.png","datePublished":"2018-09-19T14:26:17+00:00","dateModified":"2018-09-19T16:57:07+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/#primaryimage","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/github-notices.png?fit=2050%2C1022&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/09\/github-notices.png?fit=2050%2C1022&ssl=1","width":2050,"height":1022},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2018\/09\/19\/taking-a-tour-of-the-pirate-ship-github-dmca-with-r\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"Taking a Tour of the Pirate Ship &#8216;GitHub DMCA&#8217; with R"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-308","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":2028,"url":"https:\/\/rud.is\/b\/2013\/02\/07\/retrieve-ip-asn-bgp-peer-info-with-r\/","url_meta":{"origin":11540,"position":0},"title":"Retrieve IP ASN &#038; BGP Peer Info With R","author":"hrbrmstr","date":"2013-02-07","format":false,"excerpt":"This is part of a larger project I'm working on, but it's useful enough to share (github version coming soon). The fine folks at @TeamCymru have a great service to map IP addresses to ASN\/BGP information en masse. There are libraries for Python, Perl and other languages but none for\u2026","rel":"","context":"In &quot;Development&quot;","block_context":{"text":"Development","link":"https:\/\/rud.is\/b\/category\/development\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3558,"url":"https:\/\/rud.is\/b\/2015\/07\/25\/roll-your-own-gist-comments-notifier-in-r\/","url_meta":{"origin":11540,"position":1},"title":"Roll Your Own Gist Comments Notifier in R","author":"hrbrmstr","date":"2015-07-25","format":false,"excerpt":"As I was putting together the [coord_proj](https:\/\/rud.is\/b\/2015\/07\/24\/a-path-towards-easier-map-projection-machinations-with-ggplot2\/) ggplot2 extension I had posted a (https:\/\/gist.github.com\/hrbrmstr\/363e33f74e2972c93ca7) that I shared on Twitter. Said gist received a comment (several, in fact) and a bunch of us were painfully reminded of the fact that there is no built-in way to receive notifications from said comment\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6385,"url":"https:\/\/rud.is\/b\/2017\/09\/19\/pirating-web-content-responsibly-with-r\/","url_meta":{"origin":11540,"position":2},"title":"Pirating Web Content Responsibly With R","author":"hrbrmstr","date":"2017-09-19","format":false,"excerpt":"International Code Talk Like A Pirate Day almost slipped by without me noticing (September has been a crazy busy month), but it popped up in the calendar notifications today and I was glad that I had prepped the meat of a post a few weeks back. There will be no\u2026","rel":"","context":"In &quot;data wrangling&quot;","block_context":{"text":"data wrangling","link":"https:\/\/rud.is\/b\/category\/data-wrangling\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1200%2C917&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1200%2C917&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1200%2C917&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1200%2C917&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/09\/Plot_Zoom-2.png?fit=1200%2C917&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":13382,"url":"https:\/\/rud.is\/b\/2022\/04\/03\/turning-ggplot2-into-a-pos-point-of-sale-system\/","url_meta":{"origin":11540,"position":3},"title":"Turning {ggplot2} Into a PoS (Point-of-Sale) System","author":"hrbrmstr","date":"2022-04-03","format":false,"excerpt":"At the end of March, I caught a fleeting tweet that showcased an Epson thermal receipt printer generating a new \"ticket\" whenever a new GitHub issue was filed on a repository. @aschmelyun documents it well in this blog post. It's a pretty cool hack, self-contained on a Pi Zero. Andrew's\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2022\/04\/FPR9RFaXwAgE6bN-scaled.jpeg?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2022\/04\/FPR9RFaXwAgE6bN-scaled.jpeg?resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2022\/04\/FPR9RFaXwAgE6bN-scaled.jpeg?resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2022\/04\/FPR9RFaXwAgE6bN-scaled.jpeg?resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2022\/04\/FPR9RFaXwAgE6bN-scaled.jpeg?resize=1050%2C600 3x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2022\/04\/FPR9RFaXwAgE6bN-scaled.jpeg?resize=1400%2C800 4x"},"classes":[]},{"id":4547,"url":"https:\/\/rud.is\/b\/2016\/07\/24\/mid-year-r-packages-update-summary\/","url_meta":{"origin":11540,"position":4},"title":"Mid-year R Packages Update Summary","author":"hrbrmstr","date":"2016-07-24","format":false,"excerpt":"I been updating some existing packages and github-releasing new ones (before a CRAN push). Most are \"cyber\"-related, but there are some general purpose ones. Here's a quick overview: docxtractr (CRAN, now, v0.2.0) was initially designed to make it easy to get data tables out of MS Word (docx) documents. The\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11648,"url":"https:\/\/rud.is\/b\/2018\/11\/14\/use-github-vulnerability-alerts-to-keep-users-of-your-r-packages-safe\/","url_meta":{"origin":11540,"position":5},"title":"Use GitHub Vulnerability Alerts to Keep Users of Your R Packages Safe","author":"hrbrmstr","date":"2018-11-14","format":false,"excerpt":"Despite their now inherent evil status, GitHub has some tools other repository aggregators do not. One such tool is the free vulnerability alert service which will scan repositories for outdated+vulnerable dependencies. Now, \"R\" is nowhere near a first-class citizen in the internet writ large, including software development tooling (e.g. the\u2026","rel":"","context":"In &quot;Cybersecurity&quot;","block_context":{"text":"Cybersecurity","link":"https:\/\/rud.is\/b\/category\/cybersecurity\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/11\/Screen-Shot-2018-11-14-at-08.43.14.png?fit=1200%2C424&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/11\/Screen-Shot-2018-11-14-at-08.43.14.png?fit=1200%2C424&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/11\/Screen-Shot-2018-11-14-at-08.43.14.png?fit=1200%2C424&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/11\/Screen-Shot-2018-11-14-at-08.43.14.png?fit=1200%2C424&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/11\/Screen-Shot-2018-11-14-at-08.43.14.png?fit=1200%2C424&ssl=1&resize=1050%2C600 3x"},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11540","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=11540"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11540\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=11540"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=11540"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=11540"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}