

{"id":9579,"date":"2018-04-12T06:55:36","date_gmt":"2018-04-12T11:55:36","guid":{"rendered":"https:\/\/rud.is\/b\/?p=9579"},"modified":"2018-04-13T14:56:38","modified_gmt":"2018-04-13T19:56:38","slug":"convert-epub-to-text-for-processing-in-r","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/","title":{"rendered":"Convert epub to Text for Processing in R"},"content":{"rendered":"<p>@RMHoge asked the following on Twitter:<\/p>\n<blockquote class=\"twitter-tweet\" data-lang=\"en\">\n<p lang=\"en\" dir=\"ltr\">Hello <a href=\"https:\/\/twitter.com\/hashtag\/rstats?src=hash&amp;ref_src=twsrc%5Etfw\">#rstats<\/a> hyve mind! Is there a package that reads epub into R? I can not find any, I now convert to text and parse the text but you sort of lose the structure of the text. Pinging  <a href=\"https:\/\/twitter.com\/dataandme?ref_src=twsrc%5Etfw\">@dataandme<\/a> <a href=\"https:\/\/twitter.com\/hrbrmstr?ref_src=twsrc%5Etfw\">@hrbrmstr<\/a><\/p>\n<p>&mdash; Roel (@RMHoge) <a href=\"https:\/\/twitter.com\/RMHoge\/status\/984345828671344640?ref_src=twsrc%5Etfw\">April 12, 2018<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>Here&#8217;s one way to do that which doesn&#8217;t rely on <code>pandoc<\/code> (<code>pandoc<\/code> can easily do this and ships with RStudio but shelling out for this is cheating :-)<\/p>\n<p>We&#8217;ll need some help (NOTE that 2 of these are &#8220;GitHub&#8221; packages)<\/p>\n<pre id=\"epubtotext01\"><code class=\"language-r\">library(archive) # install_github(&quot;jimhester\/archive&quot;) + 3rd party library\r\nlibrary(hgr) # install_github(&quot;hrbrmstr\/hgr&quot;)\r\nlibrary(stringi)\r\nlibrary(tidyverse)<\/code><\/pre>\n<p>We&#8217;ll use one of @hadleywickham&#8217;s books since it&#8217;s O&#8217;Reilly and they do epubs well. The <code>archive<\/code> package lets us treat the epub (which is really just a ZIP file) as a mini-filesystem and embraces &#8220;tidy&#8221; so we have lovely data frames to work with:<\/p>\n<pre id=\"epubtotext02\"><code class=\"language-r\">bk_src &lt;- &quot;~\/Data\/R Packages.epub&quot;\r\n\r\nbk &lt;- archive::archive(bk_src)\r\n\r\nbk\r\n## # A tibble: 92 x 3\r\n##    path                           size date               \r\n##    &lt;chr&gt;                         &lt;dbl&gt; &lt;dttm&gt;             \r\n##  1 mimetype                        20. 2015-03-24 21:49:16\r\n##  2 OEBPS\/assets\/cover.png      211616. 2015-06-03 16:16:56\r\n##  3 OEBPS\/content.opf            10193. 2015-03-24 21:49:16\r\n##  4 OEBPS\/toc.ncx                30037. 2015-03-24 21:49:16\r\n##  5 OEBPS\/cover.html               315. 2015-03-24 21:49:16\r\n##  6 OEBPS\/titlepage01.html         466. 2015-03-24 21:49:16\r\n##  7 OEBPS\/copyright-page01.html   3286. 2015-03-24 21:49:16\r\n##  8 OEBPS\/toc01.html             17557. 2015-03-24 21:49:16\r\n##  9 OEBPS\/preface01.html         17784. 2015-03-24 21:49:16\r\n## 10 OEBPS\/part01.html              444. 2015-03-24 21:49:16\r\n## # ... with 82 more rows<\/code><\/pre>\n<p>We care not about crufty bits and only want HTML files (NOTE: I use <code>html<\/code> for the pattern since they can be <code>.xhtml<\/code> files as well):<\/p>\n<pre id=\"epubtotext03\"><code class=\"language-r\">## # A tibble: 26 x 3\r\n##    path                          size date               \r\n##    &lt;chr&gt;                        &lt;dbl&gt; &lt;dttm&gt;             \r\n##  1 OEBPS\/cover.html              315. 2015-03-24 21:49:16\r\n##  2 OEBPS\/titlepage01.html        466. 2015-03-24 21:49:16\r\n##  3 OEBPS\/copyright-page01.html  3286. 2015-03-24 21:49:16\r\n##  4 OEBPS\/toc01.html            17557. 2015-03-24 21:49:16\r\n##  5 OEBPS\/preface01.html        17784. 2015-03-24 21:49:16\r\n##  6 OEBPS\/part01.html             444. 2015-03-24 21:49:16\r\n##  7 OEBPS\/ch01.html             12007. 2015-03-24 21:49:16\r\n##  8 OEBPS\/ch02.html             28633. 2015-03-24 21:49:18\r\n##  9 OEBPS\/part02.html             454. 2015-03-24 21:49:18\r\n## 10 OEBPS\/ch03.html             28629. 2015-03-24 21:49:18\r\n## # ... with 16 more rows<\/code><\/pre>\n<p>Let&#8217;s read in one file (as a test) and convert it to text and show the first few lines of it:<\/p>\n<pre id=\"epubtotext04\"><code class=\"language-r\">archive::archive_read(bk, &quot;OEBPS\/preface01.html&quot;) %&gt;%\r\n  read_lines() %&gt;%\r\n  paste0(collapse = &quot;\\n&quot;) -&gt; chapter\r\n\r\nhgr::clean_text(chapter) %&gt;%\r\n  stri_sub(1, 1000) %&gt;%\r\n  cat()\r\n## Preface\r\n## \r\n## \r\n## In This Book\r\n## \r\n## This book will guide you from being a user of R packages to being a creator of R packages. In , you\u2019ll learn why mastering this skill is so important, and why it\u2019s easier than you think. Next, you\u2019ll learn about the basic structure of a package, and the forms it can take, in . The subsequent chapters go into more detail about each component. They\u2019re roughly organized in order of importance:\r\n## \r\n## \r\n##  The most important directory is R\/, where your R code lives. A package with just this directory is still a useful package. (And indeed, if you stop reading the book after this chapter, you\u2019ll have still learned some useful new skills.)\r\n##  \r\n##  The DESCRIPTION lets you describe what your package needs to work. If you\u2019re sharing your package, you\u2019ll also use the DESCRIPTION to describe what it does, who can use it (the license), and who to contact if things go wrong.\r\n##  \r\n##  If you want other people (including \u201cfuture you\u201d!) to understand how to use the functions in your package, you\u2019<\/code><\/pre>\n<p><code>hgr::clean_text()<\/code> uses some XSLT magic to pull text. My <a href=\"https:\/\/github.com\/hrbrmstr\/jericho\"><code>jericho<\/code>?<\/a> can often do a better job but it&#8217;s <code>rJava<\/code>-based so a bit painful for some folks to get running.<\/p>\n<p>Now, we&#8217;ll convert all the files:<\/p>\n<pre id=\"epubtotext05\"><code class=\"language-r\">filter(bk, stri_detect_fixed(path, &quot;html&quot;)) %&gt;%\r\n  mutate(content = map_chr(path, ~{\r\n    archive::archive_read(bk, .x) %&gt;%\r\n      read_lines() %&gt;%\r\n      paste0(collapse = &quot;\\n&quot;) %&gt;%\r\n      hgr::clean_text()\r\n  })) %&gt;%\r\n  print(n=27)\r\n## # A tibble: 26 x 4\r\n##    path                          size date                content         \r\n##    &lt;chr&gt;                        &lt;dbl&gt; &lt;dttm&gt;              &lt;chr&gt;           \r\n##  1 OEBPS\/cover.html              315. 2015-03-24 21:49:16 Cover           \r\n##  2 OEBPS\/titlepage01.html        466. 2015-03-24 21:49:16 &quot;R Packages\\n\\n\u2026\r\n##  3 OEBPS\/copyright-page01.html  3286. 2015-03-24 21:49:16 &quot;R Packages\\n\\n\u2026\r\n##  4 OEBPS\/toc01.html            17557. 2015-03-24 21:49:16 &quot;navPrefaceIn T\u2026\r\n##  5 OEBPS\/preface01.html        17784. 2015-03-24 21:49:16 &quot;Preface\\n\\n\\nI\u2026\r\n##  6 OEBPS\/part01.html             444. 2015-03-24 21:49:16 Getting Started \r\n##  7 OEBPS\/ch01.html             12007. 2015-03-24 21:49:16 &quot;Introduction\\n\u2026\r\n##  8 OEBPS\/ch02.html             28633. 2015-03-24 21:49:18 &quot;Package Struct\u2026\r\n##  9 OEBPS\/part02.html             454. 2015-03-24 21:49:18 Package Compone\u2026\r\n## 10 OEBPS\/ch03.html             28629. 2015-03-24 21:49:18 &quot;R Code\\n\\nThe \u2026\r\n## 11 OEBPS\/ch04.html             31275. 2015-03-24 21:49:18 &quot;Package Metada\u2026\r\n## 12 OEBPS\/ch05.html             42089. 2015-03-24 21:49:18 &quot;Object Documen\u2026\r\n## 13 OEBPS\/ch06.html             31484. 2015-03-24 21:49:18 &quot;Vignettes: Lon\u2026\r\n## 14 OEBPS\/ch07.html             28594. 2015-03-24 21:49:18 &quot;Testing\\n\\nTes\u2026\r\n## 15 OEBPS\/ch08.html             30808. 2015-03-24 21:49:18 &quot;Namespace\\n\\nT\u2026\r\n## 16 OEBPS\/ch09.html             12125. 2015-03-24 21:49:18 &quot;External Data\\\u2026\r\n## 17 OEBPS\/ch10.html             42013. 2015-03-24 21:49:18 &quot;Compiled Code\\\u2026\r\n## 18 OEBPS\/ch11.html              8933. 2015-03-24 21:49:18 &quot;Installed File\u2026\r\n## 19 OEBPS\/ch12.html              3897. 2015-03-24 21:49:18 &quot;Other Componen\u2026\r\n## 20 OEBPS\/part03.html             446. 2015-03-24 21:49:18 Best Practices  \r\n## 21 OEBPS\/ch13.html             59493. 2015-03-24 21:49:18 &quot;Git and GitHub\u2026\r\n## 22 OEBPS\/ch14.html             44702. 2015-03-24 21:49:18 &quot;Automated Chec\u2026\r\n## 23 OEBPS\/ch15.html             39450. 2015-03-24 21:49:18 &quot;Releasing a Pa\u2026\r\n## 24 OEBPS\/ix01.html             75277. 2015-03-24 21:49:20 IndexAad hoc te\u2026\r\n## 25 OEBPS\/colophon01.html         974. 2015-03-24 21:49:20 &quot;About the Auth\u2026\r\n## 26 OEBPS\/colophon02.html        1653. 2015-03-24 21:49:20 &quot;Colophon\\n\\nTh\u2026<\/code><\/pre>\n<p><strike>I&#8217;m not wrapping this into a package anytime soon but this is also a pretty basic flow that may not require a package.<\/strike> This has been wrapped into a small package dubbed <a href=\"https:\/\/github.com\/hrbrmstr\/pubcrawl\"><code>pubcrawl<\/code>?<\/a>.<\/p>\n<p>Drop a note in the comments with your hints\/workflows on converting epub to plaintext!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>@RMHoge asked the following on Twitter: Hello #rstats hyve mind! Is there a package that reads epub into R? I can not find any, I now convert to text and parse the text but you sort of lose the structure of the text. Pinging @dataandme @hrbrmstr &mdash; Roel (@RMHoge) April 12, 2018 Here&#8217;s one way [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[91],"tags":[787],"class_list":["post-9579","post","type-post","status-publish","format-standard","hentry","category-r","tag-r6"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Convert epub to Text for Processing in R - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Convert epub to Text for Processing in R - rud.is\" \/>\n<meta property=\"og:description\" content=\"@RMHoge asked the following on Twitter: Hello #rstats hyve mind! Is there a package that reads epub into R? I can not find any, I now convert to text and parse the text but you sort of lose the structure of the text. Pinging @dataandme @hrbrmstr &mdash; Roel (@RMHoge) April 12, 2018 Here&#8217;s one way [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2018-04-12T11:55:36+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-04-13T19:56:38+00:00\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"Convert epub to Text for Processing in R\",\"datePublished\":\"2018-04-12T11:55:36+00:00\",\"dateModified\":\"2018-04-13T19:56:38+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/\"},\"wordCount\":299,\"commentCount\":2,\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"keywords\":[\"r6\"],\"articleSection\":[\"R\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/\",\"url\":\"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/\",\"name\":\"Convert epub to Text for Processing in R - rud.is\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/#website\"},\"datePublished\":\"2018-04-12T11:55:36+00:00\",\"dateModified\":\"2018-04-13T19:56:38+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/rud.is\/b\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Convert epub to Text for Processing in R\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/rud.is\/b\/#website\",\"url\":\"https:\/\/rud.is\/b\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/rud.is\/b\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\/\/rud.is\"],\"url\":\"https:\/\/rud.is\/b\/author\/hrbrmstr\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Convert epub to Text for Processing in R - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/","og_locale":"en_US","og_type":"article","og_title":"Convert epub to Text for Processing in R - rud.is","og_description":"@RMHoge asked the following on Twitter: Hello #rstats hyve mind! Is there a package that reads epub into R? I can not find any, I now convert to text and parse the text but you sort of lose the structure of the text. Pinging @dataandme @hrbrmstr &mdash; Roel (@RMHoge) April 12, 2018 Here&#8217;s one way [&hellip;]","og_url":"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/","og_site_name":"rud.is","article_published_time":"2018-04-12T11:55:36+00:00","article_modified_time":"2018-04-13T19:56:38+00:00","author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"Convert epub to Text for Processing in R","datePublished":"2018-04-12T11:55:36+00:00","dateModified":"2018-04-13T19:56:38+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/"},"wordCount":299,"commentCount":2,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"keywords":["r6"],"articleSection":["R"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/","url":"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/","name":"Convert epub to Text for Processing in R - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"datePublished":"2018-04-12T11:55:36+00:00","dateModified":"2018-04-13T19:56:38+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2018\/04\/12\/convert-epub-to-text-for-processing-in-r\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"Convert epub to Text for Processing in R"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-2uv","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":5854,"url":"https:\/\/rud.is\/b\/2017\/04\/30\/r%e2%81%b6-using-pandoc-from-r-a-neat-package-for-reading-subtitles\/","url_meta":{"origin":9579,"position":0},"title":"R\u2076 \u2014 Using pandoc from R + A Neat Package For Reading Subtitles","author":"hrbrmstr","date":"2017-04-30","format":false,"excerpt":"Once I realized that my planned, larger post would not come to fruition today I took the R\u2076 post (i.e. \"minimal expository, keen focus\") route, prompted by a Twitter discussion with some R mates who needed to convert \"lightly formatted\" Microsoft Word (docx) documents to markdown. Something like this: to:\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/04\/flash.png?fit=1200%2C643&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/04\/flash.png?fit=1200%2C643&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/04\/flash.png?fit=1200%2C643&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/04\/flash.png?fit=1200%2C643&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/04\/flash.png?fit=1200%2C643&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":4867,"url":"https:\/\/rud.is\/b\/2017\/01\/10\/knit-directly-to-jupyter-notebooks-from-rstudio\/","url_meta":{"origin":9579,"position":1},"title":"Knit directly to jupyter notebooks from RStudio","author":"hrbrmstr","date":"2017-01-10","format":false,"excerpt":"Did you know that you can completely replace the \"knitting\" engine in R Markdown documents? Well, you can! Why would you want to do this? Well, in the case of this post, to commit the unpardonable sin of creating a clunky jupyter notebook from a pristine Rmd file. I'm definitely\u2026","rel":"","context":"In &quot;Python&quot;","block_context":{"text":"Python","link":"https:\/\/rud.is\/b\/category\/python-2\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":10111,"url":"https:\/\/rud.is\/b\/2018\/04\/18\/examining-potus-executive-orders\/","url_meta":{"origin":9579,"position":2},"title":"Examining POTUS Executive Orders","author":"hrbrmstr","date":"2018-04-18","format":false,"excerpt":"This week's edition of Data is Plural had two really fun data sets. One is serious fun (the first comprehensive data set on U.S. evictions, and the other I knew about but had forgotten: The Federal Register Executive Order (EO) data set(s). The EO data is also comprehensive as the\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/eo-count-2.png?fit=1200%2C635&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/eo-count-2.png?fit=1200%2C635&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/eo-count-2.png?fit=1200%2C635&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/eo-count-2.png?fit=1200%2C635&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/eo-count-2.png?fit=1200%2C635&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":12896,"url":"https:\/\/rud.is\/b\/2021\/01\/24\/calling-compiled-swift-from-r-part-2\/","url_meta":{"origin":9579,"position":3},"title":"Calling [Compiled] Swift from R: Part 2","author":"hrbrmstr","date":"2021-01-24","format":false,"excerpt":"The previous post introduced the topic of how to compile Swift code for use in R using a useless, toy example. This one goes a bit further and makes a case for why one might want to do this by showing how to use one of Apple's machine learning libraries,\u2026","rel":"","context":"In &quot;macOS&quot;","block_context":{"text":"macOS","link":"https:\/\/rud.is\/b\/category\/macos\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":4527,"url":"https:\/\/rud.is\/b\/2016\/07\/12\/slaying-cidr-orcs-with-triebeard-a-k-a-fast-trie-based-ipv4-in-cidr-lookups-in-r\/","url_meta":{"origin":9579,"position":4},"title":"Slaying CIDR Orcs with Triebeard (a.k.a. fast trie-based &#8216;IPv4-in-CIDR&#8217; lookups in R)","author":"hrbrmstr","date":"2016-07-12","format":false,"excerpt":"The insanely productive elf-lord, @quominus put together a small package ([`triebeard`](https:\/\/github.com\/ironholds\/triebeard)) that exposes an API for [radix\/prefix tries](https:\/\/en.wikipedia.org\/wiki\/Trie) at both the R and Rcpp levels. I know he had some personal needs for this and we both kinda need these to augment some functions in our `iptools` package. Despite `triebeard`\u2026","rel":"","context":"In &quot;Cybersecurity&quot;","block_context":{"text":"Cybersecurity","link":"https:\/\/rud.is\/b\/category\/cybersecurity\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6215,"url":"https:\/\/rud.is\/b\/2017\/09\/04\/readability-redux\/","url_meta":{"origin":9579,"position":5},"title":"Readability Redux","author":"hrbrmstr","date":"2017-09-04","format":false,"excerpt":"I recently posted about using a Python module to convert HTML to usable text. Since then, a new package has hit CRAN dubbed htm2txt that is 100% R and uses regular expressions to strip tags from text. I gave it a spin so folks could compare some basic output, but\u2026","rel":"","context":"In &quot;data wrangling&quot;","block_context":{"text":"data wrangling","link":"https:\/\/rud.is\/b\/category\/data-wrangling\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/9579","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=9579"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/9579\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=9579"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=9579"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=9579"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}