

{"id":11215,"date":"2018-08-04T09:17:46","date_gmt":"2018-08-04T14:17:46","guid":{"rendered":"https:\/\/rud.is\/b\/?p=11215"},"modified":"2018-08-05T06:51:14","modified_gmt":"2018-08-05T11:51:14","slug":"digging-into-mbox-details-a-tale-of-tm-reticulate","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/","title":{"rendered":"Digging into mbox details: A tale of tm &#038; reticulate"},"content":{"rendered":"<style>sup#x >a { font-size: 8pt !important }<\/style>\n<div style=\"float:right\"><a href=\"#f0\">\u2728<\/a><\/div>\n<p>I had to processes a bunch of emails for a <code>$DAYJOB<\/code> task this week and my &#8220;default setting&#8221; is to use R for pretty much everything (this should come as no surprise). Treating mail as data is not an uncommon task and many R packages exist that can reach out and grab mail from servers or work directly with local mail archives.<\/p>\n<h3>Mbox&#8217;in off the rails on a crazy tm<sup id=\"x\"><a href=\"#f1\">1<\/a><\/sup><\/h3>\n<p>This particular mail corpus is in <a href=\"https:\/\/www.loc.gov\/preservation\/digital\/formats\/fdd\/fdd000383.shtml\"><code>mbox<\/code>?<\/a> format since it was saved via Apple Mail. It&#8217;s one big text file with each message appearing one after the other. The format has been around for decades, and R&#8217;s <code>tm<\/code> package &#8212; via the <code>tm.plugin.mail<\/code> plugin package &#8212; can process these <code>mbox<\/code> files.<\/p>\n<p>To demonstrate, we&#8217;ll use an Apple Mail archive excerpt from a set of R mailing list messages as they are not private\/sensitive:<\/p>\n<pre><code class=\"language-r\">library(tm)\nlibrary(tm.plugin.mail)\n\n# point the tm corpus machinery to the mbox file and let it know the timestamp format since it varies\nVCorpus(\n  MBoxSource(\"~\/Data\/test.mbox\/mbox\"),\n  readerControl = list(\n    reader = readMail(DateFormat = \"%a, %e %b %Y %H:%M:%S %z\")\n  )\n) -> mbox\n\nstr(unclass(mbox), 1)\n## List of 3\n##  $ content:List of 198\n##  $ meta   : list()\n##   ..- attr(*, \"class\")= chr \"CorpusMeta\"\n##  $ dmeta  :'data.frame': 198 obs. of  0 variables\n\nstr(unclass(mbox[[1]]), 1)\n## List of 2\n##  $ content: chr [1:476] \"Try this:\" \"\" \"> library(lubridate)\" \"> library(tidyverse)\" ...\n##  $ meta   :List of 9\n##   ..- attr(*, \"class\")= chr \"TextDocumentMeta\"\n\nstr(unclass(mbox[[1]]$meta), 1)\n## List of 9\n##  $ author       : chr \"jim holtman <jholtman@gmail.com>\"\n##  $ datetimestamp: POSIXlt[1:1], format: \"2018-08-01 15:01:17\"\n##  $ description  : chr(0) \n##  $ heading      : chr \"Re: [R] read txt file - date - no space\"\n##  $ id           : chr \"<CAAxdm-5rNfu-+zE2PNkpudM33-coJBaiCrChWuDePtZfFerAjw@mail.gmail.com>\"\n##  $ language     : chr \"en\"\n##  $ origin       : chr(0) \n##  $ header       : chr [1:145] \"Delivered-To: bob@rud.is\" \"Received: by 2002:ac0:e681:0:0:0:0:0 with SMTP id b1-v6csp950182imq;\" \"        Wed, 1 Aug 2018 08:02:23 -0700 (PDT)\" \"X-Google-Smtp-Source: AAOMgpcdgBD4sDApBiF2DpKRfFZ9zi\/4Ao32Igz9n8vT7EgE6InRoa7VZelMIik7OVmrFCRPDBde\" ...\n##  $              : NULL<\/code><\/pre>\n<p>We&#8217;re using <code>unclass()<\/code> since the <code>str()<\/code> output gets a bit crowded with all of the <code>tm<\/code> class attributes stuck in the output display.<\/p>\n<p>The <code>tm<\/code> suite is designed for text mining. My task had nothing to do with text mining and I really just needed some header fields and body content in a data frame. If you&#8217;ve been working with R for a while, some things in the <code>str()<\/code> output will no doubt cause a bit of angst. For instance:<\/p>\n<ul>\n<li><code>datetimestamp: POSIXlt[1:1],<\/code> : <code>POSIXlt<\/code> ? and data frames really don&#8217;t mix well<\/li>\n<li><code>description  : chr(0)<\/code> \/ <code>origin       : chr(0)<\/code>: zero-length character vectors \u2639\ufe0f<\/li>\n<li><code>$              : NULL<\/code> : Blank element name with a <code>NULL<\/code> value\u2026<em>I Don&#8217;t Even<\/em> ??\u200d\u2640\ufe0f<sup id=\"x\"><a href=\"#f2\">2<\/a><\/sup><\/li>\n<\/ul>\n<p>The <code>tm<\/code> suite is also <em>super opinionated<\/em> and &#8220;helpfully&#8221; left out a ton of headers (though it did keep the source for the complete headers around). Still, we can roll up our sleeves and turn that into a data frame:<\/p>\n<pre><code class=\"language-r\"># helper function for cleaner\/shorter code\n`%|0|%` <- function(x, y) { if (length(x) == 0) y else x }\n\n# might as well stay old-school since we're using tm\ndo.call(\n  rbind.data.frame,\n  lapply(mbox, function(.x) {\n\n    # we have a few choices, but this one is pretty explicit abt what it does\n    # so we'll likely be able to decipher it quickly in 2 years when\/if we come\n    # back to it\n\n    data.frame(\n      author = .x$meta$author %|0|% NA_character_,\n      datetimestamp = as.POSIXct(.x$meta$datetimestamp %|0|% NA),\n      description = .x$meta$description %|0|% NA_character_,\n      heading = .x$meta$heading %|0|% NA_character_,\n      id = .x$meta$id %|0|% NA_character_,\n      language = .x$meta$language %|0|% NA_character_,\n      origin = .x$meta$origin %|0|% NA_character_,\n      header = I(list(.x$meta$header %|0|% NA_character_)),\n      body = I(list(.x$content %|0|% NA_character_)),\n      stringsAsFactors = FALSE\n    )\n\n  })\n) %>%\n  glimpse()\n## Observations: 198\n## Variables: 9\n## $ author        <chr> \"jim holtman <jholtman@gmail.com>\", \"PIKAL Petr ...\n## $ datetimestamp <dttm> 2018-08-01 15:01:17, 2018-08-01 13:09:18, 2018-...\n## $ description   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...\n## $ heading       <chr> \"Re: [R] read txt file - date - no space\", \"Re: ...\n## $ id            <chr> \"<CAAxdm-5rNfu-+zE2PNkpudM33-coJBaiCrChWuDePtZfF...\n## $ language      <chr> \"en\", \"en\", \"en\", \"en\", \"en\", \"en\", \"en\", \"en\", ...\n## $ origin        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...\n## $ header        <I(list)> Delivere...., Delivere...., Delivere...., De...\n## $ body          <I(list)> Try this...., SGkNCg0K...., Dear Pik...., De... <\/code><\/pre>\n<p>That wasn&#8217;t a huge effort, but we would now have to re-process the headers and\/or write a custom version of <code>tm.plugin.mail::readMail()<\/code> (the function source is very readable and extendable) to get any extra data out. Here&#8217;s what that might look like:<\/p>\n<pre><code class=\"language-r\"># Custom msg reader\nread_mail <- function(elem, language, id) {\n\n  # extract header val\n  hdr_val <- function(src, pat) {\n    gsub(\n      sprintf(\"%s: \", pat), \"\",\n      grep(sprintf(\"^%s:\", pat), src, \"\", value = TRUE, useBytes = TRUE)\n    ) %|0|% NA\n  }\n\n  mail <- elem$content\n\n  index <- which(mail == \"\")[1]\n  header <- mail[1:index]\n  mid <- hdr_val(header, \"Message-ID\")\n\n  PlainTextDocument(\n    x = mail[(index + 1):length(mail)],\n    author = hdr_val(header, \"From\"),\n\n    spam_score = hdr_val(header, \"X-Spam-Score\"), ### <<==== an extra header!\n\n    datetimestamp = as.POSIXct(hdr_val(header, \"Date\"), format = \"%a, %e %b %Y %H:%M:%S %z\", tz = \"GMT\"),\n    description = NA_character_,\n    header = header,\n    heading = hdr_val(header, \"Subject\"),\n    id = if (length(mid)) mid[1] else id,\n    language = language,\n    origin = hdr_val(header, \"Newsgroups\"),\n    class = \"MailDocument\"\n  )\n\n}\n\nVCorpus(\n  MBoxSource(\"~\/Data\/test.mbox\/mbox\"),\n  readerControl = list(reader = read_mail)\n) -> mbox\n\nstr(unclass(mbox[[1]]$meta), 1)\n## List of 9\n##  $ author       : chr \"jim holtman <jholtman@gmail.com>\"\n##  $ datetimestamp: POSIXct[1:1], format: \"2018-08-01 15:01:17\"\n##  $ description  : chr NA\n##  $ heading      : chr \"Re: [R] read txt file - date - no space\"\n##  $ id           : chr \"<CAAxdm-5rNfu-+zE2PNkpudM33-coJBaiCrChWuDePtZfFerAjw@mail.gmail.com>\"\n##  $ language     : chr \"en\"\n##  $ origin       : chr NA\n##  $ spam_score   : chr \"-3.631\"\n##  $ header       : chr [1:145] \"Delivered-To: bob@rud.is\" \"Received: by 2002:ac0:e681:0:0:0:0:0 with SMTP id b1-v6csp950182imq;\" \"        Wed, 1 Aug 2018 08:02:23 -0700 (PDT)\" \"X-Google-Smtp-Source: AAOMgpcdgBD4sDApBiF2DpKRfFZ9zi\/4Ao32Igz9n8vT7EgE6InRoa7VZelMIik7OVmrFCRPDBde\" ...<\/code><\/pre>\n<p>If we wanted all the headers, there are even more succinct ways to solve for that use case.<\/p>\n<h3>Packaging up emails with a reticulated message.mbox<\/h3>\n<p>Since the default functionality of <code>tm.plugin.mail::readMail()<\/code> forced us to work a bit to get what we needed there&#8217;s some justification in seeking out an alternative path. I&#8217;ve <a href=\"https:\/\/rud.is\/b\/?s=reticulate\">written about <code>reticulate<\/code> before<\/a> and am including it in this post as the Python standard library module <a href=\"https:\/\/docs.python.org\/3.7\/library\/mailbox.html?highlight=mailbox#mbox\"><code>mailbox<\/code>?<\/a> can also make quick work of <code>mbox<\/code> files.<\/p>\n<p>Two pieces of advice I generally reiterate when I talk about <code>reticulate<\/code> is that I highly recommend using Python 3 (remember, it&#8217;s a fragmented ecosystem) and that I prefer specifying the specific target Python to use via the <code>RETICULATE_PYTHON<\/code> environment variable that I have in <code>~\/.Renviron<\/code> as <code>RETICULATE_PYTHON=\/usr\/local\/bin\/python3<\/code>.<\/p>\n<p>Let&#8217;s bring the <code>mailbox<\/code> module into R:<\/p>\n<pre><code class=\"language-r\">library(reticulate)\nlibrary(tidyverse)\n\nmailbox <- import(\"mailbox\")<\/code><\/pre>\n<p>If you're unfamiliar with a Python module or object, you can get help right in R via <code>reticulate::py_help()<\/code>. Et sequitur<sup id=\"x\"><a href=\"#f3\">3<\/a><\/sup>: <code>py_help(mailbox)<\/code> will bring up the text help for that module and <code>py_help(mailbox$mbox)<\/code> (remember, we swap out dots for dollars when referencing Python object components in R) will do the same for the <code>mailbox.mbox<\/code> class.<\/p>\n<p>Text help is great and all, but we can also render it to HTML with this helper function:<\/p>\n<pre><code class=\"language-r\">py_doc <- function(x) {\n  require(\"htmltools\")\n  require(\"reticulate\")\n  pydoc <- reticulate::import(\"pydoc\")\n  htmltools::html_print(\n    htmltools::HTML(\n      pydoc$render_doc(x, renderer=pydoc$HTMLDoc())\n    )\n  )\n}<\/code><\/pre>\n<p>Here's what the text and HTML help for <code>mailbox.mbox<\/code> look like side-by-side:<\/p>\n<p><a href=\"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/py-help-doc\/\" rel=\"attachment wp-att-11221\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"11221\" data-permalink=\"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/py-help-doc\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/py-help-doc.png?fit=2844%2C1520&amp;ssl=1\" data-orig-size=\"2844,1520\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"py-help-doc\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/py-help-doc.png?fit=510%2C273&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/py-help-doc.png?resize=510%2C273&#038;ssl=1\" alt=\"\" width=\"510\" height=\"273\" class=\"aligncenter size-full wp-image-11221\" \/><\/a><\/p>\n<p>We can also use a helper function to view the <a href=\"https:\/\/docs.python.org\">online documentation<\/a>:<\/p>\n<pre><code class=\"language-r\">readthedocs <- function(obj, py_ver=3, check_keywords = \"yes\") {\n  require(\"glue\")\n  query <- obj$`__name__`\n  browseURL(\n    glue::glue(\n      \"https:\/\/docs.python.org\/{py_ver}\/search.html?q={query}&#038;check_keywords={check_keywords}\"\n    )\n  )\n}<\/code><\/pre>\n<p>Et sequitur: <code>readthedocs(mailbox$mbox)<\/code> will take us to <a href=\"https:\/\/docs.python.org\/3\/search.html?q=mbox&#038;check_keywords=yes\">this results page<\/a><\/p>\n<p>Going back to the task at hand, we need to cycle through the messages and make a data frame for the bits we (well, <em>I<\/em>) care about). The <code>reticulate<\/code> package does an amazing job making Python objects first-class citizens in R, but Python objects may feel \"opaque\" to R users since we have to use the <code>$<\/code> syntax to get to methods and values and \u2014 very often \u2014 familiar helpers such as <code>str()<\/code> are less than helpful on these objects. Let's try to look at the first message (remember, Python is <code>0<\/code>-indexed):<\/p>\n<pre><code class=\"language-r\">msg1 <- mbox$get(0)\n\nstr(msg1)\n\nmsg1<\/code><\/pre>\n<p>The output for those last two calls is not shown because they both are just a <em>large<\/em> text dump of the message source. <code>#unhelpful<\/code><\/p>\n<p>We <em>can<\/em> get more details, and we'll wrap some punctuation-filled calls in two, small helper functions that have names that will sound familiar:<\/p>\n<pre><code class=\"language-r\">pstr <- function(obj, ...) { str(obj$`__dict__`, ...) } # like 'str()`\n\npnames <- function(obj) { import_builtins()$dir(obj) } # like 'names()' but more complete<\/code><\/pre>\n<p>Lets see them in action:<\/p>\n<pre><code class=\"language-r\">pstr(msg1, 1) # we can pass any params str() will take\n## List of 10\n##  $ _from        : chr \"jholtman@gmail.com Wed Aug 01 15:02:23 2018\"\n##  $ policy       :Compat32()\n##  $ _headers     :List of 56\n##  $ _unixfrom    : NULL\n##  $ _payload     : chr \"Try this:\\n\\n> library(lubridate)\\n> library(tidyverse)\\n> input <- read.csv(text =3D \\\"date,str1,str2,str3\\n+ \"| __truncated__\n##  $ _charset     : NULL\n##  $ preamble     : NULL\n##  $ epilogue     : NULL\n##  $ defects      : list()\n##  $ _default_type: chr \"text\/plain\"\n\npnames(msg1)\n##  [1] \"__bytes__\"                 \"__class__\"                \n##  [3] \"__contains__\"              \"__delattr__\"              \n##  [5] \"__delitem__\"               \"__dict__\"                 \n##  [7] \"__dir__\"                   \"__doc__\"                  \n##  [9] \"__eq__\"                    \"__format__\"               \n## [11] \"__ge__\"                    \"__getattribute__\"         \n## [13] \"__getitem__\"               \"__gt__\"                   \n## [15] \"__hash__\"                  \"__init__\"                 \n## [17] \"__init_subclass__\"         \"__iter__\"                 \n## [19] \"__le__\"                    \"__len__\"                  \n## [21] \"__lt__\"                    \"__module__\"               \n## [23] \"__ne__\"                    \"__new__\"                  \n## [25] \"__reduce__\"                \"__reduce_ex__\"            \n## [27] \"__repr__\"                  \"__setattr__\"              \n## [29] \"__setitem__\"               \"__sizeof__\"               \n## [31] \"__str__\"                   \"__subclasshook__\"         \n## [33] \"__weakref__\"               \"_become_message\"          \n## [35] \"_charset\"                  \"_default_type\"            \n## [37] \"_explain_to\"               \"_from\"                    \n## [39] \"_get_params_preserve\"      \"_headers\"                 \n## [41] \"_payload\"                  \"_type_specific_attributes\"\n## [43] \"_unixfrom\"                 \"add_flag\"                 \n## [45] \"add_header\"                \"as_bytes\"                 \n## [47] \"as_string\"                 \"attach\"                   \n## [49] \"defects\"                   \"del_param\"                \n## [51] \"epilogue\"                  \"get\"                      \n## [53] \"get_all\"                   \"get_boundary\"             \n## [55] \"get_charset\"               \"get_charsets\"             \n## [57] \"get_content_charset\"       \"get_content_disposition\"  \n## [59] \"get_content_maintype\"      \"get_content_subtype\"      \n## [61] \"get_content_type\"          \"get_default_type\"         \n## [63] \"get_filename\"              \"get_flags\"                \n## [65] \"get_from\"                  \"get_param\"                \n## [67] \"get_params\"                \"get_payload\"              \n## [69] \"get_unixfrom\"              \"is_multipart\"             \n## [71] \"items\"                     \"keys\"                     \n## [73] \"policy\"                    \"preamble\"                 \n## [75] \"raw_items\"                 \"remove_flag\"              \n## [77] \"replace_header\"            \"set_boundary\"             \n## [79] \"set_charset\"               \"set_default_type\"         \n## [81] \"set_flags\"                 \"set_from\"                 \n## [83] \"set_param\"                 \"set_payload\"              \n## [85] \"set_raw\"                   \"set_type\"                 \n## [87] \"set_unixfrom\"              \"values\"                   \n## [89] \"walk\"\n\nnames(msg1)\n##  [1] \"add_flag\"                \"add_header\"             \n##  [3] \"as_bytes\"                \"as_string\"              \n##  [5] \"attach\"                  \"defects\"                \n##  [7] \"del_param\"               \"epilogue\"               \n##  [9] \"get\"                     \"get_all\"                \n## [11] \"get_boundary\"            \"get_charset\"            \n## [13] \"get_charsets\"            \"get_content_charset\"    \n## [15] \"get_content_disposition\" \"get_content_maintype\"   \n## [17] \"get_content_subtype\"     \"get_content_type\"       \n## [19] \"get_default_type\"        \"get_filename\"           \n## [21] \"get_flags\"               \"get_from\"               \n## [23] \"get_param\"               \"get_params\"             \n## [25] \"get_payload\"             \"get_unixfrom\"           \n## [27] \"is_multipart\"            \"items\"                  \n## [29] \"keys\"                    \"policy\"                 \n## [31] \"preamble\"                \"raw_items\"              \n## [33] \"remove_flag\"             \"replace_header\"         \n## [35] \"set_boundary\"            \"set_charset\"            \n## [37] \"set_default_type\"        \"set_flags\"              \n## [39] \"set_from\"                \"set_param\"              \n## [41] \"set_payload\"             \"set_raw\"                \n## [43] \"set_type\"                \"set_unixfrom\"           \n## [45] \"values\"                  \"walk\"\n\n# See the difference between pnames() and names()\n\nsetdiff(pnames(msg1), names(msg1))\n##  [1] \"__bytes__\"                 \"__class__\"                \n##  [3] \"__contains__\"              \"__delattr__\"              \n##  [5] \"__delitem__\"               \"__dict__\"                 \n##  [7] \"__dir__\"                   \"__doc__\"                  \n##  [9] \"__eq__\"                    \"__format__\"               \n## [11] \"__ge__\"                    \"__getattribute__\"         \n## [13] \"__getitem__\"               \"__gt__\"                   \n## [15] \"__hash__\"                  \"__init__\"                 \n## [17] \"__init_subclass__\"         \"__iter__\"                 \n## [19] \"__le__\"                    \"__len__\"                  \n## [21] \"__lt__\"                    \"__module__\"               \n## [23] \"__ne__\"                    \"__new__\"                  \n## [25] \"__reduce__\"                \"__reduce_ex__\"            \n## [27] \"__repr__\"                  \"__setattr__\"              \n## [29] \"__setitem__\"               \"__sizeof__\"               \n## [31] \"__str__\"                   \"__subclasshook__\"         \n## [33] \"__weakref__\"               \"_become_message\"          \n## [35] \"_charset\"                  \"_default_type\"            \n## [37] \"_explain_to\"               \"_from\"                    \n## [39] \"_get_params_preserve\"      \"_headers\"                 \n## [41] \"_payload\"                  \"_type_specific_attributes\"\n## [43] \"_unixfrom\"<\/code><\/pre>\n<p>Using just <code>names()<\/code> excludes the \"hidden\" builtins for Python objects, but knowing they are there and what they are can be helpful, depending on the program context.<\/p>\n<p>Let's continue on the path to our messaging goal and see what headers are available. We'll use some domain knowledge about the <code>_headers<\/code> component, though we won't end up going that route to build a data frame:<\/p>\n<pre><code class=\"language-r\">map_chr(msg1$`_headers`, ~.x[[1]])\n##  [1] \"Delivered-To\"               \"Received\"                  \n##  [3] \"X-Google-Smtp-Source\"       \"X-Received\"                \n##  [5] \"ARC-Seal\"                   \"ARC-Message-Signature\"     \n##  [7] \"ARC-Authentication-Results\" \"Return-Path\"               \n##  [9] \"Received\"                   \"Received-SPF\"              \n## [11] \"Authentication-Results\"     \"Received\"                  \n## [13] \"X-Virus-Scanned\"            \"Received\"                  \n## [15] \"Received\"                   \"Received\"                  \n## [17] \"X-Virus-Scanned\"            \"X-Spam-Flag\"               \n## [19] \"X-Spam-Score\"               \"X-Spam-Level\"              \n## [21] \"X-Spam-Status\"              \"Received\"                  \n## [23] \"Received\"                   \"Received\"                  \n## [25] \"Received\"                   \"DKIM-Signature\"            \n## [27] \"X-Google-DKIM-Signature\"    \"X-Gm-Message-State\"        \n## [29] \"X-Received\"                 \"MIME-Version\"              \n## [31] \"References\"                 \"In-Reply-To\"               \n## [33] \"From\"                       \"Date\"                      \n## [35] \"Message-ID\"                 \"To\"                        \n## [37] \"X-Tag-Only\"                 \"X-Filter-Node\"             \n## [39] \"X-Spam-Level\"               \"X-Spam-Status\"             \n## [41] \"X-Spam-Flag\"                \"Content-Disposition\"       \n## [43] \"Subject\"                    \"X-BeenThere\"               \n## [45] \"X-Mailman-Version\"          \"Precedence\"                \n## [47] \"List-Id\"                    \"List-Unsubscribe\"          \n## [49] \"List-Archive\"               \"List-Post\"                 \n## [51] \"List-Help\"                  \"List-Subscribe\"            \n## [53] \"Content-Type\"               \"Content-Transfer-Encoding\" \n## [55] \"Errors-To\"                  \"Sender\"<\/code><\/pre>\n<p>The <code>mbox<\/code> object does provide a <code>get()<\/code> method to retrieve header values so we'll go that route to build our data frame but we'll make yet-another helper since doing something like <code>msg1$get(\"this header does not exist\")<\/code> will return <code>NULL<\/code> just like <code>list(a=1)$b<\/code> would. We'll actually make two new helpers since we want to be able to safely work with the payload content and that means ensuring it's in UTF-8 encoding (mail systems are horribly diverse beasts and the R community is international and, remember, we're using R mailing list messages):<\/p>\n<pre><code class=\"language-r\"># execute an object's get() method and return a character string or NA if no value was present for the key\nget_chr <- function(.x, .y) { as.character(.x[[\"get\"]](.y)) %|0|% NA_character_ }\n\n# get the object's value as a valid UTF-8 string\nutf8_decode <- function(.x) { .x[[\"decode\"]](\"utf-8\", \"ignore\") %|0|% NA_character_ }<\/code><\/pre>\n<p>We're also doing this because I get really tired of using the <code>$<\/code> syntax.<\/p>\n<p>We also want the message content or payload. Modern mail messages can be really complex structures with many <a href=\"https:\/\/tools.ietf.org\/html\/rfc1521#page-28\">multiple part entities<\/a>. To put it a different way, there may be HTML, RTF and plaintext versions of a message all in the same envelope. We want the plaintext ones so we'll have to iterate through any multipart messages to (hopefully) get to a plaintext version. Since this post is already pretty long and we ignored errors in the <code>tm<\/code> portion, I'll refrain from including any error handling code here as well.<\/p>\n<pre><code class=\"language-r\">map_df(1:py_len(mbox), ~{\n\n  m <- mbox$get(.x-1) # python uses 0-index lists\n\n  list(\n    date = as.POSIXct(get_chr(m, \"date\"), format = \"%a, %e %b %Y %H:%M:%S %z\"),\n    from = get_chr(m, \"from\"),\n    to = get_chr(m, \"to\"),\n    subj = get_chr(m, \"subject\"),\n    spam_score = get_chr(m, \"X-Spam-Score\")\n  ) -> mdf\n\n  content_type <-  m$get_content_maintype() %|0|% NA_character_\n\n  if (content_type[1] == \"text\") { # we don't want images\n    while (m$is_multipart()) m <- m$get_payload()[[1]] # cycle through until we get to something we can use\n    mtmp <- m$get_payload(decode = TRUE) # get the message text\n    mdf$body <- utf8_decode(mtmp) # make it safe to use\n  }\n\n  mdf\n\n}) -> mbox_df\n\nglimpse(mbox_df)\n## Observations: 198\n## Variables: 7\n## $ date         <dttm> 2018-08-01 11:01:17, 2018-08-01 09:09:18, 20...\n## $ from         <chr> \"jim holtman <jholtman@gmail.com>\", \"PIKAL Pe...\n## $ to           <chr> \"diego.avesani@gmail.com, R mailing list <r-h...\n## $ subj         <chr> \"Re: [R] read txt file - date - no space\", \"R...\n## $ spam_score   <chr> \"-3.631\", \"-3.533\", \"-3.631\", \"-3.631\", \"-3.5...\n## $ content_type <chr> \"text\", \"text\", \"text\", \"text\", \"text\", \"text...\n## $ body         <chr> \"Try this:\\n\\n library(lubridate)\\n library...<\/code><\/pre>\n<h3>FIN<\/h3>\n<p>By now, you've likely figured out this post really had nothing to do with reading <code>mbox<\/code> files. I mean, <em>it did<\/em> \u2014 and this was a task I had to do this week \u2014 but the real goal was to use a fairly basic task to help R folks edge a bit closer to becoming more friendly with Python in R. There hundreds of thousands of <a href=\"https:\/\/pypi.org\/\">Python packages<\/a> out there and, while I'm one to wax poetic about having R or C[++]-backed R-native packages \u2014 and am wont to point out Python's egregiously prolific flaws \u2014 sometimes you just need to get something done quickly and wish to avoid reinventing the wheel. The <code>reticulate<\/code> package makes that eminently possible.<\/p>\n<p>I'll be wrapping up some of the <code>reticulate<\/code> helper functions into a small package soon, so keep your eyes on RSS.<\/p>\n<hr \/>\n<p><sup><a name=\"f0\">\u2728<\/a>: You might want to read this even if you're not interested in <code>mbox<\/code> files. FIN (right above this note) might have some clues as to why.<\/sup><br \/>\n<sup><a name=\"f1\">1<\/a>: yes, the section title was a stretch<\/sup><br \/>\n<sup><a name=\"f2\">2<\/a>: am I doing this right, Mara? ;-)<\/sup><br \/>\n<sup><a name=\"f3\">3<\/a>: Make Latin Great Again<\/sup><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u2728 I had to processes a bunch of emails for a $DAYJOB task this week and my &#8220;default setting&#8221; is to use R for pretty much everything (this should come as no surprise). Treating mail as data is not an uncommon task and many R packages exist that can reach out and grab mail from [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[640,91],"tags":[],"class_list":["post-11215","post","type-post","status-publish","format-standard","hentry","category-python-2","category-r"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Digging into mbox details: A tale of tm &amp; reticulate - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Digging into mbox details: A tale of tm &amp; reticulate - rud.is\" \/>\n<meta property=\"og:description\" content=\"\u2728 I had to processes a bunch of emails for a $DAYJOB task this week and my &#8220;default setting&#8221; is to use R for pretty much everything (this should come as no surprise). Treating mail as data is not an uncommon task and many R packages exist that can reach out and grab mail from [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2018-08-04T14:17:46+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-08-05T11:51:14+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/rud.is\/b\/wp-content\/uploads\/2018\/08\/py-help-doc.png\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/08\\\/04\\\/digging-into-mbox-details-a-tale-of-tm-reticulate\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/08\\\/04\\\/digging-into-mbox-details-a-tale-of-tm-reticulate\\\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"Digging into mbox details: A tale of tm &#038; reticulate\",\"datePublished\":\"2018-08-04T14:17:46+00:00\",\"dateModified\":\"2018-08-05T11:51:14+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/08\\\/04\\\/digging-into-mbox-details-a-tale-of-tm-reticulate\\\/\"},\"wordCount\":1142,\"commentCount\":3,\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"image\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/08\\\/04\\\/digging-into-mbox-details-a-tale-of-tm-reticulate\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2018\\\/08\\\/py-help-doc.png\",\"articleSection\":[\"Python\",\"R\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/08\\\/04\\\/digging-into-mbox-details-a-tale-of-tm-reticulate\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/08\\\/04\\\/digging-into-mbox-details-a-tale-of-tm-reticulate\\\/\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/08\\\/04\\\/digging-into-mbox-details-a-tale-of-tm-reticulate\\\/\",\"name\":\"Digging into mbox details: A tale of tm & reticulate - rud.is\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/08\\\/04\\\/digging-into-mbox-details-a-tale-of-tm-reticulate\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/08\\\/04\\\/digging-into-mbox-details-a-tale-of-tm-reticulate\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2018\\\/08\\\/py-help-doc.png\",\"datePublished\":\"2018-08-04T14:17:46+00:00\",\"dateModified\":\"2018-08-05T11:51:14+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/08\\\/04\\\/digging-into-mbox-details-a-tale-of-tm-reticulate\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/08\\\/04\\\/digging-into-mbox-details-a-tale-of-tm-reticulate\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/08\\\/04\\\/digging-into-mbox-details-a-tale-of-tm-reticulate\\\/#primaryimage\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2018\\\/08\\\/py-help-doc.png?fit=2844%2C1520&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2018\\\/08\\\/py-help-doc.png?fit=2844%2C1520&ssl=1\",\"width\":2844,\"height\":1520},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/08\\\/04\\\/digging-into-mbox-details-a-tale-of-tm-reticulate\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/rud.is\\\/b\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Digging into mbox details: A tale of tm &#038; reticulate\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/rud.is\\\/b\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\\\/\\\/rud.is\"],\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/author\\\/hrbrmstr\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Digging into mbox details: A tale of tm & reticulate - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/","og_locale":"en_US","og_type":"article","og_title":"Digging into mbox details: A tale of tm & reticulate - rud.is","og_description":"\u2728 I had to processes a bunch of emails for a $DAYJOB task this week and my &#8220;default setting&#8221; is to use R for pretty much everything (this should come as no surprise). Treating mail as data is not an uncommon task and many R packages exist that can reach out and grab mail from [&hellip;]","og_url":"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/","og_site_name":"rud.is","article_published_time":"2018-08-04T14:17:46+00:00","article_modified_time":"2018-08-05T11:51:14+00:00","og_image":[{"url":"https:\/\/rud.is\/b\/wp-content\/uploads\/2018\/08\/py-help-doc.png","type":"","width":"","height":""}],"author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"Digging into mbox details: A tale of tm &#038; reticulate","datePublished":"2018-08-04T14:17:46+00:00","dateModified":"2018-08-05T11:51:14+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/"},"wordCount":1142,"commentCount":3,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"image":{"@id":"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/b\/wp-content\/uploads\/2018\/08\/py-help-doc.png","articleSection":["Python","R"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/","url":"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/","name":"Digging into mbox details: A tale of tm & reticulate - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"primaryImageOfPage":{"@id":"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/#primaryimage"},"image":{"@id":"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/b\/wp-content\/uploads\/2018\/08\/py-help-doc.png","datePublished":"2018-08-04T14:17:46+00:00","dateModified":"2018-08-05T11:51:14+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/#primaryimage","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/py-help-doc.png?fit=2844%2C1520&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/py-help-doc.png?fit=2844%2C1520&ssl=1","width":2844,"height":1520},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2018\/08\/04\/digging-into-mbox-details-a-tale-of-tm-reticulate\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"Digging into mbox details: A tale of tm &#038; reticulate"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-2UT","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":13498,"url":"https:\/\/rud.is\/b\/2022\/07\/10\/rust-cli-for-apples-weatherkit-rest-api\/","url_meta":{"origin":11215,"position":0},"title":"Rust CLI For Apple&#8217;s WeatherKit REST API","author":"hrbrmstr","date":"2022-07-10","format":false,"excerpt":"Apple is in the final stages of shuttering the DarkSky service\/API. They've replaced it with WeatherKit, which has both an xOS framework version as well as a REST API. To use either, you need to be a member of the Apple Developer Program (ADP) \u2014 $99.00\/USD per-year \u2014 and calls\u2026","rel":"","context":"In &quot;Apple&quot;","block_context":{"text":"Apple","link":"https:\/\/rud.is\/b\/category\/apple\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11427,"url":"https:\/\/rud.is\/b\/2018\/08\/24\/friday-rstats-twofer-finding-macos-32-bit-apps-processing-data-from-system-commands\/","url_meta":{"origin":11215,"position":1},"title":"Friday #rstats twofer: Finding macOS 32-bit apps &#038; Processing Data from System Commands","author":"hrbrmstr","date":"2018-08-24","format":false,"excerpt":"Apple has run the death bell on 32-bit macOS apps and, if you're running a recent macOS version on your Mac (which you should so you can get security updates) you likely see this alert from time-to-time: If you're like me, you click through that and keep working but later\u2026","rel":"","context":"In &quot;Apple&quot;","block_context":{"text":"Apple","link":"https:\/\/rud.is\/b\/category\/apple\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/Screen-Shot-2018-08-24-at-4.58.41-AM.png?fit=1200%2C612&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/Screen-Shot-2018-08-24-at-4.58.41-AM.png?fit=1200%2C612&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/Screen-Shot-2018-08-24-at-4.58.41-AM.png?fit=1200%2C612&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/Screen-Shot-2018-08-24-at-4.58.41-AM.png?fit=1200%2C612&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/08\/Screen-Shot-2018-08-24-at-4.58.41-AM.png?fit=1200%2C612&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":5916,"url":"https:\/\/rud.is\/b\/2017\/05\/07\/plot-the-vote-making-u-s-senate-house-cartograms-in-r\/","url_meta":{"origin":11215,"position":2},"title":"Plot the Vote: Making U.S. Senate &#038; House Cartograms in R","author":"hrbrmstr","date":"2017-05-07","format":false,"excerpt":"Political machinations are a tad insane in the U.S. these days & I regularly hit up @ProPublica & @GovTrack sites (& sub to the GovTrack e-mail updates) as I try to be an informed citizen, especially since I've got a Senator and Representative who seem to be in the sway\u2026","rel":"","context":"In &quot;cartography&quot;","block_context":{"text":"cartography","link":"https:\/\/rud.is\/b\/category\/cartography\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/05\/rep_gt-1.png?fit=1200%2C840&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/05\/rep_gt-1.png?fit=1200%2C840&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/05\/rep_gt-1.png?fit=1200%2C840&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/05\/rep_gt-1.png?fit=1200%2C840&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/05\/rep_gt-1.png?fit=1200%2C840&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":3469,"url":"https:\/\/rud.is\/b\/2015\/06\/19\/do-something-nifffty-with-r\/","url_meta":{"origin":11215,"position":3},"title":"DO Something Nifffty with R","author":"hrbrmstr","date":"2015-06-19","format":false,"excerpt":"@briandconnelly (of [pushoverr](http:\/\/crantastic.org\/authors\/4002) fame) made a super-cool post about [connecting R](http:\/\/bconnelly.net\/2015\/06\/connecting-r-to-everything-with-ifttt\/) to @IFTTT via IFTTT's \"Maker\" channel. The IFTTT Maker interface to receive events is fairly straightforward and Brian's code worked flawlessly, so it was easy to tweak a bit and [wrap into a package](https:\/\/github.com\/hrbrmstr\/nifffty). To get started, you can\u2026","rel":"","context":"In &quot;Apple Watch&quot;","block_context":{"text":"Apple Watch","link":"https:\/\/rud.is\/b\/category\/apple-watch\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":5097,"url":"https:\/\/rud.is\/b\/2017\/02\/23\/on-watering-holes-trust-defensible-systems-and-data-science-community-security\/","url_meta":{"origin":11215,"position":4},"title":"On Watering Holes, Trust, Defensible Systems and Data Science Community Security","author":"hrbrmstr","date":"2017-02-23","format":false,"excerpt":"I've been threatening to do a series on \"data science community security\" for a while and had cause to issue this inaugural post today. It all started with this: Hey #rstats folks: don't do this. Srsly. Don't do this. Pls. Will blog why. Just don't do this. https:\/\/t.co\/qkem5ruEBi\u2014 boB Rudis\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/hieRarchy.png?fit=1200%2C1035&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/hieRarchy.png?fit=1200%2C1035&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/hieRarchy.png?fit=1200%2C1035&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/hieRarchy.png?fit=1200%2C1035&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/02\/hieRarchy.png?fit=1200%2C1035&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":11859,"url":"https:\/\/rud.is\/b\/2019\/02\/03\/r-package-update-urlscan\/","url_meta":{"origin":11215,"position":5},"title":"R Package Update: urlscan","author":"hrbrmstr","date":"2019-02-03","format":false,"excerpt":"The urlscan? package (an interface to the urlscan.io API) is now at version 0.2.0 and supports urlscan.io's authentication requirement when submitting a link for analysis. The service is handy if you want to learn about the details \u2014 all the gory technical details \u2014 for a website. For instance, say\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11215","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=11215"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11215\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=11215"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=11215"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=11215"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}