{"id":12811,"date":"2020-08-08T08:28:20","date_gmt":"2020-08-08T13:28:20","guid":{"rendered":"https:\/\/rud.is\/b\/?p=12811"},"modified":"2020-08-08T08:28:20","modified_gmt":"2020-08-08T13:28:20","slug":"quick-hit-speeding-up-data-frame-creation","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2020\/08\/08\/quick-hit-speeding-up-data-frame-creation\/","title":{"rendered":"Quick Hit: Speeding Up Data Frame Creation"},"content":{"rendered":"<p>(This is part 2 of <code>n<\/code> &#8220;quick hit&#8221; posts, each walking through some approaches to speeding up components of an iterative operation. Go <a href=\"https:\/\/rud.is\/b\/2020\/08\/07\/quick-hit-comparison-of-whole-file-reading-methods\/\">here<\/a> for part 1).<\/p>\n<p>Thanks to the aforementioned previous post, we now have a super fast way of reading individual text files containing HTTP headers from <code>HEAD<\/code> requests into a character vector:<\/p>\n<pre><code class=\"language-r\">library(Rcpp)\n\nvapply(\n  X = fils, \n  FUN = cpp_read_file, # see previous post for the source for this C++ Rcpp function\n  FUN.VALUE = character(1), \n  USE.NAMES = FALSE\n) -&gt; hdrs\n\nhead(hdrs, 2)\n## [1] \"HTTP\/1.1 200 OK\\r\\nDate: Mon, 08 Jun 2020 14:40:45 GMT\\r\\nServer: Apache\\r\\nLast-Modified: Sun, 26 Apr 2020 00:06:47 GMT\\r\\nETag: \\\"ace-ec1a0-5a4265fd413c0\\\"\\r\\nAccept-Ranges: bytes\\r\\nContent-Length: 967072\\r\\nX-Frame-Options: SAMEORIGIN\\r\\nContent-Type: application\/x-msdownload\\r\\n\\r\\n\"                                   \n## [2] \"HTTP\/1.1 200 OK\\r\\nDate: Mon, 08 Jun 2020 14:43:46 GMT\\r\\nServer: Apache\\r\\nLast-Modified: Wed, 05 Jun 2019 03:52:22 GMT\\r\\nETag: \\\"423-d99a0-58a8b864f8980\\\"\\r\\nAccept-Ranges: bytes\\r\\nContent-Length: 891296\\r\\nX-XSS-Protection: 1; mode=block\\r\\nX-Frame-Options: SAMEORIGIN\\r\\nContent-Type: application\/x-msdownload\\r\\n\\r\\n\"\n<\/code><\/pre>\n<p>However, I need the headers and values broken out so I can eventually get to the analysis I need to do, and a data frame of name\/value columns would be the most helpful format. We&#8217;ll use {stringi} to help us build a function (explanation of what it&#8217;s doing is in comment annotations) that turns each unkempt string into a very kempt data frame:<\/p>\n<pre><code class=\"language-r\">library(stringi)\n\nparse_headers &lt;- function(x) {\n\n  # split lines from into a character vector\n  split_hdrs &lt;- stri_split_lines(x, omit_empty = TRUE)\n\n  lapply(split_hdrs, function(lines) {\n\n    # we don't care about the HTTP x\/x ...\n    lines &lt;- lines[-1]\n\n    # make a matrix out of found NAME: VALUE\n    hdrs &lt;- stri_match_first_regex(lines, \"^([^:]*):\\\\s*(.*)$\")\n\n    if (nrow(hdrs) &gt; 0) { # if we have any\n      data.frame(\n        name = stri_replace_all_fixed(stri_trans_tolower(hdrs[,2]), \"-\", \"_\"),\n        value = hdrs[,3]\n      )\n    } else { # if we don't have any\n      NULL\n    }\n\n  })\n\n}\n\nparse_headers(hdrs[1:3])\n## [[1]]\n##              name                         value\n## 1            date Mon, 08 Jun 2020 14:40:45 GMT\n## 2          server                        Apache\n## 3   last_modified Sun, 26 Apr 2020 00:06:47 GMT\n## 4            etag     \"ace-ec1a0-5a4265fd413c0\"\n## 5   accept_ranges                         bytes\n## 6  content_length                        967072\n## 7 x_frame_options                    SAMEORIGIN\n## 8    content_type      application\/x-msdownload\n## \n## [[2]]\n##               name                         value\n## 1             date Mon, 08 Jun 2020 14:43:46 GMT\n## 2           server                        Apache\n## 3    last_modified Wed, 05 Jun 2019 03:52:22 GMT\n## 4             etag     \"423-d99a0-58a8b864f8980\"\n## 5    accept_ranges                         bytes\n## 6   content_length                        891296\n## 7 x_xss_protection                 1; mode=block\n## 8  x_frame_options                    SAMEORIGIN\n## 9     content_type      application\/x-msdownload\n## \n## [[3]]\n##           name                         value\n## 1         date Mon, 08 Jun 2020 14:23:53 GMT\n## 2       server                        Apache\n## 3 content_type text\/html; charset=iso-8859-1\n\nparse_header(hdrs[1])\n##              name                         value\n## 1            date Mon, 08 Jun 2020 14:40:45 GMT\n## 2          server                        Apache\n## 3   last_modified Sun, 26 Apr 2020 00:06:47 GMT\n## 4            etag     \"ace-ec1a0-5a4265fd413c0\"\n## 5   accept_ranges                         bytes\n## 6  content_length                        967072\n## 7 x_frame_options                    SAMEORIGIN\n## 8    content_type      application\/x-msdownload\n<\/code><\/pre>\n<p>Unfortunately, this takes almost 16 painful seconds to crunch through the ~75K text entries:<\/p>\n<pre><code class=\"language-r\">system.time(tmp &lt;- parse_headers(hdrs))\n##   user  system elapsed \n## 15.033   0.097  15.227 \n<\/code><\/pre>\n<p>as each call can be near 150 microseconds:<\/p>\n<pre><code class=\"language-r\">microbenchmark(\n  ph = parse_headers(hdrs[1]),\n  times = 1000,\n  control = list(warmup = 100)\n)\n## Unit: microseconds\n##  expr     min       lq     mean  median      uq     max neval\n##    ph 143.328 146.8995 154.8609 148.361 158.121 415.332  1000\n<\/code><\/pre>\n<p>A big reason it takes so long is the data frame creation. If you&#8217;ve never looked at the source for <code>data.frame()<\/code> have a go at it \u2014 <a href=\"https:\/\/github.com\/wch\/r-source\/blob\/86532f5aa3d9880f4c1c9e74a417005616846a34\/src\/library\/base\/R\/dataframe.R#L435-L603\">https:\/\/github.com\/wch\/r-source\/blob\/86532f5aa3d9880f4c1c9e74a417005616846a34\/src\/library\/base\/R\/dataframe.R#L435-L603<\/a> \u2014 before continuing.<\/p>\n<p>Back? Great! The {base} <code>data.frame()<\/code> has tons of guard rails to make sure you&#8217;re getting what you think you asked for across a myriad of use cases. I learned about a trick to make data frame creation faster when I started playing with {ggplot2} source. Said trick has virtually no guard rails \u2014 it just adds a class, and <code>row.names<\/code> attribute to a <code>list<\/code> \u2014 so you really should only use it in cases like this where you have a very good idea of the structure and values of the data frame you&#8217;re making. Here&#8217;s an even more simplified version of the function in the {ggplot2} source:<\/p>\n<pre><code class=\"language-r\">fast_frame &lt;- function(x = list()) {\n\n  lengths &lt;- vapply(x, length, integer(1))\n  n &lt;- if (length(x) == 0 || min(lengths) == 0) 0 else max(lengths)\n  class(x) &lt;- \"data.frame\"\n  attr(x, \"row.names\") &lt;- .set_row_names(n) # help(.set_row_names) for info\n\n  x\n\n}\n<\/code><\/pre>\n<p>Now, we&#8217;ll change <code>parse_headers()<\/code> a bit to use that function instead of <code>data.frame()<\/code>:<\/p>\n<pre><code class=\"language-r\">parse_headers &lt;- function(x) {\n\n  # split lines from into a character vector\n  split_hdrs &lt;- stri_split_lines(x, omit_empty = TRUE)\n\n  lapply(split_hdrs, function(lines) {\n\n    # we don't care about the HTTP x\/x ...\n    lines &lt;- lines[-1]\n\n    # make a matrix out of found NAME: VALUE\n    hdrs &lt;- stri_match_first_regex(lines, \"^([^:]*):\\\\s*(.*)$\")\n\n    if (nrow(hdrs) &gt; 0) { # if we have any\n      fast_frame(\n        list(\n          name = stri_replace_all_fixed(stri_trans_tolower(hdrs[,2]), \"-\", \"_\"),\n          value = hdrs[,3]\n        )\n      )\n    } else { # if we don't have any\n      NULL\n    }\n\n  })\n\n}\n<\/code><\/pre>\n<p>Note that we had to pass in a <code>list()<\/code> to it vs bare name\/value vectors.<\/p>\n<p>How much faster is it? Quite a bit:<\/p>\n<pre><code class=\"language-r\">microbenchmark(\n  ph = parse_headers(hdrs[1]),\n  times = 1000,\n  control = list(warmup = 100)\n)\n## Unit: microseconds\n##  expr   min      lq     mean median      uq      max neval\n##    ph 27.94 28.7205 34.66066 29.024 29.3785 4144.402  1000\n<\/code><\/pre>\n<p>This speedup means the painful ~15s is now just a tolerable ~3s:<\/p>\n<pre><code class=\"language-r\">system.time(tmp &lt;- parse_headers(hdrs))\n##  user  system elapsed \n## 2.901   0.011   2.918 \n<\/code><\/pre>\n<h3>FIN<\/h3>\n<p>Normally, guard rails are awesome, and you can have even more safe code (which means safer and more reproducible analyses) when using {tidyverse} functions. As noted in the previous post, I&#8217;m doing a great deal of iterative work, have more than one set of headers I&#8217;m crunching on, and am testing out different approaches\/theories, so going from 16 seconds to 3 seconds does truly speed up my efforts and has an even bigger impact when I process around 3 million raw header records.<\/p>\n<p>I <em>think<\/em> I promised {future} work in this post (asynchronous pun not intended), but we&#8217;ll get to that eventually (probably the next post).<\/p>\n<p>If you have your own favorite way to speedup data frame creation (or extracting target values from raw text records) drop a note in the comments!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(This is part 2 of n &#8220;quick hit&#8221; posts, each walking through some approaches to speeding up components of an iterative operation. Go here for part 1). Thanks to the aforementioned previous post, we now have a super fast way of reading individual text files containing HTTP headers from HEAD requests into a character vector: [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":"","jetpack_post_was_ever_published":false},"categories":[91],"tags":[],"class_list":["post-12811","post","type-post","status-publish","format-standard","hentry","category-r"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Quick Hit: Speeding Up Data Frame Creation - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2020\/08\/08\/quick-hit-speeding-up-data-frame-creation\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Quick Hit: Speeding Up Data Frame Creation - rud.is\" \/>\n<meta property=\"og:description\" content=\"(This is part 2 of n &#8220;quick hit&#8221; posts, each walking through some approaches to speeding up components of an iterative operation. Go here for part 1). Thanks to the aforementioned previous post, we now have a super fast way of reading individual text files containing HTTP headers from HEAD requests into a character vector: [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2020\/08\/08\/quick-hit-speeding-up-data-frame-creation\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2020-08-08T13:28:20+00:00\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/08\\\/08\\\/quick-hit-speeding-up-data-frame-creation\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/08\\\/08\\\/quick-hit-speeding-up-data-frame-creation\\\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"Quick Hit: Speeding Up Data Frame Creation\",\"datePublished\":\"2020-08-08T13:28:20+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/08\\\/08\\\/quick-hit-speeding-up-data-frame-creation\\\/\"},\"wordCount\":481,\"commentCount\":2,\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"articleSection\":[\"R\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/08\\\/08\\\/quick-hit-speeding-up-data-frame-creation\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/08\\\/08\\\/quick-hit-speeding-up-data-frame-creation\\\/\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/08\\\/08\\\/quick-hit-speeding-up-data-frame-creation\\\/\",\"name\":\"Quick Hit: Speeding Up Data Frame Creation - rud.is\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\"},\"datePublished\":\"2020-08-08T13:28:20+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/08\\\/08\\\/quick-hit-speeding-up-data-frame-creation\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/08\\\/08\\\/quick-hit-speeding-up-data-frame-creation\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/08\\\/08\\\/quick-hit-speeding-up-data-frame-creation\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/rud.is\\\/b\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Quick Hit: Speeding Up Data Frame Creation\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/rud.is\\\/b\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\\\/\\\/rud.is\"],\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/author\\\/hrbrmstr\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Quick Hit: Speeding Up Data Frame Creation - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2020\/08\/08\/quick-hit-speeding-up-data-frame-creation\/","og_locale":"en_US","og_type":"article","og_title":"Quick Hit: Speeding Up Data Frame Creation - rud.is","og_description":"(This is part 2 of n &#8220;quick hit&#8221; posts, each walking through some approaches to speeding up components of an iterative operation. Go here for part 1). Thanks to the aforementioned previous post, we now have a super fast way of reading individual text files containing HTTP headers from HEAD requests into a character vector: [&hellip;]","og_url":"https:\/\/rud.is\/b\/2020\/08\/08\/quick-hit-speeding-up-data-frame-creation\/","og_site_name":"rud.is","article_published_time":"2020-08-08T13:28:20+00:00","author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2020\/08\/08\/quick-hit-speeding-up-data-frame-creation\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2020\/08\/08\/quick-hit-speeding-up-data-frame-creation\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"Quick Hit: Speeding Up Data Frame Creation","datePublished":"2020-08-08T13:28:20+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2020\/08\/08\/quick-hit-speeding-up-data-frame-creation\/"},"wordCount":481,"commentCount":2,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"articleSection":["R"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2020\/08\/08\/quick-hit-speeding-up-data-frame-creation\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2020\/08\/08\/quick-hit-speeding-up-data-frame-creation\/","url":"https:\/\/rud.is\/b\/2020\/08\/08\/quick-hit-speeding-up-data-frame-creation\/","name":"Quick Hit: Speeding Up Data Frame Creation - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"datePublished":"2020-08-08T13:28:20+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2020\/08\/08\/quick-hit-speeding-up-data-frame-creation\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2020\/08\/08\/quick-hit-speeding-up-data-frame-creation\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2020\/08\/08\/quick-hit-speeding-up-data-frame-creation\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"Quick Hit: Speeding Up Data Frame Creation"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-3kD","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":12804,"url":"https:\/\/rud.is\/b\/2020\/08\/07\/quick-hit-comparison-of-whole-file-reading-methods\/","url_meta":{"origin":12811,"position":0},"title":"Quick Hit: Comparison of &#8220;Whole File Reading&#8221; Methods","author":"hrbrmstr","date":"2020-08-07","format":false,"excerpt":"(This is part 1 of n posts using this same data; n will likely be 2-3, and the posts are more around optimization than anything else.) I recently had to analyze HTTP response headers (generated by a HEAD request) from around 74,000 sites (each response stored in a text file).\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11894,"url":"https:\/\/rud.is\/b\/2019\/02\/09\/quick-hit-speeding-up-a-slow-mundane-task-with-a-little-rcpp\/","url_meta":{"origin":12811,"position":1},"title":"Quick Hit: Speeding Up a Slow\/Mundane Task with a Little Rcpp","author":"hrbrmstr","date":"2019-02-09","format":false,"excerpt":"Over at $DAYJOB's blog I've queued up a post that shows how to use our new ropendata? package to work with our Open Data portal's API. I'm not super-sure when it's going to be posted so keep an RSS reader fixed on https:\/\/blog.rapid7.com\/ if you're interested in seeing it (I\u2026","rel":"","context":"In &quot;C++&quot;","block_context":{"text":"C++","link":"https:\/\/rud.is\/b\/category\/c\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":8140,"url":"https:\/\/rud.is\/b\/2018\/02\/06\/quick-and-clean-dmarc-record-processing-with-inline-rcpp\/","url_meta":{"origin":12811,"position":2},"title":"Quick and Clean DMARC Record Processing with &#8220;Inline&#8221; Rcpp","author":"hrbrmstr","date":"2018-02-06","format":false,"excerpt":"Much of what I need to do for work-work involves using tools that are (for the moment) not in R. Today, I needed to test the validity of (and other processing on) DMARC records and I'm loathe to either reinvent the wheel or reticulate bits from a fragmented programming language\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":12366,"url":"https:\/\/rud.is\/b\/2019\/06\/26\/quick-hit-above-the-fold-hard-wrapping-text-at-n-characters\/","url_meta":{"origin":12811,"position":3},"title":"Quick Hit: Above the Fold; Hard wrapping text at &#8216;n&#8217; characters","author":"hrbrmstr","date":"2019-06-26","format":false,"excerpt":"Despite being on holiday I'm getting in a bit of non-work R coding since the fam has a greater ability to sleep late than I do. Apart from other things I've been working on a PR into {lutz}, a package by @andyteucher that turns lat\/lng pairs into timezone strings. The\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/06\/lutz-widths-02.png?fit=1200%2C628&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/06\/lutz-widths-02.png?fit=1200%2C628&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/06\/lutz-widths-02.png?fit=1200%2C628&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/06\/lutz-widths-02.png?fit=1200%2C628&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/06\/lutz-widths-02.png?fit=1200%2C628&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":7579,"url":"https:\/\/rud.is\/b\/2017\/12\/17\/mqtt-development-log-on-dsls-rcpp-modules-and-custom-formula-functions\/","url_meta":{"origin":12811,"position":4},"title":"mqtt Development Log : On DSLs, Rcpp Modules and Custom Formula Functions","author":"hrbrmstr","date":"2017-12-17","format":false,"excerpt":"I know some folks had a bit of fun with the previous post since it exposed the fact that I left out unique MQTT client id generation from the initial 0.1.0 release of the in-development package (client ids need to be unique). There have been some serious improvements since said\u2026","rel":"","context":"In &quot;mqtt&quot;","block_context":{"text":"mqtt","link":"https:\/\/rud.is\/b\/category\/mqtt\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/12\/tojson.gif?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/12\/tojson.gif?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/12\/tojson.gif?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/12\/tojson.gif?resize=700%2C400&ssl=1 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/12\/tojson.gif?resize=1050%2C600&ssl=1 3x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/12\/tojson.gif?resize=1400%2C800&ssl=1 4x"},"classes":[]},{"id":3622,"url":"https:\/\/rud.is\/b\/2015\/08\/21\/doh-i-could-have-had-just-used-v8\/","url_meta":{"origin":12811,"position":5},"title":"Doh! I Could Have Had Just Used V8!","author":"hrbrmstr","date":"2015-08-21","format":false,"excerpt":"An R user recently had the need to split a \"full, human name\" into component parts to retrieve first & last names. The full names could be anything from something simple like _\"David Regan\"_ to more complex & diverse such as _\"John Smith Jr.\"_, _\"Izaque Iuzuru Nagata\"_ or _\"Christian Schmit\u2026","rel":"","context":"In &quot;Javascript&quot;","block_context":{"text":"Javascript","link":"https:\/\/rud.is\/b\/category\/javascript\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/12811","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=12811"}],"version-history":[{"count":4,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/12811\/revisions"}],"predecessor-version":[{"id":12815,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/12811\/revisions\/12815"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=12811"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=12811"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=12811"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}