

{"id":6127,"date":"2017-07-27T18:07:24","date_gmt":"2017-07-27T23:07:24","guid":{"rendered":"https:\/\/rud.is\/b\/?p=6127"},"modified":"2018-10-05T11:01:21","modified_gmt":"2018-10-05T16:01:21","slug":"reading-pcap-files-with-apache-drill-and-the-sergeant-r-package","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/","title":{"rendered":"Reading PCAP Files with Apache Drill and the sergeant R Package"},"content":{"rendered":"<p>It&#8217;s no secret that I&#8217;m a fan of <a href=\"https:\/\/drill.apache.org\/\">Apache Drill<\/a>. One big strength of the platform is that it normalizes the access to diverse data sources down to ANSI SQL calls, which means that I can pull data from parquet, Hie, HBase, Kudu, CSV, JSON, MongoDB and MariaDB with the same SQL syntax. This also means that I get access to all those platforms in R centrally through the <a href=\"https:\/\/github.com\/hrbrmstr\/sergeant\"><code>sergeant<\/code><\/a> package that rests atop <code>d[b]plyr<\/code>. However, it <em>further<\/em> means that when support for a new file type is added, I get that same functionality without any extra effort.<\/p>\n<p>Why am I calling this out?<\/p>\n<p>Well, the intrepid Drill developers are in the process of finalizing the <a href=\"https:\/\/home.apache.org\/~arina\/drill\/releases\/1.11.0\/\">release candidate for version 1.11.0<\/a> and one feature they&#8217;ve added is the ability to query individual and entire directories full of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Pcap\">PCAP files<\/a> from within Drill. While I provided a link to the Wikipedia article on PCAP files, the TL;DR on them is that it&#8217;s an optimized binary file format for recording network activity. If you&#8217;re on macOS or a linux-ish system go do something like this:<\/p>\n<p><code>sudo tcpdump -ni en0 -s0 -w capture01.pcap<\/code><\/p>\n<p>And, wait a bit.<\/p>\n<p>NOTE: Some of you may have to change the <code>en0<\/code> to your main network interface name (a quick google for that for your platform should get you to the right one to use).<\/p>\n<p>That command will passively record all network activity on your system until you <code>ctrl-c<\/code> it. The longer it goes the larger it gets.<\/p>\n<p>When you&#8217;ve recorded a minute or two of packets, <code>ctrl-c<\/code> the program and then try to look at the PCAP file. It&#8217;s a binary mess. You can re-read it with <code>tcpdump<\/code> or <a href=\"https:\/\/www.wireshark.org\">Wireshark<\/a> and there are many C[++] libraries and other utilities that can read them. You can even convert them to CSV or XML, but the PCAP itself requires custom tools to work with them effectively. I had started creating <a href=\"https:\/\/github.com\/hrbrmstr\/crafter\"><code>crafter<\/code><\/a> to work with these files but my use case\/project dried up and haven&#8217;t gone back to it.<\/p>\n<p>Adding the capability into Drill means I don&#8217;t really have to work any further on that specialized package as I can do this:<\/p>\n<pre id=\"drillpcap01\"><code class=\"language-r\">library(sergeant)\r\nlibrary(iptools)\r\nlibrary(tidyverse)\r\nlibrary(cymruservices)\r\n\r\ndb &lt;- src_drill(&quot;localhost&quot;)\r\n\r\nmy_pcaps &lt;- tbl(db, &quot;dfs.caps.`\/capture02.pcap`&quot;)\r\n\r\nglimpse(my_pcaps)\r\n## Observations: 25\r\n## Variables: 12\r\n## $ src_ip          &lt;chr&gt; &quot;192.168.10.100&quot;, &quot;54.159.166.81&quot;, &quot;192.168.10...\r\n## $ src_port        &lt;int&gt; 60025, 443, 60025, 443, 60025, 58976, 443, 535...\r\n## $ tcp_session     &lt;dbl&gt; -2.082796e+17, -2.082796e+17, -2.082796e+17, -...\r\n## $ packet_length   &lt;int&gt; 129, 129, 66, 703, 66, 65, 75, 364, 65, 65, 75...\r\n## $ data            &lt;chr&gt; &quot;...g9B..c.&lt;..O..@=,0R.`........K..EzYd=.........\r\n## $ src_mac_address &lt;chr&gt; &quot;78:4F:43:77:02:00&quot;, &quot;D4:8C:B5:C9:6C:1B&quot;, &quot;78:...\r\n## $ dst_port        &lt;int&gt; 443, 60025, 443, 60025, 443, 443, 58976, 5353,...\r\n## $ type            &lt;chr&gt; &quot;TCP&quot;, &quot;TCP&quot;, &quot;TCP&quot;, &quot;TCP&quot;, &quot;TCP&quot;, &quot;UDP&quot;, &quot;UDP...\r\n## $ dst_ip          &lt;chr&gt; &quot;54.159.166.81&quot;, &quot;192.168.10.100&quot;, &quot;54.159.166...\r\n## $ dst_mac_address &lt;chr&gt; &quot;D4:8C:B5:C9:6C:1B&quot;, &quot;78:4F:43:77:02:00&quot;, &quot;D4:...\r\n## $ network         &lt;int&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...\r\n## $ timestamp       &lt;dttm&gt; 2017-07-27 23:54:58, 2017-07-27 23:54:59, 201...\r\n\r\nsummarise(my_pcaps, max = max(timestamp), min = min(timestamp)) %&gt;% \r\n  collect() %&gt;% \r\n  summarise(max - min)\r\n## # A tibble: 1 x 1\r\n##     `max - min`\r\n##          &lt;time&gt;\r\n## 1 1.924583 mins\r\n\r\ncount(my_pcaps, type)\r\n## # Source:   lazy query [?? x 2]\r\n## # Database: DrillConnection\r\n##    type     n\r\n##   &lt;chr&gt; &lt;int&gt;\r\n## 1   TCP  4974\r\n## 2   UDP   774\r\n\r\nfilter(my_pcaps, type==&quot;TCP&quot;) %&gt;% \r\n  count(dst_port, sort=TRUE)\r\n## # Source:     lazy query [?? x 2]\r\n## # Database:   DrillConnection\r\n## # Ordered by: desc(n)\r\n##    dst_port     n\r\n##       &lt;int&gt; &lt;int&gt;\r\n##  1      443  2580\r\n##  2    56202   476\r\n##  3    56229   226\r\n##  4    56147   169\r\n##  5    56215   103\r\n##  6    56143    94\r\n##  7    56085    89\r\n##  8    56203    56\r\n##  9    56205    39\r\n## 10    56209    39\r\n## # ... with more rows\r\n\r\nfilter(my_pcaps, type==&quot;TCP&quot;) %&gt;% \r\n  count(dst_ip, sort=TRUE) %&gt;% \r\n  collect() -&gt; dst_ips\r\n\r\nfilter(dst_ips, !is.na(dst_ip)) %&gt;%\r\n  left_join(ips_in_cidrs(.$dst_ip, c(&quot;10.0.0.0\/8&quot;, &quot;172.16.0.0\/12&quot;, &quot;192.168.0.0\/16&quot;)),\r\n            by = c(&quot;dst_ip&quot;=&quot;ips&quot;)) %&gt;%\r\n  filter(!in_cidr) %&gt;%\r\n  left_join(distinct(bulk_origin(.$dst_ip), ip, .keep_all=TRUE), c(&quot;dst_ip&quot; = &quot;ip&quot;)) %&gt;%\r\n  select(dst_ip, n, as_name)\r\n## # A tibble: 37 x 3\r\n##            dst_ip     n                              as_name\r\n##             &lt;chr&gt; &lt;int&gt;                                &lt;chr&gt;\r\n##  1   104.244.42.2   862           TWITTER - Twitter Inc., US\r\n##  2 104.244.46.103   556           TWITTER - Twitter Inc., US\r\n##  3  104.20.60.241   183 CLOUDFLARENET - CloudFlare, Inc., US\r\n##  4     31.13.80.8   160        FACEBOOK - Facebook, Inc., US\r\n##  5  52.218.160.76   100     AMAZON-02 - Amazon.com, Inc., US\r\n##  6  104.20.59.241    79 CLOUDFLARENET - CloudFlare, Inc., US\r\n##  7  52.218.160.92    66     AMAZON-02 - Amazon.com, Inc., US\r\n##  8  199.16.156.81    58           TWITTER - Twitter Inc., US\r\n##  9 104.244.42.193    47           TWITTER - Twitter Inc., US\r\n## 10  52.86.113.212    42    AMAZON-AES - Amazon.com, Inc., US\r\n## # ... with 27 more rows<\/code><\/pre>\n<p>No custom R code. No modification to the <code>sergeant<\/code> package. Just query it like any other data source.<\/p>\n<p>One really cool part of this is that &mdash; while similar functionality has been available in various Hadoop contexts for a few years &mdash; we&#8217;re doing this query from a <em>local file system<\/em> outside of a Hadoop context.<\/p>\n<p>I had to add <code>\"pcap\": { \"type\": \"pcap\" }<\/code> to the <code>formats<\/code> section of the <code>dfs<\/code> storage configuration (#ty to the Drill community for helping me figure that out) and, I setup a directory that defaults to the <code>pcap<\/code> type. But after that, it <em>just works<\/em>.<\/p>\n<p>Well, <em>kinda<\/em>.<\/p>\n<p>The Java code that the plugin is based on doesn&#8217;t like busted PCAP files (which we get quite a bit of in infosec- &amp; honeypot-lands) and it seems to bork on IPv6 packets a bit. And, my <code>sergeant<\/code> package (for now) can&#8217;t do much with the <code>data<\/code> component (neither can Drill-proper, either). <em>But<\/em>, it&#8217;s a <em>great start<\/em> and I can use it to do bulk parquet file creation of basic protocols &amp; connection information or take a quick look at some honeypot captures whenever I need to, right from R, without converting them first.<\/p>\n<p>Drill 1.11.0 is only at RC0 right now, so some of these issues may be gone by the time the full release is baked. Some fixes may have to wait for 1.12.0. And, much work needs to be done on the UDF-side and <code>sergeant<\/code> side to help make the <code>data<\/code> element more useful.<\/p>\n<p>Even with the issues and limitations, this is an amazing new feature that&#8217;s been added to an incredibly useful tool and much thanks goes out to the Drill dev team for sneaking this in to 1.11.0.<\/p>\n<p>If you have cause to work with PCAP files, give this a go and see if it helps speed up parts of your workflow.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It&#8217;s no secret that I&#8217;m a fan of Apache Drill. One big strength of the platform is that it normalizes the access to diverse data sources down to ANSI SQL calls, which means that I can pull data from parquet, Hie, HBase, Kudu, CSV, JSON, MongoDB and MariaDB with the same SQL syntax. This also [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[819,764,781,798,91],"tags":[810],"class_list":["post-6127","post","type-post","status-publish","format-standard","hentry","category-apache-drill","category-data-wrangling","category-drill","category-pcap","category-r","tag-post"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Reading PCAP Files with Apache Drill and the sergeant R Package - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Reading PCAP Files with Apache Drill and the sergeant R Package - rud.is\" \/>\n<meta property=\"og:description\" content=\"It&#8217;s no secret that I&#8217;m a fan of Apache Drill. One big strength of the platform is that it normalizes the access to diverse data sources down to ANSI SQL calls, which means that I can pull data from parquet, Hie, HBase, Kudu, CSV, JSON, MongoDB and MariaDB with the same SQL syntax. This also [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2017-07-27T23:07:24+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-10-05T16:01:21+00:00\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"Reading PCAP Files with Apache Drill and the sergeant R Package\",\"datePublished\":\"2017-07-27T23:07:24+00:00\",\"dateModified\":\"2018-10-05T16:01:21+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/\"},\"wordCount\":674,\"commentCount\":10,\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"keywords\":[\"post\"],\"articleSection\":[\"Apache Drill\",\"data wrangling\",\"drill\",\"pcap\",\"R\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/\",\"url\":\"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/\",\"name\":\"Reading PCAP Files with Apache Drill and the sergeant R Package - rud.is\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/#website\"},\"datePublished\":\"2017-07-27T23:07:24+00:00\",\"dateModified\":\"2018-10-05T16:01:21+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/rud.is\/b\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Reading PCAP Files with Apache Drill and the sergeant R Package\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/rud.is\/b\/#website\",\"url\":\"https:\/\/rud.is\/b\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/rud.is\/b\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\/\/rud.is\"],\"url\":\"https:\/\/rud.is\/b\/author\/hrbrmstr\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Reading PCAP Files with Apache Drill and the sergeant R Package - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/","og_locale":"en_US","og_type":"article","og_title":"Reading PCAP Files with Apache Drill and the sergeant R Package - rud.is","og_description":"It&#8217;s no secret that I&#8217;m a fan of Apache Drill. One big strength of the platform is that it normalizes the access to diverse data sources down to ANSI SQL calls, which means that I can pull data from parquet, Hie, HBase, Kudu, CSV, JSON, MongoDB and MariaDB with the same SQL syntax. This also [&hellip;]","og_url":"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/","og_site_name":"rud.is","article_published_time":"2017-07-27T23:07:24+00:00","article_modified_time":"2018-10-05T16:01:21+00:00","author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"Reading PCAP Files with Apache Drill and the sergeant R Package","datePublished":"2017-07-27T23:07:24+00:00","dateModified":"2018-10-05T16:01:21+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/"},"wordCount":674,"commentCount":10,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"keywords":["post"],"articleSection":["Apache Drill","data wrangling","drill","pcap","R"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/","url":"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/","name":"Reading PCAP Files with Apache Drill and the sergeant R Package - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"datePublished":"2017-07-27T23:07:24+00:00","dateModified":"2018-10-05T16:01:21+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"Reading PCAP Files with Apache Drill and the sergeant R Package"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-1AP","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":4929,"url":"https:\/\/rud.is\/b\/2017\/01\/22\/create-parquet-files-from-r-data-frames-with-sergeant-apache-drill-a-k-a-make-parquet-files-great-again-in-r\/","url_meta":{"origin":6127,"position":0},"title":"Create Parquet Files From R Data Frames With sergeant &#038; Apache Drill (a.k.a. Make Parquet Files Great Again in R)","author":"hrbrmstr","date":"2017-01-22","format":false,"excerpt":"2021-11-04 UPDATE: Just use {arrow}. Apache Drill is a nice tool to have in the toolbox as it provides a SQL front-end to a wide array of database and file back-ends and runs in standalone\/embedded mode on every modern operating system (i.e. you can get started with or play locally\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6091,"url":"https:\/\/rud.is\/b\/2017\/06\/17\/replicating-the-apache-drill-yelp-academic-dataset-with-sergeant\/","url_meta":{"origin":6127,"position":1},"title":"Replicating the Apache Drill &#8216;Yelp&#8217; Academic Dataset Analysis with sergeant","author":"hrbrmstr","date":"2017-06-17","format":false,"excerpt":"The Apache Drill folks have a nice walk-through tutorial on how to analyze the Yelp Academic Dataset with Drill. It's a bit out of date (the current Yelp data set structure is different enough that the tutorial will error out at various points), but it's a great example of how\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":4753,"url":"https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/","url_meta":{"origin":6127,"position":2},"title":"sergeant : An R Boot Camp for Apache Drill","author":"hrbrmstr","date":"2016-12-20","format":false,"excerpt":"I recently mentioned that I've been working on a development version of an Apache Drill R package called sergeant. Here's a lifted \"TLDR\" on Drill: Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift,\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11712,"url":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/","url_meta":{"origin":6127,"position":3},"title":"Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types &#038; Mounds of New Metadata","author":"hrbrmstr","date":"2019-01-02","format":false,"excerpt":"Apache Drill is an innovative distributed SQL engine designed to enable data exploration and analytics on non-relational datastores [...] without having to create and manage schemas. [...] It has a schema-free JSON document model similar to MongoDB and Elasticsearch; [a plethora of APIs, including] ANSI SQL, ODBC\/JDBC, and HTTP[S] REST;\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":10121,"url":"https:\/\/rud.is\/b\/2018\/04\/20\/painless-odbc-dplyr-connections-to-amazon-athena-and-apache-drill-with-r-odbc\/","url_meta":{"origin":6127,"position":4},"title":"Painless ODBC  + dplyr Connections to Amazon Athena and Apache Drill with R &#038; odbc","author":"hrbrmstr","date":"2018-04-20","format":false,"excerpt":"I spent some time this morning upgrading the JDBC driver (and changing up some supporting code to account for changes to it) for my metis package? which connects R up to Amazon Athena via RJDBC. I'm used to JDBC and have to deal with Java separately from R so I'm\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":7637,"url":"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/","url_meta":{"origin":6127,"position":5},"title":"R\u2076 Series \u2014 Random Sampling From Apache Drill Tables With R &#038; sergeant","author":"hrbrmstr","date":"2017-12-20","format":false,"excerpt":"(For first-timers, R\u2076 tagged posts are short & sweet with minimal expository; R\u2076 feed) At work-work I mostly deal with medium-to-large-ish data. I often want to poke at new or existing data sets w\/o working across billions of rows. I also use Apache Drill for much of my exploratory work.\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/6127","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=6127"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/6127\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=6127"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=6127"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=6127"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}