

{"id":7637,"date":"2017-12-20T08:09:45","date_gmt":"2017-12-20T13:09:45","guid":{"rendered":"https:\/\/rud.is\/b\/?p=7637"},"modified":"2018-10-05T11:01:35","modified_gmt":"2018-10-05T16:01:35","slug":"r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/","title":{"rendered":"R\u2076 Series \u2014 Random Sampling From Apache Drill Tables With R &#038; sergeant"},"content":{"rendered":"<p><span style=\"font-size:9pt\">(For first-timers, R\u2076 tagged posts are short &amp; sweet with minimal expository; <a rhef=\"https:\/\/rud.is\/b\/tag\/r6\/feed\/\">R\u2076 feed<\/a>)<\/span><\/p>\n<p>At work-work I mostly deal with medium-to-large-ish data. I often want to poke at new or existing data sets w\/o working across billions of rows. I also use Apache Drill for much of my exploratory work.<\/p>\n<p>Here&#8217;s how to uniformly sample data from Apache Drill using the <code>sergeant<\/code> package:<\/p>\n<pre id=\"drillsamp01\"><code class=\"language-r\">library(sergeant)\r\n\r\ndb &lt;- src_drill(&quot;sonar&quot;)\r\ntbl &lt;- tbl(db, &quot;dfs.dns.`aaaa.parquet`&quot;)\r\n\r\nsummarise(tbl, n=n())\r\n## # Source:   lazy query [?? x 1]\r\n## # Database: DrillConnection\r\n##          n\r\n##      &lt;int&gt;\r\n## 1 19977415\r\n\r\nmutate(tbl, r=rand()) %&gt;% \r\n  filter(r &lt;= 0.01) %&gt;% \r\n  summarise(n=n())\r\n## # Source:   lazy query [?? x 1]\r\n## # Database: DrillConnection\r\n##        n\r\n##    &lt;int&gt;\r\n## 1 199808\r\n\r\nmutate(tbl, r=rand()) %&gt;% \r\n  filter(r &lt;= 0.50) %&gt;% \r\n  summarise(n=n())\r\n## # Source:   lazy query [?? x 1]\r\n## # Database: DrillConnection\r\n##         n\r\n##     &lt;int&gt;\r\n## 1 9988797<\/code><\/pre>\n<p>And, for groups (using a different\/larger &#8220;database&#8221;):<\/p>\n<pre id=\"drillsamp02\"><code class=\"language-r\">fdns &lt;- tbl(db, &quot;dfs.fdns.`201708`&quot;)\r\n\r\nsummarise(fdns, n=n())\r\n## # Source:   lazy query [?? x 1]\r\n## # Database: DrillConnection\r\n##            n\r\n##        &lt;int&gt;\r\n## 1 1895133100\r\n\r\nfilter(fdns, type %in% c(&quot;cname&quot;, &quot;txt&quot;)) %&gt;% \r\n  count(type)\r\n## # Source:   lazy query [?? x 2]\r\n## # Database: DrillConnection\r\n##    type        n\r\n##   &lt;chr&gt;    &lt;int&gt;\r\n## 1 cname 15389064\r\n## 2   txt 67576750\r\n\r\nfilter(fdns, type %in% c(&quot;cname&quot;, &quot;txt&quot;)) %&gt;% \r\n  group_by(type) %&gt;% \r\n  mutate(r=rand()) %&gt;% \r\n  ungroup() %&gt;% \r\n  filter(r &lt;= 0.15) %&gt;% \r\n  count(type)\r\n## # Source:   lazy query [?? x 2]\r\n## # Database: DrillConnection\r\n##    type        n\r\n##   &lt;chr&gt;    &lt;int&gt;\r\n## 1 cname  2307604\r\n## 2   txt 10132672<\/code><\/pre>\n<p>I will (hopefully) be better at cranking these bite-sized posts more frequently in 2018.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(For first-timers, R\u2076 tagged posts are short &amp; sweet with minimal expository; R\u2076 feed) At work-work I mostly deal with medium-to-large-ish data. I often want to poke at new or existing data sets w\/o working across billions of rows. I also use Apache Drill for much of my exploratory work. Here&#8217;s how to uniformly sample [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[819,781,91],"tags":[810,787],"class_list":["post-7637","post","type-post","status-publish","format-standard","hentry","category-apache-drill","category-drill","category-r","tag-post","tag-r6"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>R\u2076 Series \u2014 Random Sampling From Apache Drill Tables With R &amp; sergeant - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2017\/12\/20\/r\u2076-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"R\u2076 Series \u2014 Random Sampling From Apache Drill Tables With R &amp; sergeant - rud.is\" \/>\n<meta property=\"og:description\" content=\"(For first-timers, R\u2076 tagged posts are short &amp; sweet with minimal expository; R\u2076 feed) At work-work I mostly deal with medium-to-large-ish data. I often want to poke at new or existing data sets w\/o working across billions of rows. I also use Apache Drill for much of my exploratory work. Here&#8217;s how to uniformly sample [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2017\/12\/20\/r\u2076-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2017-12-20T13:09:45+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-10-05T16:01:35+00:00\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"R\u2076 Series \u2014 Random Sampling From Apache Drill Tables With R &#038; sergeant\",\"datePublished\":\"2017-12-20T13:09:45+00:00\",\"dateModified\":\"2018-10-05T16:01:35+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/\"},\"wordCount\":106,\"commentCount\":1,\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"keywords\":[\"post\",\"r6\"],\"articleSection\":[\"Apache Drill\",\"drill\",\"R\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/\",\"url\":\"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/\",\"name\":\"R\u2076 Series \u2014 Random Sampling From Apache Drill Tables With R & sergeant - rud.is\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/#website\"},\"datePublished\":\"2017-12-20T13:09:45+00:00\",\"dateModified\":\"2018-10-05T16:01:35+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/rud.is\/b\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"R\u2076 Series \u2014 Random Sampling From Apache Drill Tables With R &#038; sergeant\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/rud.is\/b\/#website\",\"url\":\"https:\/\/rud.is\/b\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/rud.is\/b\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\/\/rud.is\"],\"url\":\"https:\/\/rud.is\/b\/author\/hrbrmstr\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"R\u2076 Series \u2014 Random Sampling From Apache Drill Tables With R & sergeant - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2017\/12\/20\/r\u2076-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/","og_locale":"en_US","og_type":"article","og_title":"R\u2076 Series \u2014 Random Sampling From Apache Drill Tables With R & sergeant - rud.is","og_description":"(For first-timers, R\u2076 tagged posts are short &amp; sweet with minimal expository; R\u2076 feed) At work-work I mostly deal with medium-to-large-ish data. I often want to poke at new or existing data sets w\/o working across billions of rows. I also use Apache Drill for much of my exploratory work. Here&#8217;s how to uniformly sample [&hellip;]","og_url":"https:\/\/rud.is\/b\/2017\/12\/20\/r\u2076-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/","og_site_name":"rud.is","article_published_time":"2017-12-20T13:09:45+00:00","article_modified_time":"2018-10-05T16:01:35+00:00","author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"R\u2076 Series \u2014 Random Sampling From Apache Drill Tables With R &#038; sergeant","datePublished":"2017-12-20T13:09:45+00:00","dateModified":"2018-10-05T16:01:35+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/"},"wordCount":106,"commentCount":1,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"keywords":["post","r6"],"articleSection":["Apache Drill","drill","R"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/","url":"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/","name":"R\u2076 Series \u2014 Random Sampling From Apache Drill Tables With R & sergeant - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"datePublished":"2017-12-20T13:09:45+00:00","dateModified":"2018-10-05T16:01:35+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"R\u2076 Series \u2014 Random Sampling From Apache Drill Tables With R &#038; sergeant"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-1Zb","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":4753,"url":"https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/","url_meta":{"origin":7637,"position":0},"title":"sergeant : An R Boot Camp for Apache Drill","author":"hrbrmstr","date":"2016-12-20","format":false,"excerpt":"I recently mentioned that I've been working on a development version of an Apache Drill R package called sergeant. Here's a lifted \"TLDR\" on Drill: Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift,\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":4929,"url":"https:\/\/rud.is\/b\/2017\/01\/22\/create-parquet-files-from-r-data-frames-with-sergeant-apache-drill-a-k-a-make-parquet-files-great-again-in-r\/","url_meta":{"origin":7637,"position":1},"title":"Create Parquet Files From R Data Frames With sergeant &#038; Apache Drill (a.k.a. Make Parquet Files Great Again in R)","author":"hrbrmstr","date":"2017-01-22","format":false,"excerpt":"2021-11-04 UPDATE: Just use {arrow}. Apache Drill is a nice tool to have in the toolbox as it provides a SQL front-end to a wide array of database and file back-ends and runs in standalone\/embedded mode on every modern operating system (i.e. you can get started with or play locally\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":10121,"url":"https:\/\/rud.is\/b\/2018\/04\/20\/painless-odbc-dplyr-connections-to-amazon-athena-and-apache-drill-with-r-odbc\/","url_meta":{"origin":7637,"position":2},"title":"Painless ODBC  + dplyr Connections to Amazon Athena and Apache Drill with R &#038; odbc","author":"hrbrmstr","date":"2018-04-20","format":false,"excerpt":"I spent some time this morning upgrading the JDBC driver (and changing up some supporting code to account for changes to it) for my metis package? which connects R up to Amazon Athena via RJDBC. I'm used to JDBC and have to deal with Java separately from R so I'm\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":4721,"url":"https:\/\/rud.is\/b\/2016\/12\/16\/minding-the-zookeeper-with-r\/","url_meta":{"origin":7637,"position":3},"title":"Minding the zoo[keeper] with R","author":"hrbrmstr","date":"2016-12-16","format":false,"excerpt":"I've been drafting a new R package \u2014 [`sergeant`](https:\/\/github.com\/hrbrmstr\/sergeant) \u2014 to work with [Apache Drill](http:\/\/drill.apache.org\/) and have found it much easier to manage having Drill operating in a single node cluster vs `drill-embedded` mode (esp when I need to add a couple more nodes for additional capacity). That means running\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6127,"url":"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/","url_meta":{"origin":7637,"position":4},"title":"Reading PCAP Files with Apache Drill and the sergeant R Package","author":"hrbrmstr","date":"2017-07-27","format":false,"excerpt":"It's no secret that I'm a fan of Apache Drill. One big strength of the platform is that it normalizes the access to diverse data sources down to ANSI SQL calls, which means that I can pull data from parquet, Hie, HBase, Kudu, CSV, JSON, MongoDB and MariaDB with the\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6046,"url":"https:\/\/rud.is\/b\/2017\/05\/31\/drilling-into-csvs-teaser-trailer\/","url_meta":{"origin":7637,"position":5},"title":"Drilling Into CSVs \u2014 Teaser Trailer","author":"hrbrmstr","date":"2017-05-31","format":false,"excerpt":"I used reading a directory of CSVs as the foundational example in my recent post on idioms. During my exchange with Matt, Hadley and a few others -- in the crazy Twitter thread that spawned said post -- I mentioned that I'd personally \"just use Drill\u201d. I'll use this post\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/7637","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=7637"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/7637\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=7637"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=7637"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=7637"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}