

{"id":4753,"date":"2016-12-20T11:01:17","date_gmt":"2016-12-20T16:01:17","guid":{"rendered":"https:\/\/rud.is\/b\/?p=4753"},"modified":"2018-10-05T10:59:59","modified_gmt":"2018-10-05T15:59:59","slug":"sergeant-a-r-boot-camp-for-apache-drill","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/","title":{"rendered":"sergeant : An R Boot Camp for Apache Drill"},"content":{"rendered":"<p>I <a href=\"https:\/\/rud.is\/b\/2016\/12\/16\/minding-the-zookeeper-with-r\/\">recently mentioned<\/a> that I&#8217;ve been working on a development version of an <a href=\"https:\/\/drill.apache.org\/\">Apache Drill<\/a> R package called <a href=\"https:\/\/github.com\/hrbrmstr\/sergeant\"><code>sergeant<\/code><\/a>. Here&#8217;s a lifted &#8220;TLDR&#8221; on Drill:<\/p>\n<div style=\"padding-left:36px; font-style:italic; border-left:0px solid #b2b2b255\">\nDrill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores. For example, you can join a user profile collection in MongoDB with a directory of event logs in Hadoop.<br \/>\nDrill&#8217;s datastore-aware optimizer automatically restructures a query plan to leverage the datastore&#8217;s internal processing capabilities. In addition, Drill supports data locality, so it&#8217;s a good idea to co-locate Drill and the datastore on the same nodes.\n<\/div>\n<p>It also supports reading formats such as:<\/p>\n<ul>\n<li>Avro<\/li>\n<li>[CTP]SV ([C]omma-, [T]ab-, [P]ipe-Separated-Values)<\/li>\n<li>Parquet<\/li>\n<li>Hadoop Sequence Files<\/li>\n<\/ul>\n<p>It&#8217;s a <em>bit<\/em> like Spark in that you can run it on a single workstation and scale up to a YUGE cluster. It lacks the ML components of Spark, but it connects to <em>everything<\/em> without the need to define a schema up front. Said &#8220;everything&#8221; includes parquet files on local filesystems, so if you need to slice through GBs of parquet data and have a beefy enough Linux workstation (I believe Drill runs on Windows and know it runs on macOS fine, too, but that&#8217;s $$$ for a bucket of memory &amp; disk space) you can take advantage of the optimized processing power Drill offers on a single system (while also joining the data with any other data format you can think of). You can also seamlessly move the data to a cluster and barely tweak your code to support said capacity expansion.<\/p>\n<h3>Why <code>sergeant<\/code>?<\/h3>\n<p>There&#8217;s already an R package on CRAN to work with Drill: <a href=\"https:\/\/cran.r-project.org\/web\/packages\/DrillR\/index.html\"><code>DrillR<\/code><\/a>. It&#8217;s S4 class-based, has a decent implementation and interfaces with the REST API. However, it sticks <code>httr::verbose()<\/code> <em>everywhere<\/em>: <a href=\"https:\/\/github.com\/cran\/DrillR\/search?utf8=%E2%9C%93&amp;q=verbose\">https:\/\/github.com\/cran\/DrillR\/search?utf8=%E2%9C%93&amp;q=verbose<\/a>.<\/p>\n<p>The <code>sergeant<\/code> package interfaces with the REST API as well, but also works with the JDBC driver (the dev version includes the driver with the package, but this will be removed for the eventual CRAN submission) and includes some other niceties around Drill options viewing and setting and some other non-SQL bits. Of note: the REST API version shows an <code>httr<\/code> progress bar for data downloading and you can wrap the calls with <code>httr::with_verbose(\u2026)<\/code> if you <em>really<\/em> like seeing cURL messages.<\/p>\n<p>The other thing <code>sergeant<\/code> has going for it is a nascent <code>dplyr<\/code> interface. Presently, this is a hack-ish wrapper around the <code>RJDBC<\/code> <code>JDBCConnection<\/code> presented by the Drill JDBC driver. While basic functionality works, I firmly believe Drill needs it&#8217;s own DBI driver (like is second-cousin Preso has) to avoid collisions withy any other JDBC connections you might have open, plus more work needs to be done under the covers to deal with quoting properly and exposing more Drill built-in SQL functions.<\/p>\n<h3>SQL vs <code>dplyr<\/code><\/h3>\n<p>For some truly complex data machinations you&#8217;re going to want to work at the SQL level and I think it&#8217;s important to know SQL if you&#8217;re ever going to do data work outside JSON &amp; CSV files <em>just to appreciate how much gnashing of teeth<\/em> <code>dplyr<\/code> <em>saves you from<\/em>. Using SQL for many light-to-medium aggregation tasks that feed data to R can feel like you&#8217;re banging rocks together to make fire when you could just be using your R precision welder. What would you rather write:<\/p>\n<pre id=\"drill-sql-01\"><code class=\"language-sql\">SELECT  gender ,  marital_status , COUNT(*) AS  n \r\nFROM  cp.`employee.json` \r\nGROUP BY  gender ,  marital_status<\/code><\/pre>\n<p>in a <code>drill-embedded<\/code> or <code>drill-localhost<\/code> SQL shell? Or:<\/p>\n<pre id=\"drill-dplyr-01\"><code class=\"language-r\">library(RJDBC)\r\nlibrary(dplyr)\r\nlibrary(sergeant)\r\n\r\nds &lt;- src_drill(&quot;localhost:31010&quot;, use_zk=FALSE)\r\n\r\ndb &lt;- tbl(ds, &quot;cp.`employee.json`&quot;) \r\n\r\ncount(db, gender, marital_status) %&gt;% collect()<\/code><\/pre>\n<p>(NOTE: that SQL statement is what ultimately gets sent to Drill from <code>dplyr<\/code>)<\/p>\n<p>Now, <code>dplyr<\/code> <code>tbl_df<\/code> idioms don&#8217;t translate 1:1 to all other <code>src_<\/code>es, but they are much easier on the eyes and more instructive in analysis code (and, I fully admit that said statement is more opinion than fact).<\/p>\n<h3><code>sergeant<\/code> and <code>dplyr<\/code><\/h3>\n<p>The <code>src_drill()<\/code> function uses the JDBC Drill driver and, hence, has an <code>RJDBC<\/code> dependency. The Presto folks (a &#8220;competing&#8221; offering to Drill) wrapped a <a href=\"https:\/\/github.com\/prestodb\/RPresto\"><code>DBI<\/code> interface<\/a> around their REST API to facilitate the use of <code>dplyr<\/code> idioms. I&#8217;m not sold on whether I&#8217;ll continue with a lightweight DBI wrapper using RJDBC or go the <code>RPresto<\/code> route, but for now the basic functionality works and changing the back-end implementation should not break anything (much).<\/p>\n<h3>You&#8217;ve said &#8220;parquet&#8221; alot&hellip;<\/h3>\n<p>Yes. Yes, I have. Parquet is a &#8220;big data&#8221; compressed columnar storage format that is generally used in Hadoop shops. Parquet is <a href=\"https:\/\/github.com\/wesm\/feather\/issues\/188\">different from &#8216;feather&#8217;<\/a> (&#8216;feather&#8217; is based on another Apache foundation project: <a href=\"https:\/\/arrow.apache.org\/\">Arrow<\/a>). Arrow\/feather is great for things that fit in memory. Parquet and the idioms that sit on top of it enable having large amounts data available in a cluster for processing with Hadoop \/ Spark \/ Drill \/ Presto (etc). Parquet is great for storing all kinds of data, including log and event data which I have to work with quite a bit and it&#8217;s great being able to prototype on a single workstation then move code to hit a production cluster. Plus, it&#8217;s super-easy to, say, convert an entire, nested directory tree of daily JSON log files into parquet with Drill:<\/p>\n<pre id=\"drill-sql-02\"><code class=\"language-sql\">CREATE TABLE dfs.destination.`source\/2016\/12\/2016_12_source_event_logs.parquet` AS\r\n  SELECT src_ip, dst_ip, src_port, dst_port, event_message, ts \r\n  FROM dfs.source.`\/log\/dir\/root\/2016\/12\/*\/event_log.json`;<\/code><\/pre>\n<h3>Kick the tyres<\/h3>\n<p>The REST and JDBC functions are solid (I&#8217;ve been using them at work for a while) and the <code>dplyr<\/code> support has handled some preliminary production work well (though, remember, it&#8217;s not fully-baked). There are plenty of examples &mdash; including a <code>dplyr::left_join()<\/code> between parquet and JSON data &mdash; in the <a href=\"https:\/\/github.com\/hrbrmstr\/sergeant\">README<\/a> and all the exposed functions have documentation.<\/p>\n<p>File <a href=\"https:\/\/github.com\/hrbrmstr\/sergeant\/issues\">an issue<\/a> with a feature request or bug report.<\/p>\n<p>I expect to have this CRAN-able in January, 2017.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I recently mentioned that I&#8217;ve been working on a development version of an Apache Drill R package called sergeant. Here&#8217;s a lifted &#8220;TLDR&#8221; on Drill: Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":true,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[819,779,781,91,778],"tags":[810],"class_list":["post-4753","post","type-post","status-publish","format-standard","hentry","category-apache-drill","category-dplyr","category-drill","category-r","category-sql","tag-post"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>sergeant : An R Boot Camp for Apache Drill - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"sergeant : An R Boot Camp for Apache Drill - rud.is\" \/>\n<meta property=\"og:description\" content=\"I recently mentioned that I&#8217;ve been working on a development version of an Apache Drill R package called sergeant. Here&#8217;s a lifted &#8220;TLDR&#8221; on Drill: Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2016-12-20T16:01:17+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-10-05T15:59:59+00:00\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2016\\\/12\\\/20\\\/sergeant-a-r-boot-camp-for-apache-drill\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2016\\\/12\\\/20\\\/sergeant-a-r-boot-camp-for-apache-drill\\\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"sergeant : An R Boot Camp for Apache Drill\",\"datePublished\":\"2016-12-20T16:01:17+00:00\",\"dateModified\":\"2018-10-05T15:59:59+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2016\\\/12\\\/20\\\/sergeant-a-r-boot-camp-for-apache-drill\\\/\"},\"wordCount\":942,\"commentCount\":3,\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"keywords\":[\"post\"],\"articleSection\":[\"Apache Drill\",\"dplyr\",\"drill\",\"R\",\"SQL\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2016\\\/12\\\/20\\\/sergeant-a-r-boot-camp-for-apache-drill\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2016\\\/12\\\/20\\\/sergeant-a-r-boot-camp-for-apache-drill\\\/\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/2016\\\/12\\\/20\\\/sergeant-a-r-boot-camp-for-apache-drill\\\/\",\"name\":\"sergeant : An R Boot Camp for Apache Drill - rud.is\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\"},\"datePublished\":\"2016-12-20T16:01:17+00:00\",\"dateModified\":\"2018-10-05T15:59:59+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2016\\\/12\\\/20\\\/sergeant-a-r-boot-camp-for-apache-drill\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2016\\\/12\\\/20\\\/sergeant-a-r-boot-camp-for-apache-drill\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2016\\\/12\\\/20\\\/sergeant-a-r-boot-camp-for-apache-drill\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/rud.is\\\/b\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"sergeant : An R Boot Camp for Apache Drill\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/rud.is\\\/b\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\\\/\\\/rud.is\"],\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/author\\\/hrbrmstr\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"sergeant : An R Boot Camp for Apache Drill - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/","og_locale":"en_US","og_type":"article","og_title":"sergeant : An R Boot Camp for Apache Drill - rud.is","og_description":"I recently mentioned that I&#8217;ve been working on a development version of an Apache Drill R package called sergeant. Here&#8217;s a lifted &#8220;TLDR&#8221; on Drill: Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A [&hellip;]","og_url":"https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/","og_site_name":"rud.is","article_published_time":"2016-12-20T16:01:17+00:00","article_modified_time":"2018-10-05T15:59:59+00:00","author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"sergeant : An R Boot Camp for Apache Drill","datePublished":"2016-12-20T16:01:17+00:00","dateModified":"2018-10-05T15:59:59+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/"},"wordCount":942,"commentCount":3,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"keywords":["post"],"articleSection":["Apache Drill","dplyr","drill","R","SQL"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/","url":"https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/","name":"sergeant : An R Boot Camp for Apache Drill - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"datePublished":"2016-12-20T16:01:17+00:00","dateModified":"2018-10-05T15:59:59+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"sergeant : An R Boot Camp for Apache Drill"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-1eF","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":10121,"url":"https:\/\/rud.is\/b\/2018\/04\/20\/painless-odbc-dplyr-connections-to-amazon-athena-and-apache-drill-with-r-odbc\/","url_meta":{"origin":4753,"position":0},"title":"Painless ODBC  + dplyr Connections to Amazon Athena and Apache Drill with R &#038; odbc","author":"hrbrmstr","date":"2018-04-20","format":false,"excerpt":"I spent some time this morning upgrading the JDBC driver (and changing up some supporting code to account for changes to it) for my metis package? which connects R up to Amazon Athena via RJDBC. I'm used to JDBC and have to deal with Java separately from R so I'm\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":6127,"url":"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/","url_meta":{"origin":4753,"position":1},"title":"Reading PCAP Files with Apache Drill and the sergeant R Package","author":"hrbrmstr","date":"2017-07-27","format":false,"excerpt":"It's no secret that I'm a fan of Apache Drill. One big strength of the platform is that it normalizes the access to diverse data sources down to ANSI SQL calls, which means that I can pull data from parquet, Hie, HBase, Kudu, CSV, JSON, MongoDB and MariaDB with the\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":12772,"url":"https:\/\/rud.is\/b\/2020\/06\/01\/sergeant-0-9-0-is-on-its-way-to-cran-mirrors\/","url_meta":{"origin":4753,"position":2},"title":"{sergeant} 0.9.0 Is On Its Way to CRAN Mirrors!","author":"hrbrmstr","date":"2020-06-01","format":false,"excerpt":"Tis been a long time coming, but a minor change to default S3 parameters in tibbles finally caused a push of {sergeant} \u2014\u00a0the R package that lets you use the Apache Drill REST API via {DBI}, {dplyr}, or directly \u2014 to CRAN. The CRAN automatic processing system approved the release\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11712,"url":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/","url_meta":{"origin":4753,"position":3},"title":"Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types &#038; Mounds of New Metadata","author":"hrbrmstr","date":"2019-01-02","format":false,"excerpt":"Apache Drill is an innovative distributed SQL engine designed to enable data exploration and analytics on non-relational datastores [...] without having to create and manage schemas. [...] It has a schema-free JSON document model similar to MongoDB and Elasticsearch; [a plethora of APIs, including] ANSI SQL, ODBC\/JDBC, and HTTP[S] REST;\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6091,"url":"https:\/\/rud.is\/b\/2017\/06\/17\/replicating-the-apache-drill-yelp-academic-dataset-with-sergeant\/","url_meta":{"origin":4753,"position":4},"title":"Replicating the Apache Drill &#8216;Yelp&#8217; Academic Dataset Analysis with sergeant","author":"hrbrmstr","date":"2017-06-17","format":false,"excerpt":"The Apache Drill folks have a nice walk-through tutorial on how to analyze the Yelp Academic Dataset with Drill. It's a bit out of date (the current Yelp data set structure is different enough that the tutorial will error out at various points), but it's a great example of how\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":12855,"url":"https:\/\/rud.is\/b\/2020\/11\/20\/updated-apache-drill-r-jdbc-interface-package-sergeant-caffeinated-with-dbplyr-2-x-compatibility\/","url_meta":{"origin":4753,"position":5},"title":"Updated Apache Drill R JDBC Interface Package {sergeant.caffeinated} With {dbplyr} 2.x Compatibility","author":"hrbrmstr","date":"2020-11-20","format":false,"excerpt":"While the future of the Apache Drill ecosystem is somewhat in-play (MapR \u2014 a major sponsoring org for the project \u2014 is kinda dead), I still use it almost daily (on my local home office cluster) to avoid handing over any more money to Amazon than I\/we already do. The\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/4753","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=4753"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/4753\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=4753"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=4753"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=4753"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}