

{"id":11392,"date":"2018-08-16T11:49:17","date_gmt":"2018-08-16T16:49:17","guid":{"rendered":"https:\/\/rud.is\/b\/?p=11392"},"modified":"2018-10-05T11:02:56","modified_gmt":"2018-10-05T16:02:56","slug":"updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/","title":{"rendered":"Updates to the sergeant (Apache Drill connector) Package &#038; a look at Apache Drill 1.14.0 release"},"content":{"rendered":"<p>Apache Drill 1.14.0 was <a href=\"https:\/\/drill.apache.org\/blog\/2018\/08\/05\/drill-1.14-released\/\">recently released<\/a>, bringing with it many new features and a temporary incompatibility with the current rev of the MapR ODBC drivers. The Drill community expects new ODBC drivers to arrive shortly. The <a href=\"https:\/\/gitlab.com\/hrbrmstr\/sergeant\"><code>sergeant<\/code>?<\/a> is an alternative to ODBC for R users as it provides a <code>dplyr<\/code> interface to the REST API along with a JDBC interface and functions to work directly with the REST API in a more programmatic fashion.<\/p>\n<h3>First-class dplyr-citizen support for the sergeant JDBC interface<\/h3>\n<p>I&#8217;ve been primarily using the ODBC interface for a while, now, since it&#8217;s dead simple to use it with <code>dplyr<\/code> (as <a href=\"https:\/\/rud.is\/books\/drill-sergeant-rstats\/wiring-up-drill-and-r-odbc-style.html\">has been noted<\/a> in my still-unfinished, short cookbook on wiring up Drill and R). The ODBC incompatibility is pretty severe since it&#8217;s at the <a href=\"https:\/\/developers.google.com\/protocol-buffers\/\">protobuf<\/a>-level, but <code>sergeant::src_drill()<\/code> is an easy swap out and does not have any issues since it works against the REST API. Unfortunately, the query endpoint of the REST API mangles the field order when it returns query results. This really isn&#8217;t <em>too<\/em> painful since it&#8217;s easy to add in a <code>select()<\/code> call after gathering query results to reorder things. However, it&#8217;s painful enough that it facilitated rounding out some of the corners to the JDBC interface.<\/p>\n<p><code>sergeant::drill_jdbc()<\/code> now returns a <code>&lt;DrillJDBCConnection&gt;<\/code> object which was necessary to add <code>dplyr<\/code> classes for just enough bits to enable smooth operation with the <code>tbl()<\/code> function (without breaking all your other RJDBC usage in the same session). The next blog section will use the new JDBC interface with <code>dplyr<\/code> as it introduces one of Drill&#8217;s new features.<\/p>\n<h3>Query Image Metadata with Apache Drill 1.14.0<\/h3>\n<p>There are quite a few <a href=\"https:\/\/github.com\/search?utf8=%E2%9C%93&amp;q=exif++language%3AR&amp;type=Repositories&amp;ref=advsearch&amp;l=R&amp;l=\">R packages for reading image\/media metadata<\/a>. Since that seems to be en vogue, R folks might be interested in Drill&#8217;s new <a href=\"https:\/\/drill.apache.org\/docs\/image-metadata-format-plugin\/\">image metadata format plugin<\/a>. Just point drill to a directory of files and you can use a familiar <code>dplyr<\/code> interface to get the deets on your <strike>pirated torrent archive<\/strike>family photo inventory.<\/p>\n<p>You first need to follow the directions at the aforelinked resource and <a href=\"https:\/\/rud.is\/books\/drill-sergeant-rstats\/adding-or-modifying-drill-formats.html\">add the following format<\/a> to the <code>formats:<\/code> section.<\/p>\n<pre><code class=\"language-json\">formats: {\n     \"image\": {\n       \"type\": \"image\",\n       \"extensions\": [\n         \"jpg\", \"jpeg\", \"jpe\", \"tif\", \"tiff\", \"dng\", \"psd\", \"png\", \"bmp\", \"gif\",\n         \"ico\", \"pcx\", \"wav\", \"wave\", \"avi\", \"webp\", \"mov\", \"mp4\", \"m4a\", \"m4p\",\n         \"m4b\", \"m4r\", \"m4v\", \"3gp\", \"3g2\", \"eps\", \"epsf\", \"epsi\", \"ai\", \"arw\",\n         \"crw\", \"cr2\", \"nef\", \"orf\", \"raf\", \"rw2\", \"rwl\", \"srw\", \"x3f\"\n       ],\n       \"fileSystemMetadata\": true,\n       \"descriptive\": true,\n       \"timeZone\": null\n     }  \n   }<\/code><\/pre>\n<p>Note that the configuration snippet on Drill&#8217;s site (as of the date-stamp on this post) did not have a <code>,<\/code> after the <code>]<\/code> for the <code>extensions<\/code> array, so copy this one instead.<\/p>\n<p>I created a <code>media<\/code> <a href=\"https:\/\/rud.is\/books\/drill-sergeant-rstats\/adding-a-new-workspace-to-drill.html\">workspace<\/a> and set the <code>defaultInputFormat<\/code> to <code>image<\/code>. Here&#8217;s a naive first look at what you can get back from a simple query to a <code>jpg<\/code> directory under it (using the new JDBC interface and <code>dplyr<\/code>):<\/p>\n<pre><code class=\"language-r\">library(sergeant)\nlibrary(tidyverse)\n\n(con <- drill_jdbc(\"bigd:2181\"))\n## <DrillJDBCConnection>\n\ntbl(con, \"dfs.media.`\/jpg\/*`\") %>%\n  glimpse()\n## Observations: ??\n## Variables: 28\n## $ FileSize        <chr> \"4412686 bytes\", \"4737696 bytes\", \"4253912 byt...\n## $ FileDateTime    <chr> \"Thu Aug 16 03:04:16 -04:00 2018\", \"Thu Aug 16...\n## $ Format          <chr> \"JPEG\", \"JPEG\", \"JPEG\", \"JPEG\", \"JPEG\", \"JPEG\"...\n## $ PixelWidth      <chr> \"4032\", \"4032\", \"4032\", \"4032\", \"4032\", \"4032\"...\n## $ PixelHeight     <chr> \"3024\", \"3024\", \"3024\", \"3024\", \"3024\", \"3024\"...\n## $ BitsPerPixel    <chr> \"24\", \"24\", \"24\", \"24\", \"24\", \"24\", \"24\", \"24\"...\n## $ DPIWidth        <chr> \"72\", \"72\", \"72\", \"72\", \"72\", \"72\", \"72\", \"72\"...\n## $ DPIHeight       <chr> \"72\", \"72\", \"72\", \"72\", \"72\", \"72\", \"72\", \"72\"...\n## $ Orientaion      <chr> \"Unknown (0)\", \"Unknown (0)\", \"Unknown (0)\", \"...\n## $ ColorMode       <chr> \"RGB\", \"RGB\", \"RGB\", \"RGB\", \"RGB\", \"RGB\", \"RGB...\n## $ HasAlpha        <chr> \"false\", \"false\", \"false\", \"false\", \"false\", \"...\n## $ Duration        <chr> \"00:00:00\", \"00:00:00\", \"00:00:00\", \"00:00:00\"...\n## $ VideoCodec      <chr> \"Unknown\", \"Unknown\", \"Unknown\", \"Unknown\", \"U...\n## $ FrameRate       <chr> \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"...\n## $ AudioCodec      <chr> \"Unknown\", \"Unknown\", \"Unknown\", \"Unknown\", \"U...\n## $ AudioSampleSize <chr> \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"...\n## $ AudioSampleRate <chr> \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"...\n## $ JPEG            <chr> \"{\\\"CompressionType\\\":\\\"Baseline\\\",\\\"DataPreci...\n## $ JFIF            <chr> \"{\\\"Version\\\":\\\"1.1\\\",\\\"ResolutionUnits\\\":\\\"no...\n## $ ExifIFD0        <chr> \"{\\\"Make\\\":\\\"Apple\\\",\\\"Model\\\":\\\"iPhone 7 Plus...\n## $ ExifSubIFD      <chr> \"{\\\"ExposureTime\\\":\\\"1\/2227 sec\\\",\\\"FNumber\\\":...\n## $ AppleMakernote  <chr> \"{\\\"UnknownTag(0x0001)\\\":\\\"5\\\",\\\"UnknownTag(0x...\n## $ GPS             <chr> \"{\\\"GPSLatitudeRef\\\":\\\"N\\\",\\\"GPSLatitude\\\":\\\"4...\n## $ XMP             <chr> \"{\\\"XMPValueCount\\\":\\\"4\\\",\\\"Photoshop\\\":{\\\"Dat...\n## $ Photoshop       <chr> \"{\\\"CaptionDigest\\\":\\\"48 89 11 77 33 105 192 3...\n## $ IPTC            <chr> \"{\\\"CodedCharacterSet\\\":\\\"UTF-8\\\",\\\"Applicatio...\n## $ Huffman         <chr> \"{\\\"NumberOfTables\\\":\\\"4 Huffman tables\\\"}\", \"...\n## $ FileType        <chr> \"{\\\"DetectedFileTypeName\\\":\\\"JPEG\\\",\\\"Detected...<\/code><\/pre>\n<p>That&#8217;s quite a bit of metadata, but the Drill format plugin page kinda fibs a bit about column types since we see many <code>chr<\/code>s there. You may be quick to question the <code>sergeant<\/code> package but this isn&#8217;t using the REST interface and we can use <code>DBI<\/code> calls to ask Drill what&#8217;s it&#8217;s sending us:<\/p>\n<pre><code class=\"language-r\">dbSendQuery(con, \"SELECT * FROM dfs.media.`\/jpg\/*`\") %>%\n  dbColumnInfo()\n##         field.name        field.type data.type            name\n## 1         FileSize CHARACTER VARYING character        FileSize\n## 2     FileDateTime CHARACTER VARYING character    FileDateTime\n## 3           Format CHARACTER VARYING character          Format\n## 4       PixelWidth CHARACTER VARYING character      PixelWidth\n## 5      PixelHeight CHARACTER VARYING character     PixelHeight\n## 6     BitsPerPixel CHARACTER VARYING character    BitsPerPixel\n## 7         DPIWidth CHARACTER VARYING character        DPIWidth\n## 8        DPIHeight CHARACTER VARYING character       DPIHeight\n## 9       Orientaion CHARACTER VARYING character      Orientaion\n## 10       ColorMode CHARACTER VARYING character       ColorMode\n## 11        HasAlpha CHARACTER VARYING character        HasAlpha\n## 12        Duration CHARACTER VARYING character        Duration\n## 13      VideoCodec CHARACTER VARYING character      VideoCodec\n## 14       FrameRate CHARACTER VARYING character       FrameRate\n## 15      AudioCodec CHARACTER VARYING character      AudioCodec\n## 16 AudioSampleSize CHARACTER VARYING character AudioSampleSize\n## 17 AudioSampleRate CHARACTER VARYING character AudioSampleRate\n## 18            JPEG               MAP character            JPEG\n## 19            JFIF               MAP character            JFIF\n## 20        ExifIFD0               MAP character        ExifIFD0\n## 21      ExifSubIFD               MAP character      ExifSubIFD\n## 22  AppleMakernote               MAP character  AppleMakernote\n## 23             GPS               MAP character             GPS\n## 24             XMP               MAP character             XMP\n## 25       Photoshop               MAP character       Photoshop\n## 26            IPTC               MAP character            IPTC\n## 27         Huffman               MAP character         Huffman\n## 28        FileType               MAP character        FileType<\/code><\/pre>\n<p>We can still work with the results, but there&#8217;s also a pretty key element missing: the media <em>filename<\/em>. The reason it&#8217;s not in the listing is that <code>filename<\/code> is an <a href=\"https:\/\/drill.apache.org\/docs\/querying-a-file-system-introduction\/#implicit-columns\">implicit column<\/a> that we have to ask for. So, we need to modify our query to be something like this:<\/p>\n<pre><code class=\"language-r\">tbl(con, sql(\"SELECT filename AS fname, * FROM dfs.media.`\/jpg\/*`\")) %>%\n  glimpse()\n## Observations: ??\n## Variables: 29\n## $ fname           <chr> \"IMG_0778.jpg\", \"IMG_0802.jpg\", \"IMG_0793.jpg\"...\n## $ FileSize        <chr> \"4412686 bytes\", \"4737696 bytes\", \"4253912 byt...\n## $ FileDateTime    <chr> \"Thu Aug 16 03:04:16 -04:00 2018\", \"Thu Aug 16...\n## $ Format          <chr> \"JPEG\", \"JPEG\", \"JPEG\", \"JPEG\", \"JPEG\", \"JPEG\"...\n## $ PixelWidth      <chr> \"4032\", \"4032\", \"4032\", \"4032\", \"4032\", \"4032\"...\n## $ PixelHeight     <chr> \"3024\", \"3024\", \"3024\", \"3024\", \"3024\", \"3024\"...\n## $ BitsPerPixel    <chr> \"24\", \"24\", \"24\", \"24\", \"24\", \"24\", \"24\", \"24\"...\n## $ DPIWidth        <chr> \"72\", \"72\", \"72\", \"72\", \"72\", \"72\", \"72\", \"72\"...\n## $ DPIHeight       <chr> \"72\", \"72\", \"72\", \"72\", \"72\", \"72\", \"72\", \"72\"...\n## $ Orientaion      <chr> \"Unknown (0)\", \"Unknown (0)\", \"Unknown (0)\", \"...\n## $ ColorMode       <chr> \"RGB\", \"RGB\", \"RGB\", \"RGB\", \"RGB\", \"RGB\", \"RGB...\n## $ HasAlpha        <chr> \"false\", \"false\", \"false\", \"false\", \"false\", \"...\n## $ Duration        <chr> \"00:00:00\", \"00:00:00\", \"00:00:00\", \"00:00:00\"...\n## $ VideoCodec      <chr> \"Unknown\", \"Unknown\", \"Unknown\", \"Unknown\", \"U...\n## $ FrameRate       <chr> \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"...\n## $ AudioCodec      <chr> \"Unknown\", \"Unknown\", \"Unknown\", \"Unknown\", \"U...\n## $ AudioSampleSize <chr> \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"...\n## $ AudioSampleRate <chr> \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"0\", \"...\n## $ JPEG            <chr> \"{\\\"CompressionType\\\":\\\"Baseline\\\",\\\"DataPreci...\n## $ JFIF            <chr> \"{\\\"Version\\\":\\\"1.1\\\",\\\"ResolutionUnits\\\":\\\"no...\n## $ ExifIFD0        <chr> \"{\\\"Make\\\":\\\"Apple\\\",\\\"Model\\\":\\\"iPhone 7 Plus...\n## $ ExifSubIFD      <chr> \"{\\\"ExposureTime\\\":\\\"1\/2227 sec\\\",\\\"FNumber\\\":...\n## $ AppleMakernote  <chr> \"{\\\"UnknownTag(0x0001)\\\":\\\"5\\\",\\\"UnknownTag(0x...\n## $ GPS             <chr> \"{\\\"GPSLatitudeRef\\\":\\\"N\\\",\\\"GPSLatitude\\\":\\\"4...\n## $ XMP             <chr> \"{\\\"XMPValueCount\\\":\\\"4\\\",\\\"Photoshop\\\":{\\\"Dat...\n## $ Photoshop       <chr> \"{\\\"CaptionDigest\\\":\\\"48 89 11 77 33 105 192 3...\n## $ IPTC            <chr> \"{\\\"CodedCharacterSet\\\":\\\"UTF-8\\\",\\\"Applicatio...\n## $ Huffman         <chr> \"{\\\"NumberOfTables\\\":\\\"4 Huffman tables\\\"}\", \"...\n## $ FileType        <chr> \"{\\\"DetectedFileTypeName\\\":\\\"JPEG\\\",\\\"Detected...<\/code><\/pre>\n<p>We <em>could<\/em> work with the &#8220;map&#8221; columns with Drill&#8217;s SQL, but this is just metadata and even if there are many files, most R folks have sufficient system memory these days to collect it all and work with it locally. There&#8217;s nothing stopping you from working on the SQL side, though, and it may be a better choice if you&#8217;ll be using this to process huge archives. But, we&#8217;ll do this in R and convert a bunch of field types along the way:<\/p>\n<pre><code class=\"language-r\">from_map <- function(x) { map(x, jsonlite::fromJSON)}\n\ntbl(con, sql(\"SELECT filename AS fname, * FROM dfs.media.`\/jpg\/*`\")) %>%\n  collect() %>%\n  mutate_at(\n    .vars = vars(\n      JPEG, JFIF, ExifSubIFD, AppleMakernote, GPS, XMP, Photoshop, IPTC, Huffman, FileType\n    ),\n    .funs=funs(from_map)\n  ) %>%\n  mutate_at(\n    .vars = vars(\n      PixelWidth, PixelHeight, DPIWidth, DPIHeight, FrameRate, AudioSampleSize, AudioSampleRate\n    ),\n    .funs=funs(as.numeric)\n  ) %>%\n  glimpse() -> media_df\n## Observations: 11\n## Variables: 29\n## $ fname           <chr> \"IMG_0778.jpg\", \"IMG_0802.jpg\", \"IMG_0793.jpg\"...\n## $ FileSize        <chr> \"4412686 bytes\", \"4737696 bytes\", \"4253912 byt...\n## $ FileDateTime    <chr> \"Thu Aug 16 03:04:16 -04:00 2018\", \"Thu Aug 16...\n## $ Format          <chr> \"JPEG\", \"JPEG\", \"JPEG\", \"JPEG\", \"JPEG\", \"JPEG\"...\n## $ PixelWidth      <dbl> 4032, 4032, 4032, 4032, 4032, 4032, 3024, 4032...\n## $ PixelHeight     <dbl> 3024, 3024, 3024, 3024, 3024, 3024, 4032, 3024...\n## $ BitsPerPixel    <chr> \"24\", \"24\", \"24\", \"24\", \"24\", \"24\", \"24\", \"24\"...\n## $ DPIWidth        <dbl> 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72\n## $ DPIHeight       <dbl> 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72\n## $ Orientaion      <chr> \"Unknown (0)\", \"Unknown (0)\", \"Unknown (0)\", \"...\n## $ ColorMode       <chr> \"RGB\", \"RGB\", \"RGB\", \"RGB\", \"RGB\", \"RGB\", \"RGB...\n## $ HasAlpha        <chr> \"false\", \"false\", \"false\", \"false\", \"false\", \"...\n## $ Duration        <chr> \"00:00:00\", \"00:00:00\", \"00:00:00\", \"00:00:00\"...\n## $ VideoCodec      <chr> \"Unknown\", \"Unknown\", \"Unknown\", \"Unknown\", \"U...\n## $ FrameRate       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0\n## $ AudioCodec      <chr> \"Unknown\", \"Unknown\", \"Unknown\", \"Unknown\", \"U...\n## $ AudioSampleSize <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0\n## $ AudioSampleRate <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0\n## $ JPEG            <list> [[\"Baseline\", \"8 bits\", \"3024 pixels\", \"4032 ...\n## $ JFIF            <list> [[\"1.1\", \"none\", \"72 dots\", \"72 dots\", \"0\", \"...\n## $ ExifIFD0        <chr> \"{\\\"Make\\\":\\\"Apple\\\",\\\"Model\\\":\\\"iPhone 7 Plus...\n## $ ExifSubIFD      <list> [[\"1\/2227 sec\", \"f\/1.8\", \"Program normal\", \"2...\n## $ AppleMakernote  <list> [[\"5\", \"[558 values]\", \"[104 values]\", \"1\", \"...\n## $ GPS             <list> [[\"N\", \"44\u00b0 19' 6.3\\\"\", \"W\", \"-68\u00b0 11' 22.39\\...\n## $ XMP             <list> [[\"4\", [\"2017-06-22T14:28:04\"], [\"2017-06-22T...\n## $ Photoshop       <list> [[\"48 89 11 77 33 105 192 33 170 252 63 34 43...\n## $ IPTC            <list> [[\"UTF-8\", \"2\", \"14:28:04\", \"2017:06:22\", \"20...\n## $ Huffman         <list> [[\"4 Huffman tables\"], [\"4 Huffman tables\"], ...\n## $ FileType        <list> [[\"JPEG\", \"Joint Photographic Experts Group\",...<\/code><\/pre>\n<p>Now, we can do anything with the data, including getting the average file size:<\/p>\n<pre><code class=\"language-r\">mutate(media_df, FileSize = str_replace(FileSize, \" bytes\", \"\") %>% as.numeric()) %>%\n  summarise(mean(FileSize))\n## # A tibble: 1 x 1\n##   `mean(FileSize)`\n##              <dbl>\n## 1         3878963.<\/code><\/pre>\n<h3>FIN<\/h3>\n<p>The enhancements to the JDBC interface have only been given a light workout but seem to be doing well so-far. Kick the tyres and file an issue if you have any problems. ODBC users should not have to wait long for new drivers and <code>src_drill()<\/code> aficionados can keep chugging along as before without issue.<\/p>\n<p>For those new to Apache Drill, there&#8217;s now an <a href=\"https:\/\/drill.apache.org\/docs\/running-drill-on-docker\/\">official Docker image<\/a> for it so you can get up and running without adding too much cruft to your local systems. I may add support for spinning up and managing a Drill container to the <code>sergeant<\/code> package, so keep your eyes on pushes to the repo.<\/p>\n<p>Also keep an eye on <a href=\"https:\/\/rud.is\/books\/drill-sergeant-rstats\/\">the mini-cookbook<\/a> as I&#8217;ll be modifying to to account for the new package changes and introduce additional, new Drill features.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Apache Drill 1.14.0 was recently released, bringing with it many new features and a temporary incompatibility with the current rev of the MapR ODBC drivers. The Drill community expects new ODBC drivers to arrive shortly. The sergeant? is an alternative to ODBC for R users as it provides a dplyr interface to the REST API [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[819,781,91],"tags":[],"class_list":["post-11392","post","type-post","status-publish","format-standard","hentry","category-apache-drill","category-drill","category-r"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Updates to the sergeant (Apache Drill connector) Package &amp; a look at Apache Drill 1.14.0 release - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Updates to the sergeant (Apache Drill connector) Package &amp; a look at Apache Drill 1.14.0 release - rud.is\" \/>\n<meta property=\"og:description\" content=\"Apache Drill 1.14.0 was recently released, bringing with it many new features and a temporary incompatibility with the current rev of the MapR ODBC drivers. The Drill community expects new ODBC drivers to arrive shortly. The sergeant? is an alternative to ODBC for R users as it provides a dplyr interface to the REST API [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2018-08-16T16:49:17+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-10-05T16:02:56+00:00\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"Updates to the sergeant (Apache Drill connector) Package &#038; a look at Apache Drill 1.14.0 release\",\"datePublished\":\"2018-08-16T16:49:17+00:00\",\"dateModified\":\"2018-10-05T16:02:56+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/\"},\"wordCount\":760,\"commentCount\":2,\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"articleSection\":[\"Apache Drill\",\"drill\",\"R\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/\",\"url\":\"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/\",\"name\":\"Updates to the sergeant (Apache Drill connector) Package & a look at Apache Drill 1.14.0 release - rud.is\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/#website\"},\"datePublished\":\"2018-08-16T16:49:17+00:00\",\"dateModified\":\"2018-10-05T16:02:56+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/rud.is\/b\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Updates to the sergeant (Apache Drill connector) Package &#038; a look at Apache Drill 1.14.0 release\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/rud.is\/b\/#website\",\"url\":\"https:\/\/rud.is\/b\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/rud.is\/b\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\/\/rud.is\"],\"url\":\"https:\/\/rud.is\/b\/author\/hrbrmstr\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Updates to the sergeant (Apache Drill connector) Package & a look at Apache Drill 1.14.0 release - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/","og_locale":"en_US","og_type":"article","og_title":"Updates to the sergeant (Apache Drill connector) Package & a look at Apache Drill 1.14.0 release - rud.is","og_description":"Apache Drill 1.14.0 was recently released, bringing with it many new features and a temporary incompatibility with the current rev of the MapR ODBC drivers. The Drill community expects new ODBC drivers to arrive shortly. The sergeant? is an alternative to ODBC for R users as it provides a dplyr interface to the REST API [&hellip;]","og_url":"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/","og_site_name":"rud.is","article_published_time":"2018-08-16T16:49:17+00:00","article_modified_time":"2018-10-05T16:02:56+00:00","author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"Updates to the sergeant (Apache Drill connector) Package &#038; a look at Apache Drill 1.14.0 release","datePublished":"2018-08-16T16:49:17+00:00","dateModified":"2018-10-05T16:02:56+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/"},"wordCount":760,"commentCount":2,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"articleSection":["Apache Drill","drill","R"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/","url":"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/","name":"Updates to the sergeant (Apache Drill connector) Package & a look at Apache Drill 1.14.0 release - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"datePublished":"2018-08-16T16:49:17+00:00","dateModified":"2018-10-05T16:02:56+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2018\/08\/16\/updates-to-the-sergeant-apache-drill-connector-package-apache-drill-1-14-0-release\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"Updates to the sergeant (Apache Drill connector) Package &#038; a look at Apache Drill 1.14.0 release"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-2XK","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":10121,"url":"https:\/\/rud.is\/b\/2018\/04\/20\/painless-odbc-dplyr-connections-to-amazon-athena-and-apache-drill-with-r-odbc\/","url_meta":{"origin":11392,"position":0},"title":"Painless ODBC  + dplyr Connections to Amazon Athena and Apache Drill with R &#038; odbc","author":"hrbrmstr","date":"2018-04-20","format":false,"excerpt":"I spent some time this morning upgrading the JDBC driver (and changing up some supporting code to account for changes to it) for my metis package? which connects R up to Amazon Athena via RJDBC. I'm used to JDBC and have to deal with Java separately from R so I'm\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":6111,"url":"https:\/\/rud.is\/b\/2017\/07\/17\/ten-hut-the-apache-drill-r-interface-package-sergeant-is-now-on-cran\/","url_meta":{"origin":11392,"position":1},"title":"Ten-HUT! The Apache Drill R interface package \u2014\u00a0sergeant \u2014\u00a0is now on CRAN","author":"hrbrmstr","date":"2017-07-17","format":false,"excerpt":"I'm extremely pleased to announce that the sergeant package is now on CRAN or will be hitting your local CRAN mirror soon. sergeant provides JDBC, DBI and dplyr\/dbplyr interfaces to Apache Drill. I've also wrapped a few goodies into the dplyr custom functions that work with Drill and if you\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":4753,"url":"https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/","url_meta":{"origin":11392,"position":2},"title":"sergeant : An R Boot Camp for Apache Drill","author":"hrbrmstr","date":"2016-12-20","format":false,"excerpt":"I recently mentioned that I've been working on a development version of an Apache Drill R package called sergeant. Here's a lifted \"TLDR\" on Drill: Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift,\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11712,"url":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/","url_meta":{"origin":11392,"position":3},"title":"Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types &#038; Mounds of New Metadata","author":"hrbrmstr","date":"2019-01-02","format":false,"excerpt":"Apache Drill is an innovative distributed SQL engine designed to enable data exploration and analytics on non-relational datastores [...] without having to create and manage schemas. [...] It has a schema-free JSON document model similar to MongoDB and Elasticsearch; [a plethora of APIs, including] ANSI SQL, ODBC\/JDBC, and HTTP[S] REST;\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":12772,"url":"https:\/\/rud.is\/b\/2020\/06\/01\/sergeant-0-9-0-is-on-its-way-to-cran-mirrors\/","url_meta":{"origin":11392,"position":4},"title":"{sergeant} 0.9.0 Is On Its Way to CRAN Mirrors!","author":"hrbrmstr","date":"2020-06-01","format":false,"excerpt":"Tis been a long time coming, but a minor change to default S3 parameters in tibbles finally caused a push of {sergeant} \u2014\u00a0the R package that lets you use the Apache Drill REST API via {DBI}, {dplyr}, or directly \u2014 to CRAN. The CRAN automatic processing system approved the release\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":12855,"url":"https:\/\/rud.is\/b\/2020\/11\/20\/updated-apache-drill-r-jdbc-interface-package-sergeant-caffeinated-with-dbplyr-2-x-compatibility\/","url_meta":{"origin":11392,"position":5},"title":"Updated Apache Drill R JDBC Interface Package {sergeant.caffeinated} With {dbplyr} 2.x Compatibility","author":"hrbrmstr","date":"2020-11-20","format":false,"excerpt":"While the future of the Apache Drill ecosystem is somewhat in-play (MapR \u2014 a major sponsoring org for the project \u2014 is kinda dead), I still use it almost daily (on my local home office cluster) to avoid handing over any more money to Amazon than I\/we already do. The\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11392","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=11392"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11392\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=11392"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=11392"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=11392"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}