

{"id":11712,"date":"2019-01-02T17:24:17","date_gmt":"2019-01-02T22:24:17","guid":{"rendered":"https:\/\/rud.is\/b\/?p=11712"},"modified":"2019-01-02T17:24:17","modified_gmt":"2019-01-02T22:24:17","slug":"apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/","title":{"rendered":"Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types &#038; Mounds of New Metadata"},"content":{"rendered":"<p><a href=\"https:\/\/drill.apache.org\/\">Apache Drill<\/a> is <em>an innovative distributed SQL engine designed to enable data exploration and analytics on non-relational datastores [&#8230;] without having to create and manage schemas. [&#8230;] It has a schema-free JSON document model similar to MongoDB and Elasticsearch; [a plethora of APIs, including] ANSI SQL, ODBC\/JDBC, and HTTP[S] REST; [is] extremely user and developer friendly; [and, has a] pluggable architecture enables connectivity to multiple datastores<\/em>.<\/p>\n<p>To ring in the new year the Drill team knocked out a new <a href=\"https:\/\/drill.apache.org\/docs\/apache-drill-1-15-0-release-notes\/\">1.15.0 release<\/a> with a cadre of new functionality including:<\/p>\n<ul>\n<li><a href=\"https:\/\/drill.apache.org\/docs\/from-clause\/#join-types\">CROSS JOIN<\/a> support<\/li>\n<li>New metadata tables for <a href=\"https:\/\/drill.apache.org\/docs\/querying-the-information-schema\/#files\">files<\/a>, <a href=\"https:\/\/drill.apache.org\/docs\/querying-system-tables\/#querying-the-functions-table\">system functiosn<\/a>, and <a href=\"https:\/\/drill.apache.org\/docs\/querying-system-tables\/#querying-the-options-table\">system functions<\/a><\/li>\n<li><a href=\"https:\/\/issues.apache.org\/jira\/browse\/DRILL-6179\"><code>pcapng<\/code><\/a> support<\/li>\n<li>enhanced datetime functions<\/li>\n<li>better security<\/li>\n<li>and, web UI improvements<\/li>\n<\/ul>\n<p>One super-helpful new feature of the REST API is that it now returns query results metadata along with the query results themselves. This means REST API endpoints finally know both column order <em>and<\/em> column type. This gave me cause to re-visit the <code>sergeant<\/code> package [<a href=\"https:\/\/gitlab.com\/hrbrmstr\/sergeant\/tree\/0.8.0\">GL<\/a>|<a href=\"https:\/\/github.com\/hrbrmstr\/sergeant\/tree\/0.8.0\">GH<\/a>] and make some accommodations for some of these new features.<\/p>\n<h3>Ushering In A New Order<\/h3>\n<p>Drill REST API queries return a <code>\"columns\"<\/code> field and <code>\"metadata\"<\/code> field with the data itself. We can use that to force an order to the columns as well as <em>mostly<\/em> use proper types (vs JSON-parsed\/guessed types). I say <em>mostly<\/em> since the package still uses <code>jsonlite<\/code> to parse the results and there&#8217;s no support for 64-bit integers in <code>jsonlite<\/code> (more on this later).<\/p>\n<p>We&#8217;ll use the example from <a href=\"https:\/\/issues.apache.org\/jira\/browse\/DRILL-6847\">DRILL-6847<\/a> and use the example provided by Charles Givre in his Jira issue since it will let me demonstrate more of that &#8220;<em>mostly<\/em>&#8221; comment and show off another new feature:<\/p>\n<pre><code class=\"language-r\">library(sergeant) # 0.8.0 branch of sergeant on gitlab or github\nlibrary(tidyverse)\n\ncon &lt;- src_drill(\"localhost\")\n\nx &lt;- tbl(con, \"cp.`employee.json`\")\n\nmutate(x, employee_id = as.integer64(employee_id)) %&gt;% \n  mutate(position_id = as.integer64(position_id)) %&gt;% \n  select(\n    employee_id, full_name, first_name, last_name, \n    position_id, position_title\n  ) -&gt; bigint_result\n<\/code><\/pre>\n<p>The above is (logically):<\/p>\n<pre><code class=\"language-sql\">SELECT \n  CAST (employee_id AS INT) AS employee_id,\n  full_name,\n  first_name, \n  last_name, \n  CAST (position_id AS BIGINT) AS position_id, \n  position_title \nFROM cp.`employee.json`\n<\/code><\/pre>\n<p>What do we get when we take a preview of the result?<\/p>\n<pre><code class=\"language-r\">bigint_result\n## # Source:   lazy query [?? x 6]\n## # Database: DrillConnection\n##    employee_id full_name  first_name last_name position_id position_title \n##          &lt;dbl&gt; &lt;chr&gt;      &lt;chr&gt;      &lt;chr&gt;           &lt;dbl&gt; &lt;chr&gt;          \n##  1           1 Sheri Now\u2026 Sheri      Nowmer              1 President      \n##  2           2 Derrick W\u2026 Derrick    Whelply             2 VP Country Man\u2026\n##  3           4 Michael S\u2026 Michael    Spence              2 VP Country Man\u2026\n##  4           5 Maya Guti\u2026 Maya       Gutierrez           2 VP Country Man\u2026\n##  5           6 Roberta D\u2026 Roberta    Damstra             3 VP Information\u2026\n##  6           7 Rebecca K\u2026 Rebecca    Kanagaki            4 VP Human Resou\u2026\n##  7           8 Kim Brunn\u2026 Kim        Brunner            11 Store Manager  \n##  8           9 Brenda Bl\u2026 Brenda     Blumberg           11 Store Manager  \n##  9          10 Darren St\u2026 Darren     Stanz               5 VP Finance     \n## 10          11 Jonathan \u2026 Jonathan   Murraiin           11 Store Manager  \n## # ... with more rows\n<\/code><\/pre>\n<pre><code class=\"language-plain\">Warning message:\nOne or more columns are of type BIGINT. The sergeant package currently uses jsonlite::fromJSON()\nto process Drill REST API result sets. Since jsonlite does not support 64-bit integers BIGINT \ncolumns are initially converted to numeric since that's how jsonlite::fromJSON() works. This is\nproblematic for many reasons, including trying to use 'dplyr' idioms with said converted \nBIGINT-to-numeric columns. It is recommended that you 'CAST' BIGINT columns to 'VARCHAR' prior to\nworking with them from R\/'dplyr'.\n\nIf you really need BIGINT\/integer64 support, consider using the R ODBC interface to Apache Drill \nwith the MapR ODBC drivers.\n\nThis informational warning will only be shown once per R session and you can disable them from \nappearing by setting the 'sergeant.bigint.warnonce' option to 'FALSE' \n(i.e. options(sergeant.bigint.warnonce = FALSE)). \n<\/code><\/pre>\n<p>The first thing <code>sergeant<\/code> users will notice is proper column order (before it just returned the columns in the order they came back in the JSON <code>rows[]<\/code> structure). The second thing is that we didn&#8217;t get <code>integer64<\/code>s back. Instead, we got <code>double<\/code>s plus an information warning about why and what you can do about it. Said warning only displays once per-session and can be silenced with the option <code>sergeant.bigint.warnonce<\/code>. i.e. just put:<\/p>\n<pre><code class=\"language-r\">options(sergeant.bigint.warnonce = FALSE)\n<\/code><\/pre>\n<p>in your script or <code>~\/.Rprofile<\/code> and you won&#8217;t hear from it again.<\/p>\n<p>The <code>as.integer64()<\/code> we used is not from the <code>bit64<\/code> package but an internal <code>sergeant<\/code> package function that knows how to translate said operation to, e.g. <code>CAST( employee_id AS BIGINT )<\/code>.<\/p>\n<p>You can use the ODBC drivers to gain BIGINT support and there are plans for the 0.8.0 branch to eventually use <code>rapidjsonr<\/code> at the C++-level to provide direct in-package support for BIGINTs as well.<\/p>\n<h3>Better Error Messages<\/h3>\n<p>Drill query errors that the <code>sergeant<\/code> package bubbled up through its various interfaces have not been pretty or all that useful. This has changed with the 0.8.0 branch. Let&#8217;s take a look:<\/p>\n<pre><code class=\"language-r\">tbl(con, \"cp.employees.json\")\n## # Source:   table&lt;cp.employees.json&gt; [?? x 4]\n## # Database: DrillConnection\n<\/code><\/pre>\n<pre><code class=\"language-plain\">Warning message:\nVALIDATION ERROR: From line 2, column 6 to line 2, column 24: Object 'cp.employees.json' not found\n\nOriginal Query:\n\n  1: SELECT *\n  2: FROM `cp.employees.json`\n  3: LIMIT 10\n\nQuery Profile Error Link:\nhttp:\/\/localhost:8047\/profiles\/079fc8cf-19c6-4c78-95a9-0b949a3ecf4c \n<\/code><\/pre>\n<p>As you can see in the above output, you now get a highly-formatted return value with the original SQL query broken into lines (with line numbers) and a full link to the Drill query profile so you can dig in to the gnarly details of complex query issues. As you work with this and find edge cases I missed for messages, drop an issue on your social-coding site of choice.<\/p>\n<h3>SUPPORT ALL THE PCAPs!<\/h3>\n<p>Drill has had packet capture (PCAP) file support for a while now and 1.15.0 adds support for the more modern\/rich <a href=\"http:\/\/pcapng.com\/\"><code>pcapng<\/code> format<\/a>. To enable support for this you need to add <code>\"pcapng\": {\"type\": \"pcapng\", \"extensions\": [\"pcapng\"] },<\/code> to the <code>\"formats\"<\/code> section of your storage plugins and also configure a workspace directory to use that as the default (the principle of which is <a href=\"https:\/\/rud.is\/books\/drill-sergeant-rstats\/adding-or-modifying-drill-formats.html\">covered here<\/a>).<\/p>\n<p>We&#8217;ll use one of the <a href=\"https:\/\/wiki.wireshark.org\/SampleCaptures\">Wireshark example captures<\/a> to demonstrate:<\/p>\n<pre><code class=\"language-r\">pcaps &lt;- tbl(con, \"dfs.caps.`*.pcapng`\")\n\nglimpse(pcaps)\n## Observations: ??\n## Variables: 25\n## $ tcp_flags_ece_ecn_capable            &lt;int&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0...\n## $ tcp_flags_ece_congestion_experienced &lt;int&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0...\n## $ tcp_flags_psh                        &lt;int&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0...\n## $ type                                 &lt;chr&gt; \"TCP\", \"TCP\", \"TCP\", \"TCP...\n## $ tcp_flags_cwr                        &lt;int&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0...\n## $ dst_ip                               &lt;chr&gt; \"74.125.28.139\", \"10.254....\n## $ src_ip                               &lt;chr&gt; \"10.254.157.208\", \"74.125...\n## $ tcp_flags_fin                        &lt;int&gt; 1, 1, 0, 0, 0, 0, 0, 0, 0...\n## $ tcp_flags_ece                        &lt;int&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0...\n## $ tcp_flags                            &lt;int&gt; 17, 17, 16, 16, 16, 0, 0,...\n## $ tcp_flags_ack                        &lt;int&gt; 1, 1, 1, 1, 1, 0, 0, 0, 0...\n## $ src_mac_address                      &lt;chr&gt; \"00:05:9A:3C:7A:00\", \"00:...\n## $ tcp_flags_syn                        &lt;int&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0...\n## $ tcp_flags_rst                        &lt;int&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0...\n## $ timestamp                            &lt;dttm&gt; 2015-04-14 07:19:25, 201...\n## $ tcp_session                          &lt;dbl&gt; 8.353837e+17, 8.353837e+1...\n## $ packet_data                          &lt;chr&gt; \"\\\"3DU...&lt;z...E..(J.@.......\n## $ tcp_parsed_flags                     &lt;chr&gt; \"ACK|FIN\", \"ACK|FIN\", \"AC...\n## $ tcp_flags_ns                         &lt;int&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0...\n## $ src_port                             &lt;int&gt; 60268, 443, 60268, 58382,...\n## $ packet_length                        &lt;int&gt; 54, 54, 54, 55, 66, 78, 7...\n## $ tcp_flags_urg                        &lt;int&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0...\n## $ tcp_ack                              &lt;int&gt; 662445631, 1496589825, 66...\n## $ dst_port                             &lt;int&gt; 443, 60268, 443, 29216, 5...\n## $ dst_mac_address                      &lt;chr&gt; \"00:11:22:33:44:55\", \"00:...\n\ncount(pcaps, src_ip, dst_ip, sort=TRUE)\n## # Source:     lazy query [?? x 3]\n## # Database:   DrillConnection\n## # Groups:     src_ip\n## # Ordered by: desc(n)\n##    src_ip         dst_ip             n\n##    &lt;chr&gt;          &lt;chr&gt;          &lt;dbl&gt;\n##  1 10.254.157.208 10.254.158.25    298\n##  2 10.254.158.25  10.254.157.208   204\n##  3 174.137.42.81  10.254.157.208    76\n##  4 10.254.157.208 10.254.158.8      54\n##  5 10.254.158.8   10.254.157.208    49\n##  6 74.125.28.102  10.254.157.208    49\n##  7 10.254.157.208 74.125.28.102     44\n##  8 10.254.157.208 174.137.42.81     41\n##  9 54.84.98.25    10.254.157.208    25\n## 10 157.55.56.168  10.254.157.208    25\n## # ... with more rows\n<\/code><\/pre>\n<p>More work appears to be planned by the Drill team to enable digging into the packet (binary) contents.<\/p>\n<h3>Drill Metadata As Data<\/h3>\n<p>Drill has provided ways to lookup Drill operational information as actual tables but the Drill team has added support for even more metadata-as-data queries.<\/p>\n<p>First up is finally having better access to filesystem information. Prior to 1.15.0 one could get file and path attributes as part of other queries, but now we can treat filesystems as actual data. Let&#8217;s list all the PCAPs in the above workspace:<\/p>\n<pre><code class=\"language-r\">tbl(con, \"information_schema.`schemata`\") %&gt;% \n  filter(SCHEMA_NAME == \"dfs.caps\") %&gt;% \n  print() %&gt;% \n  pull(SCHEMA_NAME) -&gt; pcap_schema\n## # Source:   lazy query [?? x 9]\n## # Database: DrillConnection\n##   CATALOG_NAME SCHEMA_NAME SCHEMA_OWNER TYPE  IS_MUTABLE\n##   &lt;chr&gt;        &lt;chr&gt;       &lt;chr&gt;        &lt;chr&gt; &lt;chr&gt;     \n## 1 DRILL        dfs.caps    &lt;owner&gt;      file  NO\n\ntbl(con, \"information_schema.`files`\") %&gt;% \n  filter(schema_name == pcap_schema) %&gt;% \n  glimpse()\n## Observations: ??\n## Variables: 13\n## $ SCHEMA_NAME       &lt;chr&gt; \"dfs.caps\"\n## $ ROOT_SCHEMA_NAME  &lt;chr&gt; \"dfs\"\n## $ WORKSPACE_NAME    &lt;chr&gt; \"caps\"\n## $ FILE_NAME         &lt;chr&gt; \"dof-short-capture.pcapng\"\n## $ RELATIVE_PATH     &lt;chr&gt; \"dof-short-capture.pcapng\"\n## $ IS_DIRECTORY      &lt;lgl&gt; FALSE\n## $ IS_FILE           &lt;lgl&gt; TRUE\n## $ LENGTH            &lt;dbl&gt; 634280\n## $ OWNER             &lt;chr&gt; \"hrbrmstr\"\n## $ GROUP             &lt;chr&gt; \"staff\"\n## $ PERMISSION        &lt;chr&gt; \"rw-r--r--\"\n## $ ACCESS_TIME       &lt;dttm&gt; 1969-12-31 19:00:00\n## $ MODIFICATION_TIME &lt;dttm&gt; 2019-01-01 19:12:17\n<\/code><\/pre>\n<p>The Drill system <code>options<\/code> table now has full descriptions for the options and also provides a new table that knows about all of Drills functions and all <em>your<\/em> custom UDFs. <code>drill_opts()<\/code> and <code>drill_functions()<\/code> return a data frame of all this info and have an optional <code>browse<\/code> parameter which, if set to <code>TRUE<\/code>, will show a <code>DT<\/code> interactive data table for them. I find this especially handy when I forget something like <code>regexp_like<\/code> syntax (I use <em>alot<\/em> of back-ends and many are wildly different) and can now do this:<\/p>\n<p><a href=\"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/drill-dt\/\" rel=\"attachment wp-att-11713\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"11713\" data-permalink=\"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/drill-dt\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/01\/drill-dt.png?fit=1822%2C678&amp;ssl=1\" data-orig-size=\"1822,678\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"drill-dt\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/01\/drill-dt.png?fit=510%2C190&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/01\/drill-dt.png?resize=510%2C190&#038;ssl=1\" alt=\"\" width=\"510\" height=\"190\" class=\"aligncenter size-full wp-image-11713\" \/><\/a><\/p>\n<h3>FIN<\/h3>\n<p>Keep on the lookout for the <code>rapidjsonr<\/code>\/BIGINT integration and more new features of the <code>sergeant<\/code> package. NOTE: The better error messages have been ported over to the <code>sergeant.caffeinated<\/code> package (the RJDBC interface) and the other niceties will make their way into that package soon as well.<\/p>\n<p>So, make sure you&#8217;re using the <code>0.8.0<\/code> <a href=\"https:\/\/gitlab.com\/hrbrmstr\/sergeant\/tree\/0.8.0\">GL<\/a> \/ <a href=\"https:\/\/github.com\/hrbrmstr\/sergeant\/tree\/0.8.0\">GH<\/a>, kick the tyres, file issues where you&#8217;re most comfortable working.<\/p>\n<p>May your queries all be optimized and results sets complete in the new year!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Apache Drill is an innovative distributed SQL engine designed to enable data exploration and analytics on non-relational datastores [&#8230;] without having to create and manage schemas. [&#8230;] It has a schema-free JSON document model similar to MongoDB and Elasticsearch; [a plethora of APIs, including] ANSI SQL, ODBC\/JDBC, and HTTP[S] REST; [is] extremely user and developer [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[819,779,781,91],"tags":[],"class_list":["post-11712","post","type-post","status-publish","format-standard","hentry","category-apache-drill","category-dplyr","category-drill","category-r"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types &amp; Mounds of New Metadata - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types &amp; Mounds of New Metadata - rud.is\" \/>\n<meta property=\"og:description\" content=\"Apache Drill is an innovative distributed SQL engine designed to enable data exploration and analytics on non-relational datastores [&#8230;] without having to create and manage schemas. [&#8230;] It has a schema-free JSON document model similar to MongoDB and Elasticsearch; [a plethora of APIs, including] ANSI SQL, ODBC\/JDBC, and HTTP[S] REST; [is] extremely user and developer [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2019-01-02T22:24:17+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/rud.is\/b\/wp-content\/uploads\/2019\/01\/drill-dt.png\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/01\\\/02\\\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/01\\\/02\\\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\\\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types &#038; Mounds of New Metadata\",\"datePublished\":\"2019-01-02T22:24:17+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/01\\\/02\\\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\\\/\"},\"wordCount\":857,\"commentCount\":3,\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"image\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/01\\\/02\\\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2019\\\/01\\\/drill-dt.png\",\"articleSection\":[\"Apache Drill\",\"dplyr\",\"drill\",\"R\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/01\\\/02\\\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/01\\\/02\\\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\\\/\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/01\\\/02\\\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\\\/\",\"name\":\"Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types & Mounds of New Metadata - rud.is\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/01\\\/02\\\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/01\\\/02\\\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2019\\\/01\\\/drill-dt.png\",\"datePublished\":\"2019-01-02T22:24:17+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/01\\\/02\\\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/01\\\/02\\\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/01\\\/02\\\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\\\/#primaryimage\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2019\\\/01\\\/drill-dt.png?fit=1822%2C678&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2019\\\/01\\\/drill-dt.png?fit=1822%2C678&ssl=1\",\"width\":1822,\"height\":678},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2019\\\/01\\\/02\\\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/rud.is\\\/b\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types &#038; Mounds of New Metadata\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/rud.is\\\/b\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\\\/\\\/rud.is\"],\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/author\\\/hrbrmstr\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types & Mounds of New Metadata - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/","og_locale":"en_US","og_type":"article","og_title":"Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types & Mounds of New Metadata - rud.is","og_description":"Apache Drill is an innovative distributed SQL engine designed to enable data exploration and analytics on non-relational datastores [&#8230;] without having to create and manage schemas. [&#8230;] It has a schema-free JSON document model similar to MongoDB and Elasticsearch; [a plethora of APIs, including] ANSI SQL, ODBC\/JDBC, and HTTP[S] REST; [is] extremely user and developer [&hellip;]","og_url":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/","og_site_name":"rud.is","article_published_time":"2019-01-02T22:24:17+00:00","og_image":[{"url":"https:\/\/rud.is\/b\/wp-content\/uploads\/2019\/01\/drill-dt.png","type":"","width":"","height":""}],"author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types &#038; Mounds of New Metadata","datePublished":"2019-01-02T22:24:17+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/"},"wordCount":857,"commentCount":3,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"image":{"@id":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/b\/wp-content\/uploads\/2019\/01\/drill-dt.png","articleSection":["Apache Drill","dplyr","drill","R"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/","url":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/","name":"Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types & Mounds of New Metadata - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"primaryImageOfPage":{"@id":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/#primaryimage"},"image":{"@id":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/b\/wp-content\/uploads\/2019\/01\/drill-dt.png","datePublished":"2019-01-02T22:24:17+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/#primaryimage","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/01\/drill-dt.png?fit=1822%2C678&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/01\/drill-dt.png?fit=1822%2C678&ssl=1","width":1822,"height":678},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types &#038; Mounds of New Metadata"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-32U","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":4753,"url":"https:\/\/rud.is\/b\/2016\/12\/20\/sergeant-a-r-boot-camp-for-apache-drill\/","url_meta":{"origin":11712,"position":0},"title":"sergeant : An R Boot Camp for Apache Drill","author":"hrbrmstr","date":"2016-12-20","format":false,"excerpt":"I recently mentioned that I've been working on a development version of an Apache Drill R package called sergeant. Here's a lifted \"TLDR\" on Drill: Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift,\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11088,"url":"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/","url_meta":{"origin":11712,"position":1},"title":"Two new Apache Drill UDFs for Processing UR[IL]s  and Internet Domain Names","author":"hrbrmstr","date":"2018-07-26","format":false,"excerpt":"Continuing the blog's UDF theme of late, there are two new UDF kids in town: drill-url-tools? for slicing & dicing URI\/URLs (just going to use 'URL' from now on in the post) drill-domain-tools? for slicing & dicing internet domain names (IDNs). Now, if you're an Apache Drill fanatic, you're likely\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11082,"url":"https:\/\/rud.is\/b\/2018\/07\/22\/new-apache-drill-udf-for-processing-twitter-tweet-text\/","url_meta":{"origin":11712,"position":2},"title":"New Apache Drill UDF for Processing Twitter Tweet Text","author":"hrbrmstr","date":"2018-07-22","format":false,"excerpt":"There are many ways to gather Twitter data for analysis and many R and Python (et al) libraries make full use of the Twitter API when building a corpus to extract useful metadata for each tweet along with the text of each tweet. However, many corpus archives are minimal and\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6091,"url":"https:\/\/rud.is\/b\/2017\/06\/17\/replicating-the-apache-drill-yelp-academic-dataset-with-sergeant\/","url_meta":{"origin":11712,"position":3},"title":"Replicating the Apache Drill &#8216;Yelp&#8217; Academic Dataset Analysis with sergeant","author":"hrbrmstr","date":"2017-06-17","format":false,"excerpt":"The Apache Drill folks have a nice walk-through tutorial on how to analyze the Yelp Academic Dataset with Drill. It's a bit out of date (the current Yelp data set structure is different enough that the tutorial will error out at various points), but it's a great example of how\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":4929,"url":"https:\/\/rud.is\/b\/2017\/01\/22\/create-parquet-files-from-r-data-frames-with-sergeant-apache-drill-a-k-a-make-parquet-files-great-again-in-r\/","url_meta":{"origin":11712,"position":4},"title":"Create Parquet Files From R Data Frames With sergeant &#038; Apache Drill (a.k.a. Make Parquet Files Great Again in R)","author":"hrbrmstr","date":"2017-01-22","format":false,"excerpt":"2021-11-04 UPDATE: Just use {arrow}. Apache Drill is a nice tool to have in the toolbox as it provides a SQL front-end to a wide array of database and file back-ends and runs in standalone\/embedded mode on every modern operating system (i.e. you can get started with or play locally\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":10121,"url":"https:\/\/rud.is\/b\/2018\/04\/20\/painless-odbc-dplyr-connections-to-amazon-athena-and-apache-drill-with-r-odbc\/","url_meta":{"origin":11712,"position":5},"title":"Painless ODBC  + dplyr Connections to Amazon Athena and Apache Drill with R &#038; odbc","author":"hrbrmstr","date":"2018-04-20","format":false,"excerpt":"I spent some time this morning upgrading the JDBC driver (and changing up some supporting code to account for changes to it) for my metis package? which connects R up to Amazon Athena via RJDBC. I'm used to JDBC and have to deal with Java separately from R so I'm\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=700%2C400 2x"},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11712","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=11712"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11712\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=11712"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=11712"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=11712"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}