

{"id":11403,"date":"2018-08-17T10:56:00","date_gmt":"2018-08-17T15:56:00","guid":{"rendered":"https:\/\/rud.is\/b\/?p=11403"},"modified":"2018-10-05T11:03:45","modified_gmt":"2018-10-05T16:03:45","slug":"in-brief-using-bro-connection-logs-with-apache-drill","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/","title":{"rendered":"In-brief: Using Bro connection logs with Apache Drill"},"content":{"rendered":"<p>If you&#8217;ve got a directory full of <a href=\"https:\/\/www.bro.org\/\">Bro NSM<\/a> logs, it&#8217;s easy to work with them in Apache Drill since they&#8217;re just tab-separated values (TSV) files by default. The most tedious part is mapping the columns to proper types and hopefully this saves at least one person from typing it out manually:<\/p>\n<pre><code class=\"language-sql\">SELECT \n  TO_TIMESTAMP(CAST(columns[0] AS DOUBLE)) AS ts,\n                    columns[1]             AS uid,\n                    columns[2]             AS id_orig_h,\n                    columns[3]             AS id_orig_p,\n                    columns[4]             AS id_resp_h,\n                    columns[5]             AS id_resp_p,\n                    columns[6]             AS proto,\n                    columns[7]             AS service,\n              CAST( columns[8] AS DOUBLE)  AS duration,\n              CAST( columns[9] AS INTEGER) AS orig_bytes,\n              CAST(columns[10] AS INTEGER) AS resp_bytes,\n                   columns[11]             AS conn_state,\n                   columns[12]             AS local_orig,\n                   columns[13]             AS local_resp,\n              CAST(columns[14] AS INTEGER) AS missed_bytes,\n                   columns[15]             AS history,\n              CAST(columns[16] AS INTEGER) AS orig_packets,\n              CAST(columns[17] AS INTEGER) AS orig_ip_bytes,\n              CAST(columns[18] AS INTEGER) AS resp_pkts,\n              CAST(columns[19] AS INTEGER) AS resp_ip_bytes,\n                   columns[20]             AS tunnel_parents\nFROM dfs.brologs.`\/201808\/*`<\/code><\/pre>\n<p>You can either store them all under a single <a href=\"https:\/\/rud.is\/books\/drill-sergeant-rstats\/adding-a-new-workspace-to-drill.html\">workspace<\/a> with a default input type or soft-link\/rename them to end in <code>.tsv<\/code> (it&#8217;s unlikely you want to change <em>all<\/em> <code>.log<\/code> files to be read as TSV everywhere).<\/p>\n<p>While you <em>could<\/em> just use the logs this way, consider using <a href=\"https:\/\/drill.apache.org\/docs\/create-table-as-ctas\/\">CTAS<\/a> to move them to Parquet. The above will created typed columns and the queries will generally be much faster.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you&#8217;ve got a directory full of Bro NSM logs, it&#8217;s easy to work with them in Apache Drill since they&#8217;re just tab-separated values (TSV) files by default. The most tedious part is mapping the columns to proper types and hopefully this saves at least one person from typing it out manually: SELECT TO_TIMESTAMP(CAST(columns[0] AS [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[819,781],"tags":[],"class_list":["post-11403","post","type-post","status-publish","format-standard","hentry","category-apache-drill","category-drill"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>In-brief: Using Bro connection logs with Apache Drill - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"In-brief: Using Bro connection logs with Apache Drill - rud.is\" \/>\n<meta property=\"og:description\" content=\"If you&#8217;ve got a directory full of Bro NSM logs, it&#8217;s easy to work with them in Apache Drill since they&#8217;re just tab-separated values (TSV) files by default. The most tedious part is mapping the columns to proper types and hopefully this saves at least one person from typing it out manually: SELECT TO_TIMESTAMP(CAST(columns[0] AS [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2018-08-17T15:56:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-10-05T16:03:45+00:00\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"In-brief: Using Bro connection logs with Apache Drill\",\"datePublished\":\"2018-08-17T15:56:00+00:00\",\"dateModified\":\"2018-10-05T16:03:45+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/\"},\"wordCount\":131,\"commentCount\":1,\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"articleSection\":[\"Apache Drill\",\"drill\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/\",\"url\":\"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/\",\"name\":\"In-brief: Using Bro connection logs with Apache Drill - rud.is\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/#website\"},\"datePublished\":\"2018-08-17T15:56:00+00:00\",\"dateModified\":\"2018-10-05T16:03:45+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/rud.is\/b\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"In-brief: Using Bro connection logs with Apache Drill\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/rud.is\/b\/#website\",\"url\":\"https:\/\/rud.is\/b\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/rud.is\/b\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\/\/rud.is\"],\"url\":\"https:\/\/rud.is\/b\/author\/hrbrmstr\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"In-brief: Using Bro connection logs with Apache Drill - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/","og_locale":"en_US","og_type":"article","og_title":"In-brief: Using Bro connection logs with Apache Drill - rud.is","og_description":"If you&#8217;ve got a directory full of Bro NSM logs, it&#8217;s easy to work with them in Apache Drill since they&#8217;re just tab-separated values (TSV) files by default. The most tedious part is mapping the columns to proper types and hopefully this saves at least one person from typing it out manually: SELECT TO_TIMESTAMP(CAST(columns[0] AS [&hellip;]","og_url":"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/","og_site_name":"rud.is","article_published_time":"2018-08-17T15:56:00+00:00","article_modified_time":"2018-10-05T16:03:45+00:00","author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"In-brief: Using Bro connection logs with Apache Drill","datePublished":"2018-08-17T15:56:00+00:00","dateModified":"2018-10-05T16:03:45+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/"},"wordCount":131,"commentCount":1,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"articleSection":["Apache Drill","drill"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/","url":"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/","name":"In-brief: Using Bro connection logs with Apache Drill - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"datePublished":"2018-08-17T15:56:00+00:00","dateModified":"2018-10-05T16:03:45+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2018\/08\/17\/in-brief-using-bro-connection-logs-with-apache-drill\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"In-brief: Using Bro connection logs with Apache Drill"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-2XV","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":11712,"url":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/","url_meta":{"origin":11403,"position":0},"title":"Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types &#038; Mounds of New Metadata","author":"hrbrmstr","date":"2019-01-02","format":false,"excerpt":"Apache Drill is an innovative distributed SQL engine designed to enable data exploration and analytics on non-relational datastores [...] without having to create and manage schemas. [...] It has a schema-free JSON document model similar to MongoDB and Elasticsearch; [a plethora of APIs, including] ANSI SQL, ODBC\/JDBC, and HTTP[S] REST;\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6046,"url":"https:\/\/rud.is\/b\/2017\/05\/31\/drilling-into-csvs-teaser-trailer\/","url_meta":{"origin":11403,"position":1},"title":"Drilling Into CSVs \u2014 Teaser Trailer","author":"hrbrmstr","date":"2017-05-31","format":false,"excerpt":"I used reading a directory of CSVs as the foundational example in my recent post on idioms. During my exchange with Matt, Hadley and a few others -- in the crazy Twitter thread that spawned said post -- I mentioned that I'd personally \"just use Drill\u201d. I'll use this post\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":4929,"url":"https:\/\/rud.is\/b\/2017\/01\/22\/create-parquet-files-from-r-data-frames-with-sergeant-apache-drill-a-k-a-make-parquet-files-great-again-in-r\/","url_meta":{"origin":11403,"position":2},"title":"Create Parquet Files From R Data Frames With sergeant &#038; Apache Drill (a.k.a. Make Parquet Files Great Again in R)","author":"hrbrmstr","date":"2017-01-22","format":false,"excerpt":"2021-11-04 UPDATE: Just use {arrow}. Apache Drill is a nice tool to have in the toolbox as it provides a SQL front-end to a wide array of database and file back-ends and runs in standalone\/embedded mode on every modern operating system (i.e. you can get started with or play locally\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":12561,"url":"https:\/\/rud.is\/b\/2019\/12\/18\/quickly-create-mostly-responsive-html-columns-with-htmltools\/","url_meta":{"origin":11403,"position":3},"title":"Quickly Create (Mostly) Responsive HTML Columns With {htmltools}","author":"hrbrmstr","date":"2019-12-18","format":false,"excerpt":"I had need to present a wall-of-text to show off a giant list of SSL certificate alternate names and needed the entire list to fit on one slide (not really for reading in full, but to show just how many there were in a way that a simple count would\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11737,"url":"https:\/\/rud.is\/b\/2019\/01\/09\/on-the-road-to-0-8-0-some-additional-new-features-coming-in-the-sergeant-package\/","url_meta":{"origin":11403,"position":4},"title":"On the Road to 0.8.0 \u2014 Some Additional New Features Coming in the sergeant Package","author":"hrbrmstr","date":"2019-01-09","format":false,"excerpt":"It was probably not difficult to discern from my previous Drill-themed post that I'm fairly excited about the Apache Drill 1.15.0 release. I've rounded out most of the existing corners for it in preparation for a long-overdue CRAN update and have been concentrating on two helper features: configuring & launching\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6146,"url":"https:\/\/rud.is\/b\/2017\/08\/01\/r%e2%81%b6-reticulating-parquet-files\/","url_meta":{"origin":11403,"position":5},"title":"R\u2076 \u2014 Reticulating Parquet Files","author":"hrbrmstr","date":"2017-08-01","format":false,"excerpt":"The reticulate package provides a very clean & concise interface bridge between R and Python which makes it handy to work with modules that have yet to be ported to R (going native is always better when you can do it). This post shows how to use reticulate to create\u2026","rel":"","context":"In &quot;data wrangling&quot;","block_context":{"text":"data wrangling","link":"https:\/\/rud.is\/b\/category\/data-wrangling\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11403","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=11403"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11403\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=11403"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=11403"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=11403"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}