{"id":11082,"date":"2018-07-22T15:47:23","date_gmt":"2018-07-22T20:47:23","guid":{"rendered":"https:\/\/rud.is\/b\/?p=11082"},"modified":"2018-07-22T15:47:23","modified_gmt":"2018-07-22T20:47:23","slug":"new-apache-drill-udf-for-processing-twitter-tweet-text","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2018\/07\/22\/new-apache-drill-udf-for-processing-twitter-tweet-text\/","title":{"rendered":"New Apache Drill UDF for Processing Twitter Tweet Text"},"content":{"rendered":"<p>There are many ways to gather Twitter data for analysis and many R and Python (et al) libraries make full use of the Twitter API when building a corpus to extract useful metadata for each tweet along with the text of each tweet. However, many corpus archives are minimal and only retain a small portion of the metadata &mdash; often just tweet timestamp, the tweet creator and the tweet text &mdash; leaving to the analyst the trudging work of re-extracting hashtags, mentions, URLs (etc).<\/p>\n<p>Twitter provides a tweet-text processing library for many languages. One of these languages is <a href=\"https:\/\/github.com\/twitter\/twitter-text\/tree\/master\/java\">Java<\/a>. Since it make sense to perform at-scale data operations in Apache Drill, it also seemed to make sense that Apache Drill could use a tweet metadata extraction set of user-defined functions (UDFs). Plus, there just aren&#8217;t enough examples of Drill UDFs out there. Thus begat <a href=\"https:\/\/github.com\/hrbrmstr\/drill-twitter-text\"><code>drill-twitter-text<\/code>?<\/a>.<\/p>\n<h3>What&#8217;s Inside the Tin?<\/h3>\n<p>There are five UDF functions in the package:<\/p>\n<ul>\n<li><code>tw_parse_tweet(string)<\/code>: Parses the tweet text and returns a map column with the following named values:\n<ul>\n<li><code>weightedLength<\/code>: (int) the overall length of the tweet with code points weighted per the ranges defined in the configuration file<\/li>\n<li><code>permillage: (int) indicates the proportion (per thousand) of the weighted length in comparison to the max weighted length. A value &gt; 1000 indicates input text that is longer than the allowable maximum.<\/code><\/li>\n<li><code>isValid<\/code>: (boolean) indicates if input text length corresponds to a valid result.<\/li>\n<li><code>display_start<\/code> \/ <code>display_end<\/code>: (int) indices identifying the inclusive start and exclusive end of the displayable content of the Tweet.<\/li>\n<li><code>valid_start<\/code> \/ <code>valid_end<\/code>: (int) indices identifying the inclusive start and exclusive end of the valid content of the Tweet.<\/li>\n<\/ul>\n<\/li>\n<li><code>tw_extract_hashtags(string)<\/code>: Extracts all hashtags in the tweet text into a list which can be <code>FLATTEN()<\/code>ed.<\/li>\n<li><code>tw_extract_screennames(string)<\/code>: Extracts all screennames in the tweet text into a list which can be <code>FLATTEN()<\/code>ed.<\/li>\n<li><code>tw_extract_urls(string)<\/code>: Extracts all URLs in the tweet text into a list which can be <code>FLATTEN()<\/code>ed.<\/li>\n<li><code>tw_extract_reply_screenname()<\/code>: Extracts the reply screenname (if any) from the tweet text into a <code>VARCHAR<\/code>.<\/li>\n<\/ul>\n<p>The repo has all the necessary bits and info to help you compile and load the necessary JARs, but those in a hurry can just copy all the files in the <a href=\"https:\/\/github.com\/hrbrmstr\/drill-twitter-text\/tree\/master\/target\"><code>target<\/code><\/a> directory to your local <code>jars\/3rparty<\/code> directory and restart Drill.<\/p>\n<h3>Usage<\/h3>\n<p>Here&#8217;s an example of how to call each UDF along with the output:<\/p>\n<pre><code class=\"language-sql\">SELECT \n  tw_extract_screennames(tweetText) AS mentions,\n  tw_extract_hashtags(tweetText) AS tags,\n  tw_extract_urls(tweetText) AS urls,\n  tw_extract_reply_screenname(tweetText) AS reply_to,\n  tw_parse_tweet(tweetText) AS tweet_meta\nFROM\n  (SELECT \n     '@youThere Load data from #Apache Drill to @QlikSense - #Qlik Tuesday Tips and Tricks #ApacheDrill #BigData https:\/\/t.co\/fkAJokKF5O https:\/\/t.co\/bxdNCiqdrE' AS tweetText\n   FROM (VALUES((1))))<\/code>\n\n+----------+------+------+----------+------------+\n| mentions | tags | urls | reply_to | tweet_meta |\n+----------+------+------+----------+------------+\n| [\"youThere\",\"QlikSense\"] | [\"Apache\",\"Qlik\",\"ApacheDrill\",\"BigData\"] | [\"https:\/\/t.co\/fkAJokKF5O\",\"https:\/\/t.co\/bxdNCiqdrE\"] | youThere | {\"weightedLength\":154,\"permillage\":550,\"isValid\":true,\"display_start\":0,\"display_end\":153,\"valid_start\":0,\"valid_end\":153} |\n+----------+------+------+----------+------------+<\/pre>\n<h3>FIN<\/h3>\n<p>Kick the tyres and file <a href=\"https:\/\/github.com\/hrbrmstr\/drill-twitter-text\/issues\">issues<\/a> and <a href=\"https:\/\/github.com\/hrbrmstr\/drill-twitter-text\/pulls\">PRs<\/a> as needed.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There are many ways to gather Twitter data for analysis and many R and Python (et al) libraries make full use of the Twitter API when building a corpus to extract useful metadata for each tweet along with the text of each tweet. However, many corpus archives are minimal and only retain a small portion [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":"","jetpack_post_was_ever_published":false},"categories":[819,781,778,655,820],"tags":[],"class_list":["post-11082","post","type-post","status-publish","format-standard","hentry","category-apache-drill","category-drill","category-sql","category-twitter-2","category-udf"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>New Apache Drill UDF for Processing Twitter Tweet Text - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2018\/07\/22\/new-apache-drill-udf-for-processing-twitter-tweet-text\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"New Apache Drill UDF for Processing Twitter Tweet Text - rud.is\" \/>\n<meta property=\"og:description\" content=\"There are many ways to gather Twitter data for analysis and many R and Python (et al) libraries make full use of the Twitter API when building a corpus to extract useful metadata for each tweet along with the text of each tweet. However, many corpus archives are minimal and only retain a small portion [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2018\/07\/22\/new-apache-drill-udf-for-processing-twitter-tweet-text\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2018-07-22T20:47:23+00:00\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/07\\\/22\\\/new-apache-drill-udf-for-processing-twitter-tweet-text\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/07\\\/22\\\/new-apache-drill-udf-for-processing-twitter-tweet-text\\\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"New Apache Drill UDF for Processing Twitter Tweet Text\",\"datePublished\":\"2018-07-22T20:47:23+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/07\\\/22\\\/new-apache-drill-udf-for-processing-twitter-tweet-text\\\/\"},\"wordCount\":362,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"articleSection\":[\"Apache Drill\",\"drill\",\"SQL\",\"twitter\",\"UDF\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/07\\\/22\\\/new-apache-drill-udf-for-processing-twitter-tweet-text\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/07\\\/22\\\/new-apache-drill-udf-for-processing-twitter-tweet-text\\\/\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/07\\\/22\\\/new-apache-drill-udf-for-processing-twitter-tweet-text\\\/\",\"name\":\"New Apache Drill UDF for Processing Twitter Tweet Text - rud.is\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\"},\"datePublished\":\"2018-07-22T20:47:23+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/07\\\/22\\\/new-apache-drill-udf-for-processing-twitter-tweet-text\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/07\\\/22\\\/new-apache-drill-udf-for-processing-twitter-tweet-text\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2018\\\/07\\\/22\\\/new-apache-drill-udf-for-processing-twitter-tweet-text\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/rud.is\\\/b\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"New Apache Drill UDF for Processing Twitter Tweet Text\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/rud.is\\\/b\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\\\/\\\/rud.is\"],\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/author\\\/hrbrmstr\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"New Apache Drill UDF for Processing Twitter Tweet Text - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2018\/07\/22\/new-apache-drill-udf-for-processing-twitter-tweet-text\/","og_locale":"en_US","og_type":"article","og_title":"New Apache Drill UDF for Processing Twitter Tweet Text - rud.is","og_description":"There are many ways to gather Twitter data for analysis and many R and Python (et al) libraries make full use of the Twitter API when building a corpus to extract useful metadata for each tweet along with the text of each tweet. However, many corpus archives are minimal and only retain a small portion [&hellip;]","og_url":"https:\/\/rud.is\/b\/2018\/07\/22\/new-apache-drill-udf-for-processing-twitter-tweet-text\/","og_site_name":"rud.is","article_published_time":"2018-07-22T20:47:23+00:00","author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2018\/07\/22\/new-apache-drill-udf-for-processing-twitter-tweet-text\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2018\/07\/22\/new-apache-drill-udf-for-processing-twitter-tweet-text\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"New Apache Drill UDF for Processing Twitter Tweet Text","datePublished":"2018-07-22T20:47:23+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2018\/07\/22\/new-apache-drill-udf-for-processing-twitter-tweet-text\/"},"wordCount":362,"commentCount":0,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"articleSection":["Apache Drill","drill","SQL","twitter","UDF"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2018\/07\/22\/new-apache-drill-udf-for-processing-twitter-tweet-text\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2018\/07\/22\/new-apache-drill-udf-for-processing-twitter-tweet-text\/","url":"https:\/\/rud.is\/b\/2018\/07\/22\/new-apache-drill-udf-for-processing-twitter-tweet-text\/","name":"New Apache Drill UDF for Processing Twitter Tweet Text - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"datePublished":"2018-07-22T20:47:23+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2018\/07\/22\/new-apache-drill-udf-for-processing-twitter-tweet-text\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2018\/07\/22\/new-apache-drill-udf-for-processing-twitter-tweet-text\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2018\/07\/22\/new-apache-drill-udf-for-processing-twitter-tweet-text\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"New Apache Drill UDF for Processing Twitter Tweet Text"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-2SK","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":11088,"url":"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/","url_meta":{"origin":11082,"position":0},"title":"Two new Apache Drill UDFs for Processing UR[IL]s  and Internet Domain Names","author":"hrbrmstr","date":"2018-07-26","format":false,"excerpt":"Continuing the blog's UDF theme of late, there are two new UDF kids in town: drill-url-tools? for slicing & dicing URI\/URLs (just going to use 'URL' from now on in the post) drill-domain-tools? for slicing & dicing internet domain names (IDNs). Now, if you're an Apache Drill fanatic, you're likely\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6237,"url":"https:\/\/rud.is\/b\/2017\/09\/11\/increasing-output-buffer-size-in-apache-drill-udfs-custom-simple-functions\/","url_meta":{"origin":11082,"position":1},"title":"Increasing Output Buffer Size in Apache Drill UDFs  Custom (Simple) Functions","author":"hrbrmstr","date":"2017-09-11","format":false,"excerpt":"Putting this here to make it easier for others who try to Google this topic to find it w\/o having to find and tediously search through other UDFs (user-defined functions). I was\/am making a custom UDF for base64 decoding\/encoding and ran into: It's incredibly easy to \"fix\" (and, if my\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6091,"url":"https:\/\/rud.is\/b\/2017\/06\/17\/replicating-the-apache-drill-yelp-academic-dataset-with-sergeant\/","url_meta":{"origin":11082,"position":2},"title":"Replicating the Apache Drill &#8216;Yelp&#8217; Academic Dataset Analysis with sergeant","author":"hrbrmstr","date":"2017-06-17","format":false,"excerpt":"The Apache Drill folks have a nice walk-through tutorial on how to analyze the Yelp Academic Dataset with Drill. It's a bit out of date (the current Yelp data set structure is different enough that the tutorial will error out at various points), but it's a great example of how\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6127,"url":"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/","url_meta":{"origin":11082,"position":3},"title":"Reading PCAP Files with Apache Drill and the sergeant R Package","author":"hrbrmstr","date":"2017-07-27","format":false,"excerpt":"It's no secret that I'm a fan of Apache Drill. One big strength of the platform is that it normalizes the access to diverse data sources down to ANSI SQL calls, which means that I can pull data from parquet, Hie, HBase, Kudu, CSV, JSON, MongoDB and MariaDB with the\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11712,"url":"https:\/\/rud.is\/b\/2019\/01\/02\/apache-drill-1-15-0-sergeant-0-8-0-pcapng-support-proper-column-types-mounds-of-new-metadata\/","url_meta":{"origin":11082,"position":4},"title":"Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types &#038; Mounds of New Metadata","author":"hrbrmstr","date":"2019-01-02","format":false,"excerpt":"Apache Drill is an innovative distributed SQL engine designed to enable data exploration and analytics on non-relational datastores [...] without having to create and manage schemas. [...] It has a schema-free JSON document model similar to MongoDB and Elasticsearch; [a plethora of APIs, including] ANSI SQL, ODBC\/JDBC, and HTTP[S] REST;\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/01\/drill-dt.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/01\/drill-dt.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/01\/drill-dt.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/01\/drill-dt.png?resize=700%2C400&ssl=1 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/01\/drill-dt.png?resize=1050%2C600&ssl=1 3x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/01\/drill-dt.png?resize=1400%2C800&ssl=1 4x"},"classes":[]},{"id":7637,"url":"https:\/\/rud.is\/b\/2017\/12\/20\/r%e2%81%b6-series-random-sampling-from-apache-drill-tables-with-r-sergeant\/","url_meta":{"origin":11082,"position":5},"title":"R\u2076 Series \u2014 Random Sampling From Apache Drill Tables With R &#038; sergeant","author":"hrbrmstr","date":"2017-12-20","format":false,"excerpt":"(For first-timers, R\u2076 tagged posts are short & sweet with minimal expository; R\u2076 feed) At work-work I mostly deal with medium-to-large-ish data. I often want to poke at new or existing data sets w\/o working across billions of rows. I also use Apache Drill for much of my exploratory work.\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11082","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=11082"}],"version-history":[{"count":4,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11082\/revisions"}],"predecessor-version":[{"id":11086,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11082\/revisions\/11086"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=11082"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=11082"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=11082"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}