

{"id":11088,"date":"2018-07-26T20:28:05","date_gmt":"2018-07-27T01:28:05","guid":{"rendered":"https:\/\/rud.is\/b\/?p=11088"},"modified":"2018-07-26T20:28:05","modified_gmt":"2018-07-27T01:28:05","slug":"two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/","title":{"rendered":"Two new Apache Drill UDFs for Processing UR[IL]s  and Internet Domain Names"},"content":{"rendered":"<p>Continuing the blog&#8217;s UDF theme of late, there are two new UDF kids in town:<\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/hrbrmstr\/drill-url-tools\"><code>drill-url-tools<\/code>?<\/a> for slicing &amp; dicing URI\/URLs (just going to use &#8216;URL&#8217; from now on in the post)<\/li>\n<li><a href=\"https:\/\/github.com\/hrbrmstr\/drill-domain-tools\"><code>drill-domain-tools<\/code>?<\/a> for slicing &amp; dicing internet domain names (IDNs).<\/li>\n<\/ul>\n<p>Now, if you&#8217;re an Apache Drill fanatic, you&#8217;re likely thinking <em>&#8220;Hey hrbrmstr: don&#8217;t you know that Drill has a <a href=\"https:\/\/github.com\/apache\/drill\/blob\/62f14690870568364723dc77494043a9854a0447\/exec\/java-exec\/src\/main\/java\/org\/apache\/drill\/exec\/expr\/fn\/impl\/ParseUrlFunction.java\"><code>parse_url()<\/code>?<\/a> function already?&#8221;<\/em> My answer is <em>&#8220;Sure, but it&#8217;s based on <code>java.net.URL<\/code> which is <a href=\"http:\/\/galimatias.mola.io\/\">fundamentally broken<\/a>.&#8221;<\/em><\/p>\n<p>Slicing &amp; dicing URLs and IDNs is a large part of the <code>$DAYJOB<\/code> and they go together pretty well, hence the joint UDF release.<\/p>\n<p>Rather than just use boring SQL for an example, we&#8217;ll start with some SQL and use R for a decent example of working with the two, new UDFs.<\/p>\n<h3>Counting Lying Lock Icons<\/h3>\n<p>SSL\/TLS is all the craze these days, so let&#8217;s see how many distinct sites in the <a href=\"https:\/\/blog.gdeltproject.org\/announcing-gdelt-global-frontpage-graph-gfg\/\">GDELT Global Front Page<\/a> (GFG) data set use port 443 vs port 80 (a good indicator, plus it will help show how the URL tools pick up ports even when they&#8217;re not there).<\/p>\n<p>If you go to the aforementioned URL it instructs us that the most current GFG dataset URL can be retrieved by inspecting the contents of <a href=\"http:\/\/data.gdeltproject.org\/gdeltv3\/gfg\/alpha\/lastupdate.txt\">this metadata URL<\/a><\/p>\n<p>There are over a million records in that data set but &#8212; as we&#8217;ll see &#8212; not nearly as many distinct hosts.<\/p>\n<p>Let&#8217;s get the data:<\/p>\n<pre><code class=\"language-r\">library(sergeant)\nlibrary(tidyverse)\n\nread_delim(\n  file = \"http:\/\/data.gdeltproject.org\/gdeltv3\/gfg\/alpha\/lastupdate.txt\", \n  delim = \" \", \n  col_names = FALSE,\n  col_types = \"ccc\"\n) -> gfg_update\n\ndl_path <- file.path(\"~\/Data\/gfg_links.tsv.gz\")\n\nif (!file.exists(dl_path)) download.file(gfg_update$X3[1], dl_path)<\/code><\/pre>\n<p>Those operations have placed the GFG data set in a place where my local Drill instance can get to them. It's a tab separated file (TSV) which &mdash; while not a great data format &mdash; is workable with Drill.<\/p>\n<p>Now we'll setup a SQL query that will parse the URLs and domains, giving us a nice rectangular structure for R &amp; <code>dbplyr<\/code>. We'll use the second column since a significant percentage of the URLs in column 6 are malformed:<\/p>\n<pre><code class=\"language-r\">db <- src_drill()\n\ntbl(db, \"(\nSELECT \n  b.host,\n  port,\n  b.rec.hostname AS hostname,\n  b.rec.assigned AS assigned,\n  b.rec.tld AS tld,\n  b.rec.subdomain AS subdomain\nFROM\n  (SELECT\n    host, port, suffix_extract(host) AS rec             -- break the hostname into components\n  FROM\n    (SELECT\n      a.rec.host AS host, a.rec.port AS port\n    FROM\n      (SELECT \n        columns[1] AS url, url_parse(columns[1]) AS rec -- break the URL into components\n      FROM dfs.d.`\/gfg_links.tsv.gz`) a\n    WHERE a.rec.port IS NOT NULL                        -- filter out URL parsing failures\n    )\n  ) b\nWHERE b.rec.tld IS NOT NULL                             -- filter out domain parsing failures\n)\") -> gfg_df\n\ngfg_df\n## # Database: DrillConnection\n##    hostname  port host              subdomain assigned      tld  \n##    <chr>    <int> <chr>             <lgl>     <chr>         <chr>\n##  1 www         80 www.eestikirik.ee NA        eestikirik.ee ee   \n##  2 www         80 www.eestikirik.ee NA        eestikirik.ee ee   \n##  3 www         80 www.eestikirik.ee NA        eestikirik.ee ee   \n##  4 www         80 www.eestikirik.ee NA        eestikirik.ee ee   \n##  5 www         80 www.eestikirik.ee NA        eestikirik.ee ee   \n##  6 www         80 www.eestikirik.ee NA        eestikirik.ee ee   \n##  7 www         80 www.eestikirik.ee NA        eestikirik.ee ee   \n##  8 www         80 www.eestikirik.ee NA        eestikirik.ee ee   \n##  9 www         80 www.eestikirik.ee NA        eestikirik.ee ee   \n## 10 www         80 www.eestikirik.ee NA        eestikirik.ee ee   \n## # ... with more rows<\/code><\/pre>\n<p>While we could have done it all in SQL, we saved some bits for R:<\/p>\n<pre><code class=\"language-r\">distinct(gfg_df, assigned, port) %>% \n  count(port) %>% \n  collect() -> port_counts\n\nport_counts\n# A tibble: 2 x 2\n   port     n\n* <int> <int>\n1    80 20648\n2   443 22178\n<\/code><\/pre>\n<p>You'd think more news-oriented sites would be HTTPS by default given the current global political climate (though those lock icons are no safety panacea by any stretch of the imagination).<\/p>\n<h3>FIN<\/h3>\n<p>Now, R <em>can<\/em> do URL &amp; IDN slicing, but Drill can operate at-scale. That is, R's <code>urltools<\/code> package may be fine for single-node, in-memory ops, but Drill can process billions of URLs when part of a cluster.<\/p>\n<p>I'm not 100% settled on the <code>galimatias<\/code> library for URL parsing (I need to do some extended testing) and I may add some less-strict IDN slicing &amp; dicing functions as well.<\/p>\n<p>Kick the tyres &amp; file issues &amp; PRs as necessary.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Continuing the blog&#8217;s UDF theme of late, there are two new UDF kids in town: drill-url-tools? for slicing &amp; dicing URI\/URLs (just going to use &#8216;URL&#8217; from now on in the post) drill-domain-tools? for slicing &amp; dicing internet domain names (IDNs). Now, if you&#8217;re an Apache Drill fanatic, you&#8217;re likely thinking &#8220;Hey hrbrmstr: don&#8217;t you [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[819,781,91],"tags":[],"class_list":["post-11088","post","type-post","status-publish","format-standard","hentry","category-apache-drill","category-drill","category-r"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Two new Apache Drill UDFs for Processing UR[IL]s and Internet Domain Names - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Two new Apache Drill UDFs for Processing UR[IL]s and Internet Domain Names - rud.is\" \/>\n<meta property=\"og:description\" content=\"Continuing the blog&#8217;s UDF theme of late, there are two new UDF kids in town: drill-url-tools? for slicing &amp; dicing URI\/URLs (just going to use &#8216;URL&#8217; from now on in the post) drill-domain-tools? for slicing &amp; dicing internet domain names (IDNs). Now, if you&#8217;re an Apache Drill fanatic, you&#8217;re likely thinking &#8220;Hey hrbrmstr: don&#8217;t you [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2018-07-27T01:28:05+00:00\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"Two new Apache Drill UDFs for Processing UR[IL]s and Internet Domain Names\",\"datePublished\":\"2018-07-27T01:28:05+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/\"},\"wordCount\":458,\"commentCount\":2,\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"articleSection\":[\"Apache Drill\",\"drill\",\"R\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/\",\"url\":\"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/\",\"name\":\"Two new Apache Drill UDFs for Processing UR[IL]s and Internet Domain Names - rud.is\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/#website\"},\"datePublished\":\"2018-07-27T01:28:05+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/rud.is\/b\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Two new Apache Drill UDFs for Processing UR[IL]s and Internet Domain Names\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/rud.is\/b\/#website\",\"url\":\"https:\/\/rud.is\/b\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/rud.is\/b\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\/\/rud.is\"],\"url\":\"https:\/\/rud.is\/b\/author\/hrbrmstr\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Two new Apache Drill UDFs for Processing UR[IL]s and Internet Domain Names - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/","og_locale":"en_US","og_type":"article","og_title":"Two new Apache Drill UDFs for Processing UR[IL]s and Internet Domain Names - rud.is","og_description":"Continuing the blog&#8217;s UDF theme of late, there are two new UDF kids in town: drill-url-tools? for slicing &amp; dicing URI\/URLs (just going to use &#8216;URL&#8217; from now on in the post) drill-domain-tools? for slicing &amp; dicing internet domain names (IDNs). Now, if you&#8217;re an Apache Drill fanatic, you&#8217;re likely thinking &#8220;Hey hrbrmstr: don&#8217;t you [&hellip;]","og_url":"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/","og_site_name":"rud.is","article_published_time":"2018-07-27T01:28:05+00:00","author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"Two new Apache Drill UDFs for Processing UR[IL]s and Internet Domain Names","datePublished":"2018-07-27T01:28:05+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/"},"wordCount":458,"commentCount":2,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"articleSection":["Apache Drill","drill","R"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/","url":"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/","name":"Two new Apache Drill UDFs for Processing UR[IL]s and Internet Domain Names - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"datePublished":"2018-07-27T01:28:05+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2018\/07\/26\/two-new-apache-drill-udfs-for-processing-urils-and-internet-domain-names\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"Two new Apache Drill UDFs for Processing UR[IL]s and Internet Domain Names"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-2SQ","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":11082,"url":"https:\/\/rud.is\/b\/2018\/07\/22\/new-apache-drill-udf-for-processing-twitter-tweet-text\/","url_meta":{"origin":11088,"position":0},"title":"New Apache Drill UDF for Processing Twitter Tweet Text","author":"hrbrmstr","date":"2018-07-22","format":false,"excerpt":"There are many ways to gather Twitter data for analysis and many R and Python (et al) libraries make full use of the Twitter API when building a corpus to extract useful metadata for each tweet along with the text of each tweet. However, many corpus archives are minimal and\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":6237,"url":"https:\/\/rud.is\/b\/2017\/09\/11\/increasing-output-buffer-size-in-apache-drill-udfs-custom-simple-functions\/","url_meta":{"origin":11088,"position":1},"title":"Increasing Output Buffer Size in Apache Drill UDFs  Custom (Simple) Functions","author":"hrbrmstr","date":"2017-09-11","format":false,"excerpt":"Putting this here to make it easier for others who try to Google this topic to find it w\/o having to find and tediously search through other UDFs (user-defined functions). I was\/am making a custom UDF for base64 decoding\/encoding and ran into: It's incredibly easy to \"fix\" (and, if my\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":10121,"url":"https:\/\/rud.is\/b\/2018\/04\/20\/painless-odbc-dplyr-connections-to-amazon-athena-and-apache-drill-with-r-odbc\/","url_meta":{"origin":11088,"position":2},"title":"Painless ODBC  + dplyr Connections to Amazon Athena and Apache Drill with R &#038; odbc","author":"hrbrmstr","date":"2018-04-20","format":false,"excerpt":"I spent some time this morning upgrading the JDBC driver (and changing up some supporting code to account for changes to it) for my metis package? which connects R up to Amazon Athena via RJDBC. I'm used to JDBC and have to deal with Java separately from R so I'm\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":6127,"url":"https:\/\/rud.is\/b\/2017\/07\/27\/reading-pcap-files-with-apache-drill-and-the-sergeant-r-package\/","url_meta":{"origin":11088,"position":3},"title":"Reading PCAP Files with Apache Drill and the sergeant R Package","author":"hrbrmstr","date":"2017-07-27","format":false,"excerpt":"It's no secret that I'm a fan of Apache Drill. One big strength of the platform is that it normalizes the access to diverse data sources down to ANSI SQL calls, which means that I can pull data from parquet, Hie, HBase, Kudu, CSV, JSON, MongoDB and MariaDB with the\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":4852,"url":"https:\/\/rud.is\/b\/2017\/01\/08\/2017-01-authored-package-updates\/","url_meta":{"origin":11088,"position":4},"title":"2017-01 Authored Package Updates","author":"hrbrmstr","date":"2017-01-08","format":false,"excerpt":"The rest of the month is going to be super-hectic and it's unlikely I'll be able to do any more to help the push to CRAN 10K, so here's a breakdown of CRAN and GitHub new packages & package updates that I felt were worth raising awareness on: epidata I\u2026","rel":"","context":"In &quot;dplyr&quot;","block_context":{"text":"dplyr","link":"https:\/\/rud.is\/b\/category\/dplyr\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/01\/epi2.png?fit=982%2C1200&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/01\/epi2.png?fit=982%2C1200&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/01\/epi2.png?fit=982%2C1200&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2017\/01\/epi2.png?fit=982%2C1200&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":11479,"url":"https:\/\/rud.is\/b\/2018\/09\/09\/driving-drill-dynamically-with-docker-and-updating-storage-configurations-on-the-fly-with-sergeant\/","url_meta":{"origin":11088,"position":5},"title":"Driving Drill Dynamically with Docker and Updating Storage Configurations On-the-fly with sergeant","author":"hrbrmstr","date":"2018-09-09","format":false,"excerpt":"The sergeant? package has a minor update that adds REST API coverage for two \"new\" storage endpoints that make it possible to add, update and remove storage configurations on-the-fly without using the GUI or manually updating a config file. This is an especially handy feature when paired with Drill's new,\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11088","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=11088"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11088\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=11088"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=11088"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=11088"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}