

{"id":12938,"date":"2021-02-02T23:10:21","date_gmt":"2021-02-03T04:10:21","guid":{"rendered":"https:\/\/rud.is\/b\/?p=12938"},"modified":"2021-02-02T23:10:21","modified_gmt":"2021-02-03T04:10:21","slug":"amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/","title":{"rendered":"Amazon Athena {dbplyr} Implicit Usage of Presto Functions and Making JSON Casting Great Again"},"content":{"rendered":"<p>I was chatting with a fellow Amazon Athena user and the topic of using Presto functions such as <a href=\"https:\/\/prestodb.io\/docs\/0.217\/functions\/aggregate.html#approx_distinct\">approx_distinct()<\/a> via {d[b]plyr} came up and it seems it might not be fully common knowledge that any non-already <a href=\"https:\/\/dbplyr.tidyverse.org\/articles\/translation-function.html\">translated function<\/a> is passed to the destination intact. That means you can just &#8220;use&#8221; <code>approx_distinct()<\/code> and it will work just fine. Here&#8217;s an example using the ODBC {DBI} interface:<\/p>\n<pre><code class=\"language-r\">library(dbplyr)\nlibrary(tidyverse)\n\n# My personal Athena workgroup has been upgraded to \"engine 2\"\n# so Presto 0.217 functions are available. Only noting that for\n# folks who may not keep up with AWS announcements.\n#\n# https:\/\/prestodb.io\/docs\/0.217\/index.html\n\nDBI::dbConnect(\n  odbc::odbc(),\n  driver = \"\/Library\/simba\/athenaodbc\/lib\/libathenaodbc_sbu.dylib\",\n  Schema = \"sampledb\",\n  AwsRegion = \"us-east-1\",\n  AuthenticationType = \"IAM Profile\",\n  AWSProfile = \"personal\",\n  MaxCatalogNameLen = 0L,\n  MaxSchemaNameLen = 0L,\n  MaxColumnNameLen = 0L,\n  MaxTableNameLen = 0L,\n  UseResultsetStreaming = 1L,\n  StringColumnLength = 32 * 1024L,\n  S3OutputLocation = \"s3:\/\/accessible-bucket\/\"\n) -&gt; con\n\n# this comes with Athena\nelb_logs &lt;- tbl(con, \"elb_logs\")\n\nelb_logs\n## # Source:   table&lt;elb_logs&gt; [?? x 16]\n## # Database: Amazon Athena 01.00.0000[@Amazon Athena\/AwsDataCatalog]\n##    timestamp elbname requestip requestport backendip backendport\n##    &lt;chr&gt;     &lt;chr&gt;   &lt;chr&gt;           &lt;int&gt; &lt;chr&gt;           &lt;int&gt;\n##  1 2014-09-\u2026 lb-demo 251.51.8\u2026       17141 251.111.\u2026        8000\n##  2 2014-09-\u2026 lb-demo 244.201.\u2026       17141 244.140.\u2026        8888\n##  3 2014-09-\u2026 lb-demo 242.204.\u2026       17141 255.196.\u2026        8888\n##  4 2014-09-\u2026 lb-demo 251.51.8\u2026       17141 255.129.\u2026        8888\n##  5 2014-09-\u2026 lb-demo 242.241.\u2026       17141 255.129.\u2026        8899\n##  6 2014-09-\u2026 lb-demo 243.198.\u2026       17141 255.129.\u2026        8888\n##  7 2014-09-\u2026 lb-demo 244.119.\u2026       17141 242.89.1\u2026          80\n##  8 2014-09-\u2026 lb-demo 254.173.\u2026       17141 251.51.8\u2026        8000\n##  9 2014-09-\u2026 lb-demo 243.198.\u2026       17141 254.149.\u2026        8888\n## 10 2014-09-\u2026 lb-demo 249.185.\u2026       17141 241.36.2\u2026        8888\n## # \u2026 with more rows, and 10 more variables: requestprocessingtime &lt;dbl&gt;,\n## #   backendprocessingtime &lt;dbl&gt;, clientresponsetime &lt;dbl&gt;,\n## #   elbresponsecode &lt;chr&gt;, backendresponsecode &lt;chr&gt;,\n## #   receivedbytes &lt;int64&gt;, sentbytes &lt;int64&gt;, requestverb &lt;chr&gt;,\n## #   url &lt;chr&gt;, protocol &lt;chr&gt;\n\nelb_logs %&gt;% \n  summarise(d = n_distinct(backendip)) # 0.62 seconds\n## # Source:   lazy query [?? x 1]\n## # Database: Amazon Athena 01.00.0000[@Amazon Athena\/AwsDataCatalog]\n##         d\n##   &lt;int64&gt;\n## 1    2311\n\n# https:\/\/prestodb.io\/docs\/0.217\/functions\/aggregate.html#approx_distinct\n\nelb_logs %&gt;% \n  summarise(d = approx_distinct(backendip)) # 0.49 seconds\n## # Source:   lazy query [?? x 1]\n## # Database: Amazon Athena 01.00.0000[@Amazon Athena\/AwsDataCatalog]\n##         d\n##   &lt;int64&gt;\n## 1    2386\n<\/code><\/pre>\n<p>In this toy example there&#8217;s no real reason to use this alternate function, but on my datasets using the approximator version dramatically reduces query time, reduces query cost, and produces results that by default have a standard error of 2.3% (which is fine for the use-cases I apply this to). There&#8217;s an alternate signature which lets you supply the standard error, as well.<\/p>\n<p>If you&#8217;re curious as to what functions are translated by default, just use <code>sql_translate_env()<\/code> on the connection object:<\/p>\n<pre><code class=\"language-r\">sql_translate_env(con)\n## &lt;sql_variant&gt;\n## scalar:    -, :, !, !=, (, [, [[, {, *, \/, &amp;, &amp;&amp;, %\/%, %%, %&gt;%,\n## scalar:    %in%, ^, +, &lt;, &lt;=, ==, &gt;, &gt;=, |, ||, $, abs, acos,\n## scalar:    as_date, as_datetime, as.character, as.Date,\n## scalar:    as.double, as.integer, as.integer64, as.logical,\n## scalar:    as.numeric, as.POSIXct, asin, atan, atan2, between,\n## scalar:    bitwAnd, bitwNot, bitwOr, bitwShiftL, bitwShiftR,\n## scalar:    bitwXor, c, case_when, ceil, ceiling, coalesce, cos,\n## scalar:    cosh, cot, coth, day, desc, exp, floor, hour, if,\n## scalar:    if_else, ifelse, is.na, is.null, log, log10, mday,\n## scalar:    minute, month, na_if, nchar, now, paste, paste0, pmax,\n## scalar:    pmin, qday, round, second, sign, sin, sinh, sql, sqrt,\n## scalar:    str_c, str_conv, str_count, str_detect, str_dup,\n## scalar:    str_extract, str_extract_all, str_flatten, str_glue,\n## scalar:    str_glue_data, str_interp, str_length, str_locate,\n## scalar:    str_locate_all, str_match, str_match_all, str_order,\n## scalar:    str_pad, str_remove, str_remove_all, str_replace,\n## scalar:    str_replace_all, str_replace_na, str_sort, str_split,\n## scalar:    str_split_fixed, str_squish, str_sub, str_subset,\n## scalar:    str_to_lower, str_to_title, str_to_upper, str_trim,\n## scalar:    str_trunc, str_view, str_view_all, str_which,\n## scalar:    str_wrap, substr, substring, switch, tan, tanh, today,\n## scalar:    tolower, toupper, trimws, wday, xor, yday, year\n## aggregate: cume_dist, cummax, cummean, cummin, cumsum,\n## aggregate: dense_rank, first, lag, last, lead, max, mean, median,\n## aggregate: min, min_rank, n, n_distinct, nth, ntile, order_by,\n## aggregate: percent_rank, quantile, rank, row_number, sd, sum, var\n## window:    cume_dist, cummax, cummean, cummin, cumsum,\n## window:    dense_rank, first, lag, last, lead, max, mean, median,\n## window:    min, min_rank, n, n_distinct, nth, ntile, order_by,\n## window:    percent_rank, quantile, rank, row_number, sd, sum, var\n<\/code><\/pre>\n<p>The release of the latest versions of {d[b]plyr} destroyed a lazy, bad, hack I was using to cast columns to JSON (you&#8217;ll note the lack of a <code>cast()<\/code> function above, which is necessary for Athena since the <a href=\"https:\/\/prestodb.io\/docs\/0.217\/functions\/conversion.html\">syntax is not that of a function call<\/a>). I&#8217;m _very_glad they did since it&#8217;s bad to rely on undocumented functionality and, honestly, it&#8217;s pretty <a href=\"https:\/\/dbplyr.tidyverse.org\/reference\/sql_variant.html\">straightforward to make an &#8220;official&#8221; translation for them<\/a>.<\/p>\n<p>First, we need the class of this Athena ODBC connection:<\/p>\n<pre><code class=\"language-r\">class(con)\n## [1] \"Amazon Athena\"\n## attr(,\"package\")\n## [1] \".GlobalEnv\"\n<\/code><\/pre>\n<p>We&#8217;ll need to write a <code>sql_translation.Amazon Athena()<\/code> function for this connection class and we&#8217;ll start with writing one that doesn&#8217;t handle our casting just to show the basic setup:<\/p>\n<pre><code class=\"language-r\">`sql_translation.Amazon Athena` &lt;- function(x) {\n  sql_variant(\n    dbplyr::base_odbc_scalar,\n    dbplyr::base_odbc_agg,\n    dbplyr::base_odbc_win\n  )\n}\n<\/code><\/pre>\n<p>All that function is doing (now) is setting up the default translators you&#8217;ve seen in the above output listings.<\/p>\n<p>To make it do something else, we need to add casting translator helpers, which fall under the &#8220;scalar&#8221; category. This, too, is pretty straightforward since {dbplyr} makes it possible to just extend a parent set of category translators:<\/p>\n<pre><code class=\"language-r\">sql_translator(\n  .parent = dbplyr::base_odbc_scalar,\n  cast_as = function(x, y) dbplyr::build_sql(\"CAST(\", x, \" AS \", y, \")\"),\n  try_cast_as = function(x, y) dbplyr::build_sql(\"TRY_CAST(\", x, \" AS \", y, \")\")\n) -&gt; athena_scalar\n\n`sql_translation.Amazon Athena` &lt;- function(x) {\n  sql_variant(\n    athena_scalar,\n    dbplyr::base_odbc_agg,\n    dbplyr::base_odbc_win\n  )\n}\n<\/code><\/pre>\n<p>Now, let&#8217;s see if it <em>really<\/em> knows about our new casting functions:<\/p>\n<pre><code class=\"language-r\">sql_translate_env(con)\n## &lt;sql_variant&gt;\n## scalar:    -, :, !, !=, (, [, [[, {, *, \/, &amp;, &amp;&amp;, %\/%, %%, %&gt;%,\n## scalar:    %in%, ^, +, &lt;, &lt;=, ==, &gt;, &gt;=, |, ||, $, abs, acos,\n## scalar:    as_date, as_datetime, as.character, as.Date,\n## scalar:    as.double, as.integer, as.integer64, as.logical,\n## scalar:    as.numeric, as.POSIXct, asin, atan, atan2, between,\n## scalar:    bitwAnd, bitwNot, bitwOr, bitwShiftL, bitwShiftR,\n## scalar:    bitwXor, c, case_when, cast_as, ceil, ceiling,\n## scalar:    coalesce, cos, cosh, cot, coth, day, desc, exp, floor,\n## scalar:    hour, if, if_else, ifelse, is.na, is.null, log, log10,\n## scalar:    mday, minute, month, na_if, nchar, now, paste, paste0,\n## scalar:    pmax, pmin, qday, round, second, sign, sin, sinh, sql,\n## scalar:    sqrt, str_c, str_conv, str_count, str_detect, str_dup,\n## scalar:    str_extract, str_extract_all, str_flatten, str_glue,\n## scalar:    str_glue_data, str_interp, str_length, str_locate,\n## scalar:    str_locate_all, str_match, str_match_all, str_order,\n## scalar:    str_pad, str_remove, str_remove_all, str_replace,\n## scalar:    str_replace_all, str_replace_na, str_sort, str_split,\n## scalar:    str_split_fixed, str_squish, str_sub, str_subset,\n## scalar:    str_to_lower, str_to_title, str_to_upper, str_trim,\n## scalar:    str_trunc, str_view, str_view_all, str_which,\n## scalar:    str_wrap, substr, substring, switch, tan, tanh, today,\n## scalar:    tolower, toupper, trimws, try_cast_as, wday, xor,\n## scalar:    yday, year\n## aggregate: cume_dist, cummax, cummean, cummin, cumsum,\n## aggregate: dense_rank, first, lag, last, lead, max, mean, median,\n## aggregate: min, min_rank, n, n_distinct, nth, ntile, order_by,\n## aggregate: percent_rank, quantile, rank, row_number, sd, sum, var\n## window:    cume_dist, cummax, cummean, cummin, cumsum,\n## window:    dense_rank, first, lag, last, lead, max, mean, median,\n## window:    min, min_rank, n, n_distinct, nth, ntile, order_by,\n## window:    percent_rank, quantile, rank, row_number, sd, sum, var\n<\/code><\/pre>\n<p>Aye! Let&#8217;s test it out.<\/p>\n<p>Unfortunately, this boring, default database has no <code>MAP<\/code> columns to really show this off, but we can convert a simple character column into JSON just to get the idea:<\/p>\n<pre><code class=\"language-r\">elb_logs %&gt;% \n  select(backendip)\n## # Source:   lazy query [?? x 1]\n## # Database: Amazon Athena 01.00.0000[@Amazon Athena\/AwsDataCatalog]\n##    backendip      \n##    &lt;chr&gt;          \n##  1 249.6.80.219   \n##  2 248.178.189.65 \n##  3 254.70.228.23  \n##  4 248.178.189.65 \n##  5 252.0.81.65    \n##  6 248.178.189.65 \n##  7 245.241.133.121\n##  8 244.202.183.67 \n##  9 255.226.190.127\n## 10 246.22.152.210 \n## # \u2026 with more rows\n\nelb_logs %&gt;% \n  select(backendip) %&gt;% \n  mutate(\n    backendip = cast_as(backendip, JSON)\n  )\n## # Source:   lazy query [?? x 1]\n## # Database: Amazon Athena 01.00.0000[@Amazon Athena\/AwsDataCatalog]\n##    backendip            \n##    &lt;chr&gt;                \n##  1 \"\\\"244.238.214.120\\\"\"\n##  2 \"\\\"248.99.214.228\\\"\" \n##  3 \"\\\"243.3.190.175\\\"\"  \n##  4 \"\\\"246.235.181.255\\\"\"\n##  5 \"\\\"241.112.203.216\\\"\"\n##  6 \"\\\"240.147.242.82\\\"\" \n##  7 \"\\\"248.99.214.228\\\"\" \n##  8 \"\\\"248.99.214.228\\\"\" \n##  9 \"\\\"253.161.243.121\\\"\"\n## 10 \"\\\"248.99.214.228\\\"\" \n## # \u2026 with more rows\n<\/code><\/pre>\n<h3>FIN<\/h3>\n<p>Despite the {tidyverse} documentation being written with care and clarity, this part of the R ecosystem is so extensive and evolving that watching out for all the doors and corners can be tricky. It&#8217;s easy for the short paragraph on the &#8220;untranslated function&#8221; capability to be overlooked and it may be hard to fully grok the translation concept without an IRL example.<\/p>\n<p>Hopefully this helped (even if only a little) demystify these two areas of {d[b]plyr}.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was chatting with a fellow Amazon Athena user and the topic of using Presto functions such as approx_distinct() via {d[b]plyr} came up and it seems it might not be fully common knowledge that any non-already translated function is passed to the destination intact. That means you can just &#8220;use&#8221; approx_distinct() and it will work [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[91],"tags":[],"class_list":["post-12938","post","type-post","status-publish","format-standard","hentry","category-r"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Amazon Athena {dbplyr} Implicit Usage of Presto Functions and Making JSON Casting Great Again - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Amazon Athena {dbplyr} Implicit Usage of Presto Functions and Making JSON Casting Great Again - rud.is\" \/>\n<meta property=\"og:description\" content=\"I was chatting with a fellow Amazon Athena user and the topic of using Presto functions such as approx_distinct() via {d[b]plyr} came up and it seems it might not be fully common knowledge that any non-already translated function is passed to the destination intact. That means you can just &#8220;use&#8221; approx_distinct() and it will work [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2021-02-03T04:10:21+00:00\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"Amazon Athena {dbplyr} Implicit Usage of Presto Functions and Making JSON Casting Great Again\",\"datePublished\":\"2021-02-03T04:10:21+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/\"},\"wordCount\":464,\"commentCount\":4,\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"articleSection\":[\"R\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/\",\"url\":\"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/\",\"name\":\"Amazon Athena {dbplyr} Implicit Usage of Presto Functions and Making JSON Casting Great Again - rud.is\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/#website\"},\"datePublished\":\"2021-02-03T04:10:21+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/rud.is\/b\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Amazon Athena {dbplyr} Implicit Usage of Presto Functions and Making JSON Casting Great Again\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/rud.is\/b\/#website\",\"url\":\"https:\/\/rud.is\/b\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/rud.is\/b\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\/\/rud.is\"],\"url\":\"https:\/\/rud.is\/b\/author\/hrbrmstr\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Amazon Athena {dbplyr} Implicit Usage of Presto Functions and Making JSON Casting Great Again - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/","og_locale":"en_US","og_type":"article","og_title":"Amazon Athena {dbplyr} Implicit Usage of Presto Functions and Making JSON Casting Great Again - rud.is","og_description":"I was chatting with a fellow Amazon Athena user and the topic of using Presto functions such as approx_distinct() via {d[b]plyr} came up and it seems it might not be fully common knowledge that any non-already translated function is passed to the destination intact. That means you can just &#8220;use&#8221; approx_distinct() and it will work [&hellip;]","og_url":"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/","og_site_name":"rud.is","article_published_time":"2021-02-03T04:10:21+00:00","author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"Amazon Athena {dbplyr} Implicit Usage of Presto Functions and Making JSON Casting Great Again","datePublished":"2021-02-03T04:10:21+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/"},"wordCount":464,"commentCount":4,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"articleSection":["R"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/","url":"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/","name":"Amazon Athena {dbplyr} Implicit Usage of Presto Functions and Making JSON Casting Great Again - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"datePublished":"2021-02-03T04:10:21+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2021\/02\/02\/amazon-athena-dbplyr-implicit-usage-of-presto-functions-and-making-json-casting-great-again\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"Amazon Athena {dbplyr} Implicit Usage of Presto Functions and Making JSON Casting Great Again"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-3mG","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":11070,"url":"https:\/\/rud.is\/b\/2018\/07\/14\/alleviating-aws-athena-aggravation-with-asynchronous-assistance\/","url_meta":{"origin":12938,"position":0},"title":"Alleviating AWS Athena Aggravation with Asynchronous Assistance","author":"hrbrmstr","date":"2018-07-14","format":false,"excerpt":"I've blogged about how to use Amazon Athena with R before and if you are a regular Athena user, you've likely run into a situation where you prepare a dplyr chain, fire off a collect() and then wait. And, wait. And, wait. And, wait. Queries that take significant processing time\u2026","rel":"","context":"In &quot;athena&quot;","block_context":{"text":"athena","link":"https:\/\/rud.is\/b\/category\/athena\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11921,"url":"https:\/\/rud.is\/b\/2019\/02\/17\/conquering-caffeinated-amazon-athena-with-the-metis-trio-of-packages\/","url_meta":{"origin":12938,"position":1},"title":"Conquering Caffeinated Amazon Athena with the metis Trio of Packages","author":"hrbrmstr","date":"2019-02-17","format":false,"excerpt":"I must preface this post with the posit that if you're doing anything interactive() with Amazon Athena you should seriously consider just using their free ODBC drivers as it's the easiest way to wire them up to R DBI- and tidyverse-wise. I've said as much in previous posts. Drop a\u2026","rel":"","context":"In &quot;athena&quot;","block_context":{"text":"athena","link":"https:\/\/rud.is\/b\/category\/athena\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11077,"url":"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/","url_meta":{"origin":12938,"position":2},"title":"A new &#8216;boto3&#8217; Amazon Athena client wrapper with dplyr async query support","author":"hrbrmstr","date":"2018-07-20","format":false,"excerpt":"A previous post explored how to deal with Amazon Athena queries asynchronously. The function presented is a beast, though it is on purpose (to provide options for folks). In reality, nobody really wants to use rJava wrappers much anymore and dealing with icky Python library calls directly just feels wrong,\u2026","rel":"","context":"In &quot;athena&quot;","block_context":{"text":"athena","link":"https:\/\/rud.is\/b\/category\/athena\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":10121,"url":"https:\/\/rud.is\/b\/2018\/04\/20\/painless-odbc-dplyr-connections-to-amazon-athena-and-apache-drill-with-r-odbc\/","url_meta":{"origin":12938,"position":3},"title":"Painless ODBC  + dplyr Connections to Amazon Athena and Apache Drill with R &#038; odbc","author":"hrbrmstr","date":"2018-04-20","format":false,"excerpt":"I spent some time this morning upgrading the JDBC driver (and changing up some supporting code to account for changes to it) for my metis package? which connects R up to Amazon Athena via RJDBC. I'm used to JDBC and have to deal with Java separately from R so I'm\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":12138,"url":"https:\/\/rud.is\/b\/2019\/04\/10\/lost-in-sql-translation-charting-dbplyr-mapped-sql-function-support-across-all-backends\/","url_meta":{"origin":12938,"position":4},"title":"Lost In [SQL] Translation: Charting d[b]plyr Mapped SQL Function Support Across All Backends","author":"hrbrmstr","date":"2019-04-10","format":false,"excerpt":"Like more posts than I care to admit, this one starts innocently enough with a tweet by @gshotwell: Is there a reference document somewhere of which dplyr commands work on various database backends? #rstats\u2014 Gordon Shotwell (@gshotwell) April 9, 2019 Since I use at least 4 different d[b]plyr backends every\u2026","rel":"","context":"In &quot;dplyr&quot;","block_context":{"text":"dplyr","link":"https:\/\/rud.is\/b\/category\/dplyr\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/04\/backend-heatmap.jpg?fit=1200%2C659&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/04\/backend-heatmap.jpg?fit=1200%2C659&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/04\/backend-heatmap.jpg?fit=1200%2C659&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/04\/backend-heatmap.jpg?fit=1200%2C659&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2019\/04\/backend-heatmap.jpg?fit=1200%2C659&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":6111,"url":"https:\/\/rud.is\/b\/2017\/07\/17\/ten-hut-the-apache-drill-r-interface-package-sergeant-is-now-on-cran\/","url_meta":{"origin":12938,"position":5},"title":"Ten-HUT! The Apache Drill R interface package \u2014\u00a0sergeant \u2014\u00a0is now on CRAN","author":"hrbrmstr","date":"2017-07-17","format":false,"excerpt":"I'm extremely pleased to announce that the sergeant package is now on CRAN or will be hitting your local CRAN mirror soon. sergeant provides JDBC, DBI and dplyr\/dbplyr interfaces to Apache Drill. I've also wrapped a few goodies into the dplyr custom functions that work with Drill and if you\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/12938","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=12938"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/12938\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=12938"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=12938"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=12938"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}