

{"id":11077,"date":"2018-07-20T10:32:02","date_gmt":"2018-07-20T15:32:02","guid":{"rendered":"https:\/\/rud.is\/b\/?p=11077"},"modified":"2018-07-21T15:09:10","modified_gmt":"2018-07-21T20:09:10","slug":"a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/","title":{"rendered":"A new &#8216;boto3&#8217; Amazon Athena client wrapper with dplyr async query support"},"content":{"rendered":"<p>A <a href=\"https:\/\/rud.is\/b\/2018\/07\/14\/alleviating-aws-athena-aggravation-with-asynchronous-assistance\/\">previous post<\/a> explored how to deal with Amazon Athena queries asynchronously. The function presented is a <em>beast<\/em>, though it is on purpose (to provide options for folks).<\/p>\n<p>In reality, nobody really wants to use <code>rJava<\/code> wrappers much anymore and dealing with icky Python library calls directly just feels wrong, plus Python functions often return truly daft\/ugly data structures. R users deserve better than that.<\/p>\n<p>R is not a first-class citizen in Amazon-land and while the <a href=\"https:\/\/github.com\/cloudyr\"><code>cloudyr<\/code><\/a> project does a fine job building native-R packages for various Amazon services, the fact remains that the <em>official<\/em> Amazon SDKs are in other languages. The <code>reticulate<\/code> package provides an elegant interface to Python so it seemed to make sense to go ahead and wrap the <a href=\"https:\/\/boto3.readthedocs.io\/en\/latest\/reference\/services\/athena.html\"><code>boto3<\/code><\/a> Athena client into something more R-like and toss in the <code>collect_async()<\/code> function for good measure.<\/p>\n<h3>Dependencies<\/h3>\n<p>I forced a dependency on Python 3.5 because friends don&#8217;t let friends rely on dated, fragmented ecosystems. Python versions can gracefully (mostly) coexist so there should be no pain\/angst associated with keeping an updated Python 3 environment around. As noted in the package, I highly recommend adding <code>RETICULATE_PYTHON=\/usr\/local\/bin\/python3<\/code> to your R environment (<code>~\/.Renviron<\/code> is a good place to store it) since it will help <code>reticulate<\/code> find the proper target.<\/p>\n<p>If <code>boto3<\/code> is not installed, you will need to do <code>pip3 install boto3<\/code> to ensure you have the necessary Python module available and associated with your Python 3 installation.<\/p>\n<p>It may seem obvious, but an Amazon AWS account is also required and you should be familiar with the Athena service and AWS services in general. Most of the <code>roto.athena<\/code> functions have a set of optional parameters:<\/p>\n<ul>\n<li><code>aws_access_key_id<\/code><\/li>\n<li><code>aws_secret_access_key<\/code><\/li>\n<li><code>aws_session_token<\/code><\/li>\n<li><code>region_name<\/code><\/li>\n<li><code>profile_name<\/code><\/li>\n<\/ul>\n<p>Ideally, these should be in setup in the proper configuration files and you should let <code>boto3<\/code> handle the details of retrieving them. One parameter you will see used in many of my examples is <code>profile_name = \"personal\"<\/code>. I have numerous AWS accounts and manage them via the profile ids. By ensuring the AWS configuration files are thoroughly populated, I avoid the need to load and pass around the various keys and\/or tokens most AWS SDK API calls require. You can read more about profile management in the official docs: <a href=\"https:\/\/docs.aws.amazon.com\/cli\/latest\/userguide\/cli-config-files.html\">1<\/a>, <a href=\"https:\/\/docs.aws.amazon.com\/cli\/latest\/topic\/config-vars.html\">2<\/a>.<\/p>\n<h3>Usage<\/h3>\n<p>The project README and package manual pages are populated and have a smattering of usage examples. It is likely you will really just want to execute a manually prepared SQL query and retrieve the results <em>or<\/em> do the <code>dplyr<\/code> dance and collect the results asynchronously. We&#8217;ll cover both of those use-cases now, starting with a manual SQL query.<\/p>\n<p>If you have not deleted it, your Athena instance comes with a <code>sampledb<\/code> that contains an <code>elb_logs<\/code> table. We&#8217;ll use that for our example queries. First, let&#8217;s get the packages we&#8217;ll be using out of the way:<\/p>\n<pre><code class=\"language-r\">library(odbc)\nlibrary(DBI) # for dplyr access later\nlibrary(odbc) # for dplyr access later\nlibrary(roto.athena) # hrbrmstr\/roto.athena on gh or gl\nlibrary(tidyverse) # b\/c it rocks<\/code><\/pre>\n<p>Now, we&#8217;ll prepare and execute the query. This is a super-simple one:<\/p>\n<pre><code class=\"language-r\">query <- \"SELECT COUNT(requestip) AS ct FROM elb_logs\"\n\nstart_query_execution(\n  query = query,\n  database = \"sampledb\",\n  output_location = \"s3:\/\/aws-athena-query-results-redacted\",\n  profile = \"personal\"\n) -> qex_id<\/code><\/pre>\n<p>The <code>qex_id<\/code> contains the query execution id. We can pass that along to get information on the status of the query:<\/p>\n<pre><code class=\"language-r\">get_query_execution(qex_id, profile = \"personal\") %>%\n  glimpse()\n## Observations: 1\n## Variables: 10\n## $ query_execution_id  <chr> \"7f8d8bd6-9fe6-4a26-a021-ee10470c1048\"\n## $ query               <chr> \"SELECT COUNT(requestip) AS ct FROM elb_logs\"\n## $ output_location     <chr> \"s3:\/\/aws-athena-query-results-redacted\/7f...\n## $ database            <chr> \"sampledb\"\n## $ state               <chr> \"RUNNING\"\n## $ state_change_reason <chr> NA\n## $ submitted           <chr> \"2018-07-20 11:06:06.468000-04:00\"\n## $ completed           <chr> NA\n## $ execution_time_ms   <int> NA\n## $ bytes_scanned       <dbl> NA<\/code><\/pre>\n<p>If the <code>state<\/code> is not <code>SUCCEEDED<\/code> then you&#8217;ll need to be patient before trying to retrieve the results.<\/p>\n<pre><code class=\"language-r\">get_query_results(qex_id, profile = \"personal\")\n## # A tibble: 1 x 1\n##   ct             \n##   <S3: integer64>\n## 1 4229<\/code><\/pre>\n<p>Now, we&#8217;ll use <code>dplyr<\/code> via the Athena ODBC driver:<\/p>\n<pre><code class=\"language-r\">DBI::dbConnect(\n  odbc::odbc(),\n  driver = \"\/Library\/simba\/athenaodbc\/lib\/libathenaodbc_sbu.dylib\",\n  Schema = \"sampledb\",\n  AwsRegion = \"us-east-1\",\n  AwsProfile = \"personal\",\n  AuthenticationType = \"IAM Profile\",\n  S3OutputLocation = \"s3:\/\/aws-athena-query-results-redacted\"\n) -> con\n\nelb_logs <- tbl(con, \"elb_logs\")<\/code><\/pre>\n<p>I've got the ODBC DBI fragment in a parameterized RStudio snippet and others may find that as a time-saver if you're not doing that already.<\/p>\n<p>Now to build and submit the query:<\/p>\n<pre><code class=\"language-r\">mutate(elb_logs, tsday = substr(timestamp, 1, 10)) %>%\n  filter(tsday == \"2014-09-29\") %>%\n  select(requestip, requestprocessingtime) %>%\n  collect_async(\n    database = \"sampledb\",\n    output_location = \"s3:\/\/aws-athena-query-results-redacted\",\n    profile_name = \"personal\"\n  ) -> qex_id<\/code><\/pre>\n<p>As noted in the previous blog post, <code>collect_async()<\/code> turn the <code>dplyr<\/code> chain into a SQL query then fires off the whole thing to <code>start_query_execution()<\/code> for you and returns the query execution id:<\/p>\n<pre><code class=\"language-r\">get_query_execution(qex_id, profile = \"personal\") %>%\n  glimpse()\n## Observations: 1\n## Variables: 10\n## $ query_execution_id  <chr> \"95bd158b-7790-42ba-aa83-e7436c3470fe\"\n## $ query               <chr> \"SELECT \\\"requestip\\\", \\\"requestprocessing...\n## $ output_location     <chr> \"s3:\/\/aws-athena-query-results-redacted\/95...\n## $ database            <chr> \"sampledb\"\n## $ state               <chr> \"RUNNING\"\n## $ state_change_reason <chr> NA\n## $ submitted           <chr> \"2018-07-20 11:06:12.817000-04:00\"\n## $ completed           <chr> NA\n## $ execution_time_ms   <int> NA\n## $ bytes_scanned       <dbl> NA<\/code><\/pre>\n<p>Again, you'll need to be patient and wait for the <code>state<\/code> to be <code>SUCCEEDED<\/code> to retrieve the results.<\/p>\n<pre><code class=\"language-r\">get_query_results(qex_id, profile = \"personal\")\n## # A tibble: 774 x 2\n##    requestip       requestprocessingtime\n##    <chr>                           <dbl>\n##  1 255.48.150.122              0.0000900\n##  2 249.213.227.93              0.0000970\n##  3 245.108.120.229             0.0000870\n##  4 241.112.203.216             0.0000940\n##  5 241.43.107.223              0.0000760\n##  6 249.117.98.137              0.0000830\n##  7 250.134.112.194             0.0000630\n##  8 250.200.171.222             0.0000540\n##  9 248.193.76.218              0.0000820\n## 10 250.57.61.131               0.0000870\n## # ... with 764 more rows<\/code><\/pre>\n<p>You can also use the query execution id to sync the resultant CSV from S3. Which one is more performant is definitely something you'll need to test since it varies with AWS region, result set size, your network connection and other environment variables. One benefit of using <code>get_query_results()<\/code> is that it uses the column types to set the data frame column types appropriately (I still need to setup a full test of all possible types so not all are handled yet).<\/p>\n<h3>Kick the tyres<\/h3>\n<p>The package is up on both <a href=\"https:\/\/gitlab.com\/hrbrmstr\/roto.athena\">GitLab<\/a> and <a href=\"https:\/\/github.com\/hrbrmstr\/roto.athena\">GitHub<\/a> and any and all feedback (i.e. Issues) or tweaks (i.e. PRs) are most welcome.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A previous post explored how to deal with Amazon Athena queries asynchronously. The function presented is a beast, though it is on purpose (to provide options for folks). In reality, nobody really wants to use rJava wrappers much anymore and dealing with icky Python library calls directly just feels wrong, plus Python functions often return [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[818,91],"tags":[],"class_list":["post-11077","post","type-post","status-publish","format-standard","hentry","category-athena","category-r"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>A new &#039;boto3&#039; Amazon Athena client wrapper with dplyr async query support - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A new &#039;boto3&#039; Amazon Athena client wrapper with dplyr async query support - rud.is\" \/>\n<meta property=\"og:description\" content=\"A previous post explored how to deal with Amazon Athena queries asynchronously. The function presented is a beast, though it is on purpose (to provide options for folks). In reality, nobody really wants to use rJava wrappers much anymore and dealing with icky Python library calls directly just feels wrong, plus Python functions often return [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2018-07-20T15:32:02+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-07-21T20:09:10+00:00\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"A new &#8216;boto3&#8217; Amazon Athena client wrapper with dplyr async query support\",\"datePublished\":\"2018-07-20T15:32:02+00:00\",\"dateModified\":\"2018-07-21T20:09:10+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/\"},\"wordCount\":704,\"commentCount\":2,\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"articleSection\":[\"athena\",\"R\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/\",\"url\":\"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/\",\"name\":\"A new 'boto3' Amazon Athena client wrapper with dplyr async query support - rud.is\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/#website\"},\"datePublished\":\"2018-07-20T15:32:02+00:00\",\"dateModified\":\"2018-07-21T20:09:10+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/rud.is\/b\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"A new &#8216;boto3&#8217; Amazon Athena client wrapper with dplyr async query support\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/rud.is\/b\/#website\",\"url\":\"https:\/\/rud.is\/b\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/rud.is\/b\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\/\/rud.is\"],\"url\":\"https:\/\/rud.is\/b\/author\/hrbrmstr\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A new 'boto3' Amazon Athena client wrapper with dplyr async query support - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/","og_locale":"en_US","og_type":"article","og_title":"A new 'boto3' Amazon Athena client wrapper with dplyr async query support - rud.is","og_description":"A previous post explored how to deal with Amazon Athena queries asynchronously. The function presented is a beast, though it is on purpose (to provide options for folks). In reality, nobody really wants to use rJava wrappers much anymore and dealing with icky Python library calls directly just feels wrong, plus Python functions often return [&hellip;]","og_url":"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/","og_site_name":"rud.is","article_published_time":"2018-07-20T15:32:02+00:00","article_modified_time":"2018-07-21T20:09:10+00:00","author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"A new &#8216;boto3&#8217; Amazon Athena client wrapper with dplyr async query support","datePublished":"2018-07-20T15:32:02+00:00","dateModified":"2018-07-21T20:09:10+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/"},"wordCount":704,"commentCount":2,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"articleSection":["athena","R"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/","url":"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/","name":"A new 'boto3' Amazon Athena client wrapper with dplyr async query support - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"datePublished":"2018-07-20T15:32:02+00:00","dateModified":"2018-07-21T20:09:10+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2018\/07\/20\/a-new-boto3-amazon-athena-client-wrapper-with-dplyr-async-query-support\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"A new &#8216;boto3&#8217; Amazon Athena client wrapper with dplyr async query support"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-2SF","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":4696,"url":"https:\/\/rud.is\/b\/2016\/12\/05\/interacting-with-amazon-athena-from-r\/","url_meta":{"origin":11077,"position":0},"title":"Interacting With Amazon Athena from R","author":"hrbrmstr","date":"2016-12-05","format":false,"excerpt":"This is a short post for those looking to test out Amazon Athena with R. Amazon makes Athena available via JDBC, so you can use RJDBC to query data. All you need is their JAR file and some setup information. Here's how to get the JAR file to the current\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":10121,"url":"https:\/\/rud.is\/b\/2018\/04\/20\/painless-odbc-dplyr-connections-to-amazon-athena-and-apache-drill-with-r-odbc\/","url_meta":{"origin":11077,"position":1},"title":"Painless ODBC  + dplyr Connections to Amazon Athena and Apache Drill with R &#038; odbc","author":"hrbrmstr","date":"2018-04-20","format":false,"excerpt":"I spent some time this morning upgrading the JDBC driver (and changing up some supporting code to account for changes to it) for my metis package? which connects R up to Amazon Athena via RJDBC. I'm used to JDBC and have to deal with Java separately from R so I'm\u2026","rel":"","context":"In &quot;Apache Drill&quot;","block_context":{"text":"Apache Drill","link":"https:\/\/rud.is\/b\/category\/apache-drill\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2018\/04\/today-is-a-good-day-to-query.jpg?fit=700%2C535&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":5954,"url":"https:\/\/rud.is\/b\/2017\/05\/16\/r%e2%81%b6-using-r-with-amazon-athena-awas-temporary-security-credentials\/","url_meta":{"origin":11077,"position":2},"title":"R\u2076 \u2014 Using R With Amazon Athena &#038; AWS Temporary Security Credentials","author":"hrbrmstr","date":"2017-05-16","format":false,"excerpt":"Most of the examples of working with most of the AWS services show basic username & password authentication. That's all well-and-good, but many shops use the AWS Security Token Service to provide temporary credentials and session tokens to limit exposure and provide more uniform multi-factor authentication. At my workplace, Frank\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11978,"url":"https:\/\/rud.is\/b\/2019\/02\/22\/cloudy-with-a-chance-of-caffeinated-query-orchestration-new-rjava-wrappers-for-aws-athena-sdk-for-java\/","url_meta":{"origin":11077,"position":3},"title":"Cloudy with a chance of Caffeinated Query Orchestration &#8211; New rJava Wrappers for AWS Athena SDK for Java","author":"hrbrmstr","date":"2019-02-22","format":false,"excerpt":"There are two fledgling rJava-based R packages that enable working with the AWS SDK for Athena: awsathena | GL| GH awsathenajars | GL| GH They're both needed to conform with the way CRAN like rJava-based packages submitted that also have large JAR dependencies. The goal is to eventually have wrappers\u2026","rel":"","context":"In &quot;Java&quot;","block_context":{"text":"Java","link":"https:\/\/rud.is\/b\/category\/java\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11070,"url":"https:\/\/rud.is\/b\/2018\/07\/14\/alleviating-aws-athena-aggravation-with-asynchronous-assistance\/","url_meta":{"origin":11077,"position":4},"title":"Alleviating AWS Athena Aggravation with Asynchronous Assistance","author":"hrbrmstr","date":"2018-07-14","format":false,"excerpt":"I've blogged about how to use Amazon Athena with R before and if you are a regular Athena user, you've likely run into a situation where you prepare a dplyr chain, fire off a collect() and then wait. And, wait. And, wait. And, wait. Queries that take significant processing time\u2026","rel":"","context":"In &quot;athena&quot;","block_context":{"text":"athena","link":"https:\/\/rud.is\/b\/category\/athena\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":11921,"url":"https:\/\/rud.is\/b\/2019\/02\/17\/conquering-caffeinated-amazon-athena-with-the-metis-trio-of-packages\/","url_meta":{"origin":11077,"position":5},"title":"Conquering Caffeinated Amazon Athena with the metis Trio of Packages","author":"hrbrmstr","date":"2019-02-17","format":false,"excerpt":"I must preface this post with the posit that if you're doing anything interactive() with Amazon Athena you should seriously consider just using their free ODBC drivers as it's the easiest way to wire them up to R DBI- and tidyverse-wise. I've said as much in previous posts. Drop a\u2026","rel":"","context":"In &quot;athena&quot;","block_context":{"text":"athena","link":"https:\/\/rud.is\/b\/category\/athena\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11077","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=11077"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/11077\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=11077"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=11077"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=11077"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}