Most of the examples of working with most of the AWS services show basic username & password authentication. That’s all well-and-good, but many shops use the AWS Security Token Service to provide temporary credentials and session tokens to limit exposure and provide more uniform multi-factor authentication. At my workplace, Frank Mitchell created a nice electron app to make it super easy to create and re-up these credentials. The downside of this is that all AWS service usage for work requires using these credentials and I was having the darndest time trying to get Athena’s JDBC driver working with it (but I wasn’t spending alot of time on it as I tend to mirror research data to a local, beefy Apache Drill server).
I finally noticed the
com.amazonaws.athena.jdbc.shaded.com.amazonaws.auth.EnvironmentVariableCredentialsProvider
class and decided to give the following a go (you will need to point fil
to wherever you have the Athena jar file):
library(RJDBC)
library(tidyverse)
fil <- "~/Drivers/AthenaJDBC41-1.0.1.jar"
drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")
aws <- ini::read.ini("~/.aws/credentials")
Sys.setenv(AWS_ACCESS_KEY_ID = aws[Sys.getenv("AWS_PROFILE")][[1]]$aws_access_key_id)
Sys.setenv(AWS_SECRET_ACCESS_KEY = aws[Sys.getenv("AWS_PROFILE")][[1]]$aws_secret_access_key)
Sys.setenv(AWS_SESSION_TOKEN = aws[Sys.getenv("AWS_PROFILE")][[1]]$aws_session_token)
provider <- "com.amazonaws.athena.jdbc.shaded.com.amazonaws.auth.EnvironmentVariableCredentialsProvider"
con <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
s3_staging_dir=Sys.getenv("AWS_S3_STAGING_DIR"),
schema_name="DEFAULT_DB_SCHEMA_NAME",
aws_credentials_provider_class=provider)
dbListTables(con)
dbListFields(con, "SOME_TABLE_IN_THE_DEFAULT_DB")
dbGetQuery(con, "SELECT * FROM DEFAULT_DB_SCHEMA_NAME.SOME_TABLE_IN_THE_DEFAULT_DB limit 10;")
YMMV on Windows (comments about what does and does not work on Windows are welcome).
The provider
line was the key element I was missing prior to last night.
The Awsaml utility monitors/maintains entries under it’s purview credentials
file and keeps consistent profile ids, so I keep that AWS_PROFILE
setting in my ~/.Renviron
.
I also keep the default S3 Athena data staging bucket in an environment variable as well.
If you provide a default schema_name
then you can list tables and fields but queries need fully qualified database (Amazon calls them “schemas”) dot table name.
Initial attempts to have this setup “just work” with dplyr
0.6.0 (the forthcoming EPIC release) were unsuccessful but I’ll poke at all this when I get time and likely write a small Athena package to help smooth over rougher areas.
Y’all likely figured all this out way before I did, but in the event someone else is looking for the information, it should be google-able now.
3 Comments
My coworker and I are putting together an R package to wrap the JDBC calls, largely based on this blog post. I’d appreciate any feedback you might have : https://github.com/nfultz/AWR.Athena
Thanks for posting this! It’s just helped me solve a problem.
One update – the provider class has changed in the v2 driver. It should now be
com.simba.athena.amazonaws.auth.EnvironmentVariableCredentialsProvider
Well, I use the latest drivers kinda daily and what in my pkgs still works so…
3 Trackbacks/Pingbacks
[…] article was first published on R – rud.is, and kindly contributed to […]
[…] Most of the examples of working with most of the AWS services show basic username & password authentication. That’s all well-and-good, but many shops use the AWS Security Token Service to provide temporary credentials and session tokens to limit exposure and provide more uniform multi-factor authentication. At my workplace, Frank Mitchell created a nice electron… Continue reading → […]
[…] article was first published on R – rud.is, and kindly contributed to […]