This is a short post for those looking to test out Amazon Athena with R.
Amazon makes Athena available via JDBC, so you can use RJDBC to query data. All you need is their JAR file and some setup information. Here’s how to get the JAR file to the current working directory:
URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.0.jar'
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil)
To avoid putting credentials in code, you can store the AWS key and secret you’re using for the queries in ATHENA_USER and ATHENA_PASSWORD environment variables via ~/.Renviron. You’ll also need an S3 bucket writable by those credentials for the Athena staging directory. With that info in hand, it’s easy to connect:
library(RJDBC)
library(dplyr)
drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")
con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
s3_staging_dir="s3://accessible-bucket",
user=Sys.getenv("ATHENA_USER"),
password=Sys.getenv("ATHENA_PASSWORD"))
Even if you have no data configured in Athena, you can check out the test data available to all:
dbListTables(con)
## [1] "elb_logs"
If that worked, then you should be able to query data (using the fully qualified table name in this case):
dbGetQuery(con, "SELECT * FROM sampledb.elb_logs LIMIT 10") %>%
dplyr::glimpse()
## Observations: 10
## Variables: 16
## $ timestamp <chr> "2014-09-27T00:00:25.424956Z", "2014-09-27T00:00:56.439218Z", "2014-09-27T00:01:27.441734Z", "2014-09-27T00:01:58.366715Z", "2014-09-27T00:02:29.446363Z", "2014-09-2...
## $ elbname <chr> "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo"
## $ requestip <chr> "241.230.198.83", "252.26.60.51", "250.244.20.109", "247.59.58.167", "254.64.224.54", "245.195.140.77", "245.195.140.77", "243.71.49.173", "240.139.5.14", "251.192.4...
## $ requestport <dbl> 27026, 27026, 27026, 27026, 27026, 27026, 27026, 27026, 27026, 27026
## $ backendip <chr> "251.192.40.76", "249.89.116.3", "251.111.156.171", "251.139.91.156", "251.111.156.171", "254.64.224.54", "254.64.224.54", "250.244.20.109", "247.65.176.249", "250.2...
## $ backendport <dbl> 443, 8888, 8888, 8888, 8000, 8888, 8888, 8888, 8888, 8888
## $ requestprocessingtime <dbl> 9.1e-05, 9.4e-05, 8.4e-05, 9.7e-05, 9.1e-05, 9.3e-05, 9.4e-05, 8.3e-05, 9.0e-05, 9.0e-05
## $ backendprocessingtime <dbl> 0.046598, 0.038973, 0.047054, 0.039845, 0.061461, 0.037791, 0.047035, 0.048792, 0.045724, 0.029918
## $ clientresponsetime <dbl> 4.9e-05, 4.7e-05, 4.9e-05, 4.9e-05, 4.0e-05, 7.7e-05, 7.5e-05, 7.3e-05, 4.0e-05, 6.7e-05
## $ elbresponsecode <chr> "200", "200", "200", "200", "200", "200", "200", "200", "200", "200"
## $ backendresponsecode <chr> "200", "200", "200", "200", "200", "400", "400", "200", "200", "200"
## $ receivedbytes <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ sentbytes <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
## $ requestverb <chr> "GET", "GET", "GET", "GET", "GET", "GET", "GET", "GET", "GET", "GET"
## $ url <chr> "http://www.abcxyz.com:80/jobbrowser/?format=json&state=running&user=20g578y", "http://www.abcxyz.com:80/jobbrowser/?format=json&state=running&user=20g578y", "http:/...
## $ protocol <chr> "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1"
And, you can disconnect when done:
dbDisconnect(con)
You should probably store the JAR file in a central location and refer to it that way in “production” scripts.
Now, you can go crazy querying data and racking up AWS charges ?.


Jumping on the “Banned” Wagon
>_The parent proposed a committee made up of parents and teachers of different cultural backgrounds come up with a list of books that are inclusive for all students._
Where’s the [ACLU](https://www.aclu.org/issues/free-speech/artistic-expression/banned-books)? They’d be right there if this was an alt-right’er asking to ban books with salacious content they deem (rightly or wrongly) “inappropriate” or “harmful” to teens. I hope they step up and fight the fight the good fight here.
We’re rapidly losing — or may have already lost (as a society) — the concept of building resilience & strength through adversity. I’m hopeful that the diversity we bring into this country with immigrants and refugees (if the borders don’t close shut or we scare them away) that know what true adversity is will eventually counteract this downward spiral.
Refs:
– (mp3 backup in the event they take down the audio record)
–
–