Apache Zeppelin is a “notebook” alternative to Jupyter (and other) notebooks. It supports a plethora of kernels/interpreters and can do a ton of things that this post isn’t going to discuss (perhaps future ones will, especially since it’s the first “notebook” environment I’ve been able to tolerate for longer than a week).
One really cool feature of Zeppelin is the ability for it to wire it up to databases via JDBC and use it interactive queries. A future post will provide instructions for Apache Drill, but this one’s about wiring up Amazon Athena and Apache Zeppelin. A big reason to do this is that image at the top of the post. The query interface is far nicer than the Amazon console and — while RStudio is going to have similar features in the 1.2 release — Zeppelin has some advantages over it, especially as 0.9.0 moves to final release.
If you use basic credentials in Athena, this post can help you connect up.
At $DAYJOB we use an open source application that we developed — Awsaml
? at $DAYJOB — which provides automagically rotated temporary AWS credentials every hour after a successful initial multi-factor authentication (you should think about doing this, too).
Because it uses a non default
profile name we need to use a different authentication class when using the Athena JDBC interface.
To somewhat dup the aforelinked post, you’ll need to download the driver that matches your version of the JDK and the JDBC data standards.
- The AthenaJDBC41-2.0.2.jar is compatible with JDBC 4.1 and requires JDK 7.0 or later.
- The AthenaJDBC42-2.0.2.jar is compatible with JDBC 4.2 and requires JDK 8.0 or later.
I like to put JARs like this in /usr/local/jars
(just remember where you put it).
Now, just create a Zeppelin interpreter named athena
(or whatever you like). Set the default.driver
to com.simba.athena.jdbc.Driver
and the JDBC string to this horribly long entity:
jdbc:awsathena://athena.us-east-1.amazonaws.com:443;S3OutputLocation=s3://aws-athena-query-results-something-us-east-1;Schema=default;AwsCredentialsProviderClass=com.simba.athena.amazonaws.auth.profile.ProfileCredentialsProvider;AwsCredentialsProviderArguments="your-profile-name"
I intentionally left it un-wrapped so it’s easier to copy. Here are the individual parts (separating the bullets at the semicolons):
jdbc:awsathena://athena.us-east-1.amazonaws.com:443
(use what you need to here)S3OutputLocation=s3://aws-athena-query-results-something-us-east-1
(wherever Athena can write to)Schema=default
(the schema you’ll use)AwsCredentialsProviderClass=com.simba.athena.amazonaws.auth.profile.ProfileCredentialsProvider
(this is the proper class to select a profile by name)AwsCredentialsProviderArguments="your-profile-name"
(this is the profile name you want to use)
NOTE: You can use other JDBC driver parameters as well. I just focused on the minimum ones to keep it simple.
Blank-out any username/password fields (which, in theory, won’t be referenced anyway) and then scroll down and add the JAR you’re using an artifact. In my case that’s /usr/local/jars/AthenaJDBC42_2.0.2.jar
.
Now, you can use a stored profile and hopefully rotating creds to work with %athena
interpreter blocks in Zeppelin.