Recipe 1 Adding a New ‘Workspace’ to Drill
1.1 Problem
You have data in a directory structure and would like a Drill “shortcut” reference to it vs entering the full path in queries all the time.
1.2 Solution
Create a Drill “workspace
”.
1.3 Discussion
If you’ve gone through Drill in 10 Minutes or reviewed the recipe that goes into a bit more depth on an introduction to Drill, you know you can get to any location on your local filesystem with dfs.root
filesystem references like this:
dfs.root.`/some/very/long/path/to/a/set/of/files/in/my/coolproject/*.json.gz`
That’s great but it’s also annoying to type each time you work with data in that directory.
Drill lets you define a workspace name as a kind of alias to a filesystem location. They’re very easy to setup by going to http://localhost:8047/storage/dfs and taking a look at the JSON configuration under the dfs
storage plugin. There’s an entry for “workspaces
” and we can add one for the above example like so:
"coolproject" : {
"location" : "/some/very/long/path/to/a/set/of/files/in/my/coolproject/",
"writable" : false,
"defaultInputFormat": null
},
Now, you can use:
dfs.coolproject.`/*.json.gz`
in queries.
If you have custom formats or just know the file format most files will be using in your directory tree, you can also customize the defaultInputFormat
and if you really want to live dangerously you can make your directory tree writable by changing that boolean value to true
. Drill is pretty good about not overwriting files and directories but unless you really need write-ability, leave this false
.