(For first-timers, R⁶ tagged posts are short & sweet with minimal expository; R⁶ feed) At work-work I mostly deal with medium-to-large-ish data. I often want to poke at new or existing data sets w/o working across billions of rows. I also use Apache Drill for much of my exploratory work. Here’s how to uniformly sample… Continue reading
Post Category → drill
Increasing Output Buffer Size in Apache Drill UDFs Custom (Simple) Functions
Putting this here to make it easier for others who try to Google this topic to find it w/o having to find and tediously search through other UDFs (user-defined functions). I was/am making a custom UDF for base64 decoding/encoding and ran into: It’s incredibly easy to “fix” (and, if my Java weren’t so rusty I’d… Continue reading
Reading PCAP Files with Apache Drill and the sergeant R Package
It’s no secret that I’m a fan of Apache Drill. One big strength of the platform is that it normalizes the access to diverse data sources down to ANSI SQL calls, which means that I can pull data from parquet, Hie, HBase, Kudu, CSV, JSON, MongoDB and MariaDB with the same SQL syntax. This also… Continue reading
Ten-HUT! The Apache Drill R interface package — sergeant — is now on CRAN
I’m extremely pleased to announce that the sergeant package is now on CRAN or will be hitting your local CRAN mirror soon. sergeant provides JDBC, DBI and dplyr/dbplyr interfaces to Apache Drill. I’ve also wrapped a few goodies into the dplyr custom functions that work with Drill and if you have Drill UDFs that don’t… Continue reading
Replicating the Apache Drill ‘Yelp’ Academic Dataset Analysis with sergeant
The Apache Drill folks have a nice walk-through tutorial on how to analyze the Yelp Academic Dataset with Drill. It’s a bit out of date (the current Yelp data set structure is different enough that the tutorial will error out at various points), but it’s a great example of how to work with large, nested… Continue reading
Drilling Into CSVs — Teaser Trailer
I used reading a directory of CSVs as the foundational example in my recent post on idioms. During my exchange with Matt, Hadley and a few others — in the crazy Twitter thread that spawned said post — I mentioned that I’d personally “just use Drill”. I’ll use this post as a bit of a… Continue reading
Create Parquet Files From R Data Frames With sergeant & Apache Drill (a.k.a. Make Parquet Files Great Again in R)
2021-11-04 UPDATE: Just use {arrow}. Apache Drill is a nice tool to have in the toolbox as it provides a SQL front-end to a wide array of database and file back-ends and runs in standalone/embedded mode on every modern operating system (i.e. you can get started with or play locally with Drill w/o needing a… Continue reading
2017-01 Authored Package Updates
The rest of the month is going to be super-hectic and it’s unlikely I’ll be able to do any more to help the push to CRAN 10K, so here’s a breakdown of CRAN and GitHub new packages & package updates that I felt were worth raising awareness on: epidata I mentioned this one last week… Continue reading