Lost In [SQL] Translation: Charting d[b]plyr Mapped SQL Function Support Across All Backends

Like more posts than I care to admit, this one starts innocently enough with a tweet by @gshotwell: Is there a reference document somewhere of which dplyr commands work on various database backends? #rstats — Gordon Shotwell (@gshotwell) April 9, 2019 Since I use at least 4 different d[b]plyr backends every week, this same question… Continue reading

Apache Drill 1.15.0 + sergeant 0.8.0 = pcapng Support, Proper Column Types & Mounds of New Metadata

Apache Drill is an innovative distributed SQL engine designed to enable data exploration and analytics on non-relational datastores […] without having to create and manage schemas. […] It has a schema-free JSON document model similar to MongoDB and Elasticsearch; [a plethora of APIs, including] ANSI SQL, ODBC/JDBC, and HTTP[S] REST; [is] extremely user and developer… Continue reading

Making a Case for case_when

This is a brief (and likely obvious, for some folks) post on the dplyr::case_when() function. Part of my work-work is dealing with data from internet scans. When we’re performing a deeper inspection of a particular internet protocol or service we try to capture as much system and service metadata as possible. Sifting through said metadata… Continue reading

sergeant : An R Boot Camp for Apache Drill

I recently mentioned that I’ve been working on a development version of an Apache Drill R package called sergeant. Here’s a lifted “TLDR” on Drill: Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A… Continue reading