[MonetDBLite](https://www.monetdb.org/blog/monetdblite-r) (for R) was announced/released today and, while the examples they provide are compelling there’s a “gotcha” for potential new folks using SQL in general and SQL + MonetDB + R together. The toy example on the site shows dumping `mtcars` with `dbWriteTable` and then doing things. Real-world CSV files have headers and commas (MonetDB by default expects no headers and `|` as a separator). Also, you need to make a MonetDB table (with a schema) before copying your _giant_ CSV file full of data into it. That’s a pain to do by hand.
Here’s another toy example that shows how to:
– use a specific directory for the embedded MonetDB files
– *auto-generate* the `CREATE TABLE` syntax from a sample of the real-world CSV file
– load the data from the real-world CSV file (i.e. skipping the header and using a `,` as a delimiter
– wire it up to R & dplyr
It’s very similar to the MonetDBLite toy example but may help folks get up and running in the real world with less frustration.
library(MonetDBLite) library(MonetDB.R) library(dplyr) # use built-in mtcars to make a CS File # we're more likely to find a file in this format vs what dbWriteTable produces # i.e. it has a header and commas for separator write.csv(add_rownames(mtcars, "auto"), "mtcars.csv", row.names=FALSE) # make a connection and get rid of the old table if it exists since # we are just playing around. in real life you prbly want to keep # the giant table there vs recreate it every time mdb <- dbConnect(MonetDBLite(), "/full/path/to/your/preferred/monetdb/data/dir") try(invisible(dbSendQuery(mdb, "DROP TABLE mtcars")), silent=TRUE) # now we guess the column types by reading in a small fraction of the rows guess <- read.csv("mtcars.csv", stringsAsFactors=FALSE, nrows=1000) create <- sprintf("CREATE TABLE mtcars ( %s )", paste0(sprintf('"%s" %s', colnames(guess), sapply(guess, dbDataType, dbObj=mdb)), collapse=",")) # we build the table creation dynamically from what we've learned from guessing invisible(dbSendQuery(mdb, create)) # and then we load the data into the database, skipping the header and specifying a comma invisible(dbSendQuery(mdb, "COPY OFFSET 2 INTO mtcars FROM '/full/path/to/where/you/wrote/the/csv/to/mtcars.csv' USING DELIMITERS ','")) # now wire it up to dplyr mdb_src <- src_monetdb(embedded="/full/path/to/your/preferred/monetdb/data/dir") mdb_mtcars <- tbl(mdb_src, "mtcars") # and have some fun count(mdb_mtcars, cyl) ## Source: MonetDB () ## From: <derived table> [?? x 2] ## ## cyl n ## (int) (dbl) ## 1 6 7 ## 2 4 11 ## 3 8 14 ## .. ... ...
3 Comments
Thanks very much for this post. I’be been trying to grok the interface between MonetDB and R and this helped quite a bit. I noticed that MonetDB.R has a monetdb.read.csv() function. I tried it on your example and it seems to work.
Hi, thanks for your post! Quick hint, explicitly including the MonetDBLite package is not required. Otherwise, I also recommend the monetdb.read.csv function which tries to automate the CSV loading process.
Thanks very much for the warning, Bob. I am not even really clear (after watching https://channel9.msdn.com/Events/useR-international-R-User-conference/useR2016/Efficient-tabular-data-ingestion-and-manipulation-with-MonetDBLite) who is switching to Monet or why.
Nice post, as always.
One Trackback/Pingback
[…] article was first published on rud.is » R, and kindly contributed to […]