We’ll do some light processing of that with awk, using a custom {knitr} engine that’s included in this qmd file. Use the code tools in the top right corner to see the code chunks.
In the one below, we define variable names for the positional fields and specify the file(s) to operate on in the chunk options.
---title: "Awk Quoted CSV Example"author: "@hrbrmstr"date: "2023-09-12"code-tools: trueformat: html: self-contained: true embed-resources: true theme: light: flatly dark: darklyengine: knitr---Demonstration that awk's CSV support handles quoted fields.```{r setup, echo=FALSE}# define custom knitr engine for our bespoke compiled awk with CSV supportknitr::knit_engines$set( cawk = function(opts) { code <- paste(opts$code, collapse = '\n') tf <- tempfile(fileext = ".awk") writeLines(code, tf) on.exit(unlink(tf)) args <- c("-f", tf) # enable awk CSV parsing if requested if (!is.null(opts[["awk.csv"]])) { args <- c(args, "--csv") } # enable awk variable assignment if requested awk_vars <- opts[grepl("^awk\\.var", names(opts))] if (length(awk_vars) > 0) { for (var_name in names(awk_vars)) { args <- c( args, "-v", sprintf( "%s=%s", sub("awk.var.", "", var_name, fixed=TRUE), shQuote(awk_vars[[var_name]]) ) ) } } # get data file(s) to process awk_files <- opts[grepl("^awk\\.file", names(opts))] if (length(awk_files) > 0) { for (fil in awk_files) { args <- c(args, shQuote(fil)) } } out <- system2( command = Sys.which("cawk"), args = args, stdout = TRUE ) # so we get syntax highlighting opts$engine <- "awk" knitr::engine_output( options = opts, code = code, out = out, extra = NULL ) })```Take a peek at the CSV file we're using:```{r peek}readLines("spike-01.csv", n=5) |> writeLines()```So. Many. Quotes.We'll do some light processing of that with awk, using a custom {knitr} engine that's included in this qmd file. Use the code tools in the top right corner to see the code chunks. In the one below, we define variable names for the positional fields and specify the file(s) to operate on in the chunk options.```{cawk, awk-block, awk.csv=TRUE}#| awk.var.dt: 1#| awk.var.hour: 2#| awk.var.app_protocol: 3#| awk.var.destination_port: 4#| awk.var.unique: 5#| awk.var.n_sensors: 6#| awk.var.total: 7#| awk.file.1: "spike-01.csv"NR > 1 { cum_total_by[$app_protocol ":" $destination_port] += $total } END { for (proto_port in cum_total_by) print proto_port " => " cum_total_by[proto_port] }```