Awk Quoted CSV Example

Author

@hrbrmstr

Published

September 12, 2023

Demonstration that awk’s CSV support handles quoted fields.

Take a peek at the CSV file we’re using:

readLines("spike-01.csv", n=5) |> 
  writeLines()
"dt","hour","app_protocol","destination_port","unique","n_sensors","total"
"2022-07-02","2022-07-02 04:00:00.000","TLS","443","6","1","106"
"2022-07-04","2022-07-04 03:00:00.000","HTTPS","443","134","38","6281"
"2022-07-04","2022-07-04 02:00:00.000","HTTPS","80","7","10","326"
"2022-07-02","2022-07-02 17:00:00.000","HTTPS","443","290","36","4763"

So. Many. Quotes.

We’ll do some light processing of that with awk, using a custom {knitr} engine that’s included in this qmd file. Use the code tools in the top right corner to see the code chunks.

In the one below, we define variable names for the positional fields and specify the file(s) to operate on in the chunk options.

NR > 1 { 
  cum_total_by[$app_protocol ":" $destination_port] += $total 
} END { 
  for (proto_port in cum_total_by) 
    print proto_port " => " cum_total_by[proto_port] 
}
HTTPS:80 => 24206
TLS:443 => 5945
TLS:80 => 10
HTTPS:8090 => 12637
TLS:47001 => 1
HTTPS:443 => 155323
TLS:3389 => 115