If you’ve got a directory full of Bro NSM logs, it’s easy to work with them in Apache Drill since they’re just tab-separated values (TSV) files by default. The most tedious part is mapping the columns to proper types and hopefully this saves at least one person from typing it out manually:
SELECT
TO_TIMESTAMP(CAST(columns[0] AS DOUBLE)) AS ts,
columns[1] AS uid,
columns[2] AS id_orig_h,
columns[3] AS id_orig_p,
columns[4] AS id_resp_h,
columns[5] AS id_resp_p,
columns[6] AS proto,
columns[7] AS service,
CAST( columns[8] AS DOUBLE) AS duration,
CAST( columns[9] AS INTEGER) AS orig_bytes,
CAST(columns[10] AS INTEGER) AS resp_bytes,
columns[11] AS conn_state,
columns[12] AS local_orig,
columns[13] AS local_resp,
CAST(columns[14] AS INTEGER) AS missed_bytes,
columns[15] AS history,
CAST(columns[16] AS INTEGER) AS orig_packets,
CAST(columns[17] AS INTEGER) AS orig_ip_bytes,
CAST(columns[18] AS INTEGER) AS resp_pkts,
CAST(columns[19] AS INTEGER) AS resp_ip_bytes,
columns[20] AS tunnel_parents
FROM dfs.brologs.`/201808/*`
You can either store them all under a single workspace with a default input type or soft-link/rename them to end in .tsv
(it’s unlikely you want to change all .log
files to be read as TSV everywhere).
While you could just use the logs this way, consider using CTAS to move them to Parquet. The above will created typed columns and the queries will generally be much faster.
One Trackback/Pingback
[…] *** This is a Security Bloggers Network syndicated blog from rud.is authored by hrbrmstr. Read the original post at: https://rud.is/b/2018/08/17/in-brief-using-bro-connection-logs-with-apache-drill/ […]