Category Archives: Information Security

AI Proofing Your It/cyber Career: The Human Only Capabilities That Matter

2026-01-10 – 15:11
Posted in AI, Commentary, Cybersecurity, Data Analysis, data science, data wrangling, Development, Information Security, Leadership, LLM, Personal, Programming, Risk Assessment, Risk Management, Threat Intelligence, Vulnerabilities
Comments (1)

In the past ~4 weeks I have personally observed some irrefutable things in “AI” that are very likely going to cause massive shocks to employment models in IT, software development, systems administration, and cybersecurity. I know some have already seen minor shocks. They are nothing compared to what’s highly probably ahead.

Nobody likely wants to hear this, but you absolutely need to make or take time this year to identify what you can do that AI cannot do and create some of those items if your list is short or empty.

The weavers in the 1800s used violence to get a 20-year pseudo-reprieve before they were pushed into obsolescence. We’ve got ~maybe 18 months. I’m as pushback-on-this-“AI”-thing as makes sense. I’d like for the bubble to burst. Even if it does, the rulers of our clicktatorship will just fuel a quick rebuild.

Four human-only capabilities in security

In my (broad) field, I think there are some things that make humans 110% necessary. Here’s my list — and it’d be great if folks in very subdomain-specific parts of cyber would provide similar ones. I try to stay in my lane.

1. Judgment under uncertainty with real consequences

These new “AI” systems can use tools to analyze a gazillion sessions and cluster payloads, but they do not (or absolutely should not) bear responsibility for the “we’re pulling the plug on production” decision at 3am. This “weight of consequence” shapes human expertise in ways that inform intuition, risk tolerance, and the ability to act decisively with incomplete information.

Organizations will continue needing people who can own outcomes, not just produce analysis.

2. Adversarial creativity and novel problem framing

The more recent “AI” systems are actually darn good at pattern matching against known patterns and recombining existing approaches. They absolutely suck at the “genuinely novel” — the attack vector nobody has documented, the defensive technique that requires understanding how a specific organization actually operates versus how it should operate.

The best security practitioners think like attackers in ways that go beyond “here are common TTPs.”

3. Institutional knowledge and relationship capital

A yuge one.

Understanding that the finance team always ignores security warnings — especially Dave — during quarter-close. That the legacy SCADA system can’t be patched because the vendor went bankrupt in 2019. That the CISO and CTO have a long-running disagreement about cloud migration.

This context shapes what recommendations are actually actionable. Many technically correct analyses are organizationally useless.

4. The ability to build and maintain trust

The biggest one.

When a breach happens, executives don’t want a report from an “AI”. They want someone who can look them in the eye, explain what happened, and take ownership of the path forward. The human element of security leadership is absolutely not going away.

How to develop these capabilities

Develop depth in areas that require your presence or legal accountability. Disciplines such as incident response, compliance attestation, or security architecture for air-gapped or classified environments. These have regulatory and practical barriers to full automation.

Build expertise in the seams between systems. Understanding how a given combination of legacy mainframe, cloud services, and OT environment actually interconnects requires the kind of institutional archaeology (or the powers of a sexton) that doesn’t exist in training data.

Get comfortable being the human in the loop. I know this will get me tapping mute or block a lot, but you’re going to need to get comfortable being the human in the loop for “AI”-augmented workflows. The analyst who can effectively direct tools, validate outputs (b/c these things will always make stuff up), and translate findings for different audiences has a different job than before but still a necessary one.

Learn to ask better questions. Bring your hypotheses, domain expertise, and knowing which threads are worth pulling to the table. That editorial judgment about what matters is undervalued, and is going to take a while to infuse into “AI” systems.

We’re all John Henry now

A year ago, even with long covid brain fog, I could out-“John Henry” all of the commercial AI models at programming, cyber, and writing tasks. Both in speed and quality.

Now, with the fog gone, I’m likely ~3 months away from being slower than “AI” on a substantial number of core tasks that it can absolutely do. I’ve seen it. I’ve validated the outputs. It sucks. It really really sucks. And it’s not because I’m feeble or have some other undisclosed brain condition (unlike 47). These systems are being curated to do exactly that: erase all of us John Henrys.

The folks who thrive will be those who can figure out what “AI” capabilities aren’t complete garbage and wield them with uniquely human judgment rather than competing on tasks where “AI” has clear advantages.

The pipeline problem

The very uncomfortable truth: there will be fewer entry-level positions that consist primarily of “look at alerts and escalate.” That pipeline into the field is narrowing at a frightening pace.

What concerns me most isn’t the senior practitioners. We’ll adapt and likely become that much more effective. It’s the junior folks who won’t get the years of pattern exposure that built our intuition in the first place.

That’s a pipeline problem the industry hasn’t seriously grappled with yet — and isn’t likely to b/c of the hot, thin air in the offices and boardrooms of myopic and greedy senior executives.

Are We Becoming Children of the MagentAI?

2025-12-20 – 09:14
Posted in Commentary, Cybersecurity, Development, Information Security, Programming, Software
Comments (2)

(If you’d prefer, you can skip the intro blathering and just download the full white paper)

Back in 1997, a commercial airline captain noticed his fellow pilots had a problem: they’d gotten so used to following the magenta flight path lines on their fancy new navigation screens that they were forgetting how to actually fly the damn plane. He called them “children of the magenta line.”

Fast forward to now, and I can’t shake the feeling we’re watching the same movie play out in tech; except, the stakes are higher and no regulatory body forcing us to maintain our skills.

Look, I’m not here to tell you AI is bad. I use these tools daily. They’re genuinely useful in limited contexts. But when Dario Amodei (the dude running Anthropic, the company building Claude) goes on record saying AI could wipe out half of all entry-level white-collar jobs in the next few years and push unemployment to 10-20%, maybe we should pay attention.

“We, as the producers of this technology, have a duty and an obligation to be honest about what is coming,” he told Axios. “I don’t think this is on people’s radar.”

He’s not wrong.

The Data’s Already Ugly

Here’s what caught my attention while pulling this together:

Software developer employment for the 22-25 age bracket? Down almost 20% since ChatGPT dropped. Meanwhile, developers over 30 are doing fine. We’re not replacing jobs—we’re eliminating the ladder people used to climb into them.

More than half of engineering leaders are planning to hire fewer juniors because AI lets their senior folks handle the load. AWS’s CEO called this “one of the dumbest things I’ve ever heard” and asked the obvious question: who exactly is going to know anything in ten years?

And my personal favorite: a controlled study found developers using AI tools took 19% longer to complete tasks—while genuinely believing they were 20% faster. That’s a 39-point gap between vibes and reality.

Oh, and a Replit AI agent deleted someone’s entire production database during an explicit code freeze, then tried to cover its tracks by fabricating thousands of fake records. Cool cool cool.

What I Actually Wrote

The full paper traces this from that 1997 pilot observation through Dan Geer’s 2015 warnings (the man saw this coming a decade early) to the current mess. I dug into:

What the research actually shows vs. what the hype claims
Where aviation’s lessons translate and where we’re in uncharted territory
The security implications of AI-generated code (spoiler: not great)
What orgs, industries, and policymakers can actually do about it

This isn’t a “burn it all down” screed. It’s an attempt to think clearly about a transition that’s moving faster than our institutions can adapt.

The window to shape how this goes is still open. Probably not for long.

Grab the full PDF. Read it, argue with it, tell me where I’m wrong and what I missed in the comments.

New CISA KEV MCP Server

2025-05-02 – 09:16
Posted in AI, Cybersecurity, GPT, Information Security, LLM, MCP
Leave a Comment

MCP servers let you wire up external services/APIs in a standard way for LLM/GPT tool-calling and other forms of automation.

I made a basic, but fairly comprehensive CISA KEV MCP server that I go into the details a bit more of here.

To test it, I hammered out some questions to it in Claude Desktop (and in oterm with a local Ollama config which you can see in the aforelinked post), and you can read whole session that is in pictures, below, at https://claude.ai/share/d73aa2be-a536-4c9d-977d-ea80ec6dce15, but these are some of those convos:

CVESky: Monitoring The Bluesky Jetstream For CVE Mentions

2024-12-04 – 06:36
Posted in bluesky, Cybersecurity, Information Security
Leave a Comment

I mentioned this new app over at the newsletter but it deserves a mention on the legacy blog.

CVESky is a tool to explore CVE chatter on Bluesky. At work, we’re ingesting the Bluesky Jetstream and watching for CVE chatter, excluding daft bots that just regurgitate new NVD CVEs.

There are six cards for the current and past five days of chatter, with CVEs displayed in descending order of activity. Tapping on a CVE provides details, and the ability to explore the CVE on Bluesky, Feedly, CIRCL’s Vuln Lookup, and — if present in our data — GreyNoise.

At the bottom of the page is a 30-day heatmap of CVE chatter. Tap on any populated square to see all the Bluesky chatter for that CVE.

This is similar to, but slightly different to the most excellent CVE Crowd, which monitors the Mastodonverse for CVE chatter.

The code behind the site also maintains a Bluesky list containing all the folks who chatter about CVEs on Bluesky.

Comments? Questions? Bugs? Feature requests? Hit up research@greynoise.io.

Avoid libwebp Electron Woes On macOS With positron

2023-09-30 – 16:45
Posted in Cybersecurity, Go, Golang, Information Security, macOS
Comments (2)

If you’ve got 👀 on this blog (directly, or via syndication) you’d have to have been living under a rock to not know about the libwebp supply chain disaster. An unfortunate casualty of inept programming just happened to be any app in the Electron ecosystem that doesn’t undergo bleeding-edge updates.

Former cow-orker Tom Sellers (one of the best humans in cyber) did a great service to the macOS user community with tips on how to stay safe on macOS. His find + strings + grep combo was superbly helpful and I hope many macOS users did the command line dance to see how negligent their app providers were/are.

But, you still have to know what versions are OK and which ones are not to do that dance. And, having had yet-another immune system invasion (thankfully, not COVID, again) on top of still working through long COVID (#protip: you may be over the pandemic, but I guarantee it’s not done with you/us for a while) which re-sapped mobility energy, I put my sedentary time to less woesome use by hacking together a small, Golang macOS CLI to help ferret out bad Electron-based apps you may have installed.

I named it positron, since that’s kind of the opposite of Electron, and I was pretty creativity-challenged today.

It does virtually the same thing as Tom’s strings and grep does, just in a single, lightweight, universal, signed macOS binary.

When I ran it after the final build, all my Electron-based apps were 🔴. After deleting some, and updating others, this is my current status:

$ find /Applications -type f -name "*Electron Framework*" -exec ./positron "{}" \;
/Applications/Signal.app: Chrome/114.0.5735.289 Electron/25.8.4 🟢
/Applications/Keybase.app: Chrome/87.0.4280.141 Electron/11.5.0 🔴
/Applications/Raindrop.io.app: Chrome/102.0.5005.167 Electron/19.0.17 🔴
/Applications/1Password.app: Chrome/114.0.5735.289 Electron/25.8.1 🟢
/Applications/Replit.app: Chrome/116.0.5845.188 Electron/26.2.1 🟢
/Applications/lghub.app: Chrome/104.0.5112.65 Electron/20.0.0 🔴

It’s still on you to do the find (cooler folks run fd) since I’m not about to write a program that’ll rummage across your SSDs or disc drives, but it does all the MachO inspection internally, and then also does the SemVer comparison to let you know which apps still suck at keeping you safe.

FWIW, the Keybase folks did accept a PR for the libwebp thing, but darned if I will spend any time building it (I don’t run it anymore, anyway, so I should just delete it).

The aforementioned signed, universal, macOS binary is in the GitLab releases.

Stay safe out there!

Acoustic: Solving a CyberDefenders PCAP SIP/RTP Challenge with R, Zeek, tshark (& friends)

2021-07-25 – 09:40
Posted in Cybersecurity, Data Analysis, data driven security, data wrangling, Information Security, pcap, R
Comments (3)

Hot on the heels of the previous CyberDefenders Challenge Solution comes this noisy installment which solves their Acoustic challenge.

You can find the source Rmd on GitHub, but I’m also testing the limits of WP’s markdown rendering and putting it in-stream as well.

No longer book expository this time since much of the setup/explanatory bits from it apply here as well).

Acoustic

Convert the PCAP
Examine and Process log.txt
Process Zeek Logs
Process Packet Summary
What is the transport protocol being used?
The attacker used a bunch of scanning tools that belong to the same suite. Provide the name of the suite.
“What is the User-Agent of the victim system?”
Which tool was only used against the following extensions: 100, 101, 102, 103, and 111?
Which extension on the honeypot does NOT require authentication?
How many extensions were scanned in total?
There is a trace for a real SIP client. What is the corresponding user-agent? (two words, once space in between)
Multiple real-world phone numbers were dialed. Provide the first 11 digits of the number dialed from extension 101?
What are the default credentials used in the attempted basic authentication? (format is username:password)
Which codec does the RTP stream use? (3 words, 2 spaces in between)
How long is the sampling time (in milliseconds)?
What was the password for the account with username 555?
Which RTP packet header field can be used to reorder out of sync
RTP packets in the correct sequence?
The trace includes a secret hidden message. Can you hear
it?

This challenge takes us “into the world of voice communications on the internet. VoIP is becoming the de-facto standard for voice communication. As this technology becomes more common, malicious parties have more opportunities and stronger motives to control these systems to conduct nefarious activities. This challenge was designed to examine and explore some of the attributes of the SIP and RTP protocols.”

We have two files to work with:

log.txt which was generated from an unadvertised, passive honeypot located on the internet such that any traffic destined to it must be nefarious. Unknown parties scanned the honeypot with a range of tools, and this activity is represented in the log file.
- The IP address of the honeypot has been changed to “honey.pot.IP.removed”. In terms of geolocation, pick your favorite city.
- The MD5 hash in the authorization digest is replaced with “MD5_hash_removedXXXXXXXXXXXXXXXX”
- Some octets of external IP addresses have been replaced with an “X”
- Several trailing digits of phone numbers have been replaced with an “X”
- Assume the timestamps in the log files are UTC.
Voip-trace.pcap was created by honeynet members for this forensic challenge to allow participants to employ network analysis skills in the VOIP context.

There are 14 questions to answer.

If you are not familiar with SIP and/or RTP you should do a bit of research first. A good place to start is RTC 3261 (for SIP) and RFC 3550 (for RTC). Some questions may be able to be answered just by knowing the details of these protocols.

Convert the PCAP

library(stringi)
library(tidyverse)

We’ll pre-generate Zeek logs. The -C tells Zeek to not bother with checksums, -r tells it to read from a file and the LogAscii::use_json=T means we want JSON output vs the default delimited files. JSON gives us data types (the headers in the delimited files do as well, but we’d have to write something to read those types then deal with it vs get this for free out of the box with JSON).

system("ZEEK_LOG_SUFFIX=json /opt/zeek/bin/zeek -C -r src/Voip-trace.pcap LogAscii::use_json=T HTTP::default_capture_password=T")

We process the PCAP twice with tshark. Once to get the handy (and small) packet summary table, then dump the whole thing to JSON. We may need to run tshark again down the road a bit.

system("tshark -T tabs -r src/Voip-trace.pcap > voip-packets.tsv")
system("tshark -T json -r src/Voip-trace.pcap > voip-trace")

Examine and Process `log.txt`

We aren’t told what format log.txt is in, so let’s take a look:

cd_sip_log <- stri_read_lines("src/log.txt")

cat(head(cd_sip_log, 25), sep="\n")
## Source: 210.184.X.Y:1083
## Datetime: 2010-05-02 01:43:05.606584
## 
## Message:
## 
## OPTIONS sip:100@honey.pot.IP.removed SIP/2.0
## Via: SIP/2.0/UDP 127.0.0.1:5061;branch=z9hG4bK-2159139916;rport
## Content-Length: 0
## From: "sipvicious"<sip:100@1.1.1.1>; tag=X_removed
## Accept: application/sdp
## User-Agent: friendly-scanner
## To: "sipvicious"<sip:100@1.1.1.1>
## Contact: sip:100@127.0.0.1:5061
## CSeq: 1 OPTIONS
## Call-ID: 845752980453913316694142
## Max-Forwards: 70
## 
## 
## 
## 
## -------------------------
## Source: 210.184.X.Y:4956
## Datetime: 2010-05-02 01:43:12.488811
## 
## Message:

These look a bit like HTTP server responses, but we know we’re working in SIP land and if you perused the RFC you’d have noticed that SIP is an HTTP-like ASCII protocol. While some HTTP response parsers might work on these records, it’s pretty straightforward to whip up a bespoke pseudo-parser.

Let’s see how many records there are by counting the number of “Message:” lines (we’re doing this, primarily, to see if we should use the {furrr} package to speed up processing):

cd_sip_log[stri_detect_fixed(cd_sip_log, "Message:")] %>%
  table()
## .
## Message: 
##     4266

There are many, so we’ll avoid parallel processing the data and just use a single thread.

One way to tackle the parsing is to look for the stop and start of each record, extract fields (these have similar formats to HTTP headers), and perhaps have to extract content as well. We know this because there are “Content-Length:” fields. According to the RFC they are supposed to exist for every message. Let’s first see if any “Content-Length:” header records are greater than 0. We’ll do this with a little help from the ripgrep utility as it provides a way to see context before and/or after matched patterns:

cat(system('rg --after-context=10 "^Content-Length: [^0]" src/log.txt', intern=TRUE), sep="\n")
## Content-Length: 330
## 
## v=0
## o=Zoiper_user 0 0 IN IP4 89.42.194.X
## s=Zoiper_session
## c=IN IP4 89.42.194.X
## t=0 0
## m=audio 52999 RTP/AVP 3 0 8 110 98 101
## a=rtpmap:3 GSM/8000
## a=rtpmap:0 PCMU/8000
## a=rtpmap:8 PCMA/8000
## --
## Content-Length: 330
## 
## v=0
## o=Zoiper_user 0 0 IN IP4 89.42.194.X
## s=Zoiper_session
## c=IN IP4 89.42.194.X
## t=0 0
## m=audio 52999 RTP/AVP 3 0 8 110 98 101
## a=rtpmap:3 GSM/8000
## a=rtpmap:0 PCMU/8000
## a=rtpmap:8 PCMA/8000
## --
## Content-Length: 330
## 
## v=0
## o=Zoiper_user 0 0 IN IP4 89.42.194.X
## s=Zoiper_session
## c=IN IP4 89.42.194.X
## t=0 0
## m=audio 52999 RTP/AVP 3 0 8 110 98 101
## a=rtpmap:3 GSM/8000
## a=rtpmap:0 PCMU/8000
## a=rtpmap:8 PCMA/8000
## --
## Content-Length: 330
## 
## v=0
## o=Zoiper_user 0 0 IN IP4 89.42.194.X
## s=Zoiper_session
## c=IN IP4 89.42.194.X
## t=0 0
## m=audio 52999 RTP/AVP 3 0 8 110 98 101
## a=rtpmap:3 GSM/8000
## a=rtpmap:0 PCMU/8000
## a=rtpmap:8 PCMA/8000

So,we do need to account for content. It’s still pretty straightforward (explanatory comments inline):

starts <- which(stri_detect_regex(cd_sip_log, "^Source:"))
stops <- which(stri_detect_regex(cd_sip_log, "^----------"))

map2_dfr(starts, stops, ~{

  raw_rec <- stri_trim_both(cd_sip_log[.x:.y]) # target the record from the log
  raw_rec <- raw_rec[raw_rec != "-------------------------"] # remove separator

  msg_idx <- which(stri_detect_regex(raw_rec, "^Message:")) # find where "Message:" line is
  source_idx <- which(stri_detect_regex(raw_rec, "^Source: ")) # find where "Source:" line is
  datetime_idx <- which(stri_detect_regex(raw_rec, "^Datetime: ")) # find where "Datetime:" line is
  contents_idx <- which(stri_detect_regex(raw_rec[(msg_idx+2):length(raw_rec)], "^$"))[1] + 2 # get position of the "data"

  source <- stri_match_first_regex(raw_rec[source_idx], "^Source: (.*)$")[,2] # extract source
  datetime <- stri_match_first_regex(raw_rec[datetime_idx], "^Datetime: (.*)$")[,2] # extract datetime
  request <- raw_rec[msg_idx+2] # extract request line

  # build a matrix out of the remaining headers. header key will be in column 2, value will be in column 3
  tmp <- stri_match_first_regex(raw_rec[(msg_idx+3):contents_idx], "^([^:]+):[[:space:]]+(.*)$")
  tmp[,2] <- stri_trans_tolower(tmp[,2]) # lowercase the header key
  tmp[,2] <- stri_replace_all_fixed(tmp[,2], "-", "_") # turn dashes to underscores so we can more easily use the keys as column names

  contents <- raw_rec[(contents_idx+1):length(raw_rec)]
  contents <- paste0(contents[contents != ""], collapse = "\n")

  as.list(tmp[,3]) %>% # turn the header values into a list
    set_names(tmp[,2]) %>% # make their names the tranformed keys
    append(c(
      source = source, # add source to the list (etc)
      datetime = datetime,
      request = request,
      contents = contents
    ))

}) -> sip_log_parsed

Let’s see what we have:

sip_log_parsed
## # A tibble: 4,266 x 18
##    via     content_length from    accept  user_agent to     contact cseq  source
##    <chr>   <chr>          <chr>   <chr>   <chr>      <chr>  <chr>   <chr> <chr> 
##  1 SIP/2.… 0              "\"sip… applic… friendly-… "\"si… sip:10… 1 OP… 210.1…
##  2 SIP/2.… 0              "\"342… applic… friendly-… "\"34… sip:34… 1 RE… 210.1…
##  3 SIP/2.… 0              "\"172… applic… friendly-… "\"17… sip:17… 1 RE… 210.1…
##  4 SIP/2.… 0              "\"adm… applic… friendly-… "\"ad… sip:ad… 1 RE… 210.1…
##  5 SIP/2.… 0              "\"inf… applic… friendly-… "\"in… sip:in… 1 RE… 210.1…
##  6 SIP/2.… 0              "\"tes… applic… friendly-… "\"te… sip:te… 1 RE… 210.1…
##  7 SIP/2.… 0              "\"pos… applic… friendly-… "\"po… sip:po… 1 RE… 210.1…
##  8 SIP/2.… 0              "\"sal… applic… friendly-… "\"sa… sip:sa… 1 RE… 210.1…
##  9 SIP/2.… 0              "\"ser… applic… friendly-… "\"se… sip:se… 1 RE… 210.1…
## 10 SIP/2.… 0              "\"sup… applic… friendly-… "\"su… sip:su… 1 RE… 210.1…
## # … with 4,256 more rows, and 9 more variables: datetime <chr>, request <chr>,
## #   contents <chr>, call_id <chr>, max_forwards <chr>, expires <chr>,
## #   allow <chr>, authorization <chr>, content_type <chr>

glimpse(sip_log_parsed)
## Rows: 4,266
## Columns: 18
## $ via            <chr> "SIP/2.0/UDP 127.0.0.1:5061;branch=z9hG4bK-2159139916;r…
## $ content_length <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", …
## $ from           <chr> "\"sipvicious\"<sip:100@1.1.1.1>; tag=X_removed", "\"34…
## $ accept         <chr> "application/sdp", "application/sdp", "application/sdp"…
## $ user_agent     <chr> "friendly-scanner", "friendly-scanner", "friendly-scann…
## $ to             <chr> "\"sipvicious\"<sip:100@1.1.1.1>", "\"3428948518\"<sip:…
## $ contact        <chr> "sip:100@127.0.0.1:5061", "sip:3428948518@honey.pot.IP.…
## $ cseq           <chr> "1 OPTIONS", "1 REGISTER", "1 REGISTER", "1 REGISTER", …
## $ source         <chr> "210.184.X.Y:1083", "210.184.X.Y:4956", "210.184.X.Y:51…
## $ datetime       <chr> "2010-05-02 01:43:05.606584", "2010-05-02 01:43:12.4888…
## $ request        <chr> "OPTIONS sip:100@honey.pot.IP.removed SIP/2.0", "REGIST…
## $ contents       <chr> "Call-ID: 845752980453913316694142\nMax-Forwards: 70", …
## $ call_id        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ max_forwards   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ expires        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ allow          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ authorization  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ content_type   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

Looks 👍, but IRL there are edge-cases we’d have to deal with.

Process Zeek Logs

Because they’re JSON files, and the names are reasonable, we can do some magic incantations to read them all in and shove them into a list we’ll call zeek:

zeek <- list()

list.files(
  pattern = "json$",
  full.names = TRUE
) %>%
  walk(~{
    append(zeek, list(file(.x) %>% 
      jsonlite::stream_in(verbose = FALSE) %>%
      as_tibble()) %>% 
        set_names(tools::file_path_sans_ext(basename(.x)))
    ) ->> zeek
  })

str(zeek, 1)
## List of 7
##  $ conn         : tibble [97 × 18] (S3: tbl_df/tbl/data.frame)
##  $ dpd          : tibble [1 × 9] (S3: tbl_df/tbl/data.frame)
##  $ files        : tibble [38 × 16] (S3: tbl_df/tbl/data.frame)
##  $ http         : tibble [92 × 24] (S3: tbl_df/tbl/data.frame)
##  $ packet_filter: tibble [1 × 5] (S3: tbl_df/tbl/data.frame)
##  $ sip          : tibble [9 × 23] (S3: tbl_df/tbl/data.frame)
##  $ weird        : tibble [1 × 9] (S3: tbl_df/tbl/data.frame)

walk2(names(zeek), zeek, ~{
  cat("File:", .x, "\n")
  glimpse(.y)
  cat("\n\n")
})
## File: conn 
## Rows: 97
## Columns: 18
## $ ts            <dbl> 1272737631, 1272737581, 1272737669, 1272737669, 12727376…
## $ uid           <chr> "Cb0OAQ1eC0ZhQTEKNl", "C2s0IU2SZFGVlZyH43", "CcEeLRD3cca…
## $ id.orig_h     <chr> "172.25.105.43", "172.25.105.43", "172.25.105.43", "172.…
## $ id.orig_p     <int> 57086, 5060, 57087, 57088, 57089, 57090, 57091, 57093, 5…
## $ id.resp_h     <chr> "172.25.105.40", "172.25.105.40", "172.25.105.40", "172.…
## $ id.resp_p     <int> 80, 5060, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80…
## $ proto         <chr> "tcp", "udp", "tcp", "tcp", "tcp", "tcp", "tcp", "tcp", …
## $ service       <chr> "http", "sip", "http", "http", "http", "http", "http", "…
## $ duration      <dbl> 0.0180180073, 0.0003528595, 0.0245900154, 0.0740420818, …
## $ orig_bytes    <int> 502, 428, 380, 385, 476, 519, 520, 553, 558, 566, 566, 5…
## $ resp_bytes    <int> 720, 518, 231, 12233, 720, 539, 17499, 144, 144, 144, 14…
## $ conn_state    <chr> "SF", "SF", "SF", "SF", "SF", "SF", "SF", "SF", "SF", "S…
## $ missed_bytes  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ history       <chr> "ShADadfF", "Dd", "ShADadfF", "ShADadfF", "ShADadfF", "S…
## $ orig_pkts     <int> 5, 1, 5, 12, 5, 6, 16, 6, 6, 5, 5, 5, 5, 5, 5, 5, 6, 5, …
## $ orig_ip_bytes <int> 770, 456, 648, 1017, 744, 839, 1360, 873, 878, 834, 834,…
## $ resp_pkts     <int> 5, 1, 5, 12, 5, 5, 16, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
## $ resp_ip_bytes <int> 988, 546, 499, 12865, 988, 807, 18339, 412, 412, 412, 41…
## 
## 
## File: dpd 
## Rows: 1
## Columns: 9
## $ ts             <dbl> 1272737798
## $ uid            <chr> "CADvMziC96POynR2e"
## $ id.orig_h      <chr> "172.25.105.3"
## $ id.orig_p      <int> 43204
## $ id.resp_h      <chr> "172.25.105.40"
## $ id.resp_p      <int> 5060
## $ proto          <chr> "udp"
## $ analyzer       <chr> "SIP"
## $ failure_reason <chr> "Binpac exception: binpac exception: string mismatch at…
## 
## 
## File: files 
## Rows: 38
## Columns: 16
## $ ts             <dbl> 1272737631, 1272737669, 1272737676, 1272737688, 1272737…
## $ fuid           <chr> "FRnb7P5EDeZE4Y3z4", "FOT2gC2yLxjfMCuE5f", "FmUCuA3dzcS…
## $ tx_hosts       <list> "172.25.105.40", "172.25.105.40", "172.25.105.40", "17…
## $ rx_hosts       <list> "172.25.105.43", "172.25.105.43", "172.25.105.43", "17…
## $ conn_uids      <list> "Cb0OAQ1eC0ZhQTEKNl", "CFfYtA0DqqrJk4gI5", "CHN4qA4UUH…
## $ source         <chr> "HTTP", "HTTP", "HTTP", "HTTP", "HTTP", "HTTP", "HTTP",…
## $ depth          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ analyzers      <list> [], [], [], [], [], [], [], [], [], [], [], [], [], []…
## $ mime_type      <chr> "text/html", "text/html", "text/html", "text/html", "te…
## $ duration       <dbl> 0.000000e+00, 8.920908e-03, 0.000000e+00, 0.000000e+00,…
## $ is_orig        <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, …
## $ seen_bytes     <int> 479, 11819, 479, 313, 17076, 55, 50, 30037, 31608, 1803…
## $ total_bytes    <int> 479, NA, 479, 313, NA, 55, 50, NA, NA, NA, 58, 313, 50,…
## $ missing_bytes  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ overflow_bytes <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ timedout       <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
## 
## 
## File: http 
## Rows: 92
## Columns: 24
## $ ts                <dbl> 1272737631, 1272737669, 1272737669, 1272737676, 1272…
## $ uid               <chr> "Cb0OAQ1eC0ZhQTEKNl", "CcEeLRD3cca3j4QGh", "CFfYtA0D…
## $ id.orig_h         <chr> "172.25.105.43", "172.25.105.43", "172.25.105.43", "…
## $ id.orig_p         <int> 57086, 57087, 57088, 57089, 57090, 57091, 57093, 570…
## $ id.resp_h         <chr> "172.25.105.40", "172.25.105.40", "172.25.105.40", "…
## $ id.resp_p         <int> 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, …
## $ trans_depth       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ method            <chr> "GET", "GET", "GET", "GET", "GET", "GET", "GET", "GE…
## $ host              <chr> "172.25.105.40", "172.25.105.40", "172.25.105.40", "…
## $ uri               <chr> "/maint", "/", "/user/", "/maint", "/maint", "/maint…
## $ referrer          <chr> "http://172.25.105.40/user/", NA, NA, "http://172.25…
## $ version           <chr> "1.1", "1.1", "1.1", "1.1", "1.1", "1.1", "1.1", "1.…
## $ user_agent        <chr> "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9)…
## $ request_body_len  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ response_body_len <int> 479, 0, 11819, 479, 313, 17076, 0, 0, 0, 0, 0, 0, 0,…
## $ status_code       <int> 401, 302, 200, 401, 301, 200, 304, 304, 304, 304, 30…
## $ status_msg        <chr> "Authorization Required", "Found", "OK", "Authorizat…
## $ tags              <list> [], [], [], [], [], [], [], [], [], [], [], [], [],…
## $ resp_fuids        <list> "FRnb7P5EDeZE4Y3z4", <NULL>, "FOT2gC2yLxjfMCuE5f", …
## $ resp_mime_types   <list> "text/html", <NULL>, "text/html", "text/html", "tex…
## $ username          <chr> NA, NA, NA, NA, "maint", "maint", "maint", "maint", …
## $ password          <chr> NA, NA, NA, NA, "password", "password", "password", …
## $ orig_fuids        <list> <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NU…
## $ orig_mime_types   <list> <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NU…
## 
## 
## File: packet_filter 
## Rows: 1
## Columns: 5
## $ ts      <dbl> 1627151196
## $ node    <chr> "zeek"
## $ filter  <chr> "ip or not ip"
## $ init    <lgl> TRUE
## $ success <lgl> TRUE
## 
## 
## File: sip 
## Rows: 9
## Columns: 23
## $ ts                <dbl> 1272737581, 1272737768, 1272737768, 1272737768, 1272…
## $ uid               <chr> "C2s0IU2SZFGVlZyH43", "CADvMziC96POynR2e", "CADvMziC…
## $ id.orig_h         <chr> "172.25.105.43", "172.25.105.3", "172.25.105.3", "17…
## $ id.orig_p         <int> 5060, 43204, 43204, 43204, 43204, 43204, 43204, 4320…
## $ id.resp_h         <chr> "172.25.105.40", "172.25.105.40", "172.25.105.40", "…
## $ id.resp_p         <int> 5060, 5060, 5060, 5060, 5060, 5060, 5060, 5060, 5060
## $ trans_depth       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ method            <chr> "OPTIONS", "REGISTER", "REGISTER", "SUBSCRIBE", "SUB…
## $ uri               <chr> "sip:100@172.25.105.40", "sip:172.25.105.40", "sip:1…
## $ request_from      <chr> "\"sipvicious\"<sip:100@1.1.1.1>", "<sip:555@172.25.…
## $ request_to        <chr> "\"sipvicious\"<sip:100@1.1.1.1>", "<sip:555@172.25.…
## $ response_from     <chr> "\"sipvicious\"<sip:100@1.1.1.1>", "<sip:555@172.25.…
## $ response_to       <chr> "\"sipvicious\"<sip:100@1.1.1.1>;tag=as18cdb0c9", "<…
## $ call_id           <chr> "61127078793469957194131", "MzEwMmYyYWRiYTUxYTBhODY3…
## $ seq               <chr> "1 OPTIONS", "1 REGISTER", "2 REGISTER", "1 SUBSCRIB…
## $ request_path      <list> "SIP/2.0/UDP 127.0.1.1:5060", "SIP/2.0/UDP 172.25.10…
## $ response_path     <list> "SIP/2.0/UDP 127.0.1.1:5060", "SIP/2.0/UDP 172.25.10…
## $ user_agent        <chr> "UNfriendly-scanner - for demo purposes", "X-Lite B…
## $ status_code       <int> 200, 401, 200, 401, 404, 401, 100, 200, NA
## $ status_msg        <chr> "OK", "Unauthorized", "OK", "Unauthorized", "Not fo…
## $ request_body_len  <int> 0, 0, 0, 0, 0, 264, 264, 264, 0
## $ response_body_len <int> 0, 0, 0, 0, 0, 0, 0, 302, NA
## $ content_type      <chr> NA, NA, NA, NA, NA, NA, NA, "application/sdp", NA
## 
## 
## File: weird 
## Rows: 1
## Columns: 9
## $ ts        <dbl> 1272737805
## $ id.orig_h <chr> "172.25.105.3"
## $ id.orig_p <int> 0
## $ id.resp_h <chr> "172.25.105.40"
## $ id.resp_p <int> 0
## $ name      <chr> "truncated_IPv6"
## $ notice    <lgl> FALSE
## $ peer      <chr> "zeek"
## $ source    <chr> "IP"

Process Packet Summary

We won’t process the big JSON file tshark generated for us util we really have to, but we can read in the packet summary table now:

packet_cols <- c("packet_num", "ts", "src", "discard", "dst", "proto", "length", "info")

read_tsv(
  file = "voip-packets.tsv",
  col_names = packet_cols,
  col_types = "ddccccdc"
) %>%
  select(-discard) -> packets

packets
## # A tibble: 4,447 x 7
##    packet_num       ts src      dst     proto length info                       
##         <dbl>    <dbl> <chr>    <chr>   <chr>  <dbl> <chr>                      
##  1          1  0       172.25.… 172.25… SIP      470 Request: OPTIONS sip:100@1…
##  2          2  3.53e-4 172.25.… 172.25… SIP      560 Status: 200 OK |           
##  3          3  5.03e+1 172.25.… 172.25… TCP       74 57086 → 80 [SYN] Seq=0 Win…
##  4          4  5.03e+1 172.25.… 172.25… TCP       74 80 → 57086 [SYN, ACK] Seq=…
##  5          5  5.03e+1 172.25.… 172.25… TCP       66 57086 → 80 [ACK] Seq=1 Ack…
##  6          6  5.03e+1 172.25.… 172.25… HTTP     568 GET /maint HTTP/1.1        
##  7          7  5.03e+1 172.25.… 172.25… TCP       66 80 → 57086 [ACK] Seq=1 Ack…
##  8          8  5.03e+1 172.25.… 172.25… HTTP     786 HTTP/1.1 401 Authorization…
##  9          9  5.03e+1 172.25.… 172.25… TCP       66 80 → 57086 [FIN, ACK] Seq=…
## 10         10  5.03e+1 172.25.… 172.25… TCP       66 57086 → 80 [ACK] Seq=503 A…
## # … with 4,437 more rows

glimpse(packets)
## Rows: 4,447
## Columns: 7
## $ packet_num <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
## $ ts         <dbl> 0.000000, 0.000353, 50.317176, 50.317365, 50.320071, 50.329…
## $ src        <chr> "172.25.105.43", "172.25.105.40", "172.25.105.43", "172.25.…
## $ dst        <chr> "172.25.105.40", "172.25.105.43", "172.25.105.40", "172.25.…
## $ proto      <chr> "SIP", "SIP", "TCP", "TCP", "TCP", "HTTP", "TCP", "HTTP", "…
## $ length     <dbl> 470, 560, 74, 74, 66, 568, 66, 786, 66, 66, 66, 66, 74, 74,…
## $ info       <chr> "Request: OPTIONS sip:100@172.25.105.40 |", "Status: 200 OK…

What is the transport protocol being used?

SIP can use TCP or UDP and which transport it uses will be specified in the Via: header. Let’s take a look:

head(sip_log_parsed$via)
## [1] "SIP/2.0/UDP 127.0.0.1:5061;branch=z9hG4bK-2159139916;rport"
## [2] "SIP/2.0/UDP 127.0.0.1:5087;branch=z9hG4bK-1189344537;rport"
## [3] "SIP/2.0/UDP 127.0.0.1:5066;branch=z9hG4bK-2119091576;rport"
## [4] "SIP/2.0/UDP 127.0.0.1:5087;branch=z9hG4bK-3226446220;rport"
## [5] "SIP/2.0/UDP 127.0.0.1:5087;branch=z9hG4bK-1330901245;rport"
## [6] "SIP/2.0/UDP 127.0.0.1:5087;branch=z9hG4bK-945386205;rport"

Are they all UDP? We can find out by performing some light processing
on the via column:

sip_log_parsed %>% 
  select(via) %>% 
  mutate(
    transport = stri_match_first_regex(via, "^([^[:space:]]+)")[,2]
  ) %>% 
  count(transport, sort=TRUE)
## # A tibble: 1 x 2
##   transport       n
##   <chr>       <int>
## 1 SIP/2.0/UDP  4266

Looks like they’re all UDP. Question 1: ✅

The attacker used a bunch of scanning tools that belong to the same suite. Provide the name of the suite.

Don’t you, now, wish you had listen to your parents when they were telling you about the facts of SIP life when you were a wee pup?

We’ll stick with the SIP log to answer this one and peek back at the RFC to see that there’s a “User-Agent:” field which contains information about the client originating the request. Most scanners written by defenders identify themselves in User-Agent fields when those fields are available in a protocol exchange, and a large percentage of naive malicious folks are too daft to change this value (or leave it default to make you think they’re not behaving badly).

If you are a regular visitor to SIP land, you likely know the common SIP scanning tools. These are a few:

Nmap’s SIP library
Mr.SIP, a “SIP-Based Audit and Attack Tool”
SIPVicious, a “set of security tools that can be used to audit SIP based VoIP systems”
Sippts, a “set of tools to audit SIP based VoIP Systems”

(There are many more.)

Let’s see what user-agent was used in this log extract:

count(sip_log_parsed, user_agent, sort=TRUE)
## # A tibble: 3 x 2
##   user_agent           n
##   <chr>            <int>
## 1 friendly-scanner  4248
## 2 Zoiper rev.6751     14
## 3 <NA>                 4

The overwhelming majority are friendly-scanner. Let’s look at a few of those log entries:

sip_log_parsed %>% 
  filter(
    user_agent == "friendly-scanner"
  ) %>% 
  glimpse()
## Rows: 4,248
## Columns: 18
## $ via            <chr> "SIP/2.0/UDP 127.0.0.1:5061;branch=z9hG4bK-2159139916;r…
## $ content_length <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", …
## $ from           <chr> "\"sipvicious\"<sip:100@1.1.1.1>; tag=X_removed", "\"34…
## $ accept         <chr> "application/sdp", "application/sdp", "application/sdp"…
## $ user_agent     <chr> "friendly-scanner", "friendly-scanner", "friendly-scann…
## $ to             <chr> "\"sipvicious\"<sip:100@1.1.1.1>", "\"3428948518\"<sip:…
## $ contact        <chr> "sip:100@127.0.0.1:5061", "sip:3428948518@honey.pot.IP.…
## $ cseq           <chr> "1 OPTIONS", "1 REGISTER", "1 REGISTER", "1 REGISTER", …
## $ source         <chr> "210.184.X.Y:1083", "210.184.X.Y:4956", "210.184.X.Y:51…
## $ datetime       <chr> "2010-05-02 01:43:05.606584", "2010-05-02 01:43:12.4888…
## $ request        <chr> "OPTIONS sip:100@honey.pot.IP.removed SIP/2.0", "REGIST…
## $ contents       <chr> "Call-ID: 845752980453913316694142\nMax-Forwards: 70", …
## $ call_id        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ max_forwards   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ expires        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ allow          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ authorization  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ content_type   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

Those from and to fields have an interesting name in them: “sipviscious”. You’ve seen that before, right at the beginning of this section.

Let’s do a quick check over at the SIPvicious repo just to make sure.

count(sip_log_parsed, user_agent)
## # A tibble: 3 x 2
##   user_agent           n
##   <chr>            <int>
## 1 friendly-scanner  4248
## 2 Zoiper rev.6751     14
## 3 <NA>                 4

“What is the User-Agent of the victim system?”

We only have partial data in the text log so we’ll have to look elsewhere (the PCAP) for this information. The “victim” is whatever was the target of a this SIP-based attack and we can look for SIP messages, user agents, and associated IPs in the PCAP thanks to tshark’s rich SIP filter library:

system("tshark -Q -T fields -e ip.src -e ip.dst -e sip.User-Agent -r src/Voip-trace.pcap 'sip.User-Agent'")

That first exchange is all we really need. We see our rude poker talking to 172.25.105.40 and it responding right after.

Which tool was only used against the following extensions: 100, 101, 102, 103, and 111?

The question is a tad vague and is assuming — since we now know the SIPvicious suite was used — that we also know to provide the name of the Python script in SIPvicious that was used. There are five tools:

svmap: this is a sip scanner. When launched against ranges of ip address space, it will identify any SIP servers which it finds on the way. Also has the option to scan hosts on ranges of ports. Usage: https://github.com/EnableSecurity/sipvicious/wiki/SVMap-Usage
svwar: identifies working extension lines on a PBX. A working extension is one that can be registered. Also tells you if the extension line requires authentication or not. Usage: https://github.com/EnableSecurity/sipvicious/wiki/SVWar-Usage
svcrack: a password cracker making use of digest authentication. It is able to crack passwords on both registrar servers and proxy servers. Current cracking modes are either numeric ranges or words from dictionary files. Usage: https://github.com/EnableSecurity/sipvicious/wiki/SVCrack-Usage
svreport: able to manage sessions created by the rest of the tools and export to pdf, xml, csv and plain text. Usage: https://github.com/EnableSecurity/sipvicious/wiki/SVReport-Usage
svcrash: responds to svwar and svcrack SIP messages with a message that causes old versions to crash. Usage: https://github.com/EnableSecurity/sipvicious/wiki/SVCrash-FAQ

The svcrash tool is something defenders can use to help curtail scanner activity. We can cross that off the list. The svreport tool is for working with data generated by svmap, svwar and/or svcrack. One more crossed off. We also know that the attacker scanned the SIP network looking for nodes, which means svmap and svwar are likely not exclusive tool to the target extensions. (We technically have enough information right now to answer the question especially if you look carefully at the answer box on the site but that’s cheating).

The SIP request line and header field like “To:” destination information in the form of a SIP URI. Since we only care about the extension component of the URI for this question, we can use a regular expression to isolate them.

Back to the SIP log to see if we can find the identified extensions. We’ll also process the “From:” header just in case we need it.

sip_log_parsed %>% 
  mutate_at(
    vars(request, from, to),
    ~stri_match_first_regex(.x, "sip:([^@]+)@")[,2]
  ) %>% 
  select(request, from, to)
## # A tibble: 4,266 x 3
##    request    from       to        
##    <chr>      <chr>      <chr>     
##  1 100        100        100       
##  2 3428948518 3428948518 3428948518
##  3 1729240413 1729240413 1729240413
##  4 admin      admin      admin     
##  5 info       info       info      
##  6 test       test       test      
##  7 postmaster postmaster postmaster
##  8 sales      sales      sales     
##  9 service    service    service   
## 10 support    support    support   
## # … with 4,256 more rows

That worked! We can now see what friendly-scanner attempted to authenticate only to our targets:

sip_log_parsed %>%
  mutate_at(
    vars(request, from, to),
    ~stri_match_first_regex(.x, "sip:([^@]+)@")[,2]
  ) %>% 
  filter(
    user_agent == "friendly-scanner",
    stri_detect_fixed(contents, "Authorization")
  ) %>% 
  distinct(to)
## # A tibble: 4 x 1
##   to   
##   <chr>
## 1 102  
## 2 103  
## 3 101  
## 4 111

While we’re missing 100 that’s likely due to it not requiring authentication (svcrack will REGISTER first to determine if a target requires authentication and not send cracking requests if it doesn’t).

Which extension on the honeypot does NOT require authentication?

We know this due to what we found in the previous question. Extension 100 does not require authentication.

How many extensions were scanned in total?

We just need to count the distinct to’s where the user agent is the scanner:

sip_log_parsed %>% 
  mutate_at(
    vars(request, from, to),
    ~stri_match_first_regex(.x, "sip:([^@]+)@")[,2]
  ) %>% 
  filter(
    user_agent == "friendly-scanner"
  ) %>% 
  distinct(to)
## # A tibble: 2,652 x 1
##    to        
##    <chr>     
##  1 100       
##  2 3428948518
##  3 1729240413
##  4 admin     
##  5 info      
##  6 test      
##  7 postmaster
##  8 sales     
##  9 service   
## 10 support   
## # … with 2,642 more rows

There is a trace for a real SIP client. What is the corresponding user-agent? (two words, once space in between)

We only need to look for user agent’s that aren’t our scanner:

sip_log_parsed %>% 
  filter(
    user_agent != "friendly-scanner"
  ) %>% 
  count(user_agent)
## # A tibble: 1 x 2
##   user_agent          n
##   <chr>           <int>
## 1 Zoiper rev.6751    14

Multiple real-world phone numbers were dialed. Provide the first 11 digits of the number dialed from extension 101?

Calls are “INVITE” requests

sip_log_parsed %>% 
  mutate_at(
    vars(from, to),
    ~stri_match_first_regex(.x, "sip:([^@]+)@")[,2]
  ) %>% 
  filter(
    from == 101,
    stri_detect_regex(cseq, "INVITE")
  ) %>% 
  select(to) 
## # A tibble: 3 x 1
##   to              
##   <chr>           
## 1 900114382089XXXX
## 2 00112322228XXXX 
## 3 00112524021XXXX

The challenge answer box provides hint to what number they want. I’m not sure but I suspect it may be randomized, so you’ll have to match the pattern they expect with the correct digits above.

What are the default credentials used in the attempted basic authentication? (format is username:password)

This question wants us to look at the HTTP requests that require authentication. We can get he credentials info from the zeek$http log:

zeek$http %>% 
  distinct(username, password)
## # A tibble: 2 x 2
##   username password
##   <chr>    <chr>   
## 1 <NA>     <NA>    
## 2 maint    password

Which codec does the RTP stream use? (3 words, 2 spaces in between)

“Codec” refers to the algorithm used to encode/decode an audio or video stream. The RTP RFC uses the term “payload type” to refer to this during exchanges and even has a link to RFC 3551 which provides further information on these encodings.

The summary packet table that tshark generates helpfully provides summary info for RTP packets and part of that info is PT=… which indicates the payload type.

packets %>% 
  filter(proto == "RTP") %>% 
  select(info)
## # A tibble: 2,988 x 1
##    info                                                       
##    <chr>                                                      
##  1 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6402, Time=126160
##  2 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6403, Time=126320
##  3 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6404, Time=126480
##  4 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6405, Time=126640
##  5 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6406, Time=126800
##  6 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6407, Time=126960
##  7 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6408, Time=127120
##  8 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6409, Time=127280
##  9 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6410, Time=127440
## 10 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6411, Time=127600
## # … with 2,978 more rows

How long is the sampling time (in milliseconds)?

1 Hz = 1,000 ms
1 ms = 1,000 Hz

(1/8000) * 1000

What was the password for the account with username 555?

We don’t really need to use external programs for this but it will sure go quite a bit faster if we do. While the original reference page for sipdump and sipcrack is defunct, you can visit that link to go to the Wayback machine’s capture of it. It will help if you have a linux system handy (so Docker to the rescue for macOS and Windows folks) since the following answer details are running on Ubunbu.

This question is taking advantage of the fact that the default authentication method for SIP is extremely weak. The process uses an MD5 challenge/response, and if an attacker can capture call traffic it is possible to brute force the password offline (which is what we’ll use sipcrack for).

You can install them via sudo apt install sipcrack.

We’ll first generate a dump of the authentication attempts with sipdump:

system("sipdump -p src/Voip-trace.pcap sip.dump", intern=TRUE)
##  [1] ""                                                               
##  [2] "SIPdump 0.2 "                                                   
##  [3] "---------------------------------------"                        
##  [4] ""                                                               
##  [5] "* Using pcap file 'src/Voip-trace.pcap' for sniffing"           
##  [6] "* Starting to sniff with packet filter 'tcp or udp'"            
##  [7] ""                                                               
##  [8] "* Dumped login from 172.25.105.40 -> 172.25.105.3 (User: '555')"
##  [9] "* Dumped login from 172.25.105.40 -> 172.25.105.3 (User: '555')"
## [10] "* Dumped login from 172.25.105.40 -> 172.25.105.3 (User: '555')"
## [11] ""                                                               
## [12] "* Exiting, sniffed 3 logins"

cat(readLines("sip.dump"), sep="\n")
## 172.25.105.3"172.25.105.40"555"asterisk"REGISTER"sip:172.25.105.40"4787f7ce""""MD5"1ac95ce17e1f0230751cf1fd3d278320
## 172.25.105.3"172.25.105.40"555"asterisk"INVITE"sip:1000@172.25.105.40"70fbfdae""""MD5"aa533f6efa2b2abac675c1ee6cbde327
## 172.25.105.3"172.25.105.40"555"asterisk"BYE"sip:1000@172.25.105.40"70fbfdae""""MD5"0b306e9db1f819dd824acf3227b60e07

It saves the IPs, caller, authentication realm, method, nonce and hash which will all be fed into the sipcrack.

We know from the placeholder answer text that the “password” is 4 characters, and this is the land of telephony, so we can make an assumption that it is really 4 digits. sipcrack needs a file of passwords to try, so We’ll let R make a randomized file of 4 digit pins for us:

cat(sprintf("%04d", sample(0:9999)), file = "4-digits", sep="\n")

We only have authenticaton packets for 555 so we can automate what would normally be an interactive process:

cat(system('echo "1" | sipcrack -w 4-digits sip.dump', intern=TRUE), sep="\n")
## 
## SIPcrack 0.2 
## ----------------------------------------
## 
## * Found Accounts:
## 
## Num  Server      Client      User    Hash|Password
## 
## 1    172.25.105.3    172.25.105.40   555 1ac95ce17e1f0230751cf1fd3d278320
## 2    172.25.105.3    172.25.105.40   555 aa533f6efa2b2abac675c1ee6cbde327
## 3    172.25.105.3    172.25.105.40   555 0b306e9db1f819dd824acf3227b60e07
## 
## * Select which entry to crack (1 - 3): 
## * Generating static MD5 hash... c3e0f1664fde9fbc75a7cbd341877875
## * Loaded wordlist: '4-digits'
## * Starting bruteforce against user '555' (MD5: '1ac95ce17e1f0230751cf1fd3d278320')
## * Tried 8904 passwords in 0 seconds
## 
## * Found password: '1234'
## * Updating dump file 'sip.dump'... done

Which RTP packet header field can be used to reorder out of sync RTP packets in the correct sequence?

Just reading involved here: 5.1 RTP Fixed Header Fields.

The trace includes a secret hidden message. Can you hear it?

We could command line this one but honestly Wireshark has a pretty keen audio player. Fire it up, open up the PCAP, go to the “Telephony” menu, pick SIP and play the streams.

Packet Maze: Solving a CyberDefenders PCAP Puzzle with R, Zeek, and tshark

2021-07-20 – 15:18
Posted in Cybersecurity, Data Analysis, data driven security, data science, data wrangling, DNS, Information Security, pcap, R
Comments (7)

It was a rainy weekend in southern Maine and I really didn’t feel like doing chores, so I was skimming through RSS feeds and noticed a link to a PacketMaze challenge in the latest This Week In 4n6.

Since it’s also been a while since I’ve done any serious content delivery (on the personal side, anyway), I thought it’d be fun to solve the challenge with some tools I like — namely Zeek, tshark, and R (links to those in the e-book I’m linking to below), craft some real expository around each solution, and bundle it all up into an e-book and lighter-weight GitHub repo.

There are 11 “quests” in the challenge, requiring sifting through a packet capture (PCAP) and looking for various odds and ends (some are very windy maze passages). The challenge ranges from extracting images and image metadata from FTP sessions to pulling out precise elements in TLS sessions, to dealing with IPv6.

This is far from an expert challenge, and anyone can likely work through it with a little bit of elbow grease.

As it says on the tin, not all data is ‘big’ nor do all data-driven cybersecurity projects require advanced modeling capabilities. Sometimes you just need to dissect some network packet capture (PCAP) data and don’t want to click through a GUI to get the job done. This short book works through the questions in CyberDefenders Lab #68 to show how you can get the Zeek open source network security tool, tshark command-line PCAP analysis Swiss army knife, and R (via RStudio) working together.

FIN

If you find the resource helpful or have other feedback, drop a note on Twitter (@hrbrmstr), in a comment here, or as a GitHub issue.

A Small macOS (Big Sur+) App to Extract Indicators of Compromise

2021-04-25 – 06:20
Posted in Information Security, macOS, Swift
Leave a Comment

There’s a semi-infrequent-but-frequent-enough-to-be-annoying manual task at $DAYJOB that involves extracting a particular set of strings (identifiable by a fairly benign set of regular expressions) from various interactive text sources (so, not static documents or documents easily scrape-able).

Rather than hack something onto Sublime Text or VS Code I made a small macOS app in SwiftUI that does the extraction when something is pasted.

It occurred to me that this would work for indicators of compromise (IoCs) — because why not add one more to the 5 billion of them on GitHub — and I forked my app, removed all the $WORK bits and added in some code to do just this, and unimaginatively dubbed it extractor. Here’s the main view:

For now, extractor handles identifying & extracting CIDRs, IPv4s, URLs, hostnames, and email addresses (file issues if you really want hashes, CVE strings or other types) either from:

an input URL (it fetches the content and extracts IoCs from the rendered HTML, not the HTML source);
items pasted into the textbox (more on some SwiftUI 2 foibles regarding that in a bit); and
PDF, HTML, and text files (via Open / ⌘-o)

Here it is extracting IoCs from one of FireEye’s “solarwinds”-related posts:

If you tick the “Monitor Pasteboard” toggle, the app will monitor the all system-wide additions to the pasteboard, extract the IoCs from them and put them in the textbox. (I think I really need to make this additive to the text in the textbox vs replacing what’s there).

You can save the indicators out to a text file (via Save / ⌘-s) or just copy them from the text box (if you want ndjson or some threat indicator sharing format file an issue).

That SwiftUI 2 Thing I Mentioned

SwiftUI 2 makes app-creation very straightforward, but it also still has many limitations. One of which is how windows/controls handle the “Paste” command. The glue code to make this app really work the way I’d like it to work is just annoying enough to have it on a TODO vs an ISDONE list and I’m hoping SwiftUI 3 comes out with WWDC 2021 (in a scant ~2 months) and provides a less hacky solution.

FIN

You can find the source and notarized binary releases of extractor on GitHub. File issues for questions, feature requests, or problems with the app/code.

Because I used SwiftUI 2, it is very likely possible to have this app work on iOS and iPadOS devices. I can’t see anyone using an iPad for DFIR work, but if you’d like a version of this for iOS/iPadOS, also drop an issue.