Skip navigation

Category Archives: Cybersecurity

VulnCheck has some new, free API endpoints for the cybersecurity community.

Two extremely useful ones are for their extended version of CISA’s KEV, and an in-situ replacement for NVD’s sad excuse for an API and soon-to-be-removed JSON feeds.

There are two ways to work with these APIs. One is retrieve a “backup” of the entire dataset as a ZIP file, and the other is to use the API to retrieve individual CVEs from each “index”.

You’ll need a free API key from VulnCheck to use these APIs.

All code shown makes the assumption that you’ve stored your API key in an environment variable named VULNCHECK_API_KEY.

After the curl examples, there’s a section on a small Golang CLI I made to make it easier to get combined extended KEV and NVDv2 CVE information in one CLI call for a given CVE.

Backups

Retrieving the complete dataset is a multi-step process. First you make a call to the specific API endpoint for each index to backup. That returns some JSON with a temporary, AWS pre-signed URL (a method to grant temporary access to files stored in AWS S3) to download the ZIP file. Then you download the ZIP file, and finally you extract the contents of the ZIP file into a directory. The output is different for the NVDv2 and extended KEV indexes, but the core process is the same.

NVDv2

Here’s a curl idiom for the NVDv2 index backup. The result is a directory of uncompressed JSON that’s in the same format as the NVDv2 JSON feeds.

# Grab the temporary AWS pre-signed URL for the NVDv2 index and then download the ZIP file.
curl \
  --silent \
  --output vcnvd2.zip --url "$(
    curl \
      --silent \
      --cookie "token=${VULNCHECK_API_KEY}" \
      --header 'Accept: application/json' \
      --url "https://api.vulncheck.com/v3/backup/nist-nvd2" | jq -r '.data[].url'
    )"

rm -rf ./nvd2

# unzip it
unzip -q -o -d ./nvd2 vcnvd2.zip

# uncompress the JSON files
ls ./nvd2/*gz | xargs gunzip

tree ./nvd2
./nvd2
├── nvdcve-2.0-000.json
├── nvdcve-2.0-001.json
├── nvdcve-2.0-002.json
├── nvdcve-2.0-003.json
├── nvdcve-2.0-004.json
├── nvdcve-2.0-005.json
├── nvdcve-2.0-006.json
├── nvdcve-2.0-007.json
├── nvdcve-2.0-008.json
├── nvdcve-2.0-009.json
├── nvdcve-2.0-010.json
├── nvdcve-2.0-011.json
├── nvdcve-2.0-012.json
├── nvdcve-2.0-013.json
├── nvdcve-2.0-014.json
├── nvdcve-2.0-015.json
├── nvdcve-2.0-016.json
├── nvdcve-2.0-017.json
├── nvdcve-2.0-018.json
├── nvdcve-2.0-019.json
├── nvdcve-2.0-020.json
├── nvdcve-2.0-021.json
├── nvdcve-2.0-022.json
├── nvdcve-2.0-023.json
├── nvdcve-2.0-024.json
├── nvdcve-2.0-025.json
├── nvdcve-2.0-026.json
├── nvdcve-2.0-027.json
├── nvdcve-2.0-028.json
├── nvdcve-2.0-029.json
├── nvdcve-2.0-030.json
├── nvdcve-2.0-031.json
├── nvdcve-2.0-032.json
├── nvdcve-2.0-033.json
├── nvdcve-2.0-034.json
├── nvdcve-2.0-035.json
├── nvdcve-2.0-036.json
├── nvdcve-2.0-037.json
├── nvdcve-2.0-038.json
├── nvdcve-2.0-039.json
├── nvdcve-2.0-040.json
├── nvdcve-2.0-041.json
├── nvdcve-2.0-042.json
├── nvdcve-2.0-043.json
├── nvdcve-2.0-044.json
├── nvdcve-2.0-045.json
├── nvdcve-2.0-046.json
├── nvdcve-2.0-047.json
├── nvdcve-2.0-048.json
├── nvdcve-2.0-049.json
├── nvdcve-2.0-050.json
├── nvdcve-2.0-051.json
├── nvdcve-2.0-052.json
├── nvdcve-2.0-053.json
├── nvdcve-2.0-054.json
├── nvdcve-2.0-055.json
├── nvdcve-2.0-056.json
├── nvdcve-2.0-057.json
├── nvdcve-2.0-058.json
├── nvdcve-2.0-059.json
├── nvdcve-2.0-060.json
├── nvdcve-2.0-061.json
├── nvdcve-2.0-062.json
├── nvdcve-2.0-063.json
├── nvdcve-2.0-064.json
├── nvdcve-2.0-065.json
├── nvdcve-2.0-066.json
├── nvdcve-2.0-067.json
├── nvdcve-2.0-068.json
├── nvdcve-2.0-069.json
├── nvdcve-2.0-070.json
├── nvdcve-2.0-071.json
├── nvdcve-2.0-072.json
├── nvdcve-2.0-073.json
├── nvdcve-2.0-074.json
├── nvdcve-2.0-075.json
├── nvdcve-2.0-076.json
├── nvdcve-2.0-077.json
├── nvdcve-2.0-078.json
├── nvdcve-2.0-079.json
├── nvdcve-2.0-080.json
├── nvdcve-2.0-081.json
├── nvdcve-2.0-082.json
├── nvdcve-2.0-083.json
├── nvdcve-2.0-084.json
├── nvdcve-2.0-085.json
├── nvdcve-2.0-086.json
├── nvdcve-2.0-087.json
├── nvdcve-2.0-088.json
├── nvdcve-2.0-089.json
├── nvdcve-2.0-090.json
├── nvdcve-2.0-091.json
├── nvdcve-2.0-092.json
├── nvdcve-2.0-093.json
├── nvdcve-2.0-094.json
├── nvdcve-2.0-095.json
├── nvdcve-2.0-096.json
├── nvdcve-2.0-097.json
├── nvdcve-2.0-098.json
├── nvdcve-2.0-099.json
├── nvdcve-2.0-100.json
├── nvdcve-2.0-101.json
├── nvdcve-2.0-102.json
├── nvdcve-2.0-103.json
├── nvdcve-2.0-104.json
├── nvdcve-2.0-105.json
├── nvdcve-2.0-106.json
├── nvdcve-2.0-107.json
├── nvdcve-2.0-108.json
├── nvdcve-2.0-109.json
├── nvdcve-2.0-110.json
├── nvdcve-2.0-111.json
├── nvdcve-2.0-112.json
├── nvdcve-2.0-113.json
├── nvdcve-2.0-114.json
├── nvdcve-2.0-115.json
├── nvdcve-2.0-116.json
├── nvdcve-2.0-117.json
├── nvdcve-2.0-118.json
├── nvdcve-2.0-119.json
├── nvdcve-2.0-120.json
└── nvdcve-2.0-121.json

1 directory, 122 files

VulnCheck’s Extended KEV

Here’s a curl idiom for the extended KEV index backup. The result is a directory with a single uncompressed JSON that’s in an extended format of what’s in the CISA KEV JSON.s

# Grab the temporary AWS pre-signed URL for the NVDv2 index and then download the ZIP file.
curl \
  --silent \
  --output vckev.zip --url "$(
    curl \
      --silent \
      --cookie "token=${VULNCHECK_API_KEY}" \
      --header 'Accept: application/json' \
      --url "https://api.vulncheck.com/v3/backup/vulncheck-kev" | jq -r '.data[].url'
    )"

rm -rf ./vckev

# unzip it
unzip -q -o -d ./vckev vckev.zip

tree ./vckev
./vckev
└── vulncheck_known_exploited_vulnerabilities.json

1 directory, 1 file

Retrieving Information On Individual CVEs

While there are other, searchable fields for each index, the primary use case for most of us is getting information on individual CVEs. The API calls are virtually identical, apart from the selected index.

NOTE: the examples pipe the output through jq to make the API results easier to read.

NVDv2

curl \
  --silent \
  --cookie "token=${VULNCHECK_API_KEY}" \
  --header 'Accept: application/json' \
  --url "https://api.vulncheck.com/v3/index/nist-nvd2?cve=CVE-2024-23334" | jq
{
  "_benchmark": 0.056277,
  "_meta": {
    "timestamp": "2024-03-23T08:47:17.940032202Z",
    "index": "nist-nvd2",
    "limit": 100,
    "total_documents": 1,
    "sort": "_id",
    "parameters": [
      {
        "name": "cve",
        "format": "CVE-YYYY-N{4-7}"
      },
      {
        "name": "alias"
      },
      {
        "name": "iava",
        "format": "[0-9]{4}[A-Z-0-9]+"
      },
      {
        "name": "threat_actor"
      },
      {
        "name": "mitre_id"
      },
      {
        "name": "misp_id"
      },
      {
        "name": "ransomware"
      },
      {
        "name": "botnet"
      },
      {
        "name": "published"
      },
      {
        "name": "lastModStartDate",
        "format": "YYYY-MM-DD"
      },
      {
        "name": "lastModEndDate",
        "format": "YYYY-MM-DD"
      }
    ],
    "order": "desc",
    "page": 1,
    "total_pages": 1,
    "max_pages": 6,
    "first_item": 1,
    "last_item": 1
  },
  "data": [
    {
      "id": "CVE-2024-23334",
      "sourceIdentifier": "security-advisories@github.com",
      "vulnStatus": "Modified",
      "published": "2024-01-29T23:15:08.563",
      "lastModified": "2024-02-09T03:15:09.603",
      "descriptions": [
        {
          "lang": "en",
          "value": "aiohttp is an asynchronous HTTP client/server framework for asyncio and Python. When using aiohttp as a web server and configuring static routes, it is necessary to specify the root path for static files. Additionally, the option 'follow_symlinks' can be used to determine whether to follow symbolic links outside the static root directory. When 'follow_symlinks' is set to True, there is no validation to check if reading a file is within the root directory. This can lead to directory traversal vulnerabilities, resulting in unauthorized access to arbitrary files on the system, even when symlinks are not present.  Disabling follow_symlinks and using a reverse proxy are encouraged mitigations.  Version 3.9.2 fixes this issue."
        },
        {
          "lang": "es",
          "value": "aiohttp es un framework cliente/servidor HTTP asíncrono para asyncio y Python. Cuando se utiliza aiohttp como servidor web y se configuran rutas estáticas, es necesario especificar la ruta raíz para los archivos estáticos. Además, la opción 'follow_symlinks' se puede utilizar para determinar si se deben seguir enlaces simbólicos fuera del directorio raíz estático. Cuando 'follow_symlinks' se establece en Verdadero, no hay validación para verificar si la lectura de un archivo está dentro del directorio raíz. Esto puede generar vulnerabilidades de directory traversal, lo que resulta en acceso no autorizado a archivos arbitrarios en el sistema, incluso cuando no hay enlaces simbólicos presentes. Se recomiendan como mitigaciones deshabilitar follow_symlinks y usar un proxy inverso. La versión 3.9.2 soluciona este problema."
        }
      ],
      "references": [
        {
          "url": "https://github.com/aio-libs/aiohttp/commit/1c335944d6a8b1298baf179b7c0b3069f10c514b",
          "source": "security-advisories@github.com",
          "tags": [
            "Patch"
          ]
        },
        {
          "url": "https://github.com/aio-libs/aiohttp/pull/8079",
          "source": "security-advisories@github.com",
          "tags": [
            "Patch"
          ]
        },
        {
          "url": "https://github.com/aio-libs/aiohttp/security/advisories/GHSA-5h86-8mv2-jq9f",
          "source": "security-advisories@github.com",
          "tags": [
            "Exploit",
            "Mitigation",
            "Vendor Advisory"
          ]
        },
        {
          "url": "https://lists.fedoraproject.org/archives/list/package-announce@lists.fedoraproject.org/message/ICUOCFGTB25WUT336BZ4UNYLSZOUVKBD/",
          "source": "security-advisories@github.com"
        },
        {
          "url": "https://lists.fedoraproject.org/archives/list/package-announce@lists.fedoraproject.org/message/XXWVZIVAYWEBHNRIILZVB3R3SDQNNAA7/",
          "source": "security-advisories@github.com",
          "tags": [
            "Mailing List"
          ]
        }
      ],
      "metrics": {
        "cvssMetricV31": [
          {
            "source": "nvd@nist.gov",
            "type": "Primary",
            "cvssData": {
              "version": "3.1",
              "vectorString": "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N",
              "attackVector": "NETWORK",
              "attackComplexity": "LOW",
              "privilegesRequired": "NONE",
              "userInteraction": "NONE",
              "scope": "UNCHANGED",
              "confidentialityImpact": "HIGH",
              "integrityImpact": "NONE",
              "availabilityImpact": "NONE",
              "baseScore": 7.5,
              "baseSeverity": "HIGH"
            },
            "exploitabilityScore": 3.9,
            "impactScore": 3.6
          },
          {
            "source": "security-advisories@github.com",
            "type": "Secondary",
            "cvssData": {
              "version": "3.1",
              "vectorString": "CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:U/C:H/I:N/A:N",
              "attackVector": "NETWORK",
              "attackComplexity": "HIGH",
              "privilegesRequired": "NONE",
              "userInteraction": "NONE",
              "scope": "UNCHANGED",
              "confidentialityImpact": "HIGH",
              "integrityImpact": "NONE",
              "availabilityImpact": "NONE",
              "baseScore": 5.9,
              "baseSeverity": "MEDIUM"
            },
            "exploitabilityScore": 2.2,
            "impactScore": 3.6
          }
        ]
      },
      "weaknesses": [
        {
          "source": "security-advisories@github.com",
          "type": "Primary",
          "description": [
            {
              "lang": "en",
              "value": "CWE-22"
            }
          ]
        }
      ],
      "configurations": [
        {
          "nodes": [
            {
              "operator": "OR",
              "cpeMatch": [
                {
                  "vulnerable": true,
                  "criteria": "cpe:2.3:a:aiohttp:aiohttp:*:*:*:*:*:*:*:*",
                  "versionStartIncluding": "1.0.5",
                  "versionEndExcluding": "3.9.2",
                  "matchCriteriaId": "CC18B2A9-9D80-4A6E-94E7-8FC010D8FC70"
                }
              ]
            }
          ]
        },
        {
          "nodes": [
            {
              "operator": "OR",
              "cpeMatch": [
                {
                  "vulnerable": true,
                  "criteria": "cpe:2.3:o:fedoraproject:fedora:39:*:*:*:*:*:*:*",
                  "matchCriteriaId": "B8EDB836-4E6A-4B71-B9B2-AA3E03E0F646"
                }
              ]
            }
          ]
        }
      ],
      "_timestamp": "2024-02-09T05:33:33.170054Z"
    }
  ]
}

VulnCheck’s Extended KEV

curl \
  --silent \
  --cookie "token=${VULNCHECK_API_KEY}" \
  --header 'Accept: application/json' \
  --url "https://api.vulncheck.com/v3/index/vulncheck-kev?cve=CVE-2024-23334" | jq
{
  "_benchmark": 0.328855,
  "_meta": {
    "timestamp": "2024-03-23T08:47:41.025967418Z",
    "index": "vulncheck-kev",
    "limit": 100,
    "total_documents": 1,
    "sort": "_id",
    "parameters": [
      {
        "name": "cve",
        "format": "CVE-YYYY-N{4-7}"
      },
      {
        "name": "alias"
      },
      {
        "name": "iava",
        "format": "[0-9]{4}[A-Z-0-9]+"
      },
      {
        "name": "threat_actor"
      },
      {
        "name": "mitre_id"
      },
      {
        "name": "misp_id"
      },
      {
        "name": "ransomware"
      },
      {
        "name": "botnet"
      },
      {
        "name": "published"
      },
      {
        "name": "lastModStartDate",
        "format": "YYYY-MM-DD"
      },
      {
        "name": "lastModEndDate",
        "format": "YYYY-MM-DD"
      },
      {
        "name": "pubStartDate",
        "format": "YYYY-MM-DD"
      },
      {
        "name": "pubEndDate",
        "format": "YYYY-MM-DD"
      }
    ],
    "order": "desc",
    "page": 1,
    "total_pages": 1,
    "max_pages": 6,
    "first_item": 1,
    "last_item": 1
  },
  "data": [
    {
      "vendorProject": "aiohttp",
      "product": "aiohttp",
      "shortDescription": "aiohttp is an asynchronous HTTP client/server framework for asyncio and Python. When using aiohttp as a web server and configuring static routes, it is necessary to specify the root path for static files. Additionally, the option 'follow_symlinks' can be used to determine whether to follow symbolic links outside the static root directory. When 'follow_symlinks' is set to True, there is no validation to check if reading a file is within the root directory. This can lead to directory traversal vulnerabilities, resulting in unauthorized access to arbitrary files on the system, even when symlinks are not present.  Disabling follow_symlinks and using a reverse proxy are encouraged mitigations.  Version 3.9.2 fixes this issue.",
      "vulnerabilityName": "aiohttp aiohttp Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')",
      "required_action": "Apply remediations or mitigations per vendor instructions or discontinue use of the product if remediation or mitigations are unavailable.",
      "knownRansomwareCampaignUse": "Known",
      "cve": [
        "CVE-2024-23334"
      ],
      "vulncheck_xdb": [
        {
          "xdb_id": "231b48941355",
          "xdb_url": "https://vulncheck.com/xdb/231b48941355",
          "date_added": "2024-02-28T22:30:21Z",
          "exploit_type": "infoleak",
          "clone_ssh_url": "git@github.com:ox1111/CVE-2024-23334.git"
        },
        {
          "xdb_id": "f1d001911304",
          "xdb_url": "https://vulncheck.com/xdb/f1d001911304",
          "date_added": "2024-03-19T16:28:56Z",
          "exploit_type": "infoleak",
          "clone_ssh_url": "git@github.com:jhonnybonny/CVE-2024-23334.git"
        }
      ],
      "vulncheck_reported_exploitation": [
        {
          "url": "https://cyble.com/blog/cgsi-probes-shadowsyndicate-groups-possible-exploitation-of-aiohttp-vulnerability-cve-2024-23334/",
          "date_added": "2024-03-15T00:00:00Z"
        }
      ],
      "date_added": "2024-03-15T00:00:00Z",
      "_timestamp": "2024-03-23T08:27:47.861266Z"
    }
  ]
}

vccve

There’s a project on Codeberg that has code and binaries for macOS, Linux, and Windows for a small CLI that gets you combined extended KEV and NVDv2 information all in one call.

The project README has examples and installation instructions.

If you’ve got 👀 on this blog (directly, or via syndication) you’d have to have been living under a rock to not know about the libwebp supply chain disaster. An unfortunate casualty of inept programming just happened to be any app in the Electron ecosystem that doesn’t undergo bleeding-edge updates.

Former cow-orker Tom Sellers (one of the best humans in cyber) did a great service to the macOS user community with tips on how to stay safe on macOS. His find + strings + grep combo was superbly helpful and I hope many macOS users did the command line dance to see how negligent their app providers were/are.

But, you still have to know what versions are OK and which ones are not to do that dance. And, having had yet-another immune system invasion (thankfully, not COVID, again) on top of still working through long COVID (#protip: you may be over the pandemic, but I guarantee it’s not done with you/us for a while) which re-sapped mobility energy, I put my sedentary time to less woesome use by hacking together a small, Golang macOS CLI to help ferret out bad Electron-based apps you may have installed.

I named it positron, since that’s kind of the opposite of Electron, and I was pretty creativity-challenged today.

It does virtually the same thing as Tom’s strings and grep does, just in a single, lightweight, universal, signed macOS binary.

When I ran it after the final build, all my Electron-based apps were 🔴. After deleting some, and updating others, this is my current status:

$ find /Applications -type f -name "*Electron Framework*" -exec ./positron "{}" \;
/Applications/Signal.app: Chrome/114.0.5735.289 Electron/25.8.4 🟢
/Applications/Keybase.app: Chrome/87.0.4280.141 Electron/11.5.0 🔴
/Applications/Raindrop.io.app: Chrome/102.0.5005.167 Electron/19.0.17 🔴
/Applications/1Password.app: Chrome/114.0.5735.289 Electron/25.8.1 🟢
/Applications/Replit.app: Chrome/116.0.5845.188 Electron/26.2.1 🟢
/Applications/lghub.app: Chrome/104.0.5112.65 Electron/20.0.0 🔴

It’s still on you to do the find (cooler folks run fd) since I’m not about to write a program that’ll rummage across your SSDs or disc drives, but it does all the MachO inspection internally, and then also does the SemVer comparison to let you know which apps still suck at keeping you safe.

FWIW, the Keybase folks did accept a PR for the libwebp thing, but darned if I will spend any time building it (I don’t run it anymore, anyway, so I should just delete it).

The aforementioned signed, universal, macOS binary is in the GitLab releases.

Stay safe out there!

Hot on the heels of the previous CyberDefenders Challenge Solution comes this noisy installment which solves their Acoustic challenge.

You can find the source Rmd on GitHub, but I’m also testing the limits of WP’s markdown rendering and putting it in-stream as well.

No longer book expository this time since much of the setup/explanatory bits from it apply here as well).

Acoustic

This challenge takes us “into the world of voice communications on the internet. VoIP is becoming the de-facto standard for voice communication. As this technology becomes more common, malicious parties have more opportunities and stronger motives to control these systems to conduct nefarious activities. This challenge was designed to examine and explore some of the attributes of the SIP and RTP protocols.”

We have two files to work with:

  • log.txt which was generated from an unadvertised, passive honeypot located on the internet such that any traffic destined to it must be nefarious. Unknown parties scanned the honeypot with a range of tools, and this activity is represented in the log file.
    • The IP address of the honeypot has been changed to “honey.pot.IP.removed”. In terms of geolocation, pick your favorite city.
    • The MD5 hash in the authorization digest is replaced with “MD5_hash_removedXXXXXXXXXXXXXXXX
    • Some octets of external IP addresses have been replaced with an “X”
    • Several trailing digits of phone numbers have been replaced with an “X”
    • Assume the timestamps in the log files are UTC.
  • Voip-trace.pcap was created by honeynet members for this forensic challenge to allow participants to employ network analysis skills in the VOIP context.

There are 14 questions to answer.

If you are not familiar with SIP and/or RTP you should do a bit of research first. A good place to start is RTC 3261 (for SIP) and RFC 3550 (for RTC). Some questions may be able to be answered just by knowing the details of these protocols.

Convert the PCAP

library(stringi)
library(tidyverse)

We’ll pre-generate Zeek logs. The -C tells Zeek to not bother with checksums, -r tells it to read from a file and the LogAscii::use_json=T means we want JSON output vs the default delimited files. JSON gives us data types (the headers in the delimited files do as well, but we’d have to write something to read those types then deal with it vs get this for free out of the box with JSON).

system("ZEEK_LOG_SUFFIX=json /opt/zeek/bin/zeek -C -r src/Voip-trace.pcap LogAscii::use_json=T HTTP::default_capture_password=T")

We process the PCAP twice with tshark. Once to get the handy (and small) packet summary table, then dump the whole thing to JSON. We may need to run tshark again down the road a bit.

system("tshark -T tabs -r src/Voip-trace.pcap > voip-packets.tsv")
system("tshark -T json -r src/Voip-trace.pcap > voip-trace")

Examine and Process log.txt

We aren’t told what format log.txt is in, so let’s take a look:

cd_sip_log <- stri_read_lines("src/log.txt")

cat(head(cd_sip_log, 25), sep="\n")
## Source: 210.184.X.Y:1083
## Datetime: 2010-05-02 01:43:05.606584
## 
## Message:
## 
## OPTIONS sip:100@honey.pot.IP.removed SIP/2.0
## Via: SIP/2.0/UDP 127.0.0.1:5061;branch=z9hG4bK-2159139916;rport
## Content-Length: 0
## From: "sipvicious"<sip:100@1.1.1.1>; tag=X_removed
## Accept: application/sdp
## User-Agent: friendly-scanner
## To: "sipvicious"<sip:100@1.1.1.1>
## Contact: sip:100@127.0.0.1:5061
## CSeq: 1 OPTIONS
## Call-ID: 845752980453913316694142
## Max-Forwards: 70
## 
## 
## 
## 
## -------------------------
## Source: 210.184.X.Y:4956
## Datetime: 2010-05-02 01:43:12.488811
## 
## Message:

These look a bit like HTTP server responses, but we know we’re working in SIP land and if you perused the RFC you’d have noticed that SIP is an HTTP-like ASCII protocol. While some HTTP response parsers might work on these records, it’s pretty straightforward to whip up a bespoke pseudo-parser.

Let’s see how many records there are by counting the number of “Message:” lines (we’re doing this, primarily, to see if we should use the {furrr} package to speed up processing):

cd_sip_log[stri_detect_fixed(cd_sip_log, "Message:")] %>%
  table()
## .
## Message: 
##     4266

There are many, so we’ll avoid parallel processing the data and just use a single thread.

One way to tackle the parsing is to look for the stop and start of each record, extract fields (these have similar formats to HTTP headers), and perhaps have to extract content as well. We know this because there are “Content-Length:” fields. According to the RFC they are supposed to exist for every message. Let’s first see if any “Content-Length:” header records are greater than 0. We’ll do this with a little help from the ripgrep utility as it provides a way to see context before and/or after matched patterns:

cat(system('rg --after-context=10 "^Content-Length: [^0]" src/log.txt', intern=TRUE), sep="\n")
## Content-Length: 330
## 
## v=0
## o=Zoiper_user 0 0 IN IP4 89.42.194.X
## s=Zoiper_session
## c=IN IP4 89.42.194.X
## t=0 0
## m=audio 52999 RTP/AVP 3 0 8 110 98 101
## a=rtpmap:3 GSM/8000
## a=rtpmap:0 PCMU/8000
## a=rtpmap:8 PCMA/8000
## --
## Content-Length: 330
## 
## v=0
## o=Zoiper_user 0 0 IN IP4 89.42.194.X
## s=Zoiper_session
## c=IN IP4 89.42.194.X
## t=0 0
## m=audio 52999 RTP/AVP 3 0 8 110 98 101
## a=rtpmap:3 GSM/8000
## a=rtpmap:0 PCMU/8000
## a=rtpmap:8 PCMA/8000
## --
## Content-Length: 330
## 
## v=0
## o=Zoiper_user 0 0 IN IP4 89.42.194.X
## s=Zoiper_session
## c=IN IP4 89.42.194.X
## t=0 0
## m=audio 52999 RTP/AVP 3 0 8 110 98 101
## a=rtpmap:3 GSM/8000
## a=rtpmap:0 PCMU/8000
## a=rtpmap:8 PCMA/8000
## --
## Content-Length: 330
## 
## v=0
## o=Zoiper_user 0 0 IN IP4 89.42.194.X
## s=Zoiper_session
## c=IN IP4 89.42.194.X
## t=0 0
## m=audio 52999 RTP/AVP 3 0 8 110 98 101
## a=rtpmap:3 GSM/8000
## a=rtpmap:0 PCMU/8000
## a=rtpmap:8 PCMA/8000

So,we do need to account for content. It’s still pretty straightforward (explanatory comments inline):

starts <- which(stri_detect_regex(cd_sip_log, "^Source:"))
stops <- which(stri_detect_regex(cd_sip_log, "^----------"))

map2_dfr(starts, stops, ~{

  raw_rec <- stri_trim_both(cd_sip_log[.x:.y]) # target the record from the log
  raw_rec <- raw_rec[raw_rec != "-------------------------"] # remove separator

  msg_idx <- which(stri_detect_regex(raw_rec, "^Message:")) # find where "Message:" line is
  source_idx <- which(stri_detect_regex(raw_rec, "^Source: ")) # find where "Source:" line is
  datetime_idx <- which(stri_detect_regex(raw_rec, "^Datetime: ")) # find where "Datetime:" line is
  contents_idx <- which(stri_detect_regex(raw_rec[(msg_idx+2):length(raw_rec)], "^$"))[1] + 2 # get position of the "data"

  source <- stri_match_first_regex(raw_rec[source_idx], "^Source: (.*)$")[,2] # extract source
  datetime <- stri_match_first_regex(raw_rec[datetime_idx], "^Datetime: (.*)$")[,2] # extract datetime
  request <- raw_rec[msg_idx+2] # extract request line

  # build a matrix out of the remaining headers. header key will be in column 2, value will be in column 3
  tmp <- stri_match_first_regex(raw_rec[(msg_idx+3):contents_idx], "^([^:]+):[[:space:]]+(.*)$")
  tmp[,2] <- stri_trans_tolower(tmp[,2]) # lowercase the header key
  tmp[,2] <- stri_replace_all_fixed(tmp[,2], "-", "_") # turn dashes to underscores so we can more easily use the keys as column names

  contents <- raw_rec[(contents_idx+1):length(raw_rec)]
  contents <- paste0(contents[contents != ""], collapse = "\n")

  as.list(tmp[,3]) %>% # turn the header values into a list
    set_names(tmp[,2]) %>% # make their names the tranformed keys
    append(c(
      source = source, # add source to the list (etc)
      datetime = datetime,
      request = request,
      contents = contents
    ))

}) -> sip_log_parsed

Let’s see what we have:

sip_log_parsed
## # A tibble: 4,266 x 18
##    via     content_length from    accept  user_agent to     contact cseq  source
##    <chr>   <chr>          <chr>   <chr>   <chr>      <chr>  <chr>   <chr> <chr> 
##  1 SIP/2.… 0              "\"sip… applic… friendly-… "\"si… sip:10… 1 OP… 210.1…
##  2 SIP/2.… 0              "\"342… applic… friendly-… "\"34… sip:34… 1 RE… 210.1…
##  3 SIP/2.… 0              "\"172… applic… friendly-… "\"17… sip:17… 1 RE… 210.1…
##  4 SIP/2.… 0              "\"adm… applic… friendly-… "\"ad… sip:ad… 1 RE… 210.1…
##  5 SIP/2.… 0              "\"inf… applic… friendly-… "\"in… sip:in… 1 RE… 210.1…
##  6 SIP/2.… 0              "\"tes… applic… friendly-… "\"te… sip:te… 1 RE… 210.1…
##  7 SIP/2.… 0              "\"pos… applic… friendly-… "\"po… sip:po… 1 RE… 210.1…
##  8 SIP/2.… 0              "\"sal… applic… friendly-… "\"sa… sip:sa… 1 RE… 210.1…
##  9 SIP/2.… 0              "\"ser… applic… friendly-… "\"se… sip:se… 1 RE… 210.1…
## 10 SIP/2.… 0              "\"sup… applic… friendly-… "\"su… sip:su… 1 RE… 210.1…
## # … with 4,256 more rows, and 9 more variables: datetime <chr>, request <chr>,
## #   contents <chr>, call_id <chr>, max_forwards <chr>, expires <chr>,
## #   allow <chr>, authorization <chr>, content_type <chr>
glimpse(sip_log_parsed)
## Rows: 4,266
## Columns: 18
## $ via            <chr> "SIP/2.0/UDP 127.0.0.1:5061;branch=z9hG4bK-2159139916;r…
## $ content_length <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", …
## $ from           <chr> "\"sipvicious\"<sip:100@1.1.1.1>; tag=X_removed", "\"34…
## $ accept         <chr> "application/sdp", "application/sdp", "application/sdp"…
## $ user_agent     <chr> "friendly-scanner", "friendly-scanner", "friendly-scann…
## $ to             <chr> "\"sipvicious\"<sip:100@1.1.1.1>", "\"3428948518\"<sip:…
## $ contact        <chr> "sip:100@127.0.0.1:5061", "sip:3428948518@honey.pot.IP.…
## $ cseq           <chr> "1 OPTIONS", "1 REGISTER", "1 REGISTER", "1 REGISTER", …
## $ source         <chr> "210.184.X.Y:1083", "210.184.X.Y:4956", "210.184.X.Y:51…
## $ datetime       <chr> "2010-05-02 01:43:05.606584", "2010-05-02 01:43:12.4888…
## $ request        <chr> "OPTIONS sip:100@honey.pot.IP.removed SIP/2.0", "REGIST…
## $ contents       <chr> "Call-ID: 845752980453913316694142\nMax-Forwards: 70", …
## $ call_id        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ max_forwards   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ expires        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ allow          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ authorization  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ content_type   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

Looks 👍, but IRL there are edge-cases we’d have to deal with.

Process Zeek Logs

Because they’re JSON files, and the names are reasonable, we can do some magic incantations to read them all in and shove them into a list we’ll call zeek:

zeek <- list()

list.files(
  pattern = "json$",
  full.names = TRUE
) %>%
  walk(~{
    append(zeek, list(file(.x) %>% 
      jsonlite::stream_in(verbose = FALSE) %>%
      as_tibble()) %>% 
        set_names(tools::file_path_sans_ext(basename(.x)))
    ) ->> zeek
  })

str(zeek, 1)
## List of 7
##  $ conn         : tibble [97 × 18] (S3: tbl_df/tbl/data.frame)
##  $ dpd          : tibble [1 × 9] (S3: tbl_df/tbl/data.frame)
##  $ files        : tibble [38 × 16] (S3: tbl_df/tbl/data.frame)
##  $ http         : tibble [92 × 24] (S3: tbl_df/tbl/data.frame)
##  $ packet_filter: tibble [1 × 5] (S3: tbl_df/tbl/data.frame)
##  $ sip          : tibble [9 × 23] (S3: tbl_df/tbl/data.frame)
##  $ weird        : tibble [1 × 9] (S3: tbl_df/tbl/data.frame)
walk2(names(zeek), zeek, ~{
  cat("File:", .x, "\n")
  glimpse(.y)
  cat("\n\n")
})
## File: conn 
## Rows: 97
## Columns: 18
## $ ts            <dbl> 1272737631, 1272737581, 1272737669, 1272737669, 12727376…
## $ uid           <chr> "Cb0OAQ1eC0ZhQTEKNl", "C2s0IU2SZFGVlZyH43", "CcEeLRD3cca…
## $ id.orig_h     <chr> "172.25.105.43", "172.25.105.43", "172.25.105.43", "172.…
## $ id.orig_p     <int> 57086, 5060, 57087, 57088, 57089, 57090, 57091, 57093, 5…
## $ id.resp_h     <chr> "172.25.105.40", "172.25.105.40", "172.25.105.40", "172.…
## $ id.resp_p     <int> 80, 5060, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80…
## $ proto         <chr> "tcp", "udp", "tcp", "tcp", "tcp", "tcp", "tcp", "tcp", …
## $ service       <chr> "http", "sip", "http", "http", "http", "http", "http", "…
## $ duration      <dbl> 0.0180180073, 0.0003528595, 0.0245900154, 0.0740420818, …
## $ orig_bytes    <int> 502, 428, 380, 385, 476, 519, 520, 553, 558, 566, 566, 5…
## $ resp_bytes    <int> 720, 518, 231, 12233, 720, 539, 17499, 144, 144, 144, 14…
## $ conn_state    <chr> "SF", "SF", "SF", "SF", "SF", "SF", "SF", "SF", "SF", "S…
## $ missed_bytes  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ history       <chr> "ShADadfF", "Dd", "ShADadfF", "ShADadfF", "ShADadfF", "S…
## $ orig_pkts     <int> 5, 1, 5, 12, 5, 6, 16, 6, 6, 5, 5, 5, 5, 5, 5, 5, 6, 5, …
## $ orig_ip_bytes <int> 770, 456, 648, 1017, 744, 839, 1360, 873, 878, 834, 834,…
## $ resp_pkts     <int> 5, 1, 5, 12, 5, 5, 16, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
## $ resp_ip_bytes <int> 988, 546, 499, 12865, 988, 807, 18339, 412, 412, 412, 41…
## 
## 
## File: dpd 
## Rows: 1
## Columns: 9
## $ ts             <dbl> 1272737798
## $ uid            <chr> "CADvMziC96POynR2e"
## $ id.orig_h      <chr> "172.25.105.3"
## $ id.orig_p      <int> 43204
## $ id.resp_h      <chr> "172.25.105.40"
## $ id.resp_p      <int> 5060
## $ proto          <chr> "udp"
## $ analyzer       <chr> "SIP"
## $ failure_reason <chr> "Binpac exception: binpac exception: string mismatch at…
## 
## 
## File: files 
## Rows: 38
## Columns: 16
## $ ts             <dbl> 1272737631, 1272737669, 1272737676, 1272737688, 1272737…
## $ fuid           <chr> "FRnb7P5EDeZE4Y3z4", "FOT2gC2yLxjfMCuE5f", "FmUCuA3dzcS…
## $ tx_hosts       <list> "172.25.105.40", "172.25.105.40", "172.25.105.40", "17…
## $ rx_hosts       <list> "172.25.105.43", "172.25.105.43", "172.25.105.43", "17…
## $ conn_uids      <list> "Cb0OAQ1eC0ZhQTEKNl", "CFfYtA0DqqrJk4gI5", "CHN4qA4UUH…
## $ source         <chr> "HTTP", "HTTP", "HTTP", "HTTP", "HTTP", "HTTP", "HTTP",…
## $ depth          <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ analyzers      <list> [], [], [], [], [], [], [], [], [], [], [], [], [], []…
## $ mime_type      <chr> "text/html", "text/html", "text/html", "text/html", "te…
## $ duration       <dbl> 0.000000e+00, 8.920908e-03, 0.000000e+00, 0.000000e+00,…
## $ is_orig        <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, …
## $ seen_bytes     <int> 479, 11819, 479, 313, 17076, 55, 50, 30037, 31608, 1803…
## $ total_bytes    <int> 479, NA, 479, 313, NA, 55, 50, NA, NA, NA, 58, 313, 50,…
## $ missing_bytes  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ overflow_bytes <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ timedout       <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,…
## 
## 
## File: http 
## Rows: 92
## Columns: 24
## $ ts                <dbl> 1272737631, 1272737669, 1272737669, 1272737676, 1272…
## $ uid               <chr> "Cb0OAQ1eC0ZhQTEKNl", "CcEeLRD3cca3j4QGh", "CFfYtA0D…
## $ id.orig_h         <chr> "172.25.105.43", "172.25.105.43", "172.25.105.43", "…
## $ id.orig_p         <int> 57086, 57087, 57088, 57089, 57090, 57091, 57093, 570…
## $ id.resp_h         <chr> "172.25.105.40", "172.25.105.40", "172.25.105.40", "…
## $ id.resp_p         <int> 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, 80, …
## $ trans_depth       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ method            <chr> "GET", "GET", "GET", "GET", "GET", "GET", "GET", "GE…
## $ host              <chr> "172.25.105.40", "172.25.105.40", "172.25.105.40", "…
## $ uri               <chr> "/maint", "/", "/user/", "/maint", "/maint", "/maint…
## $ referrer          <chr> "http://172.25.105.40/user/", NA, NA, "http://172.25…
## $ version           <chr> "1.1", "1.1", "1.1", "1.1", "1.1", "1.1", "1.1", "1.…
## $ user_agent        <chr> "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9)…
## $ request_body_len  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ response_body_len <int> 479, 0, 11819, 479, 313, 17076, 0, 0, 0, 0, 0, 0, 0,…
## $ status_code       <int> 401, 302, 200, 401, 301, 200, 304, 304, 304, 304, 30…
## $ status_msg        <chr> "Authorization Required", "Found", "OK", "Authorizat…
## $ tags              <list> [], [], [], [], [], [], [], [], [], [], [], [], [],…
## $ resp_fuids        <list> "FRnb7P5EDeZE4Y3z4", <NULL>, "FOT2gC2yLxjfMCuE5f", …
## $ resp_mime_types   <list> "text/html", <NULL>, "text/html", "text/html", "tex…
## $ username          <chr> NA, NA, NA, NA, "maint", "maint", "maint", "maint", …
## $ password          <chr> NA, NA, NA, NA, "password", "password", "password", …
## $ orig_fuids        <list> <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NU…
## $ orig_mime_types   <list> <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NU…
## 
## 
## File: packet_filter 
## Rows: 1
## Columns: 5
## $ ts      <dbl> 1627151196
## $ node    <chr> "zeek"
## $ filter  <chr> "ip or not ip"
## $ init    <lgl> TRUE
## $ success <lgl> TRUE
## 
## 
## File: sip 
## Rows: 9
## Columns: 23
## $ ts                <dbl> 1272737581, 1272737768, 1272737768, 1272737768, 1272…
## $ uid               <chr> "C2s0IU2SZFGVlZyH43", "CADvMziC96POynR2e", "CADvMziC…
## $ id.orig_h         <chr> "172.25.105.43", "172.25.105.3", "172.25.105.3", "17…
## $ id.orig_p         <int> 5060, 43204, 43204, 43204, 43204, 43204, 43204, 4320…
## $ id.resp_h         <chr> "172.25.105.40", "172.25.105.40", "172.25.105.40", "…
## $ id.resp_p         <int> 5060, 5060, 5060, 5060, 5060, 5060, 5060, 5060, 5060
## $ trans_depth       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ method            <chr> "OPTIONS", "REGISTER", "REGISTER", "SUBSCRIBE", "SUB…
## $ uri               <chr> "sip:100@172.25.105.40", "sip:172.25.105.40", "sip:1…
## $ request_from      <chr> "\"sipvicious\"<sip:100@1.1.1.1>", "<sip:555@172.25.…
## $ request_to        <chr> "\"sipvicious\"<sip:100@1.1.1.1>", "<sip:555@172.25.…
## $ response_from     <chr> "\"sipvicious\"<sip:100@1.1.1.1>", "<sip:555@172.25.…
## $ response_to       <chr> "\"sipvicious\"<sip:100@1.1.1.1>;tag=as18cdb0c9", "<…
## $ call_id           <chr> "61127078793469957194131", "MzEwMmYyYWRiYTUxYTBhODY3…
## $ seq               <chr> "1 OPTIONS", "1 REGISTER", "2 REGISTER", "1 SUBSCRIB…
## $ request_path      <list> "SIP/2.0/UDP 127.0.1.1:5060", "SIP/2.0/UDP 172.25.10…
## $ response_path     <list> "SIP/2.0/UDP 127.0.1.1:5060", "SIP/2.0/UDP 172.25.10…
## $ user_agent        <chr> "UNfriendly-scanner - for demo purposes", "X-Lite B…
## $ status_code       <int> 200, 401, 200, 401, 404, 401, 100, 200, NA
## $ status_msg        <chr> "OK", "Unauthorized", "OK", "Unauthorized", "Not fo…
## $ request_body_len  <int> 0, 0, 0, 0, 0, 264, 264, 264, 0
## $ response_body_len <int> 0, 0, 0, 0, 0, 0, 0, 302, NA
## $ content_type      <chr> NA, NA, NA, NA, NA, NA, NA, "application/sdp", NA
## 
## 
## File: weird 
## Rows: 1
## Columns: 9
## $ ts        <dbl> 1272737805
## $ id.orig_h <chr> "172.25.105.3"
## $ id.orig_p <int> 0
## $ id.resp_h <chr> "172.25.105.40"
## $ id.resp_p <int> 0
## $ name      <chr> "truncated_IPv6"
## $ notice    <lgl> FALSE
## $ peer      <chr> "zeek"
## $ source    <chr> "IP"

Process Packet Summary

We won’t process the big JSON file tshark generated for us util we really have to, but we can read in the packet summary table now:

packet_cols <- c("packet_num", "ts", "src", "discard", "dst", "proto", "length", "info")

read_tsv(
  file = "voip-packets.tsv",
  col_names = packet_cols,
  col_types = "ddccccdc"
) %>%
  select(-discard) -> packets

packets
## # A tibble: 4,447 x 7
##    packet_num       ts src      dst     proto length info                       
##         <dbl>    <dbl> <chr>    <chr>   <chr>  <dbl> <chr>                      
##  1          1  0       172.25.… 172.25… SIP      470 Request: OPTIONS sip:100@1…
##  2          2  3.53e-4 172.25.… 172.25… SIP      560 Status: 200 OK |           
##  3          3  5.03e+1 172.25.… 172.25… TCP       74 57086 → 80 [SYN] Seq=0 Win…
##  4          4  5.03e+1 172.25.… 172.25… TCP       74 80 → 57086 [SYN, ACK] Seq=…
##  5          5  5.03e+1 172.25.… 172.25… TCP       66 57086 → 80 [ACK] Seq=1 Ack…
##  6          6  5.03e+1 172.25.… 172.25… HTTP     568 GET /maint HTTP/1.1        
##  7          7  5.03e+1 172.25.… 172.25… TCP       66 80 → 57086 [ACK] Seq=1 Ack…
##  8          8  5.03e+1 172.25.… 172.25… HTTP     786 HTTP/1.1 401 Authorization…
##  9          9  5.03e+1 172.25.… 172.25… TCP       66 80 → 57086 [FIN, ACK] Seq=…
## 10         10  5.03e+1 172.25.… 172.25… TCP       66 57086 → 80 [ACK] Seq=503 A…
## # … with 4,437 more rows
glimpse(packets)
## Rows: 4,447
## Columns: 7
## $ packet_num <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
## $ ts         <dbl> 0.000000, 0.000353, 50.317176, 50.317365, 50.320071, 50.329…
## $ src        <chr> "172.25.105.43", "172.25.105.40", "172.25.105.43", "172.25.…
## $ dst        <chr> "172.25.105.40", "172.25.105.43", "172.25.105.40", "172.25.…
## $ proto      <chr> "SIP", "SIP", "TCP", "TCP", "TCP", "HTTP", "TCP", "HTTP", "…
## $ length     <dbl> 470, 560, 74, 74, 66, 568, 66, 786, 66, 66, 66, 66, 74, 74,…
## $ info       <chr> "Request: OPTIONS sip:100@172.25.105.40 |", "Status: 200 OK…

What is the transport protocol being used?

SIP can use TCP or UDP and which transport it uses will be specified in the Via: header. Let’s take a look:

head(sip_log_parsed$via)
## [1] "SIP/2.0/UDP 127.0.0.1:5061;branch=z9hG4bK-2159139916;rport"
## [2] "SIP/2.0/UDP 127.0.0.1:5087;branch=z9hG4bK-1189344537;rport"
## [3] "SIP/2.0/UDP 127.0.0.1:5066;branch=z9hG4bK-2119091576;rport"
## [4] "SIP/2.0/UDP 127.0.0.1:5087;branch=z9hG4bK-3226446220;rport"
## [5] "SIP/2.0/UDP 127.0.0.1:5087;branch=z9hG4bK-1330901245;rport"
## [6] "SIP/2.0/UDP 127.0.0.1:5087;branch=z9hG4bK-945386205;rport"

Are they all UDP? We can find out by performing some light processing
on the via column:

sip_log_parsed %>% 
  select(via) %>% 
  mutate(
    transport = stri_match_first_regex(via, "^([^[:space:]]+)")[,2]
  ) %>% 
  count(transport, sort=TRUE)
## # A tibble: 1 x 2
##   transport       n
##   <chr>       <int>
## 1 SIP/2.0/UDP  4266

Looks like they’re all UDP. Question 1: ✅

The attacker used a bunch of scanning tools that belong to the same suite. Provide the name of the suite.

Don’t you, now, wish you had listen to your parents when they were telling you about the facts of SIP life when you were a wee pup?

We’ll stick with the SIP log to answer this one and peek back at the RFC to see that there’s a “User-Agent:” field which contains information about the client originating the request. Most scanners written by defenders identify themselves in User-Agent fields when those fields are available in a protocol exchange, and a large percentage of naive malicious folks are too daft to change this value (or leave it default to make you think they’re not behaving badly).

If you are a regular visitor to SIP land, you likely know the common SIP scanning tools. These are a few:

  • Nmap’s SIP library
  • Mr.SIP, a “SIP-Based Audit and Attack Tool”
  • SIPVicious, a “set of security tools that can be used to audit SIP based VoIP systems”
  • Sippts, a “set of tools to audit SIP based VoIP Systems”

(There are many more.)

Let’s see what user-agent was used in this log extract:

count(sip_log_parsed, user_agent, sort=TRUE)
## # A tibble: 3 x 2
##   user_agent           n
##   <chr>            <int>
## 1 friendly-scanner  4248
## 2 Zoiper rev.6751     14
## 3 <NA>                 4

The overwhelming majority are friendly-scanner. Let’s look at a few of those log entries:

sip_log_parsed %>% 
  filter(
    user_agent == "friendly-scanner"
  ) %>% 
  glimpse()
## Rows: 4,248
## Columns: 18
## $ via            <chr> "SIP/2.0/UDP 127.0.0.1:5061;branch=z9hG4bK-2159139916;r…
## $ content_length <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", …
## $ from           <chr> "\"sipvicious\"<sip:100@1.1.1.1>; tag=X_removed", "\"34…
## $ accept         <chr> "application/sdp", "application/sdp", "application/sdp"…
## $ user_agent     <chr> "friendly-scanner", "friendly-scanner", "friendly-scann…
## $ to             <chr> "\"sipvicious\"<sip:100@1.1.1.1>", "\"3428948518\"<sip:…
## $ contact        <chr> "sip:100@127.0.0.1:5061", "sip:3428948518@honey.pot.IP.…
## $ cseq           <chr> "1 OPTIONS", "1 REGISTER", "1 REGISTER", "1 REGISTER", …
## $ source         <chr> "210.184.X.Y:1083", "210.184.X.Y:4956", "210.184.X.Y:51…
## $ datetime       <chr> "2010-05-02 01:43:05.606584", "2010-05-02 01:43:12.4888…
## $ request        <chr> "OPTIONS sip:100@honey.pot.IP.removed SIP/2.0", "REGIST…
## $ contents       <chr> "Call-ID: 845752980453913316694142\nMax-Forwards: 70", …
## $ call_id        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ max_forwards   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ expires        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ allow          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ authorization  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ content_type   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

Those from and to fields have an interesting name in them: “sipviscious”. You’ve seen that before, right at the beginning of this section.

Let’s do a quick check over at the SIPvicious repo just to make sure.

count(sip_log_parsed, user_agent)
## # A tibble: 3 x 2
##   user_agent           n
##   <chr>            <int>
## 1 friendly-scanner  4248
## 2 Zoiper rev.6751     14
## 3 <NA>                 4

“What is the User-Agent of the victim system?”

We only have partial data in the text log so we’ll have to look elsewhere (the PCAP) for this information. The “victim” is whatever was the target of a this SIP-based attack and we can look for SIP messages, user agents, and associated IPs in the PCAP thanks to tshark’s rich SIP filter library:

system("tshark -Q -T fields -e ip.src -e ip.dst -e sip.User-Agent -r src/Voip-trace.pcap 'sip.User-Agent'")

That first exchange is all we really need. We see our rude poker talking to 172.25.105.40 and it responding right after.

Which tool was only used against the following extensions: 100, 101, 102, 103, and 111?

The question is a tad vague and is assuming — since we now know the SIPvicious suite was used — that we also know to provide the name of the Python script in SIPvicious that was used. There are five tools:

The svcrash tool is something defenders can use to help curtail scanner activity. We can cross that off the list. The svreport tool is for working with data generated by svmap, svwar and/or svcrack. One more crossed off. We also know that the attacker scanned the SIP network looking for nodes, which means svmap and svwar are likely not exclusive tool to the target extensions. (We technically have enough information right now to answer the question especially if you look carefully at the answer box on the site but that’s cheating).

The SIP request line and header field like To: destination information in the form of a SIP URI. Since we only care about the extension component of the URI for this question, we can use a regular expression to isolate them.

Back to the SIP log to see if we can find the identified extensions. We’ll also process the “From:” header just in case we need it.

sip_log_parsed %>% 
  mutate_at(
    vars(request, from, to),
    ~stri_match_first_regex(.x, "sip:([^@]+)@")[,2]
  ) %>% 
  select(request, from, to)
## # A tibble: 4,266 x 3
##    request    from       to        
##    <chr>      <chr>      <chr>     
##  1 100        100        100       
##  2 3428948518 3428948518 3428948518
##  3 1729240413 1729240413 1729240413
##  4 admin      admin      admin     
##  5 info       info       info      
##  6 test       test       test      
##  7 postmaster postmaster postmaster
##  8 sales      sales      sales     
##  9 service    service    service   
## 10 support    support    support   
## # … with 4,256 more rows

That worked! We can now see what friendly-scanner attempted to authenticate only to our targets:

sip_log_parsed %>%
  mutate_at(
    vars(request, from, to),
    ~stri_match_first_regex(.x, "sip:([^@]+)@")[,2]
  ) %>% 
  filter(
    user_agent == "friendly-scanner",
    stri_detect_fixed(contents, "Authorization")
  ) %>% 
  distinct(to)
## # A tibble: 4 x 1
##   to   
##   <chr>
## 1 102  
## 2 103  
## 3 101  
## 4 111

While we’re missing 100 that’s likely due to it not requiring authentication (svcrack will REGISTER first to determine if a target requires authentication and not send cracking requests if it doesn’t).

Which extension on the honeypot does NOT require authentication?

We know this due to what we found in the previous question. Extension 100 does not require authentication.

How many extensions were scanned in total?

We just need to count the distinct to’s where the user agent is the scanner:

sip_log_parsed %>% 
  mutate_at(
    vars(request, from, to),
    ~stri_match_first_regex(.x, "sip:([^@]+)@")[,2]
  ) %>% 
  filter(
    user_agent == "friendly-scanner"
  ) %>% 
  distinct(to)
## # A tibble: 2,652 x 1
##    to        
##    <chr>     
##  1 100       
##  2 3428948518
##  3 1729240413
##  4 admin     
##  5 info      
##  6 test      
##  7 postmaster
##  8 sales     
##  9 service   
## 10 support   
## # … with 2,642 more rows

There is a trace for a real SIP client. What is the corresponding user-agent? (two words, once space in between)

We only need to look for user agent’s that aren’t our scanner:

sip_log_parsed %>% 
  filter(
    user_agent != "friendly-scanner"
  ) %>% 
  count(user_agent)
## # A tibble: 1 x 2
##   user_agent          n
##   <chr>           <int>
## 1 Zoiper rev.6751    14

Multiple real-world phone numbers were dialed. Provide the first 11 digits of the number dialed from extension 101?

Calls are INVITE” requests

sip_log_parsed %>% 
  mutate_at(
    vars(from, to),
    ~stri_match_first_regex(.x, "sip:([^@]+)@")[,2]
  ) %>% 
  filter(
    from == 101,
    stri_detect_regex(cseq, "INVITE")
  ) %>% 
  select(to) 
## # A tibble: 3 x 1
##   to              
##   <chr>           
## 1 900114382089XXXX
## 2 00112322228XXXX 
## 3 00112524021XXXX

The challenge answer box provides hint to what number they want. I’m not sure but I suspect it may be randomized, so you’ll have to match the pattern they expect with the correct digits above.

What are the default credentials used in the attempted basic authentication? (format is username:password)

This question wants us to look at the HTTP requests that require authentication. We can get he credentials info from the zeek$http log:

zeek$http %>% 
  distinct(username, password)
## # A tibble: 2 x 2
##   username password
##   <chr>    <chr>   
## 1 <NA>     <NA>    
## 2 maint    password

Which codec does the RTP stream use? (3 words, 2 spaces in between)

“Codec” refers to the algorithm used to encode/decode an audio or video stream. The RTP RFC uses the term “payload type” to refer to this during exchanges and even has a link to RFC 3551 which provides further information on these encodings.

The summary packet table that tshark generates helpfully provides summary info for RTP packets and part of that info is PT=… which indicates the payload type.

packets %>% 
  filter(proto == "RTP") %>% 
  select(info)
## # A tibble: 2,988 x 1
##    info                                                       
##    <chr>                                                      
##  1 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6402, Time=126160
##  2 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6403, Time=126320
##  3 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6404, Time=126480
##  4 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6405, Time=126640
##  5 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6406, Time=126800
##  6 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6407, Time=126960
##  7 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6408, Time=127120
##  8 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6409, Time=127280
##  9 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6410, Time=127440
## 10 PT=ITU-T G.711 PCMU, SSRC=0xA254E017, Seq=6411, Time=127600
## # … with 2,978 more rows

How long is the sampling time (in milliseconds)?

  • 1 Hz = 1,000 ms
  • 1 ms = 1,000 Hz

(1/8000) * 1000

What was the password for the account with username 555?

We don’t really need to use external programs for this but it will sure go quite a bit faster if we do. While the original reference page for sipdump and sipcrack is defunct, you can visit that link to go to the Wayback machine’s capture of it. It will help if you have a linux system handy (so Docker to the rescue for macOS and Windows folks) since the following answer details are running on Ubunbu.

This question is taking advantage of the fact that the default authentication method for SIP is extremely weak. The process uses an MD5 challenge/response, and if an attacker can capture call traffic it is possible to brute force the password offline (which is what we’ll use sipcrack for).

You can install them via sudo apt install sipcrack.

We’ll first generate a dump of the authentication attempts with sipdump:

system("sipdump -p src/Voip-trace.pcap sip.dump", intern=TRUE)
##  [1] ""                                                               
##  [2] "SIPdump 0.2 "                                                   
##  [3] "---------------------------------------"                        
##  [4] ""                                                               
##  [5] "* Using pcap file 'src/Voip-trace.pcap' for sniffing"           
##  [6] "* Starting to sniff with packet filter 'tcp or udp'"            
##  [7] ""                                                               
##  [8] "* Dumped login from 172.25.105.40 -> 172.25.105.3 (User: '555')"
##  [9] "* Dumped login from 172.25.105.40 -> 172.25.105.3 (User: '555')"
## [10] "* Dumped login from 172.25.105.40 -> 172.25.105.3 (User: '555')"
## [11] ""                                                               
## [12] "* Exiting, sniffed 3 logins"
cat(readLines("sip.dump"), sep="\n")
## 172.25.105.3"172.25.105.40"555"asterisk"REGISTER"sip:172.25.105.40"4787f7ce""""MD5"1ac95ce17e1f0230751cf1fd3d278320
## 172.25.105.3"172.25.105.40"555"asterisk"INVITE"sip:1000@172.25.105.40"70fbfdae""""MD5"aa533f6efa2b2abac675c1ee6cbde327
## 172.25.105.3"172.25.105.40"555"asterisk"BYE"sip:1000@172.25.105.40"70fbfdae""""MD5"0b306e9db1f819dd824acf3227b60e07

It saves the IPs, caller, authentication realm, method, nonce and hash which will all be fed into the sipcrack.

We know from the placeholder answer text that the “password” is 4 characters, and this is the land of telephony, so we can make an assumption that it is really 4 digits. sipcrack needs a file of passwords to try, so We’ll let R make a randomized file of 4 digit pins for us:

cat(sprintf("%04d", sample(0:9999)), file = "4-digits", sep="\n")

We only have authenticaton packets for 555 so we can automate what would normally be an interactive process:

cat(system('echo "1" | sipcrack -w 4-digits sip.dump', intern=TRUE), sep="\n")
## 
## SIPcrack 0.2 
## ----------------------------------------
## 
## * Found Accounts:
## 
## Num  Server      Client      User    Hash|Password
## 
## 1    172.25.105.3    172.25.105.40   555 1ac95ce17e1f0230751cf1fd3d278320
## 2    172.25.105.3    172.25.105.40   555 aa533f6efa2b2abac675c1ee6cbde327
## 3    172.25.105.3    172.25.105.40   555 0b306e9db1f819dd824acf3227b60e07
## 
## * Select which entry to crack (1 - 3): 
## * Generating static MD5 hash... c3e0f1664fde9fbc75a7cbd341877875
## * Loaded wordlist: '4-digits'
## * Starting bruteforce against user '555' (MD5: '1ac95ce17e1f0230751cf1fd3d278320')
## * Tried 8904 passwords in 0 seconds
## 
## * Found password: '1234'
## * Updating dump file 'sip.dump'... done

Which RTP packet header field can be used to reorder out of sync RTP packets in the correct sequence?

Just reading involved here: 5.1 RTP Fixed Header Fields.

The trace includes a secret hidden message. Can you hear it?

We could command line this one but honestly Wireshark has a pretty keen audio player. Fire it up, open up the PCAP, go to the “Telephony” menu, pick SIP and play the streams.

It was a rainy weekend in southern Maine and I really didn’t feel like doing chores, so I was skimming through RSS feeds and noticed a link to a PacketMaze challenge in the latest This Week In 4n6.

Since it’s also been a while since I’ve done any serious content delivery (on the personal side, anyway), I thought it’d be fun to solve the challenge with some tools I like — namely Zeek, tshark, and R (links to those in the e-book I’m linking to below), craft some real expository around each solution, and bundle it all up into an e-book and lighter-weight GitHub repo.

There are 11 “quests” in the challenge, requiring sifting through a packet capture (PCAP) and looking for various odds and ends (some are very windy maze passages). The challenge ranges from extracting images and image metadata from FTP sessions to pulling out precise elements in TLS sessions, to dealing with IPv6.

This is far from an expert challenge, and anyone can likely work through it with a little bit of elbow grease.

As it says on the tin, not all data is ‘big’ nor do all data-driven cybersecurity projects require advanced modeling capabilities. Sometimes you just need to dissect some network packet capture (PCAP) data and don’t want to click through a GUI to get the job done. This short book works through the questions in CyberDefenders Lab #68 to show how you can get the Zeek open source network security tool, tshark command-line PCAP analysis Swiss army knife, and R (via RStudio) working together.

FIN

If you find the resource helpful or have other feedback, drop a note on Twitter (@hrbrmstr), in a comment here, or as a GitHub issue.

On or about Friday evening (May 7, 2021) Edge notified me that the Feedly Mini extension (one of the only extensions I use as extensions are dangerous things) was remove from the store due to “malware”.

Feedly is used by many newshounds, and with 2021 being a very bad year when it comes to supply-chain attacks, seeing a notice about malware in a very popular Chrome extension is more than a little distressing.

I’m posting this blog to get the word “malware” associated with “Feedly” so they are compelled to make some sort of statement. I’ll update it with more information as it is provided.

Greynoise helps security teams focus on potential threats by reducing the noise from logs, alerts, and SIEMs. They constantly watch for badly behaving internet hosts, keep track of the benign ones, and use this research to classify IP addresses. Teams can use these classifications to only focus on things that (potentially) matter.

They also have a generous (10K calls/day), free community API which does not require credentialed access and returns a subset of information that the full API does. This is handy for folks who can’t afford the service or who only need to occasionally poke at IP addresses.

Andrew, GN’s CEO, tweeted out a super-hacky shell one-liner, the other day, that grabs the external IPs of all the ESTABLISHED IPv4 TCP connections and runs them through the community API via curl. Even though I made it a bit less-hacky:

sudo netstat -anp TCP \
  | rg ESTAB \
  | rg "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" -o \
  | rg -v "(^127\.)|(^10\.)|(^172\.1[6-9]\.)|(^172\.2[0-9]\.)|(^172\.3[0-1]\.)|(^192\.168\.)" \
  | rg -v "$(dig +short viz.greynoise.io @9.9.9.9 | rg '^\d' | tr '\n' '|' | sed -e 's/.$//g')" \
  | sort -u \
  | while read IP; do echo $(curl --silent https://api.greynoise.io/v3/community/$IP); done |
  Rscript -e 'tibble::as_tibble(jsonlite::stream_in(file("stdin"), verbose=FALSE))'

its still a “run-on-demand” process that you could put in a script and launchd, but then you’d still have to keep a terminal up or remember to watch some file. Plus, it relies on full executables.

I decided to make things a bit easier for folks on macOS Big Sur by cranking out a small SwiftUI app I’ve dubbed GreyWatch:

Each list entry show an IP address your Mac previously connected to (since app launch) or currently has established TCP connections to. The three indicator dots show (in order) whether Greynoise has detected scanning behavior from the IP address within the last 30 days, whether it has a “Rule It OuT” (RIOT) classification, and what — if any — classification the IP address has. The app only shows an IP address once even it you continue to connect to it and it puts new connections on top.

If an IP address has a classification, double-clicking it will open your default browser to the Greynoise visualizer, otherwise said double-click will take you to the IPInfo entry for the IP address.

Needless to say, if your Mac is talking to a host Greynoise has classified as horribad, your other 99 problems no longer take precedence. I’ll likely add a notification action if that condition occurrs.

There’s an “Export…” item in the file menu that lets you save a copy of the current IP list (with metadata) to an ndlines formatted JSON file.

The app does not shell out to dig or netstat and has a light memory and energy footprint.

There are pre-built, notarized binaries in the releases section, and I’ll gradually be adding features (submit yours via new issues!). You can also submit bug reports or other questions via GH issues as well.

Many thanks to Andrew and team for their generous free tier, which enables semi-useful community hacks like this one!

Brim Security maintains a free, Electron-based desktop GUI for exploration of PCAPs and select cybersecurity logs:

along with a broad ecosystem of tools which can be used independently of the GUI.

The standalone or embedded zqd server, as well as the zq command line utility let analysts run ZQL (a domain-specific query language) queries on cybersecurity data sources.

The Brim team maintains a Python module that is capable of working with the zqd HTTP API and my nascent {brimr}gitea|gh|gl|bb R package provides a similar API structure to perform similar operations in R, along with a wrapper for the zq commmand line tool.

PCAPs! In! Spaaaaacce[s]!

Brim Desktop organizes input sources into something called “spaces”. We can check for available spaces with brim_spaces():

library(brimr)
library(tibble)

brim_spaces()
##                               id                                                            name
## 1 sp_1p6pwLgtsESYBTHU9PL9fcl2iBn 2021-02-17-Trickbot-gtag-rob13-infection-in-AD-environment.pcap
##                                                                                              data_path storage_kind
## 1 file:///Users/demo/Library/Application%20Support/Brim/data/spaces/sp_1p6pwLgtsESYBTHU9PL9fcl2iBn    filestore

This single space availble is a sample capture of Trickbot

Let’s profile the network connections in this capture:

# ZQL query to fetch Zeek connection data
zql1 <- '_path=conn | count() by id.orig_h, id.resp_h, id.resp_p | sort id.orig_h, id.resp_h, id.resp_p'

space <- "2021-02-17-Trickbot-gtag-rob13-infection-in-AD-environment.pcap"

r1 <- brim_search(space, zql1)

r1
## ZQL query took 0.0000 seconds; 384 records matched; 1,082 records read; 238,052 bytes read

(r1 <- as_tibble(tidy_brim(r1)))
## # A tibble: 74 x 4
##    orig_h      resp_h       resp_p count
##    <chr>       <chr>        <chr>  <int>
##  1 10.2.17.2   10.2.17.101  49787      1
##  2 10.2.17.101 3.222.126.94 80         1
##  3 10.2.17.101 10.2.17.1    445        1
##  4 10.2.17.101 10.2.17.2    53        97
##  5 10.2.17.101 10.2.17.2    88        27
##  6 10.2.17.101 10.2.17.2    123        5
##  7 10.2.17.101 10.2.17.2    135        8
##  8 10.2.17.101 10.2.17.2    137        2
##  9 10.2.17.101 10.2.17.2    138        2
## 10 10.2.17.101 10.2.17.2    389       37
## # … with 64 more rows

Brim auto-processed the PCAP into Zeek log format and _path=conn in query string indicates that’s where we’re going to perform further data operations (the queries are structured a bit like jq filters). We then ask Brim/zqd to summarize and sort source IP, destination IP, and port counts. {brimr} sends this query over to the server. The raw response is a custom data structure that we can turn into a tidy data frame via tidy_brim().

We can do something similar with the Suricata data that Brim also auto-processes for us:

# Z query to fetch Suricata alerts including the count of alerts per source:destination 
zql2 <- "event_type=alert | count() by src_ip, dest_ip, dest_port, alert.severity, alert.signature | sort src_ip, dest_ip, dest_port, alert.severity, alert.signature"

r2 <- brim_search(space, zql2)

r2
## ZQL query took 0.0000 seconds; 47 records matched; 870 records read; 238,660 bytes read

(r2 <- (as_tibble(tidy_brim(r2))))
## # A tibble: 35 x 6
##    src_ip     dest_ip    dest_port severity signature                                                              count
##    <chr>      <chr>          <int>    <int> <chr>                                                                  <int>
##  1 10.2.17.2  10.2.17.1…     49674        3 SURICATA Applayer Detect protocol only one direction                       1
##  2 10.2.17.2  10.2.17.1…     49680        3 SURICATA Applayer Detect protocol only one direction                       1
##  3 10.2.17.2  10.2.17.1…     49687        3 SURICATA Applayer Detect protocol only one direction                       1
##  4 10.2.17.2  10.2.17.1…     49704        3 SURICATA Applayer Detect protocol only one direction                       1
##  5 10.2.17.2  10.2.17.1…     49709        3 SURICATA Applayer Detect protocol only one direction                       1
##  6 10.2.17.2  10.2.17.1…     49721        3 SURICATA Applayer Detect protocol only one direction                       1
##  7 10.2.17.2  10.2.17.1…     50126        3 SURICATA Applayer Detect protocol only one direction                       1
##  8 10.2.17.1… 3.222.126…        80        2 ET POLICY curl User-Agent Outbound                                         1
##  9 10.2.17.1… 36.95.27.…       443        1 ET HUNTING Suspicious POST with Common Windows Process Names - Possib…     1
## 10 10.2.17.1… 36.95.27.…       443        1 ET MALWARE Win32/Trickbot Data Exfiltration                                1
## # … with 25 more rows

Finally, for this toy example, we’ll also generate a visual overview of these connections:

library(igraph)
library(ggraph)
library(tidyverse)

gdf <- count(r1, orig_h, resp_h, wt=count)

count(gdf, node = resp_h, wt=n, name = "in_degree") %>% 
  full_join(
    count(gdf, node = orig_h, name = "out_degree")
  ) %>% 
  mutate_at(
    vars(in_degree, out_degree),
    replace_na, 1
  ) %>% 
  arrange(in_degree) -> vdf

g <- graph_from_data_frame(gdf, vertices = vdf)

ggraph(g, layout = "linear") +
  geom_node_point(
    aes(size = in_degree), shape = 21
  ) +
  geom_edge_arc(
    width = 0.125, 
    arrow = arrow(
      length = unit(5, "pt"),
      type = "closed"
    )
  )

We can also process log files directly (i.e. without any server) with zq_cmd():

zq_cmd(
  c(
    '"* | cut ts,id.orig_h,id.orig_p"', # note the quotes
    system.file("logs", "conn.log.gz", package = "brimr")
   )
 )
##           id.orig_h id.orig_p                          ts
##   1:  10.164.94.120     39681 2018-03-24T17:15:21.255387Z
##   2:    10.47.25.80     50817 2018-03-24T17:15:21.411148Z
##   3:    10.47.25.80     50817 2018-03-24T17:15:21.926018Z
##   4:    10.47.25.80     50813 2018-03-24T17:15:22.690601Z
##   5:    10.47.25.80     50813 2018-03-24T17:15:23.205187Z
##  ---                                                     
## 988: 10.174.251.215     33003 2018-03-24T17:15:21.429238Z
## 989: 10.174.251.215     33003 2018-03-24T17:15:21.429315Z
## 990: 10.174.251.215     33003 2018-03-24T17:15:21.429479Z
## 991:  10.164.94.120     38265 2018-03-24T17:15:21.427375Z
## 992: 10.174.251.215     33003 2018-03-24T17:15:21.433306Z

FIN

This package is less than 24 hrs old (as of the original blog post date) and there are still a few bits missing, which means y’all have the ability to guide the direction it heads in. So kick the tyres and interact where you’re most comfortable.

The incredibly talented folks over at Bishop Fox were quite generous this week, providing a scanner for figuring out PAN-OS GlobalProtect versions. I’ve been using their decoding technique and date-based fingerprint table to keep an eye on patch status (over at $DAYJOB we help customers, organizations, and national cybersecurity centers get ahead of issues as best as we can).

We have at-scale platforms for scanning the internet and aren’t running the panos-scanner repo code, but I since there is Python code for doing this, I thought it might be fun to show R folks how to do the same thing and show folks how to use the {httr} package to build something similar (we won’t hit all the URLs their script assesses, primarily for brevity).

What Are We Doing Again?

Palo Alto makes many things, most of which are built on their custom linux distribution dubbed PAN-OS. One of the things they build is a VPN product that lets users remotely access internal company resources. It’s had quite the number of pretty horrible problems of late.

Folks contracted to assess security defenses (colloquially dubbed “pen-testers” tho no pens are usually involved) and practitioners within an organization who want to gain an independent view of what their internet perimeter looks like often assemble tools to perform
ad-hoc assessments. Sure, there are commercial tools for performing these assessments ($DAYJOB makes some!), but these open source tools make it possible for folks to learn from each other and for makers of products (like PAN-OS) to do a better job securing their creations.

In this case, the creation is a script that lets the caller figure out what version of PAN-OS is running on a given GlobalProtect box.

To follow along at home, you’ll need access to a PAN-OS system, as I’m not providing an IP address of one for you. It’s really not hard to find one (just google a bit or stand up a trial one from the vendor). Throughout the examples I’ll be using {glue} to replace ip and port in various function calls, so let’s get some setup bits out of the way:

library(httr)
library(tidyverse) # for read_fwf() (et al), pluck(), filter(), and %>%

gg <- glue::glue # no need to bring in the entire namespace just for this

Assuming you have a valid ip and port, let’s try making a request against your PAN-OS GlobalProtect (hereafter using “GP” so save keystrokes) system:

httr::HEAD(
  url = gg("https://{ip}:{port}/global-protect/login.esp")
) -> res
## Error in curl::curl_fetch_memory(url, handle = handle) : 
##  SSL certificate problem: self signed certificate

We’re using a HEAD request as we really don’t need the contents of the remote file (unless you need to verify it truly is a PAN-OS GP server), just the metadata about it. You can use a traditional GET request if you like, though.

We immediately run into a snag since these boxes tend to use a self-signed SSL/TLS certificate which web clients aren’t thrilled about dealing with unless explicitly configured to. We can circumvent this with some configuration options, but you should not use the following incantations haphazardly. SSL/TLS no longer really means what it used to (thanks, Let’s Encrypt!) but you have no guarantees of what’s being delivered to you is legitimate if you hit a plaintext web site or one with an invalid certificate. Enough with the soapbox, let’s make the request:

httr::HEAD(
  url = gg("https://{ip}:{port}/global-protect/login.esp"),
  config = httr::config(
    ssl_verifyhost =FALSE, 
    ssl_verifypeer = FALSE
  )
) -> res

httr::status_code(res)
## [1] 200

In that request, we’ve told the underlying {curl} library calls to not verify the validity of the host or peer certificates associated with the service. Again, don’t do this haphazardly to get around generic SSL/TLS problems when making normal API calls or scraping sites.

Since we only made a HEAD request, we’re just getting back headers, so let’s take a look at them:

str(httr::headers(res), 1)
## List of 18
##  $ date                     : chr "Fri, 10 Jul 2020 15:02:32 GMT"
##  $ content-type             : chr "text/html; charset=UTF-8"
##  $ content-length           : chr "11749"
##  $ connection               : chr "keep-alive"
##  $ etag                     : chr "\"7e0d5e2b6add\""
##  $ pragma                   : chr "no-cache"
##  $ cache-control            : chr "no-store, no-cache, must-revalidate, post-check=0, pre-check=0"
##  $ expires                  : chr "Thu, 19 Nov 1981 08:52:00 GMT"
##  $ x-frame-options          : chr "DENY"
##  $ set-cookie               : chr "PHPSESSID=bde5668131c14b765e3e75f8ed5514a0; path=/; secure; HttpOnly"
##  $ set-cookie               : chr "PHPSESSID=bde5668131c14b765e3e75f8ed5514a0; path=/; secure; HttpOnly"
##  $ set-cookie               : chr "PHPSESSID=bde5668131c14b765e3e75f8ed5514a0; path=/; secure; HttpOnly"
##  $ set-cookie               : chr "PHPSESSID=bde5668131c14b765e3e75f8ed5514a0; path=/; secure; HttpOnly"
##  $ set-cookie               : chr "PHPSESSID=bde5668131c14b765e3e75f8ed5514a0; path=/; samesite=lax; secure; httponly"
##  $ strict-transport-security: chr "max-age=31536000;"
##  $ x-xss-protection         : chr "1; mode=block;"
##  $ x-content-type-options   : chr "nosniff"
##  $ content-security-policy  : chr "default-src 'self'; script-src 'self' 'unsafe-inline'; img-src * data:; style-src 'self' 'unsafe-inline';"
##  - attr(*, "class")= chr [1:2] "insensitive" "list"

As an aside, I’ve always found the use of PHP code in security products quite, er, fascinating.

The value we’re really looking for here is etag (which really looks like ETag in the raw response).

Bishop Fox (and others) figured out that that header value contains a timestamp in the last 8 characters. That timestamp maps to the release date of the particular PAN-OS version. Since Palo Alto maintains multiple, supported versions of PAN-OS and generally releases patches for them all at the same time, the mapping to an exact version is not super precise, but it’s sufficient to get an idea of whether that system is at a current, supported patch level.

The last 8 characters of 7e0d5e2b6add are 5e2b6add, which — as Bishop Fox notes in their repo — is just a hexadecimal encoding of the POSIX timestamp, in this case, 1579903709 or 2020-01-24 22:08:29 GMT (we only care about the date, so really 2020-01-24).

We can compute that with R, but first we need to note that the value is surrounded by " quotes, so we’ll have to deal with that during the processing:

httr::headers(res) %>% 
  pluck("etag") %>% 
  gsub('"', '', .) %>% 
  substr(5, 12) %>% 
  as.hexmode() %>% 
  as.integer() %>% 
  anytime::anytime(tz = "GMT") %>% 
  as.Date() -> version_date

version_date
## [1] "2020-01-24"

To get the associated version(s), we need to look the date up in their table, which is in a fixed-width format that we can read via:

read_fwf(
  file = "https://raw.githubusercontent.com/noperator/panos-scanner/master/version-table.txt",
  col_positions = fwf_widths(c(10, 14), c("version", "date")),
  col_types = "cc",
  trim_ws = TRUE
) %>% 
  mutate(
    date = lubridate::mdy(date)
  ) -> panos_trans

panos_trans
## # A tibble: 153 x 2
##    version  date      
##    <chr>    <date>    
##  1 6.0.0    2013-12-23
##  2 6.0.1    2014-02-26
##  3 6.0.2    2014-04-18
##  4 6.0.3    2014-05-29
##  5 6.0.4    2014-07-30
##  6 6.0.5    2014-09-04
##  7 6.0.5-h3 2014-10-07
##  8 6.0.6    2014-10-07
##  9 6.0.7    2014-11-18
## 10 6.0.8    2015-01-13
## # … with 143 more rows

Now, let’s see what version or versions this might be:

filter(panos_trans, date == version_date)
## # A tibble: 2 x 2
##   version date      
##   <chr>   <date>    
## 1 9.0.6   2020-01-24
## 2 9.1.1   2020-01-24

Putting It All Together

We can make a command line script for this (example) scanner:

#!env Rscript
library(purrr)

gg <- glue::glue

# we also use {httr}, {readr}, {lubridate}, {anytime}, and {jsonlite}

args <- commandArgs(trailingOnly = TRUE)

stopifnot(
  c(
    "Must supply both IP address and port" = length(args) == 2
  )
)

ip <- args[1]
port <-  args[2]

httr::HEAD(
  url = gg("https://{ip}:{port}/global-protect/login.esp"),
  config = httr::config(
    ssl_verifyhost =FALSE, 
    ssl_verifypeer = FALSE
  )
) -> res

httr::headers(res) %>% 
  pluck("etag") %>% 
  gsub('"', '', .) %>% 
  substr(5, 12) %>% 
  as.hexmode() %>% 
  as.integer() %>% 
  anytime::anytime(tz = "GMT") %>% 
  as.Date() -> version_date

panos_trans <- readr::read_csv("panos-versions.txt", col_types = "cD")

res <- panos_trans[panos_trans[["date"]] == version_date,]

if (nrow(res) == 0) {
  cat(gg('{{"ip":"{ip}","port":"{port}","version"=null,"date"=null}}\n'))
} else {
  res$ip <- ip
  res$port <- port
  jsonlite::stream_out(res[,c("ip", "port", "version", "date")], verbose = FALSE)
}

Save that as panos-scanner.R and make it executable (provided you’re on a non-legacy operating system that doesn’t support such modern capabilities). Save panos_trans as a CSV file in the same directory and try it against another (sanitized IP/port) system:

./panos-scanner.R 10.20.30.40 5678                                                                                                                                                    1
{"ip":"10.20.30.40","port":"5678","version":"9.1.2","date":"2020-03-30"}

FIN

To be complete, the script should test all the URLs the ones in the script from Bishop Fox does and stand up many more guard rails to handle errors associated with unreachable hosts, getting headers from a system that is not a PAN-OS GP host, and ensuring the ETag is a valid date.

You can grab the code [from this repo](https://git.rud.is/hrbrmstr/2020-07-10-panos-rstats.