New R Package For HTTP Headers Hashing

HTTP Headers Hashing (HHHash) is a technique developed by Alexandre Dulaunoy to gen­erate a fingerprint of an HTTP server based on the headers it returns. It employs one-way hashing to generate a hash value from the list of header keys returned by the server. The HHHash value is calculated by concatenating the list of headers returned, ordered by sequence, with each header value separated by a colon. The SHA256 of this concatenated list is then taken to generate the HHHash value. HHHash incorporates a version identifier to enable updates to new hashing functions.

While effective, HHHash’s performance relies heavily on the characteristics of the HTTP requests, so correlations are typically only established using the same crawler parameters. Locality-sensitive hashing (LSH) could be used to calculate distances between sets of headers for more efficient comparisons. There are some limitations with some LSH algorithms (such as the need to pad content to a minimum byte length) that make the initial use of SHA256 hashes a bit more straightforward.

Alexandre made a Python library for it, and I cranked out an R package for it as well.

There are three functions exposed by {hhhash}:

  • build_hash_from_response: Build a hash from headers in a curl
    response object
  • build_hash_from_url: Build a hash from headers retrieved from a URL
  • hash_headers: Build a hash from a vector of HTTP header keys

The build_hash_from_url function relies on {curl} vs {httr} since {httr} uses curl::parse_headers() which (rightfully so) lowercases the header keys. We need to preserve both order and case for the hash to be useful.

Here is some sample usage:

remotes::install_github("hrbrmstr/hhhash")

library(hhhash)

build_hash_from_url("https://www.circl.lu/")
## [1] "hhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc"

res <- curl::curl_fetch_memory("https://www.circl.lu/", curl::new_handle())

build_hash_from_response(res)
## [1] "hhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc"

c(
  "Date", "Server", "Strict-Transport-Security",
  "Last-Modified", "ETag", "Accept-Ranges",
  "Content-Length", "Content-Security-Policy",
  "X-Content-Type-Options", "X-Frame-Options",
  "X-XSS-Protection", "Content-Type"
) -> keys

hash_headers(keys)
## [1] "hhh:1:78f7ef0651bac1a5ea42ed9d22242ed8725f07815091032a34ab4e30d3c3cefc"
Cover image from Data-Driven Security
Amazon Author Page

4 Comments New R Package For HTTP Headers Hashing

  1. boB Rudis 🇺🇦

    @hrbrmstr@rud.is Well, formatted code-blocks from WordPress (using the activitypub federation capabilities of WP) really don't work well outside of a visible web context.Also testing to re-make-sure these replies show up as comments on the blog.

  2. Pingback: New R Package For HTTP Headers Hashing – Source: securityboulevard.com – CISO2CISO.COM & CYBER SECURITY GROUP

  3. Pingback: New R Package For HTTP Headers Hashing - Ciberdefensa

  4. Pingback: Creating an HTTP Header Hash in R – Curated SQL

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.