Roll Your Own Federal Government Shutdown-caused SSL Certificate Expiration Monitor in R

By now, even remote villages on uncharted islands in the Pacific know that the U.S. is in the midst of a protracted partial government shutdown. It’s having real impacts on the lives of Federal government workers but they aren’t the only ones. Much of the interaction Federal agencies have with the populace takes place online and the gateway to most of these services/information is a web site.

There are Federal standards that require U.S. government web sites to use SSL/TLS certificates and those certificates have something in common with, say, a loaf of bread you buy at the store: they expire. In all but the best of orgs — or we zany folks who use L e t ‘ s E n c r y p t and further propel internet denizens into a false sense of safety & privacy — renewing certificates involves manual labor/human intervention. For a good chunk of U.S. Federal agencies, those particular humans aren’t around. If a site’s SSL certificate expires and isn’t re-issued, it causes browsers to do funny things, like this:

Now, some of these sites are configured improperly in many ways, including them serving pages on both http and https (vs redirecting to https immediately upon receiving an http connection). But, browsers like Chrome will generally try https first and scare you into not viewing the site.

But, how big a problem could this really be? We can find out with a fairly diminutive R script that:

  • grabs a list of Federal agency domains (thanks to the GSA)
  • tries to make a SSL/TLS connection (via the openssl package) to the apex domain or www. prefixed apex domain
  • find the expiration date for the cert
  • do some simple date math

I’ve commented the script below pretty well so I’ll refrain from further blathering:

library(furrr)
library(openssl)
library(janitor)
library(memoise)
library(hrbrthemes)
library(tidyverse)

# fetch the GSA CSV:

read_csv(
  file = "https://raw.githubusercontent.com/GSA/data/master/dotgov-domains/current-federal.csv",
  col_types = "ccccccc"
) %>% 
  janitor::clean_names() -> xdf

# make openssl::download_ssl_cert calls safer in the even there
# are network/connection issues
.dl_cert <- possibly(openssl::download_ssl_cert, otherwise = NULL)

# memoise the downloader just in case we need to break the iterator
# below or another coding error causes it to break (the cached values
# will go away in a new R session or if you manually purge them)
dl_cert <- memoise::memoise(.dl_cert)

# we'll do this in parallel to save time (~1,200 domains)
plan(multiprocess)

# now follow the process described in the bullet points
future_map_dfr(xdf$domain_name, ~{

  who <- .x

  crt <- dl_cert(who)  

  if (!is.null(crt)) {
    # shld be the first cert and expires is second validity field
    expires <- crt[[1]]$validity[2] 
  } else {
    crt <- dl_cert(sprintf("www.%s", who)) # may be on www b/c "gov"
    if (!is.null(crt)) {
      expires <- crt[[1]]$validity[2]
    } else {
      expires <- NA_character_  
    }
  }

  # keep a copy of the apex domain, the expiration field and the cert
  # (in the event you want to see just how un-optimized the U.S. IT 
  # infrastructure is by how many stupid vendors they use for certs)
  tibble(
    who = who,
    expires = expires,
    cert = list(crt)
  )

}) -> cdf

Now, lets make strings into proper dates, count only the dates starting with the date of the shutdown to the end of 2019 (b/c the reckless human at the helm is borderline insane enough to do that) and plot the timeline:

filter(cdf, !is.na(expires)) %>% 
  mutate(
    expires = as.Date(
      as.POSIXct(expires, format="%b %d %H:%M:%S %Y")
    )
  ) %>% 
  arrange(expires) 
  count(expires) %>% 
  filter(
    expires >= as.Date("2018-12-22"), 
    expires <= as.Date("2019-12-31")
  ) %>% 
  ggplot(aes(expires, n)) +
  geom_vline(
    xintercept = Sys.Date(), linetype="dotted", size=0.25, color = "white"
  ) +
  geom_label(
    data = data.frame(), 
    aes(x = Sys.Date(), y = Inf, label = "Today"),
    color = "black", vjust = 1
  ) +
  geom_segment(aes(xend=expires, yend=0), color = ft_cols$peach) + 
  scale_x_date(name=NULL, date_breaks="1 month", date_labels="%b") +
  scale_y_comma("# Federal Agency Certs") +
  labs(title = "2019 Federal Agency ShutdownCertpoalypse") +
  theme_ft_rc(grid="Y")

Now, I’m unwarrantedly optimistic that this debacle could be over by the end of January. How many certs (by agency) could go bad by then?

left_join(cdf, xdf, by=c("who"="domain_name")) %>% 
  mutate(
    expires = as.Date(
      as.POSIXct(expires, format="%b %d %H:%M:%S %Y")
    )
  ) %>% 
  filter(
    expires >= as.Date("2018-12-22"),
    expires <= as.Date("2019-01-31")
  ) %>% 
  count(agency, sort = TRUE)
## # A tibble: 10 x 2
##    agency                                          n
##    <chr>                                       <int>
##  1 Government Publishing Office                    8
##  2 Department of Commerce                          4
##  3 Department of Defense                           3
##  4 Department of Housing and Urban Development     3
##  5 Department of Justice                           3
##  6 Department of Energy                            1
##  7 Department of Health and Human Services         1
##  8 Department of State                             1
##  9 Department of the Interior                      1
## 10 Department of the Treasury                      1

Ugh.

FIN

Not every agency is fully shutdown and not all workers in charge of cert renewals are furloughed (or being forced to work without pay). But, this one other area shows the possible unintended consequences of making rash, partisan decisions (something both Democrats & Republicans excel at).

You can find the contiguous R code at 2018-01-10-shutdown-certpocalypse.R and definitely try to explore the contents of those certificates.

Cover image from Data-Driven Security
Amazon Author Page

10 Comments Roll Your Own Federal Government Shutdown-caused SSL Certificate Expiration Monitor in R

      1. hrbrmstr

        The LE cabal is larger than than the EFF and the host of unintended consequences it is having and will have is just terrible. They’ll get their faux privacy at the expense of many other things.

        Reply
        1. jonocarroll

          “faux privacy”? I thought your argument against LE was that it was making it too easy for anyone to appear genuine (by actually being private, itself being a poor measure of ‘genuine’). Are LE certs not properly robust compared to other CAs? Apologies for the naiveté.

          Reply
          1. hrbrmstr

            That’s one of the arguments. The use of LE is forcing more organizations to break TLS via transparent (or full) proxies so they can introspect the content. Orgs are terrible at securing data and security teams in orgs tend to be even worse at it. Plus, companies like BlueCoat sell carrier-grade equipment to do the same and it’s not exactly hard for other actors to use god-like ca certs to do the same. Now, you’ve got https://community.letsencrypt.org/t/let-s-encrypt-no-longer-checking-google-safe-browsing/82168 as well (so they really don’t care about users at all).

          2. hrbrmstr

            It’s not LE’s certs more than it is certs in general, too. They’re forcing something to achieve some preconceived nirvana state that won’t really exist.

    1. hrbrmstr

      right, that’s why I mentioned crazy dangerous folks like us who use LE to destroy the internet wouldn’t be impacted. Federal sites are required to use authorized vendors (i.e. pay for the cert). Most legacy-ish systems the U.S. gov uses aren’t easily automatable with the cert providers they use. Some def are and there are other third party products which can monitor and handle automation for a ton of legacy systems, but that add-on costs $ and rarely gets into appropriations.

      Reply
  1. Pingback: Roll Your Own Federal Government Shutdown-caused SSL Certificate Expiration Monitor in R – Data Science Austria

  2. Kevin Binswanger

    Some of these certificates may be renewed before they’re expired. Have you given any thought to excluding certificates that were issued since the shutdown started?

    Reply
    1. hrbrmstr

      Keen observation!

      c(
        "achp.gov", "preserveamerica.gov", "ofcm.gov", "sworm.gov", "hispanicheritagemonth.gov", "jewishheritagemonth.gov",
         "nativeamericanheritagemonth.gov", "womenshistorymonth.gov", "ustr.gov", "smokefree.gov", "vote.gov", 
         "worldwar1centennial.gov", "digitalgov.gov", "connect.gov", "hurricanes.gov", "sdr.gov", "isotope.gov", "isotopes.gov", 
         "biometrics.gov", "cbp.gov", "disasterassistance.gov", "e-verify.gov", "everify.gov", "fema.gov", "firstresponder.gov", 
         "firstrespondertraining.gov", "fleta.gov", "fletc.gov", "floodsmart.gov", "globalentry.gov", "homelandsecurity.gov", 
         "ice.gov", "listo.gov", "niem.gov", "ready.gov", "readybusiness.gov", "tsa.gov", "us-cert.gov", "uscis.gov", 
         "apprenticeship.gov", "apprenticeships.gov", "benefits.gov", "govloans.gov", "hirevets.gov", "osha.gov", 
         "whistleblowers.gov", "nativeonestop.gov", "iaf.gov", "abmcscholar.gov", "nationalmall.gov", "fnal.gov", "hanford.gov", 
         "ed.gov", "arm.gov", "energysaver.gov", "energysavers.gov", "sierrawild.gov", "eia.gov", "sen.gov", 
         "pretrialservices.gov", "psa.gov", "plainlanguage.gov", "ameslab.gov", "ornl.gov", "sbst.gov", "cttso.gov", "tswg.gov"
      )
      

      all were renewed during the shutdown (67 in all so far)

      December 31 was a busy day, too:

      ## # A tibble: 15 x 2
      ##    created        n
      ##    <date>     <int>
      ##  1 2018-12-22     4
      ##  2 2018-12-24     4
      ##  3 2018-12-25     1
      ##  4 2018-12-27     3
      ##  5 2018-12-28     1
      ##  6 2018-12-30     1
      ##  7 2018-12-31    33
      ##  8 2019-01-01     1
      ##  9 2019-01-02     4
      ## 10 2019-01-03     5
      ## 11 2019-01-04     1
      ## 12 2019-01-08     1
      ## 13 2019-01-09     3
      ## 14 2019-01-10     3
      ## 15 2019-01-11     2
      

      Automation? I need to see if there’s an easy way to get whether IT staff in a given agency are furloughed or not.

      Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.