Skip navigation

The United States Centers for Disease Control (CDC from now on) has setup two new public surveillance resources for COVID-19. Together, COVIDView and COVID-NET provide similar weekly surveillance data as FluView does for influenza-like illnesses (ILI).

The COVIDView resources are HTML tables (O_O) and, while the COVID-NET interface provides a “download” button, there is no exposed API to make it easier for the epidemiological community to work with these datasets.

Enter {cdccovidview} — https://cinc.rud.is/web/packages/cdccovidview/ — which scrapes the tables and uses the hidden API in the same way {cdcfluview}(https://cran.rstudio.com/web/packages/cdcfluview/index.html) does for the FluView data.

Weekly case, hospitalization, and mortality data is available at the national, state and regional levels (where provided) and I tried to normalize the fields across each of the tables/datasets (I hate to pick on them when they’re down, but these two sites are seriously sub-optimal from a UX and just general usage perspective).

After you follow the above URL for information on how to install the package, it should “just work”. No API keys are needed, but the CDC may change the layout of tables and fields structure of the hidden API at any time, so keep an eye out for updates.

Using it is pretty simple, just use one of the functions to grab the data you want and then work with it.

library(cdccovidview)
library(hrbrthemes)
library(tidyverse)

hosp <- laboratory_confirmed_hospitalizations()

hosp
## # A tibble: 4,590 x 8
##    catchment      network   year  mmwr_year mmwr_week age_category cumulative_rate weekly_rate
##    <chr>          <chr>     <chr> <chr>     <chr>     <chr>                  <dbl>       <dbl>
##  1 Entire Network COVID-NET 2020  2020      10        0-4 yr                   0           0  
##  2 Entire Network COVID-NET 2020  2020      11        0-4 yr                   0           0  
##  3 Entire Network COVID-NET 2020  2020      12        0-4 yr                   0           0  
##  4 Entire Network COVID-NET 2020  2020      13        0-4 yr                   0.3         0.3
##  5 Entire Network COVID-NET 2020  2020      14        0-4 yr                   0.6         0.3
##  6 Entire Network COVID-NET 2020  2020      15        0-4 yr                  NA          NA  
##  7 Entire Network COVID-NET 2020  2020      16        0-4 yr                  NA          NA  
##  8 Entire Network COVID-NET 2020  2020      17        0-4 yr                  NA          NA  
##  9 Entire Network COVID-NET 2020  2020      18        0-4 yr                  NA          NA  
## 10 Entire Network COVID-NET 2020  2020      19        0-4 yr                  NA          NA  
## # … with 4,580 more rows

c(
  "0-4 yr", "5-17 yr", "18-49 yr", "50-64 yr", "65+ yr", "65-74 yr", "75-84 yr", "85+"
) -> age_f

mutate(hosp, start = mmwr_week_to_date(mmwr_year, mmwr_week)) %>%
  filter(!is.na(weekly_rate)) %>%
  filter(catchment == "Entire Network") %>%
  select(start, network, age_category, weekly_rate) %>%
  filter(age_category != "Overall") %>%
  mutate(age_category = factor(age_category, levels = age_f)) %>%
  ggplot() +
  geom_line(
    aes(start, weekly_rate)
  ) +
  scale_x_date(
    date_breaks = "2 weeks", date_labels = "%b\n%d"
  ) +
  facet_grid(network~age_category) +
  labs(
    x = NULL, y = "Rates per 100,000 pop",
    title = "COVID-NET Weekly Rates by Network and Age Group",
    caption = sprintf("Source: COVID-NET: COVID-19-Associated Hospitalization Surveillance Network, Centers for Disease Control and Prevention.\n<https://gis.cdc.gov/grasp/COVIDNet/COVID19_3.html>; Accessed on %s", Sys.Date())
  ) +
  theme_ipsum_es(grid="XY")

FIN

This is brand new and — as noted — things may change or break due to CDC site changes. I may have also missed a table or two (it’s a truly terrible site).

If you notice things are missing or would like a different interface to various data endpoints, drop an issue or PR wherever you’re most comfortable.

Stay safe!

7 Comments

  1. Hi Hrbrmstr: It seems as though your package would go well with the new RStudio addition at the link below. https://gt.rstudio.com/

    • Thx. {gt} has def been around for quite a while. I’m actually surprised it made it to CRAN as it was nigh abandoned for just under 18 months.

  2. Hi, your package don’t worl with latest version R :
    package ‘cdccovidview’ is not available (for R version 3.6.3)

    • The blog post clearly states where to go to find instructions for how to install the package. Just re-read it vs skim.

  3. Thank you very much for your package, it’s really cool!
    It seems that it retrieves a previous version of the CDCCOVIDVIEW (cutoff april10th).
    Is there any way we can use the package to retrieve the latest data available (April 24th)?
    Thank you so much.

    • it retrieves what is available. there is no inherent restriction in the package.

      • Except that they changed the URL pattern. @$R@#$@

        Updating the pkg now and #ty.


2 Trackbacks/Pingbacks

  1. […] by data_admin [This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page […]

  2. […] article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page […]

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.