The United States Centers for Disease Control (CDC from now on) has setup two new public surveillance resources for COVID-19. Together, COVIDView and COVID-NET provide similar weekly surveillance data as FluView does for influenza-like illnesses (ILI).
The COVIDView resources are HTML tables (O_O) and, while the COVID-NET interface provides a “download” button, there is no exposed API to make it easier for the epidemiological community to work with these datasets.
Enter {cdccovidview} — https://cinc.rud.is/web/packages/cdccovidview/ — which scrapes the tables and uses the hidden API in the same way {cdcfluview}(https://cran.rstudio.com/web/packages/cdcfluview/index.html) does for the FluView data.
Weekly case, hospitalization, and mortality data is available at the national, state and regional levels (where provided) and I tried to normalize the fields across each of the tables/datasets (I hate to pick on them when they’re down, but these two sites are seriously sub-optimal from a UX and just general usage perspective).
After you follow the above URL for information on how to install the package, it should “just work”. No API keys are needed, but the CDC may change the layout of tables and fields structure of the hidden API at any time, so keep an eye out for updates.
Using it is pretty simple, just use one of the functions to grab the data you want and then work with it.
library(cdccovidview)
library(hrbrthemes)
library(tidyverse)
hosp <- laboratory_confirmed_hospitalizations()
hosp
## # A tibble: 4,590 x 8
## catchment network year mmwr_year mmwr_week age_category cumulative_rate weekly_rate
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Entire Network COVID-NET 2020 2020 10 0-4 yr 0 0
## 2 Entire Network COVID-NET 2020 2020 11 0-4 yr 0 0
## 3 Entire Network COVID-NET 2020 2020 12 0-4 yr 0 0
## 4 Entire Network COVID-NET 2020 2020 13 0-4 yr 0.3 0.3
## 5 Entire Network COVID-NET 2020 2020 14 0-4 yr 0.6 0.3
## 6 Entire Network COVID-NET 2020 2020 15 0-4 yr NA NA
## 7 Entire Network COVID-NET 2020 2020 16 0-4 yr NA NA
## 8 Entire Network COVID-NET 2020 2020 17 0-4 yr NA NA
## 9 Entire Network COVID-NET 2020 2020 18 0-4 yr NA NA
## 10 Entire Network COVID-NET 2020 2020 19 0-4 yr NA NA
## # … with 4,580 more rows
c(
"0-4 yr", "5-17 yr", "18-49 yr", "50-64 yr", "65+ yr", "65-74 yr", "75-84 yr", "85+"
) -> age_f
mutate(hosp, start = mmwr_week_to_date(mmwr_year, mmwr_week)) %>%
filter(!is.na(weekly_rate)) %>%
filter(catchment == "Entire Network") %>%
select(start, network, age_category, weekly_rate) %>%
filter(age_category != "Overall") %>%
mutate(age_category = factor(age_category, levels = age_f)) %>%
ggplot() +
geom_line(
aes(start, weekly_rate)
) +
scale_x_date(
date_breaks = "2 weeks", date_labels = "%b\n%d"
) +
facet_grid(network~age_category) +
labs(
x = NULL, y = "Rates per 100,000 pop",
title = "COVID-NET Weekly Rates by Network and Age Group",
caption = sprintf("Source: COVID-NET: COVID-19-Associated Hospitalization Surveillance Network, Centers for Disease Control and Prevention.\n<https://gis.cdc.gov/grasp/COVIDNet/COVID19_3.html>; Accessed on %s", Sys.Date())
) +
theme_ipsum_es(grid="XY")
FIN
This is brand new and — as noted — things may change or break due to CDC site changes. I may have also missed a table or two (it’s a truly terrible site).
If you notice things are missing or would like a different interface to various data endpoints, drop an issue or PR wherever you’re most comfortable.
Stay safe!
7 Comments
Hi Hrbrmstr: It seems as though your package would go well with the new RStudio addition at the link below. https://gt.rstudio.com/
Thx. {gt} has def been around for quite a while. I’m actually surprised it made it to CRAN as it was nigh abandoned for just under 18 months.
Hi, your package don’t worl with latest version R :
package ‘cdccovidview’ is not available (for R version 3.6.3)
The blog post clearly states where to go to find instructions for how to install the package. Just re-read it vs skim.
Thank you very much for your package, it’s really cool!
It seems that it retrieves a previous version of the CDCCOVIDVIEW (cutoff april10th).
Is there any way we can use the package to retrieve the latest data available (April 24th)?
Thank you so much.
it retrieves what is available. there is no inherent restriction in the package.
Except that they changed the URL pattern. @$R@#$@
Updating the pkg now and #ty.
2 Trackbacks/Pingbacks
[…] by data_admin [This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page […]
[…] article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page […]