The family got hit pretty hard with the flu right as the Christmas festivities started and we were all pretty much bed-ridden zombies up until today (2017-01-02). When in the throes of a very bad ILI it’s easy to imagine that you’re a victim of a severe outbreak, especially with ancillary data from others that they, too, either just had/have the flu or know others who do. Thankfully, I didn’t have to accept this emotional opine and could turn to the cdcfluview package to see just how this year is measuring up.
Influenza cases are cyclical, and that’s super-easy to see with a longer-view of the CDC national data:
library(cdcfluview)
library(tidyverse)
library(stringi)
flu <- get_flu_data("national", years=2010:2016)
mutate(flu, week=as.Date(sprintf("%s %s 1", YEAR, WEEK), format="%Y %U %u")) %>%
select(-`AGE 25-64`) %>%
gather(age_group, count, starts_with("AGE")) %>%
mutate(age_group=stri_replace_all_fixed(age_group, "AGE", "Ages")) %>%
mutate(age_group=factor(age_group, levels=c("Ages 0-4", "Ages 5-24", "Ages 25-49", "Ages 50-64", "Ages 65"))) %>%
ggplot(aes(week, count, group=age_group)) +
geom_line(aes(color=age_group)) +
scale_y_continuous(label=scales::comma, limits=c(0,20000)) +
facet_wrap(~age_group, scales="free") +
labs(x=NULL, y="Count of reported ILI cases",
title="U.S. National ILI Case Counts by Age Group (2010:2011 flu season through 2016:2017)",
caption="Source: CDC ILInet via CRAN cdcfluview pacakge") +
ggthemes::scale_color_tableau(name=NULL) +
hrbrmisc::theme_hrbrmstr(grid="XY") +
theme(legend.position="none")
We can use the same data to zoom in on this season:
mutate(flu, week=as.Date(sprintf("%s %s 1", YEAR, WEEK), format="%Y %U %u")) %>%
select(-`AGE 25-64`) %>%
gather(age_group, count, starts_with("AGE")) %>%
mutate(age_group=stri_replace_all_fixed(age_group, "AGE", "Ages")) %>%
mutate(age_group=factor(age_group, levels=c("Ages 0-4", "Ages 5-24", "Ages 25-49", "Ages 50-64", "Ages 65"))) %>%
filter(week >= as.Date("2016-07-01")) %>%
ggplot(aes(week, count, group=age_group)) +
geom_line(aes(color=age_group)) +
scale_y_continuous(label=scales::comma, limits=c(0,20000)) +
facet_wrap(~age_group, scales="free") +
labs(x=NULL, y="Count of reported ILI cases",
title="U.S. National ILI Case Counts by Age Group (2016:2017 flu season)",
caption="Source: CDC ILInet via CRAN cdcfluview pacakge") +
ggthemes::scale_color_tableau(name=NULL) +
hrbrmisc::theme_hrbrmstr(grid="XY") +
theme(legend.position="none")
So, things are trending up, but how severe is this year compared to others? While looking at the number/percentage of ILI cases is one way to understand severity, another is to look at the mortality rate. The cdcfluview package has a get_mortality_surveillance_data() function, but it’s region-based and I’m really only looking at national data in this post. A helpful individual pointed out a new CSV file at https://www.cdc.gov/flu/weekly/index.htm#MS which we can reproducibly programmatically target (so we don’t have to track filename changes by hand) with:
library(rvest)
pg <- read_html("https://www.cdc.gov/flu/weekly/index.htm#MS")
html_nodes(pg, xpath=".//a[contains(@href, 'csv') and contains(@href, 'NCHS')]") %>%
html_attr("href") -> mort_ref
mort_url <- sprintf("https://www.cdc.gov%s", mort_ref)
df <- readr::read_csv(mort_url)
We can, then, take a look at the current “outbreak” status (when real-world mortality events exceed the model threshold):
mutate(df, week=as.Date(sprintf("%s %s 1", Year, Week), format="%Y %U %u")) %>%
select(week, Expected, Threshold, `Percent of Deaths Due to Pneumonia and Influenza`) %>%
gather(category, percent, -week) %>%
mutate(percent=percent/100) %>%
ggplot() +
geom_line(aes(week, percent, group=category, color=category)) +
scale_x_date(date_labels="%Y-%U") +
scale_y_continuous(label=scales::percent) +
ggthemes::scale_color_tableau(name=NULL) +
labs(x=NULL, y=NULL, title="U.S. Pneumonia & Influenza Mortality",
subtitle="Data through week ending December 10, 2016 as of December 28, 2016",
caption="Source: National Center for Health Statistics Mortality Surveillance System") +
hrbrmisc::theme_hrbrmstr(grid="XY") +
theme(legend.position="bottom")
That view is for all mortality events from both influenza and pneumonia. We can look at the counts for just influenza as well:
mutate(df, week=as.Date(sprintf("%s %s 1", Year, Week), format="%Y %U %u")) %>%
select(week, `Influenza Deaths`) %>%
ggplot() +
geom_line(aes(week, `Influenza Deaths`), color=ggthemes::tableau_color_pal()(1)) +
scale_x_date(date_labels="%Y-%U") +
scale_y_continuous(label=scales::comma) +
ggthemes::scale_color_tableau(name=NULL) +
labs(x=NULL, y=NULL, title="U.S. Influenza Mortality (count of mortality events)",
subtitle="Data through week ending December 10, 2016 as of December 28, 2016",
caption="Source: National Center for Health Statistics Mortality Surveillance System") +
hrbrmisc::theme_hrbrmstr(grid="XY") +
theme(legend.position="bottom")
It’s encouraging that the overall combined mortality rate is trending downwards and that the mortality rate for influenza is very low. Go. Science.
I’ll be adding a function to cdcfluview to retrieve this new data set a bit later in the year.
Hopefully you’ll avoid the flu and enjoy a healthy and prosperous 2017.




North Carolina’s Neighborhood
When I saw the bombastic headline “North Carolina is no longer classified as a democracy” pop up in my RSS feeds today (article link: http://www.newsobserver.com/opinion/op-ed/article122593759.html) I knew it’d help feed polarization bear that’s been getting fat on ‘Murica for the past decade. Sure enough, others picked it up and ran with it. I can’t wait to see how the opposite extreme reacts (everybody’s gotta feed the bear).
As of this post, neither site linked to the actual data, so here’s an early Christmas present: The Electoral Integrity Project Data. I’m very happy this is public data since this is the new reality for “news” intake:
Data literacy is even more important than it has been.
Back to the title of the post: where exactly does North Carolina fall on the newly assessed electoral integrity spectrum in the U.S.? Right here (click to zoom in):
Focusing solely on North Carolina is pretty convenient (I know there’s quite a bit of political turmoil going on down there at the moment, but that’s no excuse for cherry picking) since — frankly — there isn’t much to be proud of on that entire chart. Here’s where the ‘States fit on the global rankings (we’re in the gray box):
You can page through the table to see where our ‘States fall (we’re between Guana & Latvia…srsly). We don’t always have the nicest neighbors:
This post isn’t a commentary on North Carolina, it’s a cautionary note to be very wary of scary headlines that talk about data but don’t really show it. It’s worth pointing out that I’m taking the PEI data as it stands. I haven’t validated the efficacy of their process or checked on how “activist-y” the researchers are outside the report. It’s somewhat sad that this is a necessary next step since there’s going to be quite a bit of lying with data and even more lying about-and/or-without data over the next 4+ years on both sides (more than in the past eight combined, probably).
The PEI folks provide methodology information and data. Read/study it. They provide raw and imputed confidence intervals (note how large some of those are in the two graphs) – do the same for your research. If their practices are sound, the ‘States chart is pretty damning. I would hope that all the U.S. states would be well above 75 on the rating scale and the fact that we aren’t is a suggestion that we all have work to do right “here” at home, beginning with ceasing to feed the polarization bear.
If you do download the data, here’s the R code that generated the charts: