**NOTE** If there’s a particular data set from http://www.cdc.gov/flu/weekly/fluviewinteractive.htm that you want and that isn’t in the pacakge, please file it as an issue and be as specific as you can (screen shot if possible).
—–
Towards the end of 2014 I had been tinkering with flu data from the [CDC’s FluView portal](http://gis.cdc.gov/grasp/fluview/fluportaldashboard.html) since flu reports began to look like this season was going to go the way of 2009.
While you can track the flu over at [The Washington Post](http://www.washingtonpost.com/graphics/health/flu-tracker/), I like to work with data on my own. However the CDC’s portal is Flash-driven and there was no obvious way to get the data files programmatically. This is unfortunate, since there are weekly updates to the data set.
As an information security professional, one of the tools in my arsenal is [Burp Proxy](http://portswigger.net/burp/proxy.html), which is an application that—amongst other things—lets you configure a local proxy server for your browser and inspect all web requests. By using this tool, I was able to discern that the Flash portal calls out to `http://gis.cdc.gov/grasp/fluview/FluViewPhase2CustomDownload.ashx` with custom `POST` form parameters (that I also mapped out) to make the data sets it delivers back to the user.
With that information in hand, I whipped together a small R package: [cdcfluview](https://github.com/hrbrmstr/cdcfluview) to interface with the same server the FluView Portal does. It has a singular function – `get_flu_data` that lets you choose between different region/sub-region breakdowns and also whether you want data from WHO, ILINet (or both). It also lets you pick which years you want data for.
One reason I wanted to work with the data was to see just how this season differs from previous ones. The view I’ll leave on the blog this time—mostly as an example of how to use the package—is a faceted chart, by CDC region and CDC week showing this season (in red) as it relates to previous ones.
# devtools::install_github("hrbrmstr/cdcfluview") # if necessary library(cdcfluview) library(magrittr) library(dplyr) library(ggplot2) dat <- get_flu_data(region="hhs", sub_region=1:10, data_source="ilinet", years=2000:2014) dat %<>% mutate(REGION=factor(REGION, levels=unique(REGION), labels=c("Boston", "New York", "Philadelphia", "Atlanta", "Chicago", "Dallas", "Kansas City", "Denver", "San Francisco", "Seattle"), ordered=TRUE)) %>% mutate(season_week=ifelse(WEEK>=40, WEEK-40, WEEK), season=ifelse(WEEK<40, sprintf("%d-%d", YEAR-1, YEAR), sprintf("%d-%d", YEAR, YEAR+1))) prev_years <- dat %>% filter(season != "2014-2015") curr_year <- dat %>% filter(season == "2014-2015") curr_week <- tail(dat, 1)$season_week gg <- ggplot() gg <- gg + geom_point(data=prev_years, aes(x=season_week, y=X..WEIGHTED.ILI, group=season), color="#969696", size=1, alpa=0.25) gg <- gg + geom_point(data=curr_year, aes(x=season_week, y=X..WEIGHTED.ILI, group=season), color="red", size=1.25, alpha=1) gg <- gg + geom_line(data=curr_year, aes(x=season_week, y=X..WEIGHTED.ILI, group=season), size=1.25, color="#d7301f") gg <- gg + geom_vline(xintercept=curr_week, color="#d7301f", size=0.5, linetype="dashed", alpha=0.5) gg <- gg + facet_wrap(~REGION, ncol=3) gg <- gg + labs(x=NULL, y="Weighted ILI Index", title="ILINet - 1999-2015 year weighted flu index history by CDC region\nWeek Ending Jan 3, 2015 (Red == current season)\n") gg <- gg + theme_bw() gg <- gg + theme(panel.grid=element_blank()) gg <- gg + theme(strip.background=element_blank()) gg <- gg + theme(axis.ticks.x=element_blank()) gg <- gg + theme(axis.text.x=element_blank()) gg
(You can see an SVG version of that plot [here](http://rud.is/dl/flureport.svg))
Even without looking at the statistics, it’s pretty easy to tell that this is fixing to be a pretty bad season in many regions.
### State-level data
Soon after this post I found the state-level API for the CDC FluView interface and added a `get_state_data` function for it:
library(statebins) get_state_data() %>% filter(WEEKEND=="Jan-03-2015") %>% select(state=STATENAME, value=ACTIVITY.LEVEL) %>% filter(!(state %in% c("Puerto Rico", "New York City"))) %>% # need to add NYC & PR to statebins mutate(value=as.numeric(gsub("Level ", "", value))) %>% statebins(brewer_pal="RdPu", breaks=4, labels=c("Minimal", "Low", "Moderate", "High"), legend_position="bottom", legend_title="ILI Activity Level") + ggtitle("CDC State FluView (2014-01-03)")
As always, post bugs or feature requests on the [github repo](https://github.com/hrbrmstr/cdcfluview) and drop a note here if you’ve found the package useful or have some other interesting views or analyses to share.
7 Comments
Excellent post! I made the same approach for another institutional flash website that queried geoserver. Also made a small library that is not public yet because one thing that bothers me is that was somehow “reverse engineering”, and I’m not sure they would give permission to do it or change the server settings meanwhile. I believe webservices are the future (if not the present) to deliver data to the public, but some institutions might not be happy with it. What are your thoughts about that?
This is extremely cool: thanks for putting this together. Some minor notes:
You might remind folks that they can install the package from github
devtools::install_github("hrbrmstr/cdcfluview")
There’s a missing “
library(ggplot2)
” in your code.Kudos for sharing such a timely and interesting post.
Nick
Thanks! Both items remedied.
Is it possible to get data only for cities using this code ?
Unfortunately not. The CDC data is regional. Even state-level data ( like NY : https://www.health.ny.gov/diseases/communicable/influenza/surveillance/2015-2016/flureportcurrent_week.pdf ) isn’t going to get you city-level data. You’d need to find a reliable source of health data for a given city and prbly end up scraping a PDF (like that state-level NY report).
Awesome ! Thank you for getting back to me. If you have any ideas where to track city data for example (Seattle ) I would definitely appreciate it. I am just learning R and I had no idea R code could do what your code did. Thats really cool
I am a high school student who is looking into doing some research on flu occurrences. I am not familiar with R package. I was wondering if I get the package installed and working, whether I will be able to access data on flu occurrences at the state level.
2 Trackbacks/Pingbacks
[…] Towards the end of 2014 I had been tinkering with flu data from the CDC’s FluView portal since flu reports began to look like this season was going to go the way of 2009. While you can track the flu over at The Washington Post, I like to work with data on my own. However […] […]
[…] There’s an example on github and in the archives here. […]