Unlike @noamross, I am not an epidemiologist (NOTE: Noam battles pandemics before breakfast, so be super nice to him) but I do like to find kindred methodologies in other disciplines to help foster the growth of cybersecurity into something beyond it’s current Barnum & Bailey state. I also love finding and exposing hidden APIs and especially enjoy killing Adobe Flash. How does all that relate to
cdcfluview? was one of my first R packages. Someone, somwewhere, was trying to do something with Selenium to automate downloading of data from the CDC’s FluView Portal. It was — and, some of it still is — a Flash-based site that locked up useful data behind application screens that did little more than burn ones retinas and force folks to keep Flash alive and, hence, their browsers insecure.
Rather than let the requester suffer under the weight of a pretty significant external dependency, I used the magic of the browser “developer tools” inspector to see that it was making fairly innocuous and useful XHR requests for real data. The package sat on GitHub for a while and eventually made its way to CRAN.
Times change and Flash is dying, so the CDC paid some serious benjamins to have the site re-done in HTML, replicating the horrible UX and terrible visualizations (so. many. pie. charts.). Said revamp also caused changes to the back-end APIs and forced breaking changes. Craig McGowan jumped to the rescue and fixed some core functionality issues, but so much changed — and so much was added — that I felt it was time for a modern re-write of the
This is a pretty solid, real-world example of how dangerous it is to rely on hidden APIs. If Craig hadn’t both notified me and gone the extra mile to make a PR, I’d’ve been in the dark until I tried to commiserate (I always seem get the flu no matter what I do) with code and found my package erroring out.
Unfortunately, everything; which is one reason I’m writing this post.
First, to have folks that are using current-gen
cdcfluview kick the tyres and let me know (via issues) if you need any old API compatibility back. This isn’t anywhere near the most popular package on CRAN, but it does have users (even, I’m told, within the CDC) and I want to make sure I do as little to disrupt them as possible. But, the current package API maps much more closely to the way the revamped portal works and presents data, so I’m hoping it’s a good net-new vs crushing blow to productivity.
Speaking of maps, the package now has actual maps! A new
cdc_basemap() function returns the GeoJSON files that the CDC uses in their web views as
sf objects. And, there are tons of maps and multi-labeled features to tie data to:
Here’s what’s in the tin:
age_group_distribution: Age Group Distribution of Influenza Positive Tests Reported by Public Health Laboratories
cdc_basemap: Retrieve CDC U.S. Basemaps
geographic_spread: State and Territorial Epidemiologists Reports of Geographic Spread of Influenza
hospitalizations: Laboratory-Confirmed Influenza Hospitalizations
ilinet: Retrieve ILINet Surveillance Data
ili_weekly_activity_indicators: Retrieve weekly state-level ILI indicators per-state for a given season
pi_mortality: Pneumonia and Influenza Mortality Surveillance
state_data_providers: Retrieve metadata about U.S. State CDC Provider Data
surveillance_areas: Retrieve a list of valid sub-regions for each surveillance area.
who_nrevss: Retrieve WHO/NREVSS Surveillance Data
mmwr_week: Convert a Date to an MMWR day+week+year
mmwr_weekday: Convert a Date to an MMWR weekday
mmwr_week_to_date: Convert an MMWR year+week or year+week+day to a Date object
Plus there’s a new data object
mmwrid_map that makes it super-easy to convert arcane MMWR identifiers to real date objects.
The README has plenty of charts and examples, so I won’t take up post-space with said code or images.
Along the way, I was able to discern that there’s a hidden layer of this new, hidden API. Exposing said layer should be as easy as figuring out the right keyword and I’m hoping a bit of fuzzing will do the trick on that. It will be interesting to see what extra data that unlocks. (Yes, I just said relying on hidden APIs is dangerous; and, relying on hidden, hidden APIs is doubly so. I’m just a glutton for punishment.)
I was also able to discern that multiple people or teams worked on this revamp and said folks did not communicate with each other. The per-app APIs are woefully inconsistent. Furthermore, someone goofed and forgot to expose some pretty critical information from a few data retrieval operations (said data is also missing on the clickable download versions, too). Hopefully they’ll be addressing the issue soon (the site is technically in beta release).
If you’ve been a user of
cdcfluview please give the new API a try and file issues with anything you see. All contributors — testers, modders, enhancers — will get full
DESCRIPTION credit (so, please also include how you’d like to be cited).
Finally, please do check out the CDC FluView Portal. It’s gosh awful horribad. I know there are some spiffy Shiny experts out there who could run rings around that portal and I’ll be glad to add you as a collaborator if you contribute a Shiny app (or two!) to the package. If you’d rather go your own route with a self-contained, self-published package, just let me know what API changes you’d like and I’ll gladly accommodate. The goal is to help epidemiologists and other researchers keep us all safe.
So, go get your flu shot!!! Then, kick the tyres on this package update and don’t hesitate to convey your criticisms, patches or accolades.