Unlike @noamross, I am not an epidemiologist (NOTE: Noam battles pandemics before breakfast, so be super nice to him) but I do like to find kindred methodologies in other disciplines to help foster the growth of cybersecurity into something beyond it’s current Barnum & Bailey state. I also love finding and exposing hidden APIs and especially enjoy killing Adobe Flash. How does all that relate to cdcfluview
?
cdcfluview
? was one of my first R packages. Someone, somwewhere, was trying to do something with Selenium to automate downloading of data from the CDC’s FluView Portal. It was — and, some of it still is — a Flash-based site that locked up useful data behind application screens that did little more than burn ones retinas and force folks to keep Flash alive and, hence, their browsers insecure.
Rather than let the requester suffer under the weight of a pretty significant external dependency, I used the magic of the browser “developer tools” inspector to see that it was making fairly innocuous and useful XHR requests for real data. The package sat on GitHub for a while and eventually made its way to CRAN.
Times change and Flash is dying, so the CDC paid some serious benjamins to have the site re-done in HTML, replicating the horrible UX and terrible visualizations (so. many. pie. charts.). Said revamp also caused changes to the back-end APIs and forced breaking changes. Craig McGowan jumped to the rescue and fixed some core functionality issues, but so much changed — and so much was added — that I felt it was time for a modern re-write of the cdcfluview
package.
This is a pretty solid, real-world example of how dangerous it is to rely on hidden APIs. If Craig hadn’t both notified me and gone the extra mile to make a PR, I’d’ve been in the dark until I tried to commiserate (I always seem get the flu no matter what I do) with code and found my package erroring out.
Enter: cdcfluview
0.7.0.
What’s Different?
Unfortunately, everything; which is one reason I’m writing this post.
First, to have folks that are using current-gen cdcfluview
kick the tyres and let me know (via issues) if you need any old API compatibility back. This isn’t anywhere near the most popular package on CRAN, but it does have users (even, I’m told, within the CDC) and I want to make sure I do as little to disrupt them as possible. But, the current package API maps much more closely to the way the revamped portal works and presents data, so I’m hoping it’s a good net-new vs crushing blow to productivity.
Speaking of maps, the package now has actual maps! A new cdc_basemap()
function returns the GeoJSON files that the CDC uses in their web views as sf
objects. And, there are tons of maps and multi-labeled features to tie data to:
Here’s what’s in the tin:
age_group_distribution
: Age Group Distribution of Influenza Positive Tests Reported by Public Health Laboratoriescdc_basemap
: Retrieve CDC U.S. Basemapsgeographic_spread
: State and Territorial Epidemiologists Reports of Geographic Spread of Influenzahospitalizations
: Laboratory-Confirmed Influenza Hospitalizationsilinet
: Retrieve ILINet Surveillance Dataili_weekly_activity_indicators
: Retrieve weekly state-level ILI indicators per-state for a given seasonpi_mortality
: Pneumonia and Influenza Mortality Surveillancestate_data_providers
: Retrieve metadata about U.S. State CDC Provider Datasurveillance_areas
: Retrieve a list of valid sub-regions for each surveillance area.who_nrevss
: Retrieve WHO/NREVSS Surveillance Datammwr_week
: Convert a Date to an MMWR day+week+yearmmwr_weekday
: Convert a Date to an MMWR weekdaymmwr_week_to_date
: Convert an MMWR year+week or year+week+day to a Date object
Plus there’s a new data object mmwrid_map
that makes it super-easy to convert arcane MMWR identifiers to real date objects.
The README has plenty of charts and examples, so I won’t take up post-space with said code or images.
Curiously Enough
Along the way, I was able to discern that there’s a hidden layer of this new, hidden API. Exposing said layer should be as easy as figuring out the right keyword and I’m hoping a bit of fuzzing will do the trick on that. It will be interesting to see what extra data that unlocks. (Yes, I just said relying on hidden APIs is dangerous; and, relying on hidden, hidden APIs is doubly so. I’m just a glutton for punishment.)
I was also able to discern that multiple people or teams worked on this revamp and said folks did not communicate with each other. The per-app APIs are woefully inconsistent. Furthermore, someone goofed and forgot to expose some pretty critical information from a few data retrieval operations (said data is also missing on the clickable download versions, too). Hopefully they’ll be addressing the issue soon (the site is technically in beta release).
FIN
If you’ve been a user of cdcfluview
please give the new API a try and file issues with anything you see. All contributors — testers, modders, enhancers — will get full DESCRIPTION
credit (so, please also include how you’d like to be cited).
Finally, please do check out the CDC FluView Portal. It’s gosh awful horribad. I know there are some spiffy Shiny experts out there who could run rings around that portal and I’ll be glad to add you as a collaborator if you contribute a Shiny app (or two!) to the package. If you’d rather go your own route with a self-contained, self-published package, just let me know what API changes you’d like and I’ll gladly accommodate. The goal is to help epidemiologists and other researchers keep us all safe.
So, go get your flu shot!!! Then, kick the tyres on this package update and don’t hesitate to convey your criticisms, patches or accolades.
Now to get to my promised final review of cyphr
(I’ve not forgotten @ma_salmon ;-)
2 Trackbacks/Pingbacks
[…] article was first published on R – rud.is, and kindly contributed to […]
[…] article was first published on R – rud.is, and kindly contributed to […]