I’ve updated my [metricsgraphics](https://github.com/hrbrmstr/metricsgraphics) package to version [0.7](https://github.com/hrbrmstr/metricsgraphics/releases/tag/v0.7). The core [MetricsGraphics](http://metricsgraphicsjs.org) JavaScript library has been updated to version 2.1.0 (from 1.1.0). Two blog-worthy features since releasing version 0.5 are `mjs_grid` (which is a `grid.arrange`-like equivalent for `metricsgraphics` plots and `mjs_add_rollover` which lets you add your own custom rollover text to the plots.
### The Grid
The `grid.arrange` (and `arrangeGrob`) functions from the `gridExtra` package come in handy when combining `ggplot2` charts. I wanted a similar way to arrange independent or linked `metricsgraphics` charts, hence `mjs_grid` was born.
`mjs_grid` uses the tag functions in `htmltools` to arrange `metricsgraphics` plot objects into an HTML `
Variable | Columns | Type |
---|---|---|
COOP ID | 1-6 | Character |
YEAR | 7-10 | Integer |
MONTH | 11-12 | Integer |
ELEMENT | 13-16 | Character |
VALUE1 | 17-21 | Integer |
MFLAG1 | 22 | Character |
QFLAG1 | 23 | Character |
SFLAG1 | 24 | Character |
VALUE2 | 25-29 | Integer |
MFLAG2 | 30 | Character |
QFLAG2 | 31 | Character |
SFLAG2 | 32 | Character |
… | … | … |
VALUE31 | 257-261 | Integer |
MFLAG31 | 262 | Character |
QFLAG31 | 263 | Character |
SFLAG31 | 264 | Character |
We have fixed-width, wide-format records with 31 days for each month, which proves the existence of truly insidious people in the world. Rather than use `read.fwf` again, we’ll take a different approach (since we ultimately need the data in long format) and use `readLines` to read in all the records from the NH data file, then filter out everything but snowfall entries from the station we’re interested in.
Next, we setup nested `lapply` calls to build a long data frame from each month then combine them all together into a single data frame:
snow <- readLines("data/state27_NH.txt") snow <- grep("SNOW", snow, value=TRUE) snow <- grep("^272174", snow, value=TRUE) snow_dat <- rbindlist(pblapply(snow, function(x) { rbindlist(lapply(1:31, function(i) { # record format described here: # http://cdiac.ornl.gov/ftp/ushcn_daily/data_format.txt start <- 17 + (i-1)*8 list(coop_id=substr(x, 1, 6), date=sprintf("%s-%02d-%02d", substr(x, 7, 10), as.numeric(substr(x, 11, 12)), i), element=substr(x, 13, 16), value=as.numeric(substr(x, start, start+4)), mflag=substr(x, start+5, start+5), qflag=substr(x, start+6, start+6), sflag=substr(x, start+7, start+7)) })) }))
Now, we’ll clean up the records even further by removing invalid entries (those with a `value` == `-9999`) and convert record dates to actual `Date` objects and filter out invalid dates:
snow_dat <- snow_dat %>% filter(value != -9999) # since the data file has 31 days for each records regardless of whether # that's valid or not we do a shortcut to remove invalid dates by doing the # a vectorized Date conversion, then removing records with NA dates snow_dat$date <- as.Date(snow_dat$date) snow_dat <- snow_dat %>% filter(!is.na(date)) # having the year extracted is handy for filtering snow_dat$year <- format(snow_dat$date, "%Y")
Given that Winter in the U.S. spans across two calendar years, we need a way to keep dates in January-May associated with the previous year (yes, that adds an inherent assumption that no first snow is in June, which might not hold true for Alaska). To facilitate this, we’ll convert each date to its corresponding day of year value then add the number of total days in the start year to those values for all months <= May. We really do need to do this, too, since there are many cases where the first snowfall will be in January-March for many states.
snow_dat$doy <- as.numeric(format(snow_dat$date, "%j")) snow_dat$doy <- ifelse(snow_dat$doy<=180, snow_dat$doy + as.numeric(format(as.Date(sprintf("%s-12-31", snow_dat$year)), "%j")), snow_dat$doy)
Now, the fun begins. We use (mostly) `dplyr` to extract the first snowfall day from each year, then make a dot-line plot from the data:
first <- snow_dat %>% filter(value>0) %>% # ignore 0 values filter(date>=as.Date("1950-01-01")) %>% # start at 1950 (arbitrary) merge(stations, by="coop_id", all.x=TRUE) %>% # merge station details group_by(coop_id, year) %>% # group by station and year arrange(doy) %>% # sort by our munged day of year filter(row_number(doy) == 1) %>% # grab the first entry by group select(name, state, date, value, doy) # we only need some variables title_1 <- sprintf("First observed snowfall (historical) at %s, %s", stri_trans_totitle(unique(first$name)), unique(first$state)) gg <- ggplot(first, aes(y=year, x=doy)) gg <- gg + geom_segment(aes(xend=min(first$doy)-20, yend=year), color="#9ecae1", size=0.25) gg <- gg + geom_point(aes(color=coop_id), shape=8, size=3, color="#3182bd") gg <- gg + geom_text(aes(label=format(date, "%b-%d")), size=3, hjust=-0.2) gg <- gg + scale_x_continuous(expand=c(0, 0), limits=c(min(first$doy)-20, max(first$doy)+20)) gg <- gg + labs(x=NULL, y=NULL, title=title_1) gg <- gg + theme_bw() gg <- gg + theme(legend.position="none") gg <- gg + theme(panel.grid=element_blank()) gg <- gg + theme(panel.border=element_blank()) gg <- gg + theme(axis.ticks.x=element_blank()) gg <- gg + theme(axis.ticks.y=element_blank()) gg <- gg + theme(axis.text.x=element_blank()) gg <- gg + theme(axis.text.y=element_text(color="#08306b")) by_year <- gg
While that will help us see the diversity across years, we have to do quite a bit of eye tracking to get the most likely date range for the first snowfall, so we’ll add a boxplot into the mix and use `summary` to figure out the quartiles (for labeling the chart) for the actual date values:
wx_range <- summary(as.Date(format(first$date, "2013-%m-%d"))) names(wx_range) <- NULL min_wx <- gsub("2013-", "", wx_range[2]) max_wx <- gsub("2013-", "", wx_range[5]) title_2 <- sprintf("Most likely first snowfall will be between %s & %s", min_wx, max_wx) gg <- ggplot(first %>% mutate(name="0000"), aes(name, doy)) gg <- gg + geom_boxplot(fill="#3182bd", color="#08306b", outlier.colour="#08306b") gg <- gg + scale_y_continuous(expand=c(0, 0), limits=c(min(first$doy)-20, max(first$doy)+20)) gg <- gg + coord_flip() gg <- gg + labs(x=NULL, y=NULL, title=title_2) gg <- gg + theme_bw() gg <- gg + theme(legend.position="none") gg <- gg + theme(panel.grid=element_blank()) gg <- gg + theme(panel.border=element_blank()) gg <- gg + theme(axis.ticks.x=element_blank()) gg <- gg + theme(axis.text.x=element_blank()) gg <- gg + theme(axis.ticks.y=element_line(color="white")) gg <- gg + theme(axis.text.y=element_text(color="white")) gg <- gg + theme(plot.title=element_text(size=11)) box_wx <- gg
Finally, we’ll combine them together to get the finished product:
grid.arrange(by_year, box_wx, nrow=2, heights=unit(c(0.9, 0.1), "npc"))
And, do the same for Portland:
There are many more analyses and visualizations that can be performed on these data sets, but be wary when creating them as I’ve seen a few files with fixed-width formatting errors and have also noticed missing records for some observer stations.
You can find the complete, commented code up on [github](https://github.com/hrbrmstr/snowfirst).
In a previous post we looked at how to use D3 TopoJSON files with R and make some very D3-esque maps. I mentioned that one thing missing was moving Alaska & Hawaii a bit closer to the continental United States and this post shows you how to do that.
The D3 folks have it easy. They just use the built in modified Albers composite projection. We R folk have to roll up our sleeves a bit (but not much) to do the same. Thankfully, we can do most of the work with the elide (“ih lied”) function from the maptools
package.
We’ll start with some package imports:
library(maptools)
library(mapproj)
library(rgeos)
library(rgdal)
library(RColorBrewer)
library(gthemes)
library(ggplot2)
I’m using a GeoJSON file that I made from the 2013 US Census shapefile. I prefer GeoJSON mostly due to it being single file and the easy conversion to TopoJSON if I ever need to use the same map in a D3 context (I work with information security data most of the time, so I rarely have to use maps at all for the day job). I simplified the polygons a bit (passing -simplify 0.01
to ogr2ogr
) to reduce processing time.
We read in that file and then transform the projection to Albers equal area and join the polygon ids to the shapefile data frame:
# https://www.census.gov/geo/maps-data/data/cbf/cbf_counties.html
# read U.S. counties moderately-simplified GeoJSON file
us <- readOGR(dsn="data/us.geojson", layer="OGRGeoJSON")
# convert it to Albers equal area
us_aea <- spTransform(us, CRS("+proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs"))
us_aea@data$id <- rownames(us_aea@data)
Now, to move Alaska & Hawaii, we have to:
- extract them from the main shapefile data frame
- perform rotation, scaling and transposing on them
- ensure they have the right projection set
- merge them back into the original spatial data frame
The elide
function has parameters for all the direct shape munging, so we’ll just do that for both states. I took a peek at the D3 source code for the Albers projection to get a feel for the parameters. You can tweak those if you want them in other positions or feel the urge to change the Alaska rotation angle.
# extract, then rotate, shrink & move alaska (and reset projection)
# need to use state IDs via # https://www.census.gov/geo/reference/ansi_statetables.html
alaska <- us_aea[us_aea$STATEFP=="02",]
alaska <- elide(alaska, rotate=-50)
alaska <- elide(alaska, scale=max(apply(bbox(alaska), 1, diff)) / 2.3)
alaska <- elide(alaska, shift=c(-2100000, -2500000))
proj4string(alaska) <- proj4string(us_aea)
# extract, then rotate & shift hawaii
hawaii <- us_aea[us_aea$STATEFP=="15",]
hawaii <- elide(hawaii, rotate=-35)
hawaii <- elide(hawaii, shift=c(5400000, -1400000))
proj4string(hawaii) <- proj4string(us_aea)
# remove old states and put new ones back in; note the different order
# we're also removing puerto rico in this example but you can move it
# between texas and florida via similar methods to the ones we just used
us_aea <- us_aea[!us_aea$STATEFP %in% c("02", "15", "72"),]
us_aea <- rbind(us_aea, alaska, hawaii)
Rather than just show the resultant plain county map, we’ll add some data to it. The first example uses US drought data (from November 11th, 2014). Drought conditions are pretty severe in some states, but we’ll just show areas that have any type of drought (levels D0-D4). The color ramp shows the % of drought coverage in each county (you’ll need a browser that can display SVGs to see the maps):
# get some data to show...perhaps drought data via:
# http://droughtmonitor.unl.edu/MapsAndData/GISData.aspx
droughts <- read.csv("data/dm_export_county_20141111.csv")
droughts$id <- sprintf("%05d", as.numeric(as.character(droughts$FIPS)))
droughts$total <- with(droughts, (D0+D1+D2+D3+D4)/5)
# get ready for ggplotting it... this takes a cpl seconds
map <- fortify(us_aea, region="GEOID")
# plot it
gg <- ggplot()
gg <- gg + geom_map(data=map, map=map,
aes(x=long, y=lat, map_id=id, group=group),
fill="#ffffff", color="#0e0e0e", size=0.15)
gg <- gg + geom_map(data=droughts, map=map, aes(map_id=id, fill=total),
color="#0e0e0e", size=0.15)
gg <- gg + scale_fill_gradientn(colours=c("#ffffff", brewer.pal(n=9, name="YlOrRd")),
na.value="#ffffff", name="% of County")
gg <- gg + labs(title="U.S. Areas of Drought (all levels, % county coverage)")
gg <- gg + coord_equal()
gg <- gg + theme_map()
gg <- gg + theme(legend.position="right")
gg <- gg + theme(plot.title=element_text(size=16))
gg
While that shows Alaska & Hawaii in D3-Albers-style, it would be more convincing if we actually used the FIPS county codes on Alaska & Hawaii, so we’ll riff off the previous post and make a county land-mass area choropleth (since we have the land mass area data available in the GeoJSON file):
gg <- ggplot()
gg <- gg + geom_map(data=map, map=map,
aes(x=long, y=lat, map_id=id, group=group),
fill="white", color="white", size=0.15)
gg <- gg + geom_map(data=us_aea@data, map=map, aes(map_id=GEOID, fill=log(ALAND)),
color="white", size=0.15)
gg <- gg + scale_fill_gradientn(colours=c(brewer.pal(n=9, name="YlGn")),
na.value="#ffffff", name="County Land\nMass Area (log)")
gg <- gg + labs(title="U.S. County Area Choropleth (log scale)")
gg <- gg + coord_equal()
gg <- gg + theme_map()
gg <- gg + theme(legend.position="right")
gg <- gg + theme(plot.title=element_text(size=16))
gg
Now, you have one less reason to be envious of the D3 cool kids!
The code & shapefiles are available on github.