D Kelly O’Day did a [great post](https://chartsgraphs.wordpress.com/2015/01/16/nasa-gisss-annual-global-temperature-anomaly-trends/) on charting NASA’s Goddard Institute for Space Studies (GISS) temperature anomaly data, but it sticks with base R for data munging & plotting. While there’s absolutely nothing wrong with base R operations, I thought a modern take on the chart using `dplyr`, `magrittr` & `tidyr` for data manipulation and `ggplot2` for formatting would be helpful for the scores of new folk learning R this year (our little language is becoming [all the rage](http://redmonk.com/sogrady/2015/01/14/language-rankings-1-15/), it seems). I also really enjoy working with weather data.
Before further exposition, here’s the result:
I made liberal use of the “piping” idiom encouraged `magrittr`, `dplyr` and other new R packages, including the forward assignment operator `->` (which may put some folks off a bit). That also meant using `magrittr`’s aliases for `[` and `[[`, which are more readable in pipes.
I don’t use `library(tidyr)` since `tidyr`’s `extract` conflicts with `magrittr`’s, but you’ll see a `tidyr::gather` in the code for wide-to-long data shaping.
I chose to use the monthly temperature anomaly data as a base layer in the chart as a contrast to the monthly- and annual-anomaly means. I also marked the hottest annual- and annual-mean anomalies and framed the decades with vertical markers.
There are no hardcoded years or decades anywhere in the `ggplot2` code, so this should be quite reusable as the data source gets updated.
As I come back to the chart, I think there may be a bit too much “chart junk” on it, but you can tweak it to your own aesthetic preferences (if you do, drop a note in the comments with a link to your creation).
The code is below and in [this gist](https://gist.github.com/hrbrmstr/07ba10fb4c3fe9c9f3a0).
library(httr) library(magrittr) library(dplyr) library(ggplot2) # data retrieval ---------------------------------------------------------- # the user agent string was necessary for me; YMMV pg <- GET("http://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.txt", user_agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A")) # extract monthly data ---------------------------------------------------- content(pg, as="text") %>% strsplit("\n") %>% extract2(1) %>% grep("^[[:digit:]]", ., value=TRUE) -> lines # extract column names ---------------------------------------------------- content(pg, as="text") %>% strsplit("\n") %>% extract2(1) %>% extract(8) %>% strsplit("\ +") %>% extract2(1) -> lines_colnames # make data frame --------------------------------------------------------- data <- read.table(text=lines, stringsAsFactors=FALSE) colnames(data) <- lines_colnames # transform data frame ---------------------------------------------------- data %>% tidyr::gather(month, value, Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec) %>% # wide to long mutate(value=value/100) %>% # convert to degree Celcius change select(year=Year, month, value) %>% # only need these fields mutate(date=as.Date(sprintf("%d-%d-%d", year, month, 1)), # make proper dates decade=year %/% 10, # calc decade start=decade*10, end=decade*10+9) %>% # calc decade start/end group_by(decade) %>% mutate(decade_mean=mean(value)) %>% # calc decade mean group_by(year) %>% mutate(annum_mean=mean(value)) %>% # calc annual mean ungroup -> data # start plot -------------------------------------------------------------- gg <- ggplot() # decade vertical markers ------------------------------------------------- gg <- gg + geom_vline(data=data %>% select(end), aes(xintercept=as.numeric(as.Date(sprintf("%d-12-31", end)))), size=0.5, color="#4575b4", linetype="dotted", alpha=0.5) # monthly data ------------------------------------------------------------ gg <- gg + geom_line(data=data, aes(x=date, y=value, color="monthly anomaly"), size=0.35, alpha=0.25) gg <- gg + geom_point(data=data, aes(x=date, y=value, color"monthly anomaly"), size=0.75, alpha=0.5) # decade mean ------------------------------------------------------------- gg <- gg + geom_segment(data=data %>% distinct(decade, decade_mean, start, end), aes(x=as.Date(sprintf("%d-01-01", start)), xend=as.Date(sprintf("%d-12-31", end)), y=decade_mean, yend=decade_mean, color="decade mean anomaly"), linetype="dashed") # annual data ------------------------------------------------------------- gg <- gg + geom_line(data=data %>% distinct(year, annum_mean), aes(x=as.Date(sprintf("%d-06-15", year)), y=annum_mean, color="annual mean anomaly"), size=0.5) gg <- gg + geom_point(data=data %>% distinct(year, annum_mean), aes(x=as.Date(sprintf("%d-06-15", year)), y=annum_mean, color="annual mean anomaly"), size=2) # additional annotations -------------------------------------------------- # max annual mean anomaly horizontal marker/text gg <- gg + geom_hline(yintercept=max(data$annum_mean), alpha=0.9, color="#d73027", linetype="dashed", size=0.25) gg <- gg + annotate("text", x=as.Date(sprintf("%d-12-31", mean(range(data$year)))), y=max(data$annum_mean), color="#d73027", alpha=0.9, hjust=0.25, vjust=-1, size=3, label=sprintf("Max annual mean anomaly %2.1fºC", max(data$annum_mean))) gg <- gg + geom_hline(yintercept=max(data$value), alpha=0.9, color="#7f7f7f", linetype="dashed", size=0.25) # max annual anomaly horizontal marker/text gg <- gg + annotate("text", x=as.Date(sprintf("%d-12-31", mean(range(data$year)))), y=max(data$value), color="#7f7f7f", alpha=0.9, hjust=0.25, vjust=-1, size=3, label=sprintf("Max annual anomaly %2.1fºC", max(data$value))) gg <- gg + annotate("text", x=as.Date(sprintf("%d-12-31", range(data$year)[2])), y=min(data$value), size=3, hjust=1, label="Data: http://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.txt") # set colors -------------------------------------------------------------- gg <- gg + scale_color_manual(name="", values=c("#d73027", "#4575b4", "#7f7f7f")) # set x axis limits ------------------------------------------------------- gg <- gg + scale_x_date(expand=c(0, 1), limits=c(as.Date(sprintf("%d-01-01", range(data$year)[1])), as.Date(sprintf("%d-12-31", range(data$year)[2])))) # add labels -------------------------------------------------------------- gg <- gg + labs(x=NULL, y="GLOBAL Temp Anomalies in 1.0ºC", title=sprintf("GISS Land and Sea Temperature Annual Anomaly Trend (%d to %d)\n", range(data$year)[1], range(data$year)[2])) # theme/legend tweaks ----------------------------------------------------- gg <- gg + theme_bw() gg <- gg + theme(panel.grid=element_blank()) gg <- gg + theme(panel.border=element_blank()) gg <- gg + theme(legend.position=c(0.9, 0.2)) gg <- gg + theme(legend.key=element_blank()) gg <- gg + theme(legend.background=element_blank()) gg
One Trackback/Pingback
[…] D Kelly O’Day did a great post on charting NASA’s Goddard Institute for Space Studies (GISS) temperature anomaly data, but it sticks with base R for data munging & plotting. While there’s absolutely nothing wrong with base R operations, I thought a modern take on the chart using dplyr, magrittr & tidyr for data manipulation […] […]