NASA GISS’s Annual Global Temperature Anomaly Trends (dplyr/ggplot version)

D Kelly O’Day did a [great post]( on charting NASA’s Goddard Institute for Space Studies (GISS) temperature anomaly data, but it sticks with base R for data munging & plotting. While there’s absolutely nothing wrong with base R operations, I thought a modern take on the chart using `dplyr`, `magrittr` & `tidyr` for data manipulation and `ggplot2` for formatting would be helpful for the scores of new folk learning R this year (our little language is becoming [all the rage](, it seems). I also really enjoy working with weather data.

Before further exposition, here’s the result:


I made liberal use of the “piping” idiom encouraged `magrittr`, `dplyr` and other new R packages, including the forward assignment operator `->` (which may put some folks off a bit). That also meant using `magrittr`’s aliases for `[` and `[[`, which are more readable in pipes.

I don’t use `library(tidyr)` since `tidyr`’s `extract` conflicts with `magrittr`’s, but you’ll see a `tidyr::gather` in the code for wide-to-long data shaping.

I chose to use the monthly temperature anomaly data as a base layer in the chart as a contrast to the monthly- and annual-anomaly means. I also marked the hottest annual- and annual-mean anomalies and framed the decades with vertical markers.

There are no hardcoded years or decades anywhere in the `ggplot2` code, so this should be quite reusable as the data source gets updated.

As I come back to the chart, I think there may be a bit too much “chart junk” on it, but you can tweak it to your own aesthetic preferences (if you do, drop a note in the comments with a link to your creation).

The code is below and in [this gist](

# data retrieval ----------------------------------------------------------
# the user agent string was necessary for me; YMMV
pg <- GET("",
          user_agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A"))
# extract monthly data ----------------------------------------------------
content(pg, as="text") %>%
  strsplit("\n") %>%
  extract2(1) %>%
  grep("^[[:digit:]]", ., value=TRUE) -> lines
# extract column names ----------------------------------------------------
content(pg, as="text") %>%
  strsplit("\n") %>%
  extract2(1) %>%
  extract(8) %>%
  strsplit("\ +") %>%
  extract2(1) -> lines_colnames
# make data frame ---------------------------------------------------------
data <- read.table(text=lines, stringsAsFactors=FALSE)
colnames(data) <- lines_colnames
# transform data frame ----------------------------------------------------
data %>%
  tidyr::gather(month, value, Jan, Feb, Mar, Apr, May, Jun,
                       Jul, Aug, Sep, Oct, Nov, Dec) %>%     # wide to long
  mutate(value=value/100) %>%                                # convert to degree Celcius change
  select(year=Year, month, value) %>%                        # only need these fields
  mutate(date=as.Date(sprintf("%d-%d-%d", year, month, 1)),  # make proper dates
         decade=year %/% 10,                                 # calc decade
         start=decade*10, end=decade*10+9) %>%               # calc decade start/end
  group_by(decade) %>%
    mutate(decade_mean=mean(value)) %>%                      # calc decade mean
  group_by(year) %>%
    mutate(annum_mean=mean(value)) %>%                       # calc annual mean
  ungroup -> data
# start plot --------------------------------------------------------------
gg <- ggplot()
# decade vertical markers -------------------------------------------------
gg <- gg + geom_vline(data=data %>% select(end),
                      aes(xintercept=as.numeric(as.Date(sprintf("%d-12-31", end)))),
                          size=0.5, color="#4575b4", linetype="dotted", alpha=0.5)
# monthly data ------------------------------------------------------------
gg <- gg + geom_line(data=data, aes(x=date, y=value, color="monthly anomaly"),
                     size=0.35, alpha=0.25)
gg <- gg + geom_point(data=data, aes(x=date, y=value, color"monthly anomaly"),
                      size=0.75, alpha=0.5)
# decade mean -------------------------------------------------------------
gg <- gg + geom_segment(data=data %>% distinct(decade, decade_mean, start, end),
                        aes(x=as.Date(sprintf("%d-01-01", start)),
                            xend=as.Date(sprintf("%d-12-31", end)),
                            y=decade_mean, yend=decade_mean,
                            color="decade mean anomaly"),
# annual data -------------------------------------------------------------
gg <- gg + geom_line(data=data %>% distinct(year, annum_mean),
                      aes(x=as.Date(sprintf("%d-06-15", year)), y=annum_mean,
                          color="annual mean anomaly"),
gg <- gg + geom_point(data=data %>% distinct(year, annum_mean),
                      aes(x=as.Date(sprintf("%d-06-15", year)), y=annum_mean,
                          color="annual mean anomaly"),
# additional annotations --------------------------------------------------
# max annual mean anomaly horizontal marker/text
gg <- gg + geom_hline(yintercept=max(data$annum_mean),  alpha=0.9,
                      color="#d73027", linetype="dashed", size=0.25)
gg <- gg + annotate("text",
                    x=as.Date(sprintf("%d-12-31", mean(range(data$year)))),
                    color="#d73027", alpha=0.9,
                    hjust=0.25, vjust=-1, size=3,
                    label=sprintf("Max annual mean anomaly %2.1fºC", max(data$annum_mean)))
gg <- gg + geom_hline(yintercept=max(data$value),  alpha=0.9,
                      color="#7f7f7f", linetype="dashed", size=0.25)
# max annual anomaly horizontal marker/text
gg <- gg + annotate("text",
                    x=as.Date(sprintf("%d-12-31", mean(range(data$year)))),
                    color="#7f7f7f",  alpha=0.9,
                    hjust=0.25, vjust=-1, size=3,
                    label=sprintf("Max annual anomaly %2.1fºC", max(data$value)))
gg <- gg + annotate("text",
                    x=as.Date(sprintf("%d-12-31", range(data$year)[2])),
                    y=min(data$value), size=3, hjust=1,
# set colors --------------------------------------------------------------
gg <- gg + scale_color_manual(name="", values=c("#d73027", "#4575b4", "#7f7f7f"))
# set x axis limits -------------------------------------------------------
gg <- gg + scale_x_date(expand=c(0, 1),
                        limits=c(as.Date(sprintf("%d-01-01", range(data$year)[1])),
                                 as.Date(sprintf("%d-12-31", range(data$year)[2]))))
# add labels --------------------------------------------------------------
gg <- gg + labs(x=NULL, y="GLOBAL Temp Anomalies in 1.0ºC",
                title=sprintf("GISS Land and Sea Temperature Annual Anomaly Trend (%d to %d)\n",
                              range(data$year)[1], range(data$year)[2]))
# theme/legend tweaks -----------------------------------------------------
gg <- gg + theme_bw()
gg <- gg + theme(panel.grid=element_blank())
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(legend.position=c(0.9, 0.2))
gg <- gg + theme(legend.key=element_blank())
gg <- gg + theme(legend.background=element_blank())
Cover image from Data-Driven Security
Amazon Author Page

1 Comment NASA GISS’s Annual Global Temperature Anomaly Trends (dplyr/ggplot version)

  1. Pingback: NASA GISS’s Annual Global Temperature Anomaly Trends (dplyr/ggplot version) |

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.