Skip navigation

I woke up this morning to a [headline story from the Washington Post](https://www.washingtonpost.com/news/the-fix/wp/2015/12/10/to-many-christian-terrorists-arent-true-christians-but-muslim-terrorists-are-true-muslims/) on _”Americans are twice as willing to distance Christian extremists from their religion as Muslims_”. This post is not about the content of the headline or story. It _is_ about the horrible pie chart WaPo led the article with:

Untitled

This isn’t just a rant of a madman against pie charts. While I _am_ vehemently opposed to them, we did cover them [in our book](https://books.google.com/books?id=7DqwAgAAQBAJ&pg=PA146&lpg=PA146&dq=data-driven+security+pie+chart&source=bl&ots=Cy1iJylsHd&sig=a6Hz1JB-QYLq6H0VZJpPleJgRkQ&hl=en&sa=X&ved=0ahUKEwj79uqt_tjJAhVG0iYKHS0uDn4Q6AEIMzAH#v=onepage&q=data-driven%20security%20pie%20chart&f=false) and my co-author (@jayjacobs) and the incredibly talented @annkemery both agree there are often cases where they are appropriate. Even using their less-sensitive sensibilities, this would not be one of those cases.

So, what—exactly—is the problem? WaPo tried to enable comparison between pies by exploding them and using colors to indicate similar fear levels, mapping shades to entries in the top legend. Your eye has to move around a bit to take everything in and remember the mapping as you focus on each slice (since you will end up doing that given that each category colored differently). Their whole goal was to enable the reader to see the change in sentiment towards terrorism since this time last year.

Hrm. Two dates. Small set of values. Desire to quickly compare change in value/slope. **This sounds like a job for a slopegraph!**

The article and graphic are based on a [survey](http://publicreligion.org/research/2015/12/survey-nearly-half-of-americans-worried-that-they-or-their-family-will-be-a-victim-of-terrorism/). Thankfully the [complete survey data was made available](http://publicreligion.org/site/wp-content/uploads/2015/12/December-2015-PRRI-RNS-Topline1.pdf), which made it easy to do a makeover (in R of course). Here’s the result:

unnamed-chunk-1-1

Each category change is clearly visible, you don’t need to remember color association and you even know the actual values*.

The R code is below and in [this gist](https://gist.github.com/hrbrmstr/9bf4f93dffc1df48fe27). How would you make the WaPo chart better (drop a note in the comments with a link to your own makeover)?

library(tidyr)
library(ggplot2)
library(ggthemes)
library(scales)
library(dplyr)
 
# Easiest way to transcribe the PDF table
# The slope calculation will enable us to color the lines/points based on up/down
dat <- data_frame(`2014-11-01`=c(0.11, 0.22, 0.35, 0.31, 0.01),
                  `2015-12-01`=c(0.17, 0.30, 0.30, 0.23, 0.00),
                  slope=factor(sign(`2014-11-01` - `2015-12-01`)),
                  fear_level=c("Very worried", "Somewhat worried", "Not too worried",
                               "Not at all", "Don't know/refused"))
 
# Transform that into something we can use
dat <- gather(dat, month, value, -fear_level, -slope)
 
# We need real dates for the X-axis manipulation
dat <- mutate(dat, month=as.Date(as.character(month)))
 
# Since 2 categories have the same ending value, we need to
# take care of that (this is one of a few "gotchas" in slopegraph preparation)
end_lab <- dat %>%
  filter(month==as.Date("2015-12-01")) %>%
  group_by(value) %>%
  summarise(lab=sprintf("%s", paste(fear_level, collapse=", ")))
 
gg <- ggplot(dat)
 
# line
gg <- gg + geom_line(aes(x=month, y=value, color=slope, group=fear_level), size=1)
# points
gg <- gg + geom_point(aes(x=month, y=value, fill=slope, group=fear_level),
                      color="white", shape=21, size=2.5)
 
# left labels
gg <- gg + geom_text(data=filter(dat, month==as.Date("2014-11-01")),
                     aes(x=month, y=value, label=sprintf("%s — %s  ", fear_level, percent(value))),
                     hjust=1, size=3)
# right labels
gg <- gg + geom_text(data=end_lab,
                     aes(x=as.Date("2015-12-01"), y=value,
                         label=sprintf("  %s — %s", percent(value), lab)),
                     hjust=0, size=3)
 
# Here we do some slightly tricky x-axis formatting to ensure we have enough
# space for the in-panel labels, only show the months we need and have
# the month labels display properly
gg <- gg + scale_x_date(expand=c(0.125, 0),
                        labels=date_format("%b\n%Y"),
                        breaks=c(as.Date("2014-11-01"), as.Date("2015-12-01")),
                        limits=c(as.Date("2014-02-01"), as.Date("2016-12-01")))
gg <- gg + scale_y_continuous()
 
# I used colors from the article
gg <- gg + scale_color_manual(values=c("#f0b35f", "#177fb9"))
gg <- gg + scale_fill_manual(values=c("#f0b35f", "#177fb9"))
gg <- gg + labs(x=NULL, y=NULL, title="Fear of terror attacks (change since last year)\n")
gg <- gg + theme_tufte(base_family="Helvetica")
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(axis.text.y=element_blank())
gg <- gg + theme(legend.position="none")
gg <- gg + theme(plot.title=element_text(hjust=0.5))
gg
* Well, it’s survey. To add insult to injury, it’s a sentiment-based survey given right after a likely-to-be-attributed-terrorism attack. Also, there is a margin of error that isn’t communicated in either visualization. So while there is “data”, trust it at your own peril.

12 Comments

  1. This is pretty sad as the number of deaths caused by modern militarized police forces are orders of magnitudes higher.

  2. “So while there is “data”, trust it at your own peril.”

    I’m not a fan of Big Data, but given this sort of topic, I’d be happier if the entire sampling plan were provided. ~1,000 observations is typical of such surveys (Gallup, et al have been in that ballpark for decades), but not, to my mind, sufficient. just one opinion.

  3. I applaud your efforts at making this clearer, but to be honest, the slope graph doesn’t do it for me. Is it just because I’m not so familiar with them? Maybe labels in the middle rather than duplicating them and not combing the label of the categories that both become 30%? Sad as it is, a stacked area chart would do a better job at conveying this information to me I think.

  4. Lovely slope plot, but the change in colours is confusing when making comparisons to the original pie charts. In the end, I think the appropriate solution would have been a likert plot, for which this data is perfectly suited.

  5. Sorry, but I think the original pie chart is clearer and more informative

  6. I do not see a problem with this pie chart. There is a total of four categories which were grouped to “(rather) worried” or “(rather) not worried” by similar colors. The goal of this visualization was not to show the detailed changes inside these four categories over time from November 2014 to now but to show that people are now more worried than before. No need for numbers or detailed information about changes.

    The pie chart shows that “not too worried” stayed mostly the same, while “very worried” and “somewhat worried” gained and “not at all worried” lost.

    The slopegraph on the other hand takes more cognitive effort to understand and is too much for just showing that people are now worried more. Although, the slopegraph does indeed do the title of the pie chart justice “Fear of terror attacks has INCREASED since last year”. “Increase” is a process and the slopegraph shows detailed information about this process.

    “People now more worried of terror attacks than last year” would fit better as a title for the pie chart.

  7. They get worse. Cartographers have long known that use of an area to represent a scalar quantity is dangerous and leads to very poor estimation of the actual and relative magnitudes (check out proportionate symbol mapping) and even suggested an empirical correction for it (Flannery’s law). To compound the problem consider also the Excel way in which the pies are shown in psuedo-3D, which is really crazy.

  8. The alternative shown 1) serves different purpose; 2) is more difficutl to read and understant; 3) is simply not nice to look at. So what was the purpose of this article? To show other options – ok. To critisize – fail.

  9. I’m afraid I also think the slopegraph isn’t much of an improvement on the pie charts. The combination of “Somewhat worried” and “Not too worried” with the same value for 2015 is particularly confusing. I would just present these data with a straightforward bar chart – since what’s really important is the shape of the distribution of answers this allows the reader to easily compare the two years with a glance. For me at least this is a lot easier to interpret than the slopegraph. I can’t seem to put a figure in the comments but this code will make one.

    fear<-matrix(c(31,23,35,30,22,30,11,17), nrow=2)
    rownames(fear)<-c(“2014″,”2015”)
    colnames(fear)<-c(“Not worried”,”Not too worried”,”Somewhat worried”,”Very worried”)
    barplot(fear, beside=TRUE, ylab=”Percentage of responders”, col=c(“orange”,”steelblue”),legend=TRUE)

  10. Here’s the code with the carriage returns, I hope – they were stripped out in the previous version.

    “`
    fear<-matrix(c(31,23,35,30,22,30,11,17), nrow=2)

    rownames(fear)<-c(“2014″,”2015”)

    colnames(fear)<-c(“Not worried”,”Not too worried”,”Somewhat worried”,”Very worried”)

    barplot(fear, beside=TRUE, ylab=”Percentage of responders”, col=c(“orange”,”steelblue”),legend=TRUE)
    “`

  11. I dropped WaPo several months ago. One reason was the poor graphics. They seemed enamoured with ‘pretty’ over communication to the extent that it cut into credibility.

  12. “`
    dat <- data.frame(‘Nov 2014’ = c(0.11, 0.22, 0.35, 0.31, 0.01),
    ‘Dec 2015’ = c(0.17, 0.30, 0.30, 0.23, 0.00),
    fear_level = c(“Very worried”, “Somewhat worried”, “Not too worried”,
    “Not at all”, “Don’t know/refused”),
    check.names = FALSE)

    col <- c(‘gold2′,’dodgerblue2’)[grepl(‘(?i)n.t’, dat$fearlevel) + 1L]
    plot(col(dat[, -3]), unlist(dat[, -3]), xlim = c(0,3), col = col, pch = 16,
    ann = FALSE, axes = FALSE)
    axis(1, at = 1:2, labels = names(dat)[1:2], lwd = 0)
    segments(rep(1, 5), dat$Nov 2014, rep(2, 5), dat$Dec 2015, col = col, lwd = 2)
    ttl <- sprintf(‘%s – %s%%’, dat$fear
    level, dat$Nov 2014 * 100)
    ttr <- sprintf(‘%s – %s%%’, dat$fear_level, dat$Dec 2015 * 100)
    text(rep(1, 5), dat$Nov 2014, ttl, adj = 1, pos = 2)
    text(rep(2, 5), dat$Dec 2015 + c(0, -.01, .01, 0, 0), ttr, adj = 0, pos = 4)
    “`


3 Trackbacks/Pingbacks

  1. By Fear of WaPo Using Bad Pie Charts Has Increased Since Last Year | OSINFO on 13 Dec 2015 at 12:03 pm

    […] Each category change is clearly visible, you don’t need to remember color association and you even know the actual values*. […]

  2. By Fear of WaPo Using Bad Pie Charts Has Increased Since Last Year | OSINFO on 13 Dec 2015 at 12:03 pm

    […] Each category change is clearly visible, you don’t need to remember color association and you even know the actual values*. […]

  3. […] article was first published on R – rud.is, and kindly contributed to […]

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.