Lies, Damn Lies, “Data Journalism” and Charts That Don’t Start at 0

This tweet by @moorehn (who usually is a superb economic journalist) really bugged me:

I grabbed the raw data from EPI: (http://www.epi.org/files/2012/data-swa/jobs-data/Employment%20to%20population%20ratio%20(EPOPs).xls) and properly started the graph at 0 for the y-axis and also broke out men & women (since the Excel spreadsheet had the data). It’s a really different picture:

empToPop

I’m not saying employment is great right now, but it’s nowhere near a “ski jump”. So much for the state of data journalism at the start of 2014.

Here’s the hastily crafted R-code:

library(ggplot2)
library(ggthemes)
library(reshape2)
 
a <- read.csv("empvyear.csv")
b <- melt(a, id.vars="Year")
 
gg <- ggplot(data=b, aes(x=Year, y=value, group=variable))
gg <- gg + geom_line(aes(color=variable))
gg <- gg + ylim(0, 100)
gg <- gg + theme_economist()
gg <- gg + labs(x="Year", y="Employment as share of population (%)", 
                title="Employment-to-population ratio, age 25–54, 1975–2011")
gg <- gg + theme(legend.title = element_blank())
gg

And, here’s the data extracted from the Excel file:

Year,Men,Women
1975,89.0,51.0
1976,89.5,52.9
1977,90.1,54.8
1978,91.0,57.3
1979,91.1,59.0
1980,89.4,60.1
1981,89.0,61.2
1982,86.5,61.2
1983,86.1,62.0
1984,88.4,63.9
1985,88.7,65.3
1986,88.5,66.6
1987,89.0,68.2
1988,89.5,69.3
1989,89.9,70.4
1990,89.1,70.6
1991,87.5,70.1
1992,86.8,70.1
1993,87.0,70.4
1994,87.2,71.5
1995,87.6,72.2
1996,87.9,72.8
1997,88.4,73.5
1998,88.8,73.6
1999,89.0,74.1
2000,89.0,74.2
2001,87.9,73.4
2002,86.6,72.3
2003,85.9,72.0
2004,86.3,71.8
2005,86.9,72.0
2006,87.3,72.5
2007,87.5,72.5
2008,86.0,72.3
2009,81.5,70.2
2010,81,69.3
2011,81.4,69
Cover image from Data-Driven Security
Amazon Author Page

13 Comments Lies, Damn Lies, “Data Journalism” and Charts That Don’t Start at 0

  1. brianatoptimal

    Depends on what you are trying to show.
    Your graph shows all the data, which isn’t super helpful except to examine a long term trend (does anyone care about employment in 1975???)
    Her graph shows that between ’07 and ’10, employment drops by about 5%, and has only risen by 1% since then.
    Not a tragic statistic, other countries have done worse I’m sure.
    Each graph is relevant, depending on what you are conveying.

    Reply
  2. Pingback: Lies, Damn Lies, “Data Journalism” and Charts That Don’t Start at 0 | Patient 2 Earn

  3. jim

    Who says charts have to start at zero? Especially when that is essentially an impossible value for the plotted variable to take? Its certainly useful to see the whole time series for context, but starting the y-axis at zero just makes it hard to see the true variation (its the sort of thing you’d do if you wanted a time series to look flatter than it is) – there is zero information in the bottom half of your graph.
    The sharp decline in the combined proportion (males+females) from 2008 to 2010 looks dramatic and unprecedented when you plot it over the full range of time and historical values. I don’t know if its a “ski-jump” (?whatever that means), but the tweeted plot hardly seems misleading.

    Reply
  4. anspiess

    Well, I don’t know. Of course the ordinate scale has a major influence on data perception, but when looking at the right end of the red male time series, there is a clear drop. And since a reduction of employment of 5% affects hundreds of thousand of people I must admit that I prefer the original graph. Would be a different story if the data had been analyzed wrong, but just the scaling…?

    Cheers,
    Andrej

    Reply
  5. anspiess

    Would be interesting to know why in the mid-80’s there is such a specific increase in employment for women… Anyone have an idea?

    Reply
  6. Robert Young

    Well, as many know, the field was originally named (and so for centuries) Political Economics. In the 1970’s (yes, I was there), failed math/stat graduate students were quickly turned into Econ Ph.D.s and thence assistant professors at, mostly, 2nd and 3rd tier universities (I was there, too). The micro folks, armed with these math/stat approaches, quickly took over, and Economics with a heavy dose of Social Darwinism, to be expected from those with a micro point of view, was born.

    The “economics is not about value judgment” meme was re-born (for the Nth time) and has run rampant.

    In the current case: if you’re a Social Darwinist, then the 0 based obfuscater is your tool; while if you’re in the Krugman/Stiglitz camp, the original is what you show. If one looks, with intent, at the 0 based graphs, the ski jump is still evident in the male line. Overall, male participation has moved about a fairly level slope, while female clearly has a positive slope.

    How is a journalist, allegedly wedded to “fair and balanced”, to deal with the situation? In the current case, since the point of the article was to show the effects of the Great Recession, its birth, and aftermath, then the journalist did the right thing. If the point of an article is to, additionally, show the Great Recession is historical context, then a graph which is minimum based (not 0, even then) would be appropriate. But such a graph must needs be accompanied by a good deal of copy to provide historical context (changes in labour force participation and the like) for the various macroeconomic periods displayed.

    A picture is worth a thousand words, and the clever can mold the words by painting the right picture.

    Reply
  7. AlanTeew

    It has to start a zero if you’re trying to show how a change relates to the whole. Without proportion you get distortion.

    Reply
  8. Gilbert Pétain-Coup

    There’s a ski jump means that the difference between 75 and 80 is considered as high, which is not a question of statistics. Your graph hides this jump and does not answer the question at all.

    Reply
  9. Gilbert Pétain-Coup

    The original graph allows to see that the difference is approximately 5%. In the new graph it’s hard to see. And 5% of the population indeed sounds like a ski jump. But I guess you will delete this comment too (instead of deleting your blog post…)

    Reply
  10. Robert Young

    Two motivators:
    1) by 1975, Women’s Lib had taken sufficient hold that not just doctoring and nursing and teaching and lawyering were part of Women’s Work, so educated women entered more widely into the workforce; certain corners of society blamed all subsequent social ills on this “breakdown of the traditional family”.
    2) in 1981, the newly elected Reagan started the assault on workers generally with the PATCO firings; thus women began to enter the workforce in larger numbers as earned incomes stagnated (as measured by median income; the only measure proper for such skew distribution) in order to maintain household income.

    Reply
  11. John Daschbach

    Evaluating data should always be done in context. Starting a graph of employment a 0 is the wrong context! One can argue for different ordinate ranges, but the most common would be a reasonable range bounded by the max and min over some historical time range. Thus, in the US, one could argue for a range that spanned the available data, 50% to 100% would be reasonable. Starting at 0% is a greater distortion than simply spanning the range of data plotted.

    Reply
  12. lex

    I usually agree that charts ought to start at zero but not in this case. You can actually see why better on your version. A dip like that actually is very large compared to what is normally seen.

    Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.