(ggplot2) Exercising with (ggalt) dumbbells

I follow the most excellent Pew Research folks on Twitter to stay in tune with what’s happening (statistically speaking) with the world. Today, they tweeted this excerpt from their 2015 Global Attitudes survey:

I thought it might be helpful to folks if I made a highly aesthetically tuned version of Pew’s chart (though I chose to go a bit more minimal in terms of styling than they did) with the new geom_dumbbell() in the development version of ggalt. The source (below) is annotated, but please drop a note in the comments if any of the code would benefit from more exposition.

I’ve also switched to using the Prism javascript library starting with this post after seeing how well it works in RStudio’s flexdashboard package. If the “light on black” is hard to read or distracting, drop a note here and I’ll switch the theme if enough folks are having issues.

library(ggplot2) # devtools::install_github("hadley/ggplot2")
library(ggalt)   # devtools::install_github("hrbrmstr/ggalt")
library(dplyr)   # for data_frame() & arrange()

# I'm not crazy enough to input all the data; this will have to do for the example
df <- data_frame(country=c("Germany", "France", "Vietnam", "Japan", "Poland", "Lebanon",
                           "Australia", "South\nKorea", "Canada", "Spain", "Italy", "Peru",
                           "U.S.", "UK", "Mexico", "Chile", "China", "India"),
                 ages_35=c(0.39, 0.42, 0.49, 0.43, 0.51, 0.57,
                           0.60, 0.45, 0.65, 0.57, 0.57, 0.65,
                           0.63, 0.59, 0.67, 0.75, 0.52, 0.48),
                 ages_18_to_34=c(0.81, 0.83, 0.86, 0.78, 0.86, 0.90,
                                 0.91, 0.75, 0.93, 0.85, 0.83, 0.91,
                                 0.89, 0.84, 0.90, 0.96, 0.73, 0.69),
                 diff=sprintf("+%d", as.integer((ages_18_to_34-ages_35)*100)))

# we want to keep the order in the plot, so we use a factor for country
df <- arrange(df, desc(diff))
df$country <- factor(df$country, levels=rev(df$country))

# we only want the first line values with "%" symbols (to avoid chart junk)
# quick hack; there is a more efficient way to do this
percent_first <- function(x) {
  x <- sprintf("%d%%", round(x*100))
  x[2:length(x)] <- sub("%$", "", x[2:length(x)])
  x
}

gg <- ggplot()
# doing this vs y axis major grid line
gg <- gg + geom_segment(data=df, aes(y=country, yend=country, x=0, xend=1), color="#b2b2b2", size=0.15)
# dum…dum…dum!bell
gg <- gg + geom_dumbbell(data=df, aes(y=country, x=ages_35, xend=ages_18_to_34),
                         size=1.5, color="#b2b2b2", point.size.l=3, point.size.r=3,
                         point.colour.l="#9fb059", point.colour.r="#edae52")
# text below points
gg <- gg + geom_text(data=filter(df, country=="Germany"),
                     aes(x=ages_35, y=country, label="Ages 35+"),
                     color="#9fb059", size=3, vjust=-2, fontface="bold", family="Calibri")
gg <- gg + geom_text(data=filter(df, country=="Germany"),
                     aes(x=ages_18_to_34, y=country, label="Ages 18-34"),
                     color="#edae52", size=3, vjust=-2, fontface="bold", family="Calibri")
# text above points
gg <- gg + geom_text(data=df, aes(x=ages_35, y=country, label=percent_first(ages_35)),
                     color="#9fb059", size=2.75, vjust=2.5, family="Calibri")
gg <- gg + geom_text(data=df, color="#edae52", size=2.75, vjust=2.5, family="Calibri",
                     aes(x=ages_18_to_34, y=country, label=percent_first(ages_18_to_34)))
# difference column
gg <- gg + geom_rect(data=df, aes(xmin=1.05, xmax=1.175, ymin=-Inf, ymax=Inf), fill="#efefe3")
gg <- gg + geom_text(data=df, aes(label=diff, y=country, x=1.1125), fontface="bold", size=3, family="Calibri")
gg <- gg + geom_text(data=filter(df, country=="Germany"), aes(x=1.1125, y=country, label="DIFF"),
                     color="#7a7d7e", size=3.1, vjust=-2, fontface="bold", family="Calibri")
gg <- gg + scale_x_continuous(expand=c(0,0), limits=c(0, 1.175))
gg <- gg + scale_y_discrete(expand=c(0.075,0))
gg <- gg + labs(x=NULL, y=NULL, title="The social media age gap",
                subtitle="Adult internet users or reported smartphone owners who\nuse social networking sites",
                caption="Source: Pew Research Center, Spring 2015 Global Attitudes Survey. Q74")
gg <- gg + theme_bw(base_family="Calibri")
gg <- gg + theme(panel.grid.major=element_blank())
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(axis.text.x=element_blank())
gg <- gg + theme(plot.title=element_text(face="bold"))
gg <- gg + theme(plot.subtitle=element_text(face="italic", size=9, margin=margin(b=12)))
gg <- gg + theme(plot.caption=element_text(size=7, margin=margin(t=12), color="#7a7d7e"))
gg

RStudio

Cover image from Data-Driven Security
Amazon Author Page

24 Comments (ggplot2) Exercising with (ggalt) dumbbells

  1. Pingback: (ggplot2) Exercising with (ggalt) dumbbells – grahn.xyz

  2. Pingback: (ggplot2) Exercising with (ggalt) dumbbells – Mubashir Qasim

  3. George

    Nice. But why leave Kenya out in your new chart? Because it is from Africa? Unconscious bias?

    Reply
    1. hrbrmstr

      I left a ton out of the new chart. And, if you look carefully (vs approach it with malicious insinuation in mind) you’ll see I hand transcribed them in groups of six and stopped at 3 groups kinda b/c I had better things to do with my time. So, please feel free to redirect your typing energy into transcribing them all and provide the data so I can update the instructive post I made freely available. Thanks in advance for the data transcription offer!

      Reply
  4. Pedro J. Aphalo

    I like a lot more the graphic design of your plot than the original one.
    The positions of comment lines “# text below points” and “# text above points” seem to be swapped in the code listing.
    Thanks for the always interesting posts!

    Reply
  5. Hansoo

    Great job!!

    I have question!

    Is “geom_dumbbell” working?
    I’ve got error message like below;

    Error: could not find function “geom_dumbbell”

    How do I fix it?

    Reply
    1. hrbrmstr

      This:

      library(ggalt) # devtools::install_github("hrbrmstr/ggalt")

      is in the source.

      I put the comment after the library() call to indicate folks probably need to install ggalt from github.

      Reply
      1. egrason

        Hi, I have ggalt installed and called/loaded, but still get the same error as HANSOO above. I ran the script for the geomdumbbell function on Github (https://github.com/hrbrmstr/ggalt/blob/master/R/geomdumbbell.R). When I call the ggplot object after adding the geom_dumbbell, I get the error: could not find function “%| |%”.

        I have all the packages installed and loaded from your example (https://gist.github.com/hrbrmstr/0d206070cea01bcb0118) and am running R 3.2.2.

        I’d appreciate your help! Thanks.

        Reply
        1. hrbrmstr

          Whether Hadley or Thomas will admit it or not, they broke stuff in the latest ggplot2 release and I have yet to scrounge the time (seriously complex “real life” stuff started happening around July) to fix. I’ll try to get to it soon, tho.

          Reply
  6. @patternproject

    Thanks. This is amazing to learn. I picked up the use of sprintf.

    Can I ask how to fix the custom font issue. I get the following error

    Warning message:
    In grid.Call.graphics(L_text, as.graphicsAnnot(x$label), x$x, x$y, :
    font family not found in Windows font database

    If I query the existing installed fonts, available to R as follows:

    windowsFonts()
    $serif
    [1] “TT Times New Roman”

    $sans
    [1] “TT Arial”

    $mono
    [1] “TT Courier New”

    I have tried package showtext but still not able to fix it. How does it work in your case.

    Thanks

    Reply
  7. Tom

    Thank you very much for posting this code, it’s excellent! Very much appreciated by the R community!

    Reply
  8. Maher

    Great work. I like to portray time difference in goes_dubmbbell, I have learned a lot from your illustration. Thanks

    Reply
  9. Even Solberg

    Nice. I have a more complex version I’m grappling with: I’d love to see an example where where left hand side is “Before” and the right is “After”, but where the dots are color coded from red via yellow to green on a scale from 0-100%. So one can illustrate “Before it was at 10% (red), after it’s at 75% (yellow-green)”. I’m not sure this can even be done in the current incarnation of geom_dumbbell().

    Reply
  10. Nguyen Chi Dung

    Dear Bob,

    I used your codes for my own data set. However, i can not show text DIFF by using the follow command:

    geom_text(data = dfEstonia, aes(label = DIFF, x = (h1 + h2) / 2, y = Country), fontface = “bold”, family = my_font, vjust = -2).

    Here is full codes: http://rpubs.com/chidungkt/562726.

    Would you show me the reason why DIFF can not shown in my codes?

    Many thanks.

    Reply
  11. Pingback: Create Dumbbell Plots to Visualize Group Differences in R Towards Data Science – Medium – DeFi News

  12. Pingback: Week 1 Security 101 Homework: Security Reporting - Ace Your Essay

  13. Pingback: Adding multiple timepoint on dumbbell plot – Open Source Biology & Genetics Interest Group

  14. Pingback: cyber threat - EssayBy.com

  15. Pingback: cyber threat – PAPER ACER

  16. Pingback: cyber threat – PLUG PAPERS

  17. Pingback: cyber threat - QualityAssignWriters

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.