Putting It All Together

The kind folks over at @RStudio gave a nod to my recently CRAN-released epidata package in their January data package roundup and I thought it might be useful to give it one more showcase using the recent CRAN update to ggalt and the new hrbrthemes (github-only for now) packages.

Labor force participation rate

The U.S. labor force participation rate (LFPR) is an oft-overlooked and under- or mis-reported economic indicator. I’ll borrow the definition from Investopedia:

The participation rate is a measure of the active portion of an economy’s labor force. It refers to the number of people who are either employed or are actively looking for work. During an economic recession, many workers often get discouraged and stop looking for employment, resulting in a decrease in the participation rate.

Population age distributions and other factors are necessary to honestly interpret this statistic. Parties in power usually dismiss/ignore this statistic outright and their opponents tend to wholly embrace it for criticism (it’s an easy target if you’re naive). “Yay” partisan democracy.

Since the LFPR has nuances when looked at categorically, let’s take a look at it by attained education level to see how that particular view has changed over time (at least since the gov-quants have been tracking it).

We can easily grab this data with epidata::get_labor_force_participation_rate()(and, we’ll setup some library() calls while we’re at it:

library(hrbrthemes) # devtools::install_github("hrbrmstr/hrbrthemes")

part_rate <- get_labor_force_participation_rate("e")

## Observations: 457
## Variables: 7
## $ date              <date> 1978-12-01, 1979-01-01, 1979-02-01, 1979-03-01, 1979-04-01, 1979-05-01...
## $ all               <dbl> 0.634, 0.634, 0.635, 0.636, 0.636, 0.637, 0.637, 0.637, 0.638, 0.638, 0...
## $ less_than_hs      <dbl> 0.474, 0.475, 0.475, 0.475, 0.475, 0.474, 0.474, 0.473, 0.473, 0.473, 0...
## $ high_school       <dbl> 0.690, 0.691, 0.692, 0.692, 0.693, 0.693, 0.694, 0.694, 0.695, 0.696, 0...
## $ some_college      <dbl> 0.709, 0.710, 0.711, 0.712, 0.712, 0.713, 0.712, 0.712, 0.712, 0.712, 0...
## $ bachelor's_degree <dbl> 0.771, 0.772, 0.772, 0.773, 0.772, 0.772, 0.772, 0.772, 0.772, 0.773, 0...
## $ advanced_degree   <dbl> 0.847, 0.847, 0.848, 0.848, 0.848, 0.848, 0.847, 0.847, 0.848, 0.848, 0...

One of the easiest things to do is to use ggplot2 to make a faceted line chart by attained education level. But, let’s change the labels so they are a bit easier on the eyes in the facets and switch the facet order from alphabetical to something more useful:

gather(part_rate, category, rate, -date) %>% 
  mutate(category=stri_replace_all_fixed(category, "_", " "),
         category=stri_replace_last_regex(category, "Hs$", "High School"),
         category=factor(category, levels=c("Advanced Degree", "Bachelor's Degree", "Some College", 
                                            "High School", "Less Than High School", "All")))  -> part_rate

Now, we’ll make a simple line chart, tweaking the aesthetics just a bit:

ggplot(part_rate) +
  geom_line(aes(date, rate, group=category)) +
  scale_y_percent(limits=c(0.3, 0.9)) +
  facet_wrap(~category, scales="free") +
  labs(x=paste(format(range(part_rate$date), "%Y-%b"), collapse=" to "), 
       y="Participation rate (%)", 
       title="U.S. Labor Force Participation Rate",
       caption="Source: EPI analysis of basic monthly Current Population Survey microdata.") +
  theme_ipsum_rc(grid="XY", axis="XY")

The “All” view is interesting in that the LFPR has held fairly “steady” between 60% & 70%. Those individual and fractional percentage points actually translate to real humans, so the “minor” fluctuations do matter.

It’s also interesting to see the direct contrast between the starting historical rate and current rate (you could also do the same with min/max rates, etc.) We can use a “dumbbell” chart to compare the 1978 value to today’s value, but we’ll need to reshape the data a bit first:

group_by(part_rate, category) %>% 
  arrange(date) %>% 
  slice(c(1, n())) %>% 
  spread(date, rate) %>% 
  ungroup() %>% 
  filter(category != "All") %>% 
  mutate(category=factor(category, levels=rev(levels(category)))) -> rate_range

filter(part_rate, category=="Advanced Degree") %>% 
  arrange(date) %>% 
  slice(c(1, n())) %>% 
  mutate(lab=lubridate::year(date)) -> lab_df

(We’ll be using the extra data frame to add labels the chart.)

Now, we can compare the various ranges, once again tweaking aesthetics a bit:

ggplot(rate_range) +
  geom_dumbbell(aes(y=category, x=`1978-12-01`, xend=`2016-12-01`),
                size=3, color="#e3e2e1",
                colour_x = "#5b8124", colour_xend = "#bad744",
                dot_guide=TRUE, dot_guide_size=0.25) +
  geom_text(data=lab_df, aes(x=rate, y=5.25, label=lab), vjust=0) +
  scale_x_percent(limits=c(0.375, 0.9)) +
  labs(x=NULL, y=NULL,
       title=sprintf("U.S. Labor Force Participation Rate %s-Present", lab_df$lab[1]),
       caption="Source: EPI analysis of basic monthly Current Population Survey microdata.") +


One takeaway from both these charts is that it’s probably important to take education level into account when talking about the labor force participation rate. The get_labor_force_participation_rate() function — along with most other functions in the epidata package — also has options to factor the data by sex, race and age, so you can pull in all those views to get a more nuanced & informed understanding of this economic health indicator.

Buy on AmazonDDS Blog
DDS PodcastAmazon Author Page

7 Comments Putting It All Together

  1. Pingback: Putting It All Together – sec.uno

  2. Pingback: Putting It All Together | A bunch of data

  3. Pingback: Putting It All Together – Cyber Security

  4. Charles

    Very interesting!

    I’m on a Mac, I had problems with your “themeipsumrc” call, but using just plain “theme_ipsum” worked fine. Apparently, I don’t have the “Roboto Condensed Light” font family on my computer.

    1. hrbrmstr

      There is a function provided to install the font and the help does say you need to also load it on your Mac as well as in R.

  5. Josh Fagen

    Thanks for posting this! I found it pretty interesting. Really appreciate your run-down of the code and the helpful explanation too…just one question code-wise: do you know why the dates need to be enclosed in marks to define the x-axis limits for the dumbbells in the dumbbell chart?

  6. Pingback: Linkdump #31 | WZB Data Science Blog

Leave a Reply