Setup

The following URL: https://rud.is/210/chicago.csv contains the (surprisingly relatively current) annual salaries of employees in the City of Chicago. This is public information. I did not “hack” it.

You will/may find the following functions of help:

From stringr:

From the tidyverse:

From base/“built-in” R:

Data Cleanup

It’s a decent CSV file, but it needs some work. Here’s what you need to do:

Answering Questions with Data

Here’s what you need to answer:

  1. What is the median salary across all departments?
  2. What is the median salary by department?
  3. Who has the highest overall salary?
  4. Who has the highest overall salary by department? NOTE: Arrange this in order from highest to lowest value.
  5. What is the overall salary budget for each department? NOTE: Arrange this in order from highest to lowest value.
  6. How many employees are in each department? NOTE: Arrange this by lowest to highest value.
  7. An equally recent population estimate for Chicago is 2.719 million people. Divide the output of question 6 by the population to see how many people each official potentially serves.
  8. Visualize the distribution of salaries by department.

The blocks below have the answers shown (so you can check your work) but the code is hidden. Try to solve the problems independently or jointly before peeking (or ask me for help).


Unhide this block to see how to read in the data from the web site.

library(stringr)
library(tidyverse)

chicago <- read_csv("https://rud.is/210/chicago.csv")

glimpse(chicago)
## Observations: 32,062
## Variables: 4
## $ name                   <chr> "AARON,  ELVIA J", "AARON,  JEFFERY M", "AARON,  KARINA", "AARON...
## $ position_title         <chr> "WATER RATE TAKER", "POLICE OFFICER", "POLICE OFFICER", "CHIEF C...
## $ department             <chr> "WATER MGMNT", "POLICE", "POLICE", "GENERAL SERVICES", "WATER MG...
## $ employee_annual_salary <chr> "$90744.00", "$84450.00", "$84450.00", "$89880.00", "$106836.00"...
chicago
## # A tibble: 32,062 × 4
##                     name              position_title       department employee_annual_salary
##                    <chr>                       <chr>            <chr>                  <chr>
## 1        AARON,  ELVIA J            WATER RATE TAKER      WATER MGMNT              $90744.00
## 2      AARON,  JEFFERY M              POLICE OFFICER           POLICE              $84450.00
## 3         AARON,  KARINA              POLICE OFFICER           POLICE              $84450.00
## 4    AARON,  KIMBERLEI R    CHIEF CONTRACT EXPEDITER GENERAL SERVICES              $89880.00
## 5    ABAD JR,  VICENTE M           CIVIL ENGINEER IV      WATER MGMNT             $106836.00
## 6        ABARCA,  ANABEL        ASST TO THE ALDERMAN     CITY COUNCIL              $70764.00
## 7      ABARCA,  EMMANUEL       GENERAL LABORER - DSS    STREETS & SAN              $41849.60
## 8      ABASCAL,  REECE E TRAFFIC CONTROL AIDE-HOURLY             OEMC              $20051.20
## 9   ABBASI,  CHRISTOPHER  STAFF ASST TO THE ALDERMAN     CITY COUNCIL              $49452.00
## 10 ABBATACOLA,  ROBERT J         ELECTRICAL MECHANIC         AVIATION              $93600.00
## # ... with 32,052 more rows

Unhide this block to see how to do the data cleanup:

mutate(chicago,
       employee_annual_salary = str_replace(employee_annual_salary, "\\$", ""),
       employee_annual_salary = as.numeric(employee_annual_salary),
       name = str_to_title(name),
       position_title = str_to_title(position_title),
       department = str_to_title(department)) %>%
  separate(name, into = c("last", "first_mid"), ",") -> chicago

Unhide each section to see the code that produces the output:

What is the median salary across all departments?

scales::dollar(median(chicago$employee_annual_salary))
## [1] "$84,450"

What is the median salary by department?

group_by(chicago, department) %>%
  summarise(median_salary = median(employee_annual_salary)) %>%
  mutate(median_salary = scales::dollar(median_salary)) %>%
  knitr::kable(align="lr")
department median_salary
Admin Hearng $68,028.00
Animal Contrl $56,928.00
Aviation $72,862.40
Board Of Election $48,036.00
Board Of Ethics $81,948.00
Budget & Mgmt $89,340.00
Buildings $97,920.00
Business Affairs $75,960.00
City Clerk $62,004.00
City Council $52,950.00
Community Development $85,764.00
Cultural Affairs $84,168.00
Disabilities $82,044.00
Doit $96,066.00
Family & Support $13,520.00
Finance $68,028.00
Fire $91,362.00
General Services $93,600.00
Health $81,948.00
Human Relations $89,676.00
Human Resources $73,170.00
Inspector Gen $75,036.00
Ipra $89,880.00
Law $71,292.00
License Appl Comm $71,292.00
Mayor’s Office $74,250.00
Oemc $20,051.20
Police $87,384.00
Police Board $79,974.00
Procurement $75,960.00
Public Library $57,696.00
Streets & San $73,840.00
Transportn $81,536.00
Treasurer $88,626.00
Water Mgmnt $82,576.00

Who has the highest overall salary?

filter(chicago, employee_annual_salary == max(employee_annual_salary)) %>%
  mutate(employee_annual_salary = scales::dollar(employee_annual_salary)) %>%
  knitr::kable(align="llllr")
last first_mid position_title department employee_annual_salary
Evans Ginger S Commissioner Of Aviation Aviation $300,000

Who has the highest overall salary by department?

group_by(chicago, department) %>%
  filter(employee_annual_salary == max(employee_annual_salary)) %>%
  select(last, first_mid, department, employee_annual_salary) %>%
  arrange(desc(employee_annual_salary)) %>%
  mutate(employee_annual_salary = scales::dollar(employee_annual_salary)) %>%
  knitr::kable(align="lllr")
last first_mid department employee_annual_salary
Evans Ginger S Aviation $300,000
Emanuel Rahm Mayor’s Office $216,210
Santiago Jose A Fire $202,728
Escalante John J Police $197,724
Morita Julia Y Health $177,000
Reifman David Community Development $175,020
Morrison Butler Lisa V Family & Support $175,002
Patton Stephen R Law $173,664
Holt Alexandra D Budget & Mgmt $169,992
Powers Thomas H Water Mgmnt $169,512
Scheinfeld Rebekah C Transportn $169,500
Schenkel Gary W Oemc $167,796
Rhee Jamie L Procurement $167,220
Bannon Brian A Public Library $167,004
Widawsky Daniel J Finance $165,000
Fairley Sharon R Ipra $161,856
Ferguson Joseph M Inspector Gen $161,856
Kaiden Marla M City Council $160,248
Frydland Judith Buildings $157,092
Guerra Maria Business Affairs $157,092
Reynolds David J General Services $157,092
Williams Charles L Streets & San $157,092
Jackowiak Patricia Admin Hearng $156,420
Boone Michelle T Cultural Affairs $155,040
Berman Brenna M Doit $154,992
Choi Soo Human Resources $151,572
Lafleur Kristi L Treasurer $150,000
Tamley Karen M Disabilities $138,420
Mendoza Susana A City Clerk $133,545
Berlin Steven Board Of Ethics $131,688
Bateman Kelly Anne Board Of Election $129,840
Noriega Mona Human Relations $125,004
Yamaji Peter S Animal Contrl $123,144
Caproni Max A Police Board $97,728
Guzman Flores Michelle License Appl Comm $71,292

What is the overall salary budget for each department?

group_by(chicago, department) %>%
  summarise(total_salary_budget = sum(employee_annual_salary)) %>%
  arrange(desc(total_salary_budget)) %>%
  mutate(total_salary_budget = scales::dollar(total_salary_budget)) %>%
  knitr::kable(align="lr")
department total_salary_budget
Police $1,092,849,505
Fire $458,980,209
Water Mgmnt $157,442,278
Streets & San $148,340,585
Aviation $111,975,153
Transportn $91,304,948
Oemc $81,217,800
General Services $77,608,339
Public Library $54,108,881
Health $44,006,571
Finance $40,812,238
Law $30,924,124
Family & Support $26,367,348
Buildings $25,234,200
City Council $22,843,049
Community Development $17,910,303
Business Affairs $12,468,418
Doit $9,866,184
Mayor’s Office $7,531,149
Ipra $7,385,256
Procurement $6,639,596
Board Of Election $6,572,532
City Clerk $6,185,521
Cultural Affairs $5,950,580
Human Resources $5,313,360
Inspector Gen $4,680,780
Animal Contrl $3,683,492
Budget & Mgmt $3,587,580
Admin Hearng $2,996,364
Disabilities $2,259,888
Treasurer $1,991,577
Human Relations $1,457,052
Board Of Ethics $780,276
Police Board $159,948
License Appl Comm $71,292

How many employees are in each department?

count(chicago, department) %>%
  arrange(n) %>%
  mutate(n = scales::comma(n)) %>%
  knitr::kable(align="lr")
department n
License Appl Comm 1
Police Board 2
Board Of Ethics 9
Human Relations 16
Treasurer 22
Disabilities 28
Admin Hearng 39
Budget & Mgmt 39
Inspector Gen 57
Animal Contrl 67
Human Resources 70
Cultural Affairs 74
Ipra 82
Procurement 83
Mayor’s Office 86
City Clerk 94
Doit 102
Board Of Election 117
Business Affairs 161
Community Development 208
Buildings 262
City Council 396
Law 414
Health 529
Finance 577
Family & Support 654
General Services 930
Public Library 961
Transportn 1,073
Aviation 1,521
Water Mgmnt 1,840
Oemc 1,982
Streets & San 2,152
Fire 4,796
Police 12,618

Department employee/population breakdown:

count(chicago, department) %>%
  mutate(n = as.integer(2719000 / n)) %>%
  arrange(n) %>%
  mutate(n = scales::comma(n)) %>%
  rename(number_of_citizens_per_employee=n) %>%
  knitr::kable(align="lr")
department number_of_citizens_per_employee
Police 215
Fire 566
Streets & San 1,263
Oemc 1,371
Water Mgmnt 1,477
Aviation 1,787
Transportn 2,534
Public Library 2,829
General Services 2,923
Family & Support 4,157
Finance 4,712
Health 5,139
Law 6,567
City Council 6,866
Buildings 10,377
Community Development 13,072
Business Affairs 16,888
Board Of Election 23,239
Doit 26,656
City Clerk 28,925
Mayor’s Office 31,616
Procurement 32,759
Ipra 33,158
Cultural Affairs 36,743
Human Resources 38,842
Animal Contrl 40,582
Inspector Gen 47,701
Admin Hearng 69,717
Budget & Mgmt 69,717
Disabilities 97,107
Treasurer 123,590
Human Relations 169,937
Board Of Ethics 302,111
Police Board 1,359,500
License Appl Comm 2,719,000

Visualize the distribution of salaries by department.

ggplot(chicago, aes(department, employee_annual_salary, group=department)) +
  geom_boxplot(fill="steelblue", color="#b2b2b2") +
  hrbrthemes::scale_y_comma(limits=c(0,310000)) +
  coord_flip() +
  labs(x=NULL, y=NULL, title="City of Chicago Salary Distribution by Department") +
  hrbrthemes::theme_ipsum_rc(grid="X")

This is a different way to view the salary distributions which we’ll be covering in an upcoming class.

ggplot(chicago, aes(department, employee_annual_salary, group=department)) +
  ggbeeswarm::geom_quasirandom(shape=21, color="white", fill="steelblue", size=1.25, alpha=1/2, stroke=0.05) +
  hrbrthemes::scale_y_comma(limits=c(0,310000)) +
  coord_flip() +
  labs(x=NULL, y=NULL, title="City of Chicago Salary Distribution by Department") +
  hrbrthemes::theme_ipsum_rc(grid="X")