52Vis Week 2 (2016 Week #14) - Honing in on the Homeless

>UPDATE: Since I put in a “pull request” requirement, I intended to put in a link to getting started with GitHub. Dr. Jenny Bryan’s @stat545 has a great [section on git](https://stat545-ubc.github.io/git00_index.html) that should hopefully make it a bit less painful.

### Why 52Vis?

In case folks are wondering why I’m doing this, it’s pretty simple. We need a society that has high data literacy and we need folks who are capable of making awesome, truthful data visualizations. The only way to do that is by working with data over, and over, and over, and over again.

Directed projects with some reward are one of the best Pavlovian ways to accomplish that :-)

### This week’s challenge

The Data is Plural folks have [done it again](http://tinyletter.com/data-is-plural/letters/data-is-plural-2016-04-06-edition) and there’s a neat and important data set in this week’s vis challenge.

From their newsletter:

>_Every January, at the behest of the U.S. Department of Housing and Urban Development, volunteers across the country attempt to count the homeless in their communities. The result: HUD’s “point in time” estimates, which are currently available for 2007–2015. The most recent estimates found 564,708 homeless people nationwide, with 75,323 of that count (more than 13%) living in New York City._

I decided to take a look at this data by seeing which states had the worst homeless problem per-capita (i.e. per 100K population). I’ve included the population data along with some ready-made wrangling of the HUD data.

But, before we do that…

### RULES UPDATE + Last week’s winner

I’ll be announcing the winner on Thursday since I:

– am horribly sick after being exposed to who knows what after rOpenSci last week in SFO :-)
– have been traveling like mad this week
– need to wrangle all the answers into the github repo and get @laneharrison (and his students) to validate my choice for winner (I have picked a winner)

Given how hard the wrangling has been, I’m going to need to request that folks both leave a blog comment and file a PR to [the github repo](https://github.com/52vis/2016-14) for this week. Please include the code you used as well as the vis (or a link to a working interactive vis)

### PRIZES UPDATE

Not only can I offer [Data-Driven Security](http://dds.ec/amzn), but Hadley Wickham has offered signed copies of his books as well, and I’ll keep in the Amazon gift card as a catch-all if you have more (NOTE: if any other authors want to offer up their tomes shoot me a note!).

### No place to roam

Be warned: this was a pretty depressing data set. I went in with the question of wanting to know which “states” had the worst problem and I assumed it’d be California or New York. I had no idea it would be what it was and the exercise shattered some assumptions.

NOTE: I’ve included U.S. population data for the necessary time period.

library(readxl)
library(purrr)
library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)
library(scales)
library(grid)
library(hrbrmisc)
 
# grab the HUD homeless data
 
URL <- "https://www.hudexchange.info/resources/documents/2007-2015-PIT-Counts-by-CoC.xlsx"
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil, mode="wb")
 
# turn the excel tabs into a long data.frame
yrs <- 2015:2007
names(yrs) <- 1:9
homeless <- map_df(names(yrs), function(i) {
  df <- suppressWarnings(read_excel(fil, as.numeric(i)))
  df[,3:ncol(df)] <- suppressWarnings(lapply(df[,3:ncol(df)], as.numeric))
  new_names <- tolower(make.names(colnames(df)))
  new_names <- str_replace_all(new_names, "\\.+", "_")
  df <- setNames(df, str_replace_all(new_names, "_[[:digit:]]+$", ""))
  bind_cols(df, data_frame(year=rep(yrs[i], nrow(df))))
})
 
# clean it up a bit
homeless <- mutate(homeless,
                   state=str_match(coc_number, "^([[:alpha:]]{2})")[,2],
                   coc_name=str_replace(coc_name, " CoC$", ""))
homeless <- select(homeless, year, state, everything())
homeless <- filter(homeless, !is.na(state))
 
# read in the us population data
uspop <- read.csv("uspop.csv", stringsAsFactors=FALSE)
uspop_long <- gather(uspop, year, population, -name, -iso_3166_2)
uspop_long$year <- sub("X", "", uspop_long$year)
 
# normalize the values
states <- count(homeless, year, state, wt=total_homeless)
states <- left_join(states, albersusa::usa_composite()@data[,3:4], by=c("state"="iso_3166_2"))
states <- ungroup(filter(states, !is.na(name)))
states$year <- as.character(states$year)
states <- mutate(left_join(states, uspop_long), homeless_per_100k=(n/population)*100000)
 
# we want to order from worst to best
group_by(states, name) %>%
  summarise(mean=mean(homeless_per_100k, na.rm=TRUE)) %>%
  arrange(desc(mean)) -> ordr
 
states$year <- factor(states$year, levels=as.character(2006:2016))
states$name <- factor(states$name, levels=ordr$name)
 
# plot
#+ fig.retina=2, fig.width=10, fig.height=15
gg <- ggplot(states, aes(x=year, y=homeless_per_100k))
gg <- gg + geom_segment(aes(xend=year, yend=0), size=0.33)
gg <- gg + geom_point(size=0.5)
gg <- gg + scale_x_discrete(expand=c(0,0),
                            breaks=seq(2007, 2015, length.out=5),
                            labels=c("2007", "", "2011", "", "2015"),
                            drop=FALSE)
gg <- gg + scale_y_continuous(expand=c(0,0), labels=comma, limits=c(0,1400))
gg <- gg + labs(x=NULL, y=NULL,
                title="US Department of Housing & Urban Development (HUD) Total (Estimated) Homeless Population",
                subtitle="Counts aggregated from HUD Communities of Care Regional Surveys (normalized per 100K population)",
                caption="Data from: https://www.hudexchange.info/resource/4832/2015-ahar-part-1-pit-estimates-of-homelessness/")
gg <- gg + facet_wrap(~name, scales="free", ncol=6)
gg <- gg + theme_hrbrmstr_an(grid="Y", axis="", strip_text_size=9)
gg <- gg + theme(axis.text.x=element_text(size=8))
gg <- gg + theme(axis.text.y=element_text(size=7))
gg <- gg + theme(panel.margin=unit(c(10, 10), "pt"))
gg <- gg + theme(panel.background=element_rect(color="#97cbdc44", fill="#97cbdc44"))
gg <- gg + theme(plot.margin=margin(10, 20, 10, 15))
gg

I used one of HUD’s alternate, official color palette colors for the panel backgrounds.

Remember, this is language/tool-agnostic & go in with a good question or two, augment as you feel you need to and show us your vis!

Week 2’s content closes 2016-04-12 23:59 EDT

Contest GitHub Repo:

13 Comments

- timothyjkiely
- Posted 2016-04-07 at 09:41
- Permalink
- Reply
Another great challenge! For those of us who use GitHub less frequently, could you direct us to a tutorial about how to execute a proper pull request?
- - hrbrmstr
  - Posted 2016-04-07 at 10:04
  - Permalink
  - Reply
  Thx again for the reminder. Here’s the link I put in the post: https://stat545-ubc.github.io/git00_index.html
- hrbrmstr
- Posted 2016-04-07 at 09:48
- Permalink
- Reply
argh! yes. i meant to (I’m legitimately super-sick ;-). fixing now.
- Alex
- Posted 2016-04-09 at 21:35
- Permalink
- Reply
Hi – Thanks again, and I just submitted a PR on github.
- ottlngr
- Posted 2016-04-10 at 11:57
- Permalink
- Reply
My contribution for this week: https://github.com/ottlngr/2016-14/blob/ottlngr/ottlngr/homeless_ottlngr.Rmd
PR is submitted.
- Camille
- Posted 2016-04-10 at 21:37
- Permalink
- Reply
Hi, really interesting data! I just submitted a pull request under the user camille86. Thanks!
- Craine
- Posted 2016-04-11 at 10:16
- Permalink
- Reply
I’ve submitted my PR (cmrunton) today. Let me know what your thoughts are if you can!
- Jake Kaupp
- Posted 2016-04-11 at 15:07
- Permalink
- Reply
Thanks for this contest. Lots of fun and fantastic practise. Just submitted a PR under jkaupp.
- KRoosevelt
- Posted 2016-04-12 at 00:46
- Permalink
- Reply
Hi, thanks for putting this together, I just submitted a PR under KRoosevelt titled “Update homeless.r”. (Sorry, I have no idea how to rename that.)
- @patternproject
- Posted 2016-04-12 at 11:45
- Permalink
- Reply
Mine here: https://twitter.com/patternproject/status/719929222966472705
- Julia Silge
- Posted 2016-04-12 at 12:12
- Permalink
- Reply
Here is my version of a visualization for this data set: http://rpubs.com/juliasilge/170499
Thanks so much for organizing this!
- @patternproject
- Posted 2016-04-12 at 12:26
- Permalink
- Reply
Stuggling with “Pull Request” My code is available here: https://github.com/patternproject/r.rudis.challenge2
- Xan Gregg
- Posted 2016-04-12 at 22:32
- Permalink
- Reply
I think I got the PR sumbitted: https://github.com/52vis/2016-14/pull/11
If not, original is at https://github.com/xangregg/52vis.02.homeless

2 Trackbacks/Pingbacks

By 52Vis Week 2 (2016 Week #14) – Honing in on the Homeless – Mubashir Qasim on 07 Apr 2016 at 2:18 pm

[…] article was first published on R – rud.is, and kindly contributed to […]
By Honing in on the Homeless – the Splunkish way | Foren6 on 08 Apr 2016 at 12:31 pm

[…] I had been eager to start playing with one of the new charts when yesterday I came across a blog post by Bob Rudis, who is co-author of the Data-Driven Security Book and former member of the […]

rud.is