Recipe 22 Visualizing Intersecting Follower Sets with UpSetR

22.1 Problem

You want to examine the intersection of twitter followers between a group of definied twitter handles.

22.2 Solution

  • Scrape all follower ID’s for each handle
  • Combine into one dataframe
  • Create de-duplicated list of all followers
  • Build a logical matrix to indicate if each follower follows each handle or not
  • Plot the intersecting sets with UpSetR

22.3 Discussion

Set visualization, typically done using Venn diagrams, can become challenging when the number of sets exceeds a a trivial threshold. To address this, the UpSet project was born.

A novel visualization technique for the quantitative analysis of sets, their intersections, and aggregates of intersections.

Thankfully, there is an R package version of the project that we can use with follower data pulled with rtweet. UpSetR requires the data to be in a binary matrix format, so there is some data wrangling work to be done before we can visualize.

First we will make a list of twitter handles we want to compare then scrape all of their followers into a one dataframe using a get_followers function inside a purrr::map_df call. Set n to a number => the max follower count in your set and retryonratelimit = TRUE to ensure you capture all followers. This may take some time depending on how may followers you are scraping.

## # A tibble: 6 x 2
##   user_id            account  
##   <chr>              <chr>    
## 1 347516627          dataandme
## 2 86786449           dataandme
## 3 901331045353754624 dataandme
## 4 916260332598906881 dataandme
## 5 245066121          dataandme
## 6 828212430345994240 dataandme
## # A tibble: 6 x 2
##   user_id    account  
##   <chr>      <chr>    
## 1 92966425   thomasp85
## 2 3295717490 thomasp85
## 3 219555432  thomasp85
## 4 113125081  thomasp85
## 5 2282250918 thomasp85
## 6 2336317420 thomasp85

Next we form a binary matrix by using an ifelse inside another map_df to ascertain whether or not each follower in the master list follows each of the twitter handles.

## Observations: 58,813
## Variables: 7
## $ dataandme  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ JennyBryan <dbl> 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, ...
## $ hrbrmstr   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, ...
## $ xieyihui   <dbl> 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ drob       <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, ...
## $ juliasilge <dbl> 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, ...
## $ thomasp85  <dbl> 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, ...

Finally, we let UpSetR work its magic on the matrix and visualize the intersections…

22.4 See Also