Recipe 11 Creating a Tag Cloud from Tweet Entities

11.1 Problem

You want to make a meaningless word cloud.

11.2 Solution

Use harvesting techniques shown in previous recipes and pass the cloud-destined entities to an R wordcloud package.

11.3 Discussion

Word clouds are virtually devoid of meaning. Neiman Lab went to far as to call them harmful. But, this recipe is in the Python version of the book (figures, eh?) and this was designed to be a 1:1 mapping of said book, so let’s proceed.

The following uses some handy text taming and word cloud packages to make a collage from #NationalScienceFictionDay tweets:

library(rtweet)
library(tidytext)
library(magick)
library(kumojars) # hrbrmstr/kumojars
library(kumo) # hrbrmstr/kumo
library(tidyverse)

scifi <- search_tweets("#NationalScienceFictionDay", n=1500)

data_frame(txt=str_replace_all(scifi$text, "#NationalScienceFictionDay", "")) %>% 
  unnest_tokens(word, txt) %>% 
  anti_join(stop_words, "word") %>% 
  anti_join(rtweet::stopwordslangs, "word") %>% 
  anti_join(data_frame(word=c("https", "t.co")), "word") %>% # need to make a more technical stopwords list or clean up the text better
  filter(nchar(word)>3) %>% 
  pull(word) %>% 
  paste0(collapse=" ") -> txt

cloud_img <- word_cloud(txt, width=800, height=500,  min_font_size=10, max_font_size=60, scale="log")

image_write(cloud_img, "data/wordcloud.png")

But, seriously, don’t make word clouds except for fun.