Recipe 11 Creating a Tag Cloud from Tweet Entities

11.1 Problem

You want to make a meaningless word cloud.

11.2 Solution

Use harvesting techniques shown in previous recipes and pass the cloud-destined entities to an R wordcloud package.

11.3 Discussion

Word clouds are virtually devoid of meaning. Neiman Lab went to far as to call them harmful. But, this recipe is in the Python version of the book (figures, eh?) and this was desingned to be a 1:1 mapping of said book, so let’s proceed.

The folowing uses some handy text taming and word cloud packages to make a collage from #NationalScienceFictionDay tweets:

library(kumojars) # hrbrmstr/kumojars
library(kumo) # hrbrmstr/kumo
scifi <- search_tweets("#NationalScienceFictionDay", n=1500)
data_frame(txt=str_replace_all(scifi$text, "#NationalScienceFictionDay", "")) %>%
unnest_tokens(word, txt) %>%
anti_join(stop_words, "word") %>%
anti_join(rtweet::stopwordslangs, "word") %>%
anti_join(data_frame(word=c("https", "")), "word") %>% # need to make a more technical stopwords list or clean up the text better
filter(nchar(word)>3) %>%
pull(word) %>%
paste0(collapse=" ") -> txt
cloud_img <- word_cloud(txt, width=800, height=500, min_font_size=10, max_font_size=60, scale="log")
image_write(cloud_img, "data/wordcloud.png")

But, seriously, don’t make word clouds except for fun.