• 21 Recipes for Mining Twitter with rtweet
  • Preface
  • About the Authors
    • 0.1 Contributors
  • 1 Using OAuth to Access Twitter APIs
    • 1.1 Problem
    • 1.2 Solution
    • 1.3 Discussion
    • 1.4 See Also
  • 2 Looking Up the Trending Topics
    • 2.1 Problem
    • 2.2 Solution
    • 2.3 Discussion
    • 2.4 See Also
  • 3 Extracting Tweet Entities
    • 3.1 Problem
    • 3.2 Solution
    • 3.3 Discussion
    • 3.4 See Also
  • 4 Searching for Tweets
    • 4.1 Problem
    • 4.2 Solution
    • 4.3 Discussion
    • 4.4 See Also
  • 5 Extracting a Retweet’s Origins
    • 5.1 Problem
    • 5.2 Solution
    • 5.3 Discussion
    • 5.4 See Also
  • 6 Creating a Graph of Retweet Relationships
    • 6.1 Problem
    • 6.2 Solution
    • 6.3 Discussion
    • 6.4 See Also
  • 7 Visualizing a Graph of Retweet Relationships
    • 7.1 Problem
    • 7.2 Solution
    • 7.3 See Also
  • 8 Capturing Tweets in Real-time with the Streaming API
    • 8.1 Problem
    • 8.2 Solution
    • 8.3 Discussion
    • 8.4 See Also
  • 9 Making Robust Twitter Requests
    • 9.1 Problem
    • 9.2 Solution
    • 9.3 Discussion
    • 9.4 See Also
  • 10 Harvesting Tweets
    • 10.1 Problem
    • 10.2 Solution
    • 10.3 Discussion
  • 11 Creating a Tag Cloud from Tweet Entities
    • 11.1 Problem
    • 11.2 Solution
    • 11.3 Discussion
  • 12 Summarizing Link Targets
    • 12.1 Problem
    • 12.2 Solution
    • 12.3 Discussion
    • 12.4 See Also
  • 13 Harvesting Friends and Followers
    • 13.1 Problem
    • 13.2 Solution
    • 13.3 Discussion
    • 13.4 See Also
  • 14 Performing Setwise Operations on Friendship Data
    • 14.1 Problem
    • 14.2 Solution
    • 14.3 Discussion
    • 14.4 See Also
  • 15 Resolving User Profile Information
    • 15.1 Problem
    • 15.2 Solution
    • 15.3 Discussion
    • 15.4 See Also
  • 16 Crawling Followers to Approximate Primary Influence
    • 16.1 Problem
    • 16.2 Solution
    • 16.3 Discussion
    • 16.4 See Also
  • 17 Analyzing Friendship Relationships such as Friends of Friends
    • 17.1 Problem
    • 17.2 Solution
    • 17.3 Discussion
    • 17.4 See Also
  • 18 Analyzing Friendship Cliques
    • 18.1 Problem
    • 18.2 Solution
    • 18.3 Discussion
    • 18.4 See Also
  • 19 Analyzing the Authors of Tweets that Appear in Search Results
    • 19.1 Problem
    • 19.2 Solution
  • 20 Visualizing Geodata with a Dorling Cartogram
    • 20.1 Problem
    • 20.2 Solution
    • 20.3 Discussion
  • 21 Geocoding Locations from Profiles (or Elsewhere)
    • 21.1 Problem
    • 21.2 Solution
    • 21.3 Discussion
    • 21.4 See Also
  • 22 Visualizing Intersecting Follower Sets with UpSetR
    • 22.1 Problem
    • 22.2 Solution
    • 22.3 Discussion
    • 22.4 See Also
  • Published with bookdown

21 Recipes for Mining Twitter Data with rtweet

Recipe 2 Looking Up the Trending Topics

2.1 Problem

You want to keep track of the trending topics on Twitter over a period of time.

2.2 Solution

Use rtweet::trends_available() to see trend areas and rtweet::get_trends() to pull trends, after which you can setup a task to retrieve and cache the trend data periodically.

2.3 Discussion

Twitter has extensive information on trending topics and their API enables you to see topics that are trending globally or regionally. Twitter uses Yahoo! Where on Earth identifiers (WOEIDs) for the regions which can be obtained from rtweet::trends_available():

library(rtweet)
library(tidyverse)
(trends_avail <- trends_available())
## # A tibble: 467 x 8
##    name       url      parentid country woeid countryCode  code place_type
##  * <chr>      <chr>       <int> <chr>   <int> <chr>       <int> <chr>     
##  1 Worldwide  http://…        0 ""          1 <NA>           19 Supername 
##  2 Winnipeg   http://… 23424775 Canada   2972 CA              7 Town      
##  3 Ottawa     http://… 23424775 Canada   3369 CA              7 Town      
##  4 Quebec     http://… 23424775 Canada   3444 CA              7 Town      
##  5 Montreal   http://… 23424775 Canada   3534 CA              7 Town      
##  6 Toronto    http://… 23424775 Canada   4118 CA              7 Town      
##  7 Edmonton   http://… 23424775 Canada   8676 CA              7 Town      
##  8 Calgary    http://… 23424775 Canada   8775 CA              7 Town      
##  9 Vancouver  http://… 23424775 Canada   9807 CA              7 Town      
## 10 Birmingham http://… 23424975 United… 12723 GB              7 Town      
## # ... with 457 more rows
glimpse(trends_avail)
## Observations: 467
## Variables: 8
## $ name        <chr> "Worldwide", "Winnipeg", "Ottawa", "Quebec", "Mont...
## $ url         <chr> "http://where.yahooapis.com/v1/place/1", "http://w...
## $ parentid    <int> 0, 23424775, 23424775, 23424775, 23424775, 2342477...
## $ country     <chr> "", "Canada", "Canada", "Canada", "Canada", "Canad...
## $ woeid       <int> 1, 2972, 3369, 3444, 3534, 4118, 8676, 8775, 9807,...
## $ countryCode <chr> NA, "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA"...
## $ code        <int> 19, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7...
## $ place_type  <chr> "Supername", "Town", "Town", "Town", "Town", "Town...

The Twitter API is somewhat unforgiving and unfriendly when you use it directly since it requires the use of a WOEID. Michael has made life much easier for us all by enabling the use of names or regular expressions when asking for trends from a particular place. That means we don’t even need to care about capitalization:

(us <- get_trends("united states"))
## # A tibble: 50 x 9
##    trend   url         promoted_content query   tweet_volume place   woeid
##  * <chr>   <chr>       <lgl>            <chr>          <int> <chr>   <int>
##  1 Teairr… http://twi… NA               %22Tea…           NA Unite… 2.34e7
##  2 #Wedne… http://twi… NA               %23Wed…       162658 Unite… 2.34e7
##  3 #NetNe… http://twi… NA               %23Net…        79857 Unite… 2.34e7
##  4 #Funny… http://twi… NA               %23Fun…           NA Unite… 2.34e7
##  5 Novart… http://twi… NA               Novart…       139836 Unite… 2.34e7
##  6 #TWcha… http://twi… NA               %23TWc…           NA Unite… 2.34e7
##  7 #Outla… http://twi… NA               %23Out…           NA Unite… 2.34e7
##  8 Haspel  http://twi… NA               Haspel        258046 Unite… 2.34e7
##  9 Annett… http://twi… NA               %22Ann…           NA Unite… 2.34e7
## 10 Carlos… http://twi… NA               %22Car…           NA Unite… 2.34e7
## # ... with 40 more rows, and 2 more variables: as_of <dttm>,
## #   created_at <dttm>
glimpse(us)
## Observations: 50
## Variables: 9
## $ trend            <chr> "Teairra Mari", "#WednesdayWisdom", "#NetNeut...
## $ url              <chr> "http://twitter.com/search?q=%22Teairra+Mari%...
## $ promoted_content <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ query            <chr> "%22Teairra+Mari%22", "%23WednesdayWisdom", "...
## $ tweet_volume     <int> NA, 162658, 79857, NA, 139836, NA, NA, 258046...
## $ place            <chr> "United States", "United States", "United Sta...
## $ woeid            <int> 23424977, 23424977, 23424977, 23424977, 23424...
## $ as_of            <dttm> 2018-05-09 20:06:44, 2018-05-09 20:06:44, 20...
## $ created_at       <dttm> 2018-05-09 20:01:03, 2018-05-09 20:01:03, 20...

Twitter’s documentation states that trends are updated every 5 minutes, which means you should not call the API more frequently than that and their current API rate-limit (Twitter puts some restrictions on how frequently you can call certain API targets) is 75 requests per 15-minute window.

The rtweet::get_trends() function returns a data frame. Our ultimate goal is to retrieve the trends data on a schedule and cache it. There are numerous — and usually complex – ways to schedule jobs. One cross-platform solution is to use R itself to run a task periodically. This means keeping an R console open and running at all times, so is far from an optimal solution. See the taskscheduleR package for other ideas on how to setup more robust scheduled jobs.

In this example, we will:

  • use a SQLite database to store the trends
  • use the DBI add RSQlite packages to work with this database
  • setup a never-ending loop with Sys.sleep() providing a pause between requests
library(DBI)
library(RSQLite)
library(rtweet) # mkearney/rtweet

repeat {
  message("Retrieveing trends...") # optional
  us <- get_trends("united states")
  db_con <- dbConnect(RSQLite::SQLite(), "data/us-trends.db")
  dbWriteTable(db_con, "us_trends", us, append=TRUE) # append=TRUE will update the table vs overwrite and also create it on first run if it does not exist
  dbDisconnect(db_con)
  Sys.sleep(10 * 60) # sleep for 10 minutes
}

Later on, we can look at this data with dplyr/dbplyr:

library(dplyr)

trends_db <- src_sqlite("data/us-trends.db")
us <- tbl(trends_db, "us_trends")
select(us, trend)
## # Source:   lazy query [?? x 1]
## # Database: sqlite 3.22.0
## #   [/Users/hrbrmstr/Development/21-recipes/data/us-trends.db]
##    trend                      
##    <chr>                      
##  1 #TuesdayThoughts           
##  2 #backtowork                
##  3 #SavannahHodaTODAY         
##  4 Justin Timberlake          
##  5 #MyTVShowWasCanceledBecause
##  6 #AM2DM                     
##  7 The Trump Effect           
##  8 Carrie Underwood           
##  9 Sean Ryan                  
## 10 Larry Krasner              
## # ... with more rows

2.4 See Also

  • RSQlite quick reference
  • Introduction to dbplyr : http://dbplyr.tidyverse.org/articles/dbplyr.html