Skip navigation

Category Archives: twitter

Way back in July of 2009, the first version of the twitteR package was published by Geoff Jentry in CRAN. Since then it has seen 28 updates, finally breaking the 0.x.y barrier into 1.x.y territory in March of 2013 and receiving it’s last update in July of 2015.

For a very long time, the twitteR package was the way to siphon precious nuggets of 140 character data from that platform and is the top hit when one searches for r twitter package. It even ha[sd] it’s own mailing list and is quite popular, judging by RStudio’s CRAN logs total downloads stats .

I blog today to suggest there is a better way to work with Twitter data from R, especially if your central use-case is searching Twitter and mining tweet data. This new way is rtweet by Michael Kearney. It popped up on the scene back in August of 2016 and receives quite a bit of ? from the developer, especially on GitHub.

This post is short and mostly designed to convince you to (a) try out the package and (b) blog and tweet about the package — if you do agree that it’s the best modern way to work with Twitter from R — to raise awareness about it. Because of that focus, I won’t be delving into all of rtweet‘s seekrits, but you can explore them yourself on it’s spiffy pkgdown site.

While both packages have nigh complete access to the Twitter API, I posit that the quintessential use-case for working with Twitter in R is searching through tweets/users and then performing various types of data mining on the retrieved results. To that end, I’m going to show one use-case (out of many potential ones) that will both save you API-time and post-API munging time in order to convince you to switch to rtweet and spread the word about it.

Data-mining 300 #rstats Tweets : A Play in Two Acts

We’ll search Twitter for #rstats-tagged tweets with both twitteR and rtweet, starting with the former:

library(twitteR)

# this relies on you setting up an app in apps.twitter.com
setup_twitter_oauth(
  consumer_key = Sys.getenv("TWITTER_CONSUMER_KEY"), 
  consumer_secret = Sys.getenv("TWITTER_CONSUMER_SECRET")
)

r_folks <- searchTwitter("#rstats", n=300)

str(r_folks, 1)
## List of 300
##  $ :Reference class 'status' [package "twitteR"] with 17 fields
##   ..and 53 methods, of which 39 are  possibly relevant
##  $ :Reference class 'status' [package "twitteR"] with 17 fields
##   ..and 53 methods, of which 39 are  possibly relevant
##  $ :Reference class 'status' [package "twitteR"] with 17 fields
##   ..and 53 methods, of which 39 are  possibly relevant

str(r_folks[1])
## List of 1
##  $ :Reference class 'status' [package "twitteR"] with 17 fields
##   ..$ text         : chr "RT @historying: Wow. This is an enormously helpful tutorial by @vivalosburros for anyone interested in mapping "| __truncated__
##   ..$ favorited    : logi FALSE
##   ..$ favoriteCount: num 0
##   ..$ replyToSN    : chr(0) 
##   ..$ created      : POSIXct[1:1], format: "2017-10-22 17:18:31"
##   ..$ truncated    : logi FALSE
##   ..$ replyToSID   : chr(0) 
##   ..$ id           : chr "922150185916157952"
##   ..$ replyToUID   : chr(0) 
##   ..$ statusSource : chr "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>"
##   ..$ screenName   : chr "jasonrhody"
##   ..$ retweetCount : num 3
##   ..$ isRetweet    : logi TRUE
##   ..$ retweeted    : logi FALSE
##   ..$ longitude    : chr(0) 
##   ..$ latitude     : chr(0) 
##   ..$ urls         :'data.frame': 0 obs. of  4 variables:
##   .. ..$ url         : chr(0) 
##   .. ..$ expanded_url: chr(0) 
##   .. ..$ dispaly_url : chr(0) 
##   .. ..$ indices     : num(0) 
##   ..and 53 methods, of which 39 are  possibly relevant:
##   ..  getCreated, getFavoriteCount, getFavorited, getId, getIsRetweet, getLatitude, getLongitude, getReplyToSID,
##   ..  getReplyToSN, getReplyToUID, getRetweetCount, getRetweeted, getRetweeters, getRetweets, getScreenName,
##   ..  getStatusSource, getText, getTruncated, getUrls, initialize, setCreated, setFavoriteCount, setFavorited, setId,
##   ..  setIsRetweet, setLatitude, setLongitude, setReplyToSID, setReplyToSN, setReplyToUID, setRetweetCount,
##   ..  setRetweeted, setScreenName, setStatusSource, setText, setTruncated, setUrls, toDataFrame, toDataFrame#twitterObj

Both packages follow the similar idioms and you need to have done some prep-work by creating a Twitter “app” (both packages have instructions for that).

That operation took about 3 seconds on a fast internet connection and wicked fast computer. What you get back is definitely usable data, but it’s in lists of custom objects. This is due to the way that package models the Twitter API on to custom R objects. It’s elegant, but also likely overkill for most operations. You can use something like purrr::map_df(r_folks, as.data.frame) to get that list into a data frame, there are some other “gotchas”, such as text encoding (on a later run of this code both dplyr::glimpse() and str() gave me “invalid multibyte string” errors but that same thing did not happen with rtweet.

Here’s the rtweet version:

library(rtweet)

# this relies on you setting up an app in apps.twitter.com
create_token(
  app = Sys.getenv("TWITTER_APP"),
  consumer_key = Sys.getenv("TWITTER_CONSUMER_KEY"), 
  consumer_secret = Sys.getenv("TWITTER_CONSUMER_SECRET")
) -> twitter_token

saveRDS(twitter_token, "~/.rtweet-oauth.rds")

# ideally put this in ~/.Renviron
Sys.setenv(TWITTER_PAT=path.expand("~/.rtweet-oauth.rds"))

rtweet_folks <- search_tweets("#rstats", n=300)

dplyr::glimpse(rtweet_folks)
## Observations: 300
## Variables: 35
## $ screen_name                    <chr> "AndySugs", "jsbreker", "__rahulgupta__", "AndySugs", "jasonrhody", "sibanjan...
## $ user_id                        <chr> "230403822", "703927710", "752359265394909184", "230403822", "14184263", "863...
## $ created_at                     <dttm> 2017-10-22 17:23:13, 2017-10-22 17:19:48, 2017-10-22 17:19:39, 2017-10-22 17...
## $ status_id                      <chr> "922151366767906819", "922150507745079297", "922150470382125057", "9221504090...
## $ text                           <chr> "RT:  (Rbloggers)Markets Performance after Election: Day 239  https://t.co/D1...
## $ retweet_count                  <int> 0, 0, 9, 0, 3, 1, 1, 57, 57, 103, 10, 10, 0, 0, 0, 34, 0, 0, 642, 34, 1, 1, 1...
## $ favorite_count                 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ is_quote_status                <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, ...
## $ quote_status_id                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ is_retweet                     <lgl> FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, F...
## $ retweet_status_id              <chr> NA, NA, "922085241493360642", NA, "921782329936408576", "922149318550843393",...
## $ in_reply_to_status_status_id   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ in_reply_to_status_user_id     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ in_reply_to_status_screen_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ lang                           <chr> "en", "en", "en", "en", "en", "en", "en", "en", "en", "en", "en", "en", "ro",...
## $ source                         <chr> "IFTTT", "Twitter for iPhone", "GaggleAMP", "IFTTT", "Twitter for Android", "...
## $ media_id                       <chr> NA, "922150500237062144", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "92...
## $ media_url                      <chr> NA, "http://pbs.twimg.com/media/DMwi_oQUMAAdx5A.jpg", NA, NA, NA, NA, NA, NA,...
## $ media_url_expanded             <chr> NA, "https://twitter.com/jsbreker/status/922150507745079297/photo/1", NA, NA,...
## $ urls                           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ urls_display                   <chr> "ift.tt/2xe1xrR", NA, NA, "ift.tt/2xe1xrR", NA, "bit.ly/2yAAL0M", "bit.ly/2yA...
## $ urls_expanded                  <chr> "http://ift.tt/2xe1xrR", NA, NA, "http://ift.tt/2xe1xrR", NA, "http://bit.ly/...
## $ mentions_screen_name           <chr> NA, NA, "DataRobot", NA, "historying vivalosburros", "NoorDinTech ikashnitsky...
## $ mentions_user_id               <chr> NA, NA, "622519917", NA, "18521423 304837258", "2511247075 739773414316118017...
## $ symbols                        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ hashtags                       <chr> "rstats DataScience", "Rstats ACSmtg", "rstats", "rstats DataScience", "rstat...
## $ coordinates                    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ place_id                       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ place_type                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ place_name                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ place_full_name                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ country_code                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ country                        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ bounding_box_coordinates       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ bounding_box_type              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...

That took about 1.5 seconds and provides a tidy, immediately usable data structure.

But, that’s not all!

Michael has support for accessing the Twitter streaming API and also has some handy plot functions for quickly exploring retrieved Twitter content, enabling you to make pretty spiffy plots like this with almost no effort (this was pirated from Michael’s package website):

Fin

If the legacy twitteR package is already in your workflows, there may be little to gain. But, I would suggest that R folks give rtweet a try and blog about your experiences. It’ll give others a chance to see usage in different contacts and will also help spread the word about this alternative package and help bump up it’s pagerank.

As I said on Twitter:

Make sure to try out the GitHub version as well since it has gained some new functionality not currently in CRAN and don’t hesitate to ping @kearneymw on Twitter (though he may regret me suggesting that :-).

If this feature was in Mountain Lion Developer Previews prior to revision 4, I didn’t notice it, but you can now tweet directly from the Notification Center “pane”.

You first need to make sure said feature is enabled in System Preferences (note the use of the “old” Notification Center icon in the preference pane icon set…perhaps Apple will change that prior to the final Mountain Lion build):

If it is enabled, you now have a Twitter – er – “button” [?] in the Notification Center pane, and clicking it gives you a tweet box where you can drop your 140 from all your connected Twitter accounts.

I suspect there will be similar Facebook posting features integrated, but I’m not part of the Facebook zombie horde and have no way of testing it.

I also noticed that Chrome now has tighter integration with the Notification Center than it did before (as in, I actually saw notifications pop up from Google Mail) and both the Beta channel and Canary builds have integrated the functionality:

I recently finished watching “Japan: Memoirs of a Secret Empire” [ Netflix | PBS | Amazon ]. It’s a three-part documentary that centers around the 16th & 17th centuries (where a vast majority of my favorite samurai movies are set).

In the third part there is a segment on the four-tier class system. It goes into some detail on how the decline of the samurai (due to a time of significant peace) and the increase in trade facilitated the erosion of societal taboos that previously prevented classes from mingling. Apart from the creation of a new merchant-inspired class and an increase in the frequenting of courtesans there was also the mention of “haiku clubs” where (quoting from the documentary) “members chose pen names to obscure their social rank. That way, the classes could mingle freely“.

In a similar way, Twitter (and many forums before it) is the great equalizer with the added similarity of brevity (and often a gravity well for some modern day haiku fanatics). I don’t have any statistics, but I do wonder if Twitter is having a more equalizing impact on societies where class and overt discrimination are still wildly prevalent (think India or Saudi Arabia).

as avatars tweet
modern words become equal
history repeats