Skip navigation

Category Archives: twitter

(this is an unrolled Twitter thread converted to the blog since one never knows how long content will be preserved anywhere anymore)

It looks like @StackPath (NetCDN[.]com redirects to them) is enabling insurrection-mongers. They’re fronting news[.]parler[.]com .

It seems they (Parler) have a second domain dicecrm[.]com with the actual content, too.

dicecrm[.]com is hosted in @awscloud, so it looks like Parler folks are smarter than Bezos’ minions. Amazon might want to take this down before it gets going (again).

They load JS via @Google tag manager (you can see in the HTML src). The GA_MEASUREMENT_ID is “G-P76KHELPLT

BGP Info for the IPs associated with the domain

In site source screenshot in the first tweet there’s a reference to twexit[.]com. DNS for it shows they also have leftwexit[.]com (which is a very odd site).

"Twexit" is being enabled by @awscloud @GoDaddy and @WordPress/@automattic plus @StackPath.

While the main page has (unsurprisingly) busted HTML, they’re using their old sitemap[.]xml — https://carbon.now.sh/mdyJbvddCvZaGu2tOnD6 — which has a singular recent (whining) entry: http://dicecrm[.]com/updates/facebook-continues-their-confusing-hypocritical-stifling-of-free-speech-

Looks like @Shareaholic is also enabling Parler. Their “shareaholic:site_id” is “f7b53d75b2e7afdc512ea898bbbff585“.

shareaholic id capture

One of the CDN content refs is this (attached img). It’s loading content for Parler from free[ ]pressers[.]com, which is a pretty nutjob fake news site enabled by @IBMcloud (so IBM is enabling Parler as well). the free[ ]pressers Twitter is equally nutjob.

I suspect Parler is going to keep rejiggering this nutjob-fueled content network knowing that AWS, IBM (et al) won't play whack-a-mole and are rly just waiting for our collective memory and attention to fade so they can go back to making $ from divisiveness, greed, & hate.

protip: perhaps not spin up a new FQDN with such hastily-crafted garbage behind it when you know lots of very technically-resourced 👀 are on you.

Originally tweeted by (@hrbrmstr) on 2021-01-29.

I caught this tweet by Terence Eden about using Twitter image alt-text to “PGP sign” tweet and my mind immediately went to “how can I abuse this for covert communications, malicious command-and-control, and embedding R code in tweets?”.

When you paste or upload an image to tweet (web interface, at least) you have an opportunity to add “alt” text which is — in theory — supposed to help communicate the content of the image to folks using assistive technology. Terence figured out the alt-text limit on Twitter is large (~1K) which is plenty of room for useful R code.

I poked around for something to use as an example and settled on using data from COVID Stimulus Watch. The following makes the chart in this tweet — https://twitter.com/hrbrmstr/status/1261641887603179520.

I’m not posting the chart here b/c it’s nothing special, but the code for it is below.

library(hrbrthemes);

x <- read.csv("https://data.covidstimuluswatch.org/prog.php?&detail=export_csv")[,3:5];

x[,3] <- as.numeric(gsub("[$,]","",x[,3]));
x <- x[(x[,1]>20200400)&x[,3]>0,];
x[,1] <- as.Date(as.character(x[,1]),"%Y%m%d");

ggplot(x, aes(Award.Date, Grant.Amount, fill=Award.Type)) +
  geom_col() +
  scale_y_comma(
    labels = c("$0", "$5bn", "$10bn", "$15bn")
  ) +
  labs(
    title = "COVID Stimulus Watch: Grants",
    caption = "Source: https://data.covidstimuluswatch.org/prog.php?detail=opening"
  ) +
  theme_ipsum_es(grid="XY")

Semicolons are necessary b/c newlines are going to get stripped when we paste that code block into the alt-text entry box.

We can read that code back into R with some help from read_html() & {styler}:

library(rtweet)
library(rvest)
library(stringi)
library(magrittr)

pg <- read_html("https://twitter.com/hrbrmstr/status/1261641887603179520")

html_nodes(pg, "img") %>% 
  html_attr("alt") %>% 
  keep(stri_detect_fixed, "library") %>% 
  styler::style_text()
library(hrbrthemes)
x <- read.csv("https://data.covidstimuluswatch.org/prog.php?&detail=export_csv")[, 3:5]
x[, 3] <- as.numeric(gsub("[$,]", "", x[, 3]))
x <- x[(x[, 1] > 20200400) & x[, 3] > 0, ]
x[, 1] <- as.Date(as.character(x[, 1]), "%Y%m%d")
ggplot(x, aes(Award.Date, Grant.Amount, fill = Award.Type)) +
  geom_col() +
  scale_y_comma(
    labels = c("$0", "$5bn", "$10bn", "$15bn")
  ) +
  labs(
    title = "COVID Stimulus Watch: Grants",
    caption = "Source: https://data.covidstimuluswatch.org/prog.php?detail=opening"
  ) +
  theme_ipsum_es(grid = "XY")

Twitter’s API does not seem to return alt-text: (see UPDATE)

rtweet::lookup_statuses("1261641887603179520") %>% 
  jsonlite::toJSON(pretty=TRUE)
## [
##   {
##     "user_id": "5685812",
##     "status_id": "1261641887603179520",
##     "created_at": "2020-05-16 12:57:20",
##     "screen_name": "hrbrmstr",
##     "text": "Twitter's img alt-text limit is YUGE! So, we can abuse it for semi-covert comms channels, C2, or for \"embedding\" the code ## that makes this chart!\n\nUse `read_html()` on URL of this tweet; find 'img' nodes w/html_nodes(); extract 'alt' attr text w/## html_attr(). #rstats \n\nh/t @edent https://t.co/v5Ut8TzlRO",
##     "source": "Twitter Web App",
##     "display_text_width": 278,
##     "is_quote": false,
##     "is_retweet": false,
##     "favorite_count": 8,
##     "retweet_count": 2,
##     "hashtags": ["rstats"],
##     "symbols": [null],
##     "urls_url": [null],
##     "urls_t.co": [null],
##     "urls_expanded_url": [null],
##     "media_url": ["http://pbs.twimg.com/media/EYI_W-xWsAAZFeP.png"],
##     "media_t.co": ["https://t.co/v5Ut8TzlRO"],
##     "media_expanded_url": ["https://twitter.com/hrbrmstr/status/1261641887603179520/photo/1"],
##     "media_type": ["photo"],
##     "ext_media_url": ["http://pbs.twimg.com/media/EYI_W-xWsAAZFeP.png"],
##     "ext_media_t.co": ["https://t.co/v5Ut8TzlRO"],
##     "ext_media_expanded_url": ["https://twitter.com/hrbrmstr/status/1261641887603179520/photo/1"],
##     "mentions_user_id": ["14054507"],
##     "mentions_screen_name": ["edent"],
##     "lang": "en",
##     "geo_coords": ["NA", "NA"],
##     "coords_coords": ["NA", "NA"],
##     "bbox_coords": ["NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA"],
##     "status_url": "https://twitter.com/hrbrmstr/status/1261641887603179520",
##     "name": "boB • Everywhere is Baltimore • Rudis",
##     "location": "Doors & Corners",
##     "description": "Don't look at me…I do what he does—just slower. 🇷 #rstats avuncular • pampa • #tired • 👨‍🍳 • ✝️ • Prìomh ## Neach-saidheans Dàta @ @rapid7",
##     "url": "https://t.co/RgY1wHjoqM",
##     "protected": false,
##     "followers_count": 11886,
##     "friends_count": 458,
##     "listed_count": 667,
##     "statuses_count": 84655,
##     "favourites_count": 15140,
##     "account_created_at": "2007-05-01 14:04:24",
##     "verified": true,
##     "profile_url": "https://t.co/RgY1wHjoqM",
##     "profile_expanded_url": "https://rud.is/b",
##     "profile_banner_url": "https://pbs.twimg.com/profile_banners/5685812/1398248552",
##     "profile_background_url": "http://abs.twimg.com/images/themes/theme15/bg.png",
##     "profile_image_url": "http://pbs.twimg.com/profile_images/824974380803334144/Vpmh_s3x_normal.jpg"
##   }
## ]

but I still need to poke over at the API docs to figure out if there is a way to get it more programmatically. (see UPDATE)

If we want to be incredibly irresponsible and daft (like a recently semi-shuttered R package installation service) we can throw caution to the wind and just plot it outright:

library(rtweet)
library(rvest)
library(stringi)
library(magrittr)

pg <- read_html("https://twitter.com/hrbrmstr/status/1261641887603179520")

html_nodes(pg, "img") %>% 
  html_attr("alt") %>% 
  keep(stri_detect_fixed, "library") %>% 
  textConnection() %>% 
  source() %>% # THIS IS DANGEROUS DO NOT TRY THIS AT HOME
  print()

Seriously, though, don’t do that. Lots of bad things can happen when you source() from the internet.

UPDATE (2020-05-17)

You can use:

paste(as.character(parse(text = "...")), collapse = "; ")

to “minify” R code for the alt-text.

And, you can use https://github.com/hrbrmstr/rtweet until it is PR’d back into {rtweet} proper to send tweets with image alt-text. The “status” functions also return any alt-text in a new ext_alt_text column.

FIN

Now, you can make your Twitter charts reproducible on-platform (until Twitter does something to thwart this new communication and file-sharing channel).

Since twitter status URLs are just GET requests, orgs should consider running the content of those URLs through alt-text extractors just in case there’s some funny business going on across user endpoints.

This past week @propublica linked to a really spiffy resource for getting an overview of a Twitter user’s profile and activity called accountanalysis. It has a beautiful interface that works as well on mobile as it does in a real browser. It also is fully interactive and supports cross-filtering (zoom in on the timeline and the other graphs change). It’s especially great if you’re not a coder, but if you are, @kearneymw’s {rtweet} can get you all this info and more, putting the power of R behind data frames full of tweet inanity.

While we covered quite a bit of {rtweet} ground in the 21 Recipes book, summarizing an account to the degree that accountanalysis does is not in there. To rectify this oversight, I threw together a static clone of accountanalysis that can make standalone HTML reports like this one.twitter account analysis header

It’s a fully parameterized R markdown document, meaning you can run it as just a function call (or change the parameter and knit it by hand):

rmarkdown::render(
  input = "account-analysis.Rmd", 
  params = list(
    username = "propublica"
  ), 
  output_file = "~/Documents/propublica-analysis.html"
)

It will also, by default, save a date-stamped copy of the user info and retrieved timeline into the directory you generate the report from (add a prefix path to the save portion in the Rmd to store it in a better place).

With all the data available, you can dig in and extract all the information you want/need.

FIN

You can get the Rmd at your favorite social coding service:

There are many ways to gather Twitter data for analysis and many R and Python (et al) libraries make full use of the Twitter API when building a corpus to extract useful metadata for each tweet along with the text of each tweet. However, many corpus archives are minimal and only retain a small portion of the metadata — often just tweet timestamp, the tweet creator and the tweet text — leaving to the analyst the trudging work of re-extracting hashtags, mentions, URLs (etc).

Twitter provides a tweet-text processing library for many languages. One of these languages is Java. Since it make sense to perform at-scale data operations in Apache Drill, it also seemed to make sense that Apache Drill could use a tweet metadata extraction set of user-defined functions (UDFs). Plus, there just aren’t enough examples of Drill UDFs out there. Thus begat drill-twitter-text?.

What’s Inside the Tin?

There are five UDF functions in the package:

  • tw_parse_tweet(string): Parses the tweet text and returns a map column with the following named values:
    • weightedLength: (int) the overall length of the tweet with code points weighted per the ranges defined in the configuration file
    • permillage: (int) indicates the proportion (per thousand) of the weighted length in comparison to the max weighted length. A value > 1000 indicates input text that is longer than the allowable maximum.
    • isValid: (boolean) indicates if input text length corresponds to a valid result.
    • display_start / display_end: (int) indices identifying the inclusive start and exclusive end of the displayable content of the Tweet.
    • valid_start / valid_end: (int) indices identifying the inclusive start and exclusive end of the valid content of the Tweet.
  • tw_extract_hashtags(string): Extracts all hashtags in the tweet text into a list which can be FLATTEN()ed.
  • tw_extract_screennames(string): Extracts all screennames in the tweet text into a list which can be FLATTEN()ed.
  • tw_extract_urls(string): Extracts all URLs in the tweet text into a list which can be FLATTEN()ed.
  • tw_extract_reply_screenname(): Extracts the reply screenname (if any) from the tweet text into a VARCHAR.

The repo has all the necessary bits and info to help you compile and load the necessary JARs, but those in a hurry can just copy all the files in the target directory to your local jars/3rparty directory and restart Drill.

Usage

Here’s an example of how to call each UDF along with the output:

SELECT 
  tw_extract_screennames(tweetText) AS mentions,
  tw_extract_hashtags(tweetText) AS tags,
  tw_extract_urls(tweetText) AS urls,
  tw_extract_reply_screenname(tweetText) AS reply_to,
  tw_parse_tweet(tweetText) AS tweet_meta
FROM
  (SELECT 
     '@youThere Load data from #Apache Drill to @QlikSense - #Qlik Tuesday Tips and Tricks #ApacheDrill #BigData https://t.co/fkAJokKF5O https://t.co/bxdNCiqdrE' AS tweetText
   FROM (VALUES((1))))

+----------+------+------+----------+------------+
| mentions | tags | urls | reply_to | tweet_meta |
+----------+------+------+----------+------------+
| ["youThere","QlikSense"] | ["Apache","Qlik","ApacheDrill","BigData"] | ["https://t.co/fkAJokKF5O","https://t.co/bxdNCiqdrE"] | youThere | {"weightedLength":154,"permillage":550,"isValid":true,"display_start":0,"display_end":153,"valid_start":0,"valid_end":153} |
+----------+------+------+----------+------------+

FIN

Kick the tyres and file issues and PRs as needed.

NOTE: The likelihood of this recipe being added to the recent practice bookdown book is slim, but I’ll try to keep the same format for the blog post.

Problem

You want to collect all the tweets in a Twitter tweet thread

Solution

Use a few key functions in rtweet to piece the thread elements back together.

Discussion

In Twitterland, a “thread” is a series of tweets by an author that are in a reply chain to each other which enables them to be displayed sequentially to form a larger & (ostensibly) more cohesive message. Even with the recent 280 character tweet-length increase, threads are still popular and used daily. They’re very easy to distinguish on Twitter but there is no Twitter API call to collect up all the pieces of these threads.

Let’s build a function — get_thread() — that will take as input a starting thread URL or status id and return a data frame of all the tweets in the thread (in order). As a bonus, we’ll also include a way to include all first-level retweets and replies to each threaded tweet (that, too, happens quite a bit).

There are documentation snippets in the code block (below), but the essence of the function is:

  • first, finding the tweet that belongs to the status id to get some metadata
  • then doing a search for tweets from the author that occurred after that tweet (we do this to save on API calls and we grab a bunch of them)
  • rather than do a bunch of things by hand, we make from/to pairs to feed in as vertex edges into igraph
  • once that’s done, separate out the graph into unique subgraphs and find the one containing the starting status id
  • since that subgraph is just a set of status ids, rebuild the data frame from it and put it in order.

There may be occasions where we want to grab the replies or RTs of any of the original thread tweets. They aren’t always useful, but when they are it’d be good to have this context. So, we’ll add an option that — if TRUE — will cause the function to go down the list of threaded tweets and pull the first-level replies and RTs (excluding the ones from the author). We’ll do this using the Twitter search API as it’ll ultimately save on API calls and it puts the filtering closer to the data (I’m generally “a fan” of putting computation as close to the data as possible for any given task). If there were any, they’ll be in a replies column which can be unnested at-will.

Here’s the complete function:

get_thread <- function(first_status, include_replies=FALSE, .timeline_history=3000) {

  require(rtweet, quietly=TRUE, warn.conflicts=FALSE)
  require(igraph, quietly=TRUE, warn.conflicts=FALSE)
  require(tidyverse, quietly=TRUE, warn.conflicts=FALSE)

  first_status <- if (str_detect(first_status[1], "^http[s]://")) basename(first_status[1]) else first_status[1]

  # get first status
  orig <- rtweet::lookup_tweets(first_status)

  # grab the author's timeline to search
  author_timeline <- rtweet::get_timeline(orig$screen_name, n=.timeline_history, since_id=first_status)

  # build a data frame containing from/to pairs (anything the author
  # replied to) that also includes the `first_status` id.
  suppressWarnings(
    dplyr::filter(
      author_timeline,
      (status_id == first_status) | (reply_to_screen_name == orig$screen_name)
    ) %>%
      dplyr::select(status_id, reply_to_status_id) %>%
      igraph::graph_from_data_frame() -> g
  ) # build a graph from this

  # decompose the graph into unique subgraphs and return them to data frames
  igraph::decompose(g) %>%
    purrr::map(igraph::as_data_frame) -> threads_dfs

  # find the thread with our `first_status` ids

  thread_df <- purrr::keep(threads_dfs, ~any(which(unique(unlist(.x, use.names=FALSE)) == first_status)))

  # BONUS: we get them in the order we need!
  thread_order <- purrr::discard(rev(unique(unlist(thread_df))), str_detect, "NA")

  # filter out the thread from the timeline corpus & sort it
  dplyr::filter(author_timeline, status_id %in% pull(thread_df[[1]], from)) %>%
    dplyr::mutate(status_id = factor(status_id, levels=thread_order)) %>%
    dplyr::arrange(status_id) -> tweet_thread

  if (include_replies) {
    # for each status, lookup 1st-level references to it, excluding ones from the original author
    mutate(
      tweet_thread,
      replies = purrr::map(
        as.character(status_id),
        ~rtweet::search_tweets(sprintf("%s -from:%s", .x, orig$screen_name[1]))
      )
    ) -> tweet_thread
  }

  class(tweet_thread) <-  c("tweet_thread", class(tweet_thread))

  return(tweet_thread)

}

Now, if we grab this thread, the function will return the following:

xdf <- get_thread("https://twitter.com/petersagal/status/952910499825451009")

glimpse(select(xdf, 1:5))
## Observations: 10
## Variables: 5
## $ status_id   <fctr> 952910499825451009, 952910695804305408, 952911012990193664, 952911632077852679, 9529121...
## $ created_at  <dttm> 2018-01-15 14:29:02, 2018-01-15 14:29:48, 2018-01-15 14:31:04, 2018-01-15 14:33:31, 201...
## $ user_id     <chr> "14985228", "14985228", "14985228", "14985228", "14985228", "14985228", "14985228", "149...
## $ screen_name <chr> "petersagal", "petersagal", "petersagal", "petersagal", "petersagal", "petersagal", "pet...
## $ text        <chr> "Funny you mention that. I talked to Minniejean (Brown) Trickey, one of the Little Rock ...

purrr::map(xdf$text, strwrap) %>% 
  purrr::map_chr(paste0, collapse="\n") %>% 
  cat(sep="\n\n")
## Funny you mention that. I talked to Minniejean (Brown) Trickey, one of the Little Rock Nine, about
## that very day in front of CHS for my documentary, "Constitution USA." https://t.co/MRwtlfZtvp
## 
## You would think that of all people, she would be satisfied with the government's response to racism
## and hate. Ike sent the 101st Airborne to escort her to class!
## 
## But what I didn't know is that after the 101st left, CHS expelled her on a trumped up charge of
## assault after she spilled some chili on a white student.
## 
## She spilled some chili. After being tripped by another white kid. "We got rid of one of them!" the
## teachers bragged.
## 
## Then, of course, rather than continue to allow black students to attend CHS, the governor of Alabama
## closed the schools. https://t.co/2DfBEI0OTL"The_Lost_Year"
## 
## Ms Brown looked around the country post high school. She saw Jim Crow, firehoses turned on Blacks,
## the murder of the Birmingham Four and the Mississippi Three. She moved to Canada.
## 
## As of 2012, she found herself coming back to Little Rock, a place she told me she never wanted to
## see again. But she had family. And the National Historic Site center was there. She liked to drop
## by, talk to the kids about what happened.
## 
## Now she lives in Little Rock full time. She doesn't care that her name is inscribed on a bench in
## front of the school. She doesn't care that your dad welcomed her back in '99.  She spends time at
## the Center, telling people what really happened. You should go talk to her.
## 
## (Sorry: Arkansas, obviously. Typing too quickly.)
## 
## Here's me, talking to Ms Trickey and Marty Sammon, who served with the 101st at Little Rock. Buddy
## Squiers on camera. CHS is off to the left. https://t.co/ft4LUBf3sr
## 
## https://t.co/EHLbe1finj

The replies data frame looks much the same as the thread data frame — it’s essentially just another rtweet data frame, so we won’t waste electrons showing it.

While that map/map/cat sequence isn’t bad to type, it’d be more convenient if we had a print() method for this structure (this is one reason we added a class to it). It’d be even spiffier if this print() method made it easier to distinguish the main thread from the RT’s/replies — but still show those extra bits of info. We’ll use the crayon package for added emphasis:

print.tweet_thread <- function(x, ...) {
  
  cat(crayon::cyan(sprintf("@%s - %s\n\n", x$screen_name[1], x$created_at[1])))
  
  if (!("replies" %in% colnames(x))) x$replies <- purrr::map(1:nrow(x), ~list())
  
  purrr::walk2(x$text, x$replies, ~{
    
    cat(crayon::green(paste0(strwrap(.x), collapse="\n")), "\n\n", sep="")
    
    if (length(.y) > 0) {
      purrr::walk2(.y$screen_name, .y$text, ~{
        sprintf("@%s\n%s", .x, .y) %>%
          strwrap(indent=8, exdent=8) %>%
          paste0(collapse="\n") %>%
          crayon::silver$italic() %>%
          cat("\n\n", sep="")
      })
    }
    
  })
  
}

Let’s re-capture the tweet thread but also include replies this time and print it out:

ydf <- get_thread("https://twitter.com/petersagal/status/952910499825451009", include_replies=TRUE)

ydf

See Also

I’ve git-chatted with Sir Kearney to see where to best put this function. I mention that as there are some upcoming posts that kick the aforeblogged tweet_shot() up a notch or two and all of this may work better in a tweetview package.

Regardless, drop a note in the comments if there are other bits of functionality or function options you think belong in get_thread().

(You can find all R⁶ posts here)

UPDATE 2018-01-01 — this has been added to rtweet (GH version).

A Twitter discussion:

that spawned from Maëlle’s recent look-back post turned into a quick function for capturing an image of a Tweet/thread using webshot, rtweet, magick and glue.

Pass in a status id or a twitter URL and the function will grab an image of the mobile version of the tweet.

The ultimate goal is to make a function that builds a tweet using only R and magick. This will have to do until the new year.

tweet_shot <- function(statusid_or_url, zoom=3) {

  require(glue, quietly=TRUE)
  require(rtweet, quietly=TRUE)
  require(magick, quietly=TRUE)
  require(webshot, quietly=TRUE)

  x <- statusid_or_url[1]

  is_url <- grepl("^http[s]://", x)

  if (is_url) {

    is_twitter <- grepl("twitter", x)
    stopifnot(is_twitter)

    is_status <- grepl("status", x)
    stopifnot(is_status)

    already_mobile <- grepl("://mobile\\.", x)
    if (!already_mobile) x <- sub("://twi", "://mobile.twi", x)

  } else {

    x <- rtweet::lookup_tweets(x)
    stopifnot(nrow(x) > 0)
    x <- glue_data(x, "https://mobile.twitter.com/{screen_name}/status/{status_id}")

  }

  tf <- tempfile(fileext = ".png")
  on.exit(unlink(tf), add=TRUE)

  webshot(url=x, file=tf, zoom=zoom)

  img <- image_read(tf)
  img <- image_trim(img)

  if (zoom > 1) img <- image_scale(img, scales::percent(1/zoom))

  img

}

Now just do one of these:

tweet_shot("947082036019388416")
tweet_shot("https://twitter.com/jhollist/status/947082036019388416")

to get:

By now, virtually every major media outlet has covered the “280 Apocalypse”™. For those still not “in the know”, Twitter recently moved the tweet character cap to 280 after a “successful” beta test (some of us have different ideas of what “success” looks like).

I had been on a hiatus from the platform for a while and planned to (and did) jump back into the fray today but wanted to see what my timeline looked like tweet-length-wise. It’s a simple endeavour: use rtweet to grab the timeline, count the characters per-tweet and look up the results. I posted the results of said process to — of course — Twitter and some folks asked me for the code.

Others used it and there were some discussions as to why timelines looked similar (distribution-wise) with not many Tweets going over 140 characters. One posit I had was that it might be due to client-side limitations since I noted that Twitter for macOS — a terrible client they haven’t updated in ages (but there really aren’t any good ones) — still caps tweets at 140 characters. Others, like Buffer on the web, do have support for 280, so I modified the code a bit to look at the distribution by client.

Rather than bore you with my own timeline analysis, and to help the results be a tad more reproducible (which was another discussion that spawned from the tweet-length tweet), here’s a bit of code that tries to grab the last 3,000 tweets with the #rstats hashtag and plots the distribution by Twitter client:

library(rtweet)
library(ggalt)
library(rprojroot)
library(hrbrthemes)
library(tidyverse)

rt <- find_rstudio_root_file()
rstats_tweet_data_file <- file.path(rt, "data", "2017-11-13-rstats-tweet-search-results.rds")

if (!file.exists(rstats_tweet_data_file)) {
  rstats <- search_tweets("#rstats", 3000) # setting up rtweet is an exercise left to the reader
  write_rds(rstats, rstats_tweet_data_file)
} else {
  rstats <- read_rds(rstats_tweet_data_file)
}

rstats <- mutate(rstats, tweet_length=map_int(text, nchar))  # get the tweet length for each tweet

count(rstats, source) %>%
  filter(n > 5) -> usable_sources  # need data for density + I wanted a nice grid :-)

# We want max tweet length & total # of tweets for sorting & labeling facets
filter(rstats, source %in% usable_sources$source) %>%
  group_by(source) %>%
  summarise(max=max(tweet_length), n=n()) %>%
  arrange(desc(max)) -> ordr

# four breaks per panel regardless of the scales (we're using free-y scales)
there_are_FOUR_breaks <- function(limits) { seq(limits[1], limits[2], length.out=4) }

mutate(rstats) %>%
  filter(source %in% usable_sources$source) %>%
  mutate(source = factor(source, levels=ordr$source,
                         labels=sprintf("%s (n=%s)", ordr$source, ordr$n))) %>%
  ggplot(aes(tweet_length)) +
  geom_bkde(aes(color=source, fill=source), bandwidth=5, alpha=2/3) +
  geom_vline(xintercept=140, linetype="dashed", size=0.25) +
  scale_x_comma(breaks=seq(0,280,70), limits=c(0,280)) +
  scale_y_continuous(breaks=there_are_FOUR_breaks, expand=c(0,0)) +
  facet_wrap(~source, scales="free", ncol=5) +
  labs(x="Tweet length", y="Density",
       title="Tweet length distributions by Twitter client (4.5 days #rstats)",
       subtitle="Twitter client facets in decreasing order of ones with >140 length tweets",
       caption="NOTE free Y axis scales\nBrought to you by #rstats, rtweet & ggalt") +
  theme_ipsum_rc(grid="XY", strip_text_face="bold", strip_text_size=8, axis_text_size=7) +
  theme(panel.spacing.x=unit(5, "pt")) +
  theme(panel.spacing.y=unit(5, "pt")) +
  theme(axis.text.x=element_text(hjust=c(0, 0.5, 0.5, 0.5, 1))) +
  theme(axis.text.y=element_blank()) +
  theme(legend.position="none")

FIN

While the 140 barrier has definitely been broken, it has not been abused (yet) but the naive character counting is also not perfect since it looks like it doesn’t “count” the same way Twitter-proper does (image “attachments”, as an example, are counted as characters here here but they aren’t counted that way in Twitter clients). Bots are also counted as Twitter clients.

It’ll be interesting to track this in a few months as folks start to inch-then-blaze past the former hard-limit.

Give the code (or use your timeline info) a go and post a link with your results! You can find an RStudio project directory over on GitHub ?.

As many folks know, I live in semi-rural Maine and we were hit pretty hard with a wind+rain storm Sunday to Monday. The hrbrmstr compound had no power (besides a generator) and no stable/high-bandwidth internet (Verizon LTE was heavily congested) since 0500 Monday and still does not as I write this post.

I’ve played with scraping power outage data from Central Maine Power but there’s a great Twitter account — PowerOutage_us — that has done much of the legwork for the entire country. They don’t cover everything and do not provide easily accessible historical data (likely b/c evil folks wld steal it w/o payment or credit) but they do have a site you can poke at and do provide updates via Twitter. As you’ve seen in a previous post, we can use the rtweet package to easily read Twitter data. And, the power outage tweets are regular enough to identify and parse. But raw data is so…raw.

While one could graph data just for one’s self, I decided to marry this power scraping capability with a recent idea I’ve been toying with adding to hrbrthemes or ggalt: gg_tweet(). Imagine being able to take a ggplot2 object and “plot” it to Twitter, fully conforming to Twitter’s stream or card image sizes. By conforming to these size constraints, they don’t get cropped in the timeline view (if you allow images to be previewed in-timeline). This is even more powerful if you have some helper functions for proper theme-ing (font sizes especially need to be tweaked). Enter gg_tweet().

Power Scraping

We’ll cover scraping @PowerOutage_us first, but we’ll start with all the packages we’ll need and a helper function to convert power outage estimates to numeric values:

library(httr)
library(magick)
library(rtweet)
library(stringi)
library(hrbrthemes)
library(tidyverse)

words_to_num <- function(x) {
  map_dbl(x, ~{
    val <- stri_match_first_regex(.x, "^([[:print:]]+) #PowerOutages")[,2]
    mul <- case_when(
      stri_detect_regex(val, "[Kk]") ~ 1000,
      stri_detect_regex(val, "[Mm]") ~ 1000000,
      TRUE ~ 1
    ) 
    val <- stri_replace_all_regex(val, "[^[:digit:]\\.]", "")
    as.numeric(val) * mul 
  })
}

Now, I can’t cover setting up rtweet OAuth here. The vignette and package web site do that well.

The bot tweets infrequently enough that this is really all we need (though, bump up n as you need to):

outage <- get_timeline("PowerOutage_us", n=300)

Yep, that gets the last 300 tweets from said account. It’s amazingly simple.

Now, the outage tweets for the east coast / northeast are not individually uniform but collectively they are (there’s a pattern that may change but you can tweak this if they do):

filter(outage, stri_detect_regex(text, "\\#(EastCoast|NorthEast)")) %>% 
  mutate(created_at = lubridate::with_tz(created_at, 'America/New_York')) %>% 
  mutate(number_out = words_to_num(text)) %>%  
  ggplot(aes(created_at, number_out)) +
  geom_segment(aes(xend=created_at, yend=0), size=5) +
  scale_x_datetime(date_labels = "%Y-%m-%d\n%H:%M", date_breaks="2 hours") +
  scale_y_comma(limits=c(0,2000000)) +
  labs(
    x=NULL, y="# Customers Without Power",
    title="Northeast Power Outages",
    subtitle="Yay! Twitter as a non-blather data source",
    caption="Data via: @PowerOutage_us <https://twitter.com/PowerOutage_us>"
  ) -> gg

That pipe chain looks for key hashtags (for my area), rejiggers the time zone, and calls the helper function to, say, convert 1.2+ Million to 1200000. Finally it builds a mostly complete ggplot2 object (you should make the max Y limit more dynamic).

You can plot that on your own (print gg). We’re here to tweet, so let’s go into the next section.

Magick Tweeting

@opencpu made it possible shunt plot output to a magick device. This means we have really precise control over ggplot2 output size as well as the ability to add other graphical components to a ggplot2 plot using magick idioms. One thing we need to take into account is “retina” plots. They are — essentially — double resolution plots (72 => 144 pixels per inch). For the best looking plots we need to go retina, but that also means kicking up base plot theme font sizes a bit. Let’s build on hrbrthemes::theme_ipsum_rc() a bit and make a theme_tweet_rc():

theme_tweet_rc <- function(grid = "XY", style = c("stream", "card"), retina=TRUE) {
  
  style <- match.arg(tolower(style), c("stream", "card"))
  
  switch(
    style, 
    stream = c(24, 18, 16, 14, 12),
    card = c(22, 16, 14, 12, 10)
  ) -> font_sizes
  
  theme_ipsum_rc(
    grid = grid, 
    plot_title_size = font_sizes[1], 
    subtitle_size = font_sizes[2], 
    axis_title_size = font_sizes[3], 
    axis_text_size = font_sizes[4], 
    caption_size = font_sizes[5]
  )
  
}

Now, we just need a way to take a ggplot2 object and shunt it off to twitter. The following gg_tweet() function does not (now) use rtweet as I’ll likely add it to either ggalt or hrbrthemes and want to keep dependencies to a minimum. I may opt-in to bypass the current method since it relies on environment variables vs an RDS file for app credential storage. Regardless, one thing I wanted to do here was provide a way to preview the image before tweeting.

So you pass in a ggplot2 object (likely adding the tweet theme to it) and a Twitter status text (there’s a TODO to check the length for 140c compliance) plus choose a style (stream or card, defaulting to stream) and decide on whether you’re cool with the “retina” default.

Unless you tell it to send the tweet it won’t, giving you a chance to preview the image before sending, just in case you want to tweak it a bit before committing it to the Twitterverse. It als returns the magick object it creates in the event you want to do something more with it:

gg_tweet <- function(g, status = "ggplot2 image", style = c("stream", "card"), 
                     retina=TRUE, send = FALSE) {
  
  style <- match.arg(tolower(style), c("stream", "card"))
  
  switch(
    style, 
    stream = c(w=1024, h=512),
    card = c(w=800, h=320)
  ) -> dims
  
  dims["res"] <- 72
  
  if (retina) dims <- dims * 2
  
  fig <- image_graph(width=dims["w"], height=dims["h"], res=dims["res"])
  print(g)
  dev.off()
  
  if (send) {
    
    message("Posting image to twitter...")
    
    tf <- tempfile(fileext = ".png")
    image_write(fig, tf, format="png")
    
    # Create an app at apps.twitter.com w/callback URL of http://127.0.0.1:1410
    # Save the app name, consumer key and secret to the following
    # Environment variables
    
    app <- oauth_app(
      appname = Sys.getenv("TWITTER_APP_NAME"),
      key = Sys.getenv("TWITTER_CONSUMER_KEY"),
      secret = Sys.getenv("TWITTER_CONSUMER_SECRET")
    )
    
    twitter_token <- oauth1.0_token(oauth_endpoints("twitter"), app)
    
    POST(
      url = "https://api.twitter.com/1.1/statuses/update_with_media.json",
      config(token = twitter_token), 
      body = list(
        status = status,
        media = upload_file(path.expand(tf))
      )
    ) -> res
    
    warn_for_status(res)
    
    unlink(tf)
    
  }
  
  fig
  
}

Two Great Tastes That Taste Great Together

We can combine the power outage scraper & plotter with the tweeting code and just do:

gg_tweet(
  gg + theme_tweet_rc(grid="Y"),
  status = "Progress! #rtweet #gg_tweet",
  send=TRUE
)

That was, in-fact, the last power outage tweet I sent.

Next Steps

Ironically, given current levels of U.S. news and public “discourse” on Twitter and some inane machinations in my own area of domain expertise (cyber), gg_tweet() is likely one of the few ways I’ll be interacting with Twitter for a while. You can ping me on Keybase — hrbrmstr — or join the rstats Keybase team via keybase team request-access rstats if you need to poke me for anything for a while.

FIN

Kick the tyres and watch for gg_tweet() ending up in ggalt or hrbrthemes. Don’t hesitate to suggest (or code up) feature requests. This is still an idea in-progress and definitely not ready for prime time without a bit more churning. (Also, words_to_num() can be optimized, it was hastily crafted).