R⁶ — Capturing [YouTube] Captions

(R⁶ == brief, low-expository posts)

@yoniceedee suggested I look at the Cambridge Analytics “whistleblower” testimony proceedings:

@hrbrmstr giving the term "improving r&d" a whole new meaning … https://t.co/f1KA8U3htT

— yoni sidi (@yoniceedee) March 29, 2018

I value the resources @yoniceedee tosses my way (they often end me down twisted paths like this one, though :-) but I really dislike spending any amount of time on youtube and can consume text context much faster than even accelerated video playback.

Google auto-generated captions for that video and you can display them by clicking below the video on the right and enabling the transcript which slowly (well, in my frame of reference) loads into the upper-right. That’s still sub-optimal since we need to be on the youtube page to read/scroll. There’s no “export” option and my initial instinct was to go to Developer Tools and look for the https://www.youtube.com/service_ajax?name=getTranscriptEndpoint URL and “Copy the Response” to the clipboard and save it to a file then do some JSON/list wrangling (the transcript JSON URL is in the snippet below):

library(tidyverse)

trscrpt <- jsonlite::fromJSON("https://rud.is/dl/ca-transcript.json")

runs <- trscrpt$data$actions$openTranscriptAction$transcriptRenderer$transcriptRenderer$body$transcriptBodyRenderer$cueGroups[[1]]$transcriptCueGroupRenderer$formattedStartOffset$runs
cues <- trscrpt$data$actions$openTranscriptAction$transcriptRenderer$transcriptRenderer$body$transcriptBodyRenderer$cueGroups[[1]]$transcriptCueGroupRenderer$cues

data_frame(
  mark = map_chr(runs, ~.x$text),
  text = map_chr(cues, ~.x$transcriptCueRenderer$cue$runs[[1]]$text)  
) %>% 
  separate(mark, c("minute", "second"), sep=":", remove = FALSE, convert = TRUE) 
## # A tibble: 3,247 x 4
##    mark  minute second text                                    
##    <chr>  <int>  <int> <chr>                                   
##  1 00:00      0      0 all sort of yeah web of things if it's a
##  2 00:02      0      2 franchise then there's a kind of        
##  3 00:03      0      3 ultimately there's a there's a there's a
##  4 00:05      0      5 coordinator of that franchise or someone
##  5 00:07      0      7 who's a you got a that franchise is well
##  6 00:09      0      9 well when I was there that was Alexander
##  7 00:13      0     13 Nixon Steve banning but that's that's a 
##  8 00:16      0     16 question you should be asking aiq yeah  
##  9 00:18      0     18 yeah and just got to a IQ and the GSR   
## 10 00:24      0     24 state from gts-r that's other Hogan data
## # ... with 3,237 more rows

But, then I remembered YouTube has an API for this and threw together a quick script to grab them that way as well:

# the API needs these scopes

c(
  "https://www.googleapis.com/auth/youtube.force-ssl",
  "https://www.googleapis.com/auth/youtubepartner"
) -> scope_list

# oauth dance

httr::oauth_app(
  appname = "google",
  key = Sys.getenv("GOOGLE_APP_SECRET"),
  secret = Sys.getenv("GOOGLE_APP_KEY")
) -> captions_app

httr::oauth2.0_token(
  endpoint = httr::oauth_endpoints("google"),
  app = captions_app,
  scope = scope_list,
  cache = TRUE
) -> google_token

# list the available captions for this video
# (captions can be in one or more languages)

httr::GET(
  url = "https://www.googleapis.com/youtube/v3/captions",
  query = list(
    part = "snippet",
    videoId = "f2Sxob3fl0k" # the v=string in the YouTube URL
  ),
  httr::config(token = google_token)
) -> caps_list

# I'm cheating since I know there's only one but you'd want
# to introspect `caps_list` before blindly doing this for 
# other videos.

httr::GET(
  url = sprintf(
    "https://www.googleapis.com/youtube/v3/captions/%s",
    httr::content(caps_list)$items[[1]]$id
  ),
  httr::config(token = google_token)
) -> caps

# strangely enough, the JSON response "feels" better than this
# one, though this is a standard format that's parseable quite well.

cat(rawToChar(httr::content(caps)))
## 0:00:00.000,0:00:03.659
## all sort of yeah web of things if it's a
## 
## 0:00:02.490,0:00:05.819
## franchise then there's a kind of
## 
## 0:00:03.659,0:00:07.589
## ultimately there's a there's a there's a
## 
## 0:00:05.819,0:00:09.660
## coordinator of that franchise or someone
## 
## 0:00:07.589,0:00:13.139
## who's a you got a that franchise is well
## 
## 0:00:09.660,0:00:16.230
## well when I was there that was Alexander
## ...

Neither a reflection on active memory nor a quick Duck Duck Go search (I try not to use Google Search anymore) seemed to point to an existing R resource for this, hence the quick post in the event the snippet is helpful to anyone else.

If you do know of an R package/snippet that does this already, please shoot a note into the comments so others can find it.

5 Comments

- batpigandme
- Posted 2018-03-29 at 08:14
- Permalink
- Reply
This speaks to me on every level as someone who loathes the “pivot to video” in the world and reads infinitely faster than I can watch anything.
- - hrbrmstr
  - Posted 2018-03-29 at 08:42
  - Permalink
  - Reply
  Lemme see what I can do re: whipping up a small Shiny app to let you paste in a YT URL and then have a choice of languages popup for the captions :-)
- yoni sidi (@yoniceedee)
- Posted 2018-04-01 at 18:18
- Permalink
- Reply
but then you would miss the rockin’ pink hair :(… didnt even know there are transpcript options for the videos, thanks for pointing that out.
- - yoni sidi (@yoniceedee)
  - Posted 2018-04-01 at 18:20
  - Permalink
  - Reply
  would this suffice for a YT api package? https://cran.r-project.org/web/packages/tuber/tuber.pdf
- 350D
- Posted 2019-04-04 at 20:24
- Permalink
- Reply
Using Youtube API you can only get captions for your own videos, correct?

rud.is