Regular readers will recall the “utility belt” post from back in April of this year. This is a follow-up to a request made asking for a list of all the %
infix functions in those files.
We’re going to:
- collect up all of the sources
- parse them
- find all the definitions of
%
infix functions - write them to a file
We’ll start by grabbing the data from the previous post and look at it as a refresher:
library(stringi)
library(tidyverse)
utils <- read_rds(url("https://rud.is/dl/utility-belt.rds"))
utils
## # A tibble: 1,746 x 13
## permsissions links owner group size month day year_hr path date pkg fil file_src
## 1 -rw-r--r-- 0 hornik users 1658 Jun 05 2016 AHR/R… 2016-06-05 AHR util… "## \\int f(x)dg(x) …
## 2 -rw-r--r-- 0 ligges users 12609 Dec 13 2016 ALA4R… 2016-12-13 ALA4R util… "## some utility fun…
## 3 -rw-r--r-- 0 hornik users 0 Feb 24 2017 AWR.K… 2017-02-24 AWR.… util… ""
## 4 -rw-r--r-- 0 ligges users 4127 Aug 30 2017 Alpha… 2017-08-30 Alph… util… "#\n#' Assign API ke…
## 5 -rw-r--r-- 0 ligges users 121 Jan 19 2017 Amylo… 2017-01-19 Amyl… util… "make_decision <- fu…
## 6 -rw-r--r-- 0 herbrandt herbrandt 52 Aug 10 2017 BANES… 2017-08-10 BANE… util… "#' @importFrom dply…
## 7 -rw-r--r-- 0 ripley users 36977 Jan 06 2015 BEQI2… 2015-01-06 BEQI2 util… "#' \tRemove Redunda…
## 8 -rw-r--r-- 0 hornik users 34198 May 10 2017 BGDat… 2017-05-10 BGDa… util… "# A more memory-eff…
## 9 -rwxr-xr-x 0 ligges users 3676 Aug 14 2016 BGLR/… 2016-08-14 BGLR util… "\n readBinMat=funct…
## 10 -rw-r--r-- 0 ripley users 2547 Feb 04 2015 BLCOP… 2015-02-04 BLCOP util… "###################…
## # ... with 1,736 more rows
Note that we somewhat expected the file source to potentially come in handy at a later date and also expected the need to revisit that post, so the R data file [←direct link to RDS] included a file_src
column.
Now, let's find all the source files with at least one infix definition, collect them together and parse them so we can do more code spelunking:
filter(utils, stri_detect_fixed(file_src, "`%")) %>% # only find sources with infix definitions
pull(file_src) %>%
paste0(collapse="\n\n") %>%
parse(text = ., keep.source=TRUE) -> infix_src
str(infix_src, 1)
## length 1364 expression(dplyr::`%>%`, `%||%` <- function(a, b) if (is.null(a)) b else a, get_pkg_path <- function(ctx) { pkg_| __truncated__ ...
## - attr(*, "srcref")=List of 1364
## - attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile'
## - attr(*, "wholeSrcref")= 'srcref' int [1:8] 1 0 15768 0 0 0 1 15768
## ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile'
We can now take all of that lovely parsed source and tokenize it to work with the discrete elements in a very tidy manner:
infix_parsed <- tbl_df(getParseData(infix_src)) # tbl_df() is mainly for pretty printing
infix_parsed
## # A tibble: 118,242 x 9
## line1 col1 line2 col2 id parent token terminal text
## 1 1 1 1 24 1 -10 COMMENT TRUE #' @impor…
## 2 2 1 2 10 4 -10 COMMENT TRUE #' @export
## 3 3 1 3 12 10 0 expr FALSE ""
## 4 3 1 3 5 7 10 SYMBOL_PACKAGE TRUE dplyr
## 5 3 6 3 7 8 10 NS_GET TRUE ::
## 6 3 8 3 12 9 10 SYMBOL TRUE `%>%`
## 7 5 1 5 49 51 0 expr FALSE ""
## 8 5 1 5 6 16 18 SYMBOL TRUE `%||%`
## 9 5 1 5 6 18 51 expr FALSE ""
## 10 5 8 5 9 17 51 LEFT_ASSIGN TRUE <-
## # ... with 118,232 more rows
We just need to find a sequence of tokens that make up a function definition, then whittle those down to ones that look like our %
infix names:
pat <- c("SYMBOL", "expr", "LEFT_ASSIGN", "expr", "FUNCTION") # pattern for function definition
# find all of ^^ sequences (there's a good twitter discussion on this abt a month ago)
idx <- which(infix_parsed$token == pat[1]) # find location of match of start of seq
# look for the rest of the sequences starting at each idx position
map_lgl(idx, ~{
all(infix_parsed$token[.x:(.x+(length(pat)-1))] == pat)
}) -> found
f_defs <- idx[found] # starting indices of all the places where functions are defined
# filter ^^ to only find infix ones
infix_defs <- f_defs[stri_detect_regex(infix_parsed$text[f_defs], "^`\\%")]
# there aren't too many, but remember we're just searching `util` functions
length(infix_defs)
## [1] 106
Now, write it out to a file so we can peruse the infix functions:
# nuke a file and fill it with the function definition
cat("", sep="", file="infix_functions.R")
walk2(
getParseText(infix_parsed, infix_parsed$id[infix_defs]), # extract the infix name
getParseText(infix_parsed, infix_parsed$id[infix_defs + 3]), # extract the function definition body
~{
cat(.x, " <- ", .y, "\n\n", sep="", file="infix_functions.R", append=TRUE)
}
)
There are 106 of them so you can find the extracted ones in this gist.
Here's an overview of what you can expect to find:
# A tibble: 39 x 2 name n 1 `%||%` 47 2 `%+%` 7 3 `%AND%` 4 4 `%notin%` 4 5 `%:::%` 3 6 `%==%` 3 7 `%!=%` 2 8 `%*diag%` 2 9 `%diag*%` 2 10 `%nin%` 2 11 `%OR%` 2 12 `%::%` 1 13 `%??%` 1 14 `%.%` 1 15 `%@%` 1 16 `%&&%` 1 17 `%&%` 1 18 `%+&%` 1 19 `%++%` 1 20 `%+|%` 1 21 `%<<%` 1 22 `%>>%` 1 23 `%~~%` 1 24 `%assert_class%` 1 25 `%contains%` 1 26 `%din%` 1 27 `%fin%` 1 28 `%identical%` 1 29 `%In%` 1 30 `%inr%` 1 31 `%M%` 1 32 `%notchin%` 1 33 `%or%` 1 34 `%p%` 1 35 `%pin%` 1 36 `%R%` 1 37 `%s%` 1 38 `%sub_in%` 1 39 `%sub_nin%` 1
FIN
If any of those are useful, feel free to PR them in to https://github.com/hrbrmstr/freebase/blob/master/inst/templates/infix-helpers.R (and add yourself to the DESCRIPTION
if you do).
Hopefully this provided some further inspiration to continue to use R not only as your language of choice but also as a fun data source.
One Trackback/Pingback
[…] article was first published on R – rud.is, and kindly contributed to […]