Skip navigation

An R user recently had the need to split a “full, human name” into component parts to retrieve first & last names. The full names could be anything from something simple like _”David Regan”_ to more complex & diverse such as _”John Smith Jr.”_, _”Izaque Iuzuru Nagata”_ or _”Christian Schmit de la Breli”_. Despite the face that I’m _pretty good_ at searching GitHub & CRAN for R stuff, my quest came up empty (though a teensy part of me swears I saw this type of thing in a package somewhere). I _did_ manage to find Python & node.js modules that carved up human names but really didn’t have the time to re-implement their functionality from scratch in R (or, preferably, Rcpp).

Rather than rely on the Python bridge to R (yuck) I decided to use @opencpu’s [V8 package](https://cran.rstudio.com/web/packages/V8/index.html) to wrap a part of the node.js [humanparser](https://github.com/chovy/humanparser) module. If you’re not familiar with V8, it provides the ability to run JavaScript code within R and makes it possible to pass variables into JavaScript functions and get data back in return. All the magic happens via a JSON data passing & Rcpp wrappers (and, of course, the super-awesome code Jeroen writes).

Working with JavaScript in R is as simple as creating an instance of the JavaScript V8 interpreter, loading up the JavaScript code that makes the functions work:

library(V8)
 
ct <- new_context()
ct$source(system.file("js/underscore.js", package="V8"))
ct$call("_.filter", mtcars, JS("function(x){return x.mpg < 15}"))
 
#>                      mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Duster 360          14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
#> Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
#> Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
#> Chrysler Imperial   14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
#> Camaro Z28          13.3   8  350 245 3.73 3.840 15.41  0  0    3    4

There are many more examples in the [V8 vignette](https://cran.rstudio.com/web/packages/V8/vignettes/v8_intro.html).

For `humanparser` I needed to use Underscore.js (it comes with V8) and a [function](https://github.com/chovy/humanparser/blob/master/index.js#L5-L74) from `humanparser` that I carved out to work the way I wanted it to. You can look at the innards of the package [on github](https://github.com/hrbrmstr/humanparser)—specifically, [this file](https://github.com/hrbrmstr/humanparser/blob/master/R/humanparser.r) (it’s _really_ small)— and, to use the two new functions the package exposes it’s as simple as doing:

devtools::install_github("hrbrmstr/humanparser")
 
library(humanparser)
 
parse_name("John Smith Jr.")
 
#> $firstName
#> [1] "John"
#> 
#> $suffix
#> [1] "Jr."
#> 
#> $lastName
#> [1] "Smith"
#> 
#> $fullName
#> [1] "John Smith Jr."

or the following to convert a bunch of ’em:

full_names <- c("David Regan", "Izaque Iuzuru Nagata", 
                "Christian Schmit de la Breli", "Peter Doyle", "Hans R.Bruetsch", 
                "Marcus Reichel", "Per-Axel Koch", "Louis Van der Walt", 
                "Mario Adamek", "Ugur Tozsekerli", "Judit Ludvai" )
 
parse_names(full_names)
 
#> Source: local data frame [11 x 4]
#> 
#>    firstName     lastName                     fullName middleName
#> 1      David        Regan                  David Regan         NA
#> 2     Izaque       Nagata         Izaque Iuzuru Nagata     Iuzuru
#> 3  Christian  de la Breli Christian Schmit de la Breli     Schmit
#> 4      Peter        Doyle                  Peter Doyle         NA
#> 5       Hans   R.Bruetsch              Hans R.Bruetsch         NA
#> 6     Marcus      Reichel               Marcus Reichel         NA
#> 7   Per-Axel         Koch                Per-Axel Koch         NA
#> 8      Louis Van der Walt           Louis Van der Walt         NA
#> 9      Mario       Adamek                 Mario Adamek         NA
#> 10      Ugur   Tozsekerli              Ugur Tozsekerli         NA
#> 11     Judit       Ludvai                 Judit Ludvai         NA

Now, the functions in this package won’t win any land-speed records since we’re going from R to C[++] to JavaScript and back, passing JSON-converted data back & forth, so I pwnd @quominus into making a full Rcpp-based human, full-name parser. And, he’s nearly done! So, keep an eye on [humaniformat](https://github.com/Ironholds/humaniformat) since it will no doubt be in CRAN soon.

The real point of this post is that there are _tons_ of JavaScript modules that will work well with the V8 package and let you get immediate functionality for something that might not be in R yet. You can prototype quickly (it took almost no time to make that package and you don’t even need to go that far), then optimize later. So, next time—if you can’t find some functionality directly in R—see if you can get by with a JavaScript shim, then convert to full R/Rcpp when/if you need to go into production.

If you’ve done any creative V8 hacks, drop a note in the comments!

2 Comments

  1. Perhaps a few things in this note can help?


One Trackback/Pingback

  1. […] article was first published on rud.is » R, and kindly contributed to […]

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.