Skip navigation

Tag Archives: R

WebR 0.1.0 was released! I had been git-stalking George (the absolute genius who we all must thank for this) for a while and noticed the GH org and repos being updated earlier this week, So, I was already pretty excited.

It dropped today, and you can hit that link for all the details and other links.

I threw together a small demo to show how to get it up and running without worrying about fancy “npm projects” and the like.

View-source on that link, or look below for a very small (so, hopefully accessible) example of how to start working with WASM-ified R in a web context.

UPDATE:

Four more links:


<html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>WebR Super Simple Demo</title> <link rel="stylesheet" href="/style.css" type="text/css"> <style> li { font-family:monospace; } .nospace { margin-bottom: 2px; } </style> </head> <body> <div id="main"> <p>Simple demo to show the basics of calling the new WebR WASM!!!!</p> <p><code>view-source</code> to see how the sausage is made</code></p> <p class="nospace">Input a number, press "Call R" (when it is enabled) and magic will happen.</p> <!-- We'll pull the value from here --> <input type="text" id="x" value="10"> <!-- This button is disabled until WebR is loaded --> <button disabled="" id="callr">Call R</button> <!-- Output goes here --> <div id="output"></div> <!-- WebR is a module so you have to do this. --> <!-- NOTE: Many browsers will not like it if `.mjs` files are served --> <!-- with a content-type that isn't text/javascript --> <!-- Try renaming it from .mjs to .js if you hit that snag. --> <script type="module"> // https://github.com/r-wasm/webr/releases/download/v0.1.0/webr-0.1.0.tar.gz // // I was lazy and just left it in one directory import { WebR } from '/webr-d3-demo/webr.mjs'; // service workers == full path starting with / const webR = new WebR(); // get ready to Rumble await webR.init(); // shot's fired console.log("WebR"); // just for me b/c I don't trust anything anymore // we call this function on the button press async function callR() { let x = document.getElementById('x').value.trim(); // get the value we input; be better than me and do validation console.log(`x = ${x}`) // as noted, i don't trust anything let result = await webR.evalR(`rnorm(${x},5,1)`); // call some R! let output = await result.toArray(); // make the result something JS can work with document.getElementById("output").replaceChildren() // clear out the <div> (this is ugly; be better than me) // d3 ops d3.select("#output").append("ul") const ul = d3.select("ul") ul.selectAll("li") .data(output) .enter() .append("li") .text(d => d) } // by the time we get here, WebR is ready, so we tell the button what to do and re-enable the button document.getElementById('callr').onclick = callR; document.getElementById('callr').disabled = false; </script> <!-- d/l from D3 site or here if you trust me --> <script src="d3.min.js"></script> </div> </body> </html>

As I was putting together the [coord_proj](https://rud.is/b/2015/07/24/a-path-towards-easier-map-projection-machinations-with-ggplot2/) ggplot2 extension I had posted a (https://gist.github.com/hrbrmstr/363e33f74e2972c93ca7) that I shared on Twitter. Said gist received a comment (several, in fact) and a bunch of us were painfully reminded of the fact that there is no built-in way to receive notifications from said comment activity.

@jennybryan posited that it could be possible to use IFTTT as a broker for these notifications, but after some checking that ended up not being directly doable since there are no “gist comment” triggers to act upon in IFTTT.

There are a few standalone Ruby gems that programmatically retrieve gist comments but I wasn’t interested in managing a Ruby workflow [ugh]. I did find a Heroku-hosted service – https://gh-rss.herokuapp.com/ – that will turn gist comments into an RSS/Atom feed (based on Ruby again). I gave it a shot and hooked it up to IFTTT but my feed is far enough down on the food chain there that it never gets updated. It was possible to deploy that app on my own Heroku instance, but—again—I’m not interested in managing a Ruby workflow.

The Ruby scripts pretty much:

– grab your main gist RSS/Atom feed
– visit each gist in the feed
– extract comments & comment metadata from them (if any)
– return a composite data structure you can do anything with

That’s super-easy to duplicate in R, so I decided to build a small R script that does all that and generates an RSS/Atom file which I added to my Feedly feeds (I’m pretty much always scanning RSS, so really didn’t need the IFTTT notification setup). I put it into a `cron` job that runs every hour. When Feedly refreshes the feed, a new entry will appear whenever there’s a new comment.

The script is below and [on github](https://gist.github.com/hrbrmstr/0ad1ced217edd137de27) (ironically as a gist). Here’s what you’ll grok from the code:

– one way to deal with the “default namespace” issue in R+XML
– one way to deal with error checking for scraping
– how to build an XML file (and, specifically, an RSS/Atom feed) with R
– how to escape XML entities with R
– how to get an XML object as a character string in R

You’ll definitely need to tweak this a bit for your own setup, but it should be a fairly complete starting point for you to work from. To see the output, grab the [generated feed](http://dds.ec/hrbrmstrgcfeed.xml).

# Roll your own GitHub Gist Comments Feed in R
 
library(xml2)    # github version
library(rvest)   # github version
library(stringr) # for str_trim & str_replace
library(dplyr)   # for data_frame & bind_rows
library(pbapply) # free progress bars for everyone!
library(XML)     # to build the RSS feed
 
who <- "hrbrmstr" # CHANGE ME!
 
# Grab the user's gist feed -----------------------------------------------
 
gist_feed <- sprintf("https://gist.github.com/%s.atom", who)
feed_pg <- read_xml(gist_feed)
ns <- xml_ns_rename(xml_ns(feed_pg), d1 = "feed")
 
# Extract the links & titles of the gists in the feed ---------------------
 
links <-  xml_attr(xml_find_all(feed_pg, "//feed:entry/feed:link", ns), "href")
titles <-  xml_text(xml_find_all(feed_pg, "//feed:entry/feed:title", ns))
 
#' This function does the hard part by iterating over the
#' links/titles and building a tbl_df of all the comments per-gist
get_comments <- function(links, titles) {
 
  bind_rows(pblapply(1:length(links), function(i) {
 
    # get gist
 
    pg <- read_html(links[i])
 
    # look for comments
 
    ref <- tryCatch(html_attr(html_nodes(pg, "div.timeline-comment-wrapper a[href^='#gistcomment']"), "href"),
                    error=function(e) character(0))
 
    # in theory if 'ref' exists then the rest will
 
    if (length(ref) != 0) {
 
      # if there were comments, get all the metadata we care about
 
      author <- html_text(html_nodes(pg, "div.timeline-comment-wrapper a.author"))
      timestamp <- html_attr(html_nodes(pg, "div.timeline-comment-wrapper time"), "datetime")
      contentpg <- str_trim(html_text(html_nodes(pg, "div.timeline-comment-wrapper div.comment-body")))
 
    } else {
      ref <- author <- timestamp <- contentpg <- character(0)
    }
 
    # bind_rows ignores length 0 tbl_df's
    if (sum(lengths(list(ref, author, timestamp, contentpg))==0)) {
      return(data_frame())
    }
 
    return(data_frame(title=titles[i], link=links[i],
                      ref=ref, author=author,
                      timestamp=timestamp, contentpg=contentpg))
 
  }))
 
}
 
comments <- get_comments(links, titles)
 
feed <- xmlTree("feed")
feed$addNode("id", sprintf("user:%s", who))
feed$addNode("title", sprintf("%s's gist comments", who))
feed$addNode("icon", "https://assets-cdn.github.com/favicon.ico")
feed$addNode("link", attrs=list(href=sprintf("https://github.com/%s", who)))
feed$addNode("updated", format(Sys.time(), "%Y-%m-%dT%H:%M:%SZ", tz="GMT"))
 
for (i in 1:nrow(comments)) {
 
  feed$addNode("entry", close=FALSE)
    feed$addNode("id", sprintf("gist:comment:%s:%s", who, comments[i, "timestamp"]))
    feed$addNode("link", attrs=list(href=sprintf("%s%s", comments[i, "link"], comments[i, "ref"])))
    feed$addNode("title", sprintf("Comment by %s", comments[i, "author"]))
    feed$addNode("updated", comments[i, "timestamp"])
    feed$addNode("author", close=FALSE)
      feed$addNode("name", comments[i, "author"])
    feed$closeTag()
    feed$addNode("content", saveXML(xmlTextNode(as.character(comments[i, "contentpg"])), prefix=""), 
                 attrs=list(type="html"))
  feed$closeTag()
 
}
 
rss <- str_replace(saveXML(feed), "<feed>", '<feed xmlns="http://www.w3.org/2005/Atom">')
 
writeLines(rss, con="feed.xml")

To get that RSS feed into something that an internet service can process you have to make sure that `feed.xml` is being written to a directory that translates to a publicly accessible web location (mine is at [http://dds.ec/hrbrmstrgcfeed.xml](http://dds.ec/hrbrmstrgcfeed.xml) if you want to see it).

On the internet-facing Ubuntu box that generated the feed I’ve got a `cron` entry:

30  * * * * /home/bob/bin/gengcfeed.R

which means it’s going to check github every 30 minutes for comment updates. Tune said parameters to your liking.

At the top of `gengcfeed.R` I have an `Rscript` shebang:

#!/usr/bin/Rscript

and the execute bit is set on the file.

Run the file by hand, first, and then test the feed via [https://validator.w3.org/feed/](https://validator.w3.org/feed/) to ensure it’s accessible and that it validates correctly. Now you can enter that feed URL into your favorite newsfeed reader (I use @feedly).

Thanks to a comment suggestion, the Rforecastio package is now up to version 1.3.0 and has a new parameter which lets you specify which time conversion function you want to use. Details are up on [github](https://github.com/hrbrmstr/Rforecastio).

Not even going to put an `R` category on this since I don’t want to pollute R-bloggers with this tiny post, but I had to provide the option to let folks specify `ssl.verifypeer=FALSE` (so I made it a generic option to pass in any CURL parameters) and I had a couple gaping bugs that I missed due to not clearing out my environment before building & testing.

ThinkStats (by Allen B. Downey) is a good book to get you familiar with statistics (and even Python, if you’ve done some scripting in other languages).

I thought it would be interesting to present some of the examples & exercises in the book in R. Why? Well, once you’ve gone through the material in a particular chapter the “hard way”, seeing how you’d do the same thing in a language specifically designed for statistical computing should show when it’s best to use such a domain specific language and when you might want to consider a hybrid approach. I am also hoping it helps make R a bit more accessible to folks.

You’ll still need the book and should work through the Python examples to get the most out of these posts.

I’ll try to get at least one example/exercise section up a week.

Please submit all errors, omissions or optimizations in the comments section.

The star of the show is going to be the “data frame” in most of the examples (and is in this one). Unlike the Python code in the book, most of the hard work here is figuring out how to use the data frame file reader to parse the ugly fields in the CDC data file. By using some tricks, we can approximate the “field start:length” style of the Python code but still keep the automatic reading/parsing of the R code (including implicit handling of “NA” values).

The power & simplicity of using R’s inherent ability to apply a calculation across a whole column (pregnancies$agepreg <- pregnancies$agepreg / 100) should jump out. Unfortunately, not all elements of the examples in R will be as nice or straightforward.

You'll also notice that I cheat and use str() for displaying summary data.

Enough explanation! Here's the code:

  1. # ThinkStats in R by @hrbrmstr
  2. # Example 1.2
  3. # File format info: http://www.cdc.gov/nchs/nsfg/nsfg_cycle6.htm
  4.  
  5. # setup a data frame that has the field start/end info
  6.  
  7. pFields <- data.frame(name  = c('caseid', 'nbrnaliv', 'babysex', 'birthwgt_lb','birthwgt_oz','prglength', 'outcome', 'birthord',  'agepreg',  'finalwgt'), 
  8.                       begin = c(1, 22, 56, 57, 59, 275, 277, 278, 284, 423), 
  9.                       end   = c(12, 22, 56, 58, 60, 276, 277, 279, 287, 440) 
  10. ) 
  11.  
  12. # calculate widths so we can pass them to read.fwf()
  13.  
  14. pFields$width <- pFields$end - pFields$begin + 1 
  15.  
  16. # we aren't reading every field (for the book exercises)
  17.  
  18. pFields$skip <-  (-c(pFields$begin[-1]-pFields$end[-nrow(pFields)]-1,0)) 
  19.  
  20. widths <- c(t(pFields[,4:5])) 
  21. widths <- widths[widths!=0] 
  22.  
  23. # read in the file
  24.  
  25. pregnancies <- read.fwf("2002FemPreg.dat", widths) 
  26.  
  27. # assign column names
  28.  
  29. names(pregnancies) <- pFields$name 
  30.  
  31. # divide mother's age by 100
  32.  
  33. pregnancies$agepreg <- pregnancies$agepreg / 100
  34.  
  35. # convert weight at birth from lbs/oz to total ounces
  36.  
  37. pregnancies$totalwgt_oz = pregnancies$birthwgt_lb * 16 + pregnancies$birthwgt_oz
  38.  
  39. rFields <- data.frame(name  = c('caseid'), 
  40.                       begin = c(1), 
  41.                       end   = c(12) 
  42. ) 
  43.  
  44. rFields$width <- rFields$end - rFields$begin + 1 
  45. rFields$skip <-  (-c(rFields$begin[-1]-rFields$end[-nrow(rFields)]-1,0)) 
  46.  
  47. widths <- c(t(rFields[,4:5])) 
  48. widths <- widths[widths!=0] 
  49.  
  50. respondents <- read.fwf("2002FemResp.dat", widths) 
  51. names(respondents) <- rFields$name
  52.  
  53. str(respondents)
  54. str(pregnancies)