Skip navigation

Author Archives: hrbrmstr

Don't look at me…I do what he does — just slower. #rstats avuncular • ?Resistance Fighter • Cook • Christian • [Master] Chef des Données de Sécurité @ @rapid7

UPDATE: `docxtractr` is now [on CRAN](https://cran.rstudio.com/web/packages/docxtractr/index.html)

———————

This is more of a follow-up from [yesterday’s post](http://rud.is/b/2015/08/23/using-r-to-get-data-out-of-word-docs/). The hack and function in said post was fine, but it was limited to uniform tables and made you do more work than you had to. So, there’s now a `devtools`-installable package [on github](https://github.com/hrbrmstr/docxtractr) that makes it way easier to get information about the tables in a Word document and extract them—uniform or not.

There are plenty of examples in the GitHub README and also in the package examples. But, I will show the basic functionality here.

The package ships with four example Word documents, but we’ll work with the last one: `complex.doc`. It has five tables and the last two have varying columns and rows and look like:

complex

Let’s read those two in:

complx <- read_docx(system.file("examples/complex.docx", package="docxtractr"))

docx_tbl_count(complx)
#> [1] 5

docx_describe_tbls(complx)
#> Word document [/Library/Frameworks/R.framework/Versions/3.2/Resources/library/docxtractr/examples/complex.docx]
#> 
#> Table 1
#>   total cells: 16
#>   row count  : 4
#>   uniform    : likely!
#>   has header : likely! => possibly [This, Is, A, Column]
#> 
#> Table 2
#>   total cells: 12
#>   row count  : 4
#>   uniform    : likely!
#>   has header : likely! => possibly [Foo, Bar, Baz]
#> 
#> Table 3
#>   total cells: 14
#>   row count  : 7
#>   uniform    : likely!
#>   has header : likely! => possibly [Foo, Bar]
#> 
#> Table 4
#>   total cells: 11
#>   row count  : 4
#>   uniform    : unlikely => found differing cell counts (3, 2) across some rows 
#>   has header : likely! => possibly [Foo, Bar, Baz]
#> 
#> Table 5
#>   total cells: 21
#>   row count  : 7
#>   uniform    : likely!
#>   has header : unlikely


docx_extract_tbl(complx, 4, header=TRUE)
#> Source: local data frame [3 x 3]
#> 
#>   Foo  Bar Baz
#> 1  Aa BbCc  NA
#> 2  Dd   Ee  Ff
#> 3  Gg   Hh  ii

docx_extract_tbl(complx, 5, header=TRUE)
#> Source: local data frame [6 x 3]
#> 
#>    Foo Bar Baz
#> 1   Aa  Bb  Cc
#> 2   Dd  Ee  Ff
#> 3   Gg  Hh  Ii
#> 4 Jj88  Kk  Ll
#> 5       Uu  Ii
#> 6   Hh  Ii   h

It reads in “uniform” tables properly and will warn you if there is a header marked in Word but not asked for in the extraction.

Next steps are to both allow specifying column types and try to guess column types (`readr` has some nice functions for this) and perhaps return more metadata (if possible).

Feature requests & bug reports are most welcome [on GitHub](https://github.com/hrbrmstr/docxtractr/issues).

NOTE: after reading this post head on over to this new one as it has wrapped this functionality (and more!) into a package.

Also: docxtractr is now on CRAN


This was asked on twitter recently:

The answer is a very cautious “yes”. Much depends on how well-formed and un-formatted the table is.

Take this really simple docx file: data.docx.

It has a single table in it:

data_docx

Now, .docx files are just zipped directories, so rename that to data.zip, unzip it and navigate to data/word/document.xml and you’ll see something like this (though it’ll be more compressed):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mo="http://schemas.microsoft.com/office/mac/office/2008/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:mv="urn:schemas-microsoft-com:mac:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 wp14">
<w:body>
    <w:tbl>
        <w:tblPr>
            <w:tblStyle w:val="TableGrid"/>
            <w:tblW w:w="0" w:type="auto"/>
            <w:tblLook w:val="04A0" w:firstRow="1" w:lastRow="0" w:firstColumn="1" w:lastColumn="0" w:noHBand="0" w:noVBand="1"/>
        </w:tblPr>
        <w:tblGrid>
            <w:gridCol w:w="2337"/>
            <w:gridCol w:w="2337"/>
            <w:gridCol w:w="2338"/>
            <w:gridCol w:w="2338"/>
        </w:tblGrid>
        <w:tr w:rsidR="00244D8A" w14:paraId="6808A6FE" w14:textId="77777777" w:rsidTr="00244D8A">
            <w:tc>
                <w:tcPr>
                    <w:tcW w:w="2337" w:type="dxa"/>
                </w:tcPr>
                <w:p w14:paraId="7D006905" w14:textId="77777777" w:rsidR="00244D8A" w:rsidRDefault="00244D8A">
                    <w:r>
                        <w:t>This</w:t>
                    </w:r>
                </w:p>
            </w:tc>
            <w:tc>
                <w:tcPr>
                    <w:tcW w:w="2337" w:type="dxa"/>
                </w:tcPr>
                <w:p w14:paraId="13C9E52C" w14:textId="77777777" w:rsidR="00244D8A" w:rsidRDefault="00244D8A">
                    <w:r>
                        <w:t>Is</w:t>
                    </w:r>
                </w:p>
            </w:tc>
...

We can easily make out a table structure with rows and columns. In the simplest cases (which is all I’ll cover in this post) where the rows and columns are uniform it’s pretty easy to grab the data:

library(xml2)

# read in the XML file
doc <- read_xml("data/word/document.xml")

# there is an egregious use of namespaces in these files
ns <- xml_ns(doc)

# extract all the table cells (this is assuming one table in the document)
cells <- xml_find_all(doc, ".//w:tbl/w:tr/w:tc", ns=ns)

# convert the cells to a matrix then to a data.frame)
dat <- data.frame(matrix(xml_text(cells), ncol=4, byrow=TRUE), 
                  stringsAsFactors=FALSE)

# if there are column headers, make them the column name and remove that line
colnames(dat) <- dat[1,]
dat <- dat[-1,]
rownames(dat) <- NULL

dat

##   This      Is     A   Column
## 1    1     Cat   3.4      Dog
## 2    3    Fish 100.3     Bird
## 3    5 Pelican   -99 Kangaroo

You’ll need to clean up the column types, but you have at least freed the data from the evil file format it was in.

If there is more than one table you can use XML node targeting to process each one separately or into a list. I’ve wrapped that functionality into a rudimentary function that will:

  • auto-copy a Word doc to a temporary location
  • rename it to a zip
  • unzip it to a temporary location
  • read in the document.xml
  • auto-determine the number of tables in the document
  • auto-calculate # rows & # columns per table
  • convert each table
  • return all the tables into a list
  • clean up the temporarily created items
library(xml2)

get_tbls <- function(word_doc) {
  
  tmpd <- tempdir()
  tmpf <- tempfile(tmpdir=tmpd, fileext=".zip")
  
  file.copy(word_doc, tmpf)
  unzip(tmpf, exdir=sprintf("%s/docdata", tmpd))
  
  doc <- read_xml(sprintf("%s/docdata/word/document.xml", tmpd))
  
  unlink(tmpf)
  unlink(sprintf("%s/docdata", tmpd), recursive=TRUE)

  ns <- xml_ns(doc)
  
  tbls <- xml_find_all(doc, ".//w:tbl", ns=ns)
  
  lapply(tbls, function(tbl) {
    
    cells <- xml_find_all(tbl, "./w:tr/w:tc", ns=ns)
    rows <- xml_find_all(tbl, "./w:tr", ns=ns)
    dat <- data.frame(matrix(xml_text(cells), 
                             ncol=(length(cells)/length(rows)), 
                             byrow=TRUE), 
                      stringsAsFactors=FALSE)
    colnames(dat) <- dat[1,]
    dat <- dat[-1,]
    rownames(dat) <- NULL
    dat
    
  })
  
}

Using this multi-table Word doc – doc3:

data3

we can extract the three tables thusly:

get_tbls("~/Dropbox/data3.docx")

## [[1]]
##   This      Is     A   Column
## 1    1     Cat   3.4      Dog
## 2    3    Fish 100.3     Bird
## 3    5 Pelican   -99 Kangaroo
## 
## [[2]]
##   Foo Bar Baz
## 1  Aa  Bb  Cc
## 2  Dd  Ee  Ff
## 3  Gg  Hh  ii
## 
## [[3]]
##   Foo Bar
## 1  Aa  Bb
## 2  Dd  Ee
## 3  Gg  Hh
## 4  1    2
## 5  Zz  Jj
## 6  Tt  ii

This function tries to calculate the rows/columns per table but it does rely on a uniform table structure.

Have an alternate method or more feature-complete way of handling Word docs as tabular data sources? Then definitely drop a note in the comments.

An R user recently had the need to split a “full, human name” into component parts to retrieve first & last names. The full names could be anything from something simple like _”David Regan”_ to more complex & diverse such as _”John Smith Jr.”_, _”Izaque Iuzuru Nagata”_ or _”Christian Schmit de la Breli”_. Despite the face that I’m _pretty good_ at searching GitHub & CRAN for R stuff, my quest came up empty (though a teensy part of me swears I saw this type of thing in a package somewhere). I _did_ manage to find Python & node.js modules that carved up human names but really didn’t have the time to re-implement their functionality from scratch in R (or, preferably, Rcpp).

Rather than rely on the Python bridge to R (yuck) I decided to use @opencpu’s [V8 package](https://cran.rstudio.com/web/packages/V8/index.html) to wrap a part of the node.js [humanparser](https://github.com/chovy/humanparser) module. If you’re not familiar with V8, it provides the ability to run JavaScript code within R and makes it possible to pass variables into JavaScript functions and get data back in return. All the magic happens via a JSON data passing & Rcpp wrappers (and, of course, the super-awesome code Jeroen writes).

Working with JavaScript in R is as simple as creating an instance of the JavaScript V8 interpreter, loading up the JavaScript code that makes the functions work:

library(V8)
 
ct <- new_context()
ct$source(system.file("js/underscore.js", package="V8"))
ct$call("_.filter", mtcars, JS("function(x){return x.mpg < 15}"))
 
#>                      mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Duster 360          14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
#> Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
#> Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
#> Chrysler Imperial   14.7   8  440 230 3.23 5.345 17.42  0  0    3    4
#> Camaro Z28          13.3   8  350 245 3.73 3.840 15.41  0  0    3    4

There are many more examples in the [V8 vignette](https://cran.rstudio.com/web/packages/V8/vignettes/v8_intro.html).

For `humanparser` I needed to use Underscore.js (it comes with V8) and a [function](https://github.com/chovy/humanparser/blob/master/index.js#L5-L74) from `humanparser` that I carved out to work the way I wanted it to. You can look at the innards of the package [on github](https://github.com/hrbrmstr/humanparser)—specifically, [this file](https://github.com/hrbrmstr/humanparser/blob/master/R/humanparser.r) (it’s _really_ small)— and, to use the two new functions the package exposes it’s as simple as doing:

devtools::install_github("hrbrmstr/humanparser")
 
library(humanparser)
 
parse_name("John Smith Jr.")
 
#> $firstName
#> [1] "John"
#> 
#> $suffix
#> [1] "Jr."
#> 
#> $lastName
#> [1] "Smith"
#> 
#> $fullName
#> [1] "John Smith Jr."

or the following to convert a bunch of ’em:

full_names <- c("David Regan", "Izaque Iuzuru Nagata", 
                "Christian Schmit de la Breli", "Peter Doyle", "Hans R.Bruetsch", 
                "Marcus Reichel", "Per-Axel Koch", "Louis Van der Walt", 
                "Mario Adamek", "Ugur Tozsekerli", "Judit Ludvai" )
 
parse_names(full_names)
 
#> Source: local data frame [11 x 4]
#> 
#>    firstName     lastName                     fullName middleName
#> 1      David        Regan                  David Regan         NA
#> 2     Izaque       Nagata         Izaque Iuzuru Nagata     Iuzuru
#> 3  Christian  de la Breli Christian Schmit de la Breli     Schmit
#> 4      Peter        Doyle                  Peter Doyle         NA
#> 5       Hans   R.Bruetsch              Hans R.Bruetsch         NA
#> 6     Marcus      Reichel               Marcus Reichel         NA
#> 7   Per-Axel         Koch                Per-Axel Koch         NA
#> 8      Louis Van der Walt           Louis Van der Walt         NA
#> 9      Mario       Adamek                 Mario Adamek         NA
#> 10      Ugur   Tozsekerli              Ugur Tozsekerli         NA
#> 11     Judit       Ludvai                 Judit Ludvai         NA

Now, the functions in this package won’t win any land-speed records since we’re going from R to C[++] to JavaScript and back, passing JSON-converted data back & forth, so I pwnd @quominus into making a full Rcpp-based human, full-name parser. And, he’s nearly done! So, keep an eye on [humaniformat](https://github.com/Ironholds/humaniformat) since it will no doubt be in CRAN soon.

The real point of this post is that there are _tons_ of JavaScript modules that will work well with the V8 package and let you get immediate functionality for something that might not be in R yet. You can prototype quickly (it took almost no time to make that package and you don’t even need to go that far), then optimize later. So, next time—if you can’t find some functionality directly in R—see if you can get by with a JavaScript shim, then convert to full R/Rcpp when/if you need to go into production.

If you’ve done any creative V8 hacks, drop a note in the comments!

poster image

Danny became the [first hurricane of the 2015 Season](http://www.accuweather.com/en/weather-news/atlantic-gives-birth-to-tropical-depression-four-danny/51857239), so it’s a good time to revisit how one might be able to track them with R.

We’ll pull track data from [Unisys](http://weather.unisys.com/hurricane/atlantic/2015/index.php) and just look at Danny, but it should be easy to extrapolate from the code.

For this visualization, we’ll use [leaflet](http://rstudio.github.io/leaflet/) since it’s all the rage and makes the plots interactive without any real work (thanks to the very real work by the HTML Widgets folks and the Leaflet.JS folks).

Let’s get the library calls out of the way:

library(leaflet)
library(stringi)
library(htmltools)
library(RColorBrewer)

Now, we’ll get the tracks:

danny <- readLines("http://weather.unisys.com/hurricane/atlantic/2015/DANNY/track.dat")

Why aren’t we using `read.csv` or `read.table` directly, you ask? Well, the data is in a _really_ ugly format thanks to the spaces in the `STATUS` column and two prefix lines:

Date: 18-20 AUG 2015
Hurricane-1 DANNY
ADV  LAT    LON      TIME     WIND  PR  STAT
  1  10.60  -36.50 08/18/15Z   30  1009 TROPICAL DEPRESSION
  2  10.90  -37.50 08/18/21Z    -     - TROPICAL DEPRESSION
  3  11.20  -38.80 08/19/03Z    -     - TROPICAL DEPRESSION
  4  11.30  -40.20 08/19/09Z    -     - TROPICAL DEPRESSION
  5  11.20  -41.10 08/19/15Z    -     - TROPICAL DEPRESSION
  6  11.50  -42.00 08/19/21Z    -     - TROPICAL DEPRESSION
  7  12.10  -42.70 08/20/03Z    -     - TROPICAL DEPRESSION
  8  12.20  -43.70 08/20/09Z    -     - TROPICAL DEPRESSION
  9  12.50  -44.80 08/20/15Z    -     - TROPICAL DEPRESSION
+12  13.10  -46.00 08/21/00Z   70     - HURRICANE-1
+24  14.00  -47.60 08/21/12Z   75     - HURRICANE-1
+36  14.70  -49.40 08/22/00Z   75     - HURRICANE-1
+48  15.20  -51.50 08/22/12Z   70     - HURRICANE-1
+72  16.00  -56.40 08/23/12Z   65     - HURRICANE-1
+96  16.90  -61.70 08/24/12Z   65     - HURRICANE-1
+120  18.00  -66.60 08/25/12Z   55     - TROPICAL STORM

But, we can put that into shape pretty easily, using `gsub` to make it easier to read everything with `read.table` and we just skip over the first two lines (we’d use them if we were doing other things with more of the data).

danny_dat <- read.table(textConnection(gsub("TROPICAL ", "TROPICAL_", danny[3:length(danny)])), 
           header=TRUE, stringsAsFactors=FALSE)

Now, let’s make the data a bit prettier to work with:

# make storm type names prettier
danny_dat$STAT <- stri_trans_totitle(gsub("_", " ", danny_dat$STAT))
 
# make column names prettier
colnames(danny_dat) <- c("advisory", "lat", "lon", "time", "wind_speed", "pressure", "status")

Those steps weren’t absolutely necessary, but why do something half-baked (unless it’s chocolate chip cookies)?

Let’s pick better colors than Unisys did. We’ll use a color-blind safe palette from Color Brewer:

danny_dat$color <- as.character(factor(danny_dat$status, 
                          levels=c("Tropical Depression", "Tropical Storm",
                                   "Hurricane-1", "Hurricane-2", "Hurricane-3",
                                   "Hurricane-4", "Hurricane-5"),
                          labels=rev(brewer.pal(7, "YlOrBr"))))

And, now for the map! We’ll make lines for the path that was already traced by Danny, then make interactive points for the forecast locations from the advisory data:

last_advisory <- tail(which(grepl("^[[:digit:]]+$", danny_dat$advisory)), 1)
 
# draw the map
leaflet() %>% 
  addTiles() %>% 
  addPolylines(data=danny_dat[1:last_advisory,], ~lon, ~lat, color=~color) -> tmp_map
 
if (last_advisory < nrow(danny_dat)) {
 
   tmp_map <- tmp_map %>% 
     addCircles(data=danny_dat[last_advisory:nrow(danny_dat),], ~lon, ~lat, color=~color, fill=~color, radius=25000,
             popup=~sprintf("<b>Advisory forecast for +%sh (%s)</b><hr noshade size='1'/>
                           Position: %3.2f, %3.2f<br/>
                           Expected strength: <span style='color:%s'><strong>%s</strong></span><br/>
                           Forecast wind: %s (knots)<br/>Forecast pressure: %s",
                           htmlEscape(advisory), htmlEscape(time), htmlEscape(lon),
                           htmlEscape(lat), htmlEscape(color), htmlEscape(status), 
                           htmlEscape(wind_speed), htmlEscape(pressure)))
}
 
html_print(tmp_map)

Click on one of the circles to see the popup.

The entire source code is in [this gist](https://gist.github.com/hrbrmstr/e3253ddd353f1a489bb4) and, provided you have the proper packages installed, you can run this at any time with:

devtools::source_gist("e3253ddd353f1a489bb4", sha1="00074e03e92c48c470dc182f67c91ccac612107e")

The use of the `sha1` hash parameter will help ensure you aren’t being asked to run a potentially modified & harmful gist, but you should visit the gist first to make sure I’m not messing with you (which, I’m not).

If you riff off of this or have suggestions for improvement, drop a note here or in the gist comments.

I like to turn coincidence into convergence whenever possible. This weekend, a user of [cdcfluview](http://github.com/hrbrmstr/cdcfluview) had a question that caused me to notice a difference in behaviour between the package was interacting with CDC FluView API, so I updated the package to accommodate the change and the user.

Around the same time, @recology_ tweeted:

Finally, the [2015-2016 flu season](http://www.cdc.gov/flu/about/season/flu-season-2015-2016.htm) is also fast approaching (so a three-fer!), making a CRAN leap for `cdcfluview` quite timely.

Since I can’t let @quonimous have all the `#rstats` fun & glory, I added a function and two data sets to `cdcfluview`, did the CRAN dance and it’s now [on CRAN](https://cran.rstudio.com/web/packages/cdcfluview/). I also did the github dance to have it’s entry in the [Web Technologies Task View](https://cran.rstudio.com/web/views/WebTechnologies.html) updated.

The new function lets you grab the XML behind the high-level [Weekly US Map: Influenza Summary Update](http://www.cdc.gov/flu/weekly/usmap.htm) (I’ll be adding a function to make a similar plot that won’t require gosh-awful Flash) and the data sets provide metadata about the composition of HHS regions and Census regions, making it easier to compose factors, add labels to maps or even segment maps/combine polygons. The existing two grab current & historical detailed national & state influenza data.

There’s an example on github and in the

(https://rud.is/b/2015/01/10/new-r-package-cdcfluview-retrieve-flu-data-from-cdcs-fluview-portal/).

If you have any other data you need freed from the confines of the CDC FluView Flash portal, please file an issue & paste a screen shot (if you are comfortable with most browser Developer Tools views, even a dump of the request or “as cURL” URL would be awesome).

Riffing off of [the previous post](http://rud.is/b/2015/08/05/speeding-up-your-quests-for-r-stuff/), here’s a way to quickly search CRAN (the @RStudio flavor) from the Chrome search bar.

– Paste `chrome://settings/searchEngines` into your location bar and hit return/enter
– Scroll down until the input boxes show, enabling you to add a search engine
– For _”Add a new search engine”_ put “`CRAN`”
– For _”Keyword”_ put “`R`”, “`rstats`” or “`CRAN`”, but “`R`” is super easy to type, though it may not be optimal for you :-)
– For _”URL with %s in place of query”_ put the following:

https://www.google.com/search?as_q=%s&as_epq=&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=cran.rstudio.com&as_occt=any&safe=images&as_filetype=&as_rights=&gws_rd=ssl

(you may be able to trim that URL a bit, if desired)

Save that and then in the Chrome location bar hit `R` then `[TAB]` and you’ll be sending the query to a custom Google search that only looks on CRAN (specifically the @RStudio CRAN mirror).

I use Google quite a bit when conjuring up R projects, whether it be in a lazy pursuit of a PDF vignette or to find a package or function to fit a niche need. Inevitably, I’ll do something like [this](https://www.google.com/#q=cran+shapefile) (yeah, I’m still on a mapping kick) and the first (and best) results will come back with `https://cran.r-project.org/`-prefixed URLs. If all this works, what’s the problem? Well, the main CRAN site is, without mincing words, _slow_ much of the time. The switch to `https` on it (and it’s mostly-academic mirrors) has introduced noticeable delays.

Now, these aren’t productivity-crushing delays, but (a) why wait if you don’t have to; and, (b) why not spread the load to a whole [server farm](http://cran.rstudio.com/) dedicated to ensuring fast delivery of content? I was going to write a Chrome extension specifically for this, but I kinda figured this was a solved problem, and it is!

From the plethora of options in the Chrome Store, I grabbed [Switcheroo Redirector](https://chrome.google.com/webstore/detail/switcheroo-redirector/cnmciclhnghalnpfhhleggldniplelbg?hl=en) because (a) it has a decent user base and rating; (b) it’s not super-complex to use; and, (c) the source is [on github](https://github.com/ranjez/Switcheroo) and closely matches what’s in the actual installed extension (some extensions are tricksy/evil and you can even build your own with the source vs trust the Chrome Store one).

So, go install it and come back. We’ll wait.

OK, you back? Good. Let’s continue. You should have a Switcheroo icon near your location bar. Select it and you should see a popup like this:

Fullscreen_8_5_15__9_10_PM

I’ve already made the entry, but you just need to tell the app to substitute all URL occurrences of `cran.r-project.org` with `cran.rstudio.com` when Chrome is trying to load a URL.

Now, when you click one of those links in the above example, it will go (speedily!) to the RStudio CRAN mirror server farm.

Once nice (to security freaks like me) feature is that if you have one of the Switcheroo links open in a new tab (i.e. not directly/immediately visible to you) it will let you know that something is happening out of the ordinary:

Redirect_Notice

This is a tiny (and good) price to pay to know you’re not being whacked by a bad plugin.

If you have another preference (or have suggestions for Safari or Firefox) please drop a note in the comments so others can benefit from your experience!

There was some chatter on the twitters this week about a relatively new D3-based charting library called [TauCharts](http://taucharts.com/) (also @taucharts). The API looked pretty clean and robust, so I started working on an htmlwidget for it and was quickly joined by the Widget Master himself, @timelyportfolio.

TauCharts definitely has a “grammar of graphics” feel about it and the default aesthetics are super-nifty While the developers are actively adding new features and “geoms”, the core points (think scatterplot), lines and bars (including horizontal bars!) geoms are quite robust and definitely ready for your dashboards.

Between the two of us, we have a _substantial_ part of the [charting library API](http://api.taucharts.com/) covered. I think the only major thing left unimplemented is composite charts (i.e. lines + bars + points on the same chart) and some minor tweaks around the edges.

While you can find it on [github](http://github.com/hrbrmstr/taucharts) and do the normal:

devtools::install_github("hrbrmstr/taucharts")

or, even use the official initial release version:

devtools::install_github("hrbrmstr/taucharts@v0.1.0")

I’ll use the `dev` version:

devtools::install_github("hrbrmstr/taucharts@dev"

for the example below, mostly since it includes the data set I want to use to mimic the current, featured example on the [TauCharts homepage](http://taucharts.com/) and also has full documentation with examples.

Here’s all it takes to make a faceted scatterplot with:

– interactive tooltips
– interactive legend
– custom-selectable trendline annotation:

devtools::install_github("hrbrmstr/taucharts@dev")
 
library(taucharts)
 
data(cars_data)
 
tauchart(cars_data) %>% 
  tau_point("milespergallon", c("class", "price"), color="class") %>% 
  tau_guide_padding(bottom=300) %>% 
  tau_legend() %>% 
  tau_trendline() %>% 
  tau_tooltip(c("vehicle", "year", "class", "price", "milespergallon"))


Hybrid cars fuel economy by price and class
It seems expensive cars are less efficient.

There are _tons_ more examples in the [TauCharts RPub](http://rpubs.com/hrbrmstr/taucharts) (and soon-to-be vignette) and @timelyportfolio will be featuring it in his weekly [widget update](http://www.buildingwidgets.com/).