A Step to the Right in R Assignments

I received an out-of-band question on the use of `%<>%` in my [CDC FluView](rud.is/b/2015/01/10/new-r-package-cdcfluview-retrieve-flu-data-from-cdcs-fluview-portal/) post, and took the opportunity to address it in a broader, public fashion.

Anyone using R knows that the two most common methods of assignment are the venerable (and sensible) left arrow `<-` and it's lesser cousin `=`. `<-` has an evil sibling, `<<-`, which is used when you want/need to have R search through parent environments for an existing definition of the variable being assigned (up to the global environment). Since the introduction of the "piping idom"--`%>%`–made popular by `magrittr`, `dplyr`, `ggvis` and other packages, I have struggled with the use of `<-` in pipes. Since pipes flow data in a virtual forward motion, that LHS (left hand side) assignment has an awkward characteristic about it. Furthermore, many times you are piping from an object with the intent to replace the contents of said object. For example:

iris$Sepal.Length <- 
  iris$Sepal.Length %>%
  sqrt

(which is from the `magrittr` documentation).

To avoid the repetition of the left-hand side immediately after the assignment operator, Bache & Wickham came up with the `%<>%` operator, which shortens the above to:

iris$Sepal.Length %<>% sqrt

Try as I may (including the CDC FluView blog post), that way of assigning variables still _feels_ awkward, and is definitely confusing to new R users. But, what’s the alternative? I believe it’s R’s infrequently used `->` RHS assignment operator.

Let’s look at that in the context of the somewhat-long pipe in the CDC FluView example:

dat %>%
  mutate(REGION=factor(REGION,
                       levels=unique(REGION),
                       labels=c("Boston", "New York",
                                "Philadelphia", "Atlanta",
                                "Chicago", "Dallas",
                                "Kansas City", "Denver",
                                "San Francisco", "Seattle"),
                       ordered=TRUE)) %>%
  mutate(season_week=ifelse(WEEK>=40, WEEK-40, WEEK),
         season=ifelse(WEEK<40,
                       sprintf("%d-%d", YEAR-1, YEAR),
                       sprintf("%d-%d", YEAR, YEAR+1))) -> dat

That pipe flow says _”take `dat`, change-up some columns, make some new columns and reassign into `dat`”_. It’s a very natural flow and reads well, too, since you’re following a process up to it’s final destination. It’s even more natural in pipes that actually transform the data into something else. For example, to get a vector of the number of US male births since 1880, we’d do:

library(magrittr)
library(rvest)
 
births <- html("http://www.ssa.gov/oact/babynames/numberUSbirths.html")
 
births %>%
  html_nodes("table") %>%
  extract2(2) %>%
  html_table %>%
  use_series(Male) %>%
  gsub(",", "", .) %>%
  as.numeric -> males

That’s very readable (one of the benefits of pipes) and the flow, again, makes sense. Compare that to it’s base R counterpart:

males <- as.numeric(gsub(",", "", html_table(html_nodes(births, "table")[[2]])$Male))

The base R version is short and the LHS assignment fits well as the values “pop out” of the function calls. But, it’s also only initially, quickly readable to veteran R folks. Since code needs to be readable, maintainable and (often times) shared with folks on a team, I believe the pipes help increase overall productivity and aid in documenting what is trying to be achieved in that portion of an analysis (especially when combined with `dplyr` idioms).

Pipes are here to stay and they are definitely a part of my data analysis workflows. Moving forward, so will RHS (`->`) assignments from pipes.

Cover image from Data-Driven Security
Amazon Author Page

13 Comments A Step to the Right in R Assignments

  1. andydolman (@andydolman)

    Yes! I’ve kind of got the hang of it now, but when I was first trying to read pipes the switch in direction was really making it unintuitive. %<>% only solves the case where you want to alter the input, -> is much more general.

    Reply
  2. Eike Petersen

    Interesting, good point that you make! I also wasn’t too happy with the currently practiced solutions, and I had no idea there is a RHS assignment operator either… maybe I’m going to give it a try!

    Reply
  3. Jamie

    Glad I’m not the only one who’s faced this dilemma! Using the backward assignment operator has also become my main solution as well.

    It might be nice for dplyr or magrittr to add an enhanced assign function (assign_to??), that’s first argument is the value, second argument is the unquoted name, and the third argument is the environment with a default value of the parent environment.

    Reply
  4. jasonpbecker

    I prefer %<>% and <- for the simple reason that it’s easy to look along the left gutter and see what I’ve assigned.

    While there’s a logical consistency to “flowing” a pipe to ->, in your example with the assignment to males, I would never know that existed if I were further down in the code. With males locked on the left hand side, it’s easy to see which objects exist. Otherwise, how do I know I assigned it to males versus reassigning to the same starting frame like in the dat example?

    Reply
  5. M. Edward Borasky (@znmeb)

    I have a special can opener just for cans of worms ;-). But seriously, I’ve been programming a long time and I’ve always been a fan of right-pointing assignment. It’s the STORE operation in assembler, fercryinoutloud!

    But IIRC the R folks deprecated it a while back Have they un-deprecated it or are we right-assigners doomed to piss into a gale? ;-)

    Reply
    1. Alexis Iglauer

      I’m very much with Stefan here — I think the RHS operator is dangerous, especially if one is trying to understand code, as one only discovers at the end whether there is an operation happening or an assignment.

      Reply
  6. stefflocke

    Where I was using magrittr and using a format like

    noun %>%
    verb %>%
    verb ->
    noun

    it felt quite natural to use the RHS version.

    However, I think in longer chunks of code, function building etc it would prove tougher for comprehensibility. I also try to make code accessible for beginners so I always try to be consistent by using the LHS assignment, and never =, therefore the RHS will probably not make showings outside of quick scripts for me.

    Reply
  7. Pingback: Which Symbol Should I Use for Assignment? – Jocelyn Ireson-Paine's Blog

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.