Skip navigation

Category Archives: R

It’s only been a couple days since the initial version of my revamped take on RSwitch but there have been numerous improvements since then worth mentioning.

For starters, there’s a new app icon that uses the blue and gray from the official (modern) R logo to help visually associate it with R:

In similar fashion, the menubar icon now looks better in dark mode (I may still tweak it a bit, tho).

There are also some new features in the menu bar from v1.0.0:

  • numbered shortcuts for each R version
  • handy menubar links to R resources on the internet
  • two menubar links to make it easier to download the latest RStudio dailies by hand (if you’re not using something like Homebrew already for that) and the latest R-devel macOS distribution tarball
  • saner/cleaner alerts

On tap for 1.4.0 is using Notification Center for user messaging vs icky alerts and, perhaps, some TouchBar icons for Mac folk with capable MacBook Pro models.

FIN

As usual, kick the tyres & file issues, questions, feature requests & PRs where you like.

I caught this post on the The Surprising Number Of Programmers Who Can’t Program from the Hacker News RSS feed. Said post links to another, classic post on the same subject and you should read both before continuing.

Back? Great! Let’s dig in.

Why does hrbrmstr care about this?

Offspring #3 completed his Freshman year at UMaine Orono last year but wanted to stay academically active over the summer (he’s majoring in astrophysics and knows he’ll need some programming skills to excel in his field) and took an introductory C++ course from UMaine that was held virtually, with 1 lecture per week (14 weeks IIRC) and 1 assignment due per week with no other grading.

After seeing what passes for a standard (UMaine is not exactly on the top list of institutions to attend if one wants to be a computer scientist) intro C++ course, I’m not really surprised “Johnny can’t code”. Thirteen weeks in the the class finally started covering OO concepts, and the course is ending with a scant intro to polymorphism. Prior to this, most of the assignments were just variations on each other (read from stdin, loop with conditionals, print output) with no program going over 100 LoC (that includes comments and spacing). This wasn’t a “compsci for non-compsci majors” course, either. Anyone majoring in an area of study that requires programming could have taken this course to fulfill one of the requirements, and they’d be set on a path of forever using StackOverflow copypasta to try to get their future work done.

I’m fairly certain most of #3’s classmates could not program fizzbuzz without googling and even more certain most have no idea they weren’t really “coding in C++” most of the course.

If this is how most other middling colleges are teaching the basics of computer programming, it’s no wonder employers are having a difficult time finding qualified talent.

You have an “R” tag — actually, a few language tags — on this post, so where’s the code?

After the article triggered the lament in the previous section, a crazy, @coolbutuseless-esque thought came into my head: “I wonder how many different language FizzBuz solutions can be created from within R?”.

The criteria for that notion is/was that there needed to be some Rcpp::cppFunction(), reticulate::py_run_string(), V8 context eval()-type way to have the code in-R but then run through those far-super-to-any-other-language’s polyglot extensibility constructs.

Before getting lost in the weeds, there were some other thoughts on language inclusion:

  • Should Java be included? I :heart: {rJava}, but cat()-ing Java code out and running system() to compile it first seemed like cheating (even though that’s kinda just what cppFunction() does). Toss a note into a comment if you think a Java example should be added (or add said Java example in a comment or link to it in one!).
  • I think Julia should be in this example list but do not care enough about it to load {JuliaCall} and craft an example (again, link or post one if you can crank it out quickly).
  • I think Lua could be in this example given the existence of {luar}. If you agree, give it a go!
  • Go & Rust compiled code can also be called in R (thanks to Romain & Jeroen) once they’re turned into C-compatible libraries. Should this polyglot example show this as well?
  • What other languages am I missing?

The aforementioned “weeds”

One criteria for each language fizzbuzz example is that they need to be readable, not hacky-cool. That doesn’t mean the solutions still can’t be a bit creative. We’ll lightly go through each one I managed to code up. First we’ll need some helpers:

suppressPackageStartupMessages({
  library(purrr)
  library(dplyr)
  library(reticulate)
  library(V8)
  library(Rcpp)
})

The R, JavaScript, and Python implementations are all in the microbenchmark() call way down below. Up here are C and C++ versions. The C implementation is boring and straightforward, but we’re using Rprintf() so we can capture the output vs have any output buffering woes impact the timings.

cppFunction('
void cbuzz() {

  // super fast plain C

  for (unsigned int i=1; i<=100; i++) {
    if      (i % 15 == 0) Rprintf("FizzBuzz\\n");
    else if (i %  3 == 0) Rprintf("Fizz\\n");
    else if (i %  5 == 0) Rprintf("Buzz\\n");
    else Rprintf("%d\\n", i);
  }

}
')

The cbuzz() example is just fine even in C++ land, but we can take advantage of some C++11 vectorization features to stay formally in C++-land and play with some fun features like lambdas. This will be a bit slower than the C version plus consume more memory, but shows off some features some folks might not be familiar with:

cppFunction('
void cppbuzz() {

  std::vector<int> numbers(100); // will eventually be 1:100
  std::iota(numbers.begin(), numbers.end(), 1); // kinda sorta equiva of our R 1:100 but not exactly true

  std::vector<std::string> fb(100); // fizzbuzz strings holder

  // transform said 1..100 into fizbuzz strings
  std::transform(
    numbers.begin(), numbers.end(), 
    fb.begin(),
    [](int i) -> std::string { // lambda expression are cool like a fez
        if      (i % 15 == 0) return("FizzBuzz");
        else if (i %  3 == 0) return("Fizz");
        else if (i %  5 == 0) return("Buzz");
        else return(std::to_string(i));
    }
  );

  // round it out with use of for_each and another lambda
  // this turns out to be slightly faster than range-based for-loop
  // collection iteration syntax.
  std::for_each(
    fb.begin(), fb.end(), 
    [](std::string s) { Rcout << s << std::endl; }
  );

}
', 
plugins = c('cpp11'))

Both of those functions are now available to R.

Next, we need to prepare to run JavaScript and Python code, so we’ll initialize both of those environments:

ctx <- v8()

py_config() # not 100% necessary but I keep my needed {reticulate} options in env vars for reproducibility

Then, we tell R to capture all the output. Using sink() is a bit better than capture.output() in this use-case since to avoid nesting calls, and we need to handle Python stdout the same way py_capture_output() does to be fair in our measurements:

output_tools <- import("rpytools.output")
restore_stdout <- output_tools$start_stdout_capture()

cap <- rawConnection(raw(0), "r+")
sink(cap)

There are a few implementations below across the tidy and base R multiverse. Some use vectorization; some do not. This will let us compare overall “speed” of solution. If you have another suggestion for a readable solution in R, drop a note in the comments:

microbenchmark::microbenchmark(

  # tidy_vectors_case() is slowest but you get all sorts of type safety 
  # for free along with very readable idioms.

  tidy_vectors_case = map_chr(1:100, ~{ 
    case_when(
      (.x %% 15 == 0) ~ "FizzBuzz",
      (.x %%  3 == 0) ~ "Fizz",
      (.x %%  5 == 0) ~ "Buzz",
      TRUE ~ as.character(.x)
    )
  }) %>% 
    cat(sep="\n"),

  # tidy_vectors_if() has old-school if/else syntax but still
  # forces us to ensure type safety which is cool.

  tidy_vectors_if = map_chr(1:100, ~{ 
    if (.x %% 15 == 0) return("FizzBuzz")
    if (.x %%  3 == 0) return("Fizz")
    if (.x %%  5 == 0) return("Buzz")
    return(as.character(.x))
  }) %>% 
    cat(sep="\n"),

  # walk() just replaces `for` but stays in vector-land which is cool

  tidy_walk = walk(1:100, ~{
    if (.x %% 15 == 0) cat("FizzBuzz\n")
    if (.x %%  3 == 0) cat("Fizz\n")
    if (.x %%  5 == 0) cat("Buzz\n")
    cat(.x, "\n", sep="")
  }),

  # vapply() gets us some similiar type assurance, albeit with arcane syntax

  base_proper = vapply(1:100, function(.x) {
    if (.x %% 15 == 0) return("FizzBuzz")
    if (.x %%  3 == 0) return("Fizz")
    if (.x %%  5 == 0) return("Buzz")
    return(as.character(.x))
  }, character(1), USE.NAMES = FALSE) %>% 
    cat(sep="\n"),

  # sapply() is def lazy but this can outperform vapply() in some
  # circumstances (like this one) and is a bit less arcane.

  base_lazy = sapply(1:100, function(.x) {
    if (.x %% 15 == 0)  return("FizzBuzz")
    if (.x %%  3 == 0) return("Fizz")
    if (.x %%  5 == 0) return("Buzz")
    return(.x)
  }, USE.NAMES = FALSE) %>% 
    cat(sep="\n"),

  # for loops...ugh. might as well just use C

  base_for = for(.x in 1:100) {
    if      (.x %% 15 == 0) cat("FizzBuzz\n")
    else if (.x %%  3 == 0) cat("Fizz\n")
    else if (.x %%  5 == 0) cat("Buzz\n")
    else cat(.x, "\n", sep="")
  },

  # ok, we'll just use C!

  c_buzz = cbuzz(),

  # we can go back to vector-land in C++

  cpp_buzz = cppbuzz(),

  # some <3 for javascript

  js_readable = ctx$eval('
for (var i=1; i <101; i++){
  if      (i % 15 == 0) console.log("FizzBuzz")
  else if (i %  3 == 0) console.log("Fizz")
  else if (i %  5 == 0) console.log("Buzz")
  else console.log(i)
}
'),

  # icky readable, non-vectorized python

  python = reticulate::py_run_string('
for x in range(1, 101):
  if (x % 15 == 0):
    print("Fizz Buzz")
  elif (x % 5 == 0):
    print("Buzz")
  elif (x % 3 == 0):
    print("Fizz")
  else:
    print(x)
')

) -> res

Turn off output capturing:

sink()
if (!is.null(restore_stdout)) invisible(output_tools$end_stdout_capture(restore_stdout))

We used microbenchmark(), so here are the results:

res
## Unit: microseconds
##               expr       min         lq        mean     median         uq       max neval   cld
##  tidy_vectors_case 20290.749 21266.3680 22717.80292 22231.5960 23044.5690 33005.960   100     e
##    tidy_vectors_if   457.426   493.6270   540.68182   518.8785   577.1195   797.869   100  b   
##          tidy_walk   970.455  1026.2725  1150.77797  1065.4805  1109.9705  8392.916   100   c  
##        base_proper   357.385   375.3910   554.13973   406.8050   450.7490 13907.581   100  b   
##          base_lazy   365.553   395.5790   422.93719   418.1790   444.8225   587.718   100 ab   
##           base_for   521.674   545.9155   576.79214   559.0185   584.5250   968.814   100  b   
##             c_buzz    13.538    16.3335    18.18795    17.6010    19.4340    33.134   100 a    
##           cpp_buzz    39.405    45.1505    63.29352    49.1280    52.9605  1265.359   100 a    
##        js_readable   107.015   123.7015   162.32442   174.7860   187.1215   270.012   100 ab   
##             python  1581.661  1743.4490  2072.04777  1884.1585  1985.8100 12092.325   100    d 

Said results are 🤷🏻‍♀️ since this is a toy example, but I wanted to show that Jeroen’s {V8} can be super fast, especially when there’s no value marshaling to be done and that some things you may have thought should be faster, aren’t.

FIN

Definitely add links or code for changes or additions (especially the aforementioned other languages). Hopefully my lament about the computer science program at UMaine is not universally true for all the programming courses there.

At the bottom of the R for macOS Developer’s Page there’s mention of an “other binary” called “RSwitch” that is “a small GUI that allows you to switch between R versions quickly (if you have multiple versions of R framework installed).” Said switching requires you to use the “tar.gz” versions of R from the R for macOS Developer’s Page since the official CRAN binary installers clean up after themselves quite nicely to prevent potentially wacky behavior.

All the RSwitch GUI did was change the Current alias target in /Library/Frameworks/R.framework/Versions to the appropriate version. You can do that from the command line but the switcher GUI was created so that means some folks prefer click-switching and I have found myself using the GUI on occasion (before it stopped working on macOS Vista^wCatalina).

Since I:

  • work on Catalina most of the day
  • play with oldrel and devel versions of R
  • needed to brush up on Swift 5 coding
  • wanted RSwitch as a menubar app vs one with a dialog that I could easily lose across 15 desktops
  • decided to see if it was possible to make it work sandboxed (TLDR: it isn’t)
  • really wanted a different icon for the binary
  • couldn’t sleep last night

there was sufficient justification to create a 64-bit version of this app.

You can clone the project from any of the following social coding sites:

and, you can either compile it yourself — which is recommended since it’s 2019 and the days of even remotely trusting binaries off the internet are long gone — or build it. It should work on 10.14+ since I set that as the target, but file an issue where you like if you have, well, issues with the code or binary.

Once you do have it working, there will be a dial-switch menu in the menubar and a menu that should look something like:

The item with the checkbox is the Current alias.

FIN

Kick the tyres, file issues & PRs as you’re wont to do and prepare for the forthcoming clickpocalypse as Apple nears their GA release of Catalina.

It’s been yet-another weirdly busy summer but I’m finally catching up on noting some recent-ish developments in the blog.

First up is a full rewrite of the {wand} pacakge which still uses magic but is 100% R code (vs a mix of compiled C and R code) and now works on Windows. A newer version will be up on CRAN in a bit that has additional MIME mappings and enables specifying a custom extension mapping data frame. You’ve seen this beast on the blog before, though, by another name.

wand::get_content_type("/etc/syslog.conf")
## [1] "text/plain"

Next is the {ulid} package (which I’ve also previously discussed, here) which is also now on CRAN to meet all your Universally Unique Lexicographically Sortable Identifiers-generation needs.

ulid::ulid_generate()
## [1] "0001EKRGTCRSVA4ACSCQJA61A0"

ulid::unmarshal("0001EKRGTCRSVA4ACSCQJA61A0")
##                    ts              rnd
## 1 2019-07-27 08:27:56 RSVA4ACSCQJA61A0

The [{testthat}] gravity well has caught over 4,000 CRAN packages but it’s not the only testing game in town. The {tinytest} package take slightly more minimalist approach and has the added benefit that the tests come along for the ride with the package, which makes it easier to solicit said test results from package users having problems with your code.

I still use {testthat} in most of my packages but gave {tinytest} a spin for a few of my more recent ones and it’s pretty nifty. The biggest feature I missed when switching to it was the lack of Cmd-Shift-T support for it in RStudio. Since I kinda still want Cmd-Shift-T for all the packages I have that use {testthat} I whipped up an RStudio addin that adds an Addin context menu item (below) and placed it in {hrbraddins}.

If you load up that package you can then bind something like Cmd-Option-Shift-T to that function and have equally quick keystroke access to package tests during development.

> hrbraddins:::run_tiny_test() # from within the {wand} pkg
Running test_wand.R...................   52 tests OK
All ok (52 results)

Finally, I got bit in a recent CRAN submission because I have remote CRAN checks turned off (soooo sloooowww) but had a 404’ing URL in the documentation of one of the methods in the package. Since I have a few more submissions coming up in the next 6-8 weeks I decided to whip up an RStudio addin for an on-the-fly package URL checker that wraps the exact same checks CRAN does for these submissions. (I keybound this to Cmd-Option-Shift-U and you can catch a glimpse of it in the addins menu in the above screenshot).

The output of one run is below. I deliberately modified two working URLs to show what you get output-wise when everything isn’t perfect:

> hrbraddins:::check_package_urls()
Gathering URLs for {wand} (this may take a bit)
- Looking in HTML files...
- Looking in metadata files...
- Looking in news files...
- Looking in Rd files...
- Looking in README files...
- Looking in source files...
- Looking in PDF files...

Checking found URLs (this may also take a bit)
# A tibble: 13 x 4
   url                                                        parent      status is_https
   <chr>                                                      <chr>        <dbl> <lgl>   
 1 https://gitlab.com/hrbrmstr/wand/issuessss                 DESCRIPTION    599 TRUE    
 2 http://gitlab.com/hrbrmstr/wand                            DESCRIPTION    200 FALSE   
 3 https://github.com/jshttp/mime-db                          man/wand.Rd    200 TRUE    
 4 https://github.com/threatstack/libmagic/tree/master/magic/ man/wand.Rd    200 TRUE    
 5 https://ci.appveyor.com/project/hrbrmstr/wand              README.md      200 TRUE    
 6 https://codecov.io/gh/hrbrmstr/wand                        README.md      200 TRUE    
 7 https://cranchecks.info/pkgs/wand                          README.md      200 TRUE    
 8 https://github.com/r-lib/remotes                           README.md      200 TRUE    
 9 https://github.com/threatstack/libmagic/tree/master/magic/ README.md      200 TRUE    
10 https://keybase.io/hrbrmstr                                README.md      200 TRUE    
11 https://travis-ci.org/hrbrmstr/wand                        README.md      200 TRUE    
12 https://www.r-pkg.org/pkg/wand                             README.md      200 TRUE    
13 https://www.repostatus.org/#active                         README.md      200 TRUE  

The {hrbraddins} package itself is just a playground and will never see CRAN, so do not hesitate to yank anything from it and put it in a safer and/or more accessible location for your own work.

FIN

For the packages, file issues and PRs the same way you always would. Same goes for the addins.

Despite being a full-on denizen of all things digital I receive a fair number of dead-tree print magazines as there’s nothing quite like seeing an amazing, large, full-color print data-driven visualization up close and personal. I also like supporting data journalism through the subscriptions since without cash we will only have insane, extreme left/right-wing perspectives out there.

One of these publications is The Economist (I’d subscribe to the Financial Times as well for the non-liberal perspective but I don’t need another mortgage payment right now). The graphics folks at The Economist are Top Notch™ and a great source of inspiration to “do better” when cranking out visuals.

After reading a recent issue, one of their visualization styles stuck in my head. Specifically, the one from this story on the costs of a mammogram. I’ve put it below:

Essentially it’s a boxplot with outliers removed along with significantly different aesthetics than the one we’re all used to seeing. I would not use this for exploratory data analysis or working with other data science team members when poking at a problem but I really like the idea of making “distributions” easier to consume for general audiences and believe The Economist graphics folks have done a superb job focusing on the fundamentals (both statistical and aesthetic).

There are ways to hack something like those out manually in {ggplot2} but it would be nice to just be able to swap out something for geom_boxplot() when deciding to go to “production” with a distribution chart. Thus begat {ggeconodist}.

Since this is just a “quick hit” post we’ll avoid some interim blathering to note that we can use {ggplot2} (and a touch of {grid}) to make the following:

with just a tiny bit of R code:

library(ggeconodist)

ggplot(mammogram_costs, aes(x = city)) +
  geom_econodist(
    aes(ymin = tenth, median = median, ymax = ninetieth), stat = "identity"
  ) +
  scale_y_continuous(expand = c(0,0), position = "right", limits = range(0, 800)) +
  coord_flip() +
  labs(
    x = NULL, y = NULL,
    title = "Mammoscams",
    subtitle = "United States, prices for a mammogram*\nBy metro area, 2016, $",
    caption = "*For three large insurance companies\nSource: Health Care Cost Institute"
  ) +
  theme_econodist() -> gg

grid.newpage()
left_align(gg, c("subtitle", "title", "caption")) %>% 
  add_econodist_legend(econodist_legend_grob(), below = "subtitle") %>% 
  grid.draw()

FIN

A future post (and, even — perhaps — a new screen sharing video) will describe what went into making this new geom, stat, and theme (along with some info on how I managed to reproduce data for the vis since none was provided).

In the interim, hit up the CINC page on {ggeconodist} to learn more about the package. You may want to take a quick look at {hrbrthemes} since it might have some helpers for using all the required theming components.

So kick the tyres, file issues/PRs, and be on the lookout for the director’s cut of the making of {ggeconodist}.

The {waffle} package got some 💙 this week and now has a substantially improved geom_waffle() along with a brand new sibling function geom_pictogram() which has all the powerful new features of geom_waffle() but lets you use Font Awesome 5 brand and solid glyphs to make isotype pictograms.

This slideshow requires JavaScript.

A major new feature is that stat_waffle() (which powers both geoms) has an option to auto-compute proportions so you can use a proper 10×10 grid to show parts of a whole without doing any extra work (works in facet contexts, too).

You can look at a preview of the vignettes below or bust the iframes to see waffles and pictograms in action.

Building Waffle Charts

Building Pictograms

FIN

You can get the updated {waffle} code at your preferred social coding service (See the list for {waffle} over at CINC.

It needs much tyre kicking, especially the pictogram geom. File issues/PRs wherever you’re comfortable.

The first U.S. Democratic debates of the 2020 election season were held over two nights this past week due to the daft number of candidates running for POTUS. The spiffy @NYTgraphics folks took the tallies of time spent blathering by each speaker/topic and made rounded rectangle segmented bar charts ordered by the time the blathering was performed (these aren’t really debates, they’re loosely-prepared for performances) which I have dubbed “chicklet” charts due to a vague resemblance to the semi-popular gum/candy.

You can see each day’s live, javascript-created NYTimes charts here:

and this is a PNG snapshot of one of them:

nytimes chicklet chart for debate day 1

I liked the chicklet aesthetic enough to make a new {ggplot2} geom_chicklet() to help folks make them. To save some blog bytes, you can read how to install the package over at https://cinc.rud.is/web/packages/ggchicklet/.

Making Chicklet Charts

Since the @NYTimes chose to use javascript to make their chart they also kinda made the data available (view the source of both of the aforelinked URLs) which I’ve wrangled a bit and put into the {ggchicklet} package. We’ll use it to progress from producing basic bars to crunching out chicklets and compare all the candidates across both days.

While the @Nytimes chart(s) provide a great deal of information, most media outlets focused on how much blather time each candidate managed to get. We do not need anything fancier than a bar chart or table to do that:

library(hrbrthemes)
library(ggchicklet)
library(tidyverse)

data("debates2019")

count(debates2019, speaker, wt=elapsed, sort=TRUE) %>%
  mutate(speaker = fct_reorder(speaker, n, sum, .desc=FALSE)) %>%
  mutate(speaker = fct_inorder(speaker) %>% fct_rev()) %>%
  ggplot(aes(speaker,n)) +
  geom_col() +
  scale_y_comma(position = "right") +
  coord_flip() +
  labs(x = NULL, y = "Minutes Spoken") +
  theme_ipsum_rc(grid="X")

ordered blather time chart for each spaker

If we want to see the same basic view but include how much time each speaker spent on each topic, we can also do that without much effort:

count(debates2019, speaker, topic, wt=elapsed, sort=TRUE) %>%
  mutate(speaker = fct_reorder(speaker, n, sum, .desc=FALSE)) %>%
  ggplot(aes(speaker, n , fill = topic)) +
  geom_col() +
  scale_y_comma(position = "right") +
  ggthemes::scale_fill_tableau("Tableau 20", name = NULL) +
  coord_flip() +
  labs(x = NULL, y = "Minutes Spoken") +
  theme_ipsum_rc(grid="X") +
  theme(legend.position = "bottom")

time spent per topic per speaker

By default geom_col() is going to use the fill aesthetic to group the bars and use the default sort order to stack them together.

We can also get a broken out view by not doing the count() and just letting the segments group together and use a white bar outline to keep them distinct:

debates2019 %>%
  mutate(speaker = fct_reorder(speaker, elapsed, sum, .desc=FALSE)) %>%
  ggplot(aes(speaker, elapsed, fill = topic)) +
  geom_col(color = "white") +
  scale_y_comma(position = "right") +
  ggthemes::scale_fill_tableau("Tableau 20", name = NULL) +
  coord_flip() +
  labs(x = NULL, y = "Minutes Spoken") +
  theme_ipsum_rc(grid="X") +
  theme(legend.position = "bottom")

grouped distinct topics

While I liked the rounded rectangle aesthetic, I also really liked how the @nytimes ordered the segments by when the topics occurred during the debate. For other types of chicklet charts you don’t need the grouping variable to be a time-y whime-y column, just try to use something that has a sane ordering characteristic to it:

debates2019 %>%
  mutate(speaker = fct_reorder(speaker, elapsed, sum, .desc=FALSE)) %>%
  ggplot(aes(speaker, elapsed, group = timestamp, fill = topic)) +
  geom_col(color = "white", position = position_stack(reverse = TRUE)) +
  scale_y_comma(position = "right") +
  ggthemes::scale_fill_tableau("Tableau 20", name = NULL) +
  coord_flip() +
  labs(x = NULL, y = "Minutes Spoken") +
  theme_ipsum_rc(grid="X") +
  theme(legend.position = "bottom")

distinct topics ordered by when blathered abt during each debate

That last chart is about as far as you could go to reproduce the @nytimes look-and-feel without jumping through some serious gg-hoops.

I had made a rounded rectangle hidden geom to make rounded-corner tiles for the {statebins} package so making a version of ggplot2::geom_col() (which I also added to {ggplot2}) was pretty straightforward. There are some key differences in the defaults of geom_chicklet():

  • a “white” stroke for the chicklet/segment (geom_col() has NA for the stroke)
  • automatic reversing of the group order (geom_col() uses the standard sort order)
  • radius setting of unit(3, "px") (change this as you need)
  • chicklet legend geom (b/c they aren’t bars or points)

You likely just want to see it in action, so here it is without further adieu:

debates2019 %>%
  mutate(speaker = fct_reorder(speaker, elapsed, sum, .desc=FALSE)) %>%
  ggplot(aes(speaker, elapsed, group = timestamp, fill = topic)) +
  geom_chicklet(width = 0.75) +
  scale_y_continuous(
    expand = c(0, 0.0625),
    position = "right",
    breaks = seq(0, 14, 2),
    labels = c(0, sprintf("%d min.", seq(2, 14, 2)))
  ) +
  ggthemes::scale_fill_tableau("Tableau 20", name = NULL) +
  coord_flip() +
  labs(
    x = NULL, y = NULL, fill = NULL,
    title = "How Long Each Candidate Spoke",
    subtitle = "Nights 1 & 2 of the June 2019 Democratic Debates",
    caption = "Each bar segment represents the length of a candidate’s response to a question.\n\nOriginals <https://www.nytimes.com/interactive/2019/admin/100000006581096.embedded.html?>\n<https://www.nytimes.com/interactive/2019/admin/100000006584572.embedded.html?>\nby @nytimes Weiyi Cai, Jason Kao, Jasmine C. Lee, Alicia Parlapiano and Jugal K. Patel\n\n#rstats reproduction by @hrbrmstr"
  ) +
  theme_ipsum_rc(grid="X") +
  theme(axis.text.x = element_text(color = "gray60", size = 10)) +
  theme(legend.position = "bottom")

chiclet chart with all the topics

Yes, I upped the ggplot2 tweaking a bit to get closer to the @nytimes (FWIW I like the Y gridlines, YMMV) but didn’t have to do much else to replace geom_col() with geom_chicket(). You’ll need to play with the segment width value depending on the size of your own, different plots to get the best look (just like you do with any other geom).

Astute, intrepid readers will note that the above chart has all the topics whereas the @nytimes just has a few. We can do the grouping of non-salient topics into an “Other” category with forcats::fct_other() and make a manual fill scale from the values stolen fromused in homage from the @nytimes:

debates2019 %>%
  mutate(speaker = fct_reorder(speaker, elapsed, sum, .desc=FALSE)) %>%
  mutate(topic = fct_other(
    topic,
    c("Immigration", "Economy", "Climate Change", "Gun Control", "Healthcare", "Foreign Policy"))
  ) %>%
  ggplot(aes(speaker, elapsed, group = timestamp, fill = topic)) +
  geom_chicklet(width = 0.75) +
  scale_y_continuous(
    expand = c(0, 0.0625),
    position = "right",
    breaks = seq(0, 14, 2),
    labels = c(0, sprintf("%d min.", seq(2, 14, 2)))
  ) +
  scale_fill_manual(
    name = NULL,
    values = c(
      "Immigration" = "#ae4544",
      "Economy" = "#d8cb98",
      "Climate Change" = "#a4ad6f",
      "Gun Control" = "#cc7c3a",
      "Healthcare" = "#436f82",
      "Foreign Policy" = "#7c5981",
      "Other" = "#cccccc"
    ),
    breaks = setdiff(unique(debates2019$topic), "Other")
  ) +
  guides(
    fill = guide_legend(nrow = 1)
  ) +
  coord_flip() +
  labs(
    x = NULL, y = NULL, fill = NULL,
    title = "How Long Each Candidate Spoke",
    subtitle = "Nights 1 & 2 of the June 2019 Democratic Debates",
    caption = "Each bar segment represents the length of a candidate’s response to a question.\n\nOriginals <https://www.nytimes.com/interactive/2019/admin/100000006581096.embedded.html?>\n<https://www.nytimes.com/interactive/2019/admin/100000006584572.embedded.html?>\nby @nytimes Weiyi Cai, Jason Kao, Jasmine C. Lee, Alicia Parlapiano and Jugal K. Patel\n\n#rstats reproduction by @hrbrmstr"
  ) +
  theme_ipsum_rc(grid="X") +
  theme(axis.text.x = element_text(color = "gray60", size = 10)) +
  theme(legend.position = "top")

final chicklet chart

FIN

Remember, you can find out how to install {ggchicklet} and also where you can file issues or PRs over at https://cinc.rud.is/web/packages/ggchicklet/. The package has full documentation, including a vignette, but if any usage help is lacking, definitely file an issue.

If you use the package, don’t hesitate to share your creations in a comment or on Twitter so other folks can see how to use the package in different contexts.

The r-project.org domain had some temporary technical difficulties this week (2019-29) that made reaching R-related resources problematic for a bunch of folks for a period of time. Incidents like this underscore the need for regional and network diversity when it comes to ensuring the availability of DNS services. That is, it does no good if you have two DNS servers if they’re both connected to the same power source and/or network connection since if power goes out or the network gets wonky no client will be able to translate r-project.org to an IP address that it can then connect to.

I’m not at-keyboard much this week so only had time to take an external poke at the (new) r-project.org DNS configuration late yesterday and today before the sleepyhead vacationers emerged from slumber. To my surprise, the r-project.org current DNS setup allows full zone transfers, which means you can get the full “database” of r-project.org DNS records if you know the right incantations.

So, I wrote a small R function wrapper for the dig command using {processx}. Folks on the legacy Windows operating system are on your own for getting a copy of dig installed but users of proper, modern operating systems like Linux or macOS should have it installed by-default (or will be an easy package manager grab away).

Wrapping dig

The R-wrapper for the dig command is pretty straightforward:

library(stringi) # string processing
library(processx) # system processes orchestration
library(tidyverse) # good data wrangling idioms

dig <- function(..., cat = TRUE) {

  processx::run(
    command = unname(Sys.which("dig")), 
    args = unlist(list(...)),
  ) -> out

  if (cat) message(out$stdout)

  invisible(out)

}

We expand the ellipses into command arguments, run the command, return the output and optionally display the output via message().

Let’s see if it works by getting the dig help:

dig("-h")
## Usage:  dig [@global-server] [domain] [q-type] [q-class] {q-opt}
##             {global-d-opt} host [@local-server] {local-d-opt}
##             [ host [@local-server] {local-d-opt} [...]]
## Where:  domain    is in the Domain Name System
##         q-class  is one of (in,hs,ch,...) [default: in]
##         q-type   is one of (a,any,mx,ns,soa,hinfo,axfr,txt,...) [default:a]
##                  (Use ixfr=version for type ixfr)
##         q-opt    is one of:
##                  -4                  (use IPv4 query transport only)
##                  -6                  (use IPv6 query transport only)
##                  -b address[#port]   (bind to source address/port)
##                  -c class            (specify query class)
##                  -f filename         (batch mode)
##                  -i                  (use IP6.INT for IPv6 reverse lookups)
##                  -k keyfile          (specify tsig key file)
##                  -m                  (enable memory usage debugging)
##                  -p port             (specify port number)
##                  -q name             (specify query name)
##                  -t type             (specify query type)
##                  -u                  (display times in usec instead of msec)
##                  -x dot-notation     (shortcut for reverse lookups)
##                  -y [hmac:]name:key  (specify named base64 tsig key)
##         d-opt    is of the form +keyword[=value], where keyword is:
##                  +[no]aaonly         (Set AA flag in query (+[no]aaflag))
##                  +[no]additional     (Control display of additional section)
##                  +[no]adflag         (Set AD flag in query (default on))
##                  +[no]all            (Set or clear all display flags)
##                  +[no]answer         (Control display of answer section)
##                  +[no]authority      (Control display of authority section)
##                  +[no]besteffort     (Try to parse even illegal messages)
##                  +bufsize=###        (Set EDNS0 Max UDP packet size)
##                  +[no]cdflag         (Set checking disabled flag in query)
##                  +[no]cl             (Control display of class in records)
##                  +[no]cmd            (Control display of command line)
##                  +[no]comments       (Control display of comment lines)
##                  +[no]crypto         (Control display of cryptographic fields in records)
##                  +[no]defname        (Use search list (+[no]search))
##                  +[no]dnssec         (Request DNSSEC records)
##                  +domain=###         (Set default domainname)
##                  +[no]edns[=###]     (Set EDNS version) [0]
##                  +ednsflags=###      (Set EDNS flag bits)
##                  +[no]ednsnegotiation (Set EDNS version negotiation)
##                  +ednsopt=###[:value] (Send specified EDNS option)
##                  +noednsopt          (Clear list of +ednsopt options)
##                  +[no]expire         (Request time to expire)
##                  +[no]fail           (Don't try next server on SERVFAIL)
##                  +[no]identify       (ID responders in short answers)
##                  +[no]idnout         (convert IDN response)
##                  +[no]ignore         (Don't revert to TCP for TC responses.)
##                  +[no]keepopen       (Keep the TCP socket open between queries)
##                  +[no]multiline      (Print records in an expanded format)
##                  +ndots=###          (Set search NDOTS value)
##                  +[no]nsid           (Request Name Server ID)
##                  +[no]nssearch       (Search all authoritative nameservers)
##                  +[no]onesoa         (AXFR prints only one soa record)
##                  +[no]opcode=###     (Set the opcode of the request)
##                  +[no]qr             (Print question before sending)
##                  +[no]question       (Control display of question section)
##                  +[no]recurse        (Recursive mode)
##                  +retry=###          (Set number of UDP retries) [2]
##                  +[no]rrcomments     (Control display of per-record comments)
##                  +[no]search         (Set whether to use searchlist)
##                  +[no]short          (Display nothing except short
##                                       form of answer)
##                  +[no]showsearch     (Search with intermediate results)
##                  +[no]split=##       (Split hex/base64 fields into chunks)
##                  +[no]stats          (Control display of statistics)
##                  +subnet=addr        (Set edns-client-subnet option)
##                  +[no]tcp            (TCP mode (+[no]vc))
##                  +time=###           (Set query timeout) [5]
##                  +[no]trace          (Trace delegation down from root [+dnssec])
##                  +tries=###          (Set number of UDP attempts) [3]
##                  +[no]ttlid          (Control display of ttls in records)
##                  +[no]vc             (TCP mode (+[no]tcp))
##         global d-opts and servers (before host name) affect all queries.
##         local d-opts and servers (after host name) affect only that lookup.
##         -h                           (print help and exit)
##         -v                           (print version and exit)

To get the DNS records of r-project.org DNS we need to find the nameservers, which we can do via:

ns <- dig("+short", "NS", "@9.9.9.9", "r-project.org")
## ns1.wu-wien.ac.at.
## ns2.urbanek.info.
## ns1.urbanek.info.
## ns3.urbanek.info.
## ns4.urbanek.info.
## ns2.wu-wien.ac.at.

There are six of them (which IIRC is a few more than they had earlier this week). I wanted to see if any supported zone transfers. Here’s one way to do that:

stri_split_lines(ns$stdout, omit_empty = TRUE) %>% # split the response in stdout into lines
  flatten_chr() %>% # turn the list into a character vector
  map_df(~{ # make a data frame out of the following
    tibble(
      ns = .x, # the nameserver we are probing
      res = dig("+noall", "+answer", "AXFR", glue::glue("@{.x}"), "r-project.org", cat = FALSE) %>%  # the dig zone transfer request
        pluck("stdout") # we only want the `stdout` element of the {processx} return value
    )
  }) -> xdf

xdf
## # A tibble: 6 x 2
##   ns              res                                                  
##   <chr>           <chr>                                                
## 1 ns1.wu-wien.ac… "; Transfer failed.\n"                               
## 2 ns2.urbanek.in… "R-project.org.\t\t7200\tIN\tSOA\tns0.wu-wien.ac.at.…
## 3 ns1.urbanek.in… "R-project.org.\t\t7200\tIN\tSOA\tns0.wu-wien.ac.at.…
## 4 ns3.urbanek.in… "R-project.org.\t\t7200\tIN\tSOA\tns0.wu-wien.ac.at.…
## 5 ns4.urbanek.in… "R-project.org.\t\t7200\tIN\tSOA\tns0.wu-wien.ac.at.…
## 6 ns2.wu-wien.ac… "; Transfer failed.\n" 

(NOTE: You may not get things in the same order if you try this at home due to the way DNS queries and responses work.)

So, two servers did not accept our request but four did. Let’s see what a set of zone transfer records looks like:

cat(xdf[["res"]][[2]]) 
## R-project.org.    7200  IN  SOA ns0.wu-wien.ac.at. postmaster.wu-wien.ac.at. 2019040400 3600 1800 604800 3600
## R-project.org.    7200  IN  NS  ns1.urbanek.info.
## R-project.org.    7200  IN  NS  ns1.wu-wien.ac.at.
## R-project.org.    7200  IN  NS  ns2.urbanek.info.
## R-project.org.    7200  IN  NS  ns2.wu-wien.ac.at.
## R-project.org.    7200  IN  NS  ns3.urbanek.info.
## R-project.org.    7200  IN  NS  ns4.urbanek.info.
## R-project.org.    7200  IN  A 137.208.57.37
## R-project.org.    7200  IN  MX  5 mc1.ethz.ch.
## R-project.org.    7200  IN  MX  5 mc2.ethz.ch.
## R-project.org.    7200  IN  MX  5 mc3.ethz.ch.
## R-project.org.    7200  IN  MX  5 mc4.ethz.ch.
## R-project.org.    7200  IN  TXT "v=spf1 ip4:129.132.119.208/32 ~all"
## cran.at.R-project.org.  7200  IN  CNAME cran.wu-wien.ac.at.
## beta.R-project.org. 7200  IN  A 137.208.57.37
## bugs.R-project.org. 7200  IN  CNAME rbugs.urbanek.info.
## cran.ch.R-project.org.  7200  IN  CNAME cran.wu-wien.ac.at.
## cloud.R-project.org.  7200  IN  CNAME d3caqzu56oq2n9.cloudfront.net.
## cran.R-project.org. 7200  IN  CNAME cran.wu-wien.ac.at.
## ftp.cran.R-project.org. 7200  IN  CNAME cran.wu-wien.ac.at.
## www.cran.R-project.org. 7200  IN  CNAME cran.wu-wien.ac.at.
## cran-archive.R-project.org. 7200 IN CNAME cran.wu-wien.ac.at.
## developer.R-project.org. 7200 IN  CNAME rdevel.urbanek.info.
## cran.es.R-project.org.  7200  IN  A 137.208.57.37
## ess.R-project.org.  7200  IN  CNAME ess.math.ethz.ch.
## journal.R-project.org.  7200  IN  CNAME cran.wu-wien.ac.at.
## mac.R-project.org.  7200  IN  CNAME r.research.att.com.
## portal.R-project.org. 7200  IN  CNAME r-project.org.
## r-forge.R-project.org.  7200  IN  CNAME r-forge.wu-wien.ac.at.
## *.r-forge.R-project.org. 7200 IN  CNAME r-forge.wu-wien.ac.at.
## search.R-project.org. 7200  IN  CNAME finzi.psych.upenn.edu.
## svn.R-project.org.  7200  IN  CNAME svn-stat.math.ethz.ch.
## translation.R-project.org. 7200 IN  CNAME translation.r-project.kr.
## cran.uk.R-project.org.  7200  IN  CNAME cran.wu-wien.ac.at.
## cran.us.R-project.org.  7200  IN  A 137.208.57.37
## user2004.R-project.org. 7200  IN  CNAME r-project.org.
## useR2006.R-project.org. 7200  IN  CNAME r-project.org.
## user2007.R-project.org. 7200  IN  CNAME r-project.org.
## useR2008.R-project.org. 7200  IN  CNAME r-project.org.
## useR2009.R-project.org. 7200  IN  CNAME r-project.org.
## user2010.R-project.org. 7200  IN  CNAME r-project.org.
## useR2011.R-project.org. 7200  IN  CNAME r-project.org.
## useR2012.R-project.org. 7200  IN  CNAME r-project.org.
## useR2013.R-project.org. 7200  IN  CNAME r-project.org.
## user2014.R-project.org. 7200  IN  CNAME user2014.github.io.
## useR2015.R-project.org. 7200  IN  CNAME r-project.org.
## useR2016.R-project.org. 7200  IN  CNAME user2016.github.io.
## useR2017.R-project.org. 7200  IN  CNAME r-project.org.
## useR2018.R-project.org. 7200  IN  CNAME user-2018.netlify.com.
## useR2019.R-project.org. 7200  IN  A 5.135.185.16
## wiki.R-project.org. 7200  IN  CNAME cran.wu-wien.ac.at.
## win-builder.R-project.org. 7200 IN  A 129.217.207.166
## win-builder.R-project.org. 7200 IN  MX  0 rdevel.urbanek.info.
## www.R-project.org.  7200  IN  CNAME cran.wu-wien.ac.at.
## R-project.org.    7200  IN  SOA ns0.wu-wien.ac.at. postmaster.wu-wien.ac.at. 2019040400 3600 1800 604800 3600

That’s not pretty, but it’s wrangle-able. Let’s turn it into a data frame:

xdf[["res"]][[2]] %>% # get the response text
  stri_split_lines(omit_empty = TRUE) %>%  # split it into lines
  flatten_chr() %>%  # turn it into a character vector
  stri_split_regex("[[:space:]]+", n = 5, simplify = TRUE) %>% # split at whitespace, limiting to five fields
  as_tibble(.name_repair = "unique") %>% # make it a tibble
  set_names(c("host", "ttl", "class", "record_type", "value")) %>% # better colnames
  mutate(host = stri_trans_tolower(host)) %>% # case matters not in DNS names
  print(n=nrow(.)) # see our results
## # A tibble: 55 x 5
##    host                     ttl   class record_type value                                                              
##    <chr>                    <chr> <chr> <chr>       <chr>                                                              
##  1 r-project.org.           7200  IN    SOA         ns0.wu-wien.ac.at. postmaster.wu-wien.ac.at. 2019040400 3600 1800 …
##  2 r-project.org.           7200  IN    NS          ns1.urbanek.info.                                                  
##  3 r-project.org.           7200  IN    NS          ns1.wu-wien.ac.at.                                                 
##  4 r-project.org.           7200  IN    NS          ns2.urbanek.info.                                                  
##  5 r-project.org.           7200  IN    NS          ns2.wu-wien.ac.at.                                                 
##  6 r-project.org.           7200  IN    NS          ns3.urbanek.info.                                                  
##  7 r-project.org.           7200  IN    NS          ns4.urbanek.info.                                                  
##  8 r-project.org.           7200  IN    A           137.208.57.37                                                      
##  9 r-project.org.           7200  IN    MX          5 mc1.ethz.ch.                                                     
## 10 r-project.org.           7200  IN    MX          5 mc2.ethz.ch.                                                     
## 11 r-project.org.           7200  IN    MX          5 mc3.ethz.ch.                                                     
## 12 r-project.org.           7200  IN    MX          5 mc4.ethz.ch.                                                     
## 13 r-project.org.           7200  IN    TXT         "\"v=spf1 ip4:129.132.119.208/32 ~all\""                           
## 14 cran.at.r-project.org.   7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 15 beta.r-project.org.      7200  IN    A           137.208.57.37                                                      
## 16 bugs.r-project.org.      7200  IN    CNAME       rbugs.urbanek.info.                                                
## 17 cran.ch.r-project.org.   7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 18 cloud.r-project.org.     7200  IN    CNAME       d3caqzu56oq2n9.cloudfront.net.                                     
## 19 cran.r-project.org.      7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 20 ftp.cran.r-project.org.  7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 21 www.cran.r-project.org.  7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 22 cran-archive.r-project.… 7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 23 developer.r-project.org. 7200  IN    CNAME       rdevel.urbanek.info.                                               
## 24 cran.es.r-project.org.   7200  IN    A           137.208.57.37                                                      
## 25 ess.r-project.org.       7200  IN    CNAME       ess.math.ethz.ch.                                                  
## 26 journal.r-project.org.   7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 27 mac.r-project.org.       7200  IN    CNAME       r.research.att.com.                                                
## 28 portal.r-project.org.    7200  IN    CNAME       r-project.org.                                                     
## 29 r-forge.r-project.org.   7200  IN    CNAME       r-forge.wu-wien.ac.at.                                             
## 30 *.r-forge.r-project.org. 7200  IN    CNAME       r-forge.wu-wien.ac.at.                                             
## 31 search.r-project.org.    7200  IN    CNAME       finzi.psych.upenn.edu.                                             
## 32 svn.r-project.org.       7200  IN    CNAME       svn-stat.math.ethz.ch.                                             
## 33 translation.r-project.o… 7200  IN    CNAME       translation.r-project.kr.                                          
## 34 cran.uk.r-project.org.   7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 35 cran.us.r-project.org.   7200  IN    A           137.208.57.37                                                      
## 36 user2004.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 37 user2006.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 38 user2007.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 39 user2008.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 40 user2009.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 41 user2010.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 42 user2011.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 43 user2012.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 44 user2013.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 45 user2014.r-project.org.  7200  IN    CNAME       user2014.github.io.                                                
## 46 user2015.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 47 user2016.r-project.org.  7200  IN    CNAME       user2016.github.io.                                                
## 48 user2017.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 49 user2018.r-project.org.  7200  IN    CNAME       user-2018.netlify.com.                                             
## 50 user2019.r-project.org.  7200  IN    A           5.135.185.16                                                       
## 51 wiki.r-project.org.      7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 52 win-builder.r-project.o… 7200  IN    A           129.217.207.166                                                    
## 53 win-builder.r-project.o… 7200  IN    MX          0 rdevel.urbanek.info.                                             
## 54 www.r-project.org.       7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 55 r-project.org.           7200  IN    SOA         ns0.wu-wien.ac.at. postmaster.wu-wien.ac.at. 2019040400 3600 1800

FIN

Zone transfers are a quick way to get all the DNS information for a site. As such, it isn’t generally recommended to allow zone transfers from just anyone (though trying to keep anything secret in public DNS is a path generally fraught with peril given how easy it is to brute-force record lookups). However, if r-project.org zone transfers stay generally open, then you can use this method to keep a local copy of r-project.org host info and make local /etc/hosts (or the Windows equivalent) entries when issues like the one this past week arise.