Skip navigation

Category Archives: Javascript

The days are getting shorter and when we were visiting Down East Maine the other week, there was just a hint of some trees starting to change up their leaf palettes. It was a solid reminder to re-up my ~annual “foliage” plotting that I started way back in 2017.

The fine folks over at Smoky Mountains — (“the most authoritative source for restaurants, attractions, & cabin rentals in the Smoky Mountains”) — have been posting an interactive map of ConUS foliage predictions for many years and the dataset they curate and use for that is also very easy to use in R and other contexts.

This year, along with the usual R version, I have also made:

The only real changes to R version were to add some code to make a more usable JSON for the JavaScript versions of the project, and to take advantage of the .progress parameter to {purrr}’s walk function.

The Observable notebook version (one frame of that is above) makes use of Observable Plot’s super handy geo mark, and also shows how to do some shapefile surgery to avoid plotting Alaska & Hawaii (the Smoky Mountains folks only provide predictions for ConUS).

After using the Reveal QMD extension to make the Quarto project, the qmd document rendered fine, but I tweaked the YAML to send the output to the GH Pages-renderable docs/ directory, and combined some of the OJS blocks to tighten up the document. You’ll see some Quarto “error” blocks, briefly, since there the QMD fetches imports from Observable. You can get around that by moving all the imported resources to the Observable notebook before generating the QMD, but that’s an exercise left to the reader.

And, since I’m a fan of both Lit WebComponents and Tachyons CSS, I threw together a version using them (+ Observable Plot) to further encourage folks to get increasingly familiar with core web tech. Tachyons + Plot make it pretty straightforward to create responsive pages, too (resize the browser and toggle system dark/light mode to prove that). The Lit element’s CSS section also shows how to style Plot’s legend a bit.

Hit up the GH page to see the animated gif (I’ve stared at it a bit too much to include it in the post).

Drop any q’s here or in the GH issues, and — if anyone makes a Shiny version — please let me know, and I’ll add all links to any of those here and on the GH page.

FIN

While it is all well and good to plot foliage prediction maps, please also remember to take some time away from your glowing rectangles to go and actually observe the fall palette changes IRL.

While the future of Bluesky is nowhere near certain, it is most certainly growing. It’s also the largest community of users for the AT Protocol.

Folks are using Bluesky much the same way as any online forum/chat. One of those ways is to share URLs to content.

For the moment, it is possible to eavesdrop on the Bluesky “firehose” sans authentication. I’ve been curious as to what folks are sharing on the platform and decided to do more than poke at it casually in my hacky terminal firehose viewer.

This GitLab project contains all the code necessary to log URLs seen in the firehose to a local SQLite database. As Bluesky grows, this will definitely not scale, but it’s fine for right now, and scaling just means moving the websocket capture client to a more capable environment than my home server and setting up something like a Kafka stream. Might as well move to Postgres while we’re at it.

But, for now, this lightweight script/database is fine.

NOTE: I’m deliberately not tracking any other data, but the code is easy to modify to log whatever you want from the firehose post.

I’m syncing the data to this server every ~30 minutes or so and have created an Observable notebook which keeps track of the most popular domains.

I don’t know what card.syui.ai is (Perplexity had some ideas), but it appears to be some AI-driven “card” game that has AT protocol and ActivityPub integration. Due to the programmatic nature of the posts with URLs containing that domain, I suspect it’ll be in the lead for quite some time.

There are some neat sites in the long tail of the distribution.

I think I’ll set up one to monitor post with CVE’s, soon, too.

I have a post coming on using base and {ggplot2} plots in VanillaJS WebR, but after posting some bits on social media regarding how slow {ggplot2} is to deal with, I had some “performance”-related inquiries, which led me down a rabbit hole that I’m, now, dragging y’all down into as well.

First, a preview of the aforementioned plot/graphics:

I encourage you to load both of them before continuuing to see why I was curious about package load times.

Getting A Package Into WebR: A Look At {ggplot2}

If we strip away all the cruft, this is the core way to install a package into WebR and make it available to a freshly minted WebR context:

import { WebR } from '/webr/webr.mjs';
globalThis.webR = new WebR({ WEBR_URL: "/webr/", SW_URL: "/w/bench/",});
await globalThis.webR.init();
await globalThis.webR.installPackages(['PACKAGE'])
await globalThis.webR.evalRVoid('library(PACKAGE)')

Let’s look at what happens in the browser during the call to installPackages() when PACKAGE is ggplot2:

Screen capture of DevTools showing ggplot2 dependent packages loading.

Screen capture of DevTools showing ggplot2 dependent packages loading.

Dependent libraries are sequentially loaded until we finally get to ggplot2 (foregoeing {} from now on). There are 28 packages for ggplot2 (including itself) and they have a really skewed package size distribution:

Min.   :   6K
1st Qu.: 108K
Median : 481K
Mean   : 950K
3rd Qu.: 1.2M
Max.   : 5.4M

The good thing is, though, that the browser will cache them (for some period of time) so they aren’t re-downloaded every time you need them. Because of this, we’re going to ignore download time from consideration since they’re all, as we’ll see, below, yanked form cache in single-digit milliseconds.

When you call library(PACKAGE) R code gets executed, and that takes time. On modern desktops with local R installs, you almost never notice the time passage for this. This is not the case for WebR:

Screen capture of the ggplot2 package loading part of a Developer Tools waterfall chart.

Screen capture of the ggplot2 package loading part of a Developer Tools waterfall chart.

The Matrix, mgcv, and farver packages grind things to a halt. You felt that if you hit up the example at the beginning of the post. Brutal. Painful. Terrible.

This got me curious about all the other packages that are available to WebR (93 as of the date on this post).

Approaching R Package Load/library Benchmarking In A Browser

Much like the skewed package file size distribution of presently available R WASM packages, the per-package dependency distribution is also pretty skewed:

Min.   :  1
1st Qu.:  1
Median :  1
Mean   :  2
3rd Qu.:  2
Max.   : 15

This is good! It means you’re mostly safe to have fun with WebR and do not have to focus on working around an initial slowdown. Still, this did not deter me from a time sink.

I had to figure out a way to individually test the install/library of each WASM R packed independently, in a fresh WebR context.

One obvious way is to make 93 HTML files and load them all by hand.

O_O

There had to be a better way, and I immediately turned to “iframes” as a solution.

While I could have scripted the creation of proper for HTML 93 iframes to be put into a page, that’s not a great idea for a number of reasons:

  • that’ll crash every modern browser: far too many child iframes, all with their own DOM contexts sounds horrible
  • 93 “simultaneous” WebR initializations would consume all browser resources and DoS the tab
  • the “simultaneous” loading would skew timing results, even when the package files are cached

The solution was to use dynamically created iframes. One potential “gotcha” for this could have been the modern browser security model. Thanks to some dangerous hardware-level weakness that were discovered and exploited a few years back, Chrome and other browsers shored up the safety contracts between iframes and parent pages. Not doing so could have allowed attackers to have some fun at your expense.

If you’ve been following along the past week or so, to get the best performance with WebR, you need to make sure certain HTTP headers are in place so the browser can trust what you’re doing enough to relax some restrictions. Dynamically created iframes have no “headers”, per-se, but the clever folks who make browser bits for a living came up with a way to handle this. We just need to mark the frame as credentialless and we’ll get good performance (please read the link to get more context).

So, we can run a slightly expanded version of the (way) above javascript code to get timer stats, but how do we collect them?

Well, the parent of the iframe can talk to the iframe and vice-versa via postMessage(), so all we need to do is have the iframe send data back to the parent when it is done. This is also a signal we can kill the child iframe, freeing up resources, and then move on to the next one.

An Unexpected Twist

It turns out that some WASM-ified R packages are busted. Specifically:

  • fs
  • Hmisc
  • latticeExtra
  • pkgLoad

Some functions in each of them are needed by one or more other packages, but — as you’ll see if you run the benchmark site — they fail to library() after installation.

This was a “gotcha” I just had to wrap a try/catch block around, and also pass back information about.

Putting It All Together

You can run your own benchmarks at this playground page. View-source on the page to see the code (there’s just index.html and style.css). You can also see it at the WebR Experiments repo.

When the page loads, it fetches the last produced copy of https://rud.is/data/webr-packages.json. This is a JSON file I’m generating every night that contains all the packages available in “WASM notCRAN”. It just steals PACKAGES.rds every day and serializes it to JSON. Feel free to use it (if you get a CORS error lemme know; you shouldn’t but it’s an odd year).

Controls and sample output for the benchmark site.

The first thing your eyes will likely be drawn to is: “✅ Context is cross-origin isolated!”. When I was debugging early on WebR performance issues, George (the Godfather of WebR) noted that we needed certain headers to get those aforementioned safety restrictions loosened up a bit. You can test the global crossOriginIsolated variable to see if you’ve setup the headers correctly and read more about it when you have time. While it’s not needed on that page, I left it in so I could write this paragraph.

You’ll also see a “download results?” checkbox that is by default un-checked. If checked, you’ll get a JSON file with all the results in the table that is dynamically constructed.

After you tap “Begin Benchmark”, you can go get a matcha and come back.

You’ll see the results in a table and a surprise Observable Plot histogram (the post’s featured image).

I disable the controls after the run since you really should close the tab and start a fresh one (not just a reload) to get a clean context.

If you use the site and download the JSON, you can hit up this Observable notebook and put the JSON in a fork of it. I would also not mind it if you could post your JSON to the WebR Experiments repo as an issue and include the browser and system config you were using at the time.

FIN

This was a fun distraction, and shows you can use most of the presently available WebR packages without concern.

Make sure to check back for those WebR graphics posts!

I’ve been (mostly) keeping up with annual updates for my R/{sf} U.S. foliage post which you can find on GH. This year, we have Quarto, and it comes with so many batteries included that you’d think it was Christmas. One of those batteries is full support for the Observable runtime. These are used in {ojs} Quarto blocks, and rendered versions can run anywhere.

The Observable platform is great for both tinkering and publishing (we’re using it at work for some quick or experimental vis work), and with a few of the recent posts, here, showing how to turn Observable notebooks into Quarto documents, you’re literally two clicks or one command line away from using any public Observable notebook right in Quarto.

I made a version of the foliage vis in Observable and then did the qmd conversion using the Chrome extension, tweaked the source a bit and published the same in Quarto.

The interactive datavis uses some foundational Observable/D3 libraries:

In the JS code we set some datavis-centric values:

foliage_levels = [0.01, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6]
foliage_colors = ["#83A87C", "#FCF6B4", "#FDCC5C", "#F68C3F", "#EF3C23", "#BD1F29", "#98371F"]
foliage_labels = ["No Change", "Minimal", "Patchy", "Partial", "Near Peak", "Peak", "Past Peak"]
week_label = ["Sept 5th", "Sept 12th", "Sept 19th", "Sept 26th", "Oct 3rd", "Oct 10th", "Oct 17th", "Oct 24th", "Oct 31st", "Nov 7th", "Nov 14th", "Nov 21st"]

We then borrow the U.S. Albers-projected topojson file from the Choropleth example notebook and rebuild the outline mesh and county geometry collections, since we need to get rid of Alaska and Hawaii (they’re not present in the source data). We do this by filtering out two FIPS codes:

counties = {
  var cty = topojson.feature(us, us.objects.counties);
  cty.features = cty.features.filter(
    (d) => (d.id.substr(0, 2) != "02") & (d.id.substr(0, 2) != "15")
  );
  return cty;
}

I also ended up modifying the source CSV a bit to account for missing counties.

After that, it was a straightforward call to our imported Choropleth function:

chart = Choropleth(rendered2022, {
  id: (d) => d.id.toString().padStart(5, "0"), // this is needed since the CSV id column is numeric
  value: (d) => d[week_label.indexOf(week) + 1], // this gets the foliage value based on which index the selected week is at
  scale: d3.scaleLinear, // this says to map foliage_levels to foliage_colors directly
  domain: foliage_levels,
  range: foliage_colors,
  title: (f, d) =>
    `${f.properties.name}, ${statemap.get(f.id.slice(0, 2)).properties.name}`, // this makes the county hover text the county + state names
  features: counties, // this is the counties we modified
  borders: statemesh, // this is the statemesh
  width: 975,
  height: 610
})

and placing the legend and scrubbing slider.

The only real difference between the notebook and qmd is the inclusion of the source functions rather than use Observable’s import (I’ve found that there’s a slight load delay for imports when network conditions aren’t super perfect and the inclusion of the source — WITH copyrights — makes up for that).

I’ve set up the Quarto project so that renders go to the docs/ directory, which makes it easy to publish as a GH page.

FIN

Drop issues on GH if anything needs clarifying or fixing and go experiment! You can’t break anything either on Observable or locally that version control can’t fix (yes, Observable has version control!).

Some things to consider modifying/adding:

  • have a click take you to a (selectable?) mapping service, so folks can get driving directions
  • turn the hover text into a proper tooltip
  • speed up or slow down the animation when ‘Play’ is tapped
  • use different colors
  • bring in older datasets (see the foliage GH repo) and make multiple maps or let the user select them or have them compare across years

The fine folks over at @ObservableHQ released a new javascript exploratory visualization library called Plot last week with great fanfare. It was primarily designed to be used in Observable notebooks and I quickly tested it out there (you can find them at my Observable landing page: https://observablehq.com/@hrbrmstr).

{Plot} doesn’t require Observable, however, and I threw together a small example that dynamically tracks U.S. airline passenger counts by the TSA to demonstrate how to use it in a plain web page.

It’s small enough that I can re-create it here:

TSA Total Traveler Throughput 2021 vs 2020 vs 2019 (same weekday)


and include the (lightly annotated) source:

fetch(
"https://observable-cors.glitch.me/https://www.tsa.gov/coronavirus/passenger-throughput",
{
  cache: "no-store",
  mode: "cors",
  redirect: "follow"
}
)
.then((response) => response.text()) // we get the text here
.then((html) => {

   var parser = new DOMParser();
   var doc = parser.parseFromString(html, "text/html"); // we turn it into DOM elements here

   // some helpers to make the code less crufty
   // first a function to make proper dates

   var as_date = d3.timeParse("%m/%d/%Y");

   // and, now, a little function to pull a specific <table> column and
   // convert it to a proper numeric array. I would have put this inline
   // if we were only converting one column but there are three of them,
   // so it makes sense to functionize it.

   var col_double = (col_num) => {
     return Array.from(
     doc.querySelectorAll(`table tbody tr td:nth-child(${col_num})`)
     ).map((d) => parseFloat(d.innerText.trim().replace(/,/g, "")));
   };

   // build an arquero table from the scraped columns

   var flights = aq
         .table({
            day: Array.from(
                   doc.querySelectorAll("table tbody tr td:nth-child(1)")
                 ).map((d) => as_date(d.innerText.trim().replace(/.{4}$/g, "2021"))),
            y2021: col_double(2),
            y2020: col_double(3),
            y2019: col_double(4)
        })
        .orderby("day")
        .objects()
        .filter((d) => !isNaN(d.y2021))

   document.getElementById('vis').appendChild(
     Plot.plot({
       marginLeft: 80,
       x: {
         grid: true
       },
       y: {
         grid: true,
         label: "# Passengers"
       },
       marks: [
         Plot.line(flights, { x: "day", y: "y2019", stroke: "#4E79A7" }),
         Plot.line(flights, { x: "day", y: "y2020", stroke: "#F28E2B" }),
         Plot.line(flights, { x: "day", y: "y2021", stroke: "#E15759" })
       ]
    })
  );

})
.catch((err) => err)

FIN

I’ll likely do a more in-depth piece on Plot in the coming weeks (today is Mother’s Dayin the U.S. and that’s going to consume most of my attention today), but I highly encourage y’all to play with this new, fun tool.

I caught this post on the The Surprising Number Of Programmers Who Can’t Program from the Hacker News RSS feed. Said post links to another, classic post on the same subject and you should read both before continuing.

Back? Great! Let’s dig in.

Why does hrbrmstr care about this?

Offspring #3 completed his Freshman year at UMaine Orono last year but wanted to stay academically active over the summer (he’s majoring in astrophysics and knows he’ll need some programming skills to excel in his field) and took an introductory C++ course from UMaine that was held virtually, with 1 lecture per week (14 weeks IIRC) and 1 assignment due per week with no other grading.

After seeing what passes for a standard (UMaine is not exactly on the top list of institutions to attend if one wants to be a computer scientist) intro C++ course, I’m not really surprised “Johnny can’t code”. Thirteen weeks in the the class finally started covering OO concepts, and the course is ending with a scant intro to polymorphism. Prior to this, most of the assignments were just variations on each other (read from stdin, loop with conditionals, print output) with no program going over 100 LoC (that includes comments and spacing). This wasn’t a “compsci for non-compsci majors” course, either. Anyone majoring in an area of study that requires programming could have taken this course to fulfill one of the requirements, and they’d be set on a path of forever using StackOverflow copypasta to try to get their future work done.

I’m fairly certain most of #3’s classmates could not program fizzbuzz without googling and even more certain most have no idea they weren’t really “coding in C++” most of the course.

If this is how most other middling colleges are teaching the basics of computer programming, it’s no wonder employers are having a difficult time finding qualified talent.

You have an “R” tag — actually, a few language tags — on this post, so where’s the code?

After the article triggered the lament in the previous section, a crazy, @coolbutuseless-esque thought came into my head: “I wonder how many different language FizzBuz solutions can be created from within R?”.

The criteria for that notion is/was that there needed to be some Rcpp::cppFunction(), reticulate::py_run_string(), V8 context eval()-type way to have the code in-R but then run through those far-super-to-any-other-language’s polyglot extensibility constructs.

Before getting lost in the weeds, there were some other thoughts on language inclusion:

  • Should Java be included? I :heart: {rJava}, but cat()-ing Java code out and running system() to compile it first seemed like cheating (even though that’s kinda just what cppFunction() does). Toss a note into a comment if you think a Java example should be added (or add said Java example in a comment or link to it in one!).
  • I think Julia should be in this example list but do not care enough about it to load {JuliaCall} and craft an example (again, link or post one if you can crank it out quickly).
  • I think Lua could be in this example given the existence of {luar}. If you agree, give it a go!
  • Go & Rust compiled code can also be called in R (thanks to Romain & Jeroen) once they’re turned into C-compatible libraries. Should this polyglot example show this as well?
  • What other languages am I missing?

The aforementioned “weeds”

One criteria for each language fizzbuzz example is that they need to be readable, not hacky-cool. That doesn’t mean the solutions still can’t be a bit creative. We’ll lightly go through each one I managed to code up. First we’ll need some helpers:

suppressPackageStartupMessages({
  library(purrr)
  library(dplyr)
  library(reticulate)
  library(V8)
  library(Rcpp)
})

The R, JavaScript, and Python implementations are all in the microbenchmark() call way down below. Up here are C and C++ versions. The C implementation is boring and straightforward, but we’re using Rprintf() so we can capture the output vs have any output buffering woes impact the timings.

cppFunction('
void cbuzz() {

  // super fast plain C

  for (unsigned int i=1; i<=100; i++) {
    if      (i % 15 == 0) Rprintf("FizzBuzz\\n");
    else if (i %  3 == 0) Rprintf("Fizz\\n");
    else if (i %  5 == 0) Rprintf("Buzz\\n");
    else Rprintf("%d\\n", i);
  }

}
')

The cbuzz() example is just fine even in C++ land, but we can take advantage of some C++11 vectorization features to stay formally in C++-land and play with some fun features like lambdas. This will be a bit slower than the C version plus consume more memory, but shows off some features some folks might not be familiar with:

cppFunction('
void cppbuzz() {

  std::vector<int> numbers(100); // will eventually be 1:100
  std::iota(numbers.begin(), numbers.end(), 1); // kinda sorta equiva of our R 1:100 but not exactly true

  std::vector<std::string> fb(100); // fizzbuzz strings holder

  // transform said 1..100 into fizbuzz strings
  std::transform(
    numbers.begin(), numbers.end(), 
    fb.begin(),
    [](int i) -> std::string { // lambda expression are cool like a fez
        if      (i % 15 == 0) return("FizzBuzz");
        else if (i %  3 == 0) return("Fizz");
        else if (i %  5 == 0) return("Buzz");
        else return(std::to_string(i));
    }
  );

  // round it out with use of for_each and another lambda
  // this turns out to be slightly faster than range-based for-loop
  // collection iteration syntax.
  std::for_each(
    fb.begin(), fb.end(), 
    [](std::string s) { Rcout << s << std::endl; }
  );

}
', 
plugins = c('cpp11'))

Both of those functions are now available to R.

Next, we need to prepare to run JavaScript and Python code, so we’ll initialize both of those environments:

ctx <- v8()

py_config() # not 100% necessary but I keep my needed {reticulate} options in env vars for reproducibility

Then, we tell R to capture all the output. Using sink() is a bit better than capture.output() in this use-case since to avoid nesting calls, and we need to handle Python stdout the same way py_capture_output() does to be fair in our measurements:

output_tools <- import("rpytools.output")
restore_stdout <- output_tools$start_stdout_capture()

cap <- rawConnection(raw(0), "r+")
sink(cap)

There are a few implementations below across the tidy and base R multiverse. Some use vectorization; some do not. This will let us compare overall “speed” of solution. If you have another suggestion for a readable solution in R, drop a note in the comments:

microbenchmark::microbenchmark(

  # tidy_vectors_case() is slowest but you get all sorts of type safety 
  # for free along with very readable idioms.

  tidy_vectors_case = map_chr(1:100, ~{ 
    case_when(
      (.x %% 15 == 0) ~ "FizzBuzz",
      (.x %%  3 == 0) ~ "Fizz",
      (.x %%  5 == 0) ~ "Buzz",
      TRUE ~ as.character(.x)
    )
  }) %>% 
    cat(sep="\n"),

  # tidy_vectors_if() has old-school if/else syntax but still
  # forces us to ensure type safety which is cool.

  tidy_vectors_if = map_chr(1:100, ~{ 
    if (.x %% 15 == 0) return("FizzBuzz")
    if (.x %%  3 == 0) return("Fizz")
    if (.x %%  5 == 0) return("Buzz")
    return(as.character(.x))
  }) %>% 
    cat(sep="\n"),

  # walk() just replaces `for` but stays in vector-land which is cool

  tidy_walk = walk(1:100, ~{
    if (.x %% 15 == 0) cat("FizzBuzz\n")
    if (.x %%  3 == 0) cat("Fizz\n")
    if (.x %%  5 == 0) cat("Buzz\n")
    cat(.x, "\n", sep="")
  }),

  # vapply() gets us some similiar type assurance, albeit with arcane syntax

  base_proper = vapply(1:100, function(.x) {
    if (.x %% 15 == 0) return("FizzBuzz")
    if (.x %%  3 == 0) return("Fizz")
    if (.x %%  5 == 0) return("Buzz")
    return(as.character(.x))
  }, character(1), USE.NAMES = FALSE) %>% 
    cat(sep="\n"),

  # sapply() is def lazy but this can outperform vapply() in some
  # circumstances (like this one) and is a bit less arcane.

  base_lazy = sapply(1:100, function(.x) {
    if (.x %% 15 == 0)  return("FizzBuzz")
    if (.x %%  3 == 0) return("Fizz")
    if (.x %%  5 == 0) return("Buzz")
    return(.x)
  }, USE.NAMES = FALSE) %>% 
    cat(sep="\n"),

  # for loops...ugh. might as well just use C

  base_for = for(.x in 1:100) {
    if      (.x %% 15 == 0) cat("FizzBuzz\n")
    else if (.x %%  3 == 0) cat("Fizz\n")
    else if (.x %%  5 == 0) cat("Buzz\n")
    else cat(.x, "\n", sep="")
  },

  # ok, we'll just use C!

  c_buzz = cbuzz(),

  # we can go back to vector-land in C++

  cpp_buzz = cppbuzz(),

  # some <3 for javascript

  js_readable = ctx$eval('
for (var i=1; i <101; i++){
  if      (i % 15 == 0) console.log("FizzBuzz")
  else if (i %  3 == 0) console.log("Fizz")
  else if (i %  5 == 0) console.log("Buzz")
  else console.log(i)
}
'),

  # icky readable, non-vectorized python

  python = reticulate::py_run_string('
for x in range(1, 101):
  if (x % 15 == 0):
    print("Fizz Buzz")
  elif (x % 5 == 0):
    print("Buzz")
  elif (x % 3 == 0):
    print("Fizz")
  else:
    print(x)
')

) -> res

Turn off output capturing:

sink()
if (!is.null(restore_stdout)) invisible(output_tools$end_stdout_capture(restore_stdout))

We used microbenchmark(), so here are the results:

res
## Unit: microseconds
##               expr       min         lq        mean     median         uq       max neval   cld
##  tidy_vectors_case 20290.749 21266.3680 22717.80292 22231.5960 23044.5690 33005.960   100     e
##    tidy_vectors_if   457.426   493.6270   540.68182   518.8785   577.1195   797.869   100  b   
##          tidy_walk   970.455  1026.2725  1150.77797  1065.4805  1109.9705  8392.916   100   c  
##        base_proper   357.385   375.3910   554.13973   406.8050   450.7490 13907.581   100  b   
##          base_lazy   365.553   395.5790   422.93719   418.1790   444.8225   587.718   100 ab   
##           base_for   521.674   545.9155   576.79214   559.0185   584.5250   968.814   100  b   
##             c_buzz    13.538    16.3335    18.18795    17.6010    19.4340    33.134   100 a    
##           cpp_buzz    39.405    45.1505    63.29352    49.1280    52.9605  1265.359   100 a    
##        js_readable   107.015   123.7015   162.32442   174.7860   187.1215   270.012   100 ab   
##             python  1581.661  1743.4490  2072.04777  1884.1585  1985.8100 12092.325   100    d 

Said results are 🤷🏻‍♀️ since this is a toy example, but I wanted to show that Jeroen’s {V8} can be super fast, especially when there’s no value marshaling to be done and that some things you may have thought should be faster, aren’t.

FIN

Definitely add links or code for changes or additions (especially the aforementioned other languages). Hopefully my lament about the computer science program at UMaine is not universally true for all the programming courses there.

Matt @stiles is a spiffy data journalist at the @latimes and he posted an interesting chart on U.S. Attorneys General longevity (given that the current US AG is on thin ice):

I thought it would be neat (since Matt did the data scraping part already) to look at AG tenure distribution by party, while also pointing out where Sessions falls.

Now, while Matt did scrape the data, it’s tucked away into a javascript variable in an iframe on the page that contains his vis.

It’s still easier to get it from there vs re-scrape Wikipedia (like Matt did) thanks to the V8 package by @opencpu.

The following code:

  • grabs the vis iframe
  • extracts and evaluates the target javascript to get a nice data frame
  • performs some factor re-coding (for better grouping and to make it easier to identify Sessions)
  • plots the distributions using the beeswarm quasirandom alogrithm
library(V8)
library(rvest)
library(ggbeeswarm)
library(hrbrthemes)
library(tidyverse)

pg <- read_html("http://mattstiles.org/dailygraphics/graphics/attorney-general-tenure-20172517/child.html?initialWidth=840&childId=pym_0&parentTitle=Chart%3A%20If%20Ousted%2C%20Jeff%20Sessions%20Would%20Have%20a%20Historically%20Short%20Tenure%20%7C%20The%20Daily%20Viz&parentUrl=http%3A%2F%2Fthedailyviz.com%2F2017%2F07%2F25%2Fchart-if-ousted-jeff-sessions-would-have-a-historically-short-tenure%2F")

ctx <- v8()
ctx$eval(html_nodes(pg, xpath=".//script[contains(., 'DATA')]") %>% html_text())

ctx$get("DATA") %>% 
  as_tibble() %>% 
  readr::type_convert() %>% 
  mutate(party = ifelse(is.na(party), "Other", party)) %>% 
  mutate(party = fct_lump(party)) %>% 
  mutate(color1 = case_when(
    party == "Democratic" ~ "#313695",
    party == "Republican" ~ "#a50026",
    party == "Other" ~ "#4d4d4d")
  ) %>% 
  mutate(color2 = ifelse(grepl("Sessions", label), "#2b2b2b", "#00000000")) -> ags

ggplot() + 
  geom_quasirandom(data = ags, aes(party, amt, color = color1)) +
  geom_quasirandom(data = ags, aes(party, amt, color = color2), 
                   fill = "#ffffff00", size = 4, stroke = 0.25, shape = 21) +
  geom_text(data = data_frame(), aes(x = "Republican", y = 100, label = "Jeff Sessions"), 
            nudge_x = -0.15, family = font_rc, size = 3, hjust = 1) +
  scale_color_identity() +
  scale_y_comma(limits = c(0, 4200)) +
  labs(x = "Party", y = "Tenure (days)", 
       title = "U.S. Attorneys General",
       subtitle = "Distribution of tenure in office, by days & party: 1789-2017",
       caption = "Source data/idea: Matt Stiles <bit.ly/2vXAHTM>") +
  theme_ipsum_rc(grid = "XY")

I turned the data into a CSV and stuck it in this gist if folks want to play w/o doing the js scraping.

This made the rounds on social media last week:

One of the original versions was static and was not nearly as popular, but—as you can see—this one went viral.

Despite the public’s infatuation with circles (I’m lookin’ at you, pie charts), I’m not going to reproduce this polar coordinate visualization in ggplot2. I believe others have already done so (or are doing so) and you can mimic the animation pretty easily with `coord_polar()` and @drob’s enhanced ggplot2 animation tools.

NOTE: If you’re more interested in the stats/science than a spirograph or colorful D3 animation (below), Gavin Simpson (@ucfagls) has an [awesome post](http://www.fromthebottomoftheheap.net/2016/03/25/additive-modeling-global-temperature-series-revisited/) with a detailed view of the HadCRUT data set.

## HadCRUT in R

I noticed that [the original data source](http://www.metoffice.gov.uk/hadobs/hadcrut4/), had 12 fields, two of which (columns 11 & 12) are the lower+upper bounds of the 95% confidence interval of the combined effects of all the uncertainties described in the HadCRUT4 error model (measurement and sampling, bias and coverage uncertainties). The spinning vis of doom may be mesmerizing, but it only shows the median. I thought it might be fun to try to make a good looking visualization using the CI as well (you can pick one of the other pairs to try this at home), both in R and then in D3. I chose D3 for the animated version mostly to play with the new 4.0 main branch, but I think it’s possible to do more with dynamic visualizations in D3 than it is with R (and it doesn’t require stop-motion techniques).

The following code:

– reads in the data set (and saves it locally to be nice to their bandwidth bill)
– does some munging to get fields we need
– saves a version out for use with D3
– uses `geom_segment()` + `geom_point()` to do the heavy lifting
– colors the segments by year using the `viridis` palette (the Plasma version)
– labels the plot by decade using facets and some fun facet margin “tricks” to make it look like the x-axis labels are on top

library(readr)    # read_table() / write_csv()
library(dplyr)
library(zoo)      # as.yearmon()
library(ggplot2)  # devtools::install_github("hadley/ggplot2")
library(hrbrmisc) # devtools::install_github("hrbrmstr/hrbrmisc")
library(viridis)

URL <- "http://www.metoffice.gov.uk/hadobs/hadcrut4/data/current/time_series/HadCRUT.4.4.0.0.monthly_ns_avg.txt"
fil <- sprintf("data/%s", basename(URL))
if (!file.exists(fil)) download.file(URL, fil)

global_temps <- read_table(fil, col_names=FALSE)

global_temps %>%
  select(year_mon=1, median=2, lower=11, upper=12) %>%
  mutate(year_mon=as.Date(as.yearmon(year_mon, format="%Y/%m")),
         year=as.numeric(format(year_mon, "%Y")),
         decade=(year %/% 10) * 10,
         month=format(year_mon, "%b")) %>%
  mutate(month=factor(month, levels=month.abb)) %>%
  filter(year != 2016) -> global_temps

# for D3 vis
write_csv(global_temps, "data/temps.csv")

#+ hadcrut, fig.retina=2, fig.width=12, fig.height=6
gg <- ggplot(global_temps)
gg <- gg + geom_segment(aes(x=year_mon, xend=year_mon, y=lower, yend=upper, color=year), size=0.2)
gg <- gg + geom_point(aes(x=year_mon, y=median), color="white", shape=".", size=0.01)
gg <- gg + scale_x_date(name="Median in white", expand=c(0,0.5))
gg <- gg + scale_y_continuous(name=NULL, breaks=c(0, 1.5, 2),
                              labels=c("0°C", "1.5°C", "2.0°C"), limits=c(-1.6, 2.25))
gg <- gg + scale_color_viridis(option="C")
gg <- gg + facet_wrap(~decade, nrow=1, scales="free_x")
gg <- gg + labs(title="Global Temperature Change (1850-2016)",
                subtitle="Using lower and upper bounds of the 95% confidence interval of the combined effects of all the uncertainties described in the HadCRUT4 error model (measurement and sampling, bias and coverage uncertainties; fields 11 & 12)",
                caption="HadCRUT4 (http://www.metoffice.gov.uk/hadobs/hadcrut4/index.html)")
gg <- gg + theme_hrbrmstr_my(grid="XY")
gg <- gg + theme(panel.background=element_rect(fill="black", color="#2b2b2b", size=0.15))
gg <- gg + theme(panel.margin=margin(0,0,0,0))
gg <- gg + theme(panel.grid.major.y=element_line(color="#b2182b", size=0.25))
gg <- gg + theme(strip.text=element_text(hjust=0.5))
gg <- gg + theme(axis.title.x=element_text(hjust=0, margin=margin(t=-10)))
gg <- gg + theme(axis.text.x=element_blank())
gg <- gg + theme(axis.text.y=element_text(size=12, color="#b2182b"))
gg <- gg + theme(legend.position="none")
gg <- gg + theme(plot.margin=margin(10, 10, 10, 10))
gg <- gg + theme(plot.caption=element_text(margin=margin(t=-6)))
gg

hadcrut

(Click image for larger version)

My `theme_hrbrmstr_my()` required the Myriad Pro font, so you’ll need to use one of the other themes in the `hrbrmisc` package or fill in some `theme()` details on your own.

## HadCRUT in D3

While the static visualization is pretty, we can kick it up a bit with some basic animations. Rather than make a multi-file HTML+js+D3+CSS example, this is all self-contained (apart from the data) in a single `index.html` file (some folks asked for the next D3 example to be self-contained).

Some nice new features of D3 4.0 (that I ended up using here):

– easier to use `scale`s
– less verbose `axis` creation
– `viridis` is now a first-class citizen

Mike Bostock has spent much time refining the API for [D3 4.0](https://github.com/d3/d3) and it shows. I’m definitely looking forward to playing with it over the rest of the year.

The vis is below but you can bust the `iframe` via [https://rud.is/projects/hadcrut4/](https://rud.is/projects/hadcrut4/).

I have it setup as “click to view” out of laziness. It’s not hard to make it trigger on `div` scroll visibility, but this way you also get to repeat the visualization animation without it looping incessantly.

If you end up playing with the D3 code, definitely change the width. I had to make it a bit smaller to fit it into the blog theme.

## Fin

You can find the source for both the R & D3 visualizations [on github](https://github.com/hrbrmstr/hadcrut).