Roll Your Own Stats and Geoms in ggplot2 (Part 1: Splines!)

A huge change is coming to ggplot2 and you can get a preview of it over at Hadley’s github repo. I’ve been keenly interested in this as I will be fixing, finishing & porting coord_proj to it once it’s done.

Hadley & Winston have re-built ggplot2 with an entirely new object-oriented system called ggproto. With ggproto it’s now possible to easily extend ggplot2 from within your own packages (since source() is so last century), often times with very little effort.

Before attempting to port coord_proj I wanted to work through adding a Geom and Stat since thought it would be cool to be able to have interpolated line charts (and it helps answer some recurring StackOverflow “spline”/ggplot2 questions) and also prefer KernSmooth::bkde over the built-in density function (which geom_density and stat_density both use).

To that end, I’ve made a new github-installable package called ggalt (h/t to @jayjacobs for the better package name than I originally came up with) where I’ll be adding new Geoms, Stats, Coords (et al) as I craft them. For now, let me introduce both geom_xspline() and geom_bkde() to show how easy it is to incorporate new functionality into ggplot2.

While not a requirement, I think it’s a going to be a good idea to make both a paired Geom and Stat when adding those types of functionality to ggplot2. I found it easier to work with custom parameters this way and it also makes it feel a bit more like the way ggplot2 itself works. For the interpolated line geom/stat I used R’s graphics::xpsline function. Here’s all it took to give ggplot2 lines some curves (you can find the commented version on github):

geom_xspline <- function(mapping = NULL, data = NULL, stat = "xspline",
                      position = "identity", show.legend = NA,
                      inherit.aes = TRUE, na.rm = TRUE,
                      spline_shape=-0.25, open=TRUE, rep_ends=TRUE, ...) {
  layer(
    geom = GeomXspline,
    mapping = mapping,
    data = data,
    stat = stat,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(spline_shape=spline_shape,
                  open=open,
                  rep_ends=rep_ends,
                  ...)
  )
}

GeomXspline <- ggproto("GeomXspline", GeomLine,
  required_aes = c("x", "y"),
  default_aes = aes(colour = "black", size = 0.5, linetype = 1, alpha = NA)
)

stat_xspline <- function(mapping = NULL, data = NULL, geom = "line",
                     position = "identity", show.legend = NA, inherit.aes = TRUE,
                     spline_shape=-0.25, open=TRUE, rep_ends=TRUE, ...) {
  layer(
    stat = StatXspline,
    data = data,
    mapping = mapping,
    geom = geom,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(spline_shape=spline_shape,
                  open=open,
                  rep_ends=rep_ends,
                  ...
    )
  )
}

StatXspline <- ggproto("StatXspline", Stat,

  required_aes = c("x", "y"),

  compute_group = function(self, data, scales, params,
                           spline_shape=-0.25, open=TRUE, rep_ends=TRUE) {
    tf <- tempfile(fileext=".png")
    png(tf)
    plot.new()
    tmp <- xspline(data$x, data$y, spline_shape, open, rep_ends, draw=FALSE, NA, NA)
    invisible(dev.off())
    unlink(tf)

    data.frame(x=tmp$x, y=tmp$y)
  }
)

If that seems like alot of code, it really isn't. What we have there are:

  • two functions that handle the Geom aspects &
  • two functions that handle the Stat aspects.

Let's look at the Stat functions first, though you can also just read the handy vignette, too.

Adding Stats

In this particular case, we have it easy. We get to use geom_line/GeomLine as the base geom_ for the layer since all we're doing is generating more points for it to draw line segments between. We create the creative interface to our new Stat with stat_xspline add three new parameters with default values:

  • spline_shape
  • open
  • rep_ends

"Added three new parameters to what?" you ask? GeomLine/geom_line default to StatIdentity/stat_identity and if you look at the source code, that Stat just returns the data back in the form it came in. We're going to take these three new parameters and pass them to xspline and then return entirely new values back for ggplot2/grid to draw for us, so we tell it to call our new computation engine by giving it the StatXspline value to the layer. By using GeomLine/geom_line as the geom parameter, all we have to do is ensure we pass back the proper values. We do that in compute_group since ggplot2 will segment the incoming data into groups (via the group aesthetic) for us. We take each group and run them through the xspline with the parameters the user specified. If I didn't have to use the hack to work around what seems to be errant plot device issues in xspline, the call would be one line.

Adding Geoms

We pair up the Stat with a very basic Geom "shim" so we can use them interchangeably. It's the same idiom, an "object" function and the user-callable function. In this case, it's super-lightweight since we're really having geom_line do all the work for us. In a [very] future post, I'll cover more complex Geoms that require use of the underlying grid graphics system, but I suspect most of your own additions may be able to use the lightweight idiom here (and that's covered in the vignette).

Putting Our New Functions To Work

With our new additions to ggplot2, we can compare the output of geom_smooth to geom_xspline with some test data:

set.seed(1492)
dat <- data.frame(x=c(1:10, 1:10, 1:10),
                  y=c(sample(15:30, 10), 2*sample(15:30, 10), 3*sample(15:30, 10)),
                  group=factor(c(rep(1, 10), rep(2, 10), rep(3, 10)))
)

ggplot(dat, aes(x, y, group=group, color=factor(group))) +
  geom_point(color="black") +
  geom_smooth(se=FALSE, linetype="dashed", size=0.5) +
  geom_xspline(size=0.5)

README-unnamed-chunk-4-3

The github page has more examples for the function, but you don't have to be envious of the smooth D3 curves any more.

I realize this particular addition is not extremely helpful/beneficial, but the next one is. We'll look at adding a new/more accurate density Stat/Geom in the next installment and then discuss the "on-steroids" roxygen2 comments you'll end up using for your creations in part 3.

Cover image from Data-Driven Security
Amazon Author Page

4 Comments Roll Your Own Stats and Geoms in ggplot2 (Part 1: Splines!)

  1. Pingback: Roll Your Own Stats and Geoms in ggplot2 (Part 1: Splines!) | Mubashir Qasim

  2. Alex

    Pretty cool, exactly what I was looking for. Although in my case combination of ggplot() + geom_xspline() and then ggsave() or pdf() will crash the plot viewer and not produce a working pdf document. Any idea what could be wrong?

    Reply
    1. hrbrmstr

      I just tried it (copy/pasted the code from the blog post and did both ggsave/pdf on my M1 Mac Mini and it worked w/o any errors, so would likely need more info on the setup.

      Reply
      1. Alex

        Hi, thanks a lot for the answer! Here is a short reproducible example:

        library(ggplot2)

        Geom_xspline code here

        dat <- matrix(c(1,2,3,4,5,9,20,25,50,120,135,132), ncol=2)
        dat <- as.data.frame(dat)

        ggplot(dat, aes(x = V1, y = V2)) +
        geom_point(colour = “red”, size = 3) +
        geom_xspline(size=1, colour = “blue”)

        ggsave(“test.pdf”, height=40, width=90, unit=”mm”)
        #plot viewer crashes and saves an empty pdf document

        Also session info (if it helps)

        sessionInfo()
        R version 4.0.4 Patched (2021-02-17 r80030)
        Platform: x86_64-w64-mingw32/x64 (64-bit)
        Running under: Windows 10 x64 (build 19042)

        Matrix products: default

        locale:
        [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
        [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252

        attached base packages:
        [1] stats graphics grDevices utils datasets methods base

        other attached packages:
        [1] ggplot2_3.3.5

        loaded via a namespace (and not attached):
        [1] rstudioapi_0.13 magrittr_2.0.1 tidyselect_1.1.0 munsell_0.5.0 colorspace_2.0-0 R6_2.5.0 rlang_0.4.12
        [8] fansi_0.4.2 dplyr_1.0.7 tools_4.0.4 grid_4.0.4 gtable_0.3.0 utf8_1.1.4 DBI_1.1.1
        [15] withr_2.4.1 ellipsis_0.3.2 digest_0.6.27 yaml_2.2.1 assertthat_0.2.1 tibble_3.1.6 lifecycle_1.0.0
        [22] crayon_1.4.1 farver_2.0.3 purrr_0.3.4 vctrs_0.3.8 glue_1.4.2 labeling_0.4.2 compiler_4.0.4
        [29] pillar_1.6.4 generics_0.1.0 scales_1.1.1 pkgconfig_2.0.3

        Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.