Candy Coated Confidence Intervals

@mrshrbrmstr hinted that she would like this post by @RickWicklin translated into R for her stats class. She’s quite capable of cranking out the translation of the core component of that post — a call to chisq.test — but she wanted to show the entire post (in R) and really didn’t have time (she’s teaching a full load of classes and is department chair + a mom). I suggested that I, too, was a bit short on time which resulted in her putting out a call to the twitterverse for assistance which ultimately ended up coercing me into tackling the problem.

I won’t re-create Rick’s post or my riff of it here since you can check out the RPubs page for it and also get the source (you can get the source from the Rmd, too, but some folks like gists better).

So, why a blog post if not to present the translation?

Two reasons: I needed tidy Goodman simultaneous confidence intervals (SCIs) and Rick’s final plot was just begging to have “real” M&M’s as the point “geom”.

S[c]imple & Tidy SCIs

We’ve got options for calculating simultaneous CIs in R and I could have just used DescTools::MultinomCI except that I wanted a tibble and it returns a matrix plus it only has three of the more common methods implemented (yes, I am the ultimate package snob). I recalled that the CoinMinD package was tailor made for working with SCIs and has many more methods implemented, but the output is actually only that: print()ed to console output.

Yes, I shouted in disbelief at the glowing rectangle in front of me when I noticed that almost as loudly as you did when you read that sentence.

The algorithms implemented in CoinMinD are just dandy and the package is coming up on it’s 4th birthday. So, as a present from it (via me) to the R community, I whipped together scimple which generates tidy tibbles and has a function scimple_ci() which is similar to binom::binom.confint() in that it will generate the SCIs for all the available (non-Bayesian) methods, including Goodman.

Kick the tyres (pls!) and drop issues and/or PRs as you see fit.

You can’t plot just one

Rick’s post analyzes distributions of M&M’s so I went to the official M&M’s site to grab the official colors for the ones in his data set. I casually went about making the rest of the post with standard points with a superimposed white “m” when it dawned on me that the M&M’s site used those lentils (yes, it seems the candies are called lentils, or at least their icons are) were all over the site. After some site spelunking with Chrome Developer Tools I had the URLs for the candies in question and managed to use the nascent ggimage::geom_image() to place them on the plot:

The plot is a bit sparse as you have to get the aspect ratio just right to keep those tasty, tiny circles as circles.

The new geom_image() opens up many new possibilities for R visualizations (and not all are good possibilities). I think @mrshrbrmstr’s students got a kick out of a stats-y plot having real M&M’s on it so it worked OK this time. Just be wary of using gratuitous imagery and overdoing your watermarking.

As stated earlier you can get the code and see how you can improve upon Rick’s original post and my attempt at a quick riff. If you do end up cranking something out, drop a comment here or a tweet (@hrbrmstr) to show off your creation(s)!

Cover image from Data-Driven Security
Amazon Author Page

6 Comments Candy Coated Confidence Intervals

  1. Pingback: Candy Coated Confidence Intervals - Use-R!Use-R!

  2. Pingback: Candy Coated Confidence Intervals | A bunch of data

  3. Pingback: Candy Coated Confidence Intervals – Mubashir Qasim

  4. Pingback: Candy Coated Confidence Intervals – Cyber Security

  5. Pingback: Candy Coated Confidence Intervals – sec.uno

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.