Slopegraphs in Python – Exploring Binning/Rounding

One of the last items for the 1.0 release is support for multiple columns of data. That will require some additional refactoring, so I’ve been procrastinating by exploring the recent “fudging” discovery. Despite claims to the contrary on other sites, there are more folks playing with slopegraphs than you might imagine. The inspiration for today’s installment comes from Jon Custer (@stuffisthings). He has a two partTelling Stories with Data” series that does some exploration of export data with slopegraphs. In his “Slopegraph Strikes Back” post, Jon does a spiffy job discussing data visualization fundamentals and walks the reader through his re-design of a chart on commodities ranking, including a commentary on an aspect of slopegraphs that I’ve been noticing as I’ve been doing my exploring: the ‘scale’ problem (which I began to point out in the aforementioned “fudging” post).

The data set Jon is working with allows for a great exploration as to what works best when trying to convey a message with slopegraphs. I took the values from one of the tables he extracted:

and made a “raw” slopegraph from them (focusing on the “top 10”). The graphic won’t even come close to fitting in this post but you can grab the PDF of it and see how scale is the primary enemy of slopegraphs. It does show how gold and precious metal ores have skyrocketed from 1998 to 2007, but it’s hardly an engaging and easy to read visualization (unless you really like using your scroll wheel).

Jon grok’d this point, too, and decided to focus on the power law ranking and use the slopegraph to present the rate of change of each commodity:

While he didn’t “pull a Tufte” and just include values without caveat (see left & right 90° side labels), I still believe that there needs to be either increased annotation or the inclusion of base tabular data. Using my PySlopegraph code (forgot to mention the name change), I worked up a version of Jon’s visualization that I believe provides a clean, honest view of the data (click for larger view):

Because the chart is still based on the percentages that are fairly precise:

  1. "Coconuts, Brazil nuts, cashews",17.93,0.93
    
  2. Coffee,12.93,3.91
    
  3. Fish,7.89,5.04
    
  4. Tobacco,7.25,3.19
    
  5. Gold,6.62,18.63
    
  6. Tea,4.14,1.32
    
  7. Cotton,4.01,1.36
    
  8. Cloves,3.58,0.29
    
  9. Diamonds,3.44,0.58
    
  10. Mounted stones,2.44,1.5
    
  11. Vegetables,1.61,1.73
    
  12. Wheat,0.54,1.38
    
  13. "Precious metal ores",0,6.76

I finally added an option to the PySlopegraph configuration file for rounding (NOTE: rounding != true binning). If you add the “round_precision” option with a value that supports Python’s round function’s little-known second parameter (arbitrary positional rounding), you can have the values round to decimal or tens/hundreds/etc places which will help with scaling issues, but will also group items (in ways that you may not have originally intended). For this chart, if we use a value of “1” (first decimal rounding precision…use negative values for rounding on the whole integer side of the decimal) it’s still unreadable due to the scale it imposes by that precision, so I ended up using the nearest whole integer rounding option (value of “0”) and also included the table of actual values, along with annotating the “rate of change” nature of the slopes.

This (again) defeats the “no wasted ink (pixels?)” component of Tufte’s original creation, but I believe it’s necessary for some types of slopegraphs to ensure the chart can stand on it’s own. I’m definitely becoming more convinced that many slopegraphs are more suited for an interactive visualization where you can encode more information in rollovers/popups/etc plus allow for switching of view from percentage, power-law ranking or raw numeric comparison.

For those interested in playing with this particular data set, it’ll be included in the next github code push, which will also include the rounding feature.

Cover image from Data-Driven Security
Amazon Author Page

1 Comment Slopegraphs in Python – Exploring Binning/Rounding

  1. Pingback: Just When You Thought It Was Safe To Make A Slopegraph | rud.is

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.