(NOTE: You can keep up with progress best at github, but can always search on “slopegraph” here or just hit the tag page: “slopegraph” regularly)
I’ve been a bit obsessed with slopegraphs (a.k.a “Tufte table-chart”) of late and very dissatisfied with the lack of tools to make this particular visualization tool more prevalent. While my ultimate goal is to have a user-friendly modern web app or platform app that’s as easy as a “drag & drop” of a CSV file, this first foray will require a bit (not much, really!) of elbow grease to be used.
For those who want to get right to the code, head on over to github and have a look (I’ll post all updates there). Setup, sample & source are also below.
First, you’ll need a modern Python install. I did all the development on Mac OS Mountain Lion (beta) with the stock Python 2.7 build. You’ll also need the Cairo 2D graphics library which built and installed perfectly from source, even on ML, so it should work fine for you. If you want something besides PDF rendering, you may need additional libraries, but PDF is decent for hi-res embedding, converting to jpg/png (see below) and tweaking in programs like Illustrator.
If you search for “Gender Comparisons” in the comments on this post at Tufte’s blog, you’ll see what I was trying to reproduce in this bit of skeleton code (below). By modifying the CSV file you’re using [line 21] and then which fields are relevant [lines 45-47] you should be able to make your own basic slopegraphs without much trouble.
If you catch any glitches, add some tweak or have a slopegraph “wish list”, let me know here, twitter (@hrbrmstr) or over at github.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | # slopegraph.py # # Author: Bob Rudis (@hrbrmstr) # # Basic Python skeleton to do simple two value slopegraphs # with output to PDF (most useful form for me...Cairo has tons of options) # # Find out more about & download Cairo here: # http://cairographics.org/ # # 2012-05-28 - 0.5 - Initial github release. Still needs some polish # import csv import cairo # original data source: http://www.calvin.edu/~stob/data/television.csv # get a CSV file to work with slopeReader = csv.reader(open('television.csv', 'rb'), delimiter=',', quotechar='"') starts = {} # starting "points"/ ends = {} # ending "points" # Need to refactor label max width into font calculations # as there's no guarantee the longest (character-wise) # label is the widest one startLabelMaxLen = 0 endLabelMaxLen = 0 # build a base pair array for the final plotting # wastes memory, but simplifies plotting pairs = [] for row in slopeReader: # add chosen values (need start/end for each CSV row) # to the final plotting array. Try this sample with # row[1] (average life span) instead of row[5] to see some # of the scaling in action lab = row[0] # label beg = row[5] # male life span end = row[4] # female life span pairs.append( (float(beg), float(end)) ) # combine labels of common values into one string # also (as noted previously, inappropriately) find the # longest one if beg in starts: starts[beg] = starts[beg] + "; " + lab else: starts[beg] = lab if ((len(starts[beg]) + len(beg)) > startLabelMaxLen): startLabelMaxLen = len(starts[beg]) + len(beg) s1 = starts[beg] if end in ends: ends[end] = ends[end] + "; " + lab else: ends[end] = lab if ((len(ends[end]) + len(end)) > endLabelMaxLen): endLabelMaxLen = len(ends[end]) + len(end) e1 = ends[end] # sort all the values (in the event the CSV wasn't) so # we can determine the smallest increment we need to use # when stacking the labels and plotting points startSorted = [(k, starts[k]) for k in sorted(starts)] endSorted = [(k, ends[k]) for k in sorted(ends)] startKeys = sorted(starts.keys()) delta = max(startSorted) for i in range(len(startKeys)): if (i+1 <= len(startKeys)-1): currDelta = float(startKeys[i+1]) - float(startKeys[i]) if (currDelta < delta): delta = currDelta endKeys = sorted(ends.keys()) for i in range(len(endKeys)): if (i+1 <= len(endKeys)-1): currDelta = float(endKeys[i+1]) - float(endKeys[i]) if (currDelta < delta): delta = currDelta # we also need to find the absolute min & max values # so we know how to scale the plots lowest = min(startKeys) if (min(endKeys) < lowest) : lowest = min(endKeys) highest = max(startKeys) if (max(endKeys) > highest) : highest = max(endKeys) # just making sure everything's a number # probably should move some of this to the csv reader section delta = float(delta) lowest = float(lowest) highest = float(highest) startLabelMaxLen = float(startLabelMaxLen) endLabelMaxLen = float(endLabelMaxLen) # setup line width and font-size for the Cairo # you can change these and the constants should # scale the plots accordingly FONT_SIZE = 9 LINE_WIDTH = 0.5 # there has to be a better way to get a base "surface" # to do font calculations besides this. we're just making # this Cairo surface to we know the max pixel width # (font extents) of the labels in order to scale the graph # accurately (since width/height are based, in part, on it) filename = 'slopegraph.pdf' surface = cairo.PDFSurface (filename, 8.5*72, 11*72) cr = cairo.Context (surface) cr.save() cr.select_font_face("Sans", cairo.FONT_SLANT_NORMAL, cairo.FONT_WEIGHT_NORMAL) cr.set_font_size(FONT_SIZE) cr.set_line_width(LINE_WIDTH) xbearing, ybearing, sWidth, sHeight, xadvance, yadvance = (cr.text_extents(s1)) xbearing, ybearing, eWidth, eHeight, xadvance, yadvance = (cr.text_extents(e1)) xbearing, ybearing, spaceWidth, spaceHeight, xadvance, yadvance = (cr.text_extents(" ")) cr.restore() cr.show_page() surface.finish() # setup some more constants for plotting # all of these are malleable and should cascade nicely X_MARGIN = 10 Y_MARGIN = 10 SLOPEGRAPH_CANVAS_SIZE = 200 spaceWidth = 5 LINE_HEIGHT = 15 PLOT_LINE_WIDTH = 0.5 width = (X_MARGIN * 2) + sWidth + spaceWidth + SLOPEGRAPH_CANVAS_SIZE + spaceWidth + eWidth height = (Y_MARGIN * 2) + (((highest - lowest + 1) / delta) * LINE_HEIGHT) # create the real Cairo surface/canvas filename = 'slopegraph.pdf' surface = cairo.PDFSurface (filename, width, height) cr = cairo.Context (surface) cr.save() cr.select_font_face("Sans", cairo.FONT_SLANT_NORMAL, cairo.FONT_WEIGHT_NORMAL) cr.set_font_size(FONT_SIZE) cr.set_line_width(LINE_WIDTH) cr.set_source_rgba (0, 0, 0) # need to make this a constant # draw start labels at the correct positions # cheating a bit here as the code doesn't (yet) line up # the actual data values for k in sorted(startKeys): label = starts[k] xbearing, ybearing, lWidth, lHeight, xadvance, yadvance = (cr.text_extents(label)) val = float(k) cr.move_to(X_MARGIN + (sWidth - lWidth), Y_MARGIN + (highest - val) * LINE_HEIGHT * (1/delta) + LINE_HEIGHT/2) cr.show_text(label + " " + k) cr.stroke() # draw end labels at the correct positions # cheating a bit here as the code doesn't (yet) line up # the actual data values for k in sorted(endKeys): label = ends[k] xbearing, ybearing, lWidth, lHeight, xadvance, yadvance = (cr.text_extents(label)) val = float(k) cr.move_to(width - X_MARGIN - eWidth - (4*spaceWidth), Y_MARGIN + (highest - val) * LINE_HEIGHT * (1/delta) + LINE_HEIGHT/2) cr.show_text(k + " " + label) cr.stroke() # do the actual plotting cr.set_line_width(PLOT_LINE_WIDTH) cr.set_source_rgba (0.75, 0.75, 0.75) # need to make this a constant for s1,e1 in pairs: cr.move_to(X_MARGIN + sWidth + spaceWidth + 20, Y_MARGIN + (highest - s1) * LINE_HEIGHT * (1/delta) + LINE_HEIGHT/2) cr.line_to(width - X_MARGIN - eWidth - spaceWidth - 20, Y_MARGIN + (highest - e1) * LINE_HEIGHT * (1/delta) + LINE_HEIGHT/2) cr.stroke() cr.restore() cr.show_page() surface.finish() |
Using pandas and matplotlib, you can dramatically shrink your code :
https://gist.github.com/pascal-schetelat/7726054
Pingback: Slopegraph Workbench/Workshop in D3 — rud.is