Slopegraphs in Python

(NOTE: You can keep up with progress best at github, but can always search on “slopegraph” here or just hit the tag page: “slopegraph” regularly)

I’ve been a bit obsessed with slopegraphs (a.k.a “Tufte table-chart”) of late and very dissatisfied with the lack of tools to make this particular visualization tool more prevalent. While my ultimate goal is to have a user-friendly modern web app or platform app that’s as easy as a “drag & drop” of a CSV file, this first foray will require a bit (not much, really!) of elbow grease to be used.

For those who want to get right to the code, head on over to github and have a look (I’ll post all updates there). Setup, sample & source are also below.

First, you’ll need a modern Python install. I did all the development on Mac OS Mountain Lion (beta) with the stock Python 2.7 build. You’ll also need the Cairo 2D graphics library which built and installed perfectly from source, even on ML, so it should work fine for you. If you want something besides PDF rendering, you may need additional libraries, but PDF is decent for hi-res embedding, converting to jpg/png (see below) and tweaking in programs like Illustrator.

If you search for “Gender Comparisons” in the comments on this post at Tufte’s blog, you’ll see what I was trying to reproduce in this bit of skeleton code (below). By modifying the CSV file you’re using [line 21] and then which fields are relevant [lines 45-47] you should be able to make your own basic slopegraphs without much trouble.

If you catch any glitches, add some tweak or have a slopegraph “wish list”, let me know here, twitter (@hrbrmstr) or over at github.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
# slopegraph.py
#
# Author: Bob Rudis (@hrbrmstr)
#
# Basic Python skeleton to do simple two value slopegraphs
# with output to PDF (most useful form for me...Cairo has tons of options)
#
# Find out more about & download Cairo here:
# http://cairographics.org/
#
# 2012-05-28 - 0.5 - Initial github release. Still needs some polish
#
 
import csv
import cairo
 
# original data source: http://www.calvin.edu/~stob/data/television.csv
 
# get a CSV file to work with 
 
slopeReader = csv.reader(open('television.csv', 'rb'), delimiter=',', quotechar='"')
 
starts = {} # starting "points"/
ends = {} # ending "points"
 
# Need to refactor label max width into font calculations
# as there's no guarantee the longest (character-wise)
# label is the widest one
 
startLabelMaxLen = 0
endLabelMaxLen = 0
 
# build a base pair array for the final plotting
# wastes memory, but simplifies plotting
 
pairs = []
 
for row in slopeReader:
 
	# add chosen values (need start/end for each CSV row)
	# to the final plotting array. Try this sample with 
	# row[1] (average life span) instead of row[5] to see some
	# of the scaling in action
 
	lab = row[0] # label
	beg = row[5] # male life span
	end = row[4] # female life span
 
	pairs.append( (float(beg), float(end)) )
 
	# combine labels of common values into one string
	# also (as noted previously, inappropriately) find the
	# longest one
 
	if beg in starts:
		starts[beg] = starts[beg] + "; " + lab
	else:
		starts[beg] = lab
 
	if ((len(starts[beg]) + len(beg)) > startLabelMaxLen):
		startLabelMaxLen = len(starts[beg]) + len(beg)
		s1 = starts[beg]
 
 
	if end in ends:
		ends[end] = ends[end] + "; " + lab
	else:
		ends[end] = lab
 
	if ((len(ends[end]) + len(end)) > endLabelMaxLen):
		endLabelMaxLen = len(ends[end]) + len(end)
		e1 = ends[end]
 
# sort all the values (in the event the CSV wasn't) so
# we can determine the smallest increment we need to use
# when stacking the labels and plotting points
 
startSorted = [(k, starts[k]) for k in sorted(starts)]
endSorted = [(k, ends[k]) for k in sorted(ends)]
 
startKeys = sorted(starts.keys())
delta = max(startSorted)
for i in range(len(startKeys)):
	if (i+1 <= len(startKeys)-1):
		currDelta = float(startKeys[i+1]) - float(startKeys[i])
		if (currDelta < delta):
			delta = currDelta
 
endKeys = sorted(ends.keys())
for i in range(len(endKeys)):
	if (i+1 <= len(endKeys)-1):
		currDelta = float(endKeys[i+1]) - float(endKeys[i])
		if (currDelta < delta):
			delta = currDelta
 
# we also need to find the absolute min & max values
# so we know how to scale the plots
 
lowest = min(startKeys)
if (min(endKeys) < lowest) : lowest = min(endKeys)
 
highest = max(startKeys)
if (max(endKeys) > highest) : highest = max(endKeys)
 
# just making sure everything's a number
# probably should move some of this to the csv reader section
 
delta = float(delta)
lowest = float(lowest)
highest = float(highest)
startLabelMaxLen = float(startLabelMaxLen)
endLabelMaxLen = float(endLabelMaxLen)
 
# setup line width and font-size for the Cairo
# you can change these and the constants should
# scale the plots accordingly
 
FONT_SIZE = 9
LINE_WIDTH = 0.5
 
# there has to be a better way to get a base "surface"
# to do font calculations besides this. we're just making
# this Cairo surface to we know the max pixel width 
# (font extents) of the labels in order to scale the graph
# accurately (since width/height are based, in part, on it)
 
filename = 'slopegraph.pdf'
surface = cairo.PDFSurface (filename, 8.5*72, 11*72)
cr = cairo.Context (surface)
cr.save()
cr.select_font_face("Sans", cairo.FONT_SLANT_NORMAL, cairo.FONT_WEIGHT_NORMAL)
cr.set_font_size(FONT_SIZE)
cr.set_line_width(LINE_WIDTH)
xbearing, ybearing, sWidth, sHeight, xadvance, yadvance = (cr.text_extents(s1))
xbearing, ybearing, eWidth, eHeight, xadvance, yadvance = (cr.text_extents(e1))
xbearing, ybearing, spaceWidth, spaceHeight, xadvance, yadvance = (cr.text_extents(" "))
cr.restore()
cr.show_page()
surface.finish()
 
# setup some more constants for plotting
# all of these are malleable and should cascade nicely
 
X_MARGIN = 10
Y_MARGIN = 10
SLOPEGRAPH_CANVAS_SIZE = 200
spaceWidth = 5
LINE_HEIGHT = 15
PLOT_LINE_WIDTH = 0.5
 
width = (X_MARGIN * 2) + sWidth + spaceWidth + SLOPEGRAPH_CANVAS_SIZE + spaceWidth + eWidth
height = (Y_MARGIN * 2) + (((highest - lowest + 1) / delta) * LINE_HEIGHT)
 
# create the real Cairo surface/canvas
 
filename = 'slopegraph.pdf'
surface = cairo.PDFSurface (filename, width, height)
cr = cairo.Context (surface)
 
cr.save()
 
cr.select_font_face("Sans", cairo.FONT_SLANT_NORMAL, cairo.FONT_WEIGHT_NORMAL)
cr.set_font_size(FONT_SIZE)
 
cr.set_line_width(LINE_WIDTH)
cr.set_source_rgba (0, 0, 0) # need to make this a constant
 
# draw start labels at the correct positions
# cheating a bit here as the code doesn't (yet) line up 
# the actual data values
 
for k in sorted(startKeys):
 
	label = starts[k]
	xbearing, ybearing, lWidth, lHeight, xadvance, yadvance = (cr.text_extents(label))
 
	val = float(k)
 
	cr.move_to(X_MARGIN + (sWidth - lWidth), Y_MARGIN + (highest - val) * LINE_HEIGHT * (1/delta) + LINE_HEIGHT/2)
	cr.show_text(label + " " + k)
	cr.stroke()
 
# draw end labels at the correct positions
# cheating a bit here as the code doesn't (yet) line up 
# the actual data values
 
for k in sorted(endKeys):
 
	label = ends[k]
	xbearing, ybearing, lWidth, lHeight, xadvance, yadvance = (cr.text_extents(label))
 
	val = float(k)
 
	cr.move_to(width - X_MARGIN - eWidth - (4*spaceWidth), Y_MARGIN + (highest - val) * LINE_HEIGHT * (1/delta) + LINE_HEIGHT/2)
	cr.show_text(k + " " + label)
	cr.stroke()
 
# do the actual plotting
 
cr.set_line_width(PLOT_LINE_WIDTH)
cr.set_source_rgba (0.75, 0.75, 0.75) # need to make this a constant
 
for s1,e1 in pairs:
	cr.move_to(X_MARGIN + sWidth + spaceWidth + 20, Y_MARGIN + (highest - s1) * LINE_HEIGHT * (1/delta) + LINE_HEIGHT/2)
	cr.line_to(width - X_MARGIN - eWidth - spaceWidth - 20, Y_MARGIN + (highest - e1) * LINE_HEIGHT * (1/delta) + LINE_HEIGHT/2)
	cr.stroke()
 
cr.restore()
cr.show_page()
surface.finish()
Buy on AmazonDDS Blog
DDS PodcastAmazon Author Page

2 Comments Slopegraphs in Python

  1. Pingback: Slopegraph Workbench/Workshop in D3 — rud.is

Leave a Reply