Slopegraphs in Python

(NOTE: You can keep up with progress best at github, but can always search on “slopegraph” here or just hit the tag page: “slopegraph” regularly)

I’ve been a bit obsessed with slopegraphs (a.k.a “Tufte table-chart”) of late and very dissatisfied with the lack of tools to make this particular visualization tool more prevalent. While my ultimate goal is to have a user-friendly modern web app or platform app that’s as easy as a “drag & drop” of a CSV file, this first foray will require a bit (not much, really!) of elbow grease to be used.

For those who want to get right to the code, head on over to github and have a look (I’ll post all updates there). Setup, sample & source are also below.

First, you’ll need a modern Python install. I did all the development on Mac OS Mountain Lion (beta) with the stock Python 2.7 build. You’ll also need the Cairo 2D graphics library which built and installed perfectly from source, even on ML, so it should work fine for you. If you want something besides PDF rendering, you may need additional libraries, but PDF is decent for hi-res embedding, converting to jpg/png (see below) and tweaking in programs like Illustrator.

If you search for “Gender Comparisons” in the comments on this post at Tufte’s blog, you’ll see what I was trying to reproduce in this bit of skeleton code (below). By modifying the CSV file you’re using [line 21] and then which fields are relevant [lines 45-47] you should be able to make your own basic slopegraphs without much trouble.

If you catch any glitches, add some tweak or have a slopegraph “wish list”, let me know here, twitter (@hrbrmstr) or over at github.

  1. # slopegraph.py
  2. #
  3. # Author: Bob Rudis (@hrbrmstr)
  4. #
  5. # Basic Python skeleton to do simple two value slopegraphs
  6. # with output to PDF (most useful form for me...Cairo has tons of options)
  7. #
  8. # Find out more about & download Cairo here:
  9. # http://cairographics.org/
  10. #
  11. # 2012-05-28 - 0.5 - Initial github release. Still needs some polish
  12. #
  13.  
  14. import csv
  15. import cairo
  16.  
  17. # original data source: http://www.calvin.edu/~stob/data/television.csv
  18.  
  19. # get a CSV file to work with 
  20.  
  21. slopeReader = csv.reader(open('television.csv', 'rb'), delimiter=',', quotechar='"')
  22.  
  23. starts = {} # starting "points"/
  24. ends = {} # ending "points"
  25.  
  26. # Need to refactor label max width into font calculations
  27. # as there's no guarantee the longest (character-wise)
  28. # label is the widest one
  29.  
  30. startLabelMaxLen = 0
  31. endLabelMaxLen = 0
  32.  
  33. # build a base pair array for the final plotting
  34. # wastes memory, but simplifies plotting
  35.  
  36. pairs = []
  37.  
  38. for row in slopeReader:
  39.  
  40. 	# add chosen values (need start/end for each CSV row)
  41. 	# to the final plotting array. Try this sample with 
  42. 	# row[1] (average life span) instead of row[5] to see some
  43. 	# of the scaling in action
  44.  
  45. 	lab = row[0] # label
  46. 	beg = row[5] # male life span
  47. 	end = row[4] # female life span
  48.  
  49. 	pairs.append( (float(beg), float(end)) )
  50.  
  51. 	# combine labels of common values into one string
  52. 	# also (as noted previously, inappropriately) find the
  53. 	# longest one
  54.  
  55. 	if beg in starts:
  56. 		starts[beg] = starts[beg] + "; " + lab
  57. 	else:
  58. 		starts[beg] = lab
  59.  
  60. 	if ((len(starts[beg]) + len(beg)) > startLabelMaxLen):
  61. 		startLabelMaxLen = len(starts[beg]) + len(beg)
  62. 		s1 = starts[beg]
  63.  
  64.  
  65. 	if end in ends:
  66. 		ends[end] = ends[end] + "; " + lab
  67. 	else:
  68. 		ends[end] = lab
  69.  
  70. 	if ((len(ends[end]) + len(end)) > endLabelMaxLen):
  71. 		endLabelMaxLen = len(ends[end]) + len(end)
  72. 		e1 = ends[end]
  73.  
  74. # sort all the values (in the event the CSV wasn't) so
  75. # we can determine the smallest increment we need to use
  76. # when stacking the labels and plotting points
  77.  
  78. startSorted = [(k, starts[k]) for k in sorted(starts)]
  79. endSorted = [(k, ends[k]) for k in sorted(ends)]
  80.  
  81. startKeys = sorted(starts.keys())
  82. delta = max(startSorted)
  83. for i in range(len(startKeys)):
  84. 	if (i+1 <= len(startKeys)-1):
  85. 		currDelta = float(startKeys[i+1]) - float(startKeys[i])
  86. 		if (currDelta < delta):
  87. 			delta = currDelta
  88.  
  89. endKeys = sorted(ends.keys())
  90. for i in range(len(endKeys)):
  91. 	if (i+1 <= len(endKeys)-1):
  92. 		currDelta = float(endKeys[i+1]) - float(endKeys[i])
  93. 		if (currDelta < delta):
  94. 			delta = currDelta
  95.  
  96. # we also need to find the absolute min & max values
  97. # so we know how to scale the plots
  98.  
  99. lowest = min(startKeys)
  100. if (min(endKeys) < lowest) : lowest = min(endKeys)
  101.  
  102. highest = max(startKeys)
  103. if (max(endKeys) > highest) : highest = max(endKeys)
  104.  
  105. # just making sure everything's a number
  106. # probably should move some of this to the csv reader section
  107.  
  108. delta = float(delta)
  109. lowest = float(lowest)
  110. highest = float(highest)
  111. startLabelMaxLen = float(startLabelMaxLen)
  112. endLabelMaxLen = float(endLabelMaxLen)
  113.  
  114. # setup line width and font-size for the Cairo
  115. # you can change these and the constants should
  116. # scale the plots accordingly
  117.  
  118. FONT_SIZE = 9
  119. LINE_WIDTH = 0.5
  120.  
  121. # there has to be a better way to get a base "surface"
  122. # to do font calculations besides this. we're just making
  123. # this Cairo surface to we know the max pixel width 
  124. # (font extents) of the labels in order to scale the graph
  125. # accurately (since width/height are based, in part, on it)
  126.  
  127. filename = 'slopegraph.pdf'
  128. surface = cairo.PDFSurface (filename, 8.5*72, 11*72)
  129. cr = cairo.Context (surface)
  130. cr.save()
  131. cr.select_font_face("Sans", cairo.FONT_SLANT_NORMAL, cairo.FONT_WEIGHT_NORMAL)
  132. cr.set_font_size(FONT_SIZE)
  133. cr.set_line_width(LINE_WIDTH)
  134. xbearing, ybearing, sWidth, sHeight, xadvance, yadvance = (cr.text_extents(s1))
  135. xbearing, ybearing, eWidth, eHeight, xadvance, yadvance = (cr.text_extents(e1))
  136. xbearing, ybearing, spaceWidth, spaceHeight, xadvance, yadvance = (cr.text_extents(" "))
  137. cr.restore()
  138. cr.show_page()
  139. surface.finish()
  140.  
  141. # setup some more constants for plotting
  142. # all of these are malleable and should cascade nicely
  143.  
  144. X_MARGIN = 10
  145. Y_MARGIN = 10
  146. SLOPEGRAPH_CANVAS_SIZE = 200
  147. spaceWidth = 5
  148. LINE_HEIGHT = 15
  149. PLOT_LINE_WIDTH = 0.5
  150.  
  151. width = (X_MARGIN * 2) + sWidth + spaceWidth + SLOPEGRAPH_CANVAS_SIZE + spaceWidth + eWidth
  152. height = (Y_MARGIN * 2) + (((highest - lowest + 1) / delta) * LINE_HEIGHT)
  153.  
  154. # create the real Cairo surface/canvas
  155.  
  156. filename = 'slopegraph.pdf'
  157. surface = cairo.PDFSurface (filename, width, height)
  158. cr = cairo.Context (surface)
  159.  
  160. cr.save()
  161.  
  162. cr.select_font_face("Sans", cairo.FONT_SLANT_NORMAL, cairo.FONT_WEIGHT_NORMAL)
  163. cr.set_font_size(FONT_SIZE)
  164.  
  165. cr.set_line_width(LINE_WIDTH)
  166. cr.set_source_rgba (0, 0, 0) # need to make this a constant
  167.  
  168. # draw start labels at the correct positions
  169. # cheating a bit here as the code doesn't (yet) line up 
  170. # the actual data values
  171.  
  172. for k in sorted(startKeys):
  173.  
  174. 	label = starts[k]
  175. 	xbearing, ybearing, lWidth, lHeight, xadvance, yadvance = (cr.text_extents(label))
  176.  
  177. 	val = float(k)
  178.  
  179. 	cr.move_to(X_MARGIN + (sWidth - lWidth), Y_MARGIN + (highest - val) * LINE_HEIGHT * (1/delta) + LINE_HEIGHT/2)
  180. 	cr.show_text(label + " " + k)
  181. 	cr.stroke()
  182.  
  183. # draw end labels at the correct positions
  184. # cheating a bit here as the code doesn't (yet) line up 
  185. # the actual data values
  186.  
  187. for k in sorted(endKeys):
  188.  
  189. 	label = ends[k]
  190. 	xbearing, ybearing, lWidth, lHeight, xadvance, yadvance = (cr.text_extents(label))
  191.  
  192. 	val = float(k)
  193.  
  194. 	cr.move_to(width - X_MARGIN - eWidth - (4*spaceWidth), Y_MARGIN + (highest - val) * LINE_HEIGHT * (1/delta) + LINE_HEIGHT/2)
  195. 	cr.show_text(k + " " + label)
  196. 	cr.stroke()
  197.  
  198. # do the actual plotting
  199.  
  200. cr.set_line_width(PLOT_LINE_WIDTH)
  201. cr.set_source_rgba (0.75, 0.75, 0.75) # need to make this a constant
  202.  
  203. for s1,e1 in pairs:
  204. 	cr.move_to(X_MARGIN + sWidth + spaceWidth + 20, Y_MARGIN + (highest - s1) * LINE_HEIGHT * (1/delta) + LINE_HEIGHT/2)
  205. 	cr.line_to(width - X_MARGIN - eWidth - spaceWidth - 20, Y_MARGIN + (highest - e1) * LINE_HEIGHT * (1/delta) + LINE_HEIGHT/2)
  206. 	cr.stroke()
  207.  
  208. cr.restore()
  209. cr.show_page()
  210. surface.finish()
Cover image from Data-Driven Security
Amazon Author Page

2 Comments Slopegraphs in Python

  1. Pingback: Slopegraph Workbench/Workshop in D3 — rud.is

Leave a Reply to Pascal Schetelat Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.