Slopegraphs in Python – More Output Tweaks

The best way to explain this release will be to walk you through an updated configuration file:

  1. {
  2.  
  3. "label_font_family" : "Palatino",
  4. "label_font_size" : "9",
  5.  
  6. "header_font_family" : "Palatino",
  7. "header_font_size" : "10",
  8.  
  9. "x_margin" : "20",
  10. "y_margin" : "30",
  11.  
  12. "line_width" : "0.5",
  13.  
  14. "slope_length" : "150",
  15.  
  16. "labels" : [ "1970", "1979" ],
  17.  
  18. "header_color" : "000000",
  19. "background_color" : "FFFFFF",
  20. "label_color" : "111111",
  21. "value_color" : "999999",
  22. "slope_color" : "AAAAAA",
  23.  
  24. "value_format_string" : "%2d",
  25.  
  26. "input" : "receipts.csv",
  27. "output" : "receipts",
  28. "format" : "svg",
  29.  
  30. "description" : "Current Receipts of Government as a Percentage of Gross Domestic Product, 1970 & 1979",
  31. "source" : "Tufte, Edward. The Visual Display of Quantitative Information. Cheshire, Connecticut: Graphics Press; 1983; p. 158"
  32.  
  33. }

I added the ability to include column headers and separated the font specifications for both the column data/labels and the headers (Lines 2-7). You’re not required to use headers, so just leave out the header font specification and the “labels” option (Line 16) if you don’t want them (it keys off of the font spec, tho). You can also color headers via the “header_color” option (line 18).

If you use the keyword “transparent” for the “background_color” config option (Line 19, tho it’s not transparent in this example) it will leave out the fill, which is useful for blog posts or embedding in other documents. Works best for PNG & PDF output.

If you want to use a different value for width of the space for the slopelines, you can tweak this via the “slope_length” option (Line 14). This is setting the stage for multi-column slopegraphs.

When exchanging some communications with @jayjacobs regarding slopegraphs and seeing his spiffy use of them for incident data anlysis, it became readily apparent that I needed to include a way of formatting the column data values, so there’s a “value_format_string” option, now, that works with Pythonic sprintf formats.

Finally, I added “description” and “source” options that the code does not yet process, but allows for documenting the configuration a bit, since there’s no good way to embed comments in a JSON-format configuration file.

As always, the code’s up on github and also below:

  1. import csv
  2. import cairo
  3. import argparse
  4. import json
  5.  
  6. def split(input, size):
  7.     return [input[start:start+size] for start in range(0, len(input), size)]
  8.  
  9. class Slopegraph:
  10.  
  11.     starts = {} # starting "points"
  12.     ends = {} # ending "points"
  13.     pairs = [] # base pair array for the final plotting
  14.  
  15.     def readCSV(self, filename):
  16.  
  17.         slopeReader = csv.reader(open(filename, 'rb'), delimiter=',', quotechar='"')
  18.  
  19.         for row in slopeReader:
  20.  
  21.             # add chosen values (need start/end for each CSV row) to the final plotting array.
  22.  
  23.             lab = row[0] # label
  24.             beg = float(row[1]) # left vals
  25.             end = float(row[2]) # right vals
  26.  
  27.             self.pairs.append( (float(beg), float(end)) )
  28.  
  29.             # combine labels of common values into one string
  30.  
  31.             if beg in self.starts:
  32.                 self.starts[beg] = self.starts[beg] + "; " + lab
  33.             else:
  34.                 self.starts[beg] = lab
  35.  
  36.  
  37.             if end in self.ends:
  38.                 self.ends[end] = self.ends[end] + "; " + lab
  39.             else:
  40.                 self.ends[end] = lab
  41.  
  42.  
  43.     def sortKeys(self):
  44.  
  45.         # sort all the values (in the event the CSV wasn't) so
  46.         # we can determine the smallest increment we need to use
  47.         # when stacking the labels and plotting points
  48.  
  49.         self.startSorted = [(k, self.starts[k]) for k in sorted(self.starts)]
  50.         self.endSorted = [(k, self.ends[k]) for k in sorted(self.ends)]
  51.  
  52.         self.startKeys = sorted(self.starts.keys())
  53.         self.delta = max(self.startSorted)
  54.         for i in range(len(self.startKeys)):
  55.             if (i+1 <= len(self.startKeys)-1):
  56.                 currDelta = float(self.startKeys[i+1]) - float(self.startKeys[i])
  57.                 if (currDelta < self.delta):
  58.                     self.delta = currDelta
  59.  
  60.         self.endKeys = sorted(self.ends.keys())
  61.         for i in range(len(self.endKeys)):
  62.             if (i+1 <= len(self.endKeys)-1):
  63.                 currDelta = float(self.endKeys[i+1]) - float(self.endKeys[i])
  64.                 if (currDelta < self.delta):
  65.                     self.delta = currDelta
  66.  
  67.  
  68.     def findExtremes(self):
  69.  
  70.         # we also need to find the absolute min & max values
  71.         # so we know how to scale the plots
  72.  
  73.         self.lowest = min(self.startKeys)
  74.         if (min(self.endKeys) < self.lowest) : self.lowest = min(self.endKeys)
  75.  
  76.         self.highest = max(self.startKeys)
  77.         if (max(self.endKeys) > self.highest) : self.highest = max(self.endKeys)
  78.  
  79.         self.delta = float(self.delta)
  80.         self.lowest = float(self.lowest)
  81.         self.highest = float(self.highest)
  82.  
  83.  
  84.     def calculateExtents(self, filename, format, valueFormatString):
  85.  
  86.         if (format == "pdf"):
  87.             surface = cairo.PDFSurface (filename, 8.5*72, 11*72)
  88.         elif (format == "ps"):
  89.             surface = cairo.PSSurface(filename, 8.5*72, 11*72)
  90.             surface.set_eps(True)
  91.         elif (format == "svg"):
  92.             surface = cairo.SVGSurface (filename, 8.5*72, 11*72)
  93.         elif (format == "png"):
  94.             surface = cairo.ImageSurface (cairo.FORMAT_ARGB32, int(8.5*72), int(11*72))
  95.         else:
  96.             surface = cairo.PDFSurface (filename, 8.5*72, 11*72)
  97.  
  98.         cr = cairo.Context(surface)
  99.         cr.save()
  100.         cr.select_font_face(self.LABEL_FONT_FAMILY, cairo.FONT_SLANT_NORMAL, cairo.FONT_WEIGHT_NORMAL)
  101.         cr.set_font_size(self.LABEL_FONT_SIZE)
  102.         cr.set_line_width(self.LINE_WIDTH)
  103.  
  104.         # find the *real* maximum label width (not just based on number of chars)
  105.  
  106.         maxLabelWidth = 0
  107.         maxNumWidth = 0
  108.  
  109.         for k in sorted(self.startKeys):
  110.             s1 = self.starts[k]
  111.             xbearing, ybearing, self.sWidth, self.sHeight, xadvance, yadvance = (cr.text_extents(s1))
  112.             if (self.sWidth > maxLabelWidth) : maxLabelWidth = self.sWidth
  113.             xbearing, ybearing, self.startMaxLabelWidth, startMaxLabelHeight, xadvance, yadvance = (cr.text_extents(valueFormatString % (k)))
  114.             if (self.startMaxLabelWidth > maxNumWidth) : maxNumWidth = self.startMaxLabelWidth
  115.  
  116.         self.sWidth = maxLabelWidth
  117.         self.startMaxLabelWidth = maxNumWidth
  118.  
  119.         maxLabelWidth = 0
  120.         maxNumWidth = 0
  121.  
  122.         for k in sorted(self.endKeys):
  123.             e1 = self.ends[k]
  124.             xbearing, ybearing, self.eWidth, eHeight, xadvance, yadvance = (cr.text_extents(e1))
  125.             if (self.eWidth > maxLabelWidth) : maxLabelWidth = self.eWidth
  126.             xbearing, ybearing, self.endMaxLabelWidth, endMaxLabelHeight, xadvance, yadvance = (cr.text_extents(valueFormatString % (k)))
  127.             if (self.endMaxLabelWidth > maxNumWidth) : maxNumWidth = self.endMaxLabelWidth
  128.  
  129.         self.eWidth = maxLabelWidth
  130.         self.endMaxLabelWidth = maxNumWidth 
  131.  
  132.         cr.restore()
  133.         cr.show_page()
  134.         surface.finish()
  135.  
  136.         self.width = self.X_MARGIN + self.sWidth + self.SPACE_WIDTH + self.startMaxLabelWidth + self.SPACE_WIDTH + self.SLOPE_LENGTH + self.SPACE_WIDTH + self.endMaxLabelWidth + self.SPACE_WIDTH + self.eWidth + self.X_MARGIN ;
  137.         self.height = (self.Y_MARGIN * 2) + (((self.highest - self.lowest) / self.delta) * self.LINE_HEIGHT)
  138.  
  139.         self.HEADER_SPACE = 0.0
  140.         if (self.HEADER_FONT_FAMILY != None):
  141.             self.HEADER_SPACE = self.HEADER_FONT_SIZE + 2*self.LINE_HEIGHT
  142.             self.height += self.HEADER_SPACE
  143.  
  144.  
  145.     def makeSlopegraph(self, filename, config):
  146.  
  147.         (lab_r,lab_g,lab_b) = split(config["label_color"],2)        
  148.         LAB_R = (int(lab_r, 16)/255.0)
  149.         LAB_G = (int(lab_g, 16)/255.0)
  150.         LAB_B = (int(lab_b, 16)/255.0)
  151.  
  152.         (val_r,val_g,val_b) = split(config["value_color"],2)
  153.         VAL_R = (int(val_r, 16)/255.0)
  154.         VAL_G = (int(val_g, 16)/255.0)
  155.         VAL_B = (int(val_b, 16)/255.0)
  156.  
  157.         (line_r,line_g,line_b) = split(config["slope_color"],2)
  158.         LINE_R = (int(line_r, 16)/255.0)
  159.         LINE_G = (int(line_g, 16)/255.0)
  160.         LINE_B = (int(line_b, 16)/255.0)
  161.  
  162.         if (config["background_color"] != "transparent"):
  163.             (bg_r,bg_g,bg_b) = split(config["background_color"],2)
  164.             BG_R = (int(bg_r, 16)/255.0)
  165.             BG_G = (int(bg_g, 16)/255.0)
  166.             BG_B = (int(bg_b, 16)/255.0)
  167.  
  168.         if (config['format'] == "pdf"):
  169.             surface = cairo.PDFSurface (filename, self.width, self.height)
  170.         elif (config['format'] == "ps"):
  171.             surface = cairo.PSSurface(filename, self.width, self.height)
  172.             surface.set_eps(True)
  173.         elif (config['format'] == "svg"):
  174.             surface = cairo.SVGSurface (filename, self.width, self.height)
  175.         elif (config['format'] == "png"):
  176.             surface = cairo.ImageSurface (cairo.FORMAT_ARGB32, int(self.width), int(self.height))
  177.         else:
  178.             surface = cairo.PDFSurface (filename, self.width, self.height)
  179.  
  180.         cr = cairo.Context(surface)
  181.  
  182.         cr.save()
  183.  
  184.         cr.set_line_width(self.LINE_WIDTH)
  185.  
  186.         if (config["background_color"] != "transparent"):
  187.             cr.set_source_rgb(BG_R,BG_G,BG_B)
  188.             cr.rectangle(0,0,self.width,self.height)
  189.             cr.fill()
  190.  
  191.         # draw headers (if present)
  192.  
  193.         if (self.HEADER_FONT_FAMILY != None):
  194.  
  195.             (header_r,header_g,header_b) = split(config["header_color"],2)      
  196.             HEADER_R = (int(header_r, 16)/255.0)
  197.             HEADER_G = (int(header_g, 16)/255.0)
  198.             HEADER_B = (int(header_b, 16)/255.0)
  199.  
  200.             cr.save()
  201.  
  202.             cr.select_font_face(self.HEADER_FONT_FAMILY, cairo.FONT_SLANT_NORMAL, cairo.FONT_WEIGHT_BOLD)
  203.             cr.set_font_size(self.HEADER_FONT_SIZE)
  204.             cr.set_source_rgb(HEADER_R,HEADER_G,HEADER_B)
  205.  
  206.             xbearing, ybearing, hWidth, hHeight, xadvance, yadvance = (cr.text_extents(config["labels"][0]))            
  207.             cr.move_to(self.X_MARGIN + self.sWidth - hWidth, self.Y_MARGIN + self.HEADER_FONT_SIZE)
  208.             cr.show_text(config["labels"][0])
  209.  
  210.             xbearing, ybearing, hWidth, hHeight, xadvance, yadvance = (cr.text_extents(config["labels"][1]))            
  211.             cr.move_to(self.width - self.X_MARGIN - self.SPACE_WIDTH - self.eWidth, self.Y_MARGIN + self.HEADER_FONT_SIZE)
  212.             cr.show_text(config["labels"][1])
  213.  
  214.             cr.stroke()
  215.  
  216.             cr.restore()
  217.  
  218.         # draw start labels at the correct positions
  219.  
  220.         cr.select_font_face(self.LABEL_FONT_FAMILY, cairo.FONT_SLANT_NORMAL, cairo.FONT_WEIGHT_NORMAL)
  221.         cr.set_font_size(self.LABEL_FONT_SIZE)
  222.  
  223.         valueFormatString = config["value_format_string"]
  224.  
  225.         for k in sorted(self.startKeys):
  226.  
  227.             val = float(k)
  228.             label = self.starts[k]
  229.             xbearing, ybearing, lWidth, lHeight, xadvance, yadvance = (cr.text_extents(label))
  230.             xbearing, ybearing, kWidth, kHeight, xadvance, yadvance = (cr.text_extents(valueFormatString % (val)))
  231.  
  232.             cr.set_source_rgb(LAB_R,LAB_G,LAB_B)
  233.             cr.move_to(self.X_MARGIN + (self.sWidth - lWidth), self.Y_MARGIN + self.HEADER_SPACE + (self.highest - val) * self.LINE_HEIGHT * (1/self.delta))
  234.             cr.show_text(label)
  235.  
  236.             cr.set_source_rgb(VAL_R,VAL_G,VAL_B)
  237.             cr.move_to(self.X_MARGIN + self.sWidth + self.SPACE_WIDTH + (self.startMaxLabelWidth - kWidth), self.Y_MARGIN + self.HEADER_SPACE + (self.highest - val) * self.LINE_HEIGHT * (1/self.delta))
  238.             cr.show_text(valueFormatString % (val))
  239.  
  240.             cr.stroke()
  241.  
  242.         # draw end labels at the correct positions
  243.  
  244.         for k in sorted(self.endKeys):
  245.  
  246.             val = float(k)
  247.             label = self.ends[k]
  248.             xbearing, ybearing, lWidth, lHeight, xadvance, yadvance = (cr.text_extents(label))
  249.  
  250.             cr.set_source_rgb(VAL_R,VAL_G,VAL_B)
  251.             cr.move_to(self.width - self.X_MARGIN - self.SPACE_WIDTH - self.eWidth - self.SPACE_WIDTH - self.endMaxLabelWidth, self.Y_MARGIN + self.HEADER_SPACE + (self.highest - val) * self.LINE_HEIGHT * (1/self.delta))
  252.             cr.show_text(valueFormatString % (val))
  253.  
  254.             cr.set_source_rgb(LAB_R,LAB_G,LAB_B)
  255.             cr.move_to(self.width - self.X_MARGIN - self.SPACE_WIDTH - self.eWidth, self.Y_MARGIN + self.HEADER_SPACE + (self.highest - val) * self.LINE_HEIGHT * (1/self.delta))
  256.             cr.show_text(label)
  257.  
  258.             cr.stroke()
  259.  
  260.         # do the actual plotting
  261.  
  262.         cr.set_line_width(self.LINE_WIDTH)
  263.         cr.set_source_rgb(LINE_R, LINE_G, LINE_B)
  264.  
  265.         for s1,e1 in self.pairs:
  266.             cr.move_to(self.X_MARGIN + self.sWidth + self.SPACE_WIDTH + self.startMaxLabelWidth + self.LINE_START_DELTA, self.Y_MARGIN + self.HEADER_SPACE + (self.highest - s1) * self.LINE_HEIGHT * (1/self.delta) - self.LINE_HEIGHT/4)
  267.             cr.line_to(self.width - self.X_MARGIN - self.eWidth - self.SPACE_WIDTH - self.endMaxLabelWidth - self.LINE_START_DELTA, self.Y_MARGIN + self.HEADER_SPACE + (self.highest - e1) * self.LINE_HEIGHT * (1/self.delta) - self.LINE_HEIGHT/4)
  268.             cr.stroke()
  269.  
  270.         cr.restore()
  271.         cr.show_page()
  272.  
  273.         if (config['format'] == "png"):
  274.             surface.write_to_png(filename)
  275.  
  276.         surface.finish()    
  277.  
  278.     def __init__(self, config):
  279.  
  280.         # since some methods need these, make them local to the class
  281.  
  282.         self.LABEL_FONT_FAMILY = config["label_font_family"]
  283.         self.LABEL_FONT_SIZE = float(config["label_font_size"])
  284.  
  285.         if "header_font_family" in config:
  286.             self.HEADER_FONT_FAMILY = config["header_font_family"]
  287.             self.HEADER_FONT_SIZE = float(config["header_font_size"])
  288.         else:
  289.             self.HEADER_FONT_FAMILY = None
  290.             self.HEADER_FONT_SIZE = None
  291.  
  292.         self.X_MARGIN = float(config["x_margin"])
  293.         self.Y_MARGIN = float(config["y_margin"])
  294.         self.LINE_WIDTH = float(config["line_width"])
  295.  
  296.         if "slope_length" in config:
  297.             self.SLOPE_LENGTH = float(config["slope_length"])
  298.         else:
  299.             self.SLOPE_LENGTH = 300
  300.  
  301.         self.SPACE_WIDTH = self.LABEL_FONT_SIZE / 2.0
  302.         self.LINE_HEIGHT = self.LABEL_FONT_SIZE + (self.LABEL_FONT_SIZE / 2.0)
  303.         self.LINE_START_DELTA = 1.5*self.SPACE_WIDTH
  304.  
  305.         OUTPUT_FILE = config["output"] + "." + config["format"]
  306.  
  307.         # process the values & make the slopegraph
  308.  
  309.         self.readCSV(config["input"])
  310.         self.sortKeys()
  311.         self.findExtremes()
  312.         self.calculateExtents(OUTPUT_FILE, config["format"], config["value_format_string"])
  313.         self.makeSlopegraph(OUTPUT_FILE, config)
  314.  
  315.  
  316. def main():
  317.  
  318.     parser = argparse.ArgumentParser(description="Creates a slopegraph from a CSV source")
  319.     parser.add_argument("--config",required=True,
  320.                     help="config file name to use for  slopegraph creation",)
  321.     args = parser.parse_args()
  322.  
  323.     if args.config:
  324.  
  325.         json_data = open(args.config)
  326.         config = json.load(json_data)
  327.         json_data.close()
  328.  
  329.         Slopegraph(config)
  330.  
  331.     return(0)
  332.  
  333. if __name__ == "__main__":
  334.     main()
Cover image from Data-Driven Security
Amazon Author Page

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.