Open Source Archives - Page 2 of 2

Category Archives: Open Source

My Picks From #PyCon2013

2013-03-19 – 07:20
Posted in Big Data, Data Analysis, Data Visualization, Development, MongoDB, Open Source, Programming, Python
Tagged Automation, RaspberryPi
Leave a Comment

alogo While you can (and should) view [all the presentations](https://speakerdeck.com/pyconslides) from #PyCon2013, here are my picks for the ones that interested me the most, as they focus on scaling, mapping, automation (both web & electronics) and data analysis:

– [Chef: Why you should automate your web infrastructure](https://speakerdeck.com/pyconslides/chef-why-you-should-automate-your-web-infrastructure-by-kate-heddleston) by Kate Heddleston
– [Messaging at Scale at Instagram](https://speakerdeck.com/pyconslides/messaging-at-scale-at-instagram-by-rick-branson) by Rick Branson
– [Python at Netflix](https://speakerdeck.com/pyconslides/python-at-netflix-by-jeremy-edberg-corey-bertram-and-roy-rapoport) by Jeremy Edberg, Corey Bertram, and Roy Rapoport
– [Real-time Tracking and Mapping of Geographic Objects](https://speakerdeck.com/pyconslides/real-time-tracking-and-mapping-of-geographic-objects-by-ragi-burhum) by Ragi Burhum
– [Scaling Realtime at DISQUS](https://speakerdeck.com/pyconslides/scaling-realtime-at-disqus-by-adam-hitchcock) by Adam Hitchcock
– [A Crash Course in MongoDB](https://speakerdeck.com/pyconslides/a-crash-course-in-mongodb)
– [Server Log Analysis with Pandas](https://speakerdeck.com/pyconslides/server-log-analysis-with-pandas-by-taavi-burns) by Taavi Burns
– [Who’s There – Home Automation with Arduino and RaspberryPi](https://speakerdeck.com/pyconslides/whos-there-home-automation-with-arduino-and-raspberrypi-by-rupa-dachere) by Rupa Dachere
x
– [Why you should use Python 3 for text processing](https://speakerdeck.com/pyconslides/why-you-should-use-python-3-for-text-processing-by-david-mertz) by David Mertz
– [Awesome Big Data Algorithms](https://speakerdeck.com/pyconslides/awesome-big-data-algorithms-by-titus-brown) by Titus Brown

A huge thanks to the speakers and conference organizers for making these resources freely available, especially to those of us who were not able to attend the conference.

Follow up/Resources :: GRC-T18 – Data Analysis and Visualization for Security Professionals #RSAC

2013-02-27 – 07:30
Posted in Charts & Graphs, d3, Data Analysis, Data Visualization, DataVis, DataViz, infographics, Information Security, Open Source
Leave a Comment

Many thanks to all who attended the talk @jayjacobs & I gave at RSA on Tuesday, February 26, 2013. It was really great to be able to talk to so many of you afterwards as well.

We’ve enumerated quite a bit of non-slide-but-in-presentation information that we wanted to aggregate into a blog post so you can viz along at home. If you need more of a guided path, I strongly encourage you to take a look at some of the free courses over at [Coursera](https://www.coursera.org/).

For starters, here’s a bit.ly bundle of data analysis & visualization bookmarks that @dseverski & I maintain. We’ve been doing (IMO) a pretty good job adding new resources as they come up and may have some duplicates to the ones below.

People Mentioned

– [Stephen Few’s Perceptual Edge blog](http://www.perceptualedge.com/) : Start from the beginning to learn from a giant in information visualization
– [Andy Kirk’s Visualising Data blog](http://www.visualisingdata.com/) (@visualisingdata) : Perhaps the quintessential leader in the modern visualization movement.
– [Mike Bostock’s blog](http://bost.ocks.org/mike/) (@mbostock) : Creator of D3 and producer of amazing, interactive graphics for the @NYTimes
– [Edward Tufte’s blog](http://www.edwardtufte.com/tufte/) : The father of what we would now identify as our core visualization principles & practices

Tools Mentioned

– [R](http://www.r-project.org/) : Jay & I probably use this a bit too much as a hammer (i.e. treat ever data project as a nail) but it’s just far too flexible and powerful to not use as a go-to resource
– [RStudio](http://www.rstudio.com/) : An *amazing* IDE for R. I, personally, usually despise IDEs (yes, I even dislike Xcode), but RStudio truly improves workflow by several orders of magnitude. There are both desktop and server versions of it; the latter gives you the ability to setup a multi-user environment and use the IDE from practically anywhere you are. RStudio also makes generating [reproducible research](http://cran.r-project.org/web/views/ReproducibleResearch.html) a joy with built-in easy access to tools like [kintr](http://yihui.name/knitr/).
– [iPython](http://ipython.org/) : This version of Python takes an already amazing language and kicks it up a few notches. It brings it up to the level of R+RStudio, especially with it’s knitr-like [iPython Notebooks](http://ipython.org/ipython-doc/dev/interactive/htmlnotebook.html) for–again–reproducible research.
– [Mondrian](http://www.theusrus.de/Mondrian/) : This tool needs far more visibility. It enables extremely quick visualization of even very large data sets. The interface takes a bit of getting used to, but it’s faster then typing R commands or fumbling in Excel.
– [Tableau](http://www.tableausoftware.com/) : This tool may be one of the most accessible, fast & flexible ways to explore data sets to get an idea of where you need to/can do further analysis.
– [Processing](http://processing.org/) : A tool that was designed from the ground up to help journalists create powerful, interactive data visualizations that you can slipstream directly onto the web via the [Processing.js](http://processingjs.org/) library.
– [D3](http://d3js.org/) : The foundation of modern, data-driven visualization on the web.
– [Gephi](https://gephi.org/) : A very powerful tool when you need to explore networks & create beautiful, publication-worthy visualizations.
– [MongoDB](http://www.mongodb.org/) : NoSQL database that’s highly & easily scaleable without a steep learning curve.
– [CRUSH Tools by Google](https://code.google.com/p/crush-tools/) : Kicks up your command-line data munging.

Slopegraphs in Python

2012-05-28 – 21:01
Posted in Charts & Graphs, Development, Open Source, OS X, Python
Tagged slopegraph, table-chart, tufte
Comments (2)

(NOTE: You can keep up with progress best at github, but can always search on “slopegraph” here or just hit the tag page: “slopegraph” regularly)

I’ve been a bit obsessed with slopegraphs (a.k.a “Tufte table-chart”) of late and very dissatisfied with the lack of tools to make this particular visualization tool more prevalent. While my ultimate goal is to have a user-friendly modern web app or platform app that’s as easy as a “drag & drop” of a CSV file, this first foray will require a bit (not much, really!) of elbow grease to be used.

For those who want to get right to the code, head on over to github and have a look (I’ll post all updates there). Setup, sample & source are also below.

First, you’ll need a modern Python install. I did all the development on Mac OS Mountain Lion (beta) with the stock Python 2.7 build. You’ll also need the Cairo 2D graphics library which built and installed perfectly from source, even on ML, so it should work fine for you. If you want something besides PDF rendering, you may need additional libraries, but PDF is decent for hi-res embedding, converting to jpg/png (see below) and tweaking in programs like Illustrator.

If you search for “Gender Comparisons” in the comments on this post at Tufte’s blog, you’ll see what I was trying to reproduce in this bit of skeleton code (below). By modifying the CSV file you’re using [line 21] and then which fields are relevant [lines 45-47] you should be able to make your own basic slopegraphs without much trouble.

If you catch any glitches, add some tweak or have a slopegraph “wish list”, let me know here, twitter (@hrbrmstr) or over at github.

```
# slopegraph.py
```
```
#
```
```
# Author: Bob Rudis (@hrbrmstr)
```
```
#
```

# Basic Python skeleton to do simple two value slopegraphs

# with output to PDF (most useful form for me...Cairo has tons of options)

```
#
```

# Find out more about & download Cairo here:

```
# http://cairographics.org/
```
```
#
```

# 2012-05-28 - 0.5 - Initial github release. Still needs some polish

```
#
```
```
 
```
```
import csv
```
```
import cairo
```
```
 
```

# original data source: http://www.calvin.edu/~stob/data/television.csv

```
 
```
```
# get a CSV file to work with 
```
```
 
```

slopeReader = csv.reader(open('television.csv', 'rb'), delimiter=',', quotechar='"')

```
 
```
```
starts = {} # starting "points"/
```
```
ends = {} # ending "points"
```
```
 
```

# Need to refactor label max width into font calculations

# as there's no guarantee the longest (character-wise)

```
# label is the widest one
```
```
 
```
```
startLabelMaxLen = 0
```
```
endLabelMaxLen = 0
```
```
 
```

# build a base pair array for the final plotting

# wastes memory, but simplifies plotting

```
 
```
```
pairs = []
```
```
 
```
```
for row in slopeReader:
```
```
 
```

	# add chosen values (need start/end for each CSV row)

	# to the final plotting array. Try this sample with

	# row[1] (average life span) instead of row[5] to see some

```
	# of the scaling in action
```
```
 
```
```
	lab = row[0] # label
```
```
	beg = row[5] # male life span
```
```
	end = row[4] # female life span
```
```
 
```

	pairs.append( (float(beg), float(end)) )

```
 
```

	# combine labels of common values into one string

	# also (as noted previously, inappropriately) find the

```
	# longest one
```
```
 
```
```
	if beg in starts:
```

		starts[beg] = starts[beg] + "; " + lab

```
	else:
```
```
		starts[beg] = lab
```
```
 
```

	if ((len(starts[beg]) + len(beg)) > startLabelMaxLen):

		startLabelMaxLen = len(starts[beg]) + len(beg)

```
		s1 = starts[beg]
```
```
 
```
```
 
```
```
	if end in ends:
```
```
		ends[end] = ends[end] + "; " + lab
```
```
	else:
```
```
		ends[end] = lab
```
```
 
```

	if ((len(ends[end]) + len(end)) > endLabelMaxLen):

		endLabelMaxLen = len(ends[end]) + len(end)

```
		e1 = ends[end]
```
```
 
```

# sort all the values (in the event the CSV wasn't) so

# we can determine the smallest increment we need to use

# when stacking the labels and plotting points

```
 
```

startSorted = [(k, starts[k]) for k in sorted(starts)]

endSorted = [(k, ends[k]) for k in sorted(ends)]

```
 
```
```
startKeys = sorted(starts.keys())
```
```
delta = max(startSorted)
```
```
for i in range(len(startKeys)):
```
```
	if (i+1 <= len(startKeys)-1):
```

		currDelta = float(startKeys[i+1]) - float(startKeys[i])

```
		if (currDelta < delta):
```
```
			delta = currDelta
```
```
 
```
```
endKeys = sorted(ends.keys())
```
```
for i in range(len(endKeys)):
```
```
	if (i+1 <= len(endKeys)-1):
```

		currDelta = float(endKeys[i+1]) - float(endKeys[i])

```
		if (currDelta < delta):
```
```
			delta = currDelta
```
```
 
```

# we also need to find the absolute min & max values

```
# so we know how to scale the plots
```
```
 
```
```
lowest = min(startKeys)
```

if (min(endKeys) < lowest) : lowest = min(endKeys)

```
 
```
```
highest = max(startKeys)
```

if (max(endKeys) > highest) : highest = max(endKeys)

```
 
```

# just making sure everything's a number

# probably should move some of this to the csv reader section

```
 
```
```
delta = float(delta)
```
```
lowest = float(lowest)
```
```
highest = float(highest)
```

startLabelMaxLen = float(startLabelMaxLen)

```
endLabelMaxLen = float(endLabelMaxLen)
```
```
 
```

# setup line width and font-size for the Cairo

# you can change these and the constants should

```
# scale the plots accordingly
```
```
 
```
```
FONT_SIZE = 9
```
```
LINE_WIDTH = 0.5
```
```
 
```

# there has to be a better way to get a base "surface"

# to do font calculations besides this. we're just making

# this Cairo surface to we know the max pixel width

# (font extents) of the labels in order to scale the graph

# accurately (since width/height are based, in part, on it)

```
 
```
```
filename = 'slopegraph.pdf'
```

surface = cairo.PDFSurface (filename, 8.5*72, 11*72)

```
cr = cairo.Context (surface)
```
```
cr.save()
```

cr.select_font_face("Sans", cairo.FONT_SLANT_NORMAL, cairo.FONT_WEIGHT_NORMAL)

```
cr.set_font_size(FONT_SIZE)
```
```
cr.set_line_width(LINE_WIDTH)
```

xbearing, ybearing, sWidth, sHeight, xadvance, yadvance = (cr.text_extents(s1))

xbearing, ybearing, eWidth, eHeight, xadvance, yadvance = (cr.text_extents(e1))

xbearing, ybearing, spaceWidth, spaceHeight, xadvance, yadvance = (cr.text_extents(" "))

```
cr.restore()
```
```
cr.show_page()
```
```
surface.finish()
```
```
 
```

# setup some more constants for plotting

# all of these are malleable and should cascade nicely

```
 
```
```
X_MARGIN = 10
```
```
Y_MARGIN = 10
```
```
SLOPEGRAPH_CANVAS_SIZE = 200
```
```
spaceWidth = 5
```
```
LINE_HEIGHT = 15
```
```
PLOT_LINE_WIDTH = 0.5
```
```
 
```

width = (X_MARGIN * 2) + sWidth + spaceWidth + SLOPEGRAPH_CANVAS_SIZE + spaceWidth + eWidth

height = (Y_MARGIN * 2) + (((highest - lowest + 1) / delta) * LINE_HEIGHT)

```
 
```
```
# create the real Cairo surface/canvas
```
```
 
```
```
filename = 'slopegraph.pdf'
```

surface = cairo.PDFSurface (filename, width, height)

```
cr = cairo.Context (surface)
```
```
 
```
```
cr.save()
```
```
 
```

cr.select_font_face("Sans", cairo.FONT_SLANT_NORMAL, cairo.FONT_WEIGHT_NORMAL)

```
cr.set_font_size(FONT_SIZE)
```
```
 
```
```
cr.set_line_width(LINE_WIDTH)
```

cr.set_source_rgba (0, 0, 0) # need to make this a constant

```
 
```

# draw start labels at the correct positions

# cheating a bit here as the code doesn't (yet) line up

```
# the actual data values
```
```
 
```
```
for k in sorted(startKeys):
```
```
 
```
```
	label = starts[k]
```

	xbearing, ybearing, lWidth, lHeight, xadvance, yadvance = (cr.text_extents(label))

```
 
```
```
	val = float(k)
```
```
 
```

	cr.move_to(X_MARGIN + (sWidth - lWidth), Y_MARGIN + (highest - val) * LINE_HEIGHT * (1/delta) + LINE_HEIGHT/2)

```
	cr.show_text(label + " " + k)
```
```
	cr.stroke()
```
```
 
```

# draw end labels at the correct positions

# cheating a bit here as the code doesn't (yet) line up

```
# the actual data values
```
```
 
```
```
for k in sorted(endKeys):
```
```
 
```
```
	label = ends[k]
```

	xbearing, ybearing, lWidth, lHeight, xadvance, yadvance = (cr.text_extents(label))

```
 
```
```
	val = float(k)
```
```
 
```

	cr.move_to(width - X_MARGIN - eWidth - (4*spaceWidth), Y_MARGIN + (highest - val) * LINE_HEIGHT * (1/delta) + LINE_HEIGHT/2)

```
	cr.show_text(k + " " + label)
```
```
	cr.stroke()
```
```
 
```
```
# do the actual plotting
```
```
 
```
```
cr.set_line_width(PLOT_LINE_WIDTH)
```

cr.set_source_rgba (0.75, 0.75, 0.75) # need to make this a constant

```
 
```
```
for s1,e1 in pairs:
```

	cr.move_to(X_MARGIN + sWidth + spaceWidth + 20, Y_MARGIN + (highest - s1) * LINE_HEIGHT * (1/delta) + LINE_HEIGHT/2)

	cr.line_to(width - X_MARGIN - eWidth - spaceWidth - 20, Y_MARGIN + (highest - e1) * LINE_HEIGHT * (1/delta) + LINE_HEIGHT/2)

```
	cr.stroke()
```
```
 
```
```
cr.restore()
```
```
cr.show_page()
```
```
surface.finish()
```

Getting Things Done : A Cobbler’s Tale

2012-01-07 – 17:46
Posted in GTD, Open Source
Tagged Android, BSD, Cloud storage, Cross-platform software, Data synchronization, dropbox, File hosting, GUI, Linux, Mac OS X, Microsoft Windows, mobile devices, Online backup services, syncing to my web site, USD, Vim, web interface
Leave a Comment

Starting sometime mid-year in 2011, I began having more ‘stuff’ to do than even my eidetic memory could help with. It’s not that I forgot things, per se, but the ability to mentally recall and prioritize work, family, personal and other tasks finally required some external assistance and I resolved to find a GTD system by the end of January.

Being an OS X user, there are great choices out there (both of those have iOS sister-apps, too). However, I’m not just an OS X user. As I was saying to @myrcurial (and even @reillyusa) the other day, I dislike being locked in to proprietary solutions. Plus, the $120 price tag for OmniFocus (OS X + iPad) seemed like a king’s ransom, especially since I am also an Android user (OmniFocus only has an iOS app) and pay for both Dropbox and various virtual hosts. Believing that I still have some usable skills left, I decided to — as @hatlessec characterized my solution — cobble something together on my own.

Once upon a time, I did maintain a .plan file (when I had sysadmin duties), but really doubted the efficacy of it and finger in the age of the modern web. The thought of machinating SQLite databases, parsing XML files or even digesting bits of JSON seemed overkill for my purposes. Searching through my Evernote clippings, my memory was drawn back to one of my favorite sites, Lifehacker, which has regular GTD coverage. After re-poking around a bit, I decided to settle on @ginatrapani’s @todotxtapps for meeting the following requirements (in order):

It uses a plain text file with a simple structure – (no exposit necessary…the link is a quick read and the format will become second nature after a glance)
It is Free (mostly) – mobile apps are ~$2.00USD each and if you need more than free Dropbox hosting and want a web interface, there are potential hosting costs. If you count your setup time as money, then add that in, too.
It runs on OS X, BSD, Windows & Linux – no platform lock-in
It has a thriving community – without being backed by a vendor (like the really #spiffy @omnigroup), a strong developer & user community is extremely important to ensure the longevity of the codebase. Todo.txt has very passionate developers and users who are very active on all fronts.
It is very extensible & integrable – I used @alfredapps to give me a quick OS X “GUI CLI” to the todo.sh commands. I built an Alfred keyword for my most used Todo.txt functions along with a generic one to bring up vim in a Terminal.app window for a free-form edit. Alfred’s shell-commands also give me @growlmac integration (so I get some feedback after working with tasks).
I also integrated it with @geektool. I won’t steal the thunder from other GeekTool/Todo.txt integration posts (like this one). The GeekTool integration puts my todo’s right in front of me all the time on all my desktops.

By storing my todo directory in @dropbox, it also makes syncing to my web site and mobile devices a snap.

On my server, I have a simple cron job setup to e-mail me my todo’s at the beginning of the day (again, so it’s in front of me wherever I look).
It runs on iOS AND Android – again, no platform lock-in
There’s an optional web interface – the one I linked to (there are others) is far from ideal, but it was quick to setup and has no overt security issues. Properly protected behind nginx or apache, you should have no issues if you need to have a web version handy.

So, while the setup is a bit more than just downloading two commercial apps, it has many other benefits and isn’t too much more work if you already have some of the other pieces in place. If you want more info on the Alfred scripts or any other setup component, drop me a note in the comments.

While I’ve read about many GTD solutions and seen many user-stories of how they met their GTD needs, I’d be interested in what tools you use to ‘get things done’…

rud.is