Skip navigation

Author Archives: hrbrmstr

Don't look at me…I do what he does — just slower. #rstats avuncular • ?Resistance Fighter • Cook • Christian • [Master] Chef des Données de Sécurité @ @rapid7

Earlier this evening, I somewhat half-heartedly challenged @jayjacobs that he & I should be generating one data visualization per day. I didn’t specify anything else (well, at least that I can disclose publicly, for now) but I think I’m going to try to formalize a bit of the ‘rules’ before I get some shut-eye:

– The datavis _must_ be posted to either one of our blogs (i.e it and the data behind it must be shareable). Alternative: we setup a blog just for this.
– The data behind the datavis _must_ also be public data and either referenced or published with the datavis.
– The datavis _must_ answer a question. No random generation of numbers for a lazy bar chart, etc. Said question must be posed with the datavis and (hopefully) a bit of a short story/explanation with it and the datavis in the blog post.
– The datavis cannot be a blatant repeat of a previous datavis.
– The datavis does not have to break new ground (i.e. bar charts are #spiffy).
– The datavis _must_ be open for comments.
– There are no restrictions on what tools/languages can be used (i.e. Jay can cheat and make Tableau robocharts).
– There are no restrictions on the type of data being analyzed & visualized. Ideally, it will be from infosec or IT, but restricting it to those areas might make the challenge more difficult (the ‘public’ bit).

I’ll sleep on that and, perhaps, reduce the requirement to one per week after talking to Jay again this week.

Your thoughts & input on this challenge are most welcome in the comments, especially if you want to suggest things we can visualize. Also, feel free to volunteer to join us in this, once we start it.

Now that I’m back in the US and relaxing, I can take time for one final blather on the [PC Maker Slopegraph](http://rud.is/b/2013/04/11/ugly-tables-vs-slopegraphs-pc-maker-shipments-marketshare/) post from earlier in the week.

Slopegraphs can be quite long depending on the increment between discrete entries (as I’ve [pointed out before](http://rud.is/b/2012/06/07/slopegraphs-in-python-exploring-binningrounding/)). You either need to do binning/rounding, change the scale or add some annotations to the chart to make up for the length. Binning/rounding seems to make the most sense since you can add a table for precision but give the reader a good view of what you’re trying to communicate in as compact a fashion as possible.

I’ll, again, ask the reader, what tells you which PC maker is on top: this table:

Screen-Shot-2013-04-10-at-6.14.56-PM

or these slopegraphs:

PC Maker Shipments (in thousands, rounded to nearest thousand)
pcs

PC Maker Market Share (rounded to nearest %)
pcs-share

Labeled properly, the rounding makes for a much more compact chart and doesn’t detract from the message, especially when I also include a much prettier, quick precision reference via Google Fusion Tables:

(though the column sort feature seems a bit wonky for some reason…).

Given that the focus was on the top individual maker, the “Other” category is just noise, so excluding it is also not an issue. If we wanted to tell the story of how well individual makers are performing against that bucket of contenders or point-players, then we would include that data and use other visualizations to communicate whatever conclusions we want to lead the reader to.

Remember, data tables and visualizations should be there to help tell your story, not detract from it or require real work/effort to grok (unless you’re breaking new visualization ground, which is most definitely not happening in the Ars story).

While not perfect, I noticed that it was possible to make a pretty decent slopegraph over at [Datawrapper](http://datawrapper.de/) as I was poking at some new features they announced recently. As an example, I ran one of the charts from my [most recent](http://rud.is/b/2013/04/11/ugly-tables-vs-slopegraphs-pc-maker-shipments-marketshare/) blog post as an example.

If they had an option to do away with the gray horizontal lines, it wouldn’t be a bad slopegraph at all. I’m not sure how it’d handle overlaps, but if you have some basic data and don’t feel like messing with my Python or R code (and don’t want to do any machinations in Excel), Datawrapper might not be a bad choice.

Andrew Cunningham (@IT_AndrewC) posted an article—If you make PCs and you’re not Lenovo, you might be in trouble—on the always #spiffy @arstechnica that had this horrid table in it:

Screen-Shot-2013-04-10-at-6.14.56-PM

That table was not made by Andrew (so, don’t blame him) but Ars graphics folk *could* have made the post a bit more readable.

I’m not going to bother making a prettier table (it’s possible, but table formatting is not the point of this post), but I am going to show two slopegraphs that communicate the point of the post (that Lenovo is sitting pretty) much better:

PC Maker Market Share
pcs-share

PC Maker Shipments (in thousands)

pcs

They’re a little long (a problem I’ve noted with slopegraphs previously) but I think they are much better at conveying message intended by the story. I may try to tweak them a bit or even finish the D3 port of my slopegraph library when I’m back from Bahrain.

For those finding this post from the Bahrain eGov conference, I’d like to re-extend a hearty “Thank you!” for being one of most engaging, interactive and intelligent audiences I’ve ever experienced. I truly enjoyed talking with all of you.

You can find the slides on my Dropbox [PDF] and please do not hesitate to bounce any questions here or on Twitter (@hrbrmstr).

Screenshot_4_8_13_8_03_AM

As a result of a prod by @djbphaedrus I’m off to the Bahrain International eGovernment Forum this week to host a two hour workshop on “information risk reality”. As a result, blogging & tweeting will be at significantly reduced levels, so enjoy the brief respite from my blatherings while you can :-)

If you happen to be in Bahrain while I’m there, drop me a note and I’m sure I can find time between Tuesday night and Thursday afternoon to say hello!

it's about the people…

it’s about the people… (click for clip)

The basic technique of cybercrime statistics—measuring the incidence of a given phenomenon (DDoS, trojan, APT) as a percentage of overall population size—had entered the mainstream of cybersecurity thought only in the previous decade. Cybersecurity as a science was still in its infancy, as many of its basic principles had yet to be established.

At the same time, the scientific method rarely intersected with the development and testing of new detection & prevention regimens. When you read through that endless stream of quack cybercures published daily on the Internet and at conferences like RSA, what strikes you most is not that they are all, almost without exception, based on anecdotal or woefully inadequately small evidence. What’s striking is that they never apologize for the shortcoming. They never pause to say, “Of course, this is all based on anecdotal evidence, but hear me out.” There’s no shame in these claims, no awareness of the imperfection of the methods, precisely because it seems to eminently reasonable that the local observation of a handful of minuscule cases might serve the silver bullet for cybercrime, if you look hard enough.


But, cybercrime couldn’t be studied in isolation. It was as much a product of the internet expansion as news and social media, where it was so uselessly anatomized. To understand the beast, you needed to think on the scale of the enterprise, from the hacker’s-eye view. You needed to look at the problem from the perspective of Henry Mayhew’s balloon. And you needed a way to persuade others to join you there.

Sadly, that’s not a modern story. It’s an adapted quote from chapter 4 (pp. 97-98, paperback) of The Ghost Map, by Steven Johnson, a book on the cholera epidemic of 1854.

I won’t ruin the book nor continue my attempt at analogy any further. Suffice it to say, you should read the book—if you haven’t already—and join me in calling out for the need for the John Snow of our cyber-time to arrive.