Skip navigation

Category Archives: Charts & Graphs

Benchmarking/profiling is one of the fundamental practices for tech folk and Feng Shen’s recent post to Hacker News continues this fine tradition with a look at startup & run times for “fibonacci(40)” in seven computer languages (two ‘C’ variants, Clojure, go, python, node & Java).

Good, quick project, but I didn’t think the chart he made (check the site) comparing the results did proper justice (though python is the clear loser even with Clojure’s horrid startup time) to each language, so I charted out the startup times separately:

Here’s a Google Drive spreadsheet for the data that also has larger views of the charts.

In @jayjacobs’ latest post on SSH honeypot passsword analysis he shows some spiffy visualizations from crunching the data with Tableau. While I’ve joked with him and called them “robocharts”, the reality is that Tableau does let you work on visualizing the answers to questions quickly without having to go into “code mode” (and that doesn’t make it wrong).

I’ve been using Jay’s honeypot data for both attack analysis as well as an excuse to compare data crunching and visualization tools (so far I’ve poked at it with R and python) in an effort to see what tools are good for exploring various types of questions.

A question that came to mind recently was “Hmmm…I wonder if there is a patten to the timings of probes/attacks?” and I posited that a time-series view across the days would help illustrate that. To that end, I came up with the idea of breaking the attacks into one hour chuncks and build a day-stacked heatmap which could be filtered by country. Something like this:

I’ve been wanting to play with D3 and exploring this concept with it seemed to be a good fit.

Given that working with the real data would entail loading a ~4MB file every time someone viewed this blog post, I put the working example in a separate page where you can do a “view source” to see the code. Without the added complexity of a popup selector and loading spinner, the core code is about 50 lines, much of which could be condensed even further since it’s just chaining calls in javascript. I cheated a bit and used jQuery, too, plus made some of it dependent on WebKit (the legend may look weird in Firefox) due to time constraints.

The library is wicked simple to grok and makes it easy to come up with new ways to look at data (as you can see from the examples gallery on the D3 site).

Unfortunately, no real patterns emerged, but I’m going to take a stab at taking the timestamps (which is the timestamp at the destination of the attack) and align it to the origin to see if that makes a difference in the view. If that turns up anything interesting, I’ll make another quick post on it.

Given that much of data (“big” or otherwise) analysis is domain knowledgable folk asking interesting questions, are there any folks out there who have questions that they’d like to see explored with this data set?

I had a few moments this past weekend to play with an idea for visualizing the passwords used against the honeypot @jayjacobs set up. While it’s not as informative as Jay’s weekend endeavors:

it is pretty, and it satisfied my need to make a word cloud out of useful data.

The image below is of the top 500 passwords used against the honeypot and requires an SVG-capable browser and also requires horizontal scrolling, so you can view or download it standalone if there are any issues. For those generally SVG-challenged, there’s also a slightly less #spiffy PNG version to view as well.

123456password123412312345112testtest123qwertyabc1231234567passwdp@ssw0rd1qaz2wsxPassword1231q2w3e123qwebranburicarOOtoracleqazwsx111111@#$%redhat0000usertester1111passabcwww123456781q2w3e4r123456789passw0rdadminroot123mastermailr00tabcd1234Password1postgrestempwebftptooralexaaaasdfbillssadmin123linuxasdfgh123qaz123456qwertyMySQLpa55w0rdwebadminq1w2e3r4pass123zxcvbnm0724939114654321123123qwe123testingtesttestserver7hur@y@t3am$#@apachetemp123mucleuscacarootdiffiehellmangroupexchangesha11234567890administratorwebmasterokmnjichangemeqwertyuiop000000BUNdAS@#$RT%GQEQW#%QWvkvadaclasa1qaz2wsx3edcp4ssw0rdrootrootcarto0ns11backupguestq1w2e3alupiguszaq12wsxdiana4everworlddominationstudentadmin1test1ftpuserdkagh@#$Pa$$w0rddoarmata86abc1234123abcP@$$w0rdnagiosabcdefdavidinternetinfodemooracle12312qwaszxCiuciuka321michaelprivateletmeinqazwsxedc1qa2ws3edaicuminesirhack123qweqweroot123456cacutzaasdf1234andrewadrian140489root1234diaconusanduborissoxy1welcome510326mazdaasd123wwwdatametallicaTkdghkxkd_salesqwer1234scricideeapruebarichard1ntll1tch1qaz@WSXasdfghjklpublic1qa2wsjohntomcatKiliN6#Th3Ph03$%nix@NdR3birDmysql123iamh4ckst4rf0r3verroutermanageramandaguest123web123shellaceraspire1QAZXSW2testesysadmin11111jamesserver1cyrusinfo123defaultFum4tulP0@t3Uc1d3R4uD3T0t@#$%%supportrobertqwerty123user123jessicafedoranobody2wsx3edctindoor355postmaster6gy7cgq1w2e3r4t5zxcvbnchris1234qwerQAZ2wsx0933353329root123451q2w3e4r5tnicolehttptest1234paulp@sswordsamsungdanutzaa1postfixoracle1it00zsystemdanielaccesswilliamcomputerqazwsx123root1dataasteriskzh3I5LiK3P4rtY@v3rsonny2hack121212mikeqnlkOF2NV71qaz2wsx3edc4rfvssh4georgejoshua123surusanetworkP@55W0RDtestuserroxiroxikentr890httpdqweasdzxcannaQWEASDr00t12354321salajan123sex4s3xyg4ymnbvcxzsnow786just188uniserverroot2145pass1234qweraaaaaa1q2w3e4r5t6yroot@#serviceemaildannysex4plplbrianserver123trash1qazse4newsabcdefgzaq123camels1alanrwwtxadmfalcon#7364angeleltmzmdnjao123@#$gamesdkaghzexzexunixadamfranknimdaclamavambersecretvmwareroot01libraryoffice321graciesquidsarah@#visitorstevenmarychinadavejackjeanoliverpass1danjulietest2benreagancarlyxxxfredtim666sammarkasduser1faxnicktsbinmaxgrace%s4kural0v3iloveyou123321ubuntudarwinkevinbrett

This is an inaugural post for @MetricsHulk, on the condition that there are few – if any – “ALL CAPS” bits. Q3&4 tend to be “report season”, and @MetricsHulk usually has some critiques, praises, opines and suggestions (some smashes, too) to offer as we are inundated with a blitz of infographics.

The always #spiffy @WhiteHatSec released their 2011 Web Site Security stats report [direct link (PDF)] last week (here’s one of their teaser tweets):

With over 7,000 sites and hundreds of diverse organizations represented in the report, it is a great resource for folks to see how they stack up (more on that in a bit). Security folks should also take some encouragement from the report since:

  • Real vulnerabilities are down (significantly)
  • WAFs can help
  • Vulnerabilities are getting fixed faster (when found)

@WhiteHatSec does a fine job summarizing key & extended findings (hint: read the report), and they are awesomely up-front and honest with regard to the findings (see pages 4 & 5 for their analysis on why the ‘good stats’ might be so good).

The report is chock-full of data. Real. Data. The only way it could have been better data-wise is if they provided a Google Docs bundle of raw numbers. (NOTE: I didn’t get all the data in there, but it has decent amount from the report)

I do think there is some room for improvement. Take, for example, the – sigh – donut chart on page 9. I might be inclined to refrain from comment if this was one of those hipster infographics that seem to be everywhere these days. A pie chart isn’t much better, but at least we’re able to process the relative sizes a bit better when the actual angles are present. Here’s a before/after makeover for your comparison/opine (click for larger version):

We get an immediate sense of scale from the bars and it removes the need for the “Frosted Lucky Charms” color-wheel effect. The @WhiteHatSec folk use bars (very appropriately) almost everywhere else, so I’m not sure what the design decision was for deviating for this part of the report.

The next bit that confused me was Figure 18 (page 15). I’m having difficulty both figuring out where the “79” value comes from (I can’t get to it by averaging the values presents) and grok’ing the magnitude of the differences from the bubbles. So, here’s another before/after makeover for your comparison/opine (click for larger version):

Finally, I think Figure 23 & 24 could do with a bit of a slopegraph makeover, as the spirit of the visualization is to show year-over-year differences. The first two slopegraphs used the “Tufte binning technique“, so you’ll need to refer to the companion data tables if you want exact numbers for comparison (the trend is more important, IMO).

Average Days Open

Average Days to Close

Remediation Rates by Year

(You can also download easier to read PDFs of the slopegraphs)

Absolutely no one should take the makeover suggestions as report slander. As stated at the beginning of the post, @WhiteHatSec is open about the efficacy of their data and analysis, plus they provide actual data. The presentation of stats & trending by industry and vulnerability type should help any organization with an appsec program figure out if they are doing better or worse the others in their sector and see if they are smashing bugs with similar success. It also gives the general infosec community a view that we would otherwise not have. I would encourage other organizations to follow @WhiteHatSec’s example, even if it means more donut charts (mmm…donuts).

What information did you glean from the WhiteHat report, or what makeovers would you encourage for the next one?

UPDATE: I had to remove the Google Insight widgets and replace them with static images. There was inconsistent loading far too often in non-Chrome browsers. Click on the graphs to go to the Google Insights detail pages for more interaction with the data.

Information security breaches have been the “new black” in the past eighteen months, with the latest fashion-trend being the LinkedIn passwords fiasco. This got me thinking: what is the “half-life” of a breach? It’s becoming obvious that users do not see the security of their information as a service differentiator or even a tier one decision point when choosing to use a new social network or online application. (How many of you closed out your LinkedIn accounts?) But, just how quickly does their attention wane from a breach event? Pretty quickly, if one formulates a conclusion based on Google Insights search data.

Let’s start with LinkedIn

We have a burst that – if one is generous – captures interest for about a week. Even more interesting is that it seems said interest was limited to very specific geographic regions:

Plus, the incident continues to help show the lack of impact breaches have on stock price:

But, LinkedIn is not exactly a broad-reaching service (i.e. it’s no Facebook).

Breaches Don’t Stop The Shopping

Investor exuberance notwithstanding, LinkedIn is kinda boring since folks use it to actually publish personal data to the world. While it has some private messaging and may hold some financial account information, it’s not like Zappos which has payment information and shopping history, and who was also breached this year. How long did they get attention?

While there is a longer, flat tail, attention is still about seven days (you can interact with the chart and zoom in to verify that claim) and Zappos’ overall consumer interest does not seem to have waned:

Sownage Revisited

The word “Sony” is now almost synonymous with “breach” in the minds of most information security folk. It’s our “go to” example when talking with executives and application teams. Unfortunately, for the purposes of comparative analysis, it wasn’t just one breach. So, while the chart shows closer to a ten week interest period, that makes sense when one considers there were over ten news stories (one for each new breach):

I won’t go into the details as to why including a stock price chart has little efficacy in determining breach effect for Sony (it’s been analyzed to death), but a comparative look at “PlayStation” (with an added factor for “iPad”) shows (to me) that the breaches had far less impact on interest in the PlayStation (one of the main breach targets) than the iPad had:

Breaches Spook The Spooks

So, if breaches are of little interest to the consumer, they must have greater impact on the community that has some skin in the game, right? Kinda. If we look at the RSA & Lockheed breaches:

We see that the Lockheed breach kept attention from mid-April to about mid-July (12 weeks) and RSA spiked twice for about four weeks each time. Both of them were intertwined in the news and RSA had numerous (to be blunt) PR-events that helped keep focus on both.

RSA is part of EMC, so a stock view analysis has many other complexities that make it less than ideal, but both companies (EMC & Lockheed) did not seem to suffer from the extended initial breach interest:

Only One View

I mentioned at the beginning of the post that this was intended to be a single-factor analysis, limited to what insights Google gleans from what folks are searching for. It doesn’t provide a view into enterprise contractual agreements, service usage patterns or even blogger/social media sentiment analysis. Yet, folks search for what they are interested in and when I add a few parameters to the LinkedIn chart:

we see that people are far more interested in Scarlett Johansson, gas prices and even Snookie than they are in LinkedIn insecurity. Perhaps breaches just aren’t sexy enough or personally impacting enough to truly matter…even to security professionals.

The Fund For Peace (FFP) and Foreign Policy jointly released the 2012 version of the “failed states index” (FSI). From the FFP site, the FSI:

…focuses on the indicators of risk and is based on thousands of articles and reports that are processed by our CAST Software from electronically available sources.

I read it every year (mostly due to being an ardent reader of Foreign Policy magazine) and find the rankings, methodology & insights quite intriguing. With my recent work on slopegraphs, I thought this would be a good data set to play with to determine what – if any – features were necessary to support rank order (and to provide some impetus to finally refactor the code to support multi-column slopegraphs…more on that later).

However, I was not looking forward to transcribing the data from the Flash visualization on the Foreign Policy web site. There are HTML grids on the FFP site but I really just wanted the overall rankings (i.e. no sub-indices) and noticed this interesting scrollable mini-grid on one of the FFP FSI pages:

Thankfully[?] it’s an IFRAME and I was able to pull 2010, 2011 & 2012 data in a very usable format by manipulating this URL: http://www.fundforpeace.org/global/tables/fsiindex2010_sml.htm.

After some quick transformations, I had two CSV files for a 2010-2012 comparison and a 2011-2012 comparison.

(Before continuing, I feel the need to point out that the data, methodology, etc is 100% Copyright © 2012 The Fund for Peace as they overtly point out many times on their site.)

When I threw the data into the slopegraph tool, it was immediately obvious that I was missing something important: the ability to specify sort order for the data. For most slopegraphs, the code works well since our brains expect the larger values on the top. For a rank-order slopegraph, that sort order (for the most part) should be ascending vs descending to best represent changes in rank position. It does feel odd that being “#1” in the FSI actually means you’re really a loser, but I didn’t make the rules for their index.

So, PySlopegraph now handles two column rank order slopegraphs and, as you’ll see in part two, also handles multi-column slopegraphs (but that bit needs some work). The code will be up on github in a couple days as I’ve also got some half-finished support for Processing.js and Paper.js that I want to finish before another push. If anyone needs it sooner, just @ or DM me.

Now, For The Data

The “Top 25” (that sounds way too positive for what it really means) slopegraph is the easiest to read (as it’s the smallest). It is also where Foreign Policy & FFP focus some dataviz effort as well (though they do have visualizations for all the data). Here’s the slopegraph showing the rank order chance from 2010 to 2012:

The full slopegraphs are tall slopegraphs (I’ve been prototyping some ways to make tall ones more useful, but that’s nowhere near ready for public consumption). You may just want to grab the two PDFs and look there vs in this post:

Rank Order Comparison :: 2010/2012


Rank Order Comparison :: 2011/2012

While it requires scrolling, the changes in rank are immediately noticeable as is the fact that the the FFP folk allow for ties that leave “holes” in the table. I think you really get a feel for which countries are stable, improving and declining very quickly with the slopegraph version, but I’d like to hear your thoughts if you have an opine you’d like to share.

Stay tuned for part two!

UPDATE: It seems my use of <script async> optimization for Raphaël busted the inline slopegraph generation. Will work on tweaking the example posts to wait for Raphaël to load when I get some time.

So, I had to alter this to start after a user interaction. It loaded fine as a static, local page but seems to get a bit wonky embedded in a complex page. I also see some artifacts in Chrome but not in Safari. Still, not a bad foray into basic animation.

Animate Slopegraph


There were enough eye-catching glitches in the experimental javascript support and the ugly large-number display in the spam example post that I felt compelled to make a couple formatting tweaks in the code. I also didn’t have time to do “real” work on the codebase this weekend.

So, along with spacing adjustments, there’s now an “add_commas” non-mandatory option that will toss commas in large numbers so they’re easy to read. Here’s an example of the new output (both the Raphaël display and commas):


As usual, it’s up on github