Skip navigation

Author Archives: hrbrmstr

Don't look at me…I do what he does — just slower. #rstats avuncular • ?Resistance Fighter • Cook • Christian • [Master] Chef des Données de Sécurité @ @rapid7

Sadly, I could not make it to this year’s Workshop on the Economics of Information Security. However, the intrepid conference organizers were quick to post the papers that were presented, and I had a chance to sift through them to pick out what I believe to be the best of the best (they are all worth reading).

A Focus On The Bottom Line

First up is “Measuring the Cost of Cybercrime” by Ross Anderson, Chris Barton, Rainer B ̈ohme, Richard Clayton, Michel J.G. van Eeten, Michael Levi, Tyler Moore & Stefan Savage. They developed an interesting framework:

which tries to cover all angles of loss (including costs of defense) as well as that of gain by the criminals. They don’t just talk theory & math. They did actual investigations and have produced a great breakdown of costs & criminal gains on page 24 of the paper (click for larger image):

Beyond the details of their methodology, I include them in this list – in part – because of this paragraph:

The straightforward conclusion to draw on the basis of the comparative figures collected in this study is that we should perhaps spend less in anticipation of computer crime (on antivirus, firewalls etc.) but we should certainly spend an awful lot more on catching and punishing the perpetrators.

What a great, data-backed discussion-starter at your next security conference!

Might As Well Jump

Next up is a very maths-y offering by Adrian Baldwin, Iffat Gheyas, Christos Ioannidis, David Pym & Julian Williams on “Contagion in Cybersecurity Attacks“.

If you’re put off by math symbols, jump to the bottom of page four to stat your reading (right after reading the abstract & introduction). The authors used DShield data and focused on ten services (DNS, ssh, Oracle [they got the port #’s wrong], SQL, LDAP, http/s, SMB, IMAP/S, SMTP) sampled daily for the period 1 January 2003 to 28 February 2011. You can read the paper for their particular findings in this data set, but this extract hones in on the utility of their methodology:

Security threats to data, its quality and accessibility, represent potential losses to the integrity of the operations of the organization. Security managers, in assessing the potential risks, should be interested in the relationship between the contagious threats to these different security attributes. The nature of the inter- relationship between the threats provides additional information to assist managers in making their choices of mitigating responses. For example, if the inter-relationship between threats is constant, independently of the frequency and intensity of threats, security managers can adopt smooth mitigation profiles to meet the threat. In the absence of such stable relationships, the managers’ responses must be adjusted dynamically: for given temporal relationships between the number of attacks, their change (or ‘jump’) in frequency, and their change in size (extent of impact).

I can envision some product extensions incorporating this threat analysis into their offering or even service providers such as Akamai (they have deep, active threat intel) creating a broad, anonymized “contagion” report for public consumption with private, direct (paid) offerings for their clients.

That Is The Question

Lukas Demetz & Daniel Bachlechner hope to help security managers choose investment analysis strategies in their work on “To invest or not to invest?
Assessing the economic viability of a policy and security configuration management tool
“. They take eleven economic investment models and work through each of them for a selected tool/technology investment, pointing out the strengths & weaknesses of each (click for larger version of the summary table):

Unsurprisingly (at least for me), none were optimal, but this is the perfect paper for anyone who ever wanted to look at a summary/overview of the “should we invest?” work with an eye on real practicality.

Physician, Secure Thy Data

Martin S. Gaynor, Muhammad Zia Hydari & Rahul Telang aim to assess the impact of market competition on information security and privacy in their work on “Is Patient Data Better Protected in Competitive Healthcare Markets?“.

I first have to hand it to these researches for including the “WORK IN PROGRESS – PLEASE DO NOT QUOTE” tag right up front in the paper. Our industry seems to be one to jump on “facts” way to soon and this should give any infosec pundits pause.

However, (myself ignoring that previous paragraph) if the authors’ continued analysis does end up supporting their initial conclusion that increased competition is associated with a decline in the quality of patient data protection, it may show that security has an uphill battle getting into the “service differentiator” list.

The authors do take a moment to theorize as to why there seems to be an inverse relationship to competition & security:

We posit that hospitals in more competitive markets may be inclined to shift resources to more consumer visible activities from the less consumer visible activity of data protection

Is That A USB Of Patches In Your Pocket?

In “Online Promiscuity: Prophylactic Patching and the Spread of Computer Transmitted Infections“, Timothy Kelley & L. Jean Camp examine the efficacy of various aggregate patching and recovery behaviors using real world data and a plethora of interesting simulations.

If you listened to the SFS “Front Porch” conversation with @joshcorman, @armorguy & yours’ truly, you’ll know how I feel about patching, and I believe this paper help support the somewhat progressive approach to both the need for patching but also the need for intelligent patching (with the latter also requiring #spiffy incident response). The authors may say it best, tho:

We show, using our model and a real world data set, that small increases in patch rates and recovery speed are the most effective approaches to reduce system wide vulnerabilities due to unprotected computers. Our results illustrate that a public health approach may be feasible, as what is required is that a subpopulation adopt prophylactic actions rather than near-universal immunization.

What About The Green Jack?

Finally getting to the coding side of the security economics equation, Stephan Neuhaus & Bernhard Plattner look at whether software vulnerability fix rates decrease and if the time between successive fixes goes up as vulnerabilities become fewer and harder to fix in “Software Security Economics: Theory, in Practice“.

They chose Mozilla, Apache httpd and Apache Tomcat as targets of examination and did a thorough investigation of both vulnerability findings and code commits for each product using well-described and documented statistical methods (pretty graphs, too :-).

Here are the salient bits in their own words:

Our findings do not support the hypothesis that vulnerability fix rates decline. It seems as if the supply of easily fixable vulnerabilities is not running out and returns are not diminishing (yet).

and:

With this data and this analysis, we cannot confirm a Red Queen race.

Folks may not be too surprised with the former, but I suspect the latter will also be good conference debate fuel.

Law & Order : DBU (Data Breach Unit)

Sasha Romanosky, David Hoffman & Alessandro Acquisti analyzed court dockets for over 230 federal data breach lawsuits from 2000 to 2010 for their work on “Empirical Analysis of Data Breach Litigation“.

Why look at breach litigation outcomes? For starters, such analysis “can help provide firms with prescriptive guidance regarding the relative chances of being sued, and having to settle.” For insurance companies, this type of analysis can also be of help in crafting cyberinsurance policies. It can also help companies that have customer data as their primary asset/product better understand their obligations as custodians of such information.

But, you want to know what they found, so here’s the skinny:

Our results suggest that the odds of a firm being sued are 3.5 times greater when individuals suffer financial harm, but 6 times lower when the firm provides free credit monitoring. Moreover, defendants settle 30% more often when plaintiffs allege financial loss, or when faced with a certified class action suit. By providing the first comprehensive empirical analysis of data breach litigation, these findings offer insights in the debate over privacy litigation versus privacy regulation.

It’s a quick read and should be something you forward to your legal & compliance folk.

Achievement: Unlocked

On a topic close to home, Toshihiko Takemura & Ayako Komatsu investigate “Who Sometimes Violates the Rule of the Organizations?: Empirical Study on Information Security Behaviors and Awareness“.

The authors develop a behavioral model based on:

  • Attitude
  • Motivation toward the behavior
  • Information security awareness
  • Workplace environment

and use a survey-based approach to acquire their data.

The “money quote” (IMO) is this:

With regard to the information security awareness, in many cases it is
found that the higher the awareness is, the less the tendency to violate the rule is.

Get cranking on your awareness programs!

(If you made it this far and went through these or other WEIS 2012 papers, which ones were most impactful for you?)

[@hrbrmstr starts working in javascript again]
The Internets: What do you think?
@hrbrmstr: It’s vile.
The Internets: I know. It’s so bubbly and cloying and happy.
@hrbrmstr: Just like the Federation.
The Internets: And you know what’s really frightening? If you develop with it enough, you begin to like it.
@hrbrmstr: It’s insidious.
The Internets: Just like the Federation.

(With apologies to ST:DS9)

UPDATE: It seems my use of <script async> optimization for Raphaël busted the inline slopegraph generation. Will work on tweaking the example posts to wait for Raphaël to load when I get some time.

So, I had to alter this to start after a user interaction. It loaded fine as a static, local page but seems to get a bit wonky embedded in a complex page. I also see some artifacts in Chrome but not in Safari. Still, not a bad foray into basic animation.

Animate Slopegraph


There were enough eye-catching glitches in the experimental javascript support and the ugly large-number display in the spam example post that I felt compelled to make a couple formatting tweaks in the code. I also didn’t have time to do “real” work on the codebase this weekend.

So, along with spacing adjustments, there’s now an “add_commas” non-mandatory option that will toss commas in large numbers so they’re easy to read. Here’s an example of the new output (both the Raphaël display and commas):


As usual, it’s up on github

Not much progress over the weekend on my latest obsession (been busy enjoying some non-rainy days here in Maine). So, here are some other slopegraph implementations/resources I’ve found through mining the internets:

In preparation for the upcoming 1.0 release and with the hopes of laying a foundation for more interactive slopegraphs, I threw together some rudimentary output support over lunch today for Raphaël, which means that all you have to do is generate a new slopegraph with the “js” output type and include the salient portions of the generated html/css/javascript into a web page (along with including the Raphaël script code).

The next github push will have this update. Here’s an example of the output, using the classic Tufte example chart:


It’s definitely a bit rough around the edges (my eyes immediately fixate upon spacing discrepancies) and lacking any interactivity, but the basic building blocks are in place. It also does not render on my Android phone (HTC Incredible 2) but it does render in Chrome, Safari & on my iPad. Embedding a Raphaël graphic in a web page will definitely have advantages over a PNG or PDF in most situations even if it’s not interactive, so I’ll probably keep the support in regardless of whether I continue to improve upon it.

As I was playing with the code, I kept thinking how neat it would be if there was a Raphaël Cairosurface” option. Perhaps that will be a side project if all goes well, since it would not be that much more complicated (in fact, it may be less complicated) than the Cairo SVG surface code.

Given the focus on actual development of the PySlopegraph tool in most of the blog posts of late, folks may be wondering why an infosec/inforisk guy is obsessing so much on a tool and not talking security. Besides the fixation on filling a void and promoting an underused visualization tool, I do believe there is a place for slopegraphs in infosec data analysis and will utilize some data from McAfee’s recent Q1 2012 Threat Report [PDF] to illustrate how one might use slopegraphs in interpreting the “Spam Volume” data presented in the “Messaging Threats” section (pages 11 & 12 of the report).

The report shows individual graphs of spam volume per country from April of 2011 through March of 2012. Each individual graph conveys useful information, but I put together two slopegraphs that each show alternate and aggregate views which let you compare spam volume data relative to each country (versus just in-country).

When first doing this exploration, the scale problem reared it’s ugly head again since the United States is a huge spam outlier and causes the chart to be as tall as my youngest son when printed. I really wanted to show relative spam volume between countries as well as the increase or decrease between years in one chart and — after chatting with @maximumyin a bit — decided to test out using a log scale option for the charting (click for larger image):

This chart — Spam Volume by Country — instantly shows that:

  • overall volume has declined for most countries
  • two countries have remained steady
  • one country (Germany) has increased

The next chart – Spam Volume Percentage by Country — also needed to be presented on a log scale and has some equally compelling information:

Despite holding steady count-wise, the United States percentage of global spam actually increased and is joined by seven other countries, with Germany having the second largest percentage increase. Both charts present an opportunity to further explore why the values changed (since the best metrics are supposed to both inform and be actionable in some way).

I’m going to extract some more data from the McAfee report and some other security reports to show how slopegraphs can be used to interpret the data. Feedback on both the views and the use of the log scale would be greatly appreciated by general data scientists as well as those in the infosec community.

One of the last items for the 1.0 release is support for multiple columns of data. That will require some additional refactoring, so I’ve been procrastinating by exploring the recent “fudging” discovery. Despite claims to the contrary on other sites, there are more folks playing with slopegraphs than you might imagine. The inspiration for today’s installment comes from Jon Custer (@stuffisthings). He has a two partTelling Stories with Data” series that does some exploration of export data with slopegraphs. In his “Slopegraph Strikes Back” post, Jon does a spiffy job discussing data visualization fundamentals and walks the reader through his re-design of a chart on commodities ranking, including a commentary on an aspect of slopegraphs that I’ve been noticing as I’ve been doing my exploring: the ‘scale’ problem (which I began to point out in the aforementioned “fudging” post).

The data set Jon is working with allows for a great exploration as to what works best when trying to convey a message with slopegraphs. I took the values from one of the tables he extracted:

and made a “raw” slopegraph from them (focusing on the “top 10”). The graphic won’t even come close to fitting in this post but you can grab the PDF of it and see how scale is the primary enemy of slopegraphs. It does show how gold and precious metal ores have skyrocketed from 1998 to 2007, but it’s hardly an engaging and easy to read visualization (unless you really like using your scroll wheel).

Jon grok’d this point, too, and decided to focus on the power law ranking and use the slopegraph to present the rate of change of each commodity:

While he didn’t “pull a Tufte” and just include values without caveat (see left & right 90° side labels), I still believe that there needs to be either increased annotation or the inclusion of base tabular data. Using my PySlopegraph code (forgot to mention the name change), I worked up a version of Jon’s visualization that I believe provides a clean, honest view of the data (click for larger view):

Because the chart is still based on the percentages that are fairly precise:

  1. "Coconuts, Brazil nuts, cashews",17.93,0.93
    
  2. Coffee,12.93,3.91
    
  3. Fish,7.89,5.04
    
  4. Tobacco,7.25,3.19
    
  5. Gold,6.62,18.63
    
  6. Tea,4.14,1.32
    
  7. Cotton,4.01,1.36
    
  8. Cloves,3.58,0.29
    
  9. Diamonds,3.44,0.58
    
  10. Mounted stones,2.44,1.5
    
  11. Vegetables,1.61,1.73
    
  12. Wheat,0.54,1.38
    
  13. "Precious metal ores",0,6.76

I finally added an option to the PySlopegraph configuration file for rounding (NOTE: rounding != true binning). If you add the “round_precision” option with a value that supports Python’s round function’s little-known second parameter (arbitrary positional rounding), you can have the values round to decimal or tens/hundreds/etc places which will help with scaling issues, but will also group items (in ways that you may not have originally intended). For this chart, if we use a value of “1” (first decimal rounding precision…use negative values for rounding on the whole integer side of the decimal) it’s still unreadable due to the scale it imposes by that precision, so I ended up using the nearest whole integer rounding option (value of “0”) and also included the table of actual values, along with annotating the “rate of change” nature of the slopes.

This (again) defeats the “no wasted ink (pixels?)” component of Tufte’s original creation, but I believe it’s necessary for some types of slopegraphs to ensure the chart can stand on it’s own. I’m definitely becoming more convinced that many slopegraphs are more suited for an interactive visualization where you can encode more information in rollovers/popups/etc plus allow for switching of view from percentage, power-law ranking or raw numeric comparison.

For those interested in playing with this particular data set, it’ll be included in the next github code push, which will also include the rounding feature.