Archive for the ‘Metrics’ Category

Data-Driven Security (The Book) Update #ShamelessSelfPromotion

Data-Driven-SecurityIf I made a Venn diagram of the cross-section of readers of this blog and the Data Driven Security web sites it might be indistinguishable from a pure circle. However, just in case there are a few stragglers out there, I figured one more post on the fact that the new book by @jayjacobs & me is available now in electronic form (not pre-order) wouldn’t hurt. The print book is still making it’s way from dead trees to store shelves and should be ready for the expected February 17th debut.

Here’s the list of links to e-tailers (man, I hate that term) who have it available for the various e-readers out there.

If you happen to catch it out in the wild and not on this list, drop me (@hrbrmstr) a note #pls.

And, a huge thank you! to everyone for their kind accolades yesterday (esp to those who’ve purchased the book :-)

Speaking At RSA Conference 2013!

Earlier this week, @jayjacobs & I both received our acceptance notice for the talk we submitted to the RSA CFP! [W00t!] Now the hard part: crank out a compelling presentation in the next six weeks! If you’re interested at all in doing more with your security data, this talk is for you. Full track/number & details below:

Session Track:Governance, Risk & Compliance
Session Code:GRC-T18
Scheduled Date:02/26/2013
Scheduled Time:2:30 PM – 3:30 PM
Session Length:1 hr
Session Title:Data Analysis and Visualization for Security Professionals
Session Classification:Intermediate
Session Keywords:metrics, visualization, risk management, research
Short Abstract:You have a deluge of security-related data coming from all directions and may even have a fancy dashboard full of pretty charts. However, unless you know the right questions to ask and how to ask them, all you really have are compliance artifacts. Move beyond the checkbox and learn techniques for collecting, exploring and visualizing the stories within our security data.

DIY ZeroAccess GeoIP Analysis : So What?

NOTE: A great deal of this post comes from @jayjacobs as he took a conversation we were having about thoughts on ways to look at the data and just ran like the Flash with it.

Did you know that – if you’re a US citizen – you have approximately a 1 in 5 chance of getting the flu this year? If you’re a male (no regional bias for this one), you have a 1 in 400 chance of developing Hodgkin’s Disease and a 1 in 5,000 chance of dying from testicular cancer.

Moving away from medical stats, if you’re a NJ resident, you have a 1 in 1,000 chance of winning $275 in the straight “Pick 3″ lottery and a 1 in 13,983,816 chance of jackpotting the “Pick 6″.

What does this have to do with botnets? Well, we’ve determined that – if you’re a US resident – you have a 1 in 6,000 chance of getting the ZeroAccess flu (or winning the ZeroAccess lottery, whichever makes you feel better). Don’t believe me? Let’s look at the data.

For starters, we’re working with this file which is a summary file by US state that includes actual state population, the number of internet users in that state and the number of bots in that state (data is from Internet World Statistics). As an example, Maine has:

  • 1,332,155 residents
  • 1,102,933 internet users
  • 219 bot infections

(To aspiring security data scientists out there, I should point out that we’ve had to gather or crunch through on our own much of the data we’re using. While @fsecure gave us a great beginning, there’s no free data lunch)

Where’d we get the 1 : 6000 figure? We can do some quick R math and view the histogram and summary data:

#read in the summary data
df <- read.csv("zerogeo.csv", header=T)
# calculate how many people for 1 bot infection per state:
df$per <- round(df$intUsers/df$bots)
# plot histogram of the spread
hist(df$per, breaks=10, col="#CCCCFF", freq=T, main="Internet Users per Bot Infection")

Along with the infection rate/risk, we can also do a quick linear regression to see if there’s a correlation between the number of internet users in a state and the infection rate of that state:

# "lm" is an R function that, amongst other things, can be used for linear regression
# so we use it to performa quick regression on how internet users describe bot infections
users <- lm(df$bots~df$intUsers)
# and, R makes it easy to plot that model
plot(df$intUsers, df$bots, xlab="Internet Users", ylab="Bots", pch=19, cex=0.7, col="#3333AA")
abline(users, col="#3333AA")

Apart from some outliers (more on that in another post), there is – as Jay puts it – “very strong (statistical) relationship between the population of internet users and the infection rate in the states.” Some of you may be saying “Duh?!” right about now, but all we’ve had up until this point are dots or colors on a map. We’ve taken that superficial view (yes, it’s just really eye candy) and given it some depth and meaning.

We’re pulling some demographic data from the US Census and will be doing another data summarization at the ZIP code level to see what other aspects (I’m really focused on analyzing median income by ZIP code to see if/how that describes bot presence).

If you made it this far, I’d really like to know what you would have thought the ZeroAccess “flu” chances were before seeing that it’s 1 : 6,000 (since your guesstimate was probably based on the map views).

Finally, Jay used the summary data to work up a choropleth in R:

# setup our environment
# read the data
zero <- read.csv("zerogeo.csv", header=T)
# extract state geometries from maps library
states <- map_data("state")
# this "cleans up the data" to make it easier to merge with the built in state data
zero.clean <- data.frame(region=tolower(zero$state), 
choro <- merge(states, zero.clean, sort = FALSE, by = "region")
choro <- choro[order(choro$order),]
# "bin" the data to enable us to use a better set of colors
choro$botBreaks <- cut(choro$perBot, 10)
# get the plot
c1 = qplot(long, lat, data = choro, group = group, fill = botBreaks, geom = "polygon", 
      main="Population of Internet Users to One Zero Access Botnet Infenction") +
# display it with modified color scheme (we hate the default ggplot2 blue)
c1 + scale_fill_brewer(palette = "Reds")

2012 WhiteHat Security Website Security Statistics Report Redux

This is an inaugural post for @MetricsHulk, on the condition that there are few – if any – “ALL CAPS” bits. Q3&4 tend to be “report season”, and @MetricsHulk usually has some critiques, praises, opines and suggestions (some smashes, too) to offer as we are inundated with a blitz of infographics.

The always #spiffy @WhiteHatSec released their 2011 Web Site Security stats report [direct link (PDF)] last week (here’s one of their teaser tweets):

With over 7,000 sites and hundreds of diverse organizations represented in the report, it is a great resource for folks to see how they stack up (more on that in a bit). Security folks should also take some encouragement from the report since:

  • Real vulnerabilities are down (significantly)
  • WAFs can help
  • Vulnerabilities are getting fixed faster (when found)

@WhiteHatSec does a fine job summarizing key & extended findings (hint: read the report), and they are awesomely up-front and honest with regard to the findings (see pages 4 & 5 for their analysis on why the ‘good stats’ might be so good).

The report is chock-full of data. Real. Data. The only way it could have been better data-wise is if they provided a Google Docs bundle of raw numbers. (NOTE: I didn’t get all the data in there, but it has decent amount from the report)

I do think there is some room for improvement. Take, for example, the – sigh – donut chart on page 9. I might be inclined to refrain from comment if this was one of those hipster infographics that seem to be everywhere these days. A pie chart isn’t much better, but at least we’re able to process the relative sizes a bit better when the actual angles are present. Here’s a before/after makeover for your comparison/opine (click for larger version):

We get an immediate sense of scale from the bars and it removes the need for the “Frosted Lucky Charms” color-wheel effect. The @WhiteHatSec folk use bars (very appropriately) almost everywhere else, so I’m not sure what the design decision was for deviating for this part of the report.

The next bit that confused me was Figure 18 (page 15). I’m having difficulty both figuring out where the “79” value comes from (I can’t get to it by averaging the values presents) and grok’ing the magnitude of the differences from the bubbles. So, here’s another before/after makeover for your comparison/opine (click for larger version):

Finally, I think Figure 23 & 24 could do with a bit of a slopegraph makeover, as the spirit of the visualization is to show year-over-year differences. The first two slopegraphs used the “Tufte binning technique“, so you’ll need to refer to the companion data tables if you want exact numbers for comparison (the trend is more important, IMO).

Average Days Open
<img src=”×110.png” alt=”” title=”wh-avg-days-open” width=”300″ height=”110″ class=”aligncenter size-medium wp-image-1453″” />

Average Days to Close

Remediation Rates by Year

(You can also download easier to read PDFs of the slopegraphs)

Absolutely no one should take the makeover suggestions as report slander. As stated at the beginning of the post, @WhiteHatSec is open about the efficacy of their data and analysis, plus they provide actual data. The presentation of stats & trending by industry and vulnerability type should help any organization with an appsec program figure out if they are doing better or worse the others in their sector and see if they are smashing bugs with similar success. It also gives the general infosec community a view that we would otherwise not have. I would encourage other organizations to follow @WhiteHatSec’s example, even if it means more donut charts (mmm…donuts).

What information did you glean from the WhiteHat report, or what makeovers would you encourage for the next one?

Google Spreadsheet “importHTML” Rocks For Quick Analytics

I usually take a peek at the Internet Traffic Report (ITR) a couple times a day as part of my routine and was a bit troubled by all of the red today:

I wanted to do some crunching on the data, and I deliberately do not have Word or Excel on my new MacBook Pro (for reasons I can detail if asked). A SELECT / CUT / PASTE into TextWrangler did not really thrill me and I knew there had to be a way to get non-marked-up, columnar data into a format I could mangle and share easily.

Enter, Google Shreadsheet’s importHTML function.

If you don’t have the forumla bar enabled in Google Spreadsheets, just go to View->Formula Bar to enable it. Once there, enter the following in the formula bar to get the data from the ITR into a set of columns that will auto-update every time you reference the sheet.


(as you can see, it’s not case sensitive, either)

Yes, I know Excel can do this. I could have done a quick script whack the pasted data in TextWrangler. You can do something similar in R with htmlTreeParse + xpathApply and Perl has HTML::TableContentParser (and other handy modules), but this was a fast, easy way to get me to a point where I could do the basic analytics I wanted to perform (and, sometimes, all you need is quick & easy).

Official Google Help page on importHTML.

Businessweek Infographic Illustrates The Pounding We Took In 2011

Another #spiffy tip from @MetricsHulk:


Evan Applegate put together a great & simple infographic for Businessweek that illustrates the number and size of 2011 data breaches pretty well.

(Click for larger version)

The summary data (below the timeline bubble chart) shows there was a 37.4% increase in reported incidents and over 260 million records exposed/stolen for the year. It will be interesting to see how this compares with the DBIR.

Improve Your Security Metrics For $14.00USD

IT Security Metrics : A Practical Framework for Measuring Security & Protecting Data has has solid reviews by Richard Bejtlich (@TaoSecurity), David J. Elfering (@icxc) & Dr. Anton Chuvakin (@anton_chuvakin), amongst others. You can get it (for a short time) for just about fourteen Washingtons by doing the following.

First, go to this Amazon link and enter “ETXTBOOK” (no quotes) as the code, you’ll get a credit of $10.00USD for Amazon Kindle textbooks. That credit expires on January 9th, 2012, btw.

Now, if you view IT Security Metrics : A Practical Framework for Measuring Security & Protecting Data on Amazon and order it (again, by January 9th, 2012), it will cost you a whole ~$14.00USD

Micropwns :: Risk Microprobabilities for Infosec?

NOTE: This is a re-post from a topic I started on the SecurityMetrics & SIRA mailing lists. Wanted to broaden the discussion to anyone not on those (and, why aren’t you on them?)

I had not heard the term micromort prior to listening to David Spiegelhalter’s Do Lecture and the concept of it really stuck in my (albeit thick) head all week.

I didn’t grab the paper yet, but the abstract for “Microrisks for Medical Decision Analysis” seems to be able to extrapolate directly to the risks we face in infosec:

“Many would agree on the need to inform patients about the risks of medical conditions or treatments and to consider those risks in making medical decisions. The question is how to describe the risks and how to balance them with other factors in arriving at a decision. In this article, we present the thesis that part of the answer lies in defining an appropriate scale for risks that are often quite small. We propose that a convenient unit in which to measure most medical risks is the microprobability, a probability of 1 in 1 million. When the risk consequence is death, we can define a micromort as one microprobability of death. Medical risks can be placed in perspective by noting that we live in a society where people face about 270 micromorts per year from interactions with motor vehicles. Continuing risks or hazards, such as are posed by following unhealthful practices or by the side-effects of drugs, can be described in the same micromort framework. If the consequence is not death, but some other serious consequence like blindness or amputation, the microrisk structure can be used to characterize the probability of disability. Once the risks are described in the microrisk form, they can be evaluated in terms of the patient’s willingness-to-pay to avoid them. The suggested procedure is illustrated in the case of a woman facing a cranial arteriogram of a suspected arterio-venous malformation. Generic curves allow such analyses to be performed approximately in terms of the patient’s sex, age, and economic situation. More detailed analyses can be performed if desired. Microrisk analysis is based on the proposition that precision in language permits the soundness of thought that produces clarity of action and peace of mind.”

When my CC is handy and I feel like giving up some privacy I’ll grab the whole paper, but the correlations seem pretty clear from just that bit.

I must have missed Schneier’s blog post about it earlier this month where he links to which links to (apologies for the link leapfrogging, but it provides background context that I did not have prior).

At a risk to my credibility, I’ll add another link to a Wikipedia article that lists some actual micromorts and include a small sample here:

Risks that increase the annual death risk by one micromort, and their associated cause of death:
  • smoking 1.4 cigarettes (cancer, heart disease)
  • drinking 0.5 liter of wine (cirrhosis of the liver)
  • spending 1 hour in a coal mine (black lung disease)
  • spending 3 hours in a coal mine (accident)
  • living 2 days in New York or Boston (air pollution)

I asked on Twitter if anyone thought we had an equivalent – a “micropwn“, say – for our discipline. Do we have enough high level data to produce a generic micropwn for something like:

  • 1 micropwn for every 3 consecutive days of missed DAT updates
  • 1 micropwn for every 10 Windows desktops with users with local Administrator privileges
  • 1 micropwn for every 5 consecutive days of missed IDS/IDP signature updates

Just like with the medical side of things, the micropwn calculation can be increased depending on the level of detail. For example (these are all made up for medicine):

  • 1 micromort for smoking 0.5 cigarettes if you are an overweight man in his 50’s
  • 1 micromort for smoking 0.25 cigarettes if you are an overwight man in his 50’s with a family genetic history of lung cancer

(again, I don’t have the paper, but the abstract seems to suggest this is how medical micromorts work)

Similarly, the micropwn calculation could get more granular by factoring in type of industry, geographic locations, breach histiory, etc.

Also, a micropwn (just like micromort) doesn’t necessarily mean “catastrophic” breach (I dislike that word as I think of it as a broad term when most folks associate it directly with sensitive record loss). Could mean successful malware infection in my view.

So, to further refine the question I originally posed on Twitter: Do we have enough broad data to provide input for micropwn calculations and can we define a starter-list of micropwns that would prove valuable in helping articulate risk within and outside our discipline?

Metricon: Verification versus Validation

Speaker: Jennifer Bayuk


Based on work for Stevens Institute of Technology.

How do professional systems engineers work?


  1. Mainframe
  2. physical security (punch cards)
  3. cables to terminals
  4. network to workstations (some data moves there & on floppies) *spike in misuse & abuse
  5. modems and dedicated links to external providers/partners
  6. added midrange servers (including e-mail)
  7. added dial-back procedures to modem
  8. e-mail & other issues begat firewalls
  9. firewalls begat the “port 80″ problem
  10. modems expanded to the remote access issue
  11. remote access issue begat multi-factor auth
  12. then an explosion of midrange begat more malware
  13. internal infestation from web sites & more e-mail
  14. added proxy servers
  15. made anti-virus ubiquitous
  16. kicked in SSL on web servers that now host critical biz apps
  17. (VPN sneaks in for vendors & remote access)
  18. more customers begat identity management
  19. increasing attacks begat IDS
  20. formalized “policies” in technical security enforcement devices
  21. now we have data & access everywhere, begets log management
  22. data loss begat disk encryption on servers & workstations
  23. increasingly common app vulns begat WAFs


Reference: Stevens Inst. “systems thinking”

Use systemogram to show what systems are supposed to do (very cool visualization for differing views of “security systems thinking”)

applied that systemogram model to a real world example of Steven’s school computer lab


Shows the “Vee Model” (her diagram is more thorough – GET THE PRESENTATION)


Advantages of this approach include:

  • Manage complexity
  • Top-down requirements tracing
  • Black box modeling
  • Logical flow analysis
  • Documentation
  • Peer review
  • Detailed Communication

Must advance and move beyond threat->countermeasure insidious cycle.


Traditional requirements process involves gathering functional requirements, interface definition and system-wide “ilities” – need to get it in before the interface level (high-level “black box”)

The major vulnerabilities are at the functional decompositional level

Many security vulns are introduced at the interface level as well

Unfortunately, it’s usually put at the system-wide level (as they do with availability ,etc)


What Do Security Requiremens Look Like Today?

  • Functional – what is necessary for mission assurance
  • Nonfunctional: what is necessary for system survival
  • V&V: what is necessary to ensure requirements are met


V&V: Verification: did we build it right? Validation: was it built right? (akin to correctness & effectiveness)

There are more similarities than system architects really want to believe or understand.


Much of security metrics are really verification vs validation


Validation Criteria

  • content
  • face
  • criterion
  • construct

Optimization WordPress Plugins & Solutions by W3 EDGE