Speaker: Chris Eng / Veracode
Every major infosec company publishes quarterly/yearly summary reports. Some based on survey, some based on real captured data.
Recognizing the Narrative
Every fancy looking infosec metrics report is a marketing vehicle; each has different perspectives; no consistency, but you can figure out the framing by looking at the exec summary or ToC; other times it may require real digging. Need to understand “what they are selling”. The text in the report is there to back up the narrative.
Veracode Report Narrative
- More than 0.5 of all software failed to achieve acceptable level of security
- 3rd party apps had lowest security quality
- No single method of testing is accurate
(goal: use Veracode to analyze third party apps :-)
Trustwave Report Narrative
- 2010 incident response investigations
- attack vector evolution
- 11 strategis initiatives for 2011
(goal: “we can help…we are good at this stuff”)
- Which web programming languages are most secure
(differs in goal from previous WH reports)
Bottom line: try to understand the framing & goal when reviewing the narrative
Using Stats Responsibly
Sample distribution review/discussion
normal distribution curves can still vary, but overall shape remains the same (std deviations, & avg)
bimodal distribution (two peaks)…may miss if you report only on averages
[game: Guess the Report Jeopardy! used primarily to show the pervasiveness of the use of averages]
[side-talk: discussion about different distributions by different sources]
(/me: this is very interesting)
Would a table of # of flaws per 1K lines of code per language be enough?
Would adding 1st quartike, median and 3rd quartile provide more insight?
Will this help understand the anomalies? Will it help prioritize?
How do we ensure normalized data for comparison?
[side-talk: what’s a “line of code”…same problem in app bug analysis]
[side-talk: Truth in stats: “What’s the question? What matters?”]
Can you overdo it? Yes.
Power analysis can be use to determine the statisticaly significant sample size required to ensure the prob of error is acceptable.
Should you really include non-statistically significant data? “To asterisk or not to asterisk?”
It’s hard to un-see something after you see it (/me: good point)
[side-talk: show cell counts as well as %-ages; don’t use a bar chart when a crosstab is more useful]
[side-talk: we should follow guidance from social services in terms of how to present data for action]
Storytelling Via Omission
[side-talk: no report provided raw data]
What unwanted assumptions might result if the “wrong” data is included?
We need to provide access to raw data even though the majority of the population of consumers don’t want it.
Veracode will open up analytics platform to security researches :: vercode.com/analytics
[side-talk: Every company that publishes a report needs to publish name and contact info of their stats person who will backup the processed & data used]
[side-talk: is “truth” really what infosec companies really want to promote in their reports? @alexhutton: isn’t that #RSAC?]