Skip navigation

Category Archives: Breach

The basic technique of cybercrime statistics—measuring the incidence of a given phenomenon (DDoS, trojan, APT) as a percentage of overall population size—had entered the mainstream of cybersecurity thought only in the previous decade. Cybersecurity as a science was still in its infancy, as many of its basic principles had yet to be established.

At the same time, the scientific method rarely intersected with the development and testing of new detection & prevention regimens. When you read through that endless stream of quack cybercures published daily on the Internet and at conferences like RSA, what strikes you most is not that they are all, almost without exception, based on anecdotal or woefully inadequately small evidence. What’s striking is that they never apologize for the shortcoming. They never pause to say, “Of course, this is all based on anecdotal evidence, but hear me out.” There’s no shame in these claims, no awareness of the imperfection of the methods, precisely because it seems to eminently reasonable that the local observation of a handful of minuscule cases might serve the silver bullet for cybercrime, if you look hard enough.


But, cybercrime couldn’t be studied in isolation. It was as much a product of the internet expansion as news and social media, where it was so uselessly anatomized. To understand the beast, you needed to think on the scale of the enterprise, from the hacker’s-eye view. You needed to look at the problem from the perspective of Henry Mayhew’s balloon. And you needed a way to persuade others to join you there.

Sadly, that’s not a modern story. It’s an adapted quote from chapter 4 (pp. 97-98, paperback) of The Ghost Map, by Steven Johnson, a book on the cholera epidemic of 1854.

I won’t ruin the book nor continue my attempt at analogy any further. Suffice it to say, you should read the book—if you haven’t already—and join me in calling out for the need for the John Snow of our cyber-time to arrive.

If you haven’t viewed/read Wendy Nather’s (@451Wendy) insightful [Living Below The Security Poverty Line](https://451research.com/t1r-insight-living-below-the-security-poverty-line) you really need to do that before continuing (we’ll still be here when you get back).

Unfortunately, the catalyst for this post came from two recent, real-world events: my returned exposure to the apparent ever-increasing homeless issue in San Francisco (a side effect of choosing a hotel 10 blocks away from Moscone) and the hacking of a [small, local establishment](http://www.tnhonline.com/works-bakery-customers-targeted-by-cyber-thieves-1.2988390#.UTMuF-tASS0) resulting in exposure of customer credit cards.

If you do any mom-and-pop, brick-and-mortar shopping you’ve seen it: the Windows-based point-of-sale terminal that is the *only* computer for the owners. Your credit card will be scanned on the same machine cat videos will be viewed and e-mail will be read. In many small shops, that machine is also where accounting functions are performed.

These truly small business (TSB) owners aren’t living below the security poverty line, they are security hobos. They *kinda* know they need to care about the safety of their data, but their focus is on their business or creative processes. When they do have time to care about security, that part of their world is so complex that it’s far too easy to make the choice to ignore it than to do something about it. If your immediate reaction was to disagree with my complexity posit, here are just a few tasks a TSB owner must face in a world of modern commerce:

– Updating operating system patches
– Updating browser software
– Updating Flash
– Updating Java
– Maintain web site/Twitter/Facebook securely
– Recognizing phishing e-mails/posts/tweets
– Understanding browser security
– Keeping signature anti-malware up-to-date
– Remember passwords for system, POS vendor, government sites, e-mail, etc.
– Maintain secure Wi-Fi and Internet firewall
– Maintain physical security (e.g. cameras)

Those tasks may be as autonomous as breathing for security folk and technically-savvy users, but they are extraneous tasks that are confusing for most TSBs and may often cause instability issues with the wretched POS software options out in the marketplace. These folks also cannot afford to hire security consultants to do this work for them.

Verizon’s 2012 DBIR & Trustwave’s 2012 report both showed that [these types of businesses](http://www.slate.com/articles/technology/technology/2012/03/verizon_s_data_breach_investigations_report_reveals_that_restaurants_are_the_easiest_target_for_hackers_.single.html#pagebreak_anchor_2) were part of the groups most targeted by criminals, yet the best our industry can do is dress up folks in schoolgirl costumes at @RSAConference whilst telling TSBs to keep their systems up-to-date and not re-use passwords. It’s the security equivalent of walking by a truly desperate person on the street without even making eye contact as your body language exudes the “get a job” vibe.

We have to do better than this.

Until software and hardware vendors start to—or are forced to—actually care about security, it will be up to security professionals to create the digital equivalent of a soup kitchen to make the situation better. What can you do?

– speak at local Chamber of Commerce meetings and provide practical take-aways for those who attend
– discuss security topics with friends or relatives who are TSB owners
– have your [ISSA|ISC2|NAISG] chapter setup a booth at conventions which attract TSBs (y’know…get out of the echo chamber, mebbe?)
– raise awareness through blogging and other media outlets
– produce & distribute awareness materials—a great example would be @Veracode’s non-domain [infographics](http://www.veracode.com/blog/category/infographics/)
– demand better (in general) out of your security vendors
– lobby government for better security standards

It may not seem like much, but we have to start somewhere if we’re going to find a way to help protect those that most vulnerable, especially since it will also mean helping to keep *our own* information safe.

In case you are a truly small business owner who is reading this post, there are some things you can do to help ensure you won’t be a victim:

– Use a dedicated machine for your POS work—an iPad with [Square](https://squareup.com/) is a good option but doesn’t work for everyone
– Do not perform any operations on the Internet on the system that you do accounting tasks on
– Use @1Password to create, store & manage all your passwords on all your systems/devices
– Use [Secunia PSI](http://secunia.com/vulnerability_scanning/personal/) to help keep your Windows systems up-to-date
– Set all operating system and anti-malware software to auto-update
– Do not put your security cameras on the Internet; if you do, password protect them
– Research what your responsibilities are and what actions you’ll need to take in the event you do discover that your business or customer information has been exposed

If you’re not on the SecurityMetrics.org mailing list you missed an interaction about the Privacy Rights Clearinghouse Chronology of Data Breaches data source started by Lance Spitzner (@lspitzner). You’ll need to subscribe to the list see the thread, but one innocent question put me down the path to taking a look at the aggregated data with the intent of helping folks understand the overall utility/efficacy of it when trying to craft messages from it.

Before delving into the data, please note that PRC does an excellent job detailing source material for the data. They fully acknowledge some of the challenges with it, but a picture (or two) is worth a thousand caveats. (NOTE: Charts & numbers have been produced from January 20th, 2013 data).

The first thing I did was try to get a feel for overall volume:

Total breach record entries across all years (2005-present): 3573
Number of entries with ‘Total Records Lost’ filled in: 751
% of entries with ‘Total Records Lost’ filled in: 21.0%

Takeaway #1: Be very wary of using any “Total Records Breached” data from this data set.

It may help to see that computation broken down by reporting source over the years that the data file spans:

complete-records-by-source-across-years

This view also gives us:

Takeaway #2: Not all data sources span all years and some have very little data.

However, Lance’s original goal was to compare “human error” vs “technical hack”. To do this, he combined DISC, PHYS, PORT & STAT into one category (accidental/human :: ACC-HUM) and HACK, CARD & INSD into another (malicious/attack :: MAL-ATT). Here’s what that looks like when broken down across reporting sources across time:

breach-count-metatype-year

(click to enlarge)

This view provides another indicator that one might not want to place a great deal of faith on the PRC’s aggregation efforts. Why? It’s highly unlikely that DatalossDB had virtually no breach recordings in 2011 (in fact, it’s more than unlikely, it’s not true). Further views will show some other potential misses from DatalossDB.

Takeaway #3: Do not assume the components of this aggregated data set are complete.

We can get a further feel for data quality and also which reporting sources are weighted more heavily (i.e. which ones have more records, thus implicitly placing a greater reliance on them for any calculations) by looking at how many records they each contributed to the aggregated population each year:

(click to enlarge)

(click to enlarge)

I’m not sure why 2008 & 2009 have such small bars for Databreaches.net and PHIPrivacy.net, and you can see the 2011 gap in the DatalossDB graph.

At this point, I’d (maybe) trust some aggregate analysis of the HHS (via PHI), CA Attorney General & Media data, but would need to caveat any conclusions with the obvious biases introduced by each.

Even with these issues, I really wanted a “big picture” view of the entire set and ended up creating the following two charts:

(click to enlarge)

(click to enlarge)

(click to enlarge)

(click to enlarge)

(You’ll probably want to view the PDF documents of each : [1] [2] given how big they are.)

Those charts show the number of breaches-by-type by month across the 2005-2013 span by reporting source. The only difference between the two is that the latter one is grouped by Lance’s “meta type” definition. These views enable us to see gaps in reporting by month (note the additional aggregation issue at the tail end of 2010 for DatalossDB) and also to get a feel for the trends of each band (note the significant increase in “unknown” in 2012 for DatalossDB).

Takeaway #4: Do not ignore the “unknown” classification when performing analysis with this data set.

We can see other data issues if we look at it from other angles, such as the state the breach was recorded in:

(click to enlarge)

(click to enlarge)

We can see at least three issues (missing value and occurrences recorded not in the US) from this view, but it seems the number of breaches mostly aligns with population (discrepancies make sense given the lack of uniform breach reporting requirements).

It’s also very difficult to do any organizational analysis (I’m a big fan of looking at “repeat offenders” in general) with this data set without some significant data cleansing/normalization. For example, all of these are “Bank of America“:

[1] "Bank of America"                                                             
[2] "Wachovia, Bank of America, PNC Financial Services Group and Commerce Bancorp"
[3] "Bank of America Corp."                                                       
[4] "Citigroup, Inc., Bank of America, Corp."

Without any cleansing, here are the orgs with two or more reported breaches since 2005:

(apologies for the IFRAME but Google’s Fusion Tables are far too easy to use when embedding data tables)

Takeaway #5: Do not assume that just because a data set has been aggregated by someone and published that it’s been scrubbed well.

Even if the above sets of issues were resolved, the real details are in the “breach details” field, which is a free-form text field providing more information on who/what/when/where/why/how (with varying degrees of consistency & completeness). This is actually the information you really need. The HACK attribute is all well-and-good, but what kind of hack was it? This is one area VERIS shines. What advice are you going to give financial services (BSF) orgs from this extract:

(click to enlarge)

(click to enlarge)

HACKs are up in 2012 from 2010 & 2011, but what type of HACKs against what size organizations? Should smaller orgs be worried about desktop security and/or larger orgs start focusing more on web app security? You can’t make that determination without mining that free form text field. (NOTE: as I have time, I’m trying to craft a repeatable text analysis function I can perform on that field to see what can be automatically extracted)

Takeaway #6: This data set is pretty much what the PRC says it is: a chronology of data breaches. More in-depth analysis is not advised without a massive clean-up effort.

Finally, hypothesizing that the PRC’s aggregation could have resulted in duplicate records, I created a subset of the records based solely on breach “Date Made Public” + “Organization Name” and then sifted manually through the breach text details, 6 duplicate entries were found. Interestingly enough, only one instance of duplicate records was found across reporting databases (my hunch was that DatalossDB or DataBreaches.NET would have had records other, smaller databases included; however, this particular duplicate detection mechanism does not rule this out given the quality of the data).

Conclusion/Acknowledgements

Despite the criticisms above, the efforts by the PRC and their sources for aggregation are to be commended. Without their work to document breaches we would only have the mega-media-frenzy stories and labor-intensive artifacts like the DBIR to work with. Just because the data isn’t perfect right now doesn’t mean we won’t get to a point where we have the ability to record and share this breach information like the CDC does diseases (which also ins’t perfect, btw).

I leave you with another column of numbers that shows—if broken down by organization type and breach type—there is an average of 2 breaches per-breach/org-type-per-year (according to this data):

(The complete table includes the mean, median and standard deviation for each type.)

Lance’s final question to me (on the list) was “Bob, what do recommended as the next step to answer the question – What percentage of publicly known data breaches are deliberate cyber attacks, and what percentage are human based accidental loss/disclosure?

I’d first start with a look at the DBIR (especially this year’s) and then see if I could get a set of grad students to convert a complete set of DatalossDB records (and, perhaps, the other sources) into VERIS format for proper analysis. If any security vendors are reading this, I guarantee you’ll gain significant capital/accolades within/from the security practitioner community if you were to sponsor such an effort.

Comments, corrections & constructive criticisms are heartily welcomed. Data crunching & graphing scripts available both on request and perhaps uploaded to my github repository once I clean them up a bit.

UPDATE: I had to remove the Google Insight widgets and replace them with static images. There was inconsistent loading far too often in non-Chrome browsers. Click on the graphs to go to the Google Insights detail pages for more interaction with the data.

Information security breaches have been the “new black” in the past eighteen months, with the latest fashion-trend being the LinkedIn passwords fiasco. This got me thinking: what is the “half-life” of a breach? It’s becoming obvious that users do not see the security of their information as a service differentiator or even a tier one decision point when choosing to use a new social network or online application. (How many of you closed out your LinkedIn accounts?) But, just how quickly does their attention wane from a breach event? Pretty quickly, if one formulates a conclusion based on Google Insights search data.

Let’s start with LinkedIn

We have a burst that – if one is generous – captures interest for about a week. Even more interesting is that it seems said interest was limited to very specific geographic regions:

Plus, the incident continues to help show the lack of impact breaches have on stock price:

But, LinkedIn is not exactly a broad-reaching service (i.e. it’s no Facebook).

Breaches Don’t Stop The Shopping

Investor exuberance notwithstanding, LinkedIn is kinda boring since folks use it to actually publish personal data to the world. While it has some private messaging and may hold some financial account information, it’s not like Zappos which has payment information and shopping history, and who was also breached this year. How long did they get attention?

While there is a longer, flat tail, attention is still about seven days (you can interact with the chart and zoom in to verify that claim) and Zappos’ overall consumer interest does not seem to have waned:

Sownage Revisited

The word “Sony” is now almost synonymous with “breach” in the minds of most information security folk. It’s our “go to” example when talking with executives and application teams. Unfortunately, for the purposes of comparative analysis, it wasn’t just one breach. So, while the chart shows closer to a ten week interest period, that makes sense when one considers there were over ten news stories (one for each new breach):

I won’t go into the details as to why including a stock price chart has little efficacy in determining breach effect for Sony (it’s been analyzed to death), but a comparative look at “PlayStation” (with an added factor for “iPad”) shows (to me) that the breaches had far less impact on interest in the PlayStation (one of the main breach targets) than the iPad had:

Breaches Spook The Spooks

So, if breaches are of little interest to the consumer, they must have greater impact on the community that has some skin in the game, right? Kinda. If we look at the RSA & Lockheed breaches:

We see that the Lockheed breach kept attention from mid-April to about mid-July (12 weeks) and RSA spiked twice for about four weeks each time. Both of them were intertwined in the news and RSA had numerous (to be blunt) PR-events that helped keep focus on both.

RSA is part of EMC, so a stock view analysis has many other complexities that make it less than ideal, but both companies (EMC & Lockheed) did not seem to suffer from the extended initial breach interest:

Only One View

I mentioned at the beginning of the post that this was intended to be a single-factor analysis, limited to what insights Google gleans from what folks are searching for. It doesn’t provide a view into enterprise contractual agreements, service usage patterns or even blogger/social media sentiment analysis. Yet, folks search for what they are interested in and when I add a few parameters to the LinkedIn chart:

we see that people are far more interested in Scarlett Johansson, gas prices and even Snookie than they are in LinkedIn insecurity. Perhaps breaches just aren’t sexy enough or personally impacting enough to truly matter…even to security professionals.

I didn’t read through the Massachusetts 2011 Report on Data Breach Notifications [PDF] until recently, but once I went through the report my brain kept telling me “something is wrong”. Not something earth shattering, but more of a “something is off” signal. This happens more than I’d like as I tend to constantly background process what I intake visually.

As Twitter followers may lament, I have been known to transcribe useful tabular information from reports such as these, especially when I need to communicate them internally and I have done so with this report [gdocs] as well.

After working through the whole document, the last page of data is where I found the “off by one” error (see figure below). Someone performed “head math” vs copying & formatting from a spreadsheet. Never a good idea if you aren’t going to double-check the report thoroughly.

 

Off By One

My transcription (“Lost Stolen Misplaced” tab in the aforelinked workbook) assumes the “5” and “48” are correct and has the correct total (“53”). One of the problems when an error like this crops up is that you do not know where the error occurred, but since the sums of “12” and “277” are both correct in the spreadsheet and in the report, I think I’ve found the culprit. Unfortunately, a computational error such as this does foster suspicion on the accuracy of the rest of the report data.

It’s a lesson report writers should heed well: compute twice, publish once. Errant data can cut as deeply as a saw blade.

While I Have Your Attention

Since there aren’t many visualizations in  Massachusetts 2011 Report on Data Breach Notifications (3D numbers do not count), here are a few I made that I found helpful during my interpretation (2011 data unless otherwise specified):

# Residents Impacted By Breah Org

Number Of Breached By Org

Number of Breaches by Type 2008-2011

Residents Impacted By Breach Type

Lost/Stolen/Misplaced

Malicious/Non-Malicious

 

 

Another #spiffy tip from @MetricsHulk:

Evan Applegate put together a great & simple infographic for Businessweek that illustrates the number and size of 2011 data breaches pretty well.


(Click for larger version)

The summary data (below the timeline bubble chart) shows there was a 37.4% increase in reported incidents and over 260 million records exposed/stolen for the year. It will be interesting to see how this compares with the DBIR.

I’m on a “three things” motif for 2012, as it’s really difficult for most folks to focus on more than three core elements well. This is especially true for web developers as they have so much to contend with on a daily basis, whether it be new features, bug reports, user help requests or just ensuring proper caffeine levels are maintained.

In 2011, web sites took more hits then they ever have and—sadly—most attacks could have been prevented. I fear that the pastings will continue in 2012, but there are some steps you can take to help make your site less of a target.

Bookmark & Use OWASP’s Web Site Regularly

I’d feel a little sorry for hacked web sites if it weren’t for resources like OWASP, tools like IronBee and principles like Rugged being in abundance, with many smart folks associated with them being more than willing to offer counsel and advice.

If you run a web site or develop web applications and have not inhaled all the information OWASP has to provide, then you are engaging in the Internet equivalent of driving a Ford Pinto (the exploding kind) without seat belts, airbags, doors and a working dashboard console. There is so much good information and advice out there with solid examples that prove some truly effective security measures can really be implemented in a single line of code.

Make it a point to read, re-read and keep-up-to-date on new articles and resources that OWASP provides. I know you also need to beat the competition to new features and crank out “x” lines of code per day, but you also need to do what it takes to avoid joining the ranks of those in DataLossDB.

Patch & Properly Configure Your Bootstrap Components

Your web app uses frameworks, runs in some type of web container and sits on top of an operating system. Unfortunately, vulnerabilities pop up in each of those components from time to time and you need to keep on top of those and determine which ones you will patch and when. Sites like Secunia and US-CERT aggregate patch information pretty well for operating systems and popular server software components, but it’s best to also subscribe to release and security mailing lists for your frameworks and other bootstrap components.

Configuring your bootstrap environment securely is also important and you can use handy guides over at the Center for Internet Security and the National Vulnerability Database (which is also good for vulnerability reports). The good news is that you probably only need to double-check this a couple times a year and can also integreate secure configuration baselines into tools like Chef & Puppet.

Secure Data Appropriately

I won’t belabor this point (especially if you promise to read the OWASP guidance on this thoroughly) but you need to look at the data being stored and how it is accessed and determine the most appropriate way to secure it. Don’t store more than you absolutely need to. Encrypt password fields (and other sensitive data) with more than a plain MD5 hash. Don’t store any credit card numbers (really, just don’t) or tokenize them if you do (but you really don’t). Keep data off the front-end environment and watch the database and application logs with a service like Loggly (to see if there’s anything fishy going on).

I’m going to cheat and close with a fourth resolution for you: Create (and test) a data breach response plan. If any security professional is being honest, it’s virtually impossible to prevent a breach if a hacker is determined enough and the best thing you can do for your user base is to respond well when it happens. The only way to do that is have a plan and to test it (so you know what you are doing when the breach occurs). And, you should run your communications plan by other folks to make sure it’s adequate (ping @securitytwits for suggestions for good resources).

You want to be able to walk away from a breach with your reputation as intact as possible (so you’ll have to keep the other three resolutions anyway) with your users feeling fully informed and assured that you did everything you could to prevent it.

What other security-related resolutions are you making this year as a web developer or web site owner and what other tools/services are you using to secure your sites?

What can the @lulzsec senate.gov dump tell us about how the admins maintained their system/site?

[code light=”true”]SunOS a-ess-wwwi 5.10 Generic_139555-08 sun4u sparc SUNW,SPARC-Enterprise[/code]

means they haven’t kept up with OS patches. [-1 patch management]

[code light=”true”]celerra:/wwwdata 985G 609G 376G 62% /net/celerra/wwwdata[/code]

tells us they use EMC NAS kit for web content.

The ‘last‘ dump shows they were good about using normal logins and (probably) ‘sudo‘, and used ‘root‘ only on the console. [+1 privileged id usage]

They didn’t show the running apache version (just the config file…I guess I could have tried to profile that to figure out a range of version numbers). There’s decent likelihood that it was not at the latest patch version (based on not patching the OS) or major vendor version.

[code light=”true”]Alias /CFIDE /WEBAPPS/Apache/htdocs/CFIDE
Alias /coldfusion /WEBAPPS/Apache/htdocs/coldfusion
LoadModule jrun_module /WEBAPPS/coldfusionmx8/runtime/lib/wsconfig/1/mod_jrun22.so
JRunConfig Bootstrap 127.0.0.1:51800[/code]

Those and other entries says they are running Cold Fusion, an Adobe web application server/framework, on the same system. The “mx8” suggests an out of date, insecure version. [-1 layered product lifecycle management]

[code light=”true”] SSLEngine on
SSLCertificateFile /home/Apache/bin/senate.gov.crt
SSLCertificateKeyFile /home/Apache/bin/senate.gov.key
SSLCACertificateFile /home/Apache/bin/sslintermediate.crt[/code]

(along with the file system listing) suggests the @lulzsec folks have everything they need to host fake SSL web sites impersonating senate.gov.

Sadly,

[code light=”true”]LoadModule security_module modules/mod_security.so

<IfModule mod_security.c>
# Turn the filtering engine On or Off
SecFilterEngine On

# Make sure that URL encoding is valid
SecFilterCheckURLEncoding On

# Unicode encoding check
SecFilterCheckUnicodeEncoding Off

# Only allow bytes from this range
SecFilterForceByteRange 0 255

# Only log suspicious requests
SecAuditEngine RelevantOnly

# The name of the audit log file
SecAuditLog logs/audit_log

# Debug level set to a minimum
SecFilterDebugLog logs/modsec_debug_log    
SecFilterDebugLevel 0

# Should mod_security inspect POST payloads
SecFilterScanPOST On

# By default log and deny suspicious requests
# with HTTP status 500
SecFilterDefaultAction &quot;deny,log,status:500&quot;

</IfModule>[/code]

shows they had a built-in WAF available, but either did not configure it well enough or did not view the logs from it. [-10 checkbox compliance vs security]

[code light=”true”]-rw-r–r– 1 cfmx 102 590654 Feb 3 2006 66_00064d.jpg[/code]

(many entries with ‘102’ instead of a group name) shows they did not do identity & access management configurations well. [-1 IDM]

The apache config file discloses pseudo-trusted IP addresses & hosts (and we can assume @lulzsec has the passwords as well).

As I tweeted in the wee hours of the morning, this was a failure on many levels since they did not:

  • Develop & use secure configuration of their servers & layered products + web applications
  • Patch their operating systems
  • Patch their layered products

They did have a WAF, but it wasn’t configured well and they did not look at the WAF logs or – again, most likely – any system logs. This may have been a case where those “white noise port scans” everyone ignores was probably the intelligence probe that helped bring this box down.

Is this a terrible breach of government security? No. It’s a public web server with public data. They may have gotten to a firewalled zone, but it’s pretty much a given that no sensitive systems were on that same segment. This is just an embarrassment with a bit of extra badness in that the miscreants have SSL certs. It does show just how important it is to make sure server admins maintain systems well (note, I did not say security admins) and that application teams keep current, too. It also shows that we should be looking at all that log content we collect.

This wasn’t the first @lulzsec hack and it will not be the last. They are providing a good reminder to organizations to take their external network presence seriously.