Archive for the ‘Information Security’ Category

Secure360 (@Secure360) Data Analysis & Visualization Talk Resources #Sec360

Many thanks to all who attended the talk @jayjacobs & I gave at @Secure360 on Wednesday, May 15, 2013. As promised, here are the slides.

We’ve enumerated quite a bit of non-slide-but-in-presentation information that we wanted to aggregate into a blog post so you can vi[sz] along at home. If you need more of a guided path, I strongly encourage you to take a look at some of the free courses over at Coursera.

For starters, here’s a bit.ly bundle of data analysis & visualization bookmarks that @dseverski & I maintain. We’ve been doing (IMO) a pretty good job adding new resources as they come up and may have some duplicates to the ones below.

People Mentioned

Tools Mentioned

  • R : Jay & I probably use this a bit too much as a hammer (i.e. treat every data project as a nail) but it’s just far too flexible and powerful to not use as a go-to resource
  • RStudio : An amazing IDE for R. I, personally, usually despise IDEs (yes, I even dislike Xcode), but RStudio truly improves workflow by several orders of magnitude. There are both desktop and server versions of it; the latter gives you the ability to setup a multi-user environment and use the IDE from practically anywhere you are. RStudio also makes generating reproducible research a joy with built-in easy access to tools like kintr.
  • iPython : This version of Python takes an already amazing language and kicks it up a few notches. It brings it up to the level of R+RStudio, especially with it’s knitr-like iPython Notebooks for–again–reproducible research.
  • SecViz : Security-centric Visualization Site & Tools by @raffaelmarty
  • Mondrian : This tool needs far more visibility. It enables extremely quick visualization of even very large data sets. The interface takes a bit of getting used to, but it’s faster then typing R commands or fumbling in Excel.
  • Tableau : This tool may be one of the most accessible, fast & flexible ways to explore data sets to get an idea of where you need to/can do further analysis.
  • Processing : A tool that was designed from the ground up to help journalists create powerful, interactive data visualizations that you can slipstream directly onto the web via the Processing.js library.
  • D3 : The foundation of modern, data-driven visualization on the web.
  • Gephi : A very powerful tool when you need to explore networks & create beautiful, publication-worthy visualizations.
  • MongoDB : NoSQL database that’s highly & easily scaleable without a steep learning curve.
  • CRUSH Tools by Google : Kicks up your command-line data munging.

Slopegraph As A Service

@adammontville posited that Figure 15 from this year’s DBIR could use some slopegraph love. As I am not one to back down from a reasonable challenge, I obliged.

Here’s the original chart (produced by @jayjacobs):

figure15-orig

and, here’s a very quick slopegraph version of it:

figure15-slope

You can click on both/either for a larger version. If I had more time, I could have made the slopegraph version nicer, but it conveys a story fairly well the way it is, especially with the highlight on the two biggest changes between 2008 & 2012.

Two problems with the modified visualization are (a) multi-column slopegraphs blend into a parallel coordinate or plain old line graph pretty quickly (thus, reducing their slopegraph-y goodness); and, (b) the diversity of the year-over-year DBIR data set makes the comparison between years almost pointless (as the DBIR itself points out).

I also generated a proper/traditional slopegraph, comparing 2008 to 2012:

figure15-true-slope

The visualization is far more compact and, if the goal was to show the change between 2008 and 2012, it provides a much clearer view of what has and has not changed.

Wait Wait…Don’t Pwn Me! (SOURCE Boston) Transcript & Thank You!

wwdpm.001For those that wanted to play along at home, I’ve cleaned up the text and made the Wait Wait…Don’t Pwn Me! closing segment of SOURCE Boston 2013 available for download [PDF]. The video crew had cameras running, so keep checking the @SOURCEconf web site as it’ll probably get posted as they crank through all of the conference session videos (give them time, tho, as there are a ton of vids to process).

I also wanted to, again, thank @selenakyle for her most excellent job playing Carl Kasell; the awesome panelists: @451Wendy, @innismir & @andrewsmhay; @joshcorman for—yet again—putting up with me picking on him (and getting all the questions right); and our volunteers: @ra6bit, @Gmanfunky (and three more who I need Twitter handles from :-).

I only hope that @petersagal & the WWDTM crew can forgive me if they ever read the transcript or views the video of the segment.

SOURCE Boston (@SOURCEConf) Data Analysis & Visualization Talk Resources #srcbos13

Many thanks to all who attended the talk @jayjacobs & I gave at @SOURCEconf on Thursday, April 18, 2013. As promised, here are the slides which should be much less washed out than the projector version :-)

We’ve enumerated quite a bit of non-slide-but-in-presentation information that we wanted to aggregate into a blog post so you can viz along at home. If you need more of a guided path, I strongly encourage you to take a look at some of the free courses over at Coursera.

For starters, here’s a bit.ly bundle of data analysis & visualization bookmarks that @dseverski & I maintain. We’ve been doing (IMO) a pretty good job adding new resources as they come up and may have some duplicates to the ones below.

People Mentioned

Tools Mentioned

  • R : Jay & I probably use this a bit too much as a hammer (i.e. treat ever data project as a nail) but it’s just far too flexible and powerful to not use as a go-to resource
  • RStudio : An amazing IDE for R. I, personally, usually despise IDEs (yes, I even dislike Xcode), but RStudio truly improves workflow by several orders of magnitude. There are both desktop and server versions of it; the latter gives you the ability to setup a multi-user environment and use the IDE from practically anywhere you are. RStudio also makes generating reproducible research a joy with built-in easy access to tools like kintr.
  • iPython : This version of Python takes an already amazing language and kicks it up a few notches. It brings it up to the level of R+RStudio, especially with it’s knitr-like iPython Notebooks for–again–reproducible research.
  • SecViz : Security-centric Visualization Site & Tools by @raffaelmarty
  • Mondrian : This tool needs far more visibility. It enables extremely quick visualization of even very large data sets. The interface takes a bit of getting used to, but it’s faster then typing R commands or fumbling in Excel.
  • Tableau : This tool may be one of the most accessible, fast & flexible ways to explore data sets to get an idea of where you need to/can do further analysis.
  • Processing : A tool that was designed from the ground up to help journalists create powerful, interactive data visualizations that you can slipstream directly onto the web via the Processing.js library.
  • D3 : The foundation of modern, data-driven visualization on the web.
  • Gephi : A very powerful tool when you need to explore networks & create beautiful, publication-worthy visualizations.
  • MongoDB : NoSQL database that’s highly & easily scaleable without a steep learning curve.
  • CRUSH Tools by Google : Kicks up your command-line data munging.

Bahrain eGov Conference “Risk Reality” Slides

For those finding this post from the Bahrain eGov conference, I’d like to re-extend a hearty “Thank you!” for being one of most engaging, interactive and intelligent audiences I’ve ever experienced. I truly enjoyed talking with all of you.

You can find the slides on my Dropbox [PDF] and please do not hesitate to bounce any questions here or on Twitter (@hrbrmstr).

Off to Bahrain

Screenshot_4_8_13_8_03_AM

As a result of a prod by @djbphaedrus I’m off to the Bahrain International eGovernment Forum this week to host a two hour workshop on “information risk reality”. As a result, blogging & tweeting will be at significantly reduced levels, so enjoy the brief respite from my blatherings while you can :-)

If you happen to be in Bahrain while I’m there, drop me a note and I’m sure I can find time between Tuesday night and Thursday afternoon to say hello!

A Wish for Snow in Spring

The basic technique of cybercrime statistics—measuring the incidence of a given phenomenon (DDoS, trojan, APT) as a percentage of overall population size—had entered the mainstream of cybersecurity thought only in the previous decade. Cybersecurity as a science was still in its infancy, as many of its basic principles had yet to be established.

At the same time, the scientific method rarely intersected with the development and testing of new detection & prevention regimens. When you read through that endless stream of quack cybercures published daily on the Internet and at conferences like RSA, what strikes you most is not that they are all, almost without exception, based on anecdotal or woefully inadequately small evidence. What’s striking is that they never apologize for the shortcoming. They never pause to say, “Of course, this is all based on anecdotal evidence, but hear me out.” There’s no shame in these claims, no awareness of the imperfection of the methods, precisely because it seems to eminently reasonable that the local observation of a handful of minuscule cases might serve the silver bullet for cybercrime, if you look hard enough.

But, cybercrime couldn’t be studied in isolation. It was as much a product of the internet expansion as news and social media, where it was so uselessly anatomized. To understand the beast, you needed to think on the scale of the enterprise, from the hacker’s-eye view. You needed to look at the problem from the perspective of Henry Mayhew’s balloon. And you needed a way to persuade others to join you there.

Sadly, that’s not a modern story. It’s an adapted quote from chapter 4 (pp. 97-98, paperback) of The Ghost Map, by Steven Johnson, a book on the cholera epidemic of 1854.

I won’t ruin the book nor continue my attempt at analogy any further. Suffice it to say, you should read the book—if you haven’t already—and join me in calling out for the need for the John Snow of our cyber-time to arrive.

π, Awareness, DataVis, VAST 2013, Moar data! & GReader Machinations

Far too many interesting bits to spam on Twitter individually but each is worth getting the word out on:

tumblr_m0wabdueuR1qd78lno1_1280

*Image via davincismurf

Visualizing Risky Words — Part 4 (D3 Word Trees)

This is a fourth post in my Visualizing Risky Words series. You’ll need to read starting from that link for context if you’re just jumping in now.

I was going to create a rudimentary version of an interactive word tree for this, but the extremely talented @jasondavies (I marvel especially at his cartographic work) just posted what is probably the best online word tree generator ever made…and in D3 no less.

Word_Tree

A word tree is a “visual interactive concordance” and was created back in 2007 by Martin Wattenberg and Fernanda Viégas. You can read more about this technique on your own, but a good summary (from their site) is:

A word tree is a visual search tool for unstructured text, such as a book, article, speech or poem. It lets you pick a word or phrase and shows you all the different contexts in which it appears. The contexts are arranged in a tree-like branching structure to reveal recurrent themes and phrases.

I pasted the VZ RISK INTSUM texts into Jason’s tool so you could investigate the corpus to your heart’s content. I would suggest exploring “patch”, “vulnerability”, “adobe”, “breach” & “malware” (for starters).

Jason’s implementation is nothing short of beautiful. He uses SVG text tspans to make the individual text elements not just selectable but easily scaleable with browser window resize events.

Screenshot_3_12_13_1_36_PM

The actual word tree D3 javascript code shows just how powerful the combination of the language and @mbostock’s library is. He has, in essence, built a completely cross-platform tokenizer and interactive visualization tool in ~340 lines of javascript. Working your way through that code through to understanding will really help improve your D3 skills.

Performance Optimization WordPress Plugins by W3 EDGE