Skip navigation

Earlier this week, @jayjacobs & I both received our acceptance notice for the talk we submitted to the RSA CFP! [W00t!] Now the hard part: crank out a compelling presentation in the next six weeks! If you’re interested at all in doing more with your security data, this talk is for you. Full track/number & details below:

Session Track: Governance, Risk & Compliance
Session Code: GRC-T18
Scheduled Date: 02/26/2013
Scheduled Time: 2:30 PM – 3:30 PM
Session Length: 1 hr
Session Title: Data Analysis and Visualization for Security Professionals
Session Classification: Intermediate
Session Keywords: metrics, visualization, risk management, research
Short Abstract: You have a deluge of security-related data coming from all directions and may even have a fancy dashboard full of pretty charts. However, unless you know the right questions to ask and how to ask them, all you really have are compliance artifacts. Move beyond the checkbox and learn techniques for collecting, exploring and visualizing the stories within our security data.

UPDATE: As indicated in the code comments, Google took down the cone KML files. I’ll be changing the code to use the NHC archived cone files later tonight

I will (most likely) not be littering the blog with any more updates to the ‘Sandy’ code unless they are really significant. You can follow along at home to any changes over at github.

Of note (in terms of changes) is the addition of the 5-day forecast cone and some annotations:

As indicated in the code comments, Google took down the cone KML files. I’ll be changing the code to use the NHC archived cone files later tonight

NOTE: There is significantly updated code on github for the Sandy ‘R’ dataviz.

This is a follow-up post to the quickly crafted Watch Sandy in “R” post last night. I noticed that Google provided the KML on their crisis map and wanted to show how easy it is to add it to previous code.

I added comments inline to make it easier to follow along.

library(maps)
library(maptools)
library(rgdal)
 
# need this for handling "paste" extra spaces
trim.trailing <- function (x) sub("\\s+$", "", x)
 
# get track data from Unisys
wx = read.table(file="http://weather.unisys.com/hurricane/atlantic/2012/SANDY/track.dat", skip=3,fill=TRUE)
 
# join last two columns (type of storm) that read.table didn't parse as one
# and give the data frame real column names
wx$V7 = trim.trailing(paste(wx$V7,wx$V8," "))
wx$V8 = NULL
colnames(wx) = c("Advisory","Latitude","Longitude","Time","WindSpeed","Pressure","Status")
 
# annotate the name with forecast (+12/+24/etc) projections
wx$Advisory = unlist(strsplit(toString(wx$Advisory),", "))
wx$a = ""
wx$a[grep("\\+",wx$Advisory)] = wx$Advisory[grep("\\+",wx$Advisory)]
wx$Status = trim.trailing(paste(wx$Status,wx$a," "))
 
# cheap way to make past plots one color and forecast plots another
wx$color = "red"
wx$color[grep("\\+",wx$Advisory)] = "orange"
 
# only want part of the map
xlim=c(-90,-60)
 
# plot the map itself
map("state", interior = FALSE, xlim=xlim)
map("state", boundary = FALSE, col="gray", add = TRUE,xlim=xlim)
 
# plot the track with triangles and colors
lines(x=wx$Longitude,y=wx$Latitude,col="black",cex=0.75)
points(x=wx$Longitude,y=wx$Latitude,col=wx$color,pch=17,cex=0.75)
 
# annotate it with the current & projects strength status + forecast
text(x=wx$Longitude,y=wx$Latitude,col='blue',labels=wx$Status,adj=c(-.15),cex=0.33)
 
# get the forecast "cone" from the KML that google's crisis map provides
# NOTE: you need curl & unzip (with funzip) on your system
# NOTE: Google didn't uniquely name the cone they provide, so this will break
#       post-Sandy. I suggest saving the last KML & track files out out if you
#       want to save this for posterity
 
# this gets the cone and makes it into a string that getKMLcoordinates can process
coneKML = paste(unlist(system("curl -s -o - http://mw1.google.com/mw-weather/maps_hurricanes/three_day_cone.kmz | funzip", intern = TRUE)),collapse="")
 
# process the coneKML and make a data frame out of it
coords = getKMLcoordinates(textConnection(coneKML))
coords = data.frame(SpatialPoints(coords, CRS('+proj=longlat')))
 
# this plots the cone and leaves the track visible
polygon(coords$X1,coords$X2, pch=16, border="red", col='#FF000033',cex=0.25)
 
# no one puts Sandy in a box (well, except us)
box()

UPDATE:

  • Significantly updated code on github
  • Well, a couple folks asked how to make it more “centered” on the hurricane and stop the labels from chopping off, so I modified the previous code a bit to show how to do that.

As indicated in the code comments, Google took down the cone KML files. I’ll be changing the code to use the NHC archived cone files later tonight

While this will pale in comparison to the #spiffy Sandy graphics on Weather Underground and other sites (including Google’s “Crisis Center” maps), I thought it might be interesting to show folks how they can keep an eye on “Sandy” using R.

library(maps)
 
trim.trailing <- function (x) sub("\\s+$", "", x)
 
wx = read.table(file="http://weather.unisys.com/hurricane/atlantic/2012/SANDY/track.dat", skip=3,fill=TRUE)
wx$V7 = trim.trailing(paste(wx$V7,wx$V8," "))
wx$V8 = NULL
colnames(wx) = c("Advisory","Latitude","Longitude","Time","WindSpeed","Pressure","Status")
 
wx$Advisory = unlist(strsplit(toString(wx$Advisory),", "))
wx$a = ""
wx$a[grep("\\+",wx$Advisory)] = wx$Advisory[grep("\\+",wx$Advisory)]
wx$Status = trim.trailing(paste(wx$Status,wx$a," "))
 
wx$color = "red"
wx$color[grep("\\+",wx$Advisory)] = "orange"
 
xlim=c(-90,-60)
 
map("state", interior = FALSE, xlim=xlim)
map("state", boundary = FALSE, col="gray", add = TRUE,xlim=xlim)
 
points(x=wx$Longitude,y=wx$Latitude,col=wx$color,pch=17,cex=0.75)
 
text(x=wx$Longitude,y=wx$Latitude,col='blue',labels=wx$Status,adj=c(-.15),cex=0.33)
box()

This short script reads in data from the Unisys Hurricane Weather Center and plots colored symbols on a US map with the forecast projections. I may be extending this, but it’s a good starting point for anyone who wants to play with mapping relatively live data from the internet (or learn some more R).

I played around with OSE Firewall for WordPress for a couple days to see if it was worth switching to from the plugin I was previously using. It’s definitely not as full featured and I didn’t see any WP database extensions where it kept a log I could review/analyze, so I whipped up a little script to extract all the alert data from the Gmail account I setup for it to log to.

The script below – while focused on getting OSE Firewall alert data – can be easily modified to search for other types of automated/formatted e-mails and build a CSV file with the results. Remember, tho, that you’re going to be putting your e-mail credentials in this file (if you end up using it) so either use a mailbox you don’t care about or make sure you use sane permissions on the script and keep it somewhere safe.

I tested it on linux boxes, but it should work anywhere you have Python and mailbox access.

I highly doubt there will be any updates to this version (I’m not using OSE Firewall anymore), but you an grab the source below or on github. There should be sufficient annotation in the comments, but if you have any questions, drop a note in the comments.

# oswfw.py - extract WordPress OSE Firewall mail alerts to CSV
# 
# Author: @hrbrmstr
#

import imaplib
import datetime
import re

# get 'today' (in the event you are just reporting on today's hits
date = (datetime.date.today() - datetime.timedelta(1)).strftime("%d-%b-%Y")

# setup IMAP connection

gmail = imaplib.IMAP4_SSL('imap.gmail.com',993) # use your IMAP server it not Gmail
gmail.login("YOUR_IMAP_USERNAME","YOUR_PASSWORD")
gmail.select('[Gmail]/All Mail') # Your IMAP's "all mail" if not using Gmail

# now search for all mails with "OSE Firewall" in the subject

# uncomment this line and comment out the next one to just get results from 'today'
#result, data = gmail.uid('search', None, '(SENTSINCE {date} HEADER Subject "OSE Firewall*")'.format(date=date))
result, data = gmail.uid('search', None, '(HEADER Subject "OSE Firewall*")')

# setup CSV file for output

f = open("osefw.csv", "w+")
f.write("Date,IP,URI,Method,UserAgent,Referer\n") ;

# cycle through result set from IMAP search query, extracting salient info
# from headers/body of each found message

for msg in data[0].split():

    # fetch the msg for the UID
    res, msg_txt = gmail.uid('fetch', msg, '(RFC822)')

    # get rid of carriage returns
    body = re.sub(re.compile('\r', re.MULTILINE), '', msg_txt[0][1])

    # extract salient fields from the message body/header
    DATE = re.findall('^Date: (.*?)$', body, re.M)
    IP = re.findall('^FROM IP: http:\/\/whois.domaintools.com\/(.*?)$', body, re.M)
    URI = re.findall('^URI: (.*?)$', body, re.M)
    METHOD = re.findall('^METHOD: (.*?)$', body, re.M)
    USERAGENT= re.findall('^USERAGENT: (.*?)$', body, re.M)
    REFERER = re.findall('^REFERER: (.*?)$', body, re.M)

    # format for CSV output
    ose_log  = "%s,%s,%s,%s,%s,%s\n" % (DATE, IP, URI, METHOD, USERAGENT, REFERER)

    # quicker to replace array output brackets than to deal with non-array results checking
    f.write(re.sub("[\[\]]*", "", ose_log))

    f.flush() ;

gmail.logout()
f.close()

There’s a good FAQ on how to do the MongoDB query -> R data frame but I wanted to post a more complete example that included the database connection and query setup since I suspect there are folks new to Mongo who would appreciate the end-to-end view. The code is fully annotated with comments, and I’ll caveat that this was for pulling data from my solar radiation sensor (it provides some context for the query and values).

library(rmongodb)
library(chron) # NOTE: you don't need this for Mongo; it's for the sensor readings plot
 
# connect to mongodb server on host and connect to db
mongo = mongo.create(host="MONGODB_HOST",db="DATABASE_NAME")
 
if (mongo.is.connected(mongo)) {
 
  # this sets up the query (there are other "buffer.append…" functions
  today = format(Sys.time(), "%Y-%m-%d")
  buf = mongo.bson.buffer.create()
  mongo.bson.buffer.append.string(buf,"date",today)
  query = mongo.bson.from.buffer(buf)
 
  # run the query and get total results & the starting db cursor  
  todays.readings.count = mongo.count(mongo,"solar.readings",query)
  todays.readings.cursor = mongo.find(mongo,"solar.readings",query)
 
  # setup some vectors to hold our results  
  time = vector("character",todays.readings.count)
  lux = vector("numeric",todays.readings.count)
  full = vector("numeric",todays.readings.count)
  IR = vector("numeric",todays.readings.count)
 
  i = 1
 
  # iterate over the results with the cursor    
  while (mongo.cursor.next(todays.readings.cursor)) {
 
    # get the values of the current record
    cval = mongo.cursor.value(todays.readings.cursor)
 
    # split it out into our vectors    
    time[i] = mongo.bson.value(cval,"time")
    full[i] = mongo.bson.value(cval,"Full")
    lux[i] = mongo.bson.value(cval,"Lux")
    IR[i] = mongo.bson.value(cval,"IR")
 
    i = i + 1
 
  }
 
  # packages all our values up into a data frame  
  df = as.data.frame(list(time=time,full=full,lux=lux,IR=IR))
 
  # (for my wx data, I need 'time' as an actual time value)  
  df$Time = times(df$time)
  df$time = NULL
 
  par(mfrow=c(3,1))
  plot(df$full~df$Time,type="l",col="blue",lwd="1",xlab="",ylab="Full Spectrum",main=paste(today," Solar Radiation Readings"))
  plot(df$lux~df$Time,type="l",col="blue",lwd="1",xlab="",ylab="Lux (calculated)")
  plot(df$IR~df$Time,type="l",col="blue",lwd="1",xlab="Time",ylab="IR")
  par(mfrow=c(1,1))
 
}

HP & the Ponemon Institute have released their third annual “Cost of Cybercrime” report and the web wizards at HP have given us an infographic from it:


(You can see the full size one at the above link)

While some designers may think that infographic visualizations are not subject to the same scrutiny as “real” charts & graphs, I vehemently disagree. In this particular infographic, my eyes were immediately drawn to the donut chart since it’s usually a poor choice to begin with and many designers make the same error as this one did.

What’s wrong with it?“, you ask? Well, the donut chart is designed to be a modified pie chart which itself is supposed to display components of a whole (i.e. the circle represents 100% and the slices are the fractional components of said 100%). Donut charts are really hard to read since your eye misses the area & angular cues that help to make the distinction between slices.

Since the whole purpose of the “attacks” chart is to give you an idea of how bad/worse 2010/11/12 are from each other, it would be better served with a bar chart:

You can immediately see the distinction & increase over time much easier than in the donut. And, if you’re still not believing that a pie would give you a better visual indicator (yet still be a bad choice since we’re not really comparing parts of a whole) you be the judge:

But, what I may be most upset about is the fact that they released the chart at the end of October, included a spider in the upper-left but didn’t include a radar plot anywhere in the infographic! #spidersarescary :-)

All infographic criticism aside, I do thank HP & Ponemon for providing the results of their research free of charge for the benefit of the entire infosec community. I look forward to digesting the whole report.

I’m not entirely sure what happened to cause it, but the “Show in Finder…” feature of a plethora of OS X apps broke on me just over a month ago. I poked around a bit, deleted a preference file here or there and un-tweaked some tweaks to see if it would help and nothing resolved the issue.

So, I finally had some spare minutes to try the good ol’ “Repair Permissions” feature of Disk Utility. Ten minutes plus one reboot later and —BOOM!—I’m back in business again.

Of note were literally hundreds of issues fixed while permissions were being repaired. Methinks it may have to do with a system crash or two that happened during some of the 10.8 beta update releases I was running.

Hopefully this helps even one other person looking to fix this tedious issue.