Skip navigation

Author Archives: hrbrmstr

Don't look at me…I do what he does — just slower. #rstats avuncular • ?Resistance Fighter • Cook • Christian • [Master] Chef des Données de Sécurité @ @rapid7

Earlier this week, @jayjacobs & I both received our acceptance notice for the talk we submitted to the RSA CFP! [W00t!] Now the hard part: crank out a compelling presentation in the next six weeks! If you’re interested at all in doing more with your security data, this talk is for you. Full track/number & details below:

Session Track: Governance, Risk & Compliance
Session Code: GRC-T18
Scheduled Date: 02/26/2013
Scheduled Time: 2:30 PM – 3:30 PM
Session Length: 1 hr
Session Title: Data Analysis and Visualization for Security Professionals
Session Classification: Intermediate
Session Keywords: metrics, visualization, risk management, research
Short Abstract: You have a deluge of security-related data coming from all directions and may even have a fancy dashboard full of pretty charts. However, unless you know the right questions to ask and how to ask them, all you really have are compliance artifacts. Move beyond the checkbox and learn techniques for collecting, exploring and visualizing the stories within our security data.

UPDATE: As indicated in the code comments, Google took down the cone KML files. I’ll be changing the code to use the NHC archived cone files later tonight

I will (most likely) not be littering the blog with any more updates to the ‘Sandy’ code unless they are really significant. You can follow along at home to any changes over at github.

Of note (in terms of changes) is the addition of the 5-day forecast cone and some annotations:

As indicated in the code comments, Google took down the cone KML files. I’ll be changing the code to use the NHC archived cone files later tonight

NOTE: There is significantly updated code on github for the Sandy ‘R’ dataviz.

This is a follow-up post to the quickly crafted Watch Sandy in “R” post last night. I noticed that Google provided the KML on their crisis map and wanted to show how easy it is to add it to previous code.

I added comments inline to make it easier to follow along.

library(maps)
library(maptools)
library(rgdal)
 
# need this for handling "paste" extra spaces
trim.trailing <- function (x) sub("\\s+$", "", x)
 
# get track data from Unisys
wx = read.table(file="http://weather.unisys.com/hurricane/atlantic/2012/SANDY/track.dat", skip=3,fill=TRUE)
 
# join last two columns (type of storm) that read.table didn't parse as one
# and give the data frame real column names
wx$V7 = trim.trailing(paste(wx$V7,wx$V8," "))
wx$V8 = NULL
colnames(wx) = c("Advisory","Latitude","Longitude","Time","WindSpeed","Pressure","Status")
 
# annotate the name with forecast (+12/+24/etc) projections
wx$Advisory = unlist(strsplit(toString(wx$Advisory),", "))
wx$a = ""
wx$a[grep("\\+",wx$Advisory)] = wx$Advisory[grep("\\+",wx$Advisory)]
wx$Status = trim.trailing(paste(wx$Status,wx$a," "))
 
# cheap way to make past plots one color and forecast plots another
wx$color = "red"
wx$color[grep("\\+",wx$Advisory)] = "orange"
 
# only want part of the map
xlim=c(-90,-60)
 
# plot the map itself
map("state", interior = FALSE, xlim=xlim)
map("state", boundary = FALSE, col="gray", add = TRUE,xlim=xlim)
 
# plot the track with triangles and colors
lines(x=wx$Longitude,y=wx$Latitude,col="black",cex=0.75)
points(x=wx$Longitude,y=wx$Latitude,col=wx$color,pch=17,cex=0.75)
 
# annotate it with the current & projects strength status + forecast
text(x=wx$Longitude,y=wx$Latitude,col='blue',labels=wx$Status,adj=c(-.15),cex=0.33)
 
# get the forecast "cone" from the KML that google's crisis map provides
# NOTE: you need curl & unzip (with funzip) on your system
# NOTE: Google didn't uniquely name the cone they provide, so this will break
#       post-Sandy. I suggest saving the last KML & track files out out if you
#       want to save this for posterity
 
# this gets the cone and makes it into a string that getKMLcoordinates can process
coneKML = paste(unlist(system("curl -s -o - http://mw1.google.com/mw-weather/maps_hurricanes/three_day_cone.kmz | funzip", intern = TRUE)),collapse="")
 
# process the coneKML and make a data frame out of it
coords = getKMLcoordinates(textConnection(coneKML))
coords = data.frame(SpatialPoints(coords, CRS('+proj=longlat')))
 
# this plots the cone and leaves the track visible
polygon(coords$X1,coords$X2, pch=16, border="red", col='#FF000033',cex=0.25)
 
# no one puts Sandy in a box (well, except us)
box()

UPDATE:

  • Significantly updated code on github
  • Well, a couple folks asked how to make it more “centered” on the hurricane and stop the labels from chopping off, so I modified the previous code a bit to show how to do that.

As indicated in the code comments, Google took down the cone KML files. I’ll be changing the code to use the NHC archived cone files later tonight

While this will pale in comparison to the #spiffy Sandy graphics on Weather Underground and other sites (including Google’s “Crisis Center” maps), I thought it might be interesting to show folks how they can keep an eye on “Sandy” using R.

library(maps)
 
trim.trailing <- function (x) sub("\\s+$", "", x)
 
wx = read.table(file="http://weather.unisys.com/hurricane/atlantic/2012/SANDY/track.dat", skip=3,fill=TRUE)
wx$V7 = trim.trailing(paste(wx$V7,wx$V8," "))
wx$V8 = NULL
colnames(wx) = c("Advisory","Latitude","Longitude","Time","WindSpeed","Pressure","Status")
 
wx$Advisory = unlist(strsplit(toString(wx$Advisory),", "))
wx$a = ""
wx$a[grep("\\+",wx$Advisory)] = wx$Advisory[grep("\\+",wx$Advisory)]
wx$Status = trim.trailing(paste(wx$Status,wx$a," "))
 
wx$color = "red"
wx$color[grep("\\+",wx$Advisory)] = "orange"
 
xlim=c(-90,-60)
 
map("state", interior = FALSE, xlim=xlim)
map("state", boundary = FALSE, col="gray", add = TRUE,xlim=xlim)
 
points(x=wx$Longitude,y=wx$Latitude,col=wx$color,pch=17,cex=0.75)
 
text(x=wx$Longitude,y=wx$Latitude,col='blue',labels=wx$Status,adj=c(-.15),cex=0.33)
box()

This short script reads in data from the Unisys Hurricane Weather Center and plots colored symbols on a US map with the forecast projections. I may be extending this, but it’s a good starting point for anyone who wants to play with mapping relatively live data from the internet (or learn some more R).

I played around with OSE Firewall for WordPress for a couple days to see if it was worth switching to from the plugin I was previously using. It’s definitely not as full featured and I didn’t see any WP database extensions where it kept a log I could review/analyze, so I whipped up a little script to extract all the alert data from the Gmail account I setup for it to log to.

The script below – while focused on getting OSE Firewall alert data – can be easily modified to search for other types of automated/formatted e-mails and build a CSV file with the results. Remember, tho, that you’re going to be putting your e-mail credentials in this file (if you end up using it) so either use a mailbox you don’t care about or make sure you use sane permissions on the script and keep it somewhere safe.

I tested it on linux boxes, but it should work anywhere you have Python and mailbox access.

I highly doubt there will be any updates to this version (I’m not using OSE Firewall anymore), but you an grab the source below or on github. There should be sufficient annotation in the comments, but if you have any questions, drop a note in the comments.

# oswfw.py - extract WordPress OSE Firewall mail alerts to CSV
# 
# Author: @hrbrmstr
#

import imaplib
import datetime
import re

# get 'today' (in the event you are just reporting on today's hits
date = (datetime.date.today() - datetime.timedelta(1)).strftime("%d-%b-%Y")

# setup IMAP connection

gmail = imaplib.IMAP4_SSL('imap.gmail.com',993) # use your IMAP server it not Gmail
gmail.login("YOUR_IMAP_USERNAME","YOUR_PASSWORD")
gmail.select('[Gmail]/All Mail') # Your IMAP's "all mail" if not using Gmail

# now search for all mails with "OSE Firewall" in the subject

# uncomment this line and comment out the next one to just get results from 'today'
#result, data = gmail.uid('search', None, '(SENTSINCE {date} HEADER Subject "OSE Firewall*")'.format(date=date))
result, data = gmail.uid('search', None, '(HEADER Subject "OSE Firewall*")')

# setup CSV file for output

f = open("osefw.csv", "w+")
f.write("Date,IP,URI,Method,UserAgent,Referer\n") ;

# cycle through result set from IMAP search query, extracting salient info
# from headers/body of each found message

for msg in data[0].split():

    # fetch the msg for the UID
    res, msg_txt = gmail.uid('fetch', msg, '(RFC822)')

    # get rid of carriage returns
    body = re.sub(re.compile('\r', re.MULTILINE), '', msg_txt[0][1])

    # extract salient fields from the message body/header
    DATE = re.findall('^Date: (.*?)$', body, re.M)
    IP = re.findall('^FROM IP: http:\/\/whois.domaintools.com\/(.*?)$', body, re.M)
    URI = re.findall('^URI: (.*?)$', body, re.M)
    METHOD = re.findall('^METHOD: (.*?)$', body, re.M)
    USERAGENT= re.findall('^USERAGENT: (.*?)$', body, re.M)
    REFERER = re.findall('^REFERER: (.*?)$', body, re.M)

    # format for CSV output
    ose_log  = "%s,%s,%s,%s,%s,%s\n" % (DATE, IP, URI, METHOD, USERAGENT, REFERER)

    # quicker to replace array output brackets than to deal with non-array results checking
    f.write(re.sub("[\[\]]*", "", ose_log))

    f.flush() ;

gmail.logout()
f.close()

There’s a good FAQ on how to do the MongoDB query -> R data frame but I wanted to post a more complete example that included the database connection and query setup since I suspect there are folks new to Mongo who would appreciate the end-to-end view. The code is fully annotated with comments, and I’ll caveat that this was for pulling data from my solar radiation sensor (it provides some context for the query and values).

library(rmongodb)
library(chron) # NOTE: you don't need this for Mongo; it's for the sensor readings plot
 
# connect to mongodb server on host and connect to db
mongo = mongo.create(host="MONGODB_HOST",db="DATABASE_NAME")
 
if (mongo.is.connected(mongo)) {
 
  # this sets up the query (there are other "buffer.append…" functions
  today = format(Sys.time(), "%Y-%m-%d")
  buf = mongo.bson.buffer.create()
  mongo.bson.buffer.append.string(buf,"date",today)
  query = mongo.bson.from.buffer(buf)
 
  # run the query and get total results & the starting db cursor  
  todays.readings.count = mongo.count(mongo,"solar.readings",query)
  todays.readings.cursor = mongo.find(mongo,"solar.readings",query)
 
  # setup some vectors to hold our results  
  time = vector("character",todays.readings.count)
  lux = vector("numeric",todays.readings.count)
  full = vector("numeric",todays.readings.count)
  IR = vector("numeric",todays.readings.count)
 
  i = 1
 
  # iterate over the results with the cursor    
  while (mongo.cursor.next(todays.readings.cursor)) {
 
    # get the values of the current record
    cval = mongo.cursor.value(todays.readings.cursor)
 
    # split it out into our vectors    
    time[i] = mongo.bson.value(cval,"time")
    full[i] = mongo.bson.value(cval,"Full")
    lux[i] = mongo.bson.value(cval,"Lux")
    IR[i] = mongo.bson.value(cval,"IR")
 
    i = i + 1
 
  }
 
  # packages all our values up into a data frame  
  df = as.data.frame(list(time=time,full=full,lux=lux,IR=IR))
 
  # (for my wx data, I need 'time' as an actual time value)  
  df$Time = times(df$time)
  df$time = NULL
 
  par(mfrow=c(3,1))
  plot(df$full~df$Time,type="l",col="blue",lwd="1",xlab="",ylab="Full Spectrum",main=paste(today," Solar Radiation Readings"))
  plot(df$lux~df$Time,type="l",col="blue",lwd="1",xlab="",ylab="Lux (calculated)")
  plot(df$IR~df$Time,type="l",col="blue",lwd="1",xlab="Time",ylab="IR")
  par(mfrow=c(1,1))
 
}

I’m not entirely sure what happened to cause it, but the “Show in Finder…” feature of a plethora of OS X apps broke on me just over a month ago. I poked around a bit, deleted a preference file here or there and un-tweaked some tweaks to see if it would help and nothing resolved the issue.

So, I finally had some spare minutes to try the good ol’ “Repair Permissions” feature of Disk Utility. Ten minutes plus one reboot later and —BOOM!—I’m back in business again.

Of note were literally hundreds of issues fixed while permissions were being repaired. Methinks it may have to do with a system crash or two that happened during some of the 10.8 beta update releases I was running.

Hopefully this helps even one other person looking to fix this tedious issue.

UPDATE: While the cautionary advice still (IMO) holds true, it turns out that – once I actually looked at the lat/lng pair being returned for the anomaly presented below, the weird results come from horrible precision resolution from the initial IP address → lat/lng conversion (which isn’t the fault of @fslabs, but of the service they used). It’s hard to get a ZIP code right/more precise when you only have integer resolution (38.0,-97.0).

We’re still crunching through some of the ZeroAccess data and have some (hopefully) interesting results to present, but an weirld GeoIP anomaly has come up that I wanted to quickly share.

To get some more granular data, I’m using the GeoNames API to get the latitude/longitude pairs down to various US-level ZIP codes to facilitate additional analysis. During this exercise (which hasn’t finished as of this blog post due to needing to pace the API calls), it has become quite noticeable that GeoIP-coding definitely has flaws. Take, for example, Potwin, KS:

This cozy little town (population ~450) has the largest collection of bots, so far : 800. Yes, 800 bots (computers) in a 128 acre town of 450 people. (#unlikely)

Either there’s some weirdness in the way @fslabs is tracking the bots (which is possible since we only have a lat/long file with no other context/data to look at) or we need to treat GeoIP results very lightly – or at least do some post-processing validation – since I suspect a decent portion of the 800 bots are actually in neighbor to the southwest:

I know GeoIP translation is not an exact science and is dependent upon a whole host of factors, but this one was just pretty humorous. It has caused me to slightly question the @fslabs data a bit, but I’m comfortable assuming they did sufficient due diligence before crafting an IP address list to geocode.

In case you’re wondering what the other “Top US Bots” are (with 7K more to crunch):