Skip navigation

We had our first, real, snowfall of the season in Maine today and that usually means school delays/closings. Our “local” station – @WCHS6 – has a Storm Center Closings page as well as an SMS notification service. I decided this morning that I needed a command line version (and, eventually, a version that sends me a Twitter DM), but I also was tight for time (a lunchtime meeting ending early is responsible for this blog post).

While I’ve consumed my share of Beautiful Soup and can throw down some mechanize with the best of them, it came to me that there may be an even easier way, and one that may also help with the eventual blocking of such a scraping service.

I setup a Google Drive spreadsheet to use the importHTML formula to read in the closings table on the page:

=importHTML("http://www.wcsh6.com/weather/severe_weather/cancellations_closings/default.aspx","table",0)

Then did a File→Publish to the web and setup up Sheet 1 to “Automatically republish when changes are made” and also to have the link be to the CSV version of the data:

Screenshot 12:17:12 1:16 PM

The raw output looks a bit like:

Name,Status,Last Updated
,,
Westbook Seniors,Luncheon PPD to January 7th,12/17/2012 5:22:51
,,
Allied Wheelchair Van Services,Closed,12/17/2012 6:49:47
,,
American Legion - Dixfield,Bingo cancelled,12/17/2012 11:44:12
,,
American Legion Post 155 - Naples,Closed,12/17/2012 12:49:00

The conversion has some “blank” lines but that’s easy enough to filter out with some quick bash:

curl --silent "https://docs.google.com/spreadsheet/pub?key=0AlCY1qfmPPZVdFBsX3kzLUVHZl9Mdmw3bS1POWNsWnc&single=true&gid=0&outpu
t=csv" | grep -v "^,,"

And, looking for the specific school(s) of our kids is an easy grep as well.

The reason this is interesting is that the importHTML is dynamic and will re-convert the HTML table each time the code retrieves the CSV URL. Couple that with the fact that it’s far less likely that Google will be blocked than it is my IP address(es) and this seems to be a pretty nice alternative to traditional parsing.

If I get some time over the break, I’ll do a quick benchmark of using this method over some python and perl scraping/parsing methods.

Back in 2011, @joshcorman posited “HD Moore’s Law” which is basically:

Casual Attacker power grows at the rate of Metasploit

I am officially submitting the ‘fing’ corollary to said law:

Fundamental defender efficacy can be ascertained within 10 ‘fings’

The tool ‘fing’ : http://overlooksoft.com/fing : is a very lightweight-yet-wicked-functional network & services scanner that runs on everything from the linux command line to your iPhone. I have a permanent ‘sensor’ always running at home and have it loaded on every device I can. While the fine folks at Overlook Software would love you to death for buying a fingbox subscription, it can be used quite nicely in standalone mode to great effect.

I break out ‘fing’ during tedious meetings, bus/train/plane rides or trips to stores (like Home Depot [hint]) just to see who/what else is on the Wi-Fi network and to also get an idea of how the network itself is configured.

What’s especially fun at—um—*your* workplace is to run it from the WLAN (iOS/Android) to see how many hosts it finds on the broadcast domain, then pick pseudo-random (or just interesting looking) hosts to see what services (ports) are up and then use the one-click-access mechanism to see what’s running behind the port (especially browser-based services).

How does this relate to HD Moore’s Law? What makes my corollary worthy of an extension?

If I use ‘fing’ to do a broadcast domain discovery, select ten endpoints and discover at least one insecure configuration (e.g. telnet on routers, port 80 admin login screens, highly promiscuous number of ports) you should not consider yourself to be a responsible defender. The “you” is a bit of a broad term, but if your multi-millon dollar (assuming an enterprise) security program can be subverted internally with just ‘fing’, you really won’t be able to handle metasploit, let alone a real attacker.

While metasploit is pretty straightforward to run even for a non-security professional, ‘fing’ is even easier and should be something you show your network and server admins (and developers) how to use on their own, even if you can’t get it officially sanctioned (yes, I said that). Many (most) non-security IT professionals just don’t believe us when we tell them how easy it is for attackers to find things to exploit and this is a great, free way to show them. Or, to put it another way: “demos speak louder than risk assessments“.

If ‘fing’ isn’t in your toolbox, get it in there. If you aren’t running it regularly at work/home/out-and-about, do so. If you aren’t giving your non-security colleagues simple tools to help them be responsible defenders: start now.

And, finally, if you’re using ‘fing’ or any other simple tool in a similar capacity, drop a note in the comments (always looking for useful ways to improve security).

Naomi Robbins is running a graph makeover challenge over at her Forbes blog and this is my entry for the B2B/B2C Traffic Sources one (click for larger version):

And, here’s the R source for how to generate it:

library(ggplot2)
 
df = read.csv("b2bb2c.csv")
 
ggplot(data=df,aes(x=Site,y=Percentage,fill=Site)) + 
  geom_bar(stat="identity") + 
  facet_grid(Venue ~ .) + 
  coord_flip() + 
  opts(legend.position = "none", title="Social Traffic Sources for B2B & B2C Companies") + 
  stat_bin(geom="text", aes(label=sprintf("%d%%",Percentage), vjust=0, hjust=-0.2, size = 10))

And, here’s the data:

Site     Venue	Percentage
Facebook B2B	72
LinkedIn B2B	16
Twitter	 B2B	12
Facebook B2C	84
LinkedIn B2C	1
Twitter	 B2C	15

I chose to go with a latticed bar chart as I think it helps show the relative portions within each category (B2B/B2C) and also enables quick comparisons across categories for all three factors.

For those inclined to click, I was interviewed by Fahmida Rashid (@fahmiwrite) over at Sourceforge’s HTML5 center a few weeks ago (right after the elections) due to my tweets on the use of HTML5 tech over Flash. Here’s one of them:

https://twitter.com/hrbrmstr/status/266006111256207361

While a tad inaccurate (one site did use Flash with an HTML fallback and some international sites are still stuck in the 1990s), it is still a good sign of how the modern web is progressing.

I can honestly say I’ve never seen my last name used so many times in one article :-)

Earlier this week, @jayjacobs & I both received our acceptance notice for the talk we submitted to the RSA CFP! [W00t!] Now the hard part: crank out a compelling presentation in the next six weeks! If you’re interested at all in doing more with your security data, this talk is for you. Full track/number & details below:

Session Track: Governance, Risk & Compliance
Session Code: GRC-T18
Scheduled Date: 02/26/2013
Scheduled Time: 2:30 PM – 3:30 PM
Session Length: 1 hr
Session Title: Data Analysis and Visualization for Security Professionals
Session Classification: Intermediate
Session Keywords: metrics, visualization, risk management, research
Short Abstract: You have a deluge of security-related data coming from all directions and may even have a fancy dashboard full of pretty charts. However, unless you know the right questions to ask and how to ask them, all you really have are compliance artifacts. Move beyond the checkbox and learn techniques for collecting, exploring and visualizing the stories within our security data.

UPDATE: As indicated in the code comments, Google took down the cone KML files. I’ll be changing the code to use the NHC archived cone files later tonight

I will (most likely) not be littering the blog with any more updates to the ‘Sandy’ code unless they are really significant. You can follow along at home to any changes over at github.

Of note (in terms of changes) is the addition of the 5-day forecast cone and some annotations:

As indicated in the code comments, Google took down the cone KML files. I’ll be changing the code to use the NHC archived cone files later tonight

NOTE: There is significantly updated code on github for the Sandy ‘R’ dataviz.

This is a follow-up post to the quickly crafted Watch Sandy in “R” post last night. I noticed that Google provided the KML on their crisis map and wanted to show how easy it is to add it to previous code.

I added comments inline to make it easier to follow along.

library(maps)
library(maptools)
library(rgdal)
 
# need this for handling "paste" extra spaces
trim.trailing <- function (x) sub("\\s+$", "", x)
 
# get track data from Unisys
wx = read.table(file="http://weather.unisys.com/hurricane/atlantic/2012/SANDY/track.dat", skip=3,fill=TRUE)
 
# join last two columns (type of storm) that read.table didn't parse as one
# and give the data frame real column names
wx$V7 = trim.trailing(paste(wx$V7,wx$V8," "))
wx$V8 = NULL
colnames(wx) = c("Advisory","Latitude","Longitude","Time","WindSpeed","Pressure","Status")
 
# annotate the name with forecast (+12/+24/etc) projections
wx$Advisory = unlist(strsplit(toString(wx$Advisory),", "))
wx$a = ""
wx$a[grep("\\+",wx$Advisory)] = wx$Advisory[grep("\\+",wx$Advisory)]
wx$Status = trim.trailing(paste(wx$Status,wx$a," "))
 
# cheap way to make past plots one color and forecast plots another
wx$color = "red"
wx$color[grep("\\+",wx$Advisory)] = "orange"
 
# only want part of the map
xlim=c(-90,-60)
 
# plot the map itself
map("state", interior = FALSE, xlim=xlim)
map("state", boundary = FALSE, col="gray", add = TRUE,xlim=xlim)
 
# plot the track with triangles and colors
lines(x=wx$Longitude,y=wx$Latitude,col="black",cex=0.75)
points(x=wx$Longitude,y=wx$Latitude,col=wx$color,pch=17,cex=0.75)
 
# annotate it with the current & projects strength status + forecast
text(x=wx$Longitude,y=wx$Latitude,col='blue',labels=wx$Status,adj=c(-.15),cex=0.33)
 
# get the forecast "cone" from the KML that google's crisis map provides
# NOTE: you need curl & unzip (with funzip) on your system
# NOTE: Google didn't uniquely name the cone they provide, so this will break
#       post-Sandy. I suggest saving the last KML & track files out out if you
#       want to save this for posterity
 
# this gets the cone and makes it into a string that getKMLcoordinates can process
coneKML = paste(unlist(system("curl -s -o - http://mw1.google.com/mw-weather/maps_hurricanes/three_day_cone.kmz | funzip", intern = TRUE)),collapse="")
 
# process the coneKML and make a data frame out of it
coords = getKMLcoordinates(textConnection(coneKML))
coords = data.frame(SpatialPoints(coords, CRS('+proj=longlat')))
 
# this plots the cone and leaves the track visible
polygon(coords$X1,coords$X2, pch=16, border="red", col='#FF000033',cex=0.25)
 
# no one puts Sandy in a box (well, except us)
box()

UPDATE:

  • Significantly updated code on github
  • Well, a couple folks asked how to make it more “centered” on the hurricane and stop the labels from chopping off, so I modified the previous code a bit to show how to do that.

As indicated in the code comments, Google took down the cone KML files. I’ll be changing the code to use the NHC archived cone files later tonight

While this will pale in comparison to the #spiffy Sandy graphics on Weather Underground and other sites (including Google’s “Crisis Center” maps), I thought it might be interesting to show folks how they can keep an eye on “Sandy” using R.

library(maps)
 
trim.trailing <- function (x) sub("\\s+$", "", x)
 
wx = read.table(file="http://weather.unisys.com/hurricane/atlantic/2012/SANDY/track.dat", skip=3,fill=TRUE)
wx$V7 = trim.trailing(paste(wx$V7,wx$V8," "))
wx$V8 = NULL
colnames(wx) = c("Advisory","Latitude","Longitude","Time","WindSpeed","Pressure","Status")
 
wx$Advisory = unlist(strsplit(toString(wx$Advisory),", "))
wx$a = ""
wx$a[grep("\\+",wx$Advisory)] = wx$Advisory[grep("\\+",wx$Advisory)]
wx$Status = trim.trailing(paste(wx$Status,wx$a," "))
 
wx$color = "red"
wx$color[grep("\\+",wx$Advisory)] = "orange"
 
xlim=c(-90,-60)
 
map("state", interior = FALSE, xlim=xlim)
map("state", boundary = FALSE, col="gray", add = TRUE,xlim=xlim)
 
points(x=wx$Longitude,y=wx$Latitude,col=wx$color,pch=17,cex=0.75)
 
text(x=wx$Longitude,y=wx$Latitude,col='blue',labels=wx$Status,adj=c(-.15),cex=0.33)
box()

This short script reads in data from the Unisys Hurricane Weather Center and plots colored symbols on a US map with the forecast projections. I may be extending this, but it’s a good starting point for anyone who wants to play with mapping relatively live data from the internet (or learn some more R).