Skip navigation

Author Archives: hrbrmstr

Don't look at me…I do what he does — just slower. #rstats avuncular • ?Resistance Fighter • Cook • Christian • [Master] Chef des Données de Sécurité @ @rapid7

I updated the code to use ggsave and tweaked some of the font & line size values for more consistent (and pretty) output. This also means that I really need to get this up on github.

If you even remotely follow this blog, you’ll see that I’m kinda obsessed with slopegraphs. While I’m pretty happy with my Python implementation, I do quite a bit of data processing & data visualization in R these days and had a few free hours on a recent trip to Seattle, so I whipped up some R code to do traditional and multi-column rank-order slopegraphs in R, mostly due to a post over at Microsoft’s security blog.

#
# multicolumn-rankorder-slopegraph.R
#
# 2013-01-12 - formatting tweaks
# 2013-01-10 - Initial version - boB Rudis - @hrbrmstr
#
# Pretty much explained by the script title. This is an R script which is designed to produce
# 2+ column rank-order slopegraphs with the ability to highlight meaningful patterns
#
 
library(ggplot2)
library(reshape2)
 
# transcription of table from:
# http://blogs.technet.com/b/security/archive/2013/01/07/operating-system-infection-rates-the-most-common-malware-families-on-each-platform.aspx
#
# You can download it from: 
# https://docs.google.com/spreadsheet/ccc?key=0AlCY1qfmPPZVdHpwYk0xYkh3d2xLN0lwTFJrWXppZ2c
 
df = read.csv("~/Desktop/malware.csv")
 
# For this slopegraph, we care that #1 is at the top and that higher value #'s are at the bottom, so we 
# negate the rank values in the table we just read in
 
df$Rank.Win7.SP1 = -df$Rank.Win7.SP1
df$Rank.Win7.RTM = -df$Rank.Win7.RTM
df$Rank.Vista = -df$Rank.Vista
df$Rank.XP = -df$Rank.XP
 
# Also, we are really comparing the end state (ultimately) so sort the list by the end state.
# In this case, it's the Windows 7 malware data.
 
df$Family = with(df, reorder(Family, Rank.Win7.SP1))
 
# We need to take the multi-columns and make it into 3 for line-graph processing 
 
dfm = melt(df)
 
# We need to take the multi-columns and make it into 3 for line-graph processing 
 
dfm = melt(df)
 
# We define our color palette manually so we can highlight the lines that "matter".
# This means you'll need to generate the slopegraph at least one time prior to determine
# which lines need coloring. This should be something we pick up algorithmically, eventually
 
sgPalette = c("#990000", "#990000",  "#CCCCCC", "#CCCCCC", "#CCCCCC","#CCCCCC", "#990000", "#CCCCCC", "#CCCCCC", "#CCCCCC", "#CCCCCC", "#CCCCCC", "#CCCCCC", "#CCCCCC", "#CCCCCC")
#sgPalette = c("#CCCCCC", "#CCCCCC",  "#CCCCCC", "#CCCCCC", "#CCCCCC","#CCCCCC", "#CCCCCC", "#CCCCCC", "#CCCCCC", "#CCCCCC", "#CCCCCC", "#CCCCCC", "#CCCCCC", "#CCCCCC", "#CCCCCC")
#sgPalette = c("#000000", "#000000",  "#000000", "#000000", "#000000","#000000", "#000000", "#000000", "#000000", "#000000", "#000000", "#000000", "#000000", "#000000", "#000000")
 
 
# start the plot
#
# we do a ton of customisations to the plain ggplot canvas, but it's not rocket science
 
sg = ggplot(dfm, aes(factor(variable), value, 
                     group = Family, 
                     colour = Family, 
                     label = Family)) +
  scale_colour_manual(values=sgPalette) +
  theme(legend.position = "none", 
        axis.text.x = element_text(size=5),
        axis.text.y=element_blank(), 
        axis.title.x=element_blank(),
        axis.title.y=element_blank(),
        axis.ticks=element_blank(),
        axis.line=element_blank(),
        panel.grid.major = element_line("black", size = 0.1),
        panel.grid.major = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        panel.background = element_blank())
 
# plot the right-most labels
 
sg1 = sg + geom_line(size=0.15) + 
  geom_text(data = subset(dfm, variable == "Rank.Win7.SP1"), 
            aes(x = factor(variable), label=sprintf(" %-2d %s",-(value),Family)), size = 1.75, hjust = 0) 
 
# plot the left-most labels
 
sg1 = sg1 + geom_text(data = subset(dfm, variable == "Rank.XP"), 
                     aes(x = factor(variable), label=sprintf("%s %2d ",Family,-(value))), size = 1.75, hjust = 1)
 
# this ratio seems to work well for png output
# you'll need to tweak font size for PDF output, but PDF will make post-processing in 
# Illustrator or Inkscape much easier.
 
ggsave("~/Desktop/malware.pdf",sg1,w=8,h=5,dpi=150)

malware
Click for larger version

I really didn’t think the table told a story well and I truly believe slopegraphs are pretty good at telling stories.

This bit of R code is far from generic and requires the data investigator to do some work to make it an effective visualization, but (I think) it’s one of the better starts at a slopegraph library in R. It suffers from the same issues I’ve pointed out before, but it’s far fewer lines of code than my Python version and it handles multi-column slopegraphs quite nicely.

To be truly effective, you’ll need to plot the slopegraph first and then figure out which nodes to highlight and change the sgPalette accordingly to help the reader/viewer focus on what’s important.

I’ll post everything on github once I recover from cross-country travel and—as always–welcome feedback and fixes.

I’m not sure why I never did this earlier, but a post on LifeHacker gave me an idea to add location bar quick search of CVEs (Common Vulnerabilities and Exposures), no doubt due to their example on searching LifeHacker for “security”.

My two favorite sites for searching CVE specifics are, at present, Risk IO’s and CVE Details.

I’m fairly certain anyone in security reading this can figure out the rest, but as I’m ever a slave to minutiae, here are the two shortcuts I’ve setup in Chrome:

Title: CVE Details
Search URL: http://cvedetails.com/cve-details.php?cve_id=%s
Shortcut: cved
Title: Risk I/O Vulnerability Search
Search URL: https://db.risk.io/?q=%s
Shortcut: cvedb

Here’s what the location bar changes to when I use cvedb to search for 2012‑4774

Screenshot_12_28_12_8_58_PM

In reality, this is only saving a scroll and a click since entering CVE‑2012‑4774 into an unoptimized location bar would have just searched Google and given me most of the usual suspects in the first few links. Still, it saves some time and immediately gets me the vulnerability data from the sites I prefer.

I may start poking to see what other security-related searches I can setup in the location bar.

We had our first, real, snowfall of the season in Maine today and that usually means school delays/closings. Our “local” station – @WCHS6 – has a Storm Center Closings page as well as an SMS notification service. I decided this morning that I needed a command line version (and, eventually, a version that sends me a Twitter DM), but I also was tight for time (a lunchtime meeting ending early is responsible for this blog post).

While I’ve consumed my share of Beautiful Soup and can throw down some mechanize with the best of them, it came to me that there may be an even easier way, and one that may also help with the eventual blocking of such a scraping service.

I setup a Google Drive spreadsheet to use the importHTML formula to read in the closings table on the page:

=importHTML("http://www.wcsh6.com/weather/severe_weather/cancellations_closings/default.aspx","table",0)

Then did a File→Publish to the web and setup up Sheet 1 to “Automatically republish when changes are made” and also to have the link be to the CSV version of the data:

Screenshot 12:17:12 1:16 PM

The raw output looks a bit like:

Name,Status,Last Updated
,,
Westbook Seniors,Luncheon PPD to January 7th,12/17/2012 5:22:51
,,
Allied Wheelchair Van Services,Closed,12/17/2012 6:49:47
,,
American Legion - Dixfield,Bingo cancelled,12/17/2012 11:44:12
,,
American Legion Post 155 - Naples,Closed,12/17/2012 12:49:00

The conversion has some “blank” lines but that’s easy enough to filter out with some quick bash:

curl --silent "https://docs.google.com/spreadsheet/pub?key=0AlCY1qfmPPZVdFBsX3kzLUVHZl9Mdmw3bS1POWNsWnc&single=true&gid=0&outpu
t=csv" | grep -v "^,,"

And, looking for the specific school(s) of our kids is an easy grep as well.

The reason this is interesting is that the importHTML is dynamic and will re-convert the HTML table each time the code retrieves the CSV URL. Couple that with the fact that it’s far less likely that Google will be blocked than it is my IP address(es) and this seems to be a pretty nice alternative to traditional parsing.

If I get some time over the break, I’ll do a quick benchmark of using this method over some python and perl scraping/parsing methods.

Back in 2011, @joshcorman posited “HD Moore’s Law” which is basically:

Casual Attacker power grows at the rate of Metasploit

I am officially submitting the ‘fing’ corollary to said law:

Fundamental defender efficacy can be ascertained within 10 ‘fings’

The tool ‘fing’ : http://overlooksoft.com/fing : is a very lightweight-yet-wicked-functional network & services scanner that runs on everything from the linux command line to your iPhone. I have a permanent ‘sensor’ always running at home and have it loaded on every device I can. While the fine folks at Overlook Software would love you to death for buying a fingbox subscription, it can be used quite nicely in standalone mode to great effect.

I break out ‘fing’ during tedious meetings, bus/train/plane rides or trips to stores (like Home Depot [hint]) just to see who/what else is on the Wi-Fi network and to also get an idea of how the network itself is configured.

What’s especially fun at—um—*your* workplace is to run it from the WLAN (iOS/Android) to see how many hosts it finds on the broadcast domain, then pick pseudo-random (or just interesting looking) hosts to see what services (ports) are up and then use the one-click-access mechanism to see what’s running behind the port (especially browser-based services).

How does this relate to HD Moore’s Law? What makes my corollary worthy of an extension?

If I use ‘fing’ to do a broadcast domain discovery, select ten endpoints and discover at least one insecure configuration (e.g. telnet on routers, port 80 admin login screens, highly promiscuous number of ports) you should not consider yourself to be a responsible defender. The “you” is a bit of a broad term, but if your multi-millon dollar (assuming an enterprise) security program can be subverted internally with just ‘fing’, you really won’t be able to handle metasploit, let alone a real attacker.

While metasploit is pretty straightforward to run even for a non-security professional, ‘fing’ is even easier and should be something you show your network and server admins (and developers) how to use on their own, even if you can’t get it officially sanctioned (yes, I said that). Many (most) non-security IT professionals just don’t believe us when we tell them how easy it is for attackers to find things to exploit and this is a great, free way to show them. Or, to put it another way: “demos speak louder than risk assessments“.

If ‘fing’ isn’t in your toolbox, get it in there. If you aren’t running it regularly at work/home/out-and-about, do so. If you aren’t giving your non-security colleagues simple tools to help them be responsible defenders: start now.

And, finally, if you’re using ‘fing’ or any other simple tool in a similar capacity, drop a note in the comments (always looking for useful ways to improve security).

Naomi Robbins is running a graph makeover challenge over at her Forbes blog and this is my entry for the B2B/B2C Traffic Sources one (click for larger version):

And, here’s the R source for how to generate it:

library(ggplot2)
 
df = read.csv("b2bb2c.csv")
 
ggplot(data=df,aes(x=Site,y=Percentage,fill=Site)) + 
  geom_bar(stat="identity") + 
  facet_grid(Venue ~ .) + 
  coord_flip() + 
  opts(legend.position = "none", title="Social Traffic Sources for B2B & B2C Companies") + 
  stat_bin(geom="text", aes(label=sprintf("%d%%",Percentage), vjust=0, hjust=-0.2, size = 10))

And, here’s the data:

Site     Venue	Percentage
Facebook B2B	72
LinkedIn B2B	16
Twitter	 B2B	12
Facebook B2C	84
LinkedIn B2C	1
Twitter	 B2C	15

I chose to go with a latticed bar chart as I think it helps show the relative portions within each category (B2B/B2C) and also enables quick comparisons across categories for all three factors.

For those inclined to click, I was interviewed by Fahmida Rashid (@fahmiwrite) over at Sourceforge’s HTML5 center a few weeks ago (right after the elections) due to my tweets on the use of HTML5 tech over Flash. Here’s one of them:

https://twitter.com/hrbrmstr/status/266006111256207361

While a tad inaccurate (one site did use Flash with an HTML fallback and some international sites are still stuck in the 1990s), it is still a good sign of how the modern web is progressing.

I can honestly say I’ve never seen my last name used so many times in one article :-)

Earlier this week, @jayjacobs & I both received our acceptance notice for the talk we submitted to the RSA CFP! [W00t!] Now the hard part: crank out a compelling presentation in the next six weeks! If you’re interested at all in doing more with your security data, this talk is for you. Full track/number & details below:

Session Track: Governance, Risk & Compliance
Session Code: GRC-T18
Scheduled Date: 02/26/2013
Scheduled Time: 2:30 PM – 3:30 PM
Session Length: 1 hr
Session Title: Data Analysis and Visualization for Security Professionals
Session Classification: Intermediate
Session Keywords: metrics, visualization, risk management, research
Short Abstract: You have a deluge of security-related data coming from all directions and may even have a fancy dashboard full of pretty charts. However, unless you know the right questions to ask and how to ask them, all you really have are compliance artifacts. Move beyond the checkbox and learn techniques for collecting, exploring and visualizing the stories within our security data.

UPDATE: As indicated in the code comments, Google took down the cone KML files. I’ll be changing the code to use the NHC archived cone files later tonight

I will (most likely) not be littering the blog with any more updates to the ‘Sandy’ code unless they are really significant. You can follow along at home to any changes over at github.

Of note (in terms of changes) is the addition of the 5-day forecast cone and some annotations: