Skip navigation

Author Archives: hrbrmstr

Don't look at me…I do what he does — just slower. #rstats avuncular • ?Resistance Fighter • Cook • Christian • [Master] Chef des Données de Sécurité @ @rapid7

In the spirit of the previous example this one shows you how to do a quick, country-based choropleth in D3/jQuery with some help from the command-line since not everyone is equipped to kick out some R and most folks I know are very handy at a terminal prompt.

I took the ZeroAccessGeoIPs.csv file and ran it through a quick *nix one-liner to get a JSON-ish associative array of country abbreviations to botnet counts in that country:

cat ZeroAccessGeoIPs.csv | cut -f1,1 -d\,| sort | uniq -c | sort -n | tr "[:upper:]" "[:lower:]" | while read a b ; do echo "{ \"$b\" : \"$a\" }," ; done > botcounts.js

I found a suitable SVG world map on Wikipedia that had id="ABBREV" country groupings. This is not the most #spiffy map (and, if you dig a bit more than I did, you’ll probably find a better one) but it works for this example and shows you how quickly you can find the bits you need to get a visualization together.

With that data and the SVG file pasted into the same HTML document, it’s a simple matter of generating a gradient with some d3 magic:

color = d3.scale.log().domain([1,47880]).range(["#FFEB38","#F54747"]);

and, then, looping over the associative array while using said color range to fill in the country shapes:

 $.each(botcounts, function(key, value) {
    $('#' + key).css('fill',color(value))
  });
}) ;

we get:

You can view the full, larger example on this separate page where you can do a view-source to see the entire code. I really encourage you to do this as you’ll see that there are just a handful of lines of your own code necessary to make a compelling visualization. Sure, you’ll want to add a legend and some other styling, but the basics can be done in – literally – minutes, leaving customized details to your imagination & creativity.

The entire map could have been done in D3, but I only spent about 5 minutes on the entire exercise (including the one-liner) and am still more comfortable in jQuery than I am in D3. I did this also to show that it’s perfectly fine (as Mainers are wont to say) to do pre-processing and hard-coding when cranking out visualizations. The goal is to communicate something to your audience and there are no hard-and-fast rules governing this process. As with any coding, if you think you’ll be doing this again it is a wise idea to make the solution more generic, but there’s nothing wrong with taking valid shortcuts to get results out as quickly as possible.

Definitely feel invited to share your creations in the comments, especially if you find a better map!

Since F-Secure was #spiffy enough to provide us with GeoIP data for mapping the scope of the ZeroAccess botnet, I thought that some aspiring infosec data scientists might want to see how to use something besides Google Maps & Google Earth to view the data.

If you look at the CSV file, it’s formatted as such (this is a small portion…the file is ~140K lines):

CL,"-34.9833","-71.2333"
PT,"38.679","-9.1569"
US,"42.4163","-70.9969"
BR,"-21.8667","-51.8333"

While that’s useful, we don’t need quotes and a header would be nice (esp for some of the tools I’ll be showing), so a quick cleanup in vi gives us:

Code,Latitude,Longitude
CL,-34.9833,-71.2333
PT,38.679,-9.1569
US,42.4163,-70.9969
BR,-21.8667,-51.8333

With just this information, we can see how much of the United States is covered in ZeroAccess with just a few lines of R:

# read in the csv file
bots = read.csv("ZeroAccessGeoIPs.csv")
 
# load the maps library
library(maps)
 
# draw the US outline in black and state boundaries in gray
map("state", interior = FALSE)
map("state", boundary = FALSE, col="gray", add = TRUE)
 
# plot the latitude & longitudes with a small dot
points(x=bots$Longitude,y=bots$Latitude,col='red',cex=0.25)

Can you pwn me now?

Click for larger map

If you want to see how bad your state is, it’s just as simple. Using my state (Maine) it’s just a matter of swapping out the map statements with more specific data:

bots = read.csv("ZeroAccessGeoIPs.csv")
library(maps)
 
# draw Maine state boundary in black and counties in gray
map("state","maine",interior=FALSE)
map("county","maine",boundary=FALSE,col="gray",add=TRUE)
 
points(x=bots$Longitude,y=bots$Latitude,col='red',cex=0.25)

We’re either really tech/security-savvy or don’t do much computin’ up here

Click for larger map

Because of the way the maps library handles geo-plotting, there are points outside the actual map boundaries.

You can even get a quick and dirty geo-heatmap without too much trouble:

bots = read.csv("ZeroAccessGeoIPs.csv")
 
# load the ggplot2 library
library(ggplot2)
 
# create an plot object for the heatmap
zeroheat <- qplot(xlab="Longitude",ylab="Latitude",main="ZeroAccess Botnet",geom="blank",x=bots$Longitude,y=bots$Latitude,data=bots)  + stat_bin2d(bins =300,aes(fill = log1p(..count..))) 
 
# display the heatmap
zeroheat


Click for larger map

Try playing around with the bins to see how that impacts the plots (the stat_bin2d(…) divides the “map” into “buckets” (or bins) and that informs plot how to color code the output).

If you were to pre-process the data a bit, or craft some ugly R code, a more tradtional choropleth can easily be created as well. The interesting part about using a non-boundaried plot is that this ZeroAccess network almost defines every continent for us (which is kinda scary).

That’s just a taste of what you can do with just a few, simple lines of R. If I have some time, I’ll toss up some examples in Python as well. Definitely drop a note in the comments if you put together some #spiffy visualizations with the data they provided.

@mroesch asked how to find the “Downloads” folder for the Mac App Store. Apple changed this from the time I posted the one and only app I ever did to MAS (a while ago). So, here’s how to do it now.

First, open up iTerm or Terminal and enter:

defaults write com.apple.appstore ShowDebugMenu -bool true

(don’t forget to hit return)

[Re]Start the MAS app and you should now see a Debug menu with the option to show the MAS Downloads folder. There are many other interesting actions/options there as well.

Businessweek’s bleeding-edge approach to typography, layout and overall design is one of the features that keeps me reading the magazine in print form. The design team also often delves into experiments with data visualization and short-form infographics and the most recent issue (Sept 3, 2012) is no exception. Given my proclivity towards slopegraphs, I felt compelled to comment both on their “U.S. Stocks Lead the World” slopegraph:

I think they did a fine job combining some of the aspects of a bubble chart with a rank-order slopegraph. Normally, annotation would be necessary as most slopegraphs are comparing values instead of position; however there is sufficient labeling and consistent use of sizing, colors and other visualization hints to overcome most — if not all — of the problems usually found when using a slopegraph.

Kudos to both Lu Wang and Rita Nazareth!

Dan Kaminski (@dakami) tweeted a cool, small (11 instruction) TRNG generator called Jytter Friday:

The authors have Windows & Linux ports but no OS X port (and, I play mostly on OS X when not in virtual environments). So, I threw together a quick port of it to OS X. It should work on Snow Leopard or Lion, but YMMV.

You need to install nasm via a quick brew install then copy linux32_build.sh to osx_build.sh and change the nasm and gcc lines to be:

nasm -D_32_ -O0 -fmacho -ojytter.o jytter.asm
nasm -D_32_ -O0 -fmacho -otimestamp.o timestamp.asm
gcc -D_32_ -m32 jytter.o timestamp.o -o demo -O2 demo.c

I ran the resultant demo binary and got:

True random integers:
 
64-bit: 10DE4A7EA676A869
128-bit: E2B9F86CADC854B540090E125A7C7611
256-bit: 7F3AC590F6EE2AC13F136B802BEBCC8323CB26665BC354CDAC488ED86E153641
 
True random passwords:
 
66-bit: OEqQaY8UQeO
132-bit: Gwi9DCMtFy7XzHWHII37Hj
258-bit: TPzqJfLL84Mjq3VZXpQDW0.WhWSFq2HA9X6FL7GSjaX
 
Execution time in CPU ticks:
 
000000004397D59A

which tracks with the linux output I received (obviously not the same values) from the demo program on one of my non-VPS linux nodes.

Russell Leidich (the author) did some really impressive work here. I did virtually nothing (just enabled playing with it on OS X). The posts at the Jytter site are well worth the time spent absorbing.

Thanks to a nice call-out post link on Flowing Data in my RSS feeds this morning, I found Naomi Robbins’ Effective Graphs Forbes blog, perused the archives a bit and came across her post on arrow charts.

She presented a nice comparison between (ugh) pie charts, arrow charts and slopegraphs. Sadly, both the NPR slopegraph and Peltier’s slopegraph included in the article committed some of the cardinal sins of slopegraphs I have pointed out previously. Let’s take a look (click on each graphic to make them bigger):


  • Use of binning/rounding without annotation
  • Use of binning/rounding but not to show rate of change
  • Stacking labels (presenting rank where none exists)

More faithful representations would be explicit rounding/binning (to only show rate of change):

or the full scale version (warning: huge slopegraph) to accurately show the value differences and rate of change:

The data set is small, so transcription is not really be an issue, but here is is for you if you want to play with it some more.

This is definitely a case where her arrow charts are a solid alternative to slopegraphs, so definitely check out her post.

You may not be aware of the fact that the #spiffy Verizon Biz folk have some VERIS open source components, one of which is the XML schema for the “Vocabulary for Event Recording and Incident Sharing”.

While most Java-backends will readily slurp up and spit back archaic XML data, the modern web is a JSON world and I wanted to take a stab at encoding the sample incident in JSON format since I’m pretty convinced this type of data is definitely a NoSQL candidate and that JSON is the future.

I didn’t run this past the VZB folk prior to the post, but I think I got it right (well, it validates, at least :-) :

  1. {
  2.   "VERIS_community": {
  3.     "incident": {
  4.       "incident_uid": "String",
  5.       "handler_id": "String",
  6.       "security_compromise": "String",
  7.       "related_incidents": { "related_incident_id": "String" },
  8.       "summary": "String",
  9.       "notes": "String",
  10.       "victim": {
  11.         "victim_id": "String",
  12.         "industry": "000",
  13.         "employee_count": "25,001 to 50,000",
  14.         "location": {
  15.           "country": "String",
  16.           "region": "String"
  17.         },
  18.         "revenue": {
  19.           "amount": "0",
  20.           "iso_currency_code": "USD"
  21.         },
  22.         "security_budget": {
  23.           "amount": "0",
  24.           "iso_currency_code": "USD"
  25.         },
  26.         "notes": "String"
  27.       },
  28.       "agent": [
  29.         {
  30.           "motive": "String",
  31.           "role": "String",
  32.           "notes": "String"
  33.         },
  34.         {
  35.           "type": "External",
  36.           "motive": "String",
  37.           "role": "String",
  38.           "notes": "String",
  39.           "external_variety": "String",
  40.           "origins": {
  41.             "origin": {
  42.               "country": "String",
  43.               "region": "String"
  44.             }
  45.           },
  46.           "ips": { "ip": "String" }
  47.         },
  48.         {
  49.           "type": "Internal",
  50.           "motive": "String",
  51.           "role": "String",
  52.           "notes": "String",
  53.           "internal_variety": "String"
  54.         },
  55.         {
  56.           "type": "Partner",
  57.           "motive": "String",
  58.           "role": "String",
  59.           "notes": "String",
  60.           "industry": "0000",
  61.           "origins": {
  62.             "origin": {
  63.               "country": "String",
  64.               "region": "String"
  65.             }
  66.           }
  67.         }
  68.       ],
  69.       "action": [
  70.         { "notes": "Some notes about a generic action." },
  71.         {
  72.           "type": "Malware",
  73.           "notes": "String",
  74.           "malware_function": "String",
  75.           "malware_vector": "String",
  76.           "cves": { "cve": "String" },
  77.           "names": { "name": "String" },
  78.           "filenames": { "filename": "String" },
  79.           "hash_values": { "hash_value": "String" },
  80.           "outbound_IPs": { "outbound_IP": "String" },
  81.           "outbound_URLs": { "outbound_URL": "String" }
  82.         },
  83.         {
  84.           "type": "Hacking",
  85.           "notes": "String",
  86.           "hacking_method": "String",
  87.           "hacking_vector": "String",
  88.           "cves": { "cve": "String" }
  89.         },
  90.         {
  91.           "type": "Social",
  92.           "notes": "String",
  93.           "social_tactic": "String",
  94.           "social_channel": "String",
  95.           "email": {
  96.             "addresses": { "address": "String" },
  97.             "subject_lines": { "subject_line": "String" },
  98.             "urls": { "url": "String" }
  99.           }
  100.         },
  101.         {
  102.           "type": "Misuse",
  103.           "notes": "Notes for a misuse action.",
  104.           "misuse_variety": "String",
  105.           "misuse_venue": "String"
  106.         },
  107.         {
  108.           "type": "Physical",
  109.           "notes": "Notes for a physical action.",
  110.           "physical_variety": "String",
  111.           "physical_location": "String",
  112.           "physical_access": "String"
  113.         },
  114.         {
  115.           "type": "Error",
  116.           "notes": "Notes for a Error action.",
  117.           "error_variety": "String",
  118.           "error_reason": "String"
  119.         },
  120.         {
  121.           "type": "Environmental",
  122.           "notes": "Notes for a environmental action.",
  123.           "environmental_variety": "String"
  124.         }
  125.       ],
  126.       "assets": {
  127.         "asset_variety": "String",
  128.         "asset_ownership": "String",
  129.         "asset_hosting": "String",
  130.         "asset_management": "String",
  131.         "os": "String",
  132.         "notes": "String"
  133.       },
  134.       "attribute": [
  135.         { "notes": "String" },
  136.         {
  137.           "type": "ConfidentialityPossession",
  138.           "notes": "String",
  139.           "data_disclosure": "String",
  140.           "data": {
  141.             "data_variety": "String",
  142.             "amount": "0"
  143.           },
  144.           "data_state": "String"
  145.         },
  146.         {
  147.           "type": "AvailabilityUtility",
  148.           "notes": "String",
  149.           "availability_utility_variety": "String",
  150.           "availability_utility_duration": "String"
  151.         }
  152.       ],
  153.       "timeline": {
  154.         "timestamp_first_known_action": {
  155.           "year": "2001",
  156.           "month": "--12",
  157.           "day": "---17",
  158.           "time": "14:20:00.0Z"
  159.         },
  160.         "timestamp_data_exfiltration": {
  161.           "year": "2001",
  162.           "month": "--12",
  163.           "day": "---17",
  164.           "time": "14:20:00.0Z"
  165.         },
  166.         "timestamp_incident_discovery": {
  167.           "year": "2001",
  168.           "month": "--12",
  169.           "day": "---17",
  170.           "time": "14:20:00.0Z"
  171.         },
  172.         "timestamp_containment": {
  173.           "year": "2001",
  174.           "month": "--12",
  175.           "day": "---17",
  176.           "time": "14:20:00.0Z"
  177.         },
  178.         "timestamp_initial_compromise": {
  179.           "year": "2001",
  180.           "month": "--12",
  181.           "day": "---17",
  182.           "time": "14:20:00.0Z"
  183.         },
  184.         "timestamp_investigation": {
  185.           "year": "2001",
  186.           "month": "--12",
  187.           "day": "---17",
  188.           "time": "14:20:00.0Z"
  189.         }
  190.       },
  191.       "discovery_method": "String",
  192.       "control_failure": "String",
  193.       "corrective_action": "String",
  194.       "loss": {
  195.         "loss_variety": "String",
  196.         "loss_amount": {
  197.           "amount": "0",
  198.           "iso_currency_code": "USD"
  199.         }
  200.       },
  201.       "impact_rating": "String",
  202.       "impact_estimate": {
  203.         "amount": "0",
  204.         "iso_currency_code": "USD"
  205.       },
  206.       "certainty": "String"
  207.     }
  208.   }
  209. }

I believe I’d advocate for the “timestamps” to be more timestamp-y in the JSON version (the dashes do not make much sense to me even in the XML version) and any fields with min/max range values to be separated to actual min & max fields. I’m going try to find some cycles to mock up a MongoDB / Node.js sample to show how this JSON format would work. At a minimum, even a rough conversion from XML to JSON when requested by a browser would make it easier for client-side data rendering/manipulation.

If you’re not thinking about using VERIS for documenting incidents or hounding your vendors to enable easier support for it, you should be. If you’re skittish about recording incidents anonymously into the VERIS Community, you should get over it (barring capacity constraints).

In @jayjacobs’ latest post on SSH honeypot passsword analysis he shows some spiffy visualizations from crunching the data with Tableau. While I’ve joked with him and called them “robocharts”, the reality is that Tableau does let you work on visualizing the answers to questions quickly without having to go into “code mode” (and that doesn’t make it wrong).

I’ve been using Jay’s honeypot data for both attack analysis as well as an excuse to compare data crunching and visualization tools (so far I’ve poked at it with R and python) in an effort to see what tools are good for exploring various types of questions.

A question that came to mind recently was “Hmmm…I wonder if there is a patten to the timings of probes/attacks?” and I posited that a time-series view across the days would help illustrate that. To that end, I came up with the idea of breaking the attacks into one hour chuncks and build a day-stacked heatmap which could be filtered by country. Something like this:

I’ve been wanting to play with D3 and exploring this concept with it seemed to be a good fit.

Given that working with the real data would entail loading a ~4MB file every time someone viewed this blog post, I put the working example in a separate page where you can do a “view source” to see the code. Without the added complexity of a popup selector and loading spinner, the core code is about 50 lines, much of which could be condensed even further since it’s just chaining calls in javascript. I cheated a bit and used jQuery, too, plus made some of it dependent on WebKit (the legend may look weird in Firefox) due to time constraints.

The library is wicked simple to grok and makes it easy to come up with new ways to look at data (as you can see from the examples gallery on the D3 site).

Unfortunately, no real patterns emerged, but I’m going to take a stab at taking the timestamps (which is the timestamp at the destination of the attack) and align it to the origin to see if that makes a difference in the view. If that turns up anything interesting, I’ll make another quick post on it.

Given that much of data (“big” or otherwise) analysis is domain knowledgable folk asking interesting questions, are there any folks out there who have questions that they’d like to see explored with this data set?