Skip navigation

Category Archives: Javascript

Data Driven Security launches in February 2014. @jayjacobs & I have seen half of the book in PDF form so far and it’s almost unbelievable that this journey is almost over.

Data_Driven_Security___Amazon_Sales_Rank_Tracker

We setup a live Amazon “sales rank” tracker over at the book’s web site and provided some Python and JavaScript code to show folks how use the AWS API in conjunction with the dygraphs charting library to do the same for any ISBN. In the coming weeks, we’ll have a Google App Engine component you can clone to setup something similar without the need for your own server(s).

I was helping a friend out who wanted to build a word cloud from the text in Google Groups posts. If you’ve made any efforts to try to get content out of Google Groups you know that the only way to do so is to ensure you subscribe to the group posts via e-mail, then extract all those messages. If you don’t e-mail subscribe to a group, there really is no way to create an archive of the content.

After hacking around a bit and failing, I pulled up the mobile version of the group. You can do that for any Google Group by using the following URL and filling in GROUPNAME for the group you’re interested in: https://groups.google.com/forum/m/#!topic/GROUPNAME.

input_text_not_updated_-_Google_Groups

Then, you’ll need to navigate to a thread, use the double-down arrow to expand all the items in the thread, open up the JavaScript inspector on one of the posts and look for <div dir="ltr">. If that surrounds the post, the following hack will work. Google only seems to add this left-to-right attribute on newer groups, so if you have an older group you need to work with, you’ll need to figure out a different selector (which is coming up in a bit).

With all of the posts expanded, paste the following code into the JavaScript console:

nl = document.querySelectorAll('[dir=ltr]');
s="" ; 
for (i=0; i<nl.length; i++) {
  s = s + nl[i].textContent + "<br/><br/>";
}; 
nw = window.open(); 
nd = nw.document; 
nd.write(s); 
nd.close()

and hit return (I have it spaced out in the code above just for clarity; it will all fit on one line which makes it easier to execute in the console).

Untitled_and_input_text_not_updated_-_Google_Groups

You should get a new browser window (so, you may need to temporarily enable popups on Google Groups for this to work) with the text of all the posts in it. I only put the double <br/> tags in there for the purposes of this example. I just needed the raw text, but you can mark the posts any way you’d like.

You can tweak this hack in many ways to pull as much post metadata as you need since it’s all wrapped in heavily marked <div>s and the base technique should work in a GreaseMonkey or TamperMonkey userscript for those of you with time to code one up.

This hack only lessens the tedium a small amount. You still need to go topic by topic in the group if you want all the content. There’s probably a way to get that navigation automation coded into the script as well. Thankfully, I didn’t need to do that this time around.

If you have other ways to free Google Groups content, drop a note in the comments.

NOTE: Parts [2], [3] & [4] are also now up.

Inspired by a post by @bfist who created the following word cloud in Ruby from VZ RISK INTSUM posts (visit the link or select the visualization to go to the post):

intsumwordcld-copy-2

I ♥ word clouds as much as anyone and usually run Presidential proclamations & SOTU addresses through a word cloud generator just to see what the current year’s foci are.

However, word clouds rarely provide what the creator of the visualization intends. Without performing more strict corpus analysis, one is really just getting a font-based frequency counter. While pretty, it’s not a good idea to derive too much meaning from a simple frequency count since there are phrase & sentence structure components to consider as well as concepts such as stemming (e.g. “risks” & “risk” are most likely the same thing, one is just plural…that’s a simplistic definition/example, though).

I really liked Jim Vallandingham’s Building a Bubble Cloud walk through on how he made a version of @mbostock’s NYTimes convention word counts and decided to both run a rudimentary stem on the VZ RISK INTSUM corpus along with a principal component analysis [PDF] to find the core text influencers and feed the output to a modified version of the bubble cloud:

Screenshot_3_6_13_10_55_PM

You can select the graphic to go to the “interactive”/larger version of it. I had intended to make selecting a circle bring up the relevant documents from the post corpus, but that will have to be a task for another day.

It’s noteworthy that both @bfist’s work and this modified version share many of the same core “important” words. With some stemming refinement and further stopword removal (e.g. “week” was in the original run of this visualization and is of no value for this risk-oriented visualization, so I made it part of the stopword list), this could be a really good way to get an overview of what the risky year was all about.

I won’t promise anything, but I’ll try to get the R code cleaned up enough to post. It’s really basic tm & PCA work, so no rocket-science degree is required. Fork @vlandham’s github repo & follow the aforelinked tutorial for the crunchy D3-goodness bits.

For those inclined to click, I was interviewed by Fahmida Rashid (@fahmiwrite) over at Sourceforge’s HTML5 center a few weeks ago (right after the elections) due to my tweets on the use of HTML5 tech over Flash. Here’s one of them:

https://twitter.com/hrbrmstr/status/266006111256207361

While a tad inaccurate (one site did use Flash with an HTML fallback and some international sites are still stuck in the 1990s), it is still a good sign of how the modern web is progressing.

I can honestly say I’ve never seen my last name used so many times in one article :-)

In the spirit of the previous example this one shows you how to do a quick, country-based choropleth in D3/jQuery with some help from the command-line since not everyone is equipped to kick out some R and most folks I know are very handy at a terminal prompt.

I took the ZeroAccessGeoIPs.csv file and ran it through a quick *nix one-liner to get a JSON-ish associative array of country abbreviations to botnet counts in that country:

cat ZeroAccessGeoIPs.csv | cut -f1,1 -d\,| sort | uniq -c | sort -n | tr "[:upper:]" "[:lower:]" | while read a b ; do echo "{ \"$b\" : \"$a\" }," ; done > botcounts.js

I found a suitable SVG world map on Wikipedia that had id="ABBREV" country groupings. This is not the most #spiffy map (and, if you dig a bit more than I did, you’ll probably find a better one) but it works for this example and shows you how quickly you can find the bits you need to get a visualization together.

With that data and the SVG file pasted into the same HTML document, it’s a simple matter of generating a gradient with some d3 magic:

color = d3.scale.log().domain([1,47880]).range(["#FFEB38","#F54747"]);

and, then, looping over the associative array while using said color range to fill in the country shapes:

 $.each(botcounts, function(key, value) {
    $('#' + key).css('fill',color(value))
  });
}) ;

we get:

You can view the full, larger example on this separate page where you can do a view-source to see the entire code. I really encourage you to do this as you’ll see that there are just a handful of lines of your own code necessary to make a compelling visualization. Sure, you’ll want to add a legend and some other styling, but the basics can be done in – literally – minutes, leaving customized details to your imagination & creativity.

The entire map could have been done in D3, but I only spent about 5 minutes on the entire exercise (including the one-liner) and am still more comfortable in jQuery than I am in D3. I did this also to show that it’s perfectly fine (as Mainers are wont to say) to do pre-processing and hard-coding when cranking out visualizations. The goal is to communicate something to your audience and there are no hard-and-fast rules governing this process. As with any coding, if you think you’ll be doing this again it is a wise idea to make the solution more generic, but there’s nothing wrong with taking valid shortcuts to get results out as quickly as possible.

Definitely feel invited to share your creations in the comments, especially if you find a better map!

You may not be aware of the fact that the #spiffy Verizon Biz folk have some VERIS open source components, one of which is the XML schema for the “Vocabulary for Event Recording and Incident Sharing”.

While most Java-backends will readily slurp up and spit back archaic XML data, the modern web is a JSON world and I wanted to take a stab at encoding the sample incident in JSON format since I’m pretty convinced this type of data is definitely a NoSQL candidate and that JSON is the future.

I didn’t run this past the VZB folk prior to the post, but I think I got it right (well, it validates, at least :-) :

  1. {
  2.   "VERIS_community": {
  3.     "incident": {
  4.       "incident_uid": "String",
  5.       "handler_id": "String",
  6.       "security_compromise": "String",
  7.       "related_incidents": { "related_incident_id": "String" },
  8.       "summary": "String",
  9.       "notes": "String",
  10.       "victim": {
  11.         "victim_id": "String",
  12.         "industry": "000",
  13.         "employee_count": "25,001 to 50,000",
  14.         "location": {
  15.           "country": "String",
  16.           "region": "String"
  17.         },
  18.         "revenue": {
  19.           "amount": "0",
  20.           "iso_currency_code": "USD"
  21.         },
  22.         "security_budget": {
  23.           "amount": "0",
  24.           "iso_currency_code": "USD"
  25.         },
  26.         "notes": "String"
  27.       },
  28.       "agent": [
  29.         {
  30.           "motive": "String",
  31.           "role": "String",
  32.           "notes": "String"
  33.         },
  34.         {
  35.           "type": "External",
  36.           "motive": "String",
  37.           "role": "String",
  38.           "notes": "String",
  39.           "external_variety": "String",
  40.           "origins": {
  41.             "origin": {
  42.               "country": "String",
  43.               "region": "String"
  44.             }
  45.           },
  46.           "ips": { "ip": "String" }
  47.         },
  48.         {
  49.           "type": "Internal",
  50.           "motive": "String",
  51.           "role": "String",
  52.           "notes": "String",
  53.           "internal_variety": "String"
  54.         },
  55.         {
  56.           "type": "Partner",
  57.           "motive": "String",
  58.           "role": "String",
  59.           "notes": "String",
  60.           "industry": "0000",
  61.           "origins": {
  62.             "origin": {
  63.               "country": "String",
  64.               "region": "String"
  65.             }
  66.           }
  67.         }
  68.       ],
  69.       "action": [
  70.         { "notes": "Some notes about a generic action." },
  71.         {
  72.           "type": "Malware",
  73.           "notes": "String",
  74.           "malware_function": "String",
  75.           "malware_vector": "String",
  76.           "cves": { "cve": "String" },
  77.           "names": { "name": "String" },
  78.           "filenames": { "filename": "String" },
  79.           "hash_values": { "hash_value": "String" },
  80.           "outbound_IPs": { "outbound_IP": "String" },
  81.           "outbound_URLs": { "outbound_URL": "String" }
  82.         },
  83.         {
  84.           "type": "Hacking",
  85.           "notes": "String",
  86.           "hacking_method": "String",
  87.           "hacking_vector": "String",
  88.           "cves": { "cve": "String" }
  89.         },
  90.         {
  91.           "type": "Social",
  92.           "notes": "String",
  93.           "social_tactic": "String",
  94.           "social_channel": "String",
  95.           "email": {
  96.             "addresses": { "address": "String" },
  97.             "subject_lines": { "subject_line": "String" },
  98.             "urls": { "url": "String" }
  99.           }
  100.         },
  101.         {
  102.           "type": "Misuse",
  103.           "notes": "Notes for a misuse action.",
  104.           "misuse_variety": "String",
  105.           "misuse_venue": "String"
  106.         },
  107.         {
  108.           "type": "Physical",
  109.           "notes": "Notes for a physical action.",
  110.           "physical_variety": "String",
  111.           "physical_location": "String",
  112.           "physical_access": "String"
  113.         },
  114.         {
  115.           "type": "Error",
  116.           "notes": "Notes for a Error action.",
  117.           "error_variety": "String",
  118.           "error_reason": "String"
  119.         },
  120.         {
  121.           "type": "Environmental",
  122.           "notes": "Notes for a environmental action.",
  123.           "environmental_variety": "String"
  124.         }
  125.       ],
  126.       "assets": {
  127.         "asset_variety": "String",
  128.         "asset_ownership": "String",
  129.         "asset_hosting": "String",
  130.         "asset_management": "String",
  131.         "os": "String",
  132.         "notes": "String"
  133.       },
  134.       "attribute": [
  135.         { "notes": "String" },
  136.         {
  137.           "type": "ConfidentialityPossession",
  138.           "notes": "String",
  139.           "data_disclosure": "String",
  140.           "data": {
  141.             "data_variety": "String",
  142.             "amount": "0"
  143.           },
  144.           "data_state": "String"
  145.         },
  146.         {
  147.           "type": "AvailabilityUtility",
  148.           "notes": "String",
  149.           "availability_utility_variety": "String",
  150.           "availability_utility_duration": "String"
  151.         }
  152.       ],
  153.       "timeline": {
  154.         "timestamp_first_known_action": {
  155.           "year": "2001",
  156.           "month": "--12",
  157.           "day": "---17",
  158.           "time": "14:20:00.0Z"
  159.         },
  160.         "timestamp_data_exfiltration": {
  161.           "year": "2001",
  162.           "month": "--12",
  163.           "day": "---17",
  164.           "time": "14:20:00.0Z"
  165.         },
  166.         "timestamp_incident_discovery": {
  167.           "year": "2001",
  168.           "month": "--12",
  169.           "day": "---17",
  170.           "time": "14:20:00.0Z"
  171.         },
  172.         "timestamp_containment": {
  173.           "year": "2001",
  174.           "month": "--12",
  175.           "day": "---17",
  176.           "time": "14:20:00.0Z"
  177.         },
  178.         "timestamp_initial_compromise": {
  179.           "year": "2001",
  180.           "month": "--12",
  181.           "day": "---17",
  182.           "time": "14:20:00.0Z"
  183.         },
  184.         "timestamp_investigation": {
  185.           "year": "2001",
  186.           "month": "--12",
  187.           "day": "---17",
  188.           "time": "14:20:00.0Z"
  189.         }
  190.       },
  191.       "discovery_method": "String",
  192.       "control_failure": "String",
  193.       "corrective_action": "String",
  194.       "loss": {
  195.         "loss_variety": "String",
  196.         "loss_amount": {
  197.           "amount": "0",
  198.           "iso_currency_code": "USD"
  199.         }
  200.       },
  201.       "impact_rating": "String",
  202.       "impact_estimate": {
  203.         "amount": "0",
  204.         "iso_currency_code": "USD"
  205.       },
  206.       "certainty": "String"
  207.     }
  208.   }
  209. }

I believe I’d advocate for the “timestamps” to be more timestamp-y in the JSON version (the dashes do not make much sense to me even in the XML version) and any fields with min/max range values to be separated to actual min & max fields. I’m going try to find some cycles to mock up a MongoDB / Node.js sample to show how this JSON format would work. At a minimum, even a rough conversion from XML to JSON when requested by a browser would make it easier for client-side data rendering/manipulation.

If you’re not thinking about using VERIS for documenting incidents or hounding your vendors to enable easier support for it, you should be. If you’re skittish about recording incidents anonymously into the VERIS Community, you should get over it (barring capacity constraints).

[@hrbrmstr starts working in javascript again]
The Internets: What do you think?
@hrbrmstr: It’s vile.
The Internets: I know. It’s so bubbly and cloying and happy.
@hrbrmstr: Just like the Federation.
The Internets: And you know what’s really frightening? If you develop with it enough, you begin to like it.
@hrbrmstr: It’s insidious.
The Internets: Just like the Federation.

(With apologies to ST:DS9)

UPDATE: It seems my use of <script async> optimization for Raphaël busted the inline slopegraph generation. Will work on tweaking the example posts to wait for Raphaël to load when I get some time.

So, I had to alter this to start after a user interaction. It loaded fine as a static, local page but seems to get a bit wonky embedded in a complex page. I also see some artifacts in Chrome but not in Safari. Still, not a bad foray into basic animation.

Animate Slopegraph