Archive for the ‘Information Security’ Category

Moving From system() calls to Rcpp Interfaces

Over on the Data Driven Security Blog there’s a post on how to use Rcpp to interface with an external library (in this case ldns for DNS lookups). It builds on another post which uses system() to make a call to dig to lookup DNS TXT records.

The core code is below and at both the aforementioned blog post and this gist. The post walks you though creating a simple interface and a future post will cover how to build a full package interface to an external library.

Using Twitter as a Data Source For Monitoring Password Dumps

I shot a quick post over at the Data Driven Security blog explaining how to separate Twitter data gathering from R code via the Ruby t (github repo) command. Using t frees R code from having to be a Twitter processor and lets the analyst focus on analysis and visualization, plus you can use t as a substitute for Twitter GUIs if you’d rather play at the command-line:

$ t timeline ddsecblog
   Monitoring Credential Dumps Plus Using Twitter As a Data Source
   Nice intro to R + stats // Data Analysis and Statistical Inference free @datacamp_com course
   Very accessible paper & cool approach to detection // Nazca: Detecting Malware Distribution in
   Large-Scale Networks
   Start of a new series by new contributing blogger @spttnnh! // @AlienVault rep db Longitudinal
   Study Part 1 :

The DDSec post shows how to mine the well-formatted output from the @dumpmon Twitter bot to visualize dump trends over time:

and has the code in-line and over at the DDSec github repo [R].

Data Driven Security Roundup: betaPERT, Shiny, Honeypots, Passwords & Reproducible Research

Jay Jacobs (@jayjacobs)—my co-author of the soon-to-be-released book Data-Driven Security—& I have been hard at work over at the book’s sister-blog cranking out code to help security domain experts delve into the dark art of data science.

We’ve covered quite a bit of ground since January 1st, but I’m using this post to focus more on what we’ve produced using R, since that’s our go-to language.

Jay used the blog to do a long-form answer to a question asked by @dseverski on the SIRA mailing list and I piled on by adding a Shiny app into the mix (both posts make for a pretty #spiffy introduction to expert-opinion risk analyses in R).

Jay continued by releasing a new honeypot data set and corresponding two-part[1,2] post series to jump start analyses on that data. (There’s a D3 geo-visualization stuck in-between those posts if you’re into that sort of thing).

I got it into my head to start a project to build a password dump analytics tool in R (with much more coming soon on that, including a full-on R package + Shiny app combo) and also continue the discussion we started in the book on the need for the infusion of reproducible research principles and practices in the information security domain by building off of @sucuri_security’s Darkleech botnet research.

You can follow along at home with the blog via it’s RSS feed or via the @ddsecblog Twitter account. You can also play along at home if you feel you have something to contribute. It’s as simple as a github pull request and some really straightforward markdown. Take a look the blog’s github repo and hit me up (@hrbrmstr) for details if you’ve got something to share.

Data-Driven Security (The Book) Update #ShamelessSelfPromotion

Data-Driven-SecurityIf I made a Venn diagram of the cross-section of readers of this blog and the Data Driven Security web sites it might be indistinguishable from a pure circle. However, just in case there are a few stragglers out there, I figured one more post on the fact that the new book by @jayjacobs & me is available now in electronic form (not pre-order) wouldn’t hurt. The print book is still making it’s way from dead trees to store shelves and should be ready for the expected February 17th debut.

Here’s the list of links to e-tailers (man, I hate that term) who have it available for the various e-readers out there.

If you happen to catch it out in the wild and not on this list, drop me (@hrbrmstr) a note #pls.

And, a huge thank you! to everyone for their kind accolades yesterday (esp to those who’ve purchased the book :-)

Preparing For The February 2014 Book Launch

Data Driven Security launches in February 2014. @jayjacobs & I have seen half of the book in PDF form so far and it’s almost unbelievable that this journey is almost over.


We setup a live Amazon “sales rank” tracker over at the book’s web site and provided some Python and JavaScript code to show folks how use the AWS API in conjunction with the dygraphs charting library to do the same for any ISBN. In the coming weeks, we’ll have a Google App Engine component you can clone to setup something similar without the need for your own server(s).

ZeroAccess Bots Desperately Seeking Freedom (Visualization)

I’ve been doing a bit of graphing (with real, non-honeypot network data) as part of the research for the book I’m writing with @jayjacobs and thought one of the images was worth sharing (especially since it may not make it into the book :-).

Click image for larger view

This is a static screen capture of a D3 force-directed graph made with R, igraph & Vega of four ZeroAccess infected nodes desperately (each node tried ~200K times over a couple days) trying to break free of a firewall over the course of 11 days. The red nodes are unique destination IPs and purple ones are in the AlienVault IP Reputation database. Jay & I have read and blogged a great deal about ZeroAccess over the past year and finally had the chance to see a live slice of how pervasive (and, noisy) the network is even with just a view from a few infected nodes.

While the above graphic is the composite view of all 11 days, the following one is from just a single day with only two infected nodes trying to communicate out (this is a pure, hastily-crafted R/igraph image):

Two ZeroAccess Infected Nodes
Click image for larger view

There are some common destinations among the two, but each has a large list of unique ones; even the best, open IP reputation database on the planet only included a handful of the malicious endpoints, which means you really need to be looking at holistic behavior modeling vs port/destination alone (I filtered out legit destination traffic for these views) if you’re trying to find egressing badness (but you hopefully already knew that).

Reverse IP Address Lookups With R (From Simple To Bulk/Asynchronous)

R lacks some of the more “utilitarian” features found in other scripting languages that were/are more geared—at least initially—towards systems administration. One of the most frustrating missing pieces for security data scientists is the lack of ability to perform basic IP address manipulations, including reverse DNS resolution (even though it has nsl() which is just glue to gethostbyname()!).

If you need to perform reverse resolution, the only two viable options available are to (a) pre-resolve a list of IP addresses or (b) whip up something in R that takes advantage of the ability to perform system calls. Yes, one could write a C/C++ API library that accesses native resolver routines, but that becomes a pain to maintain across platforms. System calls also create some cross-platform issues, but they are usually easier for the typical R user to overcome.

Assuming the dig command is available on your linux, BSD or Mac OS system, it’s pretty trivial to pass in a list of IP addresses to a simple sapply() one-liner:

resolved = sapply(ips, function(x) system(sprintf("dig -x %s +short",x), intern=TRUE))

That works for fairly small lists of addresses, but doesn’t scale well to hundreds or thousands of addresses. (Also, @jayjacobs kinda hates my one-liners #true.)

A better way is to generate a batch query to dig, but the results will be synchronous, which could take A Very Long Time depending on the size of the list and types of results.

The best way (IMO) to tackle this problem is to perform an asynchronous batch query and post-process the results, which we can do with a little help from adns (which homebrew users can install with a quick “brew install adns“).

Once adns is installed, it’s just a matter of writing out a query list, performing the asynchronous batch lookup, parsing the results and re-integrating with the original IP list (which is necessary since errant or unresponsive reverse queries will not be returned by the adns system call).

#pretend this is A Very Long List of IPs
ip.list = c("", "", "", "", "", 
  "", "", "", "", 
  "", "", "", "", "", 
  "", "", "", "", 
  "", "", "", "", 
  "", "", "", "", 
  "", "", "", "", 
# "ips" is a list of IP addresses <- function(ips) {
  # save out a list of IP addresses in adnshost reverse query format
  # if you're going to be using this in "production", you *might*
  # want to consider using tempfile() #justsayin
  writeLines(laply(ips, function(x) sprintf("-i%s",x)),"/tmp/")
  # call adnshost with the file
  # requires adnshost ::
  system.output <- system("cat /tmp/ | adnshost -f",intern=TRUE)
  # keep our file system tidy
  # clean up the result
  cleaned.result <- gsub("\\.in-addr\\.arpa","",system.output)
  # split the reply
  split.result <- strsplit(cleaned.result," PTR ")
  # make a data frame of the reply
  result.df <- data.frame(, lapply(split.result, rbind)))
  colnames(result.df) <- c("IP","hostname")
  # reverse the octets in the IP address list
  result.df$IP <- sapply(as.character(result.df$IP), function(x) {
    y <- unlist(strsplit(x,"\\."))
  # fill errant lookups with "NA"
  final.result <- merge(ips,result.df,by.x="x",by.y="IP",all.x=TRUE)
  colnames(final.result) = c("IP","hostname")
resolved.df <-
                IP                                   hostname
1                                       <NA>
2                                       <NA>
3                                       <NA>
5                                       <NA>
6                                       <NA>
7                                       <NA>
8                                       <NA>
9                                       <NA>

If you wish to suppress adns error messages and any resultant R warnings, you can add an “ignore.stderr=TRUE” to the system() call and an “options(warn=-1)” to the function itself (remember to get/reset the current value). I kinda like leaving them in, though, as it shows progress is being made.

Whether you end up using a one-liner or the asynchronous function, it would be a spiffy idea to setup a local caching server, such as Unbound, to speed up subsequent queries (because you will undoubtedly have subsequent queries unless your R scripts are perfect on the first go-round).

If you’ve solved the “efficient reverse DNS query problem” a different way in R, drop a note in the comments! I know quite a few folks who’d love to buy you tasty beverage!

You can find similar, handy IP address and other security-oriented R code in our (me & @jayjacobs’) upcoming book on security data analysis and visualization.

IP Intelligence Lookup Chrome Extension

The topic of “IP intelligence” gets a nod in the book that @jayjacobs & I are writing and it was interesting to see just how many sites purport to “know something” about an IP address. I shamelessly admit to being a Chrome user and noticed there were no tools that made it possible to right-click on an IP address and do a simultaneous lookup across a these resources. So, I threw one together (it’s pretty trivial to write a contextMenus extension). It will create a new window and run search queries on the following OSI sources in new tabs:

  • www.sophos.ocm

(I’m kinda partial to the AlienVault IP Reputation database, tho.)

The source is up on github, but—if you’re in an organization that controls which Chrome add-ons you are allowed to use—I also published it to the Chrome Web Store (it’s free) so you can request a review and add by your endpoint management/security team if you find it handy.


I’m definitely open to suggestions/additions/rotten tomatoes being hurled in my direction.

Summer 2013 @GraniteSec Is ON!


What’s missing from that picture? YOU!

Like an aging action hero, @GraniteSec is back in action after an unexpected hiatus. Join us on August 17th for food and fun at the beautiful Fort Foster in Kittery Point, Maine.

The water is chilly, the hiking trails are easy-peasy and you can’t get any better company than the regular attendees of @GraniteSec.

Hit up for all the details and to sign up!

Re-imagining @panda_security’s Q1 2013 Report Pie Charts

We infosec folk eat up industry reports and most of us have no doubt already gobbled up @panda_security’s recently released Q1 2013 Report [PDF]. It’s a good read (so go ahead and read it, we’ll still be here!) and I was really happy to see a nicely stylized chart in the early pages:


However, I quickly became a #sadpanda when I happened across some explosive 3D pie charts later on. Rather than deride, I thought a re-imagining would be a better use of time and let you decide which visualizations both communicate better and are more appealing.

I chose to use @Datawrapper to showcase how easy it is to build and publish pleasing and informative visualizations without even leaving your browser.

Figure 4, Original:

Panda Labs Q1 2013 Report Fig 5 (Orig)

Figure 4, Alternative:

Figure 5, Original

Fig 4: New malware strains In Q1 2013, by Type (orig)

Figure 5, Alternative (horizontal vs vertical, just to mix it up a bit):

If the charts had been closer together in the report, I would have opted for vertical design for both and probably kept malware-type ordering vs sort by highest percentage.

How would you re-imagine the pie charts? Post a link to your creations in the comments and I’ll make sure they show up embedded with the post.

Optimization WordPress Plugins & Solutions by W3 EDGE