Skip navigation

Category Archives: data driven security

I shot a quick post over at the [Data Driven Security blog](http://bit.ly/1hyqJiT) explaining how to separate Twitter data gathering from R code via the Ruby `t` ([github repo](https://github.com/sferik/t)) command. Using `t` frees R code from having to be a Twitter processor and lets the analyst focus on analysis and visualization, plus you can use `t` as a substitute for Twitter GUIs if you’d rather play at the command-line:

$ t timeline ddsecblog
   @DDSecBlog
   Monitoring Credential Dumps Plus Using Twitter As a Data Source http://t.co/ThYbjRI9Za
 
   @DDSecBlog
   Nice intro to R + stats // Data Analysis and Statistical Inference free @datacamp_com course
   http://t.co/FC44FF9DSp
 
   @DDSecBlog
   Very accessible paper & cool approach to detection // Nazca: Detecting Malware Distribution in
   Large-Scale Networks http://t.co/fqrSaFvUK2
 
   @DDSecBlog
   Start of a new series by new contributing blogger @spttnnh! // @AlienVault rep db Longitudinal
   Study Part 1 : http://t.co/XM7m4zP0tr
 
   ...

The DDSec post shows how to mine the well-formatted output from the @dumpmon Twitter bot to visualize dump trends over time:

and has the code in-line and over at the [DDSec github repo](https://github.com/ddsbook/blog/blob/master/extra/src/R/dumpmon.R) [R].

I’m posting this mostly to show how to:

– use the Google spreadsheet data-munging “hack” from the [previous post](http://rud.is/b/2014/02/11/live-google-spreadsheet-for-keeping-track-of-sochi-medals/) in a Shiny context
– include it seamlessly into a web page, and
– run it locally without a great deal of wrangling

The code for the app is [in this gist](https://gist.github.com/hrbrmstr/8949172). It is unsurprisingly just like [some spiffy other code](http://www.r-bloggers.com/winter-olympic-medal-standings-presented-by-r/) you’ve seen apart from my aesthetic choices (Sochi blue! lines+dots! and, current rankings next to country names).

I won’t regurgitate the code here since it’s just as easy to view on [github](https://gist.github.com/hrbrmstr/8949172). You’re seeing the live results of the app below (unless you’ve been more conservative than most folks with your browser security settings),

but the app is actually hosted over at [Data Driven Security](http://shiny.dds.ec/sochi2014/), a blog and (woefully underpowered so reload if it coughs up blood, pls) Shiny server that I run with @jayjacobs. It appears in this WordPress post with the help of an `IFRAME`. It’s essentially the same technique the RStudio/Shiny folks use in many of their own examples.

The app uses [bootstrapPage()](http://www.rdocumentation.org/packages/shiny/functions/bootstrapPage) to help make a more responsive layout which will react nicely in an `IFRAME` setting (since you won’t know the width of the browser area you’re trying to fit the Shiny output into).

In the `ui.R` file, I have the [plotOutput()](http://www.rdocumentation.org/packages/shiny/functions/plotOutput) configured to scale to 100% of container width:

plotOutput("medalsPlot", width="100%")

and then create a seamless `IFRAME` that also sizes to max-width:

<iframe src="http://shiny.dds.ec/sochi2014/" 
        style="max-width:100%" 
        width="100%"
        height="500px"
        scrolling="no" 
        frameborder="no" 
        seamless="seamless">
</iframe>

The *really cool* part (IMO) about many Shiny apps is that you don’t need to rely on the external server to work with the visualization/output. Provided that:

– the authors have coded their app to support local execution…
– and presented the necessary `ui.R`, `server.R`, `global.R`, HTML/CSS & data files either as a github gist or a zip/gz/tar.gz file…
– and **you** have the necessary libraries installed

then, you can start the app with a simple [Rscript](http://www.rdocumentation.org/packages/utils/functions/Rscript) one-liner:

Rscript -e "shiny::runGist(8949172, launch.browser=TRUE)"

or

Rscript -e "shiny::runUrl('http://dds.ec/apps/sochi2014.tar.gz', launch.browser=TRUE)"

There is *some* danger doing this if you haven’t read through the R code prior, since it’s possible to stick some fairly malicious operations in an R script (hey, I’m an infosec professional, so we’re always paranoid :-). But, if you stick with using a gist and do examine the code, you should be fine.

Jay Jacobs (@jayjacobs)—my co-author of the soon-to-be-released book [Data-Driven Security](http://amzn.to/ddsec)—& I have been hard at work over at the book’s [sister-blog](http://dds.ec/blog) cranking out code to help security domain experts delve into the dark art of data science.

We’ve covered quite a bit of ground since January 1st, but I’m using this post to focus more on what we’ve produced using R, since that’s our go-to language.

Jay used the blog to do a [long-form answer](http://datadrivensecurity.info/blog/posts/2014/Jan/severski/) to a question asked by @dseverski on the [SIRA](http://societyinforisk.org) mailing list and I piled on by adding a [Shiny app](http://datadrivensecurity.info/blog/posts/2014/Jan/solvo-mediocris/) into the mix (both posts make for a pretty `#spiffy` introduction to expert-opinion risk analyses in R).

Jay continued by [releasing a new honeypot data set](http://datadrivensecurity.info/blog/data/2014/01/marx.gz) and corresponding two-part[[1](http://datadrivensecurity.info/blog/posts/2014/Jan/blander-part1/),[2](http://datadrivensecurity.info/blog/posts/2014/Jan/blander-part2/)] post series to jump start analyses on that data. (There’s a D3 geo-visualization stuck in-between those posts if you’re into that sort of thing).

I got it into my head to start a project to build a [password dump analytics tool](http://datadrivensecurity.info/blog/posts/2014/Feb/ripal/) in R (with **much** more coming soon on that, including a full-on R package + Shiny app combo) and also continue the discussion we started in the book on the need for the infusion of reproducible research principles and practices in the information security domain by building off of @sucuri_security’s [Darkleech botnet](http://datadrivensecurity.info/blog/posts/2014/Feb/reproducible-research-sucuri-darkleech-data/) research.

You can follow along at home with the blog via it’s [RSS feed](http://datadrivensecurity.info/blog/feeds/all.atom.xml) or via the @ddsecblog Twitter account. You can also **play** along at home if you feel you have something to contribute. It’s as simple as a github pull request and some really straightforward markdown. Take a look the blog’s [github repo](https://github.com/ddsbook/blog) and hit me up (@hrbrmstr) for details if you’ve got something to share.

Data-Driven-SecurityIf I made a Venn diagram of the cross-section of readers of this blog and the [Data Driven Security](http://dds.ec/) web sites it might be indistinguishable from a pure circle. However, just in case there are a few stragglers out there, I figured one more post on the fact that the new book by @jayjacobs & me is available _now_ in electronic form (not pre-order) wouldn’t hurt. The print book is still making it’s way from dead trees to store shelves and should be ready for the expected February 17th debut.

Here’s the list of links to e-tailers (man, I hate that term) who have it available for the various e-readers out there.

– [Amazon/Kindle](http://www.amazon.com/gp/product/B00I1Y7THY/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B00I1Y7THY&linkCode=as2&tag=rudisdotnet-20)
– [B&N/Nook](http://www.barnesandnoble.com/w/data-driven-security-jay-jacobs/1117239036?ean=9781118793725)
– [Google Books](https://play.google.com/store/books/details?id=LQigAgAAQBAJ)
– [Kobo](http://store.kobobooks.com/en-US/ebook/data-driven-security)

If you happen to catch it out in the wild and not on this list, drop me (@hrbrmstr) a note `#pls`.

And, a huge thank you! to everyone for their kind accolades yesterday (esp to those who’ve purchased the book :-)

dds-header-image

While you’re waiting for the [book](http://amzn.to/ddsec) by @jayjacobs & @hrbrmstr to hit the shelves, why not head on over to the inaugural post of the [Data Driven Security Blog](http://datadrivensecurity.info/blog) & give a listen to the first episode of the [Data Driven Security Podcast](http://datadrivensecurity.info/podcast).

The Data Driven Security Blog aspires to be your go-to “how to” and “what’s new?” resource for all facets of security data science and the Data Driven Security Podcast aims to expand on the blog and showcase all the practitioners leading the way in, well, data-driven security.

We start the blog with some D3 visualizations of our book’s [github logs](http://datadrivensecurity.info/blog/posts/2014/Jan/dds-github/), and the [inaugural podcast episode](http://datadrivensecurity.info/podcast/data-driven-security-episode-0.html) gives a bit of background on Jay, Bob, the book and the goal of the podcast.

Join in the discussion on @ddsecblog, @ddsecpodcast and on [Google+](https://plus.google.com/+DatadrivensecurityInfo1/posts). If you prefer e-mail updates over Twitter and/or RSS/Atom feeds, you can also sign up for our non-spammy [newsletter](http://datadrivensecurity.info/blog/pages/subscribe.html).

We welcome all feedback, suggestions and ideas, so don’t be shy!

I decided to forego the D3 map mentioned in the previous post in favor of a Shiny one since I had 90% of the mapping code written.

I binned the ranges into three groups, changed the color over to something more pleasant (with RColorBrewer), added an interactive table for the counties with outage and have the elements updating every minute.

You can see the Live Outage Map over at it’s live Shiny server. Source is below or over at github if you’ve got blockers enabled.

Data Driven Security launches in February 2014. @jayjacobs & I have seen half of the book in PDF form so far and it’s almost unbelievable that this journey is almost over.

Data_Driven_Security___Amazon_Sales_Rank_Tracker

We setup a live Amazon “sales rank” tracker over at the book’s web site and provided some Python and JavaScript code to show folks how use the AWS API in conjunction with the dygraphs charting library to do the same for any ISBN. In the coming weeks, we’ll have a Google App Engine component you can clone to setup something similar without the need for your own server(s).

Since @jayjacobs & I are down to the home stretch on Data Driven Security, I thought it would be interesting to do some post-writing pseudo-analyses of the book itself. I won’t have exact page or word counts for a bit, but I wanted to see how many R packages we ended up relying on for the examples in the chapters. It was fairly straightforward to run a grep for calls to library() or require() across all the source files, and I grouped the results into four categories: “analysis”, “core”, “munging” and “visualization”.

Since I <3 D3 circular dendrograms, I figured that would be a fun way to show the groupings. For those who dislike spinning your noggin around, a more traditional one is also presented. You'll need an SVG-capable browser to see the visualizations (below). Stay on the lookout for more "behind the scenes" posts.

visualizationaplpackcolorspaceggdendroggplot2ggthemesgridExtraigraphmapsmaptoolsRColorBrewervcdanalysisbinomcareffectsportfoliosplinesscalesverisrzoorgdalcoredevtoolsstatsmungingbitopsgdatareshapeplyrrjsonRJSONIO

visualizationaplpackcolorspaceggdendroggplot2ggthemesgridExtraigraphmapsmaptoolsRColorBrewervcdanalysisbinomcareffectsportfoliosplinesscalesverisrzoorgdalcoredevtoolsstatsmungingbitopsgdatareshapeplyrrjsonRJSONIO