Skip navigation

Author Archives: hrbrmstr

Don't look at me…I do what he does — just slower. #rstats avuncular • ?Resistance Fighter • Cook • Christian • [Master] Chef des Données de Sécurité @ @rapid7

Having received a couple follow-ups to the OS X notifications on RStudio Desktop for the Mac post, I was determined to find a quick hack to get remote notifications to OS X working from (at least) RStudio Server instances running on the same network. It turns out the hack was pretty straightforward just by using a combination of Growl and gntp-send.

To preempt detractors: Yes, Growl isn’t free for newer versions of OS X; but $3.99USD is worth skipping a frappuccino for if you desire this functionality (IMO). I’ve had Growl running since before there was an app store and it’s far more hackable than the Notification Center is (as demonstrated by this post).

You’ll need to configure Growl to listen for incoming connections (with an optional password, which is a good idea if you’re fairly mobile).

Preferences

Plus, you’ll also want to decide whether you want Notification Center integration or have Growl work independently. My preference is integrated, but YMMV.

The gntp-send app should build without issues. I did it via source download / configure / make / make install on a recent-ish Ubuntu box.

Then it’s just a matter of sourcing a version of this function. You’ll most likely wish to make more of the items defaults. Window users will need to tweak this a bit to work, but I’m betting most RStudio Server instances are running on Linux variants. I have it automatically setting the title and including which RStudio Server host the notice came from.

notify.gntp <- function(message, server, port=23053) {
  system(sprintf("/usr/local/bin/gntp-send -a 'RStudio Server' -s %s:%s '[running on %s]' '%s'",
                 server, port, as.character(Sys.info()["nodename"]), message),
         ignore.stdout=TRUE, ignore.stderr=TRUE, wait=FALSE)
}
 
# test it out 
WORKSTATION_RUNNING_GROWL = "10.0.1.5"
notify.gntp("ddply() finished", WORKSTATION_RUNNING_GROWL)

Banners_and_Alerts

You are going to need to do some additional work/coding if your IP address(es) change. I was going to hack something together that parses netstat output to make a guess at which node was the originating OS X system, but it should be quick enough to change out what your client IP address is, especially since this hack is intended for long-running jobs.

It’d be #spiffy if RStudio Server supported the browser notifications API and/or access to http header info from within the R session to make hacks like this easier or not necessary.

Thanks to a comment, I tweaked the data retrieval to ignore SSL cert errors. You can change that tweak back if you go through the pain of updating the SSL libraries on your Windows boxes (it doesn’t seem to be an issue on OS X/Linux).

I also changed the date routines to use as.POSIXlt instead of ISOdatetime as the latter seemed to cause issues for some folks.

All changes have been pushed to the github repo.

2013-09-16 UPDATE: I took suggestions from a couple comments, expanded the function a bit and stuck it in a gist. See this comment for details.

The data retrieval and computation operations are taking longer and longer as we start cranking through more security data and I’ll often let tasks run in the background whilst performing more mundane tasks or wasting time on Twitter. For folks using RStudio Desktop on a Mac, you can use the #spiffy terminal-notifier from Julien Blanchard (@julienXX) wrapped in a cozy little R function to alert you when your long-running jobs are complete.

After a quick “gem install terminal-notifier” you just need to add this notify() function to your R repertoire:

notify <- function(message="Operation complete") {
  system(sprintf("/usr/bin/terminal-notifier -title 'RStudio' -message '%s' -sender org.rstudio.RStudio -activate org.rstudio.RStudio",
                 message),
         ignore.stdout=TRUE, ignore.stderr=TRUE, wait=FALSE)
}

and add a call to it right after a potentially long-running operation to get a clickable notification right in the Notification Center:

system("sleep 10")
notify("long computation complete")

Banners_and_Alerts

I’m working on a way to send notifications from RStudio Server when using one of the standalone clients mentioned in a previous post, so stay tuned if you need that functionality as well.

It doesn’t get much better for me than when I can combine R and weather data in new ways. I’ve got something brewing with my Nest thermostat and needed to get some current wx readings plus forecast data. I could have chosen a number of different sources or API’s but I wanted to play with the data over at forecast.io (if you haven’t loaded their free weather “app” on your phone/tablet you should do that NOW) so I whipped together a small R package to fetch and process the JSON to make it easier to work with in R.

The package contains a singular function and the magic is all in the conversion of the JSON hourly/minutely weather data into R data frames, which is dirt simple to do since RJSONIO and sapply do all the hard work for us:

# take the JSON blob we got from forecast.io and make an R list from it
fio <- fromJSON(fio.json)
 
# extract hourly forecast data  
fio.hourly.df <- data.frame(
  time = ISOdatetime(1960,1,1,0,0,0) + sapply(fio$hourly$data,"[[","time"),
  summary = sapply(fio$hourly$data,"[[","summary"),
  icon = sapply(fio$hourly$data,"[[","icon"),
  precipIntensity = sapply(fio$hourly$data,"[[","precipIntensity"),
  temperature = sapply(fio$hourly$data,"[[","temperature"),
  apparentTemperature = sapply(fio$hourly$data,"[[","apparentTemperature"),
  dewPoint = sapply(fio$hourly$data,"[[","dewPoint"),
  windSpeed = sapply(fio$hourly$data,"[[","windSpeed"),
  windBearing = sapply(fio$hourly$data,"[[","windBearing"),
  cloudCover = sapply(fio$hourly$data,"[[","cloudCover"),
  humidity = sapply(fio$hourly$data,"[[","humidity"),
  pressure = sapply(fio$hourly$data,"[[","pressure"),
  visibility = sapply(fio$hourly$data,"[[","visibility"),
  ozone = sapply(fio$hourly$data,"[[","ozone")
)

You can view the full code over at github and there’s some sample usage below.

library("devtools")
install_github("Rforecastio", "hrbrmstr")
 
library(Rforecastio)
library(ggplot2)
 
# NEVER put credentials or api keys in script bodies or github repos!!
# the "config" file has one thing in it, the api key string on one line
# this is all it takes to read it in
fio.api.key = readLines("~/.forecast.io")
 
my.latitude = "43.2673"
my.longitude = "-70.8618"
 
fio.list <- fio.forecast(fio.api.key, my.latitude, my.longitude)
 
# setup "forecast" highlight plot area
 
forecast.x.min <- ISOdatetime(1960,1,1,0,0,0) + unclass(Sys.time())
forecast.x.max <- max(fio.list$hourly.df$time)
if (forecast.x.min > forecast.x.max) forecast.x.min <- forecast.x.max
fio.forecast.range.df <- data.frame(xmin=forecast.x.min, xmax=forecast.x.max,
                                    ymin=-Inf, ymax=+Inf)
 
# plot the readings
 
fio.gg <- ggplot(data=fio.list$hourly.df,aes(x=time, y=temperature))
fio.gg <- fio.gg + labs(y="Readings", x="Time")
fio.gg <- fio.gg + geom_rect(data=fio.forecast.range.df,
                             aes(xmin=xmin, xmax=xmax,
                                 ymin=ymin, ymax=ymax), 
                             fill="yellow", alpha=(0.15),
                             inherit.aes = FALSE)
fio.gg <- fio.gg + geom_line(aes(y=humidity*100), color="green")
fio.gg <- fio.gg + geom_line(aes(y=temperature), color="red")
fio.gg <- fio.gg + geom_line(aes(y=dewPoint), color="blue")
fio.gg <- fio.gg + theme_bw()
fio.gg

test

I’ve had a Nest thermometer for a while now and it’s been an overall positive experience. It’s given me more visibility into our heating/cooling system usage, patterns and preferences; plus, it actually saved us money last winter.

We try to avoid running the A/C during the summer, and it would have been really helpful if Nest had supported notifications (or had a proper API) for events such as “A/C turned on/off” for the few times it kicked in when we were away and had left the windows open (yes, we could have made “away” mode a bit less susceptible to big temperature swings). So, I decided to whip up a notification system and data logger using Scott Baker’s pynest library (and a little help from redis, mongo and pushover.net).

If you have a Nest thermometer, have an always on Linux box (this script should work nicely on a RaspberryPi) and want this functionality,

  • grab the code over at github
  • create a Pushover app so you can point the API interface there
  • install and start mongo and redis (both are very easy to setup)
  • create the config file
  • tell the script where to find the config file
  • setup a cron job. Every 5 mins shld work nicely:
    */5 * * * * /opt/nest/nizdos.py

Mongo is used for storing the readings (temp and humidity, for the moment; you can change the code to log whatever you want, tho) since it sends nice JSON to D3 without having to whip it into shape.

Redis is used for storing and updating the last known state of the heat/AC system. Technically you could use mongo or a flat file or memcached or sqlite or MySQL (you get the idea) for that, but I have redis running for other things and it’s just far too easy to setup and use.

Pushover is used for iOS and Android notifications (I really hope they add OS X soon :-)

Once @jayjacobs & I are done with our book in November, I’ll be doing another post and adding some code to the github repo to show how to do data analysis and visualization on all this data you’re logging.

If you’re wondering where the name nizdos came from and haven’t googled it yet, it’s an ancient Indo-European word for nest.

Drop me a note here or on github if you use the script (pls)! Send me a pull request on github if you fork the code make any cool changes. Definitely leave a bug report on github if you find any glaring errors.

For those who want the alerting without the overhead of actually dealing with this script, drop me tweet (@hrbrmstr). I’m pretty close to just having the alerting function working in Google’s AppEngine, which won’t require much setup for those without the infrastructure or time to use this script.

I’m jumping around analytics environments these days and have to leave the comfort of my Mac’s RStudio Desktop application to use various RStudio Server instances via browser. While I prefer to use Chrome, the need to have a “dedicated” RStudio Server client outweighs the utility of my favorite browser. This is where Fluid (@FluidApp by @iTod) comes in.

Fluid lets you build separate, dedicated, Safari/WebKit engine application wrappers for any web resource. As the web site puts it: “Fluid lets you create a Real Mac App (or “Fluid App”) out of any website or web application, effectively turning your favorite web apps into OS X desktop apps.” This means you can build something that will behave almost like the Desktop client and make one for any RStudio Server instance you use.

Fluid

It’s far too easy to perform this useful feat. Just download Fluid, point the “URL:” field in the “Create a Fluid App” dialog to an RStudio Server instance, name it what you like (something that lets you know which RStudio Server instance you’re using would be #spiffy), pick an icon (select the RStudio Desktop application to use that one) and go! You can now start a separate app for each RStudio instance you use, complete with its own cookie storage, fullscreen capability and more (provided you pay the quite reasonable $4.99USD).

Here’s a screen shot of what it ends up looking like (sans MacOS menu bar):

RStudio

If you currently use Fluid this way for RStudio Server instances or give this suggestion a try and come up with any helpful configuration options, Userscripts or Userstyles drop a note in the comments!

I’ve been doing a bit of graphing (with real, non-honeypot network data) as part of the research for the book I’m writing with @jayjacobs and thought one of the images was worth sharing (especially since it may not make it into the book :-).

Threat_View
Click image for larger view

This is a static screen capture of a D3 force-directed graph made with R, igraph & Vega of four ZeroAccess infected nodes desperately (each node tried ~200K times over a couple days) trying to break free of a firewall over the course of 11 days. The red nodes are unique destination IPs and purple ones are in the AlienVault IP Reputation database. Jay & I have read and blogged a great deal about ZeroAccess over the past year and finally had the chance to see a live slice of how pervasive (and, noisy) the network is even with just a view from a few infected nodes.

While the above graphic is the composite view of all 11 days, the following one is from just a single day with only two infected nodes trying to communicate out (this is a pure, hastily-crafted R/igraph image):

Two ZeroAccess Infected Nodes
Click image for larger view

There are some common destinations among the two, but each has a large list of unique ones; even the best, open IP reputation database on the planet only included a handful of the malicious endpoints, which means you really need to be looking at holistic behavior modeling vs port/destination alone (I filtered out legit destination traffic for these views) if you’re trying to find egressing badness (but you hopefully already knew that).

R lacks some of the more “utilitarian” features found in other scripting languages that were/are more geared—at least initially—towards systems administration. One of the most frustrating missing pieces for security data scientists is the lack of ability to perform basic IP address manipulations, including reverse DNS resolution (even though it has nsl() which is just glue to gethostbyname()!).

If you need to perform reverse resolution, the only two viable options available are to (a) pre-resolve a list of IP addresses or (b) whip up something in R that takes advantage of the ability to perform system calls. Yes, one could write a C/C++ API library that accesses native resolver routines, but that becomes a pain to maintain across platforms. System calls also create some cross-platform issues, but they are usually easier for the typical R user to overcome.

Assuming the dig command is available on your linux, BSD or Mac OS system, it’s pretty trivial to pass in a list of IP addresses to a simple sapply() one-liner:

resolved = sapply(ips, function(x) system(sprintf("dig -x %s +short",x), intern=TRUE))

That works for fairly small lists of addresses, but doesn’t scale well to hundreds or thousands of addresses. (Also, @jayjacobs kinda hates my one-liners #true.)

A better way is to generate a batch query to dig, but the results will be synchronous, which could take A Very Long Time depending on the size of the list and types of results.

The best way (IMO) to tackle this problem is to perform an asynchronous batch query and post-process the results, which we can do with a little help from adns (which homebrew users can install with a quick “brew install adns“).

Once adns is installed, it’s just a matter of writing out a query list, performing the asynchronous batch lookup, parsing the results and re-integrating with the original IP list (which is necessary since errant or unresponsive reverse queries will not be returned by the adns system call).

#pretend this is A Very Long List of IPs
ip.list = c("1.1.1.1", "2.3.4.99", "1.1.1.2", "2.3.4.100", "70.196.7.32", 
  "146.160.21.171", "146.160.21.172", "146.160.21.186", "2.3.4.101", 
  "216.167.176.93", "1.1.1.3", "2.3.4.5", "2.3.4.88", "2.3.4.9", 
  "98.208.205.1", "24.34.218.80", "159.127.124.209", "70.196.198.151", 
  "70.192.72.48", "173.192.34.24", "65.214.243.208", "173.45.242.179", 
  "184.106.97.102", "198.61.171.18", "71.184.118.37", "70.215.200.159", 
  "184.107.87.105", "174.121.93.90", "172.17.96.139", "108.59.250.112", 
  "24.63.14.4")
 
# "ips" is a list of IP addresses
ip.to.host <- function(ips) {
  # save out a list of IP addresses in adnshost reverse query format
  # if you're going to be using this in "production", you *might*
  # want to consider using tempfile() #justsayin
  writeLines(laply(ips, function(x) sprintf("-i%s",x)),"/tmp/ips.in")
  # call adnshost with the file
  # requires adnshost :: http://www.chiark.greenend.org.uk/~ian/adns/
  system.output <- system("cat /tmp/ips.in | adnshost -f",intern=TRUE)
  # keep our file system tidy
  unlink("/tmp/ips.in")
  # clean up the result
  cleaned.result <- gsub("\\.in-addr\\.arpa","",system.output)
  # split the reply
  split.result <- strsplit(cleaned.result," PTR ")
  # make a data frame of the reply
  result.df <- data.frame(do.call(rbind, lapply(split.result, rbind)))
  colnames(result.df) <- c("IP","hostname")
  # reverse the octets in the IP address list
  result.df$IP <- sapply(as.character(result.df$IP), function(x) {
    y <- unlist(strsplit(x,"\\."))
    sprintf("%s.%s.%s.%s",y[4],y[3],y[2],y[1])
  })
  # fill errant lookups with "NA"
  final.result <- merge(ips,result.df,by.x="x",by.y="IP",all.x=TRUE)
  colnames(final.result) = c("IP","hostname")
  return(final.result)
}
 
resolved.df <- ip.to.host(ip.list)
head(resolved.df,n=10)
 
                IP                                   hostname
1          1.1.1.1                                       <NA>
2          1.1.1.2                                       <NA>
3          1.1.1.3                                       <NA>
4   108.59.250.112      vps-1068142-5314.manage.myhosting.com
5   146.160.21.171                                       <NA>
6   146.160.21.172                                       <NA>
7   146.160.21.186                                       <NA>
8  159.127.124.209                                       <NA>
9    172.17.96.139                                       <NA>
10   173.192.34.24 173.192.34.24-static.reverse.softlayer.com

If you wish to suppress adns error messages and any resultant R warnings, you can add an “ignore.stderr=TRUE” to the system() call and an “options(warn=-1)” to the function itself (remember to get/reset the current value). I kinda like leaving them in, though, as it shows progress is being made.

Whether you end up using a one-liner or the asynchronous function, it would be a spiffy idea to setup a local caching server, such as Unbound, to speed up subsequent queries (because you will undoubtedly have subsequent queries unless your R scripts are perfect on the first go-round).

If you’ve solved the “efficient reverse DNS query problem” a different way in R, drop a note in the comments! I know quite a few folks who’d love to buy you tasty beverage!

You can find similar, handy IP address and other security-oriented R code in our (me & @jayjacobs’) upcoming book on security data analysis and visualization.