Skip navigation

It doesn’t get much better for me than when I can combine R and weather data in new ways. I’ve got something brewing with my Nest thermostat and needed to get some current wx readings plus forecast data. I could have chosen a number of different sources or API’s but I wanted to play with the data over at forecast.io (if you haven’t loaded their free weather “app” on your phone/tablet you should do that NOW) so I whipped together a small R package to fetch and process the JSON to make it easier to work with in R.

The package contains a singular function and the magic is all in the conversion of the JSON hourly/minutely weather data into R data frames, which is dirt simple to do since RJSONIO and sapply do all the hard work for us:

# take the JSON blob we got from forecast.io and make an R list from it
fio <- fromJSON(fio.json)
 
# extract hourly forecast data  
fio.hourly.df <- data.frame(
  time = ISOdatetime(1960,1,1,0,0,0) + sapply(fio$hourly$data,"[[","time"),
  summary = sapply(fio$hourly$data,"[[","summary"),
  icon = sapply(fio$hourly$data,"[[","icon"),
  precipIntensity = sapply(fio$hourly$data,"[[","precipIntensity"),
  temperature = sapply(fio$hourly$data,"[[","temperature"),
  apparentTemperature = sapply(fio$hourly$data,"[[","apparentTemperature"),
  dewPoint = sapply(fio$hourly$data,"[[","dewPoint"),
  windSpeed = sapply(fio$hourly$data,"[[","windSpeed"),
  windBearing = sapply(fio$hourly$data,"[[","windBearing"),
  cloudCover = sapply(fio$hourly$data,"[[","cloudCover"),
  humidity = sapply(fio$hourly$data,"[[","humidity"),
  pressure = sapply(fio$hourly$data,"[[","pressure"),
  visibility = sapply(fio$hourly$data,"[[","visibility"),
  ozone = sapply(fio$hourly$data,"[[","ozone")
)

You can view the full code over at github and there’s some sample usage below.

library("devtools")
install_github("Rforecastio", "hrbrmstr")
 
library(Rforecastio)
library(ggplot2)
 
# NEVER put credentials or api keys in script bodies or github repos!!
# the "config" file has one thing in it, the api key string on one line
# this is all it takes to read it in
fio.api.key = readLines("~/.forecast.io")
 
my.latitude = "43.2673"
my.longitude = "-70.8618"
 
fio.list <- fio.forecast(fio.api.key, my.latitude, my.longitude)
 
# setup "forecast" highlight plot area
 
forecast.x.min <- ISOdatetime(1960,1,1,0,0,0) + unclass(Sys.time())
forecast.x.max <- max(fio.list$hourly.df$time)
if (forecast.x.min > forecast.x.max) forecast.x.min <- forecast.x.max
fio.forecast.range.df <- data.frame(xmin=forecast.x.min, xmax=forecast.x.max,
                                    ymin=-Inf, ymax=+Inf)
 
# plot the readings
 
fio.gg <- ggplot(data=fio.list$hourly.df,aes(x=time, y=temperature))
fio.gg <- fio.gg + labs(y="Readings", x="Time")
fio.gg <- fio.gg + geom_rect(data=fio.forecast.range.df,
                             aes(xmin=xmin, xmax=xmax,
                                 ymin=ymin, ymax=ymax), 
                             fill="yellow", alpha=(0.15),
                             inherit.aes = FALSE)
fio.gg <- fio.gg + geom_line(aes(y=humidity*100), color="green")
fio.gg <- fio.gg + geom_line(aes(y=temperature), color="red")
fio.gg <- fio.gg + geom_line(aes(y=dewPoint), color="blue")
fio.gg <- fio.gg + theme_bw()
fio.gg

test

I’ve had a Nest thermometer for a while now and it’s been an overall positive experience. It’s given me more visibility into our heating/cooling system usage, patterns and preferences; plus, it actually saved us money last winter.

We try to avoid running the A/C during the summer, and it would have been really helpful if Nest had supported notifications (or had a proper API) for events such as “A/C turned on/off” for the few times it kicked in when we were away and had left the windows open (yes, we could have made “away” mode a bit less susceptible to big temperature swings). So, I decided to whip up a notification system and data logger using Scott Baker’s pynest library (and a little help from redis, mongo and pushover.net).

If you have a Nest thermometer, have an always on Linux box (this script should work nicely on a RaspberryPi) and want this functionality,

  • grab the code over at github
  • create a Pushover app so you can point the API interface there
  • install and start mongo and redis (both are very easy to setup)
  • create the config file
  • tell the script where to find the config file
  • setup a cron job. Every 5 mins shld work nicely:
    */5 * * * * /opt/nest/nizdos.py

Mongo is used for storing the readings (temp and humidity, for the moment; you can change the code to log whatever you want, tho) since it sends nice JSON to D3 without having to whip it into shape.

Redis is used for storing and updating the last known state of the heat/AC system. Technically you could use mongo or a flat file or memcached or sqlite or MySQL (you get the idea) for that, but I have redis running for other things and it’s just far too easy to setup and use.

Pushover is used for iOS and Android notifications (I really hope they add OS X soon :-)

Once @jayjacobs & I are done with our book in November, I’ll be doing another post and adding some code to the github repo to show how to do data analysis and visualization on all this data you’re logging.

If you’re wondering where the name nizdos came from and haven’t googled it yet, it’s an ancient Indo-European word for nest.

Drop me a note here or on github if you use the script (pls)! Send me a pull request on github if you fork the code make any cool changes. Definitely leave a bug report on github if you find any glaring errors.

For those who want the alerting without the overhead of actually dealing with this script, drop me tweet (@hrbrmstr). I’m pretty close to just having the alerting function working in Google’s AppEngine, which won’t require much setup for those without the infrastructure or time to use this script.

I’m jumping around analytics environments these days and have to leave the comfort of my Mac’s RStudio Desktop application to use various RStudio Server instances via browser. While I prefer to use Chrome, the need to have a “dedicated” RStudio Server client outweighs the utility of my favorite browser. This is where Fluid (@FluidApp by @iTod) comes in.

Fluid lets you build separate, dedicated, Safari/WebKit engine application wrappers for any web resource. As the web site puts it: “Fluid lets you create a Real Mac App (or “Fluid App”) out of any website or web application, effectively turning your favorite web apps into OS X desktop apps.” This means you can build something that will behave almost like the Desktop client and make one for any RStudio Server instance you use.

Fluid

It’s far too easy to perform this useful feat. Just download Fluid, point the “URL:” field in the “Create a Fluid App” dialog to an RStudio Server instance, name it what you like (something that lets you know which RStudio Server instance you’re using would be #spiffy), pick an icon (select the RStudio Desktop application to use that one) and go! You can now start a separate app for each RStudio instance you use, complete with its own cookie storage, fullscreen capability and more (provided you pay the quite reasonable $4.99USD).

Here’s a screen shot of what it ends up looking like (sans MacOS menu bar):

RStudio

If you currently use Fluid this way for RStudio Server instances or give this suggestion a try and come up with any helpful configuration options, Userscripts or Userstyles drop a note in the comments!

I’ve been doing a bit of graphing (with real, non-honeypot network data) as part of the research for the book I’m writing with @jayjacobs and thought one of the images was worth sharing (especially since it may not make it into the book :-).

Threat_View
Click image for larger view

This is a static screen capture of a D3 force-directed graph made with R, igraph & Vega of four ZeroAccess infected nodes desperately (each node tried ~200K times over a couple days) trying to break free of a firewall over the course of 11 days. The red nodes are unique destination IPs and purple ones are in the AlienVault IP Reputation database. Jay & I have read and blogged a great deal about ZeroAccess over the past year and finally had the chance to see a live slice of how pervasive (and, noisy) the network is even with just a view from a few infected nodes.

While the above graphic is the composite view of all 11 days, the following one is from just a single day with only two infected nodes trying to communicate out (this is a pure, hastily-crafted R/igraph image):

Two ZeroAccess Infected Nodes
Click image for larger view

There are some common destinations among the two, but each has a large list of unique ones; even the best, open IP reputation database on the planet only included a handful of the malicious endpoints, which means you really need to be looking at holistic behavior modeling vs port/destination alone (I filtered out legit destination traffic for these views) if you’re trying to find egressing badness (but you hopefully already knew that).

R lacks some of the more “utilitarian” features found in other scripting languages that were/are more geared—at least initially—towards systems administration. One of the most frustrating missing pieces for security data scientists is the lack of ability to perform basic IP address manipulations, including reverse DNS resolution (even though it has nsl() which is just glue to gethostbyname()!).

If you need to perform reverse resolution, the only two viable options available are to (a) pre-resolve a list of IP addresses or (b) whip up something in R that takes advantage of the ability to perform system calls. Yes, one could write a C/C++ API library that accesses native resolver routines, but that becomes a pain to maintain across platforms. System calls also create some cross-platform issues, but they are usually easier for the typical R user to overcome.

Assuming the dig command is available on your linux, BSD or Mac OS system, it’s pretty trivial to pass in a list of IP addresses to a simple sapply() one-liner:

resolved = sapply(ips, function(x) system(sprintf("dig -x %s +short",x), intern=TRUE))

That works for fairly small lists of addresses, but doesn’t scale well to hundreds or thousands of addresses. (Also, @jayjacobs kinda hates my one-liners #true.)

A better way is to generate a batch query to dig, but the results will be synchronous, which could take A Very Long Time depending on the size of the list and types of results.

The best way (IMO) to tackle this problem is to perform an asynchronous batch query and post-process the results, which we can do with a little help from adns (which homebrew users can install with a quick “brew install adns“).

Once adns is installed, it’s just a matter of writing out a query list, performing the asynchronous batch lookup, parsing the results and re-integrating with the original IP list (which is necessary since errant or unresponsive reverse queries will not be returned by the adns system call).

#pretend this is A Very Long List of IPs
ip.list = c("1.1.1.1", "2.3.4.99", "1.1.1.2", "2.3.4.100", "70.196.7.32", 
  "146.160.21.171", "146.160.21.172", "146.160.21.186", "2.3.4.101", 
  "216.167.176.93", "1.1.1.3", "2.3.4.5", "2.3.4.88", "2.3.4.9", 
  "98.208.205.1", "24.34.218.80", "159.127.124.209", "70.196.198.151", 
  "70.192.72.48", "173.192.34.24", "65.214.243.208", "173.45.242.179", 
  "184.106.97.102", "198.61.171.18", "71.184.118.37", "70.215.200.159", 
  "184.107.87.105", "174.121.93.90", "172.17.96.139", "108.59.250.112", 
  "24.63.14.4")
 
# "ips" is a list of IP addresses
ip.to.host <- function(ips) {
  # save out a list of IP addresses in adnshost reverse query format
  # if you're going to be using this in "production", you *might*
  # want to consider using tempfile() #justsayin
  writeLines(laply(ips, function(x) sprintf("-i%s",x)),"/tmp/ips.in")
  # call adnshost with the file
  # requires adnshost :: http://www.chiark.greenend.org.uk/~ian/adns/
  system.output <- system("cat /tmp/ips.in | adnshost -f",intern=TRUE)
  # keep our file system tidy
  unlink("/tmp/ips.in")
  # clean up the result
  cleaned.result <- gsub("\\.in-addr\\.arpa","",system.output)
  # split the reply
  split.result <- strsplit(cleaned.result," PTR ")
  # make a data frame of the reply
  result.df <- data.frame(do.call(rbind, lapply(split.result, rbind)))
  colnames(result.df) <- c("IP","hostname")
  # reverse the octets in the IP address list
  result.df$IP <- sapply(as.character(result.df$IP), function(x) {
    y <- unlist(strsplit(x,"\\."))
    sprintf("%s.%s.%s.%s",y[4],y[3],y[2],y[1])
  })
  # fill errant lookups with "NA"
  final.result <- merge(ips,result.df,by.x="x",by.y="IP",all.x=TRUE)
  colnames(final.result) = c("IP","hostname")
  return(final.result)
}
 
resolved.df <- ip.to.host(ip.list)
head(resolved.df,n=10)
 
                IP                                   hostname
1          1.1.1.1                                       <NA>
2          1.1.1.2                                       <NA>
3          1.1.1.3                                       <NA>
4   108.59.250.112      vps-1068142-5314.manage.myhosting.com
5   146.160.21.171                                       <NA>
6   146.160.21.172                                       <NA>
7   146.160.21.186                                       <NA>
8  159.127.124.209                                       <NA>
9    172.17.96.139                                       <NA>
10   173.192.34.24 173.192.34.24-static.reverse.softlayer.com

If you wish to suppress adns error messages and any resultant R warnings, you can add an “ignore.stderr=TRUE” to the system() call and an “options(warn=-1)” to the function itself (remember to get/reset the current value). I kinda like leaving them in, though, as it shows progress is being made.

Whether you end up using a one-liner or the asynchronous function, it would be a spiffy idea to setup a local caching server, such as Unbound, to speed up subsequent queries (because you will undoubtedly have subsequent queries unless your R scripts are perfect on the first go-round).

If you’ve solved the “efficient reverse DNS query problem” a different way in R, drop a note in the comments! I know quite a few folks who’d love to buy you tasty beverage!

You can find similar, handy IP address and other security-oriented R code in our (me & @jayjacobs’) upcoming book on security data analysis and visualization.

The topic of “IP intelligence” gets a nod in the book that @jayjacobs & I are writing and it was interesting to see just how many sites purport to “know something” about an IP address. I shamelessly admit to being a Chrome user and noticed there were no tools that made it possible to right-click on an IP address and do a simultaneous lookup across a these resources. So, I threw one together (it’s pretty trivial to write a contextMenus extension). It will create a new window and run search queries on the following OSI sources in new tabs:

– whois.domaintools.com
– www.mywot.com
– www.tcpiputils.com
– *labs.alienvault.com*
– www.projecthoneypot.org
– www.virustotal.com
– www.senderbase.com
– www.mcafee.com
– www.sophos.ocm
– www.ipvoid.com

(I’m kinda partial to the AlienVault IP Reputation database, tho.)

The source is up on github, but—if you’re in an organization that controls which Chrome add-ons you are allowed to use—I also published it to the Chrome Web Store (it’s free) so you can request a review and add by your endpoint management/security team if you find it handy.

ip-intel-cap

I’m definitely open to suggestions/additions/rotten tomatoes being hurled in my direction.

Beach-Chairs-Double

What’s missing from that picture? YOU!

Like an aging action hero, @GraniteSec is back in action after an unexpected hiatus. Join us on August 17th for food and fun at the beautiful Fort Foster in Kittery Point, Maine.

The water is chilly, the hiking trails are easy-peasy and you can’t get any better company than the regular attendees of @GraniteSec.

Hit up granitesec.org for all the details and to sign up!

When I am out of the office for an extended time, I try to post a “crypto” challenge for work-folk to do while I’m gone with the added bonus of winning fabulous prizes. There were no answers submitted for the clues during our ANP trip, but I’m re-posting all of the clues here (with some hints) while we’re climbing Katahdin this week and opening it up to all takers.

Here are the clues (in chronological order):

– 2013-06-23 : 2C 22 5E 55 ☚ 44 12 45 13 0116;0114;0097;0105;0108
– 2013-06-24 : ttp://rud.is/trident.png #OoO #clue
– 2013-06-25 : http://pastebin.com/t1Aqs0fj #OoO #clue
– 2013-06-26 : 34 34 20 32 32 20 31 38 ☚ 36 38 20 33 33 20 37 33 108;105;103;104;116;104;111;117;115;101; #OoO #clue
– 2013-06-27 : 0o37350o60o33 NEEEEE RSS FRR 792 [08] s/i//I #OoO #clue
– 2013-06-28 : Context is the key to your path forward… #OoO #clue
– 2013-06-29 : http://www.barharbormaine.gov/document/0002/2102.pdf #OoO #clue

There’s a solution for each day and submissions (to bob at rudis dot net) must include each day’s answer.

Here are additional hints (in chronological order):

– The hand points west
– We were in ANP, not midcoast Maine, so there’s definitely more than a visually appealing picture; plus, I left off the ‘h’ just for kicks
– Don’t over-think what you find. Focus on finding out what is missing.
– The hand points west (but you might be wrong to only use what you know from 2013-06-23)
– Octal, then piece things together and go to a familiar resource location to find something
– When did I tweet that again? Mebbe take some direction from the previous day’s hint
– Far too simple to provide a hint #true

Prizes, you say? Well, yes. For the determined folks who do submit something correct (with a bit of explanation on what you did to solve each one), the first person to do so wins an ANP mug and shot glass (you can use the former for beer or soda and the latter for espresso or harder substances).

I’ll do a write-up on the answers when [if?] I get back.