Skip navigation

Category Archives: DNS

I noticed that the @rOpenSci folks had an interface to [ip-api.com](http://ip-api.com/) on their [ToDo](https://github.com/ropensci/webservices/wiki/ToDo) list so I whipped up a small R package to fill said gap.

Their IP Geolocation API will take an IPv4, IPv6 or FQDN and kick back a ASN, lat/lon, address and more. The [ipapi package](https://github.com/hrbrmstr/ipapi) exposes one function – `geolocate` which takes in a character vector of any mixture of IPv4/6 and domains and returns a `data.table` of results. Since `ip-api.com` has a restriction of 250 requests-per-minute, the package also tries to help ensure you don’t get your own IP address banned (there’s a form on their site you can fill in to get it unbanned if you do happen to hit the limit). Overall, there’s nothing fancy in the package, but it gets the job done.

I notified the rOpenSci folks about it, so hopefully it’ll be one less thing on that particular to-do list.

You can see it in action in combination with the super-spiffy [leaflet](http://www.htmlwidgets.org/showcase_leaflet.html) htmlwidget:

library(leaflet)
library(ipapi)
library(maps)
 
# get top 500 domains
sites <- read.csv("http://moz.com/top500/domains/csv", stringsAsFactors=FALSE)
 
# make reproducible
set.seed(1492)
 
# pick out a random 50
sites <- sample(sites$URL, 50) 
sites <- gsub("/", "", sites)
locations <- geolocate(sites)
 
# take a quick look
dplyr::glimpse(locations)
 
## Observations: 50
## Variables:
## $ as          (fctr) AS2635 Automattic, Inc, AS15169 Google Inc., AS3561...
## $ city        (fctr) San Francisco, Mountain View, Chesterfield, Mountai...
## $ country     (fctr) United States, United States, United States, United...
## $ countryCode (fctr) US, US, US, US, US, US, JP, US, US, IT, US, US, US,...
## $ isp         (fctr) Automattic, Google, Savvis, Google, Level 3 Communi...
## $ lat         (dbl) 37.7484, 37.4192, 38.6631, 37.4192, 38.0000, 33.7516...
## $ lon         (dbl) -122.4156, -122.0574, -90.5771, -122.0574, -97.0000,...
## $ org         (fctr) Automattic, Google, Savvis, Google, AddThis, Peer 1...
## $ query       (fctr) 192.0.80.242, 74.125.227.239, 206.132.6.134, 74.125...
## $ region      (fctr) CA, CA, MO, CA, , GA, 13, MA, TX, , MA, TX, CA, , ,...
## $ regionName  (fctr) California, California, Missouri, California, , Geo...
## $ status      (fctr) success, success, success, success, success, succes...
## $ timezone    (fctr) America/Los_Angeles, America/Los_Angeles, America/C...
## $ zip         (fctr) 94110, 94043, 63017, 94043, , 30303, , 02142, 78218...
 
# all i want is the world!
world <- map("world", fill = TRUE, plot = FALSE) 
 
# kick out a a widget
leaflet(data=world) %>% 
  addTiles() %>% 
  addCircleMarkers(locations$lon, locations$lat, 
                   color = '#ff0000', popup=sites)

50 Random Top Sites

R lacks some of the more “utilitarian” features found in other scripting languages that were/are more geared—at least initially—towards systems administration. One of the most frustrating missing pieces for security data scientists is the lack of ability to perform basic IP address manipulations, including reverse DNS resolution (even though it has nsl() which is just glue to gethostbyname()!).

If you need to perform reverse resolution, the only two viable options available are to (a) pre-resolve a list of IP addresses or (b) whip up something in R that takes advantage of the ability to perform system calls. Yes, one could write a C/C++ API library that accesses native resolver routines, but that becomes a pain to maintain across platforms. System calls also create some cross-platform issues, but they are usually easier for the typical R user to overcome.

Assuming the dig command is available on your linux, BSD or Mac OS system, it’s pretty trivial to pass in a list of IP addresses to a simple sapply() one-liner:

resolved = sapply(ips, function(x) system(sprintf("dig -x %s +short",x), intern=TRUE))

That works for fairly small lists of addresses, but doesn’t scale well to hundreds or thousands of addresses. (Also, @jayjacobs kinda hates my one-liners #true.)

A better way is to generate a batch query to dig, but the results will be synchronous, which could take A Very Long Time depending on the size of the list and types of results.

The best way (IMO) to tackle this problem is to perform an asynchronous batch query and post-process the results, which we can do with a little help from adns (which homebrew users can install with a quick “brew install adns“).

Once adns is installed, it’s just a matter of writing out a query list, performing the asynchronous batch lookup, parsing the results and re-integrating with the original IP list (which is necessary since errant or unresponsive reverse queries will not be returned by the adns system call).

#pretend this is A Very Long List of IPs
ip.list = c("1.1.1.1", "2.3.4.99", "1.1.1.2", "2.3.4.100", "70.196.7.32", 
  "146.160.21.171", "146.160.21.172", "146.160.21.186", "2.3.4.101", 
  "216.167.176.93", "1.1.1.3", "2.3.4.5", "2.3.4.88", "2.3.4.9", 
  "98.208.205.1", "24.34.218.80", "159.127.124.209", "70.196.198.151", 
  "70.192.72.48", "173.192.34.24", "65.214.243.208", "173.45.242.179", 
  "184.106.97.102", "198.61.171.18", "71.184.118.37", "70.215.200.159", 
  "184.107.87.105", "174.121.93.90", "172.17.96.139", "108.59.250.112", 
  "24.63.14.4")
 
# "ips" is a list of IP addresses
ip.to.host <- function(ips) {
  # save out a list of IP addresses in adnshost reverse query format
  # if you're going to be using this in "production", you *might*
  # want to consider using tempfile() #justsayin
  writeLines(laply(ips, function(x) sprintf("-i%s",x)),"/tmp/ips.in")
  # call adnshost with the file
  # requires adnshost :: http://www.chiark.greenend.org.uk/~ian/adns/
  system.output <- system("cat /tmp/ips.in | adnshost -f",intern=TRUE)
  # keep our file system tidy
  unlink("/tmp/ips.in")
  # clean up the result
  cleaned.result <- gsub("\\.in-addr\\.arpa","",system.output)
  # split the reply
  split.result <- strsplit(cleaned.result," PTR ")
  # make a data frame of the reply
  result.df <- data.frame(do.call(rbind, lapply(split.result, rbind)))
  colnames(result.df) <- c("IP","hostname")
  # reverse the octets in the IP address list
  result.df$IP <- sapply(as.character(result.df$IP), function(x) {
    y <- unlist(strsplit(x,"\\."))
    sprintf("%s.%s.%s.%s",y[4],y[3],y[2],y[1])
  })
  # fill errant lookups with "NA"
  final.result <- merge(ips,result.df,by.x="x",by.y="IP",all.x=TRUE)
  colnames(final.result) = c("IP","hostname")
  return(final.result)
}
 
resolved.df <- ip.to.host(ip.list)
head(resolved.df,n=10)
 
                IP                                   hostname
1          1.1.1.1                                       <NA>
2          1.1.1.2                                       <NA>
3          1.1.1.3                                       <NA>
4   108.59.250.112      vps-1068142-5314.manage.myhosting.com
5   146.160.21.171                                       <NA>
6   146.160.21.172                                       <NA>
7   146.160.21.186                                       <NA>
8  159.127.124.209                                       <NA>
9    172.17.96.139                                       <NA>
10   173.192.34.24 173.192.34.24-static.reverse.softlayer.com

If you wish to suppress adns error messages and any resultant R warnings, you can add an “ignore.stderr=TRUE” to the system() call and an “options(warn=-1)” to the function itself (remember to get/reset the current value). I kinda like leaving them in, though, as it shows progress is being made.

Whether you end up using a one-liner or the asynchronous function, it would be a spiffy idea to setup a local caching server, such as Unbound, to speed up subsequent queries (because you will undoubtedly have subsequent queries unless your R scripts are perfect on the first go-round).

If you’ve solved the “efficient reverse DNS query problem” a different way in R, drop a note in the comments! I know quite a few folks who’d love to buy you tasty beverage!

You can find similar, handy IP address and other security-oriented R code in our (me & @jayjacobs’) upcoming book on security data analysis and visualization.

The FBI made a tool to help you determine if you were a victim of the DNSChanger malware.

If you’re like many casual Internet users, you have no idea how to get the information to plug into the input box.

Unfortunately, the security model of most modern browsers makes it impossible to easily retrieve this information. However, it is possible to grab the DNS entries if the user is willing to trust the requesting source.

To help make it easier to determine if you’re infected, I wrote DNSChanger Detector. It’s a small Java applet that requires the user to allow it to have privileged access to the DNS entries via a call to sun.net.dns.ResolverConfiguration to get the nameservers. Once it has them, there is some jQuery glue in place to let Javascript access the results.

I understand why the FBI didn’t attempt to go this route, but it will hopefully be useful to folks who don’t wish to walk their friends and family through the process.