SHODAN API in R (With Examples)

Folks may debate the merits of the SHODAN tool, but in my opinion it’s a valuable resource, especially if used for “good”. What is SHODAN? I think ThreatPost summed it up nicely:

“Shodan is a Web based search engine that discovers Internet facing computers, including desktops, servers and routers. The engine, created by programmer John Matherly, allows users to filter searches for systems running a specific type of application (say, Apache Web servers or FTP) and filter results by geographic region. The search engine indexes host ’banners,’ which include meta-data sent between a server and client and includes information such as the type of software run, what services are available and so on.”

I’m in R quite a bit these days and thought it would be useful to have access to the SHODAN API in R. I have a very rudimentary version of the API (search only) up on github which can be integrated into your R environment thus:

library(devtools)
install_github("Rshodan","hrbrmstr")
library(shodan)
help(shodan) # you don't really need to do this cmd

It’ll eventually be in CRAN, but I have some cleanup work to do before the maintainers will accept the submission. If you are new to R, there are a slew of dependencies you’ll need to add to the base R installation. Here’s a good description of how to do that on pretty much every platform.

After I tweeted the above reference, @shawnmer asked the following:

https://twitter.com/shawnmer/status/290904140782137344

That is not an unreasonable request, especially if one is new to R (or SHODAN). I had been working on this post and a more extended example and finally able to get enough code done to warrant publishing it. You can do far more in R than these simple charts & graphs. Imagine taking data from multiple searches–either across time or across ports–and doing a statistical comparison. Or, use some the image processing & recognition libraries within R as well as a package such as RCurl to fetch images from open webcams and attempt to identify people or objects. The following should be enough for most folks to get started.

You can cut/paste the source code here or download the whole source file.

The fundamental shortcut this library provides over just trying to code it yourself is taking the JSON response from SHODAN and turning it into an R data frame. That is not as overtly trivial as you might think and you may want to look at the source code for the library to see where I grabbed some of that code from. I’m also not 100% convinced it’s going to work under all circumstances (hence part of the 0.1 status).

library(shodan)
library(ggplot2)
library(xtable)
library(maps)
library(rworldmap)
library(ggthemes)
 
 
# if you're behind a proxy, setting this will help
# but it's strongly suggested/encouraged that you stick the values in a file and 
# read them in vs paste them in a script
# options(RCurlOptions = list(proxy="proxyhost", proxyuserpwd="user:pass"))
 
setSHODANKey("~/.shodankey")
 
# query example taken from Michael “theprez98” Schearer's DEFCON 18 presentation
# https://www.defcon.org/images/defcon-18/dc-18-presentations/Schearer/DEFCON-18-Schearer-SHODAN.pdf
 
# find all Cisco IOS devies that may have an unauthenticated admin login
# setting trace to be TRUE to see the progress of the query
result = SHODANQuery(query="cisco last-modified www-authenticate",trace=TRUE)
 
#find the first 100 found memcached instances
#result = SHODANQuery(query='port:11211',limit=100,trace=TRUE)
 
df = result$matches
 
# aggregate result by operating system
# you can use this one if you want to filter out NA's completely
#df.summary.by.os = ddply(df, .(os), summarise, N=sum(as.numeric(factor(os))))
#this one provides count of NA's (i.e. unidentified systems)
df.summary.by.os = ddply(df, .(os), summarise, N=length(os))
 
# sort & see the results in a text table
df.summary.by.os = transform(df.summary.by.os, os = reorder(os, -N))
df.summary.by.os

That will yield:

FALSE                 os   N
FALSE 1      Linux 2.4.x  60
FALSE 2      Linux 2.6.x   6
FALSE 3 Linux recent 2.4   2
FALSE 4     Windows 2000   2
FALSE 5   Windows 7 or 8  10
FALSE 6       Windows XP   8
FALSE 7             <NA> 112

You can plot it with:

# plot a bar chart of them
(ggplot(df.summary.by.os,aes(x=os,y=N,fill=os)) + 
   geom_bar(stat="identity") + 
   theme_few() +
   labs(y="Count",title="SHODAN Search Results by OS"))

to yield:

and:

world = map_data("world")
(ggplot() +
   geom_polygon(data=world, aes(x=long, y=lat, group=group)) +
   geom_point(data=df, aes(x=longitude, y=latitude), colour="#EE760033",size=1.75) +
   labs(x="",y="") +
   theme_few())

You can easily do the same by country:

# sort & view the results by country
# see above if you don't want to filter out NA's
df.summary.by.country_code = ddply(df, .(country_code, country_name), summarise, N=sum(!is.na(country_code)))
df.summary.by.country_code = transform(df.summary.by.country_code, country_code = reorder(country_code, -N))
 
df.summary.by.country_code

##    country_code              country_name  N
## 1            AR                 Argentina  2
## 2            AT                   Austria  2
## 3            AU                 Australia  2
## 4            BE                   Belgium  2
## 5            BN         Brunei Darussalam  2
## 6            BR                    Brazil 14
## 7            CA                    Canada 16
## 8            CN                     China  6
## 9            CO                  Colombia  4
## 10           CZ            Czech Republic  2
## 11           DE                   Germany 12
## 12           EE                   Estonia  4
## 13           ES                     Spain  4
## 14           FR                    France 10
## 15           HK                 Hong Kong  2
## 16           HU                   Hungary  2
## 17           IN                     India 10
## 18           IR Iran, Islamic Republic of  4
## 19           IT                     Italy  4
## 20           LV                    Latvia  4
## 21           MX                    Mexico  2
## 22           PK                  Pakistan  4
## 23           PL                    Poland 16
## 24           RU        Russian Federation 14
## 25           SG                 Singapore  2
## 26           SK                  Slovakia  2
## 27           TW                    Taiwan  6
## 28           UA                   Ukraine  2
## 29           US             United States 28
## 30           VE                 Venezuela  2
## 31         <NA>                      <NA>  0

(ggplot(df.summary.by.country_code,aes(x=country_code,y=N)) + 
  geom_bar(stat="identity") +
  theme_few() +
  labs(y="Count",x="Country",title="SHODAN Search Results by Country"))

And, easily generate the must-have choropleth:

# except make a choropleth
# using the very simple rworldmap process
shodanChoropleth = joinCountryData2Map( df.summary.by.country_code, joinCode = "ISO2", nameJoinColumn = "country_code")
par(mai=c(0,0,0.2,0),xaxs="i",yaxs="i")
mapCountryData(shodanChoropleth, nameColumnToPlot="N",colourPalette="terrain",catMethod="fixedWidth")

Again, producing pretty pictures is all well-and-good, but it’s best to start with some good questions you need answering to make any visualization worthwhile. In the coming weeks, I’ll do some posts that show what types of questions you may want to ask/answer with R & SHODAN.

I encourage folks that have issues, concerns or requests to use github vs post in the comments, but I’ll try to respond to either as quickly as possible.

rud.is