Skip navigation

Category Archives: Cybersecurity

Google recently [announced](https://developers.google.com/speed/public-dns/docs/dns-over-https) their DNS-over-HTTPS API, which _”enhances privacy and security between a client and a recursive resolver, and complements DNSSEC to provide end-to-end authenticated DNS lookups”_. The REST API they provided was pretty simple to [wrap into a package](https://github.com/hrbrmstr/gdns) and I tossed in some [SPF](http://www.openspf.org/SPF_Record_Syntax) functions that I had lying around to bulk it up a bit.

### Why DNS-over-HTTPS?

DNS machinations usually happen over UDP (and sometimes TCP). Unless you’re using some fairly modern DNS augmentations, these exchanges happen in cleartext, meaning your query and the response are exposed during transport (and they are already exposed to the server you’re querying for a response).

DNS queries over HTTPS will be harder to [spoof](http://www.veracode.com/security/spoofing-attack) and the query + response will be encrypted, so you gain transport privacy when, say, you’re at Starbucks or from your DSL, FiOS, Gogo Inflight, or cable internet provider (yes, they all snoop on your DNS queries).

You end up trusting Google quite a bit with this API, but if you were currently using `8.8.8.8` or `8.8.4.4` (or their IPv6 equivalents) you were already trusting Google (and it’s likely Google knows what you’re doing on the internet anyway given all the trackers and especially if you’re using Chrome).

One additional item you gain using this API is more control over [`EDNS0`](https://tools.ietf.org/html/draft-vandergaast-edns-client-ip-00) settings. `EDNS0` is a DNS protocol extension that, for example, enables the content delivery networks to pick the “closest” server farm to ensure speedy delivery of your streaming Game of Thrones binge watch. They get to know a piece of your IP address so they can make this decision, but you end up giving away a bit of privacy (though you lose the privacy in the end since the target CDN servers know precisely where you are).

Right now, there’s no way for most clients to use DNS-over-HTTPS directly, but the API can be used in a programmatic fashion, which may be helpful in situations where you need to do some DNS spelunking but UDP is blocked or you’re on a platform that can’t build the [`resolv`](https://github.com/hrbrmstr/resolv) package.

You can learn a bit more about DNS and privacy in this [IETF paper](https://www.ietf.org/mail-archive/web/dns-privacy/current/pdfWqAIUmEl47.pdf) [PDF].

### Mining DNS with `gdns`

The `gdns` package is pretty straightforward. Use the `query()` function to get DNS info for a single entity:

library(gdns)
 
query("apple.com")
## $Status  
## [1] 0          # NOERROR - Standard DNS response code (32 bit integer)
## 
## $TC
## [1] FALSE      # Whether the response is truncated
## 
## $RD
## [1] TRUE       # Should always be true for Google Public DNS
## 
## $RA
## [1] TRUE       # Should always be true for Google Public DNS
## 
## $AD
## [1] FALSE      # Whether all response data was validated with DNSSEC
## 
## $CD
## [1] FALSE      # Whether the client asked to disable DNSSEC
## 
## $Question
##         name type
## 1 apple.com.    1
## 
## $Answer
##         name type  TTL          data
## 1 apple.com.    1 1547 17.172.224.47
## 2 apple.com.    1 1547  17.178.96.59
## 3 apple.com.    1 1547 17.142.160.59
## 
## $Additional
## list()
## 
## $edns_client_subnet
## [1] "0.0.0.0/0"

The `gdns` lookup functions are set to use an `edns_client_subnet` of `0.0.0.0/0`, meaning your local IP address or subnet is not leaked outside of your connection to Google (you can override this behavior).

You can do reverse lookups as well (i.e. query IP addresses):

query("17.172.224.47", "PTR")
## $Status
## [1] 0
## 
## $TC
## [1] FALSE
## 
## $RD
## [1] TRUE
## 
## $RA
## [1] TRUE
## 
## $AD
## [1] FALSE
## 
## $CD
## [1] FALSE
## 
## $Question
##                          name type
## 1 47.224.172.17.in-addr.arpa.   12
## 
## $Answer
##                            name type  TTL                           data
## 1   47.224.172.17.in-addr.arpa.   12 1073               webobjects.info.
## 2   47.224.172.17.in-addr.arpa.   12 1073                   yessql.info.
## 3   47.224.172.17.in-addr.arpa.   12 1073                 apples-msk.ru.
## 4   47.224.172.17.in-addr.arpa.   12 1073                     icloud.se.
## 5   47.224.172.17.in-addr.arpa.   12 1073                     icloud.es.
## 6   47.224.172.17.in-addr.arpa.   12 1073                     icloud.om.
## 7   47.224.172.17.in-addr.arpa.   12 1073                   icloudo.com.
## 8   47.224.172.17.in-addr.arpa.   12 1073                     icloud.ch.
## 9   47.224.172.17.in-addr.arpa.   12 1073                     icloud.fr.
## 10  47.224.172.17.in-addr.arpa.   12 1073                   icloude.com.
## 11  47.224.172.17.in-addr.arpa.   12 1073          camelspaceeffect.com.
## 12  47.224.172.17.in-addr.arpa.   12 1073                 camelphat.com.
## 13  47.224.172.17.in-addr.arpa.   12 1073              alchemysynth.com.
## 14  47.224.172.17.in-addr.arpa.   12 1073                    openni.org.
## 15  47.224.172.17.in-addr.arpa.   12 1073                      swell.am.
## 16  47.224.172.17.in-addr.arpa.   12 1073                  appleweb.net.
## 17  47.224.172.17.in-addr.arpa.   12 1073       appleipodsettlement.com.
## 18  47.224.172.17.in-addr.arpa.   12 1073                    earpod.net.
## 19  47.224.172.17.in-addr.arpa.   12 1073                 yourapple.com.
## 20  47.224.172.17.in-addr.arpa.   12 1073                    xserve.net.
## 21  47.224.172.17.in-addr.arpa.   12 1073                    xserve.com.
## 22  47.224.172.17.in-addr.arpa.   12 1073            velocityengine.com.
## 23  47.224.172.17.in-addr.arpa.   12 1073           velocity-engine.com.
## 24  47.224.172.17.in-addr.arpa.   12 1073            universityarts.com.
## 25  47.224.172.17.in-addr.arpa.   12 1073            thinkdifferent.com.
## 26  47.224.172.17.in-addr.arpa.   12 1073               theatremode.com.
## 27  47.224.172.17.in-addr.arpa.   12 1073               theatermode.com.
## 28  47.224.172.17.in-addr.arpa.   12 1073           streamquicktime.net.
## 29  47.224.172.17.in-addr.arpa.   12 1073           streamquicktime.com.
## 30  47.224.172.17.in-addr.arpa.   12 1073                ripmixburn.com.
## 31  47.224.172.17.in-addr.arpa.   12 1073              rip-mix-burn.com.
## 32  47.224.172.17.in-addr.arpa.   12 1073        quicktimestreaming.net.
## 33  47.224.172.17.in-addr.arpa.   12 1073        quicktimestreaming.com.
## 34  47.224.172.17.in-addr.arpa.   12 1073                  quicktime.cc.
## 35  47.224.172.17.in-addr.arpa.   12 1073                      qttv.net.
## 36  47.224.172.17.in-addr.arpa.   12 1073                      qtml.com.
## 37  47.224.172.17.in-addr.arpa.   12 1073                     qt-tv.net.
## 38  47.224.172.17.in-addr.arpa.   12 1073          publishingsurvey.org.
## 39  47.224.172.17.in-addr.arpa.   12 1073          publishingsurvey.com.
## 40  47.224.172.17.in-addr.arpa.   12 1073        publishingresearch.org.
## 41  47.224.172.17.in-addr.arpa.   12 1073        publishingresearch.com.
## 42  47.224.172.17.in-addr.arpa.   12 1073         publishing-survey.org.
## 43  47.224.172.17.in-addr.arpa.   12 1073         publishing-survey.com.
## 44  47.224.172.17.in-addr.arpa.   12 1073       publishing-research.org.
## 45  47.224.172.17.in-addr.arpa.   12 1073       publishing-research.com.
## 46  47.224.172.17.in-addr.arpa.   12 1073                  powerbook.cc.
## 47  47.224.172.17.in-addr.arpa.   12 1073             playquicktime.net.
## 48  47.224.172.17.in-addr.arpa.   12 1073             playquicktime.com.
## 49  47.224.172.17.in-addr.arpa.   12 1073           nwk-apple.apple.com.
## 50  47.224.172.17.in-addr.arpa.   12 1073                   myapple.net.
## 51  47.224.172.17.in-addr.arpa.   12 1073                  macreach.net.
## 52  47.224.172.17.in-addr.arpa.   12 1073                  macreach.com.
## 53  47.224.172.17.in-addr.arpa.   12 1073                   macmate.com.
## 54  47.224.172.17.in-addr.arpa.   12 1073         macintoshsoftware.com.
## 55  47.224.172.17.in-addr.arpa.   12 1073                    machos.net.
## 56  47.224.172.17.in-addr.arpa.   12 1073                   mach-os.net.
## 57  47.224.172.17.in-addr.arpa.   12 1073                   mach-os.com.
## 58  47.224.172.17.in-addr.arpa.   12 1073                   ischool.com.
## 59  47.224.172.17.in-addr.arpa.   12 1073           insidemacintosh.com.
## 60  47.224.172.17.in-addr.arpa.   12 1073             imovietheater.com.
## 61  47.224.172.17.in-addr.arpa.   12 1073               imoviestage.com.
## 62  47.224.172.17.in-addr.arpa.   12 1073             imoviegallery.com.
## 63  47.224.172.17.in-addr.arpa.   12 1073               imacsources.com.
## 64  47.224.172.17.in-addr.arpa.   12 1073        imac-applecomputer.com.
## 65  47.224.172.17.in-addr.arpa.   12 1073                imac-apple.com.
## 66  47.224.172.17.in-addr.arpa.   12 1073                     ikids.com.
## 67  47.224.172.17.in-addr.arpa.   12 1073              ibookpartner.com.
## 68  47.224.172.17.in-addr.arpa.   12 1073                   geoport.com.
## 69  47.224.172.17.in-addr.arpa.   12 1073                   firewire.cl.
## 70  47.224.172.17.in-addr.arpa.   12 1073               expertapple.com.
## 71  47.224.172.17.in-addr.arpa.   12 1073              edu-research.org.
## 72  47.224.172.17.in-addr.arpa.   12 1073               dvdstudiopro.us.
## 73  47.224.172.17.in-addr.arpa.   12 1073              dvdstudiopro.org.
## 74  47.224.172.17.in-addr.arpa.   12 1073              dvdstudiopro.net.
## 75  47.224.172.17.in-addr.arpa.   12 1073             dvdstudiopro.info.
## 76  47.224.172.17.in-addr.arpa.   12 1073              dvdstudiopro.com.
## 77  47.224.172.17.in-addr.arpa.   12 1073              dvdstudiopro.biz.
## 78  47.224.172.17.in-addr.arpa.   12 1073          developercentral.com.
## 79  47.224.172.17.in-addr.arpa.   12 1073             desktopmovies.org.
## 80  47.224.172.17.in-addr.arpa.   12 1073             desktopmovies.net.
## 81  47.224.172.17.in-addr.arpa.   12 1073              desktopmovie.org.
## 82  47.224.172.17.in-addr.arpa.   12 1073              desktopmovie.net.
## 83  47.224.172.17.in-addr.arpa.   12 1073              desktopmovie.com.
## 84  47.224.172.17.in-addr.arpa.   12 1073          darwinsourcecode.com.
## 85  47.224.172.17.in-addr.arpa.   12 1073              darwinsource.org.
## 86  47.224.172.17.in-addr.arpa.   12 1073              darwinsource.com.
## 87  47.224.172.17.in-addr.arpa.   12 1073                darwincode.com.
## 88  47.224.172.17.in-addr.arpa.   12 1073                carbontest.com.
## 89  47.224.172.17.in-addr.arpa.   12 1073              carbondating.com.
## 90  47.224.172.17.in-addr.arpa.   12 1073                 carbonapi.com.
## 91  47.224.172.17.in-addr.arpa.   12 1073           braeburncapital.com.
## 92  47.224.172.17.in-addr.arpa.   12 1073                  applexpo.net.
## 93  47.224.172.17.in-addr.arpa.   12 1073                  applexpo.com.
## 94  47.224.172.17.in-addr.arpa.   12 1073                applereach.net.
## 95  47.224.172.17.in-addr.arpa.   12 1073                applereach.com.
## 96  47.224.172.17.in-addr.arpa.   12 1073            appleiservices.com.
## 97  47.224.172.17.in-addr.arpa.   12 1073     applefinalcutproworld.org.
## 98  47.224.172.17.in-addr.arpa.   12 1073     applefinalcutproworld.net.
## 99  47.224.172.17.in-addr.arpa.   12 1073     applefinalcutproworld.com.
## 100 47.224.172.17.in-addr.arpa.   12 1073            applefilmmaker.com.
## 101 47.224.172.17.in-addr.arpa.   12 1073             applefilmaker.com.
## 102 47.224.172.17.in-addr.arpa.   12 1073                appleenews.com.
## 103 47.224.172.17.in-addr.arpa.   12 1073               appledarwin.org.
## 104 47.224.172.17.in-addr.arpa.   12 1073               appledarwin.net.
## 105 47.224.172.17.in-addr.arpa.   12 1073               appledarwin.com.
## 106 47.224.172.17.in-addr.arpa.   12 1073         applecomputerimac.com.
## 107 47.224.172.17.in-addr.arpa.   12 1073        applecomputer-imac.com.
## 108 47.224.172.17.in-addr.arpa.   12 1073                  applecare.cc.
## 109 47.224.172.17.in-addr.arpa.   12 1073               applecarbon.com.
## 110 47.224.172.17.in-addr.arpa.   12 1073                 apple-inc.net.
## 111 47.224.172.17.in-addr.arpa.   12 1073               apple-enews.com.
## 112 47.224.172.17.in-addr.arpa.   12 1073              apple-darwin.org.
## 113 47.224.172.17.in-addr.arpa.   12 1073              apple-darwin.net.
## 114 47.224.172.17.in-addr.arpa.   12 1073              apple-darwin.com.
## 115 47.224.172.17.in-addr.arpa.   12 1073                  mobileme.com.
## 116 47.224.172.17.in-addr.arpa.   12 1073                ipa-iphone.net.
## 117 47.224.172.17.in-addr.arpa.   12 1073               jetfuelapps.com.
## 118 47.224.172.17.in-addr.arpa.   12 1073                jetfuelapp.com.
## 119 47.224.172.17.in-addr.arpa.   12 1073                   burstly.net.
## 120 47.224.172.17.in-addr.arpa.   12 1073             appmediagroup.com.
## 121 47.224.172.17.in-addr.arpa.   12 1073             airsupportapp.com.
## 122 47.224.172.17.in-addr.arpa.   12 1073            burstlyrewards.com.
## 123 47.224.172.17.in-addr.arpa.   12 1073        surveys-temp.apple.com.
## 124 47.224.172.17.in-addr.arpa.   12 1073               appleiphone.com.
## 125 47.224.172.17.in-addr.arpa.   12 1073                       asto.re.
## 126 47.224.172.17.in-addr.arpa.   12 1073                 itunesops.com.
## 127 47.224.172.17.in-addr.arpa.   12 1073                     apple.com.
## 128 47.224.172.17.in-addr.arpa.   12 1073     st11p01ww-apple.apple.com.
## 129 47.224.172.17.in-addr.arpa.   12 1073                      apple.by.
## 130 47.224.172.17.in-addr.arpa.   12 1073                 airtunes.info.
## 131 47.224.172.17.in-addr.arpa.   12 1073              applecentre.info.
## 132 47.224.172.17.in-addr.arpa.   12 1073         applecomputerinc.info.
## 133 47.224.172.17.in-addr.arpa.   12 1073                appleexpo.info.
## 134 47.224.172.17.in-addr.arpa.   12 1073             applemasters.info.
## 135 47.224.172.17.in-addr.arpa.   12 1073                 applepay.info.
## 136 47.224.172.17.in-addr.arpa.   12 1073 applepaymerchantsupplies.info.
## 137 47.224.172.17.in-addr.arpa.   12 1073         applepaysupplies.info.
## 138 47.224.172.17.in-addr.arpa.   12 1073              applescript.info.
## 139 47.224.172.17.in-addr.arpa.   12 1073               appleshare.info.
## 140 47.224.172.17.in-addr.arpa.   12 1073                   macosx.info.
## 141 47.224.172.17.in-addr.arpa.   12 1073                powerbook.info.
## 142 47.224.172.17.in-addr.arpa.   12 1073                 powermac.info.
## 143 47.224.172.17.in-addr.arpa.   12 1073            quicktimelive.info.
## 144 47.224.172.17.in-addr.arpa.   12 1073              quicktimetv.info.
## 145 47.224.172.17.in-addr.arpa.   12 1073                 sherlock.info.
## 146 47.224.172.17.in-addr.arpa.   12 1073            shopdifferent.info.
## 147 47.224.172.17.in-addr.arpa.   12 1073                 skyvines.info.
## 148 47.224.172.17.in-addr.arpa.   12 1073                     ubnw.info.
## 
## $Additional
## list()
## 
## $edns_client_subnet
## [1] "0.0.0.0/0"

And, you can go “easter egg” hunting:

cat(query("google-public-dns-a.google.com", "TXT")$Answer$data)
## "http://xkcd.com/1361/"

Note that Google DNS-over-HTTPS supports [all the RR types](http://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-4).

If you have more than a few domains to lookup and are querying for the same RR record, you can use the `bulk_query()` function:

hosts <- c("rud.is", "dds.ec", "r-project.org", "rstudio.com", "apple.com")
bulk_query(hosts)
## Source: local data frame [7 x 4]
## 
##             name  type   TTL            data
##            (chr) (int) (int)           (chr)
## 1        rud.is.     1  3599 104.236.112.222
## 2        dds.ec.     1   299   162.243.111.4
## 3 r-project.org.     1  3601   137.208.57.37
## 4   rstudio.com.     1  3599    45.79.156.36
## 5     apple.com.     1  1088   17.172.224.47
## 6     apple.com.     1  1088    17.178.96.59
## 7     apple.com.     1  1088   17.142.160.59

Note that this function only returns a `data_frame` (none of the status fields).

### More DNSpelunking with `gdns`

DNS records contain a treasure trove of data (at least for cybersecurity researchers). Say you have a list of base, primary domains for the Fortune 1000:

library(readr)
library(urltools)
 
URL <- "https://gist.githubusercontent.com/hrbrmstr/ae574201af3de035c684/raw/2d21bb4132b77b38f2992dfaab99649397f238e9/f1000.csv"
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil)
 
f1k <- read_csv(fil)
 
doms1k <- suffix_extract(domain(f1k$website))
doms1k <- paste(doms1k$domain, doms1k$suffix, sep=".")
 
head(doms1k)
## [1] "walmart.com"           "exxonmobil.com"       
## [3] "chevron.com"           "berkshirehathaway.com"
## [5] "apple.com"             "gm.com"

We can get all the `TXT` records for them:

library(parallel)
library(doParallel) # parallel ops will make this go faster
library(foreach)
library(dplyr)
library(ggplot2)
library(grid)
library(hrbrmrkdn)
 
cl <- makePSOCKcluster(4)
registerDoParallel(cl)
 
f1k_l <- foreach(dom=doms1k) %dopar% gdns::bulk_query(dom, "TXT")
f1k <- bind_rows(f1k_l)
 
length(unique(f1k$name))
## [1] 858
 
df <- count(count(f1k, name), `Number of TXT records`=n)
df <- bind_rows(df, data_frame(`Number of TXT records`=0, n=142))
 
gg <- ggplot(df, aes(`Number of TXT records`, n))
gg <- gg + geom_bar(stat="identity", width=0.75)
gg <- gg + scale_x_continuous(expand=c(0,0), breaks=0:13)
gg <- gg + scale_y_continuous(expand=c(0,0))
gg <- gg + labs(y="# Orgs", 
                title="TXT record count per Fortune 1000 Org")
gg <- gg + theme_hrbrmstr(grid="Y", axis="xy")
gg <- gg + theme(axis.title.x=element_text(margin=margin(t=-22)))
gg <- gg + theme(axis.title.y=element_text(angle=0, vjust=1, 
                                           margin=margin(r=-49)))
gg <- gg + theme(plot.margin=margin(t=10, l=30, b=30, r=10))
gg <- gg + theme(plot.title=element_text(margin=margin(b=20)))
gg

Fullscreen_4_11_16__12_35_AM

We can see that 858 of the Fortune 1000 have `TXT` records and more than a few have between 2 and 5 of them. Why look at `TXT` records? Well, they can tell us things like who uses cloud e-mail services, such as Outlook365:

sort(f1k$name[which(grepl("(MS=|outlook)", spf_includes(f1k$data), ignore.case=TRUE))])
##   [1] "21cf.com."                  "77nrg.com."                 "abbott.com."                "acuitybrands.com."         
##   [5] "adm.com."                   "adobe.com."                 "alaskaair.com."             "aleris.com."               
##   [9] "allergan.com."              "altria.com."                "amark.com."                 "ameren.com."               
##  [13] "americantower.com."         "ametek.com."                "amkor.com."                 "amphenol.com."             
##  [17] "amwater.com."               "analog.com."                "anixter.com."               "apachecorp.com."           
##  [21] "archrock.com."              "archrock.com."              "armstrong.com."             "aschulman.com."            
##  [25] "assurant.com."              "autonation.com."            "autozone.com."              "axiall.com."               
##  [29] "bd.com."                    "belk.com."                  "biglots.com."               "bio-rad.com."              
##  [33] "biomet.com."                "bloominbrands.com."         "bms.com."                   "borgwarner.com."           
##  [37] "boydgaming.com."            "brinks.com."                "brocade.com."               "brunswick.com."            
##  [41] "cabotog.com."               "caleres.com."               "campbellsoupcompany.com."   "carefusion.com."           
##  [45] "carlyle.com."               "cartech.com."               "cbrands.com."               "cbre.com."                 
##  [49] "chemtura.com."              "chipotle.com."              "chiquita.com."              "churchdwight.com."         
##  [53] "cinemark.com."              "cintas.com."                "cmc.com."                   "cmsenergy.com."            
##  [57] "cognizant.com."             "colfaxcorp.com."            "columbia.com."              "commscope.com."            
##  [61] "con-way.com."               "convergys.com."             "couche-tard.com."           "crestwoodlp.com."          
##  [65] "crowncastle.com."           "crowncork.com."             "csx.com."                   "cummins.com."              
##  [69] "cunamutual.com."            "dana.com."                  "darlingii.com."             "deanfoods.com."            
##  [73] "dentsplysirona.com."        "discoverfinancial.com."     "disney.com."                "donaldson.com."            
##  [77] "drhorton.com."              "dupont.com."                "dyn-intl.com."              "dynegy.com."               
##  [81] "ea.com."                    "eastman.com."               "ecolab.com."                "edgewell.com."             
##  [85] "edwards.com."               "emc.com."                   "enablemidstream.com."       "energyfutureholdings.com." 
##  [89] "energytransfer.com."        "eogresources.com."          "equinix.com."               "expeditors.com."           
##  [93] "express.com."               "fastenal.com."              "ferrellgas.com."            "fisglobal.com."            
##  [97] "flowserve.com."             "fmglobal.com."              "fnf.com."                   "g-iii.com."                
## [101] "genpt.com."                 "ggp.com."                   "gilead.com."                "goodyear.com."             
## [105] "grainger.com."              "graphicpkg.com."            "hanes.com."                 "hanover.com."              
## [109] "harley-davidson.com."       "harsco.com."                "hasbro.com."                "hbfuller.com."             
## [113] "hei.com."                   "hhgregg.com."               "hnicorp.com."               "homedepot.com."            
## [117] "hpinc.com."                 "hubgroup.com."              "iac.com."                   "igt.com."                  
## [121] "iheartmedia.com."           "insperity.com."             "itt.com."                   "itw.com."                  
## [125] "jarden.com."                "jcpenney.com."              "jll.com."                   "joyglobal.com."            
## [129] "juniper.net."               "kellyservices.com."         "kennametal.com."            "kiewit.com."               
## [133] "kindermorgan.com."          "kindredhealthcare.com."     "kodak.com."                 "lamresearch.com."          
## [137] "lansingtradegroup.com."     "lennar.com."                "levistrauss.com."           "lithia.com."               
## [141] "manitowoc.com."             "manpowergroup.com."         "marathonoil.com."           "marathonpetroleum.com."    
## [145] "mastec.com."                "mastercard.com."            "mattel.com."                "maximintegrated.com."      
## [149] "mednax.com."                "mercuryinsurance.com."      "mgmresorts.com."            "micron.com."               
## [153] "mohawkind.com."             "molsoncoors.com."           "mosaicco.com."              "motorolasolutions.com."    
## [157] "mpgdriven.com."             "mscdirect.com."             "mtb.com."                   "murphyoilcorp.com."        
## [161] "mutualofomaha.com."         "mwv.com."                   "navistar.com."              "nbty.com."                 
## [165] "newellrubbermaid.com."      "nexeosolutions.com."        "nike.com."                  "nobleenergyinc.com."       
## [169] "o-i.com."                   "oge.com."                   "olin.com."                  "omnicomgroup.com."         
## [173] "onsemi.com."                "owens-minor.com."           "paychex.com."               "peabodyenergy.com."        
## [177] "pepboys.com."               "pmi.com."                   "pnkinc.com."                "polaris.com."              
## [181] "polyone.com."               "postholdings.com."          "ppg.com."                   "prudential.com."           
## [185] "qg.com."                    "quantaservices.com."        "quintiles.com."             "rcscapital.com."           
## [189] "rexnord.com."               "roberthalf.com."            "rushenterprises.com."       "ryland.com."               
## [193] "sandisk.com."               "sands.com."                 "scansource.com."            "sempra.com."               
## [197] "sonoco.com."                "spiritaero.com."            "sprouts.com."               "stanleyblackanddecker.com."
## [201] "starwoodhotels.com."        "steelcase.com."             "stryker.com."               "sunedison.com."            
## [205] "sunpower.com."              "supervalu.com."             "swifttrans.com."            "synnex.com."               
## [209] "taylormorrison.com."        "techdata.com."              "tegna.com."                 "tempursealy.com."          
## [213] "tetratech.com."             "theice.com."                "thermofisher.com."          "tjx.com."                  
## [217] "trueblue.com."              "ufpi.com."                  "ulta.com."                  "unfi.com."                 
## [221] "unifiedgrocers.com."        "universalcorp.com."         "vishay.com."                "visteon.com."              
## [225] "vwr.com."                   "westarenergy.com."          "westernunion.com."          "westrock.com."             
## [229] "wfscorp.com."               "whitewave.com."             "wpxenergy.com."             "wyndhamworldwide.com."     
## [233] "xilinx.com."                "xpo.com."                   "yum.com."                   "zimmerbiomet.com."

That’s 236 of them outsourcing some part of e-mail services to Microsoft.

We can also see which ones have terrible mail configs (`+all` or `all` passing):

f1k[which(passes_all(f1k$data)),]$name
## [1] "wfscorp.com."      "dupont.com."       "group1auto.com."   "uhsinc.com."      
## [5] "bigheartpet.com."  "pcconnection.com."

or are configured for Exchange federation services:

sort(f1k$name[which(grepl("==", f1k$data))])
## sort(f1k$name[which(grepl("==", f1k$data))])
##   [1] "21cf.com."                 "aarons.com."               "abbott.com."               "abbvie.com."              
##   [5] "actavis.com."              "activisionblizzard.com."   "acuitybrands.com."         "adm.com."                 
##   [9] "adobe.com."                "adt.com."                  "advanceautoparts.com."     "aecom.com."               
##  [13] "aetna.com."                "agilent.com."              "airproducts.com."          "alcoa.com."               
##  [17] "aleris.com."               "allergan.com."             "alliancedata.com."         "amcnetworks.com."         
##  [21] "amd.com."                  "americantower.com."        "amfam.com."                "amgen.com."               
##  [25] "amtrustgroup.com."         "amtrustgroup.com."         "amtrustgroup.com."         "amtrustgroup.com."        
##  [29] "anadarko.com."             "analog.com."               "apachecorp.com."           "applied.com."             
##  [33] "aptar.com."                "aramark.com."              "aramark.com."              "arcb.com."                
##  [37] "archcoal.com."             "armstrong.com."            "armstrong.com."            "arrow.com."               
##  [41] "asburyauto.com."           "autonation.com."           "avnet.com."                "ball.com."                
##  [45] "bankofamerica.com."        "baxter.com."               "bc.com."                   "bd.com."                  
##  [49] "bd.com."                   "bd.com."                   "belden.com."               "bemis.com."               
##  [53] "bestbuy.com."              "biogen.com."               "biomet.com."               "bloominbrands.com."       
##  [57] "bms.com."                  "boeing.com."               "bonton.com."               "borgwarner.com."          
##  [61] "brinks.com."               "brocade.com."              "brunswick.com."            "c-a-m.com."               
##  [65] "ca.com."                   "cabelas.com."              "cabotog.com."              "caleres.com."             
##  [69] "caleres.com."              "caleres.com."              "calpine.com."              "capitalone.com."          
##  [73] "cardinal.com."             "carlyle.com."              "carlyle.com."              "cartech.com."             
##  [77] "cbre.com."                 "celgene.com."              "centene.com."              "centurylink.com."         
##  [81] "cerner.com."               "cerner.com."               "cfindustries.com."         "ch2m.com."                
##  [85] "chevron.com."              "chipotle.com."             "chiquita.com."             "chk.com."                 
##  [89] "chrobinson.com."           "chs.net."                  "chsinc.com."               "chubb.com."               
##  [93] "ciena.com."                "cigna.com."                "cinemark.com."             "cit.com."                 
##  [97] "cmc.com."                  "cmegroup.com."             "coach.com."                "cognizant.com."           
## [101] "cokecce.com."              "colfaxcorp.com."           "columbia.com."             "commscope.com."           
## [105] "con-way.com."              "conagrafoods.com."         "conocophillips.com."       "coopertire.com."          
## [109] "core-mark.com."            "crbard.com."               "crestwoodlp.com."          "crowncastle.com."         
## [113] "crowncork.com."            "csx.com."                  "danaher.com."              "darden.com."              
## [117] "darlingii.com."            "davita.com."               "davita.com."               "davita.com."              
## [121] "dentsplysirona.com."       "diebold.com."              "diplomat.is."              "dish.com."                
## [125] "disney.com."               "donaldson.com."            "dresser-rand.com."         "dstsystems.com."          
## [129] "dupont.com."               "dupont.com."               "dyn-intl.com."             "dyn-intl.com."            
## [133] "dynegy.com."               "ea.com."                   "ea.com."                   "eastman.com."             
## [137] "ebay.com."                 "echostar.com."             "ecolab.com."               "edmc.edu."                
## [141] "edwards.com."              "elcompanies.com."          "emc.com."                  "emerson.com."             
## [145] "energyfutureholdings.com." "energytransfer.com."       "eogresources.com."         "equinix.com."             
## [149] "essendant.com."            "esterline.com."            "evhc.net."                 "exelisinc.com."           
## [153] "exeloncorp.com."           "express-scripts.com."      "express.com."              "express.com."             
## [157] "exxonmobil.com."           "familydollar.com."         "fanniemae.com."            "fastenal.com."            
## [161] "fbhs.com."                 "ferrellgas.com."           "firstenergycorp.com."      "firstsolar.com."          
## [165] "fiserv.com."               "flowserve.com."            "fmc.com."                  "fmglobal.com."            
## [169] "fnf.com."                  "freddiemac.com."           "ge.com."                   "genpt.com."               
## [173] "genworth.com."             "ggp.com."                  "grace.com."                "grainger.com."            
## [177] "graphicpkg.com."           "graybar.com."              "guess.com."                "hain.com."                
## [181] "halliburton.com."          "hanes.com."                "hanes.com."                "harley-davidson.com."     
## [185] "harman.com."               "harris.com."               "harsco.com."               "hasbro.com."              
## [189] "hcahealthcare.com."        "hcc.com."                  "hei.com."                  "henryschein.com."         
## [193] "hess.com."                 "hhgregg.com."              "hnicorp.com."              "hollyfrontier.com."       
## [197] "hologic.com."              "honeywell.com."            "hospira.com."              "hp.com."                  
## [201] "hpinc.com."                "hrblock.com."              "iac.com."                  "igt.com."                 
## [205] "iheartmedia.com."          "imshealth.com."            "ingrammicro.com."          "intel.com."               
## [209] "interpublic.com."          "intuit.com."               "ironmountain.com."         "jacobs.com."              
## [213] "jarden.com."               "jcpenney.com."             "jll.com."                  "johndeere.com."           
## [217] "johndeere.com."            "joyglobal.com."            "juniper.net."              "karauctionservices.com."  
## [221] "kbhome.com."               "kemper.com."               "keurig.com."               "khov.com."                
## [225] "kindredhealthcare.com."    "kkr.com."                  "kla-tencor.com."           "labcorp.com."             
## [229] "labcorp.com."              "lamresearch.com."          "lamresearch.com."          "landolakesinc.com."       
## [233] "lansingtradegroup.com."    "lear.com."                 "leggmason.com."            "leidos.com."              
## [237] "level3.com."               "libertymutual.com."        "lilly.com."                "lithia.com."              
## [241] "livenation.com."           "lkqcorp.com."              "loews.com."                "magellanhealth.com."      
## [245] "manitowoc.com."            "marathonoil.com."          "marathonpetroleum.com."    "markelcorp.com."          
## [249] "markwest.com."             "marriott.com."             "martinmarietta.com."       "masco.com."               
## [253] "massmutual.com."           "mastec.com."               "mastercard.com."           "mattel.com."              
## [257] "maximintegrated.com."      "mckesson.com."             "mercuryinsurance.com."     "meritor.com."             
## [261] "metlife.com."              "mgmresorts.com."           "micron.com."               "microsoft.com."           
## [265] "mohawkind.com."            "molsoncoors.com."          "monsanto.com."             "mosaicco.com."            
## [269] "motorolasolutions.com."    "mscdirect.com."            "murphyoilcorp.com."        "nasdaqomx.com."           
## [273] "navistar.com."             "nbty.com."                 "ncr.com."                  "netapp.com."              
## [277] "newfield.com."             "newscorp.com."             "nike.com."                 "nov.com."                 
## [281] "nrgenergy.com."            "ntenergy.com."             "nucor.com."                "nustarenergy.com."        
## [285] "o-i.com."                  "oaktreecapital.com."       "ocwen.com."                "omnicare.com."            
## [289] "oneok.com."                "oneok.com."                "onsemi.com."               "outerwall.com."           
## [293] "owens-minor.com."          "owens-minor.com."          "oxy.com."                  "packagingcorp.com."       
## [297] "pall.com."                 "parexel.com."              "paychex.com."              "pcconnection.com."        
## [301] "penskeautomotive.com."     "pepsico.com."              "pfizer.com."               "pg.com."                  
## [305] "polaris.com."              "polyone.com."              "pplweb.com."               "principal.com."           
## [309] "protective.com."           "publix.com."               "qg.com."                   "questdiagnostics.com."    
## [313] "quintiles.com."            "rcscapital.com."           "realogy.com."              "regmovies.com."           
## [317] "rentacenter.com."          "republicservices.com."     "rexnord.com."              "reynoldsamerican.com."    
## [321] "reynoldsamerican.com."     "rgare.com."                "roberthalf.com."           "rpc.net."                 
## [325] "rushenterprises.com."      "safeway.com."              "saic.com."                 "sandisk.com."             
## [329] "scana.com."                "scansource.com."           "seaboardcorp.com."         "selective.com."           
## [333] "selective.com."            "sempra.com."               "servicemaster.com."        "servicemaster.com."       
## [337] "servicemaster.com."        "sjm.com."                  "sm-energy.com."            "spectraenergy.com."       
## [341] "spiritaero.com."           "sprouts.com."              "spx.com."                  "staples.com."             
## [345] "starbucks.com."            "starwoodhotels.com."       "statestreet.com."          "steelcase.com."           
## [349] "steeldynamics.com."        "stericycle.com."           "stifel.com."               "stryker.com."             
## [353] "sunedison.com."            "sungard.com."              "supervalu.com."            "symantec.com."            
## [357] "symantec.com."             "synnex.com."               "synopsys.com."             "taylormorrison.com."      
## [361] "tdsinc.com."               "teamhealth.com."           "techdata.com."             "teledyne.com."            
## [365] "tempursealy.com."          "tenethealth.com."          "teradata.com."             "tetratech.com."           
## [369] "textron.com."              "thermofisher.com."         "tiaa-cref.org."            "tiffany.com."             
## [373] "timewarner.com."           "towerswatson.com."         "treehousefoods.com."       "tribunemedia.com."        
## [377] "trimble.com."              "trinet.com."               "trueblue.com."             "ugicorp.com."             
## [381] "uhsinc.com."               "ulta.com."                 "unifiedgrocers.com."       "unisys.com."              
## [385] "unum.com."                 "usfoods.com."              "varian.com."               "verizon.com."             
## [389] "vfc.com."                  "viacom.com."               "visa.com."                 "vishay.com."              
## [393] "visteon.com."              "wabtec.com."               "walmart.com."              "wecenergygroup.com."      
## [397] "wecenergygroup.com."       "west.com."                 "westarenergy.com."         "westlake.com."            
## [401] "westrock.com."             "weyerhaeuser.com."         "wholefoodsmarket.com."     "williams.com."            
## [405] "wm.com."                   "wnr.com."                  "wpxenergy.com."            "wyndhamworldwide.com."    
## [409] "xerox.com."                "xpo.com."                  "yrcw.com."                 "yum.com."                 
## [413] "zoetis.com."

And, even go so far as to see what are the most popular third-party mail services:

incl <- suffix_extract(sort(unlist(spf_includes(f1k$data))))
incl <- data.frame(table(paste(incl$domain, incl$suffix, sep=".")), stringsAsFactors=FALSE)
incl <- head(arrange(incl, desc(Freq)), 20)
incl <- mutate(incl, Var1=factor(Var1, Var1))
incl <- rename(incl, Service=Var1, Count=Freq)
 
gg <- ggplot(incl, aes(Service, Count))
gg <- gg + geom_bar(stat="identity", width=0.75)
gg <- gg + scale_x_discrete(expand=c(0,0))
gg <- gg + scale_y_continuous(expand=c(0,0), limits=c(0, 250))
gg <- gg + coord_flip()
gg <- gg + labs(x=NULL, y=NULL, 
                title="Most popular services used by the F1000",
                subtitle="As determined by SPF record configuration")
gg <- gg + theme_hrbrmstr(grid="X", axis="y")
gg <- gg + theme(plot.margin=margin(t=10, l=10, b=20, r=10))
gg

Fullscreen_4_11_16__1_10_AM

### Fin

There are more `TXT` records to play with than just SPF ones and many other hidden easter eggs. I need to add a few more functions into `gdns` before shipping it off to CRAN, so if you have any feature requests, now’s the time to file a [github issue](https://github.com/hrbrmstr/gdns/issues).

The [`iptools` package](https://github.com/hrbrmstr/iptools)—a toolkit for manipulating, validating and testing IP addresses and ranges, along with datasets relating to IP addresses—is flying through the internets and hitting a CRAN mirror near you, soon.

### What’s fixed?

[Tim Smith](https://github.com/tdsmith) fixed [a bug](https://github.com/hrbrmstr/iptools/issues/26) in `ip_in_range()` that occurred when the netmask was `/32` (thanks, Tim!).

### What’s new?

The `range_boundaries()` function now returns the three new fields that are pretty obvious once you see it in action:

range_boundaries("172.18.0.0/28")
##   minimum_ip  maximum_ip min_numeric max_numeric         range
## 1 172.18.0.0 172.18.0.15  2886860800  2886860815 172.18.0.0/28

They are tacked on the end, so if you were using positional or named columns previously, you’re still good to go.

We’ve added a new `country_ranges()` function to return all “assigned” CIDR blocks in a country. You just give it character vector of one or more ISO 3166-1 alpha-2 codes and you get back the CIDRs:

country_ranges("TO")
## $TO
## [1] "43.255.148.0/22"  "103.239.160.0/22" "103.242.126.0/23" "103.245.160.0/22" "175.176.144.0/21" "202.43.8.0/21"   
## [7] "202.134.24.0/21"

This data is updated daily and there’s some session caching built-into the function to speed up subsequent calls if you forgot to save the output. You can flush the session cache with `flush_country_cidrs()` and query it with `cached_country_cidrs()`.

### What’s next?

We’re waiting until the R 3.3.0 Windows toolchain is stable to add in MaxMind ASN lookups. If there are any IP-related functions you need added, drop us an [issue](https://github.com/hrbrmstr/iptools/issues). We’re at nearly 1,700 downloads from the RStudio mirror, which (IMO) is kinda cool for such a niche package. Many thanks to all our users and one more thank you to Dirk for the `AsioHeaders` package.

### Fin

If you want some “bad” IP addresses to play around with in `iptools`, check out the [`blocklist`](https://github.com/hrbrmstr/blocklist) package, which provides an interface to a subset of the [blocklist.de](http://www.blocklist.de/en/index.html) API.

`iptools` is a set of tools for working with IP addresses. Not just work, but work _fast_. It’s backed by `Rcpp` and now uses the [AsioHeaders](http://dirk.eddelbuettel.com/blog/2016/01/07/#asioheaders_1.11.0-1) package by Dirk Eddelbuettel, which means it no longer needs to _link_ against the monolithic Boost libraries and *works on Windows*!

What can you do with it? One thing you can do is take a vector of domain names and turn them into IP addresses:

library(iptools)
 
hostname_to_ip(c("rud.is", "dds.ec", "ironholds.org", "google.com"))
 
## [[1]]
## [1] "104.236.112.222"
## 
## [[2]]
## [1] "162.243.111.4"
## 
## [[3]]
## [1] "104.131.2.226"
## 
## [[4]]
##  [1] "2607:f8b0:400b:80a::100e" "74.125.226.101"           "74.125.226.102"          
##  [4] "74.125.226.100"           "74.125.226.96"            "74.125.226.104"          
##  [7] "74.125.226.99"            "74.125.226.103"           "74.125.226.105"          
## [10] "74.125.226.98"            "74.125.226.97"            "74.125.226.110"

That means you can pump a bunch of domain names from logs into `iptools` and get current IP address allocations out for them.

You can also do the reverse:

library(magrittr)
library(purrr)
library(iptools)
 
hostname_to_ip(c("rud.is", "dds.ec", "ironholds.org", "google.com")) %>% 
  flatten_chr() %>% 
  ip_to_hostname() %>% 
  flatten_chr()
 
##  [1] "104.236.112.222"           "dds.ec"                    "104.131.2.226"            
##  [4] "yyz08s13-in-x0e.1e100.net" "yyz08s13-in-f5.1e100.net"  "yyz08s13-in-f6.1e100.net" 
##  [7] "yyz08s13-in-f4.1e100.net"  "yyz08s13-in-f0.1e100.net"  "yyz08s13-in-f8.1e100.net" 
## [10] "yyz08s13-in-f3.1e100.net"  "yyz08s13-in-f7.1e100.net"  "yyz08s13-in-f9.1e100.net" 
## [13] "yyz08s13-in-f2.1e100.net"  "yyz08s13-in-f1.1e100.net"  "yyz08s13-in-f14.1e100.net"

Notice that it handled IPv6 addresses and also cases where no reverse mapping existed for an IP address.

You can convert IPv4 addresses to and from long integer format (the 4 octet version of IPv4 addresses is primarily to make them easier for humans to grok), generate random IP addresses for testing, test IP addresses for validity and type and also reference data sets with registered assignments (so you can see allocated IP groups). Plus, it includes `xff_extract()` which can help identify an actual IP address (helpful when connections come from behind proxies).

We can’t thank Dirk enough for cranking out `AsioHeaders` since it means there will be many more network/”cyber” packages coming for R and available on every platform.

You can find `iptools` version `0.3.0` [on CRAN](https://cran.r-project.org/web/packages/iptools/) now (it may take your mirror a bit to catch up), grab the source [release](https://github.com/hrbrmstr/iptools/releases/tag/v0.3.0) on GitHub or check out the [repo](https://github.com/hrbrmstr/iptools/), poke around, submit issues and/or contribute!

Isn’t it great when an R package can help you with resolutions in the new year?

Gone are the days when one had a single computer plugged directly into a modem (cable, DSL or good ol’ Hayes). Even the days when there were just one or two computers connected via wires or invisible multi-gigahertz waves passing through the air are in the long gone by. Today (as you’ll see in the February 2016 [OUCH! newsletter](http://securingthehuman.sans.org/resources/newsletters/ouch/2016)), there are scads of devices of all kinds on your home network. How can you keep track of them all?

Some router & wireless access point vendors provide tools on their device “admin” pages to see what’s connected, but they are inconsistent at best (and usually pretty ugly & cumbersome to navigate to). Thankfully, app purveyors have jumped in to fill the gap. Here’s a list of free or “freemium” (basic features for free, advanced features cost extra) tools for mobile devices and Windows or OS X (if you’re running Linux at home, I’m assuming you’re familiar with the tools available for Linux).

### iOS

– Fing –
– iNet – Network Scanner –
– Network Analyzer Lite – wifi scanner, ping & net info –

### Android

– Fing – (google); (amazon)
– Pamn (google); (direct source)

### Windows

– Advanced IP Scanner
– Angry IP Scanner –
– Fing –
– MiTec Network Scanner –
– nmap –

### OS X

– Angry IP Scanner –
– Fing –
– IP Scanner –
– LanScan –
– nmap –

Some of these tools are easier to work with than others, but they all install pretty easily (though “Fing” and “nmap” work at the command-line on Windows & OS X, so if you’re not a “power user”, you may want to use other tools on those platforms). In most cases, it’s up to you to keep a copy of the output and perform your own “diffs”. One “pro” option for tools like “Fing” is the ability to have the tool store scan results “in the cloud” and perform this comparison for you.

Drop a note in the comments if you have other suggestions, but _vendors be warned_: I’ll be moderating all comments to help ensure no evil links or blatant product shilling makes it to reader eyeballs.

image

Apple made the @justgetflux folks remove their [iOS sideloaded app](https://justgetflux.com/sideload/) due to the use of private APIs (which are a violation of the Apple Developer agreement). The ZIP archive has been pulled from their site (and it really has, too).

This “sideloading”—i.e. installing directly to your device after compiling it from source—_was_ an interesting way to distribute the app. I visually scanned the source code before sideloading (I code for iOS in both Objective C and Swift) and there seemed to be nothing nefarious in it and it works _really_ well. HOWEVER, I’m 100% sure an Xcode project ZIP archive of f.lux is going to hit the torrent and file-sharing sites pretty quickly. But, I’m also 99% sure that folks who want to do Really Bad Things™ to and with your iOS devices will gladly add some code to that Xcode project and most folks won’t take the time (or do not have the knowledge/experience) to validate the veracity of the code before using it.

So, first I implore iOS users to _not_ grab this sideloaded project from torrent or file-sharing sites since you will be putting your devices at risk if you do so.

Since some folks (but not very many, I suspect, since it does involve real work) will no doubt leave my warning unheeded, *please* run:

shasum -a 256 f.lux-xcode-master.zip

from an Terminal or iTerm2 prompt. If you don’t get:

38f463ee5780a4f2b0160f9fa21dbe3c78c5d80d3c093e4ff553aaca230e2898

as a result *DO NOT SIDE LOAD THE APP*. It means the good/safe f.lux source code/project has been modified by someone since November 11th, 2015 and that you are putting your device (i.e. the confidentiality, integrity and availability of your private data) at risk by installing it. You can also post it to [this site](http://hash.online-convert.com/sha256-generator) (I verified that it produces the correct result).

And, please, do not jailbreak your devices. You take a relatively safe operating system and pretty much turn it into an Android (check out the [2015 DBIR](http://verizonenterprise.com/DBIR) for more info on iOS vs Android security based on real data vs hype) .

Cybersecurity is a domain that really likes surveys, or at the very least it has many folks within it that like to conduct and report on surveys. One recent survey on threat intelligence is in it’s second year, so it sets about comparing answers across years. Rather than go into the many technical/statistical issues with this survey, I’d like to focus on alternate ways to visualize the comparison across years.

We’ll use the data that makes up this chart (Figure 3 from the report):

surveybars

since it’s pretty representative of the remainder of the figures.

Let’s start by reproducing this figure with ggplot2:

library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)
library(scales)
library(ggthemes)
library(extrafont)

loadfonts(quiet=TRUE)

read.csv("question.csv", stringsAsFactors=FALSE) %>%
  gather(year, value, -belief) %>%
  mutate(year=factor(sub("y", "", year)),
         belief=str_wrap(belief, 40)) -> question

beliefs <- unique(question$belief)
question$belief <- factor(beliefs, levels=rev(beliefs[c(1,2,4,5,3,7,6)]))

gg <- ggplot(question, aes(belief, value, group=year))
gg <- gg + geom_bar(aes(fill=year), stat="identity", position="dodge",
                    color="white", width=0.85)
gg <- gg + geom_text(aes(label=percent(value)), hjust=-0.15,
                     position=position_dodge(width=0.8), size=3)
gg <- gg + scale_x_discrete(expand=c(0,0))
gg <- gg + scale_y_continuous(expand=c(0,0), label=percent, limits=c(0,0.8))
gg <- gg + scale_fill_tableau(name="")
gg <- gg + coord_flip()
gg <- gg + labs(x=NULL, y=NULL, title="Fig 3: Reasons for fully participating\n")
gg <- gg + theme_tufte(base_family="Arial Narrow")
gg <- gg + theme(axis.ticks.x=element_blank())
gg <- gg + theme(axis.text.x=element_blank())
gg <- gg + theme(axis.ticks.y=element_blank())
gg <- gg + theme(legend.position="bottom")
gg <- gg + theme(plot.title=element_text(hjust=0))
gg

Now, the survey does caveat the findings and talks about non-response bias, sampling-frame bias and self-reporting bias. However, nowhere does it talk about the margin of error or anything relating to uncertainty. Thankfully, both the 2014 and 2015 reports communicate population and sample sizes, so we can figure out the margin of error:

library(samplesize4surveys)

moe_2014 <- e4p(19915, 701, 0.5)
## With the parameters of this function: N = 19915 n =  701 P = 0.5 DEFF =  1 conf = 0.95 . 
## The estimated coefficient of variation is  3.709879 . 
## The margin of error is 3.635614 . 
## 

moe_2015 <- e4p(18705, 692, 0.5)
## With the parameters of this function: N = 18705 n =  692 P = 0.5 DEFF =  1 conf = 0.95 . 
## The estimated coefficient of variation is  3.730449 . 
## The margin of error is 3.655773 .

They are both roughly 3.65% so let's take a look at our dodged bar chart again with this new information:

mutate(question, ymin=value-0.0365, ymax=value+0.0365) -> question

gg <- ggplot(question, aes(belief, value, group=year))
gg <- gg + geom_bar(aes(fill=year), stat="identity",
                    position=position_dodge(0.85),
                    color="white", width=0.85)
gg <- gg + geom_linerange(aes(ymin=ymin, ymax=ymax),
                         position=position_dodge(0.85),
                         size=1.5, color="#bdbdbd")
gg <- gg + scale_x_discrete(expand=c(0,0))
gg <- gg + scale_y_continuous(expand=c(0,0), label=percent, limits=c(0,0.85))
gg <- gg + scale_fill_tableau(name="")
gg <- gg + coord_flip()
gg <- gg + labs(x=NULL, y=NULL, title="Fig 3: Reasons for fully participating\n")
gg <- gg + theme_tufte(base_family="Arial Narrow")
gg <- gg + theme(axis.ticks.x=element_blank())
gg <- gg + theme(axis.text.x=element_blank())
gg <- gg + theme(axis.ticks.y=element_blank())
gg <- gg + theme(legend.position="bottom")
gg <- gg + theme(plot.title=element_text(hjust=0))
gg

Hrm. There seems to be a bit of overlap. Let's just focus on that:

gg <- ggplot(question, aes(belief, value, group=year))
gg <- gg + geom_pointrange(aes(ymin=ymin, ymax=ymax),
                         position=position_dodge(0.25),
                         size=1, color="#bdbdbd", fatten=1)
gg <- gg + scale_x_discrete(expand=c(0,0))
gg <- gg + scale_y_continuous(expand=c(0,0), label=percent, limits=c(0,1))
gg <- gg + scale_fill_tableau(name="")
gg <- gg + coord_flip()
gg <- gg + labs(x=NULL, y=NULL, title="Fig 3: Reasons for fully participating\n")
gg <- gg + theme_tufte(base_family="Arial Narrow")
gg <- gg + theme(axis.ticks.x=element_blank())
gg <- gg + theme(axis.text.x=element_blank())
gg <- gg + theme(axis.ticks.y=element_blank())
gg <- gg + theme(legend.position="bottom")
gg <- gg + theme(plot.title=element_text(hjust=0))
gg

The report actually makes hard claims based on the year-over-year change in the answers to many of the questions (not just this chart). Most have these overlapping intervals. Now, I understand that when a paying customer says they want a report that they wouldn't really be satisfied with a one-pager saying "See last years's report", but not communicating the uncertainty in these results seems like a significant omission.

But, I digress. There are better (or at least alternate) ways than bars to show this comparison. One is a "dumbbell chart".

question %>%
  group_by(belief) %>%
  mutate(line_col=ifelse(diff(value)<0, "2015", "2014"),
         hjust=ifelse(diff(value)<0, -0.5, 1.5)) %>%
  ungroup() -> question

gg <- ggplot(question)
gg <- gg + geom_path(aes(x=value, y=belief, group=belief, color=line_col))
gg <- gg + geom_point(aes(x=value, y=belief, color=year))
gg <- gg + geom_text(data=filter(question, year=="2015"),
                     aes(x=value, y=belief, label=percent(value),
                         hjust=hjust), size=2.5)
gg <- gg + scale_x_continuous(expand=c(0,0), limits=c(0,0.8))
gg <- gg + scale_color_tableau(name="")
gg <- gg + labs(x=NULL, y=NULL, title="Fig 3: Reasons for fully participating\n")
gg <- gg + theme_tufte(base_family="Arial Narrow")
gg <- gg + theme(axis.ticks.x=element_blank())
gg <- gg + theme(axis.text.x=element_blank())
gg <- gg + theme(axis.ticks.y=element_blank())
gg <- gg + theme(legend.position="bottom")
gg <- gg + theme(plot.title=element_text(hjust=0))
gg

I've used line color to indicate whether the 2015 value increased or decreased from 2014.

But, we still have the issue of communicating the margin of error. One way I came up with (which is not perfect) is to superimpose the dot-plot on top of the entire margin of error interval. While it doesn't show the discrete start/end margin for each year it does help to show that making definitive statements on the value comparisons is not exactly a good idea:

group_by(question, belief) %>%
  summarize(xmin=min(ymin), xmax=max(ymax)) -> band

gg <- ggplot(question)
gg <- gg + geom_segment(data=band,
                        aes(x=xmin, xend=xmax, y=belief, yend=belief),
                        color="#bdbdbd", alpha=0.5, size=3)
gg <- gg + geom_path(aes(x=value, y=belief, group=belief, color=line_col),
                     show.legend=FALSE)
gg <- gg + geom_point(aes(x=value, y=belief, color=year))
gg <- gg + geom_text(data=filter(question, year=="2015"),
                     aes(x=value, y=belief, label=percent(value),
                         hjust=hjust), size=2.5)
gg <- gg + scale_x_continuous(expand=c(0,0), limits=c(0,0.8))
gg <- gg + scale_color_tableau(name="")
gg <- gg + labs(x=NULL, y=NULL, title="Fig 3: Reasons for fully participating\n")
gg <- gg + theme_tufte(base_family="Arial Narrow")
gg <- gg + theme(axis.ticks.x=element_blank())
gg <- gg + theme(axis.text.x=element_blank())
gg <- gg + theme(axis.ticks.y=element_blank())
gg <- gg + theme(legend.position="bottom")
gg <- gg + theme(plot.title=element_text(hjust=0))
gg

Finally, the year-to-year nature of the data was just begging for a slopegraph:

question %>% mutate(vjust=0.5) -> question
question[(question$belief=="Makes threat data more actionable") &
           (question$year=="2015"),]$vjust <- -1
question[(question$belief=="Reduces the cost of detecting and\npreventing cyber attacks") &
           (question$year=="2015"),]$vjust <- 1.5

question$year <- factor(question$year, levels=c("2013", "2014", "2015", "2016", "2017", "2018"))

gg <- ggplot(question)
gg <- gg + geom_path(aes(x=year, y=value, group=belief, color=line_col))
gg <- gg + geom_point(aes(x=year, y=value), shape=21, fill="black", color="white")
gg <- gg + geom_text(data=filter(question, year=="2015"),
                     aes(x=year, y=value,
                         label=sprintf("\u2000%s %s", percent(value),
                                       gsub("\n", " ", belief)),
                         vjust=vjust), hjust=0, size=3)
gg <- gg + geom_text(data=filter(question, year=="2014"),
                     aes(x=year, y=value, label=percent(value)),
                     hjust=1.3, size=3)
gg <- gg + scale_x_discrete(expand=c(0,0.1), drop=FALSE)
gg <- gg + scale_color_tableau(name="")
gg <- gg + labs(x=NULL, y=NULL, title="Fig 3: Reasons for fully participating\n")
gg <- gg + theme_tufte(base_family="Arial Narrow")
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(axis.text=element_blank())
gg <- gg + theme(legend.position="none")
gg <- gg + theme(plot.title=element_text(hjust=0.5))
gg <- gg + theme(plot.title=element_text(hjust=0))
gg

It doesn't help communicate uncertainty but it's a nice alternative to bars.

Hopefully this helps provide some alternatives to bars for these types of comparisons and also ways to communicate uncertainty without confusing the reader (communicating uncertainty to a broad audience is hard).

Perhaps those conducting surveys (or data analyses in general) could subscribe to a "data visualizers" paraphrase of a quote from Epidemics, Book I, of the Hippocratic school:

"Practice two things in your dealings with data: either help or do not harm the reader."

The full Rmd and data for this post is in this gist.

The basic technique of cybercrime statistics—measuring the incidence of a given phenomenon (DDoS, trojan, APT) as a percentage of overall population size—had entered the mainstream of cybersecurity thought only in the previous decade. Cybersecurity as a science was still in its infancy, as many of its basic principles had yet to be established.

At the same time, the scientific method rarely intersected with the development and testing of new detection & prevention regimens. When you read through that endless stream of quack cybercures published daily on the Internet and at conferences like RSA, what strikes you most is not that they are all, almost without exception, based on anecdotal or woefully inadequately small evidence. What’s striking is that they never apologize for the shortcoming. They never pause to say, “Of course, this is all based on anecdotal evidence, but hear me out.” There’s no shame in these claims, no awareness of the imperfection of the methods, precisely because it seems to eminently reasonable that the local observation of a handful of minuscule cases might serve the silver bullet for cybercrime, if you look hard enough.


But, cybercrime couldn’t be studied in isolation. It was as much a product of the internet expansion as news and social media, where it was so uselessly anatomized. To understand the beast, you needed to think on the scale of the enterprise, from the hacker’s-eye view. You needed to look at the problem from the perspective of Henry Mayhew’s balloon. And you needed a way to persuade others to join you there.

Sadly, that’s not a modern story. It’s an adapted quote from chapter 4 (pp. 97-98, paperback) of The Ghost Map, by Steven Johnson, a book on the cholera epidemic of 1854.

I won’t ruin the book nor continue my attempt at analogy any further. Suffice it to say, you should read the book—if you haven’t already—and join me in calling out for the need for the John Snow of our cyber-time to arrive.

Far too many interesting bits to spam on Twitter individually but each is worth getting the word out on:

tumblr_m0wabdueuR1qd78lno1_1280

– It’s [π Day](https://www.google.com/search?q=pi+day)*
– Unless you’re living in a hole, you probably know that [Google Reader is on a death march](http://www.bbc.co.uk/news/technology-21785378). I’m really liking self-hosting [Tiny Tiny RSS](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CDEQFjAA&url=https%3A%2F%2Fgithub.com%2Fgothfox%2FTiny-Tiny-RSS&ei=YtlBUfOLJvLe4AOHtoDIAQ&usg=AFQjCNGwtEr8slx-i0vNzhQi4b4evRVXFA&bvm=bv.43287494,d.dmg) so far, and will follow up with a standalone blog post on it.
– @jayjacobs & I are speaking soon at both [SOURCE Boston](http://www.sourceconference.com/boston/) & [Secure360](http://secure360.org/). We hope to have time outside the talk to discuss security visualization & analysis with you, so go on and register!
– Speaking of datavis, [VizSec 2013](http://www.vizsec.org/) is on October 14th in Atlanta, GA this year and the aligned [VAST 2013 Challenge](http://vacommunity.org/VAST+Challenge+2013) (via @ieeevisweek) will have another infosec-themed mini-challenge. Watch that space, grab the data and start analyzing & visualizing!
– If you’re in or around Boston, it’d be great to meet you at @openvisconf on May 16-17.
– Majestic SEO has released a new data set for folks to play with: [a list of the top 100,000 websites broken down by the country in which they are hosted](http://blog.majesticseo.com/research/top-websites-by-country/). It requires a free reg to get the link, but no spam so far. (h/t @teamcymru)
– And, finally, @MarketplaceAPM aired a [good, accessible story](http://www.marketplace.org/topics/tech/who-pays-bill-cyber-war) on “Who pays the bill for a cyber war?”. I always encourage infosec industry folk to listen to shows like Marketplace to see how we sound to non-security folk and to glean ways we can communicate better with the people we are ultimately trying to help.

*Image via davincismurf