Skip navigation

Shortly after I added lollipop charts to ggalt I had a few requests for a dumbbell geom. It wasn’t difficult to do modify the underlying lollipop Geoms to make a geom_dumbbell(). Here it is in action:

library(ggplot2)
library(ggalt) # devtools::install_github("hrbrmstr/ggalt")
library(dplyr)

# from: https://plot.ly/r/dumbbell-plots/
URL <- "https://raw.githubusercontent.com/plotly/datasets/master/school_earnings.csv"
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil)

df <- read.csv(fil, stringsAsFactors=FALSE)
df <- arrange(df, desc(Men))
df <- mutate(df, School=factor(School, levels=rev(School)))

gg <- ggplot(df, aes(x=Women, xend=Men, y=School))
gg <- gg + geom_dumbbell(colour="#686868",
                         point.colour.l="#ffc0cb",
                         point.colour.r="#0000ff",
                         point.size.l=2.5,
                         point.size.r=2.5)
gg <- gg + scale_x_continuous(breaks=seq(60, 160, by=20),
                              labels=sprintf("$%sK", comma(seq(60, 160, by=20))))
gg <- gg + labs(x="Annual Salary", y=NULL,
                title="Gender Earnings Disparity",
                caption="Data from plotly")
gg <- gg + theme_bw()
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(axis.title.x=element_text(hjust=1, face="italic", margin=margin(t=-24)))
gg <- gg + theme(plot.caption=element_text(size=8, margin=margin(t=24)))
gg

Fullscreen_4_12_16__8_38_PM

The API isn't locked in, so definitely file an issue if you want different or additional functionality. One issue I personally still have is how to identify the left/right points (blue is male and pink is female in this one).

Working Out With Dumbbells

I thought folks might like to see behind the ggcurtain. It really only took the addition of two functions to ggalt: geom_dumbbell() (which you call directly) and GeomDumbbell() which acts behind the scenes.

There are a few additional, custom parameters to geom_dumbbell() and the mapped stat and position are hardcoded in the layer call. We also pass in these new parameters into the params list.

geom_dumbbell <- function(mapping = NULL, data = NULL, ...,
                          point.colour.l = NULL, point.size.l = NULL,
                          point.colour.r = NULL, point.size.r = NULL,
                          na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) {

  layer(
    data = data,
    mapping = mapping,
    stat = "identity",
    geom = GeomDumbbell,
    position = "identity",
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      na.rm = na.rm,
      point.colour.l = point.colour.l,
      point.size.l = point.size.l,
      point.colour.r = point.colour.r,
      point.size.r = point.size.r,
      ...
    )
  )
}

The exposed function eventually calls it's paired Geom. There we get to tell it what are required aes parameters and which ones aren't required, plus set some defaults.

We automagically add yend to the data in setup_data() (which gets called by the ggplot2 API).

Then, in draw_group() we create additional data.frames and return a list of three Geom layers (two points and one segment). Finally, we provide a default legend symbol.

GeomDumbbell <- ggproto("GeomDumbbell", Geom,
  required_aes = c("x", "xend", "y"),
  non_missing_aes = c("size", "shape",
                      "point.colour.l", "point.size.l",
                      "point.colour.r", "point.size.r"),
  default_aes = aes(
    shape = 19, colour = "black", size = 0.5, fill = NA,
    alpha = NA, stroke = 0.5
  ),

  setup_data = function(data, params) {
    transform(data, yend = y)
  },

  draw_group = function(data, panel_scales, coord,
                        point.colour.l = NULL, point.size.l = NULL,
                        point.colour.r = NULL, point.size.r = NULL) {

    points.l <- data
    points.l$colour <- point.colour.l %||% data$colour
    points.l$size <- point.size.l %||% (data$size * 2.5)

    points.r <- data
    points.r$x <- points.r$xend
    points.r$colour <- point.colour.r %||% data$colour
    points.r$size <- point.size.r %||% (data$size * 2.5)

    gList(
      ggplot2::GeomSegment$draw_panel(data, panel_scales, coord),
      ggplot2::GeomPoint$draw_panel(points.l, panel_scales, coord),
      ggplot2::GeomPoint$draw_panel(points.r, panel_scales, coord)
    )

  },

  draw_key = draw_key_point
)

In essence, this new geom saves calls to three additional geom_s, but does add more parameters, so it's not really clear if it saves much typing.

If you end up making anything interesting with geom_dumbbell() I encourage you to drop a note in the comments with a link.

Google recently [announced](https://developers.google.com/speed/public-dns/docs/dns-over-https) their DNS-over-HTTPS API, which _”enhances privacy and security between a client and a recursive resolver, and complements DNSSEC to provide end-to-end authenticated DNS lookups”_. The REST API they provided was pretty simple to [wrap into a package](https://github.com/hrbrmstr/gdns) and I tossed in some [SPF](http://www.openspf.org/SPF_Record_Syntax) functions that I had lying around to bulk it up a bit.

### Why DNS-over-HTTPS?

DNS machinations usually happen over UDP (and sometimes TCP). Unless you’re using some fairly modern DNS augmentations, these exchanges happen in cleartext, meaning your query and the response are exposed during transport (and they are already exposed to the server you’re querying for a response).

DNS queries over HTTPS will be harder to [spoof](http://www.veracode.com/security/spoofing-attack) and the query + response will be encrypted, so you gain transport privacy when, say, you’re at Starbucks or from your DSL, FiOS, Gogo Inflight, or cable internet provider (yes, they all snoop on your DNS queries).

You end up trusting Google quite a bit with this API, but if you were currently using `8.8.8.8` or `8.8.4.4` (or their IPv6 equivalents) you were already trusting Google (and it’s likely Google knows what you’re doing on the internet anyway given all the trackers and especially if you’re using Chrome).

One additional item you gain using this API is more control over [`EDNS0`](https://tools.ietf.org/html/draft-vandergaast-edns-client-ip-00) settings. `EDNS0` is a DNS protocol extension that, for example, enables the content delivery networks to pick the “closest” server farm to ensure speedy delivery of your streaming Game of Thrones binge watch. They get to know a piece of your IP address so they can make this decision, but you end up giving away a bit of privacy (though you lose the privacy in the end since the target CDN servers know precisely where you are).

Right now, there’s no way for most clients to use DNS-over-HTTPS directly, but the API can be used in a programmatic fashion, which may be helpful in situations where you need to do some DNS spelunking but UDP is blocked or you’re on a platform that can’t build the [`resolv`](https://github.com/hrbrmstr/resolv) package.

You can learn a bit more about DNS and privacy in this [IETF paper](https://www.ietf.org/mail-archive/web/dns-privacy/current/pdfWqAIUmEl47.pdf) [PDF].

### Mining DNS with `gdns`

The `gdns` package is pretty straightforward. Use the `query()` function to get DNS info for a single entity:

library(gdns)
 
query("apple.com")
## $Status  
## [1] 0          # NOERROR - Standard DNS response code (32 bit integer)
## 
## $TC
## [1] FALSE      # Whether the response is truncated
## 
## $RD
## [1] TRUE       # Should always be true for Google Public DNS
## 
## $RA
## [1] TRUE       # Should always be true for Google Public DNS
## 
## $AD
## [1] FALSE      # Whether all response data was validated with DNSSEC
## 
## $CD
## [1] FALSE      # Whether the client asked to disable DNSSEC
## 
## $Question
##         name type
## 1 apple.com.    1
## 
## $Answer
##         name type  TTL          data
## 1 apple.com.    1 1547 17.172.224.47
## 2 apple.com.    1 1547  17.178.96.59
## 3 apple.com.    1 1547 17.142.160.59
## 
## $Additional
## list()
## 
## $edns_client_subnet
## [1] "0.0.0.0/0"

The `gdns` lookup functions are set to use an `edns_client_subnet` of `0.0.0.0/0`, meaning your local IP address or subnet is not leaked outside of your connection to Google (you can override this behavior).

You can do reverse lookups as well (i.e. query IP addresses):

query("17.172.224.47", "PTR")
## $Status
## [1] 0
## 
## $TC
## [1] FALSE
## 
## $RD
## [1] TRUE
## 
## $RA
## [1] TRUE
## 
## $AD
## [1] FALSE
## 
## $CD
## [1] FALSE
## 
## $Question
##                          name type
## 1 47.224.172.17.in-addr.arpa.   12
## 
## $Answer
##                            name type  TTL                           data
## 1   47.224.172.17.in-addr.arpa.   12 1073               webobjects.info.
## 2   47.224.172.17.in-addr.arpa.   12 1073                   yessql.info.
## 3   47.224.172.17.in-addr.arpa.   12 1073                 apples-msk.ru.
## 4   47.224.172.17.in-addr.arpa.   12 1073                     icloud.se.
## 5   47.224.172.17.in-addr.arpa.   12 1073                     icloud.es.
## 6   47.224.172.17.in-addr.arpa.   12 1073                     icloud.om.
## 7   47.224.172.17.in-addr.arpa.   12 1073                   icloudo.com.
## 8   47.224.172.17.in-addr.arpa.   12 1073                     icloud.ch.
## 9   47.224.172.17.in-addr.arpa.   12 1073                     icloud.fr.
## 10  47.224.172.17.in-addr.arpa.   12 1073                   icloude.com.
## 11  47.224.172.17.in-addr.arpa.   12 1073          camelspaceeffect.com.
## 12  47.224.172.17.in-addr.arpa.   12 1073                 camelphat.com.
## 13  47.224.172.17.in-addr.arpa.   12 1073              alchemysynth.com.
## 14  47.224.172.17.in-addr.arpa.   12 1073                    openni.org.
## 15  47.224.172.17.in-addr.arpa.   12 1073                      swell.am.
## 16  47.224.172.17.in-addr.arpa.   12 1073                  appleweb.net.
## 17  47.224.172.17.in-addr.arpa.   12 1073       appleipodsettlement.com.
## 18  47.224.172.17.in-addr.arpa.   12 1073                    earpod.net.
## 19  47.224.172.17.in-addr.arpa.   12 1073                 yourapple.com.
## 20  47.224.172.17.in-addr.arpa.   12 1073                    xserve.net.
## 21  47.224.172.17.in-addr.arpa.   12 1073                    xserve.com.
## 22  47.224.172.17.in-addr.arpa.   12 1073            velocityengine.com.
## 23  47.224.172.17.in-addr.arpa.   12 1073           velocity-engine.com.
## 24  47.224.172.17.in-addr.arpa.   12 1073            universityarts.com.
## 25  47.224.172.17.in-addr.arpa.   12 1073            thinkdifferent.com.
## 26  47.224.172.17.in-addr.arpa.   12 1073               theatremode.com.
## 27  47.224.172.17.in-addr.arpa.   12 1073               theatermode.com.
## 28  47.224.172.17.in-addr.arpa.   12 1073           streamquicktime.net.
## 29  47.224.172.17.in-addr.arpa.   12 1073           streamquicktime.com.
## 30  47.224.172.17.in-addr.arpa.   12 1073                ripmixburn.com.
## 31  47.224.172.17.in-addr.arpa.   12 1073              rip-mix-burn.com.
## 32  47.224.172.17.in-addr.arpa.   12 1073        quicktimestreaming.net.
## 33  47.224.172.17.in-addr.arpa.   12 1073        quicktimestreaming.com.
## 34  47.224.172.17.in-addr.arpa.   12 1073                  quicktime.cc.
## 35  47.224.172.17.in-addr.arpa.   12 1073                      qttv.net.
## 36  47.224.172.17.in-addr.arpa.   12 1073                      qtml.com.
## 37  47.224.172.17.in-addr.arpa.   12 1073                     qt-tv.net.
## 38  47.224.172.17.in-addr.arpa.   12 1073          publishingsurvey.org.
## 39  47.224.172.17.in-addr.arpa.   12 1073          publishingsurvey.com.
## 40  47.224.172.17.in-addr.arpa.   12 1073        publishingresearch.org.
## 41  47.224.172.17.in-addr.arpa.   12 1073        publishingresearch.com.
## 42  47.224.172.17.in-addr.arpa.   12 1073         publishing-survey.org.
## 43  47.224.172.17.in-addr.arpa.   12 1073         publishing-survey.com.
## 44  47.224.172.17.in-addr.arpa.   12 1073       publishing-research.org.
## 45  47.224.172.17.in-addr.arpa.   12 1073       publishing-research.com.
## 46  47.224.172.17.in-addr.arpa.   12 1073                  powerbook.cc.
## 47  47.224.172.17.in-addr.arpa.   12 1073             playquicktime.net.
## 48  47.224.172.17.in-addr.arpa.   12 1073             playquicktime.com.
## 49  47.224.172.17.in-addr.arpa.   12 1073           nwk-apple.apple.com.
## 50  47.224.172.17.in-addr.arpa.   12 1073                   myapple.net.
## 51  47.224.172.17.in-addr.arpa.   12 1073                  macreach.net.
## 52  47.224.172.17.in-addr.arpa.   12 1073                  macreach.com.
## 53  47.224.172.17.in-addr.arpa.   12 1073                   macmate.com.
## 54  47.224.172.17.in-addr.arpa.   12 1073         macintoshsoftware.com.
## 55  47.224.172.17.in-addr.arpa.   12 1073                    machos.net.
## 56  47.224.172.17.in-addr.arpa.   12 1073                   mach-os.net.
## 57  47.224.172.17.in-addr.arpa.   12 1073                   mach-os.com.
## 58  47.224.172.17.in-addr.arpa.   12 1073                   ischool.com.
## 59  47.224.172.17.in-addr.arpa.   12 1073           insidemacintosh.com.
## 60  47.224.172.17.in-addr.arpa.   12 1073             imovietheater.com.
## 61  47.224.172.17.in-addr.arpa.   12 1073               imoviestage.com.
## 62  47.224.172.17.in-addr.arpa.   12 1073             imoviegallery.com.
## 63  47.224.172.17.in-addr.arpa.   12 1073               imacsources.com.
## 64  47.224.172.17.in-addr.arpa.   12 1073        imac-applecomputer.com.
## 65  47.224.172.17.in-addr.arpa.   12 1073                imac-apple.com.
## 66  47.224.172.17.in-addr.arpa.   12 1073                     ikids.com.
## 67  47.224.172.17.in-addr.arpa.   12 1073              ibookpartner.com.
## 68  47.224.172.17.in-addr.arpa.   12 1073                   geoport.com.
## 69  47.224.172.17.in-addr.arpa.   12 1073                   firewire.cl.
## 70  47.224.172.17.in-addr.arpa.   12 1073               expertapple.com.
## 71  47.224.172.17.in-addr.arpa.   12 1073              edu-research.org.
## 72  47.224.172.17.in-addr.arpa.   12 1073               dvdstudiopro.us.
## 73  47.224.172.17.in-addr.arpa.   12 1073              dvdstudiopro.org.
## 74  47.224.172.17.in-addr.arpa.   12 1073              dvdstudiopro.net.
## 75  47.224.172.17.in-addr.arpa.   12 1073             dvdstudiopro.info.
## 76  47.224.172.17.in-addr.arpa.   12 1073              dvdstudiopro.com.
## 77  47.224.172.17.in-addr.arpa.   12 1073              dvdstudiopro.biz.
## 78  47.224.172.17.in-addr.arpa.   12 1073          developercentral.com.
## 79  47.224.172.17.in-addr.arpa.   12 1073             desktopmovies.org.
## 80  47.224.172.17.in-addr.arpa.   12 1073             desktopmovies.net.
## 81  47.224.172.17.in-addr.arpa.   12 1073              desktopmovie.org.
## 82  47.224.172.17.in-addr.arpa.   12 1073              desktopmovie.net.
## 83  47.224.172.17.in-addr.arpa.   12 1073              desktopmovie.com.
## 84  47.224.172.17.in-addr.arpa.   12 1073          darwinsourcecode.com.
## 85  47.224.172.17.in-addr.arpa.   12 1073              darwinsource.org.
## 86  47.224.172.17.in-addr.arpa.   12 1073              darwinsource.com.
## 87  47.224.172.17.in-addr.arpa.   12 1073                darwincode.com.
## 88  47.224.172.17.in-addr.arpa.   12 1073                carbontest.com.
## 89  47.224.172.17.in-addr.arpa.   12 1073              carbondating.com.
## 90  47.224.172.17.in-addr.arpa.   12 1073                 carbonapi.com.
## 91  47.224.172.17.in-addr.arpa.   12 1073           braeburncapital.com.
## 92  47.224.172.17.in-addr.arpa.   12 1073                  applexpo.net.
## 93  47.224.172.17.in-addr.arpa.   12 1073                  applexpo.com.
## 94  47.224.172.17.in-addr.arpa.   12 1073                applereach.net.
## 95  47.224.172.17.in-addr.arpa.   12 1073                applereach.com.
## 96  47.224.172.17.in-addr.arpa.   12 1073            appleiservices.com.
## 97  47.224.172.17.in-addr.arpa.   12 1073     applefinalcutproworld.org.
## 98  47.224.172.17.in-addr.arpa.   12 1073     applefinalcutproworld.net.
## 99  47.224.172.17.in-addr.arpa.   12 1073     applefinalcutproworld.com.
## 100 47.224.172.17.in-addr.arpa.   12 1073            applefilmmaker.com.
## 101 47.224.172.17.in-addr.arpa.   12 1073             applefilmaker.com.
## 102 47.224.172.17.in-addr.arpa.   12 1073                appleenews.com.
## 103 47.224.172.17.in-addr.arpa.   12 1073               appledarwin.org.
## 104 47.224.172.17.in-addr.arpa.   12 1073               appledarwin.net.
## 105 47.224.172.17.in-addr.arpa.   12 1073               appledarwin.com.
## 106 47.224.172.17.in-addr.arpa.   12 1073         applecomputerimac.com.
## 107 47.224.172.17.in-addr.arpa.   12 1073        applecomputer-imac.com.
## 108 47.224.172.17.in-addr.arpa.   12 1073                  applecare.cc.
## 109 47.224.172.17.in-addr.arpa.   12 1073               applecarbon.com.
## 110 47.224.172.17.in-addr.arpa.   12 1073                 apple-inc.net.
## 111 47.224.172.17.in-addr.arpa.   12 1073               apple-enews.com.
## 112 47.224.172.17.in-addr.arpa.   12 1073              apple-darwin.org.
## 113 47.224.172.17.in-addr.arpa.   12 1073              apple-darwin.net.
## 114 47.224.172.17.in-addr.arpa.   12 1073              apple-darwin.com.
## 115 47.224.172.17.in-addr.arpa.   12 1073                  mobileme.com.
## 116 47.224.172.17.in-addr.arpa.   12 1073                ipa-iphone.net.
## 117 47.224.172.17.in-addr.arpa.   12 1073               jetfuelapps.com.
## 118 47.224.172.17.in-addr.arpa.   12 1073                jetfuelapp.com.
## 119 47.224.172.17.in-addr.arpa.   12 1073                   burstly.net.
## 120 47.224.172.17.in-addr.arpa.   12 1073             appmediagroup.com.
## 121 47.224.172.17.in-addr.arpa.   12 1073             airsupportapp.com.
## 122 47.224.172.17.in-addr.arpa.   12 1073            burstlyrewards.com.
## 123 47.224.172.17.in-addr.arpa.   12 1073        surveys-temp.apple.com.
## 124 47.224.172.17.in-addr.arpa.   12 1073               appleiphone.com.
## 125 47.224.172.17.in-addr.arpa.   12 1073                       asto.re.
## 126 47.224.172.17.in-addr.arpa.   12 1073                 itunesops.com.
## 127 47.224.172.17.in-addr.arpa.   12 1073                     apple.com.
## 128 47.224.172.17.in-addr.arpa.   12 1073     st11p01ww-apple.apple.com.
## 129 47.224.172.17.in-addr.arpa.   12 1073                      apple.by.
## 130 47.224.172.17.in-addr.arpa.   12 1073                 airtunes.info.
## 131 47.224.172.17.in-addr.arpa.   12 1073              applecentre.info.
## 132 47.224.172.17.in-addr.arpa.   12 1073         applecomputerinc.info.
## 133 47.224.172.17.in-addr.arpa.   12 1073                appleexpo.info.
## 134 47.224.172.17.in-addr.arpa.   12 1073             applemasters.info.
## 135 47.224.172.17.in-addr.arpa.   12 1073                 applepay.info.
## 136 47.224.172.17.in-addr.arpa.   12 1073 applepaymerchantsupplies.info.
## 137 47.224.172.17.in-addr.arpa.   12 1073         applepaysupplies.info.
## 138 47.224.172.17.in-addr.arpa.   12 1073              applescript.info.
## 139 47.224.172.17.in-addr.arpa.   12 1073               appleshare.info.
## 140 47.224.172.17.in-addr.arpa.   12 1073                   macosx.info.
## 141 47.224.172.17.in-addr.arpa.   12 1073                powerbook.info.
## 142 47.224.172.17.in-addr.arpa.   12 1073                 powermac.info.
## 143 47.224.172.17.in-addr.arpa.   12 1073            quicktimelive.info.
## 144 47.224.172.17.in-addr.arpa.   12 1073              quicktimetv.info.
## 145 47.224.172.17.in-addr.arpa.   12 1073                 sherlock.info.
## 146 47.224.172.17.in-addr.arpa.   12 1073            shopdifferent.info.
## 147 47.224.172.17.in-addr.arpa.   12 1073                 skyvines.info.
## 148 47.224.172.17.in-addr.arpa.   12 1073                     ubnw.info.
## 
## $Additional
## list()
## 
## $edns_client_subnet
## [1] "0.0.0.0/0"

And, you can go “easter egg” hunting:

cat(query("google-public-dns-a.google.com", "TXT")$Answer$data)
## "http://xkcd.com/1361/"

Note that Google DNS-over-HTTPS supports [all the RR types](http://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml#dns-parameters-4).

If you have more than a few domains to lookup and are querying for the same RR record, you can use the `bulk_query()` function:

hosts <- c("rud.is", "dds.ec", "r-project.org", "rstudio.com", "apple.com")
bulk_query(hosts)
## Source: local data frame [7 x 4]
## 
##             name  type   TTL            data
##            (chr) (int) (int)           (chr)
## 1        rud.is.     1  3599 104.236.112.222
## 2        dds.ec.     1   299   162.243.111.4
## 3 r-project.org.     1  3601   137.208.57.37
## 4   rstudio.com.     1  3599    45.79.156.36
## 5     apple.com.     1  1088   17.172.224.47
## 6     apple.com.     1  1088    17.178.96.59
## 7     apple.com.     1  1088   17.142.160.59

Note that this function only returns a `data_frame` (none of the status fields).

### More DNSpelunking with `gdns`

DNS records contain a treasure trove of data (at least for cybersecurity researchers). Say you have a list of base, primary domains for the Fortune 1000:

library(readr)
library(urltools)
 
URL <- "https://gist.githubusercontent.com/hrbrmstr/ae574201af3de035c684/raw/2d21bb4132b77b38f2992dfaab99649397f238e9/f1000.csv"
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil)
 
f1k <- read_csv(fil)
 
doms1k <- suffix_extract(domain(f1k$website))
doms1k <- paste(doms1k$domain, doms1k$suffix, sep=".")
 
head(doms1k)
## [1] "walmart.com"           "exxonmobil.com"       
## [3] "chevron.com"           "berkshirehathaway.com"
## [5] "apple.com"             "gm.com"

We can get all the `TXT` records for them:

library(parallel)
library(doParallel) # parallel ops will make this go faster
library(foreach)
library(dplyr)
library(ggplot2)
library(grid)
library(hrbrmrkdn)
 
cl <- makePSOCKcluster(4)
registerDoParallel(cl)
 
f1k_l <- foreach(dom=doms1k) %dopar% gdns::bulk_query(dom, "TXT")
f1k <- bind_rows(f1k_l)
 
length(unique(f1k$name))
## [1] 858
 
df <- count(count(f1k, name), `Number of TXT records`=n)
df <- bind_rows(df, data_frame(`Number of TXT records`=0, n=142))
 
gg <- ggplot(df, aes(`Number of TXT records`, n))
gg <- gg + geom_bar(stat="identity", width=0.75)
gg <- gg + scale_x_continuous(expand=c(0,0), breaks=0:13)
gg <- gg + scale_y_continuous(expand=c(0,0))
gg <- gg + labs(y="# Orgs", 
                title="TXT record count per Fortune 1000 Org")
gg <- gg + theme_hrbrmstr(grid="Y", axis="xy")
gg <- gg + theme(axis.title.x=element_text(margin=margin(t=-22)))
gg <- gg + theme(axis.title.y=element_text(angle=0, vjust=1, 
                                           margin=margin(r=-49)))
gg <- gg + theme(plot.margin=margin(t=10, l=30, b=30, r=10))
gg <- gg + theme(plot.title=element_text(margin=margin(b=20)))
gg

Fullscreen_4_11_16__12_35_AM

We can see that 858 of the Fortune 1000 have `TXT` records and more than a few have between 2 and 5 of them. Why look at `TXT` records? Well, they can tell us things like who uses cloud e-mail services, such as Outlook365:

sort(f1k$name[which(grepl("(MS=|outlook)", spf_includes(f1k$data), ignore.case=TRUE))])
##   [1] "21cf.com."                  "77nrg.com."                 "abbott.com."                "acuitybrands.com."         
##   [5] "adm.com."                   "adobe.com."                 "alaskaair.com."             "aleris.com."               
##   [9] "allergan.com."              "altria.com."                "amark.com."                 "ameren.com."               
##  [13] "americantower.com."         "ametek.com."                "amkor.com."                 "amphenol.com."             
##  [17] "amwater.com."               "analog.com."                "anixter.com."               "apachecorp.com."           
##  [21] "archrock.com."              "archrock.com."              "armstrong.com."             "aschulman.com."            
##  [25] "assurant.com."              "autonation.com."            "autozone.com."              "axiall.com."               
##  [29] "bd.com."                    "belk.com."                  "biglots.com."               "bio-rad.com."              
##  [33] "biomet.com."                "bloominbrands.com."         "bms.com."                   "borgwarner.com."           
##  [37] "boydgaming.com."            "brinks.com."                "brocade.com."               "brunswick.com."            
##  [41] "cabotog.com."               "caleres.com."               "campbellsoupcompany.com."   "carefusion.com."           
##  [45] "carlyle.com."               "cartech.com."               "cbrands.com."               "cbre.com."                 
##  [49] "chemtura.com."              "chipotle.com."              "chiquita.com."              "churchdwight.com."         
##  [53] "cinemark.com."              "cintas.com."                "cmc.com."                   "cmsenergy.com."            
##  [57] "cognizant.com."             "colfaxcorp.com."            "columbia.com."              "commscope.com."            
##  [61] "con-way.com."               "convergys.com."             "couche-tard.com."           "crestwoodlp.com."          
##  [65] "crowncastle.com."           "crowncork.com."             "csx.com."                   "cummins.com."              
##  [69] "cunamutual.com."            "dana.com."                  "darlingii.com."             "deanfoods.com."            
##  [73] "dentsplysirona.com."        "discoverfinancial.com."     "disney.com."                "donaldson.com."            
##  [77] "drhorton.com."              "dupont.com."                "dyn-intl.com."              "dynegy.com."               
##  [81] "ea.com."                    "eastman.com."               "ecolab.com."                "edgewell.com."             
##  [85] "edwards.com."               "emc.com."                   "enablemidstream.com."       "energyfutureholdings.com." 
##  [89] "energytransfer.com."        "eogresources.com."          "equinix.com."               "expeditors.com."           
##  [93] "express.com."               "fastenal.com."              "ferrellgas.com."            "fisglobal.com."            
##  [97] "flowserve.com."             "fmglobal.com."              "fnf.com."                   "g-iii.com."                
## [101] "genpt.com."                 "ggp.com."                   "gilead.com."                "goodyear.com."             
## [105] "grainger.com."              "graphicpkg.com."            "hanes.com."                 "hanover.com."              
## [109] "harley-davidson.com."       "harsco.com."                "hasbro.com."                "hbfuller.com."             
## [113] "hei.com."                   "hhgregg.com."               "hnicorp.com."               "homedepot.com."            
## [117] "hpinc.com."                 "hubgroup.com."              "iac.com."                   "igt.com."                  
## [121] "iheartmedia.com."           "insperity.com."             "itt.com."                   "itw.com."                  
## [125] "jarden.com."                "jcpenney.com."              "jll.com."                   "joyglobal.com."            
## [129] "juniper.net."               "kellyservices.com."         "kennametal.com."            "kiewit.com."               
## [133] "kindermorgan.com."          "kindredhealthcare.com."     "kodak.com."                 "lamresearch.com."          
## [137] "lansingtradegroup.com."     "lennar.com."                "levistrauss.com."           "lithia.com."               
## [141] "manitowoc.com."             "manpowergroup.com."         "marathonoil.com."           "marathonpetroleum.com."    
## [145] "mastec.com."                "mastercard.com."            "mattel.com."                "maximintegrated.com."      
## [149] "mednax.com."                "mercuryinsurance.com."      "mgmresorts.com."            "micron.com."               
## [153] "mohawkind.com."             "molsoncoors.com."           "mosaicco.com."              "motorolasolutions.com."    
## [157] "mpgdriven.com."             "mscdirect.com."             "mtb.com."                   "murphyoilcorp.com."        
## [161] "mutualofomaha.com."         "mwv.com."                   "navistar.com."              "nbty.com."                 
## [165] "newellrubbermaid.com."      "nexeosolutions.com."        "nike.com."                  "nobleenergyinc.com."       
## [169] "o-i.com."                   "oge.com."                   "olin.com."                  "omnicomgroup.com."         
## [173] "onsemi.com."                "owens-minor.com."           "paychex.com."               "peabodyenergy.com."        
## [177] "pepboys.com."               "pmi.com."                   "pnkinc.com."                "polaris.com."              
## [181] "polyone.com."               "postholdings.com."          "ppg.com."                   "prudential.com."           
## [185] "qg.com."                    "quantaservices.com."        "quintiles.com."             "rcscapital.com."           
## [189] "rexnord.com."               "roberthalf.com."            "rushenterprises.com."       "ryland.com."               
## [193] "sandisk.com."               "sands.com."                 "scansource.com."            "sempra.com."               
## [197] "sonoco.com."                "spiritaero.com."            "sprouts.com."               "stanleyblackanddecker.com."
## [201] "starwoodhotels.com."        "steelcase.com."             "stryker.com."               "sunedison.com."            
## [205] "sunpower.com."              "supervalu.com."             "swifttrans.com."            "synnex.com."               
## [209] "taylormorrison.com."        "techdata.com."              "tegna.com."                 "tempursealy.com."          
## [213] "tetratech.com."             "theice.com."                "thermofisher.com."          "tjx.com."                  
## [217] "trueblue.com."              "ufpi.com."                  "ulta.com."                  "unfi.com."                 
## [221] "unifiedgrocers.com."        "universalcorp.com."         "vishay.com."                "visteon.com."              
## [225] "vwr.com."                   "westarenergy.com."          "westernunion.com."          "westrock.com."             
## [229] "wfscorp.com."               "whitewave.com."             "wpxenergy.com."             "wyndhamworldwide.com."     
## [233] "xilinx.com."                "xpo.com."                   "yum.com."                   "zimmerbiomet.com."

That’s 236 of them outsourcing some part of e-mail services to Microsoft.

We can also see which ones have terrible mail configs (`+all` or `all` passing):

f1k[which(passes_all(f1k$data)),]$name
## [1] "wfscorp.com."      "dupont.com."       "group1auto.com."   "uhsinc.com."      
## [5] "bigheartpet.com."  "pcconnection.com."

or are configured for Exchange federation services:

sort(f1k$name[which(grepl("==", f1k$data))])
## sort(f1k$name[which(grepl("==", f1k$data))])
##   [1] "21cf.com."                 "aarons.com."               "abbott.com."               "abbvie.com."              
##   [5] "actavis.com."              "activisionblizzard.com."   "acuitybrands.com."         "adm.com."                 
##   [9] "adobe.com."                "adt.com."                  "advanceautoparts.com."     "aecom.com."               
##  [13] "aetna.com."                "agilent.com."              "airproducts.com."          "alcoa.com."               
##  [17] "aleris.com."               "allergan.com."             "alliancedata.com."         "amcnetworks.com."         
##  [21] "amd.com."                  "americantower.com."        "amfam.com."                "amgen.com."               
##  [25] "amtrustgroup.com."         "amtrustgroup.com."         "amtrustgroup.com."         "amtrustgroup.com."        
##  [29] "anadarko.com."             "analog.com."               "apachecorp.com."           "applied.com."             
##  [33] "aptar.com."                "aramark.com."              "aramark.com."              "arcb.com."                
##  [37] "archcoal.com."             "armstrong.com."            "armstrong.com."            "arrow.com."               
##  [41] "asburyauto.com."           "autonation.com."           "avnet.com."                "ball.com."                
##  [45] "bankofamerica.com."        "baxter.com."               "bc.com."                   "bd.com."                  
##  [49] "bd.com."                   "bd.com."                   "belden.com."               "bemis.com."               
##  [53] "bestbuy.com."              "biogen.com."               "biomet.com."               "bloominbrands.com."       
##  [57] "bms.com."                  "boeing.com."               "bonton.com."               "borgwarner.com."          
##  [61] "brinks.com."               "brocade.com."              "brunswick.com."            "c-a-m.com."               
##  [65] "ca.com."                   "cabelas.com."              "cabotog.com."              "caleres.com."             
##  [69] "caleres.com."              "caleres.com."              "calpine.com."              "capitalone.com."          
##  [73] "cardinal.com."             "carlyle.com."              "carlyle.com."              "cartech.com."             
##  [77] "cbre.com."                 "celgene.com."              "centene.com."              "centurylink.com."         
##  [81] "cerner.com."               "cerner.com."               "cfindustries.com."         "ch2m.com."                
##  [85] "chevron.com."              "chipotle.com."             "chiquita.com."             "chk.com."                 
##  [89] "chrobinson.com."           "chs.net."                  "chsinc.com."               "chubb.com."               
##  [93] "ciena.com."                "cigna.com."                "cinemark.com."             "cit.com."                 
##  [97] "cmc.com."                  "cmegroup.com."             "coach.com."                "cognizant.com."           
## [101] "cokecce.com."              "colfaxcorp.com."           "columbia.com."             "commscope.com."           
## [105] "con-way.com."              "conagrafoods.com."         "conocophillips.com."       "coopertire.com."          
## [109] "core-mark.com."            "crbard.com."               "crestwoodlp.com."          "crowncastle.com."         
## [113] "crowncork.com."            "csx.com."                  "danaher.com."              "darden.com."              
## [117] "darlingii.com."            "davita.com."               "davita.com."               "davita.com."              
## [121] "dentsplysirona.com."       "diebold.com."              "diplomat.is."              "dish.com."                
## [125] "disney.com."               "donaldson.com."            "dresser-rand.com."         "dstsystems.com."          
## [129] "dupont.com."               "dupont.com."               "dyn-intl.com."             "dyn-intl.com."            
## [133] "dynegy.com."               "ea.com."                   "ea.com."                   "eastman.com."             
## [137] "ebay.com."                 "echostar.com."             "ecolab.com."               "edmc.edu."                
## [141] "edwards.com."              "elcompanies.com."          "emc.com."                  "emerson.com."             
## [145] "energyfutureholdings.com." "energytransfer.com."       "eogresources.com."         "equinix.com."             
## [149] "essendant.com."            "esterline.com."            "evhc.net."                 "exelisinc.com."           
## [153] "exeloncorp.com."           "express-scripts.com."      "express.com."              "express.com."             
## [157] "exxonmobil.com."           "familydollar.com."         "fanniemae.com."            "fastenal.com."            
## [161] "fbhs.com."                 "ferrellgas.com."           "firstenergycorp.com."      "firstsolar.com."          
## [165] "fiserv.com."               "flowserve.com."            "fmc.com."                  "fmglobal.com."            
## [169] "fnf.com."                  "freddiemac.com."           "ge.com."                   "genpt.com."               
## [173] "genworth.com."             "ggp.com."                  "grace.com."                "grainger.com."            
## [177] "graphicpkg.com."           "graybar.com."              "guess.com."                "hain.com."                
## [181] "halliburton.com."          "hanes.com."                "hanes.com."                "harley-davidson.com."     
## [185] "harman.com."               "harris.com."               "harsco.com."               "hasbro.com."              
## [189] "hcahealthcare.com."        "hcc.com."                  "hei.com."                  "henryschein.com."         
## [193] "hess.com."                 "hhgregg.com."              "hnicorp.com."              "hollyfrontier.com."       
## [197] "hologic.com."              "honeywell.com."            "hospira.com."              "hp.com."                  
## [201] "hpinc.com."                "hrblock.com."              "iac.com."                  "igt.com."                 
## [205] "iheartmedia.com."          "imshealth.com."            "ingrammicro.com."          "intel.com."               
## [209] "interpublic.com."          "intuit.com."               "ironmountain.com."         "jacobs.com."              
## [213] "jarden.com."               "jcpenney.com."             "jll.com."                  "johndeere.com."           
## [217] "johndeere.com."            "joyglobal.com."            "juniper.net."              "karauctionservices.com."  
## [221] "kbhome.com."               "kemper.com."               "keurig.com."               "khov.com."                
## [225] "kindredhealthcare.com."    "kkr.com."                  "kla-tencor.com."           "labcorp.com."             
## [229] "labcorp.com."              "lamresearch.com."          "lamresearch.com."          "landolakesinc.com."       
## [233] "lansingtradegroup.com."    "lear.com."                 "leggmason.com."            "leidos.com."              
## [237] "level3.com."               "libertymutual.com."        "lilly.com."                "lithia.com."              
## [241] "livenation.com."           "lkqcorp.com."              "loews.com."                "magellanhealth.com."      
## [245] "manitowoc.com."            "marathonoil.com."          "marathonpetroleum.com."    "markelcorp.com."          
## [249] "markwest.com."             "marriott.com."             "martinmarietta.com."       "masco.com."               
## [253] "massmutual.com."           "mastec.com."               "mastercard.com."           "mattel.com."              
## [257] "maximintegrated.com."      "mckesson.com."             "mercuryinsurance.com."     "meritor.com."             
## [261] "metlife.com."              "mgmresorts.com."           "micron.com."               "microsoft.com."           
## [265] "mohawkind.com."            "molsoncoors.com."          "monsanto.com."             "mosaicco.com."            
## [269] "motorolasolutions.com."    "mscdirect.com."            "murphyoilcorp.com."        "nasdaqomx.com."           
## [273] "navistar.com."             "nbty.com."                 "ncr.com."                  "netapp.com."              
## [277] "newfield.com."             "newscorp.com."             "nike.com."                 "nov.com."                 
## [281] "nrgenergy.com."            "ntenergy.com."             "nucor.com."                "nustarenergy.com."        
## [285] "o-i.com."                  "oaktreecapital.com."       "ocwen.com."                "omnicare.com."            
## [289] "oneok.com."                "oneok.com."                "onsemi.com."               "outerwall.com."           
## [293] "owens-minor.com."          "owens-minor.com."          "oxy.com."                  "packagingcorp.com."       
## [297] "pall.com."                 "parexel.com."              "paychex.com."              "pcconnection.com."        
## [301] "penskeautomotive.com."     "pepsico.com."              "pfizer.com."               "pg.com."                  
## [305] "polaris.com."              "polyone.com."              "pplweb.com."               "principal.com."           
## [309] "protective.com."           "publix.com."               "qg.com."                   "questdiagnostics.com."    
## [313] "quintiles.com."            "rcscapital.com."           "realogy.com."              "regmovies.com."           
## [317] "rentacenter.com."          "republicservices.com."     "rexnord.com."              "reynoldsamerican.com."    
## [321] "reynoldsamerican.com."     "rgare.com."                "roberthalf.com."           "rpc.net."                 
## [325] "rushenterprises.com."      "safeway.com."              "saic.com."                 "sandisk.com."             
## [329] "scana.com."                "scansource.com."           "seaboardcorp.com."         "selective.com."           
## [333] "selective.com."            "sempra.com."               "servicemaster.com."        "servicemaster.com."       
## [337] "servicemaster.com."        "sjm.com."                  "sm-energy.com."            "spectraenergy.com."       
## [341] "spiritaero.com."           "sprouts.com."              "spx.com."                  "staples.com."             
## [345] "starbucks.com."            "starwoodhotels.com."       "statestreet.com."          "steelcase.com."           
## [349] "steeldynamics.com."        "stericycle.com."           "stifel.com."               "stryker.com."             
## [353] "sunedison.com."            "sungard.com."              "supervalu.com."            "symantec.com."            
## [357] "symantec.com."             "synnex.com."               "synopsys.com."             "taylormorrison.com."      
## [361] "tdsinc.com."               "teamhealth.com."           "techdata.com."             "teledyne.com."            
## [365] "tempursealy.com."          "tenethealth.com."          "teradata.com."             "tetratech.com."           
## [369] "textron.com."              "thermofisher.com."         "tiaa-cref.org."            "tiffany.com."             
## [373] "timewarner.com."           "towerswatson.com."         "treehousefoods.com."       "tribunemedia.com."        
## [377] "trimble.com."              "trinet.com."               "trueblue.com."             "ugicorp.com."             
## [381] "uhsinc.com."               "ulta.com."                 "unifiedgrocers.com."       "unisys.com."              
## [385] "unum.com."                 "usfoods.com."              "varian.com."               "verizon.com."             
## [389] "vfc.com."                  "viacom.com."               "visa.com."                 "vishay.com."              
## [393] "visteon.com."              "wabtec.com."               "walmart.com."              "wecenergygroup.com."      
## [397] "wecenergygroup.com."       "west.com."                 "westarenergy.com."         "westlake.com."            
## [401] "westrock.com."             "weyerhaeuser.com."         "wholefoodsmarket.com."     "williams.com."            
## [405] "wm.com."                   "wnr.com."                  "wpxenergy.com."            "wyndhamworldwide.com."    
## [409] "xerox.com."                "xpo.com."                  "yrcw.com."                 "yum.com."                 
## [413] "zoetis.com."

And, even go so far as to see what are the most popular third-party mail services:

incl <- suffix_extract(sort(unlist(spf_includes(f1k$data))))
incl <- data.frame(table(paste(incl$domain, incl$suffix, sep=".")), stringsAsFactors=FALSE)
incl <- head(arrange(incl, desc(Freq)), 20)
incl <- mutate(incl, Var1=factor(Var1, Var1))
incl <- rename(incl, Service=Var1, Count=Freq)
 
gg <- ggplot(incl, aes(Service, Count))
gg <- gg + geom_bar(stat="identity", width=0.75)
gg <- gg + scale_x_discrete(expand=c(0,0))
gg <- gg + scale_y_continuous(expand=c(0,0), limits=c(0, 250))
gg <- gg + coord_flip()
gg <- gg + labs(x=NULL, y=NULL, 
                title="Most popular services used by the F1000",
                subtitle="As determined by SPF record configuration")
gg <- gg + theme_hrbrmstr(grid="X", axis="y")
gg <- gg + theme(plot.margin=margin(t=10, l=10, b=20, r=10))
gg

Fullscreen_4_11_16__1_10_AM

### Fin

There are more `TXT` records to play with than just SPF ones and many other hidden easter eggs. I need to add a few more functions into `gdns` before shipping it off to CRAN, so if you have any feature requests, now’s the time to file a [github issue](https://github.com/hrbrmstr/gdns/issues).

>UPDATE: Changed code to reflect the new `horizontal` parameter for `geom_lollipop()`

I make a fair share of bar charts throughout the day and really like switching to lollipop charts to mix things up a bit and enhance the visual appeal. They’re easy to do in `ggplot2`, just use your traditional `x` & `y` mapping for `geom_point()` and then use (you probably want to call this first, actually) `geom_segment()` mapping the `yend` aesthetic to `0` and the `xend` aesthetic to the same thing you used for the `x` aesthetic. But, that’s alot of typing. Hence, the need for `geom_lollipop()`.

I’ll build this example from one [provided by Stephanie Evergreen](http://stephanieevergreen.com/lollipop/) (that one’s in Excel). It’s not much code:

df <- read.csv(text="category,pct
Other,0.09
South Asian/South Asian Americans,0.12
Interngenerational/Generational,0.21
S Asian/Asian Americans,0.25
Muslim Observance,0.29
Africa/Pan Africa/African Americans,0.34
Gender Equity,0.34
Disability Advocacy,0.49
European/European Americans,0.52
Veteran,0.54
Pacific Islander/Pacific Islander Americans,0.59
Non-Traditional Students,0.61
Religious Equity,0.64
Caribbean/Caribbean Americans,0.67
Latino/Latina,0.69
Middle Eastern Heritages and Traditions,0.73
Trans-racial Adoptee/Parent,0.76
LBGTQ/Ally,0.79
Mixed Race,0.80
Jewish Heritage/Observance,0.85
International Students,0.87", stringsAsFactors=FALSE, sep=",", header=TRUE)

# devtools::install_github("hrbrmstr/ggalt")
library(ggplot2)
library(ggalt)
library(scales)

gg <- ggplot(df, aes(y=reorder(category, pct), x=pct))
gg <- gg + geom_lollipop(point.colour="steelblue", point.size=3, horizontal=TRUE)
gg <- gg + scale_x_continuous(expand=c(0,0), labels=percent,
                              breaks=seq(0, 1, by=0.2), limits=c(0, 1))
gg <- gg + coord_flip()
gg <- gg + labs(x=NULL, y=NULL, 
                title="SUNY Cortland Multicultural Alumni survey results",
                subtitle="Ranked by race, ethnicity, home land and orientation\namong the top areas of concern",
                caption="Data from http://stephanieevergreen.com/lollipop/")
gg <- gg + theme_minimal(base_family="Arial Narrow")
gg <- gg + theme(panel.grid.major.y=element_blank())
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(axis.line.y=element_line(color="#2b2b2b", size=0.15))
gg <- gg + theme(axis.text.y=element_text(margin=margin(r=-5, l=0)))
gg <- gg + theme(plot.margin=unit(rep(30, 4), "pt"))
gg <- gg + theme(plot.title=element_text(face="bold"))
gg <- gg + theme(plot.subtitle=element_text(margin=margin(b=10)))
gg <- gg + theme(plot.caption=element_text(size=8, margin=margin(t=10)))
gg

And, I’ll reiterate Stephanie’s note that the data is fake.

download

Compare it with it’s sister bar chart:

download1

to see which one you think works better (it really does come down to personal aesthetics choice).

You can find it in the development version of [`ggalt`](https://github.com/hrbrmstr/ggalt). The API is not locked in yet so definitely provide feedback in the issues.

>UPDATE: Since I put in a “pull request” requirement, I intended to put in a link to getting started with GitHub. Dr. Jenny Bryan’s @stat545 has a great [section on git](https://stat545-ubc.github.io/git00_index.html) that should hopefully make it a bit less painful.

### Why 52Vis?

In case folks are wondering why I’m doing this, it’s pretty simple. We need a society that has high data literacy and we need folks who are capable of making awesome, truthful data visualizations. The only way to do that is by working with data over, and over, and over, and over again.

Directed projects with some reward are one of the best Pavlovian ways to accomplish that :-)

### This week’s challenge

The Data is Plural folks have [done it again](http://tinyletter.com/data-is-plural/letters/data-is-plural-2016-04-06-edition) and there’s a neat and important data set in this week’s vis challenge.

From their newsletter:

>_Every January, at the behest of the U.S. Department of Housing and Urban Development, volunteers across the country attempt to count the homeless in their communities. The result: HUD’s “point in time” estimates, which are currently available for 2007–2015. The most recent estimates found 564,708 homeless people nationwide, with 75,323 of that count (more than 13%) living in New York City._

I decided to take a look at this data by seeing which states had the worst homeless problem per-capita (i.e. per 100K population). I’ve included the population data along with some ready-made wrangling of the HUD data.

But, before we do that…

### RULES UPDATE + Last week’s winner

I’ll be announcing the winner on Thursday since I:

– am horribly sick after being exposed to who knows what after rOpenSci last week in SFO :-)
– have been traveling like mad this week
– need to wrangle all the answers into the github repo and get @laneharrison (and his students) to validate my choice for winner (I have picked a winner)

Given how hard the wrangling has been, I’m going to need to request that folks both leave a blog comment and file a PR to [the github repo](https://github.com/52vis/2016-14) for this week. Please include the code you used as well as the vis (or a link to a working interactive vis)

### PRIZES UPDATE

Not only can I offer [Data-Driven Security](http://dds.ec/amzn), but Hadley Wickham has offered signed copies of his books as well, and I’ll keep in the Amazon gift card as a catch-all if you have more (NOTE: if any other authors want to offer up their tomes shoot me a note!).

### No place to roam

Be warned: this was a pretty depressing data set. I went in with the question of wanting to know which “states” had the worst problem and I assumed it’d be California or New York. I had no idea it would be what it was and the exercise shattered some assumptions.

NOTE: I’ve included U.S. population data for the necessary time period.

library(readxl)
library(purrr)
library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)
library(scales)
library(grid)
library(hrbrmisc)
 
# grab the HUD homeless data
 
URL <- "https://www.hudexchange.info/resources/documents/2007-2015-PIT-Counts-by-CoC.xlsx"
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil, mode="wb")
 
# turn the excel tabs into a long data.frame
yrs <- 2015:2007
names(yrs) <- 1:9
homeless <- map_df(names(yrs), function(i) {
  df <- suppressWarnings(read_excel(fil, as.numeric(i)))
  df[,3:ncol(df)] <- suppressWarnings(lapply(df[,3:ncol(df)], as.numeric))
  new_names <- tolower(make.names(colnames(df)))
  new_names <- str_replace_all(new_names, "\\.+", "_")
  df <- setNames(df, str_replace_all(new_names, "_[[:digit:]]+$", ""))
  bind_cols(df, data_frame(year=rep(yrs[i], nrow(df))))
})
 
# clean it up a bit
homeless <- mutate(homeless,
                   state=str_match(coc_number, "^([[:alpha:]]{2})")[,2],
                   coc_name=str_replace(coc_name, " CoC$", ""))
homeless <- select(homeless, year, state, everything())
homeless <- filter(homeless, !is.na(state))
 
# read in the us population data
uspop <- read.csv("uspop.csv", stringsAsFactors=FALSE)
uspop_long <- gather(uspop, year, population, -name, -iso_3166_2)
uspop_long$year <- sub("X", "", uspop_long$year)
 
# normalize the values
states <- count(homeless, year, state, wt=total_homeless)
states <- left_join(states, albersusa::usa_composite()@data[,3:4], by=c("state"="iso_3166_2"))
states <- ungroup(filter(states, !is.na(name)))
states$year <- as.character(states$year)
states <- mutate(left_join(states, uspop_long), homeless_per_100k=(n/population)*100000)
 
# we want to order from worst to best
group_by(states, name) %>%
  summarise(mean=mean(homeless_per_100k, na.rm=TRUE)) %>%
  arrange(desc(mean)) -> ordr
 
states$year <- factor(states$year, levels=as.character(2006:2016))
states$name <- factor(states$name, levels=ordr$name)
 
# plot
#+ fig.retina=2, fig.width=10, fig.height=15
gg <- ggplot(states, aes(x=year, y=homeless_per_100k))
gg <- gg + geom_segment(aes(xend=year, yend=0), size=0.33)
gg <- gg + geom_point(size=0.5)
gg <- gg + scale_x_discrete(expand=c(0,0),
                            breaks=seq(2007, 2015, length.out=5),
                            labels=c("2007", "", "2011", "", "2015"),
                            drop=FALSE)
gg <- gg + scale_y_continuous(expand=c(0,0), labels=comma, limits=c(0,1400))
gg <- gg + labs(x=NULL, y=NULL,
                title="US Department of Housing & Urban Development (HUD) Total (Estimated) Homeless Population",
                subtitle="Counts aggregated from HUD Communities of Care Regional Surveys (normalized per 100K population)",
                caption="Data from: https://www.hudexchange.info/resource/4832/2015-ahar-part-1-pit-estimates-of-homelessness/")
gg <- gg + facet_wrap(~name, scales="free", ncol=6)
gg <- gg + theme_hrbrmstr_an(grid="Y", axis="", strip_text_size=9)
gg <- gg + theme(axis.text.x=element_text(size=8))
gg <- gg + theme(axis.text.y=element_text(size=7))
gg <- gg + theme(panel.margin=unit(c(10, 10), "pt"))
gg <- gg + theme(panel.background=element_rect(color="#97cbdc44", fill="#97cbdc44"))
gg <- gg + theme(plot.margin=margin(10, 20, 10, 15))
gg

percapita

I used one of HUD’s alternate, official color palette colors for the panel backgrounds.

Remember, this is language/tool-agnostic & go in with a good question or two, augment as you feel you need to and show us your vis!

Week 2’s content closes 2016-04-12 23:59 EDT

Contest GitHub Repo:

The [`iptools` package](https://github.com/hrbrmstr/iptools)—a toolkit for manipulating, validating and testing IP addresses and ranges, along with datasets relating to IP addresses—is flying through the internets and hitting a CRAN mirror near you, soon.

### What’s fixed?

[Tim Smith](https://github.com/tdsmith) fixed [a bug](https://github.com/hrbrmstr/iptools/issues/26) in `ip_in_range()` that occurred when the netmask was `/32` (thanks, Tim!).

### What’s new?

The `range_boundaries()` function now returns the three new fields that are pretty obvious once you see it in action:

range_boundaries("172.18.0.0/28")
##   minimum_ip  maximum_ip min_numeric max_numeric         range
## 1 172.18.0.0 172.18.0.15  2886860800  2886860815 172.18.0.0/28

They are tacked on the end, so if you were using positional or named columns previously, you’re still good to go.

We’ve added a new `country_ranges()` function to return all “assigned” CIDR blocks in a country. You just give it character vector of one or more ISO 3166-1 alpha-2 codes and you get back the CIDRs:

country_ranges("TO")
## $TO
## [1] "43.255.148.0/22"  "103.239.160.0/22" "103.242.126.0/23" "103.245.160.0/22" "175.176.144.0/21" "202.43.8.0/21"   
## [7] "202.134.24.0/21"

This data is updated daily and there’s some session caching built-into the function to speed up subsequent calls if you forgot to save the output. You can flush the session cache with `flush_country_cidrs()` and query it with `cached_country_cidrs()`.

### What’s next?

We’re waiting until the R 3.3.0 Windows toolchain is stable to add in MaxMind ASN lookups. If there are any IP-related functions you need added, drop us an [issue](https://github.com/hrbrmstr/iptools/issues). We’re at nearly 1,700 downloads from the RStudio mirror, which (IMO) is kinda cool for such a niche package. Many thanks to all our users and one more thank you to Dirk for the `AsioHeaders` package.

### Fin

If you want some “bad” IP addresses to play around with in `iptools`, check out the [`blocklist`](https://github.com/hrbrmstr/blocklist) package, which provides an interface to a subset of the [blocklist.de](http://www.blocklist.de/en/index.html) API.

>UPDATE: Deadline is now 2016-04-05 23:59 EDT; next vis challenge is 2016-04-06!

Per a suggestion, I’m going to try to find a neat data set (prbly one from @jsvine) to feature each week and toss up some sample code (99% of the time prbly in R) and offer up a vis challenge. Just reply in the comments with a link to a gist/repo/rpub/blog/etc (or post directly, though inserting code requires some markup that you can ping me abt) post containing the code & vis with a brief explanation. I’ll gather up everything into a new github organization I made for this context. You can also submit a PR right to [this week’s repo](https://github.com/52vis/2016-13).

Winners get a free digital copy of [Data-Driven Security](http://amzn.to/ddsec), and if you win more than once I’ll come up with other stuff to give away (either an Amazon gift card, a book or something Captain America related).

Submissions should include a story/angle/question you were trying to answer, any notes or “gotchas” that the code/comments doesn’t explain and a [beautiful] vis. You can use whatever language or tool (even Excel or _ugh_ Tableau), but you’ll have to describe what you did step-by-step for the GUI tools or record a video, since the main point about this contest is to help folks learn about asking questions, munging data and making visualizations. Excel & Tableau lock that knowledge in and Tableau even locks that data in.

### Droning on and on

Today’s data source comes from this week’s Data Is Plural newsletter and is all about drones. @jsvine linked to the [main FAA site](http://www.faa.gov/uas/law_enforcement/uas_sighting_reports/) for drone sightings and there’s enough ways to slice the data that it should make for some interesting story angles.

I will remove one of those angles with a simple bar chart of unmanned aircraft (UAS) sightings by week, using an FAA site color for the bars. I wanted to see if there were any overt visual patterns in the time of year or if the registration requirement at the end of 2015 caused any changes (I didn’t crunch the numbers to see if there were any actual patterns that could be found statistically, but that’s something y’all can do). I’m not curious as to what caused the “spike” in August/September 2015 and the report text may have that data.

I’ve put this week’s example code & data into the [52 vis repo](https://github.com/52vis/2016-13) for this week.

library(ggplot2)
library(ggalt)
library(ggthemes)
library(readxl)
library(dplyr)
library(hrbrmisc)
library(grid)
 
# get copies of the data locally
 
URL1 <- "http://www.faa.gov/uas/media/UAS_Sightings_report_21Aug-31Jan.xlsx"
URL2 <- "http://www.faa.gov/uas/media/UASEventsNov2014-Aug2015.xls"
 
fil1 <- basename(URL1)
fil2 <- basename(URL2)
 
if (!file.exists(fil1)) download.file(URL1, fil1)
if (!file.exists(fil2)) download.file(URL2, fil2)
 
# read it in
 
xl1 <- read_excel(fil1)
xl2 <- read_excel(fil2)
 
# munge it a bit so we can play with it by various calendrical options
 
drones <- setNames(bind_rows(xl2[,1:3],
                             xl1[,c(1,3,4)]), 
                   c("ts", "city", "state"))
drones <- mutate(drones, 
                 year=format(ts, "%Y"), 
                 year_mon=format(ts, "%Y%m"), 
                 ymd=as.Date(ts), 
                 yw=format(ts, "%Y%V"))
 
# let's see them by week
by_week <- mutate(count(drones, yw), wk=as.Date(sprintf("%s1", yw), "%Y%U%u")-7)
 
# this looks like bad data but I didn't investigate it too much
by_week <- arrange(filter(by_week, wk>=as.Date("2014-11-10")), wk)
 
# plot
 
gg <- ggplot(by_week, aes(wk, n))
gg <- gg + geom_bar(stat="identity", fill="#937206")
gg <- gg + annotate("text", by_week$wk[1], 49, label="# reports", 
                    hjust=0, vjust=1, family="Cabin-Italic", size=3)
gg <- gg + scale_x_date(expand=c(0,0))
gg <- gg + scale_y_continuous(expand=c(0,0))
gg <- gg + labs(y=NULL,
                title="Weekly U.S. UAS (drone) sightings",
                subtitle="As reported to the Federal Aviation Administration",
                caption="Data from: http://www.faa.gov/uas/law_enforcement/uas_sighting_reports/")
gg <- gg + theme_hrbrmstr(grid="Y", axis="X")
gg <- gg + theme(axis.title.x=element_text(margin=margin(t=-6)))
gg

RStudioScreenSnapz024

### Fin

I’ll still keep up a weekly vis from the Data Is Plural weekly collection even if this whole contest thing doesn’t take root with folks. You can never have too many examples for budding data folks to review.

Folks who’ve been tracking this blog on R-bloggers probably remember [this post](https://rud.is/b/2014/11/16/moving-the-earth-well-alaska-hawaii-with-r/) where I showed how to create a composite U.S. map with an Albers projection (which is commonly referred to as AlbersUSA these days thanks to D3).

I’m not sure why I didn’t think of this earlier, but you don’t _need_ to do those geographical machinations every time you want a prettier & more inclusive map (Alaska & Hawaii have been states for a while, so perhaps we should make more of an effort to include them in both data sets and maps). After doing the map transformations, the composite shape can be saved out to a shapefile, preferably GeoJSON since (a) you can use `geojsonio::geojson_write()` to save it and (b) it’s a single file vs a ZIP/directory.

I did just that and saved both state and country maps out with FIPS codes and other useful data slot bits and created a small data package : [`albersusa`](https://github.com/hrbrmstr/albersusa) : with some helper functions. It’s not in CRAN yet so you need to `devtools::install_github(“hrbrmstr/albersusa”)` to use it. The github repo has some basic examples, heres a slightly more complex one.

### Mapping Obesity

I grabbed an [obesity data set](http://www.cdc.gov/diabetes/data/county.html) from the CDC and put together a compact example for how to make a composite U.S. county choropleth to show obesity rates per county (for 2012, which is the most recent data). I read in the Excel file, pull out the county FIPS code and 2012 obesity rate, then build the choropleth. It’s not a whole lot of code, but that’s one main reason for the package!

library(readxl)
library(rgeos)
library(maptools)
library(ggplot2)   # devtools::install_github("hadley/ggplot2") only if you want subtitles/captions
library(ggalt)
library(ggthemes)
library(albersusa) # devtools::install_github("hrbrmstr/albersusa")
library(viridis)
library(scales)
 
# get the data and be nice to the server and keep a copy of the data for offline use
 
URL <- "http://www.cdc.gov/diabetes/atlas/countydata/OBPREV/OB_PREV_ALL_STATES.xlsx"
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil)
 
# it's not a horrible Excel file, but we do need to hunt for the data
# and clean it up a bit. we just need FIPS & 2012 percent info
 
wrkbk <- read_excel(fil)
obesity_2012 <- setNames(wrkbk[-1, c(2, 61)], c("fips", "pct"))
obesity_2012$pct <- as.numeric(obesity_2012$pct) / 100
 
# I may make a version of this that returns a fortified data.frame but
# for now, we just need to read the built-in saved shapefile and turn it
# into something ggplot2 can handle
 
cmap <- fortify(counties_composite(), region="fips")
 
# and this is all it takes to make the map below
 
gg <- ggplot()
gg <- gg + geom_map(data=cmap, map=cmap,
                    aes(x=long, y=lat, map_id=id),
                    color="#2b2b2b", size=0.05, fill=NA)
gg <- gg + geom_map(data=obesity_2012, map=cmap,
                    aes(fill=pct, map_id=fips),
                    color="#2b2b2b", size=0.05)
gg <- gg + scale_fill_viridis(name="Obesity", labels=percent)
gg <- gg + coord_proj(us_laea_proj)
gg <- gg + labs(title="U.S. Obesity Rate by County (2012)",
                subtitle="Content source: Centers for Disease Control and Prevention",
           caption="Data from http://www.cdc.gov/diabetes/atlas/countydata/County_ListofIndicators.html")
gg <- gg + theme_map(base_family="Arial Narrow")
gg <- gg + theme(legend.position=c(0.8, 0.25))
gg <- gg + theme(plot.title=element_text(face="bold", size=14, margin=margin(b=6)))
gg <- gg + theme(plot.subtitle=element_text(size=10, margin=margin(b=-14)))
gg

Fullscreen_3_29_16__9_06_AM

### Fin

Note that some cartographers think of this particular map view the way I look at a pie chart, but it’s a compact & convenient way to keep the states/counties together and will make it easier to include Alaska & Hawaii in your cartographic visualizations.

The composite GeoJSON files are in:

– `system.file(“extdata/composite_us_states.geojson.gz”, package=”albersusa”)`
– `system.file(“extdata/composite_us_counties.geojson.gz”, package=”albersusa”)`

if you want to use them in another program/context.

Drop an issue [on github](https://github.com/hrbrmstr/albersusa) if you want any more default fields in the data slot and if you “need” territories (I’d rather have a PR for the latter tho :-).

As I said, I’m kinda obsessed with the “nuclear” data set. So much so that I made a D3 version that’s similar to the R version I made the other day. I tried not to code much today (too much Easter fun going on), so I left off the size & color legends, but it drops the bombs and fills the radiation meters bars as each year progresses.

Working in raw D3 reminds me how nice it is to work in R. Much cruft is abstracted away and the “API” in R (at least list ops and ggplot2) seem more consistent or at least less verbose.

One big annoyance (besides tweaking projection settings) is that I had to set:

script-src 'self' 'unsafe-eval';

on the [content security policy](http://content-security-policy.com/) since D3 uses “eval” a bit.

It’s embedded below and you can bust the frame via [this link](/projects/nucleard3/index.html). I made no effort to have it fit on tiny screens (it just requires fidgeting with the heights/widths of things), but it’s pretty complete code (including browser/touch icons, content security policy & open graph tags) and it’s annotated to help R folks map between ggplot2 and D3. If you build anything on it, drop a note in the comments.