Waffle Geoms & Other Miscellaneous In-Development Package Updates

More than just sergeant has been hacked on recently, so here’s a run-down of various ? updates:

waffle

The square pie chart generating waffle? package now contains a nascent geom_waffle() so you can do things like this:

library(hrbrthemes)
library(waffle)
library(tidyverse)

tibble(
  parts = factor(rep(month.abb[1:3], 3), levels=month.abb[1:3]),
  values = c(10, 20, 30, 6, 14, 40, 30, 20, 10),
  fct = c(rep("Thing 1", 3), rep("Thing 2", 3), rep("Thing 3", 3))
) -> xdf

ggplot(xdf, aes(fill=parts, values=values)) +
  geom_waffle(color = "white", size=1.125, n_rows = 6) +
  facet_wrap(~fct, ncol=1) +
  scale_x_discrete(expand=c(0,0)) +
  scale_y_discrete(expand=c(0,0)) +
  ggthemes::scale_fill_tableau(name=NULL) +
  coord_equal() +
  labs(
    title = "Faceted Waffle Geoms"
  ) +
  theme_ipsum_rc(grid="") +
  theme_enhance_waffle()

and get:

It’s super brand new so pls file issues (wherev you like besides blog comments as they’re not conducive to package triaging) if anything breaks or you need more aesthetic configuration options. NOTE: You need to use the 1.0.0 branch as noted in the master branch README.

markdowntemplates

I had to take a quick peek at markdowntemplates? due to a question from a blog reader about the Jupyter notebook generation functionality. While I was in the code I added two new bits to the knit: markdowntemplates::to_jupyter code. First is the option to specify a run: parameter in the YAML header so you can just knit the document to a Jupyter notebook without executing the chunks:

---
title: "ggplot2 example"
knit: markdowntemplates::to_jupyter
run: false
--- 

If run is not present it defaults to true.

The other add is a bit of intelligence to whether it should include %load_ext rpy2.ipython (the Jupyter “magic” that lets it execute R chunks). If no R code chunks are present, rpy2.ipython will not be loaded.

securitytrails

SecurityTrails is a service for cybersecurity researchers & defenders that provides tools and an API to aid in querying for all sorts of current and historical information on domains and IP addresses. It now (finally) has a mostly-complete R package securitytrails?. They’re research partners of $DAYJOB and their API is ?? so give it a spin if you are looking to broaden your threat-y API collection.

astools

Keeping the cyber theme going for a bit, next up is astools)? which are “Tools to Work With Autonomous System (‘AS’) Network and Organization Data”. Autonomous Systems (AS) are at the core of the internet (we all live in one) and this package provides tools to fetch AS data/metadata from various sources and work with it in R. For instance, we can grab the latest RouteViews data:

(rv_df <- routeviews_latest())
## # A tibble: 786,035 x 6
##    cidr         asn   minimum_ip maximum_ip  min_numeric max_numeric
##    <chr>        <chr> <chr>      <chr>             <dbl>       <dbl>
##  1 1.0.0.0/24   13335 1.0.0.0    1.0.0.255      16777216    16777471
##  2 1.0.4.0/22   56203 1.0.4.0    1.0.7.255      16778240    16779263
##  3 1.0.4.0/24   56203 1.0.4.0    1.0.4.255      16778240    16778495
##  4 1.0.5.0/24   56203 1.0.5.0    1.0.5.255      16778496    16778751
##  5 1.0.6.0/24   56203 1.0.6.0    1.0.6.255      16778752    16779007
##  6 1.0.7.0/24   56203 1.0.7.0    1.0.7.255      16779008    16779263
##  7 1.0.16.0/24  2519  1.0.16.0   1.0.16.255     16781312    16781567
##  8 1.0.64.0/18  18144 1.0.64.0   1.0.127.255    16793600    16809983
##  9 1.0.128.0/17 23969 1.0.128.0  1.0.255.255    16809984    16842751
## 10 1.0.128.0/18 23969 1.0.128.0  1.0.191.255    16809984    16826367
## # ... with 786,025 more rows

That, in turn, can work with iptools::ip_to_asn() so we can figure out which AS an IP address lives in:

rv_trie <- as_asntrie(rv_df)

iptools::ip_to_asn(rv_trie, "174.62.167.97")
## [1] "7922"

It can also fetch AS name info:

asnames_current()
## # A tibble: 63,453 x 4
##    asn   handle       asinfo                                                iso2c
##    <chr> <chr>        <chr>                                                 <chr>
##  1 1     LVLT-1       Level 3 Parent, LLC                                   US   
##  2 2     UDEL-DCN     University of Delaware                                US   
##  3 3     MIT-GATEWAYS Massachusetts Institute of Technology                 US   
##  4 4     ISI-AS       University of Southern California                     US   
##  5 5     SYMBOLICS    Symbolics, Inc.                                       US   
##  6 6     BULL-HN      Bull HN Information Systems Inc.                      US   
##  7 7     DSTL         DSTL                                                  GB   
##  8 8     RICE-AS      Rice University                                       US   
##  9 9     CMU-ROUTER   Carnegie Mellon University                            US   
## 10 10    CSNET-EXT-AS CSNET Coordination and Information Center (CSNET-CIC) US   
## # ... with 63,443 more rows

which we can use for further enrichment:

routeviews_latest() %>% 
  left_join(asnames_current())
## Joining, by = "asn"

## # A tibble: 786,035 x 9
##    cidr         asn   minimum_ip maximum_ip  min_numeric max_numeric handle            asinfo                     iso2c
##    <chr>        <chr> <chr>      <chr>             <dbl>       <dbl> <chr>             <chr>                      <chr>
##  1 1.0.0.0/24   13335 1.0.0.0    1.0.0.255      16777216    16777471 CLOUDFLARENET     Cloudflare, Inc.           US   
##  2 1.0.4.0/22   56203 1.0.4.0    1.0.7.255      16778240    16779263 GTELECOM-AUSTRAL… Gtelecom-AUSTRALIA         AU   
##  3 1.0.4.0/24   56203 1.0.4.0    1.0.4.255      16778240    16778495 GTELECOM-AUSTRAL… Gtelecom-AUSTRALIA         AU   
##  4 1.0.5.0/24   56203 1.0.5.0    1.0.5.255      16778496    16778751 GTELECOM-AUSTRAL… Gtelecom-AUSTRALIA         AU   
##  5 1.0.6.0/24   56203 1.0.6.0    1.0.6.255      16778752    16779007 GTELECOM-AUSTRAL… Gtelecom-AUSTRALIA         AU   
##  6 1.0.7.0/24   56203 1.0.7.0    1.0.7.255      16779008    16779263 GTELECOM-AUSTRAL… Gtelecom-AUSTRALIA         AU   
##  7 1.0.16.0/24  2519  1.0.16.0   1.0.16.255     16781312    16781567 VECTANT           ARTERIA Networks Corporat… JP   
##  8 1.0.64.0/18  18144 1.0.64.0   1.0.127.255    16793600    16809983 AS-ENECOM         Energia Communications,In… JP   
##  9 1.0.128.0/17 23969 1.0.128.0  1.0.255.255    16809984    16842751 TOT-NET           TOT Public Company Limited TH   
## 10 1.0.128.0/18 23969 1.0.128.0  1.0.191.255    16809984    16826367 TOT-NET           TOT Public Company Limited TH   
## # ... with 786,025 more rows

Note that routeviews_latest() and asnames_current() cache the data so there is no re-downloading unless you clear the local cache.

docxtractr

The docxtractr? package recently got a CRAN push due to some changes in the tibble ? but it also include a new feature that lets you accept or reject “tracked changes” before trying to extract tables/comments from a document without harming/changing the original document.

ednstest

DNS Flag Day is fast approaching. What is “DNS Flag Day”? It’s a day when yet-another cabal of large-scale DNS providers and tech heavy hitters decided that they know what’s best for the internet and are mandating compliance with RFC 6891 (EDNS). Honestly, there’s no good reason to run crappy DNS servers and no good reason not to support EDNS.

You could just go to the flag day site and test your provider (by entering your domain name, if you have one). But, you can also load the package, and run it locally (it still calls their API since it’s open and provides a very detailed results page if your DNS server isn’t compliant). You can just run it to get compact output and an auto-load of the report page in your browser or save off the returned object and inspect it to see what tests failed.

I ran it on a few domains that are likely familiar to readers and this is what it showed:

edns_test("rud.is")
## EDNS compliance test for [rud.is] has ✔ PASSED!
## Report URL: https://ednscomp.isc.org/ednscomp/60049cb032

edns_test("rstudio.com")
## EDNS compliance test for [rstudio.com] has ✖ FAILED
## Report URL: https://ednscomp.isc.org/ednscomp/54e2057229

edns_test("r-project.org")
## EDNS compliance test for [r-project.org] has ✔ PASSED!
## Report URL: https://ednscomp.isc.org/ednscomp/839ee9c9af

The print() function in the package also has some minimal cli? and crayon? usage in it if you’re looking to jazz up your R console output.

ulid

Finally, there’s ulid? which is a package to make “Universally Unique Lexicographically Sortable Identifiers in R”. These ULIDs have the following features:

  • 128-bit compatibility with UUID
  • 1.21e+24 unique ULIDs per millisecond
  • Lexicographically sortable!
  • Canonically encoded as a 26 character string, as opposed to the 36 character UUID
  • Uses Crockford’s base32 for better efficiency and readability (5 bits per character)
  • Case insensitive
  • No special characters (URL safe)
  • Monotonic sort order (correctly detects and handles the same millisecond)

They’re made up of

 01AN4Z07BY      79KA1307SR9X4MV3

|----------|    |----------------|
 Timestamp          Randomness
   48bits             80bits

The timestamp is a 48 bit integer representing UNIX-time in milliseconds and the randomness is an 80 bit cryptographically secure source of randomness (where possible). Read more in the full specification.

You can get one ULID easily:

ulid::ULIDgenerate()
## [1] "0001E2ERKHVPKZJ6FA6ZWHH1KS"

Generate a whole bunch of ’em:

(u <- ulid::ULIDgenerate(20))
##  [1] "0001E2ERKHVX5QF5D59SX2E65T" "0001E2ERKHKD6MHKYB1G8JHN5X" "0001E2ERKHTK0XEHVV2G5877K9" "0001E2ERKHKFGG5NPN24PC1N0W"
##  [5] "0001E2ERKH3F48CAKJCVMSCBKS" "0001E2ERKHF3N0B94VK05GTXCW" "0001E2ERKH24GCJ2CT3Z5WM1FD" "0001E2ERKH381RJ232KK7SMWQW"
##  [9] "0001E2ERKH7NAZ1T4HR4ZRQRND" "0001E2ERKHSATC17G2QAPYXE0C" "0001E2ERKH76R83NFST3MZNW84" "0001E2ERKHFKS52SD8WJ8FHXMV"
## [13] "0001E2ERKHQM6VBM5JB235JJ1W" "0001E2ERKHXG2KNYWHHFS8X69Z" "0001E2ERKHQW821KPRM4GQFANJ" "0001E2ERKHD5KWTM5S345A3RP4"
## [17] "0001E2ERKH0D901W6KX66B1BHE" "0001E2ERKHKPHZBFSC16FC7FFC" "0001E2ERKHQQH7315GMY8HRYXV" "0001E2ERKH016YBAJAB7K9777T"

and “unmarshal” them (which gets you the timestamp back):

unmarshal(u)
##                     ts              rnd
## 1  2018-12-29 07:02:57 VX5QF5D59SX2E65T
## 2  2018-12-29 07:02:57 KD6MHKYB1G8JHN5X
## 3  2018-12-29 07:02:57 TK0XEHVV2G5877K9
## 4  2018-12-29 07:02:57 KFGG5NPN24PC1N0W
## 5  2018-12-29 07:02:57 3F48CAKJCVMSCBKS
## 6  2018-12-29 07:02:57 F3N0B94VK05GTXCW
## 7  2018-12-29 07:02:57 24GCJ2CT3Z5WM1FD
## 8  2018-12-29 07:02:57 381RJ232KK7SMWQW
## 9  2018-12-29 07:02:57 7NAZ1T4HR4ZRQRND
## 10 2018-12-29 07:02:57 SATC17G2QAPYXE0C
## 11 2018-12-29 07:02:57 76R83NFST3MZNW84
## 12 2018-12-29 07:02:57 FKS52SD8WJ8FHXMV
## 13 2018-12-29 07:02:57 QM6VBM5JB235JJ1W
## 14 2018-12-29 07:02:57 XG2KNYWHHFS8X69Z
## 15 2018-12-29 07:02:57 QW821KPRM4GQFANJ
## 16 2018-12-29 07:02:57 D5KWTM5S345A3RP4
## 17 2018-12-29 07:02:57 0D901W6KX66B1BHE
## 18 2018-12-29 07:02:57 KPHZBFSC16FC7FFC
## 19 2018-12-29 07:02:57 QQH7315GMY8HRYXV
## 20 2018-12-29 07:02:57 016YBAJAB7K9777T

and can even supply your own timestamp:

(ut <- ts_generate(as.POSIXct("2017-11-01 15:00:00", origin="1970-01-01")))
## [1] "0001CZM6DGE66RJEY4N05F5R95"

unmarshal(ut)
##                    ts              rnd
## 1 2017-11-01 15:00:00 E66RJEY4N05F5R95

FIN

Kick the tyres & file issues/PRs as needed and definitely give sr.ht a spin for your code-hosting needs. It’s 100% free and open source software made up of mini-services that let you use only what you need. Zero javacript on site and no tracking/adverts. Plus, no evil giant megacorps doing heaven knows what with your browser, repos, habits and intellectual property.

Cover image from Data-Driven Security
Amazon Author Page

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.