Skip navigation

Tag Archives: post

The `knitr`/R markdown system is a great way to organize reports and analyses. However, the built-in ones (that come with RStudio/the `rmarkdown` package) rely on Bootstrap and also use jQuery. There’s nothing wrong with that, but the generated standalone HTML documents (which are a great way to distribute reports) don’t really need all that cruft and it’s fun & informative to check out new frameworks from time-to-time. Also, jQuery is a heavy crutch I’m working hard to not need anymore.

To that end, I created a package — [`markdowntemplates`](https://github.com/hrbrmstr/markdowntemplates) — that contains three alternate templates that you can use out of the box or (hopefully) clone, customize and use in your future work. One template is based on the [Bulma CSS framework](http://bulma.io), the other is based on the [Skeleton CSS framework](http://getskeleton.com) and the last one is a super-minimal template with no formatting (i.e. it’s a good one to build on).

The github repo has screenshots of the basic formatting.

I tried to keep with the base formatting of each theme, but I went a bit crazy and showed how to have a fixed banner in the Skeleton version.

### How it works

There are three directories inside `inst/rmarkdown/templates` each has a similar structure:

– a `resources` directory with CSS (and potentially javascript)
– a `skeleton` directory which holds example `Rmd` “skeleton” files
– a `base.html` file which is the parameterized HTML template for the Rmd
– a `template.yaml` file which is how RStudio/`knitr` knows there’s one or more R markdown templates in your package

The `minimal` `base.html` is small enough to include here:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"$if(lang)$ lang="$lang$" xml:lang="$lang$"$endif$>
 
<head>
 
<meta charset="utf-8">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1">
 
<title>$if(title)$$title$$endif$</title>
 
$for(header-includes)$
$header-includes$
$endfor$
 
$if(highlightjs)$
<style type="text/css">code{white-space: pre;}</style>
<link rel="stylesheet"
      href="$highlightjs$/$if(highlightjs-theme)$$highlightjs-theme$$else$default$endif$.css"
      $if(html5)$$else$type="text/css" $endif$/>
<script src="$highlightjs$/highlight.js"></script>
<script type="text/javascript">
if (window.hljs && document.readyState && document.readyState === "complete") {
   window.setTimeout(function() {
      hljs.initHighlighting();
   }, 0);
}
</script>
$endif$
 
$if(highlighting-css)$
<style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
$highlighting-css$
</style>
$endif$
 
$for(css)$
<link rel="stylesheet" href="$css$" $if(html5)$$else$type="text/css" $endif$/>
$endfor$
 
</head>
 
<body>
<div class="container">
 
<h1>$if(title)$$title$$endif$</h1>
 
$for(include-before)$
$include-before$
$endfor$
 
$if(toc)$
<div id="$idprefix$TOC">
$toc$
</div>
$endif$
 
$body$
 
$for(include-after)$
$include-after$
$endfor$
 
</div>
 
$if(mathjax-url)$
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    script.src  = "$mathjax-url$";
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script>
$endif$
 
</body>
</html>

I kept a bit of the RStudio template code for source code formatting, but grokking the actual template language should be pretty straightforward. You should be able to pick out `$title$` in there and you can add as many parameters to the `Rmd` YAML section as you like (with corresponding counterparts in that template). I added a corresponding, exported R function for each supported template to show how easy it is to customize the parameters while also accepting further customizations in the YAML of each `Rmd`.

Imagine building a base template with your personal or organization’s branding *and* having it set apart from the cookie-cutter RStudio `rmarkdown` Bootstrap/jQuery template. Or, create course-specific templates like the [`s20x` package](https://github.com/cran/s20x). Once you peek behind the curtain to see how things are done, it’s not so complex and the sky is the limit.

I’ll try to get these in CRAN as soon as possible. If you have a preference for another CSS framework (I’m thinking of adding a “Metro” CSS kit and a Google web starter CSS kit), shoot me an issue or PR and I’ll incorporate it. The more examples we have the easier it will be for folks to create new templates.

Any & all feedback is most welcome.

(If you don’t know what XML is, you should probably [read a primer](https://en.wikipedia.org/wiki/XML) before reading this post,)

When working with data, one inevitably comes across things encoded in XML. I’m in the “anti-XML” camp, but deal with my fair share of XML in “cyber” and help out enough people who have to work with XML that I’ve become pretty proficient when slicing & dicing it.

R has two main packages to deal with XML: the original `XML` package and the more lightweight and modern `xml2` package. If you really need all the power of `libxml2` (the C library that powers both packages) or are _creating_ XML from R, then you probably know your way around the `XML` package and are pretty self-sufficient.

Most folks can get by with the `xml2` package if their goal is to work with XML data. By “work with” I mean read in files or data from APIs that come in XML format and have to find nuggets of gold in between all those `<` and `>` tags. To do so requires finding what you need and that means using a query language called `XPath` to pinpoint the node(s) you are after. Working with `XPath` can be pretty daunting for those who went to school to ultimately cure diseases, build high-performing stock portfolios, target advertising to everyone or perform a host of other real work. Becoming an expert in `XPath` was not something on the bucket list but to work with XML you will need to be familiar with it.

The [`xmlview`](https://github.com/hrbrmstr/xmlview) package provides a way to visually inspect XML and interactively test out `XPath` expressions. It’s as simple to use as:

devtools::install_github("ramnathv/htmlwidgets") # we use some bleeding edge features
devtools::install_github("hrbrmstr/xmlview")
library(xml2)
library(xmlview)
 
# plain text XML
xml_view("<note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>")
 
# read-in XML document
doc <- read_xml("http://www.npr.org/rss/rss.php?id=1001")
xml_view(doc, add_filter=TRUE)

(There’s also an experimental `xml_tree_view()` in there by @timelyportfolio that we’ll be adding features to at a pretty rapid pace.)

Here’s a screenshot of it in action:

RStudioScreenSnapz003

There are options to change the CSS styling for the formatted code. Yep, it will format and highlight XML for you so it’s easier to work with. There’s an animated gif of a screencast over [on github](https://github.com/hrbrmstr/xmlview) as well.

Once you perfect your `XPath` expression, hit the “R” button and it will generate the code you can copy back into RStudio. It understands namespaces but try not to stuff a huge XML document in there as browsers don’t work well with large data elements (the viewer is an `htmlwidget` and is, hence, browser-based).

It works with plain character XML/HTML, and many `xml2` data types. I have no current plans for `XML` package object support but toss up an issue on github if you really need it (or, better yet, a PR). If there are other desired features (especially from educators), please post a request in github issue as well.

Watch for more features in the coming weeks and a CRAN release once the bleeding edge `htmlwidgets` packages makes it to CRAN.

Despite being a cybersecurity professional, it’s pretty easy to social engineer me:

I’ll note that @jayjacobs does it all the time to me.

I took Thorsten’s tweet as a challenge to ggplot2-ize the Bloomberg visualizations as best as possible.

All the code in [on github](https://github.com/hrbrmstr/forceaccounted) and you can see the finished product (knitted from an Rmd file) [on this project page](http://rud.is/projects/force_accounted.html) or mini-scroll below in the `iframe`.

I encourage folks to look at the project (it’s actually a package) source as it has quite a bit of data munging and ggplot2 “tricks” that could be useful in “real” visualizations.

`iptools` is a set of tools for working with IP addresses. Not just work, but work _fast_. It’s backed by `Rcpp` and now uses the [AsioHeaders](http://dirk.eddelbuettel.com/blog/2016/01/07/#asioheaders_1.11.0-1) package by Dirk Eddelbuettel, which means it no longer needs to _link_ against the monolithic Boost libraries and *works on Windows*!

What can you do with it? One thing you can do is take a vector of domain names and turn them into IP addresses:

library(iptools)
 
hostname_to_ip(c("rud.is", "dds.ec", "ironholds.org", "google.com"))
 
## [[1]]
## [1] "104.236.112.222"
## 
## [[2]]
## [1] "162.243.111.4"
## 
## [[3]]
## [1] "104.131.2.226"
## 
## [[4]]
##  [1] "2607:f8b0:400b:80a::100e" "74.125.226.101"           "74.125.226.102"          
##  [4] "74.125.226.100"           "74.125.226.96"            "74.125.226.104"          
##  [7] "74.125.226.99"            "74.125.226.103"           "74.125.226.105"          
## [10] "74.125.226.98"            "74.125.226.97"            "74.125.226.110"

That means you can pump a bunch of domain names from logs into `iptools` and get current IP address allocations out for them.

You can also do the reverse:

library(magrittr)
library(purrr)
library(iptools)
 
hostname_to_ip(c("rud.is", "dds.ec", "ironholds.org", "google.com")) %>% 
  flatten_chr() %>% 
  ip_to_hostname() %>% 
  flatten_chr()
 
##  [1] "104.236.112.222"           "dds.ec"                    "104.131.2.226"            
##  [4] "yyz08s13-in-x0e.1e100.net" "yyz08s13-in-f5.1e100.net"  "yyz08s13-in-f6.1e100.net" 
##  [7] "yyz08s13-in-f4.1e100.net"  "yyz08s13-in-f0.1e100.net"  "yyz08s13-in-f8.1e100.net" 
## [10] "yyz08s13-in-f3.1e100.net"  "yyz08s13-in-f7.1e100.net"  "yyz08s13-in-f9.1e100.net" 
## [13] "yyz08s13-in-f2.1e100.net"  "yyz08s13-in-f1.1e100.net"  "yyz08s13-in-f14.1e100.net"

Notice that it handled IPv6 addresses and also cases where no reverse mapping existed for an IP address.

You can convert IPv4 addresses to and from long integer format (the 4 octet version of IPv4 addresses is primarily to make them easier for humans to grok), generate random IP addresses for testing, test IP addresses for validity and type and also reference data sets with registered assignments (so you can see allocated IP groups). Plus, it includes `xff_extract()` which can help identify an actual IP address (helpful when connections come from behind proxies).

We can’t thank Dirk enough for cranking out `AsioHeaders` since it means there will be many more network/”cyber” packages coming for R and available on every platform.

You can find `iptools` version `0.3.0` [on CRAN](https://cran.r-project.org/web/packages/iptools/) now (it may take your mirror a bit to catch up), grab the source [release](https://github.com/hrbrmstr/iptools/releases/tag/v0.3.0) on GitHub or check out the [repo](https://github.com/hrbrmstr/iptools/), poke around, submit issues and/or contribute!

Isn’t it great when an R package can help you with resolutions in the new year?

Gone are the days when one had a single computer plugged directly into a modem (cable, DSL or good ol’ Hayes). Even the days when there were just one or two computers connected via wires or invisible multi-gigahertz waves passing through the air are in the long gone by. Today (as you’ll see in the February 2016 [OUCH! newsletter](http://securingthehuman.sans.org/resources/newsletters/ouch/2016)), there are scads of devices of all kinds on your home network. How can you keep track of them all?

Some router & wireless access point vendors provide tools on their device “admin” pages to see what’s connected, but they are inconsistent at best (and usually pretty ugly & cumbersome to navigate to). Thankfully, app purveyors have jumped in to fill the gap. Here’s a list of free or “freemium” (basic features for free, advanced features cost extra) tools for mobile devices and Windows or OS X (if you’re running Linux at home, I’m assuming you’re familiar with the tools available for Linux).

### iOS

– Fing –
– iNet – Network Scanner –
– Network Analyzer Lite – wifi scanner, ping & net info –

### Android

– Fing – (google); (amazon)
– Pamn (google); (direct source)

### Windows

– Advanced IP Scanner
– Angry IP Scanner –
– Fing –
– MiTec Network Scanner –
– nmap –

### OS X

– Angry IP Scanner –
– Fing –
– IP Scanner –
– LanScan –
– nmap –

Some of these tools are easier to work with than others, but they all install pretty easily (though “Fing” and “nmap” work at the command-line on Windows & OS X, so if you’re not a “power user”, you may want to use other tools on those platforms). In most cases, it’s up to you to keep a copy of the output and perform your own “diffs”. One “pro” option for tools like “Fing” is the ability to have the tool store scan results “in the cloud” and perform this comparison for you.

Drop a note in the comments if you have other suggestions, but _vendors be warned_: I’ll be moderating all comments to help ensure no evil links or blatant product shilling makes it to reader eyeballs.

Moritz Stefaner started off 2016 with a [very spiffy post](http://truth-and-beauty.net/experiments/ach-ingen-zell/) on _”a visual exploration of the spatial patterns in the endings of German town and village names”_. Moritz was [exploring some new data processing & visualization tools](https://github.com/moritzstefaner/ach-ingen-zell) for the post, but when I saw what he was doing I wondered how hard it would be to do something similar in R and also used it as an opportunity to start practicing a new habit in 2016: packages vs projects.

To state more precisely the goals for this homage, the plan was to:

– use as close to the same data sets Mortiz has in his github repo, _including_ the ones in pure javascript
– generate an HTML page as output that is as close to the style in Moritz’s visualization
– use R for _everything_ (i.e. no “cheating” by sneaking in some javascript via `htmlwidgets`)
– bundle everything into a package to take advantage of all the good stuff that comes with R package validation

You may want to [take a look at the result](http://rud.is/zellingenach.html) to see if you want to continue reading (I hope you will!).

### The Setup
rud_is_zellingenach_htmlBy using an R package as the framework for the visualization, it’s possible to keep the data with the code and also organize and document the code in a way that makes it easy for folks to use and explore without cutting and pasting (our `source`ing) code. It also makes it possible to list all the dependencies for the project and help ensure they’ll be installed when someone tries to work with it.

While I _could_ have converted Moritz’s processed data into R data files, I left the CSV intact and the javascript file of suffix groupings also intact to show that R is extremely flexible when it comes to data processing (which is a “duh” for most folks by this point but the use of javascript data structures might give some folks ideas as how to reduce data duplication between projects). Both these files get stored in the `inst/alt` folder of the source package. I also end up using some CSS for the final visualization and placed that into a file in the same directory, which makes the code that generates the HTML a bit cleaner.

Because R processes some things automatically (like `.onAttach`) when it interacts with a package one can have it provide helpful instructions (in this case, how to generate the visualization) in similar fashion to the `ggplot2` loading messages.

Similarly, there both the package itself and the package functions have documentation to help folks understand both what the package and each component is doing.

### The Fun Stuff
rud_is_zellingenach_htmlThe CSV file of places looks something like this:

name,latitude,longitude
Nierskanal,49.01,13.23
Zwiefelhof,49.22,11.18
Zwiefaltendorf,48.21,9.51
Zwiefalten,48.23,9.46
Zwiedorf,53.69,13.05
Zwickgabel,48.58,8.31
Zwickau,50.72,12.48
Zwethau,51.58,13.04
Zwesten,51.05,9.17

and, the suffix groupings list looks like this:

const suffixList = [
  ["ach", "a", "aa", "ah"],
  ["ar", "ahr"],
  ["ate", "te", "nit", "net"],
  ["au", "aue", "oog", "ooge", "ohe", "oie"],
  ["bach", "bach", "bek", "beken", "beck", "bke"],
  ["berg", "bergen", "barg", "bargen"],
  ["born", "bronn"],
  ["bruch", "broich", "brook", "brock", "brauk"],
  ["bruck", "brück", "brügge"],
  ...
];

While `read.csv` (no need for `readr` as the file is small) can handle the CSV file, we use the `V8` package to source the javascript and convert it to an R object:

ct <- v8()
ct$source(system.file("alt/suffixlist.js", package="zellingenach"))
ct$get("suffixList")

We actually turn that into a vector of regular expressions (for town name ending checking) and a list of vectors (for the HTML visualization creation). Check out `suffix_regex()` and `suffix_names()` in the source code.

The `read_places()` function builds a `data.frame` of the places combined with the suffix grouping(s) they belong to:

# read in the file
plc <- read.csv(system.file("alt/placenames_de.tsv", package="zellingenach"),
                stringsAsFactors=FALSE)
 
# iterate over each suffix and identify which place names match the grouping
lapply(suf, function(regex) {
  which(stri_detect_regex(plc$name, regex))
}) -> matched_endings
 
plc$found <- ""
 
# add which grouping(s) the place was found to a new column
for(i in 1:length(matched_endings)) {
  where_found <- matched_endings[[i]]
  plc$found[where_found] <-
    paste0(plc$found[where_found], sprintf("%d|", i))
}
 
# some don't match so get rid of them
mutate(filter(plc, found != ""), found=sub("\\|$", "", found))

I do something a bit different than Moritz in that in that I allow towns to be part of multiple suffix groups, since:

– I’m neither a historian nor expert in German town naming conventions, and
– the javascript version and this R version both take a naive approach to suffix mapping.

This means my numbers (for the _”#### places”_ label) will be different for some of my maps.

R has similar shortcut functions (Mortiz uses D3) to make hexgrids out of shapefiles. Here’s the entirety of `create_hexgrid()`:

de_shp <- getData("GADM", country="DEU", level=0, path=tempdir())
 
de_hex_pts <- spsample(de_shp, type="hexagonal", n=10000, cellsize=0.19,
                       offset=c(0.5, 0.5), pretty=TRUE)
 
HexPoints2SpatialPolygons(de_hex_pts)

You can play with `cellsize` to change the number of hexes. I tried to find a good number to get close to the # in Moritz’s maps.

This all gets put together in `make_maps()` where we use `ggplot2` to build 52 gridded heatmaps (one for each suffix grouping). I used a log of the counts to map to a binned viridis color scale, so my colors come out a bit different than Moritz’s but the overall patterns are on par with his.

Finally, `display_maps()` takes the list created by `make_maps()` and builds out an HTML page using the `htmltools` package for the page framework and `svglite::htmlSVG` to make SVGs of the ggplot objects). NOTE that you can use the `output_file` option of `display_maps()` to send the HTML to a file as well as display it in the viewer/browser.

### Fin
rud_is_zellingenach_htmlBecause the project is in a pacakge, we can run package checks to see if we’re missing anything including other pacakge dependencies, function documentation and other details that the package tools are gleeful to point out. We can also include code to test out our various components to ensure they are behaving as expected (i.e. generating the right data/output).

Once nice thing about the output is that it’s “responsive”, which means it handles multiple screen sizes quite well. So, if your screen is huge, you’ll have many map boxes on one line and if it’s small (like the `iframe` below) it will have fewer.

You’ll see that my maps are a bit bigger than Moritz’s. This is due to both the hex grid size and the fact that the SVG output is just slightly larger overall than the ones made by D3. Of note: I noticed some suffix subtitle components wrapped at the “-” so I converted the plain dashes to non-breaking ones `‑`/”‑”.

The one downside to using a package for this is that it’s harder to post complete code into a blog post, but you can [clone the repo](https://github.com/hrbrmstr/zellingenach) to look at the code and skip the dissection and just generate the visualization locally via:

install.packages("ggalt")
# OR: devtools::install_github("hrbrmstr/ggalt") 
devtools::install_github("hrbrmstr/zellingenach")
display_maps()

By targeting SVG & HTML, we can make a cross-platform, crisp and responsive visualization all without leaving RStudio.

If you caught any errors or made something cool with any of the code, please drop an issue on github and a note in the comments (respectively)!

If you prefer a single- `source`-able version, please see [this gist](https://gist.github.com/hrbrmstr/f3d2568ad0f27b2384d3).

Happy New YeaR!

James Austin (@awhstin) made some #spiffy 4-panel maps with base R graphics but also posited he didn’t use ggplot2 because:

ggplot2 and maps currently do not support world maps at this point, which does not give us a great overall view.

That is certainly a box I would not put ggplot2 into, especially with the newly updated R maps (et al) packages, ggplot2 2.0 and my (still in development) ggalt package (though this was all possible before ggplot2 2.0 and ggalt). NOTE: I have no idea why I get so defensive about ggplot2 besides the fact that it’s one the best visualization tools ever created.

Here’s all you need to use the built-in facet options of ggplot2 to make the 4-panel plot (as James points out, you can get the data file from here: CLIWOC15.csv):

library(ggplot2)  # FYI you need v2.0
library(dplyr)    # yes, i could have not done this and just used 'subset' instead of 'filter'
library(ggalt)    # devtools::install_github("hrbrmstr/ggalt")
library(ggthemes) # theme_map and tableau colors
 
world <- map_data("world")
world <- world[world$region != "Antarctica",] # intercourse antarctica
 
dat <- read.csv("CLIWOC15.csv")        # having factors here by default isn't a bad thing
dat <- filter(dat, Nation != "Sweden") # I kinda feel bad for Sweden but 4 panels look better than 5 and it doesn't have much data
 
gg <- ggplot()
gg <- gg + geom_map(data=world, map=world,
                    aes(x=long, y=lat, map_id=region),
                    color="white", fill="#7f7f7f", size=0.05, alpha=1/4)
gg <- gg + geom_point(data=dat, 
                      aes(x=Lon3, y=Lat3, color=Nation), 
                      size=0.15, alpha=1/100)
gg <- gg + scale_color_tableau()
gg <- gg + coord_proj("+proj=wintri")
gg <- gg + facet_wrap(~Nation)
gg <- gg + theme_map()
gg <- gg + theme(strip.background=element_blank())
gg <- gg + theme(legend.position="none")
gg

facetmaps

You can use a separate shapefile if you want, but this is quite minimalist (a feature James suggests is desirable) and emphasizes the routes quite nicely IMO.

It’s been a while since I’ve updated my [metricsgraphics package](https://cran.r-project.org/web/packages/metricsgraphics/index.html). The hit list for changes includes:

– Fixes for the new ggplot2 release (metricsgraphics uses the `movies` data set which is now in ggplot2movies)
– Updated all javascript libraries to the most recent versions
– Borrowed the ability to add CSS rules to a widget from taucharts (`mjs_add_css_rule`)
– Added a metricsgraphics plugin to enable line chart region annotation (`mjs_annotate_region`)
– Enabled explicit coloring line/area charts (it was a new feature in the underlying Metrics-Graphics library)
– You can use bare or quoted names when specifying the x & y accessors and can also use a variable name
– You can now use the metricsgraphics title & description capabilities, but doing so voids any predictable/specified widget height/width and the description functionality is really only suited for bootstrap templates

I think all that can be demonstrated in the following snippet:

library(metricsgraphics)
 
dat <- read.csv("http://real-chart.finance.yahoo.com/table.csv?s=AAPL&a=07&b=9&c=1996&d=11&e=21&f=2015&g=d&ignore=.csv",
                stringsAsFactors=FALSE)
 
DATE <- "Date"
 
dat %>%
  filter(Date>="2008-01-01") %>% 
  mjs_plot(DATE, y="Low", title="AAPL Stock (2008-Present)", width=800, height=500) %>% 
  mjs_line(color="#6a3d9a") %>% 
  mjs_add_line(High, color="#ff7f00") %>% 
  mjs_axis_x(xax_format="date") %>% 
  mjs_add_css_rule("{{ID}} .blk { fill:black }") %>%
  mjs_annotate_region("2013-01-01", "2013-12-31", "Volatility", "blk") %>% 
  mjs_add_marker("2014-06-09", "Split") %>% 
  mjs_add_marker("2012-09-12", "iPhone 5") %>% 
  mjs_add_legend(c("Low", "High"))

NOTE: I’m still trying to figure out why WebKit on Safari renders the em dashes and Chrome does not.