Skip navigation

Category Archives: phantomjs

>UPDATE: I rejiggered the function to actually now, y’know, do what it says it should do :-)

A friend, we’ll call him _Alen_ put a call out for some function that could take an image and produce a per-row “histogram” along the edge for the number of filled-in points. That requirement eventually scope-creeped to wanting “histograms” on both the edge and bottom. In, essence there was a desire to be able to compare the number of pixels in each row/line to each other.

Now, you’re all like _”Well, you used ggplot to make the image so…”_ Yeah, not so much. They had done some basic charting in D3. And, it turns out, that it would be handy to compare the data between different images since they had different sets of data they were charting in the same place.

I can’t show you their images as they are part of super seekrit research which will eventually solve world hunger and land a family on Mars. But, I _can_ do a minor re-creation. I made a really simple D3 page that draws random lines in a specified color. Like this:



You can view the source of to see the dead-simple D3 that generates that. You’ll see something different in that image every time since it’s javascript and js has no decent built-in random routines (well it does _now_ but the engine functionality in browsers hasn’t caught up yet). So, you won’t be able to 100% replicate the results below but it will work.

First, we need to be able to get the image from the `div` into a bitmap so we can do some pixel counting. We’ll use the new `webshot` package for that.

library(webshot)
 
tmppng1 <- tempfile(fileext=".png")
webshot("http://rud.is/projects/randomlines.html?linecol=f6743d", 
        file=tmppng1,
        selector="#vis")

The image that produced looks like this:

img1

To make the “histograms” on the right and bottom, we’ll use the `raster` capabilities in R to let us treat the data like a matrix so we can easily add columns and rows. I made a function (below) that takes in a `png` file and either a list of colors to look for or a list of colors to exclude and the color you want the “histograms” to be drawn in. This way you can just exclude the background and annotation colors or count specific sets of colors. The counting is fueled by `fastmatch` which makes for super-fast comparisons.

#' Make a "row color density" histogram for an image file
#' 
#' Takes a file path to a png and returns displays it with a histogram of 
#' pixel density
#' 
#' @param img_file path to png file
#' @param target_colors,ignore_colors colors to count or ignore. Either one should be 
#'        \code{NULL} or \code{ignore_colors} should be \code{NULL}. Whichever is
#'        not \code{NULL} should be a vector of hex strings (can be huge vector of 
#'        hex strings as it uses \code{fastmatch}). The alpha channel is thrown away 
#'        if any, so you only need to specify \code{#rrggbb} hex strings
#' @param color to use for the density histogram line
selective_image_color_histogram <- function(img_file, 
                                            target_colors=NULL,
                                            ignore_colors=c("#ffffff", "#000000"),
                                            hist_col="steelblue",
                                            plot=TRUE) {
 
  require(png)
  require(grid)
  require(raster)
  require(fastmatch)
  require(gridExtra)
 
  "%fmin%" <- function(x, table) { fmatch(x, table, nomatch = 0) > 0 }
  "%!fmin%" <- function(x, table) { !fmatch(x, table, nomatch = 0) > 0 }
 
  if (is.null(target_colors) & is.null(ignore_colors)) {
    stop("Only one of 'target_colors' or 'ignore_colors' can be 'NULL'", call.=FALSE)
  }
 
  # clean up params
  target_colors <- tolower(target_colors)  
  ignore_colors <- tolower(ignore_colors)  
 
  # read in file and convert to usable data structure  
  png_file <- readPNG(img_file)
  img <- substr(tolower(as.matrix(as.raster(png_file))), 1, 7)
 
  if (length(target_colors)==0) {
    tf_img <- matrix(img %!fmin% ignore_colors, nrow=nrow(img), ncol=ncol(img))
  } else {
    tf_img <- matrix(img %fmin% target_colors, nrow=nrow(img), ncol=ncol(img))
  }  
 
  # count the pixels
  wvals <- rowSums(tf_img)
  hvals <- colSums(tf_img)
 
  # add a slight right & bottom margin
  wdth <- max(wvals) + round(0.1*max(wvals))
  hght <- max(hvals) + round(0.1*max(hvals))
 
  # create the "histogram" 
  col_mat <- matrix(rep("#ffffff", wdth*nrow(img)), nrow=nrow(img), ncol=wdth)
  for (row in 1:nrow(img)) { 
    col_mat[row, 1:wvals[row]] <- hist_col
  }
 
  # make bigger image
  new_img <- cbind(img, col_mat)
 
  # create the "histogram"
  row_mat <- matrix(rep("#ffffff", hght*ncol(new_img)), ncol=ncol(new_img), nrow=hght)
  for (col in 1:ncol(img)) { 
    row_mat[1:hvals[col], col] <- hist_col
  }
 
  # make a new bigger image and turn it into something we can use with 
  # grid since we can also use it with ggplot this way if we really wanted to
  # and friends don't let friends use base graphics
  rg1 <- rasterGrob(rbind(new_img, row_mat))
 
  # if we want to plot it, now is the time
  if (plot) grid.arrange(rg1)
 
  # return a list with each "histogram"
  return(list(row_hist=wvals, col_hist=hvals))
 
}

After reading in the `png` as a raster, the function counts up all the specified pixels by row and extends the matrix width-wise. Then it does the same by column and extends the matrix height-wise. Finally, it makes a `rasterGrob` (b/c friends don’t let friends use base graphics) and optionally plots the output. It also returns the counts by row and by column. That will let us compare between images.

Now we can do:

a <- selective_image_color_histogram(tmppng, hist_col="#f6743d", plot=TRUE)

hist1

And, make a counterpart image for it:

tmppng2 <- tempfile(fileext=".png")
webshot("http://rud.is/projects/randomlines.html?linecol=80b1d4", 
        file=tmppng2,
        selector="#vis")
 
b <- selective_image_color_histogram(tmppng2, hist_col="#80b1d4", plot=TRUE)

hist2

You can definitely visually compare to see which ones had more “activity” in which row(s) (or column(s)) but why not let R do that for you (you’ll probably need to change the font to something boring like `”Helvetica”`)?

library(ggplot2)
library(dplyr)
 
gg <- ggplot(data_frame(x=1:length(a$row_hist),
                        diff=a$row_hist - b$row_hist,
                        `A vs B`=factor(sign(diff), levels=c(-1, 0, 1), 
                                        labels=c("A", "Neutral", "B"))))
gg <- gg + geom_segment(aes(x=x, xend=x, y=0, yend=diff, color=`A vs B`))
gg <- gg + scale_x_continuous(expand=c(0,0))
gg <- gg + scale_y_continuous(expand=c(0,0))
gg <- gg + scale_color_manual(values=c("#f6743d", "#2b2b2b", "#80b1d4"))
gg <- gg + labs(x="Row", y="Difference")
gg <- gg + coord_flip()
gg <- gg + ggthemes::theme_tufte(base_family="URW Geometric Semi Bold")
gg <- gg + theme(panel.grid=element_line(color="#2b2b2b", size=0.15))
gg <- gg + theme(panel.grid.major.y=element_blank())
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(axis.text.y=element_blank())
gg <- gg + theme(axis.title.x=element_text(hjust=0))
gg <- gg + theme(axis.title.y=element_text(hjust=0))
gg

vertdif

This way, you let the _power of data science_ show you the answer. (The column processing chart is an exercise left to the reader).

The code may only be useful to _Alen_, but it was a fun and quick enough exercise that I thought it might be useful to the broader community.

Poke holes or improve upon it at will and tell me how horrible my code is in the comments (I have not looked to see if I subtracted in the right direction as I’m on solo dad duty for a cpl days and #4 is hungry).

NOTE: you won’t need to use this function if you use the [development version](https://github.com/yihui/knitr) of `knitr`


Winston Chang released his [`webshot`](https://github.com/wch/webshot) package to CRAN this past week. The package wraps the immensely useful [`phantomjs`](http://phantomjs.org/) utility and makes it dirt simple to capture whole or partial web pages in R. One beautiful bonus feature of `webshot` is that you can install `phamtomjs` with it (getting `phantomjs` to work on Windows is a pain).

You can do many things with the `webshot` package but I hastily drafted this post to put forth a means to generate a static image from an `htmlwidget`. I won’t elaborate much since I included a fully `roxygen`-doc’d function below, but the essence of `capture_widget()` is to pass in an `htmlwidget` object and have it rendered for you to a `png` file and get back either:

– a file system `path` reference (e.g. `/path/to/widget.png`)
– a `markdown` image reference (e.g. `![](file:///path/to/widget.png)`)
– an `html` image reference (e.g. ``), or
– an `inline` base64 encoded HTML imgage reference (e.g. ``)

which you can then use in R markdown documents knitted to PDF (or in any other context).

Take a look at the function, poke the tyres and drop suggestions in the comments. I’ll add this to one of my widgets soon so folks can submit complaints or enhancements via issues & PRs on github).

To use the function, just pipe a sized widget to it and use the output from it.

#' Capture a static (png) version of a widget (e.g. for use in a PDF knitr document)
#'
#' Widgets are generally interactive beasts rendered in an HTML DOM with
#' javascript. That makes them unusable in PDF documents. However, many widgets
#' initial views would work well as static images. This function renders a widget
#' to a file and make it usable in a number of contexts.
#'
#' What is returned depends on the value of \code{output}. By default (\code{"path"}),
#' the full disk path will be returned. If \code{markdown} is specified, a markdown
#' string will be returned with a \code{file:///...} URL. If \code{html} is
#' specified, an \code{<img src='file:///...'/>} tag will be returned and if
#' \code{inline} is specified, a base64 encoded \code{<img>} tag will be returned
#' (just like you'd see in a self-contained HTML file from \code{knitr}).
#'
#' @importFrom webshot webshot
#' @importFrom base64 img
#' @param wdgt htmlwidget to capture
#' @param output how to return the results of the capture (see Details section)
#' @param height,width it's important for many widget to be responsive in HTML
#'        documents. PDFs are static beasts and having a fixed image size works
#'        better for them. \code{height} & \code{width} will be passed into the
#'        rendering process, which means you should probably specify similar
#'        values in your widget creation process so the captured \code{<div>}
#'        size matches the size you specify here.
#' @param png_render_path by default, this will be a temporary file location but
#'        a fully qualified filename (with extension) can be specified. It's up to
#'        the caller to free the storage when finished with the resource.
#' @return See Details
#' @export
capture_widget <- function(wdgt,
                           output=c("path", "markdown", "html", "inline"),
                           height, width,
                           png_render_path=tempfile(fileext=".png")) {
 
  wdgt_html_tf <- tempfile(fileext=".html")
 
  htmlwidgets::saveWidget(vl, wdgt_html_tf)
 
  webshot::webshot(url=sprintf("file://%s", wdgt_html_tf),
                   selector="#htmlwidget_container",
                   file=wdgt_png_tf,
                   vwidth=width, vheight=height)
 
  # done with HTML
  unlink(wdgt_html_tf)
 
  switch(match.arg(output, c("path", "markdown", "html", "inline")),
             `path`=png_render_path,
         `markdown`=sprintf("![widget](file://%s)", png_render_path),
             `html`=sprintf("<img src='file://%s'/>", png_render_path),
           `inline`=base64::img(wdgt_png_tf))
 
}

I’m an avid NPR listener also follow a number of their programs and people on Twitter. I really dig their [quotable](https://github.com/nprapps/quotable) tweets. Here’s a sample of a recent one:

I asked @brianboyer & @alykat (two member of the _awesome_ NPR Visuals team) if they had a workflow for this (they publish quite a bit of open source tools) and, well [of course they did](https://twitter.com/hrbrmstr/status/596656939720962048).

After poking around a bit, I determined that the workflow would be great for something like NPR but was a bit overkill for me and also did not meet all of my requirements. I really only needed something that would:

– work from the command-line (on OS X)
– take a JSON file for the quote
– generate an SVG (for blogging) and PNG (for tweeting)

Thus, [quotebox](http://github.com/hrbrmstr/quotebox) was born.

For the time being, `quotebox` is a python script that lets you pass in a JSON config file with quote components and spits out some files to use in blog posts & tweets. I can see it morphing into more, but for now you can create a quote file like:

{
  "quote" : "Making these quotes by hand is fine &amp; NPR Visuals web app is cool, but this has some advantages, too.",
  "source" : "Data-Driven Security co-author Bob Rudis on making “NPR-like quoteboxes” at the command-line.", 
  "logo" : "ddsec.png"
}

which will generate:

test

or (SVG version):

test

The script auto-adds “smart” quotes and an em-dash plus base64 encodes the logo file to a data URI so you can easily use the SVG file without needing to bring along the associated logo asset. The SVG itself has a couple (at least to me) interesting components. First, it uses the [Lato](https://www.google.com/fonts/specimen/Lato) font from Google Fonts via `defs`:

<defs>
  <style type="text/css">@import url('http://fonts.googleapis.com/css?family=Lato:300,900');</style>
</defs>

It then uses two `foreignObject` tags for the main quote and the attribution line. Here’s an example:

<foreignObject x="25" y="25" width="590" height="165">
  <body xmlns="http://www.w3.org/1999/xhtml"
    style="background: white; font-family: Lato; font-weight:900; font-stretch:normal; font-size:36px; line-height:36px; text-indent:-16px; margin-left:20px; color:#333333">
    <p>&#8220;Making these quotes by hand is fine &amp; NPR Visuals web app is cool, but this has some advantages, too.&#8221;</p>
  </body>
</foreignObject>

Like most SVG elements, you have full precision as to how you place and size the `foreignObject` and this object contains XHTML, so all text or graphics in them can be fully styled. Not every browser or command-line rendering engine supports this, but the latest Chrome & Safari browsers do along with the >=2.0.0 of [phantomjs](http://phantomjs.org/).

Once the SVG is built, the script uses phantomjs to render the resultant png file. You now have two assets (SVG & png) that can be used wherever you like, but posting it as a Twitter image post will require the use of png.

### Requirements & Usage

You’ll need python, [Pillow](https://python-pillow.github.io/) and [phantomjs](http://phantomjs.org/) to run the script and just need to execute

python quotebox.py test.json

to generate `test.svg` and `test.png`. On OS X I was using

python quotebox.py test.json && open test.svg && open test.png

to test out the image generation & conversion.

Note that many (most?) folks should use the NPR Visuals [quotable](https://github.com/nprapps/quotable) instead of this since it has full workflow management. This script was created primarily to oversimplify local usage (I don’t work in a newsroom) and play with SVG.

The python is not bullet proof (no error checking, lack of use of tempfiles/dirs) but it’s a decent basic foundation.

### Future work

There will definitely be command line switches for some options (quotes or no quotes) and possibly an extension to the JSON config file for fonts and colors. I’ll think supporting grabbing a logo from a URL is needed and automagic resizing to a series of constraints. I can also envision spinning up a quotebox app on Heroku or Google Apps and bridging it to a bookmarklet, but fork away and PR as you see fit. The creativity & ingenuity of the community knows no bounds.