Visualizing Risky Words — Part 3

The DST changeover in the US has made today a fairly strange one, especially when combined with a very busy non-computing day yesterday. That strangeness manifest as a need to take the D3 heatmap idea mentioned in the [previous post](http://rud.is/b/2013/03/09/visualizing-risky-words-part-2/) and actually (mostly) implement it. Folks just coming to this thread may want to start with the [first post](http://rud.is/b/2013/03/06/visualizing-risky-words/) in the series.

I did a quick extraction of the R TermDocumentMatrix with nested for loops and then extracted the original texts of the corpus and put them into some javascript variables along with some D3 code to show how to do a [rudimentary interactive heatmap](http://rud.is/d3/vzwordshm/).

(click for larger)

(click for live demo)

As you mouse over each of the tiles they will be highlighted and the word/document will be displayed along with the frequency count. Click on the tile and the text will appear with the highlighted word.

Some caveats:

– The heatmap looks a bit different from the R one in part 2 as the terms/keywords are in alphabetical order.
– There are no column or row headers. I won’t claim anything but laziness, though the result does look a bit cleaner this way.
– I didn’t bother extracting the stemming results (it’s kind of a pain), so not all of the keywords will be highlighted when you select a tile.
– It’s one, big HTML document complete with javascript & data. That’s not a recommended practice in production code but it will make it easier for folks to play around with.
– The HTML is not commented well, but there’s really not much of it. The code that makes the heatmap is pretty straightforward if you even have a rough familiarity with D3. Basically, enumerate the data, generating a colored tile for each row/col entry and then mapping click & mouseover/out events to each generated SVG element.
– I also use jQuery in the code, but that’s not a real dependency. I just like using jQuery selectors for non-D3 graphics work. It bulks up the document, so use them together wisely. NOTE: If you don’t want to use jQuery, you’ll need to change the selector code.

Drop a note in the comments if you do use the base example and improve on it or if you have any questions on the code. For those interested, I’ll be putting all the code from the “Visualizing Risky Words” posts into a github repository at the end of the series.

Cover image from Data-Driven Security
Amazon Author Page

1 Comment Visualizing Risky Words — Part 3

  1. Pingback: Visualizing Risky Words | rud.is

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.