Skip navigation

Category Archives: d3

UPDATE: Added some extra visualization elements since this post went live. New select menu and hover text for individual job impact detail lines in the table.

I was reviewing RSS feeds when I came across this story about “ObamaCare Employer Mandate: A List Of Cuts To Work Hours, Jobs” over on Investors.com. Efficacy of the law notwithstanding, I thought it might be interesting to visualize the data since the folks over at Investors.com provided a handy spreadsheet that they seem to maintain pretty well (link is in the article).

The spreadsheet is organized by date and lists each state where the jobs were impacted along with the employer, employer type (public/private), reason and number of jobs impacted (if available). They also have links to news stories related to each entry.

My first thought was to compare impact across states by date, so I threw together a quick R script to build a faceted bar chart:

library(ggplot2)
library(plyr)
 
# Source for job impact data:
# http://news.investors.com/politics-obamacare/092513-669013-obamacare-employer-mandate-a-list-of-cuts-to-work-hours-jobs.htm
 
emp.f <- read.csv("~/employers.csv", stringsAsFactors=FALSE)
colnames(emp.f) <- c("State","Employer","Type","Action","Jobs.Cut","Action.Date")
emp.f[is.na(emp.f$Jobs.Cut),]$Jobs.Cut = median(emp.f$Jobs.Cut, na.rm=TRUE)
emp.f[emp.f$State=="Virgina", ]$State = "Virginia"
emp.f[emp.f$State=="Washington DC", ]$State = "District of Columbia"

Yes, they really spelled “Virginia” wrong, at least in the article text where I initially scraped the data from before I saw there was a spreadsheet available. Along with fixing “Virginia”, I also changed the name of “Washington DC” to “District of Columbia” for reasons you’ll see later on in this post. I’m finding it very helpful to do as much of the data cleanup in-code (R or Python) whenever possible since it makes the process far more repeatable than performing the same tasks by hand in a text editor and is essential if you know the data is going to change/expand.

After reading in the data, it was trivial to get a ggplot of the job impacts by state (click image for larger version):

p <- ggplot(emp.f, aes(x=Action.Date, y=Jobs.Cut))
p <- p + geom_bar(aes(fill=State), stat="identity")
p <- p + facet_wrap(~State)
p <- p + theme_bw()
p <- p + theme(legend.position=0, axis.text.x = element_text(angle = 90))
p <- p + labs(x="Action Date", y="# Jobs Cut")
p

oc-facet

That visualization provided some details, but I decided to expand the scope a bit and wanted to make an interactive “bubble chart” (since folks seem to love bubbles) with circle size relative to the total job cuts per state and circle color reflecting the conservative/liberal leaning of each state (i.e. ‘red’ vs ‘blue’) to see if there was any visual correlation by that attribute. I found the political data over at Gallup and went to work prepping the data with some additional R code. (NOTE: The Gallup data was the reason for the “DC” name change since Gallup uses “District of Columbia” in their data set.)

# aggregate state data
emp.state.sum.df <- count(emp.f,c("State"),c("Jobs.Cut"))
colnames(emp.state.sum.df) <- c("State","Total.Jobs.Cut")
 
# get total (estimated) jobs impacted
total.jobs <- sum(emp.state.sum.df$Total.Jobs.Cut)
 
# Source for the red v blue state data:
# http://www.gallup.com/poll/125066/State-States.aspx
# read political leanings
red.blue.df <- read.csv("~/red-blue.csv", stringsAsFactors=FALSE)
 
# join the jobs and leaning data together
s <- join(emp.state.sum.df, red.blue.df, by="State")
 
# cheat and get leaning range for manual input into the datavis
leaning.range <- range(s$Conservative.Advantage)
 
# build the JSON data file. store state summary data for the bubbles, but also include
# the detail level for extra data for the viz
# need to clean up this file post-write and definitely run it through http://jsonlint.com/
jsfile = file("states.tmp","w")
by(s, 1:nrow(s), function(row) {
  writeLines(sprintf('      {"name": "%s", "size":%d, "leaning":%2.1f, "detail":[',row$State,row$Total.Jobs.Cut,row$Conservative.Advantage),jsfile)
  employers = emp.f[emp.f$State == row$State,]
  by(employers, 1:nrow(employers), function(emp.row) {
    writeLines(sprintf('          { "employer":"%s", "emptype":"%s", "actiondetail":"%s", "jobsimpacted":%d, "when":"%s"},',
                       emp.row$Employer, emp.row$Type, gsub('"',"'",emp.row$Action), emp.row$Jobs.Cut, emp.row$Action.Date),jsfile)
 
  })
  writeLines("]},\n",jsfile)   
})
close(jsfile)

I know the comments point out the need to tweak the resulting JSON a bit (mostly to remove “errant” commas, which is one of the annoying bits about JSON), but I wanted to re-emphasize the huge utility of JSONlint as it can save you a great deal of time debugging large amounts of gnarly JSON data.

With the data prepped, I threw together a D3 visualization that shows the bubbles on the left and details by date and employer on the right.

oc-snap.png

Since it’s D3, there’s no need to put the source code in the blog post. Just do a “view-source” on the resulting visualization or poke around the github repository. I will, however, point out a couple useful/interesting bits from the code.

First, coloring circles by political leaning took exactly one line of code since D3 provides a means to map a range of values to colors:

var ramp = d3.scale.linear().domain([-21,36]).range(["#253494","#B30000"]);

I chose the colors with Color Brewer but cheated (as I indicated in the R code) by pre-computing the range of the values for the palette. You can see the tiny District of Columbia’s very blue circle in the lower-left of the field of circles. Hopefully Investors.com will maintain the data set and we can look at changes over a larger period of time.

Second, you get rudimentary “popups” for free via element “title” tags on the SVG circles, so no need for custom tooltip code:

node.append("title")
   .text(function(d) { return d.stateName + ": " + format(d.value) + " jobs impacted"; });

I could have tweaked the display a bit more, added links to the stories and provided a means to sort the “# Jobs” column by count or date, but I took enough time away from the book to scratch this visualization itch and it came out pretty much the way I wanted it to.

If you do hack at it and build something better (which should not be terribly difficult), drop a note in the comments or over at github.

Many thanks to all who attended the talk @jayjacobs & I gave at @Secure360 on Wednesday, May 15, 2013. As promised, here are the [slides](https://dl.dropboxusercontent.com/u/43553/Secure360-2013.pdf).

We’ve enumerated quite a bit of non-slide-but-in-presentation information that we wanted to aggregate into a blog post so you can vi[sz] along at home. If you need more of a guided path, I strongly encourage you to take a look at some of the free courses over at [Coursera](https://www.coursera.org/).

For starters, here’s a bit.ly bundle of data analysis & visualization bookmarks that @dseverski & I maintain. We’ve been doing (IMO) a pretty good job adding new resources as they come up and may have some duplicates to the ones below.

People Mentioned

– [Stephen Few’s Perceptual Edge blog](http://www.perceptualedge.com/) : Start from the beginning to learn from a giant in information visualization
– [Andy Kirk’s Visualising Data blog](http://www.visualisingdata.com/) (@visualisingdata) : Perhaps the quintessential leader in the modern visualization movement.
– [Mike Bostock’s blog](http://bost.ocks.org/mike/) (@mbostock) : Creator of D3 and producer of amazing, interactive graphics for the @NYTimes
– [Edward Tufte’s blog](http://www.edwardtufte.com/tufte/) : The father of what we would now identify as our core visualization principles & practices.
– [Nathan Yau’s Flowing Data blog](http://flowingdata.com/) : Making visualization accessible, practical and repeatable.
– [Data Stories Podcast](http://datastori.es/) : Yes, you can learn much about data visualization from an audio podacst (@datastories)
– [storytelling with data](http://www.storytellingwithdata.com/) (@storywithdata) : Extremely practical blog by Cole Nussbaumer that will especially help folks “stuck” in Excel
– [Jay’s blog](http://beechplane.wordpress.com/)
– [My {this} blog](http://rud.is/b)

Tools Mentioned

– [R](http://www.r-project.org/) : Jay & I probably use this a bit too much as a hammer (i.e. treat every data project as a nail) but it’s just far too flexible and powerful to not use as a go-to resource
– [RStudio](http://www.rstudio.com/) : An *amazing* IDE for R. I, personally, usually despise IDEs (yes, I even dislike Xcode), but RStudio truly improves workflow by several orders of magnitude. There are both desktop and server versions of it; the latter gives you the ability to setup a multi-user environment and use the IDE from practically anywhere you are. RStudio also makes generating [reproducible research](http://cran.r-project.org/web/views/ReproducibleResearch.html) a joy with built-in easy access to tools like [kintr](http://yihui.name/knitr/).
– [iPython](http://ipython.org/) : This version of Python takes an already amazing language and kicks it up a few notches. It brings it up to the level of R+RStudio, especially with it’s knitr-like [iPython Notebooks](http://ipython.org/ipython-doc/dev/interactive/htmlnotebook.html) for–again–reproducible research.
– [SecViz](http://secviz.org/) : Security-centric Visualization Site & Tools by @raffaelmarty
– [Mondrian](http://www.theusrus.de/Mondrian/) : This tool needs far more visibility. It enables extremely quick visualization of even very large data sets. The interface takes a bit of getting used to, but it’s faster then typing R commands or fumbling in Excel.
– [Tableau](http://www.tableausoftware.com/) : This tool may be one of the most accessible, fast & flexible ways to explore data sets to get an idea of where you need to/can do further analysis.
– [Processing](http://processing.org/) : A tool that was designed from the ground up to help journalists create powerful, interactive data visualizations that you can slipstream directly onto the web via the [Processing.js](http://processingjs.org/) library.
– [D3](http://d3js.org/) : The foundation of modern, data-driven visualization on the web.
– [Gephi](https://gephi.org/) : A very powerful tool when you need to explore networks & create beautiful, publication-worthy visualizations.
– [MongoDB](http://www.mongodb.org/) : NoSQL database that’s highly & easily scaleable without a steep learning curve.
– [CRUSH Tools by Google](https://code.google.com/p/crush-tools/) : Kicks up your command-line data munging.

Many thanks to all who attended the talk @jayjacobs & I gave at @SOURCEconf on Thursday, April 18, 2013. As promised, here are the [slides](https://dl.dropboxusercontent.com/u/43553/SOURCE-Boston-2013.pdf) which should be much less washed out than the projector version :-)

We’ve enumerated quite a bit of non-slide-but-in-presentation information that we wanted to aggregate into a blog post so you can viz along at home. If you need more of a guided path, I strongly encourage you to take a look at some of the free courses over at [Coursera](https://www.coursera.org/).

For starters, here’s a bit.ly bundle of data analysis & visualization bookmarks that @dseverski & I maintain. We’ve been doing (IMO) a pretty good job adding new resources as they come up and may have some duplicates to the ones below.

People Mentioned

– [Stephen Few’s Perceptual Edge blog](http://www.perceptualedge.com/) : Start from the beginning to learn from a giant in information visualization
– [Andy Kirk’s Visualising Data blog](http://www.visualisingdata.com/) (@visualisingdata) : Perhaps the quintessential leader in the modern visualization movement.
– [Mike Bostock’s blog](http://bost.ocks.org/mike/) (@mbostock) : Creator of D3 and producer of amazing, interactive graphics for the @NYTimes
– [Edward Tufte’s blog](http://www.edwardtufte.com/tufte/) : The father of what we would now identify as our core visualization principles & practices.
– [Nathan Yau’s Flowing Data blog](http://flowingdata.com/) : Making visualization accessible, practical and repeatable.
– [Jay’s blog](http://beechplane.wordpress.com/)
– [My {this} blog](http://rud.is/b)

Tools Mentioned

– [R](http://www.r-project.org/) : Jay & I probably use this a bit too much as a hammer (i.e. treat ever data project as a nail) but it’s just far too flexible and powerful to not use as a go-to resource
– [RStudio](http://www.rstudio.com/) : An *amazing* IDE for R. I, personally, usually despise IDEs (yes, I even dislike Xcode), but RStudio truly improves workflow by several orders of magnitude. There are both desktop and server versions of it; the latter gives you the ability to setup a multi-user environment and use the IDE from practically anywhere you are. RStudio also makes generating [reproducible research](http://cran.r-project.org/web/views/ReproducibleResearch.html) a joy with built-in easy access to tools like [kintr](http://yihui.name/knitr/).
– [iPython](http://ipython.org/) : This version of Python takes an already amazing language and kicks it up a few notches. It brings it up to the level of R+RStudio, especially with it’s knitr-like [iPython Notebooks](http://ipython.org/ipython-doc/dev/interactive/htmlnotebook.html) for–again–reproducible research.
– [SecViz](http://secviz.org/) : Security-centric Visualization Site & Tools by @raffaelmarty
– [Mondrian](http://www.theusrus.de/Mondrian/) : This tool needs far more visibility. It enables extremely quick visualization of even very large data sets. The interface takes a bit of getting used to, but it’s faster then typing R commands or fumbling in Excel.
– [Tableau](http://www.tableausoftware.com/) : This tool may be one of the most accessible, fast & flexible ways to explore data sets to get an idea of where you need to/can do further analysis.
– [Processing](http://processing.org/) : A tool that was designed from the ground up to help journalists create powerful, interactive data visualizations that you can slipstream directly onto the web via the [Processing.js](http://processingjs.org/) library.
– [D3](http://d3js.org/) : The foundation of modern, data-driven visualization on the web.
– [Gephi](https://gephi.org/) : A very powerful tool when you need to explore networks & create beautiful, publication-worthy visualizations.
– [MongoDB](http://www.mongodb.org/) : NoSQL database that’s highly & easily scaleable without a steep learning curve.
– [CRUSH Tools by Google](https://code.google.com/p/crush-tools/) : Kicks up your command-line data munging.

This is a fourth post in my [Visualizing Risky Words](http://rud.is/b/2013/03/06/visualizing-risky-words/) series. You’ll need to read starting from that link for context if you’re just jumping in now.

I was going to create a rudimentary version of an interactive word tree for this, but the extremely talented @jasondavies (I marvel especially at his cartographic work) just posted what is probably the best online [word tree generator](https://www.jasondavies.com/wordtree/) ever made…and in D3 no less.

Word_Tree

A word tree is a “visual interactive concordance” and was created back in 2007 by Martin Wattenberg and Fernanda Viégas. You can [read more about](http://hint.fm/projects/wordtree/) this technique on your own, but a good summary (from their site) is:

A word tree is a visual search tool for unstructured text, such as a book, article, speech or poem. It lets you pick a word or phrase and shows you all the different contexts in which it appears. The contexts are arranged in a tree-like branching structure to reveal recurrent themes and phrases.

I pasted the VZ RISK INTSUM texts into Jason’s tool so you could investigate the corpus to your heart’s content. I would suggest exploring “patch”, “vulnerability”, “adobe”, “breach” & “malware” (for starters).

Jason’s implementation is nothing short of beautiful. He uses SVG text tspans to make the individual text elements not just selectable but easily scaleable with browser window resize events.

Screenshot_3_12_13_1_36_PM

The actual [word tree D3 javascript code](http://www.jasondavies.com/wordtree/wordtree.js?20130312.1) shows just how powerful the combination of the language and @mbostock’s library is. He has, in essence, built a completely cross-platform tokenizer and interactive visualization tool in ~340 lines of javascript. Working your way through that code through to understanding will really help improve your D3 skills.

The DST changeover in the US has made today a fairly strange one, especially when combined with a very busy non-computing day yesterday. That strangeness manifest as a need to take the D3 heatmap idea mentioned in the [previous post](http://rud.is/b/2013/03/09/visualizing-risky-words-part-2/) and actually (mostly) implement it. Folks just coming to this thread may want to start with the [first post](http://rud.is/b/2013/03/06/visualizing-risky-words/) in the series.

I did a quick extraction of the R TermDocumentMatrix with nested for loops and then extracted the original texts of the corpus and put them into some javascript variables along with some D3 code to show how to do a [rudimentary interactive heatmap](http://rud.is/d3/vzwordshm/).

(click for larger)

(click for live demo)

As you mouse over each of the tiles they will be highlighted and the word/document will be displayed along with the frequency count. Click on the tile and the text will appear with the highlighted word.

Some caveats:

– The heatmap looks a bit different from the R one in part 2 as the terms/keywords are in alphabetical order.
– There are no column or row headers. I won’t claim anything but laziness, though the result does look a bit cleaner this way.
– I didn’t bother extracting the stemming results (it’s kind of a pain), so not all of the keywords will be highlighted when you select a tile.
– It’s one, big HTML document complete with javascript & data. That’s not a recommended practice in production code but it will make it easier for folks to play around with.
– The HTML is not commented well, but there’s really not much of it. The code that makes the heatmap is pretty straightforward if you even have a rough familiarity with D3. Basically, enumerate the data, generating a colored tile for each row/col entry and then mapping click & mouseover/out events to each generated SVG element.
– I also use jQuery in the code, but that’s not a real dependency. I just like using jQuery selectors for non-D3 graphics work. It bulks up the document, so use them together wisely. NOTE: If you don’t want to use jQuery, you’ll need to change the selector code.

Drop a note in the comments if you do use the base example and improve on it or if you have any questions on the code. For those interested, I’ll be putting all the code from the “Visualizing Risky Words” posts into a github repository at the end of the series.

NOTE: Parts [2], [3] & [4] are also now up.

Inspired by a post by @bfist who created the following word cloud in Ruby from VZ RISK INTSUM posts (visit the link or select the visualization to go to the post):

intsumwordcld-copy-2

I ♥ word clouds as much as anyone and usually run Presidential proclamations & SOTU addresses through a word cloud generator just to see what the current year’s foci are.

However, word clouds rarely provide what the creator of the visualization intends. Without performing more strict corpus analysis, one is really just getting a font-based frequency counter. While pretty, it’s not a good idea to derive too much meaning from a simple frequency count since there are phrase & sentence structure components to consider as well as concepts such as stemming (e.g. “risks” & “risk” are most likely the same thing, one is just plural…that’s a simplistic definition/example, though).

I really liked Jim Vallandingham’s Building a Bubble Cloud walk through on how he made a version of @mbostock’s NYTimes convention word counts and decided to both run a rudimentary stem on the VZ RISK INTSUM corpus along with a principal component analysis [PDF] to find the core text influencers and feed the output to a modified version of the bubble cloud:

Screenshot_3_6_13_10_55_PM

You can select the graphic to go to the “interactive”/larger version of it. I had intended to make selecting a circle bring up the relevant documents from the post corpus, but that will have to be a task for another day.

It’s noteworthy that both @bfist’s work and this modified version share many of the same core “important” words. With some stemming refinement and further stopword removal (e.g. “week” was in the original run of this visualization and is of no value for this risk-oriented visualization, so I made it part of the stopword list), this could be a really good way to get an overview of what the risky year was all about.

I won’t promise anything, but I’ll try to get the R code cleaned up enough to post. It’s really basic tm & PCA work, so no rocket-science degree is required. Fork @vlandham’s github repo & follow the aforelinked tutorial for the crunchy D3-goodness bits.

Many thanks to all who attended the talk @jayjacobs & I gave at RSA on Tuesday, February 26, 2013. It was really great to be able to talk to so many of you afterwards as well.

We’ve enumerated quite a bit of non-slide-but-in-presentation information that we wanted to aggregate into a blog post so you can viz along at home. If you need more of a guided path, I strongly encourage you to take a look at some of the free courses over at [Coursera](https://www.coursera.org/).

For starters, here’s a bit.ly bundle of data analysis & visualization bookmarks that @dseverski & I maintain. We’ve been doing (IMO) a pretty good job adding new resources as they come up and may have some duplicates to the ones below.

People Mentioned

– [Stephen Few’s Perceptual Edge blog](http://www.perceptualedge.com/) : Start from the beginning to learn from a giant in information visualization
– [Andy Kirk’s Visualising Data blog](http://www.visualisingdata.com/) (@visualisingdata) : Perhaps the quintessential leader in the modern visualization movement.
– [Mike Bostock’s blog](http://bost.ocks.org/mike/) (@mbostock) : Creator of D3 and producer of amazing, interactive graphics for the @NYTimes
– [Edward Tufte’s blog](http://www.edwardtufte.com/tufte/) : The father of what we would now identify as our core visualization principles & practices

Tools Mentioned

– [R](http://www.r-project.org/) : Jay & I probably use this a bit too much as a hammer (i.e. treat ever data project as a nail) but it’s just far too flexible and powerful to not use as a go-to resource
– [RStudio](http://www.rstudio.com/) : An *amazing* IDE for R. I, personally, usually despise IDEs (yes, I even dislike Xcode), but RStudio truly improves workflow by several orders of magnitude. There are both desktop and server versions of it; the latter gives you the ability to setup a multi-user environment and use the IDE from practically anywhere you are. RStudio also makes generating [reproducible research](http://cran.r-project.org/web/views/ReproducibleResearch.html) a joy with built-in easy access to tools like [kintr](http://yihui.name/knitr/).
– [iPython](http://ipython.org/) : This version of Python takes an already amazing language and kicks it up a few notches. It brings it up to the level of R+RStudio, especially with it’s knitr-like [iPython Notebooks](http://ipython.org/ipython-doc/dev/interactive/htmlnotebook.html) for–again–reproducible research.
– [Mondrian](http://www.theusrus.de/Mondrian/) : This tool needs far more visibility. It enables extremely quick visualization of even very large data sets. The interface takes a bit of getting used to, but it’s faster then typing R commands or fumbling in Excel.
– [Tableau](http://www.tableausoftware.com/) : This tool may be one of the most accessible, fast & flexible ways to explore data sets to get an idea of where you need to/can do further analysis.
– [Processing](http://processing.org/) : A tool that was designed from the ground up to help journalists create powerful, interactive data visualizations that you can slipstream directly onto the web via the [Processing.js](http://processingjs.org/) library.
– [D3](http://d3js.org/) : The foundation of modern, data-driven visualization on the web.
– [Gephi](https://gephi.org/) : A very powerful tool when you need to explore networks & create beautiful, publication-worthy visualizations.
– [MongoDB](http://www.mongodb.org/) : NoSQL database that’s highly & easily scaleable without a steep learning curve.
– [CRUSH Tools by Google](https://code.google.com/p/crush-tools/) : Kicks up your command-line data munging.

The small igraph visualization in the previous post shows the basics of what you can do with the BulkOrigin & BulkPeer functions, and I thought a larger example with some basic D3 tossed in might be even more useful.

Assuming you have the previous functions in your environment, the following builds a larger graph structure (the IPs came from an overnight sample of pcap captured communication between my MacBook Pro & cloud services) and plots a similar circular graph:

library(igraph)
 
ips = c("100.43.81.11","100.43.81.7","107.20.39.216","108.166.87.63","109.152.4.217","109.73.79.58","119.235.237.17","128.12.248.13","128.221.197.57","128.221.197.60","128.221.224.57","129.241.249.6","134.226.56.7","137.157.8.253","137.69.117.58","142.56.86.35","146.255.96.169","150.203.4.24","152.62.109.57","152.62.109.62","160.83.30.185","160.83.30.202","160.83.72.205","161.69.220.1","168.159.192.57","168.244.164.254","173.165.182.190","173.57.120.151","175.41.236.5","176.34.78.244","178.85.44.139","184.172.0.214","184.72.187.192","193.164.138.35","194.203.96.184","198.22.122.158","199.181.136.59","204.191.88.251","204.4.182.15","205.185.121.149","206.112.95.181","206.47.249.246","207.189.121.46","207.54.134.4","209.221.90.250","212.36.53.166","216.119.144.209","216.43.0.10","23.20.117.241","23.20.204.157","23.20.9.81","23.22.63.190","24.207.64.10","24.64.233.203","37.59.16.223","49.212.154.200","50.16.130.169","50.16.179.34","50.16.29.33","50.17.13.221","50.17.43.219","50.18.234.67","63.71.9.108","64.102.249.7","64.31.190.1","65.210.5.50","65.52.1.12","65.60.80.199","66.152.247.114","66.193.16.162","66.249.71.143","66.249.71.47","66.249.72.76","66.41.34.181","69.164.221.186","69.171.229.245","69.28.149.29","70.164.152.31","71.127.49.50","71.41.139.254","71.87.20.2","74.112.131.127","74.114.47.11","74.121.22.10","74.125.178.81","74.125.178.82","74.125.178.88","74.125.178.94","74.176.163.56","76.118.2.138","76.126.174.105","76.14.60.62","76.168.198.238","76.22.130.45","77.79.6.37","81.137.59.193","82.132.239.186","82.132.239.97","8.28.16.254","83.111.54.154","83.251.15.145","84.61.15.10","85.90.76.149","88.211.53.36","89.204.182.67","93.186.30.114","96.27.136.169","97.107.138.192","98.158.20.231","98.158.20.237")
origin = BulkOrigin(ips)
peers = BulkPeer(ips)
 
g = graph.empty() + vertices(ips,size=10,color="red",group=1)
g = g + vertices(unique(c(peers$Peer.AS, origin$AS)),size=10,color="lightblue",group=2)
V(g)$label = c(ips, unique(c(peers$Peer.AS, origin$AS)))
ip.edges = lapply(ips,function(x) {
  c(x,origin[origin$IP==x,]$AS)
})
bgp.edges = lapply(unique(origin$BGP.Prefix),function(x) {
  startAS = unique(origin[origin$BGP.Prefix==x,]$AS)
  pAS = peers[peers$BGP.Prefix==x,]$Peer.AS
  lapply(pAS,function(y) {
    c(startAS,y)
  })
})
g = g + edges(unlist(ip.edges))
g = g + edges(unlist(bgp.edges))
E(g)$weight = 1
g = simplify(g, edge.attr.comb=list(weight="sum"))
E(g)$arrow.size = 0
g$layout = layout.circle
plot(g)

I’ll let you run that to see how horrid a large, style-/layout-unmodified circular layout graph looks.

Thanks to a snippet on StackOverflow, it’s really easy to get this into D3:

library(RJSONIO) 
temp<-cbind(V(g)$name,V(g)$group)
colnames(temp)<-c("name","group")
js1<-toJSON(temp)
write.graph(g,"/tmp/edgelist.csv",format="edgelist")
edges<-read.csv("/tmp/edgelist.csv",sep=" ",header=F)
colnames(edges)<-c("source","target")
edges<-as.matrix(edges)
js2<-toJSON(edges)
asn<-paste('{"nodes":',js1,',"links":',js2,'}',sep="")
write(asn,file="/tmp/asn.json")

We can take the resulting asn.json file and use it as a drop-in replacement for one of the example D3 force-directed layout building blocks and produce this:

Click for larger

Click for larger

Rather than view a static image, you can view the resulting D3 visualization (warning: it’s fairly big).

Both the conversion snippet and the D3 code can be easily tweaked to add more detail and be a tad more interactive/informative, but I’m hoping this larger example provides further inspiration for folks looking to do computer network analysis & visualization with R and may also help some others build more linkages between R & D3.