Skip navigation

Category Archives: infographics

We infosec folk eat up industry reports and most of us have no doubt already gobbled up @panda_security’s recently released [Q1 2013 Report](http://press.pandasecurity.com/wp-content/uploads/2010/05/PandaLabs-Quaterly-Report.pdf) [PDF]. It’s a good read (so go ahead and read it, we’ll still be here!) and I was really happy to see a nicely stylized chart in the early pages:

Screenshot_5_24_13_8_14_AM

However, I quickly became a #sadpanda when I happened across some explosive 3D pie charts later on. Rather than deride, I thought a re-imagining would be a better use of time and let you decide which visualizations both communicate better and are more appealing.

I chose to use @Datawrapper to showcase how easy it is to build and publish pleasing and informative visualizations without even leaving your browser.

Figure 4, Original:

Panda Labs Q1 2013 Report Fig 5 (Orig)

Figure 4, Alternative:

Figure 5, Original

Fig 4: New malware strains In Q1 2013, by Type (orig)

Figure 5, Alternative (horizontal vs vertical, just to mix it up a bit):

If the charts had been closer together in the report, I would have opted for vertical design for both and probably kept malware-type ordering vs sort by highest percentage.

How would you re-imagine the pie charts? Post a link to your creations in the comments and I’ll make sure they show up embedded with the post.

Many thanks to all who attended the talk @jayjacobs & I gave at @SOURCEconf on Thursday, April 18, 2013. As promised, here are the [slides](https://dl.dropboxusercontent.com/u/43553/SOURCE-Boston-2013.pdf) which should be much less washed out than the projector version :-)

We’ve enumerated quite a bit of non-slide-but-in-presentation information that we wanted to aggregate into a blog post so you can viz along at home. If you need more of a guided path, I strongly encourage you to take a look at some of the free courses over at [Coursera](https://www.coursera.org/).

For starters, here’s a bit.ly bundle of data analysis & visualization bookmarks that @dseverski & I maintain. We’ve been doing (IMO) a pretty good job adding new resources as they come up and may have some duplicates to the ones below.

People Mentioned

– [Stephen Few’s Perceptual Edge blog](http://www.perceptualedge.com/) : Start from the beginning to learn from a giant in information visualization
– [Andy Kirk’s Visualising Data blog](http://www.visualisingdata.com/) (@visualisingdata) : Perhaps the quintessential leader in the modern visualization movement.
– [Mike Bostock’s blog](http://bost.ocks.org/mike/) (@mbostock) : Creator of D3 and producer of amazing, interactive graphics for the @NYTimes
– [Edward Tufte’s blog](http://www.edwardtufte.com/tufte/) : The father of what we would now identify as our core visualization principles & practices.
– [Nathan Yau’s Flowing Data blog](http://flowingdata.com/) : Making visualization accessible, practical and repeatable.
– [Jay’s blog](http://beechplane.wordpress.com/)
– [My {this} blog](http://rud.is/b)

Tools Mentioned

– [R](http://www.r-project.org/) : Jay & I probably use this a bit too much as a hammer (i.e. treat ever data project as a nail) but it’s just far too flexible and powerful to not use as a go-to resource
– [RStudio](http://www.rstudio.com/) : An *amazing* IDE for R. I, personally, usually despise IDEs (yes, I even dislike Xcode), but RStudio truly improves workflow by several orders of magnitude. There are both desktop and server versions of it; the latter gives you the ability to setup a multi-user environment and use the IDE from practically anywhere you are. RStudio also makes generating [reproducible research](http://cran.r-project.org/web/views/ReproducibleResearch.html) a joy with built-in easy access to tools like [kintr](http://yihui.name/knitr/).
– [iPython](http://ipython.org/) : This version of Python takes an already amazing language and kicks it up a few notches. It brings it up to the level of R+RStudio, especially with it’s knitr-like [iPython Notebooks](http://ipython.org/ipython-doc/dev/interactive/htmlnotebook.html) for–again–reproducible research.
– [SecViz](http://secviz.org/) : Security-centric Visualization Site & Tools by @raffaelmarty
– [Mondrian](http://www.theusrus.de/Mondrian/) : This tool needs far more visibility. It enables extremely quick visualization of even very large data sets. The interface takes a bit of getting used to, but it’s faster then typing R commands or fumbling in Excel.
– [Tableau](http://www.tableausoftware.com/) : This tool may be one of the most accessible, fast & flexible ways to explore data sets to get an idea of where you need to/can do further analysis.
– [Processing](http://processing.org/) : A tool that was designed from the ground up to help journalists create powerful, interactive data visualizations that you can slipstream directly onto the web via the [Processing.js](http://processingjs.org/) library.
– [D3](http://d3js.org/) : The foundation of modern, data-driven visualization on the web.
– [Gephi](https://gephi.org/) : A very powerful tool when you need to explore networks & create beautiful, publication-worthy visualizations.
– [MongoDB](http://www.mongodb.org/) : NoSQL database that’s highly & easily scaleable without a steep learning curve.
– [CRUSH Tools by Google](https://code.google.com/p/crush-tools/) : Kicks up your command-line data munging.

Many thanks to all who attended the talk @jayjacobs & I gave at RSA on Tuesday, February 26, 2013. It was really great to be able to talk to so many of you afterwards as well.

We’ve enumerated quite a bit of non-slide-but-in-presentation information that we wanted to aggregate into a blog post so you can viz along at home. If you need more of a guided path, I strongly encourage you to take a look at some of the free courses over at [Coursera](https://www.coursera.org/).

For starters, here’s a bit.ly bundle of data analysis & visualization bookmarks that @dseverski & I maintain. We’ve been doing (IMO) a pretty good job adding new resources as they come up and may have some duplicates to the ones below.

People Mentioned

– [Stephen Few’s Perceptual Edge blog](http://www.perceptualedge.com/) : Start from the beginning to learn from a giant in information visualization
– [Andy Kirk’s Visualising Data blog](http://www.visualisingdata.com/) (@visualisingdata) : Perhaps the quintessential leader in the modern visualization movement.
– [Mike Bostock’s blog](http://bost.ocks.org/mike/) (@mbostock) : Creator of D3 and producer of amazing, interactive graphics for the @NYTimes
– [Edward Tufte’s blog](http://www.edwardtufte.com/tufte/) : The father of what we would now identify as our core visualization principles & practices

Tools Mentioned

– [R](http://www.r-project.org/) : Jay & I probably use this a bit too much as a hammer (i.e. treat ever data project as a nail) but it’s just far too flexible and powerful to not use as a go-to resource
– [RStudio](http://www.rstudio.com/) : An *amazing* IDE for R. I, personally, usually despise IDEs (yes, I even dislike Xcode), but RStudio truly improves workflow by several orders of magnitude. There are both desktop and server versions of it; the latter gives you the ability to setup a multi-user environment and use the IDE from practically anywhere you are. RStudio also makes generating [reproducible research](http://cran.r-project.org/web/views/ReproducibleResearch.html) a joy with built-in easy access to tools like [kintr](http://yihui.name/knitr/).
– [iPython](http://ipython.org/) : This version of Python takes an already amazing language and kicks it up a few notches. It brings it up to the level of R+RStudio, especially with it’s knitr-like [iPython Notebooks](http://ipython.org/ipython-doc/dev/interactive/htmlnotebook.html) for–again–reproducible research.
– [Mondrian](http://www.theusrus.de/Mondrian/) : This tool needs far more visibility. It enables extremely quick visualization of even very large data sets. The interface takes a bit of getting used to, but it’s faster then typing R commands or fumbling in Excel.
– [Tableau](http://www.tableausoftware.com/) : This tool may be one of the most accessible, fast & flexible ways to explore data sets to get an idea of where you need to/can do further analysis.
– [Processing](http://processing.org/) : A tool that was designed from the ground up to help journalists create powerful, interactive data visualizations that you can slipstream directly onto the web via the [Processing.js](http://processingjs.org/) library.
– [D3](http://d3js.org/) : The foundation of modern, data-driven visualization on the web.
– [Gephi](https://gephi.org/) : A very powerful tool when you need to explore networks & create beautiful, publication-worthy visualizations.
– [MongoDB](http://www.mongodb.org/) : NoSQL database that’s highly & easily scaleable without a steep learning curve.
– [CRUSH Tools by Google](https://code.google.com/p/crush-tools/) : Kicks up your command-line data munging.

Naomi Robbins is running a graph makeover challenge over at her Forbes blog and this is my entry for the B2B/B2C Traffic Sources one (click for larger version):

And, here’s the R source for how to generate it:

library(ggplot2)
 
df = read.csv("b2bb2c.csv")
 
ggplot(data=df,aes(x=Site,y=Percentage,fill=Site)) + 
  geom_bar(stat="identity") + 
  facet_grid(Venue ~ .) + 
  coord_flip() + 
  opts(legend.position = "none", title="Social Traffic Sources for B2B & B2C Companies") + 
  stat_bin(geom="text", aes(label=sprintf("%d%%",Percentage), vjust=0, hjust=-0.2, size = 10))

And, here’s the data:

Site     Venue	Percentage
Facebook B2B	72
LinkedIn B2B	16
Twitter	 B2B	12
Facebook B2C	84
LinkedIn B2C	1
Twitter	 B2C	15

I chose to go with a latticed bar chart as I think it helps show the relative portions within each category (B2B/B2C) and also enables quick comparisons across categories for all three factors.

Earlier this week, @jayjacobs & I both received our acceptance notice for the talk we submitted to the RSA CFP! [W00t!] Now the hard part: crank out a compelling presentation in the next six weeks! If you’re interested at all in doing more with your security data, this talk is for you. Full track/number & details below:

Session Track: Governance, Risk & Compliance
Session Code: GRC-T18
Scheduled Date: 02/26/2013
Scheduled Time: 2:30 PM – 3:30 PM
Session Length: 1 hr
Session Title: Data Analysis and Visualization for Security Professionals
Session Classification: Intermediate
Session Keywords: metrics, visualization, risk management, research
Short Abstract: You have a deluge of security-related data coming from all directions and may even have a fancy dashboard full of pretty charts. However, unless you know the right questions to ask and how to ask them, all you really have are compliance artifacts. Move beyond the checkbox and learn techniques for collecting, exploring and visualizing the stories within our security data.

HP & the Ponemon Institute have released their third annual “Cost of Cybercrime” report and the web wizards at HP have given us an infographic from it:


(You can see the full size one at the above link)

While some designers may think that infographic visualizations are not subject to the same scrutiny as “real” charts & graphs, I vehemently disagree. In this particular infographic, my eyes were immediately drawn to the donut chart since it’s usually a poor choice to begin with and many designers make the same error as this one did.

What’s wrong with it?“, you ask? Well, the donut chart is designed to be a modified pie chart which itself is supposed to display components of a whole (i.e. the circle represents 100% and the slices are the fractional components of said 100%). Donut charts are really hard to read since your eye misses the area & angular cues that help to make the distinction between slices.

Since the whole purpose of the “attacks” chart is to give you an idea of how bad/worse 2010/11/12 are from each other, it would be better served with a bar chart:

You can immediately see the distinction & increase over time much easier than in the donut. And, if you’re still not believing that a pie would give you a better visual indicator (yet still be a bad choice since we’re not really comparing parts of a whole) you be the judge:

But, what I may be most upset about is the fact that they released the chart at the end of October, included a spider in the upper-left but didn’t include a radar plot anywhere in the infographic! #spidersarescary :-)

All infographic criticism aside, I do thank HP & Ponemon for providing the results of their research free of charge for the benefit of the entire infosec community. I look forward to digesting the whole report.