

{"id":2262,"date":"2013-03-09T07:29:35","date_gmt":"2013-03-09T12:29:35","guid":{"rendered":"http:\/\/rud.is\/b\/?p=2262"},"modified":"2017-04-02T22:51:49","modified_gmt":"2017-04-03T03:51:49","slug":"visualizing-risky-words-part-2","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/","title":{"rendered":"Visualizing Risky Words \u2014 Part 2"},"content":{"rendered":"<p>This is a follow-up to my [Visualizing Risky Words](http:\/\/rud.is\/b\/2013\/03\/06\/visualizing-risky-words\/) post. You&#8217;ll need to read that for context if you&#8217;re just jumping in now. Full R code for the generated images (which are pretty large) is at the end.<\/p>\n<p>Aesthetics are the primary reason for using a word cloud, though one <i>can<\/i> pretty quickly recognize what words were more important on well crafted ones. An interactive bubble chart is a tad better as it lets you explore the corpus elements that contained the terms (a feature I have not added yet).<\/p>\n<p>I would posit that a simple bar chart can be of similar use if one is trying to get a feel for overall word use across a corpus:<\/p>\n<p><center><a target=\"_blank\" href=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"2264\" data-permalink=\"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/freq-bars\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars.png?fit=2550%2C3300&amp;ssl=1\" data-orig-size=\"2550,3300\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;}\" data-image-title=\"freq-bars\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars.png?fit=231%2C300&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars.png?fit=510%2C659&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars.png?resize=510%2C659&#038;ssl=1\" alt=\"freq-bars\" width=\"510\" height=\"659\" class=\"aligncenter size-large wp-image-2264\" srcset=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars.png?resize=530%2C685&amp;ssl=1 530w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars.png?resize=115%2C150&amp;ssl=1 115w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars.png?resize=231%2C300&amp;ssl=1 231w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars.png?resize=535%2C692&amp;ssl=1 535w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars.png?w=1020&amp;ssl=1 1020w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars.png?w=1530&amp;ssl=1 1530w\" sizes=\"auto, (max-width: 510px) 100vw, 510px\" \/><br \/>(click for larger version)<\/a><\/center><\/p>\n<p>It&#8217;s definitely not as sexy as a word cloud, but it may be a better visualization choice if you&#8217;re trying to do analysis vs just make a pretty picture.<\/p>\n<p>If you are trying to analyze a corpus, you might want to see which elements influenced the term frequencies the most, primarily to see if there were any outliers (i.e. strong influencers). With that in mind, I took @bfist&#8217;s [corpus](http:\/\/securityblog.verizonbusiness.com\/2013\/03\/06\/2012-intsum-word-cloud\/) and generated a heat map from the top terms\/keywords:<\/p>\n<p><center><a target=\"_blank\" href=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/risk-hm.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"2263\" data-permalink=\"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/risk-hm\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/risk-hm.png?fit=3300%2C2550&amp;ssl=1\" data-orig-size=\"3300,2550\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;}\" data-image-title=\"risk-hm\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/risk-hm.png?fit=300%2C231&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/risk-hm.png?fit=510%2C394&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/risk-hm.png?resize=510%2C394&#038;ssl=1\" alt=\"risk-hm\" width=\"510\" height=\"394\" class=\"aligncenter size-large wp-image-2263\" srcset=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/risk-hm.png?resize=530%2C409&amp;ssl=1 530w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/risk-hm.png?resize=150%2C115&amp;ssl=1 150w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/risk-hm.png?resize=300%2C231&amp;ssl=1 300w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/risk-hm.png?resize=535%2C413&amp;ssl=1 535w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/risk-hm.png?w=1020&amp;ssl=1 1020w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/risk-hm.png?w=1530&amp;ssl=1 1530w\" sizes=\"auto, (max-width: 510px) 100vw, 510px\" \/><br \/>(click for larger version)<\/a><\/center><\/p>\n<p>There are some stronger influencers, but there is a pattern of general, regular usage of the terms across each corpus component. This is to be expected for this particular set as each post is going to be talking about the same types of security threats, vulnerabilities &#038; issues.<\/p>\n<p>The R code below is fully annotated, but it&#8217;s important to highlight a few items in it and on the analysis as a whole:<\/p>\n<p>&#8211; The extra, corpus-specific stopword list : <em>&#8220;week&#8221;, &#8220;report&#8221;, &#8220;security&#8221;, &#8220;weeks&#8221;, &#8220;tuesday&#8221;, &#8220;update&#8221;, &#8220;team&#8221;<\/em> : was designed after manually inspecting the initial frequency breakdowns and inserting my opinion at the efficacy (or lack thereof) of including those terms. I&#8217;m sure another iteration would add more (like <em>&#8220;released&#8221;<\/em> and <em>&#8220;reported&#8221;<\/em>). Your expert view needs to shape the analysis and&mdash;in most cases&mdash;that analysis is far from a static\/one-off exercise.<br \/>\n&#8211; Another area of opine was the choice of <code>0.7<\/code> in the <code>removeSparseTerms(tdm, sparse=0.7)<\/code> call. I started at <code>0.5<\/code> and worked up through <code>0.8<\/code>, inspecting the results at each iteration. Playing around with that number and re-generating the heatmap might be an interesting exercise to perform (hint).<br \/>\n&#8211; Same as the above for the choice of <code>10<\/code> in <code>subset(tf, tf>=10)<\/code>. Tweak the value and re-do the bar chart vis!<br \/>\n&#8211; After the initial &#8220;<em>ooh! ahh!<\/em>&#8221; from a word cloud or even the above bar chart (though, bar charts tend to not evoke emotional reactions) is to ask yourself <i>&#8220;so what?&#8221;<\/i>. There&#8217;s nothing inherently wrong with generating a visualization just to make one, but it&#8217;s way cooler to actually have a reason or a question in  mind. One possible answer to a &#8220;<i>so what?<\/i>&#8221; for the bar chart is to take the high frequency terms and do a bigram\/digraph breakdown on them and even do a larger cross-term frequency association analysis (both of which we&#8217;ll do in another post)<br \/>\n&#8211; The heat map would be far more useful as a D3 visualization where you could select a tile and view the corpus elements with the term highlighted or even select a term on the Y axis and view an extract from all the corpus elements that make it up. That <i>might<\/i> make it to the TODO list, but no promises.<\/p>\n<p>I deliberately tried to make this as simple as possible for those new to R to show how straightforward and brief text corpus analysis can be (there&#8217;s less than 20 lines of code excluding library imports, whitespace, comments and the unnecessary expansion of some of the <code>tm<\/code> function calls that could have been combined into one). Furthermore, this is really just a basic demonstration of <code>tm<\/code> package functionality. The post\/code is also aimed pretty squarely at the information security crowd as we tend to not like examples that aren&#8217;t in our domain. Hopefully it makes a good starting point for folks and, as always, questions\/comments are heartily encouraged.<\/p>\n<pre lang=\"rsplus\"># need this NOAWT setting if you're running it on Mac OS; doesn't hurt on others\r\nSys.setenv(NOAWT=TRUE)\r\nlibrary(ggplot2)\r\nlibrary(ggthemes)\r\nlibrary(tm)\r\nlibrary(Snowball) \r\nlibrary(RWeka) \r\nlibrary(reshape)\r\n\r\n# input the raw corpus raw text\r\n# you could read directly from @bfist's source : http:\/\/l.rud.is\/10tUR65\r\na = readLines(\"intext.txt\")\r\n\r\n# convert raw text into a Corpus object\r\n# each line will be a different \"document\"\r\nc = Corpus(VectorSource(a))\r\n\r\n# clean up the corpus (function calls are obvious)\r\nc = tm_map(c, tolower)\r\nc = tm_map(c, removePunctuation)\r\nc = tm_map(c, removeNumbers)\r\n\r\n# remove common stopwords\r\nc = tm_map(c, removeWords, stopwords())\r\n\r\n# remove custom stopwords (I made this list after inspecting the corpus)\r\nc = tm_map(c, removeWords, c(\"week\",\"report\",\"security\",\"weeks\",\"tuesday\",\"update\",\"team\"))\r\n\r\n# perform basic stemming : background: http:\/\/l.rud.is\/YiKB9G\r\n# save original corpus\r\nc_orig = c\r\n\r\n# do the actual stemming\r\nc = tm_map(c, stemDocument)\r\nc = tm_map(c, stemCompletion, dictionary=c_orig)\r\n\r\n# create term document matrix : http:\/\/l.rud.is\/10tTbcK : from corpus\r\ntdm = TermDocumentMatrix(c, control = list(minWordLength = 1))\r\n\r\n# remove the sparse terms (requires trial->inspection cycle to get sparse value \"right\")\r\ntdm.s = removeSparseTerms(tdm, sparse=0.7)\r\n\r\n# we'll need the TDM as a matrix\r\nm = as.matrix(tdm.s)\r\n\r\n# datavis time\r\n\r\n# convert matri to data frame\r\nm.df = data.frame(m)\r\n\r\n# quick hack to make keywords - which got stuck in row.names - into a variable\r\nm.df$keywords = rownames(m.df)\r\n\r\n# \"melt\" the data frame ; ?melt at R console for info\r\nm.df.melted = melt(m.df)\r\n\r\n# not necessary, but I like decent column names\r\ncolnames(m.df.melted) = c(\"Keyword\",\"Post\",\"Freq\")\r\n\r\n# generate the heatmap\r\nhm = ggplot(m.df.melted, aes(x=Post, y=Keyword)) + \r\n  geom_tile(aes(fill=Freq), colour=\"white\") + \r\n  scale_fill_gradient(low=\"black\", high=\"darkorange\") + \r\n  labs(title=\"Major Keyword Use Across VZ RISK INTSUM 202 Corpus\") + \r\n  theme_few() +\r\n  theme(axis.text.x  = element_text(size=6))\r\nggsave(plot=hm,filename=\"risk-hm.png\",width=11,height=8.5)\r\n\r\n# not done yet\r\n\r\n# better? way to view frequencies\r\n# sum rows of the tdm to get term freq count\r\ntf = rowSums(as.matrix(tdm))\r\n# we don't want all the words, so choose ones with 10+ freq\r\ntf.10 = subset(tf, tf>=10)\r\n\r\n# wimping out and using qplot so I don't have to make another data frame\r\nbf = qplot(names(tf.10), tf.10, geom=\"bar\") + \r\n  coord_flip() + \r\n  labs(title=\"VZ RISK INTSUM Keyword Frequencies\", x=\"Keyword\",y=\"Frequency\") + \r\n  theme_few()\r\nggsave(plot=bf,filename=\"freq-bars.png\",width=8.5,height=11)<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>This is a follow-up to my [Visualizing Risky Words](http:\/\/rud.is\/b\/2013\/03\/06\/visualizing-risky-words\/) post. You&#8217;ll need to read that for context if you&#8217;re just jumping in now. Full R code for the generated images (which are pretty large) is at the end. Aesthetics are the primary reason for using a word cloud, though one can pretty quickly recognize what [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":true,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[24,677,678,673,674,3,91],"tags":[],"class_list":["post-2262","post","type-post","status-publish","format-standard","hentry","category-charts-graphs","category-data-analysis-2","category-data-visualization","category-datavis-2","category-dataviz","category-information-security","category-r"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Visualizing Risky Words \u2014 Part 2 - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Visualizing Risky Words \u2014 Part 2 - rud.is\" \/>\n<meta property=\"og:description\" content=\"This is a follow-up to my [Visualizing Risky Words](http:\/\/rud.is\/b\/2013\/03\/06\/visualizing-risky-words\/) post. You&#8217;ll need to read that for context if you&#8217;re just jumping in now. Full R code for the generated images (which are pretty large) is at the end. Aesthetics are the primary reason for using a word cloud, though one can pretty quickly recognize what [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2013-03-09T12:29:35+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2017-04-03T03:51:49+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars-530x685.png\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"Visualizing Risky Words \u2014 Part 2\",\"datePublished\":\"2013-03-09T12:29:35+00:00\",\"dateModified\":\"2017-04-03T03:51:49+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/\"},\"wordCount\":702,\"commentCount\":2,\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"image\":{\"@id\":\"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars-530x685.png\",\"articleSection\":[\"Charts &amp; Graphs\",\"Data Analysis\",\"Data Visualization\",\"DataVis\",\"DataViz\",\"Information Security\",\"R\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/\",\"url\":\"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/\",\"name\":\"Visualizing Risky Words \u2014 Part 2 - rud.is\",\"isPartOf\":{\"@id\":\"https:\/\/rud.is\/b\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars-530x685.png\",\"datePublished\":\"2013-03-09T12:29:35+00:00\",\"dateModified\":\"2017-04-03T03:51:49+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/#primaryimage\",\"url\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars.png?fit=2550%2C3300&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars.png?fit=2550%2C3300&ssl=1\",\"width\":2550,\"height\":3300},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/rud.is\/b\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Visualizing Risky Words \u2014 Part 2\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/rud.is\/b\/#website\",\"url\":\"https:\/\/rud.is\/b\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/rud.is\/b\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\/\/rud.is\"],\"url\":\"https:\/\/rud.is\/b\/author\/hrbrmstr\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Visualizing Risky Words \u2014 Part 2 - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/","og_locale":"en_US","og_type":"article","og_title":"Visualizing Risky Words \u2014 Part 2 - rud.is","og_description":"This is a follow-up to my [Visualizing Risky Words](http:\/\/rud.is\/b\/2013\/03\/06\/visualizing-risky-words\/) post. You&#8217;ll need to read that for context if you&#8217;re just jumping in now. Full R code for the generated images (which are pretty large) is at the end. Aesthetics are the primary reason for using a word cloud, though one can pretty quickly recognize what [&hellip;]","og_url":"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/","og_site_name":"rud.is","article_published_time":"2013-03-09T12:29:35+00:00","article_modified_time":"2017-04-03T03:51:49+00:00","og_image":[{"url":"https:\/\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars-530x685.png","type":"","width":"","height":""}],"author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"Visualizing Risky Words \u2014 Part 2","datePublished":"2013-03-09T12:29:35+00:00","dateModified":"2017-04-03T03:51:49+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/"},"wordCount":702,"commentCount":2,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"image":{"@id":"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars-530x685.png","articleSection":["Charts &amp; Graphs","Data Analysis","Data Visualization","DataVis","DataViz","Information Security","R"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/","url":"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/","name":"Visualizing Risky Words \u2014 Part 2 - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"primaryImageOfPage":{"@id":"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/#primaryimage"},"image":{"@id":"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/#primaryimage"},"thumbnailUrl":"https:\/\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars-530x685.png","datePublished":"2013-03-09T12:29:35+00:00","dateModified":"2017-04-03T03:51:49+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/#primaryimage","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars.png?fit=2550%2C3300&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/03\/freq-bars.png?fit=2550%2C3300&ssl=1","width":2550,"height":3300},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"Visualizing Risky Words \u2014 Part 2"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p23idr-Au","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":2245,"url":"https:\/\/rud.is\/b\/2013\/03\/06\/visualizing-risky-words\/","url_meta":{"origin":2262,"position":0},"title":"Visualizing Risky Words","author":"hrbrmstr","date":"2013-03-06","format":false,"excerpt":"NOTE: Parts [2], [3] & [4] are also now up. Inspired by a post by @bfist who created the following word cloud in Ruby from VZ RISK INTSUM posts (visit the link or select the visualization to go to the post): I \u2665 word clouds as much as anyone and\u2026","rel":"","context":"In &quot;d3&quot;","block_context":{"text":"d3","link":"https:\/\/rud.is\/b\/category\/d3\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2288,"url":"https:\/\/rud.is\/b\/2013\/03\/10\/visualizing-risky-words-part-3\/","url_meta":{"origin":2262,"position":1},"title":"Visualizing Risky Words \u2014 Part 3","author":"hrbrmstr","date":"2013-03-10","format":false,"excerpt":"The DST changeover in the US has made today a fairly strange one, especially when combined with a very busy non-computing day yesterday. That strangeness manifest as a need to take the D3 heatmap idea mentioned in the [previous post](http:\/\/rud.is\/b\/2013\/03\/09\/visualizing-risky-words-part-2\/) and actually (mostly) implement it. Folks just coming to this\u2026","rel":"","context":"In &quot;d3&quot;","block_context":{"text":"d3","link":"https:\/\/rud.is\/b\/category\/d3\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2314,"url":"https:\/\/rud.is\/b\/2013\/03\/12\/visualizing-risky-words-part-4-d3-word-trees\/","url_meta":{"origin":2262,"position":2},"title":"Visualizing Risky Words \u2014 Part 4 (D3 Word Trees)","author":"hrbrmstr","date":"2013-03-12","format":false,"excerpt":"This is a fourth post in my [Visualizing Risky Words](http:\/\/rud.is\/b\/2013\/03\/06\/visualizing-risky-words\/) series. You'll need to read starting from that link for context if you're just jumping in now. I was going to create a rudimentary version of an interactive word tree for this, but the extremely talented @jasondavies (I marvel especially\u2026","rel":"","context":"In &quot;d3&quot;","block_context":{"text":"d3","link":"https:\/\/rud.is\/b\/category\/d3\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3010,"url":"https:\/\/rud.is\/b\/2014\/09\/10\/r-version-of-an-exploratory-technique-for-visualizing-the-distributions-of-100-variables\/","url_meta":{"origin":2262,"position":3},"title":"R version of &#8220;An exploratory technique for visualizing the distributions of 100 variables:&#8221;","author":"hrbrmstr","date":"2014-09-10","format":false,"excerpt":"Rick Wicklin (@[RickWicklin](https:\/\/twitter.com\/RickWicklin)) made a recent post to the SAS blog on [An exploratory technique for visualizing the distributions of 100 variables](http:\/\/blogs.sas.com\/content\/iml\/). It's a very succinct tutorial on both the power of boxplots and how to make them in SAS (of course). I'm not one to let R be \"out-boxed\",\u2026","rel":"","context":"In &quot;Charts &amp; Graphs&quot;","block_context":{"text":"Charts &amp; Graphs","link":"https:\/\/rud.is\/b\/category\/charts-graphs\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3775,"url":"https:\/\/rud.is\/b\/2015\/11\/08\/visualizing_survey_data\/","url_meta":{"origin":2262,"position":4},"title":"Visualizing Survey Data : Comparison Between Observations","author":"hrbrmstr","date":"2015-11-08","format":false,"excerpt":"Cybersecurity is a domain that really likes surveys, or at the very least it has many folks within it that like to conduct and report on surveys. One recent survey on threat intelligence is in it's second year, so it sets about comparing answers across years. Rather than go into\u2026","rel":"","context":"In &quot;Cybersecurity&quot;","block_context":{"text":"Cybersecurity","link":"https:\/\/rud.is\/b\/category\/cybersecurity\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2015\/11\/Visualizing_Survey_Data___Comparison_Between_Observations.png?fit=1200%2C721&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2015\/11\/Visualizing_Survey_Data___Comparison_Between_Observations.png?fit=1200%2C721&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2015\/11\/Visualizing_Survey_Data___Comparison_Between_Observations.png?fit=1200%2C721&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2015\/11\/Visualizing_Survey_Data___Comparison_Between_Observations.png?fit=1200%2C721&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2015\/11\/Visualizing_Survey_Data___Comparison_Between_Observations.png?fit=1200%2C721&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":2728,"url":"https:\/\/rud.is\/b\/2013\/09\/28\/obamacare-jobs-r-d3\/","url_meta":{"origin":2262,"position":5},"title":"Visualizing &#8220;ObamaCare-related&#8221; Job Cuts","author":"hrbrmstr","date":"2013-09-28","format":false,"excerpt":"UPDATE: Added some extra visualization elements since this post went live. New select menu and hover text for individual job impact detail lines in the table. I was reviewing RSS feeds when I came across this story about \"ObamaCare Employer Mandate: A List Of Cuts To Work Hours, Jobs\" over\u2026","rel":"","context":"In &quot;Charts &amp; Graphs&quot;","block_context":{"text":"Charts &amp; Graphs","link":"https:\/\/rud.is\/b\/category\/charts-graphs\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/oc-snap.png.png?fit=945%2C660&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/oc-snap.png.png?fit=945%2C660&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/oc-snap.png.png?fit=945%2C660&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2013\/09\/oc-snap.png.png?fit=945%2C660&ssl=1&resize=700%2C400 2x"},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/2262","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=2262"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/2262\/revisions"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=2262"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=2262"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=2262"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}