

{"id":12684,"date":"2020-03-04T07:36:26","date_gmt":"2020-03-04T12:36:26","guid":{"rendered":"https:\/\/rud.is\/b\/?p=12684"},"modified":"2020-03-04T07:36:26","modified_gmt":"2020-03-04T12:36:26","slug":"catchpole-redux-and-hashing-files-websites-with-ssdeepr","status":"publish","type":"post","link":"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/","title":{"rendered":"{catchpole} Redux and Hashing Files &#038; Websites with {ssdeepr}"},"content":{"rendered":"<p>\u00dcber Tuesday has come and <em>almost<\/em> gone (some state results will take a while to coalesce) and I&#8217;m relieved to say that {catchpole} did indeed work, with the example code from <a href=\"https:\/\/rud.is\/b\/2020\/03\/02\/make-wsj-esque-uber-tuesday-democrat-delegate-cartograms-in-r-with-catchpole\/\">before<\/a> producing this on first run:<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQNPozWkAAErlX.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"12685\" data-permalink=\"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/esqnpozwkaaerlx\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQNPozWkAAErlX.png?fit=1812%2C1328&amp;ssl=1\" data-orig-size=\"1812,1328\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"ESQNPozWkAAErlX\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQNPozWkAAErlX.png?fit=510%2C373&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQNPozWkAAErlX.png?resize=510%2C374&#038;ssl=1\" alt=\"\" width=\"510\" height=\"374\" class=\"aligncenter size-full wp-image-12685\" srcset=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQNPozWkAAErlX.png?w=1812&amp;ssl=1 1812w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQNPozWkAAErlX.png?resize=300%2C220&amp;ssl=1 300w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQNPozWkAAErlX.png?resize=530%2C388&amp;ssl=1 530w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQNPozWkAAErlX.png?resize=150%2C110&amp;ssl=1 150w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQNPozWkAAErlX.png?resize=768%2C563&amp;ssl=1 768w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQNPozWkAAErlX.png?resize=1536%2C1126&amp;ssl=1 1536w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQNPozWkAAErlX.png?resize=500%2C366&amp;ssl=1 500w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQNPozWkAAErlX.png?resize=1200%2C879&amp;ssl=1 1200w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQNPozWkAAErlX.png?resize=400%2C293&amp;ssl=1 400w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQNPozWkAAErlX.png?resize=800%2C586&amp;ssl=1 800w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQNPozWkAAErlX.png?resize=200%2C147&amp;ssl=1 200w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQNPozWkAAErlX.png?w=1020&amp;ssl=1 1020w\" sizes=\"auto, (max-width: 510px) 100vw, 510px\" \/><\/a><\/p>\n<p>If we tweak the buffer space around the squares, I think the cartogram looks better:<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQezsLXYAA4hdB.jpeg?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"12686\" data-permalink=\"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/esqezslxyaa4hdb\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQezsLXYAA4hdB.jpeg?fit=2382%2C1630&amp;ssl=1\" data-orig-size=\"2382,1630\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"ESQezsLXYAA4hdB\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQezsLXYAA4hdB.jpeg?fit=510%2C349&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQezsLXYAA4hdB.jpeg?resize=510%2C349&#038;ssl=1\" alt=\"\" width=\"510\" height=\"349\" class=\"aligncenter size-full wp-image-12686\" srcset=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQezsLXYAA4hdB.jpeg?w=2382&amp;ssl=1 2382w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQezsLXYAA4hdB.jpeg?resize=300%2C205&amp;ssl=1 300w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQezsLXYAA4hdB.jpeg?resize=530%2C363&amp;ssl=1 530w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQezsLXYAA4hdB.jpeg?resize=150%2C103&amp;ssl=1 150w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQezsLXYAA4hdB.jpeg?resize=768%2C526&amp;ssl=1 768w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQezsLXYAA4hdB.jpeg?resize=1536%2C1051&amp;ssl=1 1536w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQezsLXYAA4hdB.jpeg?resize=2048%2C1401&amp;ssl=1 2048w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQezsLXYAA4hdB.jpeg?resize=500%2C342&amp;ssl=1 500w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQezsLXYAA4hdB.jpeg?resize=1200%2C821&amp;ssl=1 1200w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQezsLXYAA4hdB.jpeg?resize=400%2C274&amp;ssl=1 400w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQezsLXYAA4hdB.jpeg?resize=800%2C547&amp;ssl=1 800w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQezsLXYAA4hdB.jpeg?resize=200%2C137&amp;ssl=1 200w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/ESQezsLXYAA4hdB.jpeg?w=1020&amp;ssl=1 1020w\" sizes=\"auto, (max-width: 510px) 100vw, 510px\" \/><\/a><\/p>\n<p>but, you should likely use a different palette (see <a href=\"https:\/\/twitter.com\/hrbrmstr\/status\/1235126950739484672?s=20\">this Twitter thread<\/a> for examples).<\/p>\n<p>I noted in the previous post that borders might be possible. While I haven&#8217;t solved that use-case for individual states, I did manage to come up with a method for making a light version of the cartogram usable:<\/p>\n<pre><code class=\"language-r\">library(sf)\nlibrary(hrbrthemes) \nlibrary(catchpole)\nlibrary(tidyverse)\n\ndelegates &lt;- read_delegates()\n\ncandidates_expanded &lt;- expand_candidates()\n\ngsf &lt;- left_join(delegates_map(), candidates_expanded, by = c(\"state\", \"idx\"))\n\nm &lt;- delegates_map()\n\n# split off each \"area\" on the map so we can make a border+background\nlist(\n  setdiff(state.abb, c(\"HI\", \"AK\")),\n  \"AK\", \"HI\", \"DC\", \"VI\", \"PR\", \"MP\", \"GU\", \"DA\", \"AS\"\n) %&gt;% \n  map(~{\n    suppressWarnings(suppressMessages(st_buffer(\n      x = st_union(m[m$state %in% .x, ]),\n      dist = 0.0001,\n      endCapStyle = \"SQUARE\"\n    )))\n  }) -&gt; m_borders\n\ngg &lt;- ggplot()\nfor (mb in m_borders) {\n  gg &lt;- gg + geom_sf(data = mb, col = \"#2b2b2b\", size = 0.125)\n}\n\ngg + \n  geom_sf(\n    data = gsf,\n    aes(fill = candidate),\n    col = \"white\", shape = 22, size = 3, stroke = 0.125\n  ) +\n  scale_fill_manual(\n    name = NULL,\n    na.value = \"#f0f0f0\",\n    values = c(\n      \"Biden\" = '#f0027f',\n      \"Sanders\" = '#7fc97f',\n      \"Warren\" = '#beaed4',\n      \"Buttigieg\" = '#fdc086',\n      \"Klobuchar\" = '#ffff99',\n      \"Gabbard\" = '#386cb0',\n      \"Bloomberg\" = '#bf5b17'\n    ),\n    limits = intersect(unique(delegates$candidate), names(delegates_pal))\n  ) +\n  guides(\n    fill = guide_legend(\n      override.aes = list(size = 4)\n    )\n  ) +\n  coord_sf(datum = NA) +\n  theme_ipsum_es(grid=\"\") +\n  theme(legend.position = \"bottom\")\n<\/code><\/pre>\n<p><a href=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?ssl=1\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"12687\" data-permalink=\"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/catchpole-white\/\" data-orig-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?fit=2080%2C1482&amp;ssl=1\" data-orig-size=\"2080,1482\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"catchpole-white\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?fit=510%2C364&amp;ssl=1\" src=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?resize=510%2C363&#038;ssl=1\" alt=\"\" width=\"510\" height=\"363\" class=\"aligncenter size-full wp-image-12687\" srcset=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?w=2080&amp;ssl=1 2080w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?resize=300%2C214&amp;ssl=1 300w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?resize=530%2C378&amp;ssl=1 530w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?resize=150%2C107&amp;ssl=1 150w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?resize=768%2C547&amp;ssl=1 768w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?resize=1536%2C1094&amp;ssl=1 1536w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?resize=2048%2C1459&amp;ssl=1 2048w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?resize=500%2C356&amp;ssl=1 500w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?resize=1200%2C855&amp;ssl=1 1200w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?resize=400%2C285&amp;ssl=1 400w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?resize=800%2C570&amp;ssl=1 800w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?resize=200%2C143&amp;ssl=1 200w, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?w=1020&amp;ssl=1 1020w\" sizes=\"auto, (max-width: 510px) 100vw, 510px\" \/><\/a><\/p>\n<h3>{ssdeepr}<\/h3>\n<p>Researcher pals over at Binary Edge <a href=\"https:\/\/blog.binaryedge.io\/2020\/03\/02\/webv2-the-future-for-web-scanning-at-binaryedge\/\">added web page hashing<\/a> (pre- and post-javascript scraping) to their platform using <a href=\"https:\/\/ssdeep-project.github.io\/\">ssdeep<\/a>. This approach is in the category of context triggered piecewise hashes (CTPH) (or local sensitivity hashing) similar to <a href=\"https:\/\/cinc.rud.is\/web\/packages\/tlsh\/\">my R adaptation\/packaging<\/a> of Trend Micro&#8217;s <a href=\"https:\/\/github.com\/trendmicro\/tlsh\">tlsh<\/a>.<\/p>\n<p>Since I&#8217;ll be working with BE&#8217;s data off-and-on and the ssdeep project has a well-crafted library (plus we might add ssdeep support at $DAYJOB), I went ahead and <a href=\"https:\/\/cinc.rud.is\/web\/packages\/ssdeepr\/\">packaged that up as well<\/a>.<\/p>\n<p>I recommend using the <code>hash_con()<\/code> function if you need to read large blobs since it doesn&#8217;t require you to read everything into memory first (though <code>hash_file()<\/code> doesn&#8217;t either, but that&#8217;s a direct low-level call to the underlying ssdeep library file reader and not as flexible as R connections are).<\/p>\n<p>These types of hashes are great at seeing if something has changed on a website (or see how similar two things are to each other). For instance, how closely do CRAN mirror match the mothership?<\/p>\n<pre><code class=\"language-r\">library(ssdeepr) # see the links above for installation\n\ncran1 &lt;- hash_con(url(\"https:\/\/cran.r-project.org\/web\/packages\/available_packages_by_date.html\"))\ncran2 &lt;- hash_con(url(\"https:\/\/cran.biotools.fr\/web\/packages\/available_packages_by_date.html\"))\ncran3 &lt;- hash_con(url(\"https:\/\/cran.rstudio.org\/web\/packages\/available_packages_by_date.html\"))\n\nhash_compare(cran1, cran2)\n## [1] 0\n\nhash_compare(cran1, cran3)\n## [1] 94\n<\/code><\/pre>\n<p>I picked on <code>cran.biotools.fr<\/code> as I saw they were well-behind CRAN-proper on the monitoring page.<\/p>\n<p>I noted that BE was doing pre- and post-javascript hashing as well. Why, you may ask? Well, websites behave differently with javascript running, plus they can behave differently when different user-agents are set. Let&#8217;s grab a page from Wikipedia a few different ways to show how they are not alike at all, depending on the retrieval context. First, let&#8217;s grab some web content!<\/p>\n<pre><code class=\"language-r\">library(httr)\nlibrary(ssdeepr)\nlibrary(splashr)\n\n# regular grab\nh1 &lt;- hash_con(url(\"https:\/\/en.wikipedia.org\/wiki\/Donald_Knuth\"))\n\n# you need Splash running for javascript-enabled scraping this way\nsp &lt;- splash(host = \"mysplashhost\", user = \"splashuser\", pass = \"splashpass\")\n\n# js-enabled with one ua\nsp %&gt;%\n  splash_user_agent(ua_macos_chrome) %&gt;%\n  splash_go(\"https:\/\/en.wikipedia.org\/wiki\/Donald_Knuth\") %&gt;%\n  splash_wait(2) %&gt;%\n  splash_html(raw_html = TRUE) -&gt; js1\n\n# js-enabled with another ua\nsp %&gt;%\n  splash_user_agent(ua_ios_safari) %&gt;%\n  splash_go(\"https:\/\/en.wikipedia.org\/wiki\/Donald_Knuth\") %&gt;%\n  splash_wait(2) %&gt;%\n  splash_html(raw_html = TRUE) -&gt; js2\n\nh2 &lt;- hash_raw(js1)\nh3 &lt;- hash_raw(js2)\n\n# same way {rvest} does it\nres &lt;- httr::GET(\"https:\/\/en.wikipedia.org\/wiki\/Donald_Knuth\")\n\nh4 &lt;- hash_raw(content(res, as = \"raw\"))\n<\/code><\/pre>\n<p>Now, let&#8217;s compare them:<\/p>\n<pre><code class=\"language-r\">hash_compare(h1, h4) # {ssdeepr} built-in vs httr::GET() =&gt; not surprising that they're equal\n## [1] 100\n\n# things look way different with js-enabled\n\nhash_compare(h1, h2)\n## [1] 0\nhash_compare(h1, h3)\n## [1] 0\n\n# and with variations between user-agents\n\nhash_compare(h2, h3)\n## [1] 0\n\nhash_compare(h2, h4)\n## [1] 0\n\n# only doing this for completeness\n\nhash_compare(h3, h4)\n## [1] 0\n<\/code><\/pre>\n<p>For this example, just content size would have been enough to tell the difference (mostly, note how the hashes are equal despite more characters coming back with the {httr} method):<\/p>\n<pre><code class=\"language-r\">length(js1)\n## [1] 432914\n\nlength(js2)\n## [1] 270538\n\nnchar(\n  paste0(\n    readLines(url(\"https:\/\/en.wikipedia.org\/wiki\/Donald_Knuth\")),\n    collapse = \"\\n\"\n  )\n)\n## [1] 373078\n\nlength(content(res, as = \"raw\"))\n## [1] 374099\n<\/code><\/pre>\n<h3>FIN<\/h3>\n<p>If you were in a U.S. state with a primary yesterday and were eligible to vote (and had something to vote for, either a (D) candidate or a state\/local bit of business) I sure hope you did!<\/p>\n<p>The ssdeep library works on Windows, so I&#8217;ll be figuring out how to get that going in {ssdeepr} fairly soon (mostly to try out the Rtools 4.0 toolchain vs deliberately wanting to support legacy platforms).<\/p>\n<p>As usual, drop issues\/PRs\/feature requests where you&#8217;re comfortable for any of these or other packages.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u00dcber Tuesday has come and almost gone (some state results will take a while to coalesce) and I&#8217;m relieved to say that {catchpole} did indeed work, with the example code from before producing this on first run: If we tweak the buffer space around the squares, I think the cartogram looks better: but, you should [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":12687,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"","footnotes":""},"categories":[91],"tags":[],"class_list":["post-12684","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-r"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>{catchpole} Redux and Hashing Files &amp; Websites with {ssdeepr} - rud.is<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"{catchpole} Redux and Hashing Files &amp; Websites with {ssdeepr} - rud.is\" \/>\n<meta property=\"og:description\" content=\"\u00dcber Tuesday has come and almost gone (some state results will take a while to coalesce) and I&#8217;m relieved to say that {catchpole} did indeed work, with the example code from before producing this on first run: If we tweak the buffer space around the squares, I think the cartogram looks better: but, you should [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/\" \/>\n<meta property=\"og:site_name\" content=\"rud.is\" \/>\n<meta property=\"article:published_time\" content=\"2020-03-04T12:36:26+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?fit=2080%2C1482&ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"2080\" \/>\n\t<meta property=\"og:image:height\" content=\"1482\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"hrbrmstr\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"hrbrmstr\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/03\\\/04\\\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/03\\\/04\\\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\\\/\"},\"author\":{\"name\":\"hrbrmstr\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"headline\":\"{catchpole} Redux and Hashing Files &#038; Websites with {ssdeepr}\",\"datePublished\":\"2020-03-04T12:36:26+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/03\\\/04\\\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\\\/\"},\"wordCount\":488,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"image\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/03\\\/04\\\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2020\\\/03\\\/catchpole-white.png?fit=2080%2C1482&ssl=1\",\"articleSection\":[\"R\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/03\\\/04\\\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/03\\\/04\\\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\\\/\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/03\\\/04\\\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\\\/\",\"name\":\"{catchpole} Redux and Hashing Files & Websites with {ssdeepr} - rud.is\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/03\\\/04\\\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/03\\\/04\\\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2020\\\/03\\\/catchpole-white.png?fit=2080%2C1482&ssl=1\",\"datePublished\":\"2020-03-04T12:36:26+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/03\\\/04\\\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/03\\\/04\\\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/03\\\/04\\\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\\\/#primaryimage\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2020\\\/03\\\/catchpole-white.png?fit=2080%2C1482&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2020\\\/03\\\/catchpole-white.png?fit=2080%2C1482&ssl=1\",\"width\":2080,\"height\":1482},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/2020\\\/03\\\/04\\\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/rud.is\\\/b\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"{catchpole} Redux and Hashing Files &#038; Websites with {ssdeepr}\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#website\",\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/\",\"name\":\"rud.is\",\"description\":\"&quot;In God we trust. All others must bring data&quot;\",\"publisher\":{\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/rud.is\\\/b\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/rud.is\\\/b\\\/#\\\/schema\\\/person\\\/d7cb7487ab0527447f7fda5c423ff886\",\"name\":\"hrbrmstr\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\",\"width\":460,\"height\":460,\"caption\":\"hrbrmstr\"},\"logo\":{\"@id\":\"https:\\\/\\\/i0.wp.com\\\/rud.is\\\/b\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/ukr-shield.png?fit=460%2C460&ssl=1\"},\"description\":\"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7\",\"sameAs\":[\"http:\\\/\\\/rud.is\"],\"url\":\"https:\\\/\\\/rud.is\\\/b\\\/author\\\/hrbrmstr\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"{catchpole} Redux and Hashing Files & Websites with {ssdeepr} - rud.is","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/","og_locale":"en_US","og_type":"article","og_title":"{catchpole} Redux and Hashing Files & Websites with {ssdeepr} - rud.is","og_description":"\u00dcber Tuesday has come and almost gone (some state results will take a while to coalesce) and I&#8217;m relieved to say that {catchpole} did indeed work, with the example code from before producing this on first run: If we tweak the buffer space around the squares, I think the cartogram looks better: but, you should [&hellip;]","og_url":"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/","og_site_name":"rud.is","article_published_time":"2020-03-04T12:36:26+00:00","og_image":[{"width":2080,"height":1482,"url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?fit=2080%2C1482&ssl=1","type":"image\/png"}],"author":"hrbrmstr","twitter_card":"summary_large_image","twitter_misc":{"Written by":"hrbrmstr","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/#article","isPartOf":{"@id":"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/"},"author":{"name":"hrbrmstr","@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"headline":"{catchpole} Redux and Hashing Files &#038; Websites with {ssdeepr}","datePublished":"2020-03-04T12:36:26+00:00","mainEntityOfPage":{"@id":"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/"},"wordCount":488,"commentCount":0,"publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"image":{"@id":"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?fit=2080%2C1482&ssl=1","articleSection":["R"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/","url":"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/","name":"{catchpole} Redux and Hashing Files & Websites with {ssdeepr} - rud.is","isPartOf":{"@id":"https:\/\/rud.is\/b\/#website"},"primaryImageOfPage":{"@id":"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/#primaryimage"},"image":{"@id":"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?fit=2080%2C1482&ssl=1","datePublished":"2020-03-04T12:36:26+00:00","breadcrumb":{"@id":"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/#primaryimage","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?fit=2080%2C1482&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?fit=2080%2C1482&ssl=1","width":2080,"height":1482},{"@type":"BreadcrumbList","@id":"https:\/\/rud.is\/b\/2020\/03\/04\/catchpole-redux-and-hashing-files-websites-with-ssdeepr\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/rud.is\/b\/"},{"@type":"ListItem","position":2,"name":"{catchpole} Redux and Hashing Files &#038; Websites with {ssdeepr}"}]},{"@type":"WebSite","@id":"https:\/\/rud.is\/b\/#website","url":"https:\/\/rud.is\/b\/","name":"rud.is","description":"&quot;In God we trust. All others must bring data&quot;","publisher":{"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/rud.is\/b\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/rud.is\/b\/#\/schema\/person\/d7cb7487ab0527447f7fda5c423ff886","name":"hrbrmstr","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","contentUrl":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1","width":460,"height":460,"caption":"hrbrmstr"},"logo":{"@id":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2023\/10\/ukr-shield.png?fit=460%2C460&ssl=1"},"description":"Don't look at me\u2026I do what he does \u2014 just slower. #rstats avuncular \u2022 ?Resistance Fighter \u2022 Cook \u2022 Christian \u2022 [Master] Chef des Donn\u00e9es de S\u00e9curit\u00e9 @ @rapid7","sameAs":["http:\/\/rud.is"],"url":"https:\/\/rud.is\/b\/author\/hrbrmstr\/"}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/catchpole-white.png?fit=2080%2C1482&ssl=1","jetpack_shortlink":"https:\/\/wp.me\/p23idr-3iA","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":12667,"url":"https:\/\/rud.is\/b\/2020\/03\/02\/make-wsj-esque-uber-tuesday-democrat-delegate-cartograms-in-r-with-catchpole\/","url_meta":{"origin":12684,"position":0},"title":"Make WSJ-esque \u00dcber Tuesday Democrat Delegate Cartograms in R with {catchpole}","author":"hrbrmstr","date":"2020-03-02","format":false,"excerpt":"For folks who are smart enough not to go near Twitter, I've been on a hiatus from the platform insofar as reading the Twitter feed goes. \"Why\" isn't the subject of this post so I won't go into it, but I've broken this half-NYE resolution on more than one occasion\u2026","rel":"","context":"In &quot;R&quot;","block_context":{"text":"R","link":"https:\/\/rud.is\/b\/category\/r\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/my-map-2.png?fit=1200%2C784&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/my-map-2.png?fit=1200%2C784&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/my-map-2.png?fit=1200%2C784&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/my-map-2.png?fit=1200%2C784&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/rud.is\/b\/wp-content\/uploads\/2020\/03\/my-map-2.png?fit=1200%2C784&ssl=1&resize=1050%2C600 3x"},"classes":[]}],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/12684","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/comments?post=12684"}],"version-history":[{"count":0,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/posts\/12684\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media\/12687"}],"wp:attachment":[{"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/media?parent=12684"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/categories?post=12684"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rud.is\/b\/wp-json\/wp\/v2\/tags?post=12684"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}