Skip navigation

Category Archives: Cybersecurity

I pen this mini-tome on “GDPR Enforcement Day”. The spirit of GDPR is great, but it’s just going to be another Potempkin Village in most organizations much like PCI or SOX. For now, the only thing GDPR has done is made GDPR consulting companies rich, increased the use of javascript on web sites so they can pop-up useless banners we keep telling users not to click on and increase the size of email messages to include mandatory postscripts (that should really be at the beginning of the message, but, hey, faux privacy is faux privacy).

Those are just a few of the “unintended consequences” of GDPR. Just like Let’s Encrypt & “HTTPS Everywhere” turned into “Let’s Enable Criminals and Hurt Real People With Successful Phishing Attacks”, GDPR is going to cause a great deal of downstream issues that either the designers never thought of or decided — in their infinite, superior wisdom — were completely acceptable to make themselves feel better.

Today’s installment of “GDPR Unintended Consequences” is WordPress.

WordPress “powers” a substantial part of the internet. As such, it is a perma-target of attackers.

Since the GDPR Intelligentsia provided a far-too-long lead-time on both the inaugural and mandated enforcement dates for GDPR and also created far more confusion with the regulations than clarity, WordPress owners are flocking to “single button install” solutions to make them magically GDPR compliant (#protip that’s not “a thing”). Here’s a short list of plugins and active installation counts (no links since I’m not going to encourage attack surface expansion):

  • WP GDPR Compliance : 50,000+ active installs
  • GDPR : 10,000+ active installs
  • The GDPR Framework : 6,000+ installs
  • GDPR Cookie Compliance : 10,000+ active installs
  • GDPR Cookie Consent : 200,000+ active installs
  • WP GDPR : 4,000 active installs
  • Cookiebot | GDPR Compliant Cookie Consent and Notice : 10,000+ active installations
  • GDPR Tools : 500+ active installs
  • Surbma — GDPR Proof Cookies : 400+ installs
  • Social Media Share Buttons & Social Sharing Icons (which “enhanced” GDPR compatibility) : 100,000+ active installs
  • iubenda Cookie Solution for GDPR : 10,000+ active installs
  • Cookie Consent : 100,000+ active installs

I’m somewhat confident that a fraction of those publishers follow secure coding guidelines (it may be a small fraction). But, if I was an attacker, I’d be poking pretty hard at a few of those with six-figure installs to see if I could find a usable exploit.

GDPR just gave attackers a huge footprint of homogeneous resources to attempt at-scale exploits. They will very likely succeed (over-and-over-and-over again). This means that GDPR just increased the likelihood of losing your data privacy…the complete opposite of the intent of the regulation.

There are more unintended consequences and I’ll pepper the blog with them as the year and pain progresses.

RIPE 76 is going on this week and — as usual — there are scads of great talks. The selected ones below are just my (slightly) thinner slice at what may have broader appeal outside pure networking circles.

Do not read anything more into the order than the end-number of the “Main URL” since this was auto-generated from a script that processed my Firefox tab URLs.

Artyom Gavrichenkov – Memcache Amplification DDoS: Lessons Learned

Erik Bais – Why Do We Still See Amplification DDOS Traffic

Jordi Palet Martinez – A New Internet Intro to HTTP/2, QUIC, DOH and DNS over QUIC

Sara Dickinson – DNS Privacy BCP

Jordi Palet Martinez – Email Servers on IPv6

Martin Winter – Real-Time BGP Toolkit: A New BGP Monitor Service

Job Snijders – Practical Data Sources For BGP Routing Security

Charles Eckel – Combining Open Source and Open Standards

Kostas Zorbadelos – Towards IPv6 Only: A large scale lw4o6 deployment (rfc7596) for broadband users @AS6799

Louis Poinsignon – Internet Noise (Announcing 1.1.1.0/24)

Filiz Yilmaz – Current Policy Topics – Global Policy Proposals

Geoff Huston – Measuring ATR

Moritz Muller, SIDN – DNSSEC Rollovers

Anand Buddhdev – DNS Status Report

Victoria Risk – A Survey on DNS Privacy

Baptiste Jonglez – High-Performance DNS over TCP

Sara Dickinson – Latest Measurements on DNS Privacy

Willem Toorop – Sunrise DNS-over-TLS! Sunset DNSSEC – Who Needs Reasons, When You’ve Got Heroes

Laurenz Wagner – A Modern Chatbot Approach for Accessing the RIPE Database

I apologize up-front for using bad words in this post.

Said bad words include “Facebook”, “Mark Zuckerberg” and many referrals to entities within the U.S. Government. Given the topic, it cannot be helped.

I’ve also left the R tag on this despite only showing some ggplot2 plots and Markdown tables. See the end of the post for how to get access to the code & data. R was used solely and extensively for the work behind the words.


This week Congress put on a show as they summoned the current Facebook CEO — Mark Zuckerberg — down to Washington, D.C. to demonstrate how little most of them know about how the modern internet and social networks actually work plus chest-thump to prove to their constituents they really and truly care about you.

These Congress-critters offered such proof in the guise of railing against Facebook for how they’ve handled your data. Note that I should really say our data since they do have an extensive profile database on me and most everyone else even if they’re not Facebook platform users (full disclosure: I do not have a Facebook account).

Ostensibly, this data-mishandling impacted your privacy. Most of the committee members wanted any constituent viewers to come away believing they and their fellow Congress-critters truly care about your privacy.

Fortunately, we have a few ways to measure this “caring” and the remainder of this post will explore how much members of the U.S. House and Senate care about your privacy when you visit their official .gov web sites. Future posts may explore campaign web sites and other metrics, but what better place to show they care about you then right there in their digital houses.

Privacy Primer

When you visit a web site with any browser, the main URL pulls in resources to aid in the composition and functionality of the page. These could be:

  • HTML (the main page is very likely HTML unless it’s just a media URL)
  • images (png, jpg, gif, “svg”, etc),
  • fonts
  • CSS (the “style sheet” that tells the browser how to decorate and position elements on the page)
  • binary objects (such as embedded PDF files or “protocol buffer” content)
  • XML or JSON
  • JavaScript

(plus some others)

When you go to, say, www.example.com the site does not have to load all the resources from example.com domains. In fact, it’s rare to find a modern site which does not use resources from one or more third party sites.

When each resource is loaded (generally) some information about you goes along for the ride. At a minimum, the request time and source (your) IP address is exposed and — unless you’re really careful/paranoid — the referring site, browser configuration and even cookies are even available to the third party sites. It does not take many of these data points to (pretty much) uniquely identify you. And, this is just for “benign” content like images. We’ll get to JavaScript in a bit.

As you move along the web, these third-party touch-points add up. To demonstrate this, I did my best to de-privatize my browser and OS configuration and visited 12 web sites while keeping a fresh install of Firefox Lightbeam running. Here’s the result:

Each main circle is a distinct/main site and the triangles are resources the site tried to load. The red triangles indicate a common third-party resource that was loaded by two or more sites. Each of those red triangles knows where you’ve been (again, unless you’ve been very careful/paranoid) and can use that information to enhance their knowledge about you.

It gets a bit worse with JavaScript content since a much stronger fingerprint can be created for you (you can learn more about fingerprints at this spiffy EFF site). Plus, JavaScript code can try to pilfer cookies, “hack” the browser, serve up malicious adverts, measure time-on-site, and even enlist you in a cryptomining army.

There are other issues with trusting loaded browser content, but we’ll cover that a bit further into the investigation.

Measuring “Caring”

The word “privacy” was used over 100 times each day by both Zuckerberg and our Congress-critters. Senators and House members made it pretty clear Facebook should care more about your privacy. Implicit in said posit is that they, themselves, must care about your privacy. I’m sure they’ll be glad to point out all along the midterm campaign trails just how much they’re doing to protect your privacy.

We don’t just have to take their word for it. After berating Facebook’s chief college dropout and chastising the largest social network on the planet we can see just how much of “you” these representatives give to Facebook (and other sites) and also how much they protect you when you decide to pay them[] [] a digital visit.

For this metrics experiment, I built a crawler using R and my splashr? package which, in turn, uses ScrapingHub’s open source Splash. Splash is an automation framework that lets you programmatically visit a site just like a human would with a real browser.

Normally when one scrapes content from the internet they’re just grabbing the plain, single HTML file that is at the target of a URL. Splash lets us behave like a browser and capture all the resources — images, CSS, fonts, JavaScript — the site loads and will also execute any JavaScript, so it will also capture resources each script may itself load.

By capturing the entire browser experience for the main page of each member of Congress we can get a pretty good idea of just how much each one cares about your digital privacy, and just how much they secretly love Facebook.

Let’s take a look, first, at where you go when you digitally visit a Congress-critter.

Network/Hosting/DNS

Each House and Senate member has an official (not campaign) site that is hosted on a .gov domain and served up from a handful of IP addresses across the following (n is the number of Congress-critter web sites):

asn aso n
AS5511 Orange 425
AS7016 Comcast Cable Communications, LLC 95
AS20940 Akamai International B.V. 13
AS1999 U.S. House of Representatives 6
AS7843 Time Warner Cable Internet LLC 1
AS16625 Akamai Technologies, Inc. 1

“Orange” is really Akamai and Akamai is a giant content delivery network which helps web sites efficiently provide content to your browser and can offer Denial of Service (DoS) protection. Most sites are behind Akamai, which means you “touch” Akamai every time you visit the site. They know you were there, but I know a sufficient body of folks who work at Akamai and I’m fairly certain they’re not too evil. Virtually no representative solely uses House/Senate infrastructure, but this is almost a necessity given how easy it is to take down a site with a DoS attack and how polarized politics is in America.

To get to those IP addresses, DNS names like www.king.senate.gov (one of the Senators from my state) needs to be translated to IP addresses. DNS queries are also data gold mines and everyone from your ISP to the DNS server that knows the name-to-IP mapping likely sees your IP address. Here are the DNS servers that serve up the directory lookups for all of the House and Senate domains:

nameserver gov_hosted
e4776.g.akamaiedge.net. FALSE
wc.house.gov.edgekey.net. FALSE
e509.b.akamaiedge.net. FALSE
evsan2.senate.gov.edgekey.net. FALSE
e485.b.akamaiedge.net. FALSE
evsan1.senate.gov.edgekey.net. FALSE
e483.g.akamaiedge.net. FALSE
evsan3.senate.gov.edgekey.net. FALSE
wwwhdv1.house.gov. TRUE
firesideweb02cc.house.gov. TRUE
firesideweb01cc.house.gov. TRUE
firesideweb03cc.house.gov. TRUE
dchouse01cc.house.gov. TRUE
c3pocc.house.gov. TRUE
ceweb.house.gov. TRUE
wwwd2-cdn.house.gov. TRUE
45press.house.gov. TRUE
gopweb1a.house.gov. TRUE
eleven11web.house.gov. TRUE
frontierweb.house.gov. TRUE
primitivesocialweb.house.gov. TRUE

Akamai kinda does need to serve up DNS for the sites they host, so this list also makes sense. But, you’ve now had two touch-points logged and we haven’t even loaded a single web page yet.

Safe? & Secure? Connections

When we finally make a connection to a Congress-critter’s site, it is going to be over SSL/TLS. They all support it (which is ?, but SSL/TLS confidentiality is not as bullet-proof as many “HTTPS Everywhere” proponents would like to con you into believing). However, I took a look at the SSL certificates for House and Senate sites. Here’s a sampling from, again, my state (one House representative):

The *.house.gov “Common Name (CN)” is a wildcard certificate. Many SSL certificates have just one valid CN, but it’s also possible to list alternate, valid “alt” names that can all use the same, single certificate. Wildcard certificates ease the burden of administration but it also means that if, say, I managed to get my hands on the certificate chain and private key file, I could setup vladimirputin.house.gov somewhere and your browser would think it’s A-OK. Granted, there are far more Representatives than there are Senators and their tenure length is pretty erratic these days, so I can sort of forgive them for taking the easy route, but I also in no way, shape or form believe they protect those chains and private keys well.

In contrast, the Senate can and does embed the alt-names:

Are We There Yet?

We’ve got the IP address of the site and established a “secure” connection. Now it’s time to grab the index page and all the rest of the resources that come along for the ride. As noted in the Privacy Primer (above), the loading of third-party resources is problematic from a privacy (and security) perspective. Just how many third party resources do House and Senate member sites rely on?

To figure that out, I tallied up all of the non-.gov resources loaded by each web site and plotted the distribution of House and Senate (separately) in a “beeswarm” plot with a boxplot shadowing underneath so you can make out the pertinent quantiles:

As noted, the median is around 30 for both House and Senate member sites. In other words, they value your browsing privacy so little that most Congress-critters gladly share your browser session with many other sites.

We also talked about confidentiality above. If an https site loads http resources the contents of what you see on the page cannot but guaranteed. So, how responsible are they when it comes to at least ensuring these third-party resources are loaded over https?

You’re mostly covered from a pseudo-confidentiality perspective, but what are they serving up to you? Here’s a summary of the MIME types being delivered to you:

MIME Type Number of Resources Loaded
image/jpeg 6,445
image/png 3,512
text/html 2,850
text/css 1,830
image/gif 1,518
text/javascript 1,512
font/ttf 1,266
video/mp4 974
application/json 673
application/javascript 670
application/x-javascript 353
application/octet-stream 187
application/font-woff2 99
image/bmp 44
image/svg+xml 39
text/plain 33
application/xml 15
image/jpeg, video/mp2t 12
application/x-protobuf 9
binary/octet-stream 5
font/woff 4
image/jpg 4
application/font-woff 2
application/vnd.google.gdata.error+xml 1

We’ll cover some of these in more detail a bit further into the post.

Facebook & “Friends”

Facebook started all this, so just how cozy are these Congress-critters with Facebook?

Turns out that both Senators and House members are very comfortable letting you give Facebook a love-tap when you come visit their sites since over 60% of House and 40% of Senate sites use 2 or more Facebook resources. Not all Facebook resources are created equal[ly evil] and we’ll look at some of the more invasive ones soon.

Facebook is not the only devil out there. I added in the public filter list from Disconnect and the numbers go up from 60% to 70% for the House and from 40% to 60% for the Senate when it comes to a larger corpus of known tracking sites/resources.

Here’s a list of some (first 20) of the top domains (with one of Twitter’s media-serving domains taking the individual top-spot):

Main third-party domain # of ‘pings’ %
twimg.com 764 13.7%
fbcdn.net 655 11.8%
twitter.com 573 10.3%
google-analytics.com 489 8.8%
doubleclick.net 462 8.3%
facebook.com 451 8.1%
gstatic.com 385 6.9%
fonts.googleapis.com 270 4.9%
youtube.com 246 4.4%
google.com 183 3.3%
maps.googleapis.com 144 2.6%
webtrendslive.com 95 1.7%
instagram.com 75 1.3%
bootstrapcdn.com 68 1.2%
cdninstagram.com 63 1.1%
fonts.net 51 0.9%
ajax.googleapis.com 50 0.9%
staticflickr.com 34 0.6%
translate.googleapis.com 34 0.6%
sharethis.com 32 0.6%

So, when you go to check out what your representative is ‘officially’ up to, you’re being served…up on a silver platter to a plethora of sites where you are the product.

It’s starting to look like Congress-folk aren’t as sincere about your privacy as they may have led us all to believe this week.

A [Java]Script for Success[ful Privacy Destruction]

As stated earlier, not all third-party content is created equally malicious. JavaScript resources run code in your browser on your device and while there are limits to what it can do, those limits diminish weekly as crafty coders figure out more ways to use JavaScript to collect information and perform shady or malicious deeds.

So, how many House/Senate sites load one or more third-party JavaScript resources?

Virtually all of them.

To make matters worse, no .gov or third-party resource of any kind was loaded using subresource integrity validation. Subresource integrity validation means that the site owner — at some point — ensured that the resource being loaded was not malicious and then created a fingerprint for it and told your browser what that fingerprint is so it can compare it to what got loaded. If the fingerprints don’t match, the content is not loaded/executed. Using subresource integrity is not trivial since it requires a top-notch content management team and failure to synchronize/checkpoint third-party content fingerprints will result in resources failing to load.

Congress was quick to demand that Facebook implement stronger policies and controls, but they, themselves, cannot be bothered.

Future Work

There are plenty more avenues to explore in this data set (such as “security headers” — they all 100% use strict-transport-security pretty well, but are deeply deficient in others) and more targets for future works, such as the campaign sites of House and Senate members. I may follow up with a look at a specific slice from this data set (the members of the committees who were berating Zuckerberg this week).

The bottom line is that while the beating Facebook took this week was just, those inflicting the pain have a long way to go themselves before they can truly judge what other social media and general internet sites do when it comes to ensuring the safety and privacy of their visitors.

In other words, “Legislator, regulate thyself” before thy regulatists others.

FIN

Apart from some egregiously bad (or benign) examples, I tried not to “name and shame”. I also won’t answer any questions about facets by party since that really doesn’t matter too much as they’re all pretty bad when it comes to understanding and implementing privacy and safey on their sites.

The data set can be found over at Zenodo (alternately, click/tap/select the badge below). I converted the R data frame to ndjson/streaming JSON/jsonlines (however you refer to the format) and tested it out in Apache Drill.

I’ll toss up some R code using data extracts later this week (meaning by April 20th).

DOI

The 2018 IEEE Security & Privacy Conference is in May but they’ve posted their full proceedings and it’s better to grab them early than to wait for it to become part of a paid journal offering.

There are alot of papers. Not all match my interests but (fortunately?) many did and I’ve filtered down a list of the more interesting (to me) ones. It’s encouraging to see academic cybersecurity researchers branching out across a whole host of areas.

I can’t promise a the morning paper-esque daily treatment of these on the blog but I’ll likely exposit a few of them over the coming weeks. I’ve emoji’d a few that stood out. Order is the order I read them in (no other meaning to the order).

What’s Up?

The NPR Visuals Team created and maintains a javascript library that makes it super easy to embed iframes on web pages and have said documents still be responsive.

The widgetframe R htmlwidget uses pym.js to bring this (much needed) functionality into widgets and (eventually) shiny apps.

NPR reported a critical vulnerability in this library on February 15th, 2018 with no details (said details will be coming next week).

Per NPR’s guidance, any production code using pym.js needs to be pulled or updated to use this new library.

I created an issue & pushed up a PR that incorporates the new version. NOTE that the YAML config file in the existing CRAN package and GitHub dev version incorrectly has 1.3.2 as the version (it’s really the 1.3.1 dev version).

A look at the diff:

suggest that the library was not performing URL sanitization (and now is).

Watch Out For Standalone Docs

Any R markdown docs compiled in “standalone” mode will need to be recompiled and re-published as the vulnerable pym.js library comes along for the ride in those documents.

Regardless of “standalone mode”, if you used widgetframe in any context, including:

anything created is vulnerable regardless of standalone compilation or not.

FIN

Once the final details are released I’ll update this post and may do a new post. Until then:

  • check if you’ve used widgetframe (directly or indirectly)
  • REMOVE ALL VULNERABLE DOCS from RPubs, GitHub pages, your web site (etc) today
  • regenerate all standalone documents ASAP
  • regenerate your blogs, books, dashboards, etc ASAP with the patched code; DO THIS FOR INTERNAL as well as internet-facing content.
  • monitor this space

NOTE: This is mainly for those of us in the Colonies, but some tips apply globally.

Black Friday / Cyber Monday / Cyber November / Holiday ?hopping is upon us. You’re going to buy stuff. You’re going to use digital transactions to do so. Here are some tips in a semi-coherent order:

  • Sign up for a “reputable” credit card (is there such a thing? FinServs are pretty evil) with a low interest rate/cash back, multi-factor authentication on their web/app and a limit on total credit and a per-transaction limit. This card is just for shopping. Pay with petrol and groceries with something else.
  • Assign that to your PayPal, Amazon, Apple Pay, et al accounts and keep that as your only physical & digital card for your shopping sprees until the season ends.
  • Setup multi-factor auth on PayPal, Amazon, Apple Pay and anywhere else you shop. Don’t shop where you can’t do this.
  • Use Amazon or a site that accepts PayPal, Apple Pay, or Amazon payments. Yes, all those orgs are evil. But they do a better job than most when it comes to account security.
  • Use Quantum Firefox or the latest Chrome Betas to shop online. Nothing else. Check for updates daily & apply when they are out.
  • Double-check URLs when shopping. Make sure you’re on the site you want to be on. Let’s Encrypt made it super easy for attackers to pwn you this season. You can afford an extra 5 minutes since that’ll save you years battling identity theft or account bankruptcy.
  • Type all URLs into Google’s safety net — https://transparencyreport.google.com/safe-browsing/search — if at all possible before even considering trusting them.
  • Don’t use any storefront that uses a Let’s Encrypt certificate. Any.
  • Never let sites store your credit card or bank info.
  • Never shop on a site that has any errors associated with their SSL/TLS certificates. Let’s Encrypt killed the integrity of the lock icon and well-resourced adversaries can thwart the encryption but the opportunistic attackers likely to try to pwn you are going to be stopped
  • Avoid shopping with Apps. App developers are generally daft and have wretched security practices baked into their apps.
  • Use “Private Browsing” mode to shop if at all possible and start new browser sessions per-site. Your shopping habits and purchase info is as or even more valuable than your card digits, esp to trackers.
  • Use Ublock Origin or other reputable ad-blockers and tracking blockers to prevent orgs from tracking you as you shop. A good hosts file wouldn’t hurt, either.
  • Use Quad9 as your DNS provider starting now.
  • Never shop online from public Wi-Fi.
  • Don’t shop online from your company’s network (even the “guest” network). They track you. They all do or at least send data (whether they know it or not) to security appliance and “cloud” services that will use it against you or to profit off of you.
  • Absolutely do not use a store’s Wi-Fi to shop.
  • If using Amazon, avoid third-party sellers if at all possible. Scammers abound.
  • Never use social networks to share what you just purchased.
  • Never “SQUEEE” on social media that any shipments are “arriving today” and you’re “so excited!”.
  • Don’t use that daft, new Amazon video-delivery-bluetooth-alexa lock thing. Ever.
  • If you can afford it, use an in-home (not cloud-based) security camera pointed at the place where deliveries come and review the footage daily if you are expecting deliveries.
  • In-person/brick-and-mortar shopping should be done at chip+pin establishments or use cash at all others.
  • Review your day’s purchases online at the end of the day or the next morning.
  • Report all issues immediately to authorities then the establishments.

Why this particular slice of advice?

The U.S. moved to chip & signature in October of 2016. This has forced attackers to find different, creative ways to get your credit card info. Yes, there were scads of breaches this year, but a good chunk of digital crime is plain ‘ol theft. Web sites make great targets. Public Wi-Fi makes a great target. You need to protect yourself since no store, org, bank, politician or authority really cares that your identity was stolen. If they did, we wouldn’t be in the breach mess we’re in now.

Attackers know you’re in deep “breach fatigue” and figure you’re all in a “Meh. Nothing matters” mood. Don’t be pwnd! A wrong move could put you in identity theft limbo for years.

The Identity Theft Resource Center — http://www.idtheftcenter.org/ — is a great resource and can definitely help you in the right direction if you don’t follow the above advice and run into issues.

?tay ?afe thi? ?hopping ?sea?on!

insert(post, "{ 'standard_disclaimer' : 'My opinion, not my employer\'s' }")

This is a post about the fictional company FredCo. If the context or details presented by the post seem familiar, it’s purely coincidental. This is, again, a fictional story.

Let’s say FredCo had a pretty big breach that (fictionally) garnered media, Twitterverse, tech-world and Government-level attention and that we have some spurious details that let us sit back in our armchairs to opine about. What might have helped create the debacle at FredCo?

Despite (fictional) endless mainstream media coverage and a good chunk of ‘on background’ infosec-media clandestine blatherings we know very little about the breach itself (though it’s been fictionally, officially blamed on failure to patch Apache Struts). We know even less (fictionally officially) about the internal reach of the breach (apart from the limited consumer impact official disclosures). We know even less than that (fictionally officially) about how FredCo operates internally (process-wise).

But, I’ve (fictionally) seen:

  • a detailed breakdown of the number of domains, subdomains, and hosts FredCo “manages”.
  • the open port/service configurations of the public components of those domains
  • public information from individuals who are more willing to (fictionally) violate the CFAA than I am to get more than just port configuration information
  • a 2012/3 SAS 1 Type II report about FredCo controls
  • testimonies from FredCo execs regarding efficacy of $SECURITY_TECHOLOGY and 3 videos purporting to be indicative of expert opine on how to use BIIGG DATERZ to achieve cybersecurity success
  • the board & management structure + senior management bonus structures, complete with incentive-based objectives they were graded on

so, I’m going to blather a bit about how this fictional event should finally tear down the Potemkin village that is the combination of the Regulatory+Audit Industrial Complex and the Cybersecurity Industrial Complex.

“Tear down” with respect to the goal being to help individuals understand that a significant portion of organizations you entrust with your data are not incentivized or equipped to protect your data and that these same conditions exist in more critical areas — such as transportation, health care, and critical infrastructure — and you should expect a failure on the scale of FredCo — only with real, harmful impact — if nothing ends up changing soon.

From the top

There is boilerplate mention of “security” in the objectives of the senior executives between 2015 & 2016 14A filings:

  • CEO: “Employing advanced analytics and technology to help drive client growth, security, efficiency and profitability.”
  • CFO: “Continuing to advance and execute global enterprise risk management processes, including directing increased investment in data security, disaster recovery and regulatory compliance capabilities.”
  • CLO: “Continuing to refine and build out the Company’s global security organization.”
  • President, Workforce Solutions: None
  • CHRO: None
  • President – US Information Services: None

You’ll be happy to know that they all received either “Distinguished” or “Exceeds” on their appraisals and received a multiplier of their bonus & compensation targets as a result.

Furthermore, there is no one in the make-up of FredCo’s board of directors who has shown an interest or specialization in cybersecurity.

From the camera-positioned 50-yard line on instant replay, the board and shareholders of FredCo did not think protection of your identity and extremely personal information was important enough to include on three top executive directives and performance measure and was given little more than boilerplate mention for others. Investigators who look into FredCo’s breach should dig deep into the last decade of the detailed measures for these objectives. I have first-hand experience how these types of HR processes are managed in large orgs, which is why I’m encouraging this area for investigation.

“Security” is a terrible term, but it only works when it is an emergent property of the business processes of an organization. That means it must be contextual for every worker. Some colleagues suggest individual workers should not have to care about cybersecurity when making decisions or doing work, but even minimum-wage retail and grocery store clerks are educated about shoplifting risks and are given tools, tips and techniques to prevent loss. When your HR organizations is not incentivized to help create and maintain a cybersecurity-aware culture from the top you’re going to have problems, and when there are no cyberscurity-oriented targets for the CIO or even business process owners, don’t expect your holey screen door to keep out predators.

Awwwdit, Part I

NOTE: I’m not calling out any particular audit organization as I’ve only seen one fictional official report.

The Regulatory+Audit Industrial Complex is a lucrative business cabal. Governments and large business meta-agencies create structures where processes can be measured, verified and given a big green ✅. This validation exercise is generally done in one or more ways:

  • simple questionnaire, very high level questions, no veracity validation
  • more detailed questionnaire, mid-level questions, usually some in-person lightweight checking
  • detailed questionnaire, but with topics that can be sliced-and-diced by the legal+technical professions to mean literally anything, measured in-person by (usually) extremely junior reviewers with little-to-no domain expertise who follow review playbooks, get overwhelmed with log entries and scope-refinement+reduction and who end up being steered towards “important” but non-material findings

Sure, there are good audits and good auditors, but I will posit they are the rare diamonds in a bucket of zirconia.

We need to cover some technical ground before covering this further, though.

Shocking Struts

We’ll take the stated breach cause at face-value: failure to patch an remote-accessible vulnerability with Apache Struts. This was presented as the singular issue enabling attackers to walk (with crutches) away with scads of identify-theft-enabling personal data, administrator passwords, database passwords, and the recipe for the winning entry in the macaroni salad competition at last year’s HR annual picnic. Who knew one Java library had so much power!

We don’t know the architecture of all the web apps at FredCo. However, your security posture should not be a Jenga game tower, easily destroyed by removing one peg. These are all (generally) components of externally-facing applications at the scale of FredCo:

  • routers
  • switches
  • firewalls
  • load balancers
  • operating systems
  • application servers
  • middleware servers
  • database servers
  • customized code

These are mimicked (to varying levels of efficacy) across:

  • development
  • test
  • staging
  • production

environments.

They may coexist (in various layers of the network) with:

  • HR systems
  • Finance systems
  • Intranet servers
  • Active Directory
  • General user workstations
  • Executive workstations
  • Developer workstations
  • Mobile devices
  • Remote access infrastructure (i.e. VPNs)

A properly incentivized organization ensures there are logical and physical separation between/isolation of “stuff that matters” and that varying levels of authentication & authorization are applied to ensure access is restricted.

Keeping all that “secure” requires:

  • managing thousands of devices (servers, network components, laptops, desktops, mobile devices)
  • managing thousands of identities
  • managing thousands of configurations across systems, networks and devices
  • managing hundreds to thousands of connections between internal and external networks
  • managing thousands of rules
  • managing thousands of vulnerabilities (as they become known)
  • managing a secure development life cycle across hundreds or thousands of applications

Remember, though, that FredCo ostensibly managed all of that well and the data loss was solely due to one Java library.

If your executives (all of them) and workers (all of them) are not incentivized with that list in mind, you will have problems, but let’s talk about the security challenges back in the context of the audit role.

Awwwdit, Part II

The post is already long, so we’ll make this quick.

If I dropped you off — yes, you, because you’re likely as capable as the auditors mentioned in the previous section on audit — into that environment once a year, do you think you’d be able to ferret out issues based on convoluted network diagrams, poorly documented firewall rules and source code, non-standard checklists of user access management processes?

Let’s say I dropped you in months before the known Struts vulnerability and re-answer the question.

The burden placed on internal and — especially — external auditors is great and they are pretty much set up for failure from engagement number one.

Couple IT complexity with the fact that many orgs like FredCo aren’t required to do more than ensure financial reporting processes are ?.

But, even if there were more technical, security-oriented audits performed, you’d likely have ten different report findings by as many firms or auditors, especially if they were point-in-time audits. Furthermore, FredCo has has decades of point-in-time audits but hundreds of auditors and dozens of firms. The conditions of the breach were likely not net-new, so how did decades of systemic IT failures go unnoticed by this cabal?

IT audit functions are a multi-billion dollar business. FredCo is partially the result of the built-in cracks in the way verification is performed in orgs. In other words, I posit the Regulatory+Audit Industrial Complex bears some of the responsibility for FredCo’s breach.

Divisive Devices

From the (now removed) testimonials & videos, it was clear there may have been a “blinky light” problem in the mindset of those responsible for cybersecurity at FredCo. Relying solely on the capabilities of one or more devices (they are usually appliances with blinky lights) and thinking that storing petabytes of log data are going to stop “bad guys’ is a great recipe for a breach parfait.

But, the Cybersecurity Industrial Complex continues to dole out LED-laden boxes with the fervor of a U.S. doctor handing out opioids. Sure, they are just giving orgs what they want, but it doesn’t make it responsible behaviour. Just like the opioid problem, the “device” issue is likely causing cyber-sickness in more organizations that you’d like to admit. You may even know someone who works at an org with a box-addition.

I posit the Cybersecurity Industrial Complex bears some of the responsibility for FredCo’s breach, especially when you consider the hundreds of marketing e-mails I’ve seen post-FredCo breach telling me how CyberBox XJ9-11 would have stopped FredCo’s attackers cold.

A Matter of Trust

If removing a Struts peg from FredCo’s IT Jenga board caused the fictional tower to crash:

  • What do you think the B2B infrastructure looks like?
  • How do you think endpoints are managed?
  • What isolation, segmentation and access controls really exist?
  • How effective do you think their security awareness program is?
  • How many apps are architected & managed as poorly as the breached one?
  • How many shadow IT deployments exist in the ☁️ with your data in it?
  • How can you trust FredCo with anything of importance?

Fictional FIN

In this fictional world I’ve created one ending is:

  • all B2B connections to FredCo have been severed
  • lawyers at a thousand firms are working on language for filings to cancel all B2B contracts with FredCo
  • FredCo was de-listed from exchanges
  • FredCo executives are defending against a slew of criminal and civil charges
  • The U.S. Congress and U.K. Parliament have come together to undertake a joint review of regulatory and audit practices spanning both countries (since it impacted both countries and the Reg+Audit cabal spans both countries they decided to save time and money) resulting in sweeping changes
  • The SEC has mandated detailed cybersecurity objectives be placed on all senior management executives at all public companies and have forced results of those objectives assessments to be part of a new filing requirement.
  • The SEC has also mandated that at least one voting board member of public companies must have demonstrated experience with cybersecurity
  • The FTC creates and enforces standards on cybersecurity product advertising practices
  • You have understood that nobody has your back when it comes to managing your sensitive, personal data and that you must become an active participant in helping to ensure your elected representatives hold all organizations accountable when it comes to taking their responsibilities seriously.

but, another is:

  • FredCo’s stock bounces back
  • FredCo loses no business partners
  • FredCo’s current & former execs faced no civil or criminal charges
  • Congress makes a bit of opportunistic, temporary bluster for the sake of 2018 elections but doesn’t do anything more than berate FredCo publicly
  • You’re so tired of all these breaches and data loss that you go back to playing “Clash of Clans” on your mobile phone and do nothing.

I was about to embark on setting up a background task to sift through R package PDFs for traces of functions that “omit NA values” as a surprise present for Colin Fay and Sir Tierney:

When I got distracted by a PDF in the CRAN doc/contrib directory: Short-refcard.pdf. I’m not a big reference card user but students really like them and after seeing what it was I remembered having seen the document ages ago, but never associated it with CRAN before.

I saw:

by Tom Short, EPRI PEAC, tshort@epri-peac.com 2004-11-07 Granted to the public domain. See www. Rpad. org for the source and latest version. Includes material from R for Beginners by Emmanuel Paradis (with permission).

at the top of the card. The link (which I’ve made unclickable for reasons you’ll see in a sec — don’t visit that URL) was clickable and I tapped it as I wanted to see if it had changed since 2004.

You can open that image in a new tab to see the full, rendered site and take a moment to see if you can find the section that links to objectionable — and, potentially malicious — content. It’s easy to spot.

I made a likely correct assumption that Tom Short had nothing to do with this and wanted to dig into it a bit further to see when this may have happened. So, don your bestest deerstalker and follow along as we see when this may have happened.

Digging In Domain Land

We’ll need some helpers to poke around this data in a safe manner:

library(wayback) # devtools::install_github("hrbrmstr/wayback")
library(ggTimeSeries) # devtools::install_github("AtherEnergy/ggTimeSeries")
library(splashr) # devtools::install_github("hrbrmstr/splashr")
library(passivetotal) # devtools::install_github("hrbrmstr/passivetotal")
library(cymruservices)
library(magick)
library(tidyverse)

(You’ll need to get a RiskIQ PassiveTotal key to use those functions. Also, please donate to Archive.org if you use the wayback package.)

Now, let’s see if the main Rpad content URL is in the wayback machine:

glimpse(archive_available("http://www.rpad.org/Rpad/"))
## Observations: 1
## Variables: 5
## $ url        <chr> "http://www.rpad.org/Rpad/"
## $ available  <lgl> TRUE
## $ closet_url <chr> "http://web.archive.org/web/20170813053454/http://ww...
## $ timestamp  <dttm> 2017-08-13
## $ status     <chr> "200"

It is! Let’s see how many versions of it are in the archive:

x <- cdx_basic_query("http://www.rpad.org/Rpad/")

ts_range <- range(x$timestamp)

count(x, timestamp) %>%
  ggplot(aes(timestamp, n)) +
  geom_segment(aes(xend=timestamp, yend=0)) +
  labs(x=NULL, y="# changes in year", title="rpad.org Wayback Change Timeline") +
  theme_ipsum_rc(grid="Y")

count(x, timestamp) %>%
  mutate(Year = lubridate::year(timestamp)) %>%
  complete(timestamp=seq(ts_range[1], ts_range[2], "1 day"))  %>%
  filter(!is.na(timestamp), !is.na(Year)) %>%
  ggplot(aes(date = timestamp, fill = n)) +
  stat_calendar_heatmap() +
  viridis::scale_fill_viridis(na.value="white", option = "magma") +
  facet_wrap(~Year, ncol=1) +
  labs(x=NULL, y=NULL, title="rpad.org Wayback Change Timeline") +
  theme_ipsum_rc(grid="") +
  theme(axis.text=element_blank()) +
  theme(panel.spacing = grid::unit(0.5, "lines"))

There’s a big span between 2008/9 and 2016/17. Let’s poke around there a bit. First 2016:

tm <- get_timemap("http://www.rpad.org/Rpad/")

(rurl <- filter(tm, lubridate::year(anytime::anydate(datetime)) == 2016))
## # A tibble: 1 x 5
##       rel                                                                   link  type
##     <chr>                                                                  <chr> <chr>
## 1 memento http://web.archive.org/web/20160629104907/http://www.rpad.org:80/Rpad/  <NA>
## # ... with 2 more variables: from <chr>, datetime <chr>

(p2016 <- render_png(url = rurl$link))

Hrm. Could be server or network errors.

Let’s go back to 2009.

(rurl <- filter(tm, lubridate::year(anytime::anydate(datetime)) == 2009))
## # A tibble: 4 x 5
##       rel                                                                  link  type
##     <chr>                                                                 <chr> <chr>
## 1 memento     http://web.archive.org/web/20090219192601/http://rpad.org:80/Rpad  <NA>
## 2 memento http://web.archive.org/web/20090322163146/http://www.rpad.org:80/Rpad  <NA>
## 3 memento http://web.archive.org/web/20090422082321/http://www.rpad.org:80/Rpad  <NA>
## 4 memento http://web.archive.org/web/20090524155658/http://www.rpad.org:80/Rpad  <NA>
## # ... with 2 more variables: from <chr>, datetime <chr>

(p2009 <- render_png(url = rurl$link[4]))

If you poke around that, it looks like the original Rpad content, so it was “safe” back then.

(rurl <- filter(tm, lubridate::year(anytime::anydate(datetime)) == 2017))
## # A tibble: 6 x 5
##       rel                                                                link  type
##     <chr>                                                               <chr> <chr>
## 1 memento  http://web.archive.org/web/20170323222705/http://www.rpad.org/Rpad  <NA>
## 2 memento http://web.archive.org/web/20170331042213/http://www.rpad.org/Rpad/  <NA>
## 3 memento http://web.archive.org/web/20170412070515/http://www.rpad.org/Rpad/  <NA>
## 4 memento http://web.archive.org/web/20170518023345/http://www.rpad.org/Rpad/  <NA>
## 5 memento http://web.archive.org/web/20170702130918/http://www.rpad.org/Rpad/  <NA>
## 6 memento http://web.archive.org/web/20170813053454/http://www.rpad.org/Rpad/  <NA>
## # ... with 2 more variables: from <chr>, datetime <chr>

(p2017 <- render_png(url = rurl$link[1]))

I won’t break your browser and add another giant image, but that one has the icky content. So, it’s a relatively recent takeover and it’s likely that whomever added the icky content links did so to try to ensure those domains and URLs have both good SEO and a positive reputation.

Let’s see if they were dumb enough to make their info public:

rwho <- passive_whois("rpad.org")
str(rwho, 1)
## List of 18
##  $ registryUpdatedAt: chr "2016-10-05"
##  $ admin            :List of 10
##  $ domain           : chr "rpad.org"
##  $ registrant       :List of 10
##  $ telephone        : chr "5078365503"
##  $ organization     : chr "WhoisGuard, Inc."
##  $ billing          : Named list()
##  $ lastLoadedAt     : chr "2017-03-14"
##  $ nameServers      : chr [1:2] "ns-1147.awsdns-15.org" "ns-781.awsdns-33.net"
##  $ whoisServer      : chr "whois.publicinterestregistry.net"
##  $ registered       : chr "2004-06-15"
##  $ contactEmail     : chr "411233718f2a4cad96274be88d39e804.protect@whoisguard.com"
##  $ name             : chr "WhoisGuard Protected"
##  $ expiresAt        : chr "2018-06-15"
##  $ registrar        : chr "eNom, Inc."
##  $ compact          :List of 10
##  $ zone             : Named list()
##  $ tech             :List of 10

Nope. #sigh

Is this site considered “malicious”?

(rclass <- passive_classification("rpad.org"))
## $everCompromised
## NULL

Nope. #sigh

What’s the hosting history for the site?

rdns <- passive_dns("rpad.org")
rorig <- bulk_origin(rdns$results$resolve)

tbl_df(rdns$results) %>%
  type_convert() %>%
  select(firstSeen, resolve) %>%
  left_join(select(rorig, resolve=ip, as_name=as_name)) %>% 
  arrange(firstSeen) %>%
  print(n=100)
## # A tibble: 88 x 3
##              firstSeen        resolve                                              as_name
##                 <dttm>          <chr>                                                <chr>
##  1 2009-12-18 11:15:20  144.58.240.79      EPRI-PA - Electric Power Research Institute, US
##  2 2016-06-19 00:00:00 208.91.197.132 CONFLUENCE-NETWORK-INC - Confluence Networks Inc, VG
##  3 2016-07-29 00:00:00  208.91.197.27 CONFLUENCE-NETWORK-INC - Confluence Networks Inc, VG
##  4 2016-08-12 20:46:15  54.230.14.253                     AMAZON-02 - Amazon.com, Inc., US
##  5 2016-08-16 14:21:17  54.230.94.206                     AMAZON-02 - Amazon.com, Inc., US
##  6 2016-08-19 20:57:04  54.230.95.249                     AMAZON-02 - Amazon.com, Inc., US
##  7 2016-08-26 20:54:02 54.192.197.200                     AMAZON-02 - Amazon.com, Inc., US
##  8 2016-09-12 10:35:41   52.84.40.164                     AMAZON-02 - Amazon.com, Inc., US
##  9 2016-09-17 07:43:03  54.230.11.212                     AMAZON-02 - Amazon.com, Inc., US
## 10 2016-09-23 18:17:50 54.230.202.223                     AMAZON-02 - Amazon.com, Inc., US
## 11 2016-09-30 19:47:31 52.222.174.253                     AMAZON-02 - Amazon.com, Inc., US
## 12 2016-10-24 17:44:38  52.85.112.250                     AMAZON-02 - Amazon.com, Inc., US
## 13 2016-10-28 18:14:16 52.222.174.231                     AMAZON-02 - Amazon.com, Inc., US
## 14 2016-11-11 10:44:22 54.240.162.201                     AMAZON-02 - Amazon.com, Inc., US
## 15 2016-11-17 04:34:15 54.192.197.242                     AMAZON-02 - Amazon.com, Inc., US
## 16 2016-12-16 17:49:29   52.84.32.234                     AMAZON-02 - Amazon.com, Inc., US
## 17 2016-12-19 02:34:32 54.230.141.240                     AMAZON-02 - Amazon.com, Inc., US
## 18 2016-12-23 14:25:32  54.192.37.182                     AMAZON-02 - Amazon.com, Inc., US
## 19 2017-01-20 17:26:28  52.84.126.252                     AMAZON-02 - Amazon.com, Inc., US
## 20 2017-02-03 15:28:24   52.85.94.225                     AMAZON-02 - Amazon.com, Inc., US
## 21 2017-02-10 19:06:07   52.85.94.252                     AMAZON-02 - Amazon.com, Inc., US
## 22 2017-02-17 21:37:21   52.85.63.229                     AMAZON-02 - Amazon.com, Inc., US
## 23 2017-02-24 21:43:45   52.85.63.225                     AMAZON-02 - Amazon.com, Inc., US
## 24 2017-03-05 12:06:32  54.192.19.242                     AMAZON-02 - Amazon.com, Inc., US
## 25 2017-04-01 00:41:07 54.192.203.223                     AMAZON-02 - Amazon.com, Inc., US
## 26 2017-05-19 00:00:00   13.32.246.44                     AMAZON-02 - Amazon.com, Inc., US
## 27 2017-05-28 00:00:00    52.84.74.38                     AMAZON-02 - Amazon.com, Inc., US
## 28 2017-06-07 08:10:32  54.230.15.154                     AMAZON-02 - Amazon.com, Inc., US
## 29 2017-06-07 08:10:32  54.230.15.142                     AMAZON-02 - Amazon.com, Inc., US
## 30 2017-06-07 08:10:32  54.230.15.168                     AMAZON-02 - Amazon.com, Inc., US
## 31 2017-06-07 08:10:32   54.230.15.57                     AMAZON-02 - Amazon.com, Inc., US
## 32 2017-06-07 08:10:32   54.230.15.36                     AMAZON-02 - Amazon.com, Inc., US
## 33 2017-06-07 08:10:32  54.230.15.129                     AMAZON-02 - Amazon.com, Inc., US
## 34 2017-06-07 08:10:32   54.230.15.61                     AMAZON-02 - Amazon.com, Inc., US
## 35 2017-06-07 08:10:32   54.230.15.51                     AMAZON-02 - Amazon.com, Inc., US
## 36 2017-07-16 09:51:12 54.230.187.155                     AMAZON-02 - Amazon.com, Inc., US
## 37 2017-07-16 09:51:12 54.230.187.184                     AMAZON-02 - Amazon.com, Inc., US
## 38 2017-07-16 09:51:12 54.230.187.125                     AMAZON-02 - Amazon.com, Inc., US
## 39 2017-07-16 09:51:12  54.230.187.91                     AMAZON-02 - Amazon.com, Inc., US
## 40 2017-07-16 09:51:12  54.230.187.74                     AMAZON-02 - Amazon.com, Inc., US
## 41 2017-07-16 09:51:12  54.230.187.36                     AMAZON-02 - Amazon.com, Inc., US
## 42 2017-07-16 09:51:12 54.230.187.197                     AMAZON-02 - Amazon.com, Inc., US
## 43 2017-07-16 09:51:12 54.230.187.185                     AMAZON-02 - Amazon.com, Inc., US
## 44 2017-07-17 13:10:13 54.239.168.225                     AMAZON-02 - Amazon.com, Inc., US
## 45 2017-08-06 01:14:07  52.222.149.75                     AMAZON-02 - Amazon.com, Inc., US
## 46 2017-08-06 01:14:07 52.222.149.172                     AMAZON-02 - Amazon.com, Inc., US
## 47 2017-08-06 01:14:07 52.222.149.245                     AMAZON-02 - Amazon.com, Inc., US
## 48 2017-08-06 01:14:07  52.222.149.41                     AMAZON-02 - Amazon.com, Inc., US
## 49 2017-08-06 01:14:07  52.222.149.38                     AMAZON-02 - Amazon.com, Inc., US
## 50 2017-08-06 01:14:07 52.222.149.141                     AMAZON-02 - Amazon.com, Inc., US
## 51 2017-08-06 01:14:07 52.222.149.163                     AMAZON-02 - Amazon.com, Inc., US
## 52 2017-08-06 01:14:07  52.222.149.26                     AMAZON-02 - Amazon.com, Inc., US
## 53 2017-08-11 19:11:08 216.137.61.247                     AMAZON-02 - Amazon.com, Inc., US
## 54 2017-08-21 20:44:52  13.32.253.116                     AMAZON-02 - Amazon.com, Inc., US
## 55 2017-08-21 20:44:52  13.32.253.247                     AMAZON-02 - Amazon.com, Inc., US
## 56 2017-08-21 20:44:52  13.32.253.117                     AMAZON-02 - Amazon.com, Inc., US
## 57 2017-08-21 20:44:52  13.32.253.112                     AMAZON-02 - Amazon.com, Inc., US
## 58 2017-08-21 20:44:52   13.32.253.42                     AMAZON-02 - Amazon.com, Inc., US
## 59 2017-08-21 20:44:52  13.32.253.162                     AMAZON-02 - Amazon.com, Inc., US
## 60 2017-08-21 20:44:52  13.32.253.233                     AMAZON-02 - Amazon.com, Inc., US
## 61 2017-08-21 20:44:52   13.32.253.29                     AMAZON-02 - Amazon.com, Inc., US
## 62 2017-08-23 14:24:15 216.137.61.164                     AMAZON-02 - Amazon.com, Inc., US
## 63 2017-08-23 14:24:15 216.137.61.146                     AMAZON-02 - Amazon.com, Inc., US
## 64 2017-08-23 14:24:15  216.137.61.21                     AMAZON-02 - Amazon.com, Inc., US
## 65 2017-08-23 14:24:15 216.137.61.154                     AMAZON-02 - Amazon.com, Inc., US
## 66 2017-08-23 14:24:15 216.137.61.250                     AMAZON-02 - Amazon.com, Inc., US
## 67 2017-08-23 14:24:15 216.137.61.217                     AMAZON-02 - Amazon.com, Inc., US
## 68 2017-08-23 14:24:15  216.137.61.54                     AMAZON-02 - Amazon.com, Inc., US
## 69 2017-08-25 19:21:58  13.32.218.245                     AMAZON-02 - Amazon.com, Inc., US
## 70 2017-08-26 09:41:34   52.85.173.67                     AMAZON-02 - Amazon.com, Inc., US
## 71 2017-08-26 09:41:34  52.85.173.186                     AMAZON-02 - Amazon.com, Inc., US
## 72 2017-08-26 09:41:34  52.85.173.131                     AMAZON-02 - Amazon.com, Inc., US
## 73 2017-08-26 09:41:34   52.85.173.18                     AMAZON-02 - Amazon.com, Inc., US
## 74 2017-08-26 09:41:34   52.85.173.91                     AMAZON-02 - Amazon.com, Inc., US
## 75 2017-08-26 09:41:34  52.85.173.174                     AMAZON-02 - Amazon.com, Inc., US
## 76 2017-08-26 09:41:34  52.85.173.210                     AMAZON-02 - Amazon.com, Inc., US
## 77 2017-08-26 09:41:34   52.85.173.88                     AMAZON-02 - Amazon.com, Inc., US
## 78 2017-08-27 22:02:41  13.32.253.169                     AMAZON-02 - Amazon.com, Inc., US
## 79 2017-08-27 22:02:41  13.32.253.203                     AMAZON-02 - Amazon.com, Inc., US
## 80 2017-08-27 22:02:41  13.32.253.209                     AMAZON-02 - Amazon.com, Inc., US
## 81 2017-08-29 13:17:37 54.230.141.201                     AMAZON-02 - Amazon.com, Inc., US
## 82 2017-08-29 13:17:37  54.230.141.83                     AMAZON-02 - Amazon.com, Inc., US
## 83 2017-08-29 13:17:37  54.230.141.30                     AMAZON-02 - Amazon.com, Inc., US
## 84 2017-08-29 13:17:37 54.230.141.193                     AMAZON-02 - Amazon.com, Inc., US
## 85 2017-08-29 13:17:37 54.230.141.152                     AMAZON-02 - Amazon.com, Inc., US
## 86 2017-08-29 13:17:37 54.230.141.161                     AMAZON-02 - Amazon.com, Inc., US
## 87 2017-08-29 13:17:37  54.230.141.38                     AMAZON-02 - Amazon.com, Inc., US
## 88 2017-08-29 13:17:37 54.230.141.151                     AMAZON-02 - Amazon.com, Inc., US

Unfortunately, I expected this. The owner keeps moving it around on AWS infrastructure.

So What?

This was an innocent link in a document on CRAN that went to a site that looked legit. A clever individual or organization found the dead domain and saw an opportunity to legitimize some fairly nasty stuff.

Now, I realize nobody is likely using “Rpad” anymore, but this type of situation can happen to any registered domain. If this individual or organization were doing more than trying to make objectionable content legit, they likely could have succeeded, especially if they enticed you with a shiny new devtools::install_…() link with promises of statistically sound animated cat emoji gif creation tools. They did an eerily good job of making this particular site still seem legit.

There’s nothing most folks can do to “fix” that site or have it removed. I’m not sure CRAN should remove the helpful PDF, but with a clickable link, it might be a good thing to suggest.

You’ll see that I used the splashr package (which has been submitted to CRAN but not there yet). It’s a good way to work with potentially malicious web content since you can “see” it and mine content from it without putting your own system at risk.

After going through this, I’ll see what I can do to put some bows on some of the devel-only packages and get them into CRAN so there’s a bit more assurance around using them.

I’m an army of one when it comes to fielding R-related security issues, but if you do come across suspicious items (like this or icky/malicious in other ways) don’t hesitate to drop me an @ or DM on Twitter.