I went completely daft this week and broke my months-long Twitter break due to the domestic terror event in my nation’s capitol. I’ll likely be resuming the break starting today.
Whilst keeping up with the final descent of the U.S. into a fully failed state, I also noticed that a debate from months ago on CRAN URL checks was still going strong.
I briefly chimed in those months ago and this week on the dangers of short URLs (which was not exactly the core topic of the debate which centered around HTTP URL redirects which is a feature of the protocol that URL shorteners happen to take advantage of).
Short URLs make it easier to type a URL out or remember a URL (if you can still get a decent, short keyword to use after the /
) but they’re dangerous. In case you’re one of the R folks who challenge my security chops, perhaps you’ll believe Bruce.
NOTE: Regular ol’ URLs can be, and are dangerous, too, especially if they’re used in an http://
context vs an https://
context or run by daft folks who think they’re capable of making a system fully impervious to attackers.
The pandemic has made “cyber” fairly hectic, so my plan to wrap up a safety checker and local package URLs re-writer into a small, usable tool/package has no ETA on completion. However, that doesn’t mean you can’t gain visibility into the number, types, and safety of URLs in your locally installed packages.
The code below has exposition in the comments – and you can find it here as well — so I’ll close with it vs my usual “FIN”.
Stay safe out there, folks; and — to my not-so-‘United’-after-all States readers — stay strong! The nightmare of the last four years is almost over (though the cleanup — now both physical and metaphorical — is going to take a long time).
library(urltools)
library(stringi)
library(tidyverse)
# we're also using {clipr} and {tools} but via ::: and ::
# fairly comprehensive list of URL shorteners
shorteners <- read_lines("https://github.com/sambokai/ShortURL-Services-List/raw/master/shorturl-services-list.txt")
# opaque function baked into {tools}
# NOTE: this can take a while
db <- tools:::url_db_from_installed_packages(rownames(installed.packages()), verbose = TRUE)
as_tibble(db) %>%
distinct() %>% # yep, even w/in a pkg there may be dups from ^^
mutate(
scheme = scheme(URL), # https or not
dom = domain(URL) # need this later to be able to compute apex domain
) %>%
filter(
dom != "..", # prbly legit since it will be a relative "go up one directory"
!is.na(dom) # the {tools} url_db_from_installed_packages() is not perfect
) %>%
bind_cols(
suffix_extract(.$dom) # break them all down into component atoms
) %>%
select(-dom) %>% # this is now 'host' from ^^
mutate(
apex = sprintf("%s.%s", domain, suffix) # apex domain
) %>%
mutate(
is_short = (host %in% shorteners) | (apex %in% shorteners) # does it use a shortener?
) -> db
db
## # A tibble: 12,623 x 9
## URL Parent scheme host subdomain domain suffix apex is_short
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl>
## 1 https://g… albersus… https gith… NA github com gith… FALSE
## 2 https://g… albersus… https gith… NA github com gith… FALSE
## 3 https://w… AnomalyD… https www.… www usenix org usen… FALSE
## 4 https://w… AnomalyD… https www.… www jstor org jsto… FALSE
## 5 https://w… AnomalyD… https www.… www usenix org usen… FALSE
## 6 https://w… AnomalyD… https www.… www jstor org jsto… FALSE
## 7 https://g… AnomalyD… https gith… NA github com gith… FALSE
## 8 https://g… AnomalyD… https gith… NA github com gith… FALSE
## 9 https://g… AnomalyD… https gith… NA github com gith… FALSE
## 10 https://g… AnomalyD… https gith… NA github com gith… FALSE
## # … with 12,613 more rows
# what packages do i have installed that use short URLS?
# a nice thing to do would be to file a PR to these authors
filter(db, is_short) %>%
select(
URL,
Parent,
scheme
)
## # A tibble: 5 x 3
## URL Parent scheme
## <chr> <chr> <chr>
## 1 https://goo.gl/5KBjL5 fpp2/man/goog.Rd https
## 2 http://bit.ly/2016votecount geofacet/man/election.Rd http
## 3 http://bit.ly/SnLi6h knitr/man/knit.Rd http
## 4 https://bit.ly/magickintro magick/man/magick.Rd https
## 5 http://bit.ly/2UaiYbo ssh/doc/intro.html http
# what protocols are in use? (you'll note that some are borked and
# others got mangled by the {tools} function)
count(db, scheme, sort=TRUE)
## # A tibble: 5 x 2
## scheme n
## <chr> <int>
## 1 https 10007
## 2 http 2498
## 3 NA 113
## 4 ftp 4
## 5 `https 1
# what are the most used top-level sites?
count(db, host, sort=TRUE) %>%
mutate(pct = n/sum(n))
## # A tibble: 1,108 x 3
## host n pct
## <chr> <int> <dbl>
## 1 docs.aws.amazon.com 3859 0.306
## 2 github.com 2954 0.234
## 3 cran.r-project.org 450 0.0356
## 4 en.wikipedia.org 220 0.0174
## 5 aws.amazon.com 204 0.0162
## 6 doi.org 181 0.0143
## 7 wikipedia.org 132 0.0105
## 8 developers.google.com 114 0.00903
## 9 stackoverflow.com 101 0.00800
## 10 gitlab.com 86 0.00681
## # … with 1,098 more rows
# same as ^^ but apex
count(db, apex, sort=TRUE) %>%
mutate(pct = n/sum(n))
## # A tibble: 743 x 3
## apex n pct
## <chr> <int> <dbl>
## 1 amazon.com 4180 0.331
## 2 github.com 2997 0.237
## 3 r-project.org 563 0.0446
## 4 wikipedia.org 352 0.0279
## 5 doi.org 221 0.0175
## 6 google.com 179 0.0142
## 7 tidyverse.org 151 0.0120
## 8 r-lib.org 137 0.0109
## 9 rstudio.com 117 0.00927
## 10 stackoverflow.com 102 0.00808
## # … with 733 more rows
# See all the eavesdroppable, interceptable,
# content-mutable-by-evil-MITM-network-operator URLs
# A nice thing to do would be to fix these and issue PRs
filter(db, scheme == "http") %>%
select(URL, Parent)
## # A tibble: 2,498 x 2
## URL Parent
## <chr> <chr>
## 1 http://www.winfield.demon.nl antiword/DESCRIPTION
## 2 http://github.com/ropensci/antiword/issues antiword/DESCRIPTION
## 3 http://dirk.eddelbuettel.com/code/anytime.html anytime/DESCRIPTION
## 4 http://arrayhelpers.r-forge.r-project.org/ arrayhelpers/DESCRI…
## 5 http://arrow.apache.org/blog/2019/01/25/r-spark-im… arrow/doc/arrow.html
## 6 http://docs.aws.amazon.com/AmazonS3/latest/API/RES… aws.s3/man/accelera…
## 7 http://docs.aws.amazon.com/AmazonS3/latest/API/RES… aws.s3/man/accelera…
## 8 http://docs.aws.amazon.com/AmazonS3/latest/dev/acl… aws.s3/man/acl.Rd
## 9 http://docs.aws.amazon.com/AmazonS3/latest/API/RES… aws.s3/man/bucket_e…
## 10 http://docs.aws.amazon.com/AmazonS3/latest/API/RES… aws.s3/man/bucketli…
## # … with 2,488 more rows
# find the abusers of "http" URLs
filter(db, scheme == "http") %>%
select(URL, Parent) %>%
mutate(
pkg = stri_match_first_regex(Parent, "(^[^/]+)")[,2]
) %>%
count(pkg, sort=TRUE)
## # A tibble: 265 x 2
## pkg n
## <chr> <int>
## 1 paws.security.identity 258
## 2 paws.management 152
## 3 XML 129
## 4 paws.analytics 78
## 5 stringi 70
## 6 paws 57
## 7 RCurl 51
## 8 igraph 49
## 9 base 47
## 10 aws.s3 44
## # … with 255 more rows
# send all the apex domains to the clipboard
clipr::write_clip(unique(db$apex))
# go here to paste them into the domain search box
# most domain/URL checker APIs aren't free for more
# than a cpl dozen URLs/domains
browseURL("https://www.bulkblacklist.com")
# paste what you clipped into the box and wait a while
Nos Autem Non In Antebellum; Bella Iam Inceperat
(Leading this with the periodic warning/reminder that this blog occasionally breaks from technical content and has category-based RSS feeds which can be used to ensure one never see non-technical content.)
Every decent human (which excludes 74,222,958 🇺🇸 who voted for this, now 100% undeniable, traitor) with knowledge of this past week’s tragic events is likely still processing — and will be for a while — what happened; I am no exception. The Feedly board I set up to save content I’ve been pouring over has 113 articles in it, so far.
Different aspects of the costume-clad, treasonous chaos have hit me daily, if not hourly.
Two newspaper paragraphs, each one about a different victim, have bubbled up to surface thoughts more often than much of the other stories of the week.
One is about Erin Schaff, a brave, talented journalist from the New York Times:
No one came.
People just watched.
While I am deeply shocked, outraged, and saddened by what happened to Erin, I am not surprised, given that the President of the United States wants journalists to be executed for regularly giving his 2017-2020 reality show bad reviews by stating undeniable facts. Furthermore, he has continually cultivated disdain and hatred for the media in his regiment of cult followers.
Erin is lucky to be alive, even if that means living in the ruins of a failed, so-called democracy.
President Trump is responsible for Erin’s assault, and he is going to get away with it.
The other is about Ashli Babbitt, the troubled insurrectionist who died assaulting the Capitol:
After the February 2020 impeachment proceedings failed to do anything substantive, the President boasted of feeling “untouchable”; and, in at a campaign rally in 2016, then Republican presidential candidate Trump boasted that he could “shoot somebody and not lose any voters.”
Trump has taken the lives of hundreds of thousands of Americans. And, while the gun wasn’t in his stubby hand, he is fully responsible for this woman’s shooting and death.
So, Trump was right: he is going to get away with it.
Mike Pence, Ted Cruz, Josh Hawley, Lindsey Graham, Susan Collins, Mitch McConnell, and a few hundred other evil, self-serving, elected cowards are all unindicted co-conspirators to Erin’s assault and Ashley’s death, as are countless “news” and talk show hosts.
Beyond what happened to these two women, this traitorous cabal also helped orchestrate this week’s current crescendo to Trump’s term in office.
I say “current” because there’s a non-zero chance of increased violence and bloodshed before January 20th despite Trump being deplatformed.
I have lost all hope that Trump will face any tangible consequences for his actions, which will only serve to embolden other wanna-be dictators like Cruz and Hawley.
What’s worse is that even after Biden’s victory was finally 100% sealed and one of America’s most cherished institutions was ransacked, the Trump supporters near me (rural-ish Maine) still, proudly, have their 2020 Trump campaign signs up and were very likely laughing and cheering the insurrection while Erin was being assaulted and Ashley’s life was ebbing away.
Even the Court Evangelicals have doubled-down in their support of Trump.
I (literally) pray I’m wrong, but it seems inevitable that the violence and bloodshed will continue through and after the 20th. As Biden tries to (also, literally) heal America by bringing science-fueled, centralized, enforced standards to quell the carnage of Covid, we will very likely and regularly see regional repeats of this week’s contemptible acts. As he and his administration attempt to right the many, many wrongs of the past four years (and more), these necessary actions will further push the ilk of this week to regularly manifest their entitlement-fueled rage.
Nos autem non in antebellum; bella iam inceperat.