I went completely daft this week and broke my months-long Twitter break due to the domestic terror event in my nation’s capitol. I’ll likely be resuming the break starting today.
Whilst keeping up with the final descent of the U.S. into a fully failed state, I also noticed that a debate from months ago on CRAN URL checks was still going strong.
I briefly chimed in those months ago and this week on the dangers of short URLs (which was not exactly the core topic of the debate which centered around HTTP URL redirects which is a feature of the protocol that URL shorteners happen to take advantage of).
Short URLs make it easier to type a URL out or remember a URL (if you can still get a decent, short keyword to use after the /
) but they’re dangerous. In case you’re one of the R folks who challenge my security chops, perhaps you’ll believe Bruce.
NOTE: Regular ol’ URLs can be, and are dangerous, too, especially if they’re used in an http://
context vs an https://
context or run by daft folks who think they’re capable of making a system fully impervious to attackers.
The pandemic has made “cyber” fairly hectic, so my plan to wrap up a safety checker and local package URLs re-writer into a small, usable tool/package has no ETA on completion. However, that doesn’t mean you can’t gain visibility into the number, types, and safety of URLs in your locally installed packages.
The code below has exposition in the comments – and you can find it here as well — so I’ll close with it vs my usual “FIN”.
Stay safe out there, folks; and — to my not-so-‘United’-after-all States readers — stay strong! The nightmare of the last four years is almost over (though the cleanup — now both physical and metaphorical — is going to take a long time).
library(urltools)
library(stringi)
library(tidyverse)
# we're also using {clipr} and {tools} but via ::: and ::
# fairly comprehensive list of URL shorteners
shorteners <- read_lines("https://github.com/sambokai/ShortURL-Services-List/raw/master/shorturl-services-list.txt")
# opaque function baked into {tools}
# NOTE: this can take a while
db <- tools:::url_db_from_installed_packages(rownames(installed.packages()), verbose = TRUE)
as_tibble(db) %>%
distinct() %>% # yep, even w/in a pkg there may be dups from ^^
mutate(
scheme = scheme(URL), # https or not
dom = domain(URL) # need this later to be able to compute apex domain
) %>%
filter(
dom != "..", # prbly legit since it will be a relative "go up one directory"
!is.na(dom) # the {tools} url_db_from_installed_packages() is not perfect
) %>%
bind_cols(
suffix_extract(.$dom) # break them all down into component atoms
) %>%
select(-dom) %>% # this is now 'host' from ^^
mutate(
apex = sprintf("%s.%s", domain, suffix) # apex domain
) %>%
mutate(
is_short = (host %in% shorteners) | (apex %in% shorteners) # does it use a shortener?
) -> db
db
## # A tibble: 12,623 x 9
## URL Parent scheme host subdomain domain suffix apex is_short
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl>
## 1 https://g… albersus… https gith… NA github com gith… FALSE
## 2 https://g… albersus… https gith… NA github com gith… FALSE
## 3 https://w… AnomalyD… https www.… www usenix org usen… FALSE
## 4 https://w… AnomalyD… https www.… www jstor org jsto… FALSE
## 5 https://w… AnomalyD… https www.… www usenix org usen… FALSE
## 6 https://w… AnomalyD… https www.… www jstor org jsto… FALSE
## 7 https://g… AnomalyD… https gith… NA github com gith… FALSE
## 8 https://g… AnomalyD… https gith… NA github com gith… FALSE
## 9 https://g… AnomalyD… https gith… NA github com gith… FALSE
## 10 https://g… AnomalyD… https gith… NA github com gith… FALSE
## # … with 12,613 more rows
# what packages do i have installed that use short URLS?
# a nice thing to do would be to file a PR to these authors
filter(db, is_short) %>%
select(
URL,
Parent,
scheme
)
## # A tibble: 5 x 3
## URL Parent scheme
## <chr> <chr> <chr>
## 1 https://goo.gl/5KBjL5 fpp2/man/goog.Rd https
## 2 http://bit.ly/2016votecount geofacet/man/election.Rd http
## 3 http://bit.ly/SnLi6h knitr/man/knit.Rd http
## 4 https://bit.ly/magickintro magick/man/magick.Rd https
## 5 http://bit.ly/2UaiYbo ssh/doc/intro.html http
# what protocols are in use? (you'll note that some are borked and
# others got mangled by the {tools} function)
count(db, scheme, sort=TRUE)
## # A tibble: 5 x 2
## scheme n
## <chr> <int>
## 1 https 10007
## 2 http 2498
## 3 NA 113
## 4 ftp 4
## 5 `https 1
# what are the most used top-level sites?
count(db, host, sort=TRUE) %>%
mutate(pct = n/sum(n))
## # A tibble: 1,108 x 3
## host n pct
## <chr> <int> <dbl>
## 1 docs.aws.amazon.com 3859 0.306
## 2 github.com 2954 0.234
## 3 cran.r-project.org 450 0.0356
## 4 en.wikipedia.org 220 0.0174
## 5 aws.amazon.com 204 0.0162
## 6 doi.org 181 0.0143
## 7 wikipedia.org 132 0.0105
## 8 developers.google.com 114 0.00903
## 9 stackoverflow.com 101 0.00800
## 10 gitlab.com 86 0.00681
## # … with 1,098 more rows
# same as ^^ but apex
count(db, apex, sort=TRUE) %>%
mutate(pct = n/sum(n))
## # A tibble: 743 x 3
## apex n pct
## <chr> <int> <dbl>
## 1 amazon.com 4180 0.331
## 2 github.com 2997 0.237
## 3 r-project.org 563 0.0446
## 4 wikipedia.org 352 0.0279
## 5 doi.org 221 0.0175
## 6 google.com 179 0.0142
## 7 tidyverse.org 151 0.0120
## 8 r-lib.org 137 0.0109
## 9 rstudio.com 117 0.00927
## 10 stackoverflow.com 102 0.00808
## # … with 733 more rows
# See all the eavesdroppable, interceptable,
# content-mutable-by-evil-MITM-network-operator URLs
# A nice thing to do would be to fix these and issue PRs
filter(db, scheme == "http") %>%
select(URL, Parent)
## # A tibble: 2,498 x 2
## URL Parent
## <chr> <chr>
## 1 http://www.winfield.demon.nl antiword/DESCRIPTION
## 2 http://github.com/ropensci/antiword/issues antiword/DESCRIPTION
## 3 http://dirk.eddelbuettel.com/code/anytime.html anytime/DESCRIPTION
## 4 http://arrayhelpers.r-forge.r-project.org/ arrayhelpers/DESCRI…
## 5 http://arrow.apache.org/blog/2019/01/25/r-spark-im… arrow/doc/arrow.html
## 6 http://docs.aws.amazon.com/AmazonS3/latest/API/RES… aws.s3/man/accelera…
## 7 http://docs.aws.amazon.com/AmazonS3/latest/API/RES… aws.s3/man/accelera…
## 8 http://docs.aws.amazon.com/AmazonS3/latest/dev/acl… aws.s3/man/acl.Rd
## 9 http://docs.aws.amazon.com/AmazonS3/latest/API/RES… aws.s3/man/bucket_e…
## 10 http://docs.aws.amazon.com/AmazonS3/latest/API/RES… aws.s3/man/bucketli…
## # … with 2,488 more rows
# find the abusers of "http" URLs
filter(db, scheme == "http") %>%
select(URL, Parent) %>%
mutate(
pkg = stri_match_first_regex(Parent, "(^[^/]+)")[,2]
) %>%
count(pkg, sort=TRUE)
## # A tibble: 265 x 2
## pkg n
## <chr> <int>
## 1 paws.security.identity 258
## 2 paws.management 152
## 3 XML 129
## 4 paws.analytics 78
## 5 stringi 70
## 6 paws 57
## 7 RCurl 51
## 8 igraph 49
## 9 base 47
## 10 aws.s3 44
## # … with 255 more rows
# send all the apex domains to the clipboard
clipr::write_clip(unique(db$apex))
# go here to paste them into the domain search box
# most domain/URL checker APIs aren't free for more
# than a cpl dozen URLs/domains
browseURL("https://www.bulkblacklist.com")
# paste what you clipped into the box and wait a while
It’s [Almost] Over; Much Damage Has Been Done; But I [We] Have A Call To Unexpected Action
NOTE: There’s a unique feed URL for R/tech stuff — https://rud.is/b/category/r/feed/. If you hit the generic “subscribe” button b/c the vast majority of posts have been on that, this isn’t one of those posts and you should probably delete it and move on with more important things than the rantings of silly man with a captain America shield.
The last 4+ years — especially the last ~10 months — had taken a bigger personal toll than I realized. I spent much of President-Elect Joseph R. Biden Jr.’s and Vice President-elect Kamala Harris’ first speeches as duly & honestly selected leaders of this nation unabashedly tear-filled. The wave of relief was overwhelming. Hearing kind, vibrant, uplifting, and articulately + professionally delivered words was like the finest symphonic production compared to the ALL CAPS productions that we’ve been forced to consume for so long.
The outgoing (perhaps a new neologism — “unpresidented” — should be used since so much of what this person did was criminally unprecedented) loser did damage our nation severely, but I’m ashamed to admit just how much damage I let him and those that support and detract him do to me.
President-elect Biden said this as part of his speech last night:
He went on to say:
And, still, further on:
What President-elect Biden did was socially engineer a Matthew 18:21-35 on me/us since what he’s calling on us (me) to do is forgive.
Forgive the Resident in Chief.
Forgive his supporters.
Forgive the right and left radicals whose severely flawed agendas have brought us to the brink of yet-another antebellum.
Forgive the Evangelicals who sold out American Christianity for a chance to be court evangelicals and wield even greater earthly power than they already did.
Forgive owners of establishments and organizations that showed support for MAGA and the outgoing POTUS.
Forgive the extended family on my spouse’s side who proudly supported and still support what is obviously evil.
And, forgive myself for — amongst a myriad of other things — just how un-Christ-like my hate, disdain, and despair has increasingly consumed myself and my words/actions over the past 4+ years.
I wish I could say I’m eager to do this. I am not. The self-righteous, smug, superior hate and disdain feels pretty good, doesn’t it? It’s kinda warm and fiery in a wretched country bourbon sort of way. It feels soothingly justified, too, doesn’t it? I mean, hundreds of thousands of living, breathing, amazing humans in America died directly because of “these people” (ah, how comforting acerbic tribal terminology can be), didn’t they? How can I possibly forgive that?
Fortunately — yes, fortunately — I have to, and if you’re still reading this and feel similarly to the preceding paragraph, I would strongly suggest you have to as well.
I have to because it is the foundation of my Faith (which I seem to have let evil convince me to forget for a while) and because it’s a cancer that will eventually subsume me if I let it (and I already beat physical cancer once, so I’m not letting a spiritual, emotional, and intellectual one win either).
We all have to — on all sides, since “right” and “left” are far too large buckets — if Joe and Kamala have even a remote chance to lead America into healing.
Now, I am not naive. The road ahead is long and fraught with peril. We are a deeply divided nation. Repair will take decades if it happens at all.
I’ll start by striving to take Colossians 3:12-17 more seriously and faithfully than I have ever taken it before and be ready to perform whatever actions are necessary to help this be a time for myself and our nation to heal.
I say “strive” as I had planned to conclude with some “I forgive…”s, but I quite literally cannot type anything but ellipses after those two words yet. Hopefully it won’t take too long to get past that for most of the above list. I’m not sure forgiving the last item on it will happen any time soon, though.
Stay safe. Wear a mask. Be kind.