Most modern operating systems keep secrets from you in many ways. One of these ways is by associating extended file attributes with files. These attributes can serve useful purposes. For instance, macOS uses them to identify when files have passed through the Gatekeeper or to store the URLs of files that were downloaded via Safari (though most other browsers add the com.apple.metadata:kMDItemWhereFroms
attribute now, too).
Attributes are nothing more than a series of key/value pairs. They key must be a character value & unique, and it’s fairly standard practice to keep the value component under 4K. Apart from that, you can put anything in the value: text, binary content, etc.
When you’re in a terminal session you can tell that a file has extended attributes by looking for an @
sign near the permissions column:
$ cd ~/Downloads $ ls -l total 264856 -rw-r--r--@ 1 user staff 169062 Nov 27 2017 1109.1968.pdf -rw-r--r--@ 1 user staff 171059 Nov 27 2017 1109.1968v1.pdf -rw-r--r--@ 1 user staff 291373 Apr 27 21:25 1804.09970.pdf -rw-r--r--@ 1 user staff 1150562 Apr 27 21:26 1804.09988.pdf -rw-r--r--@ 1 user staff 482953 May 11 12:00 1805.01554.pdf -rw-r--r--@ 1 user staff 125822222 May 14 16:34 RStudio-1.2.627.dmg -rw-r--r--@ 1 user staff 2727305 Dec 21 17:50 athena-ug.pdf -rw-r--r--@ 1 user staff 90181 Jan 11 15:55 bgptools-0.2.tar.gz -rw-r--r--@ 1 user staff 4683220 May 25 14:52 osquery-3.2.4.pkg
You can work with extended attributes from the terminal with the xattr
command, but do you really want to go to the terminal every time you want to examine these secret settings (now that you know your OS is keeping secrets from you)?
I didn’t think so. Thus begat the xattrs
? package.
Exploring Past Downloads
Data scientists are (generally) inquisitive folk and tend to accumulate things. We grab papers, data, programs (etc.) and some of those actions are performed in browsers. Let’s use the xattrs
package to rebuild a list of download URLs from the extended attributes on the files located in ~/Downloads
(if you’ve chosen a different default for your browsers, use that directory).
We’re not going to work with the entire package in this post (it’s really straightforward to use and has a README on the GitHub site along with extensive examples) but I’ll use one of the example files from the directory listing above to demonstrate a couple functions before we get to the main example.
First, let’s see what is hidden with the RStudio disk image:
library(xattrs)
library(reticulate) # not 100% necessary but you'll see why later
library(tidyverse) # we'll need this later
list_xattrs("~/Downloads/RStudio-1.2.627.dmg")
## [1] "com.apple.diskimages.fsck" "com.apple.diskimages.recentcksum"
## [3] "com.apple.metadata:kMDItemWhereFroms" "com.apple.quarantine"
There are four keys we can poke at, but the one that will help transition us to a larger example is com.apple.metadata:kMDItemWhereFroms
. This is the key Apple has standardized on to store the source URL of a downloaded item. Let’s take a look:
get_xattr_raw("~/Downloads/RStudio-1.2.627.dmg", "com.apple.metadata:kMDItemWhereFroms")
## [1] 62 70 6c 69 73 74 30 30 a2 01 02 5f 10 4c 68 74 74 70 73 3a 2f 2f 73 33 2e 61 6d 61
## [29] 7a 6f 6e 61 77 73 2e 63 6f 6d 2f 72 73 74 75 64 69 6f 2d 69 64 65 2d 62 75 69 6c 64
## [57] 2f 64 65 73 6b 74 6f 70 2f 6d 61 63 6f 73 2f 52 53 74 75 64 69 6f 2d 31 2e 32 2e 36
## [85] 32 37 2e 64 6d 67 5f 10 2c 68 74 74 70 73 3a 2f 2f 64 61 69 6c 69 65 73 2e 72 73 74
## [113] 75 64 69 6f 2e 63 6f 6d 2f 72 73 74 75 64 69 6f 2f 6f 73 73 2f 6d 61 63 2f 08 0b 5a
## [141] 00 00 00 00 00 00 01 01 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00
## [169] 00 00 00 89
Why “raw”? Well, as noted above, the value component of these attributes can store anything and this one definitely has embedded nul[l]s (0x00
) in it. We can try to read it as a string, though:
get_xattr("~/Downloads/RStudio-1.2.627.dmg", "com.apple.metadata:kMDItemWhereFroms")
## [1] "bplist00\xa2\001\002_\020Lhttps://s3.amazonaws.com/rstudio-ide-build/desktop/macos/RStudio-1.2.627.dmg_\020,https://dailies.rstudio.com/rstudio/oss/mac/\b\vZ"
So, we can kinda figure out the URL but it’s definitely not pretty. The general practice of Safari (and other browsers) is to use a binary property list to store metadata in the value component of an extended attribute (at least for these URL references).
There will eventually be a native Rust-backed property list reading package for R, but we can work with that binary plist data in two ways: first, via the read_bplist()
function that comes with the xattrs
package and wraps Linux/BSD or macOS system utilities (which are super expensive since it also means writing out data to a file each time) or turn to Python which already has this capability. We’re going to use the latter.
I like to prime the Python setup with invisible(py_config())
but that is not really necessary (I do it mostly b/c I have a wild number of Python — don’t judge — installs and use the RETICULATE_PYTHON
env var for the one I use with R). You’ll need to install the biplist
module via pip3 install bipist
or pip install bipist
depending on your setup. I highly recommended using Python 3.x vs 2.x, though.
biplist <- import("biplist", as="biplist")
biplist$readPlistFromString(
get_xattr_raw(
"~/Downloads/RStudio-1.2.627.dmg", "com.apple.metadata:kMDItemWhereFroms"
)
)
## [1] "https://s3.amazonaws.com/rstudio-ide-build/desktop/macos/RStudio-1.2.627.dmg"
## [2] "https://dailies.rstudio.com/rstudio/oss/mac/"
That's much better.
Let's work with metadata for the whole directory:
list.files("~/Downloads", full.names = TRUE) %>%
keep(has_xattrs) %>%
set_names(basename(.)) %>%
map_df(read_xattrs, .id="file") -> xdf
xdf
## # A tibble: 24 x 4
## file name size contents
##
## 1 1109.1968.pdf com.apple.lastuseddate#PS 16
## 2 1109.1968.pdf com.apple.metadata:kMDItemWhereFroms 110
## 3 1109.1968.pdf com.apple.quarantine 74
## 4 1109.1968v1.pdf com.apple.lastuseddate#PS 16
## 5 1109.1968v1.pdf com.apple.metadata:kMDItemWhereFroms 116
## 6 1109.1968v1.pdf com.apple.quarantine 74
## 7 1804.09970.pdf com.apple.metadata:kMDItemWhereFroms 86
## 8 1804.09970.pdf com.apple.quarantine 82
## 9 1804.09988.pdf com.apple.lastuseddate#PS 16
## 10 1804.09988.pdf com.apple.metadata:kMDItemWhereFroms 104
## # ... with 14 more rows
## count(xdf, name, sort=TRUE)
## # A tibble: 5 x 2
## name n
##
## 1 com.apple.metadata:kMDItemWhereFroms 9
## 2 com.apple.quarantine 9
## 3 com.apple.lastuseddate#PS 4
## 4 com.apple.diskimages.fsck 1
## 5 com.apple.diskimages.recentcksum 1
Now we can focus on the task at hand: recovering the URLs:
list.files("~/Downloads", full.names = TRUE) %>%
keep(has_xattrs) %>%
set_names(basename(.)) %>%
map_df(read_xattrs, .id="file") %>%
filter(name == "com.apple.metadata:kMDItemWhereFroms") %>%
mutate(where_from = map(contents, biplist$readPlistFromString)) %>%
select(file, where_from) %>%
unnest() %>%
filter(!where_from == "")
## # A tibble: 15 x 2
## file where_from
##
## 1 1109.1968.pdf https://arxiv.org/pdf/1109.1968.pdf
## 2 1109.1968.pdf https://www.google.com/
## 3 1109.1968v1.pdf https://128.84.21.199/pdf/1109.1968v1.pdf
## 4 1109.1968v1.pdf https://www.google.com/
## 5 1804.09970.pdf https://arxiv.org/pdf/1804.09970.pdf
## 6 1804.09988.pdf https://arxiv.org/ftp/arxiv/papers/1804/1804.09988.pdf
## 7 1805.01554.pdf https://arxiv.org/pdf/1805.01554.pdf
## 8 athena-ug.pdf http://docs.aws.amazon.com/athena/latest/ug/athena-ug.pdf
## 9 athena-ug.pdf https://www.google.com/
## 10 bgptools-0.2.tar.gz http://nms.lcs.mit.edu/software/bgp/bgptools/bgptools-0.2.tar.gz
## 11 bgptools-0.2.tar.gz http://nms.lcs.mit.edu/software/bgp/bgptools/
## 12 osquery-3.2.4.pkg https://osquery-packages.s3.amazonaws.com/darwin/osquery-3.2.4.p…
## 13 osquery-3.2.4.pkg https://osquery.io/downloads/official/3.2.4
## 14 RStudio-1.2.627.dmg https://s3.amazonaws.com/rstudio-ide-build/desktop/macos/RStudio…
## 15 RStudio-1.2.627.dmg https://dailies.rstudio.com/rstudio/oss/mac/
(There are multiple URL entries due to the fact that some browsers preserve the path you traversed to get to the final download.)
Note: if Python is not an option for you, you can use the hack-y read_bplist()
function in the package, but it will be much, much slower and you'll need to deal with an ugly list
object vs some quaint text vectors.
FIN
Have some fun exploring what other secrets your OS may be hiding from you and if you're on Windows, give this a go. I have no idea if it will compile or work there, but if it does, definitely report back!
Remember that the package lets you set and remove extended attributes as well, so you can use them to store metadata with your data files (they don't always survive file or OS transfers but if you keep things local they can be an interesting way to tag your files) or clean up items you do not want stored.
GDPR Unintended Consequences Part 1 — Increasing WordPress Blog Exposure
I pen this mini-tome on “GDPR Enforcement Day”. The spirit of GDPR is great, but it’s just going to be another Potempkin Village in most organizations much like PCI or SOX. For now, the only thing GDPR has done is made GDPR consulting companies rich, increased the use of javascript on web sites so they can pop-up useless banners we keep telling users not to click on and increase the size of email messages to include mandatory postscripts (that should really be at the beginning of the message, but, hey, faux privacy is faux privacy).
Those are just a few of the “unintended consequences” of GDPR. Just like Let’s Encrypt & “HTTPS Everywhere” turned into “Let’s Enable Criminals and Hurt Real People With Successful Phishing Attacks”, GDPR is going to cause a great deal of downstream issues that either the designers never thought of or decided — in their infinite, superior wisdom — were completely acceptable to make themselves feel better.
Today’s installment of “GDPR Unintended Consequences” is WordPress.
WordPress “powers” a substantial part of the internet. As such, it is a perma-target of attackers.
Since the GDPR Intelligentsia provided a far-too-long lead-time on both the inaugural and mandated enforcement dates for GDPR and also created far more confusion with the regulations than clarity, WordPress owners are flocking to “single button install” solutions to make them magically GDPR compliant (
#protip
that’s not “a thing”). Here’s a short list of plugins and active installation counts (no links since I’m not going to encourage attack surface expansion):I’m somewhat confident that a fraction of those publishers follow secure coding guidelines (it may be a small fraction). But, if I was an attacker, I’d be poking pretty hard at a few of those with six-figure installs to see if I could find a usable exploit.
GDPR just gave attackers a huge footprint of homogeneous resources to attempt at-scale exploits. They will very likely succeed (over-and-over-and-over again). This means that GDPR just increased the likelihood of losing your data privacy…the complete opposite of the intent of the regulation.
There are more unintended consequences and I’ll pepper the blog with them as the year and pain progresses.