Skip navigation

Category Archives: macOS

If you’re an R/RStudio user who has migrated to Mojave (macOS 10.14) or are contemplating migrating to it, you will likely eventually run into an issue where you’re trying to access resources that are in Apple’s new hardened filesystem sandboxes.

Rather than reinvent the wheel by blathering about what that means, give these links a visit if you want more background info:

and/or Google/DuckDuckGo for macos privacy tcc full disk access as there have been a ton of blogs on this topic.

The TLDR for getting your R/RStudio bits to work in these sandboxed areas is to grant them “Full Disk Access” (if that sounds scary, it is) permissions. You can do that by adding both the R executable and RStudio executable (drag their icons) to the Full Disk Access pane under the Privacy tab of System Preferences => Security & Privacy:

I also used the Finder’s “Go To” command to head on over /usr/local/bin and use the “Show Original” popup on both R and Rscript and dragged their fully qualified path binaries into the pane as well (you can’t see them b/c the pane is far too small). The symbolic links might be sufficient, but I’ve been running this way since the betas and just re-drag the versioned R/Rscript binaries each time I upgrade (or rebuild) R.

If you do grant FDA to R/RStudio just make sure be even more careful about trusting R code you run from the internet or R packages you install from untrusted sources (like GitHub) since R/RStudio are now potential choice conduits for malicious code that wants to get at your seekret things.

Photo by Alexander Dummer on Unsplash

Apple has run the death bell on 32-bit macOS apps and, if you’re running a recent macOS version on your Mac (which you should so you can get security updates) you likely see this alert from time-to-time:

If you’re like me, you click through that and keep working but later ponder just how many of those apps you have. They are definitely going away, so knowing if your favourite app is on the chopping block is likely a good idea.

You can get this information via the “About This Mac”͢”System Report” app and sorting via one of the table fields.

R folks are data folks and we know we can do better than that. But, first we need to get the data. Thankfully, we can get this via the system_profiler command-line utility since it can both display user-friendly information in the terminal and also generate an XML version of the information to work with. We won’t need to head to the terminal for this work, though, since there are many ways to execute the command from R and read the generated output.

Executing System Calls from R

Base R provides two core methods for issuing a system call:

  • system()
  • system2()

Note that there are other functions provided with a base R installation that can also issue system commands and process the “piped” output, but we’ll focus on these deliberate invocation ones.

Functions in two other packages can also assist with this task and we’ll include a look at:

  • processx::run()
  • sys::exec_internal()

as well.

Why leave base R for this task? Truthfully, we really don’t need to, but both sys and processx have other tasks which do make them handy tools in your package toolbox. Having said that, keep reading since we’re going to end up not choosing the built-in functions for this task.

This is the command line we need to execute:

system_profiler -xml -detailLevel full SPApplicationsDataType

Let’s load up all the packages we’ll be needing and execute this command-line all four ways, then briefly discuss the differences:

library(sys)
library(processx)
library(microbenchmark)
library(xml2)
library(tidyverse)

system(
  command = "system_profiler -xml -detailLevel full SPApplicationsDataType",
  intern = TRUE
) -> apps_system

str(apps_system)
##  chr [1:10665] "" ...

system2(
  command = "system_profiler",
  args = c("-xml", "-detailLevel", "full", "SPApplicationsDataType"),
  stdout = TRUE
) -> apps_system2

str(apps_system2)
##  chr [1:10665] "" ...

processx::run(
  command = "system_profiler",
  args = c("-xml", "-detailLevel", "full", "SPApplicationsDataType"),
  spinner = TRUE
) -> apps_processx_run

str(apps_processx_run)
## List of 4
##  $ status : int 0
##  $ stdout : chr "XML STRING THAT prism.js won't let me show"
##  $ stderr : chr ""
##  $ timeout: logi FALSE

sys::exec_internal(
  cmd = "system_profiler",
  args = c("-xml", "-detailLevel", "full", "SPApplicationsDataType")
) -> apps_sys_exec_internal

str(apps_sys_exec_internal)
## List of 3
##  $ status: int 0
##  $ stdout: raw [1:331133] 3c 3f 78 6d ...
##  $ stderr: raw(0) 

The core difference between system() and the rest is that you need to shQuote()? for system() whereas that’s taken care of for you by the others (so they’re a bit safer by default since you’re more than likely going to forget to shQuote()).

You can definitely notice the main differences in return objects. The built-in functions just give us the character data from the standard output stream (stdout) and the last two return a more structured object that provides more explicit information about the job we just executed. The base ones can provide this detail, but it’s a twisty maze of remembering which options do what vs the more (IMO) straightforward approach both processx and sys take.

You’ll also notice a difference in stdout between processx and sys with the latter giving us a raw vector vs a character vector. This gives us a great deal of power and flexibility. It also turns out to be a great choice for processing command-line-generated XML data. Here’s why:

microbenchmark(
  sys = xml2::read_xml(apps_sys_exec_internal$stdout),
  processx = xml2::read_xml(apps_processx_run$stdout)
)
## Unit: milliseconds
##      expr      min       lq      mean    median        uq      max neval
##       sys 4.086492  4.60078  9.085143  5.508814  5.906942 207.6495   100
##  processx 9.510356 10.98282 14.275499 12.054810 13.292234 163.9870   100

It turns out xml2::read_xml() makes much quicker work of the raw vector data (though, I mean, really—are we really going to care about those ~5ms IRL?).

We’ll move on to the real reason for the post, but definitely explore both sys and processx since they are both super-handy packages.

“Can we please just find the 32-bit apps already?”

No problem. Well, actually, there is a minor annoyance. These are property list XML files and I’ll confess that I truly hate this format. There are “dictionary arrays” of key and value nodes, but those nodes are siblings vs directly associated pairs. So, we have to use the sibling relationship to work with them. It’s not hard, per se, just (again, IMO) suboptimal.

Let’s take a look at it:

apps <- read_xml(apps_sys_exec_internal$stdout)

xml_find_all(apps, "//array/dict")
## {xml_nodeset (476)}
##  [1] \n  _SPCommandLineArguments\n  \n     ...
##  [2] \n  _name\n  Sublime Text\n  h ...
##  [3] \n  _name\n  System Preferences\n   ...
##  [4] \n  _name\n  Google Chrome Canary\n ...
##  [5] \n  _name\n  Google Chrome\n   ...
##  [6] \n  _name\n  Dropbox\n  has64B ...
##  [7] \n  _name\n  Keypad\n  has64Bi ...
##  [8] \n  _name\n  Garmin WebUpdater\n  < ...
##  [9] \n  _name\n  LaTeXiT\n  has64B ...
## [10] \n  _name\n  CocoaPacketAnalyzer\n  ...
## [11] \n  _name\n  Janetter\n  has64 ...
## [12] \n  _name\n  VMware Fusion\n   ...
## [13] \n  _name\n  Photo Library Migration Utility ...
## [14] \n  _name\n  Setup Assistant\n  \n  _name\n  Siri\n  has64BitI ...
## [16] \n  _name\n  Software Update\n  \n  _name\n  Spotlight\n  has6 ...
## [18] \n  _name\n  Stocks\n  has64Bi ...
## [19] \n  _name\n  SystemUIServer\n  \n  _name\n  UniversalAccessControl ...
## ...

Let’s look at a sample record using xml_view() from the htmltidy package:

xml_find_all(apps, "//array/dict[key='_name']")[1] %>% 
  htmltidy::xml_view()

Be wary of using xml_view() on giant XML structures since it’ll freeze up RStudio for a bit and even slows down Chrome since the resultant, composed DOM object can get ginormous.

Now we know we can use has64BitIntelCode for filtering once we get to the data. Let’s read in all the apps, cherry-picking the fields and then just look at the 32-bit apps:

xml_find_all(apps, "//array/dict[key='_name']") %>% 
  map_df(~{
    list(
      name = xml_find_first(.x, ".//string") %>% xml_text(),
      path = xml_find_first(.x, ".//key[.='path']/following-sibling::string") %>% xml_text(),
      is_64bit = xml_find_first(.x, ".//key[.='has64BitIntelCode']/following-sibling::string") %>% xml_text() 
    )
  }) %>% 
  filter(is_64bit == "no") %>% 
  arrange(name) %>% 
  select(-is_64bit)
## # A tibble: 30 x 2
##    name                      path                                           
##                                                                   
##  1 AAM Registration Notifier /Applications/Utilities/Adobe Application Mana…
##  2 AAM Registration Notifier /Applications/Utilities/Adobe Application Mana…
##  3 AAM Updates Notifier      /Applications/Utilities/Adobe Application Mana…
##  4 AAMLauncherUtil           /Applications/Utilities/Adobe Application Mana…
##  5 ACR_9_10                  /Library/Application Support/Adobe/Uninstall/A…
##  6 Adobe Application Manager /Applications/Utilities/Adobe Application Mana…
##  7 adobe_licutil             /Applications/Utilities/Adobe Application Mana…
##  8 Audacity                  /Applications/Audacity.app                     
##  9 COCM_1_0_32               /Library/Application Support/Adobe/Uninstall/C…
## 10 COPS_1_0_32               /Library/Application Support/Adobe/Uninstall/C…
## # ... with 20 more rows

The Adobe helper apps are longstanding 32-bit “offenders”. Many of these death-row apps fall into the “helper” category and will hopefully get some attention by their developers. I do find it amusing that Apple kinda wants us to prod the developers to get their collective acts together.

FIN

This exhaustive search finds all of the 32-bit apps residing on your system. If you just want to see the one’s you’ve executed and macOS has kept track of, you can drop to a command-line and do:

sudo Rscript -e 'knitr::kable((dplyr::select(dplyr::tbl(dplyr::src_sqlite("/var/db/SystemPolicyConfiguration/ExecPolicy"), "legacy_exec_history_v3"), responsible_path)))'

You need elevated privileges to access those files, so please read that whole line to make sure I’m not having you rm -rf /.

Remember to take some time to explore the sys and processx packages and perhaps bundle up the salient bits of this post into a script so you can occasionally check to see the 32-bit eradication progress.

(General reminder abt “R⁶” posts in that they are heavy on code-examples, minimal on expository. I try to design them with 2-3 “nuggets” embedded for those who take the time to walk through the code examples on their systems. I’ll always provide further expository if requested in a comment, so don’t hesitate to ask if something is confusing.)

I had to check something on the macOS systems across the abode today and — on a lark — decided to do all the “shell” scripting in R vs bash for a change. After performing the tasks, it occurred to me that not all R users on macOS realize there are hidden gems of information spread across the “boring” parts of the filesystem in SQLite databases. So, I put together a small example that:

  • identifies all the top-level apps in /Applications
  • extracts code signing information from them into a tibble
  • grabs the Gatekeeper whitelist database
  • uses an internal SQLite transformation function inside a dplyr chain
  • merges the info and gives you a basis to explore your apps
  • shows off the wicked cool processx package by @gaborcsardi that you should be using in place of system() / system2()

A quick note about this Gatekeeper database: If an app is not already recognized by macOS and you allow it to run your local system is updated to “whitelist” this app so you don’t get the notification in the future. Apple maintains a large signature list that gets updated throughout the year on your system and your own list is merged with it.

A second quick note (I guess I’m doing more expository than promised :-) about QUOTE(): In certain dplyr / dbplyr contexts, what looks like local function calls are actually passed over to the SQL side vs used locally. Even though quote() is an internal R function, the use of the lowercase version of it would still go over to the SQL side vs be eval’d locally. However, to avoid confusing others, I try to uppercase conflicts like this when they occur.

library(processx)
library(stringi)
library(docxtractr) # install_github("hrbrmstr/docxtractr")
library(tidyverse)

list.files("/Applications", pattern = "app$", full.names = TRUE) %>% 
  map_df(~{
    message(.x)
    res <- run("codesign", args = c("-dvvv", .x), error_on_status = FALSE)
    if (any(grepl("not signed at all", res$stderr))) {
      list(Executable = .x)
    } else {
      stri_split_lines(res$stderr)[[1]] %>% 
        keep(~grepl("=", .)) %>% 
        stri_split_fixed("=", 2, simplify = TRUE) -> res
      as.list(set_names(res[,2], res[,1]))
    }
  }) %>% 
  mcga() %>% 
  mutate(short_name = stri_replace_last_fixed(basename(executable), ".app", "")) -> my_apps

glimpse(my_apps)
## Observations: 102
## Variables: 20
## $ executable                  <chr> "/Applications/1Password 6.app/Con...
## $ identifier                  <chr> "com.agilebits.onepassword4", "com...
## $ format                      <chr> "app bundle with Mach-O thin (x86_...
## $ codedirectory_v             <chr> "20200 size=29594 flags=0x0(none) ...
## $ hash_type                   <chr> "sha256 size=32", "sha256 size=32"...
## $ candidatecdhash_sha1        <chr> "e21ac2a66473feec6276b448fc518678d...
## $ candidatecdhash_sha256      <chr> "8cf4cb4bdbea3b4d4f9e293e1aee1edb7...
## $ hash_choices                <chr> "sha1,sha256", "sha1,sha256", "sha...
## $ cdhash                      <chr> "8cf4cb4bdbea3b4d4f9e293e1aee1edb7...
## $ signature_size              <chr> "8915", "8936", "4610", "8528", "4...
## $ authority                   <chr> "Apple Root CA", "Apple Root CA", ...
## $ timestamp                   <chr> "Jul 17, 2017, 10:18:00 AM", "Jul ...
## $ info_plist_entries          <chr> "32", "31", "30", "17", "30", "18"...
## $ teamidentifier              <chr> "2BUA8C4S2C", "XZZXE9SED4", "94KV3...
## $ sealed_resources_version    <chr> "2 rules=12 files=2440", "2 rules=...
## $ internal_requirements_count <chr> "1 size=220", "1 size=192", "1 siz...
## $ signed_time                 <chr> NA, NA, "Jul 13, 2017, 6:40:13 PM"...
## $ library_validation_warning  <chr> NA, NA, NA, "OS X SDK version befo...
## $ platform_identifier         <chr> NA, NA, NA, NA, "2", NA, NA, "2", ...
## $ short_name                  <chr> "1Password 6", "Alfred 3", "Amazon...

# this is macOS atekeeper whitelist db
db <- src_sqlite("/private/var/db/gkopaque.bundle/Contents/Resources/gkopaque.db")

db
## src:  sqlite 3.19.3 [/private/var/db/gkopaque.bundle/Contents/Resources/gkopaque.db]
## tbls: conditions, merged, whitelist
whitelist <- tbl(db, "whitelist")

# they are binary blobs, but we need them as text, so use SQLite's "QUOTE()" function
mutate(whitelist, current = QUOTE(current)) %>% 
  select(current) %>% 
  collect() %>% 
  mutate(
    cdhash = stri_replace_first_fixed(current, "X'", "") %>% 
      stri_replace_last_fixed("'", "") %>% 
      stri_trans_tolower()
  ) -> gatekeeper_whitelist_hashes

my_apps <- left_join(my_apps, gatekeeper_whitelist_hashes)

# Unsigned apps
filter(my_apps, is.na(cdhash)) %>% 
  select(short_name)

# you can see your own output here ;-)

# App organization
select(my_apps, identifier) %>% 
  mutate(
    identifier = stri_split_fixed(identifier, ".", simplify = TRUE)[,2],
    identifier = ifelse(identifier == "", "UNSPECIFIED", identifier)
  ) %>% 
  count(identifier, sort=TRUE)
## # A tibble: 47 x 2
##        identifier     n
##             <chr> <int>
##  1          apple    37
##  2    UNSPECIFIED    10
##  3      microsoft     6
##  4  eclecticlight     3
##  5         google     3
##  6         amazon     2
##  7      agilebits     1
##  8 antonycourtney     1
##  9       appscape     1
## 10   audacityteam     1
## # ... with 37 more rows


# Which apps have code-signing validation warnings
filter(my_apps, !is.na(library_validation_warning)) %>% 
  select(short_name, library_validation_warning)
## # A tibble: 7 x 2
##        short_name
##             <chr>
## 1    Amazon Music
## 2        Audacity
## 3         firefox
## 4         RStudio
## 5             VLC
## 6 Webcam Settings
## 7     WordService
## # ... with 1 more variables: library_validation_warning <chr>

NOTE: One could easily extend this example to look for apps across the filesystem.

I caught the 0.7.0 release of dplyr on my home CRAN server early Friday morning and immediately set out to install it since I’m eager to finish up my sergeant package and get it on CRAN. “Tidyverse” upgrades aren’t trivial for me as I tinker quite a bit with the tidyverse and create packages that depend on various components. The sergeant package provides — amongst other things — a dplyr back-end for Apache Drill, so it has more tidyverse tendrils than other bits of code I maintain.

macOS binaries weren’t available yet (it generally takes 24-48 hrs for that) so I did an install.packages("dplyr", type="source") and was immediately hit with gcc 7 compilation errors. This seemed odd, but switching back to clang worked fine.

I, then, proceeded to run chunks in an Rmd I’m working on and hit “Encoding” errors on mutate() calls. Not having time to debug further I reverted to 0.5.0 of dplyr and went about my day and promised the tidyverse maintainers that I’d work on a reproducible example after work.

I made R data files from the data frames that were tossing errors and extracted & tweaked a code snippet that consistently generated the error and created a rocker container on one of my linux boxes to validate that this was an error and a cross-platform one. The rocker container used a full fresh-from-source copy of the tidyverse including dplyr 0.7.0. The code worked and no error was generated, so I immediately suspected package rot on my main dev macOS box.

Now, my situation is complicated by an insanely hasty migration to macOS 10.13β1 (I refuse to use the Apple macOS catchy names anymore since the most recent one is just silly) and a move to the gcc 7 toolchain (initially prompted to both get rJava working nicely and reproduce some CRAN noted errors with some packages). Further complications were also created by many invocations of install_github() of various packages regularly overwriting bits of the tidyverse over the past few weeks since the R 3.4.0 release. In other words, the integrity of the “tidyverse” was in serious question on my system and it was time for the Clean Slate Protocol.

Rather than itemize package versions and surgically nipping and tucking, I opted to use packrat to get to my desired end-state of a full-integrity tidyverse install. There are many ways to do this. Feel free to “one-up” me and show your l33t method in the comments. This one will likely be accessible to most — if not all — R users.

I started a new RStudio project in a new session and told it to use packrat. In the new project console, I did install.packages("tidyverse", type="source") and let it go for many minutes. I, then, navigated to the packrat subdirectory where the 3.4 package binaries are housed (just follow the project packrat tree down to the R version directory) and moved all 51 packages (yes, 51 O_o) to the main R library path (which you can figure out by running .libPaths() in any non-packrat-maintained project).

After doing that, I fired up the originally failing Rmd and everything worked fine. ?

I don’t do the Clean Slate Protocol too often (we all get to for new R dot-releases) but it came in handy this time. If you run into errors when trying to get the new dplyr working, you may benefit from the Clean Slate Protocol as well.

If you haven’t seen the changes in 0.6.0/0.7.0 you should check them out and give it a go.

An R user asked a question regarding whether it’s possible to have the RStudio pipe (%>%) shortcut (Cmd-Shift-M) available in other macOS applications. If you’re using Alfred then you can use this workflow for said task (IIRC this requires an Alfred license which is reasonably cheap).

When you add it to Alfred you must edit it to make Cmd-Shift-M the hotkey since Alfred strips the keys on import (for good reasons). I limited the workflow to a few apps (Safari, Chrome, Sublime Text, iTerm) and I think it makes sense to limit the apps it applies to (you don’t need the operator everywhere, at least IMO).

I can’t believe I didn’t do this earlier. I use R in the terminal a bit and mis-hit Cmd-Shift-M all the time when I do (since RStudio is my primary editor for R code and muscle memory is scarily powerful). I also have to use (ugh) Jupyter notebooks on occasion and this will help there, too. If you end up modifying or using the workflow, drop a note in the comments.