Apple has brought Screen Time to macOS for some time now and that means it has to store this data somewhere. Thankfully, Sarah Edwards has foraged through the macOS filesystem for us and explained where these bits of knowledge are in her post, Knowledge is Power! Using the macOS/iOS knowledgeC.db Database to Determine Precise User and Application Usage, which ultimately reveals the data lurks in ~/Library/Application Support/Knowledge/knowledgeC.db
. Sarah also has a neat little Python utility dubbed APOLLO (Apple Pattern of Life Lazy Output’er) which has a smattering of knowledgeC.db
canned SQL queries that cover a myriad of tracked items.
Today, we’ll show how to work with this database in R and the {tidyverse} to paint our own pictures of application usage.
There are quite a number of tables in the knowledgeC.db
SQLite 3 database:
That visual schema was created in OmniGraffle via a small R script that uses the OmniGraffle automation framework. The OmniGraffle source files are also available upon request.
Most of the interesting bits (for any tracking-related spelunking) are in the ZOBJECT
table and to get a full picture of usage we’ll need to join it with some other tables that are connected via a few foreign keys:
There are a few ways to do this in {tidyverse} R. The first is an extended straight SQL riff off of one of Sarah’s original queries:
library(hrbrthemes) # for ggplot2 machinations
library(tidyverse)
# source the knowledge db
kdb <- src_sqlite("~/Library/Application Support/Knowledge/knowledgeC.db")
tbl(
kdb,
sql('
SELECT
ZOBJECT.ZVALUESTRING AS "app",
(ZOBJECT.ZENDDATE - ZOBJECT.ZSTARTDATE) AS "usage",
CASE ZOBJECT.ZSTARTDAYOFWEEK
WHEN "1" THEN "Sunday"
WHEN "2" THEN "Monday"
WHEN "3" THEN "Tuesday"
WHEN "4" THEN "Wednesday"
WHEN "5" THEN "Thursday"
WHEN "6" THEN "Friday"
WHEN "7" THEN "Saturday"
END "dow",
ZOBJECT.ZSECONDSFROMGMT/3600 AS "tz",
DATETIME(ZOBJECT.ZSTARTDATE + 978307200, \'UNIXEPOCH\') as "start_time",
DATETIME(ZOBJECT.ZENDDATE + 978307200, \'UNIXEPOCH\') as "end_time",
DATETIME(ZOBJECT.ZCREATIONDATE + 978307200, \'UNIXEPOCH\') as "created_at",
CASE ZMODEL
WHEN ZMODEL THEN ZMODEL
ELSE "Other"
END "source"
FROM
ZOBJECT
LEFT JOIN
ZSTRUCTUREDMETADATA
ON ZOBJECT.ZSTRUCTUREDMETADATA = ZSTRUCTUREDMETADATA.Z_PK
LEFT JOIN
ZSOURCE
ON ZOBJECT.ZSOURCE = ZSOURCE.Z_PK
LEFT JOIN
ZSYNCPEER
ON ZSOURCE.ZDEVICEID = ZSYNCPEER.ZDEVICEID
WHERE
ZSTREAMNAME = "/app/usage"'
)) -> usage
usage
## # Source: SQL [?? x 8]
## # Database: sqlite 3.29.0 [/Users/johndoe/Library/Application Support/Knowledge/knowledgeC.db]
## app usage dow tz start_time end_time created_at source
## <chr> <int> <chr> <int> <chr> <chr> <chr> <chr>
## 1 com.bitrock.appinstaller 15 Friday -4 2019-10-05 01:11:27 2019-10-05 01:11:42 2019-10-05 01:11:… MacBookPro13…
## 2 com.tinyspeck.slackmacg… 4379 Tuesday -4 2019-10-01 13:19:24 2019-10-01 14:32:23 2019-10-01 14:32:… Other
## 3 com.tinyspeck.slackmacg… 1167 Tuesday -4 2019-10-01 18:19:24 2019-10-01 18:38:51 2019-10-01 18:38:… Other
## 4 com.tinyspeck.slackmacg… 1316 Tuesday -4 2019-10-01 19:13:49 2019-10-01 19:35:45 2019-10-01 19:35:… Other
## 5 com.tinyspeck.slackmacg… 12053 Thursday -4 2019-10-03 12:25:18 2019-10-03 15:46:11 2019-10-03 15:46:… Other
## 6 com.tinyspeck.slackmacg… 1258 Thursday -4 2019-10-03 15:50:16 2019-10-03 16:11:14 2019-10-03 16:11:… Other
## 7 com.tinyspeck.slackmacg… 2545 Thursday -4 2019-10-03 16:24:30 2019-10-03 17:06:55 2019-10-03 17:06:… Other
## 8 com.tinyspeck.slackmacg… 303 Thursday -4 2019-10-03 17:17:10 2019-10-03 17:22:13 2019-10-03 17:22:… Other
## 9 com.tinyspeck.slackmacg… 9969 Thursday -4 2019-10-03 17:33:38 2019-10-03 20:19:47 2019-10-03 20:19:… Other
## 10 com.tinyspeck.slackmacg… 2813 Thursday -4 2019-10-03 20:19:52 2019-10-03 21:06:45 2019-10-03 21:06:… Other
## # … with more rows
Before explaining what that query does, let’s rewrite it {dbplyr}-style:
tbl(kdb, "ZOBJECT") %>%
mutate(
created_at = datetime(ZCREATIONDATE + 978307200, "UNIXEPOCH", "LOCALTIME"),
start_dow = case_when(
ZSTARTDAYOFWEEK == 1 ~ "Sunday",
ZSTARTDAYOFWEEK == 2 ~ "Monday",
ZSTARTDAYOFWEEK == 3 ~ "Tuesday",
ZSTARTDAYOFWEEK == 4 ~ "Wednesday",
ZSTARTDAYOFWEEK == 5 ~ "Thursday",
ZSTARTDAYOFWEEK == 6 ~ "Friday",
ZSTARTDAYOFWEEK == 7 ~ "Saturday"
),
start_time = datetime(ZSTARTDATE + 978307200, "UNIXEPOCH", "LOCALTIME"),
end_time = datetime(ZENDDATE + 978307200, "UNIXEPOCH", "LOCALTIME"),
usage = (ZENDDATE - ZSTARTDATE),
tz = ZSECONDSFROMGMT/3600
) %>%
left_join(tbl(kdb, "ZSTRUCTUREDMETADATA"), c("ZSTRUCTUREDMETADATA" = "Z_PK")) %>%
left_join(tbl(kdb, "ZSOURCE"), c("ZSOURCE" = "Z_PK")) %>%
left_join(tbl(kdb, "ZSYNCPEER"), "ZDEVICEID") %>%
filter(ZSTREAMNAME == "/app/usage") %>%
select(
app = ZVALUESTRING, created_at, start_dow, start_time, end_time, usage, tz, source = ZMODEL
) %>%
mutate(source = ifelse(is.na(source), "Other", source)) %>%
collect() %>%
mutate_at(vars(created_at, start_time, end_time), as.POSIXct) -> usage
What we’re doing is pulling out the day of week, start/end usage times & timezone info, app bundle id, source of the app interactions and the total usage time for each entry along with when that entry was created. We need to do some maths since Apple stores time-y whime-y info in its own custom format, plus we need to convert numeric DOW to labeled DOW.
The bundle ids are pretty readable, but they’re not really intended for human consumption, so we’ll make a translation table for the bundle id to app name by using the mdls
command.
list.files(
c("/Applications", "/System/Library/CoreServices", "/Applications/Utilities", "/System/Applications"), # main places apps are stored (there are potentially more but this is sufficient for our needs)
pattern = "\\.app$",
full.names = TRUE
) -> apps
x <- sys::exec_internal("mdls", c("-name", "kMDItemCFBundleIdentifier", "-r", apps))
# mdls null (\0) terminates each entry so we have to do some raw surgery to get it into a format we can use
x$stdout[x$stdout == as.raw(0)] <- as.raw(0x0a)
tibble(
name = gsub("\\.app$", "", basename(apps)),
app = read_lines(x$stdout)
) -> app_trans
app_trans
## # A tibble: 270 x 2
## name app
## <chr> <chr>
## 1 1Password 7 com.agilebits.onepassword7
## 2 Adium com.adiumX.adiumX
## 3 Agenda com.momenta.agenda.macos
## 4 Alfred 4 com.runningwithcrayons.Alfred
## 5 Amazon Music com.amazon.music
## 6 Android File Transfer com.google.android.mtpviewer
## 7 Awsaml com.rapid7.awsaml
## 8 Bartender 2 com.surteesstudios.Bartender
## 9 BBEdit com.barebones.bbedit
## 10 BitdefenderVirusScanner com.bitdefender.BitdefenderVirusScanner
## # … with 260 more rows
The usage info goes back ~30 days, so let’s do a quick summary of the top 10 apps and their total usage (in hours):
usage %>%
group_by(app) %>%
summarise(first = min(start_time), last = max(end_time), total = sum(usage, na.rm=TRUE)) %>%
ungroup() %>%
mutate(total = total / 60 / 60) %>% # hours
arrange(desc(total)) %>%
left_join(app_trans) -> overall_usage
overall_usage %>%
slice(1:10) %>%
left_join(app_trans) %>%
mutate(name = fct_inorder(name) %>% fct_rev()) %>%
ggplot(aes(x=total, y=name)) +
geom_segment(aes(xend=0, yend=name), size=5, color = ft_cols$slate) +
scale_x_comma(position = "top") +
labs(
x = "Total Usage (hrs)", y = NULL,
title = glue::glue('App usage in the past {round(as.numeric(max(usage$end_time) - min(usage$start_time), "days"))} days')
) +
theme_ft_rc(grid="X")
There’s a YUGE flaw in the current way macOS tracks application usage. Unlike iOS where apps really don’t run simultaneously (with iPadOS they kinda can/do, now), macOS apps are usually started and left open along with other apps. Apple doesn’t do a great job identifying only active app usage activity so many of these usage numbers are heavily inflated. Hopefully that will be fixed by macOS 10.15.
We have more data at our disposal, so let’s see when these apps get used. To do that, we’ll use segments to plot individual usage tracks and color them by weekday/weekend usage (still limiting to top 10 for blog brevity):
usage %>%
filter(app %in% overall_usage$app[1:10]) %>%
left_join(app_trans) %>%
mutate(name = factor(name, levels = rev(overall_usage$name[1:10]))) %>%
ggplot() +
geom_segment(
aes(
x = start_time, xend = end_time, y = name, yend = name,
color = ifelse(start_dow %in% c("Saturday", "Sunday"), "Weekend", "Weekday")
),
size = 10,
) +
scale_x_datetime(position = "top") +
scale_colour_manual(
name = NULL,
values = c(
"Weekend" = ft_cols$light_blue,
"Weekday" = ft_cols$green
)
) +
guides(
colour = guide_legend(override.aes = list(size = 1))
) +
labs(
x = NULL, y = NULL,
title = glue::glue('Top 10 App usage on this Mac in the past {round(as.numeric(max(usage$end_time) - min(usage$start_time), "days"))} days'),
subtitle = "Each segment represents that app being 'up' (Open to Quit).\nUnfortunately, this is what Screen Time uses for its calculations on macOS"
) +
theme_ft_rc(grid="X") +
theme(legend.position = c(1, 1.25)) +
theme(legend.justification = "right")
I’m not entirely sure “on this Mac” is completely accurate since I think this syncs across all active Screen Time devices due to this (n is in seconds):
count(usage, source, wt=usage, sort=TRUE)
## # A tibble: 2 x 2
## source n
## <chr> <int>
## 1 Other 4851610
## 2 MacBookPro13,2 1634137
The “Other” appears to be the work-dev Mac but it doesn’t have the identifier mapped so I think that means it’s the local one and that the above chart is looking at Screen Time across all devices. I literally (right before this sentence) enabled Screen Time on my iPhone so we’ll see if that ends up in the database and I’ll post a quick update if it does.
We’ll take one last look by day of week and use a heatmap to see the results:
count(usage, start_dow, app, wt=usage/60/60) %>%
left_join(app_trans) %>%
filter(app %in% overall_usage$app[1:10]) %>%
mutate(name = factor(name, levels = rev(overall_usage$name[1:10]))) %>%
mutate(start_dow = factor(start_dow, c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))) %>%
ggplot() +
geom_tile(aes(start_dow, name, fill = n), color = "#252a32", size = 0.75) +
scale_x_discrete(expand = c(0, 0.5), position = "top") +
scale_y_discrete(expand = c(0, 0.5)) +
scale_fill_viridis_c(direction = -1, option = "magma", name = "Usage (hrs)") +
labs(
x = NULL, y = NULL,
title = "Top 10 App usage by day of week"
) +
theme_ft_rc(grid="")
I really need to get into the habit of using the RStudio Server access features of RSwitch over Chrome so I can get RSwitch into the top 10, but some habits (and bookmarks) die hard.
FIN
Apple’s Screen Time also tracks “category”, which is something we can pick up from each application’s embedded metadata. We’ll do that in a follow-up post along with seeing whether we can capture iOS usage now that I’ve enabled Screen Time on those devices as well.
Keep spelunking the knowledgeC.db
table(s) and blog about or reply in the comments with any interesting nuggets you find.
6 Comments
start_dow = case_when(
ZSTARTDAYOFWEEK == 1 ~ “Sunday”,
ZSTARTDAYOFWEEK == 2 ~ “Monday”,
ZSTARTDAYOFWEEK == 3 ~ “Tuesday”,
ZSTARTDAYOFWEEK == 4 ~ “Wednesday”,
ZSTARTDAYOFWEEK == 5 ~ “Thursday”,
ZSTARTDAYOFWEEK == 6 ~ “Friday”,
ZSTARTDAYOFWEEK == 7 ~ “Saturday”
)
why not
start_dow = factor(ZSTARTDAYOFWEEK, levels = 1:7, lables = c(“Sunday”, “Monday”, “Tuesday”, “Wednesday”, “Thursday”, “Friday”, “Saturday”))
or even
lubridate::wday(created_at, week_start = 1)
?
completely valid approach! I was letting the db do the work vs let R do the work (you’d need to do what you suggested after the
collect()
)Any chance that you have done similar work with ScreenTime on iOS? I would love to look at data for usage on my iPad. :)
It is, unfortunately, not possible without jailbreaking an iOS device or using commercial forensics software to capture an image of an iOS device.
And, I was mistaken! If you have “Share Acrosss Devices” checked in “Screen Time” in “Settings” you will capture all the iOS activity as well.
Thanks for your writeup! I was inspired to rewrite the extraction for long-term Screen Time analytics using vanilla macOS dependencies, i.e. zsh and the sqlite command line tools. It’s not at all clear that this is still efficient once a couple months are accumulated, but then I could just make the LaunchAgent that runs it less aggressive…
I haven’t decided on an analytics platform – maybe just something in R or Julia, or maybe Superset or Metabase. Have a look: https://gist.github.com/mmyrte/f27165d737a37b44f57988aaa2ab3d90
2 Trackbacks/Pingbacks
[…] October 29, 2019 By Donald Greer [This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page […]
[…] by data_admin [This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page […]