R Archives - Page 11 of 56

Category Archives: R

Spelunking macOS ‘ScreenTime’ App Usage with R

Apple has brought Screen Time to macOS for some time now and that means it has to store this data somewhere. Thankfully, Sarah Edwards has foraged through the macOS filesystem for us and explained where these bits of knowledge are in her post, Knowledge is Power! Using the macOS/iOS knowledgeC.db Database to Determine Precise User and Application Usage, which ultimately reveals the data lurks in ~/Library/Application Support/Knowledge/knowledgeC.db. Sarah also has a neat little Python utility dubbed APOLLO (Apple Pattern of Life Lazy Output’er) which has a smattering of knowledgeC.db canned SQL queries that cover a myriad of tracked items.

Today, we’ll show how to work with this database in R and the {tidyverse} to paint our own pictures of application usage.

There are quite a number of tables in the knowledgeC.db SQLite 3 database:

That visual schema was created in OmniGraffle via a small R script that uses the OmniGraffle automation framework. The OmniGraffle source files are also available upon request.

Most of the interesting bits (for any tracking-related spelunking) are in the ZOBJECT table and to get a full picture of usage we’ll need to join it with some other tables that are connected via a few foreign keys:

There are a few ways to do this in {tidyverse} R. The first is an extended straight SQL riff off of one of Sarah’s original queries:

library(hrbrthemes) # for ggplot2 machinations
library(tidyverse)

# source the knowledge db
kdb <- src_sqlite("~/Library/Application Support/Knowledge/knowledgeC.db")

tbl(
  kdb, 
  sql('
SELECT
  ZOBJECT.ZVALUESTRING AS "app", 
    (ZOBJECT.ZENDDATE - ZOBJECT.ZSTARTDATE) AS "usage",  
    CASE ZOBJECT.ZSTARTDAYOFWEEK 
      WHEN "1" THEN "Sunday"
      WHEN "2" THEN "Monday"
      WHEN "3" THEN "Tuesday"
      WHEN "4" THEN "Wednesday"
      WHEN "5" THEN "Thursday"
      WHEN "6" THEN "Friday"
      WHEN "7" THEN "Saturday"
    END "dow",
    ZOBJECT.ZSECONDSFROMGMT/3600 AS "tz",
    DATETIME(ZOBJECT.ZSTARTDATE + 978307200, \'UNIXEPOCH\') as "start_time", 
    DATETIME(ZOBJECT.ZENDDATE + 978307200, \'UNIXEPOCH\') as "end_time",
    DATETIME(ZOBJECT.ZCREATIONDATE + 978307200, \'UNIXEPOCH\') as "created_at", 
    CASE ZMODEL
      WHEN ZMODEL THEN ZMODEL
      ELSE "Other"
    END "source"
  FROM
    ZOBJECT 
    LEFT JOIN
      ZSTRUCTUREDMETADATA 
    ON ZOBJECT.ZSTRUCTUREDMETADATA = ZSTRUCTUREDMETADATA.Z_PK 
    LEFT JOIN
      ZSOURCE 
    ON ZOBJECT.ZSOURCE = ZSOURCE.Z_PK 
    LEFT JOIN
      ZSYNCPEER
    ON ZSOURCE.ZDEVICEID = ZSYNCPEER.ZDEVICEID
  WHERE
    ZSTREAMNAME = "/app/usage"'
  )) -> usage

usage
## # Source:   SQL [?? x 8]
## # Database: sqlite 3.29.0 [/Users/johndoe/Library/Application Support/Knowledge/knowledgeC.db]
##    app                      usage dow         tz start_time          end_time            created_at         source       
##    <chr>                    <int> <chr>    <int> <chr>               <chr>               <chr>              <chr>        
##  1 com.bitrock.appinstaller    15 Friday      -4 2019-10-05 01:11:27 2019-10-05 01:11:42 2019-10-05 01:11:… MacBookPro13…
##  2 com.tinyspeck.slackmacg…  4379 Tuesday     -4 2019-10-01 13:19:24 2019-10-01 14:32:23 2019-10-01 14:32:… Other        
##  3 com.tinyspeck.slackmacg…  1167 Tuesday     -4 2019-10-01 18:19:24 2019-10-01 18:38:51 2019-10-01 18:38:… Other        
##  4 com.tinyspeck.slackmacg…  1316 Tuesday     -4 2019-10-01 19:13:49 2019-10-01 19:35:45 2019-10-01 19:35:… Other        
##  5 com.tinyspeck.slackmacg… 12053 Thursday    -4 2019-10-03 12:25:18 2019-10-03 15:46:11 2019-10-03 15:46:… Other        
##  6 com.tinyspeck.slackmacg…  1258 Thursday    -4 2019-10-03 15:50:16 2019-10-03 16:11:14 2019-10-03 16:11:… Other        
##  7 com.tinyspeck.slackmacg…  2545 Thursday    -4 2019-10-03 16:24:30 2019-10-03 17:06:55 2019-10-03 17:06:… Other        
##  8 com.tinyspeck.slackmacg…   303 Thursday    -4 2019-10-03 17:17:10 2019-10-03 17:22:13 2019-10-03 17:22:… Other        
##  9 com.tinyspeck.slackmacg…  9969 Thursday    -4 2019-10-03 17:33:38 2019-10-03 20:19:47 2019-10-03 20:19:… Other        
## 10 com.tinyspeck.slackmacg…  2813 Thursday    -4 2019-10-03 20:19:52 2019-10-03 21:06:45 2019-10-03 21:06:… Other        
## # … with more rows

Before explaining what that query does, let’s rewrite it {dbplyr}-style:

tbl(kdb, "ZOBJECT") %>% 
  mutate(
    created_at = datetime(ZCREATIONDATE + 978307200, "UNIXEPOCH", "LOCALTIME"),
    start_dow = case_when(
      ZSTARTDAYOFWEEK == 1 ~ "Sunday",
      ZSTARTDAYOFWEEK == 2 ~ "Monday",
      ZSTARTDAYOFWEEK == 3 ~ "Tuesday",
      ZSTARTDAYOFWEEK == 4 ~ "Wednesday",
      ZSTARTDAYOFWEEK == 5 ~ "Thursday",
      ZSTARTDAYOFWEEK == 6 ~ "Friday",
      ZSTARTDAYOFWEEK == 7 ~ "Saturday"
    ),
    start_time = datetime(ZSTARTDATE + 978307200, "UNIXEPOCH", "LOCALTIME"),
    end_time = datetime(ZENDDATE + 978307200, "UNIXEPOCH", "LOCALTIME"),
    usage = (ZENDDATE - ZSTARTDATE),
    tz = ZSECONDSFROMGMT/3600 
  ) %>% 
  left_join(tbl(kdb, "ZSTRUCTUREDMETADATA"), c("ZSTRUCTUREDMETADATA" = "Z_PK")) %>% 
  left_join(tbl(kdb, "ZSOURCE"), c("ZSOURCE" = "Z_PK")) %>% 
  left_join(tbl(kdb, "ZSYNCPEER"), "ZDEVICEID") %>% 
  filter(ZSTREAMNAME == "/app/usage")  %>% 
  select(
    app = ZVALUESTRING, created_at, start_dow, start_time, end_time, usage, tz, source = ZMODEL
  ) %>% 
  mutate(source = ifelse(is.na(source), "Other", source)) %>% 
  collect() %>% 
  mutate_at(vars(created_at, start_time, end_time), as.POSIXct) -> usage

What we’re doing is pulling out the day of week, start/end usage times & timezone info, app bundle id, source of the app interactions and the total usage time for each entry along with when that entry was created. We need to do some maths since Apple stores time-y whime-y info in its own custom format, plus we need to convert numeric DOW to labeled DOW.

The bundle ids are pretty readable, but they’re not really intended for human consumption, so we’ll make a translation table for the bundle id to app name by using the mdls command.

list.files(
  c("/Applications", "/System/Library/CoreServices", "/Applications/Utilities", "/System/Applications"), # main places apps are stored (there are potentially more but this is sufficient for our needs)
  pattern = "\\.app$", 
  full.names = TRUE
) -> apps

x <- sys::exec_internal("mdls", c("-name", "kMDItemCFBundleIdentifier", "-r", apps))

# mdls null (\0) terminates each entry so we have to do some raw surgery to get it into a format we can use
x$stdout[x$stdout == as.raw(0)] <- as.raw(0x0a)

tibble(
  name = gsub("\\.app$", "", basename(apps)),
  app = read_lines(x$stdout) 
) -> app_trans

app_trans
## # A tibble: 270 x 2
##    name                    app                                    
##    <chr>                   <chr>                                  
##  1 1Password 7             com.agilebits.onepassword7             
##  2 Adium                   com.adiumX.adiumX                      
##  3 Agenda                  com.momenta.agenda.macos               
##  4 Alfred 4                com.runningwithcrayons.Alfred          
##  5 Amazon Music            com.amazon.music                       
##  6 Android File Transfer   com.google.android.mtpviewer           
##  7 Awsaml                  com.rapid7.awsaml                      
##  8 Bartender 2             com.surteesstudios.Bartender           
##  9 BBEdit                  com.barebones.bbedit                   
## 10 BitdefenderVirusScanner com.bitdefender.BitdefenderVirusScanner
## # … with 260 more rows

The usage info goes back ~30 days, so let’s do a quick summary of the top 10 apps and their total usage (in hours):

usage %>% 
  group_by(app) %>% 
  summarise(first = min(start_time), last = max(end_time), total = sum(usage, na.rm=TRUE)) %>% 
  ungroup() %>% 
  mutate(total = total / 60 / 60) %>% # hours
  arrange(desc(total)) %>% 
  left_join(app_trans) -> overall_usage

overall_usage %>% 
  slice(1:10) %>% 
  left_join(app_trans) %>%
  mutate(name = fct_inorder(name) %>% fct_rev()) %>%
  ggplot(aes(x=total, y=name)) + 
  geom_segment(aes(xend=0, yend=name), size=5, color = ft_cols$slate) +
  scale_x_comma(position = "top") +
  labs(
    x = "Total Usage (hrs)", y = NULL,
    title = glue::glue('App usage in the past {round(as.numeric(max(usage$end_time) - min(usage$start_time), "days"))} days')
  ) +
  theme_ft_rc(grid="X")

There’s a YUGE flaw in the current way macOS tracks application usage. Unlike iOS where apps really don’t run simultaneously (with iPadOS they kinda can/do, now), macOS apps are usually started and left open along with other apps. Apple doesn’t do a great job identifying only active app usage activity so many of these usage numbers are heavily inflated. Hopefully that will be fixed by macOS 10.15.

We have more data at our disposal, so let’s see when these apps get used. To do that, we’ll use segments to plot individual usage tracks and color them by weekday/weekend usage (still limiting to top 10 for blog brevity):

usage %>% 
  filter(app %in% overall_usage$app[1:10]) %>% 
  left_join(app_trans) %>%
  mutate(name = factor(name, levels = rev(overall_usage$name[1:10]))) %>% 
  ggplot() +
  geom_segment(
    aes(
      x = start_time, xend = end_time, y = name, yend = name, 
      color = ifelse(start_dow %in% c("Saturday", "Sunday"), "Weekend", "Weekday")
    ),
    size = 10,
  ) +
  scale_x_datetime(position = "top") +
  scale_colour_manual(
    name = NULL,
    values = c(
      "Weekend" = ft_cols$light_blue, 
      "Weekday" = ft_cols$green
    )
  ) +
  guides(
    colour = guide_legend(override.aes = list(size = 1))
  ) +
  labs(
    x = NULL, y = NULL,
    title = glue::glue('Top 10 App usage on this Mac in the past {round(as.numeric(max(usage$end_time) - min(usage$start_time), "days"))} days'),
    subtitle = "Each segment represents that app being 'up' (Open to Quit).\nUnfortunately, this is what Screen Time uses for its calculations on macOS"
  ) +
  theme_ft_rc(grid="X") +
  theme(legend.position = c(1, 1.25)) +
  theme(legend.justification = "right")

I’m not entirely sure “on this Mac” is completely accurate since I think this syncs across all active Screen Time devices due to this (n is in seconds):

count(usage, source, wt=usage, sort=TRUE)
## # A tibble: 2 x 2
##   source               n
##   <chr>            <int>
## 1 Other          4851610
## 2 MacBookPro13,2 1634137

The “Other” appears to be the work-dev Mac but it doesn’t have the identifier mapped so I think that means it’s the local one and that the above chart is looking at Screen Time across all devices. I literally (right before this sentence) enabled Screen Time on my iPhone so we’ll see if that ends up in the database and I’ll post a quick update if it does.

We’ll take one last look by day of week and use a heatmap to see the results:

count(usage, start_dow, app, wt=usage/60/60) %>% 
  left_join(app_trans) %>%
  filter(app %in% overall_usage$app[1:10]) %>% 
  mutate(name = factor(name, levels = rev(overall_usage$name[1:10]))) %>% 
  mutate(start_dow = factor(start_dow, c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))) %>% 
  ggplot() +
  geom_tile(aes(start_dow, name, fill = n), color = "#252a32", size = 0.75) +
  scale_x_discrete(expand = c(0, 0.5), position = "top") +
  scale_y_discrete(expand = c(0, 0.5)) +
  scale_fill_viridis_c(direction = -1, option = "magma", name = "Usage (hrs)") +
  labs(
    x = NULL, y = NULL,
    title = "Top 10 App usage by day of week"
  ) +
  theme_ft_rc(grid="")

I really need to get into the habit of using the RStudio Server access features of RSwitch over Chrome so I can get RSwitch into the top 10, but some habits (and bookmarks) die hard.

FIN

Apple’s Screen Time also tracks “category”, which is something we can pick up from each application’s embedded metadata. We’ll do that in a follow-up post along with seeing whether we can capture iOS usage now that I’ve enabled Screen Time on those devices as well.

Keep spelunking the knowledgeC.db table(s) and blog about or reply in the comments with any interesting nuggets you find.

100% Stacked Chicklets

I posted a visualization of email safety status (a.k.a. DMARC) of the Fortune 500 (2017 list) the other day on Twitter and received this spiffy request from @MarkAltosaar:

Would you be willing to add the R code used to produce this to your vignette for ggchicklet? I would love to see how you arranged the factors since it is a proportion. Every time I try something like this I feel like my code becomes very complex.

— Mark Altosaar (@MarkAltosaar) September 26, 2019

There are many ways to achieve this result. I’ll show one here and walk through the process starting with the data (this is the 2018 DMARC evaluation run):

library(hrbrthemes) # CRAN or fav social coding site using hrbrmstr/pkgname
library(ggchicklet) # fav social coding site using hrbrmstr/pkgname
library(tidyverse)

f500_dmarc <- read_csv("https://rud.is/dl/f500-industry-dmarc.csv.gz", col_types = "cc")

f500_dmarc
## # A tibble: 500 x 2
##    industry               p         
##    <chr>                  <chr>     
##  1 Retailing              Reject    
##  2 Technology             None      
##  3 Health Care            Reject    
##  4 Wholesalers            None      
##  5 Retailing              Quarantine
##  6 Motor Vehicles & Parts None      
##  7 Energy                 None      
##  8 Wholesalers            None      
##  9 Retailing              None      
## 10 Telecommunications     Quarantine
## # … with 490 more rows

The p column is the DMARC classification for each organization (org names have been withheld to protect the irresponsible) and comes from the p=… value in the DMARC DNS TXT record field. It has a limited set of values, so let’s enumerate them and assign some colors:

dmarc_levels <- c("No DMARC", "None", "Quarantine", "Reject")
dmarc_cols <- set_names(c(ft_cols$slate, "#a6dba0", "#5aae61", "#1b7837"), dmarc_levels)

We want the aggregate value of each p, thus we need to do count counting:

(dmarc_summary <- count(f500_dmarc, industry, p))
## # A tibble: 63 x 3
##    industry            p              n
##    <chr>               <chr>      <int>
##  1 Aerospace & Defense No DMARC       9
##  2 Aerospace & Defense None           3
##  3 Aerospace & Defense Quarantine     1
##  4 Apparel             No DMARC       4
##  5 Apparel             None           1
##  6 Business Services   No DMARC       9
##  7 Business Services   None           7
##  8 Business Services   Reject         4
##  9 Chemicals           No DMARC      12
## 10 Chemicals           None           2
## # … with 53 more rows

We’re also going to want to sort the industries by those with the most DMARC (sorted bars/chicklets FTW!). We’ll need a factor for that, so let’s make one:

(dmarc_summary %>% 
  filter(p != "No DMARC") %>% # we don't care abt this `p` value
  count(industry, wt=n, sort=TRUE) -> industry_levels)
## # A tibble: 21 x 2
##    industry                      n
##    <chr>                     <int>
##  1 Financials                   54
##  2 Technology                   25
##  3 Health Care                  24
##  4 Retailing                    23
##  5 Wholesalers                  16
##  6 Energy                       12
##  7 Transportation               12
##  8 Business Services            11
##  9 Industrials                   8
## 10 Food, Beverages & Tobacco     6
## # … with 11 more rows

Now, we can make the chart:

dmarc_summary %>% 
  mutate(p = factor(p, levels = rev(dmarc_levels))) %>% 
  mutate(industry = factor(industry, rev(industry_levels$industry))) %>% 
  ggplot(aes(industry, n)) +
  geom_chicklet(aes(fill = p)) +
  scale_fill_manual(name = NULL, values = dmarc_cols) +
  scale_y_continuous(expand = c(0,0), position = "right") +
  coord_flip() +
  labs(
    x = NULL, y = NULL,
    title = "DMARC Status of Fortune 500 (2017 List; 2018 Measurement) Primary Email Domains"
  ) +
  theme_ipsum_rc(grid = "X") +
  theme(legend.position = "top")

Doh! We rly want them to be 100% width. Thankfully, {ggplot2} has a position_fill() we can use instead of position_dodge():

dmarc_summary %>% 
  mutate(p = factor(p, levels = rev(dmarc_levels))) %>% 
  mutate(industry = factor(industry, rev(industry_levels$industry))) %>% 
  ggplot(aes(industry, n)) +
  geom_chicklet(aes(fill = p), position = position_fill()) +
  scale_fill_manual(name = NULL, values = dmarc_cols) +
  scale_y_continuous(expand = c(0,0), position = "right") +
  coord_flip() +
  labs(
    x = NULL, y = NULL,
    title = "DMARC Status of Fortune 500 (2017 List; 2018 Measurement) Primary Email Domains"
  ) +
  theme_ipsum_rc(grid = "X") +
  theme(legend.position = "top")

Doh! Even though we forgot to use reverse = TRUE in the call to position_fill() everything is out of order. Kinda. It’s in the order we told it to be in, but that’s not right b/c we need it ordered by the in-industry percentages. If each industry had the same number of organizations, there would not have been an issue. Unfortunately, the folks who make up these lists care not about our time. Let’s re-compute the industry factor by computing the percents:

(dmarc_summary %>% 
  group_by(industry) %>% 
  mutate(pct = n/sum(n)) %>% 
  ungroup() %>% 
  filter(p != "No DMARC") %>% 
  count(industry, wt=pct, sort=TRUE) -> industry_levels)
## # A tibble: 21 x 2
##    industry               n
##    <chr>              <dbl>
##  1 Transportation     0.667
##  2 Technology         0.641
##  3 Wholesalers        0.615
##  4 Financials         0.614
##  5 Health Care        0.6  
##  6 Business Services  0.55 
##  7 Food & Drug Stores 0.5  
##  8 Retailing          0.5  
##  9 Industrials        0.444
## 10 Telecommunications 0.375
## # … with 11 more rows

Now, we can go back to using position_fill() as before:

dmarc_summary %>% 
  mutate(p = factor(p, levels = rev(dmarc_levels))) %>% 
  mutate(industry = factor(industry, rev(industry_levels$industry))) %>% 
  ggplot(aes(industry, n)) +
  geom_chicklet(aes(fill = p), position = position_fill(reverse = TRUE)) +
  scale_fill_manual(name = NULL, values = dmarc_cols) +
  scale_y_percent(expand = c(0, 0.001), position = "right") +
  coord_flip() +
  labs(
    x = NULL, y = NULL,
    title = "DMARC Status of Fortune 500 (2017 List; 2018 Measurement) Primary Email Domains"
  ) +
  theme_ipsum_rc(grid = "X") +
  theme(legend.position = "top")

FIN

As noted, this is one way to handle this situation. I’m not super happy with the final visualization here as it doesn’t have the counts next to the industry labels and I like to have the ordering by both count and more secure configuration (so, conditional on higher prevalence of Quarantine or Reject when there are ties). That is an exercise left to the reader 😎.

RSwitch 1.5.1 Released

An RSwitch user, lcolladotor filed a most-welcome issue letting me know that the core functionality of the switcher was busted 😱. After testing out the 1.5.1 release candidate I had made a “harmless” & “clever” change to reduce some redundancy in the code that handled with switching which resulted in busted symbolic link creation. Tis fixed, now.

To somewhat make amends for said error James Balamuta’s excellent “R Compiler Tools for Rcpp on macOS” resource (https://thecoatlessprofessor.com/programming/cpp/r-compiler-tools-for-rcpp-on-macos/) as been added to the available web resources links.

I’ve also setup a mailing list for RSwitch over at sourcehut where you can signup directly w/a (free) sourcehut account (signup by just email) and see archives. The RSwitch menu has a new link to the mailing list.

There’s also a new blog category for RSwitch which has it’s own RSS feed (https://rud.is/b/category/rswitch/feed/) to make it easier to keep up with RSwitch-only updates.

“Check for updates” will get you to the new release or you can grab it directly from the RSwitch site.

Twitter “Account Analysis” in R

This past week @propublica linked to a really spiffy resource for getting an overview of a Twitter user’s profile and activity called accountanalysis. It has a beautiful interface that works as well on mobile as it does in a real browser. It also is fully interactive and supports cross-filtering (zoom in on the timeline and the other graphs change). It’s especially great if you’re not a coder, but if you are, @kearneymw’s {rtweet} can get you all this info and more, putting the power of R behind data frames full of tweet inanity.

While we covered quite a bit of {rtweet} ground in the 21 Recipes book, summarizing an account to the degree that accountanalysis does is not in there. To rectify this oversight, I threw together a static clone of accountanalysis that can make standalone HTML reports like this one.

It’s a fully parameterized R markdown document, meaning you can run it as just a function call (or change the parameter and knit it by hand):

rmarkdown::render(
  input = "account-analysis.Rmd", 
  params = list(
    username = "propublica"
  ), 
  output_file = "~/Documents/propublica-analysis.html"
)

It will also, by default, save a date-stamped copy of the user info and retrieved timeline into the directory you generate the report from (add a prefix path to the save portion in the Rmd to store it in a better place).

With all the data available, you can dig in and extract all the information you want/need.

FIN

You can get the Rmd at your favorite social coding service:

RSwitch 1.5.0 Release Now Also Corrals RStudio Server Connections

2019-09-14 – 08:19
Posted in Apple, macOS, R, RSwitch, Swift
Comments (2)

RSwitch is a macOS menubar application that works on macOS 10.14+ and provides handy shortcuts for developing with R on macOS. Version 1.5.0 brings a reorganized menu system and the ability to manage and make connections to RStudio Server instances. Here’s a quick peek at the new setup:

All books, links, and other reference resources are under a single submenu system:

If there’s a resource you’d like added, follow the links on the main RSwitch site to file PRs where you’re most comfortable.

You can also setup automatic checks and notifications for when new RStudio Dailies are available (you can still always check manually and this check feature is off by default):

But, the biggest new feature is the ability to manage and launch RStudio Server connections right from RSwitch:

This slideshow requires JavaScript.

These RStudio Server browser connections are kept separate from your internet browsing and are one menu selection away. RSwitch also remembers the size and position of your RStudio Server session windows, so everything should be where you want/need/expect. This is somewhat of an experimental feature so definitely file issues if you run into any problems or would like things to work differently.

FIN

Kick the tyres, file issues or requests and, if so inclined, let me know how you’re liking RSwitch!

September 2019 Democratic Debates Added to {ggchicklet}

The latest round of the 2020 Democratic debates is over and the data from all the 2019 editions of the debates have been added to {ggchicklet}. The structure of the debates2019 built-in dataset has changed a bit:

library(ggchicklet)
library(hrbrthemes)
library(tidyverse)

debates2019
## # A tibble: 641 x 7
##    elapsed timestamp speaker   topic   debate_date debate_group night
##      <dbl> <time>    <chr>     <chr>   <date>             <dbl> <dbl>
##  1   1.04  21:03:05  Warren    Economy 2019-09-13             1     1
##  2   1.13  21:04:29  Klobuchar Economy 2019-09-13             1     1
##  3   1.13  21:06:02  O'Rourke  Economy 2019-09-13             1     1
##  4   0.226 21:07:20  O'Rourke  Economy 2019-09-13             1     1
##  5   1.06  21:07:54  Booker    Economy 2019-09-13             1     1
##  6   0.600 21:09:08  Booker    Economy 2019-09-13             1     1
##  7   0.99  21:09:50  Warren    Economy 2019-09-13             1     1
##  8   0.872 21:11:03  Castro    Economy 2019-09-13             1     1
##  9   1.07  21:12:00  Gabbard   Economy 2019-09-13             1     1
## 10   1.11  21:13:20  de Blasio Economy 2019-09-13             1     1
## # … with 631 more rows

There are now debate_date, debate_group and night columns to make it easier to segment out or group together the debate nights.

The topic names across the online JavaScript data for the June, July and September debates weren’t uniform so they’ve been cleaned up as well:

distinct(debates2019, topic) %>% 
  arrange(topic) %>% 
  print(n=nrow(.))
## # A tibble: 26 x 1
##    topic                  
##    <chr>                  
##  1 Abortion               
##  2 Age                    
##  3 Campaign Finance Reform
##  4 Civil Rights           
##  5 Climate                
##  6 Closing                
##  7 Economy                
##  8 Education              
##  9 Elections Reform       
## 10 Foreign Policy         
## 11 Gun Control            
## 12 Healthcare             
## 13 Immigration            
## 14 Lead                   
## 15 Opening                
## 16 Other                  
## 17 Party Strategy         
## 18 Politics               
## 19 Race                   
## 20 Resilience             
## 21 Socialism              
## 22 Statement              
## 23 Trade                  
## 24 Trump                  
## 25 Veterans               
## 26 Women's Rights

This should make it easier to compare speaker times per-topic across the debates.

Here’ how to generate the chart in the featured image slot for the September debate:

debates2019 %>%
  filter(debate_group == 3) %>% 
  mutate(speaker = fct_reorder(speaker, elapsed, sum, .desc=FALSE)) %>%
  mutate(topic = fct_inorder(topic)) %>% 
  ggplot(aes(speaker, elapsed, group = timestamp, fill = topic)) +
  geom_chicklet(width = 0.75) +
  scale_y_continuous(
    expand = c(0, 0.0625),
    position = "right",
    breaks = seq(0, 18, 2),
    labels = c(0, sprintf("%d min.", seq(2, 18, 2))),
    limits = c(0, 18)
  ) +
  ggthemes::scale_fill_tableau("Tableau 20") +
  guides(
    fill = guide_legend(nrow = 2)
  ) +
  coord_flip() +
  labs(
    x = NULL, y = NULL, fill = NULL,
    title = "How Long Each Candidate Spoke",
    subtitle = "September 2019 Democratic Debates",
    caption = "Each bar segment represents the length of a candidate’s response to a question.\nOriginal <https://www.nytimes.com/interactive/2019/09/12/us/elections/debate-speaking-time.html>\n#rstats reproduction by @hrbrmstr"
  ) +
  theme_ipsum_rc(grid="X") +
  theme(axis.text.x = element_text(color = "gray60", size = 10)) +
  theme(legend.position = "top")

Now that the field has been thinned a bit (yes, others are still running, but really?) we can see who has blathered the most on stage so far:

debates2019 %>%
  filter(debate_group == 3) %>% 
  distinct(speaker) %>% 
  left_join(debates2019) %>% 
  count(speaker, wt=elapsed, sort=TRUE) %>% 
  mutate(speaker = fct_inorder(speaker) %>% fct_rev()) %>% 
  ggplot(aes(speaker, n)) +
  geom_col(fill = ft_cols$slate, width=0.55) +
  coord_flip() +
  scale_y_continuous(expand = c(0, 0.55), position = "right") +
  labs(
    x = NULL, y = "Speaking time (minutes)",
    title = "Total Speaking Time Across All 2019 Debates\nfor Those Left Standing in September"
  ) +
  theme_ipsum_es(grid="X")

And, here’s what they’ve all blathered about:

debates2019 %>%
  filter(debate_group == 3) %>% 
  distinct(speaker) %>% 
  left_join(debates2019) %>% 
  count(topic, wt=elapsed, sort=TRUE) %>% 
  mutate(topic = fct_inorder(topic) %>% fct_rev()) %>% 
  ggplot(aes(topic, n)) +
  geom_col(fill = ft_cols$slate, width=0.55) +
  coord_flip() +
  scale_y_continuous(expand = c(0, 0.25), position = "right") +
  labs(
    x = NULL, y = "Topic time (minutes)",
    title = "Total Topic Time Across All 2019 Debates\nfor Those Left Standing in September"
  ) +
  theme_ipsum_es(grid="X")

RSwitch 1.4.1 Released

A minor update to RSwitch has been released. Apart from some internal code reorganization there are three user-facing changes.

First, RSwitch is now notarized! That means you won’t get a notice about it being from an “unidentified developer” nor will folks on Catalina see a warning about unable to check the download for malware. You can use {macthekinfe} to check out the application signature and notarization info:

check_sig("/Applications/RSwitch.app") %>% 
  print(n=nrow(.))
## # A tibble: 25 x 2
##    key                         value                                                               
##    <chr>                       <chr>                                                               
##  1 Executable                  /Applications/RSwitch.app/Contents/MacOS/RSwitch                    
##  2 Identifier                  is.rud.bob.RSwitch                                                  
##  3 Format                      app bundle with Mach-O thin (x86_64)                                
##  4 CodeDirectory v             20500 size=1342 flags=0x10000(runtime) hashes=33+5 location=embedded
##  5 VersionPlatform             1                                                                   
##  6 VersionMin                  658944                                                              
##  7 VersionSDK                  659200                                                              
##  8 Hash type                   sha256 size=32                                                      
##  9 CandidateCDHash sha256      efa512a9daabfb9402af8a91697f008b89ffa81e                            
## 10 CandidateCDHashFull sha256  efa512a9daabfb9402af8a91697f008b89ffa81ea014452821e39a5365d80fe6    
## 11 Hash choices                sha256                                                              
## 12 CMSDigest                   efa512a9daabfb9402af8a91697f008b89ffa81ea014452821e39a5365d80fe6    
## 13 CMSDigestType               2                                                                   
## 14 Page size                   4096                                                                
## 15 CDHash                      efa512a9daabfb9402af8a91697f008b89ffa81e                            
## 16 Signature size              8968                                                                
## 17 Authority                   Developer ID Application: Bob Rudis (CBY22P58G8)                    
## 18 Authority                   Developer ID Certification Authority                                
## 19 Authority                   Apple Root CA                                                       
## 20 Timestamp                   Sep 1, 2019 at 08:46:41                                             
## 21 Info.plist entries          26                                                                  
## 22 TeamIdentifier              CBY22P58G8                                                          
## 23 Runtime Version             10.15.0                                                             
## 24 Sealed Resources version    2 rules=13 files=26                                                 
## 25 Internal requirements count 1 size=212

check_notarization("/Applications/RSwitch.app")
## # A tibble: 4 x 2
##   key         value                                           
##   <chr>       <chr>                                           
## 1 application /Applications/RSwitch.app                       
## 2 status      accepted                                        
## 3 source      Notarized Developer ID                          
## 4 origin      Developer ID Application: Bob Rudis (CBY22P58G8)

Note: you may (I’m working on installing it in a fresh Catalina VM to know definitely) need to ensure RSwitch is granted “Full Disk Access” in System Preferences -> Security & Privacy -> [Privacy] tab to ensure it can operate where it needs to:

Next, since it’s possible to have an old set of just package libraries at a given /Library/Frameworks/R.framework/Versions/#.# path but no R binary in said locations, the script now does what the R Core RSwitch app (Simon was kind enough to forward that R-Forge SVN web link) does and performs some extra validation to see if a given R version directory is, indeed, switchable to. Directories that aren’t switchable are shown but grayed out (as in the image, below) and marked as “incomplete”. You probably should clean out those old paths.

Finally, in the same image keen observers will see a few more relevant links have been added to the “bookmarks”. I added them because I frequent them as I work on R-related things.

Folks who are running 1.4.0 should be able to use the “Check for update…” menu item to get to the releases page. You can also get it from the RSwitch landing page or download it directly via: https://rud.is/rswitch/releases/RSwitch-1.4.1.app.zip.

FIN

RSwitch feels feature complete so the pace of development and releases will likely slow a bit. Some spiffy folks have offered both a new app icon and a request to make it easier to switch between running RStudio/R GUI instances and I’m working on incorporating both of those ideas into the app. If you do have a problem, question, or feature request, definitely file an issue on your favorite social coding site. Links to where RSwitch source code can be found to file said issue(s) are at the bottom of the RSwitch landing page.

RSwitch 1.4.0 Released

Swift 5 has been so much fun to hack on that there’s a new update to macOS R-focused mebubar utility RSwitch available. Along with the app comes a new dedicated RSwitch landing page and a new user’s guide since it has enough features to warrant such documentation. Here’s the new menu

The core changes/additions include:

a reorganized menu (see above)
the use of notifications instead of alerts
disabling of download menu entries while download is in progress
the ability to start new R GUI or RStudio instances
the ability to switch to and make running R GUI or RStudio instances active
additional “bookmarks” in the reorganized web resources submenu
Built-in check for updates

To make RSwitch launch at startup, just add it as a login item to your user in the “Users & Groups” pane of “System Preferences”.

The guide has information on how all the existing and new features work plus provides documentation on the how to install the alternate R versions available at the R for macOS Developer’s Page. There’s also a slightly expanded set of information on how to contribute to RSwitch development.

FIN

As usual, kick the tyres, file feature requests or bug reports where you’re comfortable, & — if you’re macOS-dev-curious — join in the Swift 5 fun (it really is a pretty fun language).