Skip navigation

Category Archives: Swift

After a Twitter convo about weather stations I picked up a WeatherFlow Tempest. Setup was quick, but the sensor package died within 24 hours. I was going to give up on it but I had written an R package (for the REST API & UDP broadcast interfaces) and C++ utility (for just the UDP broadcast interface), and the support staff were both friendly and competent and sent me a replacement super quick.

I’ve blathered about the R package already (on Twitter) so am not going to tag that here, but will link to a few repositories (in various languages) that receive the UDP broadcast messages and at least shove them to stdout.

The C++ one is mostly C but gets the job done (it just posts the messages to stdout). It should run everywhere but I only tested on macOS & Linux, because Windows is a terrible operating system nobody should use.

The Golang one has some structured types to consume about half of the JSON messages (I’ve only seen four in the broadcasts so far, and will add more as I see new ones). It’s only more verbose than the C++ one due to the various record type handling. This should run everywhere, though.

For kicks, I threw together a Swift one that is really just Swift-ified C and is a Frankenstein monster that likely shouldn’t be used. (I’ll be making a SwiftUI macOS/iOS/iPadOS app for the UDP broadcast messages, though, soon).

To round out my obsession I also made a Rust version which I’m just in 💙 with (not because of any skill of my own). It’s the smallest source file and is pretty elegant (100% due to Rust, and, again, not me).

All the code/projects are super small, but the Rust source is so tiny that it won’t be too intrusive to post here:

use std::net::UdpSocket;

fn main() -> std::io::Result<()> {

  let mut buf = [0; 1024]; // 1024 byte buffer is plenty
  let s = UdpSocket::bind("0.0.0.0:50222").expect(r#"{"message":"Could not bind to address/port."}"#);

  loop {

    let (n, _) = s.recv_from(&mut buf).expect(r#"{"message":"No broadcasts received."}"#);

    println!("{}", String::from_utf8(buf[..n].to_vec()).unwrap())

  }

}

FIN

If you’re interested in a low-cost weather station with great DIY programming support, I’d definitely (so far, at least) recommend the Tempest. We’ll see if it survives the forthcoming snowpocalypse.

These are the JSON messages it slings over UDP:

{"serial_number":"HB-00069665","type":"hub_status","firmware_revision":"177","uptime":728643,"rssi":-50,"timestamp":1643246011,"reset_flags":"BOR,PIN,POR","seq":72787,"radio_stats":[25,1,0,3,16637],"mqtt_stats":[10,108]}
{"serial_number":"ST-00055227","type":"rapid_wind","hub_sn":"HB-00069665","ob":[1643246013,0.00,0]}
{"serial_number":"ST-00055227","type":"rapid_wind","hub_sn":"HB-00069665","ob":[1643246015,0.00,0]}
{"serial_number":"ST-00055227","type":"device_status","hub_sn":"HB-00069665","timestamp":1643246016,"uptime":106625,"voltage":2.683,"firmware_revision":165,"rssi":-72,"hub_rssi":-66,"sensor_status":655364,"debug":0}
{"serial_number":"ST-00055227","type":"obs_st","hub_sn":"HB-00069665","obs":[[1643246016,0.00,0.00,0.00,0,3,1024.56,-12.82,47.84,0,0.00,0,0.000000,0,0,0,2.683,1]],"firmware_revision":165}

WWDC 2021 is on this week and many new fun things are being introduced, including some data science-friendly additions to the frameworks that come with Xcode 13 and available on macOS 12+ (and its *OS cousins).

Specifically, Apple has made tabular data a first-class citizen with the new TabularData app service.

A future post will have some more expository, but here’s a sample of core operations including:

  • reading in tabular data from CSV or JSON
  • examining the structure
  • working with columns and/or rows
  • grouping and filtering operations
  • transforming and removing columns

I’ve tagged this with rstats as there are R equivalents included for each operation so R folks can translate any Swift code they see in the future.

import TabularData

// define some basic formatting options for data frame output
let dOpts = FormattingOptions(maximumLineWidth: 80, maximumCellWidth: 10, maximumRowCount: 20, includesColumnTypes: true)

// read in a CSV file
// R: xdf <- read.csv("mtcars.csv")
var xdf = try! DataFrame.init(contentsOfCSVFile: URL(fileURLWithPath: "mtcars.csv"))

// take a look at it
// R: print(xdf) # no more print() in further R equivalents; just assume interactive or wrap with print
print(xdf.description(options: dOpts))

┏━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳╍╍╍╍╍╍┓
┃    ┃ mpg      ┃ cyl   ┃ disp     ┃ hp    ┃ drat     ┃ wt       ┃ 5    ┇
┃    ┃ <Double> ┃ <Int> ┃ <Double> ┃ <Int> ┃ <Double> ┃ <Double> ┃ more ┇
┡━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇╍╍╍╍╍╍┩
│ 0  │ 21.0     │ 6     │ 160.0    │ 110   │ 3.9      │ 2.62     │      ┆
│ 1  │ 21.0     │ 6     │ 160.0    │ 110   │ 3.9      │ 2.875    │      ┆
│ 2  │ 22.8     │ 4     │ 108.0    │ 93    │ 3.85     │ 2.32     │      ┆
│ 3  │ 21.4     │ 6     │ 258.0    │ 110   │ 3.08     │ 3.215    │      ┆
│ 4  │ 18.7     │ 8     │ 360.0    │ 175   │ 3.15     │ 3.44     │      ┆
│ 5  │ 18.1     │ 6     │ 225.0    │ 105   │ 2.76     │ 3.46     │      ┆
│ 6  │ 14.3     │ 8     │ 360.0    │ 245   │ 3.21     │ 3.57     │      ┆
│ 7  │ 24.4     │ 4     │ 146.7    │ 62    │ 3.69     │ 3.19     │      ┆
│ 8  │ 22.8     │ 4     │ 140.8    │ 95    │ 3.92     │ 3.15     │      ┆
│ 9  │ 19.2     │ 6     │ 167.6    │ 123   │ 3.92     │ 3.44     │      ┆
│ 10 │ 17.8     │ 6     │ 167.6    │ 123   │ 3.92     │ 3.44     │      ┆
│ 11 │ 16.4     │ 8     │ 275.8    │ 180   │ 3.07     │ 4.07     │      ┆
│ 12 │ 17.3     │ 8     │ 275.8    │ 180   │ 3.07     │ 3.73     │      ┆
│ 13 │ 15.2     │ 8     │ 275.8    │ 180   │ 3.07     │ 3.78     │      ┆
│ 14 │ 10.4     │ 8     │ 472.0    │ 205   │ 2.93     │ 5.25     │      ┆
│ 15 │ 10.4     │ 8     │ 460.0    │ 215   │ 3.0      │ 5.424    │      ┆
│ 16 │ 14.7     │ 8     │ 440.0    │ 230   │ 3.23     │ 5.345    │      ┆
│ 17 │ 32.4     │ 4     │ 78.7     │ 66    │ 4.08     │ 2.2      │      ┆
│ 18 │ 30.4     │ 4     │ 75.7     │ 52    │ 4.93     │ 1.615    │      ┆
│ 19 │ 33.9     │ 4     │ 71.1     │ 65    │ 4.22     │ 1.835    │      ┆
┢╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍┪
┇ 12 more                                                               ┇
┗╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍┛

// dimensions
// R: dim(xdf)
print(xdf.shape)

(rows: 32, columns: 11)

// head
// R: head(xdf)
print(xdf.prefix(5).description(options: dOpts))

┏━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳╍╍╍╍╍╍┓
┃   ┃ mpg      ┃ cyl   ┃ disp     ┃ hp    ┃ drat     ┃ wt       ┃ 5    ┇
┃   ┃ <Double> ┃ <Int> ┃ <Double> ┃ <Int> ┃ <Double> ┃ <Double> ┃ more ┇
┡━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇╍╍╍╍╍╍┩
│ 0 │ 21.0     │ 6     │ 160.0    │ 110   │ 3.9      │ 2.62     │      ┆
│ 1 │ 21.0     │ 6     │ 160.0    │ 110   │ 3.9      │ 2.875    │      ┆
│ 2 │ 22.8     │ 4     │ 108.0    │ 93    │ 3.85     │ 2.32     │      ┆
│ 3 │ 21.4     │ 6     │ 258.0    │ 110   │ 3.08     │ 3.215    │      ┆
│ 4 │ 18.7     │ 8     │ 360.0    │ 175   │ 3.15     │ 3.44     │      ┆
└───┴──────────┴───────┴──────────┴───────┴──────────┴──────────┴╌╌╌╌╌╌┘

// tail
// R: tail(xdf)
print(xdf.suffix(5).description(options: dOpts))

┏━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳╍╍╍╍╍╍┓
┃    ┃ mpg      ┃ cyl   ┃ disp     ┃ hp    ┃ drat     ┃ wt       ┃ 5    ┇
┃    ┃ <Double> ┃ <Int> ┃ <Double> ┃ <Int> ┃ <Double> ┃ <Double> ┃ more ┇
┡━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇╍╍╍╍╍╍┩
│ 27 │ 30.4     │ 4     │ 95.1     │ 113   │ 3.77     │ 1.513    │      ┆
│ 28 │ 15.8     │ 8     │ 351.0    │ 264   │ 4.22     │ 3.17     │      ┆
│ 29 │ 19.7     │ 6     │ 145.0    │ 175   │ 3.62     │ 2.77     │      ┆
│ 30 │ 15.0     │ 8     │ 301.0    │ 335   │ 3.54     │ 3.57     │      ┆
│ 31 │ 21.4     │ 4     │ 121.0    │ 109   │ 4.11     │ 2.78     │      ┆
└────┴──────────┴───────┴──────────┴───────┴──────────┴──────────┴╌╌╌╌╌╌┘

// column summaries
// summary(xdf)
print(xdf.summaryOfAllColumns().description(options: dOpts))

┏━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳╍╍╍╍╍╍┓
┃   ┃ count(mpg) ┃ uniqueCou… ┃ top(mpg) ┃ topFreque… ┃ count(cyl) ┃ 39   ┇
┃   ┃ <Int>      ┃ <Int>      ┃ <Double> ┃ <Int>      ┃ <Int>      ┃ more ┇
┡━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇╍╍╍╍╍╍┩
│ 0 │ 32         │ 25         │ 21.4     │ 2          │ 32         │      ┆
└───┴────────────┴────────────┴──────────┴────────────┴────────────┴╌╌╌╌╌╌┘

// sort it
// R: library(tidyverse) # assume this going forward for R examples
// R: arrange(xdf, cyl)
xdf.sort(on: "cyl")

print(xdf.description(options: dOpts))

┏━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳╍╍╍╍╍╍┓
┃    ┃ mpg      ┃ cyl   ┃ disp     ┃ hp    ┃ drat     ┃ wt       ┃ 5    ┇
┃    ┃ <Double> ┃ <Int> ┃ <Double> ┃ <Int> ┃ <Double> ┃ <Double> ┃ more ┇
┡━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇╍╍╍╍╍╍┩
│ 0  │ 22.8     │ 4     │ 108.0    │ 93    │ 3.85     │ 2.32     │      ┆
│ 1  │ 24.4     │ 4     │ 146.7    │ 62    │ 3.69     │ 3.19     │      ┆
│ 2  │ 22.8     │ 4     │ 140.8    │ 95    │ 3.92     │ 3.15     │      ┆
│ 3  │ 32.4     │ 4     │ 78.7     │ 66    │ 4.08     │ 2.2      │      ┆
│ 4  │ 30.4     │ 4     │ 75.7     │ 52    │ 4.93     │ 1.615    │      ┆
│ 5  │ 33.9     │ 4     │ 71.1     │ 65    │ 4.22     │ 1.835    │      ┆
│ 6  │ 21.5     │ 4     │ 120.1    │ 97    │ 3.7      │ 2.465    │      ┆
│ 7  │ 27.3     │ 4     │ 79.0     │ 66    │ 4.08     │ 1.935    │      ┆
│ 8  │ 26.0     │ 4     │ 120.3    │ 91    │ 4.43     │ 2.14     │      ┆
│ 9  │ 30.4     │ 4     │ 95.1     │ 113   │ 3.77     │ 1.513    │      ┆
│ 10 │ 21.4     │ 4     │ 121.0    │ 109   │ 4.11     │ 2.78     │      ┆
│ 11 │ 21.0     │ 6     │ 160.0    │ 110   │ 3.9      │ 2.62     │      ┆
│ 12 │ 21.0     │ 6     │ 160.0    │ 110   │ 3.9      │ 2.875    │      ┆
│ 13 │ 21.4     │ 6     │ 258.0    │ 110   │ 3.08     │ 3.215    │      ┆
│ 14 │ 18.1     │ 6     │ 225.0    │ 105   │ 2.76     │ 3.46     │      ┆
│ 15 │ 19.2     │ 6     │ 167.6    │ 123   │ 3.92     │ 3.44     │      ┆
│ 16 │ 17.8     │ 6     │ 167.6    │ 123   │ 3.92     │ 3.44     │      ┆
│ 17 │ 19.7     │ 6     │ 145.0    │ 175   │ 3.62     │ 2.77     │      ┆
│ 18 │ 18.7     │ 8     │ 360.0    │ 175   │ 3.15     │ 3.44     │      ┆
│ 19 │ 14.3     │ 8     │ 360.0    │ 245   │ 3.21     │ 3.57     │      ┆
┢╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍┪
┇ 12 more                                                               ┇
┗╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍┛

// read in a JSON File
// R: xdf2 <- jsonlite::fromJSON("mtcars.json")
var xdf2 = try! DataFrame.init(contentsOfJSONFile: URL(fileURLWithPath: "mtcars.json"))

// bind the rows together
// R: xdf <- bind_rows(xdf, xdf2)
xdf.append(xdf2)

// get the new summary
// R: summary(xdf)
print(xdf.summaryOfAllColumns().description(options: dOpts))

┏━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳╍╍╍╍╍╍┓
┃   ┃ count(mpg) ┃ uniqueCou… ┃ top(mpg) ┃ topFreque… ┃ count(cyl) ┃ 39   ┇
┃   ┃ <Int>      ┃ <Int>      ┃ <Double> ┃ <Int>      ┃ <Int>      ┃ more ┇
┡━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇╍╍╍╍╍╍┩
│ 0 │ 64         │ 25         │ 21.4     │ 4          │ 64         │      ┆
└───┴────────────┴────────────┴──────────┴────────────┴────────────┴╌╌╌╌╌╌┘

// basic filtering
// R: xdf.filter(cyl == 6)
print( xdf.filter(on: "cyl", Int.self) { (val) in val == 6 } )

┏━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳╍╍╍╍╍╍┓
┃    ┃ mpg      ┃ cyl   ┃ disp     ┃ hp    ┃ drat     ┃ wt       ┃ 5    ┇
┃    ┃ <Double> ┃ <Int> ┃ <Double> ┃ <Int> ┃ <Double> ┃ <Double> ┃ more ┇
┡━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇╍╍╍╍╍╍┩
│ 11 │ 21.0     │ 6     │ 160.0    │ 110   │ 3.9      │ 2.62     │      ┆
│ 12 │ 21.0     │ 6     │ 160.0    │ 110   │ 3.9      │ 2.875    │      ┆
│ 13 │ 21.4     │ 6     │ 258.0    │ 110   │ 3.08     │ 3.215    │      ┆
│ 14 │ 18.1     │ 6     │ 225.0    │ 105   │ 2.76     │ 3.46     │      ┆
│ 15 │ 19.2     │ 6     │ 167.6    │ 123   │ 3.92     │ 3.44     │      ┆
│ 16 │ 17.8     │ 6     │ 167.6    │ 123   │ 3.92     │ 3.44     │      ┆
│ 17 │ 19.7     │ 6     │ 145.0    │ 175   │ 3.62     │ 2.77     │      ┆
│ 32 │ 21.0     │ 6     │ 160.0    │ 110   │ 3.9      │ 2.62     │      ┆
│ 33 │ 21.0     │ 6     │ 160.0    │ 110   │ 3.9      │ 2.875    │      ┆
│ 35 │ 21.4     │ 6     │ 258.0    │ 110   │ 3.08     │ 3.215    │      ┆
┢╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍┪
┇ 4 more                                                                ┇
┗╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍┛

// group by a column
// R: group_by(xdf, cyl)
print(xdf.grouped(by: "cyl"))

4
┏━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳╍╍╍╍╍╍┓
┃   ┃ mpg      ┃ cyl   ┃ disp     ┃ hp    ┃ drat     ┃ wt       ┃ 5    ┇
┃   ┃ <Double> ┃ <Int> ┃ <Double> ┃ <Int> ┃ <Double> ┃ <Double> ┃ more ┇
┡━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇╍╍╍╍╍╍┩
│ 0 │ 22.8     │ 4     │ 108.0    │ 93    │ 3.85     │ 2.32     │      ┆
│ 1 │ 24.4     │ 4     │ 146.7    │ 62    │ 3.69     │ 3.19     │      ┆
│ 2 │ 22.8     │ 4     │ 140.8    │ 95    │ 3.92     │ 3.15     │      ┆
│ 3 │ 32.4     │ 4     │ 78.7     │ 66    │ 4.08     │ 2.2      │      ┆
│ 4 │ 30.4     │ 4     │ 75.7     │ 52    │ 4.93     │ 1.615    │      ┆
│ 5 │ 33.9     │ 4     │ 71.1     │ 65    │ 4.22     │ 1.835    │      ┆
│ 6 │ 21.5     │ 4     │ 120.1    │ 97    │ 3.7      │ 2.465    │      ┆
│ 7 │ 27.3     │ 4     │ 79.0     │ 66    │ 4.08     │ 1.935    │      ┆
│ 8 │ 26.0     │ 4     │ 120.3    │ 91    │ 4.43     │ 2.14     │      ┆
│ 9 │ 30.4     │ 4     │ 95.1     │ 113   │ 3.77     │ 1.513    │      ┆
┢╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍┪
┇ 12 more                                                              ┇
┗╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍┛

6
┏━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳╍╍╍╍╍╍┓
┃    ┃ mpg      ┃ cyl   ┃ disp     ┃ hp    ┃ drat     ┃ wt       ┃ 5    ┇
┃    ┃ <Double> ┃ <Int> ┃ <Double> ┃ <Int> ┃ <Double> ┃ <Double> ┃ more ┇
┡━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇╍╍╍╍╍╍┩
│ 11 │ 21.0     │ 6     │ 160.0    │ 110   │ 3.9      │ 2.62     │      ┆
│ 12 │ 21.0     │ 6     │ 160.0    │ 110   │ 3.9      │ 2.875    │      ┆
│ 13 │ 21.4     │ 6     │ 258.0    │ 110   │ 3.08     │ 3.215    │      ┆
│ 14 │ 18.1     │ 6     │ 225.0    │ 105   │ 2.76     │ 3.46     │      ┆
│ 15 │ 19.2     │ 6     │ 167.6    │ 123   │ 3.92     │ 3.44     │      ┆
│ 16 │ 17.8     │ 6     │ 167.6    │ 123   │ 3.92     │ 3.44     │      ┆
│ 17 │ 19.7     │ 6     │ 145.0    │ 175   │ 3.62     │ 2.77     │      ┆
│ 32 │ 21.0     │ 6     │ 160.0    │ 110   │ 3.9      │ 2.62     │      ┆
│ 33 │ 21.0     │ 6     │ 160.0    │ 110   │ 3.9      │ 2.875    │      ┆
│ 35 │ 21.4     │ 6     │ 258.0    │ 110   │ 3.08     │ 3.215    │      ┆
┢╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍┪
┇ 4 more                                                                ┇
┗╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍┛

8
┏━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳╍╍╍╍╍╍┓
┃    ┃ mpg      ┃ cyl   ┃ disp     ┃ hp    ┃ drat     ┃ wt       ┃ 5    ┇
┃    ┃ <Double> ┃ <Int> ┃ <Double> ┃ <Int> ┃ <Double> ┃ <Double> ┃ more ┇
┡━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇╍╍╍╍╍╍┩
│ 18 │ 18.7     │ 8     │ 360.0    │ 175   │ 3.15     │ 3.44     │      ┆
│ 19 │ 14.3     │ 8     │ 360.0    │ 245   │ 3.21     │ 3.57     │      ┆
│ 20 │ 16.4     │ 8     │ 275.8    │ 180   │ 3.07     │ 4.07     │      ┆
│ 21 │ 17.3     │ 8     │ 275.8    │ 180   │ 3.07     │ 3.73     │      ┆
│ 22 │ 15.2     │ 8     │ 275.8    │ 180   │ 3.07     │ 3.78     │      ┆
│ 23 │ 10.4     │ 8     │ 472.0    │ 205   │ 2.93     │ 5.25     │      ┆
│ 24 │ 10.4     │ 8     │ 460.0    │ 215   │ 3.0      │ 5.424    │      ┆
│ 25 │ 14.7     │ 8     │ 440.0    │ 230   │ 3.23     │ 5.345    │      ┆
│ 26 │ 15.5     │ 8     │ 318.0    │ 150   │ 2.76     │ 3.52     │      ┆
│ 27 │ 15.2     │ 8     │ 304.0    │ 150   │ 3.15     │ 3.435    │      ┆
┢╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍┪
┇ 18 more                                                               ┇
┗╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍┛

// number of groups
// R: group_by(xdf, cyl) %>% group_keys() %>% nrow()
print(xdf.grouped(by: "cyl").count)

3

// group, manipulate (in this case, filter), and re-combine
// R: group_by(xdf) %>% filter(mpg < 20) %>% ungroup()
print(
  xdf.grouped(by: "cyl").mapGroups { (val) in
    val.filter(on: "mpg", Double.self) { (val) in val! < 20 }.base
  }.ungrouped()
)

┏━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳╍╍╍╍╍╍┓
┃   ┃ mpg      ┃ disp     ┃ hp    ┃ drat     ┃ wt       ┃ qsec     ┃ 5    ┇
┃   ┃ <Double> ┃ <Double> ┃ <Int> ┃ <Double> ┃ <Double> ┃ <Double> ┃ more ┇
┡━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇╍╍╍╍╍╍┩
│ 0 │ 22.8     │ 108.0    │ 93    │ 3.85     │ 2.32     │ 18.61    │      ┆
│ 1 │ 24.4     │ 146.7    │ 62    │ 3.69     │ 3.19     │ 20.0     │      ┆
│ 2 │ 22.8     │ 140.8    │ 95    │ 3.92     │ 3.15     │ 22.9     │      ┆
│ 3 │ 32.4     │ 78.7     │ 66    │ 4.08     │ 2.2      │ 19.47    │      ┆
│ 4 │ 30.4     │ 75.7     │ 52    │ 4.93     │ 1.615    │ 18.52    │      ┆
│ 5 │ 33.9     │ 71.1     │ 65    │ 4.22     │ 1.835    │ 19.9     │      ┆
│ 6 │ 21.5     │ 120.1    │ 97    │ 3.7      │ 2.465    │ 20.01    │      ┆
│ 7 │ 27.3     │ 79.0     │ 66    │ 4.08     │ 1.935    │ 18.9     │      ┆
│ 8 │ 26.0     │ 120.3    │ 91    │ 4.43     │ 2.14     │ 16.7     │      ┆
│ 9 │ 30.4     │ 95.1     │ 113   │ 3.77     │ 1.513    │ 16.9     │      ┆
┢╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍┪
┇ 182 more                                                                ┇
┗╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍┛

// look at one column
// R: xdf$cyl
print( xdf["cyl"] )

┏━━━━━━━┓
┃ cyl   ┃
┃ <Int> ┃
┡━━━━━━━┩
│ 4     │
│ 4     │
│ 4     │
│ 4     │
│ 4     │
│ 4     │
│ 4     │
│ 4     │
│ 4     │
│ 4     │
┢╍╍╍╍╍╍╍┪
┇ 54 m… ┇
┗╍╍╍╍╍╍╍┛

// combine two columns and look at it
// R: mutate(xdf, cyl_mpg = sprintf("%s:%s", cyl, mpg) %>% select(-cyl, -mpg)
// R: unite(xdf, cyl_mpg, cyl, mpg, sep = ":") # alternate way
xdf.combineColumns("cyl", "mpg", into: "cyl_mpg") { (val1: Int?, val2: Double?) -> String in
  String(val1 ?? 0) + ":" + String(val2 ?? 0.0)
}

print(xdf["cyl_mpg"])

┏━━━━━━━━━━┓
┃ cyl_mpg  ┃
┃ <String> ┃
┡━━━━━━━━━━┩
│ 4:22.8   │
│ 4:24.4   │
│ 4:22.8   │
│ 4:32.4   │
│ 4:30.4   │
│ 4:33.9   │
│ 4:21.5   │
│ 4:27.3   │
│ 4:26.0   │
│ 4:30.4   │
┢╍╍╍╍╍╍╍╍╍╍┪
┇ 54 more  ┇
┗╍╍╍╍╍╍╍╍╍╍┛

// look at the colnames (^^ removes "cyl" and "mpg"
// R: colnames(xdf)
print(xdf.columns.map{ col in col.name })

["cyl_mpg", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb"]

// turn an Int into a Double
// R: xdf$hp <- as.double(xdf$hp) # or use dplyr::mutate()
xdf.transformColumn("hp") { (val1: Int?) -> Double? in
  Double(val1 ?? 0)
}

print(xdf["hp"])

┏━━━━━━━━━━┓
┃ hp       ┃
┃ <Double> ┃
┡━━━━━━━━━━┩
│ 93.0     │
│ 62.0     │
│ 95.0     │
│ 66.0     │
│ 52.0     │
│ 65.0     │
│ 97.0     │
│ 66.0     │
│ 91.0     │
│ 113.0    │
┢╍╍╍╍╍╍╍╍╍╍┪
┇ 54 more  ┇
┗╍╍╍╍╍╍╍╍╍╍┛

// look at the coltypes
// R: sapply(mtcars, typeof)
print(xdf.columns.map{ col in col.wrappedElementType })

[Swift.String, Swift.Double, Swift.Double, Swift.Double, Swift.Double, Swift.Double, Swift.Int, Swift.Int, Swift.Int, Swift.Int]

// distinct horsepower
// R: distinct(xdf, hp)
print(xdf["hp"].distinct())

┏━━━━━━━━━━┓
┃ hp       ┃
┃ <Double> ┃
┡━━━━━━━━━━┩
│ 93.0     │
│ 62.0     │
│ 95.0     │
│ 66.0     │
│ 52.0     │
│ 65.0     │
│ 97.0     │
│ 91.0     │
│ 113.0    │
│ 109.0    │
┢╍╍╍╍╍╍╍╍╍╍┪
┇ 12 more  ┇
┗╍╍╍╍╍╍╍╍╍╍┛

// row slices
// R: xdf[10,]
print(xdf.rows[10])

┏━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳╍╍╍╍╍╍┓
┃    ┃ cyl_mpg  ┃ disp     ┃ hp       ┃ drat     ┃ wt       ┃ qsec     ┃ 4    ┇
┃    ┃ <String> ┃ <Double> ┃ <Double> ┃ <Double> ┃ <Double> ┃ <Double> ┃ more ┇
┡━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇╍╍╍╍╍╍┩
│ 10 │ 4:21.4   │ 121.0    │ 109.0    │ 4.11     │ 2.78     │ 18.6     │      ┆
└────┴──────────┴──────────┴──────────┴──────────┴──────────┴──────────┴╌╌╌╌╌╌┘

// R: xdf[3:10,]
print(xdf.rows[3...10])

Rows(base: 
┏━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳╍╍╍╍╍╍┓
┃   ┃ cyl_mpg  ┃ disp     ┃ hp       ┃ drat     ┃ wt       ┃ qsec     ┃ 4    ┇
┃   ┃ <String> ┃ <Double> ┃ <Double> ┃ <Double> ┃ <Double> ┃ <Double> ┃ more ┇
┡━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇╍╍╍╍╍╍┩
│ 0 │ 4:22.8   │ 108.0    │ 93.0     │ 3.85     │ 2.32     │ 18.61    │      ┆
│ 1 │ 4:24.4   │ 146.7    │ 62.0     │ 3.69     │ 3.19     │ 20.0     │      ┆
│ 2 │ 4:22.8   │ 140.8    │ 95.0     │ 3.92     │ 3.15     │ 22.9     │      ┆
│ 3 │ 4:32.4   │ 78.7     │ 66.0     │ 4.08     │ 2.2      │ 19.47    │      ┆
│ 4 │ 4:30.4   │ 75.7     │ 52.0     │ 4.93     │ 1.615    │ 18.52    │      ┆
│ 5 │ 4:33.9   │ 71.1     │ 65.0     │ 4.22     │ 1.835    │ 19.9     │      ┆
│ 6 │ 4:21.5   │ 120.1    │ 97.0     │ 3.7      │ 2.465    │ 20.01    │      ┆
│ 7 │ 4:27.3   │ 79.0     │ 66.0     │ 4.08     │ 1.935    │ 18.9     │      ┆
│ 8 │ 4:26.0   │ 120.3    │ 91.0     │ 4.43     │ 2.14     │ 16.7     │      ┆
│ 9 │ 4:30.4   │ 95.1     │ 113.0    │ 3.77     │ 1.513    │ 16.9     │      ┆
┢╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍╍╍╍╍┷╍╍╍╍╍╍┪
┇ 54 more                                                                    ┇
┗╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍┛
, subranges: _RangeSet(3..<11))

My {cdcfluview} package started tossing erros on CRAN just over a week ago when the CDC added an extra parameter to one of the hidden API endpoints that the package wraps. After a fairly hectic set of days since said NOTE came, I had time this morning to poke at a fix. There are alot of tests, so after successful debugging session I was awaiting CRAN checks on various remotes as well as README builds and figured I’d keep up some practice with another, nascent, package of mine, {swiftr}, which makes it dead simple to build R functions from Swift code, in similar fashion to what Rcpp::cppFunction() does for C/C++ code.

macOS comes with a full set of machine learning/AI libraries/frameworks that definitely have “batteries included” (i.e. you can almost just make one function call to get 90-95% what you want without even training new models). One of which is text extraction from Apple’s computer Vision framework. I thought it’d be a fun and quick “wait mode” distraction to wrap the VNRecognizeTextRequest() function and use it from R.

To show how capable the default model is, I pulled a semi-complex random image from DDG’s image search:

Yellow street signs against clear blue sky pointing different directions. Each plate on the street sign has a specific term like unsure, muddled, coonfused and so on. Dilemma and confusion concept. horizontal composition with copy space. Clipping path is included.

Let’s build the function (you need to be on macOS for this; exposition inine):

library(swiftr) # github.com/hrbrmstr/swiftr

swift_function(
  code = '
import Foundation
import CoreImage
import Cocoa
import Vision

@_cdecl ("detect_text")
public func detect_text(path: SEXP) -> SEXP {

   // turn R string into Swift String so we can use it
   let fileName = String(cString: R_CHAR(STRING_ELT(path, 0)))

   var res: String = ""
   var out: SEXP = R_NilValue

  // get image into the right format
  if let ciImage = CIImage(contentsOf: URL(fileURLWithPath:fileName)) {

    let context = CIContext(options: nil)
    if let img = context.createCGImage(ciImage, from: ciImage.extent) {

      // setup comptuer vision request
      let requestHandler = VNImageRequestHandler(cgImage: img)

      // start recognition
      let request = VNRecognizeTextRequest()
      do {
        try requestHandler.perform([request])

        // if we have results
        if let observations = request.results as? [VNRecognizedTextObservation] {

          // paste them together
          let recognizedStrings = observations.compactMap { observation in
            observation.topCandidates(1).first?.string
          }
          res = recognizedStrings.joined(separator: "\\n")
        }
      } catch {
        debugPrint("\\(error)")
      }
    }
  }

  res.withCString { cstr in out = Rf_mkString(cstr) }

  return(out)
}
')

The detect_text() is now available in R, so let’s see how it performs on that image of signs:

detect_text(path.expand("~/Data/signs.jpeg")) %>% 
  stringi::stri_split_lines() %>% 
  unlist()
##  [1] "BEWILDERED" "UNCLEAR"    "nAZEU"      "UNCERTAIN"  "VISA"       "INSURE"    
##  [7] "ATED"       "MUDDLED"    "LOsT"       "DISTRACTED" "PERPLEXED"  "CONFUSED"  
## [13] "PUZZLED" 

It works super-fast and gets far more correct than I would have expected.

Toy examples aside, it also works pretty well (as one would expect) on “real” text images, such as this example from the Tesseract test suite:

tesseract project newspaper clipping example text image

detect_text(path.expand("~/Data/tesseract/news.3B/0/8200_006.3B.tif")) %>% 
  stringi::stri_split_lines() %>% 
  unlist()
##  [1] "Tobacco chiefs still refuse to see the truth abou"                           
##  [2] "even of America's least conscionable"                                        
##  [3] "The tobacco industry would like to promote"                                  
##  [4] "men sat together in Washington last"                                         
##  [5] "under the conditions they are used.'"                                        
##  [6] "week to do what they do best: blow"                                          
##  [7] "the specter of prohibition."                                                 
##  [8] "panel\" of toxicologists as \"not hazardous"                                 
##  [9] "smoke at the truth about cigarettes."                                        
## [10] "'If cigarettes are too dangerous to be sold,"                                
## [11] "then ban them. Some smokers will obey the"                                   
## [12] "People not paid by the tobacco companies"                                    
## [13] "aren't so sure. The list includes several"                                   
## [14] "The CEOs of the nation's largest tobacco"                                    
## [15] "firms told congressional panel that nicotine"                                
## [16] "law, but many will not. People will be selling"                              
## [17] "iS not addictive, that they are unconvinced"                                 
## [18] "cigarettes out of the trunks of cars, cigarettes"                            
## [19] "substances the government does not allow in"                                 
## [20] "foods or classifies as potentially toxic. They"                              
## [21] "that smoking causes lung cancer or any other"                                
## [22] "made by who knows who, made of who knows include ammonia, a pesticide called"
## [23] "illness, and that smoking is no more harmful"                                
## [24] "what,\" said James Johnston of R.J. Reynolds."                               
## [25] "than drinking coffee or eating Twinkies."                                    
## [26] "It's a ruse. He knows cigarettes are not"                                    
## [27] "methoprene, and ethyl furoate, which has"                                    
## [28] "They said these things with straight taces."                                 
## [29] "going to be banned, at leasi not in his lifetime."                           
## [30] "caused liver damage in rats."                                                
## [31] "The list \"begs a number of important"                                       
## [32] "They said them in the face of massive"                                       
## [33] "STEVE WILSON"                                                                
## [34] "What he really fears are new taxes, stronger"                                
## [35] "questions about the safety of these additives,\""                            
## [36] "scientific evidence that smoking is responsible"                             
## [37] "anti-smoking campaigns, further smoking"                                     
## [38] "said a joint statement from the American"                                    
## [39] "for more than 400,000 deaths every year."                                    
## [40] "restrictions, limits on secondhand smoke and"                                
## [41] "Rep. Henry Waxman, D-Calif., put that"                                       
## [42] "Republic Columnist"                                                          
## [43] "Lung, Cancer and Heart associations. The"                                    
## [44] "limits on tar and nicotine."                                                 
## [45] "statement added that substances safe to eat"                                 
## [46] "frightful statistic another way:"                                            
## [47] "Collectively, these steps can accelerate the"                                
## [48] "\"Imagine our nation's outrage if two fully"                                 
## [49] "He and the others played dumb for the"                                       
## [50] "current 5 percent annual decline in cigarette"                               
## [51] "aren't necessarily safe to inhale."                                          
## [52] "The 50-page list can be obtained free by"                                    
## [53] "loaded jumbo jets crashed each day, killing all"                             
## [54] "entire six hours, but really didn't matter."                                 
## [55] "use and turn the tobacco business from highly"                               
## [56] "calling 1-800-852-8749."                                                     
## [57] "aboard. That's the same number of Americans"                                 
## [58] "The game i nearly over, and the tobacco"                                     
## [59] "profitable to depressed."                                                    
## [60] "Johnson's comment about cigarettes \"made"                                   
## [61] "Here are just the 44 ingredients that start"                                 
## [62] "that cigarettes kill every 24 hours.'"                                       
## [63] "executives know it."                                                         
## [64] "with the letter \"A\":"                                                      
## [65] "The CEOs were not impressed."                                                
## [66] "The hearing marked a turning point in the"                                   
## [67] "of who knows what\" was comical."                                            
## [68] "Acetanisole, acetic acid, acetoin,"                                          
## [69] "\"We have looked at the data."                                               
## [70] "It does"                                                                     
## [71] "nation's growing aversion to cigarettes. No"                                 
## [72] "The day before the hearing, the tobacco"                                     
## [73] "acetophenone,6-acetoxydihydrotheaspirane,"                                   
## [74] "not convince me that smoking causes death,\""                                
## [75] "2-acetyl-3-ethylpyrazine, 2-acetyl-5-"                                       
## [76] "said Andrew Tisch of the Lorillard Tobacco"                                  
## [77] "longer hamstrung by tobacco-state seniority"                                 
## [78] "companies released a long-secret list of 599"                                
## [79] "Co."                                                                         
## [80] "and the deep-pocketed tobacco lobby,"                                        
## [81] "methylfuran, acetylpyrazine, 2-acetylpyridine,"                              
## [82] "Congress is taking aim at cigarette makers."                                 
## [83] "additives used in cigarettes. The companies"                                 
## [84] "said all are certified by an \"independent"                                  
## [85] "3-acetylpyridine, 2-acetylthiazole, aconitic"   

(You can compare that on your own with the Tesseract results.)

FIN

{cdcfluview} checks are done, and the fixed functions are back on CRAN! Just in time to close out this post.

If you’re on macOS, definitely check out the various ML/AI frameworks Apple has to offer via Swift and have some fun playing with integrating them into R (or build some small, command line utilities if you want to keep Swift and R apart).

There’s a semi-infrequent-but-frequent-enough-to-be-annoying manual task at $DAYJOB that involves extracting a particular set of strings (identifiable by a fairly benign set of regular expressions) from various interactive text sources (so, not static documents or documents easily scrape-able).

Rather than hack something onto Sublime Text or VS Code I made a small macOS app in SwiftUI that does the extraction when something is pasted.

It occurred to me that this would work for indicators of compromise (IoCs) — because why not add one more to the 5 billion of them on GitHub — and I forked my app, removed all the $WORK bits and added in some code to do just this, and unimaginatively dubbed it extractor. Here’s the main view:

macOS GUI window showing the extractor main view

For now, extractor handles identifying & extracting CIDRs, IPv4s, URLs, hostnames, and email addresses (file issues if you really want hashes, CVE strings or other types) either from:

  • an input URL (it fetches the content and extracts IoCs from the rendered HTML, not the HTML source);
  • items pasted into the textbox (more on some SwiftUI 2 foibles regarding that in a bit); and
  • PDF, HTML, and text files (via Open / ⌘-o)

Here it is extracting IoCs from one of FireEye’s “solarwinds”-related posts:

macOS GUI window showing extracted IoCs from a blog post

If you tick the “Monitor Pasteboard” toggle, the app will monitor the all system-wide additions to the pasteboard, extract the IoCs from them and put them in the textbox. (I think I really need to make this additive to the text in the textbox vs replacing what’s there).

You can save the indicators out to a text file (via Save / ⌘-s) or just copy them from the text box (if you want ndjson or some threat indicator sharing format file an issue).

That SwiftUI 2 Thing I Mentioned

SwiftUI 2 makes app-creation very straightforward, but it also still has many limitations. One of which is how windows/controls handle the “Paste” command. The glue code to make this app really work the way I’d like it to work is just annoying enough to have it on a TODO vs an ISDONE list and I’m hoping SwiftUI 3 comes out with WWDC 2021 (in a scant ~2 months) and provides a less hacky solution.

FIN

You can find the source and notarized binary releases of extractor on GitHub. File issues for questions, feature requests, or problems with the app/code.

Because I used SwiftUI 2, it is very likely possible to have this app work on iOS and iPadOS devices. I can’t see anyone using an iPad for DFIR work, but if you’d like a version of this for iOS/iPadOS, also drop an issue.

There are a plethora of amazingly useful Golang libraries, and it has been possible for quite some time to use Go libraries with Swift. The advent of the release of the new Apple Silicon/M1/arm64 architecture for macOS created the need for a new round of “fat”/”universal” binaries and libraries to bridge the gap between legacy Intel Macs and the new breed of Macs.

I didn’t see an “all-in-one-place” snippet for how to build cross-platform + fat/universal Golang static libraries and then use them in Swift to make a fat/universal macOS binary, and it is likely I’m not the only one who wished this existed, so here’s a snippet that takes the static HTML library example from Young Dynasty and shows how to prepare the static library for use in Swift, then how to build a Swift universal binary. This is a command line app and we’re building everything without the Xcode editor to keep it concise and straightforward.

The rest of the explanatory text is in the comments in the code block.

# make a space to play in
mkdir universal-static-test
cd universal-static-test

# libhtmlscraper via: https://youngdynasty.net/posts/writing-mac-apps-in-go/
# make the small HTML escaper library Golang source
cat > main.go << EOF
package main

import (
  "C"
  "html"
)

//export escape_html
func escape_html(input *C.char) *C.char {
  s := html.EscapeString(C.GoString(input))
  return C.CString(s)
}

//export unescape_html
func unescape_html(input *C.char) *C.char {
  s := html.UnescapeString(C.GoString(input))
  return C.CString(s)
}

// We need an entry point; it's ok for this to be empty
func main() {}
EOF

# build the Go library for ARM
CGO_ENABLED=1 GOOS=darwin GOARCH=arm64 go build --buildmode=c-archive -o libhtmlescaper-arm64.a

# build the Go library for AMD
CGO_ENABLED=1 GOOS=darwin GOARCH=amd64 go build --buildmode=c-archive -o libhtmlescaper-amd64.a

# Make a universal static archive
lipo -create libhtmlescaper-arm64.a libhtmlescaper-amd64.a -o libhtmlescaper.a

# we don't need this anymore
rm libhtmlescaper-amd64.h

# this is a better name
mv libhtmlescaper-arm64.h libhtmlescaper.h

# make the objective-c bridging header so we can use the library in Swift
cat > bridge.h <<EOF
#include "libhtmlescaler.h"
EOF

# creaate a lame/super basic test swift file that uses the Go library
cat > main.swift <<EOF
print(String(cString: escape_html(strdup("<b>bold</b>"))))
EOF

# make the swift executatble for amd64
swiftc -target x86_64-apple-macos11.0 -import-objc-header bridge.h main.swift libhtmlescaper.a -o main-amd64

# make the swift executatble for arm64
swiftc -target arm64-apple-macos11.0 -import-objc-header bridge.h main.swift libhtmlescaper.a -o main-arm64 

# Make a universal binary
lipo -create main-amd64 main-arm64 -o main

# Make sure it's universal
file main
## main: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64]
## main (for architecture x86_64): Mach-O 64-bit executable x86_64
## main (for architecture arm64):  Mach-O 64-bit executable arm64

# try it out
./main.swift
## "<b>bold</b>"

The last post showed how to work with the macOS mdls command line XML output, but with {swiftr} we can avoid the command line round trip by bridging the low-level Spotlight API (which mdls uses) directly in R via Swift.

If you’ve already played with {swiftr} before but were somewhat annoyed at various boilerplate elements you’ve had to drag along with you every time you used swift_function() you’ll be pleased that I’ve added some SEXP conversion helpers to the {swiftr} package, so there’s less cruft when using swift_function().

Let’s add an R↔Swift bridge function to retrieve all available Spotlight attributes for a macOS file:

library(swiftr)

swift_function('

  // Add an extension to URL which will retrieve the spotlight 
  // attributes as an array of Swift Strings
  extension URL {

  var mdAttributes: [String]? {

    get {
      guard isFileURL else { return nil }
      let item = MDItemCreateWithURL(kCFAllocatorDefault, self as CFURL)
      let attrs = MDItemCopyAttributeNames(item)!
      return(attrs as? [String])
    }

  }

}

@_cdecl ("file_attrs")
public func file_attrs(path: SEXP) -> SEXP {

  // Grab the attributres
  let outAttr = URL(fileURLWithPath: String(path)!).mdAttributes!

  // send them to R
  return(outAttr.SEXP!)

}
')

And, then try it out:

fil <-  "/Applications/RStudio.app"

file_attrs(fil)
##  [1] "kMDItemContentTypeTree"                 "kMDItemContentType"                    
##  [3] "kMDItemPhysicalSize"                    "kMDItemCopyright"                      
##  [5] "kMDItemAppStoreCategory"                "kMDItemKind"                           
##  [7] "kMDItemDateAdded_Ranking"               "kMDItemDocumentIdentifier"             
##  [9] "kMDItemContentCreationDate"             "kMDItemAlternateNames"                 
## [11] "kMDItemContentModificationDate_Ranking" "kMDItemDateAdded"                      
## [13] "kMDItemContentCreationDate_Ranking"     "kMDItemContentModificationDate"        
## [15] "kMDItemExecutableArchitectures"         "kMDItemAppStoreCategoryType"           
## [17] "kMDItemVersion"                         "kMDItemCFBundleIdentifier"             
## [19] "kMDItemInterestingDate_Ranking"         "kMDItemDisplayName"                    
## [21] "_kMDItemDisplayNameWithExtensions"      "kMDItemLogicalSize"                    
## [23] "kMDItemUsedDates"                       "kMDItemLastUsedDate"                   
## [25] "kMDItemLastUsedDate_Ranking"            "kMDItemUseCount"                       
## [27] "kMDItemFSName"                          "kMDItemFSSize"                         
## [29] "kMDItemFSCreationDate"                  "kMDItemFSContentChangeDate"            
## [31] "kMDItemFSOwnerUserID"                   "kMDItemFSOwnerGroupID"                 
## [33] "kMDItemFSNodeCount"                     "kMDItemFSInvisible"                    
## [35] "kMDItemFSTypeCode"                      "kMDItemFSCreatorCode"                  
## [37] "kMDItemFSFinderFlags"                   "kMDItemFSHasCustomIcon"                
## [39] "kMDItemFSIsExtensionHidden"             "kMDItemFSIsStationery"                 
## [41] "kMDItemFSLabel"   

No system() (et al.) round trip!

Now, lets make R↔Swift bridge function to retrieve the value of an attribute.

Before we do that, let me be up-front that relying on debugDescription (which makes a string representation of a Swift object) is a terrible hack that I’m using just to make the example as short as possible. We should do far more error checking and then further check the type of the object coming from the Spotlight API call and return an R-compatible version of that type. This mdAttr() method will almost certainly break depending on the item being returned.

swift_function('
extension URL {

  // Add an extension to URL which will retrieve the spotlight 
  // attribute value as a String. This will almost certainly die 
  // under various value conditions.

  func mdAttr(_ attr: String) -> String? {
    guard isFileURL else { return nil }
    let item = MDItemCreateWithURL(kCFAllocatorDefault, self as CFURL)
    return(MDItemCopyAttribute(item, attr as CFString).debugDescription!)
  }

}

@_cdecl ("file_attr")
public func file_attr(path: SEXP, attr: SEXP) -> SEXP {

  // file path as Swift String
  let xPath = String(cString: R_CHAR(Rf_asChar(path)))

  // attribute we want as a Swift String
  let xAttr = String(cString: R_CHAR(Rf_asChar(attr)))

  // the Swift debug string value of the attribute
  let outAttr = URL(fileURLWithPath: xPath).mdAttr(xAttr)

  // returned as an R string
  return(Rf_mkString(outAttr))
}
')

And try this out on some carefully selected attributes:

file_attr(fil, "kMDItemDisplayName")
## [1] "RStudio.app"

file_attr(fil, "kMDItemAppStoreCategory")
## [1] "Developer Tools"

file_attr(fil, "kMDItemVersion")
## [1] "1.4.1651"

Note that if we try to get fancy and retrieve an attribute value that is something like an array of strings, it doesn’t work so well:

file_attr(fil, "kMDItemExecutableArchitectures")
## [1] "<__NSSingleObjectArrayI 0x7fe1f6d19bf0>(\nx86_64\n)\n"

Again, ideally, we’d make a small package wrapper vs use swift_function() for this in production, but I wanted to show how straightforward it can be to get access to some fun and potentially powerful features of macOS right in R with just a tiny bit of Swift glue code.

Also, I hadn’t tried {swiftr} on the M1 Mini before and it seems I need to poke a bit to see what needs doing to get it to work properly in the arm64 RStudio rsession.

UPDATE (2021-04-14 a bit later)

It dawned on me that a minor tweak to the Swift mdAttr() function would make the method more resilient (but still hacky):

  func mdAttr(_ attr: String) -> String {
    guard isFileURL else { return "" }
    let item = MDItemCreateWithURL(kCFAllocatorDefault, self as CFURL)
    let x = MDItemCopyAttribute(item, attr as CFString)
    if (x == nil) {
      return("")
    } else {
      return("\(x!)")
    }
  }

Now we can (more) safely do something like this:

str(as.list(sapply(
  file_attrs(fil),
  function(attr) {
    file_attr(fil, attr)
  }
)), 1)
## List of 41
##  $ kMDItemContentTypeTree                : chr "(\n    \"com.apple.application-bundle\",\n    \"com.apple.application\",\n    \"public.executable\",\n    \"com"| __truncated__
##  $ kMDItemContentType                    : chr "com.apple.application-bundle"
##  $ kMDItemPhysicalSize                   : chr "767619072"
##  $ kMDItemCopyright                      : chr "RStudio 1.4.1651, © 2009-2021 RStudio, PBC"
##  $ kMDItemAppStoreCategory               : chr "Developer Tools"
##  $ kMDItemKind                           : chr "Application"
##  $ kMDItemDateAdded_Ranking              : chr "2021-04-09 00:00:00 +0000"
##  $ kMDItemDocumentIdentifier             : chr "0"
##  $ kMDItemContentCreationDate            : chr "2021-03-25 23:08:34 +0000"
##  $ kMDItemAlternateNames                 : chr "(\n    \"RStudio.app\"\n)"
##  $ kMDItemContentModificationDate_Ranking: chr "2021-03-25 00:00:00 +0000"
##  $ kMDItemDateAdded                      : chr "2021-04-09 13:25:11 +0000"
##  $ kMDItemContentCreationDate_Ranking    : chr "2021-03-25 00:00:00 +0000"
##  $ kMDItemContentModificationDate        : chr "2021-03-25 23:08:34 +0000"
##  $ kMDItemExecutableArchitectures        : chr "(\n    \"x86_64\"\n)"
##  $ kMDItemAppStoreCategoryType           : chr "public.app-category.developer-tools"
##  $ kMDItemVersion                        : chr "1.4.1651"
##  $ kMDItemCFBundleIdentifier             : chr "org.rstudio.RStudio"
##  $ kMDItemInterestingDate_Ranking        : chr "2021-04-15 00:00:00 +0000"
##  $ kMDItemDisplayName                    : chr "RStudio.app"
##  $ _kMDItemDisplayNameWithExtensions     : chr "RStudio.app"
##  $ kMDItemLogicalSize                    : chr "763253198"
##  $ kMDItemUsedDates                      : chr "(\n    \"2021-03-26 04:00:00 +0000\",\n    \"2021-03-30 04:00:00 +0000\",\n    \"2021-04-02 04:00:00 +0000\",\n"| __truncated__
##  $ kMDItemLastUsedDate                   : chr "2021-04-15 00:21:45 +0000"
##  $ kMDItemLastUsedDate_Ranking           : chr "2021-04-15 00:00:00 +0000"
##  $ kMDItemUseCount                       : chr "12"
##  $ kMDItemFSName                         : chr "RStudio.app"
##  $ kMDItemFSSize                         : chr "763253198"
##  $ kMDItemFSCreationDate                 : chr "2021-03-25 23:08:34 +0000"
##  $ kMDItemFSContentChangeDate            : chr "2021-03-25 23:08:34 +0000"
##  $ kMDItemFSOwnerUserID                  : chr "501"
##  $ kMDItemFSOwnerGroupID                 : chr "80"
##  $ kMDItemFSNodeCount                    : chr "1"
##  $ kMDItemFSInvisible                    : chr "0"
##  $ kMDItemFSTypeCode                     : chr "0"
##  $ kMDItemFSCreatorCode                  : chr "0"
##  $ kMDItemFSFinderFlags                  : chr "0"
##  $ kMDItemFSHasCustomIcon                : chr ""
##  $ kMDItemFSIsExtensionHidden            : chr "1"
##  $ kMDItemFSIsStationery                 : chr ""
##  $ kMDItemFSLabel                        : chr "0"

We’re still better off (in the long run) checking for and using proper types.

FIN

I hope to be able to carve out some more time in the not-too-distant-future for both {swiftr} and the in-progress guide on using Swift and R, but hopefully this post [re-]piqued interest in this topic for some R and/or Swift users.

Greynoise helps security teams focus on potential threats by reducing the noise from logs, alerts, and SIEMs. They constantly watch for badly behaving internet hosts, keep track of the benign ones, and use this research to classify IP addresses. Teams can use these classifications to only focus on things that (potentially) matter.

They also have a generous (10K calls/day), free community API which does not require credentialed access and returns a subset of information that the full API does. This is handy for folks who can’t afford the service or who only need to occasionally poke at IP addresses.

Andrew, GN’s CEO, tweeted out a super-hacky shell one-liner, the other day, that grabs the external IPs of all the ESTABLISHED IPv4 TCP connections and runs them through the community API via curl. Even though I made it a bit less-hacky:

sudo netstat -anp TCP \
  | rg ESTAB \
  | rg "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" -o \
  | rg -v "(^127\.)|(^10\.)|(^172\.1[6-9]\.)|(^172\.2[0-9]\.)|(^172\.3[0-1]\.)|(^192\.168\.)" \
  | rg -v "$(dig +short viz.greynoise.io @9.9.9.9 | rg '^\d' | tr '\n' '|' | sed -e 's/.$//g')" \
  | sort -u \
  | while read IP; do echo $(curl --silent https://api.greynoise.io/v3/community/$IP); done |
  Rscript -e 'tibble::as_tibble(jsonlite::stream_in(file("stdin"), verbose=FALSE))'

its still a “run-on-demand” process that you could put in a script and launchd, but then you’d still have to keep a terminal up or remember to watch some file. Plus, it relies on full executables.

I decided to make things a bit easier for folks on macOS Big Sur by cranking out a small SwiftUI app I’ve dubbed GreyWatch:

Each list entry show an IP address your Mac previously connected to (since app launch) or currently has established TCP connections to. The three indicator dots show (in order) whether Greynoise has detected scanning behavior from the IP address within the last 30 days, whether it has a “Rule It OuT” (RIOT) classification, and what — if any — classification the IP address has. The app only shows an IP address once even it you continue to connect to it and it puts new connections on top.

If an IP address has a classification, double-clicking it will open your default browser to the Greynoise visualizer, otherwise said double-click will take you to the IPInfo entry for the IP address.

Needless to say, if your Mac is talking to a host Greynoise has classified as horribad, your other 99 problems no longer take precedence. I’ll likely add a notification action if that condition occurrs.

There’s an “Export…” item in the file menu that lets you save a copy of the current IP list (with metadata) to an ndlines formatted JSON file.

The app does not shell out to dig or netstat and has a light memory and energy footprint.

There are pre-built, notarized binaries in the releases section, and I’ll gradually be adding features (submit yours via new issues!). You can also submit bug reports or other questions via GH issues as well.

Many thanks to Andrew and team for their generous free tier, which enables semi-useful community hacks like this one!

The past two posts have (lightly) introduced how to use compiled Swift code in R, but they’ve involved a bunch of “scary” command line machinations and incantations.

One feature of {Rcpp} I’ve always 💙 is the cppFunction() (“r-lib” zealots have a similar cpp11::cpp_function()) which lets one experiment with C[++] code in R with as little friction as possible. To make it easier to start experimenting with Swift, I’ve built an extremely fragile swift_function() in {swiftr} that intends to replicate this functionality. Explaining it will be easier with an example.

Reading Property Lists in R With Swift

macOS relies heavily on property lists for many, many things. These can be plain text (XML) or binary files and there are command-line tools and Python libraries (usable via {reticulate}) that can read them along with the good ‘ol XML::readKeyValueDB(). We’re going to create a Swift function to read property lists and return JSON which we can use back in R via {jsonlite}.

This time around there’s no need to create extra files, just install {swiftr} and your favorite R IDE and enter the following (expository is after the code):

library(swiftr)

swift_function(
  code = '

func ignored() {
  print("""
this will be ignored by swift_function() but you could use private
functions as helpers for the main public Swift function which will be 
made available to R.
""")
}  

@_cdecl ("read_plist")
public func read_plist(path: SEXP) -> SEXP {

  var out: SEXP = R_NilValue

  do {
    // read in the raw plist
    let plistRaw = try Data(contentsOf: URL(fileURLWithPath: String(cString: R_CHAR(STRING_ELT(path, 0)))))

    // convert it to a PropertyList  
    let plist = try PropertyListSerialization.propertyList(from: plistRaw, options: [], format: nil) as! [String:Any]

    // serialize it to JSON
    let jsonData = try JSONSerialization.data(withJSONObject: plist , options: .prettyPrinted)

    // setup the JSON string return
    String(data: jsonData, encoding: .utf8)?.withCString { 
      cstr in out = Rf_mkString(cstr) 
    }

  } catch {
    debugPrint("\\(error)")
  }

  return(out)

}
')

This new swift_function() function — for the moment (the API is absolutely going to change) — is defined as:

swift_function(
  code,
  env = globalenv(),
  imports = c("Foundation"),
  cache_dir = tempdir()
)

where:

  • code is a length 1 character vector of Swift code
  • env is the environment to expose the function in (defaults to the global environment)
  • imports is a character vector of any extra Swift frameworks that need to be imported
  • cache_dir is where all the temporary files will be created and compiled dynlib will be stored. It defaults to a temporary directory so specify your own directory (that exists) if you want to keep the files around after you close the R session

Folks familiar with cppFunction() will notice some (on-purpose) similarities.

The function expects you to expose only one public Swift function which also (for the moment) needs to have the @_cdecl decorator before it. You can have as many other valid Swift helper functions as you like, but are restricted to one function that will be turned into an R function automagically.

In this example, swift_function() will see public func read_plist(path: SEXP) -> SEXP { and be able to identify

  • the function name (read_plist)
  • the number of parameters (they all need to be SEXP, for now)
  • the names of the parameters

A complete source file with all the imports will be created and a pre-built bridging header (which comes along for the ride with {swiftr}) will be included in the compilation step and a dylib will be built and loaded into the R session. Finally, an R function that wraps a .Call() will be created and will have the function name of the Swift function as well as all the parameter names (if any).

In the case of our example, above, the built R function is:

function(path) {
  .Call("read_plist", path)
}

There’s a good chance you’re using RStudio, so we can test this with it’s property list, or you can substitute any other application’s property list (or any .plist you have) to test this out:

read_plist("/Applications/RStudio.app/Contents/Info.plist") %>% 
  jsonlite::fromJSON() %>% 
  str(1)
## List of 32
##  $ NSPrincipalClass                     : chr "NSApplication"
##  $ NSCameraUsageDescription             : chr "R wants to access the camera."
##  $ CFBundleIdentifier                   : chr "org.rstudio.RStudio"
##  $ CFBundleShortVersionString           : chr "1.4.1093-1"
##  $ NSBluetoothPeripheralUsageDescription: chr "R wants to access bluetooth."
##  $ NSRemindersUsageDescription          : chr "R wants to access the reminders."
##  $ NSAppleEventsUsageDescription        : chr "R wants to run AppleScript."
##  $ NSHighResolutionCapable              : logi TRUE
##  $ LSRequiresCarbon                     : logi TRUE
##  $ NSPhotoLibraryUsageDescription       : chr "R wants to access the photo library."
##  $ CFBundleGetInfoString                : chr "RStudio 1.4.1093-1, © 2009-2020 RStudio, PBC"
##  $ NSLocationWhenInUseUsageDescription  : chr "R wants to access location information."
##  $ CFBundleInfoDictionaryVersion        : chr "6.0"
##  $ NSSupportsAutomaticGraphicsSwitching : logi TRUE
##  $ CSResourcesFileMapped                : logi TRUE
##  $ CFBundleVersion                      : chr "1.4.1093-1"
##  $ OSAScriptingDefinition               : chr "RStudio.sdef"
##  $ CFBundleLongVersionString            : chr "1.4.1093-1"
##  $ CFBundlePackageType                  : chr "APPL"
##  $ NSContactsUsageDescription           : chr "R wants to access contacts."
##  $ NSCalendarsUsageDescription          : chr "R wants to access calendars."
##  $ NSMicrophoneUsageDescription         : chr "R wants to access the microphone."
##  $ CFBundleDocumentTypes                :'data.frame':  16 obs. of  8 variables:
##  $ NSPhotoLibraryAddUsageDescription    : chr "R wants write access to the photo library."
##  $ NSAppleScriptEnabled                 : logi TRUE
##  $ CFBundleExecutable                   : chr "RStudio"
##  $ CFBundleSignature                    : chr "Rstd"
##  $ NSHumanReadableCopyright             : chr "RStudio 1.4.1093-1, © 2009-2020 RStudio, PBC"
##  $ CFBundleName                         : chr "RStudio"
##  $ LSApplicationCategoryType            : chr "public.app-category.developer-tools"
##  $ CFBundleIconFile                     : chr "RStudio.icns"
##  $ CFBundleDevelopmentRegion            : chr "English"

FIN

A source_swift() function is on the horizon as is adding a ton of checks/validations to swift_function(). I’ll likely be adding some of the SEXP and R Swift utility functions I’ve demonstrated in the [unfinished] book to make it fairly painless to interface Swift and R code in this new and forthcoming function.

As usual, kick the tyres, submit feature requests and bugs in any forum that’s comfortable and stay strong, wear a 😷, and socially distanced when out and about.