Skip navigation

Category Archives: C++

After a Twitter convo about weather stations I picked up a WeatherFlow Tempest. Setup was quick, but the sensor package died within 24 hours. I was going to give up on it but I had written an R package (for the REST API & UDP broadcast interfaces) and C++ utility (for just the UDP broadcast interface), and the support staff were both friendly and competent and sent me a replacement super quick.

I’ve blathered about the R package already (on Twitter) so am not going to tag that here, but will link to a few repositories (in various languages) that receive the UDP broadcast messages and at least shove them to stdout.

The C++ one is mostly C but gets the job done (it just posts the messages to stdout). It should run everywhere but I only tested on macOS & Linux, because Windows is a terrible operating system nobody should use.

The Golang one has some structured types to consume about half of the JSON messages (I’ve only seen four in the broadcasts so far, and will add more as I see new ones). It’s only more verbose than the C++ one due to the various record type handling. This should run everywhere, though.

For kicks, I threw together a Swift one that is really just Swift-ified C and is a Frankenstein monster that likely shouldn’t be used. (I’ll be making a SwiftUI macOS/iOS/iPadOS app for the UDP broadcast messages, though, soon).

To round out my obsession I also made a Rust version which I’m just in 💙 with (not because of any skill of my own). It’s the smallest source file and is pretty elegant (100% due to Rust, and, again, not me).

All the code/projects are super small, but the Rust source is so tiny that it won’t be too intrusive to post here:

use std::net::UdpSocket;

fn main() -> std::io::Result<()> {

  let mut buf = [0; 1024]; // 1024 byte buffer is plenty
  let s = UdpSocket::bind("0.0.0.0:50222").expect(r#"{"message":"Could not bind to address/port."}"#);

  loop {

    let (n, _) = s.recv_from(&mut buf).expect(r#"{"message":"No broadcasts received."}"#);

    println!("{}", String::from_utf8(buf[..n].to_vec()).unwrap())

  }

}

FIN

If you’re interested in a low-cost weather station with great DIY programming support, I’d definitely (so far, at least) recommend the Tempest. We’ll see if it survives the forthcoming snowpocalypse.

These are the JSON messages it slings over UDP:

{"serial_number":"HB-00069665","type":"hub_status","firmware_revision":"177","uptime":728643,"rssi":-50,"timestamp":1643246011,"reset_flags":"BOR,PIN,POR","seq":72787,"radio_stats":[25,1,0,3,16637],"mqtt_stats":[10,108]}
{"serial_number":"ST-00055227","type":"rapid_wind","hub_sn":"HB-00069665","ob":[1643246013,0.00,0]}
{"serial_number":"ST-00055227","type":"rapid_wind","hub_sn":"HB-00069665","ob":[1643246015,0.00,0]}
{"serial_number":"ST-00055227","type":"device_status","hub_sn":"HB-00069665","timestamp":1643246016,"uptime":106625,"voltage":2.683,"firmware_revision":165,"rssi":-72,"hub_rssi":-66,"sensor_status":655364,"debug":0}
{"serial_number":"ST-00055227","type":"obs_st","hub_sn":"HB-00069665","obs":[[1643246016,0.00,0.00,0.00,0,3,1024.56,-12.82,47.84,0,0.00,0,0.000000,0,0,0,2.683,1]],"firmware_revision":165}

I caught this post on the The Surprising Number Of Programmers Who Can’t Program from the Hacker News RSS feed. Said post links to another, classic post on the same subject and you should read both before continuing.

Back? Great! Let’s dig in.

Why does hrbrmstr care about this?

Offspring #3 completed his Freshman year at UMaine Orono last year but wanted to stay academically active over the summer (he’s majoring in astrophysics and knows he’ll need some programming skills to excel in his field) and took an introductory C++ course from UMaine that was held virtually, with 1 lecture per week (14 weeks IIRC) and 1 assignment due per week with no other grading.

After seeing what passes for a standard (UMaine is not exactly on the top list of institutions to attend if one wants to be a computer scientist) intro C++ course, I’m not really surprised “Johnny can’t code”. Thirteen weeks in the the class finally started covering OO concepts, and the course is ending with a scant intro to polymorphism. Prior to this, most of the assignments were just variations on each other (read from stdin, loop with conditionals, print output) with no program going over 100 LoC (that includes comments and spacing). This wasn’t a “compsci for non-compsci majors” course, either. Anyone majoring in an area of study that requires programming could have taken this course to fulfill one of the requirements, and they’d be set on a path of forever using StackOverflow copypasta to try to get their future work done.

I’m fairly certain most of #3’s classmates could not program fizzbuzz without googling and even more certain most have no idea they weren’t really “coding in C++” most of the course.

If this is how most other middling colleges are teaching the basics of computer programming, it’s no wonder employers are having a difficult time finding qualified talent.

You have an “R” tag — actually, a few language tags — on this post, so where’s the code?

After the article triggered the lament in the previous section, a crazy, @coolbutuseless-esque thought came into my head: “I wonder how many different language FizzBuz solutions can be created from within R?”.

The criteria for that notion is/was that there needed to be some Rcpp::cppFunction(), reticulate::py_run_string(), V8 context eval()-type way to have the code in-R but then run through those far-super-to-any-other-language’s polyglot extensibility constructs.

Before getting lost in the weeds, there were some other thoughts on language inclusion:

  • Should Java be included? I :heart: {rJava}, but cat()-ing Java code out and running system() to compile it first seemed like cheating (even though that’s kinda just what cppFunction() does). Toss a note into a comment if you think a Java example should be added (or add said Java example in a comment or link to it in one!).
  • I think Julia should be in this example list but do not care enough about it to load {JuliaCall} and craft an example (again, link or post one if you can crank it out quickly).
  • I think Lua could be in this example given the existence of {luar}. If you agree, give it a go!
  • Go & Rust compiled code can also be called in R (thanks to Romain & Jeroen) once they’re turned into C-compatible libraries. Should this polyglot example show this as well?
  • What other languages am I missing?

The aforementioned “weeds”

One criteria for each language fizzbuzz example is that they need to be readable, not hacky-cool. That doesn’t mean the solutions still can’t be a bit creative. We’ll lightly go through each one I managed to code up. First we’ll need some helpers:

suppressPackageStartupMessages({
  library(purrr)
  library(dplyr)
  library(reticulate)
  library(V8)
  library(Rcpp)
})

The R, JavaScript, and Python implementations are all in the microbenchmark() call way down below. Up here are C and C++ versions. The C implementation is boring and straightforward, but we’re using Rprintf() so we can capture the output vs have any output buffering woes impact the timings.

cppFunction('
void cbuzz() {

  // super fast plain C

  for (unsigned int i=1; i<=100; i++) {
    if      (i % 15 == 0) Rprintf("FizzBuzz\\n");
    else if (i %  3 == 0) Rprintf("Fizz\\n");
    else if (i %  5 == 0) Rprintf("Buzz\\n");
    else Rprintf("%d\\n", i);
  }

}
')

The cbuzz() example is just fine even in C++ land, but we can take advantage of some C++11 vectorization features to stay formally in C++-land and play with some fun features like lambdas. This will be a bit slower than the C version plus consume more memory, but shows off some features some folks might not be familiar with:

cppFunction('
void cppbuzz() {

  std::vector<int> numbers(100); // will eventually be 1:100
  std::iota(numbers.begin(), numbers.end(), 1); // kinda sorta equiva of our R 1:100 but not exactly true

  std::vector<std::string> fb(100); // fizzbuzz strings holder

  // transform said 1..100 into fizbuzz strings
  std::transform(
    numbers.begin(), numbers.end(), 
    fb.begin(),
    [](int i) -> std::string { // lambda expression are cool like a fez
        if      (i % 15 == 0) return("FizzBuzz");
        else if (i %  3 == 0) return("Fizz");
        else if (i %  5 == 0) return("Buzz");
        else return(std::to_string(i));
    }
  );

  // round it out with use of for_each and another lambda
  // this turns out to be slightly faster than range-based for-loop
  // collection iteration syntax.
  std::for_each(
    fb.begin(), fb.end(), 
    [](std::string s) { Rcout << s << std::endl; }
  );

}
', 
plugins = c('cpp11'))

Both of those functions are now available to R.

Next, we need to prepare to run JavaScript and Python code, so we’ll initialize both of those environments:

ctx <- v8()

py_config() # not 100% necessary but I keep my needed {reticulate} options in env vars for reproducibility

Then, we tell R to capture all the output. Using sink() is a bit better than capture.output() in this use-case since to avoid nesting calls, and we need to handle Python stdout the same way py_capture_output() does to be fair in our measurements:

output_tools <- import("rpytools.output")
restore_stdout <- output_tools$start_stdout_capture()

cap <- rawConnection(raw(0), "r+")
sink(cap)

There are a few implementations below across the tidy and base R multiverse. Some use vectorization; some do not. This will let us compare overall “speed” of solution. If you have another suggestion for a readable solution in R, drop a note in the comments:

microbenchmark::microbenchmark(

  # tidy_vectors_case() is slowest but you get all sorts of type safety 
  # for free along with very readable idioms.

  tidy_vectors_case = map_chr(1:100, ~{ 
    case_when(
      (.x %% 15 == 0) ~ "FizzBuzz",
      (.x %%  3 == 0) ~ "Fizz",
      (.x %%  5 == 0) ~ "Buzz",
      TRUE ~ as.character(.x)
    )
  }) %>% 
    cat(sep="\n"),

  # tidy_vectors_if() has old-school if/else syntax but still
  # forces us to ensure type safety which is cool.

  tidy_vectors_if = map_chr(1:100, ~{ 
    if (.x %% 15 == 0) return("FizzBuzz")
    if (.x %%  3 == 0) return("Fizz")
    if (.x %%  5 == 0) return("Buzz")
    return(as.character(.x))
  }) %>% 
    cat(sep="\n"),

  # walk() just replaces `for` but stays in vector-land which is cool

  tidy_walk = walk(1:100, ~{
    if (.x %% 15 == 0) cat("FizzBuzz\n")
    if (.x %%  3 == 0) cat("Fizz\n")
    if (.x %%  5 == 0) cat("Buzz\n")
    cat(.x, "\n", sep="")
  }),

  # vapply() gets us some similiar type assurance, albeit with arcane syntax

  base_proper = vapply(1:100, function(.x) {
    if (.x %% 15 == 0) return("FizzBuzz")
    if (.x %%  3 == 0) return("Fizz")
    if (.x %%  5 == 0) return("Buzz")
    return(as.character(.x))
  }, character(1), USE.NAMES = FALSE) %>% 
    cat(sep="\n"),

  # sapply() is def lazy but this can outperform vapply() in some
  # circumstances (like this one) and is a bit less arcane.

  base_lazy = sapply(1:100, function(.x) {
    if (.x %% 15 == 0)  return("FizzBuzz")
    if (.x %%  3 == 0) return("Fizz")
    if (.x %%  5 == 0) return("Buzz")
    return(.x)
  }, USE.NAMES = FALSE) %>% 
    cat(sep="\n"),

  # for loops...ugh. might as well just use C

  base_for = for(.x in 1:100) {
    if      (.x %% 15 == 0) cat("FizzBuzz\n")
    else if (.x %%  3 == 0) cat("Fizz\n")
    else if (.x %%  5 == 0) cat("Buzz\n")
    else cat(.x, "\n", sep="")
  },

  # ok, we'll just use C!

  c_buzz = cbuzz(),

  # we can go back to vector-land in C++

  cpp_buzz = cppbuzz(),

  # some <3 for javascript

  js_readable = ctx$eval('
for (var i=1; i <101; i++){
  if      (i % 15 == 0) console.log("FizzBuzz")
  else if (i %  3 == 0) console.log("Fizz")
  else if (i %  5 == 0) console.log("Buzz")
  else console.log(i)
}
'),

  # icky readable, non-vectorized python

  python = reticulate::py_run_string('
for x in range(1, 101):
  if (x % 15 == 0):
    print("Fizz Buzz")
  elif (x % 5 == 0):
    print("Buzz")
  elif (x % 3 == 0):
    print("Fizz")
  else:
    print(x)
')

) -> res

Turn off output capturing:

sink()
if (!is.null(restore_stdout)) invisible(output_tools$end_stdout_capture(restore_stdout))

We used microbenchmark(), so here are the results:

res
## Unit: microseconds
##               expr       min         lq        mean     median         uq       max neval   cld
##  tidy_vectors_case 20290.749 21266.3680 22717.80292 22231.5960 23044.5690 33005.960   100     e
##    tidy_vectors_if   457.426   493.6270   540.68182   518.8785   577.1195   797.869   100  b   
##          tidy_walk   970.455  1026.2725  1150.77797  1065.4805  1109.9705  8392.916   100   c  
##        base_proper   357.385   375.3910   554.13973   406.8050   450.7490 13907.581   100  b   
##          base_lazy   365.553   395.5790   422.93719   418.1790   444.8225   587.718   100 ab   
##           base_for   521.674   545.9155   576.79214   559.0185   584.5250   968.814   100  b   
##             c_buzz    13.538    16.3335    18.18795    17.6010    19.4340    33.134   100 a    
##           cpp_buzz    39.405    45.1505    63.29352    49.1280    52.9605  1265.359   100 a    
##        js_readable   107.015   123.7015   162.32442   174.7860   187.1215   270.012   100 ab   
##             python  1581.661  1743.4490  2072.04777  1884.1585  1985.8100 12092.325   100    d 

Said results are 🤷🏻‍♀️ since this is a toy example, but I wanted to show that Jeroen’s {V8} can be super fast, especially when there’s no value marshaling to be done and that some things you may have thought should be faster, aren’t.

FIN

Definitely add links or code for changes or additions (especially the aforementioned other languages). Hopefully my lament about the computer science program at UMaine is not universally true for all the programming courses there.

Over at $DAYJOB’s blog I’ve queued up a post that shows how to use our new ropendata? package to work with our Open Data portal’s API. I’m not super-sure when it’s going to be posted so keep an RSS reader fixed on https://blog.rapid7.com/ if you’re interested in seeing it (I may make a small note of it here so it can wind its way into R Weekly & R-bloggers).

The example data used in the post is the public version of what I talked about in a recent post here, namely the devices discovered exposing the Ubiquity Discovery Protocol.

I’m quite blessed at work since we have virtually all of our icky payload data pre-processed and in parquet map columns in Athena so I don’t really have to do much data wrangling once we’ve fully baked a new study.

The format of the public data for the Ubiquiti discovery protocol scan results is a bit different than the base64 encoded data in the previous post in that the payload response is a hex-encoded character string; e.g.

0100009302000a002722bccf9db126fa9a02000a002722bdcf9dc0a80101010006002722bccf9d0a000400006ae40b000c626a732e6572656e696c646f0c00064147352d48500d00104d6f72726f5f446f757261646f5f30330e000102030022584d2e6172373234302e76352e362e332e32383539312e3135313133302e31373439100002e24514000d41697247726964204d35204850

So, every two characters is a byte (e.g. "01" is 0x01).

R has a nice strtoi() function for converting a hex-encoded byte into a raw value but it only works for one byte. We can split a string (like the one above) into a character vector of length 2 hex strings in many ways, one of which is using helper functions from the stringi package:

library(stringi)
library(magrittr) # for %>%

x <- "0100009302000a002722bccf9db126fa9a02000a002722bdcf9dc0a80101010006002722bccf9d0a000400006ae40b000c626a732e6572656e696c646f0c00064147352d48500d00104d6f72726f5f446f757261646f5f30330e000102030022584d2e6172373234302e76352e362e332e32383539312e3135313133302e31373439100002e24514000d41697247726964204d35204850"

stri_sub(x, seq(1, stri_length(x), by = 2), length = 2)
##   [1] "01" "00" "00" "93" "02" "00" "0a" "00" "27" "22" "bc" "cf" "9d" "b1" "26" "fa" "9a"
##  [18] "02" "00" "0a" "00" "27" "22" "bd" "cf" "9d" "c0" "a8" "01" "01" "01" "00" "06" "00"
##  [35] "27" "22" "bc" "cf" "9d" "0a" "00" "04" "00" "00" "6a" "e4" "0b" "00" "0c" "62" "6a"
##  [52] "73" "2e" "65" "72" "65" "6e" "69" "6c" "64" "6f" "0c" "00" "06" "41" "47" "35" "2d"
##  [69] "48" "50" "0d" "00" "10" "4d" "6f" "72" "72" "6f" "5f" "44" "6f" "75" "72" "61" "64"
##  [86] "6f" "5f" "30" "33" "0e" "00" "01" "02" "03" "00" "22" "58" "4d" "2e" "61" "72" "37"
## [103] "32" "34" "30" "2e" "76" "35" "2e" "36" "2e" "33" "2e" "32" "38" "35" "39" "31" "2e"
## [120] "31" "35" "31" "31" "33" "30" "2e" "31" "37" "34" "39" "10" "00" "02" "e2" "45" "14"
## [137] "00" "0d" "41" "69" "72" "47" "72" "69" "64" "20" "4d" "35" "20" "48" "50"

We still need to run that through strtoi() and turn it into a raw vector (at least for this use-case):

stri_sub(x, seq(1, stri_length(x), by = 2), length = 2) %>%
  strtoi(base = 16) %>%
  as.raw()
##   [1] 01 00 00 93 02 00 0a 00 27 22 bc cf 9d b1 26 fa 9a 02 00 0a 00 27 22 bd cf 9d c0 a8 01
##  [30] 01 01 00 06 00 27 22 bc cf 9d 0a 00 04 00 00 6a e4 0b 00 0c 62 6a 73 2e 65 72 65 6e 69
##  [59] 6c 64 6f 0c 00 06 41 47 35 2d 48 50 0d 00 10 4d 6f 72 72 6f 5f 44 6f 75 72 61 64 6f 5f
##  [88] 30 33 0e 00 01 02 03 00 22 58 4d 2e 61 72 37 32 34 30 2e 76 35 2e 36 2e 33 2e 32 38 35
## [117] 39 31 2e 31 35 31 31 33 30 2e 31 37 34 39 10 00 02 e2 45 14 00 0d 41 69 72 47 72 69 64
## [146] 20 4d 35 20 48 50

On one of my systems, an individual use of that full processing pipeline with the sample string takes about 170μs which is not bad. But, what if we have half a million of them (as was the case with the blog post for work)? I mean, sure, it’s only about a minute and a half of processing time (with some variance as each bit of input will be of different lengths), but that’s a painful interactive 1.5 minutes and we still need to wrap that bit of code in a function with some vectorization so it can be used easily.

This is a good example of where the complexity introduced by using a little C++/Rcpp may be warranted, especially since the BH package—which brings us a ton of capabilities from the Boost C++ library—has some handy string utilities, including an boost::algorithm::unhex() function.

Here’s one way to attack the problem in C++/Rcpp within a plain ol’ R session:

library(Rcpp)

cppFunction(depends = "BH", '
  List dehexify_cpp(StringVector input) {

    List out(input.size()); // make room for our return value

    for (unsigned int i=0; i<input.size(); i++) { // iterate over the input 

      if (StringVector::is_na(input[i]) || (input[i].size() == 0)) {
        out[i] = StringVector::create(NA_STRING); // bad input
      } else if (input[i].size() % 2 == 0) { // likey to be ok input

        RawVector tmp(input[i].size() / 2); // only need half the space
        std::string h = boost::algorithm::unhex(Rcpp::as<std::string>(input[i])); // do the work
        std::copy(h.begin(), h.end(), tmp.begin()); // copy it to our raw vector

        out[i] = tmp; // save it to the List

      } else {
        out[i] =  StringVector::create(NA_STRING); // bad input
      }

    }

    return(out);

  }
', includes = c('#include <boost/algorithm/hex.hpp>')
)

Now, we have a dehexify_cpp() function in our environment, so we can use it on any valid R data. Let’s see if we get the same results as the stringi R version:

dehexify_cpp(x)
## [[1]]
##   [1] 01 00 00 93 02 00 0a 00 27 22 bc cf 9d b1 26 fa 9a 02 00 0a 00 27 22 bd cf 9d c0 a8 01
##  [30] 01 01 00 06 00 27 22 bc cf 9d 0a 00 04 00 00 6a e4 0b 00 0c 62 6a 73 2e 65 72 65 6e 69
##  [59] 6c 64 6f 0c 00 06 41 47 35 2d 48 50 0d 00 10 4d 6f 72 72 6f 5f 44 6f 75 72 61 64 6f 5f
##  [88] 30 33 0e 00 01 02 03 00 22 58 4d 2e 61 72 37 32 34 30 2e 76 35 2e 36 2e 33 2e 32 38 35
## [117] 39 31 2e 31 35 31 31 33 30 2e 31 37 34 39 10 00 02 e2 45 14 00 0d 41 69 72 47 72 69 64
## [146] 20 4d 35 20 48 50

Apart from it being a list (since we took care of vectorization at the same time) it is, indeed, the same data.

With that tiny bit of fairly straightforward Rcpp/C++ code we get a substantially faster execution time of around 4μs. Yep, that’s not a typo: four microseconds.

We’ll give it a real world test with the payload data from work:

# This assumes you have a "~/Data" directory. Put it somewhere
# else if you don't have a "~/Data" directory.

if (!file.exists("~/Data/dehexify-sample-data.txt.gz")) {
  download.file(
    url = "https://rud.is/dl/dehexify-sample-data.txt.gz", 
    destfile = "~/Data/dehexify-sample-data.txt.gz"
  )
}

char_hex_lines <- readr::read_lines("~/Data/dehexify-sample-data.txt.gz")

length(char_hex_lines)
## [1] 501926

res <- dehexify_cpp(char_hex_lines)

That took just over a second to run on my main development system. But, did it really work? I chose index 998 at random so let’s poke at it with the tool from the other blog post:

udpprobe::parse_ubnt_discovery_response(res[[998]])
## [Model: N5N; Firmware: XW.ar934x.v5.5.9.21734.140403.1801; Uptime: 13.1 (hrs)

Aye, it did, indeed, work.

FIN

It’s still early in 2019 and if you haven’t settled on any resolutions yet or want to substitute out one that isn’t working so well (who wants to drive to the gym anyway?) with another, perhaps add “experiment with Rcpp” to the list since a tiny dose of it can go a very long way into speeding up some tasks.