Command Line Skimmer

The {skimr} package is great for getting a comprehensive overview of your data. If you are not familiar with it, check out this vignette before continuing.

Let’s build a skimr CLI tool that will let folks skim a CSV file.

Path Fulfilled

The source for our CLI tool is in the support/ch-10/webr-skim directory. It looks much like the boilerplate we’ve been using, so we’ll be making it easier to start a WebR CLI project in the next chapter.

You will need to pkgtrap skimr readr as they are dependent packages (this is included in the sub-project’s justfile).

We need to be able to tell readr::read_csv where to find a filename the caller inputs. To do that, we’ll perform some directory/file operations in JavaScript before handing off the work to WebR/skimr:

import * as path from 'path'
// …
const fullPathToFile = path.resolve(filename);
const fullDirToFile = path.dirname(fullPathToFile);
const justFilename = path.basename(fullPathToFile);
//…
await makeAndMount(webR, fullDirToFile, '/input')
//…
await webR.objs.globalEnv.bind('input_file', justFilename)

Wide Load

The embedded R that powers our CLI knows not of the width of the terminal in a WebR context, so we’ll need to use some JavaScript to provide that information.

The terminal-size package provides a robust, cross-platform way to determine the width (and, height) of a terminal. We’ll use it to populate the width option in R:

import { default as terminalSize } from 'terminal-size';
//…
const { columns, rows } = terminalSize();
await webR.evalRVoid(`options(width=${columns})`)

Skimming Data

The core R bits that power this are small enough to use without sourcing an extermal, package-local R script:

input_csv <- suppressMessages(readr::read_csv(file.path("/input", input_file)))

skimr::skim(input_csv, .data_name = input_file) |>
  print()

I did an npm install -g . in the package directory, so let’s do a test run with a sample CSV file I included with the project:

skimr static/blueberries.csv
── Data Summary ────────────────────────
                           Values         
Name                       blueberries.csv
Number of rows             32             
Number of columns          7              
_______________________                   
Column type frequency:                    
  character                1              
  numeric                  6              
________________________                  
Group variables            None           

── Variable type: character ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  skim_variable n_missing complete_rate min max empty n_unique whitespace
1 country               0             1   4  18     0       32          0

── Variable type: numeric ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  skim_variable            n_missing complete_rate      mean       sd      p0      p25      p50       p75      p100 hist 
1 production_tons                  0             1 26050.    65253.    40      268.    1075     11060     308760    ▇▁▁▁▁
2 production_per_person_kg         0             1     0.527     1.13   0.002    0.024    0.106     0.455      4.73 ▇▁▁▁▁
3 acreage_hectare                  0             1  3816.    10026.    19       80      250      2055.     41560    ▇▁▁▁▁
4 yield_kg_hectare                 0             1  5171.     3652.   922.    2575     4613.     5985.     16752.   ▇▇▂▁▁
5 lat                              0             1    40.3      24.6  -41.3     41.1     47.2      52.4       60.2  ▁▁▁▁▇
6 lng                              0             1    13.3      53.2  -99.1      3.86    15.2      24.3      175.   ▂▃▇▁▁

I’ve also published skimr on NPM, so you can do:

$ npm install -g skimr

To run this new command line utility as well.

Things To Try

  • Allow for skimming multiple files
  • Add support for skimming JSON files
  • Let callers use a --width parameter to specify the output width
  • The {skimr} package has tons more functionality, and the skimr::skim function also takes some parameters. Consider adding support for that.

More Information

The {summarytools} package also has some solid EDA tooling. Consider fusing it with the skimr Node package and adding more options.

Next Up

You’ve either been playing along with the source from this book, or have been duplicating directories to bootstrap WebR CLI projects. We’ll look at a new WebR CLI bootstrap utility that will help jumpstart new WebR CLI projects.