import * as path from 'path'
// …
const fullPathToFile = path.resolve(filename);
const fullDirToFile = path.dirname(fullPathToFile);
const justFilename = path.basename(fullPathToFile);
//…
await makeAndMount(webR, fullDirToFile, '/input')
//…
await webR.objs.globalEnv.bind('input_file', justFilename)Command Line Skimmer
The {skimr} package is great for getting a comprehensive overview of your data. If you are not familiar with it, check out this vignette before continuing.
Let’s build a skimr CLI tool that will let folks skim a CSV file.
Path Fulfilled
The source for our CLI tool is in the support/ch-10/webr-skim directory. It looks much like the boilerplate we’ve been using, so we’ll be making it easier to start a WebR CLI project in the next chapter.
You will need to pkgtrap skimr readr as they are dependent packages (this is included in the sub-project’s justfile).
We need to be able to tell readr::read_csv where to find a filename the caller inputs. To do that, we’ll perform some directory/file operations in JavaScript before handing off the work to WebR/skimr:
Wide Load
The embedded R that powers our CLI knows not of the width of the terminal in a WebR context, so we’ll need to use some JavaScript to provide that information.
The terminal-size package provides a robust, cross-platform way to determine the width (and, height) of a terminal. We’ll use it to populate the width option in R:
import { default as terminalSize } from 'terminal-size';
//…
const { columns, rows } = terminalSize();
await webR.evalRVoid(`options(width=${columns})`)Skimming Data
The core R bits that power this are small enough to use without sourcing an extermal, package-local R script:
input_csv <- suppressMessages(readr::read_csv(file.path("/input", input_file)))
skimr::skim(input_csv, .data_name = input_file) |>
print()I did an npm install -g . in the package directory, so let’s do a test run with a sample CSV file I included with the project:
skimr static/blueberries.csv── Data Summary ────────────────────────
Values
Name blueberries.csv
Number of rows 32
Number of columns 7
_______________________
Column type frequency:
character 1
numeric 6
________________________
Group variables None
── Variable type: character ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
skim_variable n_missing complete_rate min max empty n_unique whitespace
1 country 0 1 4 18 0 32 0
── Variable type: numeric ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
1 production_tons 0 1 26050. 65253. 40 268. 1075 11060 308760 ▇▁▁▁▁
2 production_per_person_kg 0 1 0.527 1.13 0.002 0.024 0.106 0.455 4.73 ▇▁▁▁▁
3 acreage_hectare 0 1 3816. 10026. 19 80 250 2055. 41560 ▇▁▁▁▁
4 yield_kg_hectare 0 1 5171. 3652. 922. 2575 4613. 5985. 16752. ▇▇▂▁▁
5 lat 0 1 40.3 24.6 -41.3 41.1 47.2 52.4 60.2 ▁▁▁▁▇
6 lng 0 1 13.3 53.2 -99.1 3.86 15.2 24.3 175. ▂▃▇▁▁
I’ve also published skimr on NPM, so you can do:
$ npm install -g skimrTo run this new command line utility as well.
Things To Try
- Allow for skimming multiple files
- Add support for skimming JSON files
- Let callers use a
--widthparameter to specify the output width - The {skimr} package has tons more functionality, and the
skimr::skimfunction also takes some parameters. Consider adding support for that.
More Information
The {summarytools} package also has some solid EDA tooling. Consider fusing it with the skimr Node package and adding more options.
Next Up
You’ve either been playing along with the source from this book, or have been duplicating directories to bootstrap WebR CLI projects. We’ll look at a new WebR CLI bootstrap utility that will help jumpstart new WebR CLI projects.