import * as path from 'path'
// …
const fullPathToFile = path.resolve(filename);
const fullDirToFile = path.dirname(fullPathToFile);
const justFilename = path.basename(fullPathToFile);
//…
await makeAndMount(webR, fullDirToFile, '/input')
//…
await webR.objs.globalEnv.bind('input_file', justFilename)
Command Line Skimmer
The {skimr} package is great for getting a comprehensive overview of your data. If you are not familiar with it, check out this vignette before continuing.
Let’s build a skimr
CLI tool that will let folks skim a CSV file.
Path Fulfilled
The source for our CLI tool is in the support/ch-10/webr-skim
directory. It looks much like the boilerplate we’ve been using, so we’ll be making it easier to start a WebR CLI project in the next chapter.
You will need to pkgtrap skimr readr
as they are dependent packages (this is included in the sub-project’s justfile
).
We need to be able to tell readr::read_csv
where to find a filename the caller inputs. To do that, we’ll perform some directory/file operations in JavaScript before handing off the work to WebR/skimr:
Wide Load
The embedded R that powers our CLI knows not of the width of the terminal in a WebR context, so we’ll need to use some JavaScript to provide that information.
The terminal-size
package provides a robust, cross-platform way to determine the width (and, height) of a terminal. We’ll use it to populate the width
option in R:
import { default as terminalSize } from 'terminal-size';
//…
const { columns, rows } = terminalSize();
await webR.evalRVoid(`options(width=${columns})`)
Skimming Data
The core R bits that power this are small enough to use without sourcing an extermal, package-local R script:
<- suppressMessages(readr::read_csv(file.path("/input", input_file)))
input_csv
::skim(input_csv, .data_name = input_file) |>
skimrprint()
I did an npm install -g .
in the package directory, so let’s do a test run with a sample CSV file I included with the project:
skimr static/blueberries.csv
── Data Summary ────────────────────────
Values
Name blueberries.csv
Number of rows 32
Number of columns 7
_______________________
Column type frequency:
character 1
numeric 6
________________________
Group variables None
── Variable type: character ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
skim_variable n_missing complete_rate min max empty n_unique whitespace
1 country 0 1 4 18 0 32 0
── Variable type: numeric ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
1 production_tons 0 1 26050. 65253. 40 268. 1075 11060 308760 ▇▁▁▁▁
2 production_per_person_kg 0 1 0.527 1.13 0.002 0.024 0.106 0.455 4.73 ▇▁▁▁▁
3 acreage_hectare 0 1 3816. 10026. 19 80 250 2055. 41560 ▇▁▁▁▁
4 yield_kg_hectare 0 1 5171. 3652. 922. 2575 4613. 5985. 16752. ▇▇▂▁▁
5 lat 0 1 40.3 24.6 -41.3 41.1 47.2 52.4 60.2 ▁▁▁▁▇
6 lng 0 1 13.3 53.2 -99.1 3.86 15.2 24.3 175. ▂▃▇▁▁
I’ve also published skimr
on NPM, so you can do:
$ npm install -g skimr
To run this new command line utility as well.
Things To Try
- Allow for skimming multiple files
- Add support for skimming JSON files
- Let callers use a
--width
parameter to specify the output width - The {skimr} package has tons more functionality, and the
skimr::skim
function also takes some parameters. Consider adding support for that.
More Information
The {summarytools} package also has some solid EDA tooling. Consider fusing it with the skimr
Node package and adding more options.
Next Up
You’ve either been playing along with the source from this book, or have been duplicating directories to bootstrap WebR CLI projects. We’ll look at a new WebR CLI bootstrap utility that will help jumpstart new WebR CLI projects.