Introduction

A journey worthy of embarcation demands preparation. The destination must be idenfied; supplies must be gathered; maps must be made; and, courses must be charted. Let’s go over some foundational elements that should be in place if you are going to get the most out of this expedition. You’ll be well-prepared for what is to come once you’ve examined or completed each of the following sections.

0.1 Curiosity

By reading this book you’re already a bit curious as to how to make ggplot2 extensions. Don’t stop there. As you make your way through examples and start to forge your own creations try to foster a mindset of exploration by constantly asking questions and then finding answers. Parts of various chapters will try to help you develop this mindset along the way, but it is best to never assume anything nor just transcribe/run code without working out at least how it works and why it works the way it does.

0.2 Minimum Viable R Setup

While it may go without saying that you need R and ggplot2, you will also — at some point — end up creating a package that wraps up your new ggplot2 extension. It is highly suggested that you ensure you have the components outlined in the Getting Started section of Hadley Wickham’s R Packages book setup and working before that section.

0.3 RStudio

There is a high probability that you are already an RStudio user. If not, then you should take this opportunity to explore the features and capabilities of this integrated development environment (IDE) for R.

RStudio is not, per se, required. Common alternatives to RStudio include Submlime Text, Atom, R Tools for Visual Studio, and jupyter notebooks. You can create ggplot2 extensions in any of those environments, however, there are references to RStudio features and operations throughout the book and you are on your own when it comes to translating those references to your own R coding environment idioms.

0.4 Command-line Familiarity

Graphical user interfaces (GUIs) and IDEs are great, but some things are better done (and done faster) on the command-line. Newer versions of RStudio come with a built-in terminal. Unless you’re already familiar and comfortable with native Windows, macOS or Linux-ish terminals you should use the RStudio terminal pane whenever you see suggestions to type some command at the command-line.

0.5 git

You will be making extensive use of the git version control system and GitHub (a web-based hosting service for version control using git). Jenny Bryan has an excellent git resource that you may want to keep handy as you work through the various examples and tutorials in this book. In fact, you’re about to use it a bit later in this introductory chapter.

0.6 GitHub

A GitHub account is not 100% necessary, but you will be referred to it many times and should consider publishing your creations there since it is free and has a large nexus of R users and developers.

0.7 Organization

The examples in the book will move (quickly) from small functions in single R files to many functions spread throughout many R files (which all eventually get wrapped up into a package). It is essential that you are very comforatble navigating across files and folders (both in a GUI and at the command-line) and work in a “project” mindset. That may seem obvious to some readers, but you may be surprised at just how unfamiliar this topic can be to those moving from other ecosystems into R.

RStudio has the concept of a project. Projects are nothing more than a directory with a special project-name.Rproj file in them (you can even open any .Rproj file with a text editor to see the settings that are stored there). Good projects are also created with source code control enabled to ensure you can save your work in stages, revert to previous versions of your code and collaborate with others by syncing your local code directory up to sites like GitHub.

Getting into a project mindset up-front also helps prepare you for the transition from code encapsulated in a project directory to shipping spiffy functions in a bona fide package.

It is recommended that you — at a minimum — have:

  • a projects directory where you organize your projects (those being in subdirectories)
  • a packages directory where you organize packages you write (again, in subdirectories)
  • a references directory where you keep copies of useful projects or packages written by others that you use for reference

Those can all be individual directories underneath your home directory or stored in some other filesystem area that’s convenient and familiar to you. Note that various examples in the book will refer to these projects, packages and references directories from time-to-time.

Ultimately, choose an organizational system that works best for you (but do choose one!).

0.8 A local copy of ggplot2 package source

Yes, you read that section header correctly: you should keep a copy of the ggplot2 package source code locally.

Why?

You are going to want (truthfully, need) to examine how existig Geoms, Stats, Coords (et al) are made as you manifest your own creations. While you could just reference the ggplot2 code on GitHub when needed, it is much easier to have the ggplot2 package up in one RStudio window (session) while working on your own project in another.

Assuming you’ve made the aforementioned references directory, use the RStudio terminal to change the current working to it and type the following in it:

or

You’ll see an .Rproj file in there which you can open with RStudio. You should do that now.

Inside that package project will be an R directory which contains all the magic behind ggplot2. You’ll be referencing 4 files quite a bit (at least initially):

  • geom-.r, which contains the foundational code for the ggplot2 object that is responsible for rendering data in plots (e.g. geom_point())
  • stat-.r, which also contains foundational code for rendering data in plots but is also logically responsible for performaing statistical or basic computational transformation on data that is passed in to ggplot2 (e.g. stat_identity() which is a Stat you rarely consciously see typed out but use all the time)
  • scale.r, which contains the basis creating Scale objects which convert from data values to perceptual properties and influences the creation of guides (legends and axes)
  • coord-.r, which has the core components for building coordinate systems which take inputs that then determine the position of points, lines or other geometric elements on the canvas (e.g.coord_cartesian() which is a Coord you rarey consciously see typed out but use all the time)

Other ggplot2 source files will also be referenced as you progress through the chapters.

The more time you spend getting to know ggplot2 internally, the easier it will be to make more complex extensions.