Introduction
A journey worthy of embarcation demands preparation. The destination must be idenfied; supplies must be gathered; maps must be made; and, courses must be charted. Let’s go over some foundational elements that should be in place if you are going to get the most out of this expedition. You’ll be well-prepared for what is to come once you’ve examined or completed each of the following sections.
0.1 Curiosity
By reading this book you’re already a bit curious as to how to make ggplot2
extensions. Don’t stop there. As you make your way through examples and start to forge your own creations try to foster a mindset of exploration by constantly asking questions and then finding answers. Parts of various chapters will try to help you develop this mindset along the way, but it is best to never assume anything nor just transcribe/run code without working out at least how it works and why it works the way it does.
0.2 Minimum Viable R Setup
While it may go without saying that you need R and ggplot2
, you will also — at some point — end up creating a package that wraps up your new ggplot2
extension. It is highly suggested that you ensure you have the components outlined in the Getting Started section of Hadley Wickham’s R Packages book setup and working before that section.
0.3 RStudio
There is a high probability that you are already an RStudio user. If not, then you should take this opportunity to explore the features and capabilities of this integrated development environment (IDE) for R.
RStudio is not, per se, required. Common alternatives to RStudio include Submlime Text, Atom, R Tools for Visual Studio, and jupyter notebooks. You can create ggplot2
extensions in any of those environments, however, there are references to RStudio features and operations throughout the book and you are on your own when it comes to translating those references to your own R coding environment idioms.
0.4 Command-line Familiarity
Graphical user interfaces (GUIs) and IDEs are great, but some things are better done (and done faster) on the command-line. Newer versions of RStudio come with a built-in terminal. Unless you’re already familiar and comfortable with native Windows, macOS or Linux-ish terminals you should use the RStudio terminal pane whenever you see suggestions to type some command at the command-line.
0.5 git
You will be making extensive use of the git
version control system and GitHub (a web-based hosting service for version control using git
). Jenny Bryan has an excellent git
resource that you may want to keep handy as you work through the various examples and tutorials in this book. In fact, you’re about to use it a bit later in this introductory chapter.
0.6 GitHub
A GitHub account is not 100% necessary, but you will be referred to it many times and should consider publishing your creations there since it is free and has a large nexus of R users and developers.
0.7 Organization
The examples in the book will move (quickly) from small functions in single R files to many functions spread throughout many R files (which all eventually get wrapped up into a package). It is essential that you are very comforatble navigating across files and folders (both in a GUI and at the command-line) and work in a “project” mindset. That may seem obvious to some readers, but you may be surprised at just how unfamiliar this topic can be to those moving from other ecosystems into R.
RStudio has the concept of a project. Projects are nothing more than a directory with a special project-name.Rproj
file in them (you can even open any .Rproj
file with a text editor to see the settings that are stored there). Good projects are also created with source code control enabled to ensure you can save your work in stages, revert to previous versions of your code and collaborate with others by syncing your local code directory up to sites like GitHub.
Getting into a project mindset up-front also helps prepare you for the transition from code encapsulated in a project directory to shipping spiffy functions in a bona fide package.
It is recommended that you — at a minimum — have:
- a
projects
directory where you organize your projects (those being in subdirectories) - a
packages
directory where you organize packages you write (again, in subdirectories) - a
references
directory where you keep copies of useful projects or packages written by others that you use for reference
Those can all be individual directories underneath your home directory or stored in some other filesystem area that’s convenient and familiar to you. Note that various examples in the book will refer to these projects
, packages
and references
directories from time-to-time.
Ultimately, choose an organizational system that works best for you (but do choose one!).
0.8 A local copy of ggplot2
package source
Yes, you read that section header correctly: you should keep a copy of the ggplot2
package source code locally.
Why?
You are going to want (truthfully, need) to examine how existig Geom
s, Stat
s, Coord
s (et al) are made as you manifest your own creations. While you could just reference the ggplot2
code on GitHub when needed, it is much easier to have the ggplot2
package up in one RStudio window (session) while working on your own project in another.
Assuming you’ve made the aforementioned references
directory, use the RStudio terminal to change the current working to it and type the following in it:
or
You’ll see an .Rproj
file in there which you can open with RStudio. You should do that now.
Inside that package project will be an R
directory which contains all the magic behind ggplot2
. You’ll be referencing 4 files quite a bit (at least initially):
geom-.r
, which contains the foundational code for theggplot2
object that is responsible for rendering data in plots (e.g.geom_point()
)stat-.r
, which also contains foundational code for rendering data in plots but is also logically responsible for performaing statistical or basic computational transformation on data that is passed in to ggplot2 (e.g.stat_identity()
which is aStat
you rarely consciously see typed out but use all the time)scale.r
, which contains the basis creatingScale
objects which convert from data values to perceptual properties and influences the creation of guides (legends and axes)coord-.r
, which has the core components for building coordinate systems which take inputs that then determine the position of points, lines or other geometric elements on the canvas (e.g.coord_cartesian()
which is aCoord
you rarey consciously see typed out but use all the time)
Other ggplot2
source files will also be referenced as you progress through the chapters.
The more time you spend getting to know ggplot2
internally, the easier it will be to make more complex extensions.