Chapter 2 Introduction
2.1 What is ggplot2?
The “gg” in ggplot2 is short for “Grammar of Graphics” (Wilkinson 2005), which defines a common grammar for all data visualizations.
What is this grammar?
In brief, the grammar tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars). The plot may also contain statistical transformations of the data and is drawn on a specific coordinate system. Facetting can be used to generate the same plot for different subsets of the dataset. It is the combination of these independent components that make up a graphic (Wickham 2016).
ggplot2 was built on this concept, providing an elegant means of combining each of these elements to create virtually any statistical graphic imaginable.
Other plotting libraries typically rely on pre-specified functions to plot data in a specific way. This is useful in certain applications, but can become extremely limiting when trying to visualize data in unconventional ways.
It takes time to understand and appreciate the ggplot2 approach, but once you master the grammar it is as liberating as learning a new language. The 3 key components of every plot are:
- Data (
- Aesthetics (
- Geometries (
Let’s go into each in a little more detail…
Input data for ggplot2 must be tidy.
Tidy data means:
- Each variable must have its own column
- Each observation must have its own row
- Each value must have its own cell
In practice that means you should:
- Put each dataset in a data frame (or tibble)
- Put each variable in a column
ggplot2 is part of an ecosystem of packages called the tidyverse, which provide common (amazing) tools for working with “tidy” data. For more information on the tidyverse or this approach see the tidyverse website (https://www.tidyverse.org/) or R for Data Science book (https://r4ds.had.co.nz/)
This is maybe the most confusing part of the ggplot2 grammar. Aesthetics refer to mapped relationship between variables in the data and the visual properties of the plot.
Aesthetic properties of plots include the following:
This mapping is achieved using the
mapping = aes() argument within either the
ggplot() function, or within geometry function (
geom_*(); more on these next). This will hopefully make more sense once we start moving through some hands-on examples.
The data and aesthetic mapping are then used to plot the data using a given geometry. Examples of geometries include:
Each geometry is called using a function with the
geom_ prefix. For example, points are created with
geom_point(), paths with
geom_path(), etc. Each geometry has a specific set of required aesthetics. These are all easily available in the function documentation (use
?geom_path() to see the documentation for
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.