library(readxl) # load the readxl package
possums <- read_excel("possum_bw.xlsx") # read file, store as "possums" objectBoxplots in R with ggplot2
Cheatsheet
This work was developed using resources that are available under a Creative Commons Attribution 4.0 International License, made available on the SOLES Open Educational Resources repository by the School of Life and Environmental Sciences, The University of Sydney.



About
The boxplot is a visual representation of a dataset’s distribution, showing the median, quartiles, and outliers. It is useful for comparing distributions between groups and identifying outliers within a single group.
- You know how to install and load packages in R.
- You know how to import data into R.
- You recognise data frames and vectors.
Your data should be structured in a way that makes it easy to plot. The ideal structure is long, i.e. one where each column represents a variable and each row an observation (Figure 1). You can either reshape your data in R or move cells manually in a spreadsheet program to achieve the desired structure. For boxplots comparing more than one group of data, a categorical variable representing the group should be present in the data.
Sex is categorical and BW is the measured, continuous response – is preferred over wide data (right), as it makes it easier to manipulate data when plotting.
1 Data
For this cheatsheet we will use part of the possums dataset used in BIOL2022 labs.
2 Import data
3 Plot
Below are multiple versions of a boxplot comparing the body weight, BW, of possums between two groups defined by the Sex variable. Use the code snippets and their different implementations to understand how to customise your boxplot.
- 1
-
The
library()function loads a package. Here, we load theggplot2package to enable the functions required to create the plot. - 2
-
The
ggplot()function creates a plot canvas. Theaes()function specifies the aesthetic mappings, i.e. which variables are mapped to the x and y axes. - 3
-
Once the canvas is defined, the data can be added automatically using
geom_*()functions. Here,geom_boxplot()adds the boxplot to the canvas, structured according to the aesthetic mappings.

library(ggplot2)
ggplot(possums, aes(x = Sex, y = BW)) +
1 geom_boxplot(fill = "slateblue") +
2 xlab("Sex") +
ylab("Body weight (g)") +
3 theme_classic()- 1
-
Adding a
fillargument to thegeom_boxplot()function changes the colour of the boxplot. - 2
-
xlab()andylab()add labels to the x and y axes, respectively. - 3
-
An optional step,
theme_classic()changes the plot’s appearance without needing to specify complex customisations.

library(ggplot2)
1ggplot(possums, aes(x = BW, y = Sex, fill = Sex)) +
geom_boxplot() +
xlab("Body weight (g)") +
ylab("Sex") +
theme_minimal() +
2 scale_fill_manual(values = c("salmon", "slateblue"))- 1
-
The
Sexvariable is mapped to the y-axis and theBWvariable to the x-axis. Thefillaesthetic is used to colour the boxplots by theSexvariable. - 2
-
The
scale_fill_manual()function allows you to manually set the colours of the boxplots defined by thefillaesthetic in theaes()function above. There must be one colour for each level of theSexvariable.

library(ggplot2)
ggplot(possums) +
1 aes(x = Sex, y = BW) +
2 geom_boxplot(width = .3, fill = "beige") +
3 geom_point(
4 position = position_nudge(x = -.3),
5 shape = 95, size = 24, alpha = .25
) +
theme_bw()- 1
-
The
aes()function is placed outside theggplot()function, allowing the aesthetic mappings to be used across multiplegeom_*()functions. - 2
-
geom_boxplot()can be customised further using thewidthargument to change the width of the boxplots. - 3
-
geom_point()adds points to the plot. - 4
-
The
position_nudge()function moves the points to the left of the boxplots by -0.3 units. - 5
-
The
shape,size, andalphaarguments customise the appearance of the points, resulting in a different visual representation of the data “points”.

library(ggplot2)
plot1 <-
ggplot(possums) +
1 aes(x = Sex, y = BW)
2plot1 +
geom_boxplot() +
geom_point(
3 position = position_jitter(width = .05, seed = 0),
size = 4, alpha = .5,
colour = "firebrick"
) +
theme_classic()- 1
-
It is possible to save current work on a plot for later use by assigning it to an object, e.g.
plot1. - 2
-
To continue working on the plot, use the
+operator on the saved object and continue adding layers. - 3
-
The
position_jitter()function adds a small amount of random noise to the points, preventing them from overlapping. Theseedargument ensures the noise is consistent across multiple plots.

4 More resources
- R colors – a good resource for choosing colours using words in R.
- Beyond bar and box plots – alternative visualisation methods in R for comparing groups.
- Boxplot – the R Graph Gallery – a gallery of boxplot examples in R.