library(readxl) # load the readxl package
<- read_excel("possum_bw.xlsx") # read file, store as "possums" object possums
Boxplots in R with ggplot2
Cheatsheet
This work was developed using resources that are available under a Creative Commons Attribution 4.0 International License, made available on the SOLES Open Educational Resources repository by the School of Life and Environmental Sciences, The University of Sydney.
About
The boxplot is a visual representation of a dataset’s distribution, showing the median, quartiles, and outliers. It is useful for comparing distributions between groups and identifying outliers within a single group.
- You know how to install and load packages in R.
- You know how to import data into R.
- You recognise data frames and vectors.
Your data should be structured in a way that makes it easy to plot. The ideal structure is long, i.e. one where each column represents a variable and each row an observation (Figure 1). You can either reshape your data in R or move cells manually in a spreadsheet program to achieve the desired structure. For boxplots comparing more than one group of data, a categorical variable representing the group should be present in the data.

Sex
is categorical and BW
is the measured, continuous response – is preferred over wide data (right), as it makes it easier to manipulate data when plotting.
1 Data
For this cheatsheet we will use part of the possums dataset used in BIOL2022 labs.
2 Import data
3 Plot
Below are multiple versions of a boxplot comparing the body weight, BW
, of possums between two groups defined by the Sex
variable. Use the code snippets and their different implementations to understand how to customise your boxplot.
- 1
-
The
library()
function loads a package. Here, we load theggplot2
package to enable the functions required to create the plot. - 2
-
The
ggplot()
function creates a plot canvas. Theaes()
function specifies the aesthetic mappings, i.e. which variables are mapped to the x and y axes. - 3
-
Once the canvas is defined, the data can be added automatically using
geom_*()
functions. Here,geom_boxplot()
adds the boxplot to the canvas, structured according to the aesthetic mappings.
library(ggplot2)
ggplot(possums, aes(x = Sex, y = BW)) +
1geom_boxplot(fill = "slateblue") +
2xlab("Sex") +
ylab("Body weight (g)") +
3theme_classic()
- 1
-
Adding a
fill
argument to thegeom_boxplot()
function changes the colour of the boxplot. - 2
-
xlab()
andylab()
add labels to the x and y axes, respectively. - 3
-
An optional step,
theme_classic()
changes the plot’s appearance without needing to specify complex customisations.
library(ggplot2)
1ggplot(possums, aes(x = BW, y = Sex, fill = Sex)) +
geom_boxplot() +
xlab("Body weight (g)") +
ylab("Sex") +
theme_minimal() +
2scale_fill_manual(values = c("salmon", "slateblue"))
- 1
-
The
Sex
variable is mapped to the y-axis and theBW
variable to the x-axis. Thefill
aesthetic is used to colour the boxplots by theSex
variable. - 2
-
The
scale_fill_manual()
function allows you to manually set the colours of the boxplots defined by thefill
aesthetic in theaes()
function above. There must be one colour for each level of theSex
variable.
library(ggplot2)
ggplot(possums) +
1aes(x = Sex, y = BW) +
2geom_boxplot(width = .3, fill = "beige") +
3geom_point(
4position = position_nudge(x = -.3),
5shape = 95, size = 24, alpha = .25
+
) theme_bw()
- 1
-
The
aes()
function is placed outside theggplot()
function, allowing the aesthetic mappings to be used across multiplegeom_*()
functions. - 2
-
geom_boxplot()
can be customised further using thewidth
argument to change the width of the boxplots. - 3
-
geom_point()
adds points to the plot. - 4
-
The
position_nudge()
function moves the points to the left of the boxplots by -0.3 units. - 5
-
The
shape
,size
, andalpha
arguments customise the appearance of the points, resulting in a different visual representation of the data “points”.
library(ggplot2)
<-
plot1 ggplot(possums) +
1aes(x = Sex, y = BW)
2+
plot1 geom_boxplot() +
geom_point(
3position = position_jitter(width = .05, seed = 0),
size = 4, alpha = .5,
colour = "firebrick"
+
) theme_classic()
- 1
-
It is possible to save current work on a plot for later use by assigning it to an object, e.g.
plot1
. - 2
-
To continue working on the plot, use the
+
operator on the saved object and continue adding layers. - 3
-
The
position_jitter()
function adds a small amount of random noise to the points, preventing them from overlapping. Theseed
argument ensures the noise is consistent across multiple plots.
4 More resources
- R colors – a good resource for choosing colours using words in R.
- Beyond bar and box plots – alternative visualisation methods in R for comparing groups.
- Boxplot – the R Graph Gallery – a gallery of boxplot examples in R.