Sex | BW |
---|---|
F | 2.15 |
M | 2.55 |
F | 2.95 |
F | 2.70 |
M | 2.20 |
F | 1.85 |
M | 2.55 |
M | 2.60 |
Histograms in R with ggplot2
Cheatsheet
This work was developed using resources that are available under a Creative Commons Attribution 4.0 International License, made available on the SOLES Open Educational Resources repository by the School of Life and Environmental Sciences, The University of Sydney.
1 About
The histogram is a bar plot that shows the frequency of (often) continuous values in a dataset. It helps identify patterns, outliers, and the shape of the distribution (skewness, kurtosis).
- You know how to install and load packages in R.
- You know how to import data into R.
- You recognise data frames and vectors.
The data should be in a long format (also known as tidy data), where each row is an observation and each column is a variable (Figure 1). If your data is not already structured this way, reshape it manually in a spreadsheet program or in R using the pivot_longer()
function from the tidyr
package.
F | M |
---|---|
2.15 | 2.55 |
2.95 | 2.20 |
2.70 | 2.55 |
1.85 | 2.60 |
2 Data
For this cheatsheet we will use the entire possums dataset used in BIOL2022 labs.
3 Import data
library(readxl)
<- read_excel("possums.xlsx", sheet = 2) possums
4 Plot
Use the different plots below to explore the use of histograms in R. Note that histograms only need one variable to be plotted, therefore we pick any one of the several continuous variables in the dataset in the aes()
function.
There bare minimum code to create a histogram in R.
- 1
-
Load the
ggplot2
package withlibrary(ggplot2)
. - 2
-
Create a canvas using
ggplot()
and specify the variable to be plotted on the x-axis withaes(x = BW)
. - 3
-
Add a histogram layer using
geom_histogram()
.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Add colours, labels, adjust bin width, and change the theme.
library(ggplot2)
ggplot(possums, aes(x = BW)) +
1geom_histogram(fill = "skyblue", color = "black",
2binwidth = .3) +
3labs(x = "Body weight (g)",
y = "Frequency") +
4theme_minimal()
- 1
-
Use
colour
andfill
arguments to change bar colors. - 2
-
Adjust bin width with
binwidth
to control the detail of the histogram. A smaller value will increase the number of bars based on the range of the data. - 3
-
Add axis labels using
labs()
. - 4
-
Use
theme_minimal()
for a standardized appearance.
Plot both a histogram and a density plot at the same time.
Compare a histogram with a standardised normal distribution.
library(ggplot2)
ggplot(possums, aes(x = AactiveTBLUP)) +
geom_histogram(aes(y = after_stat(density)),
fill = "skyblue",
color = "black",
binwidth = 0.3) +
stat_function(fun = dnorm,
args = list(mean = mean(possums$AactiveTBLUP),
sd = sd(possums$AactiveTBLUP)),
color = "red", linewidth = 1) +
theme_minimal()
5 Export
Use the ggsave()
function to save the plot as an image file.
- The
filename
argument specifies the name of the file. - The
plot
argument specifies the plot to be saved. In this case, the plot is stored in the objectp3
(from Version 3). - The
width
andheight
arguments specify the dimensions of the plot in inches.
ggsave(filename = "histogram.pdf", plot = p3, width = 7, height = 5)
The plot will be saved in the working directory, unless you specify a different path in the filename
argument.