ANOVA/regression tables in R

Cheatsheet

Published

September 5, 2024

License

This work was developed using resources that are available under a Creative Commons Attribution 4.0 International License, made available on the SOLES Open Educational Resources repository by the School of Life and Environmental Sciences, The University of Sydney.

Assumed knowledge

You know how to install and load packages in R.
You know how to import data into R.
You recognise data frames and vectors.
You have performed a GLM or ANOVA, and now wish to create a table.

Data structure

The data should be in a long format (also known as tidy data), where each row is an observation and each column is a variable (Figure 1). If your data is not already structured this way, reshape it manually in a spreadsheet program or in R using the pivot_longer() function from the tidyr package.

Sex	BW
F	2.15
M	2.55
F	2.95
F	2.70
M	2.20
F	1.85
M	2.55
M	2.60

F	M
2.15	2.55
2.95	2.20
2.70	2.55
1.85	2.60

Figure 1: Data should be in long format (left) where each row is an observation and each column is a variable. This is the preferred format for most statistical software. Wide format (right) is also common, but may require additional steps to analyse or visualise in some instances.

Data

For this cheatsheet we will use data from the penguins dataset from the palmerpenguins package. You may need to install this package:

install.packages("palmerpenguins")

About

This cheatsheet will show you how to quickly create “simple” HTML tables for ANOVA and regression models in R. Note that these tables are probably not publication-ready. At some point it might be easier to either export the tables here to a document processing software such as Word or Excel for further formatting. Other packages such as kableExtra, stargazer and huxtable can also be used to format tables in R but are beyond the scope of this cheatsheet.

R packages used

tidyverse car emmeans gt gtsummary sjPlot palmerpenguins

Code

We will format the ANOVA table below.

Code

# First, generate the ANOVA output
fit01 <-
    penguins |>
    lm(bill_length_mm ~ species * sex, data = _) |>
    Anova()
# Rename the predictors/terms if needed
rownames(fit01) <- (c("Species", "Sex", "Species:Sex", "Residuals"))
fit01

Anova Table (Type II tests)

Response: bill_length_mm
            Sum Sq  Df  F value Pr(>F)    
Species     6975.6   2 650.4786 <2e-16 ***
Sex         1135.7   1 211.8066 <2e-16 ***
Species:Sex   24.5   2   2.2841 0.1035    
Residuals   1753.3 327                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

table1 <-
1    gt(fit01, rownames_to_stub = TRUE) |>
2    sub_missing(missing_text = "") |>
    # bold any p-values less than 0.05
    tab_style(
        style = list(
            cell_text(weight = "bold")),
            locations = cells_body(columns = `Pr(>F)`, rows = `Pr(>F)` < 0.05)
3        ) |>
4    cols_label(`Df` = "df", `F value` = "F", `Pr(>F)` = "p") |>
5    fmt_number(columns = c(2, 4, 5), decimals = 2) |>
6    sub_small_vals(threshold = 0.001)
table1

1: rownames_to_stub will use the rownames as the first column.
2: sub_missing will replace missing values with an empty string.
3: tab_style basically contains code to bold p-values less than 0.05.
4: cols_label will rename the columns.
5: fmt_number is used to round the numbers to 2 decimal places for columns 2 and 4.
6: sub_small_vals can be used to define a threshold for rounding small values. In this case any value less than 0.001 will be rounded to “<0.001”.

	Sum Sq	df	F	p
Species	6,975.59	2	650.48	<0.001
Sex	1,135.68	1	211.81	<0.001
Species:Sex	24.49	2	2.28	0.10
Residuals	1,753.34	327

If you want to further customise the table, you can use the gtsave() function to save the table as a word/docx file.

R has extensive support for GLMs. For example, the sjPlot package can be used to create regression tables automatically. We will format the regression table below.

Code

fit02 <-
    penguins |>
    lm(body_mass_g ~ bill_length_mm * flipper_length_mm, data = _) 
summary(fit02)


Call:
lm(formula = body_mass_g ~ bill_length_mm * flipper_length_mm, 
    data = penguins)

Residuals:
     Min       1Q   Median       3Q      Max 
-1040.18  -283.07   -23.94   241.93  1241.40 

Coefficients:
                                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)                      5090.5088  2925.3007   1.740 0.082740 .  
bill_length_mm                   -229.2424    63.4334  -3.614 0.000347 ***
flipper_length_mm                  -7.3085    15.0321  -0.486 0.627145    
bill_length_mm:flipper_length_mm    1.1998     0.3224   3.721 0.000232 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 386.8 on 338 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.7694,    Adjusted R-squared:  0.7674 
F-statistic: 375.9 on 3 and 338 DF,  p-value: < 2.2e-16

table2 <-
    fit02 |>
    sjPlot::tab_model() 
table2

	body mass g
Predictors	Estimates	CI	p
(Intercept)	5090.51	-663.58 – 10844.60	0.083
bill length mm	-229.24	-354.02 – -104.47	<0.001
flipper length mm	-7.31	-36.88 – 22.26	0.627
bill length mm × flipper length mm	1.20	0.57 – 1.83	<0.001
Observations	342
R² / R² adjusted	0.769 / 0.767

The gtsummary package can also be used to create regression tables. We will format the regression table below.

Code

summary(fit02)


Call:
lm(formula = body_mass_g ~ bill_length_mm * flipper_length_mm, 
    data = penguins)

Residuals:
     Min       1Q   Median       3Q      Max 
-1040.18  -283.07   -23.94   241.93  1241.40 

Coefficients:
                                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)                      5090.5088  2925.3007   1.740 0.082740 .  
bill_length_mm                   -229.2424    63.4334  -3.614 0.000347 ***
flipper_length_mm                  -7.3085    15.0321  -0.486 0.627145    
bill_length_mm:flipper_length_mm    1.1998     0.3224   3.721 0.000232 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 386.8 on 338 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.7694,    Adjusted R-squared:  0.7674 
F-statistic: 375.9 on 3 and 338 DF,  p-value: < 2.2e-16

fit02 |>
    tbl_regression() |>
    add_glance_source_note(include = c(r.squared, adj.r.squared, nobs))

Characteristic	Beta	95% CI ¹	p-value
bill_length_mm	-229	-354, -104	<0.001
flipper_length_mm	-7.3	-37, 22	0.6
bill_length_mm * flipper_length_mm	1.2	0.57, 1.8	<0.001
R² = 0.769; Adjusted R² = 0.767; No. Obs. = 342
¹ CI = Confidence Interval

Other resources

None yet. Stay tuned…