ANOVA/regression tables in R

Cheatsheet

Published

September 5, 2024

This work was developed using resources that are available under a Creative Commons Attribution 4.0 International License, made available on the SOLES Open Educational Resources repository by the School of Life and Environmental Sciences, The University of Sydney.

Assumed knowledge
  • You know how to install and load packages in R.
  • You know how to import data into R.
  • You recognise data frames and vectors.
  • You have performed a GLM or ANOVA, and now wish to create a table.

The data should be in a long format (also known as tidy data), where each row is an observation and each column is a variable (Figure 1). If your data is not already structured this way, reshape it manually in a spreadsheet program or in R using the pivot_longer() function from the tidyr package.

Sex BW
F 2.15
M 2.55
F 2.95
F 2.70
M 2.20
F 1.85
M 2.55
M 2.60

 

F M
2.15 2.55
2.95 2.20
2.70 2.55
1.85 2.60
Figure 1: Data should be in long format (left) where each row is an observation and each column is a variable. This is the preferred format for most statistical software. Wide format (right) is also common, but may require additional steps to analyse or visualise in some instances.
Data

For this cheatsheet we will use data from the penguins dataset from the palmerpenguins package. You may need to install this package:

install.packages("palmerpenguins")

About

This cheatsheet will show you how to quickly create “simple” HTML tables for ANOVA and regression models in R. Note that these tables are probably not publication-ready. At some point it might be easier to either export the tables here to a document processing software such as Word or Excel for further formatting. Other packages such as kableExtra, stargazer and huxtable can also be used to format tables in R but are beyond the scope of this cheatsheet.

R packages used

tidyverse car emmeans gt gtsummary sjPlot palmerpenguins

Code

We will format the ANOVA table below.

Code
# First, generate the ANOVA output
fit01 <-
    penguins |>
    lm(bill_length_mm ~ species * sex, data = _) |>
    Anova()
# Rename the predictors/terms if needed
rownames(fit01) <- (c("Species", "Sex", "Species:Sex", "Residuals"))
fit01
Anova Table (Type II tests)

Response: bill_length_mm
            Sum Sq  Df  F value Pr(>F)    
Species     6975.6   2 650.4786 <2e-16 ***
Sex         1135.7   1 211.8066 <2e-16 ***
Species:Sex   24.5   2   2.2841 0.1035    
Residuals   1753.3 327                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
table1 <-
1    gt(fit01, rownames_to_stub = TRUE) |>
2    sub_missing(missing_text = "") |>
    # bold any p-values less than 0.05
    tab_style(
        style = list(
            cell_text(weight = "bold")),
            locations = cells_body(columns = `Pr(>F)`, rows = `Pr(>F)` < 0.05)
3        ) |>
4    cols_label(`Df` = "df", `F value` = "F", `Pr(>F)` = "p") |>
5    fmt_number(columns = c(2, 4, 5), decimals = 2) |>
6    sub_small_vals(threshold = 0.001)
table1
1
rownames_to_stub will use the rownames as the first column.
2
sub_missing will replace missing values with an empty string.
3
tab_style basically contains code to bold p-values less than 0.05.
4
cols_label will rename the columns.
5
fmt_number is used to round the numbers to 2 decimal places for columns 2 and 4.
6
sub_small_vals can be used to define a threshold for rounding small values. In this case any value less than 0.001 will be rounded to “<0.001”.
Sum Sq df F p
Species 6,975.59 2 650.48 <0.001
Sex 1,135.68 1 211.81 <0.001
Species:Sex 24.49 2 2.28 0.10
Residuals 1,753.34 327

If you want to further customise the table, you can use the gtsave() function to save the table as a word/docx file.

R has extensive support for GLMs. For example, the sjPlot package can be used to create regression tables automatically. We will format the regression table below.

Code
fit02 <-
    penguins |>
    lm(body_mass_g ~ bill_length_mm * flipper_length_mm, data = _) 
summary(fit02)

Call:
lm(formula = body_mass_g ~ bill_length_mm * flipper_length_mm, 
    data = penguins)

Residuals:
     Min       1Q   Median       3Q      Max 
-1040.18  -283.07   -23.94   241.93  1241.40 

Coefficients:
                                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)                      5090.5088  2925.3007   1.740 0.082740 .  
bill_length_mm                   -229.2424    63.4334  -3.614 0.000347 ***
flipper_length_mm                  -7.3085    15.0321  -0.486 0.627145    
bill_length_mm:flipper_length_mm    1.1998     0.3224   3.721 0.000232 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 386.8 on 338 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.7694,    Adjusted R-squared:  0.7674 
F-statistic: 375.9 on 3 and 338 DF,  p-value: < 2.2e-16
table2 <-
    fit02 |>
    sjPlot::tab_model() 
table2
  body mass g
Predictors Estimates CI p
(Intercept) 5090.51 -663.58 – 10844.60 0.083
bill length mm -229.24 -354.02 – -104.47 <0.001
flipper length mm -7.31 -36.88 – 22.26 0.627
bill length mm × flipper
length mm
1.20 0.57 – 1.83 <0.001
Observations 342
R2 / R2 adjusted 0.769 / 0.767

The gtsummary package can also be used to create regression tables. We will format the regression table below.

Code
summary(fit02)

Call:
lm(formula = body_mass_g ~ bill_length_mm * flipper_length_mm, 
    data = penguins)

Residuals:
     Min       1Q   Median       3Q      Max 
-1040.18  -283.07   -23.94   241.93  1241.40 

Coefficients:
                                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)                      5090.5088  2925.3007   1.740 0.082740 .  
bill_length_mm                   -229.2424    63.4334  -3.614 0.000347 ***
flipper_length_mm                  -7.3085    15.0321  -0.486 0.627145    
bill_length_mm:flipper_length_mm    1.1998     0.3224   3.721 0.000232 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 386.8 on 338 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.7694,    Adjusted R-squared:  0.7674 
F-statistic: 375.9 on 3 and 338 DF,  p-value: < 2.2e-16
fit02 |>
    tbl_regression() |>
    add_glance_source_note(include = c(r.squared, adj.r.squared, nobs))

Characteristic

Beta

95% CI

1

p-value

bill_length_mm -229 -354, -104 <0.001
flipper_length_mm -7.3 -37, 22 0.6
bill_length_mm * flipper_length_mm 1.2 0.57, 1.8 <0.001

R² = 0.769; Adjusted R² = 0.767; No. Obs. = 342

1

CI = Confidence Interval

Other resources

None yet. Stay tuned…