Table of Model Results using kable and kableExtra
By Daniel Chen
September 13, 2019
I’m at the R/Medicine conference (no it’s not a Reddit thing) and got to help Alison Hill with her R Markdown for Medicine workshop. One of the questions I got to tinker with was making tables used to report model results.
One technique I learned while doing my MPH was to add variables to your model in blocks. It reduces the number of tests you need to perform, and is more meaningful than saying “I ran stepwise”. So, as a researcher, you will add variables in meaningful blocks and then see if that block of variables is “significant” by looking at the F-statistic.
Here is one example of at least putting your model results into a table.
Creating the models
library(reshape2) # using this just for the tips dataset
library(broom)
library(purrr)
library(knitr)
library(kableExtra)
head(tips)
## total_bill tip sex smoker day time size
## 1 16.99 1.01 Female No Sun Dinner 2
## 2 10.34 1.66 Male No Sun Dinner 3
## 3 21.01 3.50 Male No Sun Dinner 3
## 4 23.68 3.31 Male No Sun Dinner 2
## 5 24.59 3.61 Female No Sun Dinner 4
## 6 25.29 4.71 Male No Sun Dinner 4
# create a bumch of models
m1 <- lm(tip ~ total_bill, data = tips)
m2 <- lm(tip ~ total_bill + sex, data = tips)
m3 <- lm(tip ~ total_bill + sex + smoker, data = tips)
m4 <- lm(tip ~ total_bill + smoker + size, data = tips)
Now that we have the models, we can use the broom
package to get a nice consistent dataframe of the coefficients
# get a dataframe of the model estimates
models <- list(m1, m2, m3, m4)
broom_outputs <- purrr::map(models, broom::tidy)
broom_outputs
## [[1]]
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.920 0.160 5.76 2.53e- 8
## 2 total_bill 0.105 0.00736 14.3 6.69e-34
##
## [[2]]
## # A tibble: 3 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.933 0.174 5.37 1.84e- 7
## 2 total_bill 0.105 0.00746 14.1 2.33e-33
## 3 sexMale -0.0266 0.138 -0.192 8.48e- 1
##
## [[3]]
## # A tibble: 4 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.977 0.178 5.48 1.05e- 7
## 2 total_bill 0.106 0.00748 14.2 1.73e-33
## 3 sexMale -0.0281 0.138 -0.203 8.39e- 1
## 4 smokerYes -0.149 0.135 -1.10 2.72e- 1
##
## [[4]]
## # A tibble: 4 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.709 0.205 3.46 6.38e- 4
## 2 total_bill 0.0939 0.00933 10.1 4.26e-20
## 3 smokerYes -0.0834 0.138 -0.605 5.46e- 1
## 4 size 0.180 0.0878 2.05 4.11e- 2
What we have now is a list of dataframes, one for each model we fit.
Just the beta estimates
If we just want the coefficients, we only need the terms
and estimates
columns and do a join
.
We can use the purrr::reduce
function to do this
# join them all together
model_outputs <- purrr::reduce(broom_outputs, dplyr::full_join, by = 'term')
model_outputs
## # A tibble: 5 x 17
## term estimate.x std.error.x statistic.x p.value.x estimate.y std.error.y
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.920 0.160 5.76 2.53e- 8 0.933 0.174
## 2 total_bill 0.105 0.00736 14.3 6.69e-34 0.105 0.00746
## 3 sexMale NA NA NA NA -0.0266 0.138
## 4 smokerYes NA NA NA NA NA NA
## 5 size NA NA NA NA NA NA
## # ... with 10 more variables: statistic.y <dbl>, p.value.y <dbl>,
## # estimate.x.x <dbl>, std.error.x.x <dbl>, statistic.x.x <dbl>,
## # p.value.x.x <dbl>, estimate.y.y <dbl>, std.error.y.y <dbl>,
## # statistic.y.y <dbl>, p.value.y.y <dbl>
This will join all the dataframes together using the term
column as the key.
Variables that do not exist in the model will be filled with an NA
.
Problem now, is that we have a bunch of columns we don’t need.
So the next step is to pull out the columns we want, and rename them accordingly.
model_table <- model_outputs %>%
dplyr::select(term, tidyselect::starts_with('estimate'))
names(model_table) <- c('Variable', 'Model 1', "Model 2", "Model 3", "Model 4")
model_table
## # A tibble: 5 x 5
## Variable `Model 1` `Model 2` `Model 3` `Model 4`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.920 0.933 0.977 0.709
## 2 total_bill 0.105 0.105 0.106 0.0939
## 3 sexMale NA -0.0266 -0.0281 NA
## 4 smokerYes NA NA -0.149 -0.0834
## 5 size NA NA NA 0.180
What if I need more variables?
Just reporting the beta coefficients is not how you want to report your findings,
typically you’d want to also display some measure of uncertainty along with the results.
Here I’m just going to add the std.error
column, but you can calculate the 95% confidence intervals using the std.error
.
Here’s the reference from the kableExtra
library I’m using to build the coefficient table:
# this is the exmaple form the kable docs I used for reference
# https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html
collapse_rows_dt <- data.frame(C1 = c(rep("a", 3), rep("b", 2)),
C2 = c(rep("c", 2), rep("d", 1), rep("c", 1), rep("d", 1)),
C3 = 1:5,
C4 = sample(c(0,1), 5, replace = TRUE))
collapse_rows_dt
## C1 C2 C3 C4
## 1 a c 1 0
## 2 a c 2 1
## 3 a d 3 0
## 4 b c 4 1
## 5 b d 5 1
kable(collapse_rows_dt, align = "c") %>%
kable_styling(full_width = F) %>%
column_spec(1, bold = T) %>%
collapse_rows(columns = 1:2, valign = "top")
C1 | C2 | C3 | C4 |
---|---|---|---|
a | c | 1 | 0 |
a | c | 2 | 1 |
a | d | 3 | 0 |
b | c | 4 | 1 |
b | d | 5 | 1 |
This example was important to my solution, becuase I would need to format my coefficient table the same way as collapse_rows_dt
.
This means that:
C1
would be the variables in my modelC2
would need to alternate between theestimate
and thestd.error
- The other columns would be the actual values for each model.
Since the model outputs the estimate
and std.error
as columns,
I know some kind of reshaping/tidying step will need to be involved (melt
/gather
/ pivot_longer
).
We still start from the same data steps as the previous example
broom_outputs
## [[1]]
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.920 0.160 5.76 2.53e- 8
## 2 total_bill 0.105 0.00736 14.3 6.69e-34
##
## [[2]]
## # A tibble: 3 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.933 0.174 5.37 1.84e- 7
## 2 total_bill 0.105 0.00746 14.1 2.33e-33
## 3 sexMale -0.0266 0.138 -0.192 8.48e- 1
##
## [[3]]
## # A tibble: 4 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.977 0.178 5.48 1.05e- 7
## 2 total_bill 0.106 0.00748 14.2 1.73e-33
## 3 sexMale -0.0281 0.138 -0.203 8.39e- 1
## 4 smokerYes -0.149 0.135 -1.10 2.72e- 1
##
## [[4]]
## # A tibble: 4 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.709 0.205 3.46 6.38e- 4
## 2 total_bill 0.0939 0.00933 10.1 4.26e-20
## 3 smokerYes -0.0834 0.138 -0.605 5.46e- 1
## 4 size 0.180 0.0878 2.05 4.11e- 2
The trick here is to get the data in the correct format that can be used by kableExtra
broom_stats_long <- purrr::map(broom_outputs, tidyr::gather, 'key', 'value', estimate, std.error)
broom_stats_long
## [[1]]
## # A tibble: 4 x 5
## term statistic p.value key value
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 (Intercept) 5.76 2.53e- 8 estimate 0.920
## 2 total_bill 14.3 6.69e-34 estimate 0.105
## 3 (Intercept) 5.76 2.53e- 8 std.error 0.160
## 4 total_bill 14.3 6.69e-34 std.error 0.00736
##
## [[2]]
## # A tibble: 6 x 5
## term statistic p.value key value
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 (Intercept) 5.37 1.84e- 7 estimate 0.933
## 2 total_bill 14.1 2.33e-33 estimate 0.105
## 3 sexMale -0.192 8.48e- 1 estimate -0.0266
## 4 (Intercept) 5.37 1.84e- 7 std.error 0.174
## 5 total_bill 14.1 2.33e-33 std.error 0.00746
## 6 sexMale -0.192 8.48e- 1 std.error 0.138
##
## [[3]]
## # A tibble: 8 x 5
## term statistic p.value key value
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 (Intercept) 5.48 1.05e- 7 estimate 0.977
## 2 total_bill 14.2 1.73e-33 estimate 0.106
## 3 sexMale -0.203 8.39e- 1 estimate -0.0281
## 4 smokerYes -1.10 2.72e- 1 estimate -0.149
## 5 (Intercept) 5.48 1.05e- 7 std.error 0.178
## 6 total_bill 14.2 1.73e-33 std.error 0.00748
## 7 sexMale -0.203 8.39e- 1 std.error 0.138
## 8 smokerYes -1.10 2.72e- 1 std.error 0.135
##
## [[4]]
## # A tibble: 8 x 5
## term statistic p.value key value
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 (Intercept) 3.46 6.38e- 4 estimate 0.709
## 2 total_bill 10.1 4.26e-20 estimate 0.0939
## 3 smokerYes -0.605 5.46e- 1 estimate -0.0834
## 4 size 2.05 4.11e- 2 estimate 0.180
## 5 (Intercept) 3.46 6.38e- 4 std.error 0.205
## 6 total_bill 10.1 4.26e-20 std.error 0.00933
## 7 smokerYes -0.605 5.46e- 1 std.error 0.138
## 8 size 2.05 4.11e- 2 std.error 0.0878
Now that we have the data formatted the way we need, we now select
only the columns we need.
# filter only the columns we need
broom_stats_long_filter <- purrr::map(broom_stats_long, dplyr::select, term, key, value)
broom_stats_long_filter
## [[1]]
## # A tibble: 4 x 3
## term key value
## <chr> <chr> <dbl>
## 1 (Intercept) estimate 0.920
## 2 total_bill estimate 0.105
## 3 (Intercept) std.error 0.160
## 4 total_bill std.error 0.00736
##
## [[2]]
## # A tibble: 6 x 3
## term key value
## <chr> <chr> <dbl>
## 1 (Intercept) estimate 0.933
## 2 total_bill estimate 0.105
## 3 sexMale estimate -0.0266
## 4 (Intercept) std.error 0.174
## 5 total_bill std.error 0.00746
## 6 sexMale std.error 0.138
##
## [[3]]
## # A tibble: 8 x 3
## term key value
## <chr> <chr> <dbl>
## 1 (Intercept) estimate 0.977
## 2 total_bill estimate 0.106
## 3 sexMale estimate -0.0281
## 4 smokerYes estimate -0.149
## 5 (Intercept) std.error 0.178
## 6 total_bill std.error 0.00748
## 7 sexMale std.error 0.138
## 8 smokerYes std.error 0.135
##
## [[4]]
## # A tibble: 8 x 3
## term key value
## <chr> <chr> <dbl>
## 1 (Intercept) estimate 0.709
## 2 total_bill estimate 0.0939
## 3 smokerYes estimate -0.0834
## 4 size estimate 0.180
## 5 (Intercept) std.error 0.205
## 6 total_bill std.error 0.00933
## 7 smokerYes std.error 0.138
## 8 size std.error 0.0878
Next, you need to create some kind of ID value for each model you’re creating,
that way when you’re combining everything into a single dataframe,
the estimates are traveling together with the standard error.
Here I’m labeling each model as a number starting with 1
.
tidy_tables <- purrr::map2_df(broom_stats_long_filter, 1:length(broom_stats_long_filter),
function(x, y) dplyr::mutate(x, mod = y))
tidy_tables
## # A tibble: 26 x 4
## term key value mod
## <chr> <chr> <dbl> <int>
## 1 (Intercept) estimate 0.920 1
## 2 total_bill estimate 0.105 1
## 3 (Intercept) std.error 0.160 1
## 4 total_bill std.error 0.00736 1
## 5 (Intercept) estimate 0.933 2
## 6 total_bill estimate 0.105 2
## 7 sexMale estimate -0.0266 2
## 8 (Intercept) std.error 0.174 2
## 9 total_bill std.error 0.00746 2
## 10 sexMale std.error 0.138 2
## # ... with 16 more rows
Almost there!
Lastly, we need to sort the columns by using arrange
.
Here is where the ID column, mod
get’s used to make sure model values are traveling together.
df <- tidy_tables %>%
dplyr::arrange(term, mod, key, value) %>%
tidyr::spread(mod, value, drop = FALSE)
df
## # A tibble: 10 x 6
## term key `1` `2` `3` `4`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) estimate 0.920 0.933 0.977 0.709
## 2 (Intercept) std.error 0.160 0.174 0.178 0.205
## 3 sexMale estimate NA -0.0266 -0.0281 NA
## 4 sexMale std.error NA 0.138 0.138 NA
## 5 size estimate NA NA NA 0.180
## 6 size std.error NA NA NA 0.0878
## 7 smokerYes estimate NA NA -0.149 -0.0834
## 8 smokerYes std.error NA NA 0.135 0.138
## 9 total_bill estimate 0.105 0.105 0.106 0.0939
## 10 total_bill std.error 0.00736 0.00746 0.00748 0.00933
kable(df, align='c') %>%
kable_styling(full_width = F) %>%
column_spec(1, bold = T) %>%
collapse_rows(columns = 1, valign = "top")
term | key | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
(Intercept) | estimate | 0.9202696 | 0.9332785 | 0.9770353 | 0.7090155 |
(Intercept) | std.error | 0.1597347 | 0.1737557 | 0.1781633 | 0.2048813 |
sexMale | estimate | NA | -0.0266087 | -0.0280926 | NA |
sexMale | std.error | NA | 0.1383340 | 0.1382793 | NA |
size | estimate | NA | NA | NA | 0.1803316 |
size | std.error | NA | NA | NA | 0.0878033 |
smokerYes | estimate | NA | NA | -0.1491923 | -0.0834326 |
smokerYes | std.error | NA | NA | 0.1354353 | 0.1380005 |
total_bill | estimate | 0.1050245 | 0.1052324 | 0.1059431 | 0.0938884 |
total_bill | std.error | 0.0073648 | 0.0074582 | 0.0074827 | 0.0093314 |
kableExtra
There kableExtra is great for tables like this. Here are links to more information
- Posted on:
- September 13, 2019
- Length:
- 11 minute read, 2170 words