How to format a complex table for rmarkdown PDF output

This is very simple to do using the add_header_above command from the KableExtra-package. You can add as many column groupings as you want. Here is what I would do:

d <- mtcars[1:5,1:5]
kable(d,longtable = T, booktabs = T) %>%
   add_header_above(c(" ", "Group 1" = 2, "Group 2" = 3)) %>%
   add_header_above(c("","Groups" = 5))

enter image description here


Quoting this comment:

I'm looking for a way to do this programmatically from within the rmarkdown document without having to hard-code the formatting, so that it's reproducible and flexible.

The following solution uses a hard-coded "template", but the template can be filled with any data (provided it has the same 2x8 structure).

The generated table looks like this:

Output

Full code below.


Basically, the final table consists of 9 columns, so the basic LaTeX structure is

\begin{tabular}{|c|c|c|c|c|c|c|c|c|}
% rest of table
\end{tabular}

However, it is convenient to fix the width of the cells. This is possible with the custom column type C (taken from here on TEX.SE), which allows for centered content with fixed width. This, together with the more compact syntax for repeating column types gives:

\begin{tabular}{|c *{8}{|C{1cm}}|}
% rest of table
\end{tabular}

(First column centered with flexible width, then 8 centered columns, each 1cm wide).

The cells spanning multiple columns are possible using \multicolumn. These cells should also have a fixed width in order to have the cell captions break into two lines. Note that it is a fallacy to assume that the cells spanning two 1cm-columns should have a width of 2cm because the two spanned cells have additional padding between them. Some measurement revealed that about 2.436cm delivers good results.

Remark on the first column: Although \multicolumn{1}{...}{...} looks useless at first sight, it is useful for changing the column type (including left/right) borders for a single cell. I used it to drop the leftmost vertical line in the first two rows.

\cline{x-y} provides horizontal lines that span only the columns xto y.

Taking these pieces together gives:

\begin{tabular}{|c *{8}{|C{1cm}}|} \cline{2-9}
    \multicolumn{1}{c|}{} & \multicolumn{8}{c|}{\textbf{Predicted}} \\ \cline{2-9}
    \multicolumn{1}{c|}{} & \multicolumn{2}{c|}{\textbf{Count}} & \multicolumn{2}{C{2.436cm}|}{\textbf{Overall Percent}} & \multicolumn{2}{C{2.436cm}|}{\textbf{Row \newline Percent}} & \multicolumn{2}{C{2.436cm}|}{\textbf{Column Percent}} \\ \hline
% rest of table
\end{tabular}

Regarding the data, I dropped the last line of the code that generated to sample data to get:

> x <- structure(c(34L, 6L, 9L, 35L), .Dim = c(2L, 2L), .Dimnames = structure(list(Actual = c("Fail", "Pass"), Predicted = c("Fail", "Pass")), .Names = c("Actual", "Predicted")), class = "table")
> x <- cbind(x, prop.table(x), prop.table(x, 1), prop.table(x,2))
> x[, -c(1,2)] <- sapply(x[,-c(1,2)], function(i) paste0(sprintf("%1.1f", i*100),"%"))
> x
     Fail Pass Fail    Pass    Fail    Pass    Fail    Pass   
Fail "34" "9"  "40.5%" "10.7%" "79.1%" "20.9%" "85.0%" "20.5%"
Pass "6"  "35" "7.1%"  "41.7%" "14.6%" "85.4%" "15.0%" "79.5%"

To set the column and row names in italics, apply

colnames(x) <- sprintf("\\emph{%s}", colnames(x)) # highlight colnames
rownames(x) <- sprintf("\\emph{%s}", rownames(x)) # highlight rownames

Then, the following xtable code can be used:

print(xtable(x),
      only.contents = TRUE, 
      comment = FALSE,
      sanitize.colnames.function = identity, 
      sanitize.rownames.function = identity, 
      hline.after = 0:2)

The argument only.contents suppresses the enclosing tabular environment. Assigning the identity function to sanitize.colnames.function and sanitize.rownames.function means "don't sanitize". We need this because column and row names contain special LaTeX characters that should not be escaped (\emph).

The output should replace the %rest of table placeholder from above.


Conceptually, the code uses xtable to generate only the table body but not the header because it is much easier to write the header manually.

Although the whole table header is "hard-coded", the data can be changed as required.

Don't forget to escape all \ with a second \! Also, the following must be added to the header (header.tex):

\usepackage{array}
\newcolumntype{C}[1]{>{\centering\let\newline\\\arraybackslash\hspace{0pt}}m{#1}} % https://tex.stackexchange.com/a/12712/37118

I wrapped all the elements outlined above in a function PrintConfusionMatrix that can be reused with any 2x8 data frame providing the data and column / row names.


Full code:

---
output:
  pdf_document: 
    keep_tex: yes
    includes:
      in_header: header.tex
---


```{r, echo = FALSE}
library(xtable)

# Sample data from question
x <- structure(c(34L, 6L, 9L, 35L), .Dim = c(2L, 2L), .Dimnames = structure(list(Actual = c("Fail", "Pass"), Predicted = c("Fail", "Pass")), .Names = c("Actual", "Predicted")), class = "table")
x <- cbind(x, prop.table(x), prop.table(x, 1), prop.table(x,2))
x[, -c(1,2)] <- sapply(x[,-c(1,2)], function(i) paste0(sprintf("%1.1f", i*100),"%"))
#x <- cbind(Actual=rownames(x), x) # dropped; better not to add row names to data

PrintConfusionMatrix <- function(data, ...) {

  stopifnot(all(dim(x) == c(2, 8)))

  colnames(x) <- sprintf("\\emph{%s}", colnames(x)) # highlight colnames
  rownames(x) <- sprintf("\\emph{%s}", rownames(x)) # highlight rownames

  cat('\\begin{tabular}{|c *{8}{|C{1cm}}|} \\cline{2-9}
    \\multicolumn{1}{c|}{} & \\multicolumn{8}{c|}{\\textbf{Predicted}} \\\\ \\cline{2-9}
    \\multicolumn{1}{c|}{} & \\multicolumn{2}{c|}{\\textbf{Count}} & \\multicolumn{2}{C{2.436cm}|}{\\textbf{Overall Percent}} & \\multicolumn{2}{C{2.436cm}|}{\\textbf{Row \\newline Percent}} & \\multicolumn{2}{C{2.436cm}|}{\\textbf{Column Percent}} \\\\ \\hline
    \\textbf{Actual} ')

  print(xtable(x),
        only.contents = TRUE, 
        comment = FALSE,
        sanitize.colnames.function = identity, 
        sanitize.rownames.function = identity, 
        hline.after = 0:2,
        ...)
  cat("\\end{tabular}")
}
```

```{r, results='asis'}
PrintConfusionMatrix(x)
```