Reproducible research - Rmarkdown

29 April, 2022

By: Murray Logan


Load the necessary libraries

library(rmarkdown) #to render rmarkdown documents
library(tidyverse) #for data wrangling and plotting

Markdown and pandoc

Both LaTeX and HTML are markup languages. They both have standardized short-hand syntax to specify how content should be styled and formatted. Markdown is another markup language with its own specific syntax, yet is far simpler and less verbose than either LaTeX or HTML. The goal of markup languages is to provide simple styling rules and syntax so as to allow the author to concentrate on the content. To this end, the highly simplified syntax of the markdown language makes it one of the briefest and content rich formats. Unlike, many other markup languages (such as LaTeX and HTML), carriage returns and spaces form an important part of the language structure and thus influence the formatting of the final document.

To gain an appreciation of some of the simple styling rules of a markdown document, consider the following:

title: Example markdown
author: D. Author
date: 16-06-2020

This is the title

## Section 1

A paragraph of text containing a word that is **emphasised** or ~~strikethrough~~.
Followed by an unordered list: 

- item 1
- item 2

Or perhaps an enumerated list:

1. item 1
2. item 2

### Subsection 1.1

There might be a [link]( or even a table:

| Item      | Example | Description           |
| numeric   | 12.34   | floating point number |
| character | 'Site'  | words                 |
| ...       |         |                       |

Even in plain text, the general formatting is obvious. This simplicity also makes markdown an ideal language for acting as a base source from which other formats (such as PDF, HTML, Presentations, Ebooks) can be created as well as a sort of conduit language through which other formats are converted.

Pandoc is a universal document converter that converts between one markup language and another. Specifically, Pandoc can read markdown and subsets of the following formats:

  • HTML
  • LaTeX
  • Textile
  • reStructuredText
  • MediaWiki markup
  • DocBook XML

Pandoc can write the following formats:

  • plain text
  • markdown
  • LaTeX
  • PDF (when LaTeX installed)
  • Various HTML/Javascript based slide shows (Slidy, Slideous, DZSlides, S5)
  • EPUB
  • Emacs org-mode
  • Rich Text Format (RTF)
  • OpenDocument XML
  • LibreOffice (Open Document Format, ODT)
  • Microsoft Word DOCX
  • MediaWiki markup
  • FictionBook2
  • Textile
  • groff man pages
  • AsciiDoc

By way of example, the above markdown can be rendered into multiple popular formats via pandoc.

pdf (requires LaTeX)

pandoc -o example1.pdf 


pandoc -s -o example1.html 

word (docx)

pandoc -o example1.docx 

Many of the above markup languages feature extensive definitions for styling and formatting rules that do not have direct equivalents within other languages. For example, Cascading Style Sheets and Javascript within HTML provide advanced styling and dynamic presentation of content that cannot be easily translated into other languages. Similarly, there are many macros available for LaTeX that enhance the styling and formatting of content relevant to PDF. Consequently, not all of the more advanced features of each of the languages are supported by Pandoc for conversion.

Pandoc fully supports markdown as an input language, making markdown a popular base language to create content from which other formats can be generated. For example, contents authored in markdown can then be converted into PDF, HTML, HTML presentations, eBooks and others. There are currently numerous dialects of the markdown language. Pandoc has its own enhanced dialect of markdown which includes syntax for bibliographies and citations, footnotes, code blocks, tables, enhanced lists, tables of contents, embedded LaTeX math.

This tutorial will focus on markdown as a base source language from which PDF, HTML, presentations and eBooks are created. As a result, the tutorial will focus on Pandoc’s enhanced markdown. That said, from now on, we will not use pandoc directly - rather we will employ specific R functions that engage with pandoc as part of their overall processing.

Rather than introduce the structural elements of markdown and the intricacies of the pandoc tool in abstract terms, the main features will be The pandoc engine described and demonstrated in an R context with Rmarkdown.

The metadata block

You may have noticed in the example above that at the top of the markdown there were a block of lines starting with three hypens (---) and ending with three hyphens (---). When processed via pandoc, these lines define the document’s meta data (such as the title, author and creation date).

The meta data are a set of key value pairs in YAML format. The list of useful metadata depends on the intended output.

The following rules can be applied to yield different outcomes:

  • The three fields must be in order of title, author(s), date with each on a separate line

  • When omitting a field, the field must be left as a line just containing the % character

  • Multiple authors can be defined by either:

    • separating each author by a ; (semicolon) character
    • placing each author on a separate line (indented by a single space)
    title: This is the title
        - name D. Author
        - name D. Other
    date: 14-02-2013

    In addition to the above metadata fields, the YAML header provides a mechanism for storing processing preferences. For example, output dependent options can be specified by indenting each of the options under the output format (the following example indicates that html documents should have a table of contents.

title: This is the title
author: D. Author
date: 14-02-2013
    toc: yes

Note, YAML formatting is very particular. Indentation must be via spaces (not tabs).

Since most of the metadata fields are specific to output behaviours, we will illustrate other fields when describing the associated outputs.

Text formatting

Brief changes to font styles within a block of text can be effective at emphasizing or applying different meanings to characters. Common text modifier styles are: italic, bold and strikethrough.

Markdown Result
*Italic text* or _Italic text_ Italic text
**Bold text** or __Bold text__ Bold text
~~Strikethrough~~ Strikethrough
`Monospace font` Monospaced font
superscript^2^ superscript2
subscript~2~ subscript2

If the content to be raised or lowered (for super- and sub- scripts) contains spaces, then they must be escaped by proceeding the space with a  character. For example, Effect~Oxygen\ concentration~ equates to EffectOxygen concentration.

Note, underlined text is not defined in any dialect of markdown (including pandoc markdown) as the developers believe that the underline style is a relic of the days of typewriters when there where few alternatives for emphasizing words. Furthermore, underlining of regular words within a sentence tends to break the aesthetic spacing of lines.

Horizontal lines are indicated by a row of three or more *, - or _ characters (optionally separated by spaces) with a blank row either side.


Focused example

The rate of oxygen consumption (O~2~ per min^-1^.mg^2^) ...
Effect~Oxygen\ concentration~ 

Markdown document

Markdown (*.md)

title: Example markdown
author: D. Author
date: 16-06-2020

This is the title

A paragraph of text containing a word that is **emphasised**, ~~strikethrough~~
and `Monospace`.


The rate of oxygen consumption (O~2~ per min^-1^.mg^2^)
Effect~Oxygen\ concentration~



pandoc -o example2.pdf  


pandoc -s -o example2.html  

word (docx)

pandoc -o example2.docx  

Section headings

Pandoc markdown supports two heading formats (pandoc markdown headings must be proceeded by a blank line):

  • Setext-style headings. Level 1 headings are specified by underlining the heading with a row of = characters and level 2 headings are specified by underlining with a row of - characters.

    Setext-style headings only support level 1 and level 2 headings.

Focused example

Section 1


### Subsubsection

# Section 2

## Subsection 

### Subsection 

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013

Section 1


Section 2


pandoc -o example3a.pdf 


pandoc -s -o example3a.html 

word (docx)

pandoc -o example3a.docx 

  • Atx-style headings. Levels 1-6 headings comprise one to six # characters followed by the heading text.

Focused example

Section 1


### Subsubsection

# Section 2

## Subsection 

### Subsection 

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013

# Section 1

## Subsection 

### Subsubsection

# Section 2


pandoc -o example3b.pdf 


pandoc -s -o example3b.html 

word (docx)

pandoc -o example3b.docx 

Table of contents

A table of contents can be included by issuing the --toc command line switch to pandoc. For some output formats (such as HTML), a block of links to section headings is created, whilst for others (such as LaTeX), an instruction (\tableofcontentsfor the external driver to create the table of contents is generated.

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013

# Section 1

## Subsection 

### Subsubsection

# Section 2


pandoc --toc -o example3b.pdf 


pandoc -s --toc -o example3b.html 

word (docx)

pandoc --toc -o example4.docx 

Block quotations

Focused example

Normal text

> This is a block quotation.  Block quotations are specified by
> proceeding each line with a > character.  The quotation block
> will be indented.
> To have paragraphs in block quotations, separate paragraphs
> with a line containing only the block quotation mark character.

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013

# Section 1
> This is a block quotation.  Block quotations are specified by
> proceeding each line with a > character.  The quotation block
> will be indented.
> To have paragraphs in block quotations, separate paragraphs
> with a line containing only the block quotation mark character.

Block quotations in pandoc markdown follows email conventions - that is, each line is proceeded by a > character.


pandoc -o example5.pdf 


pandoc -s -o example5.html 

word (docx)

pandoc --toc -o example5.docx 

Verbatim (code) blocks

Verbatim blocks are typically used to represent blocks of code syntax. The text within the verbatim block is rendered literally as it is typed (retaining all spaces and line breaks) and in monoscript font (typically courier). In pandoc markdown, verbatim text blocks are specified by indenting a block of text by either four spaces or a tab character. Within verbatim text, regular pandoc markdown formatting rules (due to spaces etc) are ignored.

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013

# Section 1

    a = rnorm(10,5,2)
    for (i in 1:10) {


pandoc -o example6.pdf 


pandoc -s -o example6.html 

word (docx)

pandoc --toc -o example6.docx 

Alternatively, verbatim blocks can be specified without indentation if the text block is surrounded by a row of three or more ~ characters. This format is often referred to as fenced code.

Focused example

Normal text

a = rnorm(10,5,2)
for (i in 1:10) {

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013

# Section 1
a = rnorm(10,5,2)
for (i in 1:10) {


pandoc -o example7.pdf 


pandoc -s -o example7.html 

word (docx)

pandoc --toc -o example7.docx 


There are three basic list environments available within pandoc markdown:

  • Bullet lists - un-numbered itemized lists
  • Ordered lists - enumerated lists
  • Definition lists - descriptive lists

Bullet lists

A bullet list item begins with either a *, + or - character followed by a single space. Bullets can also be indented.

Focused example

Bullet list

* This is the first bullet item
* This is the second.  
  To indent this sentence on the next line,
    the previous line ended in two spaces and
    this sentence is indented by four spaces.
* This is the third item

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013

# Section 1
  * This is the first bullet item
  * This is the second.  
    To indent this sentence on the next line,
    the previous line ended in two spaces and
    this sentence is indented by four spaces.
  * This is the third item


pandoc -o example8.pdf 


pandoc -s -o example8.html 

word (docx)

pandoc --toc -o example8.docx 

Ordered lists

An ordered list item begins with a number followed by a space. The list enumerator can be a decimal number or a roman numeral. In addition to the enumerator, other formatting characters can be used to further define the format of the list numbering.

Focused example

Ordered list

1. This is the first numbered item.
2. This is the second.
1. This is the third item.  Note that the number I supplied is ignored

(i) This is list with roman numeral enumerators
(ii) Another item

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013

# Section 1
  1. This is the first numbered item.
  2. This is the second.
  1. This is the third item.  Note that the number I supplied is ignored
# Section 2
  (i) This is list with roman numeral enumerators
  (ii) Another item


pandoc -o example9.pdf  


pandoc -s -o example9.html  

word (docx)

pandoc --toc -o example9.docx  

Note that only the value of the number used for the first item is considered. For subsequent list items the value of the numbers themselves are ignored, they are merely used to confirm that the list items have the same sort of enumerator.

Definition lists

Focused example

Definition list

Term 1
    :  This is the definition of this term

This is a phrase
    :  This is the definition of the phrase

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013

# Section 1
Term 1
 :  This is the definition of this term

This is a phrase
 :  This is the definition of the phrase


pandoc -o example10.pdf


pandoc -s -o example10.html

word (docx)

pandoc --toc -o example10.docx 

Nesting and the four space rule

To include multiple paragraphs (or other blocked content) within a list item or nested lists, the content must be indented by four or more spaces from the main list item.

Focused example

Nested lists

1. This is the first numbered item.
2. This is the second.
   i) this is a sub-point
   ii) and another sub-point
1. This is the third item.  Note that the number I supplied is ignored

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013

# Section 1
1. This is the first numbered item.
2. This is the second.
    i) this is a sub-point
    ii) and another sub-point
1. This is the third item.  Note that the number I supplied is ignored


pandoc -o example11.pdf 


pandoc -s -o example11.html 

word (docx)

pandoc --toc -o example11.docx

Ending lists

Normally, pandoc considers a list as complete when a blank line is followed by non-indented text (as markdown does not have starting and ending tags). However, if you wish to place indented text directly after a list, it is necessary to provide an explicit indication that the list is complete. This is done with the <!– end of list –> marker.

Similarly, if you wish to place one list directly following on from another list, a <!– –> marker must be used between the two lists so as to explicitly separate them.

Focused example

1. This is the first numbered item.
2. This is the second.
1. This is the third item.  Note that the number I supplied is ignored

<-- --!>

1. Another list.
2. With more points

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013

# Section 1
1. This is the first numbered item.
2. This is the second.
1. This is the third item.  Note that the number I supplied is ignored
<!-- -->
1. Another list.
2. With more points


pandoc -o example12.pdf 


pandoc -s -o example12.html 

word (docx)

Not sure why this does not work for word…

pandoc -o example12.docx 


As markdown is a very minimalist markup language that aims to be reasonably well formatted even read as plain text, table formatting must be defined by layout features that have meaning in plain text.

Table captions can be provided by including a paragraph that begins with either Table: or just :. Everything prior to the : will be stripped off during processing.

Simple tables

The number of columns as well as column alignment are determined by the relative positions of the table headings and dashed row underneath:

  • if the dashed line is flush with the end of the column header, yet extends to the left of the start of the header text, then the column will be right aligned
  • if the dashed line is flush with the start of the column header, yet extends to the right of the end of the header text, then the column will be left aligned
  • if the dashed line extends to the left of the start and right of the end of the header text, then the column will be center aligned
  • if the dashed line is flush with the start and end of the header text, then the column will follow the default justification (typically left justified)

The table must finish in either a blank line or a row of dashes mirroring those below the header followed by a blank row.

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013

# Section 1

Table: A description of the table

Column A    Column B    Column C
---------  ----------  ---------
Category 1    High        100.00
Category 2    High         80.50
---------  ----------  ---------


pandoc -o example13a.pdf 


pandoc -s -o example13a.html  

word (docx)

Note simple tables do not render well in Libre Office. The DOCX thumbnail presented below is generated by converting the DOCX to a png image using unoconv. As this is a command line tool that is part of the Libre Office family, the resulting thumbnail will not render the table correctly. The actual DOCx will nevertheless render fine within either Microsoft Word or WPS Office.

pandoc --toc -o example13a.docx  

Multiline tables

Simple tables can be extended to allow cell contents to span multiple lines. This imposes the following additional layout requirements:

  • the table must start with a row of dashes that spans the full width of the table
  • the table must end with a row of dashes that spans the full width of the table followed by a blank line
  • each table row must be separated by a blank line

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013

# Section 1

Table: A description of the table

Column A    Column B      Column 
---------  ----------  ---------
Category 1    High        100.00
High         95.00

Category 2    High         80.50
High         82.50


pandoc -o example13b.pdf 


pandoc -s -o example13b.html  

word (docx)

pandoc --toc -o example13b.docx  

Grid tables

Grid tables have a little more adornment in that they use characters to mark all the cell boundaries. However, by explicitly defining the bounds of a cell, grid tables permit more complex cell contents. A grid table for example, can contain a list or a code block etc.

Cell corners are marked by + characters and the table header and main body are separated by a row of = characters.

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013

# Section 1

Table: A description of the table

| Fruit         | Price         | Advantages         |
| Bananas       | $1.34         | - built-in wrapper |
|               |               | - bright color     |
| Oranges       | $2.10         | - cures scurvy     |
|               |               | - tasty            |

Table: Another table

|Column A   |Column B  |   Column C|
|Category 1 |100.00    | - point A |
|           |          | - point B |
|Category 2 | 85.00    | - point C |
|           |          | - point D |


pandoc -o example13c.pdf 


pandoc -s -o example13c.html  

word (docx)

pandoc --toc -o example13c.docx  

Although, grid tables require substantially more setup, emacs users will welcome that grid tables are compatible with emacs table mode.

Pipe tables

Finally, there are also pipe tables. These are somewhat similar to grid tables in requiring a little more explicit specification of cell boundaries, however, unlike grid tables, they have a means to configure column alignment. Cell alignment is specified via the use of : characters (see example below).. Nor is it necessary to indicate cell corners.

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013

# Section 1

Table: A description of the table

| Default | left  | Center | Right  |
|   High  | Cat 1 | A      | 100.00 |
|   High  | Cat 2 | B      |  85.50 |
|   Low   | Cat 3 | C      |  80.00 |


pandoc -o example13d.pdf 


pandoc -s -o example13d.html  

word (docx)

pandoc --toc -o example13d.docx  

Note pipe tables do not render well in Libre Office. The DOCX thumbnail presented below is generated by converting the DOCX to a png image using unoconv. As this is a command line tool that is part of the Libre Office family, the resulting thumbnail will not render the table correctly. The actual DOCx will nevertheless render fine within either Microsoft Word or WPS Office.

Figures and images

Images are not displayed in plain text (obviously). However, an image link in pandoc markdown will insert the image into the various derivative document types (if appropriate), Image links are defined in a similar manner to other links, yet preceded immediately by a ! character.

![in text label](filename)
[label]: filename

Focused example

![Figure caption](AIMS_wq.jpg)

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013

# Section 1

Include the JPEG figure

And a PNG figure



pandoc -o example14.pdf


pandoc -s -o example14.html  

word (docx)

pandoc -o example14.docx  

Math and equations

Markdown leverages TeX math processing. Whilst this does technically break the rules that promote source documents that are readable in text only mode, the payoff is that math is rendered nicely in the various derivative documents (such as pdf or html). In fact, math are passed straight through to the derivative document allowing that document (or is reader) to handle TeX math as appropriate.

Inline math is defined as anything within a pair of $ characters and for math in its own environment (paragraph), use a pair of $$ characters.

Focused example

The formula, $y=mx+c$, is displayed inline.

Some symbols and equations (such as 
$\sum{x}$ or $\frac{1}{2}$) are rescaled 
to prevent disruptions to the regular 
line spacing.
For more voluminous equations (such as 
$\sum{\frac{(\mu - \bar{x})^2}{n-1}}$), 
some line spacing disruptions are unavoidable.  
Math should then be displayed in display mode.
$$\sum{\frac{(\mu - \bar{x})^2}{n-1}}$$ 

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013

# Section 1

The formula, $y=mx+c$, is displayed inline. 
Some symbols and equations (such as 
$\sum{x}$ or $\frac{1}{2}$) are rescaled 
to prevent disruptions to the regular 
line spacing.
For more voluminous equations (such as 
$\sum{\frac{(\mu - \bar{x})^2}{n-1}}$), 
some line spacing disruptions are unavoidable.  
Math should then be displayed in displayed mode.
$$\sum{\frac{(\mu - \bar{x})^2}{n-1}}$$


pandoc -o example15.pdf  


pandoc --mathjax -s -o example15.html   

word (docx)

pandoc -o example15.docx   

Note not all math are rendered correctly in Libre Office. The DOCX thumbnail presented below is generated by converting the DOCX to a png image using unoconv. As this is a command line tool that is part of the Libre Office family, the resulting thumbnail will not render some of the equations correctly. The actual DOCx will nevertheless render fine within either Microsoft Word or WPS Office.


In addition to the above, there is a pandoc filter to use cross referencing.

Citations (bibliography)

It is always important to cite the original source of an idea or finding, and an analysis document is no exception. It is often and the statistical development stage that methodological references are consulted. It is vital that any used sources are documented - particularly before they are misplaced or forgotten.

Pandoc can incorporate citations from any of the following formats: BibTeX (.bib), Copac (.copac), CSL JSON (.json), CSL YAML (.yaml), EndNote (.enl), Endnote XML (.xml), ISI (.wos), MEDLINE (.medline), MODS (.mods) and RIS (.ris).

For illustrative purposes, bibtex database is a plain text file with specific tag pairs to define the different components (fields) of a reference. To illustrate lets generate a bibtex database with two references.

    author = {Bolker, B. M. and Brooks, M. E. and Clark, C. J. and Geange, S. W. and Poulsen, 
        J. R. and Stevens, H. H. and White, J. S.},
    journal = {Trends in Ecology and Evolution},
    number = {3},
    pages = {127-135},
    title = {Generalized linear mixed models: a practical guide for ecology and evolution},
    volume = {24},
    year = {2008}

    author    = "Michel Goossens and Frank Mittlebach and Alexander Samarin",
    title     = "The LaTeX Companion",
    year      = "1993",
    publisher = "Addison-Wesley",
    address   = "Reading, Massachusetts"

The bibtex format is one of the major formats provided for export by journals and databases like citeulike and crossref.

The bibliography can be referenced either via a bibliography item in the YAML metadata or using the --bibliography argument to pandoc. This points to a file containing the bibliography.

Similarly, the citation style is determined via either the csl YAML medata data item or the --csl pandoc argument and should point to a Citation Style Language file. A large selection of CSL files can be found in the Zostero Style Repository. Additional styles can be found in a repository of CSL 1.0 styles.

Incorporating citations requires the pandoc-citeproc filter (and this must be included after the pandoc-crossref filter (if this is included).

To include a citation, involves appending an at sign (@) to the citation identifier. For example, @Bolker-2008-127. If the item is enclosed in square braces, the associated in-text citation will be enclosed in round brackets in the output. Additional text can also be included within the square braces. For example, [see @Bolker-2008-127].

If you are using Rstudio and have the citr package loaded, then, if you select ‘Insert citations’ from the the ‘Addins’ toolbar icon, you will be able to search your bibtex bibliography and selected references will be added to your document.

Markdown document

Markdown (*.md)

title: This is the title
author: D. Author
date: 14-02-2013
bibliography: ../resources/references.bib
csl: ../resources/marine-pollution-bulletin.csl

# Introduction {#sec:intro}

@Quinn-2002-2002 described something important about ecological statistics in general.
Something important about generalized mixed models [@Bolker-2008-127].

# References


pandoc -F pandoc-crossref --citeproc --number-sections -o example19.pdf     


pandoc -F pandoc-crossref --citeproc --number-sections -s -o example19.html           

word (docx)

pandoc -F pandoc-crossref --citeproc --number-sections -o example19.docx          



Ideally, reproducible research works best when the documentation and source codes are woven together into a single document. Traditionally, document preparation involved substantial quantities of ‘cutting and pasting’ from statistical software into document authoring tools such as LaTeX, html or Microsoft Word. Of course, any minor changes in the analyses then necessitated replacing the code in the document as well as replacing any affected figures or tables. Keeping everything synchronised was a bit of a battle.

Early implementations of reproducible research in R involved embedding chunks of R code between special tags within either HTML or LaTex documents. The file would then be parsed through specific R functions to evaluate each chunk and replace them with their tidied code and outputs in a process referred to as either weaving or knitting (depending on the function).

Over time the knitting routines (as supported by the knitr package) became more sophisticate. At the same time, knitr provided support for embedding R chunks into markdown. Here, markdown has begun to replace HTML and LaTeX as the base document because (as we illustrate above) it is both simple to use and can act as a universal language from which other formats can be generated.

Rmarkdown is essentially a markdown file with R (or many other languages) code embedded within specially marked chunks. Code chunks are defined as starting with the sequence ```{ and end with ```. For example, to define a simple R code chunk, we would include:

```{r name}


Any code that appears in the lines between the opening and closing chunk sequences will be evaluated by R. Similarly, other languages can also be used.

Importantly, to evaluate the code chunks embedded within an Rmarkdown document, the code is passed through a new R session. This means that although you might be testing the code in an R console (or Rstudio) as you write the code, it is important that the code be completely self contained. Therefore, if the code relies on a package or external function, these must be loaded as part of the script.

To see knitting in action, we will add an R code chunk to a markdown document. When we knit this document, knitr will convert the Rmarkdown file into a markdown file by evaluating any code chunks and replacing them with formatted input and output markdown fenced contents. Thereafter, we can use pandoc as we did previously to convert this markdown into a variety of output formats.

Rmarkdown document

Rmarkdown (*.Rmd)

title: Example markdown
author: D. Author
date: 16-06-2020

# This is the title

```{r summary, eval=TRUE, results='markup'}
x <- rnorm(10)


echo 'library(knitr); knit("Example1.Rmd", output="")' | R --no-save --no-restore
pandoc -o Example1.pdf  


echo 'library(knitr); knit("Example1.Rmd", output="")' | R --no-save --no-restore
pandoc -s -o Example1.html           

word (docx)

echo 'library(knitr); knit("Example1.Rmd", output="")' | R --no-save --no-restore
pandoc -s -o Example1.docx           

The above workflow is conveniently supported by an R package called rmarkdown whose main function is to act as a wrapper for knitting and running pandoc. As a very basic overview, the following would render an Rmarkdown document as a pdf file.

rmarkdown::render('file.Rmd', output_format='pdf_document')

The rmarkdown package comes with numerous output formats. These include:

Output format rmarkdown name
PDF pdf_document (requires Tex)
HTML html_document
DOCx word_document
LaTeX latex_document
ODT odt_document
RTF rtf_document
Github github_document
Context context_document
Markdown md_document
ioslides presentation ioslides_presentation
Slidy presentation slidy_presentation
Powerpoint presentation powerpoint_presentation
Beamer presentation beamer_presentation (requires Tex)

Additionally, the bookdown package contains versions of many of these formats that provide support for more advanced features (such as captions etc). These will be highlighted below where appropriate.

To illustrate the basic use of the render() function, lets process the above simple example file.

Rmarkdown document

Rmarkdown (*.Rmd)

title: Example markdown
author: D. Author
date: 16-06-2020

# This is the title

```{r summary, eval=TRUE, results='markup'}
x <- rnorm(10)


render('Example1.Rmd', output_format='pdf_document')


render('Example1.Rmd', output_format='html_document')

word (docx)

render('Example1.Rmd', output_format='word_document')

Code chunks

As a minimum, it is advisable that each chunk be given a unique name (name in the example above). There are numerous additional arguments (options) that can be included in the chunk header. These control the behaviour of the knitting process and the common ones are listed in the following table.

Option Description
Code evaluations


  • TRUE or FALSE (whether to evaluate the code)
  • a vector of numbers (indicating which lines to evaluate)
include TRUE or FALSE (whether to include the evaluated chunk in the output)
engine the language used to evaluate the code chunk (default is ‘R’)
code code to replace the code in the chunk
child a character vector of (.Rmd) filenames to be evaluated and substituted in place of the chunk


  • TRUE or FALSE (whether to output the code)
  • a vector of numbers (indicating which lines of code to output)


  • ‘markup’ - output in the format of the surrounding document
  • ‘asis’ - output as raw (verbatim) output
  • ‘hold’ - defer the output of individual outputs until the chunk end
  • ‘hide’ - hide the output (not warnings or errors)


  • TRUE or FALSE (whether to include warnings)
  • a vector of numbers (indicating which warnings to include)


  • TRUE or FALSE (whether to include errors)
  • a vector of numbers (indicating which errors to include)


  • TRUE or FALSE (whether to include messages)
  • a vector of numbers (indicating which messages to include)
Code decoration


  • TRUE or FALSE (whether to reformat the code)
  • ‘styler’ to use styler package for reformatting
tidy.opts a list of options passed on to the tidying function. For example, tidy.opts=list(width.cutoff=60)
prompt TRUE or FALSE (whether to include a command prompt as a prefex to each line of code
comment the comment character used as a prefix to each output line (e.g. ##)
highlight TRUE or FALSE (whether to apply syntax highlighting to the code)
size the font size for code and output (only some document types)
strip.white TRUE or FALSE (whether to remove leading spaces from code in output)
background the color of the code and output background (only some document types)
cache TRUE or FALSE (whether to cache the chunk) Don’t cache chunks that load packages
dependson a character vector of chunk names that this chunk depends on for the purpose of caching
fig.width,fig.height the width and height (in inches) of generated plots
out.width,out.height the width and height to resize the plots in the output
fig.cap a character string to use as a figure caption
fig.align ’default`, ‘left’, ‘right’, ‘center’ - alignment of plot on page

Knitting tables

Tables deserve and require special treatment hear. There are numerous routines (and packages) to support the elegant production of tables from R. Many of these routines are specific to a specific output format or else it is necessary to nominate which output format you require when calling the function.

That said, the kable function (which is part of the knitr package) is able to be relatively agnostic since it defaults to output in markdown table format. However, markdown only supports very simple tables and thus if ta fable is forced to go through a markdown funnel, many attributes (such as colours, multicolumn headers, captions and labels) are lost.

To illustrate this, lets render a simple Rmarkdown table to PDF, HTML and DOCX formats just using the kable function with all defaults. For this example, we will use a built in data set (BOD - biochemical oxygen demand).

title: This is the title
author: D. Author
date: 14-02-2013

```{r BODData}




Not withstanding issues related to converting docx to png thumbnails (via unoconv), the simple tables are produced in formats that are broadly appropriate to the document type.

There are unfortunately a number of limitations to this simple approach to including tables:

  • there is not full support for referencing tables
  • table formatting must be relatively simple

Better support for referencing is provided by the bookdown package. This package has richer versions of many of the output formats defined in the rmarkdown package and are distinguished from the rmarkdown versions by a trailing 2. For example, the bookdown version of a pdf document is called pdf_document2. The bookdown package also has additional formats that relate to online books etc.

If we only intend to output our document in a single format, then we can achieve more complex table formatting via specific packages (such as xtable for LaTeX tables), or activate output specific options to broader packages/functions (such as kable).

To illustrate the extended capabilities of the bookdown package for tables, we will again use the BOD data set. We will use a slightly different Rmarkdown for each of the pdf, html and docx output formats.

Note in the following examples, I will intentionally set echo=FASLE so as to exclude the chunk code in the output.


For pdf, we will include more metadata (to specify the XeLaTeX engine, the article document class and Arial font for the main body). We have also included an initial R chunk in which we load a few packages (the later two for supporting the tables). The code within both chunks are suppressed from the output.

title: The title
    latex_engine: xelatex
    toc: no
documentclass: article
mainfont: Arial

```{r packages, message=FALSE, echo=FALSE}

# Section 1

Bla bla (see Table \@ref(tab:BODData)).
(ref:tab-cap) Biochemical oxygen demand
```{r BODData, echo=FALSE, tab.pos='h'}
kable(BOD, caption="(ref:tab-cap)", format='latex', booktabs=TRUE) %>%
  kable_styling(latex_options = "HOLD_position")
render('Example2_1.Rmd', output_format='bookdown::pdf_document2')  


title: The title
    toc: no

# Section 1

Bla bla (see Table \@ref(tab:BODData)).

(ref:tab-cap) Biochemical oxygen demand

```{r BODData, echo=FALSE}
knitr::kable(BOD, caption="(ref:tab-cap)", format='html', booktabs=TRUE)
render('Example2_2.Rmd', output_format='bookdown::html_document2')

word (docx)

Word tables are best serviced via flex tables. These are supported by the flextable package.

title: The title
    toc: no

# Section 1

Bla bla (see Table \@ref(tab:BODData)).

(ref:tab-cap) Biochemical oxygen demand

```{r BODData, echo=FALSE, tab.cap='(ref:tab-cap)'}
flextable::flextable(BOD) %>%
    fontsize(size=8, part='all') %>%
render('Example2_3.Rmd', output_format='bookdown::word_document2')

If there is a need to have a single Rmarkdown source yield multiple output formats (e.g. pdf, html and docx), we can specify the table code conditional on output format (using a function in knitr called to_pandoc()).

Markdown document

Rmarkdown (*.Rmd)

title: The title
    toc: no
    toc: no
    latex_engine: xelatex
    toc: no
geometry: paperwidth=12cm,paperheight=15cm,hmargin=1cm,vmargin=1cm
documentclass: article
mainfont: Arial

```{r packages, message=FALSE, echo=FALSE}

# Section 1

Bla bla (see Table \@ref(tab:BODData)).
(ref:tab-cap) Biochemical oxygen demand
```{r BODData, echo=FALSE, tab.cap='(ref:tab-cap)'}
if (knitr:::pandoc_to('latex')) {
 kable(BOD, caption="(ref:tab-cap)", format='latex', booktabs=TRUE) %>%
    kable_styling(latex_options = "HOLD_position")
} else if (knitr:::pandoc_to('html')) {
 kable(BOD, caption="(ref:tab-cap)", format='html', booktabs=TRUE)
} else if (knitr:::pandoc_to('docx')) { 
 flextable(BOD) %>%
    fontsize(size=8, part='all') %>%


render('Example3.Rmd', output_format='bookdown::pdf_document2')   


render('Example3.Rmd', output_format='bookdown::html_document2')   

word (docx)

render('Example3.Rmd', output_format='bookdown::word_document2')

Knitting figures from code

Markdown document

Rmarkdown (*.Rmd)

title: The title
    toc: no
    toc: no
    latex_engine: xelatex
    toc: no
geometry: paperwidth=12cm,paperheight=15cm,hmargin=1cm,vmargin=1cm
documentclass: article
mainfont: Arial

```{r packages, message=FALSE, echo=FALSE}

# Section 1

Bla bla (see Figure \@ref(fig:BODfig)).

(ref:fig-cap) Biochemical oxygen demand

```{r BODfig, echo=FALSE, out.width='60%', fig.cap='(ref:fig-cap)'}
ggplot(BOD) +
 geom_point(aes(y=demand, x=Time))


render('Example4.Rmd', output_format='bookdown::pdf_document2')    


render('Example4.Rmd', output_format='bookdown::html_document2')   

word (docx)

render('Example4.Rmd', output_format='bookdown::word_document2')

Knitting images

Although it is possible to directly add external images to a markdown (Rmarkdown) document as illustrated above, only LaTeX derived outputs (such as pdf) will include a caption and allow referencing.

As an alternative, we can include external images using the include_graphics() function in the knitr package. Not only does this permit us to add a caption (via the chunk option fig.cap=), we can also specify the output size of the figure.

Similar to knitting tables above, we will illustrate adding external images via an Rmarkdown that supports multiple output formats.

Markdown document

Rmarkdown (*.Rmd)

title: The title
    toc: no
    toc: no
    latex_engine: xelatex
    toc: no
geometry: paperwidth=12cm,paperheight=15cm,hmargin=1cm,vmargin=1cm
documentclass: article
mainfont: Arial

```{r packages, message=FALSE, echo=FALSE}

# Section 1

Bla bla (see Figure \@ref(fig:AIMSwq)).

(ref:fig-cap) Biochemical oxygen demand

```{r AIMSwq, echo=FALSE, out.width='60%', fig.cap='(ref:fig-cap)'}


render('Example5.Rmd', output_format='bookdown::pdf_document2')   


render('Example5.Rmd', output_format='bookdown::html_document2')     

word (docx)

render('Example5.Rmd', output_format='bookdown::word_document2')

Changing the look

Recall that the meta data of a markdown file is defined in a special YAML block (which is normally, yet not necessarily, positioned at the top of the markdown file). In the examples above, the YAML block was used to define the title, an author, date and bibliography and can to specify some settings associated with the broad type of output (PDF, HTML, Word).

The YAML block is also used to determine the style and formatting of the output. In this section, we will explore some of the major customisations available from small changes to the YAML block.

The following table lists the main simple customizations that can be applied and which document type they can be applied to. In the table (yes/no) are the same as (true/false). Values in square braces are alternatives to the default values. I have purposely used settings (in the Options column) that contrast the default settings so that the effects are very obvious. They do not constitute any form of recommendation.

Option Description PDF HTML Word (docx)
toc: yes Include a table of contents (default: no) ✔ ✔ ✔
toc_depth: 2 Depth of the table of contents (default: 3) ✔ ✔ ✔
number_sections: yes Number the sections (default: no) ✔ ✔ ✖
fig_width: 5 Width of figures in inches (default: 6) ✔ ✔ ✔
fig_height: 5 Height of figures in inches (default: 4.5) ✔ ✔ ✔
df_print: kable Processing of data.frames (default: default) [kable,tibble,paged] ✔ ✔ ✔
highlight: zenburn R code syntax highlighting (default: default) [tango, pygments, kate, monochrome, espresso, zenburn, haddock, textmate] ✔ ✔ ✔
code_folding: hide Hide/reveal code blocks in output (default: none) [show,hide] ✖ ✔ ✖
theme: spacelab A pre-packaged page theme (default: default) [cerulean, journal, flatly, readable, spacelab, united, cosmo, lumen, paper, sandstone, simplex, yeti] ✖ ✔ ✖
latex_engine: xelatex The LaTeX engine to use (default: pdflatex) [xelatex, lualatex] ✔ ✖ ✖

Lets now explore what the effect of the above customizations have on the output format of a very simple Rmarkdown document.. As with the previous example, the YAML header will include options for all three output formats (pdf, html and docx). If we were to render an Rmarkdown document with the above YAML block via the render() function (and without providing the output_format argument), the underlying engine will attempt to generate a PDF file (since the pdf_document is the first one defined in the YAML block). However, if we do specify a different output_format argument, the associated customizations will be applied when rendering the document. The customizations can be different for each of the document types (PDF, HTML and Word).

Markdown document

Rmarkdown (*.Rmd)

title: The title
    toc: yes
    toc_depth: 2
    number_sections: yes
    fig_width: 5
    fig_height: 5
    highlight: zenburn
    df_print: kable
    code_folding: hide
    theme: spacelab
    toc: yes
    toc_depth: 2
    number_sections: yes
    fig_width: 5
    fig_height: 5
    highlight: zenburn
    df_print: kable 
    latex_engine: xelatex
    toc: yes
    toc_depth: 2
    number_sections: yes
    fig_width: 5
    fig_height: 5
    highlight: zenburn
    df_print: kable
documentclass: article
mainfont: Arial

# Section 1
Text with embedded R code.

```{r Summary}
```{r head}

## Subsection 1                     

We can include a figure

```{r Plot, fig.cap='Gaussian density.',message=FALSE}
data.frame(x=rnorm(1000)) %>%
    ggplot(aes(x=x)) + geom_density()

Perhaps even a table

```{r table}
kable(summary(cars), caption='Summary of the cars data set')


render('Example6.Rmd', output_format='bookdown::pdf_document2')     


render('Example6.Rmd', output_format='bookdown::html_document2')     

word (docx)

render('Example6.Rmd', output_format='bookdown::word_document2')

In addition to the above settings that can be applied to different output formats and yet are specified as third-level arguments (that is under each document type), there are also settings that can be applied to multiple output document formats yet that are applied as top-level arguments. These are listed in the following table and example outputs.

Option Description
fontsize: 10pt Document font size [10pt, 11pt, 12pt]
abstract: This is the abstract… An abstract

Although Rmarkdown (along with knitr and pandoc) is the Swiss Army knife of document preparation – in that the same source document can be used to produce multiple output formats – each of these three formats provide different utilities. As such, it is often desirable to customize the outputs to make best use of different formats.

  • PDF documents provide a good way to communicate research in a a controlled and universal manner. In this render workflow, PDF documents are created via LaTeX. Styling in LaTeX is controlled via macros defined in the document preamble. Hence, major customizations are brought about by dictating changes to the preamble.

  • HTML documents provide the presentation of dynamic content. Styling in HTML documents is largely controlled via style sheets and/or scripts (particularly javascript).

  • Word documents provide a document format that is accessable to non-scientists. Styling in Word is provided by the element styles defined in the document itself - which are often imposed by a template.

Since styling is each of these formats is quite different and as some of these customizations can get extensive, we will treat each document type separately in the following sections.

PDF documents

The styling information in LaTeX is defined in the document preamble. A markdown LaTeX template is essentially a skeleton LaTeX document with an extensive collection of conditional statements that allow the preamble to be customized based on the options indicated in the markdown YAML header.

The preable of a LaTeX document defines a document class followed by a series of macros (similar to constants and functions in other languages) that define the settings and styling of the document.

When pandoc converts a markdown document into a LaTeX document, it accesses a template which is used to generate the preable for the LaTeX document. The template system is designed so that information specified in the YAML block of the markdown document can be used to govern certain aspects of the resulting LaTeX preamble

The following table, lists some of the customizations that can be included as top-level (not indented) YAML meta data for PDF output. Note, these can be present when rendering to other formats, they will just be ignored.

Option Description
documentclass: book LaTeX document class (default: article, minimal) [book]
classoption: a4paper Options for the documentclass [oneside,a4paper]
geometry: margin=1cm Options for the geometry class
fontsize: 11pt The size of the base font (default: 12pt)
fontfamily: mathpazo Document fonts (only available when using pdflatex
mainfont: Arial Document fonts (only available when using xelatex)
sansfont: Arial Document fonts (only available when using xelatex)
monofont: Arial Document fonts (only available when using xelatex)
mathfont: Arial Document fonts (only available when using xelatex)
linkcolor: blue Colour of internal links
urlcolor: blue Colour of external links
citecolor: blue Colour of citation links

A list of available system fonts available to LaTeX can be obtained by issuing the following commands (in a terminal):

fc-list --format="%{family[0]}\n" | sort | uniq

Alternatively, within R:

Example output
Lets try it:

Rmarkdown (*.Rmd)

title: The title
    latex_engine: xelatex
    toc: yes
    toc_depth: 2
    number_sections: yes
    fig_width: 5
    fig_height: 5
    highlight: zenburn
    df_print: kable
documentclass: article
classoption: a4paper
mainfont: Arial
mathfont: LiberationMono
monofont: DejaVu Sans Mono
abstract: This is the abstract
urlcolor: red

# Section 1
Text with embedded R code.

```{r Summary}
```{r head}
y_i = \beta_0 + \beta_1 x_i

## Subsection 1                     
We can include a figure

```{r Plot, fig.cap='Gaussian density.',message=FALSE}
data.frame(x=rnorm(1000)) %>%
    ggplot(aes(x=x)) + geom_density()

Perhaps even a table

```{r table}
kable(summary(cars), caption='Summary of the cars data set', booktabs=TRUE)
render('Example7.Rmd', output_format='bookdown::pdf_document2')       

In the example, note that although we had nominated for kable output when printing dataframes, this does not by default apply things like captions and booktabs formatting. Hence notice in the above example that the first table (resulting from a call to head) has not been fully formatted and does not have a caption whereas the second table does have these features.


More wholesale style changes are supported by the use of templates. Whilst pandoc comes with a default LaTeX template, it is possible to create your own so as to provide even greater control over the output style. To illustrate, we will create a very minimal pandoc template for LaTeX (lets call it latex.template).

pandoc replaces specific tags (words enclosed in a pair of $ signs) with particular items (such as the document body or YAML options). In the following example, pandoc will replace the $documentclass$ with whatever is specified in the YAML header and $body$ with the markdown document body.




    template: latex.template
    latex_engine: xelatex
    highlight: null
documentclass: article

Text with embedded R code.

```{r head}

y_i = \beta_0 + \beta_1 x_i
render('Example7a.Rmd', output_format='bookdown::pdf_document2')         

In this way it is possible to construct a complete template so as to have full control over the output document style.

Packaged templates

Writing LaTeX code, and thus LaTeX templates is not easy for those not familiar with the language. Fortunately there are numerous templates available, the most popular of which are:

The rticles package provides additional Rmarkdown pandoc templates as well as wrapper functions to help put all the template and associated files in the correct location. Furthermore, if the rticles package is installed, then Rstudio will have additional templates available from the new R markdown dialog box.

rticles templates

The same can be completed in script using the draft function from the rmarkdown package.

rmarkdown::draft(file='../resources/plos.Rmd', template='plos', package='rticles', edit=FALSE)

rmarkdown::draft(file='../resources/pnas.Rmd', template='pnas', package='rticles', edit=FALSE)

rmarkdown::draft(file='../resources/elsevier.Rmd', template='elsevier', package='rticles', edit=FALSE)





Unlike PDF and Word documents, HTML documents have the potential to be more interactive and dynamic. This allows the presentation of content to be more flexible. Consequently, there are numerous customizations that can be applied to HTML documents that cannot be applied to other document types.

There are a number of additional second-level YAML arguments that can be used to customize HTML documents - these are listed in the following table.

Option Description
toc_float: yes Float the table of contents in a panel on the side of the document. Other options are also available (see below)
collapse: yes Collapse the table of contents to just the first level headings
smooth_scroll: yes Animate page scrolling when items selected from table of contents
self_contained: yes Whether to generate a self contained (stand alone) HTML (other than Mathjax)

Tabbed sections

One way to maintain present content in a compact manner without excluding material is to have content that is hidden until it it revealed. This behaviour is supported via tabsets. Tabsets are defined by placing a .tabset class attribute. All subheadings under this heading will then be rendered as tabs.

title: This is the title
author: D. Author
date: 14-02-2013
    toc: yes
    toc_depth: 2
    number_sections: yes
    fig_width: 5
    fig_height: 5
    df_print: kable
    highlight: zenburn
    code_folding: hide
    theme: spacelab
    toc_float: yes
    collapse: no
## Output {.tabset .tabset-fade}
### Figure
We can include a figure

```{r Plot, fig.cap='Gaussian density.',message=FALSE}
data.frame(x=rnorm(1000)) %>%
    ggplot(aes(x=x)) + geom_density()

### Table

```{r table}
kable(summary(cars), caption='Summary of the cars data set')
render('Example8a.Rmd', output_format='bookdown::html_document2')  

To break a tabset, begin a new parent section. For example, in the above example (in which a tabset is defined on the ## Output subsection), we could break the tabs (to allow more content after the tabs) by starting a new blank subsection heading (e.g. ##). To ensure that this blank or ghost section is not included in the table of contents (or any heading formatting), we can include {.unlisted .unnumbered} attributes.

Style sheets

The general look and style of a HTML document is controlled via Cascading Style Sheet (CSS). A CSS comprises one or more files that define the styling of the HTML elements. Rmarkdown has a default HTML template and style sheet that are used by pandoc when converting markdown documents to HTML documents.

It is possible to provide alternate templates and CSS. For example, to nominate an additional CSS that will be applied to the default template after the default CSS file, we can use the css: second-level argument. Since, this additional CSS is applied after the default CSS, any styles defined within the additional CSS will take precedence.

For a simple example, if we wanted all the Section headings to be a light blue color and the title to be the same colour blue colour as the dropdown menu background, we could define the following:


h1, h2, h3 {
    color: lightblue;
h1.title {
    color: #446e9b;


title: This is the title
author: D. Author
date: 14-02-2013
    toc: yes
    toc_depth: 2
    number_sections: yes
    fig_width: 5
    fig_height: 5
    df_print: kable
    highlight: zenburn
    code_folding: hide
    theme: spacelab
    toc_float: yes
    collapse: no
    css: my-style.css

# Section 1
Text with embedded R code.

```{r Summary}
```{r head}

## Output {.tabset .tabset-fade}
### Figure
We can include a figure

```{r Plot, fig.cap='Gaussian density.',message=FALSE}
data.frame(x=rnorm(1000)) %>%
    ggplot(aes(x=x)) + geom_density()

### Table

```{r table}
kable(summary(cars), caption='Summary of the cars data set')
render('Example8b.Rmd', output_format='bookdown::html_document2')