Containerisation - docker and singularity

Author

Murray Logan

Published

June 14, 2025

1 Background information

In the previous tutorial, we discussed why it is important that your codebase is under version control and backed up to help ensure that your analyses can be evaluated and replicated by you and others both now and into the future. However, having access to the code (and data) does not always guarantee full reproducibility - this can also be effected by the exact software environment in which the code is run.

In the context of statistical analyses performed in R for example, R as well as the various packages that you have elected to use in support your analyses can (and do) evolve over time. Some functions get modified and some even get depreciated. Hence, over time, code that once worked perfectly (or at least adequately) can become broken.

Early solutions to this facet of reproducibility focused on virtual machines. Virtual machines (VM) build entire software environments on top of a software layer that mimics a physical computer such that each VM runs on a host computer is a completely separate self-contained entity. Whilst VMs do permit great flexibility (as virtually any operating system can be installed on a VM), they are considerably slower and less efficient than physical machines. Moreover, it is typically necessary to allocate a fixed amount of computer resources (particularly CPU) to the VM in advance.

More modern solutions focus instead on Containers. In contrast to VMs, containers do not mimic a physical computer, rather they only virtualise layers on top of the host operating system. Indeed, containers share (read only) the host OS kernel and binaries/libraries and thus containers and the applications contained therein can be very “light” and are typically are almost as performant as applications run natively on the host.

Time for some container terminology:

Container image is a static (unchangeable) file (or collection of files) that bundles code and all its dependencies (such as the necessary system libraries, code, runtime and system tools). Essentially, the image has all the information required to reproduce an software environment on any compatible machine. However, an image is just a snapshot which serves as a template to build a container. In other words, a container is a running image, and cannot exist without the image, whereas an image can exist without a container.
Container is a standard (linux) process whose software environment is defined by the contents of a container image and that runs on top of the host’s OS.

2 Preparations

If you intend to follow along with this tutorial, you may like to:

create a new folder (hereafter referred to as the sandpit folder) in which to create some files. On my local machine, I have a folder (tmp) in my home folder into which I will place a folder (called docker_tests) for this very purpose. On Linux and MacOSX that would be achieved via the following:
```
mkdir ~/tmp/docker_tests
```

install Docker
install apptainer/singularity (if you intend to follow along with containers on a HPC).

3 Docker

Currently, the most popular container engine in use today is Docker. Docker is easy to install on most operating systems and comes with tools to build, manage, run and distribute container images (the later of which is supported via the DockerHub container ecosystem.

3.1 Simple overview

Create a Docker definition file.

The Dockerfile contains a set of instructions that Docker uses to build your container with the correct specifications. For now you do not need to know all the bits and pieces here (though please see this link for a more in-depth understanding of what the Dockerfile is capable of).

Lets start with a very simple Dockerfile (which should be a plain text file located in the root of a project directory). This first example will be a very minimal example and much of the rest of the Docker section of this tutorial will then progressively build on this example to introduce more and more complexity.

The image we will build will start with a very minimal Debian Linux base called minidep sourced from Dockerhub. This provides a fully functioning Linux operating system complete with the typical terminal power tools.

To then extend the image by updating the package lists (location of repositories) before adding (installing) a small fun terminal application (cowsay) that generates ASCII art of a cow (or other animals) along with a speech bubble.
```
FROM bitnami/minideb:bookworm
LABEL maintainer="Author"
LABEL email="author_email@email.com"

## Install the os packages
RUN apt-get update \
  && apt-get install -y --no-install-recommends \
    cowsay \
  && rm -rf /var/lib/apt/lists/*
```
In the above Dockerfile:
- the three first rows in contain information on the base image on which to construct your docker container image (in this case bitnami/minideb:stretch container freely provided by the bitnami team), as well as information about yourself. minideb is a minimal debian operating system.
- the FROM command points to a parent image. Typically, this will point to a specific image within a registry on Docker hub. This generates the base layer of the container image.
- the LABEL command is one of numerous environmental variables and is used to add metadata to an image. In this case, there are two entries to specify information about the maintainer and their contact details. Note, entries must be key-value pairs and values must be enclosed in double quotes.
- the RUN command runs shell commands in a new layer on top of the current image and commits the result. Each RUN generates a new layer. The above example, first updates the packages list and then installs an additional package (cowsay).
Build the docker image

In a terminal in the same location as the Dockerfile (i.e. your sandpit folder), enter something the following:
```
docker build . --tag minideb
```
where:
- . indicates that the path used for the build context (in this case the current working directory - . means current location). Any files within this path can be copied into the image or used in the build context (for example, a Dockerfile).
- --tag minideb provides a name (and optionally, a tag) for the image (in this case minideb). The name (and tag) can be anything, yet should be descriptive enough to help you distinguish this container image from other container images that you might construct on your system.
This will build a series of docker container images (each of the layers built upon the layer before) in a local registry
More details

Usage: docker build [OPTIONS] PATH | URL | -
Common options:

Name Description Default

--file, -f Path and name of Dockerfile Dockerfile

--no-cache Do not use cache when building image

--tag, -t Name (and optional tag) in name:tag format

Typical files in context:
- Dockerfile: a build recipe file
- .dockerignore: similar to a .gitignore, this file lists files to be ignored in collating the files that form the build context.
As an alternative to providing build instructions in the form of a Dockerfile, build can accept a URL to a (remote) docker repository.

More info:
https://docs.docker.com/engine/reference/commandline/build/
Check that the image(s) have been created

A list of images in your registry is obtained by:
```
docker image ls
```
```
REPOSITORY   TAG       IMAGE ID       CREATED        SIZE
minideb      latest    7ffb28315fd0   1 second ago   121MB
```
Note, the above simply lists all named images in your local registry. To get a more complete list of all images:
```
docker images -a  
```
```
REPOSITORY   TAG       IMAGE ID       CREATED        SIZE
minideb      latest    7ffb28315fd0   1 second ago   121MB
```
The -a switch indicates all images (including unnamed and hanging images).

This list appears chronologically from bottom to top. Hence, the Dockerhub image (bitnami/minideb) appears at the bottom of this list and above this there is a succession of intermediate images that correspond to each of the layers defined in the Dockerfile. Note, each successive image is progressively larger in size as the layer incorporates the layer below. At the top of the list is the full container image (with latest tag).

Importantly, while we have built a container image, we do not yet have any running containers. We can demonstrate this by listing all existing containers:
```
docker ps -a  
```
```
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
....
```
The output is empty (assuming you have not previously generated any containers), indicating that there are currently no running containers.
Test the docker image (fire up an ephemeral container)

We will now test the image by generating and running a container from our container image. Once the container has started, the cowsay terminal application will display an ASCII cow saying ’Moo” before quietly terminating. The container will be automatically stopped and removed.
```
docker run --entrypoint ./usr/games/cowsay --rm minideb Moo
```
```
 _____
< Moo >
 -----
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
```
where:
- --entrypoint ./usr/games/cowsay defines the base command that will be run once the container has started. In this case, it specifies the full path to the cowsay executable file.
- --rm indicates that the container should be removed after it has finished running.
- minideb is the name of our docker container image
- Moo is the string passed on to cowsay to display in the speech bubble. Feel free to experiment with other strings here.
To further appreciate the way arguments are passed on to applications within a container, lets alter the cowsay animal.
```
docker run --entrypoint ./usr/games/cowsay --rm minideb -f /usr/share/cowsay/cows/koala.cow Grunt 
```
```
 _______
< Grunt >
 -------
  \
   \
       ___  
     {~._.~}
      ( Y )
     ()~*~()   
     (_)-(_)   
```
In the above example, we passed -f /usr/share/cowsay/cows/koala.cow Grunt on to cowsay. In this context, -f points to were alternative animal definitions are located.
Test the docker image interactively

Rather than fire up a container, run some command and then immediately terminate, it is possible to run a container in interactive mode. In this mode, after the container starts up, you will be placed in a terminal where you can issue any available command you like. Once you have finished the interactive session, simply enter exit and the container will then terminate.

Try the following (within the container):
```
docker run --rm -it minideb   
```
Once the prompt appears try entering the following:
- list the files and folders in the current working directory
```
ls -la
```
- run the cowsay application
```
./usr/games/cowsay Moo
```
- exit the container
```
exit
```
Running in interactive mode is very useful when developing/debugging code on a container.

Name	Description	Default
`--file, -f`	Path and name of `Dockerfile`	`Dockerfile`
`--no-cache`	Do not use cache when building image
`--tag, -t`	Name (and optional tag) in `name:tag` format

3.2 Some Dockerfile goodness

Lets now step this up a bit and add some more information to the build recipe. Rather than alter the previous Dockerfile, we will instead make a different file (Dockerfile2) and inform docker to build with this alternative Dockerfile.

FROM bitnami/minideb:bookworm
LABEL maintainer="Author"
LABEL email="author_email@email.com"

## Install the os packages
RUN apt-get update \
  && apt-get install -y --no-install-recommends \
    cowsay \
  && rm -rf /var/lib/apt/lists/*

## Default command to run
ENTRYPOINT ["/usr/games/cowsay","-f","/usr/share/cowsay/cows/koala.cow"]

## Default extra parameters passed to the command
CMD ["Grunt"]

In the above Dockerfile:

ENTRYPOINT provides a default command to run within the container. This specification is in JSON format.
CMD provides default extra parameters that are passed on to the command (also in JSON format). This can be overridden by passing an alternative when running the docker run command (see below).

If we now build our container image using this Dockerfile2:

docker build . --tag minideb -f Dockerfile2

If we again review the list of images, we see that there are now two additional intermediate images and the latest image has been updated.

docker images -a

REPOSITORY   TAG       IMAGE ID       CREATED        SIZE
<none>       <none>    7ffb28315fd0   1 second ago   121MB
minideb      latest    18967ba179b9   1 second ago   121MB

We can now run the container as:

docker run --rm minideb

 _______
< Grunt >
 -------
  \
   \
       ___  
     {~._.~}
      ( Y )
     ()~*~()   
     (_)-(_)

To override the default arguments (CMD) we baked into the docker image, we can issue an alternative as a command line argument.

docker run --rm minideb Meow

 ______
< Meow >
 ------
  \
   \
       ___  
     {~._.~}
      ( Y )
     ()~*~()   
     (_)-(_)

So far, we have used the docker container to display an ASCII art cow (or koala) in the terminal and then exit. Whilst, this might have some utility as a simple example of interacting with containers, it hardly represents typical work.

In the context of reproducible research, containers are useful for providing a consistent environment in which to run code. Thus in order to be useful, a container should have:

access (or a copy) of the code within the container
the ability to store the results on the host where they can be viewed and disseminated.

To illustrate these, we will add the R Statistical and Graphical Environment to our container image and use this in two further examples.

Copy

For the first example, we will add instructions to the Dockerfile to copy a small R script (lets call it analysis.R) into the container so that the code can be run within the container environment. Lets create two files:

dat <- data.frame(y = rnorm(10))
dat
write.csv(dat, file = "dat.csv", row.names = FALSE)

a Dockerfile called Dockerfile3

FROM bitnami/minideb:bookworm
LABEL maintainer="Author"
LABEL email="author_email@email.com"

## Install the os packages
RUN apt-get update \
  && apt-get install -y --no-install-recommends \
    r-base \
  && rm -rf /var/lib/apt/lists/*

COPY analysis.R ~/
WORKDIR ~/

## Default command to run
ENTRYPOINT ["Rscript"]

## Default command parameters
CMD ["analysis.R"]

This Dockerfile includes instructions to:

- install R (`r-base`)

- copy our R script from the current working directory to the home
  folder in the container (`COPY analysis.R ~/`)

- set the working directory within the container to be the home
  folder (`WORKDIR ~/`)

- specify that once the container has started, the `Rscript`
  command should be run (`ENTRYPOINT ["Rscript"]`)

- specify that once the container has started, the `Rscript`
  command should be run using the `analysis.R` script 
  (`CMD ["analysis.R"]`)

Great. This time when we build the container image, we will provide both a name and tag for the image (via --tag r:1). This will result in an image called r with a tag of 1.

docker build . --tag r:1 -f Dockerfile3

When we run this new image, we see that a data frame of 10 values is returned to the terminal.

docker run --rm r:1

             y
1   0.12889068
2  -0.53226658
3   0.00170487
4   0.29525432
5   0.33315558
6   1.14149797
7  -0.33657510
8  -0.40503559
9  -0.92640039
10 -0.05583289

Mount points

R did indeed run the analysis.R script inside the container. However, what happened to the file containing the exported data (dat.csv)? Although this file was created inside the container, it was completely lost when the container terminated. Obviously that is not that useful.

For the second example, rather than copy the R script to the container, we will instead mount a local folder to a point within the container. That way we can access select host files and folders within the container, thereby enabling us to both read the R script directly and write out any output files.

To support this, we will create another Dockerfile (Dockerfile4).

FROM bitnami/minideb:bookworm
LABEL maintainer="Author"
LABEL email="author_email@email.com"

## Install the os packages
RUN apt-get update \
  && apt-get install -y --no-install-recommends \
    r-base \
  && rm -rf /var/lib/apt/lists/*

WORKDIR /home/Project

## Default command to run
ENTRYPOINT ["Rscript"]

## Default command parameters
CMD ["analysis.R"]

The changes from the previous Dockerfile:

remove the COPY statement as we will not need to work on a copy of the R code, we can work on it directly.
change the container working directory to /home/Project - Note, this path will not exist and will be created.

We will now build the container image with the name:tag of r:2

docker build . --tag r:2 -f Dockerfile4

This time, when we run the container image, we will indicate a volume to mount (and a folder to mount this volume to). This will define which host folder to mount (map) and the path to mount this to within the container.

cd ~/tmp/docker_tests/
ls -lat
docker run --rm -v $(pwd):/home/Project r:2

total 28
drwxr-xr-x 2 runner docker 4096 Jun 14 04:28 .
-rw-r--r-- 1 runner docker  361 Jun 14 04:28 Dockerfile4
-rw-r--r-- 1 runner docker  369 Jun 14 04:28 Dockerfile3
-rw-r--r-- 1 runner docker   89 Jun 14 04:28 analysis.R
-rw-r--r-- 1 runner docker  403 Jun 14 04:28 Dockerfile2
-rw-r--r-- 1 runner docker  238 Jun 14 04:28 Dockerfile
drwxr-xr-x 6 runner docker 4096 Jun 14 04:28 ..
             y
1   0.25645441
2   0.62071225
3   1.72049275
4   0.97491785
5   1.42195029
6  -2.01350723
7   0.33295720
8   0.76919806
9   0.68772824
10 -0.07519486

where:

-v pwd:/home/Project defines the mounting of the current working directory on the host machine to the /home/Project' (which theDockerfile` defined as the working directory) folder inside the container

If we list the contents of our local folder, we see that the output file (dat.csv) has been created on the host filesystem.

cd ~/tmp/docker_tests/
ls -la

total 32
drwxr-xr-x 2 runner docker 4096 Jun 14 04:28 .
drwxr-xr-x 6 runner docker 4096 Jun 14 04:28 ..
-rw-r--r-- 1 runner docker  238 Jun 14 04:28 Dockerfile
-rw-r--r-- 1 runner docker  403 Jun 14 04:28 Dockerfile2
-rw-r--r-- 1 runner docker  369 Jun 14 04:28 Dockerfile3
-rw-r--r-- 1 runner docker  361 Jun 14 04:28 Dockerfile4
-rw-r--r-- 1 runner docker   89 Jun 14 04:28 analysis.R
-rw-r--r-- 1 root   root    183 Jun 14 04:28 dat.csv

Of course, it is possible to define a combination of the above two examples - one in which a copy of the codebase is packaged up into the container image (to ensure that a specific codebase is always applied), yet a moutpoint is also specified at run time to enable the output(s) to be obtained by the host.

The .dockerignore file is similar in structure to a .gitignore file in that they both define files (or patterns) to ignore. The purpose of a dockerignore is to indicate which files and folders should be excluded from the docker build context. Excluding certain files and directories:

reduce image size by preventing unnecessary files from being copied (e.g., logs, temporary files).
speed up builds by avoiding sending large or irrelevant files to the Docker daemon.
improve security by excluding sensitive files like .env or credentials

3.3 Building a reproducible R environment

Just as it might be important to be able to recreate the state of an operating system and software from a previous time (to maximise the potential for reproducibility), it might be equally important to ensure that the entire environment reflects this state back in time. In the case of R software, this means that all included packages should be the same versions that were available at that previous time.

Posit (the developers of Rstudio) provide daily snapshots of CRAN (Posit package manager). We can therefore nominate a date when providing package installation instructions in a Dockerfile.

On UNIX based systems, such as linux and MacOSX, many R packages need to be compiled from source during their installation. As such, they sometimes have additional external system dependencies. Normally, it it is necessary to install these external dependencies prior to attempting to install the R packages. This is done via the apt-get install (debian) instructions in the Dockerfile. Unfortunately, this can turn into a very iterative process or attempting to install an R package, examine the progress and look out for any errors about missing dependencies and going back to the Dockerfile and adding the appropriate dependencies before trying again.

The R package called pak is designed to install R packages and if necessary handle the installation of any additional external dependencies as well. This makes pak a very useful addition in Dockerfiles.

Some R packages have system dependencies that usually must be installed before attempting to install R packages.

For this example, we will start from an image that already has a version of R (4.2.2) built in (rocker/r-ver:4.2.2). Although this image is substantially larger than the mini debian we used earlier, it does come with all the build tools and R, each of which would otherwise require additional downloads and compilation. The net result is that the rocker/r-ver:4.2.2 image requires less overall download traffic than the individual parts.

Note

Note, this image will take substantially longer to make as it not only has to pull down a larger base, it then has to compile the entire tidyverse from source.

The changes from the previous Dockerfile:

switch the base image to rocker:r-ver:4.2.2
add numerous -dev developer package dependencies
install the tidyverse collection of packages from a dated snapshot

FROM rocker/r-ver:4.2.2
LABEL maintainer="Author"
LABEL email="author_email@email.com"

RUN apt-get update \
  && apt-get install -y --no-install-recommends \
  libxml2-dev \
  libcurl4-openssl-dev  \
  libssl-dev \
  zlib1g-dev \
  && rm -rf /var/lib/apt/lists/*

## Install R package versions from Posit package manager (based on a date - YYYY-MM-DD)
RUN R -e "options(repos = \
    list(CRAN = \"https://packagemanager.posit.co/cran/2024-01-10/\")); \
  install.packages(\"pak\"); \
"
RUN R -e "options(repos = \
    list(CRAN = \"https://packagemanager.posit.co/cran/2024-01-10/\")); \
  pak::pkg_install(c(\"tidyverse\")); \
"

WORKDIR /home/Project

## Default command to run
ENTRYPOINT ["Rscript"]

CMD ["analysis.R"]

In the above Dockerfile

the three first rows in contain information on the base image on which to construct your docker container image (in this case rocker/r-ver:4.2.1 container freely provided by the rocker team), as well as information about yourself.
the FROM command points to a parent image. Typically, this will point to a specific image within a registry on Docker hub. This generates the base layer of the container image.
the LABEL command is one of numerous environmental variables and is used to add metadata to an image. In this case, there are two entries to specify information about the maintainer and their contact details. Note, entries must be key-value pairs and values must be enclosed in double quotes.
the RUN command runs shell commands in a new layer on top of the current image and commits the result. Each RUN generates a new layer. In the above example, there are two RUN commands.
- the first RUN command updates the underlying Ubuntu linux distribution and installs some necessary dependencies
- the second RUN adds the R package (tidyverse - which is technically a large collection of packages) from the Posit package manager) snapshot repository (base on the 4th of October 2022). This repository stores daily snapshots of all the packages on CRAN and thus allows us to obtain the set of packages in the state they existed on a nominated day.
the WORKDIR command sets the working directory for software within the container. In this case, we are creating a dedicated directory (/home/Project)
the ENTRYPOINT command defines the default command/application to run within the container if the user does not provide a command. This is in JSON format. In this case, we are indicating that by default the container should run the Rscript application. The Rscript application runs a non-interactive R session on a nominated R script file. That is, it will run the nominated R script and then terminate on completion.
the CMD command defined the default arguments to provide to the ENTRYPOINT command. In this case, we have indicated which R script to run Rscript on.

Note, many of the large collection of R packages targeted for install as part of the tidyverse ecosystem (or its dependencies) require full compilations. Whilst this does help ensure that the underlying package routines are optimised for your system, the entire install process may take up to an hour. This installation can be sped up substantially by instead installing a pre-bundled version of the tidyverse packages (and dependencies) directly from the Ubuntu repositories. The associated alternative Dockerfile is provided in the following expandable section.

Alternative Dockerfile

As an alternative, we could instead install tidyverse from the ubuntu r-cran repository. This install will be far faster, yet likely not as up-to-date and we would have a little less control over exactly which version we were installing…

FROM rocker/r-ver:4.2.2
LABEL maintainer="Author"
LABEL email="author_email@email.com"

RUN apt-get update \
  && apt-get install -y --no-install-recommends \
  r-cran-tidyverse \
  && rm -rf /var/lib/apt/lists/*

WORKDIR /home/Project

## Default command to run
ENTRYPOINT ["Rscript"]

CMD ["analysis.R"]

We will now build the container image with the name:tag of r-tidyverse:1 Note, this will take some time to complete.

docker build . --tag r-tidyverse:1 -f Dockerfile5

If we now run docker with this image, the resulting container will automatically run a non-interactive session using the analysis.R script.

cd ~/tmp/docker_tests/  
docker run --rm -v `pwd`:/home/Project r-tidyverse:1

             y
1   0.34110391
2   0.61124832
3   0.58591502
4  -0.79846987
5  -0.04871287
6   0.14069825
7  -0.02072652
8   0.94340254
9  -0.98729648
10  0.31923035

where:

-v pwd:/home/Project defines the mounting of the current working directory on the host machine to the /home/Project' (which theDockerfile` defined as the working directory) folder inside the container

Conveniently, we can override the CMD command when we run docker run. In the current context, perhaps we would like to run the R session on an different R script. Lets try this by creating a new R script - this time, one that makes use of the tidyverse ecosystem.

library(tidyverse)
dat <- data.frame(y = rnorm(10))
dat
dat %>% summarise(Mean = mean(y), Median = median(y))

Now we can nominate this alternative R script as the argument to the Rscript command.

cd ~/tmp/docker_tests/
docker run --rm -v `pwd`:/home/Project r-tidyverse:1 analysis5.R

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
             y
1  -2.00787111
2   0.06680217
3  -0.12790326
4   1.04934970
5  -0.55892137
6  -1.49056854
7   0.83187828
8  -0.87902215
9  -0.31666964
10 -1.20011593
        Mean     Median
1 -0.4633042 -0.4377955

3.4 Managing Docker images

To list the most recently created images.

docker images

REPOSITORY    TAG       IMAGE ID       CREATED         SIZE
r-tidyverse   1         0ee3cbb242d4   6 seconds ago   2.09GB
r             2         b02de0d21bea   7 minutes ago   289MB
r             1         7e0f4e1224dd   7 minutes ago   289MB
<none>        <none>    7ffb28315fd0   7 minutes ago   121MB
minideb       latest    18967ba179b9   7 minutes ago   121MB

The output will include all created images along with the base images that they are derived from (those sourced from Dockerhub for example).

The images (REPOSITORY) entries that are <none> are dangling images. That is, they are intermediate images that were previously used in the building of an image and are no longer used (because the Dockerfile layer that they were associated with is no longer used in the latest version of that Dockerfile).

We can exclude them from the output, by defining a filter that excludes dangling images.

docker images -f "dangling=false"

REPOSITORY    TAG       IMAGE ID       CREATED         SIZE
r-tidyverse   1         0ee3cbb242d4   6 seconds ago   2.09GB
r             2         b02de0d21bea   7 minutes ago   289MB
r             1         7e0f4e1224dd   7 minutes ago   289MB
minideb       latest    18967ba179b9   7 minutes ago   121MB

If instead of dangling=false, you indicate dangling=true, only dangling images are displayed. This can be useful for identifying redundant images.

Another useful filter is to predicate on time relative to an image. For example, to display all images that were created since minideb:

docker images -f "since=minideb"

REPOSITORY    TAG       IMAGE ID       CREATED         SIZE
r-tidyverse   1         0ee3cbb242d4   6 seconds ago   2.09GB
r             2         b02de0d21bea   7 minutes ago   289MB
r             1         7e0f4e1224dd   7 minutes ago   289MB

There is also a before version.

Note, the above examples will exclude the intermediate images that are associated with build layers. If we want to see all images (including the intermediate images), use the -a switch.

docker images -a

REPOSITORY    TAG       IMAGE ID       CREATED         SIZE
r-tidyverse   1         0ee3cbb242d4   6 seconds ago   2.09GB
r             2         b02de0d21bea   7 minutes ago   289MB
r             1         7e0f4e1224dd   7 minutes ago   289MB
<none>        <none>    7ffb28315fd0   7 minutes ago   121MB
minideb       latest    18967ba179b9   7 minutes ago   121MB

This will also remove unused (dangling) containers, volumes and caches.

docker system prune

docker system prune -a

docker rmi <ID>

where <ID> is the IMAGE ID from docker images -a

docker images -a | grep <"pattern"> | awk '{print $3}' | xargs docker rmi

where <"pattern"> is a regular expression for isolating the name of the image(s) to remove.

docker rmi -f $(docker images -f "dangling=true" -q).

4 Generating docker images on github

Up to now, the docker images we have created have been housed in a registry locally on the machine that we are building them on. In the spirit of reproducibility, we ideally want these images to be available to others (as well as our future selves). There are numerous options for making images available:

each user (or your future self) can build the image from the Dockerfile. In many cases, this will be a lengthy (yet automated) process. For this to be a viable option, each party will need to have the ability and resources to be able to build docker images from a Dockerfile.
the docker image could be hosted on a remote repository (such as dockerhub). This option requires that the original author of the docker image has a dockerhub account so that they can push the image that they created locally up to the remote registry.
build and host the docker image on github. This option requires the original author/authors to have a github account.

Building Docker images on GitHub via GitHub Actions is useful because it enables automated, consistent, and reproducible builds directly from your remote repository. As such, it is possible to trigger a fresh image build every time there is a change to the repo itself. This reduces manual effort and minimizing deployment errors. With GitHub Actions, you can integrate CI/CD workflows, automatically test images, and push them to container registries (e.g., GitHub Container Registry or Docker Hub). It also enhances collaboration, as team members can track build logs, detect failures early, and enforce security best practices through automated vulnerability scanning. Finally, it also means that the one platform can be used to host both code and the environment in which to run the code.

To have Github Actions build and publish a docker image, we start by creating a github actions workflow file. This file is in yaml format and should be placed a directory called .github/workflows within your git repository.

name: Create and publish the Docker image

on:
 workflow_dispatch:
 push:
   branches: [ "main" ]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build-image:
    runs-on: ubuntu-latest

    if: "!contains(github.event.head_commit.message, '[ci skip]')"

    permissions:
      contents: read
      packages: write

    name: ${{ matrix.config.r }}

    strategy:
      fail-fast: false
      matrix:
        config:
        #- { r: devel }
        #- { r: next  }
        - { r: 4.4.1 }

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Login to GitHub Container Registry
        uses: docker/login-action@v1
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.repository_owner }}  #
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata (tags, labels) for Docker
        id: meta
        uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}

      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}

In the above .github/workflow:

name give a name to the workflow and is how it will be referred to within the Actions user interface
on: workflow_dispatch: specifies that this workflow can runs when manually triggered from the GitHub Actions UI.
on: push: branches [ “main” ]: specifies that this workflow will be triggered whenever there is a pushed change to the “main” branch
env: REGISTRY: sets the container registry to GitHub Container Registry (ghcr.io).
env: IMAGE_NAME: sets the repository name (in this case to the name of the github repository).
jobs: build-image: runs-on: defines a job (build-image) that runs on the latest version of Ubuntu
if: provides a way of defining exclusion rules. In this case it prevents execution if the latest commit message contains either “ci” or “skip”. This is useful for avoiding unnecessary builds.
permissions: contents: read: allows the workflow to read the repository contents.
packages: write: grants permission to push Docker images to Github Container Registry.
strategy: fail-fast:
strategy: matrix: config: defines a build matrix for testing against multiple R versions (only 4.4.1 is active). Commented-out lines suggest that devel and next versions were once considered.
steps: the actions to perform:
- users: actions/checkout@v4 clones the repository so that the workflow can access files like the Dockerfile and any other code in the repository. It is using version four of this routine
- users: docker/login-action@v1 … uses GitHub’s built-in token (GITHUB_TOKEN) to authenticate with Github Container Registry. Ensures that pushing images to Github Container Registry is authorized.
- users: docker/metadata-actions@… generates metadata (tags, labels) based on the repository and commit details. The specific commit hash (in this case 9ec57ed1…) ensures a fixed version of docker/metadata-action.
- users: docker/build-push-action@v5 .. builds the Docker image from the repository. Pushes the image to Github Container Registry with the metadata-generated tags and labels.

Now any time there is a change to the main branch, the image will be rebuilt and published to Github Container Registry.

To access the built image:

navigate to the Code panel of the repositories Github page.
click on the item under the heading Packages.

The displayed panel will provide information on the versions of the package available as well as instructions on how to obtain (pull) the image to a local repository from where it can be run.

--- title: Containerisation - docker and singularity author: "Murray Logan" date: "`r format(Sys.time(), '%d %B, %Y')`" format: html: toc: true toc-float: true page-layout: full number-sections: true number-depth: 3 embed-resources: true code-fold: false code-tools: true code-summary: "Show the code" code-line-numbers: true code-block-border-left: "#ccc" code-copy: true highlight-style: atom-one theme: [default, ../resources/tut-style.scss] css: ../resources/tut-style.css crossref: fig-title: '**Figure**' fig-labels: arabic tbl-title: '**Table**' tbl-labels: arabic engine: knitr bibliography: ../resources/references.bib output_dir: "docs" --- ```{r setup, include=FALSE,warning=FALSE, cache=FALSE, message=FALSE} knitr::opts_chunk$set(echo = TRUE,warning=FALSE, message=FALSE, cache = TRUE, comment = "") options(tinytex.engine = 'xelatex') cleanRmdInput <- function(x) { #x <- gsub("```\\{r","```markdown\n`r ''```\\{r",x) x <- gsub("^```$","`` `",x) # the Makefile will then change this back to ``` after pandoc x } library(tidyverse) library(pander) FIG_PATH <- '12_docker_files/figure-html/' # save the built-in output hook hook_output <- knitr::knit_hooks$get("output") # set a new output hook to truncate text output knitr::knit_hooks$set(output = function(x, options) { if (!is.null(n <- options$out.lines)) { x <- xfun::split_lines(x) more <- "..." n <- eval(parse(text=n)) if (length(n) == 1) { if (length(x) > n) { # truncate the output x <- c(head(x, n), "....\n") } } else { x <- c(more, x[n], more) } x <- paste(x, collapse = "\n") } hook_output(x, options) }) ``` # Background information In the previous tutorial, we discussed why it is important that your codebase is under version control and backed up to help ensure that your analyses can be evaluated and replicated by you and others both now and into the future. However, having access to the code (and data) does not always guarantee full reproducibility - this can also be effected by the exact software environment in which the code is run. In the context of statistical analyses performed in R for example, R as well as the various packages that you have elected to use in support your analyses can (and do) evolve over time. Some functions get modified and some even get depreciated. Hence, over time, code that once worked perfectly (or at least adequately) can become _broken_. Early solutions to this facet of reproducibility focused on **virtual machines**. Virtual machines (VM) build entire software environments on top of a software layer that mimics a physical computer such that each VM runs on a _host_ computer is a completely separate self-contained entity. Whilst VMs do permit great flexibility (as virtually any operating system can be installed on a VM), they are considerably slower and less efficient than physical machines. Moreover, it is typically necessary to allocate a fixed amount of computer resources (particularly CPU) to the VM in advance. More modern solutions focus instead on **Containers**. In contrast to VMs, containers do not mimic a physical computer, rather they only virtualise layers on top of the host operating system. Indeed, containers share (read only) the host OS kernel and binaries/libraries and thus containers and the applications contained therein can be very "light" and are typically are almost as performant as applications run natively on the host. Time for some container terminology: - **Container image** is a static (unchangeable) file (or collection of files) that bundles code and all its dependencies (such as the necessary system libraries, code, runtime and system tools). Essentially, the image has all the information required to reproduce an software environment on any compatible machine. However, an image is just a snapshot which serves as a template to build a **container**. In other words, a container is a running image, and cannot exist without the image, whereas an image can exist without a container. - **Container** is a standard (linux) process whose software environment is defined by the contents of a container image and that runs on top of the host's OS. # Preparations If you intend to follow along with this tutorial, you may like to: - create a new folder (hereafter referred to as the _sandpit folder_) in which to create some files. On my local machine, I have a folder (`tmp`) in my home folder into which I will place a folder (called `docker_tests`) for this very purpose. On Linux and MacOSX that would be achieved via the following: ```{r} #| label: prepareFolder1 #| results: hide #| eval: true #| echo: false unlink('~/tmp/docker_tests', recursive=TRUE, force=TRUE) if (!dir.exists('~/tmp')) dir.create('~/tmp') ``` ```{bash} #| label: prepareFolder #| echo: true #| eval: true #| classes: terminal #| class-source: terminal #| code-line-numbers: false #| highlight-style: zenburn mkdir ~/tmp/docker_tests ``` </br> - install [Docker](www.docker.com) - install [apptainer/singularity](https://apptainer.org/) (if you intend to follow along with containers on a HPC). # Docker Currently, the most popular container engine in use today is [Docker](www.docker.com). Docker is easy to install on most operating systems and comes with tools to build, manage, run and distribute container images (the later of which is supported via the [DockerHub](https://hub.docker.com) container ecosystem. ```{tikz} %| label: DockerOverviewFig1 %| engine: tikz %| echo: false %| cache: true %| eval: true %| dependson: common %| class: tikz %| engine-opts: %| template: "../resources/tikz-commit.tex" \usetikzlibrary{positioning,shapes.callouts} \tikzstyle{fileText} = [font={\fontspec[Scale=1]{InconsolataSemiCondensed-Regular}}] \newcommand{\file}[3] { \coordinate (#3) at (#1); \draw[thick,anchor=top left](#1) -- ++(2,0) -- ++(0.5,-0.5) -- ++(0,-3) -- ++(-2.5,0) -- ++(0,3.5); \draw[thick,anchor=top left]($(#1) +(2,0)$) -- ++(0,-0.5) -- ++(0.5,0); \node [anchor=north west,fileText] at ($(#1) +(0,-0.1)$) {#2}; } \usetikzlibrary{calc,fit,positioning} \definecolor{OS}{rgb}{1,0.8,0.4}% \definecolor{Docker}{rgb}{0.26,0.65,0.91}% \definecolor{Bin}{rgb}{0.9,0.9,0.2}% \definecolor{Container}{rgb}{0.90,0.9,0.9}% \sf \begin{tikzpicture} \node [draw, anchor=south west,fill=Container] at (0,0) (Container1) { \begin{tikzpicture} \node [anchor=west] (title) at (0,0) {\textbf{Container 1}}; \node [draw, anchor=west, fill=white] at ($(title.west) + (0,-1.5)$) (App) { \scalebox{0.5}{\begin{tikzpicture} \node (Code) {\huge Apps, Code etc}; \commit{below = 1.5cm of Code}{A}{color_inactive}{}{} \commit{right = 1cm of A}{B}{color_inactive}{}{} \commit{right = 1cm of B}{C}{color_commit}{}{} \draw [-,line width=3pt,draw=black!60] (A) -- ++(-1,0); \draw [-,line width=3pt,draw=black!60] (B) -- (A); \draw [-,line width=3pt,draw=black!60] (C) -- (B); \commit{above = 0.5cm of B}{D}{color_inactive}{}{} \draw [-,line width=3pt,draw=black!60] (A.east) to[out=0,in=180] (D); \main{right = 0.5cm of C}\draw[->,line width=3pt,draw=black!60] (main) -- (C); \HEAD{right = 0.5cm of main}\draw[->,line width=3pt,draw=black!60] (HEAD) -- (main); \branch{right = 0.5cm of D}{Feature}\draw[->,line width=3pt,draw=black!60] (Feature) -- (D); \end{tikzpicture}}}; \node [draw,anchor=north,inner sep=0pt, fit=(App.south west)(App.south east), fill=Bin, minimum height = 1.75em, label={center:Bins/libs}] {}; \end{tikzpicture} }; \node [draw, anchor=south west,fill=Container] at ($(Container1.south east) + (0.2,0)$) (Container2) { \begin{tikzpicture} \node [anchor=west] (title) at (0,0) {\textbf{Container 2}}; \node [draw, anchor=west, fill=white,minimum width=4cm] at ($(title.west) + (0,-1)$) (App) {Apps, Code etc}; \node [draw, anchor=north west, fill=Bin, minimum width=4cm] at ($(App.south west) + (0,-0)$) {Bins/libs}; \end{tikzpicture} }; \node [draw,fit=(Container1.south west)(Container2.south east), anchor=north,inner sep=0pt,fill=Docker, minimum height=2em, label={center:Docker Engine}] (Docker) {}; \node [draw,fit=(Docker.south west)(Docker.south east), anchor=north,inner sep=0pt,fill=OS, minimum height=2em, label={center:Host OS}] (OS) {}; \end{tikzpicture} ``` ## Simple overview 1. **Create a Docker definition file.** The `Dockerfile` contains a set of instructions that Docker uses to build your container with the correct specifications. For now you do not need to know all the bits and pieces here (though please see this [link](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/) for a more in-depth understanding of what the Dockerfile is capable of). Lets start with a very simple `Dockerfile` (which should be a plain text file located in the root of a project directory). This first example will be a very minimal example and much of the rest of the Docker section of this tutorial will then progressively build on this example to introduce more and more complexity. The image we will build will start with a very minimal Debian Linux base called [minidep](https://hub.docker.com/r/bitnami/minideb) sourced from Dockerhub. This provides a fully functioning Linux operating system complete with the typical terminal power tools. To then extend the image by updating the package lists (location of repositories) before adding (installing) a small fun terminal application ([cowsay](https://github.com/tnalpgge/rank-amateur-cowsay)) that generates ASCII art of a cow (or other animals) along with a speech bubble. ```{bash} #| label: createDockerfile #| echo: false #| eval: true #| cache: false #| engine: bash cd ~/tmp/docker_tests echo 'FROM bitnami/minideb:bookworm' > Dockerfile echo 'LABEL maintainer="Author"' >> Dockerfile echo 'LABEL email="author_email@email.com"' >> Dockerfile echo '' >> Dockerfile echo '## Install the os packages' >> Dockerfile echo 'RUN apt-get update \' >> Dockerfile echo ' && apt-get install -y --no-install-recommends \' >> Dockerfile echo ' cowsay \' >> Dockerfile echo ' && rm -rf /var/lib/apt/lists/*' >> Dockerfile ``` ```{r} #| label: Text #| engine: text #| results: asis #| classes: dockerfile #| class-output: dockerfile #| cache: false #| file: ~/tmp/docker_tests/Dockerfile #| eval: false ``` In the above `Dockerfile`: - the three first rows in contain information on the base image on which to construct your docker container image (in this case bitnami/minideb:stretch container freely provided by the bitnami team), as well as information about yourself. `minideb` is a minimal debian operating system. - the **FROM** command points to a parent image. Typically, this will point to a specific image within a registry on Docker hub. This generates the base layer of the container image. - the **LABEL** command is one of numerous _environmental variables_ and is used to add metadata to an image. In this case, there are two entries to specify information about the maintainer and their contact details. Note, entries must be key-value pairs and values must be enclosed in double quotes. - the **RUN** command runs shell commands in a new layer on top of the current image and commits the result. Each RUN generates a new layer. The above example, first updates the packages list and then installs an additional package (`cowsay`). 2. **Build the docker image** In a terminal in the same location as the `Dockerfile` (i.e. your _sandpit folder_), enter something the following: ```{bash} #| label: dockerBuildReal #| engine: bash #| results: hide #| eval: true #| echo: false cd ~/tmp/docker_tests docker build . --tag minideb -f Dockerfile ``` ```{bash} #| label: dockerBuild #| engine: bash #| results: hide #| classes: terminal #| class-source: terminal #| code-line-numbers: false #| highlight-style: zenburn #| eval: false docker build . --tag minideb ``` <br> where: - `.` indicates that the path used for the build context (in this case the current working directory - `.` means current location). Any files within this path can be copied into the image or used in the build context (for example, a `Dockerfile`). - `--tag minideb` provides a name (and optionally, a tag) for the image (in this case `minideb`). The name (and tag) can be anything, yet should be descriptive enough to help you distinguish this container image from other container images that you might construct on your system. This will build a series of docker container images (each of the layers built upon the layer before) in a local registry <details> <summary>More details</summary> Usage: `docker build [OPTIONS] PATH | URL | -` <br> Common options: | Name | Description | Default | |--------------|----------------------------------------------|--------------| | `--file, -f` | Path and name of `Dockerfile` | `Dockerfile` | | `--no-cache` | Do not use cache when building image | | | `--tag, -t` | Name (and optional tag) in `name:tag` format | | Typical files in context:<br> - `Dockerfile`: a build recipe file<br> - `.dockerignore`: similar to a `.gitignore`, this file lists files to be ignored in collating the files that form the build context.<br> As an alternative to providing build instructions in the form of a `Dockerfile`, build can accept a `URL` to a (remote) docker repository.<br><br> More info:<br> https://docs.docker.com/engine/reference/commandline/build/ </details> 3. **Check that the image(s) have been created** A list of images in your registry is obtained by: ```{bash} #| label: dockerImage #| engine: bash #| results: markup #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn #| eval: true #| comment: "" #| cache: true docker image ls ``` Note, the above simply lists all named images in your local registry. To get a more complete list of all images: ```{bash} #| label: dockerImages #| engine: bash #| results: markup #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn #| eval: true #| comment: "" #| cache: true docker images -a ``` The `-a` switch indicates all images (including unnamed and hanging images). This list appears chronologically from bottom to top. Hence, the Dockerhub image (`bitnami/minideb`) appears at the bottom of this list and above this there is a succession of intermediate images that correspond to each of the layers defined in the `Dockerfile`. Note, each successive image is progressively larger in size as the layer incorporates the layer below. At the top of the list is the full container image (with `latest` tag). Importantly, while we have built a container image, we do not yet have any running containers. We can demonstrate this by listing all existing containers: ```{bash} #| label: dockerPs #| engine: bash #| results: markup #| eval: true #| comment: "" #| cache: true #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| out.lines: 1 #| highlight-style: zenburn docker ps -a ``` The output is empty (assuming you have not previously generated any containers), indicating that there are currently no running containers. 4. **Test the docker image (fire up an ephemeral container)** We will now test the image by generating and running a container from our container image. Once the container has started, the `cowsay` terminal application will display an ASCII cow saying 'Moo" before quietly terminating. The container will be automatically stopped and removed. ```{bash} #| label: dockerRun #| engine: bash #| results: markup #| eval: true #| comment: "" #| cache: true #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn docker run --entrypoint ./usr/games/cowsay --rm minideb Moo ``` where: - `--entrypoint ./usr/games/cowsay` defines the base command that will be run once the container has started. In this case, it specifies the full path to the `cowsay` executable file. - `--rm` indicates that the container should be removed after it has finished running. - `minideb` is the name of our docker container image - `Moo` is the string passed on to `cowsay` to display in the speech bubble. Feel free to experiment with other strings here. To further appreciate the way arguments are passed on to applications within a container, lets alter the `cowsay` animal. ```{bash} #| label: dockerRun1 #| engine: bash #| results: markup #| eval: true #| comment: "" #| cache: true #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn docker run --entrypoint ./usr/games/cowsay --rm minideb -f /usr/share/cowsay/cows/koala.cow Grunt ``` In the above example, we passed `-f /usr/share/cowsay/cows/koala.cow Grunt` on to `cowsay`. In this context, `-f` points to were alternative animal definitions are located. 5. **Test the docker image interactively** Rather than fire up a container, run some command and then immediately terminate, it is possible to run a container in _interactive mode_. In this mode, after the container starts up, you will be placed in a terminal where you can issue any available command you like. Once you have finished the interactive session, simply enter `exit` and the container will then terminate. Try the following (within the container): ```{bash} #| label: dockerRun2 #| engine: bash #| results: markup #| eval: false #| comment: "" #| cache: true #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn docker run --rm -it minideb ``` Once the prompt appears try entering the following: - list the files and folders in the current working directory ```{bash} #| label: dockerRun3 #| engine: bash #| results: markup #| eval: false #| comment: "" #| cache: true #| classes: terminal-docker #| class-source: [terminal, terminal-docker] #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn ls -la ``` - run the cowsay application ```{bash} #| label: dockerRun4 #| engine: bash #| results: markup #| eval: false #| comment: "" #| cache: true #| classes: terminal-docker #| class-source: [terminal, terminal-docker] #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn ./usr/games/cowsay Moo ``` - exit the container ```{bash} #| label: dockerRun5 #| engine: bash #| results: markup #| eval: false #| comment: "" #| classes: terminal-docker #| class-source: [terminal, terminal-docker] #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn exit ``` Running in interactive mode is very useful when developing/debugging code on a container. ## Some Dockerfile goodness ::: {.panel-tabset} ### Incorporate default commands Lets now step this up a bit and add some more information to the build recipe. Rather than alter the previous `Dockerfile`, we will instead make a different file (`Dockerfile2`) and inform `docker` to build with this alternative `Dockerfile`. ```{bash} #| label: createDockerfile2 #| echo: false #| eval: true #| classes: bash #| engine: bash #| cache: false cd ~/tmp/docker_tests echo 'FROM bitnami/minideb:bookworm' > Dockerfile2 echo 'LABEL maintainer="Author"' >> Dockerfile2 echo 'LABEL email="author_email@email.com"' >> Dockerfile2 echo '' >> Dockerfile2 echo '## Install the os packages' >> Dockerfile2 echo 'RUN apt-get update \' >> Dockerfile2 echo ' && apt-get install -y --no-install-recommends \' >> Dockerfile2 echo ' cowsay \' >> Dockerfile2 echo ' && rm -rf /var/lib/apt/lists/*' >> Dockerfile2 echo '' >> Dockerfile2 echo '## Default command to run' >> Dockerfile2 echo 'ENTRYPOINT ["/usr/games/cowsay","-f","/usr/share/cowsay/cows/koala.cow"]' >> Dockerfile2 echo '' >> Dockerfile2 echo '## Default extra parameters passed to the command' >> Dockerfile2 echo 'CMD ["Grunt"]' >> Dockerfile2 ``` ```{r} #| label: Text2 #| engine: text #| results: asis #| classes: dockerfile #| class-output: dockerfile #| file: ~/tmp/docker_tests/Dockerfile2 #| eval: false ``` In the above `Dockerfile`: - `ENTRYPOINT` provides a default command to run within the container. This specification is in _JSON_ format. - `CMD` provides default extra parameters that are passed on to the command (also in _JSON_ format). This can be overridden by passing an alternative when running the `docker run` command (see below). If we now build our container image using this `Dockerfile2`: ```{bash} #| label: dockerBuildReal2 #| engine: bash #| results: hide #| eval: true #| echo: false cd ~/tmp/docker_tests docker build . --tag minideb -f Dockerfile2 ``` ```{bash} #| label: dockerBuild2 #| engine: bash #| results: hide #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn #| eval: false docker build . --tag minideb -f Dockerfile2 ``` If we again review the list of images, we see that there are now two additional intermediate images and the `latest` image has been updated. ```{bash} #| label: dockerImages2 #| engine: bash #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn #| eval: true #| comment: "" #| cache: true docker images -a ``` We can now run the container as: ```{bash} #| label: dockerRun20 #| engine: bash #| results: markup #| eval: true #| comment: "" #| cache: true #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn docker run --rm minideb ``` To override the default arguments (**CMD**) we baked into the docker image, we can issue an alternative as a command line argument. ```{bash} #| label: dockerRun20a #| engine: bash #| results: markup #| eval: true #| comment: "" #| cache: true #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn docker run --rm minideb Meow ``` ### Copy and mountpoints So far, we have used the docker container to display an ASCII art cow (or koala) in the terminal and then exit. Whilst, this might have some utility as a simple example of interacting with containers, it hardly represents typical work. In the context of reproducible research, containers are useful for providing a consistent environment in which to run code. Thus in order to be useful, a container should have: 1. access (or a copy) of the code within the container 2. the ability to store the results on the host where they can be viewed and disseminated. To illustrate these, we will add the R Statistical and Graphical Environment to our container image and use this in two further examples. #### Copy For the first example, we will add instructions to the `Dockerfile` to copy a small R script (lets call it `analysis.R`) into the container so that the code can be run within the container environment. Lets create two files: ```{bash} #| label: createanalysis #| echo: false #| eval: true #| classes: bash #| engine: bash #| cache: false cd ~/tmp/docker_tests echo 'dat <- data.frame(y = rnorm(10))' > analysis.R echo 'dat' >> analysis.R echo 'write.csv(dat, file = "dat.csv", row.names = FALSE)' >> analysis.R ``` ```{r} #| label: Rcode3 #| engine: text #| results: asis #| classes: dockerfile #| class-output: dockerfile #| file: ~/tmp/docker_tests/analysis.R #| eval: false ``` - a `Dockerfile` called `Dockerfile3` ```{bash} #| label: createDockerfile3 #| echo: false #| eval: true #| classes: bash #| engine: bash #| cache: false cd ~/tmp/docker_tests echo 'FROM bitnami/minideb:bookworm' > Dockerfile3 echo 'LABEL maintainer="Author"' >> Dockerfile3 echo 'LABEL email="author_email@email.com"' >> Dockerfile3 echo '' >> Dockerfile3 echo '## Install the os packages' >> Dockerfile3 echo 'RUN apt-get update \' >> Dockerfile3 echo ' && apt-get install -y --no-install-recommends \' >> Dockerfile3 echo ' r-base \' >> Dockerfile3 echo ' && rm -rf /var/lib/apt/lists/*' >> Dockerfile3 echo '' >> Dockerfile3 echo 'COPY analysis.R ~/' >> Dockerfile3 echo 'WORKDIR ~/' >> Dockerfile3 echo '' >> Dockerfile3 echo '## Default command to run' >> Dockerfile3 echo 'ENTRYPOINT ["Rscript"]' >> Dockerfile3 echo '' >> Dockerfile3 echo '## Default command parameters' >> Dockerfile3 echo 'CMD ["analysis.R"]' >> Dockerfile3 ``` ```{r} #| label: Text3 #| engine: text #| results: asis #| classes: dockerfile #| class-output: dockerfile #| file: ~/tmp/docker_tests/Dockerfile3 #| eval: false ``` This `Dockerfile` includes instructions to: - install R (`r-base`) - copy our R script from the current working directory to the home folder in the container (`COPY analysis.R ~/`) - set the working directory within the container to be the home folder (`WORKDIR ~/`) - specify that once the container has started, the `Rscript` command should be run (`ENTRYPOINT ["Rscript"]`) - specify that once the container has started, the `Rscript` command should be run using the `analysis.R` script (`CMD ["analysis.R"]`) Great. This time when we build the container image, we will provide both a name and tag for the image (via `--tag r:1`). This will result in an image called `r` with a tag of `1`. ```{bash} #| label: dockerBuildReal3 #| engine: bash #| results: hide #| eval: true #| echo: false cd ~/tmp/docker_tests docker build . --tag r:1 -f Dockerfile3 ``` ```{bash} #| label: dockerBuild3 #| engine: bash #| results: hide #| classes: terminal #| class-source: terminal #| code-line-numbers: false #| highlight-style: zenburn #| eval: false docker build . --tag r:1 -f Dockerfile3 ``` </br> When we run this new image, we see that a data frame of 10 values is returned to the terminal. ```{bash} #| label: dockerRun30 #| engine: bash #| results: markup #| eval: true #| comment: "" #| cache: true #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn docker run --rm r:1 ``` #### Mount points R did indeed run the `analysis.R` script inside the container. However, what happened to the file containing the exported data (`dat.csv`)? Although this file was created inside the container, it was completely lost when the container terminated. Obviously that is not that useful. For the second example, rather than copy the R script to the container, we will instead mount a local folder to a point within the container. That way we can access select host files and folders within the container, thereby enabling us to both read the R script directly and write out any output files. To support this, we will create another `Dockerfile` (`Dockerfile4`). ```{bash} #| label: createDockerfile4 #| echo: false #| eval: true #| classes: bash #| engine: bash #| cache: false cd ~/tmp/docker_tests echo 'FROM bitnami/minideb:bookworm' > Dockerfile4 echo 'LABEL maintainer="Author"' >> Dockerfile4 echo 'LABEL email="author_email@email.com"' >> Dockerfile4 echo '' >> Dockerfile4 echo '## Install the os packages' >> Dockerfile4 echo 'RUN apt-get update \' >> Dockerfile4 echo ' && apt-get install -y --no-install-recommends \' >> Dockerfile4 echo ' r-base \' >> Dockerfile4 echo ' && rm -rf /var/lib/apt/lists/*' >> Dockerfile4 echo '' >> Dockerfile4 echo 'WORKDIR /home/Project' >> Dockerfile4 echo '' >> Dockerfile4 echo '## Default command to run' >> Dockerfile4 echo 'ENTRYPOINT ["Rscript"]' >> Dockerfile4 echo '' >> Dockerfile4 echo '## Default command parameters' >> Dockerfile4 echo 'CMD ["analysis.R"]' >> Dockerfile4 ``` ```{r} #| label: text4 #| engine: text #| results: asis #| classes: dockerfile #| class-output: dockerfile #| file: ~/tmp/docker_tests/Dockerfile4 #| eval: false ``` The changes from the previous `Dockerfile`: - remove the `COPY` statement as we will not need to work on a copy of the R code, we can work on it directly. - change the container working directory to `/home/Project` - Note, this path will not exist and will be created. We will now build the container image with the name:tag of `r:2` ```{bash} #| label: dockerBuildReal4 #| engine: bash #| results: hide #| eval: true #| echo: false cd ~/tmp/docker_tests docker build . --tag r:2 -f Dockerfile4 ``` ```{bash} #| label: dockerBuild4 #| engine: bash #| results: hide #| classes: terminal #| class-source: terminal #| code-line-numbers: false #| highlight-style: zenburn #| eval: false docker build . --tag r:2 -f Dockerfile4 ``` </br> This time, when we run the container image, we will indicate a volume to mount (and a folder to mount this volume to). This will define which host folder to mount (map) and the path to mount this to within the container. ```{bash} #| label: dockerRun40 #| engine: bash #| results: markup #| eval: true #| comment: "" #| cache: true #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn cd ~/tmp/docker_tests/ ls -lat docker run --rm -v $(pwd):/home/Project r:2 ``` where: - `-v pwd:/home/Project` defines the mounting of the current working directory on the host machine to the `/home/Project' (which the `Dockerfile` defined as the working directory) folder inside the container If we list the contents of our local folder, we see that the output file (`dat.csv`) has been created on the host filesystem. ```{bash} #| label: dockerRun41 #| engine: bash #| results: markup #| eval: true #| comment: "" #| cache: true #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn cd ~/tmp/docker_tests/ ls -la ``` Of course, it is possible to define a combination of the above two examples - one in which a copy of the codebase is packaged up into the container image (to ensure that a specific codebase is always applied), yet a moutpoint is also specified at run time to enable the output(s) to be obtained by the host. ### .dockerignore The `.dockerignore` file is similar in structure to a `.gitignore` file in that they both define files (or patterns) to ignore. The purpose of a `dockerignore` is to indicate which files and folders should be **excluded** from the docker build context. Excluding certain files and directories: - reduce image size by preventing unnecessary files from being copied (e.g., logs, temporary files). - speed up builds by avoiding sending large or irrelevant files to the Docker daemon. - improve security by excluding sensitive files like `.env` or credentials ::: ## Building a reproducible R environment Just as it might be important to be able to recreate the state of an operating system and software from a previous time (to maximise the potential for reproducibility), it might be equally important to ensure that the entire environment reflects this state back in time. In the case of R software, this means that all included _packages_ should be the same versions that were available at that previous time. Posit (the developers of Rstudio) provide daily snapshots of CRAN ([Posit package manager](https://packagemanager.posit.co/)). We can therefore nominate a date when providing package installation instructions in a `Dockerfile`. On UNIX based systems, such as linux and MacOSX, many R packages need to be compiled from source during their installation. As such, they sometimes have additional external system dependencies. Normally, it it is necessary to install these external dependencies prior to attempting to install the R packages. This is done via the `apt-get install` (debian) instructions in the Dockerfile. Unfortunately, this can turn into a very iterative process or attempting to install an R package, examine the progress and look out for any errors about missing dependencies and going back to the Dockerfile and adding the appropriate dependencies before trying again. The R package called `pak` is designed to install R packages and if necessary handle the installation of any additional external dependencies as well. This makes `pak` a very useful addition in Dockerfiles. Some R packages have system dependencies that usually must be installed before attempting to install R packages. For this example, we will start from an image that already has a version of R (4.2.2) built in (`rocker/r-ver:4.2.2`). Although this image is substantially larger than the mini debian we used earlier, it does come with all the build tools and R, each of which would otherwise require additional downloads and compilation. The net result is that the `rocker/r-ver:4.2.2` image requires less overall download traffic than the individual parts. ::: {.callout-note} Note, this image will take substantially longer to make as it not only has to pull down a larger base, it then has to compile the entire tidyverse from source. ::: ```{bash} #| label: createDockerfile5 #| echo: false #| eval: true #| classes: bash #| engine: bash #| cache: false cd ~/tmp/docker_tests echo 'FROM rocker/r-ver:4.2.2' > Dockerfile5 echo 'LABEL maintainer="Author"' >> Dockerfile5 echo 'LABEL email="author_email@email.com"' >> Dockerfile5 echo '' >> Dockerfile5 echo 'RUN apt-get update \' >> Dockerfile5 echo ' && apt-get install -y --no-install-recommends \' >> Dockerfile5 echo ' libxml2-dev \' >> Dockerfile5 echo ' libcurl4-openssl-dev \' >> Dockerfile5 echo ' libssl-dev \' >> Dockerfile5 echo ' zlib1g-dev \' >> Dockerfile5 echo ' && rm -rf /var/lib/apt/lists/*' >> Dockerfile5 echo '' >> Dockerfile5 echo '## Install R package versions from Posit package manager (based on a date - YYYY-MM-DD)' >> Dockerfile5 echo 'RUN R -e "options(repos = \' >> Dockerfile5 echo ' list(CRAN = \"https://packagemanager.posit.co/cran/2024-01-10/\")); \' >> Dockerfile5 echo ' install.packages(\"pak\"); \' >> Dockerfile5 echo '"' >> Dockerfile5 echo 'RUN R -e "options(repos = \' >> Dockerfile5 echo ' list(CRAN = \"https://packagemanager.posit.co/cran/2024-01-10/\")); \' >> Dockerfile5 echo ' pak::pkg_install(c(\"tidyverse\")); \' >> Dockerfile5 echo '"' >> Dockerfile5 echo '' >> Dockerfile5 echo 'WORKDIR /home/Project' >> Dockerfile5 echo '' >> Dockerfile5 echo '## Default command to run' >> Dockerfile5 echo 'ENTRYPOINT ["Rscript"]' >> Dockerfile5 echo '' >> Dockerfile5 echo 'CMD ["analysis.R"]' >> Dockerfile5 ``` The changes from the previous `Dockerfile`: - switch the base image to `rocker:r-ver:4.2.2` - add numerous `-dev` developer package dependencies - install the `tidyverse` collection of packages from a dated snapshot ```{r} #| label: Text5 #| file: ~/tmp/docker_tests/Dockerfile5 #| eval: false #| engine: text #| results: asis #| classes: dockerfile #| class-output: dockerfile ``` In the above `Dockerfile` - the three first rows in contain information on the base image on which to construct your docker container image (in this case rocker/r-ver:4.2.1 container freely provided by the rocker team), as well as information about yourself. - the **FROM** command points to a parent image. Typically, this will point to a specific image within a registry on Docker hub. This generates the base layer of the container image. - the **LABEL** command is one of numerous _environmental variables_ and is used to add metadata to an image. In this case, there are two entries to specify information about the maintainer and their contact details. Note, entries must be key-value pairs and values must be enclosed in double quotes. - the **RUN** command runs shell commands in a new layer on top of the current image and commits the result. Each RUN generates a new layer. In the above example, there are two RUN commands. - the first `RUN` command updates the underlying Ubuntu linux distribution and installs some necessary dependencies - the second `RUN` adds the R package (`tidyverse` - which is technically a large collection of packages) from the [Posit package manager](https://packagemanager.posit.co/)) snapshot repository (base on the 4th of October 2022). This repository stores daily snapshots of all the packages on CRAN and thus allows us to obtain the set of packages in the state they existed on a nominated day. - the **WORKDIR** command sets the working directory for software within the container. In this case, we are creating a dedicated directory (`/home/Project`) - the **ENTRYPOINT** command defines the default command/application to run within the container if the user does not provide a command. This is in JSON format. In this case, we are indicating that by default the container should run the `Rscript` application. The `Rscript` application runs a non-interactive `R` session on a nominated R script file. That is, it will run the nominated R script and then terminate on completion. - the **CMD** command defined the default arguments to provide to the **ENTRYPOINT** command. In this case, we have indicated which R script to run `Rscript` on. **Note**, many of the large collection of R packages targeted for install as part of the `tidyverse` ecosystem (or its dependencies) require full compilations. Whilst this does help ensure that the underlying package routines are optimised for your system, the entire install process may take up to an hour. This installation can be sped up substantially by instead installing a pre-bundled version of the `tidyverse` packages (and dependencies) directly from the Ubuntu repositories. The associated alternative `Dockerfile` is provided in the following expandable section. <details> <summary>Alternative Dockerfile</summary> As an alternative, we could instead install tidyverse from the ubuntu r-cran repository. This install will be far faster, yet likely not as up-to-date and we would have a little less control over exactly which version we were installing... ```{bash} #| label: createDockerfile5a #| echo: false #| eval: true #| classes: bash #| engine: bash cd ~/tmp/docker_tests echo 'FROM rocker/r-ver:4.2.2' > Dockerfile5a echo 'LABEL maintainer="Author"' >> Dockerfile5a echo 'LABEL email="author_email@email.com"' >> Dockerfile5a echo '' >> Dockerfile5a echo 'RUN apt-get update \' >> Dockerfile5a echo ' && apt-get install -y --no-install-recommends \' >> Dockerfile5a echo ' r-cran-tidyverse \' >> Dockerfile5a echo ' && rm -rf /var/lib/apt/lists/*' >> Dockerfile5a echo '' >> Dockerfile5a echo 'WORKDIR /home/Project' >> Dockerfile5a echo '' >> Dockerfile5a echo '## Default command to run' >> Dockerfile5a echo 'ENTRYPOINT ["Rscript"]' >> Dockerfile5a echo '' >> Dockerfile5a echo 'CMD ["analysis.R"]' >> Dockerfile5a ``` ```{r} #| label: Text5a #| engine: verbatim #| results: asis #| file: ~/tmp/docker_tests/Dockerfile5a #| eval: false #| classes: dockerfile #| class-output: dockerfile ``` </details></br> We will now build the container image with the name:tag of `r-tidyverse:1` Note, this will take some time to complete. ```{bash} #| label: dockerBuildReal5 #| engine: bash #| results: hide #| eval: true #| echo: false cd ~/tmp/docker_tests docker build . --tag r-tidyverse:1 -f Dockerfile5 ``` ```{bash} #| label: dockerBuild5 #| engine: bash #| results: hide #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn #| eval: false docker build . --tag r-tidyverse:1 -f Dockerfile5 ``` </br> If we now run `docker` with this image, the resulting container will automatically run a non-interactive session using the `analysis.R` script. ```{bash} #| label: dockerRun50a #| engine: bash #| results: markup #| eval: true #| comment: "" #| cache: true #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn cd ~/tmp/docker_tests/ docker run --rm -v `pwd`:/home/Project r-tidyverse:1 ``` where: - `-v pwd:/home/Project` defines the mounting of the current working directory on the host machine to the `/home/Project' (which the `Dockerfile` defined as the working directory) folder inside the container Conveniently, we can override the **CMD** command when we run `docker run`. In the current context, perhaps we would like to run the R session on an different R script. Lets try this by creating a new R script - this time, one that makes use of the `tidyverse` ecosystem. ```{bash} #| label: createanalysis5 #| echo: false #| eval: true #| classes: bash #| engine: bash #| cache: false cd ~/tmp/docker_tests echo 'library(tidyverse)' > analysis5.R echo 'dat <- data.frame(y = rnorm(10))' >> analysis5.R echo 'dat' >> analysis5.R echo 'dat %>% summarise(Mean = mean(y), Median = median(y))' >> analysis5.R ``` ```{r} #| label: Rcode5 #| engine: text #| results: asis #| file: ~/tmp/docker_tests/analysis5.R #| eval: false #| classes: dockerfile #| class-output: dockerfile ``` Now we can nominate this alternative R script as the argument to the `Rscript` command. ```{bash} #| label: dockerRun50 #| engine: bash #| results: markup #| eval: true #| comment: "" #| cache: true #| highlight-style: zenburn #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false cd ~/tmp/docker_tests/ docker run --rm -v `pwd`:/home/Project r-tidyverse:1 analysis5.R ``` ## Managing Docker images ::: {.panel-tabset} ### Listing images To list the most recently created images. ```{bash} #| label: docker ls #| engine: bash #| results: markup #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| out.lines: 7 #| highlight-style: zenburn #| eval: true #| comment: "" #| cache: true docker images ``` The output will include all created images along with the base images that they are derived from (those sourced from Dockerhub for example). The images (`REPOSITORY`) entries that are `<none>` are _dangling_ images. That is, they are intermediate images that were previously used in the building of an image and are no longer used (because the `Dockerfile` layer that they were associated with is no longer used in the latest version of that `Dockerfile`). We can exclude them from the output, by defining a filter that excludes dangling images. ```{bash} #| label: docker f dangling #| engine: bash #| results: markup #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| out.lines: 7 #| highlight-style: zenburn #| eval: true #| comment: "" #| cache: true docker images -f "dangling=false" ``` If instead of `dangling=false`, you indicate `dangling=true`, only dangling images are displayed. This can be useful for identifying redundant images. Another useful filter is to predicate on time relative to an image. For example, to display all images that were created _since_ `minideb`: ```{bash} #| label: docker f after #| engine: bash #| results: markup #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| out.lines: 7 #| highlight-style: zenburn #| eval: true #| comment: "" #| cache: true docker images -f "since=minideb" ``` There is also a _before_ version. Note, the above examples will exclude the intermediate images that are associated with build layers. If we want to see all images (including the intermediate images), use the `-a` switch. ```{bash} #| label: docker ls a #| engine: bash #| results: markup #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| out.lines: 7 #| highlight-style: zenburn #| eval: true #| comment: "" #| cache: true docker images -a ``` ### Removing images ::: {.panel-tabset} #### Removing all **unused** images This will also remove unused (dangling) containers, volumes and caches. ```{bash} #| label: docker system prune #| engine: bash #| results: markup #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn #| eval: false #| comment: "" #| cache: true docker system prune ``` #### Removing **all** images ```{bash} #| label: docker system prune a #| engine: bash #| results: markup #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn #| eval: false #| comment: "" #| cache: true docker system prune -a ``` #### Removing **specific** images ```{bash} #| label: docker rmi #| engine: bash #| results: markup #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn #| eval: false #| comment: "" #| cache: true docker rmi <ID> ``` where `<ID>` is the `IMAGE ID` from `docker images -a` #### Removing images (**regex**) ```{bash} #| label: docker rmi grep #| engine: bash #| results: markup #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn #| eval: false #| comment: "" #| cache: true docker images -a | grep <"pattern"> | awk '{print $3}' | xargs docker rmi ``` where `<"pattern">` is a regular expression for isolating the name of the image(s) to remove. ::: ### remove dangling images ```{bash} #| label: docker remove dangling #| engine: bash #| results: markup #| classes: terminal #| class-source: terminal #| class-output: terminal #| code-line-numbers: false #| highlight-style: zenburn #| eval: false #| comment: "" #| cache: true docker rmi -f $(docker images -f "dangling=true" -q). ``` ::: # Generating docker images on github Up to now, the docker images we have created have been housed in a registry locally on the machine that we are building them on. In the spirit of reproducibility, we ideally want these images to be available to others (as well as our future selves). There are numerous options for making images available: - each user (or your future self) can build the image from the `Dockerfile`. In many cases, this will be a lengthy (yet automated) process. For this to be a viable option, each party will need to have the ability and resources to be able to build docker images from a Dockerfile. - the docker image could be hosted on a remote repository (such as [dockerhub](hub.docker.com)). This option requires that the original author of the docker image has a dockerhub account so that they can push the image that they created locally up to the remote registry. - build and host the docker image on github. This option requires the original author/authors to have a github account. Building Docker images on GitHub via GitHub Actions is useful because it enables automated, consistent, and reproducible builds directly from your remote repository. As such, it is possible to trigger a fresh image build every time there is a change to the repo itself. This reduces manual effort and minimizing deployment errors. With GitHub Actions, you can integrate CI/CD workflows, automatically test images, and push them to container registries (e.g., GitHub Container Registry or Docker Hub). It also enhances collaboration, as team members can track build logs, detect failures early, and enforce security best practices through automated vulnerability scanning. Finally, it also means that the one platform can be used to host both code and the environment in which to run the code. To have Github Actions build and publish a docker image, we start by creating a github actions workflow file. This file is in yaml format and should be placed a directory called `.github/workflows` within your git repository. ```{r} #| label: prepareCIFolder1 #| results: hide #| eval: true #| echo: false #| cache: false setwd("~/tmp/docker_tests") unlink('.github/workflows', recursive=TRUE, force=TRUE) unlink('.github', recursive=TRUE, force=TRUE) ``` ```{bash} #| label: CI1 #| echo: false #| eval: true #| classes: bash #| engine: bash #| cache: false mkdir ~/tmp/docker_tests/.github mkdir ~/tmp/docker_tests/.github/workflows cd ~/tmp/docker_tests/.github/workflows echo 'name: Create and publish the Docker image' > Docker-build-publish.yml echo '' >> Docker-build-publish.yml echo 'on:' >> Docker-build-publish.yml echo ' workflow_dispatch:' >> Docker-build-publish.yml echo ' push:' >> Docker-build-publish.yml echo ' branches: [ "main" ]' >> Docker-build-publish.yml echo '' >> Docker-build-publish.yml echo 'env:' >> Docker-build-publish.yml echo ' REGISTRY: ghcr.io' >> Docker-build-publish.yml echo ' IMAGE_NAME: ${{ github.repository }}' >> Docker-build-publish.yml echo '' >> Docker-build-publish.yml echo 'jobs:' >> Docker-build-publish.yml echo ' build-image:' >> Docker-build-publish.yml echo ' runs-on: ubuntu-latest' >> Docker-build-publish.yml echo '' >> Docker-build-publish.yml echo " if: \"!contains(github.event.head_commit.message, '[ci skip]')\"" >> Docker-build-publish.yml echo '' >> Docker-build-publish.yml echo ' permissions:' >> Docker-build-publish.yml echo ' contents: read' >> Docker-build-publish.yml echo ' packages: write' >> Docker-build-publish.yml echo '' >> Docker-build-publish.yml echo ' name: ${{ matrix.config.r }}' >> Docker-build-publish.yml echo '' >> Docker-build-publish.yml echo ' strategy:' >> Docker-build-publish.yml echo ' fail-fast: false' >> Docker-build-publish.yml echo ' matrix:' >> Docker-build-publish.yml echo ' config:' >> Docker-build-publish.yml echo ' #- { r: 'devel' }' >> Docker-build-publish.yml echo ' #- { r: 'next' }' >> Docker-build-publish.yml echo ' - { r: '4.4.1' }' >> Docker-build-publish.yml echo '' >> Docker-build-publish.yml echo ' steps:' >> Docker-build-publish.yml echo ' - name: Checkout repository' >> Docker-build-publish.yml echo ' uses: actions/checkout@v4' >> Docker-build-publish.yml echo '' >> Docker-build-publish.yml echo ' - name: Login to GitHub Container Registry' >> Docker-build-publish.yml echo ' uses: docker/login-action@v1' >> Docker-build-publish.yml echo ' with:' >> Docker-build-publish.yml echo ' registry: ${{ env.REGISTRY }}' >> Docker-build-publish.yml echo ' username: ${{ github.repository_owner }} #' >> Docker-build-publish.yml echo ' password: ${{ secrets.GITHUB_TOKEN }}' >> Docker-build-publish.yml echo '' >> Docker-build-publish.yml echo ' - name: Extract metadata (tags, labels) for Docker' >> Docker-build-publish.yml echo ' id: meta' >> Docker-build-publish.yml echo ' uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7' >> Docker-build-publish.yml echo ' with:' >> Docker-build-publish.yml echo ' images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}' >> Docker-build-publish.yml echo '' >> Docker-build-publish.yml echo ' - name: Build and push Docker image' >> Docker-build-publish.yml echo ' uses: docker/build-push-action@v5' >> Docker-build-publish.yml echo ' with:' >> Docker-build-publish.yml echo ' context: .' >> Docker-build-publish.yml echo ' push: true' >> Docker-build-publish.yml echo ' tags: ${{ steps.meta.outputs.tags }}' >> Docker-build-publish.yml echo ' labels: ${{ steps.meta.outputs.labels }}' >> Docker-build-publish.yml ``` ```{r} #| label: CIcode1 #| engine: text #| results: asis #| file: ~/tmp/docker_tests/.github/workflows/Docker-build-publish.yml #| eval: false #| classes: dockerfile #| class-output: dockerfile ``` In the above `.github/workflow`: - **name** give a name to the workflow and is how it will be referred to within the Actions user interface - **on: workflow_dispatch:** specifies that this workflow can runs when manually triggered from the GitHub Actions UI. - **on: push: branches [ "main" ]:** specifies that this workflow will be triggered whenever there is a pushed change to the "main" branch - **env: REGISTRY:** sets the container registry to GitHub Container Registry (ghcr.io). - **env: IMAGE_NAME:** sets the repository name (in this case to the name of the github repository). - **jobs: build-image:** runs-on: defines a job (build-image) that runs on the latest version of Ubuntu - **if:** provides a way of defining exclusion rules. In this case it prevents execution if the latest commit message contains either "ci" or "skip". This is useful for avoiding unnecessary builds. - **permissions: contents: read:** allows the workflow to read the repository contents. - **packages: write:** grants permission to push Docker images to Github Container Registry. - **strategy: fail-fast:** - **strategy: matrix: config:** defines a build matrix for testing against multiple R versions (only 4.4.1 is active). Commented-out lines suggest that devel and next versions were once considered. - **steps:** the actions to perform: - **users: actions/checkout@v4** clones the repository so that the workflow can access files like the Dockerfile and any other code in the repository. It is using version four of this routine - **users: docker/login-action@v1 ...** uses GitHub's built-in token (`GITHUB_TOKEN`) to authenticate with Github Container Registry. Ensures that pushing images to Github Container Registry is authorized. - **users: docker/metadata-actions@...** generates metadata (tags, labels) based on the repository and commit details. The specific commit hash (in this case 9ec57ed1...) ensures a fixed version of docker/metadata-action. - **users: docker/build-push-action@v5 ..** builds the Docker image from the repository. Pushes the image to Github Container Registry with the metadata-generated tags and labels. Now any time there is a change to the main branch, the image will be rebuilt and published to Github Container Registry. To access the built image: - navigate to the `Code` panel of the repositories Github page. - click on the item under the heading `Packages`. ![](../resources/git_packages1.png) The displayed panel will provide information on the versions of the package available as well as instructions on how to obtain (pull) the image to a local repository from where it can be run. ::: {.raw} <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script> <script type='text/javascript'> $(document).ready(function () { $('div.terminal').prepend("<div class='head'><a href='#' class='mbtn'>✕</a><a href='#' class='mbtn stnd'>─</a><a href='#' class='mbtn stnd max'>◻</a><span>murray@arch:~</span></div>"); $('div.terminal-docker').prepend("<div class='head'><a href='#' class='mbtn'>✕</a><a href='#' class='mbtn stnd'>─</a><a href='#' class='mbtn stnd max'>◻</a><span>#</span></div>"); }); </script> :::