Containerisation - docker and singularity

Author

Murray Logan

Published

February 17, 2025

1 Background information

In the previous tutorial, we discussed why it is important that your codebase is under version control and backed up to help ensure that your analyses can be evaluated and replicated by you and others both now and into the future. However, having access to the code (and data) does not always guarantee full reproducibility - this can also be effected by the exact software environment in which the code is run.

In the context of statistical analyses performed in R for example, R as well as the various packages that you have elected to use in support your analyses can (and do) evolve over time. Some functions get modified and some even get depreciated. Hence, over time, code that once worked perfectly (or at least adequately) can become broken.

Early solutions to this facet of reproducibility focused on virtual machines. Virtual machines (VM) build entire software environments on top of a software layer that mimics a physical computer such that each VM runs on a host computer is a completely separate self-contained entity. Whilst VMs do permit great flexibility (as virtually any operating system can be installed on a VM), they are considerably slower and less efficient than physical machines. Moreover, it is typically necessary to allocate a fixed amount of computer resources (particularly CPU) to the VM in advance.

More modern solutions focus instead on Containers. In contrast to VMs, containers do not mimic a physical computer, rather they only virtualise layers on top of the host operating system. Indeed, containers share (read only) the host OS kernel and binaries/libraries and thus containers and the applications contained therein can be very “light” and are typically are almost as performant as applications run natively on the host.

Time for some container terminology:

  • Container image is a static (unchangeable) file (or collection of files) that bundles code and all its dependencies (such as the necessary system libraries, code, runtime and system tools). Essentially, the image has all the information required to reproduce an software environment on any compatible machine. However, an image is just a snapshot which serves as a template to build a container. In other words, a container is a running image, and cannot exist without the image, whereas an image can exist without a container.

  • Container is a standard (linux) process whose software environment is defined by the contents of a container image and that runs on top of the host’s OS.

2 Preparations

If you intend to follow along with this tutorial, you may like to:

  • create a new folder (hereafter referred to as the sandpit folder) in which to create some files. On my local machine, I have a folder (tmp) in my home folder into which I will place a folder (called docker_tests) for this very purpose. On Linux and MacOSX that would be achieved via the following:

    mkdir ~/tmp/docker_tests


3 Docker

Currently, the most popular container engine in use today is Docker. Docker is easy to install on most operating systems and comes with tools to build, manage, run and distribute container images (the later of which is supported via the DockerHub container ecosystem.

3.1 Simple overview

  1. Create a Docker definition file.

    The Dockerfile contains a set of instructions that Docker uses to build your container with the correct specifications. For now you do not need to know all the bits and pieces here (though please see this link for a more in-depth understanding of what the Dockerfile is capable of).

    Lets start with a very simple Dockerfile (which should be a plain text file located in the root of a project directory). This first example will be a very minimal example and much of the rest of the Docker section of this tutorial will then progressively build on this example to introduce more and more complexity.

    The image we will build will start with a very minimal Debian Linux base called minidep sourced from Dockerhub. This provides a fully functioning Linux operating system complete with the typical terminal power tools.

    To then extend the image by updating the package lists (location of repositories) before adding (installing) a small fun terminal application (cowsay) that generates ASCII art of a cow (or other animals) along with a speech bubble.

    FROM bitnami/minideb:bookworm
    LABEL maintainer="Author"
    LABEL email="author_email@email.com"
    
    ## Install the os packages
    RUN apt-get update \
      && apt-get install -y --no-install-recommends \
        cowsay \
      && rm -rf /var/lib/apt/lists/*

    In the above Dockerfile:

    • the three first rows in contain information on the base image on which to construct your docker container image (in this case bitnami/minideb:stretch container freely provided by the bitnami team), as well as information about yourself. minideb is a minimal debian operating system.

    • the FROM command points to a parent image. Typically, this will point to a specific image within a registry on Docker hub. This generates the base layer of the container image.

    • the LABEL command is one of numerous environmental variables and is used to add metadata to an image. In this case, there are two entries to specify information about the maintainer and their contact details. Note, entries must be key-value pairs and values must be enclosed in double quotes.

    • the RUN command runs shell commands in a new layer on top of the current image and commits the result. Each RUN generates a new layer. The above example, first updates the packages list and then installs an additional package (cowsay).

  2. Build the docker image

    In a terminal in the same location as the Dockerfile (i.e. your sandpit folder), enter something the following:

    docker build . --tag minideb


    where:

    • . indicates that the path used for the build context (in this case the current working directory - . means current location). Any files within this path can be copied into the image or used in the build context (for example, a Dockerfile).

    • --tag minideb provides a name (and optionally, a tag) for the image (in this case minideb). The name (and tag) can be anything, yet should be descriptive enough to help you distinguish this container image from other container images that you might construct on your system.

    This will build a series of docker container images (each of the layers built upon the layer before) in a local registry

    More details

    Usage: docker build [OPTIONS] PATH | URL | -
    Common options:

    Name Description Default
    --file, -f Path and name of Dockerfile Dockerfile
    --no-cache Do not use cache when building image
    --tag, -t Name (and optional tag) in name:tag format

    Typical files in context:

    • Dockerfile: a build recipe file
    • .dockerignore: similar to a .gitignore, this file lists files to be ignored in collating the files that form the build context.

    As an alternative to providing build instructions in the form of a Dockerfile, build can accept a URL to a (remote) docker repository.

    More info:
    https://docs.docker.com/engine/reference/commandline/build/

  3. Check that the image(s) have been created

    A list of images in your registry is obtained by:

    docker image ls
    REPOSITORY   TAG       IMAGE ID       CREATED        SIZE
    minideb      latest    637720d5a978   1 second ago   121MB

    Note, the above simply lists all named images in your local registry. To get a more complete list of all images:

    docker images -a  
    REPOSITORY   TAG       IMAGE ID       CREATED        SIZE
    minideb      latest    637720d5a978   1 second ago   121MB

    The -a switch indicates all images (including unnamed and hanging images).

    This list appears chronologically from bottom to top. Hence, the Dockerhub image (bitnami/minideb) appears at the bottom of this list and above this there is a succession of intermediate images that correspond to each of the layers defined in the Dockerfile. Note, each successive image is progressively larger in size as the layer incorporates the layer below. At the top of the list is the full container image (with latest tag).

    Importantly, while we have built a container image, we do not yet have any running containers. We can demonstrate this by listing all existing containers:

    docker ps -a  
    CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
    ....

    The output is empty (assuming you have not previously generated any containers), indicating that there are currently no running containers.

  4. Test the docker image (fire up an ephemeral container)

    We will now test the image by generating and running a container from our container image. Once the container has started, the cowsay terminal application will display an ASCII cow saying ’Moo” before quietly terminating. The container will be automatically stopped and removed.

    docker run --entrypoint ./usr/games/cowsay --rm minideb Moo
     _____
    < Moo >
     -----
            \   ^__^
             \  (oo)\_______
                (__)\       )\/\
                    ||----w |
                    ||     ||

    where:

    • --entrypoint ./usr/games/cowsay defines the base command that will be run once the container has started. In this case, it specifies the full path to the cowsay executable file.

    • --rm indicates that the container should be removed after it has finished running.

    • minideb is the name of our docker container image

    • Moo is the string passed on to cowsay to display in the speech bubble. Feel free to experiment with other strings here.

    To further appreciate the way arguments are passed on to applications within a container, lets alter the cowsay animal.

    docker run --entrypoint ./usr/games/cowsay --rm minideb -f /usr/share/cowsay/cows/koala.cow Grunt 
     _______
    < Grunt >
     -------
      \
       \
           ___  
         {~._.~}
          ( Y )
         ()~*~()   
         (_)-(_)   

    In the above example, we passed -f /usr/share/cowsay/cows/koala.cow Grunt on to cowsay. In this context, -f points to were alternative animal definitions are located.

  5. Test the docker image interactively

    Rather than fire up a container, run some command and then immediately terminate, it is possible to run a container in interactive mode. In this mode, after the container starts up, you will be placed in a terminal where you can issue any available command you like. Once you have finished the interactive session, simply enter exit and the container will then terminate.

    Try the following (within the container):

    docker run --rm -it minideb   

    Once the prompt appears try entering the following:

    • list the files and folders in the current working directory
    ls -la
    • run the cowsay application
    ./usr/games/cowsay Moo
    • exit the container
    exit

    Running in interactive mode is very useful when developing/debugging code on a container.

3.2 Some Dockerfile goodness

Lets now step this up a bit and add some more information to the build recipe. Rather than alter the previous Dockerfile, we will instead make a different file (Dockerfile2) and inform docker to build with this alternative Dockerfile.

FROM bitnami/minideb:bookworm
LABEL maintainer="Author"
LABEL email="author_email@email.com"

## Install the os packages
RUN apt-get update \
  && apt-get install -y --no-install-recommends \
    cowsay \
  && rm -rf /var/lib/apt/lists/*

## Default command to run
ENTRYPOINT ["/usr/games/cowsay","-f","/usr/share/cowsay/cows/koala.cow"]

## Default extra parameters passed to the command
CMD ["Grunt"]

In the above Dockerfile:

  • ENTRYPOINT provides a default command to run within the container. This specification is in JSON format.

  • CMD provides default extra parameters that are passed on to the command (also in JSON format). This can be overridden by passing an alternative when running the docker run command (see below).

If we now build our container image using this Dockerfile2:

docker build . --tag minideb -f Dockerfile2 

If we again review the list of images, we see that there are now two additional intermediate images and the latest image has been updated.

docker images -a
REPOSITORY   TAG       IMAGE ID       CREATED         SIZE
<none>       <none>    637720d5a978   2 seconds ago   121MB
minideb      latest    eb43f6f05a68   2 seconds ago   121MB

We can now run the container as:

docker run --rm minideb
 _______
< Grunt >
 -------
  \
   \
       ___  
     {~._.~}
      ( Y )
     ()~*~()   
     (_)-(_)   

To override the default arguments (CMD) we baked into the docker image, we can issue an alternative as a command line argument.

docker run --rm minideb Meow
 ______
< Meow >
 ------
  \
   \
       ___  
     {~._.~}
      ( Y )
     ()~*~()   
     (_)-(_)   

So far, we have used the docker container to display an ASCII art cow (or koala) in the terminal and then exit. Whilst, this might have some utility as a simple example of interacting with containers, it hardly represents typical work.

In the context of reproducible research, containers are useful for providing a consistent environment in which to run code. Thus in order to be useful, a container should have:

  1. access (or a copy) of the code within the container

  2. the ability to store the results on the host where they can be viewed and disseminated.

To illustrate these, we will add the R Statistical and Graphical Environment to our container image and use this in two further examples.

Copy

For the first example, we will add instructions to the Dockerfile to copy a small R script (lets call it analysis.R) into the container so that the code can be run within the container environment. Lets create two files:

dat <- data.frame(y = rnorm(10))
dat
write.csv(dat, file = "dat.csv", row.names = FALSE)
  • a Dockerfile called Dockerfile3
FROM bitnami/minideb:bookworm
LABEL maintainer="Author"
LABEL email="author_email@email.com"

## Install the os packages
RUN apt-get update \
  && apt-get install -y --no-install-recommends \
    r-base \
  && rm -rf /var/lib/apt/lists/*

COPY analysis.R ~/
WORKDIR ~/

## Default command to run
ENTRYPOINT ["Rscript"]

## Default command parameters
CMD ["analysis.R"]

This Dockerfile includes instructions to:

- install R (`r-base`)

- copy our R script from the current working directory to the home
  folder in the container (`COPY analysis.R ~/`)

- set the working directory within the container to be the home
  folder (`WORKDIR ~/`)

- specify that once the container has started, the `Rscript`
  command should be run (`ENTRYPOINT ["Rscript"]`)

- specify that once the container has started, the `Rscript`
  command should be run using the `analysis.R` script 
  (`CMD ["analysis.R"]`)

Great. This time when we build the container image, we will provide both a name and tag for the image (via --tag r:1). This will result in an image called r with a tag of 1.

docker build . --tag r:1 -f Dockerfile3


When we run this new image, we see that a data frame of 10 values is returned to the terminal.

docker run --rm r:1
            y
1   1.0769312
2  -1.1909726
3  -1.1649543
4   0.2700351
5   0.8343507
6   1.4796507
7  -1.0420881
8  -1.4226972
9   1.0876275
10  0.4221856

Mount points

R did indeed run the analysis.R script inside the container. However, what happened to the file containing the exported data (dat.csv)? Although this file was created inside the container, it was completely lost when the container terminated. Obviously that is not that useful.

For the second example, rather than copy the R script to the container, we will instead mount a local folder to a point within the container. That way we can access select host files and folders within the container, thereby enabling us to both read the R script directly and write out any output files.

To support this, we will create another Dockerfile (Dockerfile4).

FROM bitnami/minideb:bookworm
LABEL maintainer="Author"
LABEL email="author_email@email.com"

## Install the os packages
RUN apt-get update \
  && apt-get install -y --no-install-recommends \
    r-base \
  && rm -rf /var/lib/apt/lists/*

WORKDIR /home/Project

## Default command to run
ENTRYPOINT ["Rscript"]

## Default command parameters
CMD ["analysis.R"]

The changes from the previous Dockerfile:

  • remove the COPY statement as we will not need to work on a copy of the R code, we can work on it directly.

  • change the container working directory to /home/Project - Note, this path will not exist and will be created.

We will now build the container image with the name:tag of r:2

docker build . --tag r:2 -f Dockerfile4 


This time, when we run the container image, we will indicate a volume to mount (and a folder to mount this volume to). This will define which host folder to mount (map) and the path to mount this to within the container.

cd ~/tmp/docker_tests/
ls -lat
docker run --rm -v $(pwd):/home/Project r:2
total 28
drwxr-xr-x 2 runner docker 4096 Feb 17 04:04 .
-rw-r--r-- 1 runner docker  361 Feb 17 04:04 Dockerfile4
-rw-r--r-- 1 runner docker  369 Feb 17 04:04 Dockerfile3
-rw-r--r-- 1 runner docker   89 Feb 17 04:04 analysis.R
-rw-r--r-- 1 runner docker  403 Feb 17 04:04 Dockerfile2
-rw-r--r-- 1 runner docker  238 Feb 17 04:04 Dockerfile
drwxr-xr-x 3 runner docker 4096 Feb 17 04:04 ..
            y
1  -0.5948902
2  -1.0014230
3   0.6347289
4  -2.0479466
5   2.0936146
6   0.5159487
7  -0.3759240
8  -0.3827792
9  -0.7834866
10  0.1396498

where:

  • -v pwd:/home/Project defines the mounting of the current working directory on the host machine to the /home/Project' (which theDockerfile` defined as the working directory) folder inside the container

If we list the contents of our local folder, we see that the output file (dat.csv) has been created on the host filesystem.

cd ~/tmp/docker_tests/
ls -la  
total 32
drwxr-xr-x 2 runner docker 4096 Feb 17 04:04 .
drwxr-xr-x 3 runner docker 4096 Feb 17 04:04 ..
-rw-r--r-- 1 runner docker  238 Feb 17 04:04 Dockerfile
-rw-r--r-- 1 runner docker  403 Feb 17 04:04 Dockerfile2
-rw-r--r-- 1 runner docker  369 Feb 17 04:04 Dockerfile3
-rw-r--r-- 1 runner docker  361 Feb 17 04:04 Dockerfile4
-rw-r--r-- 1 runner docker   89 Feb 17 04:04 analysis.R
-rw-r--r-- 1 root   root    185 Feb 17 04:04 dat.csv

Of course, it is possible to define a combination of the above two examples - one in which a copy of the codebase is packaged up into the container image (to ensure that a specific codebase is always applied), yet a moutpoint is also specified at run time to enable the output(s) to be obtained by the host.

The .dockerignore file is similar in structure to a .gitignore file in that they both define files (or patterns) to ignore. The purpose of a dockerignore is to indicate which files and folders should be excluded from the docker build context. Excluding certain files and directories:

  • reduce image size by preventing unnecessary files from being copied (e.g., logs, temporary files).
  • speed up builds by avoiding sending large or irrelevant files to the Docker daemon.
  • improve security by excluding sensitive files like .env or credentials

3.3 Building a reproducible R environment

Just as it might be important to be able to recreate the state of an operating system and software from a previous time (to maximise the potential for reproducibility), it might be equally important to ensure that the entire environment reflects this state back in time. In the case of R software, this means that all included packages should be the same versions that were available at that previous time.

Posit (the developers of Rstudio) provide daily snapshots of CRAN (Posit package manager). We can therefore nominate a date when providing package installation instructions in a Dockerfile.

On UNIX based systems, such as linux and MacOSX, many R packages need to be compiled from source during their installation. As such, they sometimes have additional external system dependencies. Normally, it it is necessary to install these external dependencies prior to attempting to install the R packages. This is done via the apt-get install (debian) instructions in the Dockerfile. Unfortunately, this can turn into a very iterative process or attempting to install an R package, examine the progress and look out for any errors about missing dependencies and going back to the Dockerfile and adding the appropriate dependencies before trying again.

The R package called pak is designed to install R packages and if necessary handle the installation of any additional external dependencies as well. This makes pak a very useful addition in Dockerfiles.

Some R packages have system dependencies that usually must be installed before attempting to install R packages.

For this example, we will start from an image that already has a version of R (4.2.2) built in (rocker/r-ver:4.2.2). Although this image is substantially larger than the mini debian we used earlier, it does come with all the build tools and R, each of which would otherwise require additional downloads and compilation. The net result is that the rocker/r-ver:4.2.2 image requires less overall download traffic than the individual parts.

Note

Note, this image will take substantially longer to make as it not only has to pull down a larger base, it then has to compile the entire tidyverse from source.

The changes from the previous Dockerfile:

  • switch the base image to rocker:r-ver:4.2.2
  • add numerous -dev developer package dependencies
  • install the tidyverse collection of packages from a dated snapshot
FROM rocker/r-ver:4.2.2
LABEL maintainer="Author"
LABEL email="author_email@email.com"

RUN apt-get update \
  && apt-get install -y --no-install-recommends \
  libxml2-dev \
  libcurl4-openssl-dev  \
  libssl-dev \
  zlib1g-dev \
  && rm -rf /var/lib/apt/lists/*

## Install R package versions from Posit package manager (based on a date - YYYY-MM-DD)
RUN R -e "options(repos = \
    list(CRAN = \"https://packagemanager.posit.co/cran/2024-01-10/\")); \
  install.packages(\"pak\"); \
"
RUN R -e "options(repos = \
    list(CRAN = \"https://packagemanager.posit.co/cran/2024-01-10/\")); \
  pak::pkg_install(c(\"tidyverse\")); \
"

WORKDIR /home/Project

## Default command to run
ENTRYPOINT ["Rscript"]

CMD ["analysis.R"]

In the above Dockerfile

  • the three first rows in contain information on the base image on which to construct your docker container image (in this case rocker/r-ver:4.2.1 container freely provided by the rocker team), as well as information about yourself.

  • the FROM command points to a parent image. Typically, this will point to a specific image within a registry on Docker hub. This generates the base layer of the container image.

  • the LABEL command is one of numerous environmental variables and is used to add metadata to an image. In this case, there are two entries to specify information about the maintainer and their contact details. Note, entries must be key-value pairs and values must be enclosed in double quotes.

  • the RUN command runs shell commands in a new layer on top of the current image and commits the result. Each RUN generates a new layer. In the above example, there are two RUN commands.

    • the first RUN command updates the underlying Ubuntu linux distribution and installs some necessary dependencies
    • the second RUN adds the R package (tidyverse - which is technically a large collection of packages) from the Posit package manager) snapshot repository (base on the 4th of October 2022). This repository stores daily snapshots of all the packages on CRAN and thus allows us to obtain the set of packages in the state they existed on a nominated day.
  • the WORKDIR command sets the working directory for software within the container. In this case, we are creating a dedicated directory (/home/Project)

  • the ENTRYPOINT command defines the default command/application to run within the container if the user does not provide a command. This is in JSON format. In this case, we are indicating that by default the container should run the Rscript application. The Rscript application runs a non-interactive R session on a nominated R script file. That is, it will run the nominated R script and then terminate on completion.

  • the CMD command defined the default arguments to provide to the ENTRYPOINT command. In this case, we have indicated which R script to run Rscript on.

Note, many of the large collection of R packages targeted for install as part of the tidyverse ecosystem (or its dependencies) require full compilations. Whilst this does help ensure that the underlying package routines are optimised for your system, the entire install process may take up to an hour. This installation can be sped up substantially by instead installing a pre-bundled version of the tidyverse packages (and dependencies) directly from the Ubuntu repositories. The associated alternative Dockerfile is provided in the following expandable section.

Alternative Dockerfile

As an alternative, we could instead install tidyverse from the ubuntu r-cran repository. This install will be far faster, yet likely not as up-to-date and we would have a little less control over exactly which version we were installing…

FROM rocker/r-ver:4.2.2
LABEL maintainer="Author"
LABEL email="author_email@email.com"

RUN apt-get update \
  && apt-get install -y --no-install-recommends \
  r-cran-tidyverse \
  && rm -rf /var/lib/apt/lists/*

WORKDIR /home/Project

## Default command to run
ENTRYPOINT ["Rscript"]

CMD ["analysis.R"]


We will now build the container image with the name:tag of r-tidyverse:1 Note, this will take some time to complete.

docker build . --tag r-tidyverse:1 -f Dockerfile5 


If we now run docker with this image, the resulting container will automatically run a non-interactive session using the analysis.R script.

cd ~/tmp/docker_tests/  
docker run --rm -v `pwd`:/home/Project r-tidyverse:1
            y
1   0.2601990
2   0.7094164
3  -0.6011872
4  -0.9571264
5  -0.3310606
6   0.9465565
7   0.1731583
8  -0.1315534
9   0.3535345
10 -1.0534199

where:

  • -v pwd:/home/Project defines the mounting of the current working directory on the host machine to the /home/Project' (which theDockerfile` defined as the working directory) folder inside the container

Conveniently, we can override the CMD command when we run docker run. In the current context, perhaps we would like to run the R session on an different R script. Lets try this by creating a new R script - this time, one that makes use of the tidyverse ecosystem.

library(tidyverse)
dat <- data.frame(y = rnorm(10))
dat
dat %>% summarise(Mean = mean(y), Median = median(y))

Now we can nominate this alternative R script as the argument to the Rscript command.

cd ~/tmp/docker_tests/
docker run --rm -v `pwd`:/home/Project r-tidyverse:1 analysis5.R
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
            y
1  -0.5198849
2  -0.5561143
3   0.4415712
4  -0.2906648
5   1.8838446
6  -0.3950799
7   1.0785308
8   1.4462143
9  -0.4401129
10 -0.5984643
      Mean     Median
1 0.204984 -0.3428723

3.4 Managing Docker images

To list the most recently created images.

docker images
REPOSITORY    TAG       IMAGE ID       CREATED         SIZE
r-tidyverse   1         f6acc9d21e01   6 seconds ago   2.08GB
r             2         97b469aa03e2   7 minutes ago   289MB
r             1         d9c42516c104   7 minutes ago   289MB
<none>        <none>    637720d5a978   7 minutes ago   121MB
minideb       latest    eb43f6f05a68   7 minutes ago   121MB

The output will include all created images along with the base images that they are derived from (those sourced from Dockerhub for example).

The images (REPOSITORY) entries that are <none> are dangling images. That is, they are intermediate images that were previously used in the building of an image and are no longer used (because the Dockerfile layer that they were associated with is no longer used in the latest version of that Dockerfile).

We can exclude them from the output, by defining a filter that excludes dangling images.

docker images -f "dangling=false"
REPOSITORY    TAG       IMAGE ID       CREATED         SIZE
r-tidyverse   1         f6acc9d21e01   6 seconds ago   2.08GB
r             2         97b469aa03e2   7 minutes ago   289MB
r             1         d9c42516c104   7 minutes ago   289MB
minideb       latest    eb43f6f05a68   7 minutes ago   121MB

If instead of dangling=false, you indicate dangling=true, only dangling images are displayed. This can be useful for identifying redundant images.

Another useful filter is to predicate on time relative to an image. For example, to display all images that were created since minideb:

docker images -f "since=minideb" 
REPOSITORY    TAG       IMAGE ID       CREATED         SIZE
r-tidyverse   1         f6acc9d21e01   6 seconds ago   2.08GB
r             2         97b469aa03e2   7 minutes ago   289MB
r             1         d9c42516c104   7 minutes ago   289MB

There is also a before version.

Note, the above examples will exclude the intermediate images that are associated with build layers. If we want to see all images (including the intermediate images), use the -a switch.

docker images -a 
REPOSITORY    TAG       IMAGE ID       CREATED         SIZE
r-tidyverse   1         f6acc9d21e01   6 seconds ago   2.08GB
r             2         97b469aa03e2   7 minutes ago   289MB
r             1         d9c42516c104   7 minutes ago   289MB
<none>        <none>    637720d5a978   7 minutes ago   121MB
minideb       latest    eb43f6f05a68   7 minutes ago   121MB

This will also remove unused (dangling) containers, volumes and caches.

docker system prune 
docker system prune -a 
docker rmi <ID> 

where <ID> is the IMAGE ID from docker images -a

docker images -a | grep <"pattern"> | awk '{print $3}' | xargs docker rmi

where <"pattern"> is a regular expression for isolating the name of the image(s) to remove.

docker rmi -f $(docker images -f "dangling=true" -q).

4 Generating docker images on github

Up to now, the docker images we have created have been housed in a registry locally on the machine that we are building them on. In the spirit of reproducibility, we ideally want these images to be available to others (as well as our future selves). There are numerous options for making images available:

  • each user (or your future self) can build the image from the Dockerfile. In many cases, this will be a lengthy (yet automated) process. For this to be a viable option, each party will need to have the ability and resources to be able to build docker images from a Dockerfile.
  • the docker image could be hosted on a remote repository (such as dockerhub). This option requires that the original author of the docker image has a dockerhub account so that they can push the image that they created locally up to the remote registry.
  • build and host the docker image on github. This option requires the original author/authors to have a github account.

Building Docker images on GitHub via GitHub Actions is useful because it enables automated, consistent, and reproducible builds directly from your remote repository. As such, it is possible to trigger a fresh image build every time there is a change to the repo itself. This reduces manual effort and minimizing deployment errors. With GitHub Actions, you can integrate CI/CD workflows, automatically test images, and push them to container registries (e.g., GitHub Container Registry or Docker Hub). It also enhances collaboration, as team members can track build logs, detect failures early, and enforce security best practices through automated vulnerability scanning. Finally, it also means that the one platform can be used to host both code and the environment in which to run the code.

To have Github Actions build and publish a docker image, we start by creating a github actions workflow file. This file is in yaml format and should be placed a directory called .github/workflows within your git repository.

name: Create and publish the Docker image

on:
 worflow_dispatch:
 push:
   branches: [ "main" ]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  build-image:
    runs-on: ubuntu-latest

    if: "!contains(github.event.head_commit.message, [ci skip])"

    permissions:
      contents: read
      packages: write

    name: ${{ matrix.config.r }}

    strategy:
      fail-fast: false
      matrix:
        config:
        #- { r: devel }
        #- { r: next  }
        - { r: 4.4.1 }

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Login to GitHub Container Registry
        uses: docker/login-action@v1
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.repository_owner }}  #
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata (tags, labels) for Docker
        id: meta
        uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}

      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}

In the above .github/workflow:

  • name give a name to the workflow and is how it will be referred to within the Actions user interface

  • on: workflow_dispatch: specifies that this workflow can runs when manually triggered from the GitHub Actions UI.

  • on: push: branches [ “main” ]: specifies that this workflow will be triggered whenever there is a pushed change to the “main” branch

  • env: REGISTRY: sets the container registry to GitHub Container Registry (ghcr.io).

  • env: IMAGE_NAME: sets the repository name (in this case to the name of the github repository).

  • jobs: build-image: runs-on: defines a job (build-image) that runs on the latest version of Ubuntu

  • if: provides a way of defining exclusion rules. In this case it prevents execution if the latest commit message contains either “ci” or “skip”. This is useful for avoiding unnecessary builds.

  • permissions: contents: read: allows the workflow to read the repository contents.

  • packages: write: grants permission to push Docker images to Github Container Registry.

  • strategy: fail-fast:

  • strategy: matrix: config: defines a build matrix for testing against multiple R versions (only 4.4.1 is active). Commented-out lines suggest that devel and next versions were once considered.

  • steps: the actions to perform:

    • users: actions/checkout@v4 clones the repository so that the workflow can access files like the Dockerfile and any other code in the repository. It is using version four of this routine

    • users: docker/login-action@v1 … uses GitHub’s built-in token (GITHUB_TOKEN) to authenticate with Github Container Registry. Ensures that pushing images to Github Container Registry is authorized.

    • users: docker/metadata-actions@… generates metadata (tags, labels) based on the repository and commit details. The specific commit hash (in this case 9ec57ed1…) ensures a fixed version of docker/metadata-action.

    • users: docker/build-push-action@v5 .. builds the Docker image from the repository. Pushes the image to Github Container Registry with the metadata-generated tags and labels.

Now any time there is a change to the main branch, the image will be rebuilt and published to Github Container Registry.

To access the built image:

  • navigate to the Code panel of the repositories Github page.

  • click on the item under the heading Packages.

The displayed panel will provide information on the versions of the package available as well as instructions on how to obtain (pull) the image to a local repository from where it can be run.