Reproducible research - Containers

30 July, 2023

By: Murray Logan

Background information

In the previous tutorial, we discussed why it is important that your codebase is under version control and backed up to help ensure that your analyses can be evaluated and replicated by you and others both now and into the future. However, having access to the code (and data) does not always guarantee full reproducibility - this can also be effected by the exact software environment in which the code is run.

In the context of statistical analyses performed in R for example, R as well as the various packages that you have elected to use in support your analyses can (and do) evolve over time. Some functions get modified and some even get depreciated. Hence, over time, code that once worked perfectly (or at least adequately) can become broken.

Early solutions to this facet of reproducibility focused on virtual machines. Virtual machines (VM) build entire software environments on top of a software layer that mimics a physical computer such that each VM runs on a host computer is a completely separate self-contained entity. Whilst VMs do permit great flexibility (as virtually any operating system can be installed on a VM), they are considerably slower and less efficient than physical machines. Moreover, it is typically necessary to allocate a fixed amount of computer resources (particularly CPU) to the VM in advance.

More modern solutions focus instead on Containers. In contrast to VMs, containers do not mimic a physical computer, rather they only virtualise layers on top of the host operating system. Indeed, containers share (read only) the host OS kernel and binaries/libraries and thus containers and the applications contained therein can be very “light” and are typically are almost as performant as applications run natively on the host.

Time for some container terminology:

  • Container image is a static (unchangeable) file (or collection of files) that bundles code and all its dependencies (such as the necessary system libraries, code, runtime and system tools). Essentially, the image has all the information required to reproduce an software environment on any compatible machine. However, an image is just a snapshot which serves as a template to build a container. In other words, a container is a running image, and cannot exist without the image, whereas an image can exist without a container.

  • Container is a standard (linux) process whose software environment is defined by the contents of a container image and that runs on top of the host’s OS.

Preparations

If you intend to follow along with this tutorial, you may like to:

  • create a new folder (hereafter referred to as the sandpit folder) in which to create some files. On my local machine, I have a folder (tmp) in my home folder into which I will place a folder (called docker_tests) for this very purpose. On Linux and MacOSX that would be achieved via the following:

    mkdir ~/tmp/docker_tests


  • install Docker

  • install apptainer/singularity (if you intend to follow along with containers on a HPC).

Docker

Currently, the most popular container engine in use today is Docker. Docker is easy to install on most operating systems and comes with tools to build, manage, run and distribute container images (the later of which is supported via the DockerHub container ecosystem.

Simple overview

  1. Create a Docker definition file.

    The Dockerfile contains a set of instructions that Docker uses to build your container with the correct specifications. For now you do not need to know all the bits and pieces here (though please see this link for a more in-depth understanding of what the Dockerfile is capable of).

    Lets start with a very simple Dockerfile (which should be a plain text file located in the root of a project directory). This first example will be a very minimal example and much of the rest of the Docker section of this tutorial will then progressively build on this example to introduce more and more complexity.

    The image we will build will start with a very minimal Debian Linux base called minidep sourced from Dockerhub. This provides a fully functioning Linux operating system complete with the typical terminal power tools.

    To then extend the image by updating the package lists (location of repositories) before adding (installing) a small fun terminal application (cowsay) that generates ASCII art of a cow (or other animals) along with a speech bubble.

    Create the following file in your sandpit folder

    FROM bitnami/minideb:stretch
    LABEL maintainer="Author"
    LABEL email="author_email@email.com"
    
    ## Install the os packages
    RUN apt-get update \
      && apt-get install -y --no-install-recommends \
        cowsay \
      && rm -rf /var/lib/apt/lists/*

    In the above Dockerfile:

    • the three first rows in contain information on the base image on which to construct your docker container image (in this case bitnami/minideb:stretch container freely provided by the bitnami team), as well as information about yourself. minideb is a minimal debian operating system.

    • the FROM command points to a parent image. Typically, this will point to a specific image within a registry on Docker hub. This generates the base layer of the container image.

    • the LABEL command is one of numerous environmental variables and is used to add metadata to an image. In this case, there are two entries to specify information about the maintainer and their contact details. Note, entries must be key-value pairs and values must be enclosed in double quotes.

    • the RUN command runs shell commands in a new layer on top of the current image and commits the result. Each RUN generates a new layer. The above example, first updates the packages list and then installs an additional package (cowsay).

  2. Build the docker image

    In a terminal in the same location as the Dockerfile (i.e. your sandpit folder), enter something the following:

    docker build . --tag minideb


    where:

    • . indicates that the path used for the build context (in this case the current working directory - . means current location). Any files within this path can be copied into the image or used in the build context (for example, a Dockerfile).

    • --tag minideb provides a name (and optionally, a tag) for the image (in this case minideb). The name (and tag) can be anything, yet should be descriptive enough to help you distinguish this container image from other container images that you might construct on your system.

    This will build a series of docker container images (each of the layers built upon the layer before) in a local registry

    More details

    Usage: docker build [OPTIONS] PATH | URL | -
    Common options:

    Name Description Default
    --file, -f Path and name of Dockerfile Dockerfile
    --no-cache Do not use cache when building image
    --tag, -t Name (and optional tag) in name:tag format

    Typical files in context:

    • Dockerfile: a build recipe file
    • .dockerignore: similar to a .gitignore, this file lists files to be ignored in collating the files that form the build context.

    As an alternative to providing build instructions in the form of a Dockerfile, build can accept a URL to a (remote) docker repository.

    More info:
    https://docs.docker.com/engine/reference/commandline/build/

  3. Check that the image(s) have been created

    docker image ls  
    REPOSITORY        TAG       IMAGE ID       CREATED              SIZE
    minideb           latest    b2ea7ad941e4   About a minute ago   93.2MB
    bitnami/minideb   stretch   e398a222dbd6   4 months ago         53.8MB

    Note, the above simply lists all named images in your local registry. To get a more complete list of all images:

    docker images -a  
    REPOSITORY        TAG       IMAGE ID       CREATED              SIZE
    minideb           latest    b2ea7ad941e4   About a minute ago   93.2MB
    <none>            <none>    14019f112bdf   About a minute ago   53.8MB
    <none>            <none>    9f1463db0880   About a minute ago   53.8MB
    bitnami/minideb   stretch   e398a222dbd6   4 months ago         53.8MB

    The -a switch indicates all images (including unnamed and hanging images).

    This list appears chronologically from bottom to top. Hence, the Dockerhub image (bitnami/minideb) appears at the bottom of this list and above this there is a succession of intermediate images that correspond to each of the layers defined in the Dockerfile. Note, each successive image is progressively larger in size as the layer incorporates the layer below. At the top of the list is the full container image (with latest tag).

    Importantly, while we have built a container image, we do not yet have any running containers. We can demonstrate this by listing all existing containers:

    docker ps -a  
    CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

    The output is empty (assuming you have not previously generated any containers), indicating that there are currently no running containers.

  4. Test the docker image (fire up an ephemeral container)

    We will now test the image by generating and running a container from our container image. Once the container has started, the cowsay terminal application will display an ASCII cow saying ’Moo” before quietly terminating. The container will be automatically stopped and removed.

    docker run --entrypoint ./usr/games/cowsay --rm minideb Moo  
     _____
    < Moo >
     -----
            \   ^__^
             \  (oo)\_______
                (__)\       )\/\
                    ||----w |
                    ||     ||

    where:

    • --entrypoint ./usr/games/cowsay defines the base command that will be run once the container has started. In this case, it specifies the full path to the cowsay executable file.

    • --rm indicates that the container should be removed after it has finished running.

    • minideb is the name of our docker container image

    • Moo is the string passed on to cowsay to display in the speech bubble. Feel free to experiment with other strings here.

    To further appreciate the way arguments are passed on to applications within a container, lets alter the cowsay animal.

    docker run --entrypoint ./usr/games/cowsay --rm minideb -f /usr/share/cowsay/cows/koala.cow Grunt 
     _______
    < Grunt >
     -------
      \
       \
           ___  
         {~._.~}
          ( Y )
         ()~*~()   
         (_)-(_)   

    In the above example, we passed -f /usr/share/cowsay/cows/koala.cow Grunt on to cowsay. In this context, -f points to were alternative animal definitions are located.

  5. Test the docker image interactively

    Rather than fire up a container, run some command and then immediately terminate, it is possible to run a container in interactive mode. In this mode, after the container starts up, you will be placed in a terminal where you can issue any available command you like. Once you have finished the interactive session, simply enter exit and the container will then terminate.

    Try the following (within the container):

    docker run --rm -it minideb   

    Once the prompt appears try entering the following:

    • list the files and folders in the current working directory
    ls -la
    • run the cowsay application
    ./usr/games/cowsay Moo
    • exit the container
    exit

    Running in interactive mode is very useful when developing/debugging code on a container.

Some Dockerfile goodness

Incorporate default commands

Lets now step this up a bit and add some more information to the build recipe. Rather than alter the previous Dockerfile, we will instead make a different file (Dockerfile2) and inform docker to build with this alternative Dockerfile.

FROM bitnami/minideb:stretch
LABEL maintainer="Author"
LABEL email="author_email@email.com"

## Install the os packages
RUN apt-get update \
  && apt-get install -y --no-install-recommends \
    cowsay \
  && rm -rf /var/lib/apt/lists/*

## Default command to run
ENTRYPOINT ["/usr/games/cowsay","-f","/usr/share/cowsay/cows/koala.cow"]

## Default extra parameters passed to the command
CMD ["Grunt"]

In the above Dockerfile:

  • ENTRYPOINT provides a default command to run within the container. This specification is in JSON format.

  • CMD provides default extra parameters that are passed on to the command (also in JSON format). This can be overridden by passing an alternative when running the docker run command (see below).

If we now build our container image using this Dockerfile2:

docker build . --tag minideb -f Dockerfile2 

If we again review the list of images, we see that there are now two additional intermediate images and the latest image has been updated.

docker images -a   
REPOSITORY        TAG       IMAGE ID       CREATED          SIZE
minideb           latest    33e1c6c01fa8   37 seconds ago   93.2MB
<none>            <none>    476f3b4af612   38 seconds ago   93.2MB
<none>            <none>    b2ea7ad941e4   6 minutes ago    93.2MB
<none>            <none>    14019f112bdf   7 minutes ago    53.8MB
<none>            <none>    9f1463db0880   7 minutes ago    53.8MB
bitnami/minideb   stretch   e398a222dbd6   4 months ago     53.8MB

We can now run the container as:

docker run --rm minideb     
 _______
< Grunt >
 -------
  \
   \
       ___  
     {~._.~}
      ( Y )
     ()~*~()   
     (_)-(_)   

To override the default arguments (CMD) we baked into the docker image, we can issue an alternative as a command line argument.

docker run --rm minideb Meow     
 ______
< Meow >
 ------
  \
   \
       ___  
     {~._.~}
      ( Y )
     ()~*~()   
     (_)-(_)   

Copy and mountpoints

So far, we have used the docker container to display an ASCII art cow (or koala) in the terminal and then exit. Whilst, this might have some utility as a simple example of interacting with containers, it hardly represents typical work.

In the context of reproducible research, containers are useful for providing a consistent environment in which to run code. Thus in order to be useful, a container should have:

  1. access (or a copy) of the code within the container

  2. the ability to store the results on the host where they can be viewed and disseminated.

To illustrate these, we will add the R Statistical and Graphical Environment to our container image and use this in two further examples.

Copy

For the first example, we will add instructions to the Dockerfile to copy a small R script into the container so that the code can be run within the container environment. Lets create two files:

  • a very simple R script (analysis.R) that creates a data frame (dat) with a single variable of 10 random numbers and prints this data frame to screen as well as exporting it to a flat comma separated file (dat.csv)

    dat <- data.frame(y = rnorm(10))
    dat
    write.csv(dat, file = "dat.csv", row.names = FALSE)
  • a Dockerfile called Dockerfile3

    FROM bitnami/minideb:stretch
    LABEL maintainer="Author"
    LABEL email="author_email@email.com"
    
    ## Install the os packages
    RUN apt-get update \
      && apt-get install -y --no-install-recommends \
        r-base \
      && rm -rf /var/lib/apt/lists/*
    
    COPY analysis.R ~/
    WORKDIR ~/
    
    ## Default command to run
    ENTRYPOINT ["Rscript"]
    
    ## Default command parameters
    CMD ["analysis.R"]

This Dockerfile includes instructions to:

- install R (`r-base`)

- copy our R script from the current working directory to the home
  folder in the container (`COPY analysis.R ~/`)

- set the working directory within the container to be the home
  folder (`WORKDIR ~/`)

- specify that once the container has started, the `Rscript`
  command should be run (`ENTRYPOINT ["Rscript"]`)

- specify that once the container has started, the `Rscript`
  command should be run using the `analysis.R` script 
  (`CMD ["analysis.R"]`)

Great. This time when we build the container image, we will provide both a name and tag for the image (via --tag r:1). This will result in an image called r with a tag of 1.

docker build . --tag r:1 -f Dockerfile3


When we run this new image, we see that a data frame of 10 values is returned to the terminal.

docker run --rm r:1    
            y
1  -0.3851022
2  -0.2490779
3   0.3781064
4  -0.9018914
5  -1.0052765
6  -0.5863569
7   0.7872411
8  -2.3796989
9  -0.3764355
10 -0.3175351

Mount points

R did indeed run the analysis.R script inside the container. However, what happened to the file containing the exported data (dat.csv)? Although this file was created inside the container, it was completely lost when the container terminated. Obviously that is not that useful.

For the second example, rather than copy the R script to the container, we will instead mount a local folder to a point within the container. That way we can access select host files and folders within the container, thereby enabling us to both read the R script directly and write out any output files.

To support this, we will create another Dockerfile (Dockerfile4).

The changes from the previous Dockerfile:

  • remove the COPY statement as we will not need to work on a copy of the R code, we can work on it directly.

  • change the container working directory to /home/Project - Note, this path will not exist and will be created.

FROM bitnami/minideb:stretch
LABEL maintainer="Author"
LABEL email="author_email@email.com"

## Install the os packages
RUN apt-get update \
  && apt-get install -y --no-install-recommends \
    r-base \
  && rm -rf /var/lib/apt/lists/*

WORKDIR /home/Project

## Default command to run
ENTRYPOINT ["Rscript"]

## Default command parameters
CMD ["analysis.R"]

We will now build the container image with the name:tag of r:2

docker build . --tag r:2 -f Dockerfile4 


This time, when we run the container image, we will indicate a volume to mount (and a folder to mount this volume to). This will define which host folder to mount (map) and the path to mount this to within the container.

cd ~/tmp/docker_tests/
docker run --rm -v `pwd`:/home/Project r:2
             y
1   0.42805810
2  -0.30214526
3   0.33497680
4  -0.86393204
5   0.08471228
6   0.51800779
7  -0.25645220
8  -0.43606580
9  -0.42548406
10  1.17175564

where:

  • -v pwd:/home/Project defines the mounting of the current working directory on the host machine to the /home/Project' (which theDockerfile` defined as the working directory) folder inside the container

If we list the contents of our local folder, we see that the output file (dat.csv) has been created on the host filesystem.

cd ~/tmp/docker_tests/
ls -la  
total 32
drwxr-xr-x  2 murray murray 4096 Nov 16 10:37 .
drwxr-xr-x 15 murray murray 4096 Nov 16 10:37 ..
-rw-r--r--  1 murray murray   89 Nov 16 10:37 analysis.R
-rw-r--r--  1 root   root    188 Nov 16 10:37 dat.csv
-rw-r--r--  1 murray murray  237 Nov 16 10:37 Dockerfile
-rw-r--r--  1 murray murray  402 Nov 16 10:37 Dockerfile2
-rw-r--r--  1 murray murray  368 Nov 16 10:37 Dockerfile3
-rw-r--r--  1 murray murray  360 Nov 16 10:37 Dockerfile4

Of course, it is possible to define a combination of the above two examples - one in which a copy of the codebase is packaged up into the container image (to ensure that a specific codebase is always applied), yet a moutpoint is also specified at run time to enable the output(s) to be obtained by the host.

.dockerignore

TODO

Building a reproducible R environment

Just as it might be important to be able to recreate the state of an operating system and software from a previous time (to maximise the potential for reproducibility), it might be equally important to ensure that the entire environment reflects this state back in time. In the case of R software, this means that all included packages should be the same versions that were available at that previous time.

Microsoft provide daily snapshots of CRAN (MRAN). We can therefore nominate a date when providing package installation instructions in a Dockerfile.

For this example, we will start from an image that already has a version of R (4.2.2) built in (rocker/r-ver:4.2.2). Although this image is substantially larger than the mini debian we used earlier, it does come with all the build tools and R, each of which would otherwise require additional downloads and compilation. The net result is that the rocker/r-ver:4.2.2 image requires less overall download traffic than the individual parts.

The changes from the previous Dockerfile:

  • switch the base image to rocker:r-ver:4.2.2
  • add numerous -dev developer package dependencies
  • install the tidyverse collection of packages from a dated snapshot
FROM rocker/r-ver:4.2.2
LABEL maintainer="Author"
LABEL email="author_email@email.com"

RUN apt-get update \
  && apt-get install -y --no-install-recommends \
  libxml2-dev \
  libcurl4-openssl-dev  \
  libssl-dev \
  zlib1g-dev \
  && rm -rf /var/lib/apt/lists/*

## Install R package versions from MRAN (based on a date - YYYY-MM-DD)
RUN R -e "options(repos = \
    list(CRAN = \"http://mran.revolutionanalytics.com/snapshot/2022-10-04/\")); \
  install.packages(\"tidyverse\"); \
"

WORKDIR /home/Project

## Default command to run
ENTRYPOINT ["Rscript"]

CMD ["analysis.R"]

In the above Dockerfile

  • the three first rows in contain information on the base image on which to construct your docker container image (in this case rocker/r-ver:4.2.1 container freely provided by the rocker team), as well as information about yourself.

  • the FROM command points to a parent image. Typically, this will point to a specific image within a registry on Docker hub. This generates the base layer of the container image.

  • the LABEL command is one of numerous environmental variables and is used to add metadata to an image. In this case, there are two entries to specify information about the maintainer and their contact details. Note, entries must be key-value pairs and values must be enclosed in double quotes.

  • the RUN command runs shell commands in a new layer on top of the current image and commits the result. Each RUN generates a new layer. In the above example, there are two RUN commands.

    • the first RUN command updates the underlying Ubuntu linux distribution and installs some necessary dependencies
    • the second RUN adds the R package (tidyverse - which is technically a large collection of packages) from the MRAN snapshot repository (base on the 4th of October 2022). This repository stores daily snapshots of all the packages on CRAN and thus allows us to obtain the set of packages in the state they existed on a nominated day.
  • the WORKDIR command sets the working directory for software within the container. In this case, we are creating a dedicated directory (/home/Project)

  • the ENTRYPOINT command defines the default command/application to run within the container if the user does not provide a command. This is in JSON format. In this case, we are indicating that by default the container should run the Rscript application. The Rscript application runs a non-interactive R session on a nominated R script file. That is, it will run the nominated R script and then terminate on completion.

  • the CMD command defined the default arguments to provide to the ENTRYPOINT command. In this case, we have indicated which R script to run Rscript on.

Note, many of the large collection of R packages targeted for install as part of the tidyverse ecosystem (or its dependencies) require full compilations. Whilst this does help ensure that the underlying package routines are optimised for your system, the entire install process may take up to an hour. This installation can be sped up substantially by instead installing a pre-bundled version of the tidyverse packages (and dependencies) directly from the Ubuntu repositories. The associated alternative Dockerfile is provided in the following expandable section.

Alternative Dockerfile

As an alternative, we could instead install tidyverse from the ubuntu r-cran repository. This install will be far faster, yet likely not as up-to-date and we would have a little less control over exactly which version we were installing…

FROM rocker/r-ver:4.2.2
LABEL maintainer="Author"
LABEL email="author_email@email.com"

RUN apt-get update \
  && apt-get install -y --no-install-recommends \
  r-cran-tidyverse \
  && rm -rf /var/lib/apt/lists/*

WORKDIR /home/Project

## Default command to run
ENTRYPOINT ["Rscript"]

CMD ["analysis.R"]


We will now build the container image with the name:tag of r-tidyverse:1 Note, this will take some time to complete.

docker build . --tag r-tidyverse:1 -f Dockerfile5 


If we now run docker with this image, the resulting container will automatically run a non-interactive session using the analysis.R script.

cd ~/tmp/docker_tests/  
docker run --rm -v `pwd`:/home/Project r-tidyverse:1
             y
1  -0.03299773
2   0.23110954
3   0.32872076
4  -0.56113837
5  -0.31564902
6  -0.52932569
7   1.26809050
8   1.23017409
9  -0.36468730
10  0.71708974

where:

  • -v pwd:/home/Project defines the mounting of the current working directory on the host machine to the /home/Project' (which theDockerfile` defined as the working directory) folder inside the container

Conveniently, we can override the CMD command when we run docker run. In the current context, perhaps we would like to run the R session on an different R script. Lets try this by creating a new R script - this time, one that makes use of the tidyverse ecosystem.

library(tidyverse)
dat <- data.frame(y = rnorm(10))
dat
dat %>% summarise(Mean = mean(y), Median = median(y))

Now we can nominate this alternative R script as the argument to the Rscript command.

cd ~/tmp/docker_tests/  
docker run --rm -v `pwd`:/home/Project r-tidyverse:1 analysis5.R
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.3      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
             y
1   2.04331755
2  -0.37664693
3   0.02304288
4   1.77733814
5   0.17401634
6   0.69420006
7   0.11843890
8   0.48042439
9   0.26512189
10 -0.99474205
       Mean    Median
1 0.4204511 0.2195691

Managing Docker images

Listing images

To list the most recently created images.

docker images
REPOSITORY        TAG        IMAGE ID       CREATED             SIZE
r                 2          10ee994b3660   About an hour ago   201MB
r                 1          38cf3f496c44   About an hour ago   201MB
r-tidyverse       1          69d98c454241   22 hours ago        1.07GB
<none>            <none>     71827ee61f12   22 hours ago        1.07GB
inshore_fish      latest     0891ca908201   3 days ago          2.59GB
<none>            <none>     5a939ddba56a   5 days ago          201MB
<none>            <none>     54dc36336fe1   5 days ago          201MB
minideb           latest     2f52de2f9477   5 days ago          93.2MB
rocker/r-ver      4.2.2      3d5974176650   12 days ago         836MB
ubuntu            kinetic    8a7f92156625   13 days ago         70.2MB
bitnami/minideb   bullseye   ee9558d6f35f   2 weeks ago         78.5MB
bitnami/minideb   stretch    e398a222dbd6   5 months ago        53.8MB

The output will include all created images along with the base images that they are derived from (those sourced from Dockerhub for example).

The images (REPOSITORY) entries that are <none> are dangling images. That is, they are intermediate images that were previously used in the building of an image and are no longer used (because the Dockerfile layer that they were associated with is no longer used in the latest version of that Dockerfile).

We can exclude them from the output, by defining a filter that excludes dangling images.

docker images -f "dangling=false"
REPOSITORY        TAG        IMAGE ID       CREATED        SIZE
r                 2          10ee994b3660   2 hours ago    201MB
r                 1          38cf3f496c44   2 hours ago    201MB
r-tidyverse       1          69d98c454241   22 hours ago   1.07GB
inshore_fish      latest     0891ca908201   3 days ago     2.59GB
minideb           latest     2f52de2f9477   5 days ago     93.2MB
rocker/r-ver      4.2.2      3d5974176650   12 days ago    836MB
ubuntu            kinetic    8a7f92156625   13 days ago    70.2MB
bitnami/minideb   bullseye   ee9558d6f35f   2 weeks ago    78.5MB
bitnami/minideb   stretch    e398a222dbd6   5 months ago   53.8MB

If instead of dangling=false, you indicate dangling=true, only dangling images are displayed. This can be useful for identifying redundant images.

Another useful filter is to predicate on time relative to an image. For example, to display all images that were created since minideb:

docker images -f "since=minideb" 
REPOSITORY     TAG       IMAGE ID       CREATED        SIZE
r              2         10ee994b3660   2 hours ago    201MB
r              1         38cf3f496c44   2 hours ago    201MB
r-tidyverse    1         69d98c454241   22 hours ago   1.07GB
<none>         <none>    71827ee61f12   23 hours ago   1.07GB
inshore_fish   latest    0891ca908201   3 days ago     2.59GB
<none>         <none>    5a939ddba56a   5 days ago     201MB
<none>         <none>    54dc36336fe1   5 days ago     201MB

There is also a before version.

Note, the above examples will exclude the intermediate images that are associated with build layers. If we want to see all images (including the intermediate images), use the -a switch.

docker images -a 
REPOSITORY        TAG        IMAGE ID       CREATED             SIZE
r                 2          10ee994b3660   About an hour ago   201MB
<none>            <none>     7a0639444520   About an hour ago   201MB
<none>            <none>     4e79a4fbf92d   2 hours ago         201MB
r                 1          38cf3f496c44   2 hours ago         201MB
r-tidyverse       1          69d98c454241   22 hours ago        1.07GB
<none>            <none>     dfe6c963499c   22 hours ago        1.07GB
<none>            <none>     cd9a422f24f1   22 hours ago        1.07GB
<none>            <none>     ce403df92dda   22 hours ago        1.07GB
<none>            <none>     42579d0c2042   22 hours ago        902MB
<none>            <none>     71827ee61f12   23 hours ago        1.07GB
<none>            <none>     afc71049442e   23 hours ago        1.07GB
<none>            <none>     723ae7277837   23 hours ago        1.07GB
<none>            <none>     53d9368a3d58   23 hours ago        902MB
<none>            <none>     0f915b4184c9   24 hours ago        836MB
<none>            <none>     916c8c40dff1   24 hours ago        836MB
inshore_fish      latest     0891ca908201   3 days ago          2.59GB
<none>            <none>     6a2e0cfe62a8   3 days ago          2.59GB
<none>            <none>     31fe35ef888a   3 days ago          2.59GB
<none>            <none>     48e85e796e21   3 days ago          2.45GB
<none>            <none>     92e9b53888c8   3 days ago          1.97GB
<none>            <none>     5a939ddba56a   5 days ago          201MB
<none>            <none>     bda1926a47a0   5 days ago          201MB
<none>            <none>     d61417495b96   5 days ago          201MB
<none>            <none>     54dc36336fe1   5 days ago          201MB
<none>            <none>     4f84efc23486   5 days ago          201MB
<none>            <none>     55bb865743a9   5 days ago          201MB
<none>            <none>     d765c3364ab4   5 days ago          53.8MB
<none>            <none>     5fb74ce18e00   5 days ago          53.8MB
minideb           latest     2f52de2f9477   5 days ago          93.2MB
<none>            <none>     0ccecc4d5813   5 days ago          93.2MB
<none>            <none>     03fd0247b217   5 days ago          93.2MB
<none>            <none>     4ae8c7d92496   5 days ago          53.8MB
<none>            <none>     ec21211b907e   5 days ago          53.8MB
rocker/r-ver      4.2.2      3d5974176650   12 days ago         836MB
ubuntu            kinetic    8a7f92156625   13 days ago         70.2MB
bitnami/minideb   bullseye   ee9558d6f35f   2 weeks ago         78.5MB
bitnami/minideb   stretch    e398a222dbd6   5 months ago        53.8MB

Removing images

Removing all unused images

This will also remove unused (dangling) containers, volumes and caches.

docker system prune 

Removing all images

docker system prune -a 

Removing specific images

docker rmi <ID> 

where <ID> is the IMAGE ID from docker images -a

Removing images (regex)

docker images -a | grep <"pattern"> | awk '{print $3}' | xargs docker rmi

where <"pattern"> is a regular expression for isolating the name of the image(s) to remove.

remove dangling images

docker rmi -f $(docker images -f “dangling=true” -q).

Singularity

To create a singularity from a local docker image

  1. save a local copy of the docker image as an archive > docker save test -o test.tar
  2. build the singularity image from this archive > singularity build test.sif docker-archive://test.tar
  3. test singularity image > singularity exec -B ~/Work:/home/mlogan test.sif R

scp the singularity image to the HPC

  1. scp singularity image to the HPC > scp test.sif ‘:~/Work/AIMS/Consultations/Carly Randall’
  2. remote into HPC > [[/scp::/export/home/l-p/mlogan/Work/AIMS/Consultations/Carly Randall]]
  3. open a second shell > C-u M-x shell
  4. run singularity image > module load singularity/singularity.module > singularity exec -B ‘/export/home/l-p/mlogan/Work/AIMS/Consultations/Carly Randall’:/home/mlogan test.sif R > singularity exec –publish 8000:8000 -B ‘/export/home/l-p/mlogan/Work/AIMS/Consultations/Carly Randall’:/home/mlogan test.sif R
  5. within the new R terminal created declare a new R session > M-x ess-remote
  6. Open a script on the remote
  7. Associate an R process with this script > M-x ess-switch-process

To clean up

remove unused containers

docker system prune -a

remove unused volumes

docker volumes prune