Latest revision as of 10:51, 11 May 2023

Introduction

To date, only few container runtime environments integrate well with HPC environments due to security concerns and differing assumptions in some areas.

For example native Docker environments require elevated privileges, which is not an option on shared HPC resources. Docker's "rootless mode" is also currently not supported on our HPC systems because it does not support necessary features such as cgroups resource controls, security profiles, overlay networks, furthermore GPU passthrough is difficult. Necessary subuid (newuidmap) and subgid (newgidmap) settings may impose security issues.

On bwUniCluster the container runtimes Enroot and Singularity/Apptainer are supported.

Further rootless container runtime environments (Podman, …) might be supported in the future, depending on how support for e.g. network interconnects, security features and HPC file systems develops.

ENROOT

Enroot enables you to run Docker containers on HPC systems. It is developed by NVIDIA. It is the recommended tool to use containers on bwUniCluster and integrates well with GPU usage and has basically no impact on performance. Enroot is available to all users by default.

Usage

Excellent documentation is provided on NVIDIA's github page. This documentation here therefore confines itself to simple examples to get to know the essential functionalities.

Using Docker containers with Enroot requires three steps:

Importing an image
Creating a container
Starting a container

Optionally containers can also be exported and transferred.

Importing a container image

enroot import docker://alpine
This pulls the latest alpine image from dockerhub (default registry). You will obtain the file alpine.sqsh.

enroot import docker://nvcr.io#nvidia/pytorch:21.04-py3
This pulls the pytorch image version 21.04-py3 from NVIDIA's NGC registry. Please note that the NGC registry does not always contain the "latest" tag and instead requires the specification of a dedicated version. You will obtain the file nvidia+pytorch+21.04-py3.sqsh.

enroot import docker://registry.scc.kit.edu#myProject/myImage:latest
This pulls your latest image from the KIT registry. You obtain the file myImage.sqsh.

Creating a container

Create a container named "nvidia+pytorch+21.04-py3" by unpacking the .sqsh-file.

enroot create --name nvidia+pytorch+21.04-py3 nvidia+pytorch+21.04-py3.sqsh

"Creating" a container means that the squashed container image is unpacked inside $ENROOT_DATA_PATH/. By default this variable points to $HOME/.local/share/enroot/.

Starting a container

Start the container nvidia+pytorch+21.04-py3 in read-write mode (--rw) and run bash inside the container.
enroot start --rw nvidia+pytorch+21.04-py3 bash

Start container in --rw-mode and get root access (--root) inside the container.
enroot start --root --rw nvidia+pytorch+21.04-py3 bash
You can now install software with root privileges, depending on the containerized Linux distribution e.g. with
apt-get install … , apk add …, yum install …, pacman -S …

Start container and mount (-m) a local directory to /work inside the container.
enroot start -m <localDir>:/work --rw nvidia+pytorch+21.04-py3 bash

Start container, mount a directory and start the application jupyter lab.
enroot start -m <localDir>:/work --rw nvidia+pytorch+21.04-py3 jupyter lab

Exporting and transfering containers

If you intend to use Docker images which you built e.g. on your local desktop, and transfer them somewhere else, there are several possibilities to do so:

enroot import --output myImage.sqsh dockerd://myImage
Import an image from the locally running Docker daemon. Copy the .sqsh-file to bwUniCluster and import it with enroot import.

enroot export --output myImage.sqsh myImage
Export an existing enroot container. Copy the .sqsh-file to bwUniCluster and import it with enroot import.

enroot bundle --output myImage.run myImage.sqsh
Create a self extracting bundle from a container image. Copy the .run-file to bwUniCluster. You can run the self extracting image via ./myImage.run even if enroot is not installed!

Container management

You can list all containers on the system and additional information (--fancy parameter) with the enroot list command.

The unpacked images can be removed with the enroot remove command.

SLURM Integration

Enroot allows you to run containerized applications non-interactively, including MPI- and multi-node parallelism. The necessary Slurm integration is realized via the Pyxis plugin.

Create Container via enroot

enroot import docker://ubuntu
enroot create -n pyxis_ubuntu ubuntu.sqsh

Adding pyxis_ is a must for the pyxis plugin to work

Start via Slurm

Start existing Container:

salloc -p dev_single -t 00:10:00 --container-name=ubuntu --container-mounts=/etc/slurm/task_prolog:/etc/slurm/task_prolog,/scratch:/scratch

Download and start Container via pyxis directly:

salloc -p dev_single -t 00:10:00 --container-image=ubuntu --container-name=ubuntu --container-mounts=/etc/slurm/task_prolog:/etc/slurm/task_prolog,/scratch:/scratch

In this case an enroot Container is created under ~./local/share/enroot/

Note: --container-mounts=/etc/slurm/task_prolog:/etc/slurm/task_prolog,/scratch:/scratch is needed for the plugin to work!! The Container name has to start with pyxis_ for the Plugin to work. When using the second Method this is done automatically. Furthermore when specifying the container name in your slurm Job the pyxis_ has to be omitted.

All options usable for pyxis can be found via srun --help under "Options provided by plugins:"

Notable Options:

--container-mount-home Mounts the home directory into the container
--container-writable Makes the container filesystem writable (otherwise only the mounted home is writebale)
--container-remap-root Become root in your container. Allows installation of software via e.G apt (ubuntu)

FAQ

How can I run JupyterLab in a container and connect to it?
- Start an interactive session with or without GPUs. Notice the compute node ID the session is running on, and start a container with a running JupyterLab, e.g.:
  salloc -p gpu_4 --time=01:00:00 --gres=gpu:1
  enroot start -m <localDir>:/work --rw nvidia+pytorch+21.04-py3 jupyter lab
- Open a terminal on your desktop and create a SSH-tunnel to the running JupyterLab instance on the compute node. Insert the node ID, where the interactive session is running on:
  ssh -L8888:<computeNodeID>:8888 <yourAccount>@uc2.scc.kit.edu
- Open a web browser and open the URL localhost:8888
- Enter the token, which is visible in the output of the first terminal.
  Copy the string behind the token= and paste it into the input field in the browser.

Are GPUs accessible from within a running container?
Yes.
Unlike Docker, Enroot does not need further command line options to enable GPU passthrough like --runtime=nvidia or --privileged.

Is there something like enroot-compose?
AFAIK no.
Enroot is mainly intended for HPC workloads, not for operating multi-container applications. However, starting and running these applications separately is possible.

Can I use workspaces to store containers?
Yes.
You can define the location of configuration files and storage with environment variables. The ENROOT_DATA_PATH variable should be set accordingly. Please refer to NVIDIA's documentation on runtime configuration.

Additional resources

Source code: https://github.com/NVIDIA/enroot

Documentation: https://github.com/NVIDIA/enroot/blob/master/doc

Additional information:

Singularity/Apptainer

Usage

Excellent documentation is provided on the Documentation&Examples page provided by Sylabs. This documentation here therefore confines itself to simple examples to get to know the essential functionalities.

Using Singularity/Apptainer usually involves two steps:

Building a container image using singularity build

Running a container image using singularity run or singularity exec

Building an image

singularity build ubuntu.sif library://ubuntu
This pulls the latest Ubuntu image from Singularity's Container Library and locally creates a container image file called ubuntu.sif.

singularity build alpine.sif docker://alpine
This pulls the latest alpine image from Dockerhub and locally creates a container image file called alpine.sif.

singularity build pytorch-21.04-p3.sif docker://nvcr.io#nvidia/pytorch:21.04-py3
This pulls the latest pytorch image from NVIDIA's NGC registry and locally creates a container image file called pytorch-21.04-p3.sif.

Running an image

singularity shell ubuntu.sif
Start a shell in the Ubuntu container.

singularity run alpine.sif
Start the container alpine.sif and run the default runscript provided by the image.

singularity exec alpine.sif /bin/ls
Start the container alpine.sif and run the /bin/ls command.

Container management

You can use the singularity search command to search for images on Singularity's Container Library.

@@ Line 1: / Line 1: @@
+= Introduction =
 To date, only few container runtime environments integrate well with HPC environments due to security concerns and differing assumptions in some areas.
 For example native Docker environments require elevated privileges, which is not an option on shared HPC resources. Docker's "rootless mode" is also currently not supported on our HPC systems because it does not support necessary features such as cgroups resource controls, security profiles, overlay networks, furthermore GPU passthrough is difficult. Necessary subuid (newuidmap) and subgid (newgidmap) settings may impose security issues.
-On bwUniCluster Enroot and Singularity are supported.
+On bwUniCluster the container runtimes '''Enroot''' and '''Singularity/Apptainer''' are supported.
 Further rootless container runtime environments (Podman, …) might be supported in the future, depending on how support for e.g. network interconnects, security features and HPC file systems develops.
 = ENROOT =
-Enroot enables you to run '''Docker containers''' on HPC systems. It is developed by NVIDIA. It is the '''recommended tool''' to use containers on HoreKa and integrates well with GPU usage.
+Enroot enables you to run '''Docker containers''' on HPC systems. It is developed by NVIDIA. It is the '''recommended tool''' to use containers on bwUniCluster and integrates well with GPU usage and has basically no impact on performance.
 Enroot is available to all users by default.
+[[File:docker_logo.svg|center|100px]]
 == Usage ==
@@ Line 30: / Line 30: @@
 * <code>enroot import docker://alpine</code><br />This pulls the latest alpine image from dockerhub (default registry). You will obtain the file alpine.sqsh.
-* <code>enroot import docker://nvcr.io#nvidia/pytorch:21.04-py3</code><br />This pulls the latest pytorch image from NVIDIA's NGC registry. You will obtain the file nvidia+pytorch+21.04-py3.sqsh.
+* <code>enroot import docker://nvcr.io#nvidia/pytorch:21.04-py3</code><br />This pulls the pytorch image version 21.04-py3 from [https://ngc.nvidia.com/catalog NVIDIA's NGC registry]. Please note that the NGC registry does not always contain the "latest" tag and instead requires the specification of a dedicated version. You will obtain the file nvidia+pytorch+21.04-py3.sqsh.
-* <code>enroot import docker://registry.scc.kit.edu#myProject/myImage:latest</code><br />This pulls your latest Image from the KIT registry. You obtain the file myImage.sqsh.
+* <code>enroot import docker://registry.scc.kit.edu#myProject/myImage:latest</code><br />This pulls your latest image from the KIT registry. You obtain the file myImage.sqsh.
 === Creating a container ===
-Create a container named nvidia+pytorch+21.04-py3 by unpacking the .sqsh-file.
+Create a container named "nvidia+pytorch+21.04-py3" by unpacking the .sqsh-file.
 <code>enroot create --name nvidia+pytorch+21.04-py3 nvidia+pytorch+21.04-py3.sqsh</code>
@@ Line 42: / Line 42: @@
 === Starting a container ===
-* Start the container nvidia+pytorch+21.04-py3 in read-write mode (--rw) and run bash inside the container.<br /><code>enroot start --rw nvidia+pytorch+21.04-py3 bash</code>
+* Start the container nvidia+pytorch+21.04-py3 in read-write mode (<code>--rw</code>) and run bash inside the container.<br /><code>enroot start --rw nvidia+pytorch+21.04-py3 bash</code>
-* Start container in --rw-mode and get root access (--root) inside the container.<br /> <code>enroot start --root --rw nvidia+pytorch+21.04-py3 bash</code><br />You can now install software with root privileges, depending on the containerized Linux distribution e.g. with <code>apt-get install … </code>, <code>apk add …</code>, <code>yum install …</code>, <code>pacman -S …</code>
+* Start container in <code>--rw</code>-mode and get root access (<code>--root</code>) inside the container.<br /> <code>enroot start --root --rw nvidia+pytorch+21.04-py3 bash</code><br />You can now install software with root privileges, depending on the containerized Linux distribution e.g. with<br /><code>apt-get install … </code>, <code>apk add …</code>, <code>yum install …</code>, <code>pacman -S …</code>
-* Start container and mount (-m) a local directory to <code>/work</code> inside the container.<br /><code>enroot start -m <localDir>:/work --rw nvidia+pytorch+21.04-py3 bash</code>
+* Start container and mount (<code>-m</code>) a local directory to <code>/work</code> inside the container.<br /><code>enroot start -m <localDir>:/work --rw nvidia+pytorch+21.04-py3 bash</code>
-* Start container, mount a directory and start the application jupyter lab.<br /><code>enroot start -m <localDir>:/work --rw nvidia+pytorch+21.04-py3 jupyter lab</code>
+* Start container, mount a directory and start the application <code>jupyter lab</code>.<br /><code>enroot start -m <localDir>:/work --rw nvidia+pytorch+21.04-py3 jupyter lab</code>
 === Exporting and transfering containers ===
@@ Line 54: / Line 54: @@
 If you intend to use Docker images which you built e.g. on your local desktop, and transfer them somewhere else, there are several possibilities to do so:
-* <code>enroot import --output myImage.sqsh dockerd://myImage</code><br />Build an image via docker build and a Dockerfile, import this image from the Docker daemon. Copy the .sqsh-file to bwUniCluster and import it with <code>enroot import</code>.
+* <code>enroot import --output myImage.sqsh dockerd://myImage</code><br />Import an image from the locally running Docker daemon. Copy the .sqsh-file to bwUniCluster and import it with <code>enroot import</code>.
 * <code>enroot export --output myImage.sqsh myImage</code><br />Export an existing enroot container. Copy the .sqsh-file to bwUniCluster and import it with enroot import.
@@ Line 62: / Line 62: @@
 === Container management ===
-You can list all containers on the system and additional information (--fancy parameter) with the <code>enroot list</code> command.
+You can list all containers on the system and additional information (<code>--fancy</code> parameter) with the <code>enroot list</code> command.
 The unpacked images can be removed with the enroot remove command.
+== SLURM Integration==
-= Singularity =
+Enroot allows you to run containerized applications non-interactively, including MPI- and multi-node parallelism. The necessary Slurm integration is realized via the [https://github.com/NVIDIA/pyxis Pyxis plugin].
+=== Create Container via enroot ===
+* <code>enroot import docker://ubuntu</code>
+* <code>enroot create -n pyxis_ubuntu ubuntu.sqsh</code>
+Adding pyxis_ is a must for the pyxis plugin to work
+=== Start via Slurm ===
+Start existing Container:
+*<code>salloc -p dev_single -t 00:10:00 --container-name=ubuntu --container-mounts=/etc/slurm/task_prolog:/etc/slurm/task_prolog,/scratch:/scratch</code>
+Download and start Container via pyxis directly:
+*<code>salloc -p dev_single -t 00:10:00 --container-image=ubuntu --container-name=ubuntu --container-mounts=/etc/slurm/task_prolog:/etc/slurm/task_prolog,/scratch:/scratch</code>
+In this case an enroot Container is created under ~./local/share/enroot/
+Note: <code>--container-mounts=/etc/slurm/task_prolog:/etc/slurm/task_prolog,/scratch:/scratch</code> is needed for the plugin to work!! The Container name has to start with pyxis_ for the Plugin to work. When using the second Method this is done automatically. Furthermore when specifying the container name in your slurm Job the pyxis_ has to be omitted.
+All options usable for pyxis can be found via srun --help under "Options provided by plugins:"
+Notable Options:
+* <code>--container-mount-home</code> Mounts the home directory into the container
+* <code>--container-writable</code> Makes the container filesystem writable (otherwise only the mounted home is writebale)
+* <code>--container-remap-root</code> Become root in your container. Allows installation of software via e.G apt (ubuntu)
+== FAQ ==
+* ''How can I run JupyterLab in a container and connect to it?''
+** Start an interactive session with or without GPUs. Notice the compute node ID the session is running on, and start a container with a running JupyterLab, e.g.:<br /><code>salloc -p gpu_4 --time=01:00:00 --gres=gpu:1</code><br /><code>enroot start -m <localDir>:/work --rw nvidia+pytorch+21.04-py3 jupyter lab</code>
+** Open a terminal on your desktop and create a SSH-tunnel to the running JupyterLab instance on the compute node. Insert the node ID, where the interactive session is running on:<br /><code>ssh -L8888:<computeNodeID>:8888 <yourAccount>@uc2.scc.kit.edu</code>
+** Open a web browser and open the URL [http://localhost:8888 localhost:8888]
+** Enter the token, which is visible in the output of the first terminal.<br />Copy the string behind the <code>token=</code> and paste it into the input field in the browser.
+* ''Are GPUs accessible from within a running container?''<br />Yes.<br />Unlike Docker, Enroot does not need further command line options to enable GPU passthrough like <code>--runtime=nvidia</code> or <code>--privileged</code>.
+* ''Is there something like <code>enroot-compose</code>?''<br />AFAIK no.<br /> Enroot is mainly intended for HPC workloads, not for operating multi-container applications. However, starting and running these applications separately is possible.
+* ''Can I use workspaces to store containers?''<br />Yes.<br />You can define the location of configuration files and storage with environment variables. The <code>ENROOT_DATA_PATH</code> variable should be set accordingly. Please refer to [https://github.com/NVIDIA/enroot/blob/master/doc/configuration.md#runtime-configuration NVIDIA's documentation] on runtime configuration.
+== Additional resources ==
+Source code: [https://github.com/NVIDIA/enroot https://github.com/NVIDIA/enroot]
+Documentation: [https://github.com/NVIDIA/enroot/blob/master/doc https://github.com/NVIDIA/enroot/blob/master/doc]
+Additional information:
+* [https://archive.fosdem.org/2020/schedule/event/containers_hpc_unprivileged/ FOSDEM 2020 talk] + [https://archive.fosdem.org/2020/schedule/event/containers_hpc_unprivileged/attachments/slides/3711/export/events/attachments/containers_hpc_unprivileged/slides/3711/containers_hpc_unprivileged.pdf slides]
+* [https://slurm.schedmd.com/SLUG19/NVIDIA_Containers.pdf Slurm User Group Meeting 2019 talk]
+= Singularity/Apptainer =
+[[File:singularity_logo.svg|center|100px]]
+== Usage ==
+Excellent documentation is provided on the [https://sylabs.io/docs/ Documentation&Examples] page provided by Sylabs. This documentation here therefore confines itself to simple examples to get to know the essential functionalities.
+Using Singularity/Apptainer usually involves two steps:
+* Building a container image using singularity build
+* Running a container image using singularity run or singularity exec
+=== Building an image ===
+* <code>singularity build ubuntu.sif library://ubuntu</code><br />This pulls the latest Ubuntu image from Singularity's [https://cloud.sylabs.io/library Container Library] and locally creates a container image file called ubuntu.sif.
+* <code>singularity build alpine.sif docker://alpine</code><br />This pulls the latest alpine image from Dockerhub and locally creates a container image file called alpine.sif.
+* <code>singularity build pytorch-21.04-p3.sif docker://nvcr.io#nvidia/pytorch:21.04-py3</code><br />This pulls the latest pytorch image from NVIDIA's NGC registry and locally creates a container image file called pytorch-21.04-p3.sif.
+=== Running an image ===
+* <code>singularity shell ubuntu.sif</code><br />Start a shell in the Ubuntu container.
+* <code>singularity run alpine.sif</code><br />Start the container alpine.sif and run the default runscript provided by the image.
+* <code>singularity exec alpine.sif /bin/ls</code><br />Start the container alpine.sif and run the /bin/ls command.
+=== Container management ===
+You can use the <code>singularity search</code> command to search for images on Singularity's [https://cloud.sylabs.io/library Container Library].

BwUniCluster2.0/Containers: Difference between revisions

Latest revision as of 10:51, 11 May 2023

Contents

Introduction

ENROOT

Usage

Importing a container image

Creating a container

Starting a container

Exporting and transfering containers

Container management

SLURM Integration

Create Container via enroot

Start via Slurm

FAQ

Additional resources

Singularity/Apptainer

Usage

Building an image

Running an image

Container management

Navigation menu

BwUniCluster2.0/Containers: Difference between revisions

Latest revision as of 10:51, 11 May 2023

Introduction

ENROOT

Usage

Importing a container image

Creating a container

Starting a container

Exporting and transfering containers

Container management

SLURM Integration

Create Container via enroot

Start via Slurm

FAQ

Additional resources

Singularity/Apptainer

Usage

Building an image

Running an image

Container management

Navigation menu

Search