NEMO2/Containers/Enroot: Difference between revisions
mNo edit summary |
mNo edit summary |
||
| Line 1: | Line 1: | ||
'''Enroot''' is a container runtime developed by NVIDIA that runs OCI/Docker containers without root privileges. |
'''Enroot''' is a container runtime developed by NVIDIA that runs OCI/Docker containers without root privileges. |
||
On NEMO it is the '''recommended''' container solution and is integrated with Slurm via the '''Pyxis''' SPANK plugin. |
On NEMO it is the '''recommended''' container solution and is integrated with Slurm via the '''Pyxis''' SPANK plugin. |
||
== How it works == |
|||
| ⚫ | |||
| ⚫ | |||
== Image sources == |
== Image sources == |
||
| Line 19: | Line 24: | ||
|} |
|} |
||
== |
== Image storage == |
||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
== Default mounts == |
== Default mounts == |
||
The following paths are '''automatically mounted''' into every container |
The following paths are '''automatically mounted''' into every container: |
||
{| class="wikitable" |
{| class="wikitable" |
||
| Line 37: | Line 54: | ||
|} |
|} |
||
| ⚫ | |||
You do '''not''' need to pass <tt>--container-mount-home</tt> or <tt>--container-mounts=/work</tt> manually |
You do '''not''' need to pass <tt>--container-mount-home</tt> or <tt>--container-mounts=/work</tt> manually. |
||
| ⚫ | |||
Note that <tt>ws_*</tt> tools (e.g. <tt>ws_find</tt>, <tt>ws_list</tt>) are '''not available inside the container'''. |
Note that <tt>ws_*</tt> tools (e.g. <tt>ws_find</tt>, <tt>ws_list</tt>) are '''not available inside the container'''. |
||
Determine your workspace path on the login node before submitting the job and pass it as an environment variable or hard-code it in your script. |
Determine your workspace path on the login node before submitting the job and pass it as an environment variable or hard-code it in your script. |
||
== |
== Interactive usage == |
||
| ⚫ | |||
| ⚫ | |||
~/.local/share/enroot/ |
|||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
| ⚫ | |||
</pre> |
|||
| ⚫ | |||
== Usage without Slurm (interactive) == |
|||
=== Import an image === |
=== Import an image === |
||
| Line 99: | Line 94: | ||
The container name must be prefixed with <tt>pyxis_</tt> so that '''Pyxis''' (the Slurm plugin) can find it later — when passing <tt>--container-name</tt> to Slurm, omit the prefix. |
The container name must be prefixed with <tt>pyxis_</tt> so that '''Pyxis''' (the Slurm plugin) can find it later — when passing <tt>--container-name</tt> to Slurm, omit the prefix. |
||
Can I delete the <tt>.sqsh</tt> file afterwards? |
'''Can I delete the <tt>.sqsh</tt> file afterwards?''' |
||
Yes. Once <tt>enroot create</tt> has unpacked the image, the sqsh file is no longer needed for running the container. |
Yes. Once <tt>enroot create</tt> has unpacked the image, the sqsh file is no longer needed for running the container. |
||
Keep it as a backup to recreate the container later, or delete it to free space: |
|||
<pre> |
<pre> |
||
| Line 116: | Line 111: | ||
enroot start --root --rw pyxis_ubuntu bash |
enroot start --root --rw pyxis_ubuntu bash |
||
# mount an extra directory |
# mount an extra directory |
||
enroot start --rw -m /tmp/mydata:/data pyxis_ubuntu bash |
enroot start --rw -m /tmp/mydata:/data pyxis_ubuntu bash |
||
</pre> |
</pre> |
||
| Line 147: | Line 142: | ||
=== Batch job === |
=== Batch job === |
||
CPU job: |
|||
<pre lang="bash"> |
<pre lang="bash"> |
||
| Line 152: | Line 149: | ||
#SBATCH -p cpu |
#SBATCH -p cpu |
||
#SBATCH --container-name=ubuntu |
#SBATCH --container-name=ubuntu |
||
python3 /work/classic/myWs/train.py |
|||
| ⚫ | |||
GPU job: |
|||
<pre lang="bash"> |
|||
#!/bin/bash |
|||
#SBATCH -p l40s |
|||
#SBATCH --gres=gpu:1 |
|||
#SBATCH --container-name=pytorch |
|||
python3 /work/classic/myWs/train.py |
python3 /work/classic/myWs/train.py |
||
| Line 167: | Line 175: | ||
|- |
|- |
||
| <tt>--container-image=IMAGE</tt> || Pull image from registry and create container on the fly |
| <tt>--container-image=IMAGE</tt> || Pull image from registry and create container on the fly |
||
|- |
|||
| <tt>--container-mount-home</tt> || Mount <tt>$HOME</tt> into container (already in defaults, but explicit flag also works) |
|||
|- |
|- |
||
| <tt>--container-mounts=SRC:DST[,…]</tt> || Bind-mount additional paths |
| <tt>--container-mounts=SRC:DST[,…]</tt> || Bind-mount additional paths |
||
Latest revision as of 14:50, 11 May 2026
Enroot is a container runtime developed by NVIDIA that runs OCI/Docker containers without root privileges. On NEMO it is the recommended container solution and is integrated with Slurm via the Pyxis SPANK plugin.
How it works
Enroot converts a Docker image into a SquashFS file (.sqsh), unpacks it on demand into an overlay filesystem, and runs your workload inside that environment. The Pyxis plugin adds --container-* options to srun/salloc/sbatch so containers become first-class Slurm jobs.
Image sources
All major OCI registries work out of the box. Note that Enroot uses # (not /) to separate registry host from image path.
| Registry | URI syntax | Example |
|---|---|---|
| Docker Hub | docker://IMAGE[:TAG] | docker://ubuntu:24.04 |
| NVIDIA NGC | docker://nvcr.io#ORG/IMAGE[:TAG] | docker://nvcr.io#nvidia/pytorch:24.01-py3 |
| quay.io | docker://quay.io#ORG/IMAGE[:TAG] | docker://quay.io#rockylinux/rockylinux:9 |
| GitHub Container Registry | docker://ghcr.io#ORG/IMAGE[:TAG] | docker://ghcr.io#containerd/alpine:latest |
Image storage
Unpacked container images are stored in ~/.local/share/enroot/. Images are SquashFS files and can be several GB. To avoid filling your home quota, store images in a workspace and symlink the default path:
# create a workspace (100 days) ws_allocate enroot 100 # replace the default enroot directory with a symlink to the workspace mkdir -p ~/.local/share ln -s $(ws_find enroot) ~/.local/share/enroot
Enroot (and Pyxis) will now transparently use the workspace path for all image storage.
Default mounts
The following paths are automatically mounted into every container:
| Host path | Notes |
|---|---|
| /home | all home directories, read-write |
| /work | all workspace filesystems, read-write |
The paths inside the container are identical to the paths on the host system, so scripts referencing $HOME or workspace paths work without modification. You do not need to pass --container-mount-home or --container-mounts=/work manually.
Note that ws_* tools (e.g. ws_find, ws_list) are not available inside the container. Determine your workspace path on the login node before submitting the job and pass it as an environment variable or hard-code it in your script.
Interactive usage
Import an image
enroot import downloads an OCI image from a registry and converts it into a SquashFS archive (.sqsh). SquashFS is a compressed, read-only filesystem — the sqsh file is the portable, immutable snapshot of the container image.
# from Docker Hub enroot import docker://ubuntu:24.04 # from quay.io enroot import docker://quay.io#rockylinux/rockylinux:9 # from NVIDIA NGC enroot import docker://nvcr.io#nvidia/pytorch:24.01-py3 # from GitHub Container Registry enroot import docker://ghcr.io#containerd/alpine:latest
This creates a .sqsh file in the current directory (e.g. ubuntu+24.04.sqsh).
Create a container
Before you can run the image, you must unpack it into a container root filesystem:
enroot create --name pyxis_ubuntu ubuntu+24.04.sqsh
This extracts the sqsh archive into ~/.local/share/enroot/pyxis_ubuntu/ (or the symlinked workspace path). The container name must be prefixed with pyxis_ so that Pyxis (the Slurm plugin) can find it later — when passing --container-name to Slurm, omit the prefix.
Can I delete the .sqsh file afterwards? Yes. Once enroot create has unpacked the image, the sqsh file is no longer needed for running the container. Keep it as a backup to recreate the container later, or delete it to free space:
rm ubuntu+24.04.sqsh
Start a container
# interactive shell, read-write enroot start --rw pyxis_ubuntu bash # get root inside the container (to install packages) enroot start --root --rw pyxis_ubuntu bash # mount an extra directory enroot start --rw -m /tmp/mydata:/data pyxis_ubuntu bash
List and remove containers
enroot list --fancy enroot remove pyxis_ubuntu
Usage via Slurm / Pyxis
Interactive allocation
# use an already-created container salloc -p cpu --container-name=ubuntu # pull, create and start in one step (container is created under ~/.local/share/enroot/) salloc -p cpu --container-image=ubuntu:24.04 --container-name=ubuntu salloc -p cpu --container-image="quay.io#rockylinux/rockylinux:9" --container-name=rocky salloc -p cpu --container-image="ghcr.io#containerd/alpine:latest" --container-name=alpine salloc -p l40s --gres=gpu:1 --container-image="nvcr.io#nvidia/pytorch:24.01-py3" --container-name=pytorch # start with a specific working directory inside the container # $(ws_find enroot) is evaluated on the login node before the job starts salloc -p cpu --container-name=ubuntu --container-workdir=$(ws_find enroot)
Batch job
CPU job:
#!/bin/bash
#SBATCH -p cpu
#SBATCH --container-name=ubuntu
python3 /work/classic/myWs/train.py
GPU job:
#!/bin/bash
#SBATCH -p l40s
#SBATCH --gres=gpu:1
#SBATCH --container-name=pytorch
python3 /work/classic/myWs/train.py
Useful Pyxis options
All options are listed in srun --help under Options provided by plugins.
| Option | Effect |
|---|---|
| --container-name=NAME | Use existing enroot container (omit the pyxis_ prefix) |
| --container-image=IMAGE | Pull image from registry and create container on the fly |
| --container-mounts=SRC:DST[,…] | Bind-mount additional paths |
| --container-writable | Make the container overlay writable |
| --container-remap-root | Become root inside the container (no real root on host) |
| --container-workdir=PATH | Set the working directory inside the container |
GPU access
GPU passthrough is automatic — no extra flags are needed. Enroot/Pyxis detect the allocated GPUs via Slurm's GRES mechanism and make them available inside the container.
salloc -p l40s --gres=gpu:1 --container-name=pytorch # nvidia-smi works inside the container out of the box
Tips
- Install extra packages interactively with enroot start --root --rw before submitting batch jobs.
- Use --container-writable in batch jobs only if your script modifies the container filesystem; otherwise the default overlay is discarded after the job anyway.
- Store large datasets in workspaces (/work), not in $HOME, to avoid filling your home quota with data files.