NEMO2/Containers/Enroot

From bwHPC Wiki
< NEMO2‎ | Containers
Revision as of 19:05, 7 May 2026 by M Janczyk (talk | contribs) (Created page with "'''Enroot''' is a container runtime developed by NVIDIA that runs OCI/Docker containers without root privileges. On NEMO it is the '''recommended''' container solution and is integrated with Slurm via the '''Pyxis''' SPANK plugin. == How it works == Enroot converts a Docker image into a SquashFS file (<tt>.sqsh</tt>), unpacks it on demand into an overlay filesystem, and runs your workload inside that environment. The Pyxis plugin adds <tt>--container-*</tt> options to...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Enroot is a container runtime developed by NVIDIA that runs OCI/Docker containers without root privileges. On NEMO it is the recommended container solution and is integrated with Slurm via the Pyxis SPANK plugin.

How it works

Enroot converts a Docker image into a SquashFS file (.sqsh), unpacks it on demand into an overlay filesystem, and runs your workload inside that environment. The Pyxis plugin adds --container-* options to srun/salloc/sbatch so containers become first-class Slurm jobs.

Default mounts

The following paths are automatically mounted into every container when launched via Slurm/Pyxis:

Host path Notes
/home all home directories, read-write
/work all workspace filesystems, read-write
/etc/slurm Slurm configuration (read-only)
/usr/lib64/slurm Slurm plug-in libraries (read-only)

You do not need to pass --container-mount-home or --container-mounts=/work manually — they are already there. The paths inside the container are identical to the paths on the host system, so scripts and config files referencing $HOME or workspace paths work without modification.

Note that ws_* tools (e.g. ws_find, ws_list) are not available inside the container. Determine your workspace path on the login node before submitting the job and pass it as an environment variable or hard-code it in your script.

Container image storage

Unpacked container images are stored in:

~/.local/share/enroot/

Images are SquashFS files and can be several GB. To avoid filling your home quota, store images in a workspace and symlink the default path:

# create a workspace (100 days)
ws_allocate enroot 100

# replace the default enroot directory with a symlink to the workspace
mkdir -p ~/.local/share
ln -s $(ws_find enroot) ~/.local/share/enroot

Enroot (and Pyxis) will now transparently use the workspace path for all image storage.

Usage without Slurm (interactive)

Import an image

# from Docker Hub
enroot import docker://ubuntu:24.04

# from quay.io
enroot import docker://quay.io#rockylinux/rockylinux:9

# from NVIDIA NGC
enroot import docker://nvcr.io#nvidia/pytorch:24.01-py3

This creates a .sqsh file in the current directory.

Create a container

The container name must be prefixed with pyxis_ for Slurm/Pyxis to find it later (omit the prefix when passing --container-name to Slurm).

enroot create --name pyxis_ubuntu ubuntu+24.04.sqsh

Start a container

# interactive shell, read-write
enroot start --rw pyxis_ubuntu bash

# get root inside the container (to install packages)
enroot start --root --rw pyxis_ubuntu bash

# mount an extra directory (e.g. from outside /home and /work)
enroot start --rw -m /tmp/mydata:/data pyxis_ubuntu bash

List and remove containers

enroot list --fancy
enroot remove pyxis_ubuntu

Usage via Slurm / Pyxis

Interactive allocation

# use an already-created container
salloc -p cpu --container-name=ubuntu

# pull, create and start in one step (container is created under ~/.local/share/enroot/)
salloc -p cpu --container-image=ubuntu:24.04 --container-name=ubuntu
salloc -p cpu --container-image="quay.io#rockylinux/rockylinux:9" --container-name=rocky
salloc -p l40s --gres=gpu:1 --container-image="nvcr.io#nvidia/pytorch:24.01-py3" --container-name=pytorch

# start with a specific working directory inside the container
# $(ws_find enroot) is evaluated on the login node before the job starts
salloc -p cpu --container-name=ubuntu --container-workdir=$(ws_find enroot)

Batch job

#!/bin/bash
#SBATCH -p cpu
#SBATCH --container-name=ubuntu

python3 /work/classic/myWs/train.py

Useful Pyxis options

All options are listed in srun --help under Options provided by plugins.

Option Effect
--container-name=NAME Use existing enroot container (omit the pyxis_ prefix)
--container-image=IMAGE Pull image from registry and create container on the fly
--container-mount-home Mount $HOME into container (already in defaults, but explicit flag also works)
--container-mounts=SRC:DST[,…] Bind-mount additional paths
--container-writable Make the container overlay writable
--container-remap-root Become root inside the container (no real root on host)
--container-workdir=PATH Set the working directory inside the container

GPU access

GPU passthrough is automatic — no extra flags are needed. Enroot/Pyxis detect the allocated GPUs via Slurm's GRES mechanism and make them available inside the container.

salloc -p l40s --gres=gpu:1 --container-name=pytorch
# nvidia-smi works inside the container out of the box

Tips

  • Install extra packages interactively with enroot start --root --rw before submitting batch jobs.
  • Use --container-writable in batch jobs only if your script modifies the container filesystem; otherwise the default overlay is discarded after the job anyway.
  • Store large datasets in workspaces (/work), not in $HOME, to avoid filling your home quota with data files.