NEMO/Software/Singularity Containers

From bwHPC Wiki
< NEMO‎ | Software
Jump to navigation Jump to search

Introduction

Singularity is an open-source software for container-virtualization. Because not every different software configuration can be provided as Modules on the clusters, containers offer a way to use a pre-built scientific software in a closed and reproducible space, independent from the environment. A Singularity container contains its own operating system, the intended software and all required dependencies, except for Kernel components (e.g. drivers). This also means that you can use software that isn't available for RHEL/CentOS, but is offered for other Linux systems. Singularity containers are easily movable between systems and do not require root-rights for execution (different to Docker).

For example, a user may build a software package from a library or from source on their own computer, move it to the server (for example with scp) and execute it. This works as long as Singularity is installed on both systems, without having to deal with the environment on the cluster.

The container generally works as its own closed-off environment. While you can access data and files stored on the host, you generally do not have access to software or modules running there. This means that you usually have to provide software that could otherwise be found in a module inside the container.

The following tutorial gives a brief introduction to the program to create and run a container on a cluster.

Building containers can be tricky. If you need help, please don't hesitate to contact us. We can help you.

Containers

Requirements to build a Container

Singularity requires a Linux-system. If no Linux computer is available, a virtual machine with a Linux-OS can be used. Singularity does not work on the Windows Subsystem for Linux (WSL).

First, install Singularity 3 from source by following the instructions on the official page. Singularity 3 also requires the installation of the programming language Go.

Singularity has to be installed on all systems trying to use the container, and has to be loaded with an appropriate module first on the cluster.

The host processor and the container also need to be binary compatible. This matters for the general CPU-architecture on which the container was originally built, and also for compiling during the build-process. The compiling shouldn't be optimized for a newer version of micro-architecture than the cluster uses, as it may cause issues when trying to run on the older system.

Building a Container

The command to build a new container is singularity build. A container that should remain writable after building can be created with singularity build --sandbox. A specific home directory can be defined by using singularity build --home /your/home/path/.

Most manual building operations require root-rights, while importing existing containers can often be done without them. If a container has to be made up from scratch, it can be built through a definition file or manually in a writable shell.

Singularity 3 supports two formats of containers: Singularity Image Files (.sif), the default option, and as a directory.

If you are pulling a pre-made container directly on the cluster, use a workspace instead of your home-directory. Aside from being preferable in general, the configuration of the cluster might not allow a container to be built in your home to begin with.

Building an existing container

If an existing container from Docker (docker://) or a container library (library:// or shub://) should be built using Singularity, it can simply be imported as is. A writable sandbox container can be built by flagging --sandbox:

$ singularity build <containername> library://path/to/file
$ singularity build --sandbox <containername> library://path/to/file


Building a new container with a definition file

A definition file, or recipe in older guides, provides the program with a script to build the container from. The same definition-file always reliably produces the same container, so it is encouraged to use definition-files for reproducibility if possible.

The file has to contain a header, specifying the operating system and its source. If one wants to use CentOS, for example:

Bootstrap: library
From: centos


After this, various sections with specifications can follow in any order. The most important are %post, %files and %environment.

%post constitutes the main part of the file, with a set of bash-instructions used to build the container in order. For example, the process might start with an update and the installation of standard tools required later on:

%post
	yum -y install update
	yum -y install wget tar
	cd /
	mkdir example

Keep in mind that the container contains almost no pre-installed packages, depending on the base image that is used.

For package-managers and installations, make sure that prompts are automatically answered with "yes" (-y in the example), otherwise the automatic building will be aborted.


%files can be used to access files on the home system and copy them inside the container at the start of building process. The syntax is simply filepath targetpath.

%files
	your/file/path.f /dir


%environment can be used to set environment-variables such as paths, for example:

%environment
	export PKG_CONFIG_PATH=/usr/bin

Inputs into %environment from the definition-file are written to /.singularity.d/env/90-environment.sh, which is sourced at runtime. In %post, commands for starting the container can be written to that file or to /.singularity.d/env/91-environment.sh. This can be useful if no '~/.bashrc' or similar is available. Note that information provided through %environment will not be available for the build-process, only for the completed container.

Some other options for the definition-file are %runtime, which can include a script to be executed at the start of the run command, and %labels and %help to provide the user of the container with credentials and a metadata help-file, respectively. More options can be found on the Singularity homepage.

The file is invoked with

$ sudo singularity build <containername> definition-file.def


The container is built in a temporary directory. If this needs to be changed from the default, --tmpdir your/tmp/dir can be used to specify a temporary directory.


Building a new container in the Shell

If no definition-file is a available, a container can be built by executing the same commands that would be put in the definition file manually in a shell. Effectively, this approach is the same as building an existing container, just with the intention to write into it immidiately. To do so, a container with the desired OS has to be built as a sandbox and opened as a writable shell:

$ sudo singularity build --sandbox <containername> library://centos
$ sudo singularity shell --writable <containername>


This shell already behaves like its own OS. Note that if you took a base-container with only an OS, it contains almost no pre-installed packages, and that some paths (e.g. ~/) may still reference the host-system. In general, cd / gets you to the highest parent-folder within singularity.

$ yum -y install update
$ yum -y install wget tar
$ cd /
$ mkdir example


Using a container

To use a container on a cluster, an available Singularity-module has to be loaded. For example:

$ module load your/singularity/version


Keep in mind that other modules you may have loaded (that aren't needed for running the container itself) will not be available inside the container.

There are three key ways to run services on a regular container, the already known shell, as well as exec and run. By default, all Singularity containers are read-only. To make a sandbox writable, --writable has to be specified in the command.

Again, the container can be opened as a shell:

$ singularity shell <containername>


On the cluster, the easiest way to run one-line commands (including commands to run other files) is passing them to the container through exec:

$ singularity exec --writable <containername> yum -y install update
$ singularity exec <containername> script.sh


Exec is generally the easiest way to execute scripts. Without further specification, it runs the container in the current directory, making it a good option to run on workspaces, which Singularity might otherwise struggle with if run from $HOME.

The third option, run, allows to execute a %runscript that was provided in the definition file during the build-process:

$ singularity run <containername>

This is for example useful to start an installed program.


Containers and Batch Jobs

Batch Jobs utilizing Singularity containers are generally built the same way as all other batch jobs, where the job script contains a singularity exec command. For example:

#!/bin/bash
#MSUB -l nodes=1:gpu
#MSUB -l walltime=3:00:00
#MSUB -l mem=5000mb
#MSUB -N Singularity

module load your/singularity/version
cd your/workspace
singularity exec --nv <containername> script1.sh
singularity exec --nv <containername> python script2.py


Using GPUS

Containers can run on nvidia-GPUs as well. First, make sure that all necessary libraries (for example CUDA) are installed on the container itself and all appropriate drivers are available on the host system. To load the requirements from the host system, both exec and shell need another specification to be able to interact with the GPUs - the flag --nv after the command.

$ singularity exec --nv <containername> script.sh

Using the flag is advisable, but may be omitted if the correct GPU- and driver-APIs are available on the container.


Concluding notes

  • Keep track of your inputs when using a (writable) shell.
  • Modules loaded outside the container don't work on the inside.
  • Use singularity exec for Batch-Jobs.
  • Use singularity exec --nv for (nvidia-)GPUs.