User:M Janczyk/Software/Singularity Containers: Difference between revisions
| H Winkhardt (talk | contribs) m (Typo) | H Winkhardt (talk | contribs)   (Added info on workspaces) | ||
| Line 28: | Line 28: | ||
| Singularity 3 supports two formats of containers: Singularity Image Files (''.sif''), the default option, and as a Directory.  <br> | Singularity 3 supports two formats of containers: Singularity Image Files (''.sif''), the default option, and as a Directory.  <br> | ||
| If you are building a container directly on the cluster, use a workspace instead of your home-directory. Aside from being preferable in general, the configuration of the cluster might not allow a container to be built in your home to begin with.<br> | |||
| === Building an existing container === | === Building an existing container === | ||
Revision as of 16:39, 8 August 2019
Introduction
Singularity is an open-source software for container-virtualization. Because not every different software configuration can be provided as Modules on the clusters, containers offer a way to use a pre-built scientific software in a closed and reproducible space, independent from the environment. A Singularity container contains its own operating system, the intended software and all required dependencies. This also means that you can use software that isn't available for RHEL/CentOS, but is offered for other Linux systems. Singularity containers are easily movable between systems and do not require root-rights for execution (different to Docker). 
For example, a user may build a software package from a library or from source on their own computer, move it to the server (for example with scp) and execute it. This works as long as Singularity is installed on both systems, without having to deal with the environment on the cluster. 
The container generally works as its own closed-off environment. While you can access data and files stored on the host, you generally do not have access to software or modules running there. This means that you usually have to provide software that could otherwise be found in a module inside the container.
The following tutorial gives a brief introduction to the program to create and run a container on a cluster. 
Building containers can be tricky. If you need help, please don't hesitate to contact us. We can help you.
Containers
Requirements to build a Container
Singularity requires a Linux-system. If no Linux computer is available, a virtual machine with a Linux-OS can be used. Singularity does not work on the Windows Subsystem for Linux (WSL). 
First, install Singularity 3 from source by following the instructions on the official page. Singularity 3 also requires the installation of the programming language Go. 
Singularity has to be installed on all systems trying to use the container, and has to be loaded with an appropriate module first on the cluster. 
Building a Container
The command to build a new container is singularity build. A container that should remain writable after building can be created with singularity build --sandbox. A specific home directory can be defined by using singularity build --home /your/home/path/.  
Most building operations require root-rights. If a container has to be made up from scratch, it can be built manually in a writable shell or through a definition file. 
Singularity 3 supports two formats of containers: Singularity Image Files (.sif), the default option, and as a Directory.  
If you are building a container directly on the cluster, use a workspace instead of your home-directory. Aside from being preferable in general, the configuration of the cluster might not allow a container to be built in your home to begin with.
Building an existing container
If an existing container from Docker (docker://) or a container library (library:// or shub://) should be built using Singularity, it can simply be imported as is. A writable sandbox container can be built by flagging --sandbox: 
$ sudo singularity build <containername> library://path/to/file $ sudo singularity build --sandbox <containername> library://path/to/file
Building a new container with a definition file
A definition file, or recipe in older guides, provides the program with a script to build the container from. The same definition-file always reliably produces the same container, so it is encouraged to use definition-files for reproducibility if possible. 
The file has to contain a header, specifying the operating system and its source. If one wants to use CentOS, for example: 
Bootstrap: library From: centos
After this, various sections with specifications can follow in any order. The most important are %post, %files and %environment. 
%post constitutes the main part of the file, with a set of bash-instructions used to build the container in order. For example, the process might start with an update and the installation of standard tools required later on: 
%post yum -y install update yum -y install wget tar cd / mkdir example
Keep in mind that the container contains almost no pre-installed packages. 
%files can be used to access files on the home system and copy them inside the container at the start of building process. The syntax is simply filepath targetpath. 
%files your/file/path.f /dir
%environment can be used to set environment-variables such as paths, for example: 
%environment export PKG_CONFIG_PATH=/usr/bin
Inputs into %environment from the definition-file are written to /.singularity.d/env/90-environment.sh, which is sourced at runtime. In %post, commands for starting the container can be written to that file or to /.singularity.d/env/91-environment.sh. This can be useful if no '~/.bashrc' or similar is available. Note that information provided through %environment will not be available for the build-process, only for the completed container.  
Some other options for the definition-file are %runtime, which can include a script to be executed at the start of the run command, and %labels and %help to provide the user of the container with credentials and a metadata help-file, respectively. More options can be found on the Singularity homepage. 
The file is invoked with 
$ sudo singularity build <containername> definition-file.def
The container is built in a temporary directory. If this needs to be changed from the default, --tmpdir your/tmp/dir can be used to specify a temporary directory.
Building a new container in the Shell
If no definition-file is a available, a container can be built by executing the same commands that would be put in the definition file manually in a shell. Effectively, this approach is the same as building an existing container, just with the intention to write into it immidiately. To do so, a container with the desired OS has to be built as a sandbox and opened as a writable shell: 
$ sudo singularity build --sandbox <containername> library://centos $ sudo singularity shell --writable <containername>
This shell already behaves like its own OS. Note that if you took a base-container with only an OS, it contains almost no pre-installed packages, and that some paths (e.g. ~/) may still reference the host-system. In general, cd / gets you to the highest parent-folder within singularity. 
$ yum -y install update $ yum -y install wget tar $ cd / $ mkdir example
Using a container
To use a container on a cluster, an available Singularity-module has to be loaded. For example: 
$ module load your/singularity/version
Keep in mind that other modules you may have loaded (that aren't needed for running the container itself) will not be available inside the container. 
There are three key ways to run services on a regular container, the already known shell, as well as exec and run.  By default, all Singularity containers are read-only. To make them writable, --writable has to be specified in the command. 
Again, the container can be opened as a shell: 
$ singularity shell <containername>
On the cluster, the easiest way to run one-line commands (including commands to run other files) is passing them to the container through exec: 
$ singularity exec --writable <containername> yum -y install update $ singularity exec <containername> script.sh
Exec is generally the easiest way to execute scripts. Without further specification, it runs the container in the current directory, making it a good option to run on workspaces, which Singularity might otherwise struggle with if run from $HOME.
The third option, run, allows to execute a %runscript that was provided in the definition file during the build-process: 
$ singularity run <containername>
This is for example useful to start an installed program.
Containers and Batch Jobs
Batch Jobs utilizing Singularity containers are generally built the same way as all other batch jobs, where the job script contains a singularity exec command. For example: 
#!/bin/bash #MSUB -l nodes=1:gpu #MSUB -l walltime=3:00:00 #MSUB -l mem=5000mb #MSUB -N Singularity module load your/singularity/version cd your/workspace singularity exec --nv <containername> script1.sh singularity exec --nv <containername> python script2.py
Using GPUS
Containers can run on GPUs as well. First, make sure that all necessary libraries (for example CUDA) are installed on the container itself and all appropriate drivers are available on the host system. When using the container, both exec and shell need another mandatory specification to be able to interact with GPUs - the flag --nv after the command. 
$ singularity exec --nv <containername> script.sh
Concluding notes
- Keep track of your inputs when using a (writable) shell.
- Modules loaded outside the container don't work on the inside.
- Use singularity exec for Batch-Jobs.
- Use singularity exec --nv for GPUs.