This article contains information on the queues of the batch job system and on interactive jobs.

Job Submission

sbatch Command

sbatch -p queue

Compute resources such as (wall-)time, nodes and memory are restricted and must fit into queues. Since requested compute resources are NOT always automatically mapped to the correct queue class, you must add the correct queue class to your sbatch command . 'The specification of a queue is obligatory on both clusters - ForHLR I and ForHLR II.
Details are:

ForHLR I sbatch -p queue
queue	node	default resources	minimum resources	maximum resources
develop	thin	time=10, ntasks=1, mem=64000mb, mem-per-cpu=3200mb	nodes=1	time=60, nodes=4, ntasks-per-node=20 4 nodes are reserved for this queue Only for development, i.e. debugging or performance optimization ...
singlenode	thin	walltime=10, ntasks=1, mem=64000mb, mem-per-cpu=3200mb	nodes=1	time=3-00:00:00, nodes=1, ntasks-per-node=20
multinode	thin	time=10, ntasks=1, mem=64000mb, mem-per-cpu=3200mb	nodes=2	time=3-00:00:00, nodes=128, ntasks-per-node=20
fat	fat	time=10, ntasks=1, mem=512000mb, mem-per-cpu=16000mb	nodes=1	time=4-00:00:00, nodes=1, ntasks-per-node=32

ForHLR II sbatch -p queue
queue	node	default resources	minimum resources	maximum resources
develop	thin	time=10, ntasks=1, mem=63720mb, mem-per-cpu=3186mb	nodes=1	time=60, nodes=4, ntasks-per-node=20 4 nodes are reserved for this queue Only for development, i.e. debugging or performance optimization ...
develop-visu	fat	time=10, ntasks=1, mem=1029963mb, mem-per-cpu=21457mb	nodes=1	time=120, nodes=1, ntasks-per-node=48 1 node is reserved for this queue Only for development, i.e. debugging or performance optimization ...
normal	thin	walltime=10, ntasks=1, mem=63720mb, mem-per-cpu=3186mb	nodes=1	time=3-00:00:00, nodes=256, ntasks-per-node=20
long	thin	time=3-00:00:01, ntasks=1, mem=63720mb, mem-per-cpu=3186mb	nodes=1	time=7-00:00:00, nodes=64, ntasks-per-node=20
xnodes	thin	time=10, ntasks=257, mem=63720mb, mem-per-cpu=3186mb	nodes=257	time=3-00:00:00, nodes=512, ntasks-per-node=20
visu	visu	time=10, ntasks=1, mem=1029963mb, mem-per-cpu=21457mb	nodes=1	time=3-00:00:00, nodes=4, ntasks-per-node=48

Default resources of a queue class defines time, #tasks and memory if not explicitly given with sbatch command. Resource list acronyms --time, --ntasks, --nodes, --mem and --mem-per-cpu are described here.

Queue class examples

To run your batch job on one of the fat nodes, please use:

$ sbatch --partition=develop
     or 
$ sbatch -p develop

Interactive Jobs

On ForHLR you are only allowed to run short jobs (<< 1 hour) with little memory requirements (<< 8 GByte) on the logins nodes. If you want to run longer jobs and/or jobs with a request of more than 8 GByte of memory, you must allocate resources for so-called interactive jobs by usage of the command salloc on a login node. Considering a serial application running on a compute node that requires 5000 MByte of memory and limiting the interactive run to 2 hours the following command has to be executed:

$ salloc -p singlenode -n 1 -t 120 --mem=5000   # on ForHLR I
$ salloc -p normal -n 1 -t 120 --mem=5000       # on ForHLR II

Then you will get one core on a compute node within the partition "singlenode" and "normal" respectively. After execution of this command DO NOT CLOSE your current terminal session but wait until the queueing system Slurm has granted you the requested resources on the compute system. You will be logged in automatically on the granted core! To run a serial program on the granted core you only have to type the name of the executable.

$ ./<my_serial_program>

Please be aware that your serial job must run less than 2 hours in this example, else the job will be killed during runtime by the system.
You can also start now a graphical X11-terminal connecting you to the dedicated resource that is available for 2 hours. You can start it by the command:

$ xterm

Note that, once the walltime limit has been reached the resources - i.e. the compute node - will automatically be revoked.
An interactive parallel application running on one compute node or on many compute nodes (e.g. here 5 nodes) with 20 cores each requires usually an amount of memory in GByte (e.g. 50 GByte) and a maximum time (e.g. 1 hour). E.g. 5 nodes can be allocated by the following command:

$ salloc -p multinode -N 5 --ntasks-per-node=20 -t 01:00:00 --mem=50gb    # on ForHLR I
$ salloc -p normal -N 5 --ntasks-per-node=20 -t 01:00:00  --mem=50gb      # on ForHLR II

Now you can run parallel jobs on 100 cores requiring 50 GByte of memory per node. Please be aware that you will be logged in on core 0 of the first node. If you want to run MPI-programs, you can do it by simply typing mpirun <program_name>. Then your program will be run on 100 cores. A very simple example for starting a parallel job can be:

$ mpirun <my_mpi_program>

You can also start the debugger ddt by the commands:

$ module add devel/ddt
$ ddt <my_mpi_program>

The above commands will execute the parallel program <my_mpi_program> on all available cores. You can also start parallel programs on a subset of cores; an example for this can be:

$ mpirun -n 50 <my_mpi_program>

If you are using Intel MPI you must start <my_mpi_program> by the command mpiexec.hydra (instead of mpirun).

BwUniCluster2.0/Batch Queues

Contents

Job Submission

sbatch Command

sbatch -p queue

Queue class examples

Interactive Jobs

Navigation menu

BwUniCluster2.0/Batch Queues

Job Submission

sbatch Command

sbatch -p queue

Queue class examples

Interactive Jobs

Navigation menu

Search