BwUniCluster2.0/Batch Queues: Difference between revisions
(Created page with "This article contains information on the queues of the batch job system and on interactive jobs. = Job Submission = == sbatch Command == === sbatch -p ''queu...") |
|||
(52 intermediate revisions by 7 users not shown) | |||
Line 1: | Line 1: | ||
__TOC__ |
|||
This article contains information on the queues of the [[Batch_Jobs|batch job system]] and on interactive jobs. |
|||
== sbatch -p ''queue'' == |
|||
= Job Submission = |
|||
Compute resources such as (wall-)time, nodes and memory are restricted and must fit into '''queues'''. Since requested compute resources are NOT always automatically mapped to the correct queue class, '''you must add the correct queue class to your sbatch command '''. <font color=red>The specification of a queue is obligatory on BwUniCluster 2.0.</font> |
|||
== sbatch Command == |
|||
=== sbatch -p ''queue'' === |
|||
Compute resources such as (wall-)time, nodes and memory are restricted and must fit into '''queues'''. Since requested compute resources are NOT always automatically mapped to the correct queue class, '''you must add the correct queue class to your sbatch command '''. <font color=red>'The specification of a queue is obligatory on both clusters - ForHLR I and ForHLR II.</font> |
|||
<br> |
<br> |
||
Details are: |
Details are: |
||
{| width=750px class="wikitable" |
{| width=750px class="wikitable" |
||
! colspan="5" | |
! colspan="5" | bwUniCluster 2.0 <br> sbatch -p ''queue'' |
||
|- style="text-align:left;" |
|- style="text-align:left;" |
||
! queue !! node !! default resources !! minimum resources !! maximum resources |
! queue !! node !! default resources !! minimum resources !! maximum resources |
||
|- style="text-align:left" |
|- style="text-align:left" |
||
| dev_single |
|||
| develop |
|||
| thin |
| thin |
||
| time=10 |
| time=10, mem-per-cpu=1125mb |
||
| |
| |
||
| time= |
| time=30, nodes=1, mem=180000mb, ntasks-per-node=40, (threads-per-core=2)<br>6 nodes are reserved for this queue. <br> Only for development, i.e. debugging or performance optimization ... |
||
|- style=" |
|- style="text-align:left;" |
||
| single |
|||
| singlenode |
|||
| thin |
| thin |
||
| |
| time=30, mem-per-cpu=1125mb |
||
| |
| |
||
| time= |
| time=72:00:00, nodes=1, mem=180000mb, ntasks-per-node=40, (threads-per-core)=2 |
||
|- style=" |
|- style="text-align:left;" |
||
| dev_multiple |
|||
| multinode |
|||
| |
| hpc |
||
| time=10 |
| time=10, mem-per-cpu=1125mb |
||
| nodes=2 |
| nodes=2 |
||
| time=30, nodes=4, mem=90000mb, ntasks-per-node=40, (threads-per-core=2)<br>8 nodes are reserved for this queue.<br> Only for development, i.e. debugging or performance optimization ... |
|||
| time=3-00:00:00, nodes=128, ntasks-per-node=20 |
|||
|- style="vertical-align:top; text-align:left" |
|||
| fat |
|||
| fat |
|||
| time=10, ntasks=1, mem=512000mb, mem-per-cpu=16000mb |
|||
| nodes=1 |
|||
| time=4-00:00:00, nodes=1, ntasks-per-node=32 |
|||
|- |
|||
|} |
|||
{| width=750px class="wikitable" |
|||
! colspan="5" | ForHLR II <br> sbatch -p ''queue'' |
|||
|- style="text-align:left;" |
|- style="text-align:left;" |
||
| multiple |
|||
! queue !! node !! default resources !! minimum resources !! maximum resources |
|||
| hpc |
|||
|- style="text-align:left" |
|||
| time=30, mem-per-cpu=1125mb |
|||
| develop |
|||
| |
| nodes=2 |
||
| time= |
| time=72:00:00, mem=90000mb, nodes=80, ntasks-per-node=40, (threads-per-core=2) |
||
| nodes=1 |
|||
| time=60, nodes=4, ntasks-per-node=20<br>4 nodes are reserved for this queue <br> Only for development, i.e. debugging or performance optimization ... |
|||
|- style="text-align:left;" |
|- style="text-align:left;" |
||
| dev_multiple_il |
|||
| develop-visu |
|||
| IceLake |
|||
| time=10, mem-per-cpu=1950mb |
|||
| nodes=2 |
|||
| time=30, nodes=8, mem=249600mb, ntasks-per-node=64, (threads-per-core=2)<br>8 nodes are reserved for this queue <br> Only for development, i.e. debugging or performance optimization ... |
|||
|- style="text-align:left;" |
|||
| multiple_il |
|||
| IceLake |
|||
| time=10, mem-per-cpu=1950mb |
|||
| nodes=2 |
|||
| time=72:00:00, nodes=80, mem=249600mb, ntasks-per-node=64, (threads-per-core=2) |
|||
|- style="text-align:left;" |
|||
| dev_gpu_4_a100 |
|||
| IceLake + A100 |
|||
| time=10, mem-per-gpu=127500mb, cpus-per-gpu=16 |
|||
| |
|||
| time=30, nodes=1, mem=510000mb, ntasks-per-node=64, (threads-per-core=2) |
|||
|- style="text-align:left;" |
|||
| gpu_4_a100 |
|||
| IceLake + A100 |
|||
| time=10, mem-per-gpu=127500mb, cpus-per-gpu=16 |
|||
| |
|||
| time=48:00:00, nodes=9, mem=510000mb, ntasks-per-node=64, (threads-per-core=2) |
|||
|- style="text-align:left;" |
|||
| gpu_4_h100 |
|||
| IceLake + H100 |
|||
| time=10, mem-per-gpu=127500mb, cpus-per-gpu=16 |
|||
| |
|||
| time=48:00:00, nodes=5, mem=510000mb, ntasks-per-node=64, (threads-per-core=2) |
|||
|- style="vertical-align:top; text-align:left" |
|||
| fat |
|||
| fat |
| fat |
||
| time=10 |
| time=10, mem-per-cpu=18750mb |
||
| mem=180001mb |
|||
| nodes=1 |
|||
| time= |
| time=72:00:00, nodes=1, mem=3000000mb, ntasks-per-node=80, (threads-per-core=2) |
||
|- style="vertical-align:top; text-align:left" |
|- style="vertical-align:top; text-align:left" |
||
| dev_gpu_4 |
|||
| normal |
|||
| |
| gpu4 |
||
| |
| time=10, mem-per-gpu=94000mb, cpus-per-gpu=10 |
||
| |
| |
||
| time=30, nodes=1, mem=376000, ntasks-per-node=40, (threads-per-core=2)<br>1 node is reserved for this queue <br> Only for development, i.e. debugging or performance optimization ... |
|||
| time=3-00:00:00, nodes=256, ntasks-per-node=20 |
|||
|- style="text-align:left;" |
|||
| gpu_4 |
|||
| gpu4 |
|||
| time=10, mem-per-gpu=94000mb, cpus-per-gpu=10 |
|||
| |
|||
| time=48:00:00, mem=376000, nodes=14, ntasks-per-node=40, (threads-per-core=2) |
|||
|- style="vertical-align:top; text-align:left" |
|- style="vertical-align:top; text-align:left" |
||
| |
| gpu_8 |
||
| |
| gpu8 |
||
| time= |
| time=10, mem-per-cpu=94000mb, cpus-per-gpu=10 |
||
| |
|||
| nodes=1 |
|||
| time= |
| time=48:00:00, mem=752000, nodes=10, ntasks-per-node=40, (threads-per-core=2) |
||
|- style="vertical-align:top; text-align:left" |
|||
| xnodes |
|||
| thin |
|||
| time=10, ntasks=257, mem=63720mb, mem-per-cpu=3186mb |
|||
| nodes=257 |
|||
| time=3-00:00:00, nodes=512, ntasks-per-node=20 |
|||
|- style="vertical-align:top; text-align:left" |
|||
| visu |
|||
| visu |
|||
| time=10, ntasks=1, mem=1029963mb, mem-per-cpu=21457mb |
|||
| nodes=1 |
|||
| time=3-00:00:00, nodes=4, ntasks-per-node=48 |
|||
|- |
|- |
||
|} |
|} |
||
Default resources of a queue class defines time, #tasks and memory if not explicitly given with sbatch command. Resource list acronyms ''--time'', ''--ntasks'', ''--nodes'', ''--mem'' and ''--mem-per-cpu'' are described [[ |
Default resources of a queue class defines time, #tasks and memory if not explicitly given with sbatch command. Resource list acronyms ''--time'', ''--ntasks'', ''--nodes'', ''--mem'' and ''--mem-per-cpu'' are described [[BwUniCluster_2.0_Slurm_common_Features|here]]. |
||
<br> |
|||
=== Queue class examples === |
|||
<br> |
|||
==== Queue class examples ==== |
|||
To run your batch job on one of the |
To run your batch job on one of the thin nodes, please use: |
||
<pre> |
<pre> |
||
$ sbatch --partition= |
$ sbatch --partition=dev_multiple |
||
or |
or |
||
$ sbatch -p |
$ sbatch -p dev_multiple |
||
</pre> |
</pre> |
||
<br> |
<br> |
||
== Interactive Jobs == |
|||
On ForHLR you are only allowed to run short jobs (<< 1 hour) with little memory requirements (<< 8 GByte) on the logins nodes. If you want to run longer jobs and/or jobs with a request of more than 8 GByte of memory, you must allocate resources for so-called interactive jobs by usage of the command salloc on a login node. Considering a serial application running on a compute node that requires 5000 MByte of memory and limiting the interactive run to 2 hours the following command has to be executed: |
|||
On bwUniCluster 2.0 you are only allowed to run short jobs (<< 1 hour) with little memory requirements (<< 8 GByte) on the logins nodes. If you want to run longer jobs and/or jobs with a request of more than 8 GByte of memory, you must allocate resources for so-called interactive jobs by usage of the command salloc on a login node. Considering a serial application running on a compute node that requires 5000 MByte of memory and limiting the interactive run to 2 hours the following command has to be executed: |
|||
<pre> |
<pre> |
||
$ salloc -p |
$ salloc -p single -n 1 -t 120 --mem=5000 |
||
$ salloc -p normal -n 1 -t 120 --mem=5000 # on ForHLR II |
|||
</pre> |
</pre> |
||
Then you will get one core on a compute node within the partition "singlenode" and "normal" respectively. After execution of this command '''DO NOT CLOSE''' your current terminal session but wait until the queueing system Slurm has granted you the requested resources on the compute system. You will be logged in automatically on the granted core! To run a serial program on the granted core you only have to type the name of the executable. |
|||
Then you will get one core on a compute node within the partition "single". After execution of this command '''DO NOT CLOSE''' your current terminal session but wait until the queueing system Slurm has granted you the requested resources on the compute system. You will be logged in automatically on the granted core! To run a serial program on the granted core you only have to type the name of the executable. |
|||
<pre> |
<pre> |
||
$ ./<my_serial_program> |
$ ./<my_serial_program> |
||
</pre> |
</pre> |
||
Please be aware that your serial job must run less than 2 hours in this example, else the job will be killed during runtime by the system. |
Please be aware that your serial job must run less than 2 hours in this example, else the job will be killed during runtime by the system. |
||
<br> |
|||
You can also start now a graphical X11-terminal connecting you to the dedicated resource that is available for 2 hours. You can start it by the command: |
You can also start now a graphical X11-terminal connecting you to the dedicated resource that is available for 2 hours. You can start it by the command: |
||
<pre> |
<pre> |
||
$ xterm |
$ xterm |
||
</pre> |
</pre> |
||
Note that, once the walltime limit has been reached the resources - i.e. the compute node - will automatically be revoked. |
Note that, once the walltime limit has been reached the resources - i.e. the compute node - will automatically be revoked. |
||
<br> |
|||
An interactive parallel application running on one compute node or on many compute nodes (e.g. here 5 nodes) with 20 cores each requires usually an amount of memory in GByte (e.g. 50 GByte) and a maximum time (e.g. 1 hour). E.g. 5 nodes can be allocated by the following command: |
|||
An interactive parallel application running on one compute node or on many compute nodes (e.g. here 5 nodes) with 40 cores each requires usually an amount of memory in GByte (e.g. 50 GByte) and a maximum time (e.g. 1 hour). E.g. 5 nodes can be allocated by the following command: |
|||
<pre> |
<pre> |
||
$ salloc -p |
$ salloc -p multiple -N 5 --ntasks-per-node=40 -t 01:00:00 --mem=50gb |
||
$ salloc -p normal -N 5 --ntasks-per-node=20 -t 01:00:00 --mem=50gb # on ForHLR II |
|||
</pre> |
</pre> |
||
Now you can run parallel jobs on 100 cores requiring 50 GByte of memory per node. Please be aware that you will be logged in on core 0 of the first node. If you want to run MPI-programs, you can do it by simply typing mpirun <program_name>. Then your program will be run on 100 cores. A very simple example for starting a parallel job can be: |
|||
Now you can run parallel jobs on 200 cores requiring 50 GByte of memory per node. Please be aware that you will be logged in on core 0 of the first node. |
|||
If you want to have access to another node you have to open a new terminal, connect it also to BwUniCluster 2.0 and type the following commands to |
|||
connect to the running interactive job and then to a specific node: |
|||
<pre> |
|||
$ srun --jobid=XXXXXXXX --pty /bin/bash |
|||
$ srun --nodelist=uc2nXXX --pty /bin/bash |
|||
</pre> |
|||
With the command: |
|||
<pre> |
|||
$ squeue |
|||
</pre> |
|||
the jobid and the nodelist can be shown. |
|||
If you want to run MPI-programs, you can do it by simply typing mpirun <program_name>. Then your program will be run on 200 cores. A very simple example for starting a parallel job can be: |
|||
<pre> |
<pre> |
||
$ mpirun <my_mpi_program> |
$ mpirun <my_mpi_program> |
||
</pre> |
</pre> |
||
You can also start the debugger ddt by the commands: |
You can also start the debugger ddt by the commands: |
||
<pre> |
<pre> |
||
$ module add devel/ddt |
$ module add devel/ddt |
||
$ ddt <my_mpi_program> |
$ ddt <my_mpi_program> |
||
</pre> |
</pre> |
||
The above commands will execute the parallel program <my_mpi_program> on all available cores. You can also start parallel programs on a subset of cores; an example for this can be: |
The above commands will execute the parallel program <my_mpi_program> on all available cores. You can also start parallel programs on a subset of cores; an example for this can be: |
||
<pre> |
<pre> |
||
$ mpirun -n 50 <my_mpi_program> |
$ mpirun -n 50 <my_mpi_program> |
||
</pre> |
</pre> |
||
If you are using Intel MPI you must start <my_mpi_program> by the command mpiexec.hydra (instead of mpirun). |
|||
<br> |
|||
<br> |
|||
If you are using Intel MPI you must start <my_mpi_program> by the command mpiexec.hydra (instead of mpirun). |
|||
= Accounting = |
|||
By executing the command |
|||
<pre> |
|||
$ sacctmgr [-p] list assoc |
|||
</pre> |
|||
the granted CPU-minutes for all projects you are involved in can be seen. The value is stored in the column below GrpTRESMins. Additionally you can see below GrpSubmit how many jobs can be submitted concurrently by all users of a project and below GrpJobs how many jobs can be submitted concurrently by a single user of a project. The command sacctmgr is explained in detail on the webpage https://slurm.schedmd.com/sacctmgr.html or via manpage (man sacctmgr). |
|||
By executing the command |
|||
<pre> |
|||
# Accumulation of all jobs since 12/12/2018 (end of maintenance) on ForHLR I |
|||
$ sreport [-p] --tres cpu cluster AccountUtilizationByUser Start=2018-12-12 [-t hours] |
|||
# Accumulation of all jobs since 01/18/2019 (end of maintenance) on ForHLR II |
|||
$ sreport [-p] --tres cpu cluster AccountUtilizationByUser Start=2019-01-18 [-t hours] |
|||
</pre> |
|||
the used CPU-minutes since 12/12/2018 on ForHLR I and 01/18/2019 on ForHLR II respectively for all projects you are involved in can be seen. The value is stored in the column below Used. If you don't insert the option END, the accumulation of CPU-time will end yesterday at 23:59. You can add an arbitrary end by the option END=MM/DD[/YY]-HH:MM[:SS]. If you omit the options START and END, you will get the accumulated cpu-times of yesterday. Using the option ''-t hours'' means to get the used CPU-time in hours. The command sreport is explained in detail on the webpage https://slurm.schedmd.com/sreport.html or via manpage (man sreport). |
|||
If the used CPU-times of a project are higher than the granted cpu-time, your submitted jobs will not run. The jobs will be pending in the queue until the granted CPU-time has been enhanced. If you execute the command ''squeue'' you will get the reason "AssocGrpCPUMinutesLimit" for the pending of your job. |
|||
If your account is enclosed in more than one project group, you must set the option -A <project_group> of the command ''sbatch'' for adding the CPU time to the project <project_group>. Otherwise the mapping of CPU time to a project is random. |
|||
<br> |
|||
<br> |
|||
---- |
|||
[[Category:bwUniCluster|Batch Jobs - bwUniCluster2.0 features]] |
Latest revision as of 18:03, 19 February 2024
sbatch -p queue
Compute resources such as (wall-)time, nodes and memory are restricted and must fit into queues. Since requested compute resources are NOT always automatically mapped to the correct queue class, you must add the correct queue class to your sbatch command . The specification of a queue is obligatory on BwUniCluster 2.0.
Details are:
bwUniCluster 2.0 sbatch -p queue | ||||
---|---|---|---|---|
queue | node | default resources | minimum resources | maximum resources |
dev_single | thin | time=10, mem-per-cpu=1125mb | time=30, nodes=1, mem=180000mb, ntasks-per-node=40, (threads-per-core=2) 6 nodes are reserved for this queue. Only for development, i.e. debugging or performance optimization ... | |
single | thin | time=30, mem-per-cpu=1125mb | time=72:00:00, nodes=1, mem=180000mb, ntasks-per-node=40, (threads-per-core)=2 | |
dev_multiple | hpc | time=10, mem-per-cpu=1125mb | nodes=2 | time=30, nodes=4, mem=90000mb, ntasks-per-node=40, (threads-per-core=2) 8 nodes are reserved for this queue. Only for development, i.e. debugging or performance optimization ... |
multiple | hpc | time=30, mem-per-cpu=1125mb | nodes=2 | time=72:00:00, mem=90000mb, nodes=80, ntasks-per-node=40, (threads-per-core=2) |
dev_multiple_il | IceLake | time=10, mem-per-cpu=1950mb | nodes=2 | time=30, nodes=8, mem=249600mb, ntasks-per-node=64, (threads-per-core=2) 8 nodes are reserved for this queue Only for development, i.e. debugging or performance optimization ... |
multiple_il | IceLake | time=10, mem-per-cpu=1950mb | nodes=2 | time=72:00:00, nodes=80, mem=249600mb, ntasks-per-node=64, (threads-per-core=2) |
dev_gpu_4_a100 | IceLake + A100 | time=10, mem-per-gpu=127500mb, cpus-per-gpu=16 | time=30, nodes=1, mem=510000mb, ntasks-per-node=64, (threads-per-core=2) | |
gpu_4_a100 | IceLake + A100 | time=10, mem-per-gpu=127500mb, cpus-per-gpu=16 | time=48:00:00, nodes=9, mem=510000mb, ntasks-per-node=64, (threads-per-core=2) | |
gpu_4_h100 | IceLake + H100 | time=10, mem-per-gpu=127500mb, cpus-per-gpu=16 | time=48:00:00, nodes=5, mem=510000mb, ntasks-per-node=64, (threads-per-core=2) | |
fat | fat | time=10, mem-per-cpu=18750mb | mem=180001mb | time=72:00:00, nodes=1, mem=3000000mb, ntasks-per-node=80, (threads-per-core=2) |
dev_gpu_4 | gpu4 | time=10, mem-per-gpu=94000mb, cpus-per-gpu=10 | time=30, nodes=1, mem=376000, ntasks-per-node=40, (threads-per-core=2) 1 node is reserved for this queue Only for development, i.e. debugging or performance optimization ... | |
gpu_4 | gpu4 | time=10, mem-per-gpu=94000mb, cpus-per-gpu=10 | time=48:00:00, mem=376000, nodes=14, ntasks-per-node=40, (threads-per-core=2) | |
gpu_8 | gpu8 | time=10, mem-per-cpu=94000mb, cpus-per-gpu=10 | time=48:00:00, mem=752000, nodes=10, ntasks-per-node=40, (threads-per-core=2) |
Default resources of a queue class defines time, #tasks and memory if not explicitly given with sbatch command. Resource list acronyms --time, --ntasks, --nodes, --mem and --mem-per-cpu are described here.
Queue class examples
To run your batch job on one of the thin nodes, please use:
$ sbatch --partition=dev_multiple or $ sbatch -p dev_multiple
Interactive Jobs
On bwUniCluster 2.0 you are only allowed to run short jobs (<< 1 hour) with little memory requirements (<< 8 GByte) on the logins nodes. If you want to run longer jobs and/or jobs with a request of more than 8 GByte of memory, you must allocate resources for so-called interactive jobs by usage of the command salloc on a login node. Considering a serial application running on a compute node that requires 5000 MByte of memory and limiting the interactive run to 2 hours the following command has to be executed:
$ salloc -p single -n 1 -t 120 --mem=5000
Then you will get one core on a compute node within the partition "single". After execution of this command DO NOT CLOSE your current terminal session but wait until the queueing system Slurm has granted you the requested resources on the compute system. You will be logged in automatically on the granted core! To run a serial program on the granted core you only have to type the name of the executable.
$ ./<my_serial_program>
Please be aware that your serial job must run less than 2 hours in this example, else the job will be killed during runtime by the system.
You can also start now a graphical X11-terminal connecting you to the dedicated resource that is available for 2 hours. You can start it by the command:
$ xterm
Note that, once the walltime limit has been reached the resources - i.e. the compute node - will automatically be revoked.
An interactive parallel application running on one compute node or on many compute nodes (e.g. here 5 nodes) with 40 cores each requires usually an amount of memory in GByte (e.g. 50 GByte) and a maximum time (e.g. 1 hour). E.g. 5 nodes can be allocated by the following command:
$ salloc -p multiple -N 5 --ntasks-per-node=40 -t 01:00:00 --mem=50gb
Now you can run parallel jobs on 200 cores requiring 50 GByte of memory per node. Please be aware that you will be logged in on core 0 of the first node. If you want to have access to another node you have to open a new terminal, connect it also to BwUniCluster 2.0 and type the following commands to connect to the running interactive job and then to a specific node:
$ srun --jobid=XXXXXXXX --pty /bin/bash $ srun --nodelist=uc2nXXX --pty /bin/bash
With the command:
$ squeue
the jobid and the nodelist can be shown.
If you want to run MPI-programs, you can do it by simply typing mpirun <program_name>. Then your program will be run on 200 cores. A very simple example for starting a parallel job can be:
$ mpirun <my_mpi_program>
You can also start the debugger ddt by the commands:
$ module add devel/ddt $ ddt <my_mpi_program>
The above commands will execute the parallel program <my_mpi_program> on all available cores. You can also start parallel programs on a subset of cores; an example for this can be:
$ mpirun -n 50 <my_mpi_program>
If you are using Intel MPI you must start <my_mpi_program> by the command mpiexec.hydra (instead of mpirun).