BwUniCluster3.0/Slurm: Difference between revisions
No edit summary |
No edit summary |
||
Line 13: | Line 13: | ||
== Slurm Commands (excerpt) == |
== Slurm Commands (excerpt) == |
||
Important Slurm commands for non-administrators working on bwUniCluster 3.0. |
Important Slurm commands for non-administrators working on bwUniCluster 3.0. |
||
{| width= |
{| width=850px class="wikitable" |
||
! Slurm commands !! Brief explanation |
! Slurm commands !! Brief explanation |
||
|- |
|- |
||
Line 35: | Line 35: | ||
* [https://slurm.schedmd.com/pdfs/summary.pdf Slurm command/option summary (2 pages)] |
* [https://slurm.schedmd.com/pdfs/summary.pdf Slurm command/option summary (2 pages)] |
||
* [https://slurm.schedmd.com/man_index.html Slurm Commands] |
* [https://slurm.schedmd.com/man_index.html Slurm Commands] |
||
<br> |
|||
== Job Submission : sbatch == |
|||
Batch jobs are submitted by using the command '''sbatch'''. The main purpose of the '''sbatch''' command is to specify the resources that are needed to run the job. '''sbatch''' will then queue the batch job. However, starting of batch job depends on the availability of the requested resources and the fair sharing value. |
|||
<br> |
|||
<br> |
|||
=== sbatch Command Parameters === |
|||
The syntax and use of '''sbatch''' can be displayed via: |
|||
<pre> |
|||
$ man sbatch |
|||
</pre> |
|||
'''sbatch''' options can be used from the command line or in your job script. |
|||
{| class="wikitable" |
|||
! colspan="3" | sbatch Options |
|||
|- |
|||
! style="width:13%"| Command line |
|||
! style="width:13%"| Script |
|||
! style="width:13%"| Purpose |
|||
|- style="vertical-align:top;" |
|||
| -t, --time=''time'' |
|||
| #SBATCH --time=''time'' |
|||
| Wall clock time limit.<br> |
|||
|- |
|||
|- style="vertical-align:top;" |
|||
| -N, --nodes=''count'' |
|||
| #SBATCH --nodes=''count'' |
|||
| Number of nodes to be used. |
|||
|- style="vertical-align:top;" |
|||
| -n, --ntasks=''count'' |
|||
| #SBATCH --ntasks=''count'' |
|||
| Number of tasks to be launched. |
|||
|- |
|||
|- style="vertical-align:top;" |
|||
| --ntasks-per-node=''count'' |
|||
| #SBATCH --ntasks-per-node=''count'' |
|||
| Maximum count (<= 28 and <= 40 resp.) of tasks per node.<br>(Replaces the option ppn of MOAB.) |
|||
|- |
|||
|- style="vertical-align:top;" |
|||
| -c, --cpus-per-task=''count'' |
|||
| #SBATCH --cpus-per-task=''count'' |
|||
| Number of CPUs required per (MPI-)task. |
|||
|- |
|||
|- style="vertical-align:top;" |
|||
| --mem=''value_in_MB'' |
|||
| #SBATCH --mem=''value_in_MB'' |
|||
| Memory in MegaByte per node.<br>(Default value is 128000 and 96000 MB resp., i.e. you should omit <br> the setting of this option.) |
|||
|- |
|||
|- style="vertical-align:top;" |
|||
| --mem-per-cpu=''value_in_MB'' |
|||
| #SBATCH --mem-per-cpu=''value_in_MB'' |
|||
| Minimum Memory required per allocated CPU.<br>(Replaces the option pmem of MOAB. You should omit <br> the setting of this option.) |
|||
|- |
|||
|- style="vertical-align:top;" |
|||
| --mail-type=''type'' |
|||
| #SBATCH --mail-type=''type'' |
|||
| Notify user by email when certain event types occur.<br>Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL. |
|||
|- |
|||
|- style="vertical-align:top;" |
|||
| --mail-user=''mail-address'' |
|||
| #SBATCH --mail-user=''mail-address'' |
|||
| The specified mail-address receives email notification of state<br>changes as defined by --mail-type. |
|||
|- |
|||
|- style="vertical-align:top;" |
|||
| --output=''name'' |
|||
| #SBATCH --output=''name'' |
|||
| File in which job output is stored. |
|||
|- |
|||
|- style="vertical-align:top;" |
|||
| --error=''name'' |
|||
| #SBATCH --error=''name'' |
|||
| File in which job error messages are stored. |
|||
|- |
|||
|- style="vertical-align:top;" |
|||
| -J, --job-name=''name'' |
|||
| #SBATCH --job-name=''name'' |
|||
| Job name. |
|||
|- |
|||
|- style="vertical-align:top;" |
|||
| --export=[ALL,] ''env-variables'' |
|||
| #SBATCH --export=[ALL,] ''env-variables'' |
|||
| Identifies which environment variables from the submission <br> environment are propagated to the launched application. Default <br> is ALL. If adding an environment variable to the submission<br> environment is intended, the argument ALL must be added. |
|||
|- |
|||
|- style="vertical-align:top;" |
|||
| -A, --account=''group-name'' |
|||
| #SBATCH --account=''group-name'' |
|||
| Change resources used by this job to specified group. You may <br> need this option if your account is assigned to more <br> than one group. By command "scontrol show job" the project <br> group the job is accounted on can be seen behind "Account=". |
|||
|- |
|||
|- style="vertical-align:top;" |
|||
| -p, --partition=''queue-name'' |
|||
| #SBATCH --partition=''queue-name'' |
|||
| Request a specific queue for the resource allocation. |
|||
|- |
|||
|- style="vertical-align:top;" |
|||
| --reservation=''reservation-name'' |
|||
| #SBATCH --reservation=''reservation-name'' |
|||
| Use a specific reservation for the resource allocation. |
|||
|- |
|||
|- style="vertical-align:top;" |
|||
| -C, --constraint=''LSDF'' |
|||
| #SBATCH --constraint=LSDF |
|||
| Job constraint LSDF Filesystems. |
|||
|- |
|||
|- style="vertical-align:top;" |
|||
| -C ''BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)'' or --constraint=''BEEOND (BEEOND_4MDS, BEEOND_MAXMDS'' |
|||
| #SBATCH --constraint=BEEOND (BEEOND_4MDS, BEEOND_MAXMDS) |
|||
| Job constraint BeeOND file system. |
|||
|- |
|||
|} |
|||
<br> |
<br> |
Revision as of 15:19, 5 December 2024
Slurm HPC Workload Manager
Specification
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
Any kind of calculation on the compute nodes of bwUniCluster 2.0 requires the user to define calculations as a sequence of commands or single command together with required run time, number of CPU cores and main memory and submit all, i.e., the batch job, to a resource and workload managing software. bwUniCluster 2.0 has installed the workload managing software Slurm. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.
Slurm Commands (excerpt)
Important Slurm commands for non-administrators working on bwUniCluster 3.0.
Slurm commands | Brief explanation |
---|---|
sbatch | Submits a job and puts it into the queue [sbatch] |
salloc | Requests resources for an interactive Job [salloc] |
scontrol show job | Displays detailed job state information [scontrol] |
squeue | Displays information about active, eligible, blocked, and/or recently completed jobs [squeue] |
squeue --start | Returns start time of submitted job [squeue] |
sinfo_t_idle | Shows what resources are available for immediate use [sinfo] |
scancel | Cancels a job [scancel] |
Job Submission : sbatch
Batch jobs are submitted by using the command sbatch. The main purpose of the sbatch command is to specify the resources that are needed to run the job. sbatch will then queue the batch job. However, starting of batch job depends on the availability of the requested resources and the fair sharing value.
sbatch Command Parameters
The syntax and use of sbatch can be displayed via:
$ man sbatch
sbatch options can be used from the command line or in your job script.
sbatch Options | ||
---|---|---|
Command line | Script | Purpose |
-t, --time=time | #SBATCH --time=time | Wall clock time limit. |
-N, --nodes=count | #SBATCH --nodes=count | Number of nodes to be used. |
-n, --ntasks=count | #SBATCH --ntasks=count | Number of tasks to be launched. |
--ntasks-per-node=count | #SBATCH --ntasks-per-node=count | Maximum count (<= 28 and <= 40 resp.) of tasks per node. (Replaces the option ppn of MOAB.) |
-c, --cpus-per-task=count | #SBATCH --cpus-per-task=count | Number of CPUs required per (MPI-)task. |
--mem=value_in_MB | #SBATCH --mem=value_in_MB | Memory in MegaByte per node. (Default value is 128000 and 96000 MB resp., i.e. you should omit the setting of this option.) |
--mem-per-cpu=value_in_MB | #SBATCH --mem-per-cpu=value_in_MB | Minimum Memory required per allocated CPU. (Replaces the option pmem of MOAB. You should omit the setting of this option.) |
--mail-type=type | #SBATCH --mail-type=type | Notify user by email when certain event types occur. Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL. |
--mail-user=mail-address | #SBATCH --mail-user=mail-address | The specified mail-address receives email notification of state changes as defined by --mail-type. |
--output=name | #SBATCH --output=name | File in which job output is stored. |
--error=name | #SBATCH --error=name | File in which job error messages are stored. |
-J, --job-name=name | #SBATCH --job-name=name | Job name. |
--export=[ALL,] env-variables | #SBATCH --export=[ALL,] env-variables | Identifies which environment variables from the submission environment are propagated to the launched application. Default is ALL. If adding an environment variable to the submission environment is intended, the argument ALL must be added. |
-A, --account=group-name | #SBATCH --account=group-name | Change resources used by this job to specified group. You may need this option if your account is assigned to more than one group. By command "scontrol show job" the project group the job is accounted on can be seen behind "Account=". |
-p, --partition=queue-name | #SBATCH --partition=queue-name | Request a specific queue for the resource allocation. |
--reservation=reservation-name | #SBATCH --reservation=reservation-name | Use a specific reservation for the resource allocation. |
-C, --constraint=LSDF | #SBATCH --constraint=LSDF | Job constraint LSDF Filesystems. |
-C BEEOND (BEEOND_4MDS, BEEOND_MAXMDS) or --constraint=BEEOND (BEEOND_4MDS, BEEOND_MAXMDS | #SBATCH --constraint=BEEOND (BEEOND_4MDS, BEEOND_MAXMDS) | Job constraint BeeOND file system. |