BwUniCluster3.0/Slurm: Difference between revisions
(Created page with "__TOC__ = Slurm HPC Workload Manager = == Specification == Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of...") |
No edit summary |
||
Line 12: | Line 12: | ||
== Slurm Commands (excerpt) == |
== Slurm Commands (excerpt) == |
||
Important Slurm commands for non-administrators working on bwUniCluster 3.0. |
|||
{| width=750px class="wikitable" |
{| width=750px class="wikitable" |
||
! Slurm commands !! Brief explanation |
! Slurm commands !! Brief explanation |
||
|- |
|- |
||
| [[#Job Submission : sbatch|sbatch]] || Submits a job and |
| [[#Job Submission : sbatch|sbatch]] || Submits a job and puts it into the queue [[https://slurm.schedmd.com/sbatch.html sbatch]] |
||
|- |
|- |
||
| [[#Detailed job information : scontrol show job|scontrol show job]] || Displays detailed job state information [[https://slurm.schedmd.com/scontrol.html scontrol]] |
| [[#Detailed job information : scontrol show job|scontrol show job]] || Displays detailed job state information [[https://slurm.schedmd.com/scontrol.html scontrol]] |
||
Line 22: | Line 22: | ||
| [[#List of your submitted jo/bs : squeue|squeue]] || Displays information about active, eligible, blocked, and/or recently completed jobs [[https://slurm.schedmd.com/squeue.html squeue]] |
| [[#List of your submitted jo/bs : squeue|squeue]] || Displays information about active, eligible, blocked, and/or recently completed jobs [[https://slurm.schedmd.com/squeue.html squeue]] |
||
|- |
|- |
||
| [[#Start time of job or resources : squeue|squeue --start]] || Returns start time of submitted job |
| [[#Start time of job or resources : squeue|squeue --start]] || Returns start time of submitted job [[https://slurm.schedmd.com/squeue.html squeue]] |
||
|- |
|- |
||
| [[#Shows free resources : sinfo_t_idle|sinfo_t_idle]] || Shows what resources are available for immediate use [[https://slurm.schedmd.com/sinfo.html sinfo]] |
| [[#Shows free resources : sinfo_t_idle|sinfo_t_idle]] || Shows what resources are available for immediate use [[https://slurm.schedmd.com/sinfo.html sinfo]] |
||
|- |
|- |
||
| [[#Canceling own jobs : scancel|scancel]] || Cancels a job |
| [[#Canceling own jobs : scancel|scancel]] || Cancels a job [[https://slurm.schedmd.com/scancel.html scancel]] |
||
|} |
|} |
||
Revision as of 14:34, 5 December 2024
Slurm HPC Workload Manager
Specification
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
Any kind of calculation on the compute nodes of bwUniCluster 2.0 requires the user to define calculations as a sequence of commands or single command together with required run time, number of CPU cores and main memory and submit all, i.e., the batch job, to a resource and workload managing software. bwUniCluster 2.0 has installed the workload managing software Slurm. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.
Slurm Commands (excerpt)
Important Slurm commands for non-administrators working on bwUniCluster 3.0.
Slurm commands | Brief explanation |
---|---|
sbatch | Submits a job and puts it into the queue [sbatch] |
scontrol show job | Displays detailed job state information [scontrol] |
squeue | Displays information about active, eligible, blocked, and/or recently completed jobs [squeue] |
squeue --start | Returns start time of submitted job [squeue] |
sinfo_t_idle | Shows what resources are available for immediate use [sinfo] |
scancel | Cancels a job [scancel] |