BwForCluster JUSTUS 2 Slurm HOWTO: Difference between revisions
Line 842: | Line 842: | ||
Again, the QOS would be overriding the base priority that could be set |
Again, the QOS would be overriding the base priority that could be set |
||
in the associations. |
in the associations. |
||
== How to set a resource limit for an individual user? == |
|||
Example: |
|||
<pre> |
|||
$ sacctmgr modify user <username> set maxjobs=1 # Limit maximum number of running jobs for user |
|||
$ sacctmgr list assoc user=<username> format=user,maxjobs # Show that limit |
|||
$ sacctmgr modify user <username> set maxjobs=-1 # Remove that limit |
|||
</pre> |
|||
'''Note:''' Also see https://slurm.schedmd.com/resource_limits.html |
|||
== How to create/modify/delete QOSes? == |
|||
Suggested reading: https://slurm.schedmd.com/qos.html |
|||
Examples: |
|||
<pre> |
|||
$ sacctmgr show qos # Show existing QOSes |
|||
$ sacctmgr add qos verylong # Create new QOS verylong |
|||
$ sacctmgr modify qos verylong set MaxWall=28-00:00:00 # Set maximum walltime limit |
|||
$ sacctmgr modify qos verylong set MaxTRESPerUser=cpu=4 # Set maximum maximum number of CPUS a user can allocate at a given time |
|||
$ sacctmgr modify qos verylong set flags=denyonlimit # Prevent submission if job requests exceed any limits of QOS |
|||
$ sacctmgr modify user <username> set qos+=verylong # Add a QOS to a user account |
|||
$ sacctmgr modify user <username> set qos-=verylong # Remove a QOS from a user account |
|||
$ sacctmgr delete qos verylong # Delete that QOS |
|||
</pre> |
|||
== How to show a history of database transactions? == |
|||
<pre> |
|||
sacctmgr list transactions |
|||
</pre> |
|||
'''Note:''' Useful to get timestamps for when a user/account/qos has been created/modified/removed etc. |
Revision as of 16:09, 17 April 2020
The bwForCluster JUSTUS 2 is a state-wide high-performance compute resource dedicated to Computational Chemistry and Quantum Sciences in Baden-Württemberg, Germany.
Slurm Howto
Preface
This is a collection of howtos and convenient commands that I initially wrote for internal use at Ulm only. Scripts and commands have been tested within our Slurm test environment at JUSTUS (running Slurm 19.05 at the moment).
Maybe you find this collection useful, but use on your own risk. Things may behave differently with different Slurm versions and configurations.
GENERAL
How to find Slurm FAQ?
https://slurm.schedmd.com/faq.html
How to find a Slurm cheat sheet?
https://slurm.schedmd.com/pdfs/summary.pdf
How to get more information?
(Almost) every Slurm command has a man page. Use it.
Online versions: https://slurm.schedmd.com/man_index.html
JOB SUBMISSION
How to submit an interactive job?
Use srun command, e.g.:
$ srun --nodes=1 --ntasks-per-node=8 --pty bash
How to enable X11 forwarding for an interactive job?
Use --x11 flag, e.g.
$ srun --nodes=1 --ntasks-per-node=8 --pty --x11 bash # run shell with X11 forwarding enabled $ srun --nodes=1 --ntasks-per-node=8 --pty --x11 xterm # directly launch terminal window on node
Note:
- For X11 forwarding to work, you must also enable X11 forwarding for your ssh login from your local computer to the cluster, i.e.:
local> ssh -X <username>@justus2.uni-ulm.de>
How to submit a batch job?
Use sbatch command:
$ sbatch <job-script>
How to convert Moab batch job scripts to Slurm?
Replace Moab/Torque job specification flags and environment variables in your job scripts by their corresponding Slurm counterparts.
Commonly used Moab job specification flags and their Slurm equivalents
Option | Moab (msub) | Slurm (sbatch) |
---|---|---|
Script directive | #MSUB | #SBATCH |
Job name | -N <name> | --job-name=<name> (-J <name>) |
Account | -A <account> | --account=<account> (-A <account>) |
Queue | -q <queue> | --partition=<partition> (-p <partition>) |
Wall time limit | -l walltime=<hh:mm:ss> | --time=<hh:mm:ss> (-t <hh:mm:ss>) |
Node count | -l nodes=<count> | --nodes=<count> (-N <count>) |
Core count | -l procs=<count> | --ntasks=<count> (-n <count>) |
Process count per node | -l ppn=<count> | --ntasks-per-node=<count> |
Core count per process | --cpus-per-task=<count> | |
Memory limit per node | -l mem=<limit> | --mem=<limit> |
Memory limit per process | -l pmem=<limit> | --mem-per-cpu=<limit> |
Job array | -t <array indices> | --array=<indices> (-a <indices>) |
Node exclusive job | -l naccesspolicy=singlejob | --exclusive |
Initial working directory | -d <directory> (default: $HOME) | --chdir=<directory> (-D <directory>) (default: submission directory) |
Standard output file | -o <file path> | --output=<file> (-o <file>) |
Standard error file | -e <file path> | --error=<file> (-e <file>) |
Combine stdout/stderr to stdout | -j oe | --output=<combined stdout/stderr file> |
Mail notification events | -m <event> | --mail-type=<events> (valid types include: NONE, BEGIN, END, FAIL, ALL) |
Export environment to job | -V | --export=ALL (default) |
Don't export environment to job | (default) | --export=NONE |
Export environment variables to job | -v <var[=value][,var2=value2[, ...]]> | --export=<var[=value][,var2=value2[,...]]> |
Notes:
- Default initial job working directory is $HOME for Moab. For Slurm the default working directory is where you submit your job from.
- By default Moab does not export any environment variables to the job's runtime environment. With Slurm most of the login environment variables are exported to your job's runtime environment. This includes environment variables from software modules that were loaded at job submission time (and also $HOSTNAME variable).
Commonly used Moab/Torque script environment variables and their Slurm equivalents
Information | Moab | Torque | Slurm |
---|---|---|---|
Job name | $MOAB_JOBNAME | $PBS_JOBNAME | $SLURM_JOB_NAME |
Job ID | $MOAB_JOBID | $PBS_JOBID | $SLURM_JOB_ID |
Submit directory | $MOAB_SUBMITDIR | $PBS_O_WORKDIR | $SLURM_SUBMIT_DIR |
Number of nodes allocated | $MOAB_NODECOUNT | $PBS_NUM_NODES | $SLURM_JOB_NUM_NODES (and: $SLURM_NNODES) |
Node list | $MOAB_NODELIST | cat $PBS_NODEFILE | $SLURM_JOB_NODELIST |
Number of processes | $MOAB_PROCCOUNT | $PBS_TASKNUM | $SLURM_NTASKS |
Requested tasks per node | - | $PBS_NUM_PPN | $SLURM_NTASKS_PER_NODE |
Requested CPUs per task | --- | --- | $SLURM_CPUS_PER_TASK |
Job array index | $MOAB_JOBARRAYINDEX | $PBS_ARRAY_INDEX | $SLURM_ARRAY_TASK_ID |
Job array range | $MOAB_JOBARRAYRANGE | - | $SLURM_ARRAY_TASK_COUNT |
Queue name | $MOAB_CLASS | $PBS_QUEUE | $SLURM_JOB_PARTITION |
QOS name | $MOAB_QOS | --- | $SLURM_JOB_QOS |
--- | $PBS_NUM_PPN | $SLURM_TASKS_PER_NODE | |
Job user | $MOAB_USER | $PBS_O_LOGNAME | $SLURM_JOB_USER |
Hostname | $MOAB_MACHINE | $PBS_O_HOST | $SLURMD_NODENAME |
Note:
- See sbatch man page for a complete list of flags and environment variables.
How to view information about submitted jobs?
Use squeue command, e.g.:
$ squeue # all users (admins only) $ squeue -u <username> # jobs of specific user $ squeue -t PENDING # pending jobs only
Note: The output format of squeue (and most other Slurm commands) is highly configurable to your needs. Look for the --format or --Format options.
How to cancel jobs?
Use scancel command, e.g.
$ scancel <jobid> # cancel specific job $ scancel <jobid>_<index> # cancel indexed job in a job array $ scancel -u <username> # cancel all jobs of specific user $ scancel -t PENDING # cancel pending jobs
How to submit a serial batch job?
Sample job script template for serial job:
#!/bin/bash # Allocate one node #SBATCH --nodes=1 # Number of program instances to be executed #SBATCH --tasks-per-node=1 # 8 GB memory required per node #SBATCH --mem=8G # Maximum run time of job #SBATCH --time=1:00:00 # Give job a reasonable name #SBATCH --job-name=serial_job # File name for standard output (%j will be replaced by job id) #SBATCH --output=serial_job-%j.out # File name for error output #SBATCH --error=serial_job-%j.err # Load software modules as needed, e.g. # module load foo/bar # Run serial program ./my_serial_program
Sample code for serial program: "hello_serial.c":https://projects.uni-konstanz.de/attachments/download/16815/hello_serial.c
Notes:
- --nodes=1 and --tasks-per-node=1 may be replaced by --ntasks=1.
- If not specified, stdout and stderr are both written to slurm-%j.out.
How to emulate Moab output file names?
Use the following directives:
#SBATCH --output="%x.o%j" #SBATCH --error="%x.e%j"
How to pass command line arguments to the job script?
Run
sbatch <job-script> arg1 arg2 ...
Inside the job script the arguments can be accessed as $1, $2, ...
E.g.:
[...] infile="$1" outfile="$2" ./my_serial_program < "$infile" > "$outfile" 2>&1 [...]
Notes:
- Do not use $1, $2, ... in "#SBATCH" lines. These parameters can be used only within the regular shell script.
How to request local scratch (SSD/NVMe) at job submission?
Use '--gres=scratch:nnn' option to allocate nnn GB of local (i.e. node-local) scratch space for the entire job.
Example: '--gres=scratch:100' will allocate 100 GB scratch space on a locally attached NVMe device.
Notes:
- Do not add any unit (such as --gres=scratch:100G). This will would be treated as requesting an amount of 10^9 * 100GB of scratch space.
- Multinode jobs get nnn GB of local scratch space on every node of the job.
- Environment variable $SCRATCH will point to
- /scratch/<user>.<jobid> when local scratch has been requested
- /tmp/<user>.<jobid> when no local scratch has not been requested
- Environment variable $TMPDIR always points point to /tmp/<user>.<jobid>
- For backward compatibility environment variable $RAMDISK always points to /tmp/<user>.<jobid>
- Scratch space allocation in /scratch will be enforced by quota limits
- Data written to $TMPDIR will always count against allocated memory.
How to submit a multithreaded batch job?
Sample job script template for a job running one multithreaded program instance:
#!/bin/bash # Allocate one node #SBATCH --nodes=1 # Number of program instances to be executed #SBATCH --tasks-per-node=1 # Number of cores per program instance #SBATCH --cpus-per-task=8 # 8 GB memory required per node #SBATCH --mem=8G # Maximum run time of job #SBATCH --time=1:00:00 # Give job a reasonable name #SBATCH --job-name=multithreaded_job # File name for standard output (%j will be replaced by job id) #SBATCH --output=multithreaded_job-%j.out # File name for error output #SBATCH --error=multithreaded_job-%j.err # Load software modules as needed, e.g. # module load foo/bar export OMP_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE} export MKL_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE} # Run multithreaded program ./my_multithreaded_program
Sample code for multithreaded program: "hello_openmp.c":https://projects.uni-konstanz.de/attachments/download/16814/hello_openmp.c
Notes:
- In our configuration each physical core is considered a "CPU".
- Required memory can also by specified per allocated CPU with '--mem-per-cpu' option.
- The '--mem' and '--mem-per-cpu' options are mutually exclusive.
- In terms of core allocation '--tasks-per-node=1' or '--ntasks=1' together with '--cpus-per-task=8' is almost equivalent to '--tasks-per-node=8' or '--ntasks=8' and omitting '--cpus-per-task=8'. However, there are subtle differences when multiple tasks are spawned within one job by means of srun command.
How to submit an array job?
Use '-a' (or '--array') option, e.g.
sbatch -a 1-16%8 ...
This will submit 16 tasks to be executed, each one indexed by SLURM_ARRAY_TASK_ID ranging from 1 to 16, but will limit the number of simultaneously running tasks from this job array to 8.
Sample job script template for an array job:
# Number of cores per individual array task #SBATCH --ntasks=1 #SBATCH --array=1-16%8 #SBATCH --mem=4G #SBATCH --time=01:00:00 #SBATCH --job-name=array_job #SBATCH --output=array_job-%A_%a.out #SBATCH --error=array_job-%A_%a.err # Load software modules as needed, e.g. # module load foo/bar # Print the task id. echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID # Add lines here to run your computations, e.g. # ./my_program <input.$SLURM_ARRAY_TASK_ID
Notes:
- Placeholder %A will be replaced by the master job id, %a will be replaced by the array task id.
- Every sub job in an array job will have its own unique environment variable $SLURM_JOB_ID. Environment variable $SLURM_ARRAY_JOB_ID will be set to the first job array index value for all tasks.
- More information: https://slurm.schedmd.com/job_array.html
How to submit an MPI batch job?
Suggested reading: https://slurm.schedmd.com/mpi_guide.html
Sample job script template for an MPI job:
#!/bin/bash # Allocate two nodes #SBATCH --nodes=2 # Number of program instances to be executed #SBATCH --tasks-per-node=8 # Allocate 32 GB memory per node #SBATCH --mem=32gb # Maximum run time of job #SBATCH --time=1:00:00 # Give job a reasonable name #SBATCH --job-name=mpi_job # File name for standard output (%j will be replaced by job id) #SBATCH --output=mpi_job-%j.out # File name for error output #SBATCH --error=mpi_job-%j.err # Add lines here to run your computations, e.g. # # Option 1: Lauch MPI tasks by using mpirun # # for OpenMPI and GNU compiler: # # module load compiler/gnu # module load mpi/openmpi # mpirun ./my_mpi_program # # for Intel MPI and Intel complier: # # module load compiler/intel # module load mpi/impi # mpirun ./my_mpi_program # # Option 2: Launch MPI tasks by using srun # # for OpenMPI and GNU compiler: # # module load compiler/gnu # module load mpi/openmpi # srun ./my_mpi_program # # for Intel MPI and Intel compiler: # # module load compiler/intel # module load mpi/impi # srun ./my_mpi_program
Sample code for MPI program: "hello_mpi.c":https://projects.uni-konstanz.de/attachments/download/16813/hello_mpi.c
Notes
- SchedMD recommends to use srun and many (most?) sites do so as well. The rationale is that srun is more tightly integrated with the scheduler and provides more consistent and reliable resource tracking and accounting for individual jobs and job steps. mpirun may behave differently for different MPI implementations and versions. There are reports that claim "strange behavior" of mpirun especially when using task affinity and core binding. Using srun is supposed to resolve these issues and is therefore highly recommended.
How to submit a hybrid MPI/OpenMP job?
Sample job script template for an hybrid job:
#!/bin/bash # Number of nodes to allocate #SBATCH --nodes=4 # Number of MPI instances (ranks) to be executed per node #SBATCH --tasks-per-node=2 # Number of threads per MPI instance #SBATCH --cpus-per-task=4 # Allocate 8 GB memory per node #SBATCH --mem=8gb # Maximum run time of job #SBATCH --time=1:00:00 # Give job a reasonable name #SBATCH --job-name=hybrid_job # File name for standard output (%j will be replaced by job id) #SBATCH --output=hybrid_job-%j.out # File name for error output #SBATCH --error=hybrid_job-%j.err export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK} module load compiler/intel module load mpi/impi srun ./my_hybrid_program # or: # mpirun ./my_hybrid_program
Sample code for hybrid program: "hello_hybrid.c":https://projects.uni-konstanz.de/attachments/download/16812/hello_hybrid.c
Notes:
- $SLURM_CPUS_PER_TASK is only set if the '--cpus-per-task' option is specified.
How to request specific node(s) at job submission?
Use '-w' (or '--nodelist') option, e.g.:
$ sbatch -w <node1>,<node2> ...
Also see '-F' (or '--nodefile') option.
How to exclude specific nodes from job?
Use '-x' (or '--exclude') option, e.g.:
$ sbatch -x <node1>,<node2> ...
How to get exclusive jobs?
Use --exclusive option on job submission. This makes sure that there will be no other jobs running on your nodes. Very useful for benchmarking!
Note:
- --exclusive option does not mean that you automatically get full access to all the resources which the node might provide without explicitly requesting them.
How to avoid sharing nodes with other users?
Use '--exclusive=user' option on job submission. This will still allow multiple jobs of one and the same user on the nodes.
Note:
- Depending on configuration, exclusive=user may (and probably will) be the default node access policy on JUSTUS 2.
How to show job script of a running job?
Use scontrol command:
$ scontrol write batch_script <job_id> <file> $ scontrol write batch_script <job_id> -
- If file name is omitted default file name will be slurm-<job_id>.sh
- If file name is - (i.e. dash) job script will be written to stdout.
How to submit batch job without job script?
Use '--wrap' option.
Example:
$ sbatch --nodes=2 --ntasks-per-node=16 --wrap "sleep 600"
Note: May be useful for testing purposes.
JOB MONITORING
== How to get estimated start time of a job?
squeue --start
Note: Estimated start times are dynamic and can change at any moment. Exact start times of individual jobs are usually unpredictable.
How to check priority of jobs?
Use squeue with format options "%Q" and/or "%p", e.g.:
$ squeue -o "%8i %8u %15a %.10r %.10L %.5D %.10Q"
Use sprio command to display the priority components (age/fairshare/...) for each job:
$ sprio
Use "sshare command for listing the shares of associations, e.g. accounts.
$ sshare
How to monitor resource usage of running job(s)?
Use "sstat command.
'sstat -e' command shows a a list of fields that can be specified with the '--format' option.
Notes:
- Users can also ssh into compute nodes that they have one or more running jobs on. Once logged in, they can use standard Linux process monitoring tools like ps, (h)top, free, vmstat, iostat, du, ...
- Users can also attach an interactive shell under an already allocated job by running the following command:
srun --jobid <job> --pty /bin/bash
Once logged in, they can again use standard Linux process monitoring tools like ps, (h)top, free, vmstat, iostat, du, ... For a single node job the user does not even need to know the node that the job is running on. For a multinode job, the user can still use '-w <node>' option to specify a specific node.
How to show accounting data of completed job(s)?
Use sacct command.
'sacct -e' command shows a list of fields that can be specified with the '--format' option.
How to retrieve job history and accounting?
For a specific job:
$ sacct -j <jobid> --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist
For a specific user:
$ sacct -u <user> --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist
Note: Default time window is the current day.
Starting from a specific date:
$ sacct -u <user> -S 2020-01-15 --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist
Within a time window:
$ sacct -u <user> -S 2020-01-15 -E 2020-01-31 --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist
How to get efficiency information of completed job(s)?
Use 'seff <jobid>' command for some brief information.
How to get a parsable list of hostnames from $SLURM_JOB_NODELIST?
$ scontrol show hostnames $SLURM_JOB_NODELIST
ADMINISTRATION
How to stop Slurm from scheduling jobs?
You can stop Slurm from scheduling jobs on a per partition basis by setting that partition's state to DOWN. Set its state UP to resume scheduling. For example:
$ scontrol update PartitionName=foo State=DOWN $ scontrol update PartitionName=foo State=UP
How to reboot (all) nodes as soon as they become idle?
$ scontrol reboot ASAP nextstate=RESUME <node1>,<node2> # specific nodes $ scontrol reboot ASAP nextstate=RESUME ALL # all nodes
How to check current node status?
$ scontrol show node <node>
How to instruct all Slurm daemons to re-read the configuration file
$ scontrol reconfigure
How to prevent (hold) jobs from being scheduled for execution?
$ scontrol hold <job_id>
How to unhold job?
$ scontrol release <job_id>
How to suspend a running job?
$ scontrol suspend <job_id>
How to resume a suspended job?
$ scontrol resume <job_id>
How to requeue (cancel and resubmit) a particular job?
$ scontrol requeue <job_id>
How to prevent a user from submitting new jobs?
Use the following sacctmgr command:
$ sacctmgr update user <username> set maxsubmitjobs=0
Notes:
- Job submission is then rejected with the following message:
$ sbatch job.slurm sbatch: error: AssocMaxSubmitJobLimit sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
- Use the following command to release the limit:
sacctmgr update user <username> set maxsubmitjobs=-1
How to drain node(s)?
$ scontrol update NodeName=<node1>,<node2> State=DRAIN Reason="Some Reason"
Notes:
- Reason is mandatory.
- Do not just set state DOWN to drain nodes. This will kill any active jobs that may run on that nodes.
How to resume node state?
$ scontrol update NodeName=<node1>,<node2> State=RESUME
How to create a reservation on nodes?
$ scontrol create reservation user=root starttime=now duration=UNLIMITED flags=maint,ignore_jobs nodes=ALL $ scontrol create reservation user=root starttime=2020-12-24T17:00 duration=12:00:00 flags=maint,ignore_jobs nodes=<node1>,<node2> $ scontrol show reservation
See: https://slurm.schedmd.com/reservations.html
How to use a reservation?
$ sbatch --reservation=foo_6 ... script.slurm
How to delete a reservation?
$ scontrol delete ReservationName=foo_6
How to get node oriented information similar to 'mdiag -n'?
$ sinfo -N -l
Fields can be individually customized. See sinfo man page. For example:
$ sinfo -N --format="%8N %12P %.4C %.8O %.6m %.6e %.8T %.20E" NODELIST PARTITION CPUS CPU_LOAD MEMORY FREE_M STATE REASON n0001 standard* 0/16 0.01 128000 120445 idle none n0002 standard* 0/16 0.01 128000 120438 idle none n0003 standard* 0/0/ N/A 128000 N/A down* Not responding
How to get node oriented information similar to 'pbsnodes'?
$ scontrol show nodes # One paragraph per node $ scontrol -o show nodes # One line per node
How to get job information similar to 'checkjob'?
$ scontrol show job 1234 # For job id 1234 $ scontrol show jobs # For all jobs $ scontrol -o show jobs # One line per job
How to modify a running job?
Use
$ scontrol update JobId=<jobid> ...
E.g.:
$ scontrol update JobId=42 TimeLimit=28-0
This will modify the time limit of the job to 28 days.
How to update multiple jobs of a user with a single scontrol command?
Not possible. But you can e.g. use squeue to build the script taking advantage of its filtering and formatting options.
$ squeue -tpd -h -o "scontrol update jobid=%i priority=1000" >my.script
You can also identify the list of jobs and add them to the JobID all at once, for example:
$ scontrol update JobID=123 qos=reallylargeqos $ scontrol update JobID=123,456,789 qos=reallylargeqos $ scontrol update JobID=[123-400],[500-600] qos=reallylargeqos
Another option is to use the JobName, if all the jobs have the same name.
$ scontrol update JobName="foobar" UserID=johndoe qos=reallylargeqos
However, Slurm does not allow the UserID filter alone.
How to create a new account? =
Add account at top level in association tree:
$ sacctmgr add account <accountname> Cluster=justus Description="Account description" Organization="none"
Add account as child of some parent account in association tree:
$ sacctmgr add account <accountname> parent=<parent_accountname>
How to move account to another parent?
$ sacctmgr modify account name=<accountname> set parent=<new_parent_accountname>
How to delete an account?
$ sacctmgr delete account name=<accountname>
How to add a new user?
$ sacctmgr add user <username> DefaultAccount=<accountname>
How to add/remove users from an account?
$ sacctmgr add user <username> account=<accountname> # Add user to account $ sacctmgr add user <username> account=<accountname2> # Add user to a second account $ sacctmgr remove user <username> where account=<accountname> # Remove user from this account
How to show account information?
$ sacctmgr show assoc $ sacctmgr show assoc tree
How to implement user resource throttling policies?
Quoting from https://bugs.schedmd.com/show_bug.cgi?id=3600#c4
With Slurm, the associations are meant to establish base limits on the defined partitions, accounts and users. Because limits propagate down through the association tree, you only need to define limits at a high level and those limits will be applied to all partitions, accounts and users that are below it (parent to child). You can also override those high level (parent) limits by explicitly setting different limits at any lower level (on the child). So using the association tree is the best way to get some base limits applied that you want for most cases. QOS's are meant to override any of those base limits for exceptional cases. Like Maui, you can use QOS's to set a different priority. Again, the QOS would be overriding the base priority that could be set in the associations.
How to set a resource limit for an individual user?
Example:
$ sacctmgr modify user <username> set maxjobs=1 # Limit maximum number of running jobs for user $ sacctmgr list assoc user=<username> format=user,maxjobs # Show that limit $ sacctmgr modify user <username> set maxjobs=-1 # Remove that limit
Note: Also see https://slurm.schedmd.com/resource_limits.html
How to create/modify/delete QOSes?
Suggested reading: https://slurm.schedmd.com/qos.html
Examples:
$ sacctmgr show qos # Show existing QOSes $ sacctmgr add qos verylong # Create new QOS verylong $ sacctmgr modify qos verylong set MaxWall=28-00:00:00 # Set maximum walltime limit $ sacctmgr modify qos verylong set MaxTRESPerUser=cpu=4 # Set maximum maximum number of CPUS a user can allocate at a given time $ sacctmgr modify qos verylong set flags=denyonlimit # Prevent submission if job requests exceed any limits of QOS $ sacctmgr modify user <username> set qos+=verylong # Add a QOS to a user account $ sacctmgr modify user <username> set qos-=verylong # Remove a QOS from a user account $ sacctmgr delete qos verylong # Delete that QOS
How to show a history of database transactions?
sacctmgr list transactions
Note: Useful to get timestamps for when a user/account/qos has been created/modified/removed etc.