Difference between revisions of "JUSTUS2/Slurm"

From bwHPC Wiki
Jump to: navigation, search
(Job Priorities)
(Job Priorities)
Line 17: Line 17:
 
Fairshare does '''not''' introduce a fixed allotment, in that a user's ability to run new jobs is cut off as soon as a fixed target utilization is reached. Instead, the fairshare factor ensures that jobs from users who were under-served in the past are given higher priority than jobs from users who were over-served in the past. This keeps individual groups from long term monopolizing the resources, thus making it unfair to groups who have not used their fairshare for quite some time.
 
Fairshare does '''not''' introduce a fixed allotment, in that a user's ability to run new jobs is cut off as soon as a fixed target utilization is reached. Instead, the fairshare factor ensures that jobs from users who were under-served in the past are given higher priority than jobs from users who were over-served in the past. This keeps individual groups from long term monopolizing the resources, thus making it unfair to groups who have not used their fairshare for quite some time.
   
Slurm features backfilling, meaning that the scheduler will start lower priority jobs if doing so does not delay the expected start time of '''any''' higher priority job. Since the expected start time of pending jobs depends upon the expected completion time of running jobs, reasonably accurate time limits are valuable for backfill scheduling to work well. Watch this [https://youtu.be/OKhWwem1XZg?t=161 video] for a nice illustration about how backfilling works.
+
Slurm features '''backfilling''', meaning that the scheduler will start lower priority jobs if doing so does not delay the expected start time of '''any''' higher priority job. Since the expected start time of pending jobs depends upon the expected completion time of running jobs, reasonably accurate time limits are valuable for backfill scheduling to work well. Watch this [https://youtu.be/OKhWwem1XZg?t=161 video] for a nice illustration about how backfilling works.

Revision as of 08:29, 8 July 2020

The bwForCluster JUSTUS 2 is a state-wide high-performance compute resource dedicated to Computational Chemistry and Quantum Sciences in Baden-Württemberg, Germany.

The JUSTUS 2 cluster uses Slurm for scheduling compute jobs.

In order to get started with Slurm at JUSTUS 2, please visit our Slurm HOWTO for JUSTUS 2.

1 Partitions

Job allocations at JUSTUS 2 are routed automatically to the most suitable compute node(s) that can provide the requested resources for the job (e.g. amount of cores, memory, local scratch space). This is to prevent fragmentation of the cluster system and to ensure most efficient usage of available compute resources. There is no need to request a specific partition in in your batch job scripts, i.e. users must not specify "-p, --partition=<partition_name>" on job submission. This is of particular importance if you adapt job scripts from other cluster systems (e.g. bwUniCluster 2.0) to JUSTUS 2.

2 Job Priorities

Job priorities at JUSTUS 2 depend on multiple factors :

  • Age: The amount of time a job has been waiting in the queue, eligible to be scheduled
  • Fairshare: The difference between the portion of the computing resource allocated to an association and the amount of resources that has been consumed.

Note:

Fairshare does not introduce a fixed allotment, in that a user's ability to run new jobs is cut off as soon as a fixed target utilization is reached. Instead, the fairshare factor ensures that jobs from users who were under-served in the past are given higher priority than jobs from users who were over-served in the past. This keeps individual groups from long term monopolizing the resources, thus making it unfair to groups who have not used their fairshare for quite some time.

Slurm features backfilling, meaning that the scheduler will start lower priority jobs if doing so does not delay the expected start time of any higher priority job. Since the expected start time of pending jobs depends upon the expected completion time of running jobs, reasonably accurate time limits are valuable for backfill scheduling to work well. Watch this video for a nice illustration about how backfilling works.