BwForCluster MLS&WISO Production Batch Jobs
October 2020: This page is outdated. bwForCluster MLS&WISO Production migrated to Slurm.
Contents
1 Job Submission
This page describes the details of the queuing system specific to the bwForCluster MLS&WISO (Production). A general description on options that should work on all bwHPC clusters can be found on the Batch Jobs page.
Jobs are run in job-exclusive mode. That means only one job from a single user can run concurrently on one node. The whole node is reserved for a single job regardless how many cores are requested with the ppn flag and the user is responsible for usLink titleing all cores as efficiently as possible. The job-exclusive mode policy maximum privacy and prevents accidental competition for resources among jobs on a single node.
How can you use a node efficiently?
- Parallelization of one task: Increasing the number of requested cores will not speed up your code automatically. It may be necessary to change your code. See the batch job examples for some hints. Please contact us if you need support.
- Multiple tasks in one job: It is possible to submit many sequential tasks in a single job. On the cluster you can find some examples in the directory: /opt/bwhpc/common/system/moab-examples
For Job Monitoring users have ssh-access to the nodes on which their jobs run.
1.1 msub -l resource_list
The different node types are requested via node features in the resource_list. Standard nodes are default. Queues are automatically selected according to the requested number of nodes, node feature, and walltime.
To get the entire available physical memory of a node type it is sufficient if you request all cores of that node type. If you request less cores make sure to request sufficient memory. About 5 GB memory are reserved for the running operating system and can not be requested.
Explicit memory requests are done via mem to request total memory or pmem to request memory per requested core.
Examples:
Resource Request | Resource Allocation |
---|---|
-l nodes=2:ppn=16 | 2 standard nodes. All 16 cores per node are explicitly allocated. |
-l nodes=2:ppn=16:best | 2 best nodes. All 16 cores per nodes are explicitly allocated. |
-l nodes=1:ppn=32:fat-ivy | 1 fat-ivy node (Ivy Bridge CPU). All 32 cores are explicitly allocated. |
1.2 Queues and Limits
For the Haswell nodes:
Queue | Minimum [Default] | Maximum [Default] | ||
---|---|---|---|---|
nodes= | walltime= | nodes= | walltime= | |
quick | 1:ppn=16[:standard] | 00:10:00 | 1:ppn=16[:standard] | 00:30:00 |
single | 1:ppn=16[:standard] | 00:30:01 | 1:ppn=16[:standard] | 5:00:00:00 |
multi | 2:ppn=16[:standard] | 00:10:00 | 128:ppn=16[:standard] | 2:00:00:00 |
quick | 1:ppn=16:best | 00:10:00 | 1:ppn=16:best | 00:30:00 |
single | 1:ppn=16:best | 00:30:01 | 1:ppn=16:best | 5:00:00:00 |
multi | 2:ppn=16:best | 00:10:00 | 128:ppn=16:best | 2:00:00:00 |
gpu | 1:ppn=16:gpu | 00:10:00 | 18:ppn=16:gpu | 2:00:00:00 |
fat | 1:ppn=40:fat | 00:10:00 | 1:ppn=40:fat | 2:00:00:00 |
For the Ivy-Bridge nodes:
Queue | Minimum [Default] | Maximum [Default] | ||
---|---|---|---|---|
nodes= | walltime= | nodes= | walltime= | |
fat-ivy | 1:ppn=32:fat-ivy | 00:10:00 | 1:ppn=32:fat-ivy | 2:00:00:00 |
For the Skylake nodes:
Queue | Minimum | Maximum | ||
---|---|---|---|---|
nodes= | walltime= | nodes= | walltime= | |
best-sky | 1:ppn=32:best-sky | 00:10:00 | 16:ppn=32:best-sky | 2:00:00:00 |
gpu-sky | 1:ppn=1:gpus=1:gpu-sky | 00:10:00 | 1:ppn=32:gpus=4:gpu-sky | 2:00:00:00 |
Hints for the gpu-sky nodes:
- The gpu-sky nodes are operated in shared mode.
- If you need a certain GPU type, you must specifiy the corresponding the hostname. For example: -l nodes=h09c0101:ppn=4:gpus=2:gpu-sky
1.3 Hyperthreading
The intel CPU in all nodes provide hyperthreading. To use hyperthreading on a node, you have to use the following msub option:
-l naccesspolicy=singlejob
for example
-l nodes=1 -l walltime=0:00:10:00 -l naccesspolicy=singlejob
2 Job Monitoring
2.1 showq -r
To monitor your job during the runtime you can use 'showq -r'. The column 'EFFIC' shows the current CPU efficiency of your job.
******* JOBID S PAR *EFFIC* XFACTOR Q USERNAME GROUP MHOST PROCS REMAINING STARTTIME ******* 164074 R tor 58.25 1.1 - xx_user ma_ma m04s0201 16 1:08:36:22 Mon Feb 13 20:41:13 164077 R tor 58.03 1.0 - xx_user ma_ma m13s0501 16 4:22:11:23 Tue Feb 14 10:16:14
2.2 SSH Login To The Node
If you like to see what happens on the compute node(s), you have to login to a node where the job is running. Get the node name using:
$ checkjob <jobid> ... Allocated Nodes: [m13s0501:8] # the node name is m13s0501 ...
Using the ssh command will bring you to the compute node. For example to node m13s0501:
[login2 ~]$ ssh m13s0501 [m13s0501~]$ top
Commands like 'top' shows you the most busy processes on the node. To exit 'top' type 'q'.
3 Job Feedback
After job completion you can find feedback on resource usage and job efficiency at the end of the regular output file <jobname>.o<jobid>.
The feedback section provides the following information:
- Job parameters (job name and state, host list, time, requested and used resources)
- Job analysis - possible problems and solutions, advice on how to increase the efficiency
- Error messages associated with the job (if present)
Aim for a high node efficiency. A value of 100 % can be reached with well parallelized code or with starting enough serial tasks inside a job.
4 Interactive Jobs
Interactive jobs must NOT run on the logins nodes, however resources for interactive jobs can be requested using msub. The following example starts an interactive session on a compute node for 2 hours:
$ msub -I -V -l nodes=1 -l walltime=2:00:00
The option "-I" means "interactive job" and the option "-V" exports all environment variables to the compute node of the interactive session. After execution of this command wait until the queueing system has granted you the requested resources. Once granted you will be automatically logged on the allocated compute node. The full node with all cores will be available for you.
If you use applications or tools which provide a GUI, enable X-forwarding for your interactive session with:
$ msub -I -V -X -l nodes=1,walltime=2:00:00
Once the walltime limit has been reached you will be automatically logged out from the compute node.
5 Shows free resources : showbf
The showbf command can be used to find out how many nodes are available for immediate use on the system. On bwForCluster MLS&WISO (Production) the most important option is the -f flag which shows the availability of resources with a certain node feature.
Examples:
$ showbf -f best Partition Tasks Nodes Duration StartOffset StartDate --------- ------ ----- ------------ ------------ -------------- ALL 144 9 4:27:18 00:00:00 10:18:11_01/06 ALL 16 1 INFINITY 00:00:00 10:18:11_01/06
The output tells you that 9 nodes are immediately available (StartOffset = 00:00:00) for a walltime up to 4 hours 27 minutes (Duration = 4:27:18). One node is available for maximum walltime (Duration = INFINITY).
$ showbf -f standard resources not available
This output means there are no standard nodes available for immediate use.