BwForCluster MLS&WISO Production Batch Jobs - bwHPC Wiki BwForCluster MLS&WISO Production Batch Jobs - bwHPC Wiki

BwForCluster MLS&WISO Production Batch Jobs

From bwHPC Wiki
Jump to: navigation, search

1 Job Submission

This page describes the details of the queuing system specific to the bwForCluster MLS&WISO (Production). A general description on options that should work on all bwHPC clusters can be found on the Batch Jobs page.

Jobs are run in job-exclusive mode. That means only one job from a single user can run concurrently on one node. The whole node is reserved for a single job regardless how many cores are requested with the ppn flag and the user is responsible for using all cores as efficiently as possible. The job-exclusive mode policy maximum privacy and prevents accidental competition for resources among jobs on a single node.

How can you use a node efficiently?

  • Parallelization of one task: Increasing the number of requested cores will not speed up your code automatically. It may be necessary to change your code. See the batch job examples for some hints. Please contact us if you need support.
  • Multiple tasks in one job: It is possible to submit many sequential tasks in a single job. On the cluster you can find some examples in the directory: /opt/bwhpc/common/system/moab-examples

For Job Monitoring users have ssh-access to the nodes on which their jobs run.


1.1 msub -l resource_list

The different node types are requested via node features in the resource_list. Standard nodes are default. Queues are automatically selected according to the requested number of nodes, node feature, and walltime.

To get the entire available physical memory of a node type it is sufficient if you request all cores of that node type. If you request less cores make sure to request sufficient memory. About 5 GB memory are reserved for the running operating system and can not be requested.

Explicit memory requests are done via mem to request total memory or pmem to request memory per requested core.

Examples:

Resource Request Resource Allocation
-l nodes=2:ppn=16 2 standard nodes. All 16 cores per node are explicitly allocated.
-l nodes=2:ppn=16:best 2 best nodes. All 16 cores per nodes are explicitly allocated.
-l nodes=1:ppn=32:fat-ivy 1 fat-ivy node (Ivy Bridge CPU). All 32 cores are explicitly allocated.

1.2 Queues and Limits

For the Haswell and Ivy-Bridge nodes:

Queue Minimum [Default] Maximum [Default]
nodes= walltime= nodes= walltime=
quick 1:ppn=16[:standard] 00:10:00 1:ppn=16[:standard] 00:30:00
single 1:ppn=16[:standard] 00:30:01 1:ppn=16[:standard] 5:00:00:00
multi 2:ppn=16[:standard] 00:10:00 128:ppn=16[:standard] 2:00:00:00
quick 1:ppn=16:best 00:10:00 1:ppn=16:best 00:30:00
single 1:ppn=16:best 00:30:01 1:ppn=16:best 5:00:00:00
multi 2:ppn=16:best 00:10:00 128:ppn=16:best 2:00:00:00
gpu 1:ppn=16:gpu 00:10:00 18:ppn=16:gpu 2:00:00:00
mic 1:ppn=16:mic 00:10:00 1:ppn=16:mic 2:00:00:00
fat 1:ppn=40:fat 00:10:00 1:ppn=40:fat 2:00:00:00
fat-ivy 1:ppn=32:fat-ivy 00:10:00 1:ppn=32:fat-ivy 2:00:00:00

For the Skylake nodes:

Queue Minimum Maximum
nodes= walltime= nodes= walltime=
best-sky 1:ppn=32:best-sky 00:10:00 16:ppn=32:best-sky 2:00:00:00
gpu-sky 1:ppn=1:gpus=1:gpu-sky 00:10:00 1:ppn=32:gpus=4:gpu-sky 2:00:00:00

Hints for the gpu-sky nodes:

  • The gpu-sky nodes are operated in shared mode.
  • If you need a certain GPU type, you must specifiy the corresponding the hostname. For example: -l nodes=h09c0101:ppn=4:gpus=2:gpu-sky

1.3 Hyperthreading

The intel CPU in all nodes provide hyperthreading. To use hyperthreading on a node, you have to use the following msub option:

-l naccesspolicy=singlejob

for example

-l nodes=1 -l walltime=0:00:10:00  -l naccesspolicy=singlejob


2 Job Monitoring

2.1 showq -r

To monitor your job during the runtime you can use 'showq -r'. The column 'EFFIC' shows the current CPU efficiency of your job.

                            *******
 JOBID               S  PAR *EFFIC* XFACTOR  Q  USERNAME    GROUP            MHOST PROCS   REMAINING            STARTTIME
                            *******  
 164074              R  tor  58.25      1.1  -  xx_user    ma_ma         m04s0201    16  1:08:36:22  Mon Feb 13 20:41:13
 164077              R  tor  58.03      1.0  -  xx_user    ma_ma         m13s0501    16  4:22:11:23  Tue Feb 14 10:16:14

2.2 SSH Login To The Node

If you like to see what happens on the compute node(s), you have to login to a node where the job is running. Get the node name using:

 $ checkjob <jobid>
 ...
 Allocated Nodes:
 [m13s0501:8]  # the node name is m13s0501
 ...

Using the ssh command will bring you to the compute node. For example to node m13s0501:

 [login2 ~]$ ssh m13s0501
 [m13s0501~]$ top

Commands like 'top' shows you the most busy processes on the node. To exit 'top' type 'q'.

3 Job Feedback

After job completion you can find feedback on resource usage and job efficiency at the end of the regular output file <jobname>.o<jobid>.

The feedback section provides the following information:

  • Job parameters (job name and state, host list, time, requested and used resources)
  • Job analysis - possible problems and solutions, advice on how to increase the efficiency
  • Error messages associated with the job (if present)

Aim for a high node efficiency. A value of 100 % can be reached with well parallelized code or with starting enough serial tasks inside a job.

4 Interactive Jobs

Interactive jobs must NOT run on the logins nodes, however resources for interactive jobs can be requested using msub. The following example starts an interactive session on a compute node for 2 hours:

$ msub -I -V -l nodes=1 -l walltime=2:00:00

The option "-I" means "interactive job" and the option "-V" exports all environment variables to the compute node of the interactive session. After execution of this command wait until the queueing system has granted you the requested resources. Once granted you will be automatically logged on the allocated compute node. The full node with all cores will be available for you.

If you use applications or tools which provide a GUI, enable X-forwarding for your interactive session with:

$ msub -I -V -X -l nodes=1,walltime=2:00:00

Once the walltime limit has been reached you will be automatically logged out from the compute node.

5 Shows free resources : showbf

The showbf command can be used to find out how many nodes are available for immediate use on the system. On bwForCluster MLS&WISO (Production) the most important option is the -f flag which shows the availability of resources with a certain node feature.

Examples:

$ showbf -f best
Partition      Tasks  Nodes      Duration   StartOffset       StartDate
---------     ------  -----  ------------  ------------  --------------
ALL              144      9       4:27:18      00:00:00  10:18:11_01/06
ALL               16      1      INFINITY      00:00:00  10:18:11_01/06

The output tells you that 9 nodes are immediately available (StartOffset = 00:00:00) for a walltime up to 4 hours 27 minutes (Duration = 4:27:18). One node is available for maximum walltime (Duration = INFINITY).

$ showbf -f standard
resources not available

This output means there are no standard nodes available for immediate use.