Batch Jobs - bwUniCluster Features: Difference between revisions
Jump to navigation
Jump to search
H Winkhardt (talk | contribs) (Completely outdated and emptied for deletion) Tag: Blanking |
|||
(40 intermediate revisions by 10 users not shown) | |||
Line 1: | Line 1: | ||
This article contains information on features of the [[Batch_Jobs|batch job system]] only applicable on bwUniCluster. |
|||
= Job Submission = |
|||
== msub Command == |
|||
The bwUniCluster supports the following additional msub option(s): |
|||
{| style="width:100%; vertical-align:top; background:#f5fffa;border:1px solid #000000;padding:1px" |
|||
! colspan="3" style="background-color:#999999;padding:3px"| bwUniCluster additional msub Options |
|||
|- |
|||
! style="width:15%;height=20px; text-align:left;padding:3px"|Command line |
|||
! style="width:20%;height=20px; text-align:left;padding:3px"|Script |
|||
! style="width:65%;height=20px; text-align:left;padding:3px"|Purpose |
|||
|- style="vertical-align:top;" |
|||
| style="height=20px; text-align:left;padding:3px" | -I |
|||
| |
|||
| style="height=20px; text-align:left;padding:3px" | Declares the job is to be run interactively. |
|||
|} |
|||
<br> |
|||
<br> |
|||
=== msub -l ''resource_list'' === |
|||
No deviation or additional features to general [[Batch_Jobs|batch job]] setting. |
|||
=== msub -q ''queues'' === |
|||
Compute resources such as walltime, nodes and memory are restricted and must fit into '''queues'''. Since requested compute resources are NOT always automatically mapped to the correct queue class, you must add to your msub command the correct queue class. Details are: |
|||
{| border="1", style="width:100%; vertical-align:top; background:#f5fffa;border:1px solid #000000;padding:1px" |
|||
! colspan="5" style="background-color:#999999;padding:3px"| msub -q ''queue'' |
|||
|- style="width:10%;height=20px; text-align:left;" |
|||
! style="width:10%;padding:3px"| ''queue'' |
|||
! style="width:5%;padding:3px"| ''node'' |
|||
! style="width:15%;padding:3px"| ''default resources'' |
|||
! style="padding:3px"| ''minimum resources'' |
|||
! style="padding:3px"| ''maximum resources'' |
|||
|- style="vertical-align:top; height=20px; text-align:left" |
|||
| style="padding:3px"| develop |
|||
| style="padding:3px"| thin |
|||
| style="width:15%;padding:3px"| ''walltime''=00:10:00,''procs''=1, ''mem''=4000mb |
|||
| style="padding:3px"| ''nodes''=1 |
|||
| style="padding:3px"| ''walltime''=00:30:00,''nodes''=1:''ppn''=16 |
|||
|- style="vertical-align:top; height=20px; text-align:left" |
|||
| style="width:10%;padding:3px" | singlenode |
|||
| style="padding:3px"| thin |
|||
| style="padding:3px"| ''walltime''=00:30:01,''procs''=1, mem''=4000mb |
|||
| style="padding:3px"| ''walltime''=00:30:01,''nodes''=1 |
|||
| style="padding:3px"| ''walltime''=3:00:00:00,''nodes''=1:''ppn''=16 |
|||
|- style="vertical-align:top; height=20px; text-align:left" |
|||
| style="width:10%;vertical-align:top;height=20px; text-align:left;padding:3px" | multinode |
|||
| style="padding:3px"| thin |
|||
| style="padding:3px"| ''walltime''=00:10:00,''procs''=1, ''mem''=4000mb |
|||
| style="padding:3px"| ''nodes''=2 |
|||
| style="padding:3px"| ''walltime''=2:00:00:00,''nodes''=16:''ppn''=16 |
|||
|- style="vertical-align:top; height=20px; text-align:left" |
|||
| style="width:10%;vertical-align:top;height=20px; text-align:left;padding:3px" | verylong |
|||
| style="padding:3px"| thin |
|||
| style="padding:3px"| ''walltime''=3:00:00:01,''procs''=1, ''mem''=4000mb |
|||
| style="padding:3px"| ''walltime''=3:00:00:01,''nodes''=1 |
|||
| style="padding:3px"| ''walltime''=6:00:00:00,''nodes=1:''ppn''=16 |
|||
|- style="vertical-align:top; height=20px; text-align:left" |
|||
| style="width:10%;padding:3px" | fat |
|||
| style="padding:3px"| fat |
|||
| style="padding:3px"| ''walltime''=00:10:00,''procs''=1, ''mem''=32000mb |
|||
| style="padding:3px"| ''nodes''=1 |
|||
| style="padding:3px"| ''walltime''=3:00:00:00,''nodes''=1:''ppn''=32 |
|||
|- |
|||
|} |
|||
Default resources of a queue class defines walltime, processes and memory if not explicitly given with msub command. Resource list acronyms ''walltime'', ''procs'', ''nodes'' and ''ppn'' are described [[Batch_Jobs#msub_-l_resource_list|here]]. |
|||
==== Queue class examples ==== |
|||
* To run your batch job longer than 3 days, please use<span style="background:#edeae2;margin:10px;padding:1px;border:1px dotted #808080">$ msub -q verylong</span>. |
|||
* To run your batch job on one of the [[BwUniCluster_File_System#Components_of_bwUniCluster|fat nodes]], please use<span style="background:#edeae2;margin:10px;padding:1px;border:1px dotted #808080">$ msub -q fat</span>. |
|||
<br> |
|||
<br> |
|||
= Environment Variables for Batch Jobs = |
|||
The bwUniCluster expands the [[Batch_Jobs#Environment Variables for Batch Jobs|common set of MOAB environment variables]] by the following variable(s): |
|||
{| style="width:100%; vertical-align:top; background:#f5fffa;border:1px solid #000000;padding:1px" |
|||
! colspan="3" style="background-color:#999999;padding:3px"| bwUniCluster specific MOAB variables |
|||
|- style="width:25%;height=20px; text-align:left;padding:3px" |
|||
! style="width:20%;height=20px; text-align:left;padding:3px"| Environment variables |
|||
! style="height=20px; text-align:left;padding:3px"| Description |
|||
|- |
|||
| style="width:20%;height=20px; text-align:left;padding:3px" | MOAB_SUBMITDIR |
|||
| style="height=20px; text-align:left;padding:3px"| Directory of job submission |
|||
|} |
|||
<br> |
|||
Since the work load manager MOAB on [[bwUniCluster]] uses the resource manager SLURM, the following environment variables of SLURM are added to your environment once your job has started: |
|||
{| style="width:100%; vertical-align:top; background:#f5fffa;border:1px solid #000000;padding:1px" |
|||
! colspan="3" style="background-color:#999999;padding:3px"| SLURM variables |
|||
|- style="width:25%;height=20px; text-align:left;padding:3px" |
|||
! style="width:20%;height=20px; text-align:left;padding:3px"| Environment variables |
|||
! style="height=20px; text-align:left;padding:3px"| Description |
|||
|- |
|||
| style="width:20%;height=20px; text-align:left;padding:3px" | SLURM_JOB_CPUS_PER_NODE |
|||
| style="height=20px; text-align:left;padding:3px"| Number of processes per node dedicated to the job |
|||
|- |
|||
| style="width:20%;height=20px; text-align:left;padding:3px" | SLURM_JOB_NODELIST |
|||
| style="height=20px; text-align:left;padding:3px"| List of nodes dedicated to the job |
|||
|- |
|||
| style="width:20%;height=20px; text-align:left;padding:3px" | SLURM_JOB_NUM_NODES |
|||
| style="height=20px; text-align:left;padding:3px"| Number of nodes dedicated to the job |
|||
|- |
|||
| style="width:20%;height=20px; text-align:left;padding:3px" | SLURM_MEM_PER_NODE |
|||
| style="height=20px; text-align:left;padding:3px"| Memory per node dedicated to the job |
|||
|- |
|||
| style="width:20%;height=20px; text-align:left;padding:3px" | SLURM_NPROCS |
|||
| style="height=20px; text-align:left;padding:3px"| Total number of processes dedicated to the job |
|||
|} |
|||
<br> |
|||
<br> |
|||
== Node Monitoring == |
|||
By default nodes are not used exclusive anless they are requested with ''-l naccesspolicy=singlejob'' as described [[Batch_Jobs#msub_-l_resource_list|here]]. <br> |
|||
If a Job runs exclusive on one node you may do a ssh login to that node. To get the nodes of your job need to read the environment variable SLURM_JOB_NODELIST, e.g. |
|||
echo $SLURM_JOB_NODELIST > nodelist |
|||
= Interactive Jobs = |
|||
Interactive jobs on bwUniCluster [[BwUniCluster_User_Access#Allowed_activities_on_login_nodes|must '''NOT''' run on the logins nodes]], however resources for interactive jobs can be requested using msub. Considering a serial application with a graphical frontend that requires 5000 MByte of memory and limiting the interactive run to 2 hours execute the following: |
|||
<pre> |
|||
$ msub -I -V -l nodes=1:ppn=1 -l walltime=0:02:00:00 |
|||
</pre> |
|||
The option -V defines that all environment variables are exported to the compute node of the interactive session. |
|||
After execution of this command '''DO NOT CLOSE''' your current terminal session but wait until the queueing system MOAB has granted you the requested resources on the compute system. Once granted you will be automatically logged on the dedicated resource. Now you have an interactive session with 1 core and 5000 MByte of memory on the compute system for 2 hours. Simply execute now your application: |
|||
<pre> |
|||
$ cd to_path |
|||
$ ./application |
|||
</pre> |
|||
Note that, once the walltime limit has been reached you will be automatically logged out of the compute system. |
|||
<br> |
|||
<br> |
|||
---- |
|||
[[Category:bwUniCluster|Batch Jobs - bwUniCluster features]] |