Difference between revisions of "Batch Jobs Moab"

From bwHPC Wiki
Jump to: navigation, search
(List of submitted jobs - showq)
m (Blanked the page)
(Tag: Blanking)
 
(154 intermediate revisions by 14 users not shown)
Line 1: Line 1:
Any kind of calculation on the compute nodes of a [[HPC_infrastructure_of_Baden_Wuerttemberg|bwHPC cluster of tier 2 or 3]] requires the user to define calculations as a sequence of commands or single command together with required run time, number of CPU cores and main memory and submit all, i.e., the '''batch job''', to a resource and workload managing software. All bwHPC cluster of tier 2 and 3, including have installed the workload managing software MOAB. Therefore any job submission by the user is to be executed by commands of the MOAB software. MOAB queues and runs user jobs based on fair sharing policies.
 
 
This page only describes options and commands that can be used on all bwHPC clusters. Options specific to a single cluster are described in the following separate articles:
 
 
* [[Batch Jobs - bwUniCluster Features]]
 
* [[Batch Jobs - bwForCluster Chemistry Features]]
 
* [[Batch Jobs - ForHLR Features]]
 
 
 
 
 
Overview of:
 
{| style="width:100%; vertical-align:top; background:#f5fffa;border:2px solid #000000;"
 
! MOAB commands !! Brief explanation
 
|-
 
| msub || submits a job and queues it in an input queue
 
|-
 
| checkjob || displays detailed job state information
 
|-
 
| showq || displays information about active, eligible, blocked, and/or recently completed jobs
 
|-
 
| showbf || shows what resources are available for immediate use
 
|-
 
| showstart || returns start time of submitted job or requested resources
 
|-
 
| canceljob || cancels a job
 
|}
 
 
 
 
= Job Submission =
 
 
Batch jobs are submitted by using the command '''msub'''. The main purpose of the '''msub''' command is to specify the resources that are needed to run the job. '''msub''' will then queue the batch job. However, starting of batch job depends on availability of the requested resources and the fair sharing value.
 
<!--into the input queue. The jobs are organized into different job classes. For each job class there are specific limits for the available resources (number of nodes, number of CPUs, maximum CPU time, maximum memory etc.). -->
 
<br>
 
 
== msub Command ==
 
 
The syntax and use of '''msub''' can be displayed via:
 
<pre>
 
$ man msub
 
</pre>
 
 
'''msub''' options can be used from the command line or in your job script.
 
 
 
{| style="width:100%; vertical-align:top; background:#f5fffa;border:1px solid #000000;padding:1px"
 
! colspan="3" style="background-color:#999999;padding:3px"| msub Options
 
|-
 
! style="width:15%;height=20px; text-align:left;padding:3px"|Command line
 
! style="width:20%;height=20px; text-align:left;padding:3px"|Script
 
! style="width:65%;height=20px; text-align:left;padding:3px"|Purpose
 
|- style="vertical-align:top;"
 
| style="height=20px; text-align:left;padding:3px" | -l ''resources''
 
| style="height=20px; text-align:left;padding:3px" | #MSUB -l ''resources''
 
| style="height=20px; text-align:left;padding:3px" | Defines the resources that are required by the job. See the description below for this important flag.
 
|- style="vertical-align:top;"
 
| style="height=20px; text-align:left;padding:3px" | -N ''name''
 
| style="height=20px; text-align:left;padding:3px" | #MSUB -N ''name''
 
| style="height=20px; text-align:left;padding:3px" | Gives a user specified name to the job.
 
|-
 
|- style="vertical-align:top;"
 
| style="height=20px; text-align:left;padding:3px" | -o ''filename''
 
| style="height=20px; text-align:left;padding:3px" | #MSUB -o ''filename''
 
| style="height=20px; text-align:left;padding:3px" | Defines the filename to be used for the standard output stream of the batch job. By default the file with defined filename is placed under your job submit directory. To place under a different location, expand ''filename'' by the relative or absolute path of destination.
 
|- style="vertical-align:top;"
 
| style="height=20px; text-align:left;padding:3px" | -q ''queue''
 
| style="height=20px; text-align:left;padding:3px" | #MSUB -q ''queue''
 
| style="height=20px; text-align:left;padding:3px" | Defines the queue class
 
|-
 
|- style="vertical-align:top;"
 
| style="height=20px; text-align:left;padding:3px" | -v ''variable=arg''
 
| style="height=20px; text-align:left;padding:3px" | #MSUB -v ''variable=arg''
 
| style="height=20px; text-align:left;padding:3px" | Expands the list of environment variables that are exported to the job
 
|-
 
|- style="vertical-align:top;"
 
| style="height=20px; text-align:left;padding:3px" | -S ''Shell''
 
| style="height=20px; text-align:left;padding:3px" | #MSUB -S ''Shell''
 
| style="height=20px; text-align:left;padding:3px" | Declares the shell (state path+name, e.g. /bin/bash) that interprets the job script
 
|-
 
<!--
 
| -V
 
| #MSUB -V
 
| Declares that all environment variables in the msub environment are exported to the batch job.
 
|-
 
-->
 
|}
 
<br>
 
For cluster specific msub options, read:
 
* [[Batch_Jobs_-_bwUniCluster_Features#msub Command|bwUniCluster msub options]]
 
 
 
=== msub -l ''resource_list'' ===
 
The '''-l''' option is one of the most important msub options. It is used to specify a number of resource requirements for your job. Multiple resource strings are separated by commas.
 
 
<!--{| style="border-style: solid; border-width: 1px; padding=5px;" border="1"-->
 
{| style="width:100%; vertical-align:top; background:#f5fffa;border:1px solid #000000;padding:1px"
 
! colspan="3" style="background-color:#999999;padding:3px"| msub -l ''resource_list''
 
|- style="width:20%;height=20px; text-align:left;padding:3px"
 
! style="width:20%;height=20px; text-align:left;padding:3px"| resource
 
! style="height=20px; text-align:left;padding:3px"| Purpose
 
|-
 
<!-- temporarily removed
 
| style="width:20%;height=20px; text-align:left;padding:3px" | -l procs=8
 
| style="height=20px; text-align:left;padding:3px"| Number of processes, distribution over nodes will be done by MOAB
 
|- -->
 
| style="width:20%;height=20px; text-align:left;padding:3px" | -l nodes=2:ppn=16
 
| style="height=20px; text-align:left;padding:3px"| Number of nodes and number of processes per node
 
|-
 
| style="width:20%;height=20px; text-align:left;padding:3px" | -l walltime=600 <br> -l walltime=01:30:00
 
| style="height=20px; text-align:left;padding:3px"| Wall-clock time. Default units are seconds. <br> HH:MM:SS format is also accepted.
 
<!--
 
|-
 
| style="width:20%;height=20px; text-align:left;padding:3px" | -l feature=tree <br> -l feature=blocking <br> -l feature=fat
 
| style="height=20px; text-align:left;padding:3px" | For jobs that span over several nodes <br> For sequential jobs <br> For jobs that require up to 1 TB memory-->
 
|- style="vertical-align:top;"
 
| style="width:20%;height=40px; text-align:left;padding:3px;" | -l pmem=1000mb
 
| <div style="padding:3px;">Maximum amount of physical memory used by any single process of the job. <br>Allowed units are kb, mb, gb. Be aware that '''processes''' are either ''MPI tasks'' if running MPI parallel jobs or ''threads'' if running multithreaded jobs.</div>
 
|- style="vertical-align:top;"
 
| style="width:20%;height=40px; text-align:left;padding:3px;" | -l mem=1000mb
 
| <div style="padding:3px;">Maximum amount of physical memory used by the job.<br>Allowed units are kb, mb, gb. Be aware that this memory value is the accumulated memory for all ''MPI tasks'' or all ''threads'' of the job.</div>
 
|-
 
|- style="vertical-align:top;"
 
| style="width:20%;height=40px; text-align:left;padding:3px;" | -l advres=''res_name''
 
| <div style="padding:3px;">Specifies the reservation "res_name" required to run the job.</div>
 
|-
 
|- style="vertical-align:top;"
 
| style="width:20%;height=40px; text-align:left;padding:3px;" | -l naccesspolicy=''policy''
 
| <div style="padding:3px;">Specifies how node resources should be accessed, e.g. ''-l naccesspolicy=singlejob'' reserves all requested nodes for the job exclusively. Attention, if you request ''nodes=1:ppn=4'' together with ''singlejob'' you will be charged for the maximum cores of the node.</div>
 
|}
 
<br>
 
Note that all compute nodes do not have SWAP space, thus <span style="color:red;font-size:105%;">DO NOT specify '-l vmem' or '-l pvmem'</span> or your jobs will not start.
 
<br>
 
<br>
 
 
=== msub -q ''queues'' ===
 
Queue classes define maximum resources such as walltime, nodes and processes per node and partition of the compute system. Note that queue settings of the bwHPC cluster are not '''identical''', but differ due to their different prerequisites, such as HPC performance, scalability and throughput levels. Details can be found here:
 
* [[Batch_Jobs_-_bwUniCluster_Features#msub_-q_queues|bwUniCluster queue settings]]
 
* [[Batch_Jobs_-_ForHLR_Features#msub_-q_queues|ForHLR queue settings]]
 
<br>
 
<br>
 
 
= Environment Variables for Batch Jobs =
 
Once an eligible compute jobs starts on the compute system, MOAB adds the following variables to the job's environment:
 
{| style="width:100%; vertical-align:top; background:#f5fffa;border:1px solid #000000;padding:1px"
 
! colspan="3" style="background-color:#999999;padding:3px"| MOAB variables
 
|- style="width:25%;height=20px; text-align:left;padding:3px"
 
! style="width:20%;height=20px; text-align:left;padding:3px"| Environment variables
 
! style="height=20px; text-align:left;padding:3px"| Description
 
|-
 
| style="width:20%;height=20px; text-align:left;padding:3px" | MOAB_CLASS
 
| style="height=20px; text-align:left;padding:3px"| Class name
 
|-
 
| style="width:20%;height=20px; text-align:left;padding:3px" | MOAB_GROUP
 
| style="height=20px; text-align:left;padding:3px"| Group name
 
|-
 
| style="width:20%;height=20px; text-align:left;padding:3px" | MOAB_JOBID
 
| style="height=20px; text-align:left;padding:3px"| Job ID
 
|-
 
| style="width:20%;height=20px; text-align:left;padding:3px" | MOAB_JOBNAME
 
| style="height=20px; text-align:left;padding:3px"| Job name
 
|-
 
| style="width:20%;height=20px; text-align:left;padding:3px" | MOAB_NODECOUNT
 
| style="height=20px; text-align:left;padding:3px"| Number of nodes allocated to job
 
|-
 
| style="width:20%;height=20px; text-align:left;padding:3px" | MOAB_PARTITION
 
| style="height=20px; text-align:left;padding:3px"| Partition name the job is running in
 
|-
 
| style="width:20%;height=20px; text-align:left;padding:3px" | MOAB_PROCCOUNT
 
| style="height=20px; text-align:left;padding:3px"| Number of processors allocated to job
 
|-
 
| style="width:20%;height=20px; text-align:left;padding:3px" | MOAB_SUBMITDIR
 
| style="height=20px; text-align:left;padding:3px"| Directory of job submission
 
|-
 
| style="width:20%;height=20px; text-align:left;padding:3px" | MOAB_USER
 
| style="height=20px; text-align:left;padding:3px"| User name
 
|}
 
<br>
 
MOAB environment variables can be used to generalize your job scripts, compare [[Batch_Jobs#msub_Examples|msub examples]].
 
<br>
 
<br>
 
 
== bwHPC Cluster specific environment variables ==
 
 
Note that bwHPC Cluster may use different resource managers and additional environment variables. Details can be found here:
 
* [[Batch_Jobs_-_bwUniCluster_Features|bwUniCluster batch job variables]]
 
* [[Batch_Jobs_-_ForHLR_Features|ForHLR batch job variables]]
 
<br>
 
<br>
 
 
= Interactive Jobs =
 
The policy on interactive batch jobs can be found here:
 
* [[Batch_Jobs_-_bwUniCluster_Features|bwUniCluster interactive jobs]]
 
<br>
 
<br>
 
 
= msub Examples =
 
== Serial Programs ==
 
To submit a serial job that runs the script '''job.sh''' and that requires 5000 MB of main memory and 3 hours of wall clock time
 
 
a) execute:
 
<pre>
 
$ msub -N test -l nodes=1:ppn=1,walltime=3:00:00,pmem=5000mb job.sh
 
</pre>
 
or
 
 
b) add after the initial line of your script '''job.sh''' the lines:
 
{| style="width: 100%; border:1px solid #d0cfcc; background:#f2f7ff;border-spacing: 2px;"
 
| style="width:280px; white-space:nowrap; color:#000;" |
 
<source lang="bash">
 
#MSUB -l nodes=1:ppn=1
 
#MSUB -l walltime=3:00:00
 
#MSUB -l pmem=5000mb
 
#MSUB -N test
 
</source>
 
|}
 
and execute the modified script without any msub command line options:
 
<pre>
 
$ msub job.sh
 
</pre>
 
 
Note, that msub command line options overrule script options.
 
<br>
 
 
 
== Handling job script options and arguments ==
 
Job script options and arguments as followed:
 
<pre>
 
$ ./job.sh -n 10
 
</pre>
 
can not be passed while using msub command since those will be interpreted as command line options of msub.
 
 
 
'''Solution A:'''
 
 
Submit a wrapper script, e.g. wrapper.sh:
 
<pre>
 
$ msub wrapper.sh
 
</pre>
 
which simply contains all options and arguments of job.sh. The script wrapper.sh would at least contain the following lines:
 
{{bwFrameA|
 
<source lang="bash">
 
#!/bin/bash
 
./job.sh -n 10
 
</source>
 
}}
 
 
 
 
'''Solution B:'''
 
 
Add after the header of your '''BASH''' script job.sh the following lines:
 
{| style="width: 100%; border:1px solid #d0cfcc; background:#f2f7ff;border-spacing: 2px;"
 
| style="width:280px; white-space:nowrap; color:#000;" |
 
<source lang="bash">
 
## check if $SCRIPT_FLAGS is "set"
 
if [ -n "${SCRIPT_FLAGS}" ] ; then
 
## but if positional parameters are already present
 
## we are going to ignore $SCRIPT_FLAGS
 
if [ -z "${*}" ] ; then
 
set -- ${SCRIPT_FLAGS}
 
fi
 
fi
 
</source>
 
|}
 
 
These lines modify your BASH script to read options and arguments from the environment variable $SCRIPT_FLAGS. Now submit your script job.sh as followed:
 
<pre>
 
$ msub -v SCRIPT_FLAGS='-n 10' job.sh
 
</pre>
 
 
<!--For advanced users: [[generalized version of solution B]] if job script arguments contain whitespaces.
 
<br>-->
 
 
== Multithreaded Programs ==
 
Multithreaded programs operate faster than serial programs on CPUs with multiple cores. Moreover, multiple threads of one process share resources such as memory.
 
 
For multithreaded programs based on '''Open''' '''M'''ulti-'''P'''rocessing (OpenMP) number of threads are defined by the environment variable OMP_NUM_THREADS. By default this variable is set to 1 (OMP_NUM_THREADS=1).
 
 
To submit a batch job called ''test'' that runs a fourfold threaded program ''omp_program'' which requires 6000 MByte of total physical memory and total wall clock time of 3 hours:
 
 
<!-- 2014-01-29, at the moment submission of executables does not work, SLURM has to be instructed to generate a wrapper
 
a) execute:
 
<pre>
 
$ msub -v OMP_NUM_THREADS=4 -N test -l nodes=1:ppn=4,walltime=3:00:00,mem=6000mb omp_program
 
</pre>
 
 
or
 
-->
 
<!--b)-->
 
* generate the script '''job_omp.sh''' containing the following the lines:
 
{| style="width: 100%; border:1px solid #d0cfcc; background:#f2f7ff;border-spacing: 5px;"
 
| style="width:280px; white-space:nowrap; color:#000;" |
 
<source lang="bash">
 
#!/bin/bash
 
#MSUB -l nodes=1:ppn=4
 
#MSUB -l walltime=3:00:00
 
#MSUB -l mem=6000mb
 
#MSUB -N test
 
 
module load <placeholder>
 
export OMP_NUM_THREADS=${MOAB_PROCCOUNT}
 
./omp_program
 
</source>
 
|}
 
and, if necessary, replace <placeholder> with the required modulefile to enable the openMP environment and execute the script '''job_omp.sh''' without any msub command line options:
 
<pre>
 
$ msub job_omp.sh
 
</pre>
 
<br>
 
Note, that msub command line options overrule script options, e.g.,
 
<pre>
 
$ msub -l mem=2000mb job_omp.sh
 
</pre>
 
overwrites the script setting of 6000 MByte with 2000 MByte.
 
<br>
 
<br>
 
 
== MPI parallel Programs ==
 
MPI parallel programs run faster than serial programs on multi CPU and multi core systems. N-fold spawned processes of the MPI program, i.e., '''MPI tasks''', run simultaneously and communicate via the Message Passing Interface (MPI) paradigm. MPI tasks do not share memory but can be spawned over different nodes.
 
 
Multiple MPI tasks can not be launched by the MPI parallel program itself but via '''mpirun''', e.g. 4 MPI tasks of ''my_par_program'':
 
<pre>
 
$ mpirun -n 4 my_par_program
 
</pre>
 
<br>
 
However, this given command can '''not''' be directly included in your '''msub''' command for submitting as a batch job to the compute cluster, [[BwUniCluster_Batch_Jobs#Handling_job_script_options_and_arguments|see above]].
 
 
Generate a wrapper script ''job_ompi.sh'' for '''OpenMPI''' containing the following lines:
 
{| style="width: 100%; border:1px solid #d0cfcc; background:#f2f7ff;border-spacing: 2px;"
 
| style="width:280px; white-space:nowrap; color:#000;" |
 
<source lang="bash">
 
#!/bin/bash
 
module load mpi/openmpi/<placeholder_for_version>
 
mpirun -bind-to-core -bycore -report-bindings my_par_program
 
</source>
 
|}
 
'''Attention:''' Do '''NOT''' add mpirun options ''-n <number_of_processes>'' or any other option defining processes or nodes, since MOAB instructs mpirun about number of processes and node hostnames. Use '''ALWAYS''' the MPI options '''''-bind-to-core''''' and '''''-bycore|-bysocket|-bynode'''''.
 
 
Generate a wrapper script for '''Intel MPI''', ''job_impi.sh'' containing the following lines:
 
{| style="width: 100%; border:1px solid #d0cfcc; background:#f2f7ff;border-spacing: 2px;"
 
| style="width:280px; white-space:nowrap; color:#000;" |
 
<source lang="bash">
 
#!/bin/bash
 
module load mpi/impi/<placeholder_for_version>
 
mpiexec.hydra -bootstrap slurm my_par_program
 
</source>
 
|}
 
'''Attention:''' Do '''NOT''' add mpirun options ''-n <number_of_processes>'' or any other option defining processes or nodes, since MOAB instructs mpirun about number of processes and node hostnames.
 
Moreover, replace <placeholder_for_version> with the wished version of '''Intel MPI''' to enable the MPI environment.
 
<br>
 
 
Considering 4 OpenMPI tasks on a single node, each requiring 1000 MByte, and running for 1 hour, execute:
 
<pre>
 
$ msub -l nodes=1:ppn=4,pmem=1000mb,walltime=01:00:00 job_ompi.sh
 
</pre>
 
<br>
 
 
Launching and running 32 Intel MPI tasks on 4 nodes, each requiring 1000 MByte, and running for 5 hours, execute:
 
<pre>
 
$ msub -l nodes=4:ppn=16,pmem=1000mb,walltime=05:00:00 job_impi.sh
 
</pre>
 
<br>
 
 
== Multithreaded + MPI parallel Programs ==
 
Multithreaded + MPI parallel programs operate faster than serial programs on multi CPUs with multiple cores. All threads of one process share resources such as memory. On the contrary MPI tasks do not share memory but can be spawned over different nodes.
 
 
Multiple MPI tasks must be launched by the MPI parallel program '''mpirun''' and '''mpiexec.hydra''' respectively. For multithreaded programs based on '''Open''' '''M'''ulti-'''P'''rocessing (OpenMP) number of threads are defined by the environment variable OMP_NUM_THREADS. By default this variable is set to 1 (OMP_NUM_THREADS=1).
 
 
'''For OpenMPI''' a job-script to submit a batch job called ''job_ompi_omp.sh'' that runs a MPI program with 4 tasks and an fourfold threaded program ''ompi_omp_program'' requiring 7000 MByte of physical memory per process/thread (using 4 threads per MPI task you will get 4*7000 MByte = 28000 MByte per MPI task) and total wall clock time of 3 hours looks like:
 
 
<!--b)-->
 
{| style="width: 100%; border:1px solid #d0cfcc; background:#f2f7ff;border-spacing: 5px;"
 
| style="width:280px; white-space:nowrap; color:#000;" |
 
<source lang="bash">
 
#!/bin/bash
 
#MSUB -l nodes=2:ppn=16
 
#MSUB -l walltime=03:00:00
 
#MSUB -l pmem=7000mb
 
#MSUB -v MPI_MODULE=mpi/ompi
 
#MSUB -v OMP_NUM_THREADS=4
 
#MSUB -v MPIRUN_OPTIONS="-bind-to-core -bynode -cpus-per-proc 4 -report-bindings"
 
#MSUB -v EXECUTABLE=./ompi_omp_program
 
#MSUB -N test_ompi_omp
 
 
module load ${MPI_MODULE}
 
TASK_COUNT=$((${MOAB_PROCCOUNT}/${OMP_NUM_THREADS}))
 
echo "${EXECUTABLE} running on ${MOAB_PROCCOUNT} cores with ${TASK_COUNT} MPI-tasks and ${OMP_NUM_THREADS} threads"
 
startexe="mpirun -n ${TASK_COUNT} ${MPIRUN_OPTIONS} ${EXECUTABLE}"
 
echo $startexe
 
exec $startexe
 
</source>
 
|}
 
 
Execute the script '''job_ompi_omp.sh''' without any msub command line options:
 
<pre>
 
$ msub job_ompi_omp.sh
 
</pre>
 
<br>
 
With the mpirun option ''-bind-to-core'' MPI tasks and OpenMP threads are bound to physical cores. With the option ''-bynode'' '''that must be set''' (neighbored) MPI tasks will be attached to different nodes and the value of the option ''-cpus-per-proc <value>'' must be set to ${OMP_NUM_THREADS}. The option ''-report-bindings'' shows the bindings between MPI tasks and physical cores.
 
 
The option -bysocket does not work!!! The mpirun-options '''-bind-to-core''', '''-bynode''' and '''-cpus-per-proc''' should always be used when running a multithreaded MPI program else your multithreaded MPI program will run only on 1 node.
 
 
'''Intel MPI should not be used up to now!!!'''
 
'''For Intel MPI''' a job-script to submit a batch job called ''job_impi_omp.sh'' that runs a Intel MPI program with 8 tasks and a eightfold threaded program ''impi_omp_program'' requiring 64000 MByte of total physical memory and total wall clock time of 6 hours looks like:
 
 
<!--b)-->
 
{| style="width: 100%; border:1px solid #d0cfcc; background:#f2f7ff;border-spacing: 5px;"
 
| style="width:280px; white-space:nowrap; color:#000;" |
 
<source lang="bash">
 
#!/bin/bash
 
#MSUB -l nodes=4:ppn=16
 
#MSUB -l walltime=06:00:00
 
#MSUB -l pmem=64000mb
 
#MSUB -v MPI_MODULE=mpi/impi
 
#MSUB -v OMP_NUM_THREADS=8
 
#MSUB -v MPIRUN_OPTIONS="-print-rank-map -env I_MPI_PIN_DOMAIN socket"
 
#MSUB -v EXE=./impi_omp_program
 
#MSUB -N test_impi_omp
 
 
module load ${MPI_MODULE}
 
TASK_COUNT=$((${MOAB_PROCCOUNT}/${OMP_NUM_THREADS}))
 
echo "${EXE} running on ${MOAB_PROCCOUNT} cores with ${TASK_COUNT} MPI-tasks and ${OMP_NUM_THREADS} threads"
 
startexe="mpiexec.hydra -bootstrap slurm ${MPIRUN_OPTIONS} -n ${TASK_COUNT} ${EXE}"
 
echo $startexe
 
exec $startexe
 
</source>
 
|}
 
<br>
 
Execute the script '''job_impi_omp.sh''' without any msub command line options:
 
<pre>
 
$ msub job_impi_omp.sh
 
</pre>
 
<br>
 
The mpirun option ''-print-rank-map'' shows the bindings between MPI tasks and nodes (not very useful). With the environment variable I_MPI_PIN_DOMAIN the binding, which is always switched on, between MPI tasks and physical cores can be controlled. Choosing ''socket'' as value for the environment variable means that (neighbored) MPI tasks run on different sockets. Other values like ''node'' and ''cache'' are possible. Choosing ''node'' as value means that (neighbored) MPI tasks run on different nodes. Choosing ''cache'' as value means that (neighbored) MPI tasks run on cores that don't access a common L3-cache.
 
<br>
 
<br>
 
 
== Chain jobs ==
 
 
A job chain is a sequence of jobs where each job automatically starts its successor. Chain Job handling differs on the bwHPC Clusters. See the cluster-specific pages
 
 
* [[Batch Jobs - bwUniCluster Features]]
 
* [[Batch Jobs - bwForCluster Chemistry Features]]
 
<!--* [[Batch Jobs - ForHLR Features]]-->
 
 
= Status of batch system/jobs =
 
== Start time of job or resources - showstart ==
 
The following command can be used by any user to displays the estimated start time of a job based a number of analysis types based on historical usage, earliest available reservable resources, and priority based backlog. To show estimated start time of job <job_ID> enter:
 
<pre>
 
$ showstart -e all <job_ID>
 
</pre>
 
<br>
 
Furthermore start time of resource demands, e.g. 16 processes @ 12 h, can be displayed via:
 
<pre>
 
$ showstart -e all 16@12:00:00
 
</pre>
 
<br>
 
For further options of ''showstart'' read the manpage of ''showstart'':
 
<pre>
 
$ man showstart
 
</pre>
 
<br>
 
 
== List of your submitted jobs - showq ==
 
The following command displays information about your active, eligible, blocked, and/or recently completed jobs:
 
<pre>
 
$ showq
 
</pre>
 
The summary of your active jobs shows how many jobs of yours are running, how many processors are in use by your jobs and how many nodes are in use by '''all''' active jobs.
 
<br>
 
<br>
 
For further options of ''showq'' read the manpage of ''showq'':
 
<pre>
 
$ man showq
 
</pre>
 
<br>
 
 
== Shows free resources - showbf ==
 
The following command displays what resources are available for immediate use for the whole
 
partition - for queue "singlenode" - for queue "multinode" - for queue "fat":
 
<pre>
 
$ showbf
 
$ showbf -c singlenode
 
$ showbf -c multinode
 
$ showbf -c fat
 
</pre>
 
<br>
 
For further options of ''showbf'' read the manpage of ''showbf'':
 
<pre>
 
$ man showbf
 
</pre>
 
<br>
 
 
== Detailed job information - checkjob ==
 
''checkjob <jobID>'' displays detailed job state information and diagnostic output for the (finished) job of ''<jobID>'':
 
<pre>
 
$ checkjob <jobID>
 
</pre>
 
<br>
 
The returned output for finished job ID uc1.000000 reads:
 
{{bwFrameA|
 
<source lang="bash">
 
job uc1.000000
 
 
AName: test.sh
 
State: Completed
 
Completion Code: 0 Time: Thu Jul 31 16:03:32
 
Creds: user:XXXX group:YYY account:ZZZ class:develop
 
WallTime: 00:01:06 of 00:10:00
 
SubmitTime: Thu Jul 31 16:02:18
 
(Time Queued Total: 00:00:08 Eligible: 00:03:41)
 
 
TemplateSets: DEFAULT
 
NodeMatchPolicy: EXACTNODE
 
Total Requested Tasks: 1
 
 
Req[0] TaskCount: 1 Partition: uc1
 
Memory >= 4000M Disk >= 0 Swap >= 0
 
Dedicated Resources Per Task: PROCS: 1 MEM: 4000M
 
NodeSet=ONEOF:FEATURE:[NONE]
 
 
Allocated Nodes:
 
[uc1n459:1]
 
 
 
SystemID: uc1
 
SystemJID: uc1.000000
 
 
IWD: /pfs/data1/home/ZZZ/YYY/XXX/bwUniCluster
 
SubmitDir: /pfs/data1/home/ZZZ/YYY/XXX/bwUniCluster
 
Executable: /opt/moab/spool/moab.job.jCLed6
 
 
StartCount: 1
 
Execution Partition: uc1
 
Flags: GLOBALQUEUE
 
StartPriority: 5321
 
</source>
 
}}
 
 
For further options of ''checkjob'' read the manpage of ''checkjob'':
 
<pre>
 
$ man checkjob
 
</pre>
 
<br>
 
 
= Job management =
 
== Canceling own jobs ==
 
''canceljob <jobID>'' cancels the own job with ''<jobID>''.
 
<pre>
 
$ canceljob <jobID>
 
</pre>
 
<br>
 
Note that only own jobs can be cancelled. The command:
 
<pre>
 
$ mjobctl -c <jobID>
 
</pre>
 
has the same effect as ''canceljob <jobID>''.
 
<br>
 
<br>
 
 
 
----
 
[[Category:bwUniCluster|Batch Jobs - General Features]][[Category:ForHLR Phase I|Batch Jobs]]
 

Latest revision as of 10:04, 15 August 2023