Difference between revisions of "Batch Jobs - bwForCluster Chemistry Features"

From bwHPC Wiki
Jump to: navigation, search
(Job arrays)
Line 258: Line 258:
 
It is possible to use additional options to"msub" to describe the parameters of each job in an array (e.g. each sub-job has a walltime of 30 minutes and uses 2 nodes with 1 virtual processor)
 
It is possible to use additional options to"msub" to describe the parameters of each job in an array (e.g. each sub-job has a walltime of 30 minutes and uses 2 nodes with 1 virtual processor)
 
<br><br><code>msub -l walltime=00:30:00,nodes=2:ppn=1 -t [<jobname>]<indexlist>[%<limit>] jobarray.sh</code><br><br>
 
<br><br><code>msub -l walltime=00:30:00,nodes=2:ppn=1 -t [<jobname>]<indexlist>[%<limit>] jobarray.sh</code><br><br>
The parameter <indexlist> specifies the amount and order of the submitted sub-jobs. For example user wants to submit 10 jobs using 2 msub-commands, one to submit the five even-numbered jobs (job1) and one to submit the five odd-numbered jobs (job2). The commands should be:
+
The parameter <indexlist> specifies the amount and order of the submitted sub-jobs. For example user wants to submit 10 jobs using 2 msub-commands, one to submit the five odd-numbered jobs (job1) and one to submit the five even-numbered jobs (job2). The commands should be:
 
<br><br><code>msub -t job1.[1-10:2] jobarray.sh</code><br>
 
<br><br><code>msub -t job1.[1-10:2] jobarray.sh</code><br>
<br><br><code>msub -t job1.[2-10:2] jobarray.sh</code><br><br>
+
<br><br><code>msub -t job2.[2-10:2] jobarray.sh</code><br><br>
 
To specify that only a certain number of sub-jobs in the array can run at a time, use the percent sign (%) delimiter (e.g. %2):
 
To specify that only a certain number of sub-jobs in the array can run at a time, use the percent sign (%) delimiter (e.g. %2):
 
<br><br><code>msub -t job.[1-10]%2 jobarray.sh</code><br><br>
 
<br><br><code>msub -t job.[1-10]%2 jobarray.sh</code><br><br>

Revision as of 10:14, 4 August 2016

This article contains information on features of the batch job system only applicable on the bwForCluster for computational and theoretical Chemistry "Justus".

1 Job submission on bwForCluster for Chemistry

This page describes the details of the queuing system specific to the bwForCluster Chemistry.

A general description on options that should work on all bwHPC clusters can be found on the

page.
Currently the number of physical cores is 16 on all nodes. Do not request more than 16 processes per node with the ppn flag.
If the requested ppn count exceeds this limit, the job will remain in idle state, i.e. it will not start running.

Jobs of a user run node-exclusive. That means several jobs from one user can run concurrently on one node, but no job from any other user can run on the same node at the same time.

Users have ssh-access to the nodes on which their jobs run.

1.1 Disk Space and Resources

Disk space is only available on some of the nodes. It has to be requested in the Moab options or the job will run on a diskless node.

ATTENTION

  • disk space content will be erased when job is finished.
  • scratch - disk space allocated per one process (ppn), must be set in gigabytes (GB)

    $ msub -l gres=scratch:8 myjobscript.sh

  • "gres" is a Moab term for "generic resources";
  • "scratch" - name of the resource for disk space
  • "8" - size of disk space in gigabytes (GB)

Scratch and available resources:

Nodes count ppn MAX Disk Space (scratch) RAM-Disk Space Virtual Memory (RAM)
202 16 no scratch, only RAM-disk up to 64GB up to 125GB (~128GB**)
204 16 960GB (~1TB*) no RAM-disk 125GB (~128GB**)
22 16 1920GB (~2TB*) no RAM-disk 251GB (~256GB**)
16 16 1920GB (~2TB*) no RAM-disk 503GB (~512GB**)


ATTENTION

  • scratch: configured MAX disk space resources (scratch) are 960 and 1920 (GB).
  • virtual memory, RAM: configured MAX virtual memory (RAM) values are 125GB, 251GB and 503GB.

"RAM-disk" means, that part of virtual memory (RAM) can be used for some temporary jobs files. Size of RAM-disk grows up automatically up to 50% of the RAM size. The rest RAM can be used as a traditional virtual memory.
The disk (or RAM-disk) can be accessed via the variable $TMPDIR. It points to a file:

  • for nodes with disk space

    /scratch/<username>_job_<jobid>

  • for diskless nodes

    /ramdisk/<username>_job_<jobid>

    where <username> and <jobid> - valid for the current job and user values.


"Scratch" - is the disk space per process (ppn). When a job needs 100GB of disk space and uses 4 processes, users have to describe "scratch" as 100/4=25 (GB).
This example requests 100GB (4x25GB) of disk space:

$ msub -l nodes=1:ppn=4,gres=scratch:25 <jobscript>

1.2 Default Values

The default parameters of each job are:

  • walltime=48:00:00 - MAX run-time of job
  • nodes=1:ppn=1 - one node with one process

1.3 Queues

There is no need to explicitly specify a queue. Jobs will automatically be assigned to a queue depending on the resources they request.
Compute resources such as walltime and nodes are restricted and must fit into the allowed resources of at least one of the queues for job to start. The available queues are:

Queue name Walltime MIN Walltime MAX MAX nodes
(total per user)
MAX run/idle jobs
(total per user)
quick 00:00:01 00:05:00 2 1/1
short 00:05:01 48:00:00 64
normal 48:00:01 168:00:00 32
long 168:00:01 336:00:00 4
verylong* 336:00:01 672:00:00 2
* to use "verylong" queue please contact administrators

Example how to submit a job in the 'normal' queue:

$ msub -l walltime=72:00:00 <jobscript>

The job runs for three days and hence will start in the "normal" queue (because of the walltime of 72 hours).
Per default, a job starts in the queue "short".

1.4 Other job limitations and features

  • MAX 32 nodes per one job
  • Only 1 user per node - on each node can run jobs ONLY from one user
  • ssh-access to the compute nodes where job is running. When the job is finished/cancelled connection will be closed automatically
  • Opportunity to check job's output file in real time (e.g. for default job - STDIN.o<JOB_ID>, STDIN.e<JOB_ID>)
  • The job will be cancelled automatically when it can not be started because the requested resources do not exist in the cluster

1.5 Job Feedback

As of 2016-04-07, next to the regular output file an additional file containing a feedback on resource usage and job efficiency is created when the job is finished:

  • Name of the file: <job-output-file>_feedback
  • Location: same directory as the job output file

e.g. per default the output file is "<jobname>.o<jobid>" and feedback file - "<jobname>.o<jobid>_feedback"..

Information presented in the feedback file:

  • Main job parameters (job name and state, time, requested and used resources, host list)
  • Job analysis - possible problems and solutions, advice how to increase the efficiency
  • Error messages associated with the job (if present)
  • Link to a web page with graphical information of resource usage during the job (only for jobs with runtime >5 minutes)

2 Environment Variables for Batch Jobs

The bwForCluster for computational and theoretical Chemistry has the following variables in addition to the ones described under

Specific Moab environment variables
Environment variable Description
MOAB_NODELIST List of nodes separated by ampersands (&), e.g.: node1&node2
MOAB_TASKMAP Node list with procs per node separated by ampersands, e.g.: node1:16&node2:16

3 Interactive jobs

By starting an interactive session, a user automatically gets access to compute nodes and can start own application right there.
To submit interactive job with default parameters execute the following:

$ msub -I -V

where "-I" means "interactive job" and "-V" - export of all environment variables to the compute node of the interactive session.
When it is necessary to use applications or tools which provide a GUI, enable X-forwarding. For it execute this command:

$ msub -I -V -X

It is possible to configure the job's parameters, e.g. to get access for 2 hours to 1 compute node with 1 virtual processor execute the following:

$ msub -I -V -X -l nodes=1:ppn=1,walltime=02:00:00

ATTENTION
After execution of this command DO NOT CLOSE your current terminal session but wait until the queueing system Moab has granted you the requested resources on the compute system. Once granted you will be automatically logged on the dedicated resource. Now you have an interactive session with 1 node (node has 1 virtual processor) on the compute system for 2 hours. Now you can execute your application, e.g.:

$ cd path_to_application

$ ./users_application

Once the walltime limit has been reached you will be automatically logged out from the compute node.

4 Chain jobs

It is possible to submit a chain of jobs, i.e. each job runs after the previous job has completed. You can choose between several possible conditions, when the next job in the chain can run. Here is an example script:

#!/bin/bash
##################################################
#
# Script to submit a chain of jobs with dependencies
#
##################################################

# count of jobs to submit (e.g. "5")
MAX_JOBS_COUNT=5

# define your jobscript (e.g. "~/chain_job")
JOB_SCRIPT=~/chain_job

# type of dependency
DEPENDENCY="afterok"
# possible dependencies for this script:
#
# after            after:<job>[:<job>]...           Job may start at any time after specified jobs have started execution.
# afterany      afterany:<job>[:<job>]...     Job may start at any time after all specified jobs have completed regardless of completion status.
# afterok        afterok:<job>[:<job>]...       Job may start at any time after all specified jobs have successfully completed.
# afternotok   afternotok:<job>[:<job>]...  Job may start at any time after all specified jobs have completed unsuccessfully.
#
# list of all dependencies:
# http://docs.adaptivecomputing.com/suite/8-0/enterprise/help.htm#topics/moabWorkloadManager/topics/jobAdministration/jobdependencies.html

count=1
echo "msub $JOB_SCRIPT"
JOBID=$(msub $JOB_SCRIPT 2>&1 | grep -v -e '^$')
echo "$JOBID"
while [ $count -le $MAX_JOBS_COUNT ]; do
    echo "msub -W depend=$DEPENDENCY:$JOBID $JOB_SCRIPT"
    JOBID=$(msub -W depend=$DEPENDENCY:$JOBID $JOB_SCRIPT 2>&1 | grep -v -e '^$')
    echo "$JOBID"
    let count=$count+1
done

where user can change dependency when the next job can run (user can modify script to make a job dependent from more then one jobs):

  • after - job may start at any time after specified jobs have started execution
  • afterany - job may start at any time after all specified jobs have completed regardless of completion status
  • afterok - job may be start at any time after all specified jobs have successfully completed
  • afternotok - job may start at any time after all specified jobs have completed unsuccessfully


5 Job arrays

A user may have to run the same script many times, each time with different data (e.g. modelling of some process with different initial values). Moab has a feature called "job arrays" to help with tasks of that type. To submit a job array, you can use the following syntax:

msub -t [<jobname>]<indexlist>[%<limit>] jobarray.sh

It is possible to use additional options to"msub" to describe the parameters of each job in an array (e.g. each sub-job has a walltime of 30 minutes and uses 2 nodes with 1 virtual processor)

msub -l walltime=00:30:00,nodes=2:ppn=1 -t [<jobname>]<indexlist>[%<limit>] jobarray.sh

The parameter <indexlist> specifies the amount and order of the submitted sub-jobs. For example user wants to submit 10 jobs using 2 msub-commands, one to submit the five odd-numbered jobs (job1) and one to submit the five even-numbered jobs (job2). The commands should be:

msub -t job1.[1-10:2] jobarray.sh


msub -t job2.[2-10:2] jobarray.sh

To specify that only a certain number of sub-jobs in the array can run at a time, use the percent sign (%) delimiter (e.g. %2):

msub -t job.[1-10]%2 jobarray.sh

Each sub-job has 2 specific environment variables:

  • MOAB_JOBARRAYINDEX - index of job in array (e.g. for five even-numbered jobs - 1, 3, 5, 7, 9; for five odd-numbered jobs - 2, 4, 6, 8, 10)
  • MOAB_JOBARRAYRANGE - count of jobs in array (e.g. for all jobs above - 10)


The user can use the variable inside the job-array scripts e.g. to describe different input/output files for each sub-job. Here is an example script "jobarray.sh" with instruction in comments at the end, how to check it:

#!/bin/bash
##################################################
#
# Simple job-array script
# Read some data from input-file and write it to output-file
#
##################################################

#MSUB -l walltime=00:01:00    # walltime
#MSUB -N "array"                      # name of sub-job

cd ${MOAB_SUBMITDIR}

# Input file
INFILE=job.${MOAB_JOBARRAYINDEX}.in

# Output file
OUTFILE=job.${MOAB_JOBARRAYINDEX}.out

echo "Count of jobs in array: ${MOAB_JOBARRAYRANGE}">${OUTFILE}
echo "Index of this subjob: ${MOAB_JOBARRAYINDEX}" >>${OUTFILE}

# Read input and append to output file
cat $INFILE >>$OUTFILE

##################################################
#
# Check how does it work:
#
# 1. Create different input-files (e.g. 4)
# 
#     $ for i in `seq 4`; do echo $i >job.$i.in ; done
#
# 2. Submit a job-array (e.g. with 4 jobs)
#
#     $ msub -t array[1-4] jobarray.sh 
#
# After submitting user sees only one JOBID number. As output-files user can find:
#
# * 4 files job.[1-4].out
# * 4 traditional job output files - array.o<JOBID>-[1-4]
# * 4 traditional job error files - array.e<JOBID>-[1-4]
#
##################################################

After submitting the job, the user sees only one JOBID number.
You can get information about whole job array be typing:

checkjob <JOBID>

It is possible to get full information about each sub-job:

checkjob <JOBID>[<index>]

e.g. to get information about the sub-job 5 of the job 1234 you should type "checkjob 1234[5]"
Each sub-job has own output files:

  • sub-job output files - <jobname>.o<JOBID>-<index>
  • sub-job error files - <jobname>.e<JOBID>-<index>