Difference between revisions of "Batch Jobs - bwForCluster Chemistry Features"

From bwHPC Wiki
Jump to: navigation, search
(Disk Space)
(Default Values)
Line 90: Line 90:
 
* walltime=48:00:00 - MAX run-time of job
 
* walltime=48:00:00 - MAX run-time of job
 
* nodes=1:ppn=1 - one node with one process
 
* nodes=1:ppn=1 - one node with one process
* mem=4000 - MAX real memory/RAM (in MB) used by any single job task
+
* mem=4000mb - MAX real memory/RAM (in MB) used by any single job task
   
 
== Queues ==
 
== Queues ==

Revision as of 14:04, 22 January 2015

This article contains information on features of the batch job system only applicable on the bwForCluster for computational and theoretical Chemistry "Justus".

1 Job submission on bwForCluster for Chemistry

This page describes the details of the queuing system specific to the bwForCluster Chemistry. A general description on options that should work on all bwHPC clusters can be found on the Batch Jobs page.

Jobs of a user run node-exclusive. That means several jobs from one user can run concurrently on one node, but no job from any other user can run on the same node at the same time. Users have ssh-access to the nodes on which their jobs run.

1.1 Disk Space

Disk space is only available on some of the nodes. It has to be requested in the Moab options or the job will run on a diskless node.

ATTENTION: disk space content will be erased when job is finished.

  • scratch - disk space allocated per one process (ppn), must be set in gigabytes (GB)


$ msub -l gres=scratch:8 myjobscript.sh


  • "gres" is a Moab term for "generic resources";
  • "scratch" - name of the resource for disk space
  • "8" - size of disk space in gigabytes (GB)


Scratch and available resources:

Nodes count ppn MAX Disk Space (scratch) RAM-Disk Space Virtual Memory (RAM)
224 16 no scratch, only RAM-disk up to 64GB up to 128GB
204 16 1TB (960GB) no RAM-disk 128GB
16 16 2TB (1920GB) no RAM-disk 512GB


"RAM-disk" means, that part of virtual memory (RAM) can be used for some temporary jobs files. Size of RAM-disk grows up automatically up to 50% of the RAM size. The rest RAM can be used as a traditional virtual memory.


The disk (or RAM-disk) can be accessed via the variable $SCRATCH. It points to a file:

  • for nodes with disk space


/scratch/<username>_job_<jobid>


  • for diskless nodes


/tmp/<username>_job_<jobid>


where <username> and <jobid> - valid for the current job and user values.


"Scratch" - is the disk space per process (ppn). When a job needs 100GB of disk space and uses 4 processes, users have to describe "scratch" as 100/4=25 (GB):

This example requests 100GB (4x25GB) of disk space:


$ msub -l nodes=1:ppn=4,gres=scratch:25 <jobscript>

1.2 Default Values

The default parameters of each job are:

  • walltime=48:00:00 - MAX run-time of job
  • nodes=1:ppn=1 - one node with one process
  • mem=4000mb - MAX real memory/RAM (in MB) used by any single job task

1.3 Queues

There is no need to explicitly specify a queue. Jobs will automatically be assigned to a queue depending on the resources they request.

Compute resources such as walltime and nodes are restricted and must fit into the allowed resources of at least one of the queues for job to start. The available queues are:

Queue name Walltime MIN Walltime MAX MAX nodes
(total per user)
MAX run/idle jobs
(total per user)
quick 00:00:01 00:05:00 2 1/1
short 00:05:01 48:00:00 64
normal 48:00:01 168:00:00 16
long 168:00:01 336:00:00 4


Examples:


$ msub -l walltime=72:00:00 <jobscript>


The job runs for three days and hence will start in the "normal" queue.

Per default, a job starts in the queue "short".

1.4 Other job limitations and features

  • Memory limits - when job uses more memory than it was defined (-l mem=<memory>), the job will be cancelled automatically
  • MAX 32 nodes per one job
  • Only 1 user per node - on each node can run jobs ONLY from one user
  • ssh-access to the compute nodes where job is running. When the job is finished/cancelled connection will be closed automatically
  • Opportunity to check job's output file in real time (e.g. for default job - STDIN.o<JOB_ID>, STDIN.e<JOB_ID>)
  • The job will be cancelled automatically when it can not be started because the requested resources do not exist in the cluster


2 Environment Variables for Batch Jobs

The bwForCluster for computational and theoretical Chemistry has the following variables in addition to the ones described under common set of Moab environment variables:

Specific Moab environment variables
Environment variable Description
MOAB_NODELIST List of nodes separated by ampersands (&), e.g.: node1&node2
MOAB_TASKMAP Node list with procs per node separated by ampersands, e.g.: node1:16&node2:16


3 Interactive jobs

By starting an interactive session, a user automatically gets access to compute nodes and can start own application right there.

To submit interactive job with default parameters execute the following:


$ msub -I


When it is necessary to use applications or tools which provide a GUI, enable X-forwarding. For it execute this command:


$ msub -I -X


It is possible to configure the job's parameters, e.g. to get access for 2 hours to 2 compute nodes with 1 virtual processor each execute the following:


$ msub -I -X -l nodes=2:ppn=1,walltime=02:00:00


ATTENTION: After execution of this command DO NOT CLOSE your current terminal session but wait until the queueing system Moab has granted you the requested resources on the compute system. Once granted you will be automatically logged on the dedicated resource. Now you have an interactive session with 2 nodes (each node has 1 virtual processor) on the compute system for 2 hours. Now you can execute your application, e.g.:


$ cd path_to_application


$ ./users_application


Once the walltime limit has been reached you will be automatically logged out from the compute node.


4 Chain jobs

It is possible to submit a chain of jobs, i.e. each a job runs after the previous job has completed. You can choose between several possible conditions, when the next job in the chain can run. Here is an example script:

#!/bin/bash
##################################################
#
# Script to submit a chain of jobs with dependencies
#
##################################################

# count of jobs to submit (e.g. "5")
MAX_JOBS_COUNT=5

# define your jobscript (e.g. "~/chain_job")
JOB_SCRIPT=~/chain_job

# type of dependency
DEPENDENCY="afterok"
# possible dependencies for this script:
#
# after            after:<job>[:<job>]...           Job may start at any time after specified jobs have started execution.
# afterany      afterany:<job>[:<job>]...     Job may start at any time after all specified jobs have completed regardless of completion status.
# afterok        afterok:<job>[:<job>]...       Job may start at any time after all specified jobs have successfully completed.
# afternotok   afternotok:<job>[:<job>]...  Job may start at any time after all specified jobs have completed unsuccessfully.
#
# list of all dependencies:
# http://docs.adaptivecomputing.com/suite/8-0/enterprise/help.htm#topics/moabWorkloadManager/topics/jobAdministration/jobdependencies.html

count=1
echo "msub $JOB_SCRIPT"
JOBID=$(msub $JOB_SCRIPT 2>&1 | grep -v -e '^$')
echo "$JOBID"
while [ $count -le $MAX_JOBS_COUNT ]; do
    echo "msub -W depend=$DEPENDENCY:$JOBID $JOB_SCRIPT"
    JOBID=$(msub -W depend=$DEPENDENCY:$JOBID $JOB_SCRIPT 2>&1 | grep -v -e '^$')
    echo "$JOBID"
    let count=$count+1
done


where user can change dependency when the next job can run (user can modify script to make a job dependent from more then one jobs):

  • after - job may start at any time after specified jobs have started execution
  • afterany - job may start at any time after all specified jobs have completed regardless of completion status
  • afterok - job may be start at any time after all specified jobs have successfully completed
  • afternotok - job may start at any time after all specified jobs have completed unsuccessfully


5 Job arrays

Some times user have to run same script but with different data (e.g. modelling of some process with different initial values). To make it easier user should use job arrays. To submit a job array use the next syntax:


msub -t [<jobname>]<indexlist>[%<limit>] jobarray.sh


It is possible to use another keys of "msub" to describe parameters of each job in array (e.g. each sub-job has walltime 30 minutes and uses 2 nodes with 1 virtual processor)


msub -l walltime=00:30:00,nodes=2:ppn=1 -t [<jobname>]<indexlist>[%<limit>] jobarray.sh


Parameter <indexlist> shows count and order of submitted sub-jobs. For example user wants to submit 10 jobs using 2 msub-commands, one to submit the five even-numbered jobs (job1) and one to submit the five odd-numbered jobs (job2). The commands should be:


msub -t job1.[1-10:2] jobarray.sh


msub -t job1.[2-10:2] jobarray.sh


To specify that only a certain number of sub-jobs in the array can run at a time, use the percent sign (%) delimiter (e.g. %2):


msub -t job.[1-10]%2 jobarray.sh


Each sub-job has 2 specific environment variables:

  • MOAB_JOBARRAYINDEX - index of job in array (e.g. for five even-numbered jobs - 1, 3, 5, 7, 9; for five odd-numbered jobs - 2, 4, 6, 8, 10)
  • MOAB_JOBARRAYRANGE - count of jobs in array (e.g. for all jobs above - 10)


User can use it in own job-array scripts e.g. to describe different input/output files for each sub-job. There is an example script "jobarray.sh" and instruction in comments at the bottom, how to check it:

#!/bin/bash
##################################################
#
# Simple job-array script
# Read some data from input-file and write it to output-file
#
##################################################

#MSUB -l walltime=00:01:00    # walltime
#MSUB -N "array"                      # name of sub-job

cd ${MOAB_SUBMITDIR}

# Input file
INFILE=job.${MOAB_JOBARRAYINDEX}.in

# Output file
OUTFILE=job.${MOAB_JOBARRAYINDEX}.out

echo "Count of jobs in array: ${MOAB_JOBARRAYRANGE}">${OUTFILE}
echo "Index of this subjob: ${MOAB_JOBARRAYINDEX}" >>${OUTFILE}

# Read input and append to output file
cat $INFILE >>$OUTFILE


##################################################
#
# Check how does it work:
#
# 1. Create different input-files (e.g. 4)
# 
#     $ for i in `seq 4`; do echo $i >job.$i.in ; done
#
# 2. Submit a job-array (e.g. with 4 jobs)
#
#     $ msub -t array[1-4] jobarray.sh 
#
# After submitting user sees only one JOBID number. As output-files user can find:
#
# * 4 files job.[1-4].out
# * 4 traditional job output files - array.o<JOBID>-[1-4]
# * 4 traditional job error files - array.e<JOBID>-[1-4]
#
##################################################


After submitting user sees only one JOBID number. Get information about whole job array be typing:


checkjob <JOBID>


It is possible to get full information about each sub-job:


checkjob <JOBID>[<index>]


e.g. to get information about the sub-job 5 of the job 1234 you should type "checkjob 1234[5]"

Each sub-job has own output files:

  • sub-job output files - <jobname>.o<JOBID>-<index>
  • sub-job error files - <jobname>.e<JOBID>-<index>