BinAC2/Getting Started: Difference between revisions
F Bartusch (talk | contribs) |
F Bartusch (talk | contribs) |
||
(One intermediate revision by the same user not shown) | |||
Line 99: | Line 99: | ||
= Batch System Basics = |
= Batch System Basics = |
||
On |
On HPC clusters like BinAC 2, you don't run analyses directly on the login node. |
||
Instead, you write a script and submit it to the batch system |
Instead, you write a script and submit it as a job to the batch system. |
||
BinAC 2 uses SLURM as its batch system. |
|||
The batch systems then tries to schedule the jobs on the available compute nodes. |
|||
The system then schedules the job to run on one of the available compute nodes, where the actual computation takes place. |
|||
The cluster consists of compute nodes with different [[BinAC2/Hardware_and_Architecture#Compute_Nodes | hardware features]].</br> |
|||
== Queue/Job Basics == |
|||
These hardware features are only available when submitting the jobs to the correct [[BinAC2/SLURM_Partitions | partitions]]. |
|||
The getting started guide only provides very basic SLURM information.</br> |
|||
The cluster consists of compute nodes with different [https://wiki.bwhpc.de/e/BwForCluster_BinAC_Hardware_and_Architecture hardware features]. |
|||
Please read the extensive [[BinAC2/Slurm | SLURM documentation]]. |
|||
These hardware features (e.g. high-mem or GPUs) are only available when submitting the jobs to the [https://wiki.bwhpc.de/e/Queues specific queue]. |
|||
Also, each queue has different settings regarding maximal walltime. |
|||
The most recent queue settings are displayed on login as message of the day on the terminal. |
|||
Get an overview of the number of running and queued jobs: |
|||
<source lang="bash"> |
|||
$ qstat -q |
|||
Queue Memory CPU Time Walltime Node Run Que Lm State |
|||
---------------- ------ -------- -------- ---- --- --- -- ----- |
|||
tiny -- -- -- -- 0 0 -- E R |
|||
long -- -- -- -- 850 0 -- E R |
|||
gpu -- -- -- -- 66 0 -- E R |
|||
smp -- -- -- -- 4 1 -- E R |
|||
short -- -- -- -- 131 90 -- E R |
|||
----- ----- |
|||
1051 91 |
|||
</source> |
|||
To check all running and queued jobs: |
|||
<source lang="bash"> |
|||
qstat |
|||
</source> |
|||
Just your own jobs. |
|||
<source lang="bash"> |
|||
qstat -u <username> |
|||
</source> |
|||
== Interactive Jobs == |
|||
Interactive jobs are a good method for testing if/how software works with your data. |
|||
To start a 1 core job on a compute node providing a remote shell. |
|||
<source lang="bash"> |
|||
qsub -q short -l nodes=1:ppn=1 -I |
|||
</source> |
|||
The same but requesting the whole node. |
|||
<source lang="bash"> |
|||
qsub -q short -l nodes=1:ppn=28 -I |
|||
</source> |
|||
Standard Unix commands are directly available, for everything else use the modules. |
|||
<source lang="bash"> |
|||
module avail |
|||
</source> |
|||
Be aware that we allow node sharing. Do not disturb the calculations of other users. |
|||
== Simple Script Job == |
== Simple Script Job == |
||
You will have to write job scripts in order to conduct your computations on BinAC 2. |
|||
Use your favourite text editor to create a script called 'script.sh'. |
|||
Use your favourite text editor to create simple job script called 'myjob.sh'. |
|||
{|style="background:#deffee; width:100%;" |
{|style="background:#deffee; width:100%;" |
||
Line 173: | Line 125: | ||
<source lang="bash"> |
<source lang="bash"> |
||
# |
#SBATCH --ntasks=1 |
||
# |
#SBATCH --time=10:00 |
||
# |
#SBATCH --mem=5000m |
||
#SBATCH --job-name=simple |
|||
#PBS -S /bin/bash |
|||
#PBS -N Simple_Script_Job |
|||
echo "Scratch directory: $TMPDIR" |
|||
#PBS -j oe |
|||
echo "Date:" |
|||
#PBS -o LOG |
|||
date |
|||
cd $PBS_O_WORKDIR |
|||
echo "my Username is:" |
|||
whoami |
|||
echo "My job is running on node:" |
echo "My job is running on node:" |
||
hostname |
|||
uname -a |
uname -a |
||
sleep 240 |
|||
</source> |
</source> |
||
== Basic SLURM commands == |
|||
Submit the job using |
|||
Submit the job script you wrote with <code>sbatch</code>. |
|||
<source lang="bash"> |
<source lang="bash"> |
||
$ sbatch myjob.sh |
|||
qsub -q tiny script.sh |
|||
Submitted batch job 75441 |
|||
</source> |
</source> |
||
Take a note of your jobID. The scheduler will reserve one core and |
Take a note of your <code>jobID</code>. The scheduler will reserve one core and 5000MB memory for 5 minutes on a compute node for your job.</br> |
||
The job should be scheduled within |
The job should be scheduled within seconds if BinAC 2 is not fully busy. |
||
The output will be stored in a file called <code>slurm-<JobID>.out</code> |
|||
<source lang="bash"> |
|||
There are tons of options, details and caveats. Most of the options are explained on [https://wiki.bwhpc.de/e/Batch_Jobs this page], but be aware that there are some [https://wiki.bwhpc.de/e/BwForCluster_BinAC_Specific_Batch_Features differences on BinAC]. |
|||
$ cat slurm-75441.out |
|||
Scratch directory: /scratch/75441 |
|||
Date: |
|||
Thu Feb 13 09:56:41 AM CET 2025 |
|||
My job is running on node: |
|||
node1-083 |
|||
Linux node1-083 5.14.0-503.14.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov 15 12:04:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
|||
</source> |
|||
There are tons of options, details and caveats for SLURM job script. |
|||
If your job needs GPUs, you have to specify how many GPUs you want. Just submitting the job to the GPU queue does not work: |
|||
Most of them are explained in the [[BinAC2/Slurm | SLURM documentation]]. |
|||
If you encounter any problems, just send a mail to hpcmaster@uni-tuebingen.de. |
|||
You can get an overview of your queued and running jobs with <code>squeue</code> |
|||
<source lang="bash"> |
<source lang="bash"> |
||
[tu_iioba01@login01 ~]$ squeue --user=$USER |
|||
#PBS -l nodes=1:ppn=1:gpus=1 |
|||
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) |
|||
#PBS -q gpu |
|||
75441 compute simple tu_iioba R 0:03 1 node1-083 |
|||
</source> |
</source> |
||
Let's assume you pulled a Homer and want to stop/kill/remove a running job. |
|||
If you encounter any problems, just send a mail to hpcmaster@uni-tuebingen.de. |
|||
== Killing a Job == |
|||
Let's assume you build a Homer and want to stop/kill/remove a running job. |
|||
<source lang="bash"> |
<source lang="bash"> |
||
scancel <JobID> |
|||
qdel <jobID> |
|||
</source> |
</source> |
||
<!-- |
|||
== Best Practices == |
== Best Practices == |
||
The scheduler will reserve computational resources (nodes, cores, |
The scheduler will reserve computational resources (nodes, cores, GPUs, memory) for a specified period for you. By following some best practices, you can avoid common problems beforehand. |
||
=== Specify memory for your job === |
=== Specify memory for your job === |
||
Line 248: | Line 220: | ||
</source> |
</source> |
||
This job for example used only 25% of the available CPU resources. |
This job for example used only 25% of the available CPU resources. |
||
--> |
|||
= Software = |
= Software = |
||
There are several mechanisms how software can be installed on BinAC. |
There are several mechanisms how software can be installed on BinAC 2. |
||
If you need software that is not installed on BinAC |
If you need software that is not installed on BinAC 2, open a ticket and we can find a way to provide the software on the cluster. |
||
== Environment Modules == |
== Environment Modules == |
||
Line 265: | Line 238: | ||
# Load a module |
# Load a module |
||
$ module load bio/ |
$ module load bio/samtools/1.21 |
||
# Show the module's help |
# Show the module's help |
||
$ module help bio/ |
$ module help bio/samtools/1.21 |
||
</source> |
</source> |
||
Line 274: | Line 247: | ||
Sometimes software packages have so many dependencies or the user wants a combination of tools, so that environment modules cannot be used in a meaningful way. |
Sometimes software packages have so many dependencies or the user wants a combination of tools, so that environment modules cannot be used in a meaningful way. |
||
Then other solutions like |
Then other solutions like Conda environments or Singularity containers (see below) can be used. |
||
== Conda Environments == |
== Conda Environments == |
||
Conda environments |
Conda environments are a nice possibility for creating custom environments on the cluster, as a majority of the scientific software is available in the meantime as conda packages. |
||
BinAC 2 already provides Conda via Miniforge. |
|||
First, you have to install Miniconda in your home directory. |
|||
You can find a general documtation for using Conda on [[Development/Conda | on this wiki page]]. |
|||
<source lang="bash"> |
|||
# Download installer |
|||
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh |
|||
$ sh Miniconda3-latest-Linux-x86_64.sh |
|||
$ source ~/.bashrc |
|||
</source> |
|||
Then you can create your first environment and install software into it: |
|||
<source lang="bash"> |
|||
# Create an environment |
|||
$ conda create --name my_first_conda_environment |
|||
# Activate this environment |
|||
conda activate my_first_conda_environment |
|||
# Install software into this environment |
|||
$ conda install scipy=1.5.2 |
|||
</source> |
|||
You will need to add this line to your jobscripts such that the environments are available on the compute nodes: |
|||
<source lang="bash"> |
|||
source $HOME/miniconda3/etc/profile.d/conda.sh |
|||
conda activate <env_name> |
|||
</source> |
|||
When installing software conda will solve dependencies on the fly. |
|||
But it is not guaranteed that conda will use the exact same package versions in the future. |
|||
For the sake of reproducibility, you can write a file containing all conda packages together with their versions: |
|||
<source lang="bash"> |
|||
# Export packages installed in the active environment |
|||
$ conda list --explicit > spec-file.txt |
|||
# Create a new environment with the exact same conda packages |
|||
$ conda create --name myenv --file spec-file.txt |
|||
</source> |
|||
== Apptainer (formerly Singularity) == |
|||
== Singularity Container == |
|||
Sometimes software is also available in a software container format. |
Sometimes software is also available in a software container format. |
||
Singularity is installed on all BinAC nodes. You can pull |
Apptainer (formerly called Singularity) is installed on all BinAC 2 nodes. You can pull Apptainer containers or Docker images from registries onto BinAC 2 and use them. |
||
You can also build new |
You can also build new Apptainer containers on your own machine and copy them to BinAC. |
||
Please note that |
Please note that Apptainer containers should be stored in the <code>project</code> file system. |
||
We configured |
We configured Apptainer such that containers stored in your home directory do not work. |
Latest revision as of 11:14, 13 February 2025
Purpose and Goals
The Getting Started guide is designed for users who are new to HPC systems in general and to BinAC 2 specifically. After reading this guide, you should have a basic understanding of how to use BinAC 2 for your research.
Please note that this guide does not cover basic Linux command-line skills. If you're unfamiliar with commands such as listing directory contents or using a text editor, we recommend first exploring the Linux module on the [training platform].
This guide also doesn't cover every feature of the system but aims to provide a broad overview. For more detailed information about specific features, please refer to the dedicated Wiki pages on topics like the batch system, storage, and more.
Some terms in this guide may be unfamiliar. You can look them up in the HPC Glossary.
General Workflow of Running a Calculation
On an HPC Cluster, you do not simply log in and run your software. Instead, you write a Batch Script that contains all the commands needed to run and process your job, then submit it to a waiting queue to be executed on one of several hundred Compute Nodes.
Get Access to the Cluster
Follow the registration process for the bwForCluster. → How to Register for a bwForCluster
Login to the Cluster
Set up your service password and 2FA token, then log in to BinAC 2. → Login BinAC
Using the Linux command line
It is expected that you have at least basic Linux and command-line knowledge before using bwForCluster BinAC 2. There are numerous resources available online for learning fundamental concepts and commands. Here are two:
- bwHPC Linux Training course → Linux course on training.bwhpc.de
- HPC Wiki (external site) → Introduction to the Linux command line
Also see: .bashrc Do's and Don'ts
File System Basics
BinAC 2 offers several file systems for your data, each serving different needs. These are explained here in a short and simple form. For more detailed documentation, visit: here.
Home File System
Home directories are intended for the permanent storage of frequently used files, such as like source codes, configuration files, executable programs, conda environments, etc. The home file system is backed up daily and has a quota. If that quota is reached, you may experience issues when working with BinAC 2.
Here are some useful command line and bash tips for accessing the Home File system.
# For changing to your home directory, simply run:
cd
# To access files in your home directory within your job script, you can use one of these:
~/myFile # or
$HOME/myFile
Project File System
BinAC 2 has a project
file system intended for data that:
- is shared between members of a compute project
- is not actively used for computations in near future
The project
file system is available at /pfs/10/project/
:
$ ll /pfs/10/project/
total 333
drwxrwx---. 3 root bw16f003 33280 Dec 19 17:16 bw16f003
drwxrwx---. 3 root bw16g002 25600 Dec 17 15:23 bw16g002
[..]
As shown by the file permissions the project directories are only accessible to users belonging to the specific compute project.
Work File System
BinAC 2 has a work
file system on SSDs intended for data that is actively used and produced by compute jobs.
Each user creates workspaces on their own via the workspace tools.
The project file system is available at /pfs/10/work
$ ll /pfs/10/work/
total 1822
drwxr-xr-x. 3 root root 33280 Feb 12 14:56 db
drwx------. 5 tu_iioba01 tu_tu 25600 Jan 8 14:42 tu_iioba01-alphafold3
[..]
As you can see from the file permissions, the resulting workspace can only be accessed by you, not by other group members or other users.
Scratch
Each compute node provides local storage, which is much faster than accessing project
and work
file systems.
When you execute a job, a dedicated temporary directory will be assigned to it on the compute node. This is often referred to as the scratch
directory.
Programs frequently generate temporary data only needed during execution. If the program you are using offers an option for setting a temporary directory,
please configure it to use the scratch
directory.
You can use the environment variable $TMPDIR
, which will point to your job's scratch
directory.
Batch System Basics
On HPC clusters like BinAC 2, you don't run analyses directly on the login node. Instead, you write a script and submit it as a job to the batch system. BinAC 2 uses SLURM as its batch system. The system then schedules the job to run on one of the available compute nodes, where the actual computation takes place.
The cluster consists of compute nodes with different hardware features.
These hardware features are only available when submitting the jobs to the correct partitions.
The getting started guide only provides very basic SLURM information.
Please read the extensive SLURM documentation.
Simple Script Job
You will have to write job scripts in order to conduct your computations on BinAC 2. Use your favourite text editor to create simple job script called 'myjob.sh'.
Please note that there are differences between Windows and Linux line endings.
Make sure that your editor uses Linux line endings when you are using Windows.
You can check your line endings with |
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem=5000m
#SBATCH --job-name=simple
echo "Scratch directory: $TMPDIR"
echo "Date:"
date
echo "My job is running on node:"
hostname
uname -a
sleep 240
Basic SLURM commands
Submit the job script you wrote with sbatch
.
$ sbatch myjob.sh
Submitted batch job 75441
Take a note of your jobID
. The scheduler will reserve one core and 5000MB memory for 5 minutes on a compute node for your job.
The job should be scheduled within seconds if BinAC 2 is not fully busy.
The output will be stored in a file called slurm-<JobID>.out
$ cat slurm-75441.out
Scratch directory: /scratch/75441
Date:
Thu Feb 13 09:56:41 AM CET 2025
My job is running on node:
node1-083
Linux node1-083 5.14.0-503.14.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov 15 12:04:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
There are tons of options, details and caveats for SLURM job script. Most of them are explained in the SLURM documentation. If you encounter any problems, just send a mail to hpcmaster@uni-tuebingen.de.
You can get an overview of your queued and running jobs with squeue
[tu_iioba01@login01 ~]$ squeue --user=$USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
75441 compute simple tu_iioba R 0:03 1 node1-083
Let's assume you pulled a Homer and want to stop/kill/remove a running job.
scancel <JobID>
Software
There are several mechanisms how software can be installed on BinAC 2. If you need software that is not installed on BinAC 2, open a ticket and we can find a way to provide the software on the cluster.
Environment Modules
Environment modules is the 'classic' way for providing software on clusters. A module consists of a specific software version and can be loaded. The module system then manipulates the PATH and other environment variables such that the software can be used.
# Show available modules
$ module avail
# Load a module
$ module load bio/samtools/1.21
# Show the module's help
$ module help bio/samtools/1.21
A more detailed description of module environments can be found on this wiki page
Sometimes software packages have so many dependencies or the user wants a combination of tools, so that environment modules cannot be used in a meaningful way. Then other solutions like Conda environments or Singularity containers (see below) can be used.
Conda Environments
Conda environments are a nice possibility for creating custom environments on the cluster, as a majority of the scientific software is available in the meantime as conda packages. BinAC 2 already provides Conda via Miniforge. You can find a general documtation for using Conda on on this wiki page.
Apptainer (formerly Singularity)
Sometimes software is also available in a software container format. Apptainer (formerly called Singularity) is installed on all BinAC 2 nodes. You can pull Apptainer containers or Docker images from registries onto BinAC 2 and use them. You can also build new Apptainer containers on your own machine and copy them to BinAC.
Please note that Apptainer containers should be stored in the project
file system.
We configured Apptainer such that containers stored in your home directory do not work.