BinAC2/Getting Started: Difference between revisions
| H Winkhardt (talk | contribs) m (typo) | F Bartusch (talk | contribs)  No edit summary | ||
| Line 60: | Line 60: | ||
| * is not actively used for computations in near future | * is not actively used for computations in near future | ||
| The  | The data is stored on HDDs. The primary focus of <code>project</code> is pure capacity, not speed. | ||
| Every project gets a dedicated directory located at: | |||
| <source lang="bash"> | |||
| ⚫ | |||
| total 333 | |||
| drwxrwx---.  3 root       bw16f003 33280 Dec 19 17:16 bw16f003 | |||
| drwxrwx---.  3 root       bw16g002 25600 Dec 17 15:23 bw16g002 | |||
| [..] | |||
| </source> | |||
| <syntaxhighlight> | |||
| As shown by the file permissions the project directories are only accessible to users belonging to the specific compute project. | |||
| /pfs/10/project/<project_id>/ | |||
| </syntaxhighlight> | |||
| You can check the project you're member of: | |||
| <syntaxhighlight> | |||
| # id $USER | grep -o 'bw[^)]*' | |||
| bw16f003 | |||
| </syntaxhighlight> | |||
| In this case, your project directory would be: | |||
| ``` | |||
| ⚫ | |||
| ``` | |||
| Check our [[BinAC2/Project_Data_Organization | data organization guide ]] for methods to organize data inside the project directory. | |||
| == Work File System == | == Work File System == | ||
Latest revision as of 09:19, 17 September 2025
Purpose and Goals
The Getting Started guide is designed for users who are new to HPC systems in general and to BinAC 2 specifically. After reading this guide, you should have a basic understanding of how to use BinAC 2 for your research.
Please note that this guide does not cover basic Linux command-line skills. If you're unfamiliar with commands such as listing directory contents or using a text editor, we recommend first exploring the Linux module on the bwHPC training platform.
This guide also doesn't cover every feature of the system but aims to provide a broad overview. For more detailed information about specific features, please refer to the dedicated Wiki pages on topics like the batch system, storage, and more.
Some terms in this guide may be unfamiliar. You can look them up in the HPC Glossary.
General Workflow of Running a Calculation
On an HPC Cluster, you do not simply log in and run your software. Instead, you write a Batch Script that contains all the commands needed to run and process your job, then submit it to a waiting queue to be executed on one of several hundred Compute Nodes.
Get Access to the Cluster
Follow the registration process for the bwForCluster. → How to Register for a bwForCluster
Login to the Cluster
Set up your service password and 2FA token, then log in to BinAC 2. → Login BinAC
Using the Linux command line
It is expected that you have at least basic Linux and command-line knowledge before using bwForCluster BinAC 2. There are numerous resources available online for learning fundamental concepts and commands. Here are two:
- bwHPC Linux Training course → Linux course on training.bwhpc.de
- HPC Wiki (external site) → Introduction to the Linux command line
Also see: .bashrc Do's and Don'ts
File System Basics
BinAC 2 offers several file systems for your data, each serving different needs. These are explained here in a short and simple form. For more detailed documentation, visit: here.
Home File System
Home directories are intended for the permanent storage of frequently used files, such as like source codes, configuration files, executable programs, conda environments, etc. The home file system is backed up daily and has a quota. If that quota is reached, you may experience issues when working with BinAC 2.
Here are some useful command line and bash tips for accessing the Home File system.
# For changing to your home directory, simply run:
cd
# To access files in your home directory within your job script, you can use one of these:
~/myFile   # or
$HOME/myFile
Project File System
BinAC 2 has a project file system intended for data that:
- is shared between members of a compute project
- is not actively used for computations in near future
The data is stored on HDDs. The primary focus of project is pure capacity, not speed.
Every project gets a dedicated directory located at:
/pfs/10/project/<project_id>/You can check the project you're member of:
# id $USER | grep -o 'bw[^)]*'
bw16f003In this case, your project directory would be: ``` /pfs/10/project/bw16f003/ ```
Check our data organization guide for methods to organize data inside the project directory.
Work File System
BinAC 2 has a work file system on SSDs intended for data that is actively used and produced by compute jobs.
Each user creates workspaces on their own via the  workspace tools.
The project file system is available at /pfs/10/work
$ ll /pfs/10/work/
total 1822
drwxr-xr-x.  3 root        root          33280 Feb 12 14:56 db
drwx------.  5 tu_iioba01  tu_tu         25600 Jan  8 14:42 tu_iioba01-alphafold3
[..]
As you can see from the file permissions, the resulting workspace can only be accessed by you, not by other group members or other users.
Scratch
Each compute node provides local storage, which is much faster than accessing project and work file systems.
When you execute a job, a dedicated temporary directory will be assigned to it on the compute node. This is often referred to as the scratch directory.
Programs frequently generate temporary data only needed during execution. If the program you are using offers an option for setting a temporary directory,
please configure it to use the scratch directory.
You can use the environment variable $TMPDIR, which will point to your job's scratch directory.
Batch System Basics
On HPC clusters like BinAC 2, you don't run analyses directly on the login node. Instead, you write a script and submit it as a job to the batch system. BinAC 2 uses SLURM as its batch system. The system then schedules the job to run on one of the available compute nodes, where the actual computation takes place.
The cluster consists of compute nodes with different  hardware features.
These hardware features are only available when submitting the jobs to the correct  partitions.
The getting started guide only provides very basic SLURM information.
Please read the extensive  SLURM documentation.
Simple Script Job
You will have to write job scripts in order to conduct your computations on BinAC 2. Use your favourite text editor to create simple job script called 'myjob.sh'.
| Please note that there are differences between Windows and Linux line endings.
Make sure that your editor uses Linux line endings when you are using Windows.
You can check your line endings with  | 
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem=5000m
#SBATCH --job-name=simple
echo "Scratch directory: $TMPDIR"
echo "Date:"
date
echo "My job is running on node:"
hostname
uname -a
sleep 240
Basic SLURM commands
Submit the job script you wrote with sbatch.
$ sbatch myjob.sh
Submitted batch job 75441
Take a note of your jobID. The scheduler will reserve one core and 5000MB memory for 5 minutes on a compute node for your job.
The job should be scheduled within seconds if BinAC 2 is not fully busy.
The output will be stored in a file called slurm-<JobID>.out
$ cat slurm-75441.out 
Scratch directory: /scratch/75441
Date:
Thu Feb 13 09:56:41 AM CET 2025
My job is running on node:
node1-083
Linux node1-083 5.14.0-503.14.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov 15 12:04:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
There are tons of options, details and caveats for SLURM job script. Most of them are explained in the SLURM documentation. If you encounter any problems, just send a mail to hpcmaster@uni-tuebingen.de.
You can get an overview of your queued and running jobs with squeue
[tu_iioba01@login01 ~]$ squeue --user=$USER
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             75441   compute   simple tu_iioba  R       0:03      1 node1-083
Let's assume you pulled a Homer and want to stop/kill/remove a running job.
scancel <JobID>
Software
There are several mechanisms how software can be installed on BinAC 2. If you need software that is not installed on BinAC 2, open a ticket and we can find a way to provide the software on the cluster.
Environment Modules
Environment modules is the 'classic' way for providing software on clusters. A module consists of a specific software version and can be loaded. The module system then manipulates the PATH and other environment variables such that the software can be used.
# Show available modules
$ module avail
# Load a module
$ module load bio/samtools/1.21
# Show the module's help
$ module help bio/samtools/1.21
A more detailed description of module environments can be found on this wiki page
Sometimes software packages have so many dependencies or the user wants a combination of tools, so that environment modules cannot be used in a meaningful way. Then other solutions like Conda environments or Singularity containers (see below) can be used.
Conda Environments
Conda environments are a nice possibility for creating custom environments on the cluster, as a majority of the scientific software is available in the meantime as conda packages. BinAC 2 already provides Conda via Miniforge. You can find a general documtation for using Conda on on this wiki page.
Apptainer (formerly Singularity)
Sometimes software is also available in a software container format. Apptainer (formerly called Singularity) is installed on all BinAC 2 nodes. You can pull Apptainer containers or Docker images from registries onto BinAC 2 and use them. You can also build new Apptainer containers on your own machine and copy them to BinAC.
Please note that Apptainer containers should be stored in the project file system.
We configured Apptainer such that containers stored in your home directory do not work.
