BinAC2/Getting Started
Purpose and Goals
The Getting Started guide is designed for users who are new to HPC systems in general and to BinAC 2 specifically. After reading this guide, you should have a basic understanding of how to use BinAC 2 for your research.
Please note that this guide does not cover basic Linux command-line skills. If you're unfamiliar with commands such as listing directory contents or using a text editor, we recommend first exploring the Linux module on the [training platform].
This guide also doesn't cover every feature of the system but aims to provide a broad overview. For more detailed information about specific features, please refer to the dedicated Wiki pages on topics like the batch system, storage, and more.
Some terms in this guide may be unfamiliar. You can look them up in the HPC Glossary.
General Workflow of Running a Calculation
On an HPC Cluster, you do not simply log in and run your software. Instead, you write a Batch Script that contains all the commands needed to run and process your job, then submit it to a waiting queue to be executed on one of several hundred Compute Nodes.
Get Access to the Cluster
Follow the registration process for the bwForCluster. → How to Register for a bwForCluster
Login to the Cluster
Set up your service password and 2FA token, then log in to BinAC 2. → Login BinAC
Using the Linux command line
It is expected that you have at least basic Linux and command-line knowledge before using bwForCluster BinAC 2. There are numerous resources available online for learning fundamental concepts and commands. Here are two:
- bwHPC Linux Training course → Linux course on training.bwhpc.de
- HPC Wiki (external site) → Introduction to the Linux command line
Also see: .bashrc Do's and Don'ts
File System Basics
BinAC 2 offers several file systems for your data, each serving different needs. These are explained here in a short and simple form. For more detailed documentation, visit: here.
Home File System
Home directories are intended for the permanent storage of frequently used files, such as like source codes, configuration files, executable programs, conda environments, etc. The home file system is backed up daily and has a quota. If that quota is reached, you may experience issues when working with BinAC 2.
Here are some useful command line and bash tips for accessing the Home File system.
# For changing to your home directory, simply run:
cd
# To access files in your home directory within your job script, you can use one of these:
~/myFile # or
$HOME/myFile
Project File System
BinAC 2 has a project
file system intended for data that:
- is shared between members of a compute project
- is not actively used for computations in near future
The project
file system is available at /pfs/10/project/
:
$ ll /pfs/10/project/
total 333
drwxrwx---. 3 root bw16f003 33280 Dec 19 17:16 bw16f003
drwxrwx---. 3 root bw16g002 25600 Dec 17 15:23 bw16g002
[..]
As shown by the file permissions the project directories are only accessible to users belonging to the specific compute project.
Work File System
BinAC 2 has a work
file system on SSDs intended for data that is actively used and produced by compute jobs.
Each user creates workspaces on their own via the workspace tools.
The project file system is available at /pfs/10/work
$ ll /pfs/10/work/
total 1822
drwxr-xr-x. 3 root root 33280 Feb 12 14:56 db
drwx------. 5 tu_iioba01 tu_tu 25600 Jan 8 14:42 tu_iioba01-alphafold3
[..]
As you can see from the file permissions, the resulting workspace can only be accessed by you, not by other group members or other users.
Scratch
Each compute node provides local storage, which is much faster than accessing project
and work
file systems.
When you execute a job, a dedicated temporary directory will be assigned to it on the compute node. This is often referred to as the scratch
directory.
Programs frequently generate temporary data only needed during execution. If the program you are using offers an option for setting a temporary directory,
please configure it to use the scratch
directory.
You can use the environment variable $TMPDIR
, which will point to your job's scratch
directory.
Batch System Basics
On HPC clusters like BinAC 2, you don't run analyses directly on the login node. Instead, you write a script and submit it as a job to the batch system. BinAC 2 uses SLURM as its batch system. The system then schedules the job to run on one of the available compute nodes, where the actual computation takes place.
The cluster consists of compute nodes with different hardware features.
These hardware features are only available when submitting the jobs to the correct partitions.
The getting started guide only provides very basic SLURM information.
Please read the extensive SLURM documentation.
Simple Script Job
You will have to write job scripts in order to conduct your computations on BinAC 2. Use your favourite text editor to create simple job script called 'myjob.sh'.
Please note that there are differences between Windows and Linux line endings.
Make sure that your editor uses Linux line endings when you are using Windows.
You can check your line endings with |
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem=5000m
#SBATCH --job-name=simple
echo "Scratch directory: $TMPDIR"
echo "Date:"
date
echo "My job is running on node:"
hostname
uname -a
sleep 240
Basic SLURM commands
Submit the job script you wrote with sbatch
.
$ sbatch myjob.sh
Submitted batch job 75441
Take a note of your jobID
. The scheduler will reserve one core and 5000MB memory for 5 minutes on a compute node for your job.
The job should be scheduled within seconds if BinAC 2 is not fully busy.
The output will be stored in a file called slurm-<JobID>.out
$ cat slurm-75441.out
Scratch directory: /scratch/75441
Date:
Thu Feb 13 09:56:41 AM CET 2025
My job is running on node:
node1-083
Linux node1-083 5.14.0-503.14.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov 15 12:04:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
There are tons of options, details and caveats for SLURM job script. Most of them are explained in the SLURM documentation. If you encounter any problems, just send a mail to hpcmaster@uni-tuebingen.de.
You can get an overview of your queued and running jobs with squeue
[tu_iioba01@login01 ~]$ squeue --user=$USER
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
75441 compute simple tu_iioba R 0:03 1 node1-083
Let's assume you pulled a Homer and want to stop/kill/remove a running job.
scancel <JobID>
Software
There are several mechanisms how software can be installed on BinAC. If you need software that is not installed on BinAC you can open a ticket and we can find a way to provide the software on the cluster.
Environment Modules
Environment modules is the 'classic' way for providing software on clusters. A module consists of a specific software version and can be loaded. The module system then manipulates the PATH and other environment variables such that the software can be used.
# Show available modules
$ module avail
# Load a module
$ module load bio/bowtie2/2.4.1
# Show the module's help
$ module help bio/bowtie2/2.4.1
A more detailed description of module environments can be found on this wiki page
Sometimes software packages have so many dependencies or the user wants a combination of tools, so that environment modules cannot be used in a meaningful way. Then other solutions like conda environments or Singularity container (see below) can be used.
Conda Environments
Conda environments is a nice possibility for creating custom environments on the cluster, as a majority of the scientific software is available in the meantime as conda packages. First, you have to install Miniconda in your home directory.
# Download installer
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ sh Miniconda3-latest-Linux-x86_64.sh
$ source ~/.bashrc
Then you can create your first environment and install software into it:
# Create an environment
$ conda create --name my_first_conda_environment
# Activate this environment
conda activate my_first_conda_environment
# Install software into this environment
$ conda install scipy=1.5.2
You will need to add this line to your jobscripts such that the environments are available on the compute nodes:
source $HOME/miniconda3/etc/profile.d/conda.sh
conda activate <env_name>
When installing software conda will solve dependencies on the fly. But it is not guaranteed that conda will use the exact same package versions in the future. For the sake of reproducibility, you can write a file containing all conda packages together with their versions:
# Export packages installed in the active environment
$ conda list --explicit > spec-file.txt
# Create a new environment with the exact same conda packages
$ conda create --name myenv --file spec-file.txt
Singularity Container
Sometimes software is also available in a software container format. Singularity is installed on all BinAC nodes. You can pull Singularity or Docker containers from registries onto BinAC and use them. You can also build new Singularity containers on your own machine and copy them to BinAC.
Please note that Singularity containers should be stored in the work file system. We configured Singularity such that containers stored in your home directory do not work.