<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.bwhpc.de/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=S+Behnle</id>
	<title>bwHPC Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.bwhpc.de/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=S+Behnle"/>
	<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/e/Special:Contributions/S_Behnle"/>
	<updated>2026-04-11T15:29:09Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.39.17</generator>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Getting_Started&amp;diff=15711</id>
		<title>BinAC2/Getting Started</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Getting_Started&amp;diff=15711"/>
		<updated>2026-02-06T17:10:55Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Purpose and Goals  ==&lt;br /&gt;
&lt;br /&gt;
The Getting Started guide is designed for users who are new to HPC systems in general and to BinAC 2 specifically. After reading this guide, you should have a basic understanding of how to use BinAC 2 for your research.&lt;br /&gt;
&lt;br /&gt;
Please note that this guide does not cover basic Linux command-line skills. If you&#039;re unfamiliar with commands such as listing directory contents or using a text editor, we recommend first exploring the Linux module on the [https://training.bwhpc.de bwHPC training platform].&lt;br /&gt;
&lt;br /&gt;
This guide also doesn&#039;t cover every feature of the system but aims to provide a broad overview. For more detailed information about specific features, please refer to the dedicated Wiki pages on topics like the batch system, storage, and more.&lt;br /&gt;
&lt;br /&gt;
Some terms in this guide may be unfamiliar. You can look them up in the [[HPC_Glossary|HPC Glossary]].&lt;br /&gt;
&lt;br /&gt;
== General Workflow of Running a Calculation ==&lt;br /&gt;
&lt;br /&gt;
On an &#039;&#039;&#039;HPC Cluster&#039;&#039;&#039;, you do not simply log in and run your software. Instead, you write a &#039;&#039;&#039;Batch Script&#039;&#039;&#039; that contains all the commands needed to run and process your job, then submit it to a waiting queue to be executed on one of several hundred &#039;&#039;&#039;Compute Nodes&#039;&#039;&#039;. &lt;br /&gt;
&lt;br /&gt;
== Get Access to the Cluster ==&lt;br /&gt;
&lt;br /&gt;
Follow the registration process for the bwForCluster. &amp;amp;rarr; [[Registration/bwForCluster|How to Register for a bwForCluster]]&lt;br /&gt;
&lt;br /&gt;
== Login to the Cluster ==&lt;br /&gt;
&lt;br /&gt;
Set up your service password and 2FA token, then log in to BinAC 2. &amp;amp;rarr; [[BinAC2/Login|Login BinAC]]&lt;br /&gt;
&lt;br /&gt;
== Using the Linux command line ==&lt;br /&gt;
&lt;br /&gt;
It is expected that you have at least basic Linux and command-line knowledge before using bwForCluster BinAC 2.&lt;br /&gt;
There are numerous resources available online for learning fundamental concepts and commands.&lt;br /&gt;
Here are two:&lt;br /&gt;
&lt;br /&gt;
* bwHPC Linux Training course &amp;amp;rarr; [https://training.bwhpc.de/ Linux course on training.bwhpc.de]&lt;br /&gt;
* HPC Wiki (external site) &amp;amp;rarr;  [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC/The_Command_Line Introduction to the Linux command line]&lt;br /&gt;
&lt;br /&gt;
Also see: [[.bashrc Do&#039;s and Don&#039;ts]]&lt;br /&gt;
&lt;br /&gt;
= File System Basics =&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers several file systems for your data, each serving different needs.&lt;br /&gt;
These are explained here in a short and simple form. For more detailed documentation, visit: [https://wiki.bwhpc.de/e/BwForCluster_BinAC_Hardware_and_Architecture#Storage_Architecture here].&lt;br /&gt;
&lt;br /&gt;
== Home File System ==&lt;br /&gt;
&lt;br /&gt;
Home directories are intended for the permanent storage of frequently used files, such as like source codes, configuration files, executable programs, conda environments, etc.&lt;br /&gt;
The home file system is backed up daily and has a quota.&lt;br /&gt;
If that quota is reached, you may experience issues when working with BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Here are some useful command line and bash tips for accessing the Home File system.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
# For changing to your home directory, simply run:&lt;br /&gt;
cd&lt;br /&gt;
&lt;br /&gt;
# To access files in your home directory within your job script, you can use one of these:&lt;br /&gt;
~/myFile   # or&lt;br /&gt;
$HOME/myFile&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Project File System ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 has a &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt; file system intended for data that:&lt;br /&gt;
* is shared between members of a compute project&lt;br /&gt;
* is not actively used for computations in near future&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
Every project gets a dedicated directory located at:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
/pfs/10/project/&amp;lt;project_id&amp;gt;/&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check the project you&#039;re member of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
# id $USER | grep -o &#039;bw[^)]*&#039;&lt;br /&gt;
bw16f003&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this case, your project directory would be:&lt;br /&gt;
```&lt;br /&gt;
/pfs/10/project/bw16f003/&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Check our [[BinAC2/Project_Data_Organization | data organization guide ]] for methods to organize data inside the project directory.&lt;br /&gt;
&lt;br /&gt;
== Work File System ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 has a &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt; file system on SSDs intended for data that is actively used and produced by compute jobs.&lt;br /&gt;
Each user creates workspaces on their own via the [[BinAC2/Hardware_and_Architecture#Work | workspace tools]].&lt;br /&gt;
&lt;br /&gt;
The project file system is available at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
$ ll /pfs/10/work/&lt;br /&gt;
total 1822&lt;br /&gt;
drwxr-xr-x.  3 root        root          33280 Feb 12 14:56 db&lt;br /&gt;
drwx------.  5 tu_iioba01  tu_tu         25600 Jan  8 14:42 tu_iioba01-alphafold3&lt;br /&gt;
[..]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As you can see from the file permissions, the resulting workspace can only be accessed by you, not by other group members or other users.&lt;br /&gt;
&lt;br /&gt;
== Scratch ==&lt;br /&gt;
&lt;br /&gt;
Each compute node provides local storage, which is much faster than accessing &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt; file systems.&amp;lt;/br&amp;gt;&lt;br /&gt;
When you execute a job, a dedicated temporary directory will be assigned to it on the compute node. This is often referred to as the &amp;lt;code&amp;gt;scratch&amp;lt;/code&amp;gt; directory.&amp;lt;/br&amp;gt;&lt;br /&gt;
Programs frequently generate temporary data only needed during execution. If the program you are using offers an option for setting a temporary directory,&amp;lt;/br&amp;gt;&lt;br /&gt;
please configure it to use the &amp;lt;code&amp;gt;scratch&amp;lt;/code&amp;gt; directory.&amp;lt;/br&amp;gt;&lt;br /&gt;
You can use the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which will point to your job&#039;s &amp;lt;code&amp;gt;scratch&amp;lt;/code&amp;gt; directory.&lt;br /&gt;
&lt;br /&gt;
= Batch System Basics =&lt;br /&gt;
&lt;br /&gt;
On HPC clusters like BinAC 2, you don&#039;t run analyses directly on the login node.&lt;br /&gt;
Instead, you write a script and submit it as a job to the batch system.&lt;br /&gt;
BinAC 2 uses SLURM as its batch system.&lt;br /&gt;
The system then schedules the job to run on one of the available compute nodes, where the actual computation takes place.&lt;br /&gt;
&lt;br /&gt;
The cluster consists of compute nodes with different [[BinAC2/Hardware_and_Architecture#Compute_Nodes | hardware features]].&amp;lt;/br&amp;gt;&lt;br /&gt;
These hardware features are only available when submitting the jobs to the correct [[BinAC2/SLURM_Partitions | partitions]].&lt;br /&gt;
&lt;br /&gt;
The getting started guide only provides very basic SLURM information.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please read the extensive [[BinAC2/Slurm | SLURM documentation]].&lt;br /&gt;
&lt;br /&gt;
== Simple Script Job ==&lt;br /&gt;
&lt;br /&gt;
You will have to write job scripts in order to conduct your computations on BinAC 2.&lt;br /&gt;
Use your favourite text editor to create simple job script called &#039;myjob.sh&#039;.&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
Please note that there are differences between Windows and Linux line endings.&lt;br /&gt;
Make sure that your editor uses Linux line endings when you are using Windows.&lt;br /&gt;
You can check your line endings with &amp;lt;code&amp;gt;vim -b &amp;lt;your script&amp;gt;&amp;lt;/code&amp;gt;. Windows line endings will be displayed as &amp;lt;code&amp;gt;^M&amp;lt;/code&amp;gt;.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --time=10:00&lt;br /&gt;
#SBATCH --mem=5000m&lt;br /&gt;
#SBATCH --job-name=simple&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;Scratch directory: $TMPDIR&amp;quot;&lt;br /&gt;
echo &amp;quot;Date:&amp;quot;&lt;br /&gt;
date&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;My job is running on node:&amp;quot;&lt;br /&gt;
hostname&lt;br /&gt;
uname -a&lt;br /&gt;
&lt;br /&gt;
sleep 240&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Basic SLURM commands == &lt;br /&gt;
&lt;br /&gt;
Submit the job script you wrote with &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
$ sbatch myjob.sh&lt;br /&gt;
Submitted batch job 75441&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Take a note of your &amp;lt;code&amp;gt;jobID&amp;lt;/code&amp;gt;. The scheduler will reserve one core and 5000MB memory for 5 minutes on a compute node for your job.&amp;lt;/br&amp;gt;&lt;br /&gt;
The job should be scheduled within seconds if BinAC 2 is not fully busy.&lt;br /&gt;
The output will be stored in a file called &amp;lt;code&amp;gt;slurm-&amp;lt;JobID&amp;gt;.out&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
$ cat slurm-75441.out &lt;br /&gt;
Scratch directory: /scratch/75441&lt;br /&gt;
Date:&lt;br /&gt;
Thu Feb 13 09:56:41 AM CET 2025&lt;br /&gt;
My job is running on node:&lt;br /&gt;
node1-083&lt;br /&gt;
Linux node1-083 5.14.0-503.14.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov 15 12:04:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
There are tons of options, details and caveats for SLURM job script.&lt;br /&gt;
Most of them are explained in the [[BinAC2/Slurm | SLURM documentation]].&lt;br /&gt;
If you encounter any problems, just send a mail to hpcmaster@uni-tuebingen.de.&lt;br /&gt;
&lt;br /&gt;
You can get an overview of your queued and running jobs with &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[tu_iioba01@login01 ~]$ squeue --user=$USER&lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
             75441   compute   simple tu_iioba  R       0:03      1 node1-083&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Let&#039;s assume you pulled a Homer and want to stop/kill/remove a running job.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
scancel &amp;lt;JobID&amp;gt;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
== Best Practices == &lt;br /&gt;
&lt;br /&gt;
The scheduler will reserve computational resources (nodes, cores, GPUs, memory) for a specified period for you. By following some best practices, you can avoid common problems beforehand.&lt;br /&gt;
&lt;br /&gt;
=== Specify memory for your job ===&lt;br /&gt;
&lt;br /&gt;
Often we get tickets with question like &amp;quot;Why did the system kill my job?&amp;quot;.&lt;br /&gt;
Most often the user did not specify the required memory resources for the job. Then the following happens:&lt;br /&gt;
&lt;br /&gt;
The job is started on a compute node, where it shares the resources other jobs. Let us assume that the other jobs on this node occupy already 100 gigabyte of memory. Now your job tries to allocate 40 gigabyte of memory. As the compute node has only 128 gigabyte, your job crashes because it cannot allocate that much memory.&lt;br /&gt;
&lt;br /&gt;
You can make your life easier by specifying the required memory in your job script with:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#PBS -l mem=xxgb&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Then you have the guarantee that your job can allocate xx gigabyte of memory.&lt;br /&gt;
&lt;br /&gt;
If you do not know how much memory your job will need, look into the documentation of the tools you use or ask us.&lt;br /&gt;
We also started [https://wiki.bwhpc.de/e/Memory_Usage a wiki page] on which we will document some guidelines and pitfalls for specific tools.&lt;br /&gt;
&lt;br /&gt;
=== Use the reserved resources ===&lt;br /&gt;
&lt;br /&gt;
Reserved resources (nodes, cores, gpus, memory) are not available to other users and their jobs.&lt;br /&gt;
You have the responsibility that your programs utilize the reserved resources.&lt;br /&gt;
&lt;br /&gt;
An extreme example: You request a whole node (node=1:ppn=28), but your job uses just one core. The other 27 cores are idling. This is bad practice, so take care that the used programs really use the requested resources.&lt;br /&gt;
&lt;br /&gt;
Another example are tools that do not benefit from a increasing number of cores.&lt;br /&gt;
Please check the documentation of your tools and also check the feedback files that report the CPU efficiency of your job.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
CPU efficiency, 0-100%                      | 25.00&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
This job for example used only 25% of the available CPU resources.&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Software =&lt;br /&gt;
&lt;br /&gt;
There are several mechanisms how software can be installed on BinAC 2.&lt;br /&gt;
If you need software that is not installed on BinAC 2, open a ticket and we can find a way to provide the software on the cluster.&lt;br /&gt;
&lt;br /&gt;
== Environment Modules ==&lt;br /&gt;
&lt;br /&gt;
Environment modules is the &#039;classic&#039; way for providing software on clusters.&lt;br /&gt;
A module consists of a specific software version and can be loaded.&lt;br /&gt;
The module system then manipulates the PATH and other environment variables such that the software can be used.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
# Show available modules&lt;br /&gt;
$ module avail&lt;br /&gt;
&lt;br /&gt;
# Load a module&lt;br /&gt;
$ module load bio/samtools/1.21&lt;br /&gt;
&lt;br /&gt;
# Show the module&#039;s help&lt;br /&gt;
$ module help bio/samtools/1.21&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A more detailed description of module environments can be found [https://wiki.bwhpc.de/e/Environment_Modules on this wiki page]&lt;br /&gt;
&lt;br /&gt;
Sometimes software packages have so many dependencies or the user wants a combination of tools, so that environment modules cannot be used in a meaningful way.&lt;br /&gt;
Then other solutions like Conda environments or Singularity containers (see below) can be used.&lt;br /&gt;
&lt;br /&gt;
== Conda Environments ==&lt;br /&gt;
&lt;br /&gt;
Conda environments are a nice possibility for creating custom environments on the cluster, as a majority of the scientific software is available in the meantime as conda packages.&lt;br /&gt;
BinAC 2 already provides Conda via Miniforge.&lt;br /&gt;
You can find a general documtation for using Conda on [[Development/Conda | on this wiki page]].&lt;br /&gt;
&lt;br /&gt;
== Apptainer (formerly Singularity) ==&lt;br /&gt;
&lt;br /&gt;
Sometimes software is also available in a software container format.&lt;br /&gt;
Apptainer (formerly called Singularity) is installed on all BinAC 2 nodes. You can pull Apptainer containers or Docker images from registries onto BinAC 2 and use them.&lt;br /&gt;
You can also build new Apptainer containers on your own machine and copy them to BinAC.&lt;br /&gt;
&lt;br /&gt;
Please note that Apptainer containers should be stored in the &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt; file system.&lt;br /&gt;
We configured Apptainer such that containers stored in your home directory do not work.&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Slurm&amp;diff=15670</id>
		<title>BinAC2/Slurm</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Slurm&amp;diff=15670"/>
		<updated>2026-01-07T08:47:56Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: /* GPU jobs */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= General information about Slurm =&lt;br /&gt;
&lt;br /&gt;
Any kind of calculation on the compute nodes of bwForCluster BinAC 2 requires the user to define calculations as a sequence of commands or single command together with required run time, number of CPU cores and main memory and submit all, i.e., the &#039;&#039;&#039;batch job&#039;&#039;&#039;, to a resource and workload managing software. BinAC 2 has installed the workload managing software Slurm. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.&lt;br /&gt;
&lt;br /&gt;
= External Slurm documentation =&lt;br /&gt;
&lt;br /&gt;
You can find the official Slurm configuration and some other material here:&lt;br /&gt;
&lt;br /&gt;
* Slurm documentation: https://slurm.schedmd.com/documentation.html&lt;br /&gt;
* Slurm cheat sheet: https://slurm.schedmd.com/pdfs/summary.pdf&lt;br /&gt;
* Slurm tutorials: https://slurm.schedmd.com/tutorials.html&lt;br /&gt;
&lt;br /&gt;
= SLURM terminology = &lt;br /&gt;
&lt;br /&gt;
SLURM knows and mirrors the division of the cluster into &#039;&#039;&#039;nodes&#039;&#039;&#039; with several &#039;&#039;&#039;cores&#039;&#039;&#039;. When queuing &#039;&#039;&#039;jobs&#039;&#039;&#039;, there are several ways of requesting resources and it is important to know which term means what in SLURM. Here are some basic SLURM terms:&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Job&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Job&lt;br /&gt;
: A job is a self-contained computation that may encompass multiple tasks and is given specific resources like individual CPUs/GPUs, a specific amount of RAM or entire nodes. These resources are said to have been allocated for the job.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Task&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Task&lt;br /&gt;
: A task is a single run of a single process. By default, one task is run per node and one CPU is assigned per task.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Partition&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Partition    &lt;br /&gt;
: A partition (usually called queue outside SLURM) is a waiting line in which jobs are put by users.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Socket&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Socket    &lt;br /&gt;
: Receptacle on the motherboard for one physically packaged processor (each of which can contain one or more cores).&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Core&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Core    &lt;br /&gt;
: A complete private set of registers, execution units, and retirement queues needed to execute programs.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Thread&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Thread    &lt;br /&gt;
: One or more hardware contexts withing a single core. Each thread has attributes of one core, managed &amp;amp; scheduled as a single logical processor by the OS.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;CPU&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;CPU&lt;br /&gt;
: A &#039;&#039;&#039;CPU&#039;&#039;&#039; in Slurm means a &#039;&#039;&#039;single core&#039;&#039;&#039;. This is different from the more common terminology, where a CPU (a microprocessor chip) consists of multiple cores. Slurm uses the term &#039;&#039;&#039;sockets&#039;&#039;&#039; when talking about CPU chips. Depending upon system configuration, a CPU can be either a &#039;&#039;&#039;core&#039;&#039;&#039; or a &#039;&#039;&#039;thread&#039;&#039;&#039;. On &#039;&#039;&#039;BinAC 2 Hyperthreading is activated on every machine&#039;&#039;&#039;. This means that the operating system and Slurm sees each physical core as two logical cores.&lt;br /&gt;
&lt;br /&gt;
= Slurm Commands =&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Brief explanation&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/sbatch.html sbatch] || Submits a job and queues it in an input queue&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/salloc.html saclloc] || Request resources for an interactive job&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/squeue.html squeue] || Displays information about active, eligible, blocked, and/or recently completed jobs &lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/scontrol.html scontrol] || Displays detailed job state information&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/scontrol.html sstat] || Displays status information about a running job&lt;br /&gt;
|- &lt;br /&gt;
| [https://slurm.schedmd.com/scancel.html scancel] || Cancels a job&lt;br /&gt;
|- &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Interactive Jobs ==&lt;br /&gt;
&lt;br /&gt;
You can run interactive jobs for testing and developing your job scripts.&lt;br /&gt;
Several nodes are reserved for interactive work, so your jobs should start right away.&lt;br /&gt;
You can only submit one job to this partition at a time. A job can run for up to 10 hours (about one workday).&lt;br /&gt;
&lt;br /&gt;
This example command gives you 16 cores and 128 GB of memory for four hours on one of the reserved nodes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --partition=interactive --time=4:00:00 --cpus-per-task=16 --mem=128gb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also use srun to request the same resources:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
srun --partition=interactive --time=4:00:00 --cpus-per-task=16 --mem=128gb --pty bash&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Job Submission : sbatch ==&lt;br /&gt;
&lt;br /&gt;
Batch jobs are submitted by using the command &#039;&#039;&#039;sbatch&#039;&#039;&#039;. The main purpose of the &#039;&#039;&#039;sbatch&#039;&#039;&#039; command is to specify the resources that are needed to run the job. &#039;&#039;&#039;sbatch&#039;&#039;&#039; will then queue the batch job. However, starting of batch job depends on the availability of the requested resources and the fair sharing value.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== sbatch Command Parameters ===&lt;br /&gt;
The syntax and use of &#039;&#039;&#039;sbatch&#039;&#039;&#039; can be displayed via:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ man sbatch&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;sbatch&#039;&#039;&#039; options can be used from the command line or in your job script. The following table shows the syntax and provides examples for each option.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;5&amp;quot; | sbatch Options&lt;br /&gt;
|-&lt;br /&gt;
! Command line&lt;br /&gt;
! Job Script&lt;br /&gt;
! Purpose&lt;br /&gt;
! Example&lt;br /&gt;
! Default value&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| &amp;lt;code&amp;gt;-t &#039;&#039;time&#039;&#039;&amp;lt;/code&amp;gt;  or  &amp;lt;code&amp;gt;--time=&#039;&#039;time&#039;&#039;&amp;lt;/code&amp;gt;&lt;br /&gt;
| #SBATCH --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| Wall clock time limit.&amp;lt;br&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;-t 2:30:00&amp;lt;/code&amp;gt; Limits run time to 2h 30 min.&amp;lt;/br&amp;gt;&amp;lt;code&amp;gt;-t 2-12&amp;lt;/code&amp;gt; Limits run time to 2 days and 12 hours.&lt;br /&gt;
| Depends on Slurm partition.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -N &#039;&#039;count&#039;&#039;  or  --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of nodes to be used.&lt;br /&gt;
| &amp;lt;code&amp;gt;-N 1&amp;lt;/code&amp;gt; Run job on one node.&amp;lt;/br&amp;gt;&amp;lt;code&amp;gt;-N 2&amp;lt;/code&amp;gt; Run job on two nodes (have to use MPI!)&lt;br /&gt;
| &lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -n &#039;&#039;count&#039;&#039;  or  --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of tasks to be launched.&lt;br /&gt;
| &amp;lt;code&amp;gt;-n 2&amp;lt;/code&amp;gt; launch two tasks in the job.&lt;br /&gt;
| One task per node&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Maximum count of tasks per node.&amp;lt;br&amp;gt;(Replaces the option &amp;lt;code&amp;gt;ppn&amp;lt;/code&amp;gt; of MOAB.)&lt;br /&gt;
| &amp;lt;code&amp;gt;--ntasks-per-node=2&amp;lt;/code&amp;gt; Run 2 tasks per node&lt;br /&gt;
| 1 task per node&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -c &#039;&#039;count&#039;&#039; or --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of CPUs required per (MPI-)task.&lt;br /&gt;
| &amp;lt;code&amp;gt;-c 2&amp;lt;/code&amp;gt; Request two CPUs per (MPI-)task.&lt;br /&gt;
| 1 CPU per (MPI-)task&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| &amp;lt;code&amp;gt;--mem=&amp;lt;size&amp;gt;[units]&amp;lt;/code&amp;gt;&lt;br /&gt;
| #SBATCH --mem=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Memory in MegaByte per node.&amp;lt;/br&amp;gt;&amp;lt;code&amp;gt;[units]&amp;lt;/code&amp;gt; can be one of &amp;lt;code&amp;gt;[K&amp;lt;nowiki&amp;gt;|&amp;lt;/nowiki&amp;gt;M&amp;lt;nowiki&amp;gt;|&amp;lt;/nowiki&amp;gt;G&amp;lt;nowiki&amp;gt;|&amp;lt;/nowiki&amp;gt;T]&amp;lt;/code&amp;gt;.&lt;br /&gt;
| &amp;lt;code&amp;gt;--mem=10g&amp;lt;/code&amp;gt; Request 10GB RAM per node &amp;lt;/br&amp;gt; &amp;lt;code&amp;gt;--mem=0&amp;lt;/code&amp;gt; Request all memory on node&lt;br /&gt;
| Depends on Slurm configuration.&amp;lt;/br&amp;gt;It is better to specify &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; in every case.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Minimum Memory required per allocated CPU.&amp;lt;br&amp;gt;(Replaces the option pmem of MOAB. You should omit &amp;lt;br&amp;gt; the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| Notify user by email when certain event types occur.&amp;lt;br&amp;gt;Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
|  The specified mail-address receives email notification of state&amp;lt;br&amp;gt;changes as defined by --mail-type.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job output is stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job error messages are stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -J &#039;&#039;name&#039;&#039; or --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| Job name.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| #SBATCH --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| Identifies which environment variables from the submission &amp;lt;br&amp;gt; environment are propagated to the launched application. Default &amp;lt;br&amp;gt; is ALL. If adding an environment variable to the submission&amp;lt;br&amp;gt; environment is intended, the argument ALL must be added.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -A &#039;&#039;group-name&#039;&#039; or --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| #SBATCH --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| Change resources used by this job to specified group. You may &amp;lt;br&amp;gt; need this option if your account is assigned to more &amp;lt;br&amp;gt; than one group. By command &amp;quot;scontrol show job&amp;quot; the project &amp;lt;br&amp;gt; group the job is accounted on can be seen behind &amp;quot;Account=&amp;quot;. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -p &#039;&#039;queue-name&#039;&#039; or --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| #SBATCH --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| Request a specific queue for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| #SBATCH --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| Use a specific reservation for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C &#039;&#039;LSDF&#039;&#039; or --constraint=&#039;&#039;LSDF&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=LSDF&lt;br /&gt;
| Job constraint LSDF Filesystems.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== sbatch --partition  &#039;&#039;queues&#039;&#039; ====&lt;br /&gt;
Queue classes define maximum resources such as walltime, nodes and processes per node and queue of the compute system. Details can be found here:&lt;br /&gt;
* [[BinAC2/SLURM_Partitions|BinAC 2 partitions]]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== sbatch Examples ===&lt;br /&gt;
&lt;br /&gt;
If you are coming from Moab/Torque on BinAC 1 or you are new to HPC/Slurm the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; options may confuse you. The following examples give an orientation how to run typical workloads on BinAC 2.&lt;br /&gt;
&lt;br /&gt;
You can find every file mentioned on this Wiki page on BinAC 2 at: &amp;lt;code&amp;gt;/pfs/10/project/examples&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Serial Programs ====&lt;br /&gt;
When you use serial programs that use only one process, you can omit most of the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; parameters, as the default values are sufficient.&lt;br /&gt;
&lt;br /&gt;
To submit a serial job that runs the script &amp;lt;code&amp;gt;serial_job.sh&amp;lt;/code&amp;gt; and requires 5000 MB of main memory and 10 minutes of wall clock time, Slurm will allocate one &#039;&#039;&#039;physical&#039;&#039;&#039; core to your job.&lt;br /&gt;
&lt;br /&gt;
a) execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p compute -t 10:00 --mem=5000m  serial_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or&lt;br /&gt;
b) add after the initial line of your script &#039;&#039;&#039;serial_job.sh&#039;&#039;&#039; the lines:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#SBATCH --time=10:00&lt;br /&gt;
#SBATCH --mem=5000m&lt;br /&gt;
#SBATCH --job-name=simple-serial-job&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
and execute the modified script with the command line option &#039;&#039;--partition=compute&#039;&#039;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p=compute serial_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note, that sbatch command line options overrule script options.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Multithreaded Programs ====&lt;br /&gt;
&lt;br /&gt;
Multithreaded programs run their processes on multiple threads and share resources such as memory.&amp;lt;br&amp;gt;&lt;br /&gt;
You may use a program that includes a built-in option for multithreading (e.g., options like &amp;lt;code&amp;gt;--threads&amp;lt;/code&amp;gt;).&amp;lt;br&amp;gt;&lt;br /&gt;
For multithreaded programs based on &#039;&#039;&#039;Open&#039;&#039;&#039; &#039;&#039;&#039;M&#039;&#039;&#039;ulti-&#039;&#039;&#039;P&#039;&#039;&#039;rocessing (OpenMP) a number of threads is defined by the environment variable &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt;. By default, this variable is set to 1 (&amp;lt;code&amp;gt;OMP_NUM_THREADS=1&amp;lt;/code&amp;gt;). &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Important:&#039;&#039;&#039; Hyperthreading is activated on bwForCluster BinAC 2. Hyperthreading can be beneficial for some applications and codes, but it can also degrade performance in other cases. We therefore recommend to run a small test job with and without hyperthreading to determine the best choice. &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;a) Program with built-in multithreading option&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
The example uses the common Bioinformatics software called &amp;lt;code&amp;gt;samtools&amp;lt;/code&amp;gt; as example for using built-in multithreading.&lt;br /&gt;
&lt;br /&gt;
The module &amp;lt;code&amp;gt;bio/samtools/1.21&amp;lt;/code&amp;gt; provides an example jobscript that requests 4 CPUs and runs &amp;lt;code&amp;gt;samtools sort&amp;lt;/code&amp;gt; with 4 threads.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#SBATCH --time=19:00&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --cpus-per-task=4&lt;br /&gt;
#SBATCH --mem=5000m&lt;br /&gt;
#SBATCH --partition compute&lt;br /&gt;
[...]&lt;br /&gt;
samtools sort -@ 4 sample.bam -o sample.sorted.bam&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can use the example jobscript with this command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch /opt/bwhpc/common/bio/samtools/1.21/bwhpc-examples/binac2-samtools-1.21-bwhpc-examples.slurm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;b) OpenMP&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
We will run an exaple OpenMP Hello-World program. The jobscript looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --cpus-per-task=4&lt;br /&gt;
#SBATCH --time=1:00&lt;br /&gt;
#SBATCH --mem=5000m   &lt;br /&gt;
#SBATCH -J OpenMP-Hello-World&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$(${SLURM_JOB_CPUS_PER_NODE}/2)&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;Executable running on ${SLURM_JOB_CPUS_PER_NODE} cores with ${OMP_NUM_THREADS} threads&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Run parallel Hello World&lt;br /&gt;
/pfs/10/project/examples/openmp_hello_world&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit the job to the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition and get the output (in the stdout-file)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch --partition=compute /pfs/10/project/examples/openmp_hello_world.sh&lt;br /&gt;
&lt;br /&gt;
Executable  running on 4 cores with 4 threads&lt;br /&gt;
Hello from process: 0&lt;br /&gt;
Hello from process: 2&lt;br /&gt;
Hello from process: 1&lt;br /&gt;
Hello from process: 3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
If you want to run MPI-jobs on batch nodes, generate a wrapper script &amp;lt;code&amp;gt;mpi_hello_world.sh&amp;lt;/code&amp;gt; for &#039;&#039;&#039;OpenMPI&#039;&#039;&#039; containing the following lines:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#SBATCH --partition compute&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=2&lt;br /&gt;
#SBATCH --cpus-per-task=2&lt;br /&gt;
#SBATCH --mem-per-cpu=2000&lt;br /&gt;
#SBATCH --time=05:00&lt;br /&gt;
&lt;br /&gt;
# Load the MPI implementation of your choice&lt;br /&gt;
module load mpi/openmpi/4.1-gnu-14.2&lt;br /&gt;
&lt;br /&gt;
# Run your MPI program&lt;br /&gt;
mpirun --bind-to core --map-by core --report-bindings mpi_hello_world&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Attention:&#039;&#039;&#039; Do &#039;&#039;&#039;NOT&#039;&#039;&#039; add mpirun options &amp;lt;code&amp;gt;-n &amp;lt;number_of_processes&amp;gt;&amp;lt;/code&amp;gt; or any other option defining processes or nodes, since Slurm instructs mpirun about number of processes and node hostnames.&lt;br /&gt;
&lt;br /&gt;
Use &#039;&#039;&#039;ALWAYS&#039;&#039;&#039; the MPI options &amp;lt;code&amp;gt;--bind-to core&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--map-by core|socket|node&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please type &amp;lt;code&amp;gt;man mpirun&amp;lt;/code&amp;gt; for an explanation of the meaning of the different options of mpirun option &amp;lt;code&amp;gt;--map-by&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The above jobscript runs four OpenMPI tasks, distributed between two nodes. Because of hyperthreading you have to set &amp;lt;code&amp;gt;--cpus-per-task=2&amp;lt;/code&amp;gt;. This means each MPI-task will get one physical core. If you omit &amp;lt;code&amp;gt;--cpus-per-task=2&amp;lt;/code&amp;gt; MPI will fail.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Attention:&#039;&#039;&#039; Not all compute nodes are connected via Infiniband. Tell Slurm you want Infiniband via &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt; when submitting or add &amp;lt;code&amp;gt;#SBATCH --constraint=ib&amp;lt;/code&amp;gt; to your jobscript.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch --constraint=ib /pfs/10/project/examples/mpi_hello_world.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will run a simple Hello World program:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
Hello world from processor node2-031, rank 3 out of 4 processors&lt;br /&gt;
Hello world from processor node2-031, rank 2 out of 4 processors&lt;br /&gt;
Hello world from processor node2-030, rank 1 out of 4 processors&lt;br /&gt;
Hello world from processor node2-030, rank 0 out of 4 processors&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Multithreaded + MPI parallel Programs ====&lt;br /&gt;
&lt;br /&gt;
Multithreaded + MPI parallel programs operate faster than serial programs on multi CPUs with multiple cores. All threads of one process share resources such as memory. On the contrary MPI tasks do not share memory but can be spawned over different nodes. &#039;&#039;&#039;Because hyperthreading is switched on BinaC 2, the option --cpus-per-task (-c) must be set to 2*n, if you want to use n threads.&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
===== OpenMPI with Multithreading =====&lt;br /&gt;
Multiple MPI tasks using &#039;&#039;&#039;OpenMPI&#039;&#039;&#039; must be launched by the MPI parallel program &#039;&#039;&#039;mpirun&#039;&#039;&#039;. For multithreaded programs based on &#039;&#039;&#039;Open&#039;&#039;&#039; &#039;&#039;&#039;M&#039;&#039;&#039;ulti-&#039;&#039;&#039;P&#039;&#039;&#039;rocessing (OpenMP) number of threads are defined by the environment variable OMP_NUM_THREADS. By default this variable is set to 1 (OMP_NUM_THREADS=1).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;For OpenMPI&#039;&#039;&#039; a job-script to submit a batch job called &#039;&#039;job_ompi_omp.sh&#039;&#039; that runs a MPI program with 4 tasks and a 28-fold threaded program &#039;&#039;ompi_omp_program&#039;&#039; requiring 3000 MByte of physical memory per thread (using 28 threads per MPI task you will get 28*3000 MByte = 84000 MByte per MPI task) and total wall clock time of 3 hours looks like:&lt;br /&gt;
&amp;lt;!--b)--&amp;gt;&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --ntasks=4&lt;br /&gt;
#SBATCH --cpus-per-task=56&lt;br /&gt;
#SBATCH --time=03:00:00&lt;br /&gt;
#SBATCH --mem=83gb    # 84000 MB = 84000/1024 GB = 82.1 GB&lt;br /&gt;
#SBATCH --export=ALL,MPI_MODULE=mpi/openmpi/3.1,EXECUTABLE=./ompi_omp_program&lt;br /&gt;
#SBATCH --output=&amp;quot;parprog_hybrid_%j.out&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
# Use when a defined module environment related to OpenMPI is wished&lt;br /&gt;
module load ${MPI_MODULE}&lt;br /&gt;
export OMP_NUM_THREADS=$((${SLURM_CPUS_PER_TASK}/2))&lt;br /&gt;
export MPIRUN_OPTIONS=&amp;quot;--bind-to core --map-by socket:PE=${OMP_NUM_THREADS} -report-bindings&amp;quot;&lt;br /&gt;
export NUM_CORES=${SLURM_NTASKS}*${OMP_NUM_THREADS}&lt;br /&gt;
echo &amp;quot;${EXECUTABLE} running on ${NUM_CORES} cores with ${SLURM_NTASKS} MPI-tasks and ${OMP_NUM_THREADS} threads&amp;quot;&lt;br /&gt;
startexe=&amp;quot;mpirun -n ${SLURM_NTASKS} ${MPIRUN_OPTIONS} ${EXECUTABLE}&amp;quot;&lt;br /&gt;
echo $startexe&lt;br /&gt;
exec $startexe&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Execute the script &#039;&#039;&#039;job_ompi_omp.sh&#039;&#039;&#039; by command sbatch:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p compute ./job_ompi_omp.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* With the mpirun option &#039;&#039;--bind-to core&#039;&#039; MPI tasks and OpenMP threads are bound to physical cores.&lt;br /&gt;
* With the option &#039;&#039;--map-by node:PE=&amp;lt;value&amp;gt;&#039;&#039; (neighbored) MPI tasks will be attached to different nodes and each MPI task is bound to the first core of a node. &amp;lt;value&amp;gt; must be set to ${OMP_NUM_THREADS}.&lt;br /&gt;
* The option &#039;&#039;-report-bindings&#039;&#039; shows the bindings between MPI tasks and physical cores.&lt;br /&gt;
* The mpirun-options &#039;&#039;&#039;--bind-to core&#039;&#039;&#039;, &#039;&#039;&#039;--map-by socket|...|node:PE=&amp;lt;value&amp;gt;&#039;&#039;&#039; should always be used when running a multithreaded MPI program.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== GPU jobs ====&lt;br /&gt;
&lt;br /&gt;
The nodes in the &amp;lt;code&amp;gt;gpu&amp;lt;/code&amp;gt; queue have 2 or 4 NVIDIA A30/A100/H200 GPUs. Just submitting a job to these queues is not enough to also allocate one or more GPUs, you have to do so using the &amp;quot;--gres=gpu&amp;quot; parameter. You have to specifiy how many GPUs your job needs, e.g. &amp;quot;--gres=gpu:a30:2&amp;quot; will request two NVIDIA A30 GPUs.&lt;br /&gt;
&lt;br /&gt;
The GPU nodes are shared between multiple jobs if the jobs don&#039;t request all the GPUs in a node and there are enough ressources to run more than one job. The individual GPUs are always bound to a single job and will not be shared between different jobs.&lt;br /&gt;
&lt;br /&gt;
a) add after the initial line of your script job.sh the line including the&lt;br /&gt;
information about the GPU usage:&amp;lt;br&amp;gt;   #SBATCH --gres=gpu:a30:2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --ntasks=40&lt;br /&gt;
#SBATCH --time=02:00:00&lt;br /&gt;
#SBATCH --mem=4000&lt;br /&gt;
#SBATCH --gres=gpu:a30:2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or b) execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p &amp;lt;queue&amp;gt; -n 40 -t 02:00:00 --mem 4000 --gres=gpu:a30:2 job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
If you start an interactive session on of the GPU nodes, you can use the &amp;quot;nvidia-smi&amp;quot; command to list the GPUs allocated to your job:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ nvidia-smi&lt;br /&gt;
Sun Mar 29 15:20:05 2020       &lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |&lt;br /&gt;
|-------------------------------+----------------------+----------------------+&lt;br /&gt;
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |&lt;br /&gt;
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |&lt;br /&gt;
|===============================+======================+======================|&lt;br /&gt;
|   0  Tesla V100-SXM2...  Off  | 00000000:3A:00.0 Off |                    0 |&lt;br /&gt;
| N/A   29C    P0    39W / 300W |      9MiB / 32510MiB |      0%      Default |&lt;br /&gt;
+-------------------------------+----------------------+----------------------+&lt;br /&gt;
|   1  Tesla V100-SXM2...  Off  | 00000000:3B:00.0 Off |                    0 |&lt;br /&gt;
| N/A   30C    P0    41W / 300W |      8MiB / 32510MiB |      0%      Default |&lt;br /&gt;
+-------------------------------+----------------------+----------------------+&lt;br /&gt;
                                                                               &lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
| Processes:                                                       GPU Memory |&lt;br /&gt;
|  GPU       PID   Type   Process name                             Usage      |&lt;br /&gt;
|=============================================================================|&lt;br /&gt;
|    0     14228      G   /usr/bin/X                                     8MiB |&lt;br /&gt;
|    1     14228      G   /usr/bin/X                                     8MiB |&lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Upon successfull GPU ressource allocation, SLURM will set the environment variable &amp;lt;code&amp;gt;CUDA_VISIBLE_DEVICES&amp;lt;/code&amp;gt; appropriately. &amp;lt;b&amp;gt;Do not change this variable!&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
In case of using OpenMPI, the underlying communication infrastructure (UCX and Open MPI&#039;s BTL) is CUDA-aware.&lt;br /&gt;
However, there may be warnings, e.g. when running&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load compiler/gnu/10.3 mpi/openmpi devel/cuda&lt;br /&gt;
$ mpirun mpirun -np 2 ./mpi_cuda_app&lt;br /&gt;
--------------------------------------&lt;br /&gt;
WARNING: There are more than one active ports on host &#039;uc2n520&#039;, but the&lt;br /&gt;
default subnet GID prefix was detected on more than one of these&lt;br /&gt;
ports.  If these ports are connected to different physical IB&lt;br /&gt;
networks, this configuration will fail in Open MPI.  This version of&lt;br /&gt;
Open MPI requires that every physically separate IB subnet that is&lt;br /&gt;
used between connected MPI processes must have different subnet ID&lt;br /&gt;
values.&lt;br /&gt;
&lt;br /&gt;
Please see this FAQ entry for more details:&lt;br /&gt;
&lt;br /&gt;
  http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid&lt;br /&gt;
&lt;br /&gt;
NOTE: You can turn off this warning by setting the MCA parameter&lt;br /&gt;
      btl_openib_warn_default_gid_prefix to 0.&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please run OpenMPI&#039;s mpirun using the following command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun --mca pml ucx --mca btl_openib_warn_default_gid_prefix 0 -np 2 ./mpi_cuda_app&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or disabling the (older) communication layer Bit-Transfer-Layer (short BTL) altogether:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun --mca pml ucx --mca btl ^openib -np 2 ./mpi_cuda_app&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(Please note, that CUDA per v12.8 is only officially supported with up to GCC-11)&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Start time of job or resources : squeue --start ==&lt;br /&gt;
The command can be used by any user to displays the estimated start time of a job based a number of analysis types based on historical usage, earliest available reservable resources, and priority based backlog. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be run by &#039;&#039;&#039;any user&#039;&#039;&#039;. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== List of your submitted jobs : squeue ==&lt;br /&gt;
Displays information about YOUR active, pending and/or recently completed jobs. The command displays all own active and pending jobs. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be run by any user.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Flags ===&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Flag !! Description&lt;br /&gt;
|-&lt;br /&gt;
| -l, --long&lt;br /&gt;
| Report more of the available information for the selected jobs or job steps, subject to any constraints specified.&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Examples ===&lt;br /&gt;
&#039;&#039;squeue&#039;&#039; example on BinaC 2 &amp;lt;small&amp;gt;(Only your own jobs are displayed!)&amp;lt;/small&amp;gt;.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue &lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
          18088744    single CPV.sbat   ab1234 PD       0:00      1 (Priority)&lt;br /&gt;
          18098414  multiple CPV.sbat   ab1234 PD       0:00      2 (Priority) &lt;br /&gt;
          18090089  multiple CPV.sbat   ab1234  R       2:27      2 uc2n[127-128]&lt;br /&gt;
$ squeue -l&lt;br /&gt;
            JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON) &lt;br /&gt;
         18088654    single CPV.sbat   ab1234 COMPLETI       4:29   2:00:00      1 uc2n374&lt;br /&gt;
         18088785    single CPV.sbat   ab1234  PENDING       0:00   2:00:00      1 (Priority)&lt;br /&gt;
         18098414  multiple CPV.sbat   ab1234  PENDING       0:00   2:00:00      2 (Priority)&lt;br /&gt;
         18088683    single CPV.sbat   ab1234  RUNNING       0:14   2:00:00      1 uc2n413  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The output of &#039;&#039;squeue&#039;&#039; shows how many jobs of yours are running or pending and how many nodes are in use by your jobs.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Shows free resources : sinfo_t_idle ==&lt;br /&gt;
The Slurm command sinfo is used to view partition and node information for a system running Slurm. It incorporates down time, reservations, and node state information in determining the available backfill window. The sinfo command can only be used by the administrator.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
SCC has prepared a special script (sinfo_t_idle) to find out how many processors are available for immediate use on the system. It is anticipated that users will use this information to submit jobs that meet these criteria and thus obtain quick job turnaround times. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be used by any user or administrator. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Example ===&lt;br /&gt;
* The following command displays what resources are available for immediate use for the whole partition.&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sinfo_t_idle&lt;br /&gt;
Partition dev_multiple  :      8 nodes idle&lt;br /&gt;
Partition multiple      :    332 nodes idle&lt;br /&gt;
Partition dev_single    :      4 nodes idle&lt;br /&gt;
Partition single        :     76 nodes idle&lt;br /&gt;
Partition long          :     80 nodes idle&lt;br /&gt;
Partition fat           :      5 nodes idle&lt;br /&gt;
Partition dev_special   :    342 nodes idle&lt;br /&gt;
Partition special       :    342 nodes idle&lt;br /&gt;
Partition dev_multiple_e:      7 nodes idle&lt;br /&gt;
Partition multiple_e    :    335 nodes idle&lt;br /&gt;
Partition gpu_4         :     12 nodes idle&lt;br /&gt;
Partition gpu_8         :      6 nodes idle&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For the above example jobs in all partitions can be run immediately.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Detailed job information : scontrol show job ==&lt;br /&gt;
scontrol show job displays detailed job state information and diagnostic output for all or a specified job of yours. Detailed information is available for active, pending and recently completed jobs. The command scontrol is explained in detail on the webpage https://slurm.schedmd.com/scontrol.html or via manpage (man scontrol). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of all your jobs in normal mode: scontrol show job&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of a job with &amp;lt;jobid&amp;gt; in normal mode: scontrol show job &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
* End users can use scontrol show job to view the status of their &#039;&#039;&#039;own jobs&#039;&#039;&#039; only. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Arguments ===&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Option !! Default !! Description !! Example&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
|- style=&amp;quot;width:12%;&amp;quot; &lt;br /&gt;
| -d&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Detailed mode&lt;br /&gt;
| Example: Display the state with jobid 18089884 in detailed mode. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt;scontrol -d show job 18089884&amp;lt;/pre&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Scontrol show job Example ===&lt;br /&gt;
Here is an example from BinAC 2.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue    # show my own jobs (here the userid is replaced!)&lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
          18089884  multiple CPV.sbat   bq0742  R      33:44      2 uc2n[165-166]&lt;br /&gt;
&lt;br /&gt;
$&lt;br /&gt;
$ # now, see what&#039;s up with my pending job with jobid 18089884&lt;br /&gt;
$ &lt;br /&gt;
$ scontrol show job 18089884&lt;br /&gt;
&lt;br /&gt;
JobId=18089884 JobName=CPV.sbatch&lt;br /&gt;
   UserId=bq0742(8946) GroupId=scc(12345) MCS_label=N/A&lt;br /&gt;
   Priority=3 Nice=0 Account=kit QOS=normal&lt;br /&gt;
   JobState=RUNNING Reason=None Dependency=(null)&lt;br /&gt;
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0&lt;br /&gt;
   RunTime=00:35:06 TimeLimit=02:00:00 TimeMin=N/A&lt;br /&gt;
   SubmitTime=2020-03-16T14:14:54 EligibleTime=2020-03-16T14:14:54&lt;br /&gt;
   AccrueTime=2020-03-16T14:14:54&lt;br /&gt;
   StartTime=2020-03-16T15:12:51 EndTime=2020-03-16T17:12:51 Deadline=N/A&lt;br /&gt;
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-03-16T15:12:51&lt;br /&gt;
   Partition=multiple AllocNode:Sid=uc2n995:5064&lt;br /&gt;
   ReqNodeList=(null) ExcNodeList=(null)&lt;br /&gt;
   NodeList=uc2n[165-166]&lt;br /&gt;
   BatchHost=uc2n165&lt;br /&gt;
   NumNodes=2 NumCPUs=160 NumTasks=80 CPUs/Task=1 ReqB:S:C:T=0:0:*:1&lt;br /&gt;
   TRES=cpu=160,mem=96320M,node=2,billing=160&lt;br /&gt;
   Socks/Node=* NtasksPerN:B:S:C=40:0:*:1 CoreSpec=*&lt;br /&gt;
   MinCPUsNode=40 MinMemoryCPU=1204M MinTmpDiskNode=0&lt;br /&gt;
   Features=(null) DelayBoot=00:00:00&lt;br /&gt;
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)&lt;br /&gt;
   Command=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/CPV.sbatch&lt;br /&gt;
   WorkDir=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin&lt;br /&gt;
   StdErr=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/slurm-18089884.out&lt;br /&gt;
   StdIn=/dev/null&lt;br /&gt;
   StdOut=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/slurm-18089884.out&lt;br /&gt;
   Power=&lt;br /&gt;
   MailUser=(null) MailType=NONE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
You can use standard Linux pipe commands to filter the very detailed scontrol show job output.&lt;br /&gt;
* In which state the job is?&lt;br /&gt;
&amp;lt;pre&amp;gt;$ scontrol show job 18089884 | grep -i State&lt;br /&gt;
   JobState=COMPLETED Reason=None Dependency=(null)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Cancel Slurm Jobs ==&lt;br /&gt;
The scancel command is used to cancel jobs. The command scancel is explained in detail on the webpage https://slurm.schedmd.com/scancel.html or via manpage (man scancel).   &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Canceling own jobs : scancel ===&lt;br /&gt;
scancel is used to signal or cancel jobs, job arrays or job steps. The command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel [-i] &amp;lt;job-id&amp;gt;&lt;br /&gt;
$ scancel -t &amp;lt;job_state_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Flag !! Default !! Description !! Example&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -i, --interactive&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Interactive mode.&lt;br /&gt;
| Cancel the job 987654 interactively. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt; scancel -i 987654 &amp;lt;/pre&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| -t, --state&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Restrict the scancel operation to jobs in a certain state. &amp;lt;br&amp;gt; &amp;quot;job_state_name&amp;quot; may have a value of either &amp;quot;PENDING&amp;quot;, &amp;quot;RUNNING&amp;quot; or &amp;quot;SUSPENDED&amp;quot;.&lt;br /&gt;
| Cancel all jobs in state &amp;quot;PENDING&amp;quot;. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt; scancel -t &amp;quot;PENDING&amp;quot; &amp;lt;/pre&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Resource Managers =&lt;br /&gt;
=== Batch Job (Slurm) Variables ===&lt;br /&gt;
The following environment variables of Slurm are added to your environment once your job has started&lt;br /&gt;
&amp;lt;small&amp;gt;(only an excerpt of the most important ones)&amp;lt;/small&amp;gt;.&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Environment !! Brief explanation&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_CPUS_PER_NODE &lt;br /&gt;
| Number of processes per node dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_NODELIST &lt;br /&gt;
| List of nodes dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_NUM_NODES &lt;br /&gt;
| Number of nodes dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_MEM_PER_NODE &lt;br /&gt;
| Memory per node dedicated to the job &lt;br /&gt;
|- &lt;br /&gt;
| SLURM_NPROCS&lt;br /&gt;
| Total number of processes dedicated to the job &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_CLUSTER_NAME&lt;br /&gt;
| Name of the cluster executing the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_CPUS_PER_TASK &lt;br /&gt;
| Number of CPUs requested per task&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_ACCOUNT&lt;br /&gt;
| Account name &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_ID&lt;br /&gt;
| Job ID&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_NAME&lt;br /&gt;
| Job Name&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_PARTITION&lt;br /&gt;
| Partition/queue running the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_UID&lt;br /&gt;
| User ID of the job&#039;s owner&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_SUBMIT_DIR&lt;br /&gt;
| Job submit folder.  The directory from which sbatch was invoked. &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_USER&lt;br /&gt;
| User name of the job&#039;s owner&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_RESTART_COUNT&lt;br /&gt;
| Number of times job has restarted&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_PROCID&lt;br /&gt;
| Task ID (MPI rank)&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_NTASKS&lt;br /&gt;
| The total number of tasks available for the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_STEP_ID&lt;br /&gt;
| Job step ID&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_STEP_NUM_TASKS&lt;br /&gt;
| Task count (number of MPI ranks)&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_CONSTRAINT&lt;br /&gt;
| Job constraints&lt;br /&gt;
|}&lt;br /&gt;
See also:&lt;br /&gt;
* [https://slurm.schedmd.com/sbatch.html#lbAI Slurm input and output environment variables]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Job Exit Codes ===&lt;br /&gt;
A job&#039;s exit code (also known as exit status, return code and completion code) is captured by SLURM and saved as part of the job record. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Any non-zero exit code will be assumed to be a job failure and will result in a Job State of FAILED with a reason of &amp;quot;NonZeroExitCode&amp;quot;.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The exit code is an 8 bit unsigned number ranging between 0 and 255. While it is possible for a job to return a negative exit code, SLURM will display it as an unsigned value in the 0 - 255 range.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
==== Displaying Exit Codes and Signals ====&lt;br /&gt;
SLURM displays a job&#039;s exit code in the output of the &#039;&#039;&#039;scontrol show job&#039;&#039;&#039; and the sview utility.&lt;br /&gt;
&amp;lt;br&amp;gt; &lt;br /&gt;
When a signal was responsible for a job or step&#039;s termination, the signal number will be displayed after the exit code, delineated by a colon(:).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
==== Submitting Termination Signal ====&lt;br /&gt;
Here is an example, how to &#039;save&#039; a Slurm termination signal in a typical jobscript.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
exit_code=$?&lt;br /&gt;
mpirun  -np &amp;lt;#cores&amp;gt;  &amp;lt;EXE_BIN_DIR&amp;gt;/&amp;lt;executable&amp;gt; ... (options)  2&amp;gt;&amp;amp;1&lt;br /&gt;
[ &amp;quot;$exit_code&amp;quot; -eq 0 ] &amp;amp;&amp;amp; echo &amp;quot;all clean...&amp;quot; || \&lt;br /&gt;
   echo &amp;quot;Executable &amp;lt;EXE_BIN_DIR&amp;gt;/&amp;lt;executable&amp;gt; finished with exit code ${$exit_code}&amp;quot;&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
* Do not use &#039;&#039;&#039;&#039;time&#039;&#039;&#039;&#039; mpirun! The exit code will be the one submitted by the first (time) program.&lt;br /&gt;
* You do not need an &#039;&#039;&#039;exit $exit_code&#039;&#039;&#039; in the scripts.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
----&lt;br /&gt;
[[#top|Back to top]]&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Slurm&amp;diff=15669</id>
		<title>BinAC2/Slurm</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Slurm&amp;diff=15669"/>
		<updated>2026-01-07T08:46:11Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: /* GPU jobs */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= General information about Slurm =&lt;br /&gt;
&lt;br /&gt;
Any kind of calculation on the compute nodes of bwForCluster BinAC 2 requires the user to define calculations as a sequence of commands or single command together with required run time, number of CPU cores and main memory and submit all, i.e., the &#039;&#039;&#039;batch job&#039;&#039;&#039;, to a resource and workload managing software. BinAC 2 has installed the workload managing software Slurm. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.&lt;br /&gt;
&lt;br /&gt;
= External Slurm documentation =&lt;br /&gt;
&lt;br /&gt;
You can find the official Slurm configuration and some other material here:&lt;br /&gt;
&lt;br /&gt;
* Slurm documentation: https://slurm.schedmd.com/documentation.html&lt;br /&gt;
* Slurm cheat sheet: https://slurm.schedmd.com/pdfs/summary.pdf&lt;br /&gt;
* Slurm tutorials: https://slurm.schedmd.com/tutorials.html&lt;br /&gt;
&lt;br /&gt;
= SLURM terminology = &lt;br /&gt;
&lt;br /&gt;
SLURM knows and mirrors the division of the cluster into &#039;&#039;&#039;nodes&#039;&#039;&#039; with several &#039;&#039;&#039;cores&#039;&#039;&#039;. When queuing &#039;&#039;&#039;jobs&#039;&#039;&#039;, there are several ways of requesting resources and it is important to know which term means what in SLURM. Here are some basic SLURM terms:&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Job&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Job&lt;br /&gt;
: A job is a self-contained computation that may encompass multiple tasks and is given specific resources like individual CPUs/GPUs, a specific amount of RAM or entire nodes. These resources are said to have been allocated for the job.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Task&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Task&lt;br /&gt;
: A task is a single run of a single process. By default, one task is run per node and one CPU is assigned per task.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Partition&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Partition    &lt;br /&gt;
: A partition (usually called queue outside SLURM) is a waiting line in which jobs are put by users.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Socket&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Socket    &lt;br /&gt;
: Receptacle on the motherboard for one physically packaged processor (each of which can contain one or more cores).&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Core&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Core    &lt;br /&gt;
: A complete private set of registers, execution units, and retirement queues needed to execute programs.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Thread&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Thread    &lt;br /&gt;
: One or more hardware contexts withing a single core. Each thread has attributes of one core, managed &amp;amp; scheduled as a single logical processor by the OS.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;CPU&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;CPU&lt;br /&gt;
: A &#039;&#039;&#039;CPU&#039;&#039;&#039; in Slurm means a &#039;&#039;&#039;single core&#039;&#039;&#039;. This is different from the more common terminology, where a CPU (a microprocessor chip) consists of multiple cores. Slurm uses the term &#039;&#039;&#039;sockets&#039;&#039;&#039; when talking about CPU chips. Depending upon system configuration, a CPU can be either a &#039;&#039;&#039;core&#039;&#039;&#039; or a &#039;&#039;&#039;thread&#039;&#039;&#039;. On &#039;&#039;&#039;BinAC 2 Hyperthreading is activated on every machine&#039;&#039;&#039;. This means that the operating system and Slurm sees each physical core as two logical cores.&lt;br /&gt;
&lt;br /&gt;
= Slurm Commands =&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Brief explanation&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/sbatch.html sbatch] || Submits a job and queues it in an input queue&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/salloc.html saclloc] || Request resources for an interactive job&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/squeue.html squeue] || Displays information about active, eligible, blocked, and/or recently completed jobs &lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/scontrol.html scontrol] || Displays detailed job state information&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/scontrol.html sstat] || Displays status information about a running job&lt;br /&gt;
|- &lt;br /&gt;
| [https://slurm.schedmd.com/scancel.html scancel] || Cancels a job&lt;br /&gt;
|- &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Interactive Jobs ==&lt;br /&gt;
&lt;br /&gt;
You can run interactive jobs for testing and developing your job scripts.&lt;br /&gt;
Several nodes are reserved for interactive work, so your jobs should start right away.&lt;br /&gt;
You can only submit one job to this partition at a time. A job can run for up to 10 hours (about one workday).&lt;br /&gt;
&lt;br /&gt;
This example command gives you 16 cores and 128 GB of memory for four hours on one of the reserved nodes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --partition=interactive --time=4:00:00 --cpus-per-task=16 --mem=128gb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also use srun to request the same resources:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
srun --partition=interactive --time=4:00:00 --cpus-per-task=16 --mem=128gb --pty bash&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Job Submission : sbatch ==&lt;br /&gt;
&lt;br /&gt;
Batch jobs are submitted by using the command &#039;&#039;&#039;sbatch&#039;&#039;&#039;. The main purpose of the &#039;&#039;&#039;sbatch&#039;&#039;&#039; command is to specify the resources that are needed to run the job. &#039;&#039;&#039;sbatch&#039;&#039;&#039; will then queue the batch job. However, starting of batch job depends on the availability of the requested resources and the fair sharing value.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== sbatch Command Parameters ===&lt;br /&gt;
The syntax and use of &#039;&#039;&#039;sbatch&#039;&#039;&#039; can be displayed via:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ man sbatch&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;sbatch&#039;&#039;&#039; options can be used from the command line or in your job script. The following table shows the syntax and provides examples for each option.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;5&amp;quot; | sbatch Options&lt;br /&gt;
|-&lt;br /&gt;
! Command line&lt;br /&gt;
! Job Script&lt;br /&gt;
! Purpose&lt;br /&gt;
! Example&lt;br /&gt;
! Default value&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| &amp;lt;code&amp;gt;-t &#039;&#039;time&#039;&#039;&amp;lt;/code&amp;gt;  or  &amp;lt;code&amp;gt;--time=&#039;&#039;time&#039;&#039;&amp;lt;/code&amp;gt;&lt;br /&gt;
| #SBATCH --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| Wall clock time limit.&amp;lt;br&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;-t 2:30:00&amp;lt;/code&amp;gt; Limits run time to 2h 30 min.&amp;lt;/br&amp;gt;&amp;lt;code&amp;gt;-t 2-12&amp;lt;/code&amp;gt; Limits run time to 2 days and 12 hours.&lt;br /&gt;
| Depends on Slurm partition.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -N &#039;&#039;count&#039;&#039;  or  --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of nodes to be used.&lt;br /&gt;
| &amp;lt;code&amp;gt;-N 1&amp;lt;/code&amp;gt; Run job on one node.&amp;lt;/br&amp;gt;&amp;lt;code&amp;gt;-N 2&amp;lt;/code&amp;gt; Run job on two nodes (have to use MPI!)&lt;br /&gt;
| &lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -n &#039;&#039;count&#039;&#039;  or  --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of tasks to be launched.&lt;br /&gt;
| &amp;lt;code&amp;gt;-n 2&amp;lt;/code&amp;gt; launch two tasks in the job.&lt;br /&gt;
| One task per node&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Maximum count of tasks per node.&amp;lt;br&amp;gt;(Replaces the option &amp;lt;code&amp;gt;ppn&amp;lt;/code&amp;gt; of MOAB.)&lt;br /&gt;
| &amp;lt;code&amp;gt;--ntasks-per-node=2&amp;lt;/code&amp;gt; Run 2 tasks per node&lt;br /&gt;
| 1 task per node&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -c &#039;&#039;count&#039;&#039; or --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of CPUs required per (MPI-)task.&lt;br /&gt;
| &amp;lt;code&amp;gt;-c 2&amp;lt;/code&amp;gt; Request two CPUs per (MPI-)task.&lt;br /&gt;
| 1 CPU per (MPI-)task&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| &amp;lt;code&amp;gt;--mem=&amp;lt;size&amp;gt;[units]&amp;lt;/code&amp;gt;&lt;br /&gt;
| #SBATCH --mem=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Memory in MegaByte per node.&amp;lt;/br&amp;gt;&amp;lt;code&amp;gt;[units]&amp;lt;/code&amp;gt; can be one of &amp;lt;code&amp;gt;[K&amp;lt;nowiki&amp;gt;|&amp;lt;/nowiki&amp;gt;M&amp;lt;nowiki&amp;gt;|&amp;lt;/nowiki&amp;gt;G&amp;lt;nowiki&amp;gt;|&amp;lt;/nowiki&amp;gt;T]&amp;lt;/code&amp;gt;.&lt;br /&gt;
| &amp;lt;code&amp;gt;--mem=10g&amp;lt;/code&amp;gt; Request 10GB RAM per node &amp;lt;/br&amp;gt; &amp;lt;code&amp;gt;--mem=0&amp;lt;/code&amp;gt; Request all memory on node&lt;br /&gt;
| Depends on Slurm configuration.&amp;lt;/br&amp;gt;It is better to specify &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; in every case.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Minimum Memory required per allocated CPU.&amp;lt;br&amp;gt;(Replaces the option pmem of MOAB. You should omit &amp;lt;br&amp;gt; the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| Notify user by email when certain event types occur.&amp;lt;br&amp;gt;Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
|  The specified mail-address receives email notification of state&amp;lt;br&amp;gt;changes as defined by --mail-type.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job output is stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job error messages are stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -J &#039;&#039;name&#039;&#039; or --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| Job name.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| #SBATCH --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| Identifies which environment variables from the submission &amp;lt;br&amp;gt; environment are propagated to the launched application. Default &amp;lt;br&amp;gt; is ALL. If adding an environment variable to the submission&amp;lt;br&amp;gt; environment is intended, the argument ALL must be added.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -A &#039;&#039;group-name&#039;&#039; or --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| #SBATCH --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| Change resources used by this job to specified group. You may &amp;lt;br&amp;gt; need this option if your account is assigned to more &amp;lt;br&amp;gt; than one group. By command &amp;quot;scontrol show job&amp;quot; the project &amp;lt;br&amp;gt; group the job is accounted on can be seen behind &amp;quot;Account=&amp;quot;. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -p &#039;&#039;queue-name&#039;&#039; or --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| #SBATCH --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| Request a specific queue for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| #SBATCH --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| Use a specific reservation for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C &#039;&#039;LSDF&#039;&#039; or --constraint=&#039;&#039;LSDF&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=LSDF&lt;br /&gt;
| Job constraint LSDF Filesystems.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== sbatch --partition  &#039;&#039;queues&#039;&#039; ====&lt;br /&gt;
Queue classes define maximum resources such as walltime, nodes and processes per node and queue of the compute system. Details can be found here:&lt;br /&gt;
* [[BinAC2/SLURM_Partitions|BinAC 2 partitions]]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== sbatch Examples ===&lt;br /&gt;
&lt;br /&gt;
If you are coming from Moab/Torque on BinAC 1 or you are new to HPC/Slurm the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; options may confuse you. The following examples give an orientation how to run typical workloads on BinAC 2.&lt;br /&gt;
&lt;br /&gt;
You can find every file mentioned on this Wiki page on BinAC 2 at: &amp;lt;code&amp;gt;/pfs/10/project/examples&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Serial Programs ====&lt;br /&gt;
When you use serial programs that use only one process, you can omit most of the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; parameters, as the default values are sufficient.&lt;br /&gt;
&lt;br /&gt;
To submit a serial job that runs the script &amp;lt;code&amp;gt;serial_job.sh&amp;lt;/code&amp;gt; and requires 5000 MB of main memory and 10 minutes of wall clock time, Slurm will allocate one &#039;&#039;&#039;physical&#039;&#039;&#039; core to your job.&lt;br /&gt;
&lt;br /&gt;
a) execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p compute -t 10:00 --mem=5000m  serial_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or&lt;br /&gt;
b) add after the initial line of your script &#039;&#039;&#039;serial_job.sh&#039;&#039;&#039; the lines:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#SBATCH --time=10:00&lt;br /&gt;
#SBATCH --mem=5000m&lt;br /&gt;
#SBATCH --job-name=simple-serial-job&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
and execute the modified script with the command line option &#039;&#039;--partition=compute&#039;&#039;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p=compute serial_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note, that sbatch command line options overrule script options.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Multithreaded Programs ====&lt;br /&gt;
&lt;br /&gt;
Multithreaded programs run their processes on multiple threads and share resources such as memory.&amp;lt;br&amp;gt;&lt;br /&gt;
You may use a program that includes a built-in option for multithreading (e.g., options like &amp;lt;code&amp;gt;--threads&amp;lt;/code&amp;gt;).&amp;lt;br&amp;gt;&lt;br /&gt;
For multithreaded programs based on &#039;&#039;&#039;Open&#039;&#039;&#039; &#039;&#039;&#039;M&#039;&#039;&#039;ulti-&#039;&#039;&#039;P&#039;&#039;&#039;rocessing (OpenMP) a number of threads is defined by the environment variable &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt;. By default, this variable is set to 1 (&amp;lt;code&amp;gt;OMP_NUM_THREADS=1&amp;lt;/code&amp;gt;). &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Important:&#039;&#039;&#039; Hyperthreading is activated on bwForCluster BinAC 2. Hyperthreading can be beneficial for some applications and codes, but it can also degrade performance in other cases. We therefore recommend to run a small test job with and without hyperthreading to determine the best choice. &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;a) Program with built-in multithreading option&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
The example uses the common Bioinformatics software called &amp;lt;code&amp;gt;samtools&amp;lt;/code&amp;gt; as example for using built-in multithreading.&lt;br /&gt;
&lt;br /&gt;
The module &amp;lt;code&amp;gt;bio/samtools/1.21&amp;lt;/code&amp;gt; provides an example jobscript that requests 4 CPUs and runs &amp;lt;code&amp;gt;samtools sort&amp;lt;/code&amp;gt; with 4 threads.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#SBATCH --time=19:00&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --cpus-per-task=4&lt;br /&gt;
#SBATCH --mem=5000m&lt;br /&gt;
#SBATCH --partition compute&lt;br /&gt;
[...]&lt;br /&gt;
samtools sort -@ 4 sample.bam -o sample.sorted.bam&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can use the example jobscript with this command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch /opt/bwhpc/common/bio/samtools/1.21/bwhpc-examples/binac2-samtools-1.21-bwhpc-examples.slurm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;b) OpenMP&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
We will run an exaple OpenMP Hello-World program. The jobscript looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --cpus-per-task=4&lt;br /&gt;
#SBATCH --time=1:00&lt;br /&gt;
#SBATCH --mem=5000m   &lt;br /&gt;
#SBATCH -J OpenMP-Hello-World&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$(${SLURM_JOB_CPUS_PER_NODE}/2)&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;Executable running on ${SLURM_JOB_CPUS_PER_NODE} cores with ${OMP_NUM_THREADS} threads&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Run parallel Hello World&lt;br /&gt;
/pfs/10/project/examples/openmp_hello_world&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit the job to the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition and get the output (in the stdout-file)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch --partition=compute /pfs/10/project/examples/openmp_hello_world.sh&lt;br /&gt;
&lt;br /&gt;
Executable  running on 4 cores with 4 threads&lt;br /&gt;
Hello from process: 0&lt;br /&gt;
Hello from process: 2&lt;br /&gt;
Hello from process: 1&lt;br /&gt;
Hello from process: 3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
If you want to run MPI-jobs on batch nodes, generate a wrapper script &amp;lt;code&amp;gt;mpi_hello_world.sh&amp;lt;/code&amp;gt; for &#039;&#039;&#039;OpenMPI&#039;&#039;&#039; containing the following lines:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#SBATCH --partition compute&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=2&lt;br /&gt;
#SBATCH --cpus-per-task=2&lt;br /&gt;
#SBATCH --mem-per-cpu=2000&lt;br /&gt;
#SBATCH --time=05:00&lt;br /&gt;
&lt;br /&gt;
# Load the MPI implementation of your choice&lt;br /&gt;
module load mpi/openmpi/4.1-gnu-14.2&lt;br /&gt;
&lt;br /&gt;
# Run your MPI program&lt;br /&gt;
mpirun --bind-to core --map-by core --report-bindings mpi_hello_world&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Attention:&#039;&#039;&#039; Do &#039;&#039;&#039;NOT&#039;&#039;&#039; add mpirun options &amp;lt;code&amp;gt;-n &amp;lt;number_of_processes&amp;gt;&amp;lt;/code&amp;gt; or any other option defining processes or nodes, since Slurm instructs mpirun about number of processes and node hostnames.&lt;br /&gt;
&lt;br /&gt;
Use &#039;&#039;&#039;ALWAYS&#039;&#039;&#039; the MPI options &amp;lt;code&amp;gt;--bind-to core&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--map-by core|socket|node&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please type &amp;lt;code&amp;gt;man mpirun&amp;lt;/code&amp;gt; for an explanation of the meaning of the different options of mpirun option &amp;lt;code&amp;gt;--map-by&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The above jobscript runs four OpenMPI tasks, distributed between two nodes. Because of hyperthreading you have to set &amp;lt;code&amp;gt;--cpus-per-task=2&amp;lt;/code&amp;gt;. This means each MPI-task will get one physical core. If you omit &amp;lt;code&amp;gt;--cpus-per-task=2&amp;lt;/code&amp;gt; MPI will fail.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Attention:&#039;&#039;&#039; Not all compute nodes are connected via Infiniband. Tell Slurm you want Infiniband via &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt; when submitting or add &amp;lt;code&amp;gt;#SBATCH --constraint=ib&amp;lt;/code&amp;gt; to your jobscript.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch --constraint=ib /pfs/10/project/examples/mpi_hello_world.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will run a simple Hello World program:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
Hello world from processor node2-031, rank 3 out of 4 processors&lt;br /&gt;
Hello world from processor node2-031, rank 2 out of 4 processors&lt;br /&gt;
Hello world from processor node2-030, rank 1 out of 4 processors&lt;br /&gt;
Hello world from processor node2-030, rank 0 out of 4 processors&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Multithreaded + MPI parallel Programs ====&lt;br /&gt;
&lt;br /&gt;
Multithreaded + MPI parallel programs operate faster than serial programs on multi CPUs with multiple cores. All threads of one process share resources such as memory. On the contrary MPI tasks do not share memory but can be spawned over different nodes. &#039;&#039;&#039;Because hyperthreading is switched on BinaC 2, the option --cpus-per-task (-c) must be set to 2*n, if you want to use n threads.&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
===== OpenMPI with Multithreading =====&lt;br /&gt;
Multiple MPI tasks using &#039;&#039;&#039;OpenMPI&#039;&#039;&#039; must be launched by the MPI parallel program &#039;&#039;&#039;mpirun&#039;&#039;&#039;. For multithreaded programs based on &#039;&#039;&#039;Open&#039;&#039;&#039; &#039;&#039;&#039;M&#039;&#039;&#039;ulti-&#039;&#039;&#039;P&#039;&#039;&#039;rocessing (OpenMP) number of threads are defined by the environment variable OMP_NUM_THREADS. By default this variable is set to 1 (OMP_NUM_THREADS=1).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;For OpenMPI&#039;&#039;&#039; a job-script to submit a batch job called &#039;&#039;job_ompi_omp.sh&#039;&#039; that runs a MPI program with 4 tasks and a 28-fold threaded program &#039;&#039;ompi_omp_program&#039;&#039; requiring 3000 MByte of physical memory per thread (using 28 threads per MPI task you will get 28*3000 MByte = 84000 MByte per MPI task) and total wall clock time of 3 hours looks like:&lt;br /&gt;
&amp;lt;!--b)--&amp;gt;&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --ntasks=4&lt;br /&gt;
#SBATCH --cpus-per-task=56&lt;br /&gt;
#SBATCH --time=03:00:00&lt;br /&gt;
#SBATCH --mem=83gb    # 84000 MB = 84000/1024 GB = 82.1 GB&lt;br /&gt;
#SBATCH --export=ALL,MPI_MODULE=mpi/openmpi/3.1,EXECUTABLE=./ompi_omp_program&lt;br /&gt;
#SBATCH --output=&amp;quot;parprog_hybrid_%j.out&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
# Use when a defined module environment related to OpenMPI is wished&lt;br /&gt;
module load ${MPI_MODULE}&lt;br /&gt;
export OMP_NUM_THREADS=$((${SLURM_CPUS_PER_TASK}/2))&lt;br /&gt;
export MPIRUN_OPTIONS=&amp;quot;--bind-to core --map-by socket:PE=${OMP_NUM_THREADS} -report-bindings&amp;quot;&lt;br /&gt;
export NUM_CORES=${SLURM_NTASKS}*${OMP_NUM_THREADS}&lt;br /&gt;
echo &amp;quot;${EXECUTABLE} running on ${NUM_CORES} cores with ${SLURM_NTASKS} MPI-tasks and ${OMP_NUM_THREADS} threads&amp;quot;&lt;br /&gt;
startexe=&amp;quot;mpirun -n ${SLURM_NTASKS} ${MPIRUN_OPTIONS} ${EXECUTABLE}&amp;quot;&lt;br /&gt;
echo $startexe&lt;br /&gt;
exec $startexe&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Execute the script &#039;&#039;&#039;job_ompi_omp.sh&#039;&#039;&#039; by command sbatch:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p compute ./job_ompi_omp.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* With the mpirun option &#039;&#039;--bind-to core&#039;&#039; MPI tasks and OpenMP threads are bound to physical cores.&lt;br /&gt;
* With the option &#039;&#039;--map-by node:PE=&amp;lt;value&amp;gt;&#039;&#039; (neighbored) MPI tasks will be attached to different nodes and each MPI task is bound to the first core of a node. &amp;lt;value&amp;gt; must be set to ${OMP_NUM_THREADS}.&lt;br /&gt;
* The option &#039;&#039;-report-bindings&#039;&#039; shows the bindings between MPI tasks and physical cores.&lt;br /&gt;
* The mpirun-options &#039;&#039;&#039;--bind-to core&#039;&#039;&#039;, &#039;&#039;&#039;--map-by socket|...|node:PE=&amp;lt;value&amp;gt;&#039;&#039;&#039; should always be used when running a multithreaded MPI program.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== GPU jobs ====&lt;br /&gt;
&lt;br /&gt;
The nodes in the &amp;lt;code&amp;gt;gpu&amp;lt;/code&amp;gt; queue have 2 or 4 NVIDIA A30/A100/H200 GPUs. Just submitting a job to these queues is not enough to also allocate one or more GPUs, you have to do so using the &amp;quot;--gres=gpu&amp;quot; parameter. You have to specifiy how many GPUs your job needs, e.g. &amp;quot;--gres=gpu:a30:2&amp;quot; will request two NVIDIA A30 GPUs.&lt;br /&gt;
&lt;br /&gt;
The GPU nodes are shared between multiple jobs if the jobs don&#039;t request all the GPUs in a node and there are enough ressources to run more than one job. The individual GPUs are always bound to a single job and will not be shared between different jobs.&lt;br /&gt;
&lt;br /&gt;
a) add after the initial line of your script job.sh the line including the&lt;br /&gt;
information about the GPU usage:&amp;lt;br&amp;gt;   #SBATCH --gres=gpu:a30:2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --ntasks=40&lt;br /&gt;
#SBATCH --time=02:00:00&lt;br /&gt;
#SBATCH --mem=4000&lt;br /&gt;
#SBATCH --gres=gpu:a30:2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or b) execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p &amp;lt;queue&amp;gt; -n 40 -t 02:00:00 --mem 4000 --gres=gpu:a30:2 job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
If you start an interactive session on of the GPU nodes, you can use the &amp;quot;nvidia-smi&amp;quot; command to list the GPUs allocated to your job:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ nvidia-smi&lt;br /&gt;
Sun Mar 29 15:20:05 2020       &lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |&lt;br /&gt;
|-------------------------------+----------------------+----------------------+&lt;br /&gt;
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |&lt;br /&gt;
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |&lt;br /&gt;
|===============================+======================+======================|&lt;br /&gt;
|   0  Tesla V100-SXM2...  Off  | 00000000:3A:00.0 Off |                    0 |&lt;br /&gt;
| N/A   29C    P0    39W / 300W |      9MiB / 32510MiB |      0%      Default |&lt;br /&gt;
+-------------------------------+----------------------+----------------------+&lt;br /&gt;
|   1  Tesla V100-SXM2...  Off  | 00000000:3B:00.0 Off |                    0 |&lt;br /&gt;
| N/A   30C    P0    41W / 300W |      8MiB / 32510MiB |      0%      Default |&lt;br /&gt;
+-------------------------------+----------------------+----------------------+&lt;br /&gt;
                                                                               &lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
| Processes:                                                       GPU Memory |&lt;br /&gt;
|  GPU       PID   Type   Process name                             Usage      |&lt;br /&gt;
|=============================================================================|&lt;br /&gt;
|    0     14228      G   /usr/bin/X                                     8MiB |&lt;br /&gt;
|    1     14228      G   /usr/bin/X                                     8MiB |&lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Upon successfull GPU ressource allocation, SLURM will set the environment variable &amp;lt;code&amp;gt;CUDA_VISIBLE_DEVICES&amp;lt;/code&amp;gt; appropriately. &amp;lt;b&amp;gt;Do not change this variable!&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
In case of using Open MPI, the underlying communication infrastructure (UCX and Open MPI&#039;s BTL) is CUDA-aware.&lt;br /&gt;
However, there may be warnings, e.g. when running&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load compiler/gnu/10.3 mpi/openmpi devel/cuda&lt;br /&gt;
$ mpirun mpirun -np 2 ./mpi_cuda_app&lt;br /&gt;
--------------------------------------&lt;br /&gt;
WARNING: There are more than one active ports on host &#039;uc2n520&#039;, but the&lt;br /&gt;
default subnet GID prefix was detected on more than one of these&lt;br /&gt;
ports.  If these ports are connected to different physical IB&lt;br /&gt;
networks, this configuration will fail in Open MPI.  This version of&lt;br /&gt;
Open MPI requires that every physically separate IB subnet that is&lt;br /&gt;
used between connected MPI processes must have different subnet ID&lt;br /&gt;
values.&lt;br /&gt;
&lt;br /&gt;
Please see this FAQ entry for more details:&lt;br /&gt;
&lt;br /&gt;
  http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid&lt;br /&gt;
&lt;br /&gt;
NOTE: You can turn off this warning by setting the MCA parameter&lt;br /&gt;
      btl_openib_warn_default_gid_prefix to 0.&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please run Open MPI&#039;s mpirun using the following command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun --mca pml ucx --mca btl_openib_warn_default_gid_prefix 0 -np 2 ./mpi_cuda_app&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or disabling the (older) communication layer Bit-Transfer-Layer (short BTL) alltogether:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun --mca pml ucx --mca btl ^openib -np 2 ./mpi_cuda_app&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(Please note, that CUDA per v12.8 is only officially supported with up to GCC-11)&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Start time of job or resources : squeue --start ==&lt;br /&gt;
The command can be used by any user to displays the estimated start time of a job based a number of analysis types based on historical usage, earliest available reservable resources, and priority based backlog. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be run by &#039;&#039;&#039;any user&#039;&#039;&#039;. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== List of your submitted jobs : squeue ==&lt;br /&gt;
Displays information about YOUR active, pending and/or recently completed jobs. The command displays all own active and pending jobs. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be run by any user.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Flags ===&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Flag !! Description&lt;br /&gt;
|-&lt;br /&gt;
| -l, --long&lt;br /&gt;
| Report more of the available information for the selected jobs or job steps, subject to any constraints specified.&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Examples ===&lt;br /&gt;
&#039;&#039;squeue&#039;&#039; example on BinaC 2 &amp;lt;small&amp;gt;(Only your own jobs are displayed!)&amp;lt;/small&amp;gt;.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue &lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
          18088744    single CPV.sbat   ab1234 PD       0:00      1 (Priority)&lt;br /&gt;
          18098414  multiple CPV.sbat   ab1234 PD       0:00      2 (Priority) &lt;br /&gt;
          18090089  multiple CPV.sbat   ab1234  R       2:27      2 uc2n[127-128]&lt;br /&gt;
$ squeue -l&lt;br /&gt;
            JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON) &lt;br /&gt;
         18088654    single CPV.sbat   ab1234 COMPLETI       4:29   2:00:00      1 uc2n374&lt;br /&gt;
         18088785    single CPV.sbat   ab1234  PENDING       0:00   2:00:00      1 (Priority)&lt;br /&gt;
         18098414  multiple CPV.sbat   ab1234  PENDING       0:00   2:00:00      2 (Priority)&lt;br /&gt;
         18088683    single CPV.sbat   ab1234  RUNNING       0:14   2:00:00      1 uc2n413  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The output of &#039;&#039;squeue&#039;&#039; shows how many jobs of yours are running or pending and how many nodes are in use by your jobs.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Shows free resources : sinfo_t_idle ==&lt;br /&gt;
The Slurm command sinfo is used to view partition and node information for a system running Slurm. It incorporates down time, reservations, and node state information in determining the available backfill window. The sinfo command can only be used by the administrator.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
SCC has prepared a special script (sinfo_t_idle) to find out how many processors are available for immediate use on the system. It is anticipated that users will use this information to submit jobs that meet these criteria and thus obtain quick job turnaround times. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be used by any user or administrator. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Example ===&lt;br /&gt;
* The following command displays what resources are available for immediate use for the whole partition.&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sinfo_t_idle&lt;br /&gt;
Partition dev_multiple  :      8 nodes idle&lt;br /&gt;
Partition multiple      :    332 nodes idle&lt;br /&gt;
Partition dev_single    :      4 nodes idle&lt;br /&gt;
Partition single        :     76 nodes idle&lt;br /&gt;
Partition long          :     80 nodes idle&lt;br /&gt;
Partition fat           :      5 nodes idle&lt;br /&gt;
Partition dev_special   :    342 nodes idle&lt;br /&gt;
Partition special       :    342 nodes idle&lt;br /&gt;
Partition dev_multiple_e:      7 nodes idle&lt;br /&gt;
Partition multiple_e    :    335 nodes idle&lt;br /&gt;
Partition gpu_4         :     12 nodes idle&lt;br /&gt;
Partition gpu_8         :      6 nodes idle&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For the above example jobs in all partitions can be run immediately.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Detailed job information : scontrol show job ==&lt;br /&gt;
scontrol show job displays detailed job state information and diagnostic output for all or a specified job of yours. Detailed information is available for active, pending and recently completed jobs. The command scontrol is explained in detail on the webpage https://slurm.schedmd.com/scontrol.html or via manpage (man scontrol). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of all your jobs in normal mode: scontrol show job&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of a job with &amp;lt;jobid&amp;gt; in normal mode: scontrol show job &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
* End users can use scontrol show job to view the status of their &#039;&#039;&#039;own jobs&#039;&#039;&#039; only. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Arguments ===&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Option !! Default !! Description !! Example&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
|- style=&amp;quot;width:12%;&amp;quot; &lt;br /&gt;
| -d&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Detailed mode&lt;br /&gt;
| Example: Display the state with jobid 18089884 in detailed mode. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt;scontrol -d show job 18089884&amp;lt;/pre&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Scontrol show job Example ===&lt;br /&gt;
Here is an example from BinAC 2.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue    # show my own jobs (here the userid is replaced!)&lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
          18089884  multiple CPV.sbat   bq0742  R      33:44      2 uc2n[165-166]&lt;br /&gt;
&lt;br /&gt;
$&lt;br /&gt;
$ # now, see what&#039;s up with my pending job with jobid 18089884&lt;br /&gt;
$ &lt;br /&gt;
$ scontrol show job 18089884&lt;br /&gt;
&lt;br /&gt;
JobId=18089884 JobName=CPV.sbatch&lt;br /&gt;
   UserId=bq0742(8946) GroupId=scc(12345) MCS_label=N/A&lt;br /&gt;
   Priority=3 Nice=0 Account=kit QOS=normal&lt;br /&gt;
   JobState=RUNNING Reason=None Dependency=(null)&lt;br /&gt;
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0&lt;br /&gt;
   RunTime=00:35:06 TimeLimit=02:00:00 TimeMin=N/A&lt;br /&gt;
   SubmitTime=2020-03-16T14:14:54 EligibleTime=2020-03-16T14:14:54&lt;br /&gt;
   AccrueTime=2020-03-16T14:14:54&lt;br /&gt;
   StartTime=2020-03-16T15:12:51 EndTime=2020-03-16T17:12:51 Deadline=N/A&lt;br /&gt;
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-03-16T15:12:51&lt;br /&gt;
   Partition=multiple AllocNode:Sid=uc2n995:5064&lt;br /&gt;
   ReqNodeList=(null) ExcNodeList=(null)&lt;br /&gt;
   NodeList=uc2n[165-166]&lt;br /&gt;
   BatchHost=uc2n165&lt;br /&gt;
   NumNodes=2 NumCPUs=160 NumTasks=80 CPUs/Task=1 ReqB:S:C:T=0:0:*:1&lt;br /&gt;
   TRES=cpu=160,mem=96320M,node=2,billing=160&lt;br /&gt;
   Socks/Node=* NtasksPerN:B:S:C=40:0:*:1 CoreSpec=*&lt;br /&gt;
   MinCPUsNode=40 MinMemoryCPU=1204M MinTmpDiskNode=0&lt;br /&gt;
   Features=(null) DelayBoot=00:00:00&lt;br /&gt;
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)&lt;br /&gt;
   Command=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/CPV.sbatch&lt;br /&gt;
   WorkDir=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin&lt;br /&gt;
   StdErr=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/slurm-18089884.out&lt;br /&gt;
   StdIn=/dev/null&lt;br /&gt;
   StdOut=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/slurm-18089884.out&lt;br /&gt;
   Power=&lt;br /&gt;
   MailUser=(null) MailType=NONE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
You can use standard Linux pipe commands to filter the very detailed scontrol show job output.&lt;br /&gt;
* In which state the job is?&lt;br /&gt;
&amp;lt;pre&amp;gt;$ scontrol show job 18089884 | grep -i State&lt;br /&gt;
   JobState=COMPLETED Reason=None Dependency=(null)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Cancel Slurm Jobs ==&lt;br /&gt;
The scancel command is used to cancel jobs. The command scancel is explained in detail on the webpage https://slurm.schedmd.com/scancel.html or via manpage (man scancel).   &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Canceling own jobs : scancel ===&lt;br /&gt;
scancel is used to signal or cancel jobs, job arrays or job steps. The command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel [-i] &amp;lt;job-id&amp;gt;&lt;br /&gt;
$ scancel -t &amp;lt;job_state_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Flag !! Default !! Description !! Example&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -i, --interactive&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Interactive mode.&lt;br /&gt;
| Cancel the job 987654 interactively. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt; scancel -i 987654 &amp;lt;/pre&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| -t, --state&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Restrict the scancel operation to jobs in a certain state. &amp;lt;br&amp;gt; &amp;quot;job_state_name&amp;quot; may have a value of either &amp;quot;PENDING&amp;quot;, &amp;quot;RUNNING&amp;quot; or &amp;quot;SUSPENDED&amp;quot;.&lt;br /&gt;
| Cancel all jobs in state &amp;quot;PENDING&amp;quot;. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt; scancel -t &amp;quot;PENDING&amp;quot; &amp;lt;/pre&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Resource Managers =&lt;br /&gt;
=== Batch Job (Slurm) Variables ===&lt;br /&gt;
The following environment variables of Slurm are added to your environment once your job has started&lt;br /&gt;
&amp;lt;small&amp;gt;(only an excerpt of the most important ones)&amp;lt;/small&amp;gt;.&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Environment !! Brief explanation&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_CPUS_PER_NODE &lt;br /&gt;
| Number of processes per node dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_NODELIST &lt;br /&gt;
| List of nodes dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_NUM_NODES &lt;br /&gt;
| Number of nodes dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_MEM_PER_NODE &lt;br /&gt;
| Memory per node dedicated to the job &lt;br /&gt;
|- &lt;br /&gt;
| SLURM_NPROCS&lt;br /&gt;
| Total number of processes dedicated to the job &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_CLUSTER_NAME&lt;br /&gt;
| Name of the cluster executing the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_CPUS_PER_TASK &lt;br /&gt;
| Number of CPUs requested per task&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_ACCOUNT&lt;br /&gt;
| Account name &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_ID&lt;br /&gt;
| Job ID&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_NAME&lt;br /&gt;
| Job Name&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_PARTITION&lt;br /&gt;
| Partition/queue running the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_UID&lt;br /&gt;
| User ID of the job&#039;s owner&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_SUBMIT_DIR&lt;br /&gt;
| Job submit folder.  The directory from which sbatch was invoked. &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_USER&lt;br /&gt;
| User name of the job&#039;s owner&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_RESTART_COUNT&lt;br /&gt;
| Number of times job has restarted&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_PROCID&lt;br /&gt;
| Task ID (MPI rank)&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_NTASKS&lt;br /&gt;
| The total number of tasks available for the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_STEP_ID&lt;br /&gt;
| Job step ID&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_STEP_NUM_TASKS&lt;br /&gt;
| Task count (number of MPI ranks)&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_CONSTRAINT&lt;br /&gt;
| Job constraints&lt;br /&gt;
|}&lt;br /&gt;
See also:&lt;br /&gt;
* [https://slurm.schedmd.com/sbatch.html#lbAI Slurm input and output environment variables]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Job Exit Codes ===&lt;br /&gt;
A job&#039;s exit code (also known as exit status, return code and completion code) is captured by SLURM and saved as part of the job record. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Any non-zero exit code will be assumed to be a job failure and will result in a Job State of FAILED with a reason of &amp;quot;NonZeroExitCode&amp;quot;.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The exit code is an 8 bit unsigned number ranging between 0 and 255. While it is possible for a job to return a negative exit code, SLURM will display it as an unsigned value in the 0 - 255 range.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
==== Displaying Exit Codes and Signals ====&lt;br /&gt;
SLURM displays a job&#039;s exit code in the output of the &#039;&#039;&#039;scontrol show job&#039;&#039;&#039; and the sview utility.&lt;br /&gt;
&amp;lt;br&amp;gt; &lt;br /&gt;
When a signal was responsible for a job or step&#039;s termination, the signal number will be displayed after the exit code, delineated by a colon(:).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
==== Submitting Termination Signal ====&lt;br /&gt;
Here is an example, how to &#039;save&#039; a Slurm termination signal in a typical jobscript.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
exit_code=$?&lt;br /&gt;
mpirun  -np &amp;lt;#cores&amp;gt;  &amp;lt;EXE_BIN_DIR&amp;gt;/&amp;lt;executable&amp;gt; ... (options)  2&amp;gt;&amp;amp;1&lt;br /&gt;
[ &amp;quot;$exit_code&amp;quot; -eq 0 ] &amp;amp;&amp;amp; echo &amp;quot;all clean...&amp;quot; || \&lt;br /&gt;
   echo &amp;quot;Executable &amp;lt;EXE_BIN_DIR&amp;gt;/&amp;lt;executable&amp;gt; finished with exit code ${$exit_code}&amp;quot;&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
* Do not use &#039;&#039;&#039;&#039;time&#039;&#039;&#039;&#039; mpirun! The exit code will be the one submitted by the first (time) program.&lt;br /&gt;
* You do not need an &#039;&#039;&#039;exit $exit_code&#039;&#039;&#039; in the scripts.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
----&lt;br /&gt;
[[#top|Back to top]]&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Slurm&amp;diff=15668</id>
		<title>BinAC2/Slurm</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Slurm&amp;diff=15668"/>
		<updated>2026-01-07T08:45:05Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= General information about Slurm =&lt;br /&gt;
&lt;br /&gt;
Any kind of calculation on the compute nodes of bwForCluster BinAC 2 requires the user to define calculations as a sequence of commands or single command together with required run time, number of CPU cores and main memory and submit all, i.e., the &#039;&#039;&#039;batch job&#039;&#039;&#039;, to a resource and workload managing software. BinAC 2 has installed the workload managing software Slurm. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.&lt;br /&gt;
&lt;br /&gt;
= External Slurm documentation =&lt;br /&gt;
&lt;br /&gt;
You can find the official Slurm configuration and some other material here:&lt;br /&gt;
&lt;br /&gt;
* Slurm documentation: https://slurm.schedmd.com/documentation.html&lt;br /&gt;
* Slurm cheat sheet: https://slurm.schedmd.com/pdfs/summary.pdf&lt;br /&gt;
* Slurm tutorials: https://slurm.schedmd.com/tutorials.html&lt;br /&gt;
&lt;br /&gt;
= SLURM terminology = &lt;br /&gt;
&lt;br /&gt;
SLURM knows and mirrors the division of the cluster into &#039;&#039;&#039;nodes&#039;&#039;&#039; with several &#039;&#039;&#039;cores&#039;&#039;&#039;. When queuing &#039;&#039;&#039;jobs&#039;&#039;&#039;, there are several ways of requesting resources and it is important to know which term means what in SLURM. Here are some basic SLURM terms:&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Job&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Job&lt;br /&gt;
: A job is a self-contained computation that may encompass multiple tasks and is given specific resources like individual CPUs/GPUs, a specific amount of RAM or entire nodes. These resources are said to have been allocated for the job.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Task&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Task&lt;br /&gt;
: A task is a single run of a single process. By default, one task is run per node and one CPU is assigned per task.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Partition&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Partition    &lt;br /&gt;
: A partition (usually called queue outside SLURM) is a waiting line in which jobs are put by users.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Socket&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Socket    &lt;br /&gt;
: Receptacle on the motherboard for one physically packaged processor (each of which can contain one or more cores).&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Core&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Core    &lt;br /&gt;
: A complete private set of registers, execution units, and retirement queues needed to execute programs.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Thread&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Thread    &lt;br /&gt;
: One or more hardware contexts withing a single core. Each thread has attributes of one core, managed &amp;amp; scheduled as a single logical processor by the OS.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;CPU&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;CPU&lt;br /&gt;
: A &#039;&#039;&#039;CPU&#039;&#039;&#039; in Slurm means a &#039;&#039;&#039;single core&#039;&#039;&#039;. This is different from the more common terminology, where a CPU (a microprocessor chip) consists of multiple cores. Slurm uses the term &#039;&#039;&#039;sockets&#039;&#039;&#039; when talking about CPU chips. Depending upon system configuration, a CPU can be either a &#039;&#039;&#039;core&#039;&#039;&#039; or a &#039;&#039;&#039;thread&#039;&#039;&#039;. On &#039;&#039;&#039;BinAC 2 Hyperthreading is activated on every machine&#039;&#039;&#039;. This means that the operating system and Slurm sees each physical core as two logical cores.&lt;br /&gt;
&lt;br /&gt;
= Slurm Commands =&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Brief explanation&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/sbatch.html sbatch] || Submits a job and queues it in an input queue&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/salloc.html saclloc] || Request resources for an interactive job&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/squeue.html squeue] || Displays information about active, eligible, blocked, and/or recently completed jobs &lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/scontrol.html scontrol] || Displays detailed job state information&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/scontrol.html sstat] || Displays status information about a running job&lt;br /&gt;
|- &lt;br /&gt;
| [https://slurm.schedmd.com/scancel.html scancel] || Cancels a job&lt;br /&gt;
|- &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Interactive Jobs ==&lt;br /&gt;
&lt;br /&gt;
You can run interactive jobs for testing and developing your job scripts.&lt;br /&gt;
Several nodes are reserved for interactive work, so your jobs should start right away.&lt;br /&gt;
You can only submit one job to this partition at a time. A job can run for up to 10 hours (about one workday).&lt;br /&gt;
&lt;br /&gt;
This example command gives you 16 cores and 128 GB of memory for four hours on one of the reserved nodes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --partition=interactive --time=4:00:00 --cpus-per-task=16 --mem=128gb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also use srun to request the same resources:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
srun --partition=interactive --time=4:00:00 --cpus-per-task=16 --mem=128gb --pty bash&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Job Submission : sbatch ==&lt;br /&gt;
&lt;br /&gt;
Batch jobs are submitted by using the command &#039;&#039;&#039;sbatch&#039;&#039;&#039;. The main purpose of the &#039;&#039;&#039;sbatch&#039;&#039;&#039; command is to specify the resources that are needed to run the job. &#039;&#039;&#039;sbatch&#039;&#039;&#039; will then queue the batch job. However, starting of batch job depends on the availability of the requested resources and the fair sharing value.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== sbatch Command Parameters ===&lt;br /&gt;
The syntax and use of &#039;&#039;&#039;sbatch&#039;&#039;&#039; can be displayed via:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ man sbatch&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;sbatch&#039;&#039;&#039; options can be used from the command line or in your job script. The following table shows the syntax and provides examples for each option.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;5&amp;quot; | sbatch Options&lt;br /&gt;
|-&lt;br /&gt;
! Command line&lt;br /&gt;
! Job Script&lt;br /&gt;
! Purpose&lt;br /&gt;
! Example&lt;br /&gt;
! Default value&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| &amp;lt;code&amp;gt;-t &#039;&#039;time&#039;&#039;&amp;lt;/code&amp;gt;  or  &amp;lt;code&amp;gt;--time=&#039;&#039;time&#039;&#039;&amp;lt;/code&amp;gt;&lt;br /&gt;
| #SBATCH --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| Wall clock time limit.&amp;lt;br&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;-t 2:30:00&amp;lt;/code&amp;gt; Limits run time to 2h 30 min.&amp;lt;/br&amp;gt;&amp;lt;code&amp;gt;-t 2-12&amp;lt;/code&amp;gt; Limits run time to 2 days and 12 hours.&lt;br /&gt;
| Depends on Slurm partition.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -N &#039;&#039;count&#039;&#039;  or  --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of nodes to be used.&lt;br /&gt;
| &amp;lt;code&amp;gt;-N 1&amp;lt;/code&amp;gt; Run job on one node.&amp;lt;/br&amp;gt;&amp;lt;code&amp;gt;-N 2&amp;lt;/code&amp;gt; Run job on two nodes (have to use MPI!)&lt;br /&gt;
| &lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -n &#039;&#039;count&#039;&#039;  or  --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of tasks to be launched.&lt;br /&gt;
| &amp;lt;code&amp;gt;-n 2&amp;lt;/code&amp;gt; launch two tasks in the job.&lt;br /&gt;
| One task per node&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Maximum count of tasks per node.&amp;lt;br&amp;gt;(Replaces the option &amp;lt;code&amp;gt;ppn&amp;lt;/code&amp;gt; of MOAB.)&lt;br /&gt;
| &amp;lt;code&amp;gt;--ntasks-per-node=2&amp;lt;/code&amp;gt; Run 2 tasks per node&lt;br /&gt;
| 1 task per node&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -c &#039;&#039;count&#039;&#039; or --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of CPUs required per (MPI-)task.&lt;br /&gt;
| &amp;lt;code&amp;gt;-c 2&amp;lt;/code&amp;gt; Request two CPUs per (MPI-)task.&lt;br /&gt;
| 1 CPU per (MPI-)task&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| &amp;lt;code&amp;gt;--mem=&amp;lt;size&amp;gt;[units]&amp;lt;/code&amp;gt;&lt;br /&gt;
| #SBATCH --mem=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Memory in MegaByte per node.&amp;lt;/br&amp;gt;&amp;lt;code&amp;gt;[units]&amp;lt;/code&amp;gt; can be one of &amp;lt;code&amp;gt;[K&amp;lt;nowiki&amp;gt;|&amp;lt;/nowiki&amp;gt;M&amp;lt;nowiki&amp;gt;|&amp;lt;/nowiki&amp;gt;G&amp;lt;nowiki&amp;gt;|&amp;lt;/nowiki&amp;gt;T]&amp;lt;/code&amp;gt;.&lt;br /&gt;
| &amp;lt;code&amp;gt;--mem=10g&amp;lt;/code&amp;gt; Request 10GB RAM per node &amp;lt;/br&amp;gt; &amp;lt;code&amp;gt;--mem=0&amp;lt;/code&amp;gt; Request all memory on node&lt;br /&gt;
| Depends on Slurm configuration.&amp;lt;/br&amp;gt;It is better to specify &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; in every case.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Minimum Memory required per allocated CPU.&amp;lt;br&amp;gt;(Replaces the option pmem of MOAB. You should omit &amp;lt;br&amp;gt; the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| Notify user by email when certain event types occur.&amp;lt;br&amp;gt;Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
|  The specified mail-address receives email notification of state&amp;lt;br&amp;gt;changes as defined by --mail-type.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job output is stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job error messages are stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -J &#039;&#039;name&#039;&#039; or --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| Job name.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| #SBATCH --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| Identifies which environment variables from the submission &amp;lt;br&amp;gt; environment are propagated to the launched application. Default &amp;lt;br&amp;gt; is ALL. If adding an environment variable to the submission&amp;lt;br&amp;gt; environment is intended, the argument ALL must be added.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -A &#039;&#039;group-name&#039;&#039; or --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| #SBATCH --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| Change resources used by this job to specified group. You may &amp;lt;br&amp;gt; need this option if your account is assigned to more &amp;lt;br&amp;gt; than one group. By command &amp;quot;scontrol show job&amp;quot; the project &amp;lt;br&amp;gt; group the job is accounted on can be seen behind &amp;quot;Account=&amp;quot;. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -p &#039;&#039;queue-name&#039;&#039; or --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| #SBATCH --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| Request a specific queue for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| #SBATCH --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| Use a specific reservation for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C &#039;&#039;LSDF&#039;&#039; or --constraint=&#039;&#039;LSDF&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=LSDF&lt;br /&gt;
| Job constraint LSDF Filesystems.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== sbatch --partition  &#039;&#039;queues&#039;&#039; ====&lt;br /&gt;
Queue classes define maximum resources such as walltime, nodes and processes per node and queue of the compute system. Details can be found here:&lt;br /&gt;
* [[BinAC2/SLURM_Partitions|BinAC 2 partitions]]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== sbatch Examples ===&lt;br /&gt;
&lt;br /&gt;
If you are coming from Moab/Torque on BinAC 1 or you are new to HPC/Slurm the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; options may confuse you. The following examples give an orientation how to run typical workloads on BinAC 2.&lt;br /&gt;
&lt;br /&gt;
You can find every file mentioned on this Wiki page on BinAC 2 at: &amp;lt;code&amp;gt;/pfs/10/project/examples&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Serial Programs ====&lt;br /&gt;
When you use serial programs that use only one process, you can omit most of the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; parameters, as the default values are sufficient.&lt;br /&gt;
&lt;br /&gt;
To submit a serial job that runs the script &amp;lt;code&amp;gt;serial_job.sh&amp;lt;/code&amp;gt; and requires 5000 MB of main memory and 10 minutes of wall clock time, Slurm will allocate one &#039;&#039;&#039;physical&#039;&#039;&#039; core to your job.&lt;br /&gt;
&lt;br /&gt;
a) execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p compute -t 10:00 --mem=5000m  serial_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or&lt;br /&gt;
b) add after the initial line of your script &#039;&#039;&#039;serial_job.sh&#039;&#039;&#039; the lines:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#SBATCH --time=10:00&lt;br /&gt;
#SBATCH --mem=5000m&lt;br /&gt;
#SBATCH --job-name=simple-serial-job&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
and execute the modified script with the command line option &#039;&#039;--partition=compute&#039;&#039;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p=compute serial_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note, that sbatch command line options overrule script options.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Multithreaded Programs ====&lt;br /&gt;
&lt;br /&gt;
Multithreaded programs run their processes on multiple threads and share resources such as memory.&amp;lt;br&amp;gt;&lt;br /&gt;
You may use a program that includes a built-in option for multithreading (e.g., options like &amp;lt;code&amp;gt;--threads&amp;lt;/code&amp;gt;).&amp;lt;br&amp;gt;&lt;br /&gt;
For multithreaded programs based on &#039;&#039;&#039;Open&#039;&#039;&#039; &#039;&#039;&#039;M&#039;&#039;&#039;ulti-&#039;&#039;&#039;P&#039;&#039;&#039;rocessing (OpenMP) a number of threads is defined by the environment variable &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt;. By default, this variable is set to 1 (&amp;lt;code&amp;gt;OMP_NUM_THREADS=1&amp;lt;/code&amp;gt;). &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Important:&#039;&#039;&#039; Hyperthreading is activated on bwForCluster BinAC 2. Hyperthreading can be beneficial for some applications and codes, but it can also degrade performance in other cases. We therefore recommend to run a small test job with and without hyperthreading to determine the best choice. &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;a) Program with built-in multithreading option&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
The example uses the common Bioinformatics software called &amp;lt;code&amp;gt;samtools&amp;lt;/code&amp;gt; as example for using built-in multithreading.&lt;br /&gt;
&lt;br /&gt;
The module &amp;lt;code&amp;gt;bio/samtools/1.21&amp;lt;/code&amp;gt; provides an example jobscript that requests 4 CPUs and runs &amp;lt;code&amp;gt;samtools sort&amp;lt;/code&amp;gt; with 4 threads.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#SBATCH --time=19:00&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --cpus-per-task=4&lt;br /&gt;
#SBATCH --mem=5000m&lt;br /&gt;
#SBATCH --partition compute&lt;br /&gt;
[...]&lt;br /&gt;
samtools sort -@ 4 sample.bam -o sample.sorted.bam&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can use the example jobscript with this command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch /opt/bwhpc/common/bio/samtools/1.21/bwhpc-examples/binac2-samtools-1.21-bwhpc-examples.slurm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;b) OpenMP&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
We will run an exaple OpenMP Hello-World program. The jobscript looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --cpus-per-task=4&lt;br /&gt;
#SBATCH --time=1:00&lt;br /&gt;
#SBATCH --mem=5000m   &lt;br /&gt;
#SBATCH -J OpenMP-Hello-World&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$(${SLURM_JOB_CPUS_PER_NODE}/2)&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;Executable running on ${SLURM_JOB_CPUS_PER_NODE} cores with ${OMP_NUM_THREADS} threads&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Run parallel Hello World&lt;br /&gt;
/pfs/10/project/examples/openmp_hello_world&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit the job to the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition and get the output (in the stdout-file)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch --partition=compute /pfs/10/project/examples/openmp_hello_world.sh&lt;br /&gt;
&lt;br /&gt;
Executable  running on 4 cores with 4 threads&lt;br /&gt;
Hello from process: 0&lt;br /&gt;
Hello from process: 2&lt;br /&gt;
Hello from process: 1&lt;br /&gt;
Hello from process: 3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
If you want to run MPI-jobs on batch nodes, generate a wrapper script &amp;lt;code&amp;gt;mpi_hello_world.sh&amp;lt;/code&amp;gt; for &#039;&#039;&#039;OpenMPI&#039;&#039;&#039; containing the following lines:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#SBATCH --partition compute&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=2&lt;br /&gt;
#SBATCH --cpus-per-task=2&lt;br /&gt;
#SBATCH --mem-per-cpu=2000&lt;br /&gt;
#SBATCH --time=05:00&lt;br /&gt;
&lt;br /&gt;
# Load the MPI implementation of your choice&lt;br /&gt;
module load mpi/openmpi/4.1-gnu-14.2&lt;br /&gt;
&lt;br /&gt;
# Run your MPI program&lt;br /&gt;
mpirun --bind-to core --map-by core --report-bindings mpi_hello_world&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Attention:&#039;&#039;&#039; Do &#039;&#039;&#039;NOT&#039;&#039;&#039; add mpirun options &amp;lt;code&amp;gt;-n &amp;lt;number_of_processes&amp;gt;&amp;lt;/code&amp;gt; or any other option defining processes or nodes, since Slurm instructs mpirun about number of processes and node hostnames.&lt;br /&gt;
&lt;br /&gt;
Use &#039;&#039;&#039;ALWAYS&#039;&#039;&#039; the MPI options &amp;lt;code&amp;gt;--bind-to core&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--map-by core|socket|node&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please type &amp;lt;code&amp;gt;man mpirun&amp;lt;/code&amp;gt; for an explanation of the meaning of the different options of mpirun option &amp;lt;code&amp;gt;--map-by&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The above jobscript runs four OpenMPI tasks, distributed between two nodes. Because of hyperthreading you have to set &amp;lt;code&amp;gt;--cpus-per-task=2&amp;lt;/code&amp;gt;. This means each MPI-task will get one physical core. If you omit &amp;lt;code&amp;gt;--cpus-per-task=2&amp;lt;/code&amp;gt; MPI will fail.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Attention:&#039;&#039;&#039; Not all compute nodes are connected via Infiniband. Tell Slurm you want Infiniband via &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt; when submitting or add &amp;lt;code&amp;gt;#SBATCH --constraint=ib&amp;lt;/code&amp;gt; to your jobscript.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch --constraint=ib /pfs/10/project/examples/mpi_hello_world.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will run a simple Hello World program:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
Hello world from processor node2-031, rank 3 out of 4 processors&lt;br /&gt;
Hello world from processor node2-031, rank 2 out of 4 processors&lt;br /&gt;
Hello world from processor node2-030, rank 1 out of 4 processors&lt;br /&gt;
Hello world from processor node2-030, rank 0 out of 4 processors&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Multithreaded + MPI parallel Programs ====&lt;br /&gt;
&lt;br /&gt;
Multithreaded + MPI parallel programs operate faster than serial programs on multi CPUs with multiple cores. All threads of one process share resources such as memory. On the contrary MPI tasks do not share memory but can be spawned over different nodes. &#039;&#039;&#039;Because hyperthreading is switched on BinaC 2, the option --cpus-per-task (-c) must be set to 2*n, if you want to use n threads.&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
===== OpenMPI with Multithreading =====&lt;br /&gt;
Multiple MPI tasks using &#039;&#039;&#039;OpenMPI&#039;&#039;&#039; must be launched by the MPI parallel program &#039;&#039;&#039;mpirun&#039;&#039;&#039;. For multithreaded programs based on &#039;&#039;&#039;Open&#039;&#039;&#039; &#039;&#039;&#039;M&#039;&#039;&#039;ulti-&#039;&#039;&#039;P&#039;&#039;&#039;rocessing (OpenMP) number of threads are defined by the environment variable OMP_NUM_THREADS. By default this variable is set to 1 (OMP_NUM_THREADS=1).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;For OpenMPI&#039;&#039;&#039; a job-script to submit a batch job called &#039;&#039;job_ompi_omp.sh&#039;&#039; that runs a MPI program with 4 tasks and a 28-fold threaded program &#039;&#039;ompi_omp_program&#039;&#039; requiring 3000 MByte of physical memory per thread (using 28 threads per MPI task you will get 28*3000 MByte = 84000 MByte per MPI task) and total wall clock time of 3 hours looks like:&lt;br /&gt;
&amp;lt;!--b)--&amp;gt;&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --ntasks=4&lt;br /&gt;
#SBATCH --cpus-per-task=56&lt;br /&gt;
#SBATCH --time=03:00:00&lt;br /&gt;
#SBATCH --mem=83gb    # 84000 MB = 84000/1024 GB = 82.1 GB&lt;br /&gt;
#SBATCH --export=ALL,MPI_MODULE=mpi/openmpi/3.1,EXECUTABLE=./ompi_omp_program&lt;br /&gt;
#SBATCH --output=&amp;quot;parprog_hybrid_%j.out&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
# Use when a defined module environment related to OpenMPI is wished&lt;br /&gt;
module load ${MPI_MODULE}&lt;br /&gt;
export OMP_NUM_THREADS=$((${SLURM_CPUS_PER_TASK}/2))&lt;br /&gt;
export MPIRUN_OPTIONS=&amp;quot;--bind-to core --map-by socket:PE=${OMP_NUM_THREADS} -report-bindings&amp;quot;&lt;br /&gt;
export NUM_CORES=${SLURM_NTASKS}*${OMP_NUM_THREADS}&lt;br /&gt;
echo &amp;quot;${EXECUTABLE} running on ${NUM_CORES} cores with ${SLURM_NTASKS} MPI-tasks and ${OMP_NUM_THREADS} threads&amp;quot;&lt;br /&gt;
startexe=&amp;quot;mpirun -n ${SLURM_NTASKS} ${MPIRUN_OPTIONS} ${EXECUTABLE}&amp;quot;&lt;br /&gt;
echo $startexe&lt;br /&gt;
exec $startexe&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Execute the script &#039;&#039;&#039;job_ompi_omp.sh&#039;&#039;&#039; by command sbatch:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p compute ./job_ompi_omp.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* With the mpirun option &#039;&#039;--bind-to core&#039;&#039; MPI tasks and OpenMP threads are bound to physical cores.&lt;br /&gt;
* With the option &#039;&#039;--map-by node:PE=&amp;lt;value&amp;gt;&#039;&#039; (neighbored) MPI tasks will be attached to different nodes and each MPI task is bound to the first core of a node. &amp;lt;value&amp;gt; must be set to ${OMP_NUM_THREADS}.&lt;br /&gt;
* The option &#039;&#039;-report-bindings&#039;&#039; shows the bindings between MPI tasks and physical cores.&lt;br /&gt;
* The mpirun-options &#039;&#039;&#039;--bind-to core&#039;&#039;&#039;, &#039;&#039;&#039;--map-by socket|...|node:PE=&amp;lt;value&amp;gt;&#039;&#039;&#039; should always be used when running a multithreaded MPI program.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== GPU jobs ====&lt;br /&gt;
&lt;br /&gt;
The nodes in the &amp;lt;code&amp;gt;gpu&amp;lt;/code&amp;gt; queue have 2 or 4 NVIDIA A30/A100/H200 GPUs. Just submitting a job to these queues is not enough to also allocate one or more GPUs, you have to do so using the &amp;quot;--gres=gpu&amp;quot; parameter. You have to specifiy how many GPUs your job needs, e.g. &amp;quot;--gres=gpu:a30:2&amp;quot; will request two NVIDIA A30 GPUs.&lt;br /&gt;
&lt;br /&gt;
The GPU nodes are shared between multiple jobs if the jobs don&#039;t request all the GPUs in a node and there are enough ressources to run more than one job. The individual GPUs are always bound to a single job and will not be shared between different jobs.&lt;br /&gt;
&lt;br /&gt;
a) add after the initial line of your script job.sh the line including the&lt;br /&gt;
information about the GPU usage:&amp;lt;br&amp;gt;   #SBATCH --gres=gpu:a30:2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --ntasks=40&lt;br /&gt;
#SBATCH --time=02:00:00&lt;br /&gt;
#SBATCH --mem=4000&lt;br /&gt;
#SBATCH --gres=gpu:a30:2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or b) execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p &amp;lt;queue&amp;gt; -n 40 -t 02:00:00 --mem 4000 --gres=gpu:a30:2 job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
If you start an interactive session on of the GPU nodes, you can use the &amp;quot;nvidia-smi&amp;quot; command to list the GPUs allocated to your job:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ nvidia-smi&lt;br /&gt;
Sun Mar 29 15:20:05 2020       &lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |&lt;br /&gt;
|-------------------------------+----------------------+----------------------+&lt;br /&gt;
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |&lt;br /&gt;
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |&lt;br /&gt;
|===============================+======================+======================|&lt;br /&gt;
|   0  Tesla V100-SXM2...  Off  | 00000000:3A:00.0 Off |                    0 |&lt;br /&gt;
| N/A   29C    P0    39W / 300W |      9MiB / 32510MiB |      0%      Default |&lt;br /&gt;
+-------------------------------+----------------------+----------------------+&lt;br /&gt;
|   1  Tesla V100-SXM2...  Off  | 00000000:3B:00.0 Off |                    0 |&lt;br /&gt;
| N/A   30C    P0    41W / 300W |      8MiB / 32510MiB |      0%      Default |&lt;br /&gt;
+-------------------------------+----------------------+----------------------+&lt;br /&gt;
                                                                               &lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
| Processes:                                                       GPU Memory |&lt;br /&gt;
|  GPU       PID   Type   Process name                             Usage      |&lt;br /&gt;
|=============================================================================|&lt;br /&gt;
|    0     14228      G   /usr/bin/X                                     8MiB |&lt;br /&gt;
|    1     14228      G   /usr/bin/X                                     8MiB |&lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Upon successfull GPU ressource allocation, SLURM will set the environment variable &amp;lt;code&amp;gt;CUDA_VISIBLE_DEVICES&amp;lt;/code&amp;gt; appropriately. &amp;lt;b&amp;gt;Do not change this variable!&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
In case of using Open MPI, the underlying communication infrastructure (UCX and Open MPI&#039;s BTL) is CUDA-aware.&lt;br /&gt;
However, there may be warnings, e.g. when running&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load compiler/gnu/10.3 mpi/openmpi devel/cuad&lt;br /&gt;
$ mpirun mpirun -np 2 ./mpi_cuda_app&lt;br /&gt;
--------------------------------------&lt;br /&gt;
WARNING: There are more than one active ports on host &#039;uc2n520&#039;, but the&lt;br /&gt;
default subnet GID prefix was detected on more than one of these&lt;br /&gt;
ports.  If these ports are connected to different physical IB&lt;br /&gt;
networks, this configuration will fail in Open MPI.  This version of&lt;br /&gt;
Open MPI requires that every physically separate IB subnet that is&lt;br /&gt;
used between connected MPI processes must have different subnet ID&lt;br /&gt;
values.&lt;br /&gt;
&lt;br /&gt;
Please see this FAQ entry for more details:&lt;br /&gt;
&lt;br /&gt;
  http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid&lt;br /&gt;
&lt;br /&gt;
NOTE: You can turn off this warning by setting the MCA parameter&lt;br /&gt;
      btl_openib_warn_default_gid_prefix to 0.&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please run Open MPI&#039;s mpirun using the following command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun --mca pml ucx --mca btl_openib_warn_default_gid_prefix 0 -np 2 ./mpi_cuda_app&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or disabling the (older) communication layer Bit-Transfer-Layer (short BTL) alltogether:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun --mca pml ucx --mca btl ^openib -np 2 ./mpi_cuda_app&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(Please note, that CUDA per v12.8 is only officially supported with up to GCC-11)&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Start time of job or resources : squeue --start ==&lt;br /&gt;
The command can be used by any user to displays the estimated start time of a job based a number of analysis types based on historical usage, earliest available reservable resources, and priority based backlog. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be run by &#039;&#039;&#039;any user&#039;&#039;&#039;. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== List of your submitted jobs : squeue ==&lt;br /&gt;
Displays information about YOUR active, pending and/or recently completed jobs. The command displays all own active and pending jobs. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be run by any user.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Flags ===&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Flag !! Description&lt;br /&gt;
|-&lt;br /&gt;
| -l, --long&lt;br /&gt;
| Report more of the available information for the selected jobs or job steps, subject to any constraints specified.&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Examples ===&lt;br /&gt;
&#039;&#039;squeue&#039;&#039; example on BinaC 2 &amp;lt;small&amp;gt;(Only your own jobs are displayed!)&amp;lt;/small&amp;gt;.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue &lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
          18088744    single CPV.sbat   ab1234 PD       0:00      1 (Priority)&lt;br /&gt;
          18098414  multiple CPV.sbat   ab1234 PD       0:00      2 (Priority) &lt;br /&gt;
          18090089  multiple CPV.sbat   ab1234  R       2:27      2 uc2n[127-128]&lt;br /&gt;
$ squeue -l&lt;br /&gt;
            JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON) &lt;br /&gt;
         18088654    single CPV.sbat   ab1234 COMPLETI       4:29   2:00:00      1 uc2n374&lt;br /&gt;
         18088785    single CPV.sbat   ab1234  PENDING       0:00   2:00:00      1 (Priority)&lt;br /&gt;
         18098414  multiple CPV.sbat   ab1234  PENDING       0:00   2:00:00      2 (Priority)&lt;br /&gt;
         18088683    single CPV.sbat   ab1234  RUNNING       0:14   2:00:00      1 uc2n413  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The output of &#039;&#039;squeue&#039;&#039; shows how many jobs of yours are running or pending and how many nodes are in use by your jobs.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Shows free resources : sinfo_t_idle ==&lt;br /&gt;
The Slurm command sinfo is used to view partition and node information for a system running Slurm. It incorporates down time, reservations, and node state information in determining the available backfill window. The sinfo command can only be used by the administrator.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
SCC has prepared a special script (sinfo_t_idle) to find out how many processors are available for immediate use on the system. It is anticipated that users will use this information to submit jobs that meet these criteria and thus obtain quick job turnaround times. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be used by any user or administrator. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Example ===&lt;br /&gt;
* The following command displays what resources are available for immediate use for the whole partition.&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sinfo_t_idle&lt;br /&gt;
Partition dev_multiple  :      8 nodes idle&lt;br /&gt;
Partition multiple      :    332 nodes idle&lt;br /&gt;
Partition dev_single    :      4 nodes idle&lt;br /&gt;
Partition single        :     76 nodes idle&lt;br /&gt;
Partition long          :     80 nodes idle&lt;br /&gt;
Partition fat           :      5 nodes idle&lt;br /&gt;
Partition dev_special   :    342 nodes idle&lt;br /&gt;
Partition special       :    342 nodes idle&lt;br /&gt;
Partition dev_multiple_e:      7 nodes idle&lt;br /&gt;
Partition multiple_e    :    335 nodes idle&lt;br /&gt;
Partition gpu_4         :     12 nodes idle&lt;br /&gt;
Partition gpu_8         :      6 nodes idle&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For the above example jobs in all partitions can be run immediately.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Detailed job information : scontrol show job ==&lt;br /&gt;
scontrol show job displays detailed job state information and diagnostic output for all or a specified job of yours. Detailed information is available for active, pending and recently completed jobs. The command scontrol is explained in detail on the webpage https://slurm.schedmd.com/scontrol.html or via manpage (man scontrol). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of all your jobs in normal mode: scontrol show job&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of a job with &amp;lt;jobid&amp;gt; in normal mode: scontrol show job &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
* End users can use scontrol show job to view the status of their &#039;&#039;&#039;own jobs&#039;&#039;&#039; only. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Arguments ===&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Option !! Default !! Description !! Example&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
|- style=&amp;quot;width:12%;&amp;quot; &lt;br /&gt;
| -d&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Detailed mode&lt;br /&gt;
| Example: Display the state with jobid 18089884 in detailed mode. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt;scontrol -d show job 18089884&amp;lt;/pre&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Scontrol show job Example ===&lt;br /&gt;
Here is an example from BinAC 2.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue    # show my own jobs (here the userid is replaced!)&lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
          18089884  multiple CPV.sbat   bq0742  R      33:44      2 uc2n[165-166]&lt;br /&gt;
&lt;br /&gt;
$&lt;br /&gt;
$ # now, see what&#039;s up with my pending job with jobid 18089884&lt;br /&gt;
$ &lt;br /&gt;
$ scontrol show job 18089884&lt;br /&gt;
&lt;br /&gt;
JobId=18089884 JobName=CPV.sbatch&lt;br /&gt;
   UserId=bq0742(8946) GroupId=scc(12345) MCS_label=N/A&lt;br /&gt;
   Priority=3 Nice=0 Account=kit QOS=normal&lt;br /&gt;
   JobState=RUNNING Reason=None Dependency=(null)&lt;br /&gt;
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0&lt;br /&gt;
   RunTime=00:35:06 TimeLimit=02:00:00 TimeMin=N/A&lt;br /&gt;
   SubmitTime=2020-03-16T14:14:54 EligibleTime=2020-03-16T14:14:54&lt;br /&gt;
   AccrueTime=2020-03-16T14:14:54&lt;br /&gt;
   StartTime=2020-03-16T15:12:51 EndTime=2020-03-16T17:12:51 Deadline=N/A&lt;br /&gt;
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-03-16T15:12:51&lt;br /&gt;
   Partition=multiple AllocNode:Sid=uc2n995:5064&lt;br /&gt;
   ReqNodeList=(null) ExcNodeList=(null)&lt;br /&gt;
   NodeList=uc2n[165-166]&lt;br /&gt;
   BatchHost=uc2n165&lt;br /&gt;
   NumNodes=2 NumCPUs=160 NumTasks=80 CPUs/Task=1 ReqB:S:C:T=0:0:*:1&lt;br /&gt;
   TRES=cpu=160,mem=96320M,node=2,billing=160&lt;br /&gt;
   Socks/Node=* NtasksPerN:B:S:C=40:0:*:1 CoreSpec=*&lt;br /&gt;
   MinCPUsNode=40 MinMemoryCPU=1204M MinTmpDiskNode=0&lt;br /&gt;
   Features=(null) DelayBoot=00:00:00&lt;br /&gt;
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)&lt;br /&gt;
   Command=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/CPV.sbatch&lt;br /&gt;
   WorkDir=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin&lt;br /&gt;
   StdErr=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/slurm-18089884.out&lt;br /&gt;
   StdIn=/dev/null&lt;br /&gt;
   StdOut=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/slurm-18089884.out&lt;br /&gt;
   Power=&lt;br /&gt;
   MailUser=(null) MailType=NONE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
You can use standard Linux pipe commands to filter the very detailed scontrol show job output.&lt;br /&gt;
* In which state the job is?&lt;br /&gt;
&amp;lt;pre&amp;gt;$ scontrol show job 18089884 | grep -i State&lt;br /&gt;
   JobState=COMPLETED Reason=None Dependency=(null)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Cancel Slurm Jobs ==&lt;br /&gt;
The scancel command is used to cancel jobs. The command scancel is explained in detail on the webpage https://slurm.schedmd.com/scancel.html or via manpage (man scancel).   &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Canceling own jobs : scancel ===&lt;br /&gt;
scancel is used to signal or cancel jobs, job arrays or job steps. The command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel [-i] &amp;lt;job-id&amp;gt;&lt;br /&gt;
$ scancel -t &amp;lt;job_state_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Flag !! Default !! Description !! Example&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -i, --interactive&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Interactive mode.&lt;br /&gt;
| Cancel the job 987654 interactively. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt; scancel -i 987654 &amp;lt;/pre&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| -t, --state&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Restrict the scancel operation to jobs in a certain state. &amp;lt;br&amp;gt; &amp;quot;job_state_name&amp;quot; may have a value of either &amp;quot;PENDING&amp;quot;, &amp;quot;RUNNING&amp;quot; or &amp;quot;SUSPENDED&amp;quot;.&lt;br /&gt;
| Cancel all jobs in state &amp;quot;PENDING&amp;quot;. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt; scancel -t &amp;quot;PENDING&amp;quot; &amp;lt;/pre&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Resource Managers =&lt;br /&gt;
=== Batch Job (Slurm) Variables ===&lt;br /&gt;
The following environment variables of Slurm are added to your environment once your job has started&lt;br /&gt;
&amp;lt;small&amp;gt;(only an excerpt of the most important ones)&amp;lt;/small&amp;gt;.&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Environment !! Brief explanation&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_CPUS_PER_NODE &lt;br /&gt;
| Number of processes per node dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_NODELIST &lt;br /&gt;
| List of nodes dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_NUM_NODES &lt;br /&gt;
| Number of nodes dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_MEM_PER_NODE &lt;br /&gt;
| Memory per node dedicated to the job &lt;br /&gt;
|- &lt;br /&gt;
| SLURM_NPROCS&lt;br /&gt;
| Total number of processes dedicated to the job &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_CLUSTER_NAME&lt;br /&gt;
| Name of the cluster executing the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_CPUS_PER_TASK &lt;br /&gt;
| Number of CPUs requested per task&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_ACCOUNT&lt;br /&gt;
| Account name &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_ID&lt;br /&gt;
| Job ID&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_NAME&lt;br /&gt;
| Job Name&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_PARTITION&lt;br /&gt;
| Partition/queue running the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_UID&lt;br /&gt;
| User ID of the job&#039;s owner&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_SUBMIT_DIR&lt;br /&gt;
| Job submit folder.  The directory from which sbatch was invoked. &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_USER&lt;br /&gt;
| User name of the job&#039;s owner&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_RESTART_COUNT&lt;br /&gt;
| Number of times job has restarted&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_PROCID&lt;br /&gt;
| Task ID (MPI rank)&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_NTASKS&lt;br /&gt;
| The total number of tasks available for the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_STEP_ID&lt;br /&gt;
| Job step ID&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_STEP_NUM_TASKS&lt;br /&gt;
| Task count (number of MPI ranks)&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_CONSTRAINT&lt;br /&gt;
| Job constraints&lt;br /&gt;
|}&lt;br /&gt;
See also:&lt;br /&gt;
* [https://slurm.schedmd.com/sbatch.html#lbAI Slurm input and output environment variables]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Job Exit Codes ===&lt;br /&gt;
A job&#039;s exit code (also known as exit status, return code and completion code) is captured by SLURM and saved as part of the job record. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Any non-zero exit code will be assumed to be a job failure and will result in a Job State of FAILED with a reason of &amp;quot;NonZeroExitCode&amp;quot;.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The exit code is an 8 bit unsigned number ranging between 0 and 255. While it is possible for a job to return a negative exit code, SLURM will display it as an unsigned value in the 0 - 255 range.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
==== Displaying Exit Codes and Signals ====&lt;br /&gt;
SLURM displays a job&#039;s exit code in the output of the &#039;&#039;&#039;scontrol show job&#039;&#039;&#039; and the sview utility.&lt;br /&gt;
&amp;lt;br&amp;gt; &lt;br /&gt;
When a signal was responsible for a job or step&#039;s termination, the signal number will be displayed after the exit code, delineated by a colon(:).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
==== Submitting Termination Signal ====&lt;br /&gt;
Here is an example, how to &#039;save&#039; a Slurm termination signal in a typical jobscript.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
exit_code=$?&lt;br /&gt;
mpirun  -np &amp;lt;#cores&amp;gt;  &amp;lt;EXE_BIN_DIR&amp;gt;/&amp;lt;executable&amp;gt; ... (options)  2&amp;gt;&amp;amp;1&lt;br /&gt;
[ &amp;quot;$exit_code&amp;quot; -eq 0 ] &amp;amp;&amp;amp; echo &amp;quot;all clean...&amp;quot; || \&lt;br /&gt;
   echo &amp;quot;Executable &amp;lt;EXE_BIN_DIR&amp;gt;/&amp;lt;executable&amp;gt; finished with exit code ${$exit_code}&amp;quot;&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
* Do not use &#039;&#039;&#039;&#039;time&#039;&#039;&#039;&#039; mpirun! The exit code will be the one submitted by the first (time) program.&lt;br /&gt;
* You do not need an &#039;&#039;&#039;exit $exit_code&#039;&#039;&#039; in the scripts.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
----&lt;br /&gt;
[[#top|Back to top]]&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15667</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15667"/>
		<updated>2025-12-19T18:09:21Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: More on Lustre (3)&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
[[File:Binac2 schema.png|600px|thumb|center|Overview on the BinAC 2 hardware architecture.]]&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.6&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 168 / 12 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80 / 2.95&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GiB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question:&#039;&#039;&#039;&amp;lt;/br&amp;gt;&lt;br /&gt;
OpenMPI throws the following warning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
No OpenFabrics connection schemes reported that they were able to be&lt;br /&gt;
used on a specific port.  As such, the openib BTL (OpenFabrics&lt;br /&gt;
support) will be disabled for this port.&lt;br /&gt;
  Local host:           node1-083&lt;br /&gt;
  Local device:         mlx5_0&lt;br /&gt;
  Local port:           1&lt;br /&gt;
  CPCs attempted:       rdmacm, udcm&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[node1-083:2137377] 3 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port&lt;br /&gt;
[node1-083:2137377] Set MCA parameter &amp;quot;orte_base_help_aggregate&amp;quot; to 0 to see all help / error messages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
What should i do?&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039;&amp;lt;/br&amp;gt;&lt;br /&gt;
BinAC2 has two (almost) separate networks, a 100GbE network and and InfiniBand network, both connecting a subset of the nodes. Both networks require different cables and switches.  &lt;br /&gt;
Concerning the network cards for the nodes, however, there exist VPI network cards which can be configured to work in either mode (https://docs.nvidia.com/networking/display/connectx6vpi/specifications#src-2487215234_Specifications-MCX653105A-ECATSpecifications).&lt;br /&gt;
OpenMPI can use a number of layers for transferring data and messages between processes. When it ramps up, it will test all means of communication that were configured during compilation and then tries to figure out the fastest path between all processes. &lt;br /&gt;
If OpenMPI encounters such a VPI card, it will first try to establish a Remote Direct Memory Access communication (RDMA) channel using the OpenFabrics (OFI) layer.&lt;br /&gt;
On nodes with 100Gb ethernet, this fails as there is no RDMA protocol configured. OpenMPI will fall back to TCP transport but not without complaints.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Workaround:&#039;&#039;&#039;&amp;lt;/br&amp;gt;  &lt;br /&gt;
For single-node jobs or on regular compute nodes, A30 and A100 GPU nodes: Add the lines&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export OMPI_MCA_btl=&amp;quot;^ofi,openib&amp;quot;&lt;br /&gt;
export OMPI_MCA_mtl=&amp;quot;^ofi&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your job script to disable the OFI transport layer. If you need high-bandwidth, low-latency transport between all processes on all nodes, switch to the Infiniband partition (&amp;lt;code&amp;gt;#SBATCH --constraint=ib&amp;lt;/code&amp;gt;). &#039;&#039;Do not turn off the OFI layer on Infiniband nodes as this will be the best choice between nodes!&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed in parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 480 GB (compute nodes); 7.7 TB (GPU-A30 nodes); 16 TB (GPU-A100 and SMP nodes); 31 TB (GPU-H200 nodes)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | File System Type&lt;br /&gt;
| NFS&lt;br /&gt;
| Lustre&lt;br /&gt;
| Lustre&lt;br /&gt;
| XFS&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA-30)/ ≈ 26 GB/s (GPU-A100 + SMP)/ ≈ 42 GB/s (GPU-H200) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes (nightly)&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;color:red; background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Please note that due to the large capacity of &#039;&#039;&#039;work&#039;&#039;&#039; and &#039;&#039;&#039;project&#039;&#039;&#039; and due to frequent file changes on these file systems, no backup can be provided.&amp;lt;/br&amp;gt;&lt;br /&gt;
Backing up these file systems would require a redundant storage facility with multiple times the capacity of &#039;&#039;&#039;project&#039;&#039;&#039;. Furthermore, regular backups would significantly degrade the performance.&amp;lt;/br&amp;gt;&lt;br /&gt;
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
Every project gets a dedicated directory located at:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
/pfs/10/project/&amp;lt;project_id&amp;gt;/&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check the project(s) you are member of via:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
# id $USER | grep -o &#039;bw[^)]*&#039;&lt;br /&gt;
bw16f003&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this case, your project directory would be:&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
/pfs/10/project/bw16f003/&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Check our [[BinAC2/Project_Data_Organization | data organization guide ]] for methods to organize data inside the project directory.&lt;br /&gt;
&lt;br /&gt;
=== Workspaces ===&lt;br /&gt;
&lt;br /&gt;
Data on the fast storage pool at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs.&lt;br /&gt;
The primary focus is speed, not capacity.&lt;br /&gt;
&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user should create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
&lt;br /&gt;
You can find more info on workspace tools on our general page:&lt;br /&gt;
&lt;br /&gt;
:: &amp;amp;rarr; &#039;&#039;&#039;[[Workspace]]s&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first or use the &amp;lt;code&amp;gt;--delete-data&amp;lt;/code&amp;gt; option.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;br /&gt;
&lt;br /&gt;
=== More Details on the Lustre File System ===&lt;br /&gt;
[https://www.lustre.org/ Lustre] is a distributed parallel file system.&lt;br /&gt;
* The entire logical volume as presented to the user is formed by multiple physical or local drives. Data is distributed over more than one physical or logical volume/hard drive, single files can be larger than the capacity of a single hard drive.&lt;br /&gt;
* The file system can be mounted from all nodes (&amp;quot;clients&amp;quot;) in parallel at the same time for reading and writing. &amp;lt;i&amp;gt;This also means that technically you can write to the same file from two different compute nodes! Usually, this will create an unpredictable mess! Never ever do this unless you know &amp;lt;b&amp;gt;exactly&amp;lt;/b&amp;gt; what you are doing!&amp;lt;/i&amp;gt;&lt;br /&gt;
* On a single server or client, the bandwidth of multiple network interfaces can be aggregated to increase the throughput (&amp;quot;multi-rail&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
Lustre works by chopping files into many small parts (&amp;quot;stripes&amp;quot;, file objects) which are then stored on the object storage servers. The information which part of the file is stored where on which object storage server, when it was changed last etc. and the entire directory structure is stored on the metadata servers. Think of the entries on the metadata server as being pointers pointing to the actual file objects on the object storage servers.&lt;br /&gt;
A Lustre file system can consist of many metadata servers (MDS) and object storage servers (OSS).&lt;br /&gt;
Each MDS or OSS can again hold one or more so-called object storage targets (OST) or metadata targets (MDT) which can e.g. be simply multiple hard drives.&lt;br /&gt;
The capacity of a Lustre file system can hence be easily scaled by adding more servers.&lt;br /&gt;
&lt;br /&gt;
==== Useful Lustre Comamnds ====&lt;br /&gt;
Commands specific to the Lustre file system are divided into user commands (&amp;lt;code&amp;gt;lfs ...&amp;lt;/code&amp;gt;) and administrative commands (&amp;lt;code&amp;gt;lctl ...&amp;lt;/code&amp;gt;). On BinAC2, users may only execute user commands, and also not all of them.&lt;br /&gt;
* &amp;lt;code&amp;gt;lfs help &amp;lt;command&amp;gt;&amp;lt;/code&amp;gt;: Print built-in help for command; Alternative: &amp;lt;code&amp;gt;man lfs &amp;lt;command&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;lfs find&amp;lt;/code&amp;gt;: Drop-in replacement for the &amp;lt;code&amp;gt;find&amp;lt;/code&amp;gt; command, much faster on Lustre filesystems as it directly talks to the metadata sever&lt;br /&gt;
* &amp;lt;code&amp;gt;lfs --list-commands&amp;lt;/code&amp;gt;: Print a list of available commands&lt;br /&gt;
&lt;br /&gt;
==== Moving data between WORK and PROJECT ====&lt;br /&gt;
&amp;lt;b&amp;gt;!! IMPORTANT !!&amp;lt;/b&amp;gt; Calling &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt; on files will &amp;lt;i&amp;gt;not&amp;lt;/i&amp;gt; physically move them between the fast and the slow pool of the file system. Instead, the file metadata, i.e. the path to the file in the directory tree will be modified (i.e. data stored on the MDS). The stripes of the file on the OSS, however, will remain exactly were they were. The only result will be the confusing situation that you now have metadata entries under &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that still point to WORK OSTs. This may sound confusing at first. When using &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt; on the same file system, Lustre only renames the files and makes them available from a different path. The pointers to the file objects on the OSS stay identical. This will only change if you either create a copy of the file at a different path (with &amp;lt;code&amp;gt;cp&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;rsync&amp;lt;/code&amp;gt;, e.g.) or if you explicitly instruct Lustre to move the actual file objects to another storage location, e.g. another pool of the same file system.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Proper ways of moving data between the pools&amp;lt;/b&amp;gt;&amp;lt;/br&amp;gt;&lt;br /&gt;
* Copy the data - which will create new files -, then delete the old files. Example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; cp -ar /pfs/10/work/tu_abcde01-my-precious-ws/* /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/.&lt;br /&gt;
$&amp;gt; rm -rf /pfs/10/work/tu_abcde01-my-precious-ws/*&lt;br /&gt;
$&amp;gt; ws_release --delete-data my-precious-ws&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Alternative to copy: use &amp;lt;code&amp;gt;rsync&amp;lt;/code&amp;gt; to copy data between the workspace and the project directories. Example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; rsync -av /pfs/10/work/tu_abcde01-my-precious-ws/simulation/output /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/simulation25/&lt;br /&gt;
$&amp;gt; rm -rf /pfs/10/work/tu_abcde01-my-precious-ws/*&lt;br /&gt;
$&amp;gt; ws_release --delete-data my-precious-ws&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* If there are many subfolders with similar size, you can use &amp;lt;code&amp;gt;xargs&amp;lt;/code&amp;gt; to copy them in parallel:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; find . -maxdepth 1 -mindepth 1 -type d -print | xargs -P4 -I{} rsync -aHAXW --inplace --update {} /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/simulation25/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
will launch four parallel &amp;lt;code&amp;gt;rsync&amp;lt;/code&amp;gt; processes at a time, each will copy one of the subdirectories.&lt;br /&gt;
* First move the metadata with &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt;, then use &amp;lt;code&amp;gt;lfs migrate&amp;lt;/code&amp;gt; or the wrapper &amp;lt;code&amp;gt;lfs_migrate&amp;lt;/code&amp;gt; to actually migrate the file stripes. This is also a possible resolution if you already &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt;ed data from &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt; or vice versa.&lt;br /&gt;
** &amp;lt;code&amp;gt;lfs migrate&amp;lt;/code&amp;gt; is the raw lustre command. It can only operate on one file at a time, but offers access to all options.&lt;br /&gt;
** &amp;lt;code&amp;gt;lfs_migrate&amp;lt;/code&amp;gt; is a versatile wrapper script that can work on single files or recursively on entire directories. If available, it will try to use &amp;lt;code&amp;gt;lfs migrate&amp;lt;/code&amp;gt;, otherwise it will fall back to &amp;lt;code&amp;gt;rsync&amp;lt;/code&amp;gt; (see &amp;lt;code&amp;gt;lfs_migrate --help&amp;lt;/code&amp;gt; for all options.)&amp;lt;/br&amp;gt;&lt;br /&gt;
Example with &amp;lt;code&amp;gt;lfs migrate&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; mv /pfs/10/work/tu_abcde01-my-precious-ws/* /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/.&lt;br /&gt;
$&amp;gt; cd /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/.&lt;br /&gt;
$&amp;gt; lfs find . -type f --pool work -0 | xargs -0 lfs migrate --pool project # find all files whose file objects are on the work pool and migrate the objects to the project pool&lt;br /&gt;
$&amp;gt; ws_release --delete-data my-precious-ws&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Example with &amp;lt;code&amp;gt;lfs_migrate&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; mv /pfs/10/work/tu_abcde01-my-precious-ws/* /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/.&lt;br /&gt;
$&amp;gt; cd /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/.&lt;br /&gt;
$&amp;gt; lfs_migrate --yes -q -p project * # migrate all file objects in the current directory to the project pool, be quiet (-q) and do not ask for confirmation (--yes)&lt;br /&gt;
$&amp;gt; ws_release --delete-data my-precious-ws&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Both migration commands can also be combined with options to restripe the files during migration, i.e. you can also change the number of OSTs the file is striped over, the size of a single strip etc.&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Attention!&amp;lt;/b&amp;gt; Both &amp;lt;code&amp;gt;lfs migrate&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;lfs_migrate&amp;lt;/code&amp;gt; will &amp;lt;i&amp;gt;not&amp;lt;/i&amp;gt; change the path of the file(s), you must also &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt; them! If used without &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt;, the files will still belong to the workspace although their file object stripes are now on the &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt; pool and a subsequent &amp;lt;code&amp;gt;rm&amp;lt;/code&amp;gt; in the workspace will wipe them. &lt;br /&gt;
&lt;br /&gt;
All of the above procedures may take a considerable amount of time depending on the amount of data, so it might be advisable to execute them in a terminal multiplexer like &amp;lt;code&amp;gt;screen&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;tmux&amp;lt;/code&amp;gt; or wrap them into small SLURM jobs with &amp;lt;code&amp;gt;sbatch --wrap=&amp;quot;&amp;lt;command&amp;gt;&amp;quot;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Question&amp;lt;/b&amp;gt;:&amp;lt;/br&amp;gt; I totally lost overview, how do i find out where my files are located?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Answer&amp;lt;/b&amp;gt;:&amp;lt;/br&amp;gt;&lt;br /&gt;
* Use &amp;lt;code&amp;gt;lfs find&amp;lt;/code&amp;gt; to find files on a specific pool. Example: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; lfs find . --pool project # recursively find all files in the current directory whose file objects are on the &amp;quot;project&amp;quot; pool&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Use &amp;lt;code&amp;gt;lfs getstripe&amp;lt;/code&amp;gt; to query the striping pattern and the pool (also works recursively if called with a directory). Example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; lfs getstripe parameter.h &lt;br /&gt;
parameter.h&lt;br /&gt;
lmm_stripe_count:  1&lt;br /&gt;
lmm_stripe_size:   1048576&lt;br /&gt;
lmm_pattern:       raid0&lt;br /&gt;
lmm_layout_gen:    1&lt;br /&gt;
lmm_stripe_offset: 44&lt;br /&gt;
lmm_pool:          project&lt;br /&gt;
        obdidx           objid           objid           group&lt;br /&gt;
            44         7991938       0x79f282      0xd80000400&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
shows that the file is striped over OST 44 (obdidx) which belongs to pool project (lmm_pool).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Why pathes and storage pools should match:&amp;lt;/b&amp;gt;&amp;lt;/br&amp;gt;&lt;br /&gt;
There are four different possible scenarios with two subdirectories and two pools:&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;good.&amp;lt;/b&amp;gt;&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;good.&amp;lt;/b&amp;gt;&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;bad&amp;lt;/b&amp;gt;. This will &amp;quot;leak&amp;quot; storage from the fast pool, making it unavailable for workspaces.&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;bad&amp;lt;/b&amp;gt;. Access will be slow, and if (volatile) workspaces are purged, data residing on &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt; will (voluntarily or involuntarily) be deleted.&lt;br /&gt;
The latter two situations may arise from &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt;ing data between workspaces and project folders.&lt;br /&gt;
&lt;br /&gt;
==== More on data striping and how to influence it ====&lt;br /&gt;
&amp;lt;b&amp;gt;!! The default striping patterns on BinAC2 are set for good reasons and should not light-heartedly be changed!&amp;lt;/br&amp;gt; Doing so wrongly will in the best case only hurt your performance.&amp;lt;/br&amp;gt; In the worst case, it will also hurt all other users and endanger the stability of the cluster.&amp;lt;/br&amp;gt; Please talk to the admins first if you think that you need a non-default pattern.&amp;lt;/b&amp;gt;&lt;br /&gt;
* Reading striping patterns with &amp;lt;code&amp;gt;lfs getstripe&amp;lt;/code&amp;gt;&lt;br /&gt;
* Setting striping patterns with &amp;lt;code&amp;gt;lfs setstripe&amp;lt;/code&amp;gt; for new files and directories&lt;br /&gt;
* Restriping files with &amp;lt;code&amp;gt;lfs migrate&amp;lt;/code&amp;gt;&lt;br /&gt;
* Progressive File Layout&lt;br /&gt;
&lt;br /&gt;
==== Architecture of BinAC2&#039;s Lustre File System ====&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata Servers:&amp;lt;/b&amp;gt;&lt;br /&gt;
* 2 metadata servers&lt;br /&gt;
* 1 MDT per server&lt;br /&gt;
* MDT Capacity: 31TB, hardware RAID6 on NVMe drives (flash memory/SSD)&lt;br /&gt;
* Networking: 2x 100 GbE, 2x HDR-100 InfiniBand&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Object storage servers:&amp;lt;/b&amp;gt;&lt;br /&gt;
* 8 object storage servers&lt;br /&gt;
* 2 fast OSTs per server&lt;br /&gt;
** 70 TB per OST, software RAID (raid-z2, 10+2 reduncancy)&lt;br /&gt;
** NVMe drives, directly attached to the PCIe bus&lt;br /&gt;
* 8 slow OSTs per server&lt;br /&gt;
** 143 TB per OST, hardware RAID (RAID6, 8+2 redundancy)&lt;br /&gt;
** externally attached via SAS&lt;br /&gt;
* Networking: 2x 100 GbE, 2x HDR-100 InfiniBand&lt;br /&gt;
&lt;br /&gt;
* All fast OSTs are assigned to the pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;&lt;br /&gt;
* All slow OSTs are assigned to the pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;&lt;br /&gt;
* All files that are created under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; are by default stored on the fast pool&lt;br /&gt;
* All files that are created under &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; are by default stored on the slow pool&lt;br /&gt;
* Metadata is distributed over both MDTs. All subdirectories of a directory (workspace or project folder) are typically on the same MDT. Directory striping/placement on MDTs can not be influenced by users.&lt;br /&gt;
* Default OST striping: Stripes have size 1 MiB. Files are striped over one OST if possible, i.e. all stripes of a file are on the same OST. New files are created on the most empty OST.&lt;br /&gt;
Internally, the slow and the fast pool belong to the same Lustre file system and namespace.&lt;br /&gt;
&lt;br /&gt;
More reading:&lt;br /&gt;
* [https://doc.lustre.org/lustre_manual.xhtml The Lustre 2.X Manual] ([http://doc.lustre.org/lustre_manual.pdf PDF])&lt;br /&gt;
* [https://wiki.lustre.org/Main_Page The Lustre Wiki]&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15666</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15666"/>
		<updated>2025-12-19T17:57:00Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: More on Lustre (2)&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
[[File:Binac2 schema.png|600px|thumb|center|Overview on the BinAC 2 hardware architecture.]]&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.6&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 168 / 12 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80 / 2.95&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GiB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question:&#039;&#039;&#039;&amp;lt;/br&amp;gt;&lt;br /&gt;
OpenMPI throws the following warning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
No OpenFabrics connection schemes reported that they were able to be&lt;br /&gt;
used on a specific port.  As such, the openib BTL (OpenFabrics&lt;br /&gt;
support) will be disabled for this port.&lt;br /&gt;
  Local host:           node1-083&lt;br /&gt;
  Local device:         mlx5_0&lt;br /&gt;
  Local port:           1&lt;br /&gt;
  CPCs attempted:       rdmacm, udcm&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[node1-083:2137377] 3 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port&lt;br /&gt;
[node1-083:2137377] Set MCA parameter &amp;quot;orte_base_help_aggregate&amp;quot; to 0 to see all help / error messages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
What should i do?&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039;&amp;lt;/br&amp;gt;&lt;br /&gt;
BinAC2 has two (almost) separate networks, a 100GbE network and and InfiniBand network, both connecting a subset of the nodes. Both networks require different cables and switches.  &lt;br /&gt;
Concerning the network cards for the nodes, however, there exist VPI network cards which can be configured to work in either mode (https://docs.nvidia.com/networking/display/connectx6vpi/specifications#src-2487215234_Specifications-MCX653105A-ECATSpecifications).&lt;br /&gt;
OpenMPI can use a number of layers for transferring data and messages between processes. When it ramps up, it will test all means of communication that were configured during compilation and then tries to figure out the fastest path between all processes. &lt;br /&gt;
If OpenMPI encounters such a VPI card, it will first try to establish a Remote Direct Memory Access communication (RDMA) channel using the OpenFabrics (OFI) layer.&lt;br /&gt;
On nodes with 100Gb ethernet, this fails as there is no RDMA protocol configured. OpenMPI will fall back to TCP transport but not without complaints.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Workaround:&#039;&#039;&#039;&amp;lt;/br&amp;gt;  &lt;br /&gt;
For single-node jobs or on regular compute nodes, A30 and A100 GPU nodes: Add the lines&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export OMPI_MCA_btl=&amp;quot;^ofi,openib&amp;quot;&lt;br /&gt;
export OMPI_MCA_mtl=&amp;quot;^ofi&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your job script to disable the OFI transport layer. If you need high-bandwidth, low-latency transport between all processes on all nodes, switch to the Infiniband partition (&amp;lt;code&amp;gt;#SBATCH --constraint=ib&amp;lt;/code&amp;gt;). &#039;&#039;Do not turn off the OFI layer on Infiniband nodes as this will be the best choice between nodes!&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed in parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 480 GB (compute nodes); 7.7 TB (GPU-A30 nodes); 16 TB (GPU-A100 and SMP nodes); 31 TB (GPU-H200 nodes)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | File System Type&lt;br /&gt;
| NFS&lt;br /&gt;
| Lustre&lt;br /&gt;
| Lustre&lt;br /&gt;
| XFS&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA-30)/ ≈ 26 GB/s (GPU-A100 + SMP)/ ≈ 42 GB/s (GPU-H200) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes (nightly)&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;color:red; background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Please note that due to the large capacity of &#039;&#039;&#039;work&#039;&#039;&#039; and &#039;&#039;&#039;project&#039;&#039;&#039; and due to frequent file changes on these file systems, no backup can be provided.&amp;lt;/br&amp;gt;&lt;br /&gt;
Backing up these file systems would require a redundant storage facility with multiple times the capacity of &#039;&#039;&#039;project&#039;&#039;&#039;. Furthermore, regular backups would significantly degrade the performance.&amp;lt;/br&amp;gt;&lt;br /&gt;
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
Every project gets a dedicated directory located at:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
/pfs/10/project/&amp;lt;project_id&amp;gt;/&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check the project(s) you are member of via:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
# id $USER | grep -o &#039;bw[^)]*&#039;&lt;br /&gt;
bw16f003&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this case, your project directory would be:&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
/pfs/10/project/bw16f003/&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Check our [[BinAC2/Project_Data_Organization | data organization guide ]] for methods to organize data inside the project directory.&lt;br /&gt;
&lt;br /&gt;
=== Workspaces ===&lt;br /&gt;
&lt;br /&gt;
Data on the fast storage pool at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs.&lt;br /&gt;
The primary focus is speed, not capacity.&lt;br /&gt;
&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user should create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
&lt;br /&gt;
You can find more info on workspace tools on our general page:&lt;br /&gt;
&lt;br /&gt;
:: &amp;amp;rarr; &#039;&#039;&#039;[[Workspace]]s&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first or use the &amp;lt;code&amp;gt;--delete-data&amp;lt;/code&amp;gt; option.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;br /&gt;
&lt;br /&gt;
=== More Details on the Lustre File System ===&lt;br /&gt;
[https://www.lustre.org/ Lustre] is a distributed parallel file system.&lt;br /&gt;
* The entire logical volume as presented to the user is formed by multiple physical or local drives. Data is distributed over more than one physical or logical volume/hard drive, single files can be larger than the capacity of a single hard drive.&lt;br /&gt;
* The file system can be mounted from all nodes (&amp;quot;clients&amp;quot;) in parallel at the same time for reading and writing. &amp;lt;i&amp;gt;This also means that technically you can write to the same file from two different compute nodes! Usually, this will create an unpredictable mess! Never ever do this unless you know &amp;lt;b&amp;gt;exactly&amp;lt;/b&amp;gt; what you are doing!&amp;lt;/i&amp;gt;&lt;br /&gt;
* On a single server or client, the bandwidth of multiple network interfaces can be aggregated to increase the throughput (&amp;quot;multi-rail&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
Lustre works by chopping files into many small parts (&amp;quot;stripes&amp;quot;, file objects) which are then stored on the object storage servers. The information which part of the file is stored where on which object storage server, when it was changed last etc. and the entire directory structure is stored on the metadata servers. Think of the entries on the metadata server as being pointers pointing to the actual file objects on the object storage servers.&lt;br /&gt;
A Lustre file system can consist of many metadata servers (MDS) and object storage servers (OSS).&lt;br /&gt;
Each MDS or OSS can again hold one or more so-called object storage targets (OST) or metadata targets (MDT) which can e.g. be simply multiple hard drives.&lt;br /&gt;
The capacity of a Lustre file system can hence be easily scaled by adding more servers.&lt;br /&gt;
&lt;br /&gt;
==== Useful Lustre Comamnds ====&lt;br /&gt;
Commands specific to the Lustre file system are divided into user commands (&amp;lt;code&amp;gt;lfs ...&amp;lt;/code&amp;gt;) and administrative commands (&amp;lt;code&amp;gt;lctl ...&amp;lt;/code&amp;gt;). On BinAC2, users may only execute user commands, and also not all of them.&lt;br /&gt;
* &amp;lt;code&amp;gt;lfs help &amp;lt;command&amp;gt;&amp;lt;/code&amp;gt;: Print built-in help for command; Alternative: &amp;lt;code&amp;gt;man lfs &amp;lt;command&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
* &amp;lt;code&amp;gt;lfs find&amp;lt;/code&amp;gt;: Drop-in replacement for the &amp;lt;code&amp;gt;find&amp;lt;/code&amp;gt; command, much faster on Lustre filesystems as it directly talks to the metadata sever&lt;br /&gt;
* &amp;lt;code&amp;gt;lfs --list-commands&amp;lt;/code&amp;gt;: Print a list of available commands&lt;br /&gt;
&lt;br /&gt;
==== Moving data between WORK and PROJECT ====&lt;br /&gt;
&amp;lt;b&amp;gt;!! IMPORTANT !!&amp;lt;/b&amp;gt; Calling &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt; on files will &amp;lt;i&amp;gt;not&amp;lt;/i&amp;gt; physically move them between the fast and the slow pool of the file system. Instead, the file metadata, i.e. the path to the file in the directory tree will be modified (i.e. data stored on the MDS). The stripes of the file on the OSS, however, will remain exactly were they were. The only result will be the confusing situation that you now have metadata entries under &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that still point to WORK OSTs. This may sound confusing at first. When using &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt; on the same file system, Lustre only renames the files and makes them available from a different path. The pointers to the file objects on the OSS stay identical. This will only change if you either create a copy of the file at a different path (with &amp;lt;code&amp;gt;cp&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;rsync&amp;lt;/code&amp;gt;, e.g.) or if you explicitly instruct Lustre to move the actual file objects to another storage location, e.g. another pool of the same file system.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Proper ways of moving data between the pools&amp;lt;/b&amp;gt;&amp;lt;/br&amp;gt;&lt;br /&gt;
* Copy the data - which will create new files -, then delete the old files. Example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; cp -ar /pfs/10/work/tu_abcde01-my-precious-ws/* /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/.&lt;br /&gt;
$&amp;gt; rm -rf /pfs/10/work/tu_abcde01-my-precious-ws/*&lt;br /&gt;
$&amp;gt; ws_release --delete-data my-precious-ws&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Alternative to copy: use &amp;lt;code&amp;gt;rsync&amp;lt;/code&amp;gt; to copy data between the workspace and the project directories. Example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; rsync -av /pfs/10/work/tu_abcde01-my-precious-ws/simulation/output /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/simulation25/&lt;br /&gt;
$&amp;gt; rm -rf /pfs/10/work/tu_abcde01-my-precious-ws/*&lt;br /&gt;
$&amp;gt; ws_release --delete-data my-precious-ws&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* If there are many subfolders with similar size, you can use &amp;lt;code&amp;gt;xargs&amp;lt;/code&amp;gt; to copy them in parallel:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; find . -maxdepth 1 -mindepth 1 -type d -print | xargs -P4 -I{} rsync -aHAXW --inplace --update {} /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/simulation25/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
will launch four parallel &amp;lt;code&amp;gt;rsync&amp;lt;/code&amp;gt; processes at a time, each will copy one of the subdirectories.&lt;br /&gt;
* First move the metadata with &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt;, then use &amp;lt;code&amp;gt;lfs migrate&amp;lt;/code&amp;gt; or the wrapper &amp;lt;code&amp;gt;lfs_migrate&amp;lt;/code&amp;gt; to actually migrate the file stripes. This is also a possible resolution if you already &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt;ed data from &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt; to &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt; or vice versa.&lt;br /&gt;
** &amp;lt;code&amp;gt;lfs migrate&amp;lt;/code&amp;gt; is the raw lustre command. It can only operate on one file at a time, but offers access to all options.&lt;br /&gt;
** &amp;lt;code&amp;gt;lfs_migrate&amp;lt;/code&amp;gt; is a versatile wrapper script that can work on single files or recursively on entire directories. If available, it will try to use &amp;lt;code&amp;gt;lfs migrate&amp;lt;/code&amp;gt;, otherwise it will fall back to &amp;lt;code&amp;gt;rsync&amp;lt;/code&amp;gt; (see &amp;lt;code&amp;gt;lfs_migrate --help&amp;lt;/code&amp;gt; for all options.)&amp;lt;/br&amp;gt;&lt;br /&gt;
Example with &amp;lt;code&amp;gt;lfs migrate&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; mv /pfs/10/work/tu_abcde01-my-precious-ws/* /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/.&lt;br /&gt;
$&amp;gt; cd /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/.&lt;br /&gt;
$&amp;gt; lfs find . -type f --pool work -0 | xargs -0 lfs migrate --pool project # find all files whose file objects are on the work pool and migrate the objects to the project pool&lt;br /&gt;
$&amp;gt; ws_release --delete-data my-precious-ws&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Example with &amp;lt;code&amp;gt;lfs_migrate&amp;lt;/code&amp;gt;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; mv /pfs/10/work/tu_abcde01-my-precious-ws/* /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/.&lt;br /&gt;
$&amp;gt; cd /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/.&lt;br /&gt;
$&amp;gt; lfs_migrate --yes -q -p project * # migrate all file objects in the current directory to the project pool, be quiet (-q) and do not ask for confirmation (--yes)&lt;br /&gt;
$&amp;gt; ws_release --delete-data my-precious-ws&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Both migration commands can also be combined with options to restripe the files during migration, i.e. you can also change the number of OSTs the file is striped over, the size of a single strip etc.&amp;lt;/br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;Attention!&amp;lt;/b&amp;gt; Both &amp;lt;code&amp;gt;lfs migrate&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;lfs_migrate&amp;lt;/code&amp;gt; will &amp;lt;i&amp;gt;not&amp;lt;/i&amp;gt; change the path of the file(s), you must also &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt; them! If used without &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt;, the files will still belong to the workspace although their file object stripes are now on the &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt; pool and a subsequent &amp;lt;code&amp;gt;rm&amp;lt;/code&amp;gt; in the workspace will wipe them. &lt;br /&gt;
&lt;br /&gt;
All of the above procedures may take a considerable amount of time depending on the amount of data, so it might be advisable to execute them in a terminal multiplexer like &amp;lt;code&amp;gt;screen&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;tmux&amp;lt;/code&amp;gt; or wrap them into small SLURM jobs with &amp;lt;code&amp;gt;sbatch --wrap=&amp;quot;&amp;lt;command&amp;gt;&amp;quot;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Why pathes and storage pools should match:&amp;lt;/b&amp;gt;&amp;lt;/br&amp;gt;&lt;br /&gt;
There are four different possible scenarios with two subdirectories and two pools:&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;good.&amp;lt;/b&amp;gt;&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;good.&amp;lt;/b&amp;gt;&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;bad&amp;lt;/b&amp;gt;. This will &amp;quot;leak&amp;quot; storage from the fast pool, making it unavailable for workspaces.&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;bad&amp;lt;/b&amp;gt;. Access will be slow, and if (volatile) workspaces are purged, data residing on &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt; will (voluntarily or involuntarily) be deleted.&lt;br /&gt;
The latter two situations may arise from &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt;ing data between workspaces and project folders.&lt;br /&gt;
&lt;br /&gt;
==== More on data striping and how to influence it ====&lt;br /&gt;
&amp;lt;b&amp;gt;!! The default striping patterns on BinAC2 are set for good reasons and should not light-heartedly be changed!&amp;lt;/br&amp;gt; Doing so wrongly will in the best case only hurt your performance.&amp;lt;/br&amp;gt; In the worst case, it will also hurt all other users and endanger the stability of the cluster.&amp;lt;/br&amp;gt; Please talk to the admins first if you think that you need a non-default pattern.&amp;lt;/b&amp;gt;&lt;br /&gt;
* Reading striping patterns with &amp;lt;code&amp;gt;lfs getstripe&amp;lt;/code&amp;gt;&lt;br /&gt;
* Setting striping patterns with &amp;lt;code&amp;gt;lfs setstripe&amp;lt;/code&amp;gt; for new files and directories&lt;br /&gt;
* Restriping files with &amp;lt;code&amp;gt;lfs migrate&amp;lt;/code&amp;gt;&lt;br /&gt;
* Progressive File Layout&lt;br /&gt;
&lt;br /&gt;
==== Architecture of BinAC2&#039;s Lustre File System ====&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata Servers:&amp;lt;/b&amp;gt;&lt;br /&gt;
* 2 metadata servers&lt;br /&gt;
* 1 MDT per server&lt;br /&gt;
* MDT Capacity: 31TB, hardware RAID6 on NVMe drives (flash memory/SSD)&lt;br /&gt;
* Networking: 2x 100 GbE, 2x HDR-100 InfiniBand&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Object storage servers:&amp;lt;/b&amp;gt;&lt;br /&gt;
* 8 object storage servers&lt;br /&gt;
* 2 fast OSTs per server&lt;br /&gt;
** 70 TB per OST, software RAID (raid-z2, 10+2 reduncancy)&lt;br /&gt;
** NVMe drives, directly attached to the PCIe bus&lt;br /&gt;
* 8 slow OSTs per server&lt;br /&gt;
** 143 TB per OST, hardware RAID (RAID6, 8+2 redundancy)&lt;br /&gt;
** externally attached via SAS&lt;br /&gt;
* Networking: 2x 100 GbE, 2x HDR-100 InfiniBand&lt;br /&gt;
&lt;br /&gt;
* All fast OSTs are assigned to the pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;&lt;br /&gt;
* All slow OSTs are assigned to the pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;&lt;br /&gt;
* All files that are created under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; are by default stored on the fast pool&lt;br /&gt;
* All files that are created under &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; are by default stored on the slow pool&lt;br /&gt;
* Metadata is distributed over both MDTs. All subdirectories of a directory (workspace or project folder) are typically on the same MDT. Directory striping/placement on MDTs can not be influenced by users.&lt;br /&gt;
* Default OST striping: Stripes have size 1 MiB. Files are striped over one OST if possible, i.e. all stripes of a file are on the same OST. New files are created on the most empty OST.&lt;br /&gt;
Internally, the slow and the fast pool belong to the same Lustre file system and namespace.&lt;br /&gt;
&lt;br /&gt;
More reading:&lt;br /&gt;
* [https://doc.lustre.org/lustre_manual.xhtml The Lustre 2.X Manual] ([http://doc.lustre.org/lustre_manual.pdf PDF])&lt;br /&gt;
* [https://wiki.lustre.org/Main_Page The Lustre Wiki]&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15665</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15665"/>
		<updated>2025-12-19T16:23:28Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
[[File:Binac2 schema.png|600px|thumb|center|Overview on the BinAC 2 hardware architecture.]]&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.6&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 168 / 12 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80 / 2.95&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GiB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question:&#039;&#039;&#039;&amp;lt;/br&amp;gt;&lt;br /&gt;
OpenMPI throws the following warning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
No OpenFabrics connection schemes reported that they were able to be&lt;br /&gt;
used on a specific port.  As such, the openib BTL (OpenFabrics&lt;br /&gt;
support) will be disabled for this port.&lt;br /&gt;
  Local host:           node1-083&lt;br /&gt;
  Local device:         mlx5_0&lt;br /&gt;
  Local port:           1&lt;br /&gt;
  CPCs attempted:       rdmacm, udcm&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[node1-083:2137377] 3 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port&lt;br /&gt;
[node1-083:2137377] Set MCA parameter &amp;quot;orte_base_help_aggregate&amp;quot; to 0 to see all help / error messages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
What should i do?&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039;&amp;lt;/br&amp;gt;&lt;br /&gt;
BinAC2 has two (almost) separate networks, a 100GbE network and and InfiniBand network, both connecting a subset of the nodes. Both networks require different cables and switches.  &lt;br /&gt;
Concerning the network cards for the nodes, however, there exist VPI network cards which can be configured to work in either mode (https://docs.nvidia.com/networking/display/connectx6vpi/specifications#src-2487215234_Specifications-MCX653105A-ECATSpecifications).&lt;br /&gt;
OpenMPI can use a number of layers for transferring data and messages between processes. When it ramps up, it will test all means of communication that were configured during compilation and then tries to figure out the fastest path between all processes. &lt;br /&gt;
If OpenMPI encounters such a VPI card, it will first try to establish a Remote Direct Memory Access communication (RDMA) channel using the OpenFabrics (OFI) layer.&lt;br /&gt;
On nodes with 100Gb ethernet, this fails as there is no RDMA protocol configured. OpenMPI will fall back to TCP transport but not without complaints.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Workaround:&#039;&#039;&#039;&amp;lt;/br&amp;gt;  &lt;br /&gt;
For single-node jobs or on regular compute nodes, A30 and A100 GPU nodes: Add the lines&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export OMPI_MCA_btl=&amp;quot;^ofi,openib&amp;quot;&lt;br /&gt;
export OMPI_MCA_mtl=&amp;quot;^ofi&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your job script to disable the OFI transport layer. If you need high-bandwidth, low-latency transport between all processes on all nodes, switch to the Infiniband partition (&amp;lt;code&amp;gt;#SBATCH --constraint=ib&amp;lt;/code&amp;gt;). &#039;&#039;Do not turn off the OFI layer on Infiniband nodes as this will be the best choice between nodes!&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed in parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 480 GB (compute nodes); 7.7 TB (GPU-A30 nodes); 16 TB (GPU-A100 and SMP nodes); 31 TB (GPU-H200 nodes)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | File System Type&lt;br /&gt;
| NFS&lt;br /&gt;
| Lustre&lt;br /&gt;
| Lustre&lt;br /&gt;
| XFS&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA-30)/ ≈ 26 GB/s (GPU-A100 + SMP)/ ≈ 42 GB/s (GPU-H200) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes (nightly)&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;color:red; background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Please note that due to the large capacity of &#039;&#039;&#039;work&#039;&#039;&#039; and &#039;&#039;&#039;project&#039;&#039;&#039; and due to frequent file changes on these file systems, no backup can be provided.&amp;lt;/br&amp;gt;&lt;br /&gt;
Backing up these file systems would require a redundant storage facility with multiple times the capacity of &#039;&#039;&#039;project&#039;&#039;&#039;. Furthermore, regular backups would significantly degrade the performance.&amp;lt;/br&amp;gt;&lt;br /&gt;
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
Every project gets a dedicated directory located at:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
/pfs/10/project/&amp;lt;project_id&amp;gt;/&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check the project(s) you are member of via:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
# id $USER | grep -o &#039;bw[^)]*&#039;&lt;br /&gt;
bw16f003&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this case, your project directory would be:&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
/pfs/10/project/bw16f003/&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Check our [[BinAC2/Project_Data_Organization | data organization guide ]] for methods to organize data inside the project directory.&lt;br /&gt;
&lt;br /&gt;
=== Workspaces ===&lt;br /&gt;
&lt;br /&gt;
Data on the fast storage pool at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs.&lt;br /&gt;
The primary focus is speed, not capacity.&lt;br /&gt;
&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user should create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
&lt;br /&gt;
You can find more info on workspace tools on our general page:&lt;br /&gt;
&lt;br /&gt;
:: &amp;amp;rarr; &#039;&#039;&#039;[[Workspace]]s&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first or use the &amp;lt;code&amp;gt;--delete-data&amp;lt;/code&amp;gt; option.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;br /&gt;
&lt;br /&gt;
=== More Details on the Lustre File System ===&lt;br /&gt;
[https://www.lustre.org/ Lustre] is a distributed parallel file system.&lt;br /&gt;
* The entire logical volume as presented to the user is formed by multiple physical or local drives. Data is distributed over more than one physical or logical volume/hard drive, single files can be larger than the capacity of a single hard drive.&lt;br /&gt;
* The file system can be mounted from all nodes (&amp;quot;clients&amp;quot;) in parallel at the same time for reading and writing. &amp;lt;i&amp;gt;This also means that technically you can write to the same file from two different compute nodes! Usually, this will create an unpredictable mess! Never ever do this unless you know &amp;lt;b&amp;gt;exactly&amp;lt;/b&amp;gt; what you are doing!&amp;lt;/i&amp;gt;&lt;br /&gt;
* On a single server or client, the bandwidth of multiple network interfaces can be aggregated to increase the throughput (&amp;quot;multi-rail&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
Lustre works by chopping files into many small parts (&amp;quot;stripes&amp;quot;, file objects) which are then stored on the object storage servers. The information which part of the file is stored where on which object storage server, when it was changed last etc. and the entire directory structure is stored on the metadata servers. Think of the entries on the metadata server as being pointers pointing to the actual file objects on the object storage servers.&lt;br /&gt;
A Lustre file system can consist of many metadata servers (MDS) and object storage servers (OSS).&lt;br /&gt;
Each MDS or OSS can again hold one or more so-called object storage targets (OST) or metadata targets (MDT) which can e.g. be simply multiple hard drives.&lt;br /&gt;
The capacity of a Lustre file system can hence be easily scaled by adding more servers.&lt;br /&gt;
&lt;br /&gt;
==== Architecture of BinAC2&#039;s Lustre File System ====&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata Servers:&amp;lt;/b&amp;gt;&lt;br /&gt;
* 2 metadata servers&lt;br /&gt;
* 1 MDT per server&lt;br /&gt;
* MDT Capacity: 31TB, hardware RAID6 on NVMe drives (flash memory/SSD)&lt;br /&gt;
* Networking: 2x 100 GbE, 2x HDR-100 InfiniBand&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Object storage servers:&amp;lt;/b&amp;gt;&lt;br /&gt;
* 8 object storage servers&lt;br /&gt;
* 2 fast OSTs per server&lt;br /&gt;
** 70 TB per OST, software RAID (raid-z2, 10+2 reduncancy)&lt;br /&gt;
** NVMe drives, directly attached to the PCIe bus&lt;br /&gt;
* 8 slow OSTs per server&lt;br /&gt;
** 143 TB per OST, hardware RAID (RAID6, 8+2 redundancy)&lt;br /&gt;
** externally attached via SAS&lt;br /&gt;
* Networking: 2x 100 GbE, 2x HDR-100 InfiniBand&lt;br /&gt;
&lt;br /&gt;
* All fast OSTs are assigned to the pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;&lt;br /&gt;
* All slow OSTs are assigned to the pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;&lt;br /&gt;
* All files that are created under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; are by default stored on the fast pool&lt;br /&gt;
* All files that are created under &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; are by default stored on the slow pool&lt;br /&gt;
* Metadata is distributed over both MDTs. All subdirectories of a directory (workspace or project folder) are typically on the same MDT. Directory striping/placement on MDTs can not be influenced by users.&lt;br /&gt;
* Default OST striping: Stripes have size 1 MiB. Files are striped over one OST if possible, i.e. all stripes of a file are on the same OST. New files are created on the most empty OST.&lt;br /&gt;
Internally, the slow and the fast pool belong to the same Lustre file system and namespace which has advantages and disadvantages.&lt;br /&gt;
&lt;br /&gt;
==== Moving data between WORK and PROJECT ====&lt;br /&gt;
&amp;lt;b&amp;gt;!! IMPORTANT !!&amp;lt;/b&amp;gt; Calling &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt; on files will &amp;lt;i&amp;gt;not&amp;lt;/i&amp;gt; physically move them. Instead, the file metadata, i.e. the path to the file in the directory tree will be modified (i.e. data stored on the MDS). The stripes of the file on the OSS, however, will remain exactly were they were. The only result will be the confusing situation that you now have metadata entries under &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that still point to WORK OSTs. This may sound confusing at first. When using &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt; on the same file system, Lustre only renames the files and makes them available from a different path. The pointers to the file objects on the OSS stay identical. This will only change if you either create a copy of the file at a different path (with &amp;lt;code&amp;gt;cp&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;rsync&amp;lt;/code&amp;gt;, e.g.) or if you explicitly instruct Lustre to move the actual file objects to another storage location, e.g. another pool of the same file system.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Proper ways of moving data between the pools&amp;lt;/b&amp;gt;&amp;lt;/br&amp;gt;&lt;br /&gt;
* Copy the data - which will create new files, then delete the old files. Example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; cp -ar /pfs/10/work/tu_abcde01-my-precious-ws/* /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/.&lt;br /&gt;
$&amp;gt; rm -rf /pfs/10/work/tu_abcde01-my-precious-ws/*&lt;br /&gt;
$&amp;gt; ws_release --delete-data my-precious-ws&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Alternative to copy: use &amp;lt;code&amp;gt;rsync&amp;lt;/code&amp;gt; to copy data between the workspace and the project directories. Example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; rsync -av /pfs/10/work/tu_abcde01-my-precious-ws/simulation/output /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/simulation25/&lt;br /&gt;
$&amp;gt; rm -rf /pfs/10/work/tu_abcde01-my-precious-ws/*&lt;br /&gt;
$&amp;gt; ws_release --delete-data my-precious-ws&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* If there are many subfolders with similar size, you can use &amp;lt;code&amp;gt;xargs&amp;lt;/code&amp;gt; to copy them in parallel:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; find . -maxdepth 1 -mindepth 1 -type d -print | xargs -P4 -I{} rsync -aHAXW --inplace --update {} /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/simulation25/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
will launch four parallel rsync processes at a time, each will copy one of the subdirectories.&lt;br /&gt;
* Move the metadata with mv, then use lfs migrate or lfs_migrate to actually migrate the stripes&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Why pathes and storage pools should match:&amp;lt;/b&amp;gt;&amp;lt;/br&amp;gt;&lt;br /&gt;
There are four different possible scenarios with two subdirectories and two pools:&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;good.&amp;lt;/b&amp;gt;&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;good.&amp;lt;/b&amp;gt;&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;bad&amp;lt;/b&amp;gt;. This will &amp;quot;leak&amp;quot; storage from the fast pool, making it unavailable for workspaces.&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;bad&amp;lt;/b&amp;gt;. Access will be slow, and if (volatile) workspaces are purged, data residing on &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt; will (voluntarily or involuntarily) be deleted.&lt;br /&gt;
The latter two situations may arise from &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt;ing data between workspaces and project folders.&lt;br /&gt;
&lt;br /&gt;
More reading:&lt;br /&gt;
* [https://doc.lustre.org/lustre_manual.xhtml The Lustre 2.X Manual] ([http://doc.lustre.org/lustre_manual.pdf PDF])&lt;br /&gt;
* [https://wiki.lustre.org/Main_Page The Lustre Wiki]&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15664</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15664"/>
		<updated>2025-12-19T15:06:33Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
[[File:Binac2 schema.png|600px|thumb|center|Overview on the BinAC 2 hardware architecture.]]&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.6&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 168 / 12 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80 / 2.95&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GiB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question:&#039;&#039;&#039;&amp;lt;/br&amp;gt;&lt;br /&gt;
OpenMPI throws the following warning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
No OpenFabrics connection schemes reported that they were able to be&lt;br /&gt;
used on a specific port.  As such, the openib BTL (OpenFabrics&lt;br /&gt;
support) will be disabled for this port.&lt;br /&gt;
  Local host:           node1-083&lt;br /&gt;
  Local device:         mlx5_0&lt;br /&gt;
  Local port:           1&lt;br /&gt;
  CPCs attempted:       rdmacm, udcm&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[node1-083:2137377] 3 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port&lt;br /&gt;
[node1-083:2137377] Set MCA parameter &amp;quot;orte_base_help_aggregate&amp;quot; to 0 to see all help / error messages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
What should i do?&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039;&amp;lt;/br&amp;gt;&lt;br /&gt;
BinAC2 has two (almost) separate networks, a 100GbE network and and InfiniBand network, both connecting a subset of the nodes. Both networks require different cables and switches.  &lt;br /&gt;
Concerning the network cards for the nodes, however, there exist VPI network cards which can be configured to work in either mode (https://docs.nvidia.com/networking/display/connectx6vpi/specifications#src-2487215234_Specifications-MCX653105A-ECATSpecifications).&lt;br /&gt;
OpenMPI can use a number of layers for transferring data and messages between processes. When it ramps up, it will test all means of communication that were configured during compilation and then tries to figure out the fastest path between all processes. &lt;br /&gt;
If OpenMPI encounters such a VPI card, it will first try to establish a Remote Direct Memory Access communication (RDMA) channel using the OpenFabrics (OFI) layer.&lt;br /&gt;
On nodes with 100Gb ethernet, this fails as there is no RDMA protocol configured. OpenMPI will fall back to TCP transport but not without complaints.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Workaround:&#039;&#039;&#039;&amp;lt;/br&amp;gt;  &lt;br /&gt;
For single-node jobs or on regular compute nodes, A30 and A100 GPU nodes: Add the lines&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export OMPI_MCA_btl=&amp;quot;^ofi,openib&amp;quot;&lt;br /&gt;
export OMPI_MCA_mtl=&amp;quot;^ofi&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your job script to disable the OFI transport layer. If you need high-bandwidth, low-latency transport between all processes on all nodes, switch to the Infiniband partition (&amp;lt;code&amp;gt;#SBATCH --constraint=ib&amp;lt;/code&amp;gt;). &#039;&#039;Do not turn off the OFI layer on Infiniband nodes as this will be the best choice between nodes!&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed in parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 480 GB (compute nodes); 7.7 TB (GPU-A30 nodes); 16 TB (GPU-A100 and SMP nodes); 31 TB (GPU-H200 nodes)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | File System Type&lt;br /&gt;
| NFS&lt;br /&gt;
| Lustre&lt;br /&gt;
| Lustre&lt;br /&gt;
| XFS&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA-30)/ ≈ 26 GB/s (GPU-A100 + SMP)/ ≈ 42 GB/s (GPU-H200) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes (nightly)&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;color:red; background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Please note that due to the large capacity of &#039;&#039;&#039;work&#039;&#039;&#039; and &#039;&#039;&#039;project&#039;&#039;&#039; and due to frequent file changes on these file systems, no backup can be provided.&amp;lt;/br&amp;gt;&lt;br /&gt;
Backing up these file systems would require a redundant storage facility with multiple times the capacity of &#039;&#039;&#039;project&#039;&#039;&#039;. Furthermore, regular backups would significantly degrade the performance.&amp;lt;/br&amp;gt;&lt;br /&gt;
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
Every project gets a dedicated directory located at:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
/pfs/10/project/&amp;lt;project_id&amp;gt;/&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check the project you&#039;re member of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
# id $USER | grep -o &#039;bw[^)]*&#039;&lt;br /&gt;
bw16f003&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this case, your project directory would be:&lt;br /&gt;
```&lt;br /&gt;
/pfs/10/project/bw16f003/&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Check our [[BinAC2/Project_Data_Organization | data organization guide ]] for methods to organize data inside the project directory.&lt;br /&gt;
&lt;br /&gt;
=== Workspaces ===&lt;br /&gt;
&lt;br /&gt;
Data on the fast storage pool at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs.&lt;br /&gt;
The primary focus is speed, not capacity.&lt;br /&gt;
&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user should create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
&lt;br /&gt;
You can find more info on workspace tools on our general page:&lt;br /&gt;
&lt;br /&gt;
:: &amp;amp;rarr; &#039;&#039;&#039;[[Workspace]]s&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first or use the &amp;lt;code&amp;gt;--delete-data&amp;lt;/code&amp;gt; option.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;br /&gt;
&lt;br /&gt;
=== More Details on the Lustre File System ===&lt;br /&gt;
[https://www.lustre.org/ Lustre] is a distributed parallel file system.&lt;br /&gt;
* The entire logical volume as presented to the user is formed by multiple physical or local drives. Data is distributed over more than one physical or logical volume/hard drive, single files can be larger than the capacity of a single hard drive.&lt;br /&gt;
* The file system can be mounted from all nodes (&amp;quot;clients&amp;quot;) in parallel at the same time for reading and writing. &amp;lt;i&amp;gt;This also means that technically you can write to the same file from two different compute nodes! Usually, this will create an unpredictable mess! Never ever do this unless you know &amp;lt;b&amp;gt;exactly&amp;lt;/b&amp;gt; what you are doing!&amp;lt;/i&amp;gt;&lt;br /&gt;
* On a single server or client, the bandwidth of multiple network interfaces can be aggregated to increase the throughput (&amp;quot;multi-rail&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
Lustre works by chopping files into many small parts (&amp;quot;stripes&amp;quot;, file objects) which are then stored on the object storage servers. The information which part of the file is stored where on which object storage server, when it was changed last etc. and the entire directory structure is stored on the metadata servers. Think of the entries on the metadata server as being pointers pointing to the actual file objects on the object storage servers.&lt;br /&gt;
A Lustre file system can consist of many metadata servers (MDS) and object storage servers (OSS).&lt;br /&gt;
Each MDS or OSS can again hold one or more so-called object storage targets (OST) or metadata targets (MDT) which can e.g. be simply multiple hard drives.&lt;br /&gt;
The capacity of a Lustre file system can hence be easily scaled by adding more servers.&lt;br /&gt;
&lt;br /&gt;
==== Architecture of BinAC2&#039;s Lustre File System ====&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata Servers:&amp;lt;/b&amp;gt;&lt;br /&gt;
* 2 metadata servers&lt;br /&gt;
* 1 MDT per server&lt;br /&gt;
* MDT Capacity: 31TB, hardware RAID6 on NVMe drives (flash memory/SSD)&lt;br /&gt;
* Networking: 2x 100 GbE, 2x HDR-100 InfiniBand&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Object storage servers:&amp;lt;/b&amp;gt;&lt;br /&gt;
* 8 object storage servers&lt;br /&gt;
* 2 fast OSTs per server&lt;br /&gt;
** 70 TB per OST, software RAID (raid-z2, 10+2 reduncancy)&lt;br /&gt;
** NVMe drives, directly attached to the PCIe bus&lt;br /&gt;
* 8 slow OSTs per server&lt;br /&gt;
** 143 TB per OST, hardware RAID (RAID6, 8+2 redundancy)&lt;br /&gt;
** externally attached via SAS&lt;br /&gt;
* Networking: 2x 100 GbE, 2x HDR-100 InfiniBand&lt;br /&gt;
&lt;br /&gt;
* All fast OSTs are assigned to the pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;&lt;br /&gt;
* All slow OSTs are assigned to the pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;&lt;br /&gt;
* All files that are created under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; are by default stored on the fast pool&lt;br /&gt;
* All files that are created under &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; are by default stored on the slow pool&lt;br /&gt;
* Metadata is distributed over both MDTs. All subdirectories of a directory (workspace or project folder) are typically on the same MDT. Directory striping/placement on MDTs can not be influenced by users.&lt;br /&gt;
* Default OST striping: Stripes have size 1 MiB. Files are striped over one OST if possible, i.e. all stripes of a file are on the same OST. New files are created on the most empty OST.&lt;br /&gt;
Internally, the slow and the fast pool belong to the same Lustre file system and namespace which has advantages and disadvantages.&lt;br /&gt;
&lt;br /&gt;
==== Moving data between WORK and PROJECT ====&lt;br /&gt;
&amp;lt;b&amp;gt;!! IMPORTANT !!&amp;lt;/b&amp;gt; Calling &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt; on files will &amp;lt;i&amp;gt;not&amp;lt;/i&amp;gt; physically move them. Instead, the file metadata, i.e. the path to the file in the directory tree will be modified (i.e. data stored on the MDS). The stripes of the file on the OSS, however, will remain exactly were they were. The only result will be the confusing situation that you now have metadata entries under &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that still point to WORK OSTs. This may sound confusing at first. When using &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt; on the same file system, Lustre only renames the files and makes them available from a different path. The pointers to the file objects on the OSS stay identical. This will only change if you either create a copy of the file at a different path (with &amp;lt;code&amp;gt;cp&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;rsync&amp;lt;/code&amp;gt;, e.g.) or if you explicitly instruct Lustre to move the actual file objects to another storage location, e.g. another pool of the same file system.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Proper ways of moving data between the pools&amp;lt;/b&amp;gt;&amp;lt;/br&amp;gt;&lt;br /&gt;
* Copy the data - which will create new files, then delete the old files. Example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; cp -ar /pfs/10/work/tu_abcde01-my-precious-ws/* /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/.&lt;br /&gt;
$&amp;gt; rm -rf /pfs/10/work/tu_abcde01-my-precious-ws/*&lt;br /&gt;
$&amp;gt; ws_release --delete-data my-precious-ws&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Alternative to copy: use &amp;lt;code&amp;gt;rsync&amp;lt;/code&amp;gt; to copy data between the workspace and the project directories. Example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; rsync -av /pfs/10/work/tu_abcde01-my-precious-ws/simulation/output /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/simulation25/&lt;br /&gt;
$&amp;gt; rm -rf /pfs/10/work/tu_abcde01-my-precious-ws/*&lt;br /&gt;
$&amp;gt; ws_release --delete-data my-precious-ws&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* If there are many subfolders with similar size, you can use &amp;lt;code&amp;gt;xargs&amp;lt;/code&amp;gt; to copy them in parallel:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; find . -maxdepth 1 -mindepth 1 -type d -print | xargs -P4 -I{} rsync -aHAXW --inplace --update {} /pfs/10/project/bw10a001/tu_abcde01/my-precious-research/simulation25/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
will launch four parallel rsync processes at a time, each will copy one of the subdirectories.&lt;br /&gt;
* Move the metadata with mv, then use lfs migrate or lfs_migrate to actually migrate the stripes&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Why pathes and storage pools should match:&amp;lt;/b&amp;gt;&amp;lt;/br&amp;gt;&lt;br /&gt;
There are four different possible scenarios with two subdirectories and two pools:&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;good.&amp;lt;/b&amp;gt;&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;good.&amp;lt;/b&amp;gt;&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;bad&amp;lt;/b&amp;gt;. This will &amp;quot;leak&amp;quot; storage from the fast pool, making it unavailable for workspaces.&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;bad&amp;lt;/b&amp;gt;. Access will be slow, and if (volatile) workspaces are purged, data residing on &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt; will (voluntarily or involuntarily) be deleted.&lt;br /&gt;
The latter two situations may arise from &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt;ing data between workspaces and project folders.&lt;br /&gt;
&lt;br /&gt;
More reading:&lt;br /&gt;
* [https://doc.lustre.org/lustre_manual.xhtml The Lustre 2.X Manual] ([http://doc.lustre.org/lustre_manual.pdf PDF])&lt;br /&gt;
* [https://wiki.lustre.org/Main_Page The Lustre Wiki]&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15663</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15663"/>
		<updated>2025-12-19T14:16:01Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
[[File:Binac2 schema.png|600px|thumb|center|Overview on the BinAC 2 hardware architecture.]]&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.6&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 168 / 12 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80 / 2.95&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GiB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question:&#039;&#039;&#039;&amp;lt;/br&amp;gt;&lt;br /&gt;
OpenMPI throws the following warning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
No OpenFabrics connection schemes reported that they were able to be&lt;br /&gt;
used on a specific port.  As such, the openib BTL (OpenFabrics&lt;br /&gt;
support) will be disabled for this port.&lt;br /&gt;
  Local host:           node1-083&lt;br /&gt;
  Local device:         mlx5_0&lt;br /&gt;
  Local port:           1&lt;br /&gt;
  CPCs attempted:       rdmacm, udcm&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[node1-083:2137377] 3 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port&lt;br /&gt;
[node1-083:2137377] Set MCA parameter &amp;quot;orte_base_help_aggregate&amp;quot; to 0 to see all help / error messages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
What should i do?&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039;&amp;lt;/br&amp;gt;&lt;br /&gt;
BinAC2 has two (almost) separate networks, a 100GbE network and and InfiniBand network, both connecting a subset of the nodes. Both networks require different cables and switches.  &lt;br /&gt;
Concerning the network cards for the nodes, however, there exist VPI network cards which can be configured to work in either mode (https://docs.nvidia.com/networking/display/connectx6vpi/specifications#src-2487215234_Specifications-MCX653105A-ECATSpecifications).&lt;br /&gt;
OpenMPI can use a number of layers for transferring data and messages between processes. When it ramps up, it will test all means of communication that were configured during compilation and then tries to figure out the fastest path between all processes. &lt;br /&gt;
If OpenMPI encounters such a VPI card, it will first try to establish a Remote Direct Memory Access communication (RDMA) channel using the OpenFabrics (OFI) layer.&lt;br /&gt;
On nodes with 100Gb ethernet, this fails as there is no RDMA protocol configured. OpenMPI will fall back to TCP transport but not without complaints.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Workaround:&#039;&#039;&#039;&amp;lt;/br&amp;gt;  &lt;br /&gt;
For single-node jobs or on regular compute nodes, A30 and A100 GPU nodes: Add the lines&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export OMPI_MCA_btl=&amp;quot;^ofi,openib&amp;quot;&lt;br /&gt;
export OMPI_MCA_mtl=&amp;quot;^ofi&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your job script to disable the OFI transport layer. If you need high-bandwidth, low-latency transport between all processes on all nodes, switch to the Infiniband partition (&amp;lt;code&amp;gt;#SBATCH --constraint=ib&amp;lt;/code&amp;gt;). &#039;&#039;Do not turn off the OFI layer on Infiniband nodes as this will be the best choice between nodes!&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed in parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 480 GB (compute nodes); 7.7 TB (GPU-A30 nodes); 16 TB (GPU-A100 and SMP nodes); 31 TB (GPU-H200 nodes)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | File System Type&lt;br /&gt;
| NFS&lt;br /&gt;
| Lustre&lt;br /&gt;
| Lustre&lt;br /&gt;
| XFS&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA-30)/ ≈ 26 GB/s (GPU-A100 + SMP)/ ≈ 42 GB/s (GPU-H200) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes (nightly)&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;color:red; background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Please note that due to the large capacity of &#039;&#039;&#039;work&#039;&#039;&#039; and &#039;&#039;&#039;project&#039;&#039;&#039; and due to frequent file changes on these file systems, no backup can be provided.&amp;lt;/br&amp;gt;&lt;br /&gt;
Backing up these file systems would require a redundant storage facility with multiple times the capacity of &#039;&#039;&#039;project&#039;&#039;&#039;. Furthermore, regular backups would significantly degrade the performance.&amp;lt;/br&amp;gt;&lt;br /&gt;
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
Every project gets a dedicated directory located at:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
/pfs/10/project/&amp;lt;project_id&amp;gt;/&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check the project you&#039;re member of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
# id $USER | grep -o &#039;bw[^)]*&#039;&lt;br /&gt;
bw16f003&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this case, your project directory would be:&lt;br /&gt;
```&lt;br /&gt;
/pfs/10/project/bw16f003/&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Check our [[BinAC2/Project_Data_Organization | data organization guide ]] for methods to organize data inside the project directory.&lt;br /&gt;
&lt;br /&gt;
=== Workspaces ===&lt;br /&gt;
&lt;br /&gt;
Data on the fast storage pool at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs.&lt;br /&gt;
The primary focus is speed, not capacity.&lt;br /&gt;
&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user should create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
&lt;br /&gt;
You can find more info on workspace tools on our general page:&lt;br /&gt;
&lt;br /&gt;
:: &amp;amp;rarr; &#039;&#039;&#039;[[Workspace]]s&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first or use the &amp;lt;code&amp;gt;--delete-data&amp;lt;/code&amp;gt; option.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;br /&gt;
&lt;br /&gt;
=== More Details on the Lustre File System ===&lt;br /&gt;
[https://www.lustre.org/ Lustre] is a distributed parallel file system.&lt;br /&gt;
* The entire logical volume as presented to the user is formed by multiple physical or local drives. Data is distributed over more than one physical or logical volume/hard drive, single files can be larger than the capacity of a single hard drive.&lt;br /&gt;
* The file system can be mounted from all nodes (&amp;quot;clients&amp;quot;) in parallel at the same time for reading and writing. &amp;lt;i&amp;gt;This also means that technically you can write to the same file from two different compute nodes! Usually, this will create an unpredictable mess! Never ever do this unless you know &amp;lt;b&amp;gt;exactly&amp;lt;/b&amp;gt; what you are doing!&amp;lt;/i&amp;gt;&lt;br /&gt;
* On a single server or client, the bandwidth of multiple network interfaces can be aggregated to increase the throughput (&amp;quot;multi-rail&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
Lustre works by chopping files into many small parts (&amp;quot;stripes&amp;quot;, file objects) which are then stored on the object storage servers. The information which part of the file is stored where on which object storage server, when it was changed last etc. and the entire directory structure is stored on the metadata servers. Think of the entries on the metadata server as being pointers pointing to the actual file objects on the object storage servers.&lt;br /&gt;
A Lustre file system can consist of many metadata servers (MDS) and object storage servers (OSS).&lt;br /&gt;
Each MDS or OSS can again hold one or more so-called object storage targets (OST) or metadata targets (MDT) which can e.g. be simply multiple hard drives.&lt;br /&gt;
The capacity of a Lustre file system can hence be easily scaled by adding more servers.&lt;br /&gt;
&lt;br /&gt;
==== Architecture of BinAC2&#039;s Lustre File System ====&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata Servers:&amp;lt;/b&amp;gt;&lt;br /&gt;
* 2 metadata servers&lt;br /&gt;
* 1 MDT per server&lt;br /&gt;
* MDT Capacity: 31TB, hardware RAID6 on NVMe drives (flash memory/SSD)&lt;br /&gt;
* Networking: 2x 100 GbE, 2x HDR-100 InfiniBand&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Object storage servers:&amp;lt;/b&amp;gt;&lt;br /&gt;
* 8 object storage servers&lt;br /&gt;
* 2 fast OSTs per server&lt;br /&gt;
** 70 TB per OST, software RAID (raid-z2, 10+2 reduncancy)&lt;br /&gt;
** NVMe drives, directly attached to the PCIe bus&lt;br /&gt;
* 8 slow OSTs per server&lt;br /&gt;
** 143 TB per OST, hardware RAID (RAID6, 8+2 redundancy)&lt;br /&gt;
** externally attached via SAS&lt;br /&gt;
* Networking: 2x 100 GbE, 2x HDR-100 InfiniBand&lt;br /&gt;
&lt;br /&gt;
* All fast OSTs are assigned to the pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;&lt;br /&gt;
* All slow OSTs are assigned to the pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;&lt;br /&gt;
* All files that are created under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; are by default stored on the fast pool&lt;br /&gt;
* All files that are created under &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; are by default stored on the slow pool&lt;br /&gt;
* Metadata is distributed over both MDTs. All subdirectories of a directory (workspace or project folder) are typically on the same MDT. Directory striping/placement on MDTs can not be influenced by users.&lt;br /&gt;
* Default OST striping: Stripes have size 1 MiB. Files are striped over one OST if possible, i.e. all stripes of a file are on the same OST. New files are created on the most empty OST.&lt;br /&gt;
Internally, the slow and the fast pool belong to the same Lustre file system and namespace which has advantages and disadvantages.&lt;br /&gt;
&lt;br /&gt;
==== Moving data between WORK and PROJECT ====&lt;br /&gt;
&amp;lt;b&amp;gt;!! IMPORTANT !!&amp;lt;/b&amp;gt; Calling &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt; on files will &amp;lt;i&amp;gt;not&amp;lt;/i&amp;gt; physically move them. Instead, the file metadata, i.e. the path to the file in the directory tree will be modified (i.e. data stored on the MDS). The stripes of the file on the OSS, however, will remain exactly were they were. The only result will be the confusing situation that you now have metadata entries under &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that still point to WORK OSTs. This may sound confusing at first. When using &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt; on the same file system, Lustre only renames the files and makes them available from a different path. The pointers to the file objects on the OSS stay identical. This will only change if you either create a copy of the file at a different path (with &amp;lt;code&amp;gt;cp&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;rsync&amp;lt;/code&amp;gt;, e.g.) or if you explicitly instruct Lustre to move the actual file objects to another storage location, e.g. another pool of the same file system.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Proper ways of moving data between the pools&amp;lt;/b&amp;gt;&amp;lt;/br&amp;gt;&lt;br /&gt;
* Copy the data - which will create new files, then delete the old files. Example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; cp -ar /pfs/10/work/tu_abcde01-my-precious-ws/* pfs/10/project/bw10a001/tu_abcde01/my-precious-research/.&lt;br /&gt;
$&amp;gt; rm -rf /pfs/10/work/tu_abcde01-my-precious-ws/*&lt;br /&gt;
$&amp;gt; ws_release --delete-data my-precious-ws&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Alternative to copy: use &amp;lt;code&amp;gt;rsync&amp;lt;/code&amp;gt; to copy data between the workspace and the project directories. Example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; rsync -av /pfs/10/work/tu_abcde01-my-precious-ws/simulation/output pfs/10/project/bw10a001/tu_abcde01/my-precious-research/simulation25/&lt;br /&gt;
$&amp;gt; rm -rf /pfs/10/work/tu_abcde01-my-precious-ws/*&lt;br /&gt;
$&amp;gt; ws_release --delete-data my-precious-ws&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Move the metadata with mv, then use lfs migrate or lfs_migrate to actually migrate the stripes&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Why pathes and storage pools should match:&amp;lt;/b&amp;gt;&amp;lt;/br&amp;gt;&lt;br /&gt;
There are four different possible scenarios with two subdirectories and two pools:&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;good.&amp;lt;/b&amp;gt;&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;good.&amp;lt;/b&amp;gt;&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;bad&amp;lt;/b&amp;gt;. This will &amp;quot;leak&amp;quot; storage from the fast pool, making it unavailable for workspaces.&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;bad&amp;lt;/b&amp;gt;. Access will be slow, and if (volatile) workspaces are purged, data residing on &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt; will (voluntarily or involuntarily) be deleted.&lt;br /&gt;
The latter two situations may arise from &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt;ing data between workspaces and project folders.&lt;br /&gt;
&lt;br /&gt;
More reading:&lt;br /&gt;
* [https://doc.lustre.org/lustre_manual.xhtml The Lustre 2.X Manual] ([http://doc.lustre.org/lustre_manual.pdf PDF])&lt;br /&gt;
* [https://wiki.lustre.org/Main_Page The Lustre Wiki]&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15660</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15660"/>
		<updated>2025-12-18T18:45:55Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: More on Lustre&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
[[File:Binac2 schema.png|600px|thumb|center|Overview on the BinAC 2 hardware architecture.]]&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.6&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 168 / 12 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80 / 2.95&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GiB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question:&#039;&#039;&#039;&amp;lt;/br&amp;gt;&lt;br /&gt;
OpenMPI throws the following warning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
No OpenFabrics connection schemes reported that they were able to be&lt;br /&gt;
used on a specific port.  As such, the openib BTL (OpenFabrics&lt;br /&gt;
support) will be disabled for this port.&lt;br /&gt;
  Local host:           node1-083&lt;br /&gt;
  Local device:         mlx5_0&lt;br /&gt;
  Local port:           1&lt;br /&gt;
  CPCs attempted:       rdmacm, udcm&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[node1-083:2137377] 3 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port&lt;br /&gt;
[node1-083:2137377] Set MCA parameter &amp;quot;orte_base_help_aggregate&amp;quot; to 0 to see all help / error messages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
What should i do?&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039;&amp;lt;/br&amp;gt;&lt;br /&gt;
BinAC2 has two (almost) separate networks, a 100GbE network and and InfiniBand network, both connecting a subset of the nodes. Both networks require different cables and switches.  &lt;br /&gt;
Concerning the network cards for the nodes, however, there exist VPI network cards which can be configured to work in either mode (https://docs.nvidia.com/networking/display/connectx6vpi/specifications#src-2487215234_Specifications-MCX653105A-ECATSpecifications).&lt;br /&gt;
OpenMPI can use a number of layers for transferring data and messages between processes. When it ramps up, it will test all means of communication that were configured during compilation and then tries to figure out the fastest path between all processes. &lt;br /&gt;
If OpenMPI encounters such a VPI card, it will first try to establish a Remote Direct Memory Access communication (RDMA) channel using the OpenFabrics (OFI) layer.&lt;br /&gt;
On nodes with 100Gb ethernet, this fails as there is no RDMA protocol configured. OpenMPI will fall back to TCP transport but not without complaints.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Workaround:&#039;&#039;&#039;&amp;lt;/br&amp;gt;  &lt;br /&gt;
For single-node jobs or on regular compute nodes, A30 and A100 GPU nodes: Add the lines&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export OMPI_MCA_btl=&amp;quot;^ofi,openib&amp;quot;&lt;br /&gt;
export OMPI_MCA_mtl=&amp;quot;^ofi&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your job script to disable the OFI transport layer. If you need high-bandwidth, low-latency transport between all processes on all nodes, switch to the Infiniband partition (&amp;lt;code&amp;gt;#SBATCH --constraint=ib&amp;lt;/code&amp;gt;). &#039;&#039;Do not turn off the OFI layer on Infiniband nodes as this will be the best choice between nodes!&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed in parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 480 GB (compute nodes); 7.7 TB (GPU-A30 nodes); 16 TB (GPU-A100 and SMP nodes); 31 TB (GPU-H200 nodes)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | File System Type&lt;br /&gt;
| NFS&lt;br /&gt;
| Lustre&lt;br /&gt;
| Lustre&lt;br /&gt;
| XFS&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA-30)/ ≈ 26 GB/s (GPU-A100 + SMP)/ ≈ 42 GB/s (GPU-H200) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes (nightly)&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;color:red; background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Please note that due to the large capacity of &#039;&#039;&#039;work&#039;&#039;&#039; and &#039;&#039;&#039;project&#039;&#039;&#039; and due to frequent file changes on these file systems, no backup can be provided.&amp;lt;/br&amp;gt;&lt;br /&gt;
Backing up these file systems would require a redundant storage facility with multiple times the capacity of &#039;&#039;&#039;project&#039;&#039;&#039;. Furthermore, regular backups would significantly degrade the performance.&amp;lt;/br&amp;gt;&lt;br /&gt;
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
Every project gets a dedicated directory located at:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
/pfs/10/project/&amp;lt;project_id&amp;gt;/&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check the project you&#039;re member of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
# id $USER | grep -o &#039;bw[^)]*&#039;&lt;br /&gt;
bw16f003&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this case, your project directory would be:&lt;br /&gt;
```&lt;br /&gt;
/pfs/10/project/bw16f003/&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Check our [[BinAC2/Project_Data_Organization | data organization guide ]] for methods to organize data inside the project directory.&lt;br /&gt;
&lt;br /&gt;
=== Workspaces ===&lt;br /&gt;
&lt;br /&gt;
Data on the fast storage pool at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs.&lt;br /&gt;
The primary focus is speed, not capacity.&lt;br /&gt;
&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user should create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
&lt;br /&gt;
You can find more info on workspace tools on our general page:&lt;br /&gt;
&lt;br /&gt;
:: &amp;amp;rarr; &#039;&#039;&#039;[[Workspace]]s&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first or use the &amp;lt;code&amp;gt;--delete-data&amp;lt;/code&amp;gt; option.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;br /&gt;
&lt;br /&gt;
=== More Details on the Lustre File System ===&lt;br /&gt;
[https://www.lustre.org/ Lustre] is a distributed parallel file system.&lt;br /&gt;
* The entire logical volume as presented to the user is formed by multiple physical or local drives. Data is distributed over more than one physical or logical volume/hard drive, single files can be larger than the capacity of a single hard drive.&lt;br /&gt;
* The file system can be mounted from all nodes (&amp;quot;clients&amp;quot;) in parallel at the same time for reading and writing. &amp;lt;i&amp;gt;This also means that technically you can write to the same file from two different compute nodes! Usually, this will create an unpredictable mess! Never ever do this unless you know &amp;lt;b&amp;gt;exactly&amp;lt;/b&amp;gt; what you are doing!&amp;lt;/i&amp;gt;&lt;br /&gt;
* On a single server or client, the bandwidth of multiple network interfaces can be aggregated to increase the throughput (&amp;quot;multi-rail&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
Lustre works by chopping files into many small parts (&amp;quot;stripes&amp;quot;, file objects) which are then stored on the object storage servers. The information which part of the file is stored where on which object storage server, when it was changed last etc. and the entire directory structure is stored on the metadata servers. Think of the entries on the metadata server as being pointers pointing to the actual file objects on the object storage servers.&lt;br /&gt;
A Lustre file system can consist of many metadata servers (MDS) and object storage servers (OSS).&lt;br /&gt;
Each MDS or OSS can again hold one or more so-called object storage targets (OST) or metadata targets (MDT) which can e.g. be simply multiple hard drives.&lt;br /&gt;
The capacity of a Lustre file system can hence be easily scaled by adding more servers.&lt;br /&gt;
&lt;br /&gt;
==== Architecture of BinAC2&#039;s Lustre File System ====&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata Servers:&amp;lt;/b&amp;gt;&lt;br /&gt;
* 2 metadata servers&lt;br /&gt;
* 1 MDT per server&lt;br /&gt;
* MDT Capacity: 31TB, hardware RAID6 on NVMe drives (flash memory/SSD)&lt;br /&gt;
* Networking: 2x 100 GbE, 2x HDR-100 InfiniBand&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Object storage servers:&amp;lt;/b&amp;gt;&lt;br /&gt;
* 8 object storage servers&lt;br /&gt;
* 2 fast OSTs per server&lt;br /&gt;
** 70 TB per OST, software RAID (raid-z2, 10+2 reduncancy)&lt;br /&gt;
** NVMe drives, directly attached to the PCIe bus&lt;br /&gt;
* 8 slow OSTs per server&lt;br /&gt;
** 143 TB per OST, hardware RAID (RAID6, 8+2 redundancy)&lt;br /&gt;
** externally attached via SAS&lt;br /&gt;
* Networking: 2x 100 GbE, 2x HDR-100 InfiniBand&lt;br /&gt;
&lt;br /&gt;
* All fast OSTs are assigned to the pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;&lt;br /&gt;
* All slow OSTs are assigned to the pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;&lt;br /&gt;
* All files that are created under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; are by default stored on the fast pool&lt;br /&gt;
* All files that are created under &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; are by default stored on the slow pool&lt;br /&gt;
* Metadata is distributed over both MDTs. All subdirectories of a directory (workspace or project folder) are typically on the same MDT. Directory striping/placement on MDTs can not be influenced by users.&lt;br /&gt;
* Default OST striping: Stripes have size 1 MiB. Files are striped over one OST if possible, i.e. all stripes of a file are on the same OST. New files are created on the most empty OST.&lt;br /&gt;
Internally, the slow and the fast pool belong to the same Lustre file system and namespace which has advantages and disadvantages.&lt;br /&gt;
&lt;br /&gt;
==== Moving data between WORK and PROJECT ====&lt;br /&gt;
&amp;lt;b&amp;gt;!! IMPORTANT !!&amp;lt;/b&amp;gt; Calling &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt; on files will &amp;lt;i&amp;gt;not&amp;lt;/i&amp;gt; physically move them. Instead, the file metadata, i.e. the path to the file in the directory tree will be modified (i.e. data stored on the MDS). The stripes of the file on the OSS, however, will remain exactly were they were. The only result will be the confusing situation that you now have metadata entries under &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that still point to WORK OSTs. This may sound confusing at first. When using &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt; on the same file system, Lustre only renames the files and makes them available from a different path. The pointers to the file objects on the OSS stay identical. This will only change if you either create a copy of the file at a different path (with &amp;lt;code&amp;gt;cp&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;rsync&amp;lt;/code&amp;gt;, e.g.) or if you explicitly instruct Lustre to move the actual file objects to another storage location, e.g. another pool of the same file system.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Proper ways of moving data between the pools&amp;lt;/b&amp;gt;&amp;lt;/br&amp;gt;&lt;br /&gt;
* Copy the data - which will create new files, then delete the old files. Example:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$&amp;gt; cp -ar /pfs/10/work/tu_abcde01-my-precious-ws/* pfs/10/project/bw10a001/tu_abcde01/my-precious-research/.&lt;br /&gt;
$&amp;gt; rm -rf /pfs/10/work/tu_abcde01-my-precious-ws/*&lt;br /&gt;
$&amp;gt; ws_release --delete-data my-precious-ws&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Move the metadata with mv, then use lfs migrate or lfs_migrate to actually migrate the stripes&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Why pathes and storage pools should match:&amp;lt;/b&amp;gt;&amp;lt;/br&amp;gt;&lt;br /&gt;
There are four different possible scenarios with two subdirectories and two pools:&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;good.&amp;lt;/b&amp;gt;&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;good.&amp;lt;/b&amp;gt;&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;bad&amp;lt;/b&amp;gt;. This will &amp;quot;leak&amp;quot; storage from the fast pool, making it unavailable for workspaces.&lt;br /&gt;
* File path in &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;, file objects on pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;: &amp;lt;b&amp;gt;bad&amp;lt;/b&amp;gt;. Access will be slow, and if (volatile) workspaces are purged, data residing on &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt; will (voluntarily or involuntarily) be deleted.&lt;br /&gt;
The latter two situations may arise from &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt;ing data between workspaces and project folders.&lt;br /&gt;
&lt;br /&gt;
More reading:&lt;br /&gt;
* [https://doc.lustre.org/lustre_manual.xhtml The Lustre 2.X Manual] ([http://doc.lustre.org/lustre_manual.pdf PDF])&lt;br /&gt;
* [https://wiki.lustre.org/Main_Page The Lustre Wiki]&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15659</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15659"/>
		<updated>2025-12-18T17:02:08Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: Added a BinAC2 schematic&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
[[File:Binac2 schema.png|600px|thumb|center|Overview on the BinAC 2 hardware architecture.]]&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.6&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 168 / 12 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80 / 2.95&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GiB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question:&#039;&#039;&#039;&lt;br /&gt;
OpenMPI throws the following warning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
No OpenFabrics connection schemes reported that they were able to be&lt;br /&gt;
used on a specific port.  As such, the openib BTL (OpenFabrics&lt;br /&gt;
support) will be disabled for this port.&lt;br /&gt;
  Local host:           node1-083&lt;br /&gt;
  Local device:         mlx5_0&lt;br /&gt;
  Local port:           1&lt;br /&gt;
  CPCs attempted:       rdmacm, udcm&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[node1-083:2137377] 3 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port&lt;br /&gt;
[node1-083:2137377] Set MCA parameter &amp;quot;orte_base_help_aggregate&amp;quot; to 0 to see all help / error messages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
What should i do?&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039;&lt;br /&gt;
BinAC2 has two (almost) separate networks, a 100GbE network and and InfiniBand network, both connecting a subset of the nodes. Both networks require different cables and switches.  &lt;br /&gt;
Concerning the network cards for the nodes, however, there exist VPI network cards which can be configured to work in either mode (https://docs.nvidia.com/networking/display/connectx6vpi/specifications#src-2487215234_Specifications-MCX653105A-ECATSpecifications).&lt;br /&gt;
OpenMPI can use a number of layers for transferring data and messages between processes. When it ramps up, it will test all means of communication that were configured during compilation and then tries to figure out the fastest path between all processes. &lt;br /&gt;
If OpenMPI encounters such a VPI card, it will first try to establish a Remote Direct Memory Access communication (RDMA) channel using the OpenFabrics (OFI) layer.&lt;br /&gt;
On nodes with 100Gb ethernet, this fails as there is no RDMA protocol configured. OpenMPI will fall back to TCP transport but not without complaints.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Workaround:&#039;&#039;&#039;  &lt;br /&gt;
For single-node jobs or on regular compute nodes, A30 and A100 GPU nodes: Add the lines&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export OMPI_MCA_btl=&amp;quot;^ofi,openib&amp;quot;&lt;br /&gt;
export OMPI_MCA_mtl=&amp;quot;^ofi&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your job script to disable the OFI transport layer. If you need high-bandwidth, low-latency transport between all processes on all nodes, switch to the Infiniband partition (&amp;lt;code&amp;gt;#SBATCH --constraint=ib&amp;lt;/code&amp;gt;). &#039;&#039;Do not turn off the OFI layer on Infiniband nodes as this will be the best choice between nodes!&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed in parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 480 GB (compute nodes); 7.7 TB (GPU-A30 nodes); 16 TB (GPU-A100 and SMP nodes); 31 TB (GPU-H200 nodes)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | File System Type&lt;br /&gt;
| NFS&lt;br /&gt;
| Lustre&lt;br /&gt;
| Lustre&lt;br /&gt;
| XFS&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA-30)/ ≈ 26 GB/s (GPU-A100 + SMP)/ ≈ 42 GB/s (GPU-H200) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes (nightly)&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;color:red; background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Please note that due to the large capacity of &#039;&#039;&#039;work&#039;&#039;&#039; and &#039;&#039;&#039;project&#039;&#039;&#039; and due to frequent file changes on these file systems, no backup can be provided.&amp;lt;/br&amp;gt;&lt;br /&gt;
Backing up these file systems would require a redundant storage facility with multiple times the capacity of &#039;&#039;&#039;project&#039;&#039;&#039;. Furthermore, regular backups would significantly degrade the performance.&amp;lt;/br&amp;gt;&lt;br /&gt;
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
Every project gets a dedicated directory located at:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
/pfs/10/project/&amp;lt;project_id&amp;gt;/&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check the project you&#039;re member of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
# id $USER | grep -o &#039;bw[^)]*&#039;&lt;br /&gt;
bw16f003&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this case, your project directory would be:&lt;br /&gt;
```&lt;br /&gt;
/pfs/10/project/bw16f003/&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Check our [[BinAC2/Project_Data_Organization | data organization guide ]] for methods to organize data inside the project directory.&lt;br /&gt;
&lt;br /&gt;
=== Workspaces ===&lt;br /&gt;
&lt;br /&gt;
Data on the fast storage pool at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs.&lt;br /&gt;
The primary focus is speed, not capacity.&lt;br /&gt;
&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user should create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
&lt;br /&gt;
You can find more info on workspace tools on our general page:&lt;br /&gt;
&lt;br /&gt;
:: &amp;amp;rarr; &#039;&#039;&#039;[[Workspace]]s&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first or use the &amp;lt;code&amp;gt;--delete-data&amp;lt;/code&amp;gt; option.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;br /&gt;
&lt;br /&gt;
=== More Details on the Lustre File System ===&lt;br /&gt;
[https://www.lustre.org/ Lustre] is a distributed parallel file system.&lt;br /&gt;
* The entire logical volume as presented to the user is formed by multiple physical or local drives. Data is distributed over more than one physical or logical volume/hard drive, single files can be larger than the capacity of a single hard drive.&lt;br /&gt;
* The file system can be mounted from all nodes (&amp;quot;clients&amp;quot;) in parallel at the same time for reading and writing. &amp;lt;i&amp;gt;This also means that technically you can write to the same file from two different compute nodes! Usually, this will create an unpredictable mess! Never ever do this unless you know &amp;lt;b&amp;gt;exactly&amp;lt;/b&amp;gt; what you are doing!&amp;lt;/i&amp;gt;&lt;br /&gt;
* On a single server or client, the bandwidth of multiple network interfaces can be aggregated to increase the throughput (&amp;quot;multi-rail&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
Lustre works by chopping files into many small parts (&amp;quot;stripes&amp;quot;, file objects) which are then stored on the object storage servers. The information which part of the file is stored where on which object storage server, when it was changed last etc. and the entire directory structure is stored on the metadata servers. Think of the entries on the metadata server as being pointers pointing to the actual file objects on the object storage servers.&lt;br /&gt;
A Lustre file system can consist of many metadata servers (MDS) and object storage servers (OSS).&lt;br /&gt;
Each MDS or OSS can again hold one or more so-called object storage targets (OST) or metadata targets (MDT) which can e.g. be simply multiple hard drives.&lt;br /&gt;
The capacity of a Lustre file system can hence be easily scaled by adding more servers.&lt;br /&gt;
&lt;br /&gt;
==== Architecture of BinAC2&#039;s Lustre File System ====&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata Servers:&amp;lt;/b&amp;gt;&lt;br /&gt;
* 2 metadata servers&lt;br /&gt;
* 1 MDT per server&lt;br /&gt;
* MDT Capacity: 31TB, hardware RAID6 on NVMe drives (flash memory/SSD)&lt;br /&gt;
* Networking: 2x 100 GbE, 2x HDR-100 InfiniBand&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Object storage servers:&amp;lt;/b&amp;gt;&lt;br /&gt;
* 8 object storage servers&lt;br /&gt;
* 2 fast OSTs per server&lt;br /&gt;
** 70 TB per OST, software RAID (raid-z2, 10+2 reduncancy)&lt;br /&gt;
** NVMe drives, directly attached to the PCIe bus&lt;br /&gt;
* 8 slow OSTs per server&lt;br /&gt;
** 143 TB per OST, hardware RAID (RAID6, 8+2 redundancy)&lt;br /&gt;
** externally attached via SAS&lt;br /&gt;
* Networking: 2x 100 GbE, 2x HDR-100 InfiniBand&lt;br /&gt;
&lt;br /&gt;
* All fast OSTs are assigned to the pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;&lt;br /&gt;
* All slow OSTs are assigned to the pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;&lt;br /&gt;
* All files that are created under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; are by default stored on the fast pool&lt;br /&gt;
* All files that are created under &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; are by default stored on the slow pool&lt;br /&gt;
* Metadata is distributed over both MDTs. All subdirectories of a directory (workspace or project folder) are typically on the same MDT. Directory striping/placement on MDTs can not be influenced by users.&lt;br /&gt;
* Default OST striping: Stripes have size 1 MiB. Files are striped over one OST if possible, i.e. all stripes of a file are on the same OST. New files are created on the most empty OST.&lt;br /&gt;
Internally, the slow and the fast pool belong to the same Lustre file system and namespace which has advantages and disadvantages.&lt;br /&gt;
&lt;br /&gt;
==== Moving data between WORK and PROJECT ====&lt;br /&gt;
&amp;lt;b&amp;gt;!! IMPORTANT !!&amp;lt;/b&amp;gt; Calling &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt; on files will &amp;lt;i&amp;gt;not&amp;lt;/i&amp;gt; physically move them. Instead, the file metadata, i.e. the path to the file in the directory tree will be modified (i.e. data stored on the MDS). The stripes of the file on the OSS, however, will remain exactly were they were. The only result will be the confusing situation that you now have metadata entries under &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that still point to WORK OSTs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Proper ways of moving data between the pools&amp;lt;/b&amp;gt;&lt;br /&gt;
* Copy the data - which will create new files, then delete the old files&lt;br /&gt;
* Move the metadata with mv, then use lfs migrate or lfs_migrate to actually migrate the stripes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
More reading:&lt;br /&gt;
* [https://doc.lustre.org/lustre_manual.xhtml The Lustre 2.X Manual] ([http://doc.lustre.org/lustre_manual.pdf PDF])&lt;br /&gt;
* [https://wiki.lustre.org/Main_Page The Lustre Wiki]&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=File:Binac2_schema.png&amp;diff=15658</id>
		<title>File:Binac2 schema.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=File:Binac2_schema.png&amp;diff=15658"/>
		<updated>2025-12-18T16:56:23Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: A schematic drawing showing the nodes, partitions and connectivity of the BinAC2 cluster&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Summary ==&lt;br /&gt;
A schematic drawing showing the nodes, partitions and connectivity of the BinAC2 cluster&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15654</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15654"/>
		<updated>2025-12-16T18:44:21Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: Lustre stuff&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.6&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 168 / 12 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80 / 2.95&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GiB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question:&#039;&#039;&#039;&lt;br /&gt;
OpenMPI throws the following warning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
No OpenFabrics connection schemes reported that they were able to be&lt;br /&gt;
used on a specific port.  As such, the openib BTL (OpenFabrics&lt;br /&gt;
support) will be disabled for this port.&lt;br /&gt;
  Local host:           node1-083&lt;br /&gt;
  Local device:         mlx5_0&lt;br /&gt;
  Local port:           1&lt;br /&gt;
  CPCs attempted:       rdmacm, udcm&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[node1-083:2137377] 3 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port&lt;br /&gt;
[node1-083:2137377] Set MCA parameter &amp;quot;orte_base_help_aggregate&amp;quot; to 0 to see all help / error messages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
What should i do?&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039;&lt;br /&gt;
BinAC2 has two (almost) separate networks, a 100GbE network and and InfiniBand network, both connecting a subset of the nodes. Both networks require different cables and switches.  &lt;br /&gt;
Concerning the network cards for the nodes, however, there exist VPI network cards which can be configured to work in either mode (https://docs.nvidia.com/networking/display/connectx6vpi/specifications#src-2487215234_Specifications-MCX653105A-ECATSpecifications).&lt;br /&gt;
OpenMPI can use a number of layers for transferring data and messages between processes. When it ramps up, it will test all means of communication that were configured during compilation and then tries to figure out the fastest path between all processes. &lt;br /&gt;
If OpenMPI encounters such a VPI card, it will first try to establish a Remote Direct Memory Access communication (RDMA) channel using the OpenFabrics (OFI) layer.&lt;br /&gt;
On nodes with 100Gb ethernet, this fails as there is no RDMA protocol configured. OpenMPI will fall back to TCP transport but not without complaints.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Workaround:&#039;&#039;&#039;  &lt;br /&gt;
For single-node jobs or on regular compute nodes, A30 and A100 GPU nodes: Add the lines&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export OMPI_MCA_btl=&amp;quot;^ofi,openib&amp;quot;&lt;br /&gt;
export OMPI_MCA_mtl=&amp;quot;^ofi&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your job script to disable the OFI transport layer. If you need high-bandwidth, low-latency transport between all processes on all nodes, switch to the Infiniband partition (&amp;lt;code&amp;gt;#SBATCH --constraint=ib&amp;lt;/code&amp;gt;). &#039;&#039;Do not turn off the OFI layer on Infiniband nodes as this will be the best choice between nodes!&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed in parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 480 GB (compute nodes); 7.7 TB (GPU-A30 nodes); 16 TB (GPU-A100 and SMP nodes); 31 TB (GPU-H200 nodes)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | File System Type&lt;br /&gt;
| NFS&lt;br /&gt;
| Lustre&lt;br /&gt;
| Lustre&lt;br /&gt;
| XFS&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA-30)/ ≈ 26 GB/s (GPU-A100 + SMP)/ ≈ 42 GB/s (GPU-H200) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes (nightly)&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;color:red; background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Please note that due to the large capacity of &#039;&#039;&#039;work&#039;&#039;&#039; and &#039;&#039;&#039;project&#039;&#039;&#039; and due to frequent file changes on these file systems, no backup can be provided.&amp;lt;/br&amp;gt;&lt;br /&gt;
Backing up these file systems would require a redundant storage facility with multiple times the capacity of &#039;&#039;&#039;project&#039;&#039;&#039;. Furthermore, regular backups would significantly degrade the performance.&amp;lt;/br&amp;gt;&lt;br /&gt;
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
Every project gets a dedicated directory located at:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
/pfs/10/project/&amp;lt;project_id&amp;gt;/&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check the project you&#039;re member of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
# id $USER | grep -o &#039;bw[^)]*&#039;&lt;br /&gt;
bw16f003&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this case, your project directory would be:&lt;br /&gt;
```&lt;br /&gt;
/pfs/10/project/bw16f003/&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Check our [[BinAC2/Project_Data_Organization | data organization guide ]] for methods to organize data inside the project directory.&lt;br /&gt;
&lt;br /&gt;
=== Workspaces ===&lt;br /&gt;
&lt;br /&gt;
Data on the fast storage pool at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs.&lt;br /&gt;
The primary focus is speed, not capacity.&lt;br /&gt;
&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user should create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
&lt;br /&gt;
You can find more info on workspace tools on our general page:&lt;br /&gt;
&lt;br /&gt;
:: &amp;amp;rarr; &#039;&#039;&#039;[[Workspace]]s&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first or use the &amp;lt;code&amp;gt;--delete-data&amp;lt;/code&amp;gt; option.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;br /&gt;
&lt;br /&gt;
=== More Details on the Lustre File System ===&lt;br /&gt;
[https://www.lustre.org/ Lustre] is a distributed parallel file system.&lt;br /&gt;
* The entire logical volume as presented to the user is formed by multiple physical or local drives. Data is distributed over more than one physical or logical volume/hard drive, single files can be larger than the capacity of a single hard drive.&lt;br /&gt;
* The file system can be mounted from all nodes (&amp;quot;clients&amp;quot;) in parallel at the same time for reading and writing. &amp;lt;i&amp;gt;This also means that technically you can write to the same file from two different compute nodes! Usually, this will create an unpredictable mess! Never ever do this unless you know &amp;lt;b&amp;gt;exactly&amp;lt;/b&amp;gt; what you are doing!&amp;lt;/i&amp;gt;&lt;br /&gt;
* On a single server or client, the bandwidth of multiple network interfaces can be aggregated to increase the throughput (&amp;quot;multi-rail&amp;quot;).&lt;br /&gt;
&lt;br /&gt;
Lustre works by chopping files into many small parts (&amp;quot;stripes&amp;quot;, file objects) which are then stored on the object storage servers. The information which part of the file is stored where on which object storage server, when it was changed last etc. and the entire directory structure is stored on the metadata servers. Think of the entries on the metadata server as being pointers pointing to the actual file objects on the object storage servers.&lt;br /&gt;
A Lustre file system can consist of many metadata servers (MDS) and object storage servers (OSS).&lt;br /&gt;
Each MDS or OSS can again hold one or more so-called object storage targets (OST) or metadata targets (MDT) which can e.g. be simply multiple hard drives.&lt;br /&gt;
The capacity of a Lustre file system can hence be easily scaled by adding more servers.&lt;br /&gt;
&lt;br /&gt;
==== Architecture of BinAC2&#039;s Lustre File System ====&lt;br /&gt;
&amp;lt;b&amp;gt;Metadata Servers:&amp;lt;/b&amp;gt;&lt;br /&gt;
* 2 metadata servers&lt;br /&gt;
* 1 MDT per server&lt;br /&gt;
* MDT Capacity: 31TB, hardware RAID6 on NVMe drives (flash memory/SSD)&lt;br /&gt;
* Networking: 2x 100 GbE, 2x HDR-100 InfiniBand&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Object storage servers:&amp;lt;/b&amp;gt;&lt;br /&gt;
* 8 object storage servers&lt;br /&gt;
* 2 fast OSTs per server&lt;br /&gt;
** 70 TB per OST, software RAID (raid-z2, 10+2 reduncancy)&lt;br /&gt;
** NVMe drives, directly attached to the PCIe bus&lt;br /&gt;
* 8 slow OSTs per server&lt;br /&gt;
** 143 TB per OST, hardware RAID (RAID6, 8+2 redundancy)&lt;br /&gt;
** externally attached via SAS&lt;br /&gt;
* Networking: 2x 100 GbE, 2x HDR-100 InfiniBand&lt;br /&gt;
&lt;br /&gt;
* All fast OSTs are assigned to the pool &amp;lt;code&amp;gt;work&amp;lt;/code&amp;gt;&lt;br /&gt;
* All slow OSTs are assigned to the pool &amp;lt;code&amp;gt;project&amp;lt;/code&amp;gt;&lt;br /&gt;
* All files that are created under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; are by default stored on the fast pool&lt;br /&gt;
* All files that are created under &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; are by default stored on the slow pool&lt;br /&gt;
* Metadata is distributed over both MDTs. All subdirectories of a directory (workspace or project folder) are typically on the same MDT. Directory striping/placement on MDTs can not be influenced by users.&lt;br /&gt;
* Default OST striping: Stripes have size 1 MiB. Files are striped over one OST if possible, i.e. all stripes of a file are on the same OST. New files are created on the most empty OST.&lt;br /&gt;
Internally, the slow and the fast pool belong to the same Lustre file system and namespace which has advantages and disadvantages.&lt;br /&gt;
&lt;br /&gt;
==== Moving data between WORK and PROJECT ====&lt;br /&gt;
&amp;lt;b&amp;gt;!! IMPORTANT !!&amp;lt;/b&amp;gt; Calling &amp;lt;code&amp;gt;mv&amp;lt;/code&amp;gt; on files will &amp;lt;i&amp;gt;not&amp;lt;/i&amp;gt; physically move them. Instead, the file metadata, i.e. the path to the file in the directory tree will be modified (i.e. data stored on the MDS). The stripes of the file on the OSS, however, will remain exactly were they were. The only result will be the confusing situation that you now have metadata entries under &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that still point to WORK OSTs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Proper ways of moving data between the pools&amp;lt;/b&amp;gt;&lt;br /&gt;
* Copy the data - which will create new files, then delete the old files&lt;br /&gt;
* Move the metadata with mv, then use lfs migrate or lfs_migrate to actually migrate the stripes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
More reading:&lt;br /&gt;
* [https://doc.lustre.org/lustre_manual.xhtml The Lustre 2.X Manual] ([http://doc.lustre.org/lustre_manual.pdf PDF])&lt;br /&gt;
* [https://wiki.lustre.org/Main_Page The Lustre Wiki]&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15653</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15653"/>
		<updated>2025-12-16T17:01:16Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.6&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 168 / 12 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80 / 2.95&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GiB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question:&#039;&#039;&#039;&lt;br /&gt;
OpenMPI throws the following warning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
No OpenFabrics connection schemes reported that they were able to be&lt;br /&gt;
used on a specific port.  As such, the openib BTL (OpenFabrics&lt;br /&gt;
support) will be disabled for this port.&lt;br /&gt;
  Local host:           node1-083&lt;br /&gt;
  Local device:         mlx5_0&lt;br /&gt;
  Local port:           1&lt;br /&gt;
  CPCs attempted:       rdmacm, udcm&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[node1-083:2137377] 3 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port&lt;br /&gt;
[node1-083:2137377] Set MCA parameter &amp;quot;orte_base_help_aggregate&amp;quot; to 0 to see all help / error messages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
What should i do?&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039;&lt;br /&gt;
BinAC2 has two (almost) separate networks, a 100GbE network and and InfiniBand network, both connecting a subset of the nodes. Both networks require different cables and switches.  &lt;br /&gt;
Concerning the network cards for the nodes, however, there exist VPI network cards which can be configured to work in either mode (https://docs.nvidia.com/networking/display/connectx6vpi/specifications#src-2487215234_Specifications-MCX653105A-ECATSpecifications).&lt;br /&gt;
OpenMPI can use a number of layers for transferring data and messages between processes. When it ramps up, it will test all means of communication that were configured during compilation and then tries to figure out the fastest path between all processes. &lt;br /&gt;
If OpenMPI encounters such a VPI card, it will first try to establish a Remote Direct Memory Access communication (RDMA) channel using the OpenFabrics (OFI) layer.&lt;br /&gt;
On nodes with 100Gb ethernet, this fails as there is no RDMA protocol configured. OpenMPI will fall back to TCP transport but not without complaints.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Workaround:&#039;&#039;&#039;  &lt;br /&gt;
For single-node jobs or on regular compute nodes, A30 and A100 GPU nodes: Add the lines&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export OMPI_MCA_btl=&amp;quot;^ofi,openib&amp;quot;&lt;br /&gt;
export OMPI_MCA_mtl=&amp;quot;^ofi&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your job script to disable the OFI transport layer. If you need high-bandwidth, low-latency transport between all processes on all nodes, switch to the Infiniband partition (&amp;lt;code&amp;gt;#SBATCH --constraint=ib&amp;lt;/code&amp;gt;). &#039;&#039;Do not turn off the OFI layer on Infiniband nodes as this will be the best choice between nodes!&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed in parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 480 GB (compute nodes); 7.7 TB (GPU-A30 nodes); 16 TB (GPU-A100 and SMP nodes); 31 TB (GPU-H200 nodes)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA-30)/ ≈ 26 GB/s (GPU-A100 + SMP)/ ≈ 42 GB/s (GPU-H200) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes (nightly)&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;color:red; background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Please note that due to the large capacity of &#039;&#039;&#039;work&#039;&#039;&#039; and &#039;&#039;&#039;project&#039;&#039;&#039; and due to frequent file changes on these file systems, no backup can be provided.&amp;lt;/br&amp;gt;&lt;br /&gt;
Backing up these file systems would require a redundant storage facility with multiple times the capacity of &#039;&#039;&#039;project&#039;&#039;&#039;. Furthermore, regular backups would significantly degrade the performance.&amp;lt;/br&amp;gt;&lt;br /&gt;
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
Every project gets a dedicated directory located at:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
/pfs/10/project/&amp;lt;project_id&amp;gt;/&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check the project you&#039;re member of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
# id $USER | grep -o &#039;bw[^)]*&#039;&lt;br /&gt;
bw16f003&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this case, your project directory would be:&lt;br /&gt;
```&lt;br /&gt;
/pfs/10/project/bw16f003/&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Check our [[BinAC2/Project_Data_Organization | data organization guide ]] for methods to organize data inside the project directory.&lt;br /&gt;
&lt;br /&gt;
=== Workspaces ===&lt;br /&gt;
&lt;br /&gt;
Data on the fast storage pool at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs.&lt;br /&gt;
The primary focus is speed, not capacity.&lt;br /&gt;
&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user should create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
&lt;br /&gt;
You can find more info on workspace tools on our general page:&lt;br /&gt;
&lt;br /&gt;
:: &amp;amp;rarr; &#039;&#039;&#039;[[Workspace]]s&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first or use the &amp;lt;code&amp;gt;--delete-data&amp;lt;/code&amp;gt; option.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15652</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15652"/>
		<updated>2025-12-16T16:57:45Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.6&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 168 / 12 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80 / 2.95&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GiB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question:&#039;&#039;&#039;&lt;br /&gt;
OpenMPI throws the following warning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
No OpenFabrics connection schemes reported that they were able to be&lt;br /&gt;
used on a specific port.  As such, the openib BTL (OpenFabrics&lt;br /&gt;
support) will be disabled for this port.&lt;br /&gt;
  Local host:           node1-083&lt;br /&gt;
  Local device:         mlx5_0&lt;br /&gt;
  Local port:           1&lt;br /&gt;
  CPCs attempted:       rdmacm, udcm&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[node1-083:2137377] 3 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port&lt;br /&gt;
[node1-083:2137377] Set MCA parameter &amp;quot;orte_base_help_aggregate&amp;quot; to 0 to see all help / error messages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
What should i do?&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039;&lt;br /&gt;
BinAC2 has two (almost) separate networks, a 100GbE network and and InfiniBand network, both connecting a subset of the nodes. Both networks require different cables and switches.  &lt;br /&gt;
Concerning the network cards for the nodes, however, there exist VPI network cards which can be configured to work in either mode (https://docs.nvidia.com/networking/display/connectx6vpi/specifications#src-2487215234_Specifications-MCX653105A-ECATSpecifications).&lt;br /&gt;
OpenMPI can use a number of layers for transferring data and messages between processes. When it ramps up, it will test all means of communication that were configured during compilation and then tries to figure out the fastest path between all processes. &lt;br /&gt;
If OpenMPI encounters such a VPI card, it will first try to establish a Remote Direct Memory Access communication (RDMA) channel using the OpenFabrics (OFI) layer.&lt;br /&gt;
On nodes with 100Gb ethernet, this fails as there is no RDMA protocol configured. OpenMPI will fall back to TCP transport but not without complaints.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Workaround:&#039;&#039;&#039;  &lt;br /&gt;
For single-node jobs or on regular compute nodes, A30 and A100 GPU nodes: Add the lines&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export OMPI_MCA_btl=&amp;quot;^ofi,openib&amp;quot;&lt;br /&gt;
export OMPI_MCA_mtl=&amp;quot;^ofi&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your job script to disable the OFI transport layer. If you need high-bandwidth, low-latency transport between all processes on all nodes, switch to the Infiniband partition (&amp;lt;code&amp;gt;#SBATCH --constraint=ib&amp;lt;/code&amp;gt;). &#039;&#039;Do not turn off the OFI layer on Infiniband nodes as this will be the best choice between nodes!&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed in parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 480 GB (compute nodes); 7.7 TB (GPU-A30 nodes); 16 TB (GPU-A100 and SMP nodes); 31 TB (GPU-H200 nodes)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA-30)/ ≈ 26 GB/s (GPU-A100 + SMP)/ ≈ 42 GB/s (GPU-H200) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes (nightly)&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;color:red; background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Please note that due to the large capacity of &#039;&#039;&#039;work&#039;&#039;&#039; and &#039;&#039;&#039;project&#039;&#039;&#039; and due to frequent file changes on these file systems, no backup can be provided.&amp;lt;/br&amp;gt;&lt;br /&gt;
Backing up these file systems would require a redundant storage facility with multiple times the capacity of &#039;&#039;&#039;project&#039;&#039;&#039;. Furthermore, regular backups would significantly degrade the performance.&amp;lt;/br&amp;gt;&lt;br /&gt;
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
Every project gets a dedicated directory located at:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
/pfs/10/project/&amp;lt;project_id&amp;gt;/&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check the project you&#039;re member of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
# id $USER | grep -o &#039;bw[^)]*&#039;&lt;br /&gt;
bw16f003&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this case, your project directory would be:&lt;br /&gt;
```&lt;br /&gt;
/pfs/10/project/bw16f003/&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Check our [[BinAC2/Project_Data_Organization | data organization guide ]] for methods to organize data inside the project directory.&lt;br /&gt;
&lt;br /&gt;
=== Workspaces ===&lt;br /&gt;
&lt;br /&gt;
Data on the fast storage pool at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs.&lt;br /&gt;
The primary focus is speed, not capacity.&lt;br /&gt;
&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user should create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
&lt;br /&gt;
You can find more info on workspace tools on our general page:&lt;br /&gt;
&lt;br /&gt;
:: &amp;amp;rarr; &#039;&#039;&#039;[[Workspace]]s&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15651</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15651"/>
		<updated>2025-12-16T16:54:46Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.6&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 168 / 12 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80 / 2.95&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GiB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question:&#039;&#039;&#039;&lt;br /&gt;
OpenMPI throws the following warning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
No OpenFabrics connection schemes reported that they were able to be&lt;br /&gt;
used on a specific port.  As such, the openib BTL (OpenFabrics&lt;br /&gt;
support) will be disabled for this port.&lt;br /&gt;
  Local host:           node1-083&lt;br /&gt;
  Local device:         mlx5_0&lt;br /&gt;
  Local port:           1&lt;br /&gt;
  CPCs attempted:       rdmacm, udcm&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[node1-083:2137377] 3 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port&lt;br /&gt;
[node1-083:2137377] Set MCA parameter &amp;quot;orte_base_help_aggregate&amp;quot; to 0 to see all help / error messages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
What should i do?&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039;&lt;br /&gt;
BinAC2 has two (almost) separate networks, a 100GbE network and and InfiniBand network, both connecting a subset of the nodes. Both networks require different cables and switches.  &lt;br /&gt;
Concerning the network cards for the nodes, however, there exist VPI network cards which can be configured to work in either mode (https://docs.nvidia.com/networking/display/connectx6vpi/specifications#src-2487215234_Specifications-MCX653105A-ECATSpecifications).&lt;br /&gt;
OpenMPI can use a number of layers for transferring data and messages between processes. When it ramps up, it will test all means of communication that were configured during compilation and then tries to figure out the fastest path between all processes. &lt;br /&gt;
If OpenMPI encounters such a VPI card, it will first try to establish a Remote Direct Memory Access communication (RDMA) channel using the OpenFabrics (OFI) layer.&lt;br /&gt;
On nodes with 100Gb ethernet, this fails as there is no RDMA protocol configured. OpenMPI will fall back to TCP transport but not without complaints.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Workaround:&#039;&#039;&#039;  &lt;br /&gt;
For single-node jobs or on regular compute nodes, A30 and A100 GPU nodes: Add the lines&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export OMPI_MCA_btl=&amp;quot;^ofi,openib&amp;quot;&lt;br /&gt;
export OMPI_MCA_mtl=&amp;quot;^ofi&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your job script to disable the OFI transport layer. If you need high-bandwidth, low-latency transport between all processes on all nodes, switch to the Infiniband partition (&amp;lt;code&amp;gt;#SBATCH --constraint=ib&amp;lt;/code&amp;gt;). &#039;&#039;Do not turn off the OFI layer on Infiniband nodes as this will be the best choice between nodes!&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 480 GB (compute nodes); 7.7 TB (GPU-A30 nodes); 16 TB (GPU-A100 and SMP nodes); 31 TB (GPU-H200 nodes)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA-30)/ ≈ 26 GB/s (GPU-A100 + SMP)/ ≈ 42 GB/s (GPU-H200) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes (nightly)&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;color:red; background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Please note that due to the large capacity of &#039;&#039;&#039;work&#039;&#039;&#039; and &#039;&#039;&#039;project&#039;&#039;&#039; and due to frequent file changes on these file systems, no backup can be provided.&amp;lt;/br&amp;gt;&lt;br /&gt;
Backing up these file systems would require a redundant storage facility with multiple times the capacity of &#039;&#039;&#039;project&#039;&#039;&#039;. Furthermore, regular backups would significantly degrade the performance.&amp;lt;/br&amp;gt;&lt;br /&gt;
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
Every project gets a dedicated directory located at:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
/pfs/10/project/&amp;lt;project_id&amp;gt;/&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check the project you&#039;re member of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
# id $USER | grep -o &#039;bw[^)]*&#039;&lt;br /&gt;
bw16f003&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this case, your project directory would be:&lt;br /&gt;
```&lt;br /&gt;
/pfs/10/project/bw16f003/&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Check our [[BinAC2/Project_Data_Organization | data organization guide ]] for methods to organize data inside the project directory.&lt;br /&gt;
&lt;br /&gt;
=== Workspaces ===&lt;br /&gt;
&lt;br /&gt;
Data on the fast storage pool at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs.&lt;br /&gt;
The primary focus is speed, not capacity.&lt;br /&gt;
&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user should create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
&lt;br /&gt;
You can find more info on workspace tools on our general page:&lt;br /&gt;
&lt;br /&gt;
:: &amp;amp;rarr; &#039;&#039;&#039;[[Workspace]]s&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15642</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15642"/>
		<updated>2025-12-12T10:17:19Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: /* Network */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.5&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 168 / 12 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80 / 2.95&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GiB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Question:&#039;&#039;&#039;&lt;br /&gt;
OpenMPI throws the following warning:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
No OpenFabrics connection schemes reported that they were able to be&lt;br /&gt;
used on a specific port.  As such, the openib BTL (OpenFabrics&lt;br /&gt;
support) will be disabled for this port.&lt;br /&gt;
  Local host:           node1-083&lt;br /&gt;
  Local device:         mlx5_0&lt;br /&gt;
  Local port:           1&lt;br /&gt;
  CPCs attempted:       rdmacm, udcm&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
[node1-083:2137377] 3 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port&lt;br /&gt;
[node1-083:2137377] Set MCA parameter &amp;quot;orte_base_help_aggregate&amp;quot; to 0 to see all help / error messages&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
What should i do?&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039;&lt;br /&gt;
BinAC2 has two (almost) separate networks, a 100GbE network and and InfiniBand network, both connecting a subset of the nodes. Both networks require different cables and switches.  &lt;br /&gt;
Concerning the network cards for the nodes, however, there exist VPI network cards which can be configured to work in either mode (https://docs.nvidia.com/networking/display/connectx6vpi/specifications#src-2487215234_Specifications-MCX653105A-ECATSpecifications).&lt;br /&gt;
OpenMPI can use a number of layers for transferring data and messages between processes. When it ramps up, it will test all means of communication that were configured during compilation and then tries to figure out the fastest path between all processes. &lt;br /&gt;
If OpenMPI encounters such a VPI card, it will first try to establish a Remote Direct Memory Access communication (RDMA) channel using the OpenFabrics (OFI) layer.&lt;br /&gt;
On nodes with 100Gb ethernet, this fails as there is no RDMA protocol configured. OpenMPI will fall back to TCP transport but not without complaints.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Workaround:&#039;&#039;&#039;  &lt;br /&gt;
For single-node jobs or on regular compute nodes, A30 and A100 GPU nodes: Add the lines&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export OMPI_MCA_btl=&amp;quot;^ofi,openib&amp;quot;&lt;br /&gt;
export OMPI_MCA_mtl=&amp;quot;^ofi&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to your job script to disable the OFI transport layer. If you need high-bandwidth, low-latency transport between all processes on all nodes, switch to the Infiniband partition (&amp;lt;code&amp;gt;#SBATCH --constraint=ib&amp;lt;/code&amp;gt;). &#039;&#039;Do not turn off the OFI layer on Infiniband nodes as this will be the best choice between nodes!&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 480 GB (compute nodes); 7.7 TB (GPU-A30 nodes); 16 TB (GPU-A100 and SMP nodes); 31 TB (GPU-H200 nodes)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA-30)/ ≈ 26 GB/s (GPU-A100 + SMP)/ ≈ 42 GB/s (GPU-H200) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes (nightly)&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;color:red; background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Please note that due to the large capacity of &#039;&#039;&#039;work&#039;&#039;&#039; and &#039;&#039;&#039;project&#039;&#039;&#039; and due to frequent file changes on these file systems, no backup can be provided.&amp;lt;/br&amp;gt;&lt;br /&gt;
Backing up these file systems would require a redundant storage facility with multiple times the capacity of &#039;&#039;&#039;project&#039;&#039;&#039;. Furthermore, regular backups would significantly degrade the performance.&amp;lt;/br&amp;gt;&lt;br /&gt;
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
Every project gets a dedicated directory located at:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
/pfs/10/project/&amp;lt;project_id&amp;gt;/&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can check the project you&#039;re member of:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
# id $USER | grep -o &#039;bw[^)]*&#039;&lt;br /&gt;
bw16f003&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
In this case, your project directory would be:&lt;br /&gt;
```&lt;br /&gt;
/pfs/10/project/bw16f003/&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Check our [[BinAC2/Project_Data_Organization | data organization guide ]] for methods to organize data inside the project directory.&lt;br /&gt;
&lt;br /&gt;
=== Workspaces ===&lt;br /&gt;
&lt;br /&gt;
Data on the fast storage pool at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs.&lt;br /&gt;
The primary focus is speed, not capacity.&lt;br /&gt;
&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user should create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
&lt;br /&gt;
You can find more info on workspace tools on our general page:&lt;br /&gt;
&lt;br /&gt;
:: &amp;amp;rarr; &#039;&#039;&#039;[[Workspace]]s&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/SLURM_Partitions&amp;diff=15300</id>
		<title>BinAC2/SLURM Partitions</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/SLURM_Partitions&amp;diff=15300"/>
		<updated>2025-09-18T07:42:40Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: Added BinAC2 long partition&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Partitions ==&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 provides four partitions for job submission.&lt;br /&gt;
Within a partition job allocations are routed automatically to the most suitable compute node(s) for the requested resources (e.g. amount of nodes and cores, memory, number and type of GPUs). &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;gpu&amp;lt;/code&amp;gt; partition will only run 8 jobs per user at the same time. A user can only use 4 A100 and 8 A30 GPUs at the same time.&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;interactive&amp;lt;/code&amp;gt; partition will only run 1 job per user at the same time.&lt;br /&gt;
This partition is reserved is dedicated for testing things and using tools via a graphical user interface.&lt;br /&gt;
The four nodes &amp;lt;code&amp;gt;node1-00[1-4]&amp;lt;/code&amp;gt; are exclusively reserved for this partition.&lt;br /&gt;
You can run a VNC server in this partition. Please use &amp;lt;code&amp;gt;#SBATCH --gres=display:1&amp;lt;/code&amp;gt; in your jobscript or &amp;lt;code&amp;gt;--gres=display:1&amp;lt;/code&amp;gt; on the command line if you need a display. This ensures that your job starts on a node with &amp;quot;free&amp;quot; displays, because each of the four nodes only provide 20 possible virtual displays.&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;long&amp;lt;/code&amp;gt; partition is meant for long-running, parallel jobs. Please pack your jobs as dense as possible. If possible, do regular checkpointing in case the job fails after several days. Due to the small number of GPU nodes at BinAC2, we cannot offer a &amp;lt;code&amp;gt;long&amp;lt;/code&amp;gt; partition with GPU nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
All partitions are operated in shared mode, that is, jobs from different users can be executed on the same node. However, one can get exclusive access to compute nodes by using the &amp;quot;--exclusive&amp;quot; option.&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Partition&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Node Access Policy&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Node Types&lt;br /&gt;
! style=&amp;quot;width:20%&amp;quot;| Default&lt;br /&gt;
! style=&amp;quot;width:20%&amp;quot;| Limits&lt;br /&gt;
|-&lt;br /&gt;
| compute (default)&lt;br /&gt;
| shared&lt;br /&gt;
| cpu&lt;br /&gt;
| ntasks=1, time=00:10:00, mem-per-cpu=1gb&lt;br /&gt;
| nodes=2, time=14-00:00:00&lt;br /&gt;
|-&lt;br /&gt;
| gpu&lt;br /&gt;
| shared&lt;br /&gt;
| gpu &lt;br /&gt;
| ntasks=1, time=00:10:00, mem-per-cpu=1gb&lt;br /&gt;
| time=14-00:00:00&amp;lt;/br&amp;gt;MaxJobsPerUser: 8&amp;lt;/br&amp;gt;MaxTRESPerUser:&amp;lt;/br&amp;gt;&amp;lt;pre&amp;gt;gres/gpu:a100=4,&lt;br /&gt;
gres/gpu:a30=8,&lt;br /&gt;
gres/gpu:h200=4&amp;lt;/pre&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| interactive&lt;br /&gt;
| shared&lt;br /&gt;
| cpu &lt;br /&gt;
| ntasks=1, time=00:10:00, mem-per-cpu=1gb&lt;br /&gt;
| time=10:00:00&amp;lt;/br&amp;gt;MaxJobsPerUser: 1&lt;br /&gt;
|-&lt;br /&gt;
| long&lt;br /&gt;
| shared&lt;br /&gt;
| cpu (InfiniBand nodes only) &lt;br /&gt;
| time=1-00:00:00, feature=ib&lt;br /&gt;
| time=30-00:00:00&amp;lt;/br&amp;gt;MaxNodes=10&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Parallel Jobs ===&lt;br /&gt;
&lt;br /&gt;
In order to submit parallel jobs to the InfiniBand part of the cluster, i.e., for fast inter-node communication, please select the appropriate nodes via the &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt; option in your job script. For less demanding parallel jobs, you may try the &amp;lt;code&amp;gt;--constraint=eth&amp;lt;/code&amp;gt; option, which utilizes 100Gb/s Ethernet instead of the low-latency 100Gb/s InfiniBand.&lt;br /&gt;
&lt;br /&gt;
=== GPU Jobs ===&lt;br /&gt;
&lt;br /&gt;
BinAC 2 provides different GPU models for computations. Please select the appropriate GPU type and the amount of GPUs with the &amp;lt;code&amp;gt;--gres=aXX:N&amp;lt;/code&amp;gt; option in your job script&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:20%&amp;quot;| GPU&lt;br /&gt;
! style=&amp;quot;width:20%&amp;quot;| GPU Memory&lt;br /&gt;
! style=&amp;quot;width:20%&amp;quot;| # GPUs per Node [N]&lt;br /&gt;
! style=&amp;quot;width:20%&amp;quot;| Submit Option&lt;br /&gt;
|-&lt;br /&gt;
| Nvidia A30&lt;br /&gt;
| 24GB&lt;br /&gt;
| 2&lt;br /&gt;
| &amp;lt;code&amp;gt;--gres=gpu:a30:N&amp;lt;/code&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Nvidia A100&lt;br /&gt;
| 80GB&lt;br /&gt;
| 4&lt;br /&gt;
| &amp;lt;code&amp;gt;--gres=gpu:a100:N&amp;lt;/code&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Nvidia H200&lt;br /&gt;
| 141GB&lt;br /&gt;
| 4&lt;br /&gt;
| &amp;lt;code&amp;gt;--gres=gpu:h200:N&amp;lt;/code&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15290</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15290"/>
		<updated>2025-09-16T07:53:24Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.5&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 168 / 12 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80 / 2.95&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GiB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 480 GB (compute nodes); 7.7 TB (GPU-A30 nodes); 16 TB (GPU-A100 and SMP nodes); 31 TB (GPU-H200 nodes)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA-30)/ ≈ 26 GB/s (GPU-A100 + SMP)/ ≈ 42 GB/s (GPU-H200) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes (nightly)&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;color:red; background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Please note that due to the large capacity of &#039;&#039;&#039;work&#039;&#039;&#039; and &#039;&#039;&#039;project&#039;&#039;&#039; and due to frequent file changes on these file systems, no backup can be provided.&amp;lt;/br&amp;gt;&lt;br /&gt;
Backing up these file systems would require a redundant storage facility with multiple times the capacity of &#039;&#039;&#039;project&#039;&#039;&#039;. Furthermore, regular backups would significantly degrade the performance.&amp;lt;/br&amp;gt;&lt;br /&gt;
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
Each compute project has its own project directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ls -lh /pfs/10/project/&lt;br /&gt;
drwxrwx---. 2 root bw16f003 33K Dec 12 16:46 bw16f003&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As you can see the directory is owned by a group representing your compute project (here bw16f003) and the directory is accessible by all group members. It is upon your group to decide how to use the space inside this directory: shared data folders, personal directories for each project member, software containers, etc.&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
=== Workspaces ===&lt;br /&gt;
&lt;br /&gt;
Data on the fast storage pool at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs.&lt;br /&gt;
The primary focus is speed, not capacity.&lt;br /&gt;
&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user should create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
&lt;br /&gt;
You can find more info on workspace tools on our general page:&lt;br /&gt;
&lt;br /&gt;
:: &amp;amp;rarr; &#039;&#039;&#039;[[Workspace]]s&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Slurm&amp;diff=15273</id>
		<title>BinAC2/Slurm</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Slurm&amp;diff=15273"/>
		<updated>2025-09-08T07:42:25Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: updated GPU section&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= General information about Slurm =&lt;br /&gt;
&lt;br /&gt;
Any kind of calculation on the compute nodes of bwForCluster BinAC 2 requires the user to define calculations as a sequence of commands or single command together with required run time, number of CPU cores and main memory and submit all, i.e., the &#039;&#039;&#039;batch job&#039;&#039;&#039;, to a resource and workload managing software. BinAC 2 has installed the workload managing software Slurm. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.&lt;br /&gt;
&lt;br /&gt;
= External Slurm documentation =&lt;br /&gt;
&lt;br /&gt;
You can find the official Slurm configuration and some other material here:&lt;br /&gt;
&lt;br /&gt;
* Slurm documentation: https://slurm.schedmd.com/documentation.html&lt;br /&gt;
* Slurm cheat sheet: https://slurm.schedmd.com/pdfs/summary.pdf&lt;br /&gt;
* Slurm tutorials: https://slurm.schedmd.com/tutorials.html&lt;br /&gt;
&lt;br /&gt;
= SLURM terminology = &lt;br /&gt;
&lt;br /&gt;
SLURM knows and mirrors the division of the cluster into &#039;&#039;&#039;nodes&#039;&#039;&#039; with several &#039;&#039;&#039;cores&#039;&#039;&#039;. When queuing &#039;&#039;&#039;jobs&#039;&#039;&#039;, there are several ways of requesting resources and it is important to know which term means what in SLURM. Here are some basic SLURM terms:&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Job&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Job&lt;br /&gt;
: A job is a self-contained computation that may encompass multiple tasks and is given specific resources like individual CPUs/GPUs, a specific amount of RAM or entire nodes. These resources are said to have been allocated for the job.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Task&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Task&lt;br /&gt;
: A task is a single run of a single process. By default, one task is run per node and one CPU is assigned per task.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Partition&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Partition    &lt;br /&gt;
: A partition (usually called queue outside SLURM) is a waiting line in which jobs are put by users.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Socket&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Socket    &lt;br /&gt;
: Receptacle on the motherboard for one physically packaged processor (each of which can contain one or more cores).&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Core&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Core    &lt;br /&gt;
: A complete private set of registers, execution units, and retirement queues needed to execute programs.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Thread&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Thread    &lt;br /&gt;
: One or more hardware contexts withing a single core. Each thread has attributes of one core, managed &amp;amp; scheduled as a single logical processor by the OS.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;CPU&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;CPU&lt;br /&gt;
: A &#039;&#039;&#039;CPU&#039;&#039;&#039; in Slurm means a &#039;&#039;&#039;single core&#039;&#039;&#039;. This is different from the more common terminology, where a CPU (a microprocessor chip) consists of multiple cores. Slurm uses the term &#039;&#039;&#039;sockets&#039;&#039;&#039; when talking about CPU chips. Depending upon system configuration, a CPU can be either a &#039;&#039;&#039;core&#039;&#039;&#039; or a &#039;&#039;&#039;thread&#039;&#039;&#039;. On &#039;&#039;&#039;BinAC 2 Hyperthreading is activated on every machine&#039;&#039;&#039;. This means that the operating system and Slurm sees each physical core as two logical cores.&lt;br /&gt;
&lt;br /&gt;
= Slurm Commands =&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Brief explanation&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/sbatch.html sbatch] || Submits a job and queues it in an input queue&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/salloc.html saclloc] || Request resources for an interactive job&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/squeue.html squeue] || Displays information about active, eligible, blocked, and/or recently completed jobs &lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/scontrol.html scontrol] || Displays detailed job state information&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/scontrol.html sstat] || Displays status information about a running job&lt;br /&gt;
|- &lt;br /&gt;
| [https://slurm.schedmd.com/scancel.html scancel] || Cancels a job&lt;br /&gt;
|- &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Interactive Jobs ==&lt;br /&gt;
&lt;br /&gt;
You can run interactive jobs for testing and developing your job scripts.&lt;br /&gt;
Several nodes are reserved for interactive work, so your jobs should start right away.&lt;br /&gt;
You can only submit one job to this partition at a time. A job can run for up to 10 hours (about one workday).&lt;br /&gt;
&lt;br /&gt;
This example command gives you 16 cores and 128 GB of memory for four hours on one of the reserved nodes:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
salloc --partition=interactive --time=4:00:00 --cpus-per-task=16 --mem=128gb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also use srun to request the same resources:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
srun --partition=interactive --time=4:00:00 --cpus-per-task=16 --mem=128gb --pty bash&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Job Submission : sbatch ==&lt;br /&gt;
&lt;br /&gt;
Batch jobs are submitted by using the command &#039;&#039;&#039;sbatch&#039;&#039;&#039;. The main purpose of the &#039;&#039;&#039;sbatch&#039;&#039;&#039; command is to specify the resources that are needed to run the job. &#039;&#039;&#039;sbatch&#039;&#039;&#039; will then queue the batch job. However, starting of batch job depends on the availability of the requested resources and the fair sharing value.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== sbatch Command Parameters ===&lt;br /&gt;
The syntax and use of &#039;&#039;&#039;sbatch&#039;&#039;&#039; can be displayed via:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ man sbatch&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;sbatch&#039;&#039;&#039; options can be used from the command line or in your job script. The following table shows the syntax and provides examples for each option.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;5&amp;quot; | sbatch Options&lt;br /&gt;
|-&lt;br /&gt;
! Command line&lt;br /&gt;
! Job Script&lt;br /&gt;
! Purpose&lt;br /&gt;
! Example&lt;br /&gt;
! Default value&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| &amp;lt;code&amp;gt;-t &#039;&#039;time&#039;&#039;&amp;lt;/code&amp;gt;  or  &amp;lt;code&amp;gt;--time=&#039;&#039;time&#039;&#039;&amp;lt;/code&amp;gt;&lt;br /&gt;
| #SBATCH --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| Wall clock time limit.&amp;lt;br&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;-t 2:30:00&amp;lt;/code&amp;gt; Limits run time to 2h 30 min.&amp;lt;/br&amp;gt;&amp;lt;code&amp;gt;-t 2-12&amp;lt;/code&amp;gt; Limits run time to 2 days and 12 hours.&lt;br /&gt;
| Depends on Slurm partition.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -N &#039;&#039;count&#039;&#039;  or  --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of nodes to be used.&lt;br /&gt;
| &amp;lt;code&amp;gt;-N 1&amp;lt;/code&amp;gt; Run job on one node.&amp;lt;/br&amp;gt;&amp;lt;code&amp;gt;-N 2&amp;lt;/code&amp;gt; Run job on two nodes (have to use MPI!)&lt;br /&gt;
| &lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -n &#039;&#039;count&#039;&#039;  or  --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of tasks to be launched.&lt;br /&gt;
| &amp;lt;code&amp;gt;-n 2&amp;lt;/code&amp;gt; launch two tasks in the job.&lt;br /&gt;
| One task per node&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Maximum count of tasks per node.&amp;lt;br&amp;gt;(Replaces the option &amp;lt;code&amp;gt;ppn&amp;lt;/code&amp;gt; of MOAB.)&lt;br /&gt;
| &amp;lt;code&amp;gt;--ntasks-per-node=2&amp;lt;/code&amp;gt; Run 2 tasks per node&lt;br /&gt;
| 1 task per node&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -c &#039;&#039;count&#039;&#039; or --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of CPUs required per (MPI-)task.&lt;br /&gt;
| &amp;lt;code&amp;gt;-c 2&amp;lt;/code&amp;gt; Request two CPUs per (MPI-)task.&lt;br /&gt;
| 1 CPU per (MPI-)task&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| &amp;lt;code&amp;gt;--mem=&amp;lt;size&amp;gt;[units]&amp;lt;/code&amp;gt;&lt;br /&gt;
| #SBATCH --mem=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Memory in MegaByte per node.&amp;lt;/br&amp;gt;&amp;lt;code&amp;gt;[units]&amp;lt;/code&amp;gt; can be one of &amp;lt;code&amp;gt;[K&amp;lt;nowiki&amp;gt;|&amp;lt;/nowiki&amp;gt;M&amp;lt;nowiki&amp;gt;|&amp;lt;/nowiki&amp;gt;G&amp;lt;nowiki&amp;gt;|&amp;lt;/nowiki&amp;gt;T]&amp;lt;/code&amp;gt;.&lt;br /&gt;
| &amp;lt;code&amp;gt;--mem=10g&amp;lt;/code&amp;gt; Request 10GB RAM per node &amp;lt;/br&amp;gt; &amp;lt;code&amp;gt;--mem=0&amp;lt;/code&amp;gt; Request all memory on node&lt;br /&gt;
| Depends on Slurm configuration.&amp;lt;/br&amp;gt;It is better to specify &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; in every case.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Minimum Memory required per allocated CPU.&amp;lt;br&amp;gt;(Replaces the option pmem of MOAB. You should omit &amp;lt;br&amp;gt; the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| Notify user by email when certain event types occur.&amp;lt;br&amp;gt;Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
|  The specified mail-address receives email notification of state&amp;lt;br&amp;gt;changes as defined by --mail-type.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job output is stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job error messages are stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -J &#039;&#039;name&#039;&#039; or --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| Job name.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| #SBATCH --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| Identifies which environment variables from the submission &amp;lt;br&amp;gt; environment are propagated to the launched application. Default &amp;lt;br&amp;gt; is ALL. If adding an environment variable to the submission&amp;lt;br&amp;gt; environment is intended, the argument ALL must be added.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -A &#039;&#039;group-name&#039;&#039; or --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| #SBATCH --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| Change resources used by this job to specified group. You may &amp;lt;br&amp;gt; need this option if your account is assigned to more &amp;lt;br&amp;gt; than one group. By command &amp;quot;scontrol show job&amp;quot; the project &amp;lt;br&amp;gt; group the job is accounted on can be seen behind &amp;quot;Account=&amp;quot;. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -p &#039;&#039;queue-name&#039;&#039; or --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| #SBATCH --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| Request a specific queue for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| #SBATCH --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| Use a specific reservation for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C &#039;&#039;LSDF&#039;&#039; or --constraint=&#039;&#039;LSDF&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=LSDF&lt;br /&gt;
| Job constraint LSDF Filesystems.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== sbatch --partition  &#039;&#039;queues&#039;&#039; ====&lt;br /&gt;
Queue classes define maximum resources such as walltime, nodes and processes per node and queue of the compute system. Details can be found here:&lt;br /&gt;
* [[BinAC2/SLURM_Partitions|BinAC 2 partitions]]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== sbatch Examples ===&lt;br /&gt;
&lt;br /&gt;
If you are coming from Moab/Torque on BinAC 1 or you are new to HPC/Slurm the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; options may confuse you. The following examples give an orientation how to run typical workloads on BinAC 2.&lt;br /&gt;
&lt;br /&gt;
You can find every file mentioned on this Wiki page on BinAC 2 at: &amp;lt;code&amp;gt;/pfs/10/project/examples&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Serial Programs ====&lt;br /&gt;
When you use serial programs that use only one process, you can omit most of the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; parameters, as the default values are sufficient.&lt;br /&gt;
&lt;br /&gt;
To submit a serial job that runs the script &amp;lt;code&amp;gt;serial_job.sh&amp;lt;/code&amp;gt; and requires 5000 MB of main memory and 10 minutes of wall clock time, Slurm will allocate one &#039;&#039;&#039;physical&#039;&#039;&#039; core to your job.&lt;br /&gt;
&lt;br /&gt;
a) execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p compute -t 10:00 --mem=5000m  serial_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or&lt;br /&gt;
b) add after the initial line of your script &#039;&#039;&#039;serial_job.sh&#039;&#039;&#039; the lines:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#SBATCH --time=10:00&lt;br /&gt;
#SBATCH --mem=5000m&lt;br /&gt;
#SBATCH --job-name=simple-serial-job&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
and execute the modified script with the command line option &#039;&#039;--partition=compute&#039;&#039;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p=compute serial_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note, that sbatch command line options overrule script options.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Multithreaded Programs ====&lt;br /&gt;
&lt;br /&gt;
Multithreaded programs run their processes on multiple threads and share resources such as memory.&amp;lt;br&amp;gt;&lt;br /&gt;
You may use a program that includes a built-in option for multithreading (e.g., options like &amp;lt;code&amp;gt;--threads&amp;lt;/code&amp;gt;).&amp;lt;br&amp;gt;&lt;br /&gt;
For multithreaded programs based on &#039;&#039;&#039;Open&#039;&#039;&#039; &#039;&#039;&#039;M&#039;&#039;&#039;ulti-&#039;&#039;&#039;P&#039;&#039;&#039;rocessing (OpenMP) a number of threads is defined by the environment variable &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt;. By default, this variable is set to 1 (&amp;lt;code&amp;gt;OMP_NUM_THREADS=1&amp;lt;/code&amp;gt;). &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Important:&#039;&#039;&#039; Hyperthreading is activated on bwForCluster BinAC 2. Hyperthreading can be beneficial for some applications and codes, but it can also degrade performance in other cases. We therefore recommend to run a small test job with and without hyperthreading to determine the best choice. &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;a) Program with built-in multithreading option&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
The example uses the common Bioinformatics software called &amp;lt;code&amp;gt;samtools&amp;lt;/code&amp;gt; as example for using built-in multithreading.&lt;br /&gt;
&lt;br /&gt;
The module &amp;lt;code&amp;gt;bio/samtools/1.21&amp;lt;/code&amp;gt; provides an example jobscript that requests 4 CPUs and runs &amp;lt;code&amp;gt;samtools sort&amp;lt;/code&amp;gt; with 4 threads.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#SBATCH --time=19:00&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --cpus-per-task=4&lt;br /&gt;
#SBATCH --mem=5000m&lt;br /&gt;
#SBATCH --partition compute&lt;br /&gt;
[...]&lt;br /&gt;
samtools sort -@ 4 sample.bam -o sample.sorted.bam&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can use the example jobscript with this command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch /opt/bwhpc/common/bio/samtools/1.21/bwhpc-examples/binac2-samtools-1.21-bwhpc-examples.slurm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;b) OpenMP&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
We will run an exaple OpenMP Hello-World program. The jobscript looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --cpus-per-task=4&lt;br /&gt;
#SBATCH --time=1:00&lt;br /&gt;
#SBATCH --mem=5000m   &lt;br /&gt;
#SBATCH -J OpenMP-Hello-World&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$(${SLURM_JOB_CPUS_PER_NODE}/2)&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;Executable running on ${SLURM_JOB_CPUS_PER_NODE} cores with ${OMP_NUM_THREADS} threads&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Run parallel Hello World&lt;br /&gt;
/pfs/10/project/examples/openmp_hello_world&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit the job to the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition and get the output (in the stdout-file)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch --partition=compute /pfs/10/project/examples/openmp_hello_world.sh&lt;br /&gt;
&lt;br /&gt;
Executable  running on 4 cores with 4 threads&lt;br /&gt;
Hello from process: 0&lt;br /&gt;
Hello from process: 2&lt;br /&gt;
Hello from process: 1&lt;br /&gt;
Hello from process: 3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
If you want to run MPI-jobs on batch nodes, generate a wrapper script &amp;lt;code&amp;gt;mpi_hello_world.sh&amp;lt;/code&amp;gt; for &#039;&#039;&#039;OpenMPI&#039;&#039;&#039; containing the following lines:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#SBATCH --partition compute&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=2&lt;br /&gt;
#SBATCH --cpus-per-task=2&lt;br /&gt;
#SBATCH --mem-per-cpu=2000&lt;br /&gt;
#SBATCH --time=05:00&lt;br /&gt;
&lt;br /&gt;
# Load the MPI implementation of your choice&lt;br /&gt;
module load mpi/openmpi/4.1-gnu-14.2&lt;br /&gt;
&lt;br /&gt;
# Run your MPI program&lt;br /&gt;
mpirun --bind-to core --map-by core --report-bindings mpi_hello_world&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Attention:&#039;&#039;&#039; Do &#039;&#039;&#039;NOT&#039;&#039;&#039; add mpirun options &amp;lt;code&amp;gt;-n &amp;lt;number_of_processes&amp;gt;&amp;lt;/code&amp;gt; or any other option defining processes or nodes, since Slurm instructs mpirun about number of processes and node hostnames.&lt;br /&gt;
&lt;br /&gt;
Use &#039;&#039;&#039;ALWAYS&#039;&#039;&#039; the MPI options &amp;lt;code&amp;gt;--bind-to core&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--map-by core|socket|node&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please type &amp;lt;code&amp;gt;man mpirun&amp;lt;/code&amp;gt; for an explanation of the meaning of the different options of mpirun option &amp;lt;code&amp;gt;--map-by&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The above jobscript runs four OpenMPI tasks, distributed between two nodes. Because of hyperthreading you have to set &amp;lt;code&amp;gt;--cpus-per-task=2&amp;lt;/code&amp;gt;. This means each MPI-task will get one physical core. If you omit &amp;lt;code&amp;gt;--cpus-per-task=2&amp;lt;/code&amp;gt; MPI will fail.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Attention:&#039;&#039;&#039; Not all compute nodes are connected via Infiniband. Tell Slurm you want Infiniband via &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt; when submitting or add &amp;lt;code&amp;gt;#SBATCH --constraint=ib&amp;lt;/code&amp;gt; to your jobscript.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch --constraint=ib /pfs/10/project/examples/mpi_hello_world.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will run a simple Hello World program:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
Hello world from processor node2-031, rank 3 out of 4 processors&lt;br /&gt;
Hello world from processor node2-031, rank 2 out of 4 processors&lt;br /&gt;
Hello world from processor node2-030, rank 1 out of 4 processors&lt;br /&gt;
Hello world from processor node2-030, rank 0 out of 4 processors&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Multithreaded + MPI parallel Programs ====&lt;br /&gt;
&lt;br /&gt;
Multithreaded + MPI parallel programs operate faster than serial programs on multi CPUs with multiple cores. All threads of one process share resources such as memory. On the contrary MPI tasks do not share memory but can be spawned over different nodes. &#039;&#039;&#039;Because hyperthreading is switched on BinaC 2, the option --cpus-per-task (-c) must be set to 2*n, if you want to use n threads.&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
===== OpenMPI with Multithreading =====&lt;br /&gt;
Multiple MPI tasks using &#039;&#039;&#039;OpenMPI&#039;&#039;&#039; must be launched by the MPI parallel program &#039;&#039;&#039;mpirun&#039;&#039;&#039;. For multithreaded programs based on &#039;&#039;&#039;Open&#039;&#039;&#039; &#039;&#039;&#039;M&#039;&#039;&#039;ulti-&#039;&#039;&#039;P&#039;&#039;&#039;rocessing (OpenMP) number of threads are defined by the environment variable OMP_NUM_THREADS. By default this variable is set to 1 (OMP_NUM_THREADS=1).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;For OpenMPI&#039;&#039;&#039; a job-script to submit a batch job called &#039;&#039;job_ompi_omp.sh&#039;&#039; that runs a MPI program with 4 tasks and a 28-fold threaded program &#039;&#039;ompi_omp_program&#039;&#039; requiring 3000 MByte of physical memory per thread (using 28 threads per MPI task you will get 28*3000 MByte = 84000 MByte per MPI task) and total wall clock time of 3 hours looks like:&lt;br /&gt;
&amp;lt;!--b)--&amp;gt;&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --ntasks=4&lt;br /&gt;
#SBATCH --cpus-per-task=56&lt;br /&gt;
#SBATCH --time=03:00:00&lt;br /&gt;
#SBATCH --mem=83gb    # 84000 MB = 84000/1024 GB = 82.1 GB&lt;br /&gt;
#SBATCH --export=ALL,MPI_MODULE=mpi/openmpi/3.1,EXECUTABLE=./ompi_omp_program&lt;br /&gt;
#SBATCH --output=&amp;quot;parprog_hybrid_%j.out&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
# Use when a defined module environment related to OpenMPI is wished&lt;br /&gt;
module load ${MPI_MODULE}&lt;br /&gt;
export OMP_NUM_THREADS=$((${SLURM_CPUS_PER_TASK}/2))&lt;br /&gt;
export MPIRUN_OPTIONS=&amp;quot;--bind-to core --map-by socket:PE=${OMP_NUM_THREADS} -report-bindings&amp;quot;&lt;br /&gt;
export NUM_CORES=${SLURM_NTASKS}*${OMP_NUM_THREADS}&lt;br /&gt;
echo &amp;quot;${EXECUTABLE} running on ${NUM_CORES} cores with ${SLURM_NTASKS} MPI-tasks and ${OMP_NUM_THREADS} threads&amp;quot;&lt;br /&gt;
startexe=&amp;quot;mpirun -n ${SLURM_NTASKS} ${MPIRUN_OPTIONS} ${EXECUTABLE}&amp;quot;&lt;br /&gt;
echo $startexe&lt;br /&gt;
exec $startexe&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Execute the script &#039;&#039;&#039;job_ompi_omp.sh&#039;&#039;&#039; by command sbatch:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p compute ./job_ompi_omp.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* With the mpirun option &#039;&#039;--bind-to core&#039;&#039; MPI tasks and OpenMP threads are bound to physical cores.&lt;br /&gt;
* With the option &#039;&#039;--map-by node:PE=&amp;lt;value&amp;gt;&#039;&#039; (neighbored) MPI tasks will be attached to different nodes and each MPI task is bound to the first core of a node. &amp;lt;value&amp;gt; must be set to ${OMP_NUM_THREADS}.&lt;br /&gt;
* The option &#039;&#039;-report-bindings&#039;&#039; shows the bindings between MPI tasks and physical cores.&lt;br /&gt;
* The mpirun-options &#039;&#039;&#039;--bind-to core&#039;&#039;&#039;, &#039;&#039;&#039;--map-by socket|...|node:PE=&amp;lt;value&amp;gt;&#039;&#039;&#039; should always be used when running a multithreaded MPI program.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== GPU jobs ====&lt;br /&gt;
&lt;br /&gt;
The nodes in the &amp;lt;code&amp;gt;gpu&amp;lt;/code&amp;gt; queue have 2 or 8 NVIDIA A30/A100/H200 GPUs. Just submitting a job to these queues is not enough to also allocate one or more GPUs, you have to do so using the &amp;quot;--gres=gpu&amp;quot; parameter. You have to specifiy how many GPUs your job needs, e.g. &amp;quot;--gres=gpu:a30:2&amp;quot; will request two NVIDIA A30 GPUs.&lt;br /&gt;
&lt;br /&gt;
The GPU nodes are shared between multiple jobs if the jobs don&#039;t request all the GPUs in a node and there are enough ressources to run more than one job. The individual GPUs are always bound to a single job and will not be shared between different jobs.&lt;br /&gt;
&lt;br /&gt;
a) add after the initial line of your script job.sh the line including the&lt;br /&gt;
information about the GPU usage:&amp;lt;br&amp;gt;   #SBATCH --gres=gpu:a30:2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --ntasks=40&lt;br /&gt;
#SBATCH --time=02:00:00&lt;br /&gt;
#SBATCH --mem=4000&lt;br /&gt;
#SBATCH --gres=gpu:a30:2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or b) execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p &amp;lt;queue&amp;gt; -n 40 -t 02:00:00 --mem 4000 --gres=gpu:a30:2 job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
If you start an interactive session on of the GPU nodes, you can use the &amp;quot;nvidia-smi&amp;quot; command to list the GPUs allocated to your job:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ nvidia-smi&lt;br /&gt;
Sun Mar 29 15:20:05 2020       &lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |&lt;br /&gt;
|-------------------------------+----------------------+----------------------+&lt;br /&gt;
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |&lt;br /&gt;
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |&lt;br /&gt;
|===============================+======================+======================|&lt;br /&gt;
|   0  Tesla V100-SXM2...  Off  | 00000000:3A:00.0 Off |                    0 |&lt;br /&gt;
| N/A   29C    P0    39W / 300W |      9MiB / 32510MiB |      0%      Default |&lt;br /&gt;
+-------------------------------+----------------------+----------------------+&lt;br /&gt;
|   1  Tesla V100-SXM2...  Off  | 00000000:3B:00.0 Off |                    0 |&lt;br /&gt;
| N/A   30C    P0    41W / 300W |      8MiB / 32510MiB |      0%      Default |&lt;br /&gt;
+-------------------------------+----------------------+----------------------+&lt;br /&gt;
                                                                               &lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
| Processes:                                                       GPU Memory |&lt;br /&gt;
|  GPU       PID   Type   Process name                             Usage      |&lt;br /&gt;
|=============================================================================|&lt;br /&gt;
|    0     14228      G   /usr/bin/X                                     8MiB |&lt;br /&gt;
|    1     14228      G   /usr/bin/X                                     8MiB |&lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Upon successfull GPU ressource allocation, SLURM will set the environment variable &amp;lt;code&amp;gt;CUDA_VISIBLE_DEVICES&amp;lt;/code&amp;gt; appropriately. &amp;lt;b&amp;gt;Do not change this variable!&amp;lt;/b&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
In case of using Open MPI, the underlying communication infrastructure (UCX and Open MPI&#039;s BTL) is CUDA-aware.&lt;br /&gt;
However, there may be warnings, e.g. when running&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load compiler/gnu/10.3 mpi/openmpi devel/cuad&lt;br /&gt;
$ mpirun mpirun -np 2 ./mpi_cuda_app&lt;br /&gt;
--------------------------------------&lt;br /&gt;
WARNING: There are more than one active ports on host &#039;uc2n520&#039;, but the&lt;br /&gt;
default subnet GID prefix was detected on more than one of these&lt;br /&gt;
ports.  If these ports are connected to different physical IB&lt;br /&gt;
networks, this configuration will fail in Open MPI.  This version of&lt;br /&gt;
Open MPI requires that every physically separate IB subnet that is&lt;br /&gt;
used between connected MPI processes must have different subnet ID&lt;br /&gt;
values.&lt;br /&gt;
&lt;br /&gt;
Please see this FAQ entry for more details:&lt;br /&gt;
&lt;br /&gt;
  http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid&lt;br /&gt;
&lt;br /&gt;
NOTE: You can turn off this warning by setting the MCA parameter&lt;br /&gt;
      btl_openib_warn_default_gid_prefix to 0.&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please run Open MPI&#039;s mpirun using the following command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun --mca pml ucx --mca btl_openib_warn_default_gid_prefix 0 -np 2 ./mpi_cuda_app&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or disabling the (older) communication layer Bit-Transfer-Layer (short BTL) alltogether:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun --mca pml ucx --mca btl ^openib -np 2 ./mpi_cuda_app&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(Please note, that CUDA per v12.8 is only officially supported with up to GCC-11)&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Start time of job or resources : squeue --start ==&lt;br /&gt;
The command can be used by any user to displays the estimated start time of a job based a number of analysis types based on historical usage, earliest available reservable resources, and priority based backlog. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be run by &#039;&#039;&#039;any user&#039;&#039;&#039;. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== List of your submitted jobs : squeue ==&lt;br /&gt;
Displays information about YOUR active, pending and/or recently completed jobs. The command displays all own active and pending jobs. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be run by any user.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Flags ===&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Flag !! Description&lt;br /&gt;
|-&lt;br /&gt;
| -l, --long&lt;br /&gt;
| Report more of the available information for the selected jobs or job steps, subject to any constraints specified.&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Examples ===&lt;br /&gt;
&#039;&#039;squeue&#039;&#039; example on BinaC 2 &amp;lt;small&amp;gt;(Only your own jobs are displayed!)&amp;lt;/small&amp;gt;.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue &lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
          18088744    single CPV.sbat   ab1234 PD       0:00      1 (Priority)&lt;br /&gt;
          18098414  multiple CPV.sbat   ab1234 PD       0:00      2 (Priority) &lt;br /&gt;
          18090089  multiple CPV.sbat   ab1234  R       2:27      2 uc2n[127-128]&lt;br /&gt;
$ squeue -l&lt;br /&gt;
            JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON) &lt;br /&gt;
         18088654    single CPV.sbat   ab1234 COMPLETI       4:29   2:00:00      1 uc2n374&lt;br /&gt;
         18088785    single CPV.sbat   ab1234  PENDING       0:00   2:00:00      1 (Priority)&lt;br /&gt;
         18098414  multiple CPV.sbat   ab1234  PENDING       0:00   2:00:00      2 (Priority)&lt;br /&gt;
         18088683    single CPV.sbat   ab1234  RUNNING       0:14   2:00:00      1 uc2n413  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The output of &#039;&#039;squeue&#039;&#039; shows how many jobs of yours are running or pending and how many nodes are in use by your jobs.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Shows free resources : sinfo_t_idle ==&lt;br /&gt;
The Slurm command sinfo is used to view partition and node information for a system running Slurm. It incorporates down time, reservations, and node state information in determining the available backfill window. The sinfo command can only be used by the administrator.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
SCC has prepared a special script (sinfo_t_idle) to find out how many processors are available for immediate use on the system. It is anticipated that users will use this information to submit jobs that meet these criteria and thus obtain quick job turnaround times. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be used by any user or administrator. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Example ===&lt;br /&gt;
* The following command displays what resources are available for immediate use for the whole partition.&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sinfo_t_idle&lt;br /&gt;
Partition dev_multiple  :      8 nodes idle&lt;br /&gt;
Partition multiple      :    332 nodes idle&lt;br /&gt;
Partition dev_single    :      4 nodes idle&lt;br /&gt;
Partition single        :     76 nodes idle&lt;br /&gt;
Partition long          :     80 nodes idle&lt;br /&gt;
Partition fat           :      5 nodes idle&lt;br /&gt;
Partition dev_special   :    342 nodes idle&lt;br /&gt;
Partition special       :    342 nodes idle&lt;br /&gt;
Partition dev_multiple_e:      7 nodes idle&lt;br /&gt;
Partition multiple_e    :    335 nodes idle&lt;br /&gt;
Partition gpu_4         :     12 nodes idle&lt;br /&gt;
Partition gpu_8         :      6 nodes idle&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For the above example jobs in all partitions can be run immediately.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Detailed job information : scontrol show job ==&lt;br /&gt;
scontrol show job displays detailed job state information and diagnostic output for all or a specified job of yours. Detailed information is available for active, pending and recently completed jobs. The command scontrol is explained in detail on the webpage https://slurm.schedmd.com/scontrol.html or via manpage (man scontrol). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of all your jobs in normal mode: scontrol show job&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of a job with &amp;lt;jobid&amp;gt; in normal mode: scontrol show job &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
* End users can use scontrol show job to view the status of their &#039;&#039;&#039;own jobs&#039;&#039;&#039; only. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Arguments ===&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Option !! Default !! Description !! Example&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
|- style=&amp;quot;width:12%;&amp;quot; &lt;br /&gt;
| -d&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Detailed mode&lt;br /&gt;
| Example: Display the state with jobid 18089884 in detailed mode. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt;scontrol -d show job 18089884&amp;lt;/pre&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Scontrol show job Example ===&lt;br /&gt;
Here is an example from BinAC 2.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue    # show my own jobs (here the userid is replaced!)&lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
          18089884  multiple CPV.sbat   bq0742  R      33:44      2 uc2n[165-166]&lt;br /&gt;
&lt;br /&gt;
$&lt;br /&gt;
$ # now, see what&#039;s up with my pending job with jobid 18089884&lt;br /&gt;
$ &lt;br /&gt;
$ scontrol show job 18089884&lt;br /&gt;
&lt;br /&gt;
JobId=18089884 JobName=CPV.sbatch&lt;br /&gt;
   UserId=bq0742(8946) GroupId=scc(12345) MCS_label=N/A&lt;br /&gt;
   Priority=3 Nice=0 Account=kit QOS=normal&lt;br /&gt;
   JobState=RUNNING Reason=None Dependency=(null)&lt;br /&gt;
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0&lt;br /&gt;
   RunTime=00:35:06 TimeLimit=02:00:00 TimeMin=N/A&lt;br /&gt;
   SubmitTime=2020-03-16T14:14:54 EligibleTime=2020-03-16T14:14:54&lt;br /&gt;
   AccrueTime=2020-03-16T14:14:54&lt;br /&gt;
   StartTime=2020-03-16T15:12:51 EndTime=2020-03-16T17:12:51 Deadline=N/A&lt;br /&gt;
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-03-16T15:12:51&lt;br /&gt;
   Partition=multiple AllocNode:Sid=uc2n995:5064&lt;br /&gt;
   ReqNodeList=(null) ExcNodeList=(null)&lt;br /&gt;
   NodeList=uc2n[165-166]&lt;br /&gt;
   BatchHost=uc2n165&lt;br /&gt;
   NumNodes=2 NumCPUs=160 NumTasks=80 CPUs/Task=1 ReqB:S:C:T=0:0:*:1&lt;br /&gt;
   TRES=cpu=160,mem=96320M,node=2,billing=160&lt;br /&gt;
   Socks/Node=* NtasksPerN:B:S:C=40:0:*:1 CoreSpec=*&lt;br /&gt;
   MinCPUsNode=40 MinMemoryCPU=1204M MinTmpDiskNode=0&lt;br /&gt;
   Features=(null) DelayBoot=00:00:00&lt;br /&gt;
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)&lt;br /&gt;
   Command=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/CPV.sbatch&lt;br /&gt;
   WorkDir=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin&lt;br /&gt;
   StdErr=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/slurm-18089884.out&lt;br /&gt;
   StdIn=/dev/null&lt;br /&gt;
   StdOut=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/slurm-18089884.out&lt;br /&gt;
   Power=&lt;br /&gt;
   MailUser=(null) MailType=NONE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
You can use standard Linux pipe commands to filter the very detailed scontrol show job output.&lt;br /&gt;
* In which state the job is?&lt;br /&gt;
&amp;lt;pre&amp;gt;$ scontrol show job 18089884 | grep -i State&lt;br /&gt;
   JobState=COMPLETED Reason=None Dependency=(null)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Cancel Slurm Jobs ==&lt;br /&gt;
The scancel command is used to cancel jobs. The command scancel is explained in detail on the webpage https://slurm.schedmd.com/scancel.html or via manpage (man scancel).   &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Canceling own jobs : scancel ===&lt;br /&gt;
scancel is used to signal or cancel jobs, job arrays or job steps. The command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel [-i] &amp;lt;job-id&amp;gt;&lt;br /&gt;
$ scancel -t &amp;lt;job_state_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Flag !! Default !! Description !! Example&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -i, --interactive&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Interactive mode.&lt;br /&gt;
| Cancel the job 987654 interactively. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt; scancel -i 987654 &amp;lt;/pre&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| -t, --state&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Restrict the scancel operation to jobs in a certain state. &amp;lt;br&amp;gt; &amp;quot;job_state_name&amp;quot; may have a value of either &amp;quot;PENDING&amp;quot;, &amp;quot;RUNNING&amp;quot; or &amp;quot;SUSPENDED&amp;quot;.&lt;br /&gt;
| Cancel all jobs in state &amp;quot;PENDING&amp;quot;. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt; scancel -t &amp;quot;PENDING&amp;quot; &amp;lt;/pre&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Resource Managers =&lt;br /&gt;
=== Batch Job (Slurm) Variables ===&lt;br /&gt;
The following environment variables of Slurm are added to your environment once your job has started&lt;br /&gt;
&amp;lt;small&amp;gt;(only an excerpt of the most important ones)&amp;lt;/small&amp;gt;.&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Environment !! Brief explanation&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_CPUS_PER_NODE &lt;br /&gt;
| Number of processes per node dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_NODELIST &lt;br /&gt;
| List of nodes dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_NUM_NODES &lt;br /&gt;
| Number of nodes dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_MEM_PER_NODE &lt;br /&gt;
| Memory per node dedicated to the job &lt;br /&gt;
|- &lt;br /&gt;
| SLURM_NPROCS&lt;br /&gt;
| Total number of processes dedicated to the job &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_CLUSTER_NAME&lt;br /&gt;
| Name of the cluster executing the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_CPUS_PER_TASK &lt;br /&gt;
| Number of CPUs requested per task&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_ACCOUNT&lt;br /&gt;
| Account name &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_ID&lt;br /&gt;
| Job ID&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_NAME&lt;br /&gt;
| Job Name&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_PARTITION&lt;br /&gt;
| Partition/queue running the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_UID&lt;br /&gt;
| User ID of the job&#039;s owner&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_SUBMIT_DIR&lt;br /&gt;
| Job submit folder.  The directory from which sbatch was invoked. &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_USER&lt;br /&gt;
| User name of the job&#039;s owner&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_RESTART_COUNT&lt;br /&gt;
| Number of times job has restarted&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_PROCID&lt;br /&gt;
| Task ID (MPI rank)&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_NTASKS&lt;br /&gt;
| The total number of tasks available for the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_STEP_ID&lt;br /&gt;
| Job step ID&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_STEP_NUM_TASKS&lt;br /&gt;
| Task count (number of MPI ranks)&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_CONSTRAINT&lt;br /&gt;
| Job constraints&lt;br /&gt;
|}&lt;br /&gt;
See also:&lt;br /&gt;
* [https://slurm.schedmd.com/sbatch.html#lbAI Slurm input and output environment variables]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Job Exit Codes ===&lt;br /&gt;
A job&#039;s exit code (also known as exit status, return code and completion code) is captured by SLURM and saved as part of the job record. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Any non-zero exit code will be assumed to be a job failure and will result in a Job State of FAILED with a reason of &amp;quot;NonZeroExitCode&amp;quot;.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The exit code is an 8 bit unsigned number ranging between 0 and 255. While it is possible for a job to return a negative exit code, SLURM will display it as an unsigned value in the 0 - 255 range.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
==== Displaying Exit Codes and Signals ====&lt;br /&gt;
SLURM displays a job&#039;s exit code in the output of the &#039;&#039;&#039;scontrol show job&#039;&#039;&#039; and the sview utility.&lt;br /&gt;
&amp;lt;br&amp;gt; &lt;br /&gt;
When a signal was responsible for a job or step&#039;s termination, the signal number will be displayed after the exit code, delineated by a colon(:).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
==== Submitting Termination Signal ====&lt;br /&gt;
Here is an example, how to &#039;save&#039; a Slurm termination signal in a typical jobscript.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
exit_code=$?&lt;br /&gt;
mpirun  -np &amp;lt;#cores&amp;gt;  &amp;lt;EXE_BIN_DIR&amp;gt;/&amp;lt;executable&amp;gt; ... (options)  2&amp;gt;&amp;amp;1&lt;br /&gt;
[ &amp;quot;$exit_code&amp;quot; -eq 0 ] &amp;amp;&amp;amp; echo &amp;quot;all clean...&amp;quot; || \&lt;br /&gt;
   echo &amp;quot;Executable &amp;lt;EXE_BIN_DIR&amp;gt;/&amp;lt;executable&amp;gt; finished with exit code ${$exit_code}&amp;quot;&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
* Do not use &#039;&#039;&#039;&#039;time&#039;&#039;&#039;&#039; mpirun! The exit code will be the one submitted by the first (time) program.&lt;br /&gt;
* You do not need an &#039;&#039;&#039;exit $exit_code&#039;&#039;&#039; in the scripts.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
----&lt;br /&gt;
[[#top|Back to top]]&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Slurm&amp;diff=15188</id>
		<title>BinAC2/Slurm</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Slurm&amp;diff=15188"/>
		<updated>2025-08-11T10:41:27Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: Updated SLURM memory request&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= General information about Slurm =&lt;br /&gt;
&lt;br /&gt;
Any kind of calculation on the compute nodes of bwForCluster BinAC 2 requires the user to define calculations as a sequence of commands or single command together with required run time, number of CPU cores and main memory and submit all, i.e., the &#039;&#039;&#039;batch job&#039;&#039;&#039;, to a resource and workload managing software. BinAC 2 has installed the workload managing software Slurm. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.&lt;br /&gt;
&lt;br /&gt;
= External Slurm documentation =&lt;br /&gt;
&lt;br /&gt;
You can find the official Slurm configuration and some other material here:&lt;br /&gt;
&lt;br /&gt;
* Slurm documentation: https://slurm.schedmd.com/documentation.html&lt;br /&gt;
* Slurm cheat sheet: https://slurm.schedmd.com/pdfs/summary.pdf&lt;br /&gt;
* Slurm tutorials: https://slurm.schedmd.com/tutorials.html&lt;br /&gt;
&lt;br /&gt;
= SLURM terminology = &lt;br /&gt;
&lt;br /&gt;
SLURM knows and mirrors the division of the cluster into &#039;&#039;&#039;nodes&#039;&#039;&#039; with several &#039;&#039;&#039;cores&#039;&#039;&#039;. When queuing &#039;&#039;&#039;jobs&#039;&#039;&#039;, there are several ways of requesting resources and it is important to know which term means what in SLURM. Here are some basic SLURM terms:&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Job&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Job&lt;br /&gt;
: A job is a self-contained computation that may encompass multiple tasks and is given specific resources like individual CPUs/GPUs, a specific amount of RAM or entire nodes. These resources are said to have been allocated for the job.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Task&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Task&lt;br /&gt;
: A task is a single run of a single process. By default, one task is run per node and one CPU is assigned per task.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Partition&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Partition    &lt;br /&gt;
: A partition (usually called queue outside SLURM) is a waiting line in which jobs are put by users.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Socket&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Socket    &lt;br /&gt;
: Receptacle on the motherboard for one physically packaged processor (each of which can contain one or more cores).&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Core&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Core    &lt;br /&gt;
: A complete private set of registers, execution units, and retirement queues needed to execute programs.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;Thread&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;Thread    &lt;br /&gt;
: One or more hardware contexts withing a single core. Each thread has attributes of one core, managed &amp;amp; scheduled as a single logical processor by the OS.&lt;br /&gt;
&lt;br /&gt;
;&amp;lt;span id=&amp;quot;CPU&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;CPU&lt;br /&gt;
: A &#039;&#039;&#039;CPU&#039;&#039;&#039; in Slurm means a &#039;&#039;&#039;single core&#039;&#039;&#039;. This is different from the more common terminology, where a CPU (a microprocessor chip) consists of multiple cores. Slurm uses the term &#039;&#039;&#039;sockets&#039;&#039;&#039; when talking about CPU chips. Depending upon system configuration, a CPU can be either a &#039;&#039;&#039;core&#039;&#039;&#039; or a &#039;&#039;&#039;thread&#039;&#039;&#039;. On &#039;&#039;&#039;BinAC 2 Hyperthreading is activated on every machine&#039;&#039;&#039;. This means that the operating system and Slurm sees each physical core as two logical cores.&lt;br /&gt;
&lt;br /&gt;
= Slurm Commands =&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Brief explanation&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/sbatch.html sbatch] || Submits a job and queues it in an input queue&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/salloc.html saclloc] || Request resources for an interactive job&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/squeue.html squeue] || Displays information about active, eligible, blocked, and/or recently completed jobs &lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/scontrol.html scontrol] || Displays detailed job state information&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/scontrol.html sstat] || Displays status information about a running job&lt;br /&gt;
|- &lt;br /&gt;
| [https://slurm.schedmd.com/scancel.html scancel] || Cancels a job&lt;br /&gt;
|- &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Job Submission : sbatch ==&lt;br /&gt;
&lt;br /&gt;
Batch jobs are submitted by using the command &#039;&#039;&#039;sbatch&#039;&#039;&#039;. The main purpose of the &#039;&#039;&#039;sbatch&#039;&#039;&#039; command is to specify the resources that are needed to run the job. &#039;&#039;&#039;sbatch&#039;&#039;&#039; will then queue the batch job. However, starting of batch job depends on the availability of the requested resources and the fair sharing value.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== sbatch Command Parameters ===&lt;br /&gt;
The syntax and use of &#039;&#039;&#039;sbatch&#039;&#039;&#039; can be displayed via:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ man sbatch&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;sbatch&#039;&#039;&#039; options can be used from the command line or in your job script. The following table shows the syntax and provides examples for each option.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;5&amp;quot; | sbatch Options&lt;br /&gt;
|-&lt;br /&gt;
! Command line&lt;br /&gt;
! Job Script&lt;br /&gt;
! Purpose&lt;br /&gt;
! Example&lt;br /&gt;
! Default value&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| &amp;lt;code&amp;gt;-t &#039;&#039;time&#039;&#039;&amp;lt;/code&amp;gt;  or  &amp;lt;code&amp;gt;--time=&#039;&#039;time&#039;&#039;&amp;lt;/code&amp;gt;&lt;br /&gt;
| #SBATCH --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| Wall clock time limit.&amp;lt;br&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;-t 2:30:00&amp;lt;/code&amp;gt; Limits run time to 2h 30 min.&amp;lt;/br&amp;gt;&amp;lt;code&amp;gt;-t 2-12&amp;lt;/code&amp;gt; Limits run time to 2 days and 12 hours.&lt;br /&gt;
| Depends on Slurm partition.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -N &#039;&#039;count&#039;&#039;  or  --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of nodes to be used.&lt;br /&gt;
| &amp;lt;code&amp;gt;-N 1&amp;lt;/code&amp;gt; Run job on one node.&amp;lt;/br&amp;gt;&amp;lt;code&amp;gt;-N 2&amp;lt;/code&amp;gt; Run job on two nodes (have to use MPI!)&lt;br /&gt;
| &lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -n &#039;&#039;count&#039;&#039;  or  --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of tasks to be launched.&lt;br /&gt;
| &amp;lt;code&amp;gt;-n 2&amp;lt;/code&amp;gt; launch two tasks in the job.&lt;br /&gt;
| One task per node&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Maximum count of tasks per node.&amp;lt;br&amp;gt;(Replaces the option &amp;lt;code&amp;gt;ppn&amp;lt;/code&amp;gt; of MOAB.)&lt;br /&gt;
| &amp;lt;code&amp;gt;--ntasks-per-node=2&amp;lt;/code&amp;gt; Run 2 tasks per node&lt;br /&gt;
| 1 task per node&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -c &#039;&#039;count&#039;&#039; or --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of CPUs required per (MPI-)task.&lt;br /&gt;
| &amp;lt;code&amp;gt;-c 2&amp;lt;/code&amp;gt; Request two CPUs per (MPI-)task.&lt;br /&gt;
| 1 CPU per (MPI-)task&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| &amp;lt;code&amp;gt;--mem=&amp;lt;size&amp;gt;[units]&amp;lt;/code&amp;gt;&lt;br /&gt;
| #SBATCH --mem=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Memory in MegaByte per node.&amp;lt;/br&amp;gt;&amp;lt;code&amp;gt;[units]&amp;lt;/code&amp;gt; can be one of &amp;lt;code&amp;gt;[K&amp;lt;nowiki&amp;gt;|&amp;lt;/nowiki&amp;gt;M&amp;lt;nowiki&amp;gt;|&amp;lt;/nowiki&amp;gt;G&amp;lt;nowiki&amp;gt;|&amp;lt;/nowiki&amp;gt;T]&amp;lt;/code&amp;gt;.&lt;br /&gt;
| &amp;lt;code&amp;gt;--mem=10g&amp;lt;/code&amp;gt; Request 10GB RAM per node &amp;lt;/br&amp;gt; &amp;lt;code&amp;gt;--mem=0&amp;lt;/code&amp;gt; Request all memory on node&lt;br /&gt;
| Depends on Slurm configuration.&amp;lt;/br&amp;gt;It is better to specify &amp;lt;code&amp;gt;--mem&amp;lt;/code&amp;gt; in every case.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Minimum Memory required per allocated CPU.&amp;lt;br&amp;gt;(Replaces the option pmem of MOAB. You should omit &amp;lt;br&amp;gt; the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| Notify user by email when certain event types occur.&amp;lt;br&amp;gt;Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
|  The specified mail-address receives email notification of state&amp;lt;br&amp;gt;changes as defined by --mail-type.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job output is stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job error messages are stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -J &#039;&#039;name&#039;&#039; or --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| Job name.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| #SBATCH --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| Identifies which environment variables from the submission &amp;lt;br&amp;gt; environment are propagated to the launched application. Default &amp;lt;br&amp;gt; is ALL. If adding an environment variable to the submission&amp;lt;br&amp;gt; environment is intended, the argument ALL must be added.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -A &#039;&#039;group-name&#039;&#039; or --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| #SBATCH --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| Change resources used by this job to specified group. You may &amp;lt;br&amp;gt; need this option if your account is assigned to more &amp;lt;br&amp;gt; than one group. By command &amp;quot;scontrol show job&amp;quot; the project &amp;lt;br&amp;gt; group the job is accounted on can be seen behind &amp;quot;Account=&amp;quot;. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -p &#039;&#039;queue-name&#039;&#039; or --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| #SBATCH --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| Request a specific queue for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| #SBATCH --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| Use a specific reservation for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C &#039;&#039;LSDF&#039;&#039; or --constraint=&#039;&#039;LSDF&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=LSDF&lt;br /&gt;
| Job constraint LSDF Filesystems.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== sbatch --partition  &#039;&#039;queues&#039;&#039; ====&lt;br /&gt;
Queue classes define maximum resources such as walltime, nodes and processes per node and queue of the compute system. Details can be found here:&lt;br /&gt;
* [[BinAC2/SLURM_Partitions|BinAC 2 partitions]]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== sbatch Examples ===&lt;br /&gt;
&lt;br /&gt;
If you are coming from Moab/Torque on BinAC 1 or you are new to HPC/Slurm the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; options may confuse you. The following examples give an orientation how to run typical workloads on BinAC 2.&lt;br /&gt;
&lt;br /&gt;
You can find every file mentioned on this Wiki page on BinAC 2 at: &amp;lt;code&amp;gt;/pfs/10/project/examples&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Serial Programs ====&lt;br /&gt;
When you use serial programs that use only one process, you can omit most of the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; parameters, as the default values are sufficient.&lt;br /&gt;
&lt;br /&gt;
To submit a serial job that runs the script &amp;lt;code&amp;gt;serial_job.sh&amp;lt;/code&amp;gt; and requires 5000 MB of main memory and 10 minutes of wall clock time, Slurm will allocate one &#039;&#039;&#039;physical&#039;&#039;&#039; core to your job.&lt;br /&gt;
&lt;br /&gt;
a) execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p compute -t 10:00 --mem=5000m  serial_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or&lt;br /&gt;
b) add after the initial line of your script &#039;&#039;&#039;serial_job.sh&#039;&#039;&#039; the lines:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#SBATCH --time=10:00&lt;br /&gt;
#SBATCH --mem=5000m&lt;br /&gt;
#SBATCH --job-name=simple-serial-job&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
and execute the modified script with the command line option &#039;&#039;--partition=compute&#039;&#039;&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p=compute serial_job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note, that sbatch command line options overrule script options.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Multithreaded Programs ====&lt;br /&gt;
&lt;br /&gt;
Multithreaded programs run their processes on multiple threads and share resources such as memory.&amp;lt;br&amp;gt;&lt;br /&gt;
You may use a program that includes a built-in option for multithreading (e.g., options like &amp;lt;code&amp;gt;--threads&amp;lt;/code&amp;gt;).&amp;lt;br&amp;gt;&lt;br /&gt;
For multithreaded programs based on &#039;&#039;&#039;Open&#039;&#039;&#039; &#039;&#039;&#039;M&#039;&#039;&#039;ulti-&#039;&#039;&#039;P&#039;&#039;&#039;rocessing (OpenMP) a number of threads is defined by the environment variable &amp;lt;code&amp;gt;OMP_NUM_THREADS&amp;lt;/code&amp;gt;. By default, this variable is set to 1 (&amp;lt;code&amp;gt;OMP_NUM_THREADS=1&amp;lt;/code&amp;gt;). &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Important:&#039;&#039;&#039; Hyperthreading is activated on bwForCluster BinAC 2. Hyperthreading can be beneficial for some applications and codes, but it can also degrade performance in other cases. We therefore recommend to run a small test job with and without hyperthreading to determine the best choice. &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;a) Program with built-in multithreading option&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
The example uses the common Bioinformatics software called &amp;lt;code&amp;gt;samtools&amp;lt;/code&amp;gt; as example for using built-in multithreading.&lt;br /&gt;
&lt;br /&gt;
The module &amp;lt;code&amp;gt;bio/samtools/1.21&amp;lt;/code&amp;gt; provides an example jobscript that requests 4 CPUs and runs &amp;lt;code&amp;gt;samtools sort&amp;lt;/code&amp;gt; with 4 threads.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#SBATCH --time=19:00&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --cpus-per-task=4&lt;br /&gt;
#SBATCH --mem=5000m&lt;br /&gt;
#SBATCH --partition compute&lt;br /&gt;
[...]&lt;br /&gt;
samtools sort -@ 4 sample.bam -o sample.sorted.bam&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can use the example jobscript with this command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch /opt/bwhpc/common/bio/samtools/1.21/bwhpc-examples/binac2-samtools-1.21-bwhpc-examples.slurm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;b) OpenMP&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
We will run an exaple OpenMP Hello-World program. The jobscript looks like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --cpus-per-task=4&lt;br /&gt;
#SBATCH --time=1:00&lt;br /&gt;
#SBATCH --mem=5000m   &lt;br /&gt;
#SBATCH -J OpenMP-Hello-World&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$(${SLURM_JOB_CPUS_PER_NODE}/2)&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;Executable running on ${SLURM_JOB_CPUS_PER_NODE} cores with ${OMP_NUM_THREADS} threads&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Run parallel Hello World&lt;br /&gt;
/pfs/10/project/examples/openmp_hello_world&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Submit the job to the &amp;lt;code&amp;gt;compute&amp;lt;/code&amp;gt; partition and get the output (in the stdout-file)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch --partition=compute /pfs/10/project/examples/openmp_hello_world.sh&lt;br /&gt;
&lt;br /&gt;
Executable  running on 4 cores with 4 threads&lt;br /&gt;
Hello from process: 0&lt;br /&gt;
Hello from process: 2&lt;br /&gt;
Hello from process: 1&lt;br /&gt;
Hello from process: 3&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== OpenMPI ====&lt;br /&gt;
&lt;br /&gt;
If you want to run MPI-jobs on batch nodes, generate a wrapper script &amp;lt;code&amp;gt;mpi_hello_world.sh&amp;lt;/code&amp;gt; for &#039;&#039;&#039;OpenMPI&#039;&#039;&#039; containing the following lines:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#SBATCH --partition compute&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
#SBATCH --ntasks-per-node=2&lt;br /&gt;
#SBATCH --cpus-per-task=2&lt;br /&gt;
#SBATCH --mem-per-cpu=2000&lt;br /&gt;
#SBATCH --time=05:00&lt;br /&gt;
&lt;br /&gt;
# Load the MPI implementation of your choice&lt;br /&gt;
module load mpi/openmpi/4.1-gnu-14.2&lt;br /&gt;
&lt;br /&gt;
# Run your MPI program&lt;br /&gt;
mpirun --bind-to core --map-by core --report-bindings mpi_hello_world&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Attention:&#039;&#039;&#039; Do &#039;&#039;&#039;NOT&#039;&#039;&#039; add mpirun options &amp;lt;code&amp;gt;-n &amp;lt;number_of_processes&amp;gt;&amp;lt;/code&amp;gt; or any other option defining processes or nodes, since Slurm instructs mpirun about number of processes and node hostnames.&lt;br /&gt;
&lt;br /&gt;
Use &#039;&#039;&#039;ALWAYS&#039;&#039;&#039; the MPI options &amp;lt;code&amp;gt;--bind-to core&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;--map-by core|socket|node&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please type &amp;lt;code&amp;gt;man mpirun&amp;lt;/code&amp;gt; for an explanation of the meaning of the different options of mpirun option &amp;lt;code&amp;gt;--map-by&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
The above jobscript runs four OpenMPI tasks, distributed between two nodes. Because of hyperthreading you have to set &amp;lt;code&amp;gt;--cpus-per-task=2&amp;lt;/code&amp;gt;. This means each MPI-task will get one physical core. If you omit &amp;lt;code&amp;gt;--cpus-per-task=2&amp;lt;/code&amp;gt; MPI will fail.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Attention:&#039;&#039;&#039; Not all compute nodes are connected via Infiniband. Tell Slurm you want Infiniband via &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt; when submitting or add &amp;lt;code&amp;gt;#SBATCH --constraint=ib&amp;lt;/code&amp;gt; to your jobscript.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch --constraint=ib /pfs/10/project/examples/mpi_hello_world.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will run a simple Hello World program:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
Hello world from processor node2-031, rank 3 out of 4 processors&lt;br /&gt;
Hello world from processor node2-031, rank 2 out of 4 processors&lt;br /&gt;
Hello world from processor node2-030, rank 1 out of 4 processors&lt;br /&gt;
Hello world from processor node2-030, rank 0 out of 4 processors&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Multithreaded + MPI parallel Programs ====&lt;br /&gt;
&lt;br /&gt;
Multithreaded + MPI parallel programs operate faster than serial programs on multi CPUs with multiple cores. All threads of one process share resources such as memory. On the contrary MPI tasks do not share memory but can be spawned over different nodes. &#039;&#039;&#039;Because hyperthreading is switched on BinaC 2, the option --cpus-per-task (-c) must be set to 2*n, if you want to use n threads.&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
===== OpenMPI with Multithreading =====&lt;br /&gt;
Multiple MPI tasks using &#039;&#039;&#039;OpenMPI&#039;&#039;&#039; must be launched by the MPI parallel program &#039;&#039;&#039;mpirun&#039;&#039;&#039;. For multithreaded programs based on &#039;&#039;&#039;Open&#039;&#039;&#039; &#039;&#039;&#039;M&#039;&#039;&#039;ulti-&#039;&#039;&#039;P&#039;&#039;&#039;rocessing (OpenMP) number of threads are defined by the environment variable OMP_NUM_THREADS. By default this variable is set to 1 (OMP_NUM_THREADS=1).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;For OpenMPI&#039;&#039;&#039; a job-script to submit a batch job called &#039;&#039;job_ompi_omp.sh&#039;&#039; that runs a MPI program with 4 tasks and a 28-fold threaded program &#039;&#039;ompi_omp_program&#039;&#039; requiring 3000 MByte of physical memory per thread (using 28 threads per MPI task you will get 28*3000 MByte = 84000 MByte per MPI task) and total wall clock time of 3 hours looks like:&lt;br /&gt;
&amp;lt;!--b)--&amp;gt;&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --ntasks=4&lt;br /&gt;
#SBATCH --cpus-per-task=56&lt;br /&gt;
#SBATCH --time=03:00:00&lt;br /&gt;
#SBATCH --mem=83gb    # 84000 MB = 84000/1024 GB = 82.1 GB&lt;br /&gt;
#SBATCH --export=ALL,MPI_MODULE=mpi/openmpi/3.1,EXECUTABLE=./ompi_omp_program&lt;br /&gt;
#SBATCH --output=&amp;quot;parprog_hybrid_%j.out&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
# Use when a defined module environment related to OpenMPI is wished&lt;br /&gt;
module load ${MPI_MODULE}&lt;br /&gt;
export OMP_NUM_THREADS=$((${SLURM_CPUS_PER_TASK}/2))&lt;br /&gt;
export MPIRUN_OPTIONS=&amp;quot;--bind-to core --map-by socket:PE=${OMP_NUM_THREADS} -report-bindings&amp;quot;&lt;br /&gt;
export NUM_CORES=${SLURM_NTASKS}*${OMP_NUM_THREADS}&lt;br /&gt;
echo &amp;quot;${EXECUTABLE} running on ${NUM_CORES} cores with ${SLURM_NTASKS} MPI-tasks and ${OMP_NUM_THREADS} threads&amp;quot;&lt;br /&gt;
startexe=&amp;quot;mpirun -n ${SLURM_NTASKS} ${MPIRUN_OPTIONS} ${EXECUTABLE}&amp;quot;&lt;br /&gt;
echo $startexe&lt;br /&gt;
exec $startexe&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Execute the script &#039;&#039;&#039;job_ompi_omp.sh&#039;&#039;&#039; by command sbatch:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p compute ./job_ompi_omp.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* With the mpirun option &#039;&#039;--bind-to core&#039;&#039; MPI tasks and OpenMP threads are bound to physical cores.&lt;br /&gt;
* With the option &#039;&#039;--map-by node:PE=&amp;lt;value&amp;gt;&#039;&#039; (neighbored) MPI tasks will be attached to different nodes and each MPI task is bound to the first core of a node. &amp;lt;value&amp;gt; must be set to ${OMP_NUM_THREADS}.&lt;br /&gt;
* The option &#039;&#039;-report-bindings&#039;&#039; shows the bindings between MPI tasks and physical cores.&lt;br /&gt;
* The mpirun-options &#039;&#039;&#039;--bind-to core&#039;&#039;&#039;, &#039;&#039;&#039;--map-by socket|...|node:PE=&amp;lt;value&amp;gt;&#039;&#039;&#039; should always be used when running a multithreaded MPI program.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== GPU jobs ====&lt;br /&gt;
&lt;br /&gt;
The nodes in the gpu_4 and gpu_8 queues have 4 or 8 NVIDIA Tesla V100 GPUs. Just submitting a job to these queues is not enough to also allocate one or more GPUs, you have to do so using the &amp;quot;--gres=gpu&amp;quot; parameter. You have to specifiy how many GPUs your job needs, e.g. &amp;quot;--gres=gpu:2&amp;quot; will request two GPUs.&lt;br /&gt;
&lt;br /&gt;
The GPU nodes are shared between multiple jobs if the jobs don&#039;t request all the GPUs in a node and there are enough ressources to run more than one job. The individual GPUs are always bound to a single job and will not be shared between different jobs.&lt;br /&gt;
&lt;br /&gt;
a) add after the initial line of your script job.sh the line including the&lt;br /&gt;
information about the GPU usage:&amp;lt;br&amp;gt;   #SBATCH --gres=gpu:2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --ntasks=40&lt;br /&gt;
#SBATCH --time=02:00:00&lt;br /&gt;
#SBATCH --mem=4000&lt;br /&gt;
#SBATCH --gres=gpu:2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or b) execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p &amp;lt;queue&amp;gt; -n 40 -t 02:00:00 --mem 4000 --gres=gpu:2 job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
If you start an interactive session on of the GPU nodes, you can use the &amp;quot;nvidia-smi&amp;quot; command to list the GPUs allocated to your job:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ nvidia-smi&lt;br /&gt;
Sun Mar 29 15:20:05 2020       &lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |&lt;br /&gt;
|-------------------------------+----------------------+----------------------+&lt;br /&gt;
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |&lt;br /&gt;
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |&lt;br /&gt;
|===============================+======================+======================|&lt;br /&gt;
|   0  Tesla V100-SXM2...  Off  | 00000000:3A:00.0 Off |                    0 |&lt;br /&gt;
| N/A   29C    P0    39W / 300W |      9MiB / 32510MiB |      0%      Default |&lt;br /&gt;
+-------------------------------+----------------------+----------------------+&lt;br /&gt;
|   1  Tesla V100-SXM2...  Off  | 00000000:3B:00.0 Off |                    0 |&lt;br /&gt;
| N/A   30C    P0    41W / 300W |      8MiB / 32510MiB |      0%      Default |&lt;br /&gt;
+-------------------------------+----------------------+----------------------+&lt;br /&gt;
                                                                               &lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
| Processes:                                                       GPU Memory |&lt;br /&gt;
|  GPU       PID   Type   Process name                             Usage      |&lt;br /&gt;
|=============================================================================|&lt;br /&gt;
|    0     14228      G   /usr/bin/X                                     8MiB |&lt;br /&gt;
|    1     14228      G   /usr/bin/X                                     8MiB |&lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
In case of using Open MPI, the underlying communication infrastructure (UCX and Open MPI&#039;s BTL) is CUDA-aware.&lt;br /&gt;
However, there may be warnings, e.g. when running&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load compiler/gnu/10.3 mpi/openmpi devel/cuad&lt;br /&gt;
$ mpirun mpirun -np 2 ./mpi_cuda_app&lt;br /&gt;
--------------------------------------&lt;br /&gt;
WARNING: There are more than one active ports on host &#039;uc2n520&#039;, but the&lt;br /&gt;
default subnet GID prefix was detected on more than one of these&lt;br /&gt;
ports.  If these ports are connected to different physical IB&lt;br /&gt;
networks, this configuration will fail in Open MPI.  This version of&lt;br /&gt;
Open MPI requires that every physically separate IB subnet that is&lt;br /&gt;
used between connected MPI processes must have different subnet ID&lt;br /&gt;
values.&lt;br /&gt;
&lt;br /&gt;
Please see this FAQ entry for more details:&lt;br /&gt;
&lt;br /&gt;
  http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid&lt;br /&gt;
&lt;br /&gt;
NOTE: You can turn off this warning by setting the MCA parameter&lt;br /&gt;
      btl_openib_warn_default_gid_prefix to 0.&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please run Open MPI&#039;s mpirun using the following command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun --mca pml ucx --mca btl_openib_warn_default_gid_prefix 0 -np 2 ./mpi_cuda_app&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or disabling the (older) communication layer Bit-Transfer-Layer (short BTL) alltogether:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun --mca pml ucx --mca btl ^openib -np 2 ./mpi_cuda_app&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(Please note, that CUDA per v11.4 is only available with up to GCC-10)&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Start time of job or resources : squeue --start ==&lt;br /&gt;
The command can be used by any user to displays the estimated start time of a job based a number of analysis types based on historical usage, earliest available reservable resources, and priority based backlog. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be run by &#039;&#039;&#039;any user&#039;&#039;&#039;. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== List of your submitted jobs : squeue ==&lt;br /&gt;
Displays information about YOUR active, pending and/or recently completed jobs. The command displays all own active and pending jobs. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be run by any user.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Flags ===&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Flag !! Description&lt;br /&gt;
|-&lt;br /&gt;
| -l, --long&lt;br /&gt;
| Report more of the available information for the selected jobs or job steps, subject to any constraints specified.&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Examples ===&lt;br /&gt;
&#039;&#039;squeue&#039;&#039; example on BinaC 2 &amp;lt;small&amp;gt;(Only your own jobs are displayed!)&amp;lt;/small&amp;gt;.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue &lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
          18088744    single CPV.sbat   ab1234 PD       0:00      1 (Priority)&lt;br /&gt;
          18098414  multiple CPV.sbat   ab1234 PD       0:00      2 (Priority) &lt;br /&gt;
          18090089  multiple CPV.sbat   ab1234  R       2:27      2 uc2n[127-128]&lt;br /&gt;
$ squeue -l&lt;br /&gt;
            JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON) &lt;br /&gt;
         18088654    single CPV.sbat   ab1234 COMPLETI       4:29   2:00:00      1 uc2n374&lt;br /&gt;
         18088785    single CPV.sbat   ab1234  PENDING       0:00   2:00:00      1 (Priority)&lt;br /&gt;
         18098414  multiple CPV.sbat   ab1234  PENDING       0:00   2:00:00      2 (Priority)&lt;br /&gt;
         18088683    single CPV.sbat   ab1234  RUNNING       0:14   2:00:00      1 uc2n413  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The output of &#039;&#039;squeue&#039;&#039; shows how many jobs of yours are running or pending and how many nodes are in use by your jobs.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Shows free resources : sinfo_t_idle ==&lt;br /&gt;
The Slurm command sinfo is used to view partition and node information for a system running Slurm. It incorporates down time, reservations, and node state information in determining the available backfill window. The sinfo command can only be used by the administrator.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
SCC has prepared a special script (sinfo_t_idle) to find out how many processors are available for immediate use on the system. It is anticipated that users will use this information to submit jobs that meet these criteria and thus obtain quick job turnaround times. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be used by any user or administrator. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Example ===&lt;br /&gt;
* The following command displays what resources are available for immediate use for the whole partition.&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sinfo_t_idle&lt;br /&gt;
Partition dev_multiple  :      8 nodes idle&lt;br /&gt;
Partition multiple      :    332 nodes idle&lt;br /&gt;
Partition dev_single    :      4 nodes idle&lt;br /&gt;
Partition single        :     76 nodes idle&lt;br /&gt;
Partition long          :     80 nodes idle&lt;br /&gt;
Partition fat           :      5 nodes idle&lt;br /&gt;
Partition dev_special   :    342 nodes idle&lt;br /&gt;
Partition special       :    342 nodes idle&lt;br /&gt;
Partition dev_multiple_e:      7 nodes idle&lt;br /&gt;
Partition multiple_e    :    335 nodes idle&lt;br /&gt;
Partition gpu_4         :     12 nodes idle&lt;br /&gt;
Partition gpu_8         :      6 nodes idle&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For the above example jobs in all partitions can be run immediately.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Detailed job information : scontrol show job ==&lt;br /&gt;
scontrol show job displays detailed job state information and diagnostic output for all or a specified job of yours. Detailed information is available for active, pending and recently completed jobs. The command scontrol is explained in detail on the webpage https://slurm.schedmd.com/scontrol.html or via manpage (man scontrol). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of all your jobs in normal mode: scontrol show job&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of a job with &amp;lt;jobid&amp;gt; in normal mode: scontrol show job &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
* End users can use scontrol show job to view the status of their &#039;&#039;&#039;own jobs&#039;&#039;&#039; only. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Arguments ===&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Option !! Default !! Description !! Example&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
|- style=&amp;quot;width:12%;&amp;quot; &lt;br /&gt;
| -d&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Detailed mode&lt;br /&gt;
| Example: Display the state with jobid 18089884 in detailed mode. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt;scontrol -d show job 18089884&amp;lt;/pre&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Scontrol show job Example ===&lt;br /&gt;
Here is an example from BinAC 2.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue    # show my own jobs (here the userid is replaced!)&lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
          18089884  multiple CPV.sbat   bq0742  R      33:44      2 uc2n[165-166]&lt;br /&gt;
&lt;br /&gt;
$&lt;br /&gt;
$ # now, see what&#039;s up with my pending job with jobid 18089884&lt;br /&gt;
$ &lt;br /&gt;
$ scontrol show job 18089884&lt;br /&gt;
&lt;br /&gt;
JobId=18089884 JobName=CPV.sbatch&lt;br /&gt;
   UserId=bq0742(8946) GroupId=scc(12345) MCS_label=N/A&lt;br /&gt;
   Priority=3 Nice=0 Account=kit QOS=normal&lt;br /&gt;
   JobState=RUNNING Reason=None Dependency=(null)&lt;br /&gt;
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0&lt;br /&gt;
   RunTime=00:35:06 TimeLimit=02:00:00 TimeMin=N/A&lt;br /&gt;
   SubmitTime=2020-03-16T14:14:54 EligibleTime=2020-03-16T14:14:54&lt;br /&gt;
   AccrueTime=2020-03-16T14:14:54&lt;br /&gt;
   StartTime=2020-03-16T15:12:51 EndTime=2020-03-16T17:12:51 Deadline=N/A&lt;br /&gt;
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-03-16T15:12:51&lt;br /&gt;
   Partition=multiple AllocNode:Sid=uc2n995:5064&lt;br /&gt;
   ReqNodeList=(null) ExcNodeList=(null)&lt;br /&gt;
   NodeList=uc2n[165-166]&lt;br /&gt;
   BatchHost=uc2n165&lt;br /&gt;
   NumNodes=2 NumCPUs=160 NumTasks=80 CPUs/Task=1 ReqB:S:C:T=0:0:*:1&lt;br /&gt;
   TRES=cpu=160,mem=96320M,node=2,billing=160&lt;br /&gt;
   Socks/Node=* NtasksPerN:B:S:C=40:0:*:1 CoreSpec=*&lt;br /&gt;
   MinCPUsNode=40 MinMemoryCPU=1204M MinTmpDiskNode=0&lt;br /&gt;
   Features=(null) DelayBoot=00:00:00&lt;br /&gt;
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)&lt;br /&gt;
   Command=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/CPV.sbatch&lt;br /&gt;
   WorkDir=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin&lt;br /&gt;
   StdErr=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/slurm-18089884.out&lt;br /&gt;
   StdIn=/dev/null&lt;br /&gt;
   StdOut=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/slurm-18089884.out&lt;br /&gt;
   Power=&lt;br /&gt;
   MailUser=(null) MailType=NONE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
You can use standard Linux pipe commands to filter the very detailed scontrol show job output.&lt;br /&gt;
* In which state the job is?&lt;br /&gt;
&amp;lt;pre&amp;gt;$ scontrol show job 18089884 | grep -i State&lt;br /&gt;
   JobState=COMPLETED Reason=None Dependency=(null)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Cancel Slurm Jobs ==&lt;br /&gt;
The scancel command is used to cancel jobs. The command scancel is explained in detail on the webpage https://slurm.schedmd.com/scancel.html or via manpage (man scancel).   &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Canceling own jobs : scancel ===&lt;br /&gt;
scancel is used to signal or cancel jobs, job arrays or job steps. The command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel [-i] &amp;lt;job-id&amp;gt;&lt;br /&gt;
$ scancel -t &amp;lt;job_state_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Flag !! Default !! Description !! Example&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -i, --interactive&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Interactive mode.&lt;br /&gt;
| Cancel the job 987654 interactively. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt; scancel -i 987654 &amp;lt;/pre&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| -t, --state&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Restrict the scancel operation to jobs in a certain state. &amp;lt;br&amp;gt; &amp;quot;job_state_name&amp;quot; may have a value of either &amp;quot;PENDING&amp;quot;, &amp;quot;RUNNING&amp;quot; or &amp;quot;SUSPENDED&amp;quot;.&lt;br /&gt;
| Cancel all jobs in state &amp;quot;PENDING&amp;quot;. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt; scancel -t &amp;quot;PENDING&amp;quot; &amp;lt;/pre&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Resource Managers =&lt;br /&gt;
=== Batch Job (Slurm) Variables ===&lt;br /&gt;
The following environment variables of Slurm are added to your environment once your job has started&lt;br /&gt;
&amp;lt;small&amp;gt;(only an excerpt of the most important ones)&amp;lt;/small&amp;gt;.&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Environment !! Brief explanation&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_CPUS_PER_NODE &lt;br /&gt;
| Number of processes per node dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_NODELIST &lt;br /&gt;
| List of nodes dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_NUM_NODES &lt;br /&gt;
| Number of nodes dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_MEM_PER_NODE &lt;br /&gt;
| Memory per node dedicated to the job &lt;br /&gt;
|- &lt;br /&gt;
| SLURM_NPROCS&lt;br /&gt;
| Total number of processes dedicated to the job &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_CLUSTER_NAME&lt;br /&gt;
| Name of the cluster executing the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_CPUS_PER_TASK &lt;br /&gt;
| Number of CPUs requested per task&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_ACCOUNT&lt;br /&gt;
| Account name &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_ID&lt;br /&gt;
| Job ID&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_NAME&lt;br /&gt;
| Job Name&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_PARTITION&lt;br /&gt;
| Partition/queue running the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_UID&lt;br /&gt;
| User ID of the job&#039;s owner&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_SUBMIT_DIR&lt;br /&gt;
| Job submit folder.  The directory from which sbatch was invoked. &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_USER&lt;br /&gt;
| User name of the job&#039;s owner&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_RESTART_COUNT&lt;br /&gt;
| Number of times job has restarted&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_PROCID&lt;br /&gt;
| Task ID (MPI rank)&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_NTASKS&lt;br /&gt;
| The total number of tasks available for the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_STEP_ID&lt;br /&gt;
| Job step ID&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_STEP_NUM_TASKS&lt;br /&gt;
| Task count (number of MPI ranks)&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_CONSTRAINT&lt;br /&gt;
| Job constraints&lt;br /&gt;
|}&lt;br /&gt;
See also:&lt;br /&gt;
* [https://slurm.schedmd.com/sbatch.html#lbAI Slurm input and output environment variables]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Job Exit Codes ===&lt;br /&gt;
A job&#039;s exit code (also known as exit status, return code and completion code) is captured by SLURM and saved as part of the job record. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Any non-zero exit code will be assumed to be a job failure and will result in a Job State of FAILED with a reason of &amp;quot;NonZeroExitCode&amp;quot;.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The exit code is an 8 bit unsigned number ranging between 0 and 255. While it is possible for a job to return a negative exit code, SLURM will display it as an unsigned value in the 0 - 255 range.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
==== Displaying Exit Codes and Signals ====&lt;br /&gt;
SLURM displays a job&#039;s exit code in the output of the &#039;&#039;&#039;scontrol show job&#039;&#039;&#039; and the sview utility.&lt;br /&gt;
&amp;lt;br&amp;gt; &lt;br /&gt;
When a signal was responsible for a job or step&#039;s termination, the signal number will be displayed after the exit code, delineated by a colon(:).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
==== Submitting Termination Signal ====&lt;br /&gt;
Here is an example, how to &#039;save&#039; a Slurm termination signal in a typical jobscript.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
exit_code=$?&lt;br /&gt;
mpirun  -np &amp;lt;#cores&amp;gt;  &amp;lt;EXE_BIN_DIR&amp;gt;/&amp;lt;executable&amp;gt; ... (options)  2&amp;gt;&amp;amp;1&lt;br /&gt;
[ &amp;quot;$exit_code&amp;quot; -eq 0 ] &amp;amp;&amp;amp; echo &amp;quot;all clean...&amp;quot; || \&lt;br /&gt;
   echo &amp;quot;Executable &amp;lt;EXE_BIN_DIR&amp;gt;/&amp;lt;executable&amp;gt; finished with exit code ${$exit_code}&amp;quot;&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
* Do not use &#039;&#039;&#039;&#039;time&#039;&#039;&#039;&#039; mpirun! The exit code will be the one submitted by the first (time) program.&lt;br /&gt;
* You do not need an &#039;&#039;&#039;exit $exit_code&#039;&#039;&#039; in the scripts.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
----&lt;br /&gt;
[[#top|Back to top]]&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Login&amp;diff=15179</id>
		<title>BinAC2/Login</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Login&amp;diff=15179"/>
		<updated>2025-07-30T06:59:40Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
Access to bwForCluster BinAC 2 is only possible from IP addresses within the [https://www.belwue.de BelWü] network which connects universities and other scientific institutions in Baden-Württemberg.&lt;br /&gt;
If your computer is in your University network (e.g. at your office), you should be able to connect to bwForCluster BinAC 2 without restrictions.&lt;br /&gt;
If you are outside the BelWü network (e.g. at home), a VPN (virtual private network) connection to your University network must be established first. Please consult the VPN documentation of your University.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Prerequisites for successful login:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
You need to have&lt;br /&gt;
* completed the 3-step [[registration/bwForCluster|bwForCluster registration]] procedure.&lt;br /&gt;
* [[Registration/Password|set a service password]] for bwForCluster BinAC 2.&lt;br /&gt;
* Setup the [[BinAC2/Login#TOTP_Second_Factor|two factor authentication (2FA)]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Login to bwForCluster BinAC 2 =&lt;br /&gt;
&lt;br /&gt;
Login to bwForCluster BinAC 2 is only possible with a Secure Shell (SSH) client for which you must know your [[BinAC2/Login#Username|username]] on the cluster and the [[BinAC2/Login#Hostname|hostname]] of the BinAC 2 login node.&lt;br /&gt;
&lt;br /&gt;
For more gneral information on SSH clients, visit the [[Registration/Login/Client|SSH clients Guide]].&lt;br /&gt;
&lt;br /&gt;
== TOTP Second Factor ==&lt;br /&gt;
&lt;br /&gt;
At the moment no second factor is needed. We are currently implementing a new TOTP procedure.&lt;br /&gt;
&lt;br /&gt;
== Username ==&lt;br /&gt;
&lt;br /&gt;
Your &amp;lt;code&amp;gt;&amp;lt;username&amp;gt;&amp;lt;/code&amp;gt; on BinAC 2 consists of a prefix and your local username.&lt;br /&gt;
For prefixes please refer to the [[Registration/Login/Username|Username Guide]].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;b&amp;gt;Example&amp;lt;/b&amp;gt;: If your local username at your University is &amp;lt;code&amp;gt;ab123&amp;lt;/code&amp;gt; and you are a user from Tübingen University, your username on the cluster is: &amp;lt;code&amp;gt;tu_ab123&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Hostnames ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 has one login node serving as a load balancer. We use DNS round-robin scheduling to load-balance the incoming connections between the actual three login nodes. If you are logging in multiple times, different sessions might run on different login nodes and hence programs started in one session might not be visible in another sessions. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Hostname !! Destination&lt;br /&gt;
|-&lt;br /&gt;
| login.binac2.uni-tuebingen.de || one of the three login nodes &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
You can choose a specific login node by using specific ports on the load balancer. Please only do this if there is a real reason for that (e.g. connecting to a running tmux/screen session).&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Port !! Destination&lt;br /&gt;
|-&lt;br /&gt;
| 2221 || login01&lt;br /&gt;
|-&lt;br /&gt;
| 2222 || login02&lt;br /&gt;
|-&lt;br /&gt;
| 2223 || login03&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
Usage: &amp;lt;code&amp;gt;ssh -p &amp;lt;port&amp;gt; [other options] &amp;lt;username&amp;gt;@login.binac2.uni-tuebingen.de&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Login with SSH command (Linux, Mac, Windows) ==&lt;br /&gt;
&lt;br /&gt;
Most Unix and Unix-like operating systems like Linux or MacOS come with a built-in SSH client provided by the OpenSSH project.&lt;br /&gt;
Windows 10 and Windows also come with a built-in OpenSSH client. &lt;br /&gt;
&lt;br /&gt;
For login use one of the following ssh commands:&lt;br /&gt;
&lt;br /&gt;
 ssh &amp;lt;username&amp;gt;@login.binac2.uni-tuebingen.de&lt;br /&gt;
&lt;br /&gt;
To run graphical applications on the cluster, you need to enable X11 forwarding with the &amp;lt;code&amp;gt;-X&amp;lt;/code&amp;gt; flag:&lt;br /&gt;
&lt;br /&gt;
 ssh -X &amp;lt;username&amp;gt;@login.binac2.uni-tuebingen.de&lt;br /&gt;
&lt;br /&gt;
For login to a specific login node (here: login03):&lt;br /&gt;
&lt;br /&gt;
 ssh -p 2223 &amp;lt;username&amp;gt;@login.binac2.uni-tuebingen.de&lt;br /&gt;
&lt;br /&gt;
== Login with graphical SSH client (Windows) ==&lt;br /&gt;
&lt;br /&gt;
For Windows we suggest using MobaXterm for login and file transfer.&lt;br /&gt;
 &lt;br /&gt;
Start MobaXterm and fill in the following fields:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Remote name              : login.binac2.uni-tuebingen.de&lt;br /&gt;
Specify user name        : &amp;lt;username&amp;gt;&lt;br /&gt;
Port                     : 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After that click on &#039;ok&#039;. Then a terminal will open where you can enter your credentials.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
&lt;br /&gt;
== Login Example ==&lt;br /&gt;
&lt;br /&gt;
To login to bwForCluster BinAC, proceed as follows:&lt;br /&gt;
# Login with SSH command or MoabXterm as shown above.&lt;br /&gt;
# The system will ask for a one-time password &amp;lt;code&amp;gt;One-time password (OATH) for &amp;lt;username&amp;gt;&amp;lt;/code&amp;gt;. Please enter your OTP and confirm it with Enter/Return. The OTP is not displayed when typing. If you do not have a second factor yet, please create one (see [BinAC/Login#TOTP_Second_Factor]]).&lt;br /&gt;
# The system will ask you for your service password &amp;lt;code&amp;gt;Password:&amp;lt;/code&amp;gt;. Please enter it and confirm it with Enter/Return. The password is not displayed when typing. If you do not have a service password yet or have forgotten it, please create one (see [[Registration/Password]]).&lt;br /&gt;
# You will be greeted by the cluster, followed by a shell.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh tu_ab123@login01.binac.uni-tuebingen.de&lt;br /&gt;
One-time password (OATH) for tu_ab123:&lt;br /&gt;
Password: &lt;br /&gt;
&lt;br /&gt;
Last login: ...&lt;br /&gt;
&lt;br /&gt;
          bwFOR Cluster BinAC, Bioinformatics and Astrophysics &lt;br /&gt;
&lt;br /&gt;
------------------------------------------------------------------------------&lt;br /&gt;
&lt;br /&gt;
please submit jobs solely with the &#039;qsub&#039; command. Available queues are&lt;br /&gt;
 &lt;br /&gt;
    tiny  - 20min     - fast queue for testing (four GPU cores available) &lt;br /&gt;
    short - 48hrs     - serial/parallel jobs&lt;br /&gt;
    long  - 7days     - serial/parallel jobs&lt;br /&gt;
    gpu   - 30days    - GPU-only jobs  &lt;br /&gt;
    smp   - 7days     - large SMP jobs ( memory &amp;gt; 128GB/node ) &lt;br /&gt;
&lt;br /&gt;
The COMPUTE and GPU nodes provide 28 cores and 128GB of RAM each. In &lt;br /&gt;
addition, every GPU node is equipped with 2 Nvidia K80 accelerator cards, &lt;br /&gt;
totalling in 4 GPUs per node. The SMP machines provide 40 cores per node &lt;br /&gt;
and 1 TB of RAM.&lt;br /&gt;
&lt;br /&gt;
A local SCRATCH directory (/scratch) is available on each node. A fast, &lt;br /&gt;
parallel WORK file system is mounted on /beegfs/work. Please also use the &lt;br /&gt;
workspace tools.&lt;br /&gt;
&lt;br /&gt;
Register to our BinAC mailing list via &lt;br /&gt;
https://listserv.uni-tuebingen.de/mailman/listinfo/binac_announce&lt;br /&gt;
&lt;br /&gt;
------------------------------------------------------------------------------&lt;br /&gt;
   Please do not keep data on WORK for a prolonged time. If rarely needed and&lt;br /&gt;
   while working on a project, please compress files to an archive. &lt;br /&gt;
------------------------------------------------------------------------------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[tu_ab123@login01 ~]$ &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Allowed Activities on Login Nodes =&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
To guarantee usability for all users you must not run your compute jobs on the login nodes.&lt;br /&gt;
Compute jobs must be submitted as batch jobs.&lt;br /&gt;
Any compute job running on the login nodes will be terminated without notice.&lt;br /&gt;
Long-running compilation or long-running pre- or post-processing tasks must also be submitted as batch jobs.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The login nodes are the access points to the compute system, your &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; directory and your workspaces.&lt;br /&gt;
These nodes are shared with all users. Hence, your activities on the login nodes are primarily limited to setting up your batch jobs.&lt;br /&gt;
Your activities may also be:&lt;br /&gt;
* quick compilation of program code or&lt;br /&gt;
* quick pre- and post-processing of results from batch jobs.&lt;br /&gt;
&lt;br /&gt;
We advise to use interactive batch jobs for compute and memory intensive compilation and pre- and post-processing tasks.&lt;br /&gt;
&lt;br /&gt;
= Related Information =&lt;br /&gt;
&lt;br /&gt;
* If you want to reset your service password, consult the [[Registration/Password|Password Guide]].&lt;br /&gt;
* If you want to register a new token for the two factor authentication (2FA), consult [[BinAC/Login#TOTP_Second_Factor|this section]].&lt;br /&gt;
* If you want to de-register, consult the [[Registration/Deregistration|De-registration Guide]].&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Migrate_Moab_to_Slurm_jobs&amp;diff=15043</id>
		<title>BinAC2/Migrate Moab to Slurm jobs</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Migrate_Moab_to_Slurm_jobs&amp;diff=15043"/>
		<updated>2025-07-03T07:18:57Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== BinAC 1 Queues -&amp;gt; BinAC 2 Slurm partitions ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 uses Slurm scheduler instead of Moab/Torque.&lt;br /&gt;
Please refer to the [[BinAC2/SLURM_Partitions|BinAC 2 Slurm partitions page]] for an overview of available Slurm partitions.&lt;br /&gt;
&lt;br /&gt;
== Moab/Torque flags -&amp;gt; Slurm flags ==&lt;br /&gt;
&lt;br /&gt;
Replace Moab/Torque job specification flags in your job scripts by their corresponding Slurm counterparts.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Commonly used Moab job specification flags and their Slurm equivalents&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Option !! Moab (msub) !! Slurm (sbatch)&lt;br /&gt;
|-&lt;br /&gt;
| Script directive                            || #PBS                                  || #SBATCH&lt;br /&gt;
|-&lt;br /&gt;
| Job name                                    || -N &amp;lt;name&amp;gt;                              || --job-name=&amp;lt;name&amp;gt;  (-J &amp;lt;name&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Account                                     || -A &amp;lt;account&amp;gt;                           || --account=&amp;lt;account&amp;gt; (-A &amp;lt;account&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Queue                                       || -q &amp;lt;queue&amp;gt;                             || --partition=&amp;lt;partition&amp;gt; (-p &amp;lt;partition&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Wall time limit                             || -l walltime=&amp;lt;hh:mm:ss&amp;gt;                 || --time=&amp;lt;hh:mm:ss&amp;gt; (-t &amp;lt;hh:mm:ss&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Node count                                  || -l nodes=&amp;lt;count&amp;gt;                       || --nodes=&amp;lt;count&amp;gt; (-N &amp;lt;count&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Core count                                  || -l procs=&amp;lt;count&amp;gt;                       || --ntasks=&amp;lt;count&amp;gt; (-n &amp;lt;count&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Process count per node                      || -l ppn=&amp;lt;count&amp;gt;                         || --ntasks-per-node=&amp;lt;count&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Core count per process                      ||                                        || --cpus-per-task=&amp;lt;count&amp;gt; (-c &amp;lt;count&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Memory limit per node                       || -l mem=&amp;lt;limit&amp;gt;                         || --mem=&amp;lt;limit&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Memory limit per process                    || -l pmem=&amp;lt;limit&amp;gt;                        || --mem-per-cpu=&amp;lt;limit&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Job array                                   || -t &amp;lt;array indices&amp;gt;                     || --array=&amp;lt;indices&amp;gt; (-a &amp;lt;indices&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Node exclusive job                          || -l naccesspolicy=singlejob             || --exclusive&lt;br /&gt;
|-&lt;br /&gt;
| Initial working directory                   || -d &amp;lt;directory&amp;gt; (default: $HOME)        || --chdir=&amp;lt;directory&amp;gt; (-D &amp;lt;directory&amp;gt;) (default: submission directory)&lt;br /&gt;
|-&lt;br /&gt;
| Standard output file                        || -o &amp;lt;file path&amp;gt;                         || --output=&amp;lt;file&amp;gt; (-o &amp;lt;file&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Standard error file                         || -e &amp;lt;file path&amp;gt;                         || --error=&amp;lt;file&amp;gt;  (-e &amp;lt;file&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Combine stdout/stderr to stdout             || -j oe                                  || --output=&amp;lt;combined stdout/stderr file&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Mail notification events                    || -m &amp;lt;event&amp;gt;                             || --mail-type=&amp;lt;events&amp;gt; (valid types include: NONE, BEGIN, END, FAIL, ALL)&lt;br /&gt;
|-&lt;br /&gt;
| Export environment to job                   || -V                                     || --export=ALL (default)&lt;br /&gt;
|-&lt;br /&gt;
| Don&#039;t export environment to job             || (default)                              || --export=NONE&lt;br /&gt;
|-&lt;br /&gt;
| Export environment variables to job         || -v &amp;lt;var[=value][,var2=value2[, ...]]&amp;gt;  || --export=&amp;lt;var[=value][,var2=value2[,...]]&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Moab/Torque environment variables ==&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
By default Moab does not export any environment variables to the job&#039;s runtime environment.&lt;br /&gt;
With Slurm most of the login environment variables are exported to your job&#039;s runtime environment.&lt;br /&gt;
This includes environment variables from software modules that were loaded at job submission time (and also $HOSTNAME variable).&lt;br /&gt;
See [https://slurm.schedmd.com/sbatch.html sbatch] man page for a complete list of flags and environment variables.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Replace Moab/Torque job specification environment variables in your job scripts by their corresponding Slurm counterparts.&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Information                 !! Moab                !! Torque               !! Slurm                                     &lt;br /&gt;
|-&lt;br /&gt;
| Job name                     || $MOAB_JOBNAME        || $PBS_JOBNAME        || $SLURM_JOB_NAME                           &lt;br /&gt;
|-&lt;br /&gt;
| Job ID                       || $MOAB_JOBID          || $PBS_JOBID          || $SLURM_JOB_ID                             &lt;br /&gt;
|-&lt;br /&gt;
| Submit directory             || $MOAB_SUBMITDIR      || $PBS_O_WORKDIR      || $SLURM_SUBMIT_DIR                         &lt;br /&gt;
|-&lt;br /&gt;
| Number of nodes allocated    || $MOAB_NODECOUNT      || $PBS_NUM_NODES      || $SLURM_JOB_NUM_NODES (and: $SLURM_NNODES) &lt;br /&gt;
|-&lt;br /&gt;
| Node list                    || $MOAB_NODELIST       || cat $PBS_NODEFILE   || $SLURM_JOB_NODELIST                       &lt;br /&gt;
|-&lt;br /&gt;
| Number of processes          || $MOAB_PROCCOUNT      || $PBS_TASKNUM        || $SLURM_NTASKS                             &lt;br /&gt;
|-&lt;br /&gt;
| Requested tasks per node     || ---                    || $PBS_NUM_PPN        || $SLURM_NTASKS_PER_NODE                    &lt;br /&gt;
|-&lt;br /&gt;
| Requested CPUs per task      || ---                  || ---                 || $SLURM_CPUS_PER_TASK                      &lt;br /&gt;
|-&lt;br /&gt;
| Job array index              || $MOAB_JOBARRAYINDEX  || $PBS_ARRAY_INDEX    || $SLURM_ARRAY_TASK_ID                      &lt;br /&gt;
|-&lt;br /&gt;
| Job array range              || $MOAB_JOBARRAYRANGE  || -                   || $SLURM_ARRAY_TASK_COUNT                   &lt;br /&gt;
|-&lt;br /&gt;
| Queue name                   || $MOAB_CLASS          || $PBS_QUEUE          || $SLURM_JOB_PARTITION                      &lt;br /&gt;
|-&lt;br /&gt;
| QOS name                     || $MOAB_QOS            || ---                 || $SLURM_JOB_QOS                            &lt;br /&gt;
|-&lt;br /&gt;
| Number of processes per node | ---                   || $PBS_NUM_PPN        || $SLURM_TASKS_PER_NODE                     &lt;br /&gt;
|-&lt;br /&gt;
| Job user                     || $MOAB_USER           || $PBS_O_LOGNAME      || $SLURM_JOB_USER                           &lt;br /&gt;
|-&lt;br /&gt;
| Hostname                     || $MOAB_MACHINE        || $PBS_O_HOST         || $SLURMD_NODENAME                          &lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15038</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15038"/>
		<updated>2025-07-01T10:23:10Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.5&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 180 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GiB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 480 GB (compute nodes); 7.7 TB (GPU-A30 nodes); 16 TB (GPU-A100 and SMP nodes); 31 TB (GPU-H200 nodes)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA-30)/ ≈ 26 GB/s (GPU-A100 + SMP)/ ≈ 42 GB/s (GPU-H200) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes (nightly)&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;color:red; background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Please note that due to the large capacity of &#039;&#039;&#039;work&#039;&#039;&#039; and &#039;&#039;&#039;project&#039;&#039;&#039; and due to frequent file changes on these file systems, no backup can be provided.&amp;lt;/br&amp;gt;&lt;br /&gt;
Backing up these file systems would require a redundant storage facility with multiple times the capacity of &#039;&#039;&#039;project&#039;&#039;&#039;. Furthermore, regular backups would significantly degrade the performance.&amp;lt;/br&amp;gt;&lt;br /&gt;
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
Each compute project has its own project directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ls -lh /pfs/10/project/&lt;br /&gt;
drwxrwx---. 2 root bw16f003 33K Dec 12 16:46 bw16f003&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As you can see the directory is owned by a group representing your compute project (here bw16f003) and the directory is accessible by all group members. It is upon your group to decide how to use the space inside this directory: shared data folders, personal directories for each project member, software containers, etc.&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
=== Work ===&lt;br /&gt;
&lt;br /&gt;
The data at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs. The primary focus is speed, not capacity.&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user can create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15037</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15037"/>
		<updated>2025-07-01T10:15:35Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.5&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 180 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 512 GB per node; 1920 GB on high-mem nodes&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA-30)/ ≈ 26 GB/s (GPU-A100 + SMP)/ ≈ 42 GB/s (GPU-H200) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes (nightly)&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;color:red; background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Please note that due to the large capacity of &#039;&#039;&#039;work&#039;&#039;&#039; and &#039;&#039;&#039;project&#039;&#039;&#039; and due to frequent file changes on these file systems, no backup can be provided.&amp;lt;/br&amp;gt;&lt;br /&gt;
Backing up these file systems would require a redundant storage facility with multiple times the capacity of &#039;&#039;&#039;project&#039;&#039;&#039;. Furthermore, regular backups would significantly degrade the performance.&amp;lt;/br&amp;gt;&lt;br /&gt;
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
Each compute project has its own project directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ls -lh /pfs/10/project/&lt;br /&gt;
drwxrwx---. 2 root bw16f003 33K Dec 12 16:46 bw16f003&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As you can see the directory is owned by a group representing your compute project (here bw16f003) and the directory is accessible by all group members. It is upon your group to decide how to use the space inside this directory: shared data folders, personal directories for each project member, software containers, etc.&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
=== Work ===&lt;br /&gt;
&lt;br /&gt;
The data at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs. The primary focus is speed, not capacity.&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user can create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15036</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15036"/>
		<updated>2025-07-01T10:14:12Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.5&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 180 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 512 GB per node; 1920 GB on high-mem nodes&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA30)/ ≈ 26 GB/s (GPUA100+SMP)/ ≈ 42 GB/s (GPUH200) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes (nightly)&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;color:red; background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Please note that due to the large capacity of &#039;&#039;&#039;work&#039;&#039;&#039; and &#039;&#039;&#039;project&#039;&#039;&#039; and due to frequent file changes on these file systems, no backup can be provided.&amp;lt;/br&amp;gt;&lt;br /&gt;
Backing up these file systems would require a redundant storage facility with multiple times the capacity of &#039;&#039;&#039;project&#039;&#039;&#039;. Furthermore, regular backups would significantly degrade the performance.&amp;lt;/br&amp;gt;&lt;br /&gt;
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
Each compute project has its own project directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ls -lh /pfs/10/project/&lt;br /&gt;
drwxrwx---. 2 root bw16f003 33K Dec 12 16:46 bw16f003&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As you can see the directory is owned by a group representing your compute project (here bw16f003) and the directory is accessible by all group members. It is upon your group to decide how to use the space inside this directory: shared data folders, personal directories for each project member, software containers, etc.&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
=== Work ===&lt;br /&gt;
&lt;br /&gt;
The data at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs. The primary focus is speed, not capacity.&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user can create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15022</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15022"/>
		<updated>2025-06-27T12:18:53Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.5&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 180 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 512 GB per node; 1920 GB on high-mem nodes&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA30)/ ≈ 26 GB/s (GPUA100+SMP) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes (nightly)&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
| &#039;&#039;&#039;no&#039;&#039;&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;color:red; background-color:#ffffcc;&amp;quot; cellpadding=&amp;quot;10&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
Please note that due to the large capacity of &#039;&#039;&#039;work&#039;&#039;&#039; and &#039;&#039;&#039;project&#039;&#039;&#039; and due to frequent file changes on these file systems, no backup can be provided.&amp;lt;/br&amp;gt;&lt;br /&gt;
Backing up these file systems would require a redundant storage facility with multiple times the capacity of &#039;&#039;&#039;project&#039;&#039;&#039;. Furthermore, regular backups would significantly degrade the performance.&amp;lt;/br&amp;gt;&lt;br /&gt;
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.&amp;lt;/br&amp;gt;&lt;br /&gt;
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
Each compute project has its own project directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ls -lh /pfs/10/project/&lt;br /&gt;
drwxrwx---. 2 root bw16f003 33K Dec 12 16:46 bw16f003&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As you can see the directory is owned by a group representing your compute project (here bw16f003) and the directory is accessible by all group members. It is upon your group to decide how to use the space inside this directory: shared data folders, personal directories for each project member, software containers, etc.&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
=== Work ===&lt;br /&gt;
&lt;br /&gt;
The data at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs. The primary focus is speed, not capacity.&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user can create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15021</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15021"/>
		<updated>2025-06-27T10:10:31Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.5&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 180 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 512 GB per node; 1920 GB on high-mem nodes&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed (read)&lt;br /&gt;
| ≈ 1 GB/s, shared by all nodes&lt;br /&gt;
| max. 12 GB/s&lt;br /&gt;
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping&lt;br /&gt;
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA30)/ ≈ 26 GB/s (GPUA100+SMP) per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
Each compute project has its own project directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ls -lh /pfs/10/project/&lt;br /&gt;
drwxrwx---. 2 root bw16f003 33K Dec 12 16:46 bw16f003&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As you can see the directory is owned by a group representing your compute project (here bw16f003) and the directory is accessible by all group members. It is upon your group to decide how to use the space inside this directory: shared data folders, personal directories for each project member, software containers, etc.&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
=== Work ===&lt;br /&gt;
&lt;br /&gt;
The data at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs. The primary focus is speed, not capacity.&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user can create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
The Lustre file system (&amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PROJECT&amp;lt;/code&amp;gt;) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than &amp;lt;code&amp;gt;WORK&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15020</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15020"/>
		<updated>2025-06-27T09:24:35Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.5&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 180 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 128 / 256&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 512 GB per node; 1920 GB on high-mem nodes&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
Each compute project has its own project directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ls -lh /pfs/10/project/&lt;br /&gt;
drwxrwx---. 2 root bw16f003 33K Dec 12 16:46 bw16f003&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As you can see the directory is owned by a group representing your compute project (here bw16f003) and the directory is accessible by all group members. It is upon your group to decide how to use the space inside this directory: shared data folders, personal directories for each project member, software containers, etc.&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
=== Work ===&lt;br /&gt;
&lt;br /&gt;
The data at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs. The primary focus is speed, not capacity.&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user can create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15019</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=15019"/>
		<updated>2025-06-27T09:06:37Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.5&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 16 SMP node&lt;br /&gt;
* 32 GPU nodes (2xA30)&lt;br /&gt;
* 8 GPU nodes (4xA100)&lt;br /&gt;
* 4 GPU nodes (4xH200)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (H200)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 180 &lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
| 4&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.85 / 2.95&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
| 3.20&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96 // 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
| 1536&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 28000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR 100 IB (84 nodes) / 100GbE (96 nodes)&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| 100GbE&lt;br /&gt;
| HDR 200 IB + 100GbE&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL  (141 GB ECC HBM3e, NVLink)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 512 GB per node; 1920 GB on high-mem nodes&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
Each compute project has its own project directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ls -lh /pfs/10/project/&lt;br /&gt;
drwxrwx---. 2 root bw16f003 33K Dec 12 16:46 bw16f003&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As you can see the directory is owned by a group representing your compute project (here bw16f003) and the directory is accessible by all group members. It is upon your group to decide how to use the space inside this directory: shared data folders, personal directories for each project member, software containers, etc.&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
=== Work ===&lt;br /&gt;
&lt;br /&gt;
The data at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs. The primary focus is speed, not capacity.&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user can create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== SDS@hd ===&lt;br /&gt;
&lt;br /&gt;
SDS@hd is mounted via NFS on login and compute nodes at &amp;lt;syntaxhighlight inline&amp;gt;/mnt/sds-hd&amp;lt;/syntaxhighlight&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.&lt;br /&gt;
&lt;br /&gt;
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation].&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight&amp;gt;&lt;br /&gt;
$ kinit $USER&lt;br /&gt;
Password for &amp;lt;user&amp;gt;@BWSERVICES.UNI-HEIDELBERG.DE: &lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Acknowledgement&amp;diff=14876</id>
		<title>BinAC2/Acknowledgement</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Acknowledgement&amp;diff=14876"/>
		<updated>2025-05-20T10:48:44Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: Added Acknowledgement Page for BinAC2&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;When preparing a publication describing work that involved the usage of a bwForCluster, e.g. BinAC2, please ensure that you reference the bwHPC initiative, the bwHPC-C5 project and – if appropriate – also the bwHPC facility itself. The following sample text is suggested as a starting point.&lt;br /&gt;
 &lt;br /&gt;
 Acknowledgement:&lt;br /&gt;
 The authors acknowledge support by the High Performance and Cloud Computing Group at the Zentrum für Datenverarbeitung of the University of Tübingen, the state of Baden-Württemberg through bwHPC&lt;br /&gt;
 and the German Research Foundation (DFG) through grant no INST 37/1159-1 FUGG.&lt;br /&gt;
&lt;br /&gt;
In addition, we kindly ask you to notify us of any reports, conference papers, journal articles, theses, posters, talks which contain results obtained on any bwHPC resource by sending an email to  &lt;br /&gt;
[mailto:publications@bwhpc.de  publications@bwhpc.de] stating:&lt;br /&gt;
* cluster facility (e.g. bwForCluster BinAC2)&lt;br /&gt;
* RV acronym (e.g. bw16A000)&lt;br /&gt;
* author(s)&lt;br /&gt;
* title &#039;&#039;or&#039;&#039; booktitle&lt;br /&gt;
* journal, volume, pages &#039;&#039;or&#039;&#039; editors, address, publisher &lt;br /&gt;
* year.&lt;br /&gt;
&lt;br /&gt;
Such recognition is highly important for acquiring funding for the next generation of hardware, support services, data storage and infrastructure.&lt;br /&gt;
&lt;br /&gt;
The publications will be referenced on the bwHPC website:&lt;br /&gt;
 https://www.bwhpc.de/user_publications.html&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC/Login&amp;diff=14794</id>
		<title>BinAC/Login</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC/Login&amp;diff=14794"/>
		<updated>2025-05-05T15:45:43Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
Access to bwForCluster BinAC is only possible from IP addresses within the [https://www.belwue.de BelWü] network which connects universities and other scientific institutions in Baden-Württemberg.&lt;br /&gt;
If your computer is in your University network (e.g. at your office), you should be able to connect to bwForCluster BinAC without restrictions.&lt;br /&gt;
If you are outside the BelWü network (e.g. at home), a VPN (virtual private network) connection to your University network must be established first. Please consult the VPN documentation of your University.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Prerequisites for successful login:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
You need to have&lt;br /&gt;
* completed the 3-step [[registration/bwForCluster|bwForCluster registration]] procedure.&lt;br /&gt;
* [[Registration/Password|set a service password]] for bwForCluster BinAC.&lt;br /&gt;
* Setup the [[BinAC/Login#TOTP_Second_Factor|two factor authentication (2FA)]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Login to bwForCluster BinAC =&lt;br /&gt;
&lt;br /&gt;
== TOTP Second Factor ==&lt;br /&gt;
Install a TOTP (time-based one-time password) app to your mobile device such as Google, Microsoft, andOTP, Aegis, FreeOTP, or Yubico Authenticator. These apps work very similar and allow you to scan a QR code containing a secret key used for the TOTP password generation.&lt;br /&gt;
&lt;br /&gt;
Connect to the QR code server and enter your password.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ssh &amp;lt;UserID&amp;gt;@c2fa.binac.uni-tuebingen.de&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the &amp;lt;code&amp;gt;&amp;lt;UserID&amp;gt;&amp;lt;/code&amp;gt; has to follow the [[Registration/Login/Username|username convention]] (also see below). The QR code displayed on screen must be scanned with your authenticator app. Note, the QR code is only displayed once. If you need a new QR code , please contact us (hpcmaster@uni-tuebingen.de) to reset the procedure.&lt;br /&gt;
Your authenticator app now displays a 6-digit number which changes every 30 seconds, representing the second factor required for the login to BinAC.&lt;br /&gt;
&lt;br /&gt;
If you have a new smartphone and cannot migrate your OTP data from the old to the new device, you will have to [[BinAC/Support|open a ticket]]. We will then reset your OTP and you can create a new QR code.&lt;br /&gt;
&lt;br /&gt;
Login to bwForCluster BinAC is only possible with a Secure Shell (SSH) client for which you must know your username on the cluster and the hostname of the login nodes.&lt;br /&gt;
For more gneral information on SSH clients, visit the [[Registration/Login/Client|SSH clients Guide]].&lt;br /&gt;
&lt;br /&gt;
== Username ==&lt;br /&gt;
&lt;br /&gt;
Your username on bwForCluster BinAC consists of a prefix and your local username.&lt;br /&gt;
For prefixes please refer to the [[Registration/Login/Username|Username Guide]].&lt;br /&gt;
&lt;br /&gt;
Example: If your local username at your University is &amp;lt;code&amp;gt;ab123&amp;lt;/code&amp;gt; and you are a user from Tübingen University, your username on the cluster is: &amp;lt;code&amp;gt;tu_ab123&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Hostnames ==&lt;br /&gt;
&lt;br /&gt;
bwForCluster BinAC has three login nodes. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Hostname !! Destination&lt;br /&gt;
|-&lt;br /&gt;
| login01.binac.uni-tuebingen.de || login node 01&lt;br /&gt;
|-&lt;br /&gt;
| login02.binac.uni-tuebingen.de || login node 02&lt;br /&gt;
|-&lt;br /&gt;
| login03.binac.uni-tuebingen.de || login node 03&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Login with SSH command (Linux, Mac, Windows) ==&lt;br /&gt;
&lt;br /&gt;
Most Unix and Unix-like operating systems like Linux or MacOS come with a built-in SSH client provided by the OpenSSH project.&lt;br /&gt;
More recent versions of Windows 10 and Windows 11 using the [https://docs.microsoft.com/en-us/windows/wsl/install Windows Subsystem for Linux] (WSL) also come with a built-in OpenSSH client. &lt;br /&gt;
&lt;br /&gt;
For login use one of the following ssh commands:&lt;br /&gt;
&lt;br /&gt;
 ssh &amp;lt;username&amp;gt;@login01.binac.uni-tuebingen.de&lt;br /&gt;
&lt;br /&gt;
To run graphical applications on the cluster, you need to enable X11 forwarding with the &amp;lt;code&amp;gt;-X&amp;lt;/code&amp;gt; flag:&lt;br /&gt;
&lt;br /&gt;
 ssh -X &amp;lt;username&amp;gt;@login01.binac.uni-tuebingen.de&lt;br /&gt;
&lt;br /&gt;
== Login with graphical SSH client (Windows) ==&lt;br /&gt;
&lt;br /&gt;
For Windows we suggest using MobaXterm for login and file transfer.&lt;br /&gt;
 &lt;br /&gt;
Start MobaXterm and fill in the following fields:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Remote name              : login01.binac.uni-tuebingen.de&lt;br /&gt;
Specify user name        : &amp;lt;username&amp;gt;&lt;br /&gt;
Port                     : 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After that click on &#039;ok&#039;. Then a terminal will open where you can enter your credentials.&lt;br /&gt;
&lt;br /&gt;
== Login Example ==&lt;br /&gt;
&lt;br /&gt;
To login to bwForCluster BinAC, proceed as follows:&lt;br /&gt;
# Login with SSH command or MoabXterm as shown above.&lt;br /&gt;
# The system will ask for a one-time password &amp;lt;code&amp;gt;One-time password (OATH) for &amp;lt;username&amp;gt;&amp;lt;/code&amp;gt;. Please enter your OTP and confirm it with Enter/Return. The OTP is not displayed when typing. If you do not have a second factor yet, please create one (see [BinAC/Login#TOTP_Second_Factor]]).&lt;br /&gt;
# The system will ask you for your service password &amp;lt;code&amp;gt;Password:&amp;lt;/code&amp;gt;. Please enter it and confirm it with Enter/Return. The password is not displayed when typing. If you do not have a service password yet or have forgotten it, please create one (see [[Registration/Password]]).&lt;br /&gt;
# You will be greeted by the cluster, followed by a shell.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh tu_ab123@login01.binac.uni-tuebingen.de&lt;br /&gt;
One-time password (OATH) for tu_ab123:&lt;br /&gt;
Password: &lt;br /&gt;
&lt;br /&gt;
Last login: ...&lt;br /&gt;
&lt;br /&gt;
          bwFOR Cluster BinAC, Bioinformatics and Astrophysics &lt;br /&gt;
&lt;br /&gt;
------------------------------------------------------------------------------&lt;br /&gt;
&lt;br /&gt;
please submit jobs solely with the &#039;qsub&#039; command. Available queues are&lt;br /&gt;
 &lt;br /&gt;
    tiny  - 20min     - fast queue for testing (four GPU cores available) &lt;br /&gt;
    short - 48hrs     - serial/parallel jobs&lt;br /&gt;
    long  - 7days     - serial/parallel jobs&lt;br /&gt;
    gpu   - 30days    - GPU-only jobs  &lt;br /&gt;
    smp   - 7days     - large SMP jobs ( memory &amp;gt; 128GB/node ) &lt;br /&gt;
&lt;br /&gt;
The COMPUTE and GPU nodes provide 28 cores and 128GB of RAM each. In &lt;br /&gt;
addition, every GPU node is equipped with 2 Nvidia K80 accelerator cards, &lt;br /&gt;
totalling in 4 GPUs per node. The SMP machines provide 40 cores per node &lt;br /&gt;
and 1 TB of RAM.&lt;br /&gt;
&lt;br /&gt;
A local SCRATCH directory (/scratch) is available on each node. A fast, &lt;br /&gt;
parallel WORK file system is mounted on /beegfs/work. Please also use the &lt;br /&gt;
workspace tools.&lt;br /&gt;
&lt;br /&gt;
Register to our BinAC mailing list via &lt;br /&gt;
https://listserv.uni-tuebingen.de/mailman/listinfo/binac_announce&lt;br /&gt;
&lt;br /&gt;
------------------------------------------------------------------------------&lt;br /&gt;
   Please do not keep data on WORK for a prolonged time. If rarely needed and&lt;br /&gt;
   while working on a project, please compress files to an archive. &lt;br /&gt;
------------------------------------------------------------------------------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[tu_ab123@login01 ~]$ &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Allowed Activities on Login Nodes =&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
To guarantee usability for all users you must not run your compute jobs on the login nodes.&lt;br /&gt;
Compute jobs must be submitted as batch jobs.&lt;br /&gt;
Any compute job running on the login nodes will be terminated without notice.&lt;br /&gt;
Long-running compilation or long-running pre- or post-processing tasks must also be submitted as batch jobs.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The login nodes are the access points to the compute system, your &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; directory and your workspaces.&lt;br /&gt;
These nodes are shared with all users. Hence, your activities on the login nodes are primarily limited to setting up your batch jobs.&lt;br /&gt;
Your activities may also be:&lt;br /&gt;
* quick compilation of program code or&lt;br /&gt;
* quick pre- and post-processing of results from batch jobs.&lt;br /&gt;
&lt;br /&gt;
We advise to use interactive batch jobs for compute and memory intensive compilation and pre- and post-processing tasks.&lt;br /&gt;
&lt;br /&gt;
= Related Information =&lt;br /&gt;
&lt;br /&gt;
* If you want to reset your service password, consult the [[Registration/Password|Password Guide]].&lt;br /&gt;
* If you want to register a new token for the two factor authentication (2FA), consult [[BinAC/Login#TOTP_Second_Factor|this section]].&lt;br /&gt;
* If you want to de-register, consult the [[Registration/Deregistration|De-registration Guide]].&lt;br /&gt;
* Configuring your shell: [[.bashrc Do&#039;s and Don&#039;ts]]&lt;br /&gt;
&amp;lt;!--* If you need an SSH key for your workflow, read [[Registration/SSH|Registering SSH Keys with your Cluster]].--&amp;gt;&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC/Login&amp;diff=14793</id>
		<title>BinAC/Login</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC/Login&amp;diff=14793"/>
		<updated>2025-05-05T15:44:29Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: Clarified the username to use&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
Access to bwForCluster BinAC is only possible from IP addresses within the [https://www.belwue.de BelWü] network which connects universities and other scientific institutions in Baden-Württemberg.&lt;br /&gt;
If your computer is in your University network (e.g. at your office), you should be able to connect to bwForCluster BinAC without restrictions.&lt;br /&gt;
If you are outside the BelWü network (e.g. at home), a VPN (virtual private network) connection to your University network must be established first. Please consult the VPN documentation of your University.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Prerequisites for successful login:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
You need to have&lt;br /&gt;
* completed the 3-step [[registration/bwForCluster|bwForCluster registration]] procedure.&lt;br /&gt;
* [[Registration/Password|set a service password]] for bwForCluster BinAC.&lt;br /&gt;
* Setup the [[BinAC/Login#TOTP_Second_Factor|two factor authentication (2FA)]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Login to bwForCluster BinAC =&lt;br /&gt;
&lt;br /&gt;
== TOTP Second Factor ==&lt;br /&gt;
Install a TOTP (time-based one-time password) app to your mobile device such as Google, Microsoft, andOTP, Aegis, FreeOTP, or Yubico Authenticator. These apps work very similar and allow you to scan a QR code containing a secret key used for the TOTP password generation.&lt;br /&gt;
&lt;br /&gt;
Connect to the QR code server and enter your password.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ssh &amp;lt;UserID&amp;gt;@c2fa.binac.uni-tuebingen.de&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where the &amp;lt;code&amp;gt;UserID&amp;lt;/code&amp;gt; has to follow the [[Registration/Login/Username|username convention]] (also see below). The QR code displayed on screen must be scanned with your authenticator app. Note, the QR code is only displayed once. If you need a new QR code , please contact us (hpcmaster@uni-tuebingen.de) to reset the procedure.&lt;br /&gt;
Your authenticator app now displays a 6-digit number which changes every 30 seconds, representing the second factor required for the login to BinAC.&lt;br /&gt;
&lt;br /&gt;
If you have a new smartphone and cannot migrate your OTP data from the old to the new device, you will have to [[BinAC/Support|open a ticket]]. We will then reset your OTP and you can create a new QR code.&lt;br /&gt;
&lt;br /&gt;
Login to bwForCluster BinAC is only possible with a Secure Shell (SSH) client for which you must know your username on the cluster and the hostname of the login nodes.&lt;br /&gt;
For more gneral information on SSH clients, visit the [[Registration/Login/Client|SSH clients Guide]].&lt;br /&gt;
&lt;br /&gt;
== Username ==&lt;br /&gt;
&lt;br /&gt;
Your username on bwForCluster BinAC consists of a prefix and your local username.&lt;br /&gt;
For prefixes please refer to the [[Registration/Login/Username|Username Guide]].&lt;br /&gt;
&lt;br /&gt;
Example: If your local username at your University is &amp;lt;code&amp;gt;ab123&amp;lt;/code&amp;gt; and you are a user from Tübingen University, your username on the cluster is: &amp;lt;code&amp;gt;tu_ab123&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Hostnames ==&lt;br /&gt;
&lt;br /&gt;
bwForCluster BinAC has three login nodes. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Hostname !! Destination&lt;br /&gt;
|-&lt;br /&gt;
| login01.binac.uni-tuebingen.de || login node 01&lt;br /&gt;
|-&lt;br /&gt;
| login02.binac.uni-tuebingen.de || login node 02&lt;br /&gt;
|-&lt;br /&gt;
| login03.binac.uni-tuebingen.de || login node 03&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Login with SSH command (Linux, Mac, Windows) ==&lt;br /&gt;
&lt;br /&gt;
Most Unix and Unix-like operating systems like Linux or MacOS come with a built-in SSH client provided by the OpenSSH project.&lt;br /&gt;
More recent versions of Windows 10 and Windows 11 using the [https://docs.microsoft.com/en-us/windows/wsl/install Windows Subsystem for Linux] (WSL) also come with a built-in OpenSSH client. &lt;br /&gt;
&lt;br /&gt;
For login use one of the following ssh commands:&lt;br /&gt;
&lt;br /&gt;
 ssh &amp;lt;username&amp;gt;@login01.binac.uni-tuebingen.de&lt;br /&gt;
&lt;br /&gt;
To run graphical applications on the cluster, you need to enable X11 forwarding with the &amp;lt;code&amp;gt;-X&amp;lt;/code&amp;gt; flag:&lt;br /&gt;
&lt;br /&gt;
 ssh -X &amp;lt;username&amp;gt;@login01.binac.uni-tuebingen.de&lt;br /&gt;
&lt;br /&gt;
== Login with graphical SSH client (Windows) ==&lt;br /&gt;
&lt;br /&gt;
For Windows we suggest using MobaXterm for login and file transfer.&lt;br /&gt;
 &lt;br /&gt;
Start MobaXterm and fill in the following fields:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Remote name              : login01.binac.uni-tuebingen.de&lt;br /&gt;
Specify user name        : &amp;lt;username&amp;gt;&lt;br /&gt;
Port                     : 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After that click on &#039;ok&#039;. Then a terminal will open where you can enter your credentials.&lt;br /&gt;
&lt;br /&gt;
== Login Example ==&lt;br /&gt;
&lt;br /&gt;
To login to bwForCluster BinAC, proceed as follows:&lt;br /&gt;
# Login with SSH command or MoabXterm as shown above.&lt;br /&gt;
# The system will ask for a one-time password &amp;lt;code&amp;gt;One-time password (OATH) for &amp;lt;username&amp;gt;&amp;lt;/code&amp;gt;. Please enter your OTP and confirm it with Enter/Return. The OTP is not displayed when typing. If you do not have a second factor yet, please create one (see [BinAC/Login#TOTP_Second_Factor]]).&lt;br /&gt;
# The system will ask you for your service password &amp;lt;code&amp;gt;Password:&amp;lt;/code&amp;gt;. Please enter it and confirm it with Enter/Return. The password is not displayed when typing. If you do not have a service password yet or have forgotten it, please create one (see [[Registration/Password]]).&lt;br /&gt;
# You will be greeted by the cluster, followed by a shell.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh tu_ab123@login01.binac.uni-tuebingen.de&lt;br /&gt;
One-time password (OATH) for tu_ab123:&lt;br /&gt;
Password: &lt;br /&gt;
&lt;br /&gt;
Last login: ...&lt;br /&gt;
&lt;br /&gt;
          bwFOR Cluster BinAC, Bioinformatics and Astrophysics &lt;br /&gt;
&lt;br /&gt;
------------------------------------------------------------------------------&lt;br /&gt;
&lt;br /&gt;
please submit jobs solely with the &#039;qsub&#039; command. Available queues are&lt;br /&gt;
 &lt;br /&gt;
    tiny  - 20min     - fast queue for testing (four GPU cores available) &lt;br /&gt;
    short - 48hrs     - serial/parallel jobs&lt;br /&gt;
    long  - 7days     - serial/parallel jobs&lt;br /&gt;
    gpu   - 30days    - GPU-only jobs  &lt;br /&gt;
    smp   - 7days     - large SMP jobs ( memory &amp;gt; 128GB/node ) &lt;br /&gt;
&lt;br /&gt;
The COMPUTE and GPU nodes provide 28 cores and 128GB of RAM each. In &lt;br /&gt;
addition, every GPU node is equipped with 2 Nvidia K80 accelerator cards, &lt;br /&gt;
totalling in 4 GPUs per node. The SMP machines provide 40 cores per node &lt;br /&gt;
and 1 TB of RAM.&lt;br /&gt;
&lt;br /&gt;
A local SCRATCH directory (/scratch) is available on each node. A fast, &lt;br /&gt;
parallel WORK file system is mounted on /beegfs/work. Please also use the &lt;br /&gt;
workspace tools.&lt;br /&gt;
&lt;br /&gt;
Register to our BinAC mailing list via &lt;br /&gt;
https://listserv.uni-tuebingen.de/mailman/listinfo/binac_announce&lt;br /&gt;
&lt;br /&gt;
------------------------------------------------------------------------------&lt;br /&gt;
   Please do not keep data on WORK for a prolonged time. If rarely needed and&lt;br /&gt;
   while working on a project, please compress files to an archive. &lt;br /&gt;
------------------------------------------------------------------------------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[tu_ab123@login01 ~]$ &lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Allowed Activities on Login Nodes =&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
To guarantee usability for all users you must not run your compute jobs on the login nodes.&lt;br /&gt;
Compute jobs must be submitted as batch jobs.&lt;br /&gt;
Any compute job running on the login nodes will be terminated without notice.&lt;br /&gt;
Long-running compilation or long-running pre- or post-processing tasks must also be submitted as batch jobs.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The login nodes are the access points to the compute system, your &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; directory and your workspaces.&lt;br /&gt;
These nodes are shared with all users. Hence, your activities on the login nodes are primarily limited to setting up your batch jobs.&lt;br /&gt;
Your activities may also be:&lt;br /&gt;
* quick compilation of program code or&lt;br /&gt;
* quick pre- and post-processing of results from batch jobs.&lt;br /&gt;
&lt;br /&gt;
We advise to use interactive batch jobs for compute and memory intensive compilation and pre- and post-processing tasks.&lt;br /&gt;
&lt;br /&gt;
= Related Information =&lt;br /&gt;
&lt;br /&gt;
* If you want to reset your service password, consult the [[Registration/Password|Password Guide]].&lt;br /&gt;
* If you want to register a new token for the two factor authentication (2FA), consult [[BinAC/Login#TOTP_Second_Factor|this section]].&lt;br /&gt;
* If you want to de-register, consult the [[Registration/Deregistration|De-registration Guide]].&lt;br /&gt;
* Configuring your shell: [[.bashrc Do&#039;s and Don&#039;ts]]&lt;br /&gt;
&amp;lt;!--* If you need an SSH key for your workflow, read [[Registration/SSH|Registering SSH Keys with your Cluster]].--&amp;gt;&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=SDS@hd/Access&amp;diff=14556</id>
		<title>SDS@hd/Access</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=SDS@hd/Access&amp;diff=14556"/>
		<updated>2025-03-31T13:15:09Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: Changed the availability of SDS@HD on BinAC. It was and is currently only mounted on login03, not all login nodes.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;This page provides an overview on how to access data served by SDS@hd. To get an introduction to data transfer in general, see [[Data_Transfer|data transfer]].&lt;br /&gt;
&lt;br /&gt;
== Prerequisites ==&lt;br /&gt;
&lt;br /&gt;
* You need to be [[SDS@hd/Registration|registered]].&lt;br /&gt;
* You need to be in the belwue-Network. This means you have to use the VPN Service of your HomeOrganization, if you want to access SDS@hd from outside the bwHPC-Clusters (e.g. via eduroam or from your personal notebook).&lt;br /&gt;
&lt;br /&gt;
== Needed Information, independent of the chosen tool ==&lt;br /&gt;
&lt;br /&gt;
* [[Registration/Login/Username| Username]]: Same as for the bwHPC Clusters&lt;br /&gt;
* Password: The Service Password that you set at bwServices in the [[SDS@hd/Registration|registration step]].&lt;br /&gt;
* Hostname: The hostname depends on the chosen network protocol:&lt;br /&gt;
** For [[Data_Transfer/SSHFS|SSHFS]] and [[Data_Transfer/SFTP|SFTP]]: &#039;&#039;lsdf02-sshfs.urz.uni-heidelberg.de&#039;&#039;&lt;br /&gt;
** For [[SDS@hd/Access/SMB|SMB]] and [[SDS@hd/Access/NFS|NFS]]: &#039;&#039;lsdf02.urz.uni-heidelberg.de&#039;&#039;&lt;br /&gt;
** For [[Data_Transfer/WebDAV|WebDAV]] the url is: &#039;&#039;https://lsdf02-webdav.urz.uni-heidelberg.de&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== Recommended Setup ==&lt;br /&gt;
The following graphic shows the recommended way for accessing SDS@hd via Windows/Mac/Linux. The table provides an overview of the most important access options and links to the related pages.&amp;lt;br /&amp;gt;&lt;br /&gt;
If you have various use cases, it is recommended to use [[Data_Transfer/Rclone|Rclone]]. You can copy, sync and mount with it. Thanks to its multithreading capability Rclone is a good fit for transferring big data.&amp;lt;br /&amp;gt;&lt;br /&gt;
For an overview of all connection possibilities, please have a look at [[Data_Transfer/All_Data_Transfer_Routes|all data transfer routes]].&lt;br /&gt;
&lt;br /&gt;
[[File:Data_transfer_diagram_simple.jpg|center|500px]]&lt;br /&gt;
&amp;lt;p style=&amp;quot;text-align: center; font-size: small; margin-top: 10px&amp;quot;&amp;gt;Figure 1: SDS@hd main transfer routes&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; &lt;br /&gt;
|- style=&amp;quot;font-weight:bold; text-align:center; vertical-align:middle;&amp;quot;&lt;br /&gt;
! &lt;br /&gt;
! Use Case&lt;br /&gt;
! Windows&lt;br /&gt;
! Mac&lt;br /&gt;
! Linux&lt;br /&gt;
! Possible Bandwith&lt;br /&gt;
! Firewall Ports&lt;br /&gt;
|-&lt;br /&gt;
| [[Data_Transfer/Rclone|Rclone]] + &amp;lt;protocol&amp;gt;&lt;br /&gt;
| copy, sync and mount, multithreading&lt;br /&gt;
| ✓&lt;br /&gt;
| ✓&lt;br /&gt;
| ✓&lt;br /&gt;
| depends on used protocol&lt;br /&gt;
| depends on used protocol&lt;br /&gt;
|-&lt;br /&gt;
| [[SDS@hd/Access/SMB|SMB]]&lt;br /&gt;
| mount as network drive in file explorer or usage via Rclone&lt;br /&gt;
| [[SDS@hd/Access/SMB#Windows|✓]]&lt;br /&gt;
| [[SDS@hd/Access/SMB#Mac|✓]]&lt;br /&gt;
| [[SDS@hd/Access/SMB#Linux|✓]]&lt;br /&gt;
| up to 40 Gbit/sec&lt;br /&gt;
| 139 (netbios), 135 (rpc), 445 (smb)&lt;br /&gt;
|-&lt;br /&gt;
| [[Data_Transfer/WebDAV|WebDAV]]&lt;br /&gt;
| go to solution for restricted networks&lt;br /&gt;
| [✓]&lt;br /&gt;
| ✓&lt;br /&gt;
| ✓&lt;br /&gt;
| up to 100GBit/sec&lt;br /&gt;
| 80 (http), 443 (https)&lt;br /&gt;
|- style=&amp;quot;vertical-align:middle;&amp;quot;&lt;br /&gt;
| [[Data_Transfer/Graphical_Clients#MobaXterm|MobaXterm]]&lt;br /&gt;
| Graphical User Interface (GUI)&lt;br /&gt;
| [[Data_Transfer/Graphical_Clients#MobaXterm|✓]]&lt;br /&gt;
| ☓&lt;br /&gt;
| ☓&lt;br /&gt;
| see sftp&lt;br /&gt;
| see sftp&lt;br /&gt;
|- style=&amp;quot;vertical-align:middle;&amp;quot;&lt;br /&gt;
| [[SDS@hd/Access/NFS|NFS]]&lt;br /&gt;
| mount for multi-user environments&lt;br /&gt;
| ☓&lt;br /&gt;
| ☓&lt;br /&gt;
| [[SDS@hd/Access/NFS|✓]]&lt;br /&gt;
| up to 40 Gbit/sec&lt;br /&gt;
| -&lt;br /&gt;
|- style=&amp;quot;vertical-align:middle;&amp;quot;&lt;br /&gt;
| [[Data_Transfer/SSHFS|SSHFS]]&lt;br /&gt;
| mount, needs stable internet connection&lt;br /&gt;
| ☓&lt;br /&gt;
| [[Data_Transfer/SSHFS#MacOS_&amp;amp;_Linux|✓]]&lt;br /&gt;
| [[Data_Transfer/SSHFS#MacOS_&amp;amp;_Linux|✓]]&lt;br /&gt;
| see sftp&lt;br /&gt;
| see sftp&lt;br /&gt;
|- style=&amp;quot;vertical-align:middle;&amp;quot;&lt;br /&gt;
| [[Data_Transfer/SFTP|SFTP]]&lt;br /&gt;
| interactive shell, better usability when used together with Rclone&lt;br /&gt;
| [[Data_Transfer/SFTP#Windows|✓]]&lt;br /&gt;
| [[Data_Transfer/SFTP#MacOS_&amp;amp;_Linux|✓]]&lt;br /&gt;
| [[Data_Transfer/SFTP#MacOS_&amp;amp;_Linux|✓]]&lt;br /&gt;
| up to 40 Gbit/sec&lt;br /&gt;
| 22 (ssh)&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;p style=&amp;quot;text-align: center; font-size: small; margin-top: 10px&amp;quot;&amp;gt;Table 1: SDS@hd transfer routes&amp;lt;/p&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Access from a bwHPC Cluster ===&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;bwForCluster Helix&#039;&#039;&#039;&amp;lt;br /&amp;gt;&lt;br /&gt;
You can directly access your storage space under &#039;&#039;/mnt/sds-hd/&#039;&#039; on all login and compute nodes.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;bwForCluster BinAC&#039;&#039;&#039;&amp;lt;br /&amp;gt;&lt;br /&gt;
You can directly access your storage space on the login node &#039;&#039;login03&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Other&#039;&#039;&#039;&amp;lt;br /&amp;gt;&lt;br /&gt;
You can mount your SDS@hd SV on the cluster yourself by using [[Data_Transfer/Rclone#Usage_Rclone_Mount | Rclone mount]]. As transfer protocol you can use WebDAV or sftp. For a full overview please have a look at [[Data_Transfer/All_Data_Transfer_Routes | All Data Transfer Routes]].&lt;br /&gt;
&lt;br /&gt;
=== Access via Webbrowser (read-only) ===&lt;br /&gt;
&lt;br /&gt;
Visit [https://lsdf02-webdav.urz.uni-heidelberg.de/ lsdf02-webdav.urz.uni-heidelberg.de] and login with your SDS@hd username and service password. Here you can get an overview of the data in your &amp;amp;quot;Speichervorhaben&amp;amp;quot; and download single files. To be able to do more, like moving data, uploading new files, or downloading complete folders, a suitable client is needed as described above.&lt;br /&gt;
&lt;br /&gt;
== Best Practices ==&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;Managing access rights with ACLs&#039;&#039;&#039; &amp;lt;br /&amp;gt; -&amp;gt; Please set ACLs either via the [https://www.urz.uni-heidelberg.de/de/service-katalog/desktop-und-arbeitsplatz/windows-terminalserver Windows terminal server] or via bwForCluster Helix. ACL changes won&#039;t work when used locally on a mounted directory.&lt;br /&gt;
* &#039;&#039;&#039;Multiuser environment&#039;&#039;&#039; -&amp;lt;br /&amp;gt; -&amp;gt; Use [[SDS@hd/Access/NFS|NFS]]&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=14537</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=14537"/>
		<updated>2025-03-31T07:41:57Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: Domänen aktualisiert&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.4&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and two types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 14 SMP node&lt;br /&gt;
* 32 GPU nodes (A30)&lt;br /&gt;
* 8 GPU nodes (A100)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 180 &lt;br /&gt;
| 14&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/cpu/amd-epyc-7543 AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/cpu/amd-epyc-7443 AMD EPYC Milan 7443]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/cpu/amd-epyc-7543 AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/cpu/amd-epyc-7543 AMD EPYC Milan 7543]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.85&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR IB (80 nodes) / 100GbE&lt;br /&gt;
| HDR&lt;br /&gt;
| HDR&lt;br /&gt;
| HDR&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
= Storage =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 512 GB per node; 1920 GB on high-mem nodes&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
Each compute project has its own project directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ls -lh /pfs/10/project/&lt;br /&gt;
drwxrwx---. 2 root bw16f003 33K Dec 12 16:46 bw16f003&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As you can see the directory is owned by a group representing your compute project (here bw16f003) and the directory is accessible by all group members. It is upon your group to decide how to use the space inside this directory: shared data folders, personal directories for each project member, software containers, etc.&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
=== Work ===&lt;br /&gt;
&lt;br /&gt;
The data at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs. The primary focus is speed, not capacity.&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user can create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC/Moab&amp;diff=14352</id>
		<title>BinAC/Moab</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC/Moab&amp;diff=14352"/>
		<updated>2025-03-14T14:08:25Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: typo&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
As of February 1st 2025 Moab® is not licensed any more.&lt;br /&gt;
As a consequence, the tools previously provided by the module &amp;lt;code&amp;gt;module load system/moab/9.1.3&amp;lt;/code&amp;gt; (like &amp;lt;code&amp;gt;checkjob&amp;lt;/code&amp;gt;) are not available any more.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=  Torque scheduler = &lt;br /&gt;
&lt;br /&gt;
Any kind of calculation on the bwForCluster BinAC compute nodes requires the user to define calculations as a sequence of commands or single command together with required run time, number of CPU cores and main memory and submit all, i.e., the &#039;&#039;&#039;batch job&#039;&#039;&#039;, to a resource and workload managing software. Therefore any job submission by the user is to be executed by commands of the Torque scheduler. Torque queues and runs user jobs based on fair sharing policies.  &lt;br /&gt;
&lt;br /&gt;
== Torque Commands ==&lt;br /&gt;
&lt;br /&gt;
Some of the most used Torque commands for non-administrators working on the bwForCluster BinAC&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Torque commands !! Brief explanation&lt;br /&gt;
|-&lt;br /&gt;
| [[#Job Submission : qsub|qsub]] || Submits a job and queues it in an input queue [[http://docs.adaptivecomputing.com/mwm/6-1-9/Content/commands/qsub.html qsub]] &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Job Submission : qsub ==&lt;br /&gt;
&lt;br /&gt;
Batch jobs are submitted by using the command &#039;&#039;&#039;qsub&#039;&#039;&#039;. The main purpose of the &#039;&#039;&#039;qsub&#039;&#039;&#039; command is to specify the resources that are needed to run the job. &#039;&#039;&#039;qsub&#039;&#039;&#039; will then queue the batch job. However, starting of batch job depends on availability of the requested resources and the fair sharing value.&lt;br /&gt;
&lt;br /&gt;
=== qsub Command Parameters ===&lt;br /&gt;
The syntax and use of &#039;&#039;&#039;qsub&#039;&#039;&#039; can be displayed via:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ man qsub&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;qsub&#039;&#039;&#039; options can be used from the command line or in your job script.&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;3&amp;quot; | qsub Options&lt;br /&gt;
|-&lt;br /&gt;
! Command line&lt;br /&gt;
! Script&lt;br /&gt;
! Purpose&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -l &#039;&#039;resources&#039;&#039;&lt;br /&gt;
| #PBS -l &#039;&#039;resources&#039;&#039;&lt;br /&gt;
| Defines the resources that are required by the job.&amp;lt;br&amp;gt;&lt;br /&gt;
See the description below for this important flag.&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -N &#039;&#039;name&#039;&#039;&lt;br /&gt;
| #PBS -N &#039;&#039;name&#039;&#039;&lt;br /&gt;
| Gives a user specified name to the job.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -o &#039;&#039;filename&#039;&#039;&lt;br /&gt;
| #PBS -o &#039;&#039;filename&#039;&#039;&lt;br /&gt;
| Defines the file-name to be used for the standard output stream of the&amp;lt;br&amp;gt;&lt;br /&gt;
batch job. By default the file with defined file name is placed under your&amp;lt;br&amp;gt;&lt;br /&gt;
job submit directory. To place under a different location, expand&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;file name&#039;&#039; by the relative or absolute path of destination.&amp;lt;br&amp;gt;&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -q &#039;&#039;queue&#039;&#039;&lt;br /&gt;
| #PBS -q &#039;&#039;queue&#039;&#039;&lt;br /&gt;
| Defines the queue class&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -v &#039;&#039;variable=arg&#039;&#039;&lt;br /&gt;
| #PBS -v &#039;&#039;variable=arg&#039;&#039;&lt;br /&gt;
| Expands the list of environment variables that are exported to the job&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -S &#039;&#039;Shell&#039;&#039;&lt;br /&gt;
| #PBS -S &#039;&#039;Shell&#039;&#039;&lt;br /&gt;
| Declares the shell (state path+name, e.g. /bin/bash) that interpret&amp;lt;br&amp;gt;&lt;br /&gt;
the job script&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -m &#039;&#039;bea&#039;&#039;&lt;br /&gt;
| #PBS -m &#039;&#039;bea&#039;&#039;&lt;br /&gt;
| Send email when job begins (b), ends (e) or aborts (a).&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -M &#039;&#039;name@uni.de&#039;&#039;&lt;br /&gt;
| #PBS -M &#039;&#039;name@uni.de&#039;&#039;&lt;br /&gt;
| Send email to the specified email address &amp;quot;name@uni.de&amp;quot;.&amp;lt;br&amp;gt; &lt;br /&gt;
Be careful what you wish for, this may generate a lot of emails!&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==== qsub -l &#039;&#039;resource_list&#039;&#039; ====&lt;br /&gt;
The &#039;&#039;&#039;-l&#039;&#039;&#039; option is one of the most important qsub options. It is used to specify a number of resource requirements for your job. Multiple resource strings are separated by commas.&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;3&amp;quot; | qsub -l &#039;&#039;resource_list&#039;&#039;&lt;br /&gt;
|-&lt;br /&gt;
! resource&lt;br /&gt;
! Purpose&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -l nodes=2:ppn=16&lt;br /&gt;
| Number of nodes and number of processes per node&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -l walltime=600 &amp;lt;br&amp;gt; -l walltime=01:30:00&lt;br /&gt;
| Wall-clock time. Default units are seconds.&amp;lt;br&amp;gt;&lt;br /&gt;
HH:MM:SS format is also accepted.&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -l  pmem=1000mb&lt;br /&gt;
| Maximum amount of physical memory used by any single process of the job.&amp;lt;br&amp;gt;&lt;br /&gt;
Allowed units are kb, mb, gb. Be aware that &#039;&#039;&#039;processes&#039;&#039;&#039; are either &#039;&#039;MPI tasks&#039;&#039;&amp;lt;br&amp;gt;&lt;br /&gt;
memory for all &#039;&#039;MPI tasks&#039;&#039; or all &#039;&#039;threads&#039;&#039; of the job.&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -l  advres=&#039;&#039;res_name&#039;&#039;&lt;br /&gt;
| Specifies the reservation &amp;quot;res_name&amp;quot; required to run the job.&amp;lt;/div&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==== qsub -q &#039;&#039;queues&#039;&#039; ====&lt;br /&gt;
Queue classes define maximum resources such as walltime, nodes and processes per node and partition of the compute system. Note that queue settings of the bwHPC cluster are not &#039;&#039;&#039;identical&#039;&#039;&#039;, but differ due to their different prerequisites, such as HPC performance, scalability and throughput levels. Details can be found here:&lt;br /&gt;
* [[BinAC/Queues|bwForCluster BINAC queue settings]]&lt;br /&gt;
&lt;br /&gt;
With the change from MOAB to Torque, you may have to adapt your jobscripts in order to use certain queues:&lt;br /&gt;
&lt;br /&gt;
* gpu: add &amp;lt;code&amp;gt;:gpu&amp;lt;/code&amp;gt; to the node/proc resource request, e.g.: &amp;lt;code&amp;gt;-l nodes=x:ppn=n:&#039;&#039;&#039;gpu&#039;&#039;&#039;&amp;lt;/code&amp;gt;&lt;br /&gt;
* short: add &amp;lt;code&amp;gt;:short&amp;lt;/code&amp;gt; to the node/proc resource request, e.g.: &amp;lt;code&amp;gt;-l nodes=x:ppn=n:&#039;&#039;&#039;short&#039;&#039;&#039;&amp;lt;/code&amp;gt;&lt;br /&gt;
* long: add &amp;lt;code&amp;gt;:long&amp;lt;/code&amp;gt; to the node/proc resource request, e.g.: &amp;lt;code&amp;gt;-l nodes=x:ppn=n:&#039;&#039;&#039;long&#039;&#039;&#039;&amp;lt;/code&amp;gt;&lt;br /&gt;
* smp: add &amp;lt;code&amp;gt;:smp&amp;lt;/code&amp;gt; to the node/proc resource request, e.g.: &amp;lt;code&amp;gt;-l nodes=x:ppn=n:&#039;&#039;&#039;smp&#039;&#039;&#039;&amp;lt;/code&amp;gt;&lt;br /&gt;
* inter: add &amp;lt;code&amp;gt;:inter&amp;lt;/code&amp;gt; to the node/proc resource request, e.g.: &amp;lt;code&amp;gt;-l nodes=x:ppn=n:&#039;&#039;&#039;inter&#039;&#039;&#039;&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== qsub Examples ===&lt;br /&gt;
&lt;br /&gt;
==== Serial Programs ====&lt;br /&gt;
&lt;br /&gt;
To submit a serial job that runs the script &#039;&#039;&#039;job.sh&#039;&#039;&#039; and that requires 5000 MB of main memory and 3 hours of wall clock time&lt;br /&gt;
&lt;br /&gt;
a) execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -q short -N test -l nodes=1:ppn=1,walltime=3:00:00,mem=5000mb   job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or&lt;br /&gt;
b) add after the initial line of your script &#039;&#039;&#039;job.sh&#039;&#039;&#039; the lines (here with a high memory request):&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#PBS -l nodes=1:ppn=1&lt;br /&gt;
#PBS -l walltime=3:00:00&lt;br /&gt;
#PBS -l mem=200gb&lt;br /&gt;
#PBS -N test&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
and execute the modified script with the command line option &#039;&#039;-q smp&#039;&#039;, as the compute nodes only have 128GB memory.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -q smp job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note, that qsub command line options overrule script options.&lt;br /&gt;
&lt;br /&gt;
==== Multithreaded Programs ====&lt;br /&gt;
&lt;br /&gt;
Multithreaded programs operate faster than serial programs on CPUs with multiple cores.&amp;lt;br&amp;gt;&lt;br /&gt;
Moreover, multiple threads of one process share resources such as memory.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
For multithreaded programs based on &#039;&#039;&#039;Open&#039;&#039;&#039; &#039;&#039;&#039;M&#039;&#039;&#039;ulti-&#039;&#039;&#039;P&#039;&#039;&#039;rocessing (OpenMP) number of threads are defined by the environment variable OMP_NUM_THREADS. By default this variable is set to 1 (OMP_NUM_THREADS=1).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
To submit a batch job called &#039;&#039;OpenMP_Test&#039;&#039; that runs a fourfold threaded program &#039;&#039;omp_executable&#039;&#039; which requires 6000 MByte of total physical memory and total wall clock time of 3 hours:&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;!-- 2014-01-29, at the moment submission of executables does not work, SLURM has to be instructed to generate a wrapper&lt;br /&gt;
a) execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -v OMP_NUM_THREADS=4 -N test -l nodes=1:ppn=4,walltime=3:00:00,mem=6000mb  job_omp.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
* generate the script &#039;&#039;&#039;job_omp.sh&#039;&#039;&#039; containing the following lines:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -l nodes=1:ppn=4&lt;br /&gt;
#PBS -l walltime=3:00:00&lt;br /&gt;
#PBS -l mem=6000mb&lt;br /&gt;
#PBS -v EXECUTABLE=./omp_executable&lt;br /&gt;
#PBS -v MODULE=&amp;lt;placeholder&amp;gt;&lt;br /&gt;
#PBS -N OpenMP_Test&lt;br /&gt;
&lt;br /&gt;
#Usually you should set&lt;br /&gt;
export KMP_AFFINITY=compact,1,0&lt;br /&gt;
#export KMP_AFFINITY=verbose,compact,1,0 prints messages concerning the supported affinity&lt;br /&gt;
#KMP_AFFINITY Description: https://software.intel.com/en-us/node/524790#KMP_AFFINITY_ENVIRONMENT_VARIABLE&lt;br /&gt;
&lt;br /&gt;
module load ${MODULE}&lt;br /&gt;
export OMP_NUM_THREADS=${PBS_NUM_PPN}&lt;br /&gt;
echo &amp;quot;Executable ${EXECUTABLE} running on ${PBS_NUM_PPN} cores with ${OMP_NUM_THREADS} threads&amp;quot;&lt;br /&gt;
startexe=${EXECUTABLE}&lt;br /&gt;
echo $startexe&lt;br /&gt;
exec $startexe&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Using Intel compiler the environment variable KMP_AFFINITY switches on binding of threads to specific cores and, if necessary, replace &amp;lt;placeholder&amp;gt; with the required modulefile to enable the OpenMP environment and execute the script &#039;&#039;&#039;job_omp.sh&#039;&#039;&#039; adding the queue class &#039;&#039;short&#039;&#039; as qsub option:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -q short job_omp.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note, that qsub command line options overrule script options, e.g.,&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -l mem=2000mb -q short job_omp.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
overwrites the script setting of 6000 MByte with 2000 MByte.&lt;br /&gt;
&lt;br /&gt;
==== MPI Parallel Programs ====&lt;br /&gt;
&lt;br /&gt;
MPI parallel programs run faster than serial programs on multi CPU and multi core systems. N-fold spawned processes of the MPI program, i.e., &#039;&#039;&#039;MPI tasks&#039;&#039;&#039;,  run simultaneously and communicate via the Message Passing Interface (MPI) paradigm. MPI tasks do not share memory but can be spawned over different nodes.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Multiple MPI tasks can not be launched by the MPI parallel program itself but via &#039;&#039;&#039;mpirun&#039;&#039;&#039;, e.g. 4 MPI tasks of &#039;&#039;my_par_program&#039;&#039;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -n 4 my_par_program&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Generate a script &#039;&#039;job_ompi.sh&#039;&#039; for &#039;&#039;&#039;OpenMPI&#039;&#039;&#039; containing the following lines:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
module load mpi/openmpi/&amp;lt;placeholder_for_version&amp;gt;&lt;br /&gt;
# Use when loading OpenMPI in version 1.8.x&lt;br /&gt;
mpirun --bind-to core --map-by core -report-bindings my_par_program&lt;br /&gt;
# Use when loading OpenMPI in an old version 1.6.x&lt;br /&gt;
mpirun -bind-to-core -bycore -report-bindings my_par_program&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Attention:&#039;&#039;&#039; Do &#039;&#039;&#039;NOT&#039;&#039;&#039; add mpirun options &#039;&#039;-n &amp;lt;number_of_processes&amp;gt;&#039;&#039; or any other option defining processes or nodes, since Torque instructs mpirun about number of processes and node hostnames. Use &#039;&#039;&#039;ALWAYS&#039;&#039;&#039; the MPI options &#039;&#039;&#039;&#039;&#039;--bind-to core&#039;&#039;&#039;&#039;&#039; and &#039;&#039;&#039;&#039;&#039;--map-by core|socket|node&#039;&#039;&#039;&#039;&#039; (OpenMPI version 1.8.x). Please type &#039;&#039;mpirun --help&#039;&#039; for an explanation of the meaning of the different options of mpirun option &#039;&#039;--map-by&#039;&#039;.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Considering 4 OpenMPI tasks on a single node, each requiring 1000 MByte, and running for 1 hour, execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -q short -l nodes=1:ppn=4,pmem=1000mb,walltime=01:00:00 job_ompi.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Multithreaded + MPI parallel Programs ====&lt;br /&gt;
&lt;br /&gt;
Multithreaded + MPI parallel programs operate faster than serial programs on multi CPUs with multiple cores. All threads of one process share resources such as memory. On the contrary MPI tasks do not share memory but can be spawned over different nodes.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Multiple MPI tasks using &#039;&#039;&#039;OpenMPI&#039;&#039;&#039; must be launched by the MPI parallel program &#039;&#039;&#039;mpirun&#039;&#039;&#039;. For multithreaded programs based on &#039;&#039;&#039;Open&#039;&#039;&#039; &#039;&#039;&#039;M&#039;&#039;&#039;ulti-&#039;&#039;&#039;P&#039;&#039;&#039;rocessing (OpenMP) number of threads are defined by the environment variable OMP_NUM_THREADS. By default this variable is set to 1 (OMP_NUM_THREADS=1).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;For OpenMPI&#039;&#039;&#039; a job-script to submit a batch job called &#039;&#039;job_ompi_omp.sh&#039;&#039; that runs a MPI program with 4 tasks and an fivefold threaded program &#039;&#039;ompi_omp_program&#039;&#039; requiring 6000 MByte of physical memory per process/thread (using 5 threads per MPI task you will get 5*6000 MByte = 30000 MByte per MPI task) and total wall clock time of 3 hours looks like:&lt;br /&gt;
&amp;lt;!--b)--&amp;gt;&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#PBS -l nodes=2:ppn=10&lt;br /&gt;
#PBS -l walltime=03:00:00&lt;br /&gt;
#PBS -l pmem=6000mb&lt;br /&gt;
#PBS -v MPI_MODULE=mpi/ompi&lt;br /&gt;
#PBS -v OMP_NUM_THREADS=5&lt;br /&gt;
#PBS -v MPIRUN_OPTIONS=&amp;quot;--bind-to core --map-by socket:PE=5 -report-bindings&amp;quot;&lt;br /&gt;
#PBS -v EXECUTABLE=./ompi_omp_program&lt;br /&gt;
#PBS -N test_ompi_omp&lt;br /&gt;
&lt;br /&gt;
module load ${MPI_MODULE}&lt;br /&gt;
TASK_COUNT=$((${PBS_NUM_PPN}/${OMP_NUM_THREADS}))&lt;br /&gt;
echo &amp;quot;${EXECUTABLE} running on ${PBS_NUM_PPN} cores with ${TASK_COUNT} MPI-tasks and ${OMP_NUM_THREADS} threads&amp;quot;&lt;br /&gt;
startexe=&amp;quot;mpirun -n ${TASK_COUNT} ${MPIRUN_OPTIONS} ${EXECUTABLE}&amp;quot;&lt;br /&gt;
echo $startexe&lt;br /&gt;
exec $startexe&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Execute the script &#039;&#039;&#039;job_ompi_omp.sh&#039;&#039;&#039; adding the queue class &#039;&#039;multinode&#039;&#039; to your qsub command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -q multinode job_ompi_omp.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* With the mpirun option &#039;&#039;--bind-to core&#039;&#039; MPI tasks and OpenMP threads are bound to physical cores.&lt;br /&gt;
* With the option &#039;&#039;--map-by socket:PE=&amp;lt;value&amp;gt;&#039;&#039; (neighbored) MPI tasks will be attached to different sockets and each MPI task is bound to the (in &amp;lt;value&amp;gt;) specified number of cpus. &amp;lt;value&amp;gt; must be set to ${OMP_NUM_THREADS}.&lt;br /&gt;
* Old OpenMPI version 1.6.x: With the mpirun option &#039;&#039;-bind-to-core&#039;&#039; MPI tasks and OpenMP threads are bound to physical cores.&lt;br /&gt;
* With the option &#039;&#039;-bysocket&#039;&#039; (neighbored) MPI tasks will be attached to different sockets and the option &#039;&#039;-cpus-per-proc &amp;lt;value&amp;gt;&#039;&#039; binds each MPI task to the (in &amp;lt;value&amp;gt;) specified number of cpus. &amp;lt;value&amp;gt; must be set to ${OMP_NUM_THREADS}.&lt;br /&gt;
* The option &#039;&#039;-report-bindings&#039;&#039; shows the bindings between MPI tasks and physical cores.&lt;br /&gt;
* The mpirun-options &#039;&#039;&#039;--bind-to core&#039;&#039;&#039;, &#039;&#039;&#039;--map-by socket|...|node:PE=&amp;lt;value&amp;gt;&#039;&#039;&#039; should always be used when running a multithreaded MPI program. (OpenMPI version 1.6.x: The mpirun-options  &#039;&#039;-bind-to-core&#039;&#039;&#039;, &#039;&#039;&#039;-bysocket|-bynode&#039;&#039;&#039; and &#039;&#039;&#039;-cpus-per-proc &amp;lt;value&amp;gt;&#039;&#039;&#039; should always be used when running a multithreaded MPI program.)&lt;br /&gt;
&lt;br /&gt;
=== Handling job script options and arguments ===&lt;br /&gt;
Job script options and arguments as followed:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ./job.sh -n 10&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
can not be passed while using qsub command since those will be interpreted as command line options  of &#039;&#039;job.sh&#039;&#039; &amp;lt;small&amp;gt;(like $1 = -n, $2 = 10)&amp;lt;/small&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solution A:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Submit a wrapper script, e.g. wrapper.sh:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -q singlenode wrapper.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
which simply contains all options and arguments of job.sh. The script wrapper.sh would at least contain the following lines:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
./job.sh -n 10&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Solution B:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Add after the header of your &#039;&#039;&#039;BASH&#039;&#039;&#039; script job.sh the following lines:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
## check if $SCRIPT_FLAGS is &amp;quot;set&amp;quot;&lt;br /&gt;
if [ -n &amp;quot;${SCRIPT_FLAGS}&amp;quot; ] ; then&lt;br /&gt;
   ## but if positional parameters are already present&lt;br /&gt;
   ## we are going to ignore $SCRIPT_FLAGS&lt;br /&gt;
   if [ -z &amp;quot;${*}&amp;quot;  ] ; then&lt;br /&gt;
      set -- ${SCRIPT_FLAGS}&lt;br /&gt;
   fi&lt;br /&gt;
fi&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
These lines modify your BASH script to read options and arguments from the environment variable $SCRIPT_FLAGS. Now submit your script job.sh as followed:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ qsub -q singlenode -v SCRIPT_FLAGS=&#039;-n 10&#039; job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Environment Variables ===&lt;br /&gt;
Once an eligible compute jobs starts on the compute system, PBS (our resource manager) adds the following variables to the job&#039;s environment:&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;3&amp;quot; | PBS variables&lt;br /&gt;
|-&lt;br /&gt;
! Environment variables&lt;br /&gt;
! Description&lt;br /&gt;
|-&lt;br /&gt;
| PBS_JOBID&lt;br /&gt;
| Job ID&lt;br /&gt;
|-&lt;br /&gt;
| PBS_JOBNAME&lt;br /&gt;
| Job name&lt;br /&gt;
|-&lt;br /&gt;
| PBS_NUM_NODES&lt;br /&gt;
| Number of nodes allocated to job&lt;br /&gt;
|-&lt;br /&gt;
| PBS_QUEUE&lt;br /&gt;
| Partition name the job is running in&lt;br /&gt;
|-&lt;br /&gt;
| PBS_NP&lt;br /&gt;
| Number of processors allocated to job&lt;br /&gt;
|-&lt;br /&gt;
| PBS_O_WORKDIR&lt;br /&gt;
| Directory of job submission&lt;br /&gt;
|-&lt;br /&gt;
| PBS_O_LOGNAME&lt;br /&gt;
| User name&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Interpreting PBS exit codes ===&lt;br /&gt;
* The PBS Server logs and accounting logs record the ‘exit status’ of jobs.&lt;br /&gt;
* Zero or positive exit status is the status of the top-level shell.&lt;br /&gt;
* Certain negative exit statuses are used internally and will never be reported to the user.&lt;br /&gt;
* The positive exit status values indicate which signal killed the job.&lt;br /&gt;
* Depending on the system, values greater than 128 (or on some systems 256, see wait(2) or waitpid(2) for more information) are the value of the signal that killed the job.&lt;br /&gt;
* To interpret (or ‘decode’) the signal contained in the exit status value, subtract the base value from the exit status.&amp;lt;br&amp;gt;For example, if a job had an exit status of 143, that indicates the jobs was killed via a SIGTERM (e.g. 143 - 128 = 15, signal 15 is SIGTERM).&lt;br /&gt;
==== Job termination ====&lt;br /&gt;
* The exit code from a batch job is a standard Unix termination signal.&lt;br /&gt;
* Typically, exit code 0 means successful completion.&lt;br /&gt;
* Codes 1-127 are generated from the job calling exit() with a non-zero value to indicate an error.&lt;br /&gt;
* Exit codes 129-255 represent jobs terminated by Unix signals.&lt;br /&gt;
* Each signal has a corresponding value which is indicated in the job exit code.&lt;br /&gt;
==== Job termination signals ====&lt;br /&gt;
&lt;br /&gt;
Specific job exit codes are also supplied by the underlying resource manager of the cluster&#039;s  batch system. More detailed information can be found in the corresponding documentation:&lt;br /&gt;
&lt;br /&gt;
* [http://docs.adaptivecomputing.com/torque/6-1-2/adminGuide/torque.htm#topics/torque/2-jobs/jobExitStatus.htm TORQUE exit codes]&lt;br /&gt;
&lt;br /&gt;
==== Submitting Termination Signal ====&lt;br /&gt;
Here is an example, how to &#039;save&#039; a &#039;&#039;qsub&#039;&#039; termination signal in a typical bwHPC-submit  script.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
exit_code=$?&lt;br /&gt;
echo &amp;quot;### Calling YOUR_PROGRAM command ...&amp;quot;&lt;br /&gt;
mpirun -np &#039;NUMBER_OF_CORES&#039; $YOUR_PROGRAM_BIN_DIR/runproc ... (options)  2&amp;gt;&amp;amp;1&lt;br /&gt;
[ &amp;quot;$exit_code&amp;quot; -eq 0 ] &amp;amp;&amp;amp; echo &amp;quot;all clean...&amp;quot; || \&lt;br /&gt;
   echo &amp;quot;Executable ${YOUR_PROGRAM_BIN_DIR}/runproc finished with exit code ${$exit_code}&amp;quot;&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
* Do not use &#039;&#039;&#039;&#039;time&#039;&#039;&#039;&#039; mpirun! The exit code will be the one submitted by the first (time) program and not the qsub exit code.&lt;br /&gt;
* You do not need an &#039;&#039;&#039;exit $exit_code&#039;&#039;&#039; in the scripts.&lt;br /&gt;
&lt;br /&gt;
== List your jobs and show job details : qstat ==&lt;br /&gt;
Displays information about active, eligible, blocked, and/or recently completed jobs. When used without flags, this command displays all jobs in active, idle, and non-queued states.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Show all your jobs: &amp;lt;code&amp;gt;qstat -u $USER&amp;lt;/code&amp;gt;&lt;br /&gt;
* Show details about a specific job:  &amp;lt;code&amp;gt;qstat -f JOBID &amp;lt;/code&amp;gt;&lt;br /&gt;
* For further options of &#039;&#039;qstat&#039;&#039; read the manpage of &#039;&#039;qstat&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
== Canceling own jobs : qdel ==&lt;br /&gt;
The qdel &amp;lt;JobId&amp;gt; command is used to selectively cancel the specified job(s) (active, idle, or non-queued) from the queue. &lt;br /&gt;
&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;font color=red&amp;gt;Note that only &#039;&#039;&#039;own jobs&#039;&#039;&#039; can be cancelled.&amp;lt;/font&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
This command can be run by any  Administrator and &#039;&#039;&#039;by the owner of the job&#039;&#039;&#039;.&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Flag !! Name !! Format !! Default !! Description !! Example&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -h&lt;br /&gt;
| HELP&lt;br /&gt;
|&lt;br /&gt;
| n./a.&lt;br /&gt;
| Display usage information&lt;br /&gt;
| &amp;lt;pre&amp;gt;$ canceljob -h&amp;lt;/pre&amp;gt;&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
|&lt;br /&gt;
| JOB ID&lt;br /&gt;
| &amp;lt;STRING&amp;gt;&lt;br /&gt;
| (none)&lt;br /&gt;
| a jobid, a job expression, or the keyword &#039;ALL&#039;&lt;br /&gt;
| see: [[#Example Use of Canceljob|example use of canceljob]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Example Use of qdel ===&lt;br /&gt;
Example use of qdel run on BinAC&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[...calc_repo-0]$ qsub bwhpc-fasta-example.pbs&lt;br /&gt;
8374356              # this is the JobId&lt;br /&gt;
$&lt;br /&gt;
$ qstat -f 8374356&lt;br /&gt;
(base) [tu_iioba01@login03 samtools]$ qstat -f 11870719&lt;br /&gt;
Job Id: 8374356&lt;br /&gt;
    Job_Name = bwhpc-fasta-example.pbs&lt;br /&gt;
    Job_Owner = tu_iioba01@login03&lt;br /&gt;
    resources_used.cput = 00:00:02&lt;br /&gt;
    resources_used.energy_used = 0&lt;br /&gt;
    resources_used.mem = 1580kb&lt;br /&gt;
    resources_used.vmem = 56404kb&lt;br /&gt;
    resources_used.walltime = 00:00:03&lt;br /&gt;
    job_state = R&lt;br /&gt;
[...]&lt;br /&gt;
&lt;br /&gt;
$ # now cancel the job&lt;br /&gt;
$ qdel 8374356&lt;br /&gt;
Terminated&lt;br /&gt;
&lt;br /&gt;
$ qstat -f 8374356 | grep state:&lt;br /&gt;
    job_state = C&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC/Software/Jupyterlab&amp;diff=13789</id>
		<title>BinAC/Software/Jupyterlab</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC/Software/Jupyterlab&amp;diff=13789"/>
		<updated>2025-02-03T11:27:41Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: Update, necessary for MOAB -&amp;gt; Torque transition&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Softwarepage|devel/jupyterlab}}&lt;br /&gt;
&lt;br /&gt;
{| width=700px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Description   !! Content&lt;br /&gt;
|-&lt;br /&gt;
| module load&lt;br /&gt;
| devel/jupyterlab&lt;br /&gt;
|-&lt;br /&gt;
| License&lt;br /&gt;
| [https://github.com/jupyterlab/jupyterlab/blob/main/LICENSE JupyterLab License]&lt;br /&gt;
|-&lt;br /&gt;
| Links&lt;br /&gt;
| [https://jupyter.org/ Homepage]&lt;br /&gt;
|-&lt;br /&gt;
| Graphical Interface&lt;br /&gt;
|  Yes&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Description = &lt;br /&gt;
&lt;br /&gt;
JupyterLab is a web-based interactive development environment for notebooks, code, and data.&lt;br /&gt;
&lt;br /&gt;
Currently BinAC provides the following [https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html#jupyter-minimal-notebook JupyterLab Docker images] via Apptainer:&lt;br /&gt;
&lt;br /&gt;
* minimal-notebook&lt;br /&gt;
* r-notebook&lt;br /&gt;
&lt;br /&gt;
= Usage =&lt;br /&gt;
&lt;br /&gt;
This guide is valid for for &amp;lt;code&amp;gt;minimal-notebook&amp;lt;/code&amp;gt;. You can follow the guide also for &amp;lt;code&amp;gt;r-notebook&amp;lt;/code&amp;gt;, but you have to use &amp;lt;code&amp;gt;r-notebook.pbs.template&amp;lt;/code&amp;gt; as template for your jobscript.&lt;br /&gt;
&lt;br /&gt;
== Start JupyterLab ==&lt;br /&gt;
&lt;br /&gt;
The module provides a job script for starting a JupyterLab instance on the BinAC &amp;lt;code&amp;gt;inter&amp;lt;/code&amp;gt; queue.&lt;br /&gt;
Load the module and copy the job script into your workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load devel/jupyterlab/7.2.1&lt;br /&gt;
cp $JUPYTERLAB_EXA_DIR/jupyterlab.pbs.template jupyterlab.pbs&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can adjust the following settings in the job script according to your needs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#PBS -l nodes=1:ppn=1:inter          # adjust the number of cpu cores (ppn)&lt;br /&gt;
#PBS -l mem=2gb&lt;br /&gt;
#PBS -l walltime=6:00:00&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please note the restrictions of the inter queue:&lt;br /&gt;
* max. walltime: 12 hours&lt;br /&gt;
* max. nodes: 1&lt;br /&gt;
* max. cores: 28&lt;br /&gt;
* max jobs per user: 1&lt;br /&gt;
It is important that you use the &amp;lt;code&amp;gt;inter&amp;lt;/code&amp;gt; tag for ressource allocation. Otherwise your job will be scheduled on nodes to which no ssh tunneling is possible.&lt;br /&gt;
&lt;br /&gt;
Then submit the job.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
jobid=$(qsub jupyterlab.pbs)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
  &lt;br /&gt;
== Create SSH tunnel ==&lt;br /&gt;
&lt;br /&gt;
The compute node on which JupyterLab is running is not reachable directly from your workstation.&lt;br /&gt;
Hence you have to create an SSH tunnel from your workstation to the compute node through a BinAC login node.&lt;br /&gt;
&lt;br /&gt;
The job&#039;s standard output file (&amp;lt;code&amp;gt;Jupyterlab.&amp;lt;jobid&amp;gt;&amp;lt;/code&amp;gt;) contains the SSH command for this tunnel.&lt;br /&gt;
Please note that details like IP, port number, and access URL will vary.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cat JupyterLab.o${jobid}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:Binac_jupyterlab_connection_details.png | 800px | center | JupyterLab connection info]]&lt;br /&gt;
&lt;br /&gt;
=== Linux Users ===&lt;br /&gt;
&lt;br /&gt;
Copy the &amp;lt;code&amp;gt;ssh -N -L ... &amp;lt;/code&amp;gt; command and execute it in a shell on your workstation.&lt;br /&gt;
After a successfull authentication the SSH tunnel is ready to use.&lt;br /&gt;
The ssh command does not return a result.&lt;br /&gt;
If there is no error message everything should be fine:&lt;br /&gt;
&lt;br /&gt;
[[File:Binac_jupyterlab_ssh_tunnel_linux.png | 800px | center | Creation of SSH tunnel on Linux]]&lt;br /&gt;
&lt;br /&gt;
=== Windows Users ===&lt;br /&gt;
&lt;br /&gt;
If you are using Windows you will need to create the SSH tunnel in the SSH client of your choice (e.g. MobaXTerm, PuTTY, etc.).&lt;br /&gt;
&lt;br /&gt;
==== MobaXTerm ====&lt;br /&gt;
&lt;br /&gt;
Select &amp;lt;code&amp;gt;Tunneling&amp;lt;/code&amp;gt; in the top ribbon. Then press &amp;lt;code&amp;gt;New SSH tunnel&amp;lt;/code&amp;gt;.&lt;br /&gt;
Then configure the SSH tunnel with the correct values taken the SSH tunnel infos above.&lt;br /&gt;
For the example in this tutorial it looks as follows:&lt;br /&gt;
&lt;br /&gt;
[[File:Binac_jupyterlab_mobaxterm.png | 800px | center ]]&lt;br /&gt;
&lt;br /&gt;
== Access JupyterLab ==&lt;br /&gt;
&lt;br /&gt;
JupyterLab is now running on a BinAC compute node and you created an SSH tunnel from your workstation to that compute node.&lt;br /&gt;
Open a browser and copy the URL with the access token into the address field:&lt;br /&gt;
&lt;br /&gt;
[[File:Binac_jupyterlab_browser_url.png | 800px | center ]]&lt;br /&gt;
&lt;br /&gt;
Your browser should now display the JupyterLab user interface:&lt;br /&gt;
&lt;br /&gt;
[[File:Binac_jupyterlab_browser_lab.png | 800px | center ]]&lt;br /&gt;
&lt;br /&gt;
== Access &amp;lt;code&amp;gt;/beegfs/work/&amp;lt;/code&amp;gt; in file browser ==&lt;br /&gt;
&lt;br /&gt;
Jupyterlab&#039;s root directory will be your home directory. As your home directory is backuped daily you may want to store your notebooks there.&lt;br /&gt;
&lt;br /&gt;
In order to access data in your workspace (e.g. somewhere under &amp;lt;code&amp;gt;/beegfs/work&amp;lt;/code&amp;gt;) via the file browser you will need to create a symbolic link from your home directory to you workspace:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
ln -s /beegfs/work/&amp;lt;path to your project data&amp;gt; $HOME/&amp;lt;link name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Through that link in your home directory you can move around your research data in Jupyterlab&#039;s file explorer.&lt;br /&gt;
&lt;br /&gt;
Here is an example how I linked to my directory:&lt;br /&gt;
&lt;br /&gt;
[[ File:Binac_jupyterlab_link.png | 800px | center ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Shut Down JupyterLab ==&lt;br /&gt;
&lt;br /&gt;
You can shut down JupyterLab via &amp;lt;code&amp;gt;File -&amp;gt; Shut Down&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please note that this will also terminate your compute job on the BinAC!&lt;br /&gt;
&lt;br /&gt;
[[ File:Binac_jupyterlab_browser_shutdown.png | 800px | center ]]&lt;br /&gt;
&lt;br /&gt;
= Managing Kernels =&lt;br /&gt;
&lt;br /&gt;
The kernels are stored in your Home directory on BinAC: &amp;lt;code&amp;gt;/$HOME/.local/share/jupyter/kernels/&amp;lt;/code&amp;gt;.&lt;br /&gt;
You can install new kernels from within the JupyterLab browser window, but you will have to install Miniconda beforehand.&lt;br /&gt;
With Miniconda available, open a new terminal window.&lt;br /&gt;
&lt;br /&gt;
== Add a new Kernel ==&lt;br /&gt;
&lt;br /&gt;
=== Python ===&lt;br /&gt;
&lt;br /&gt;
There is only a Python 3 kernel installed when you first start JupyterLab.&lt;br /&gt;
Because there are nearly endless combinations of Python versions and packages we encourage you to install the software yourself via Conda.&lt;br /&gt;
&lt;br /&gt;
This is an example how you create new kernels for Jupyterlab. It&#039;s so simple that three commmands suffice:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
conda create --name kernel_env python=3.8 pandas numpy matplotlib ipykernel             # 1&lt;br /&gt;
conda activate kernel_env                                                               # 2&lt;br /&gt;
python -m ipykernel install --user --name pandas --display-name=&amp;quot;Python 3.8 (pandas)&amp;quot;   # 3&lt;br /&gt;
&lt;br /&gt;
# Installed kernelspec pandas in /home/tu/tu_tu/tu_iioba01/.local/share/jupyter/kernels/pandas&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first command creates a new Conda environment called &amp;lt;code&amp;gt;kernel_env&amp;lt;/code&amp;gt; and installs a specific Python packages plus a few Python packages. It&#039;s important that you also install &amp;lt;code&amp;gt;ipykernel&amp;lt;/code&amp;gt;. We need &amp;lt;code&amp;gt;ipykernel&amp;lt;/code&amp;gt; later to create the JupyterLab kernel.&lt;br /&gt;
&lt;br /&gt;
The second command activates the &amp;lt;code&amp;gt;kernel_env&amp;lt;/code&amp;gt; Conda environment.&lt;br /&gt;
The third command creates the new JupyterLab kernel.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ls -lh $HOME/.local/share/jupyter/kernels/&lt;br /&gt;
total 0&lt;br /&gt;
drwxr-xr-x 2 tu_iioba01 tu_tu 109 Jul 26 10:38 pandas&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[ File:Binac_jupyterlab_new_kernel.png | 800px | center ]]&lt;br /&gt;
&lt;br /&gt;
=== R ===&lt;br /&gt;
&lt;br /&gt;
The instructions for new R-Kernels are a bit different.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
conda config --add channels r&lt;br /&gt;
conda create --name r_kernel_env r-base=4.4.1 r-irkernel&lt;br /&gt;
conda activate r_kernel_env &lt;br /&gt;
R&lt;br /&gt;
# In the R-Session:&lt;br /&gt;
install.packages(...)&lt;br /&gt;
IRkernel::installspec(name = &#039;ir44&#039;, displayname = &#039;R 4.4.1&#039;)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The first command creates a new Conda environment called &amp;lt;code&amp;gt;r_kernel_env&amp;lt;/code&amp;gt; and installs a specific R version. It&#039;s important that you also install &amp;lt;code&amp;gt;r-irkernel&amp;lt;/code&amp;gt;. We need &amp;lt;code&amp;gt;r-irkernel&amp;lt;/code&amp;gt; later to create the JupyterLab kernel.&lt;br /&gt;
The second command activates the &amp;lt;code&amp;gt;r_kernel_env&amp;lt;/code&amp;gt; Conda environment and open an R session. In this session you can install whatever R-package you need in your kernel.&lt;br /&gt;
Last, create the new kernel with the &amp;lt;code&amp;gt;installspec&amp;lt;/code&amp;gt; command.&lt;br /&gt;
&lt;br /&gt;
== Remove a Kernel ==&lt;br /&gt;
&lt;br /&gt;
In order to remove a kernel from Jupyterlab, simply remove the corresponding directory in &amp;lt;code&amp;gt;/$HOME/.local/share/jupyter/kernels/&amp;lt;/code&amp;gt;:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
# Remove the JupyterLab kernel installed in the previous example&lt;br /&gt;
rm -rf /$HOME/.local/share/jupyter/kernels/pandas&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also remove the corresponding Conda environment if you don&#039;t need it any more:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
conda env remove --name kernel_env&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=13705</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=13705"/>
		<updated>2025-01-27T09:31:25Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Astrophysics, and Geosciences.&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.4&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and two types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 14 SMP node&lt;br /&gt;
* 32 GPU nodes (A30)&lt;br /&gt;
* 8 GPU nodes (A100)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 180 &lt;br /&gt;
| 14&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/cpu/amd-epyc-7543 AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/cpu/amd-epyc-7443 AMD EPYC Milan 7443]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/cpu/amd-epyc-7543 AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/cpu/amd-epyc-7543 AMD EPYC Milan 7543]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.85&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR IB (80 nodes) / 100GbE&lt;br /&gt;
| HDR&lt;br /&gt;
| HDR&lt;br /&gt;
| HDR&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
= Storage =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 512 GB per node; 1920 GB on high-mem nodes&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
Each compute project has its own project directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ls -lh /pfs/10/project/&lt;br /&gt;
drwxrwx---. 2 root bw16f003 33K Dec 12 16:46 bw16f003&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As you can see the directory is owned by a group representing your compute project (here bw16f003) and the directory is accessible by all group members. It is upon your group to decide how to use the space inside this directory: shared data folders, personal directories for each project member, software containers, etc.&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
=== Work ===&lt;br /&gt;
&lt;br /&gt;
The data at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs. The primary focus is speed, not capacity.&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user can create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=13704</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=13704"/>
		<updated>2025-01-27T09:29:20Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Astrophysics, and Geosciences.&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.4&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and two types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 14 SMP node&lt;br /&gt;
* 32 GPU nodes (A30)&lt;br /&gt;
* 8 GPU nodes (A100)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 180 &lt;br /&gt;
| 14&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/cpu/amd-epyc-7543 AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/cpu/amd-epyc-7443 AMD EPYC Milan 7443]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/cpu/amd-epyc-7543 AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/cpu/amd-epyc-7543 AMD EPYC Milan 7543]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Base Frequency (GHz)&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.85&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Physical Cores / Hypertreads&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 48 / 96&lt;br /&gt;
| 64 / 128&lt;br /&gt;
| 64 / 128&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR IB (80 nodes) / 100GbE&lt;br /&gt;
| HDR&lt;br /&gt;
| HDR&lt;br /&gt;
| HDR&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 80 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
= Storage =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 512 GB per node; 1920 GB on high-mem nodes&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
Each compute project has its own project directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ls -lh /pfs/10/project/&lt;br /&gt;
drwxrwx---. 2 root bw16f003 33K Dec 12 16:46 bw16f003&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As you can see the directory is owned by a group representing your compute project (here bw16f003) and the directory is accessible by all group members. It is upon your group to decide how to use the space inside this directory: shared data folders, personal directories for each project member, software containers, etc.&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
=== Work ===&lt;br /&gt;
&lt;br /&gt;
The data at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs. The primary focus is speed, not capacity.&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user can create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=13703</id>
		<title>BinAC2/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BinAC2/Hardware_and_Architecture&amp;diff=13703"/>
		<updated>2025-01-27T09:25:11Z</updated>

		<summary type="html">&lt;p&gt;S Behnle: Fixed BinAC2 scratch disk sized&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Hardware and Architecture =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Astrophysics, and Geosciences.&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: Rocky Linux 9.4&lt;br /&gt;
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)&lt;br /&gt;
* (Scientific) Libraries and Software: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Compute Nodes ==&lt;br /&gt;
&lt;br /&gt;
BinAC 2 offers compute nodes, high-mem nodes, and two types of GPU nodes.&lt;br /&gt;
* 180 compute nodes&lt;br /&gt;
* 14 SMP node&lt;br /&gt;
* 32 GPU nodes (A30)&lt;br /&gt;
* 8 GPU nodes (A100)&lt;br /&gt;
* plus several special purpose nodes for login, interactive jobs, etc.&lt;br /&gt;
&lt;br /&gt;
Compute node specification:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| High-Mem&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A30)&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU (A100)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 180 &lt;br /&gt;
| 14&lt;br /&gt;
| 32&lt;br /&gt;
| 8&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processors&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/cpu/amd-epyc-7543 AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/cpu/amd-epyc-7443 AMD EPYC Milan 7443]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/cpu/amd-epyc-7543 AMD EPYC Milan 7543]&lt;br /&gt;
| 2 x [https://www.amd.com/de/products/cpu/amd-epyc-7543 AMD EPYC Milan 7543]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Processor Frequency (GHz)&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.85&lt;br /&gt;
| 2.80&lt;br /&gt;
| 2.80&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Number of Cores&lt;br /&gt;
| 64&lt;br /&gt;
| 48&lt;br /&gt;
| 64&lt;br /&gt;
| 64&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Working Memory (GB)&lt;br /&gt;
| 512&lt;br /&gt;
| 2048&lt;br /&gt;
| 512&lt;br /&gt;
| 512&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local Disk (GB)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
| 450 (NVMe-SSD)&lt;br /&gt;
| 14000 (NVMe-SSD)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
| HDR IB (80 nodes) / 100GbE&lt;br /&gt;
| HDR&lt;br /&gt;
| HDR&lt;br /&gt;
| HDR&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Coprocessors&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink]&lt;br /&gt;
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Network =&lt;br /&gt;
&lt;br /&gt;
The compute nodes and the parallel file system are connected via 100GbE ethernet&amp;lt;/br&amp;gt;&lt;br /&gt;
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 80 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with &amp;lt;code&amp;gt;--constraint=ib&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
= Storage =&lt;br /&gt;
&lt;br /&gt;
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user&#039;s home directory $HOME and one serving as a project/work space.&lt;br /&gt;
The home directory is limited in space and parallel access but offers snapshots of your files and backup.&lt;br /&gt;
&lt;br /&gt;
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at &amp;lt;code&amp;gt;/pfs/10&amp;lt;/code&amp;gt; on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; that is accessible for all members of the compute project.&lt;br /&gt;
Each user can create workspaces under &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; using the workspace tools. These directories are only accessible for the user who created the workspace.&lt;br /&gt;
&lt;br /&gt;
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| project&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| work&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global &lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| permanent&lt;br /&gt;
| work space lifetime (max. 30 days, max. 5 extensions)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Capacity&lt;br /&gt;
| -&lt;br /&gt;
| 8.1 PB&lt;br /&gt;
| 1000 TB&lt;br /&gt;
| 512 GB per node; 1920 GB on high-mem nodes&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Speed&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
| ...&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 40 GB per user&lt;br /&gt;
| not yet, maybe in the future&lt;br /&gt;
| none&lt;br /&gt;
| none&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : all nodes access the same file system&lt;br /&gt;
  local              : each node has its own file system&lt;br /&gt;
  permanent          : files are stored permanently&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
=== Home ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
Because the backup space is limited we enforce a quota of 40GB on the home directories.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;NOTE:&#039;&#039;&#039;&lt;br /&gt;
Compute jobs on nodes must not write temporary data to $HOME.&lt;br /&gt;
Instead they should use the local $TMPDIR directory for I/O-heavy use cases&lt;br /&gt;
and work spaces for less I/O intense multinode-jobs.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the &#039;&#039;&#039;diskusage&#039;&#039;&#039; command: &lt;br /&gt;
 $ diskusage&lt;br /&gt;
 &lt;br /&gt;
 User           	   Used (GB)	  Quota (GB)	Used (%)&lt;br /&gt;
 ------------------------------------------------------------------------&lt;br /&gt;
 &amp;lt;username&amp;gt;                4.38               100.00             4.38&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
=== Project ===&lt;br /&gt;
&lt;br /&gt;
Each compute project has its own project directory at &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ls -lh /pfs/10/project/&lt;br /&gt;
drwxrwx---. 2 root bw16f003 33K Dec 12 16:46 bw16f003&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
As you can see the directory is owned by a group representing your compute project (here bw16f003) and the directory is accessible by all group members. It is upon your group to decide how to use the space inside this directory: shared data folders, personal directories for each project member, software containers, etc.&lt;br /&gt;
&lt;br /&gt;
The data is stored on HDDs. The primary focus of &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; is pure capacity, not speed.&lt;br /&gt;
&lt;br /&gt;
=== Work ===&lt;br /&gt;
&lt;br /&gt;
The data at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; is stored on SSDs. The primary focus is speed, not capacity.&lt;br /&gt;
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.&lt;br /&gt;
We ask you to only store data you actively use for computations on &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt;.&lt;br /&gt;
Please move data to &amp;lt;code&amp;gt;/pfs/10/project&amp;lt;/code&amp;gt; when you don&#039;t need it on the fast storage any more.&lt;br /&gt;
&lt;br /&gt;
Each user can create workspaces at &amp;lt;code&amp;gt;/pfs/10/work&amp;lt;/code&amp;gt; through the workspace tools&lt;br /&gt;
To create a work space you&#039;ll need to supply a name for your work space area and a lifetime in days.&lt;br /&gt;
For more information read the corresponding help, e.g: &amp;lt;code&amp;gt;ws_allocate -h.&amp;lt;/code&amp;gt;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;mywork&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_allocate myotherwork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Allocate a work space named &amp;quot;myotherwork&amp;quot; with maximum lifetime.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_list -a&amp;lt;/code&amp;gt;&lt;br /&gt;
|List all your work spaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_find mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Get absolute path of work space &amp;quot;mywork&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_extend mywork 30&amp;lt;/code&amp;gt;&lt;br /&gt;
|Extend life me of work space mywork by 30 days from now.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;code&amp;gt;ws_release mywork&amp;lt;/code&amp;gt;&lt;br /&gt;
|Manually erase your work space &amp;quot;mywork&amp;quot;. Please remove directory content first.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Scratch ===&lt;br /&gt;
&lt;br /&gt;
Please use the fast local scratch space for storing temporary data during your jobs.&lt;br /&gt;
&lt;br /&gt;
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable &amp;lt;code&amp;gt;$TMPDIR&amp;lt;/code&amp;gt;, which points to &amp;lt;code&amp;gt;/scratch/&amp;lt;jobID&amp;gt;&amp;lt;/code&amp;gt;.&lt;/div&gt;</summary>
		<author><name>S Behnle</name></author>
	</entry>
</feed>