bwHPC Wiki - User contributions [en]

DACHS/Jupyter

2026-04-20T10:04:20Z

R Keller: /* Software Stacks */

= Short description of Jupyter =
Jupyter (an acronym for ''Ju''lia, ''Py''thon and ''R'') is a web application, allowing interactive programming and visualization in a browser. Jupyter uses so-called Jupyter-Notebooks to load and store the program, input data and it's output (including visualization) in a JSON-based file, allowing exchange between different implementations (like Visual Studio Code plus a Jupyter Extension) and specifically allowing incrementally editing using a version-control system like git.

We provide [https://jupyter.org/hub JupyterHub] at [https://dachs-jupyter.hs-esslingen.de https://dachs-jupyter.hs-esslingen.de/] as described below.

= Access requirements =

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
Access to Jupyter is '''limited to IP addresses from the BelWü network'''.
All partners of DACHS are connected to BelWü, so if you are on your campus network (e.g. in your office or on the Campus WiFi) you should be able to connect without restrictions. Otherwise, You will see a hint about having to connect using VPN to Your home institution (and make sure, that all packets are routed through your home institution's VPN, and not in SPLIT tunneling mode, see [[DACHS/Login|Login]] page.
|}

== Prerequisites ==
# Register as described on the [[DACHS/Registration|Registration]]
# Then [[DACHS/Login|login to DACHS]] via SSH '''at least once''': This ensures that your home directory is setup properly on DACHS
# Make sure you're '''connected to the VPN of your university''' (see above)

= Login Process =
After having logged into
https://dachs-jupyter.hs-esslingen.de
You will need to specify the resources, as described in the next section.

= Logout Process =

Your Jupyter Notebook ist started as any other Slurm job.
If you just close your web browser, the Slurm job will continue to run in the background until it hits the time limit.
Therefore, you should explicitly stop your server or log out from JupyterHub so that
* your Slurm job is stopped, and your allocated resources do not count towards your personal and organizational usage quota.
* resources are returned so that they can be used by others.

To do this, got to '''File > Hub Control Panel''' and click '''Stop My Server'''.
Alternatively, you can also log out via '''File > Log Out'''. When you log out, your Jupyter Notebook is also stored.

= Selection of Compute Resources =

After you logged in via bwIDM, you can select the resources for your Jupyter Notebook.

Preselected are the following values, but you can adjust them as needed:
* 1 CPU core
* 16 GB of RAM
* No GPU
* Job runtime of 30 minutes
* Load the <code>jupyter/ai</code> module (a virtual environment containing libraries typically needed for machine learning)
* Use the default JupyterHub reservation. From Monday to Friday, from 08:00 to 20:00 o'clock there are four nodes reserved for interactive JupyterHub use. You can still try to start a Juptyer Notebook outside of this time window, however, this job requests is treated as any other job, meaning if the cluster has no available resources the job cannot be started.

When you adjusted your resources, press start and the Notebook is started for you.

[[Image:20260324-dachs-jupyter-resource-selection.png|500px]]

 
If you want to use a custom reservation, for example a reservation you previously requested, then un-check the box after '''Use default reservation''' and enter your custom reservation name and the token that you received from us.
You can also try to start the Jupyter Notebook without a custom reservation (and without the default reservation)
This job is treated as any other by Slurm, meaning it is scheduled if there are available resources.

[[Image:20260324-dachs-jupyter-custom-reservation.png|500px]]

= Software Stacks =

We provide two Python environments:

* <code>jupyter/minimal</code>: This Python environment contains just the basic packages to run the Jupyter notebook. You can use this Python environment as base for your own environment, or you can load an environment that you completely build yourself.
* <code>jupyter/ai</code> ('''default'''): This Python environment is pre-selected in the resource selection dialog. Additional to the minimal environment, it contains <code>plotly, scikit-learn, seaborn, tensorflow, torchvision</code>.

You can also build and use your own environment in a Jupyter notebook.
This is a two step process:
# First you create the Python environment with your required dependencies.
# Then you make this environment available as kernel for the Jupyter notebook that you can select.
Take a look at the [[BwUniCluster3.0/Jupyter#Usage_of_virtual_environment_in_JupyterLab|bwUnicluster documentation]].

= R statistical Software =
In order to start using the statistical software R in Your Jupyter Notebooks do the following steps
# In the JupyterHub Tab "Software Modules" load the <code>math/R</code> module
# In the Launcher Tab, You should see the R logo.
# Otherwise start a new Terminal, and use the following command:
USERNAME@login:~> R # hit enter to start R
>>> install.packages('IRkernel')
>>> IRkernel::installspec()

= Request a Reservation for Your Lecture =

Requesting a unique reservation for JupyterHub has the advantage that the requested resources will be available to you for sure at the time you need them.
Furthermore, it's easy:

1. Send an [mailto:dachs-admin@hs-esslingen?Subject=DACHS%20JupyterHub%20Reservation%20Request email] with the following information:
* Start time and duration
* How many nodes do you request? Generally, we assign nodes from the [[ DACHS/Hardware | <code>gpu1</code> partition ]] to JupyterHub jobs.
* Do you request a one-time or periodic reservation?
* The usernames that should have access to the reservation. This is at least your username. (For example <code>es_username</code>). If you want the students of your lecture to have access as well, export a list of their usernames and send these as well. We need to know the usernames to assign the correct entitlement to access the resource.

2. Wait for a reply. You'll get a ''reservation name'' and a ''reservation token'' that must be entered in the [[#Selection_of_Compute_Resources|resource selection]] dialog of JupyterHub.

DACHS/Jupyter

2025-11-25T16:06:18Z

R Keller:

DACHS/Jupyter

2025-11-25T09:59:12Z

R Keller:

DACHS

2025-11-24T08:34:21Z

R Keller:

{| style="width: 100%; border-spacing: 5px;"
| style="text-align:center; color:#000;vertical-align:middle;font-size:75%;" |
[[File:DACHS_Logo.png|center|border|250px||]]
|-
|
|}

The Datenanalyse Cluster der Hochschulen (DACHS) supports data scientists, machine learning experts and in general engineers for research and education, specifically teaching of the participating Universities of Applied Science.

You can always reach us via Email at [mailto:dachs-admin@hs-esslingen.de dachs-admin@hs-esslingen.de].

{| style=" background:#eeeefe; width:100%;"
| style="padding:8px; background:#dedefe; font-size:120%; font-weight:bold; text-align:left" | Training & Support
|-
|
* [[DACHS/Getting_Started|Getting Started]]

* [https://training.bwhpc.de E-Learning Courses]
* [[DACHS/Support|Contact and Support]]
* Send [[:Category:Feedback|Feedback]] about Wiki pages
|}

{| style="background:#deffee; width:100%;"
| style="padding:8px; background:#cef2e0; font-size:120%; font-weight:bold; text-align:left" | User Documentation
|-
|
* [[DACHS/Registration|Registration]]
* [[DACHS/Login|Login]]
* [[DACHS/Hardware_and_Architecture|Hardware and Architecture]]
* Usage of [[DACHS/Software|Software]] on DACHS
* Running Jobs
** [[BwUniCluster2.0/Slurm|Batch System]] (page of bwUniCluster2.0)
** [[DACHS/Queues|Queues]]
** [[DACHS/Jupyter|Interactive Computing with Jupyter]]

|}
{| style=" background:#e6e9eb; width:100%;"
| style="padding:8px; background:#d1dadf; font-size:120%; font-weight:bold; text-align:left" | Cluster Funding
|-
|
* Please [[DACHS/Acknowledgement|acknowledge]] the cluster in your publications.
|}

DACHS/Queues

2025-08-12T09:53:41Z

R Keller:

__TOC__

== Partitions ==
DACHS offers three partitions in Slurm, which map directly to the node types: nodes with one NVIDIA L40S GPU, a node with 4 AMD MI300A APUs and the node with 8 NVIDIA H100 GPUs.

== sinfo_t_idle ==
To see the available nodes, DACHS offers the tool ''sinfo_t_info'', which any user may call.

== sbatch -p ''partition'' ==
Batch jobs specify compute requirements, which must fit the resources as in maximum (wall-)time, memory and GPU resources.
If You require a GPU, You must specify this with your request.
These are restricted and must fit the available '''partitions'''.
Since requested compute resources are NOT always automatically mapped to the correct queue class, '''you must add the correct queue class to your sbatch command '''.
As with bwUniCluster, the specification of a partition is required.
 
Details are:

{| width=750px class="wikitable"
! colspan="5" | DACHS sbatch -p ''partition''
|- style="text-align:left;"
! partition !! node !! default resources !! maximum resources
|- style="text-align:left"
| gpu1
| gpu1[01-45]
| time=30, mem-per-node=5000mb
| time=72:00:00, nodes=16, mem-per-node=300000mb, res=gpu:1
|- style="text-align:left;"
| gpu4
| gpu401
| time=30, mem-per-cpu=5000mb
| time=72:00:00, nodes=1, mem=500000mb, ntasks-per-node=96
|- style="vertical-align:top; text-align:left"
| gpu8
| gpu801
| time=30, mem-per-cpu=5000mb, cpus-per-gpu=8
| time=48:00:00, mem=752000mb, ntasks-per-node=96
|-
|}

Default resources of a queue class defines time, #tasks and memory if not explicitly given with sbatch command. Resource list acronyms <code>--time</code>, <code>--ntasks</code>, <code>--nodes</code>, <code>--mem</code> and <code>--mem-per-cpu</code>.

A typical Slurm batch script (called for brevity <code>python_run.slurm</code>) for 1-node requiring one NVIDIA L40S GPU:
#!/bin/bash
#SBATCH --partition=gpu1
#SBATCH --ntasks-per-gpu=48
#SBATCH --gres=gpu:1
#SBATCH --time=1:00:00
#SBATCH --mail-type=all
#SBATCH --mail-user=my_email@hs-esslingen.de
module load devel/cuda/12.4
cd $TMPDIR
python3 -m venv my_environment
. my_environment/bin/activate
python3 -m pip install -r $HOME/my_requirements.txt
rsync -avz $HOME/my_data_dir/ .
time python3 $HOME/python_script.py

Submitting <code>sbatch python_run.slurm</code> will allocate one compute node and allocate the one available GPU for 1 hour. Furthermore, this will load the CUDA module version 12.4. It will then change to the '''fast''' scratch directory specified in the environment variable <code>TMPDIR</code>.
You '''have''' to allocate the GPU, otherwise You may not use it.
It will then follow Python's best practices and create a new Virtual Environment in that directory, then installing the dependencies of the projects detailed in <code>my_requirements.txt</code>
It then copies the data directory in <code>my_data_dir</code> to this directory using <code>rsync</code>.
Finally, it executes your main python script, using the time command to figure out, how much time actually was used.
Alternatively you may time all the commands to get an estimate for Your next batch job.

Here, Slurm will email to the specified address upon start and completion of the job with a summary.

The '''better''' your approximation, the better the Slurm scheduler may allocate resources to all users.

== Interactive usage ==
To '''get a good estimation''' of runtime, You may first want to try the resource ''interactively'':
srun --partition=gpu1 --ntasks-per-gpu=48 --gres=gpu:1 --pty /bin/bash

Then You may execute the steps in <code>python_run.slurm</code> script interactively, noting differences and amend your Slurm batch script.
''Please note'' the <code>--pty</code> which forwards the standard output and takes standard input to allow working with the Shell.

== Multiple nodes ==
Of course You may allocate multiple GPUs across nodes running:
sbatch --nodes 4 ./python_run.slurm
Please be aware, that TMPDIR is still local. For the time being run from Your $HOME or better yet from an allocated [[Workspace]].

== Nodes with multiple GPUs ==
The partitions <code>gpu4</code> and <code>gpu8</code> feature multiple GPUs.
The <code>gpu4</code> partition contains the node <code>gpu401</code> featuring 4x AMD MI300A APUs each with 128GB of fast HMB3e memory shared between the 24 cores and the GPU.
You may use AMD's ROCm employing HIP, OpenACC or OpenCL to parallelize for the GPU. Please refer to the documentation on this node.

The <code>gpu8</code> partition contains the node <code>gpu801</code> featuring 8x NVIDIA H100 offering 80GB of VRAM each, interconnected using SXM5.
Please refer to the documentation on this node.

DACHS/Acknowledgement

2025-07-02T13:56:36Z

R Keller: Citation of Cluster in publications

====== Acknowledgements ======
Remember to acknowledge our resources in your publications!

Such recognition is important for acquiring funding for the next generation of hardware, support services, data storage, and infrastructure.
In publications, please **cite the usage** of the DACHS Cluster using a text like:

"We thank the DACHS data analysis cluster, hosted at Hochschule Esslingen and co-funded by the MWK within the DFG's "Großgeräte der Länder" program, for providing the computational resources necessary for this research."

DACHS/Hardware

2025-05-20T07:57:12Z

R Keller: /* Components of DACHS */

= Architecture of DACHS =
The Datenanalyse Cluster der Hochschulen (DACHS) is a parallel computer with distributed memory connected over Infiniband and Ethernet. The compute nodes contain at least dual AMD processors, at least 384GB of local memory, 2 TB local NVMe-based disc storage and accelerators as shown in the table below. With BeeGFS a fast and scalable filesystem is provided via Infiniband to all login and compute nodes

The Operating System is Rocky-Linux 9.4 (which is based on RHEL).
The setup is kept in-line (with regard to Software, Setup and general usage) and thus mostly equivalent to bwHPC and bwUniCluster in particular.

= Components of DACHS =

{| class="wikitable"
|-
! style="width:9%"|
! style="width:13%"| Compute nodes "L40S"
! style="width:13%"| Compute nodes "H100"
! style="width:13%"| Compute nodes "AMD_APU"
! style="width:13%"| Login
|-
!scope="column"| Availability in Queue
| <code>gpu1</code>
| <code>gpu8</code>
| <code>gpu4</code>
|
|-
!scope="column"| Number of nodes
| 45
| 1
| 1
| 2
|-
!scope="column"| Processors
| AMD EPYC 9254
| AMD EPYC 9454
| AMD MI300A
| AMD EPYC 9254
|-
!scope="column"| Number of sockets
| 2
| 2
| 4
| 2
|-
!scope="column"| Processor frequency (GHz)
| 2.9 Ghz
| 2.75 Ghz
| 2.1 Ghz
| 2.9 Ghz
|-
!scope="column"| Total number of cores
| 48
| 96
| 96
| 48
|-
!scope="column"| Main memory
| 384 GB
| 1536 GB
| 512 GB
| 384 GB
|-
!scope="column"| Local SSD
| 1,92 TB NVMe
| 1,92 TB NVMe
| 1,92 TB NVMe
| 1,92 TB NVMe
|-
!scope="column"| Accelerators
| 1x NVIDIA L40S
| 8x NVIDIA H100
| 4x AMD MI300A
| -
|-
!scope="column"| Accelerator memory
| 48 GB
| 8x 80 GB
| 4x 128 GB
| -
|-
!scope="column"| Interconnect
| IB HDR100
| IB HDR100
| IB HDR100
| IB HDR100
|}
Table 1: Properties of the nodes

== Storage Architecture ==
The system features a 700 TB large BeeGFS filesystem available on login and compute nodes.
Please note: there is a hard file size quota per partner organization and a soft quota per user on Your HOME.
Users will be notified by E-Mail if the quota is to be reached.

Please '''do make usage''' of [[Workspace | Work Space mechanism]] for larger files.

DACHS/Hardware

2025-05-20T07:53:36Z

R Keller: /* Components of DACHS */

= Architecture of DACHS =
The Datenanalyse Cluster der Hochschulen (DACHS) is a parallel computer with distributed memory connected over Infiniband and Ethernet. The compute nodes contain at least dual AMD processors, at least 384GB of local memory, 2 TB local NVMe-based disc storage and accelerators as shown in the table below. With BeeGFS a fast and scalable filesystem is provided via Infiniband to all login and compute nodes

The Operating System is Rocky-Linux 9.4 (which is based on RHEL).
The setup is kept in-line (with regard to Software, Setup and general usage) and thus mostly equivalent to bwHPC and bwUniCluster in particular.

= Components of DACHS =

{| class="wikitable"
|-
! style="width:9%"|
! style="width:13%"| Compute nodes "L40S"
! style="width:13%"| Compute nodes "H100"
! style="width:13%"| Compute nodes "AMD_APU"
! style="width:13%"| Login
|-
!scope="column"| Availability in Queue
| gpu1
| gpu8
| gpu401
|
|-
!scope="column"| Number of nodes
| 45
| 1
| 1
| 2
|-
!scope="column"| Processors
| AMD EPYC 9254
| AMD EPYC 9454
| AMD MI300A
| AMD EPYC 9254
|-
!scope="column"| Number of sockets
| 2
| 2
| 4
| 2
|-
!scope="column"| Processor frequency (GHz)
| 2.9 Ghz
| 2.75 Ghz
| 2.1 Ghz
| 2.9 Ghz
|-
!scope="column"| Total number of cores
| 48
| 96
| 96
| 48
|-
!scope="column"| Main memory
| 384 GB
| 1536 GB
| 512 GB
| 384 GB
|-
!scope="column"| Local SSD
| 1,92 TB NVMe
| 1,92 TB NVMe
| 1,92 TB NVMe
| 1,92 TB NVMe
|-
!scope="column"| Accelerators
| 1x NVIDIA L40S
| 8x NVIDIA H100
| 4x AMD MI300A
| -
|-
!scope="column"| Accelerator memory
| 48 GB
| 8x 80 GB
| 4x 128 GB
| -
|-
!scope="column"| Interconnect
| IB HDR100
| IB HDR100
| IB HDR100
| IB HDR100
|}
Table 1: Properties of the nodes

== Storage Architecture ==
The system features a 700 TB large BeeGFS filesystem available on login and compute nodes.
Please note: there is a hard file size quota per partner organization and a soft quota per user on Your HOME.
Users will be notified by E-Mail if the quota is to be reached.

Please '''do make usage''' of [[Workspace | Work Space mechanism]] for larger files.

Registration/bwUniCluster/Entitlement

2025-05-19T04:58:15Z

R Keller: Update web-link to HS Esslingen

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
The bwUniCluster entitlement (see [https://www.bwidm.de/attribute.php#Berechtigung eduPersonEntitlement]) issued by a university assures the operator of the clusters, that its university member's compute activities comply with the German Foreign Trade Act (Außenwirtschaftsgesetz - AWG) and German Foreign Trade Regulations (Außenwirtschaftsverordnung - AWV). ''Please check'' the regulations at the Federal Office of Economics and Export Control (BAFA) under [https://www.bafa.de/DE/Aussenwirtschaft/Ausfuhrkontrolle/Allgemeine_Einfuehrung/allgemeine_einfuehrung_node.html BAFA Aussenwirtschaft Ausfuhrkontrolle]
|}

= Step A: bwUniCluster Entitlement =

To register for the bwUniCluster 3.0 you need the '''bwUniCluster Entitlement''' issued by your university.

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
The entitlement is called '''bwUniCluster''' (and not bwUniCluster 3.0) and each university assigns the entitlement '''only''' for its own members.
|}

If you are not sure if you already have an entitlement, please check it first with the [[Registration/bwUniCluster/Entitlement#Check_your_Entitlements|'''Check your Entitlements''']] guide below.
If you need the entitlement, please follow the link for your institution or contact your local service desk if no information is provided:
* [https://www.hs-esslingen.de/informatik-und-informationstechnik/forschung-labore/forschung/laufende-projekte/bwhpc-s5 Hochschule Esslingen]
* [[BwCluster_User_Access_Uni_Freiburg|Universität Freiburg]]
* [https://heiservices.uni-heidelberg.de/entitlement Universität Heidelberg] (access only within Uni Heidelberg network)
* [https://kim.uni-hohenheim.de/bwhpc-account Universität Hohenheim]
* [https://www.scc.kit.edu/downloads/ISM/SD-HPC-Formulare/Accessform_bwUniCluster3_v1_DE_EN_2025.pdf Karlsruhe Institute of Technology (KIT)]
* [https://www.kim.uni-konstanz.de/en/services/research-and-teaching/high-performance-computing/access-to-bwunicluster Universität Konstanz]
* [[BWUniCluster_User_Access_Members_Uni_Mannheim|Universität Mannheim]]
* [https://www.hlrs.de/apply-for-computing-time/bw-uni-cluster Universität Stuttgart]
* [https://uni-tuebingen.de/de/155157 Universität Tübingen]
* [[BWUniCluster_User_Access_Members_Uni_Ulm|Universität Ulm]]
* [[Registration/HAW|HAW BW e.V.]] and Duale Hochschule Baden-Württemberg: Please contact your local service desk / compute center in case of question contact [mailto:hpc-at-haw@hs-esslingen.de mailto:hpc-at-haw@hs-esslingen.de]

== Check your Entitlements ==

To make sure you do not already have the entitlement, please log in to '''https://login.bwidm.de/user/index.xhtml'''.
To see the list of your entitlements, first select the '''Shibboleth''' tab at the top.
If the list below <code><nowiki>urn:oid:1.3.6.1.4.1.5923.1.1.1.7</nowiki></code> contains
<pre>http://bwidm.de/entitlement/bwUniCluster</pre>
you already have the entitlement and can skip step A.
{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
<code><nowiki>http://bwidm.de/entitlement/bwUniCluster</nowiki></code> is an attribute and not a link!
See [https://www.bwidm.de/dienste.php bwUniCluster und bwForCluster] for more information about needed attributes for this service.
|}
[[File:BwIDM-idp.png|center|600px|thumb|Verify Entitlement.]]

----

[[Registration/bwUniCluster/Service | Go to step B]]

DACHS/Queues

2025-05-14T08:46:29Z

R Keller: /* sbatch -p partition */

DACHS/Queues

2025-05-14T08:45:55Z

R Keller: /* Multiple nodes */

__TOC__

== Partitions ==
DACHS offers three partitions in Slurm, which map directly to the node types: nodes with one NVIDIA L40S GPU, a node with 4 AMD MI300A APUs and the node with 8 NVIDIA H100 GPUs.

== sinfo_t_idle ==
To see the available nodes, DACHS offers the tool ''sinfo_t_info'', which any user may call.

== sbatch -p ''partition'' ==
Batch jobs specify compute requirements, which must fit the resources as in maximum (wall-)time, memory and GPU resources.
If You require a GPU, You must specify this with your request.
These are restricted and must fit the available '''partitions'''.
Since requested compute resources are NOT always automatically mapped to the correct queue class, '''you must add the correct queue class to your sbatch command '''.
As with BwUniCluster 2.0, the specification of a partition is required.
 
Details are:

{| width=750px class="wikitable"
! colspan="5" | DACHS sbatch -p ''partition''
|- style="text-align:left;"
! partition !! node !! default resources !! maximum resources
|- style="text-align:left"
| gpu1
| gpu1[01-45]
| time=30, mem-per-node=5000mb
| time=72:00:00, nodes=16, mem-per-node=300000mb, res=gpu:1
|- style="text-align:left;"
| gpu4
| gpu401
| time=30, mem-per-cpu=5000mb
| time=72:00:00, nodes=1, mem=500000mb, ntasks-per-node=96
|- style="vertical-align:top; text-align:left"
| gpu8
| gpu801
| time=30, mem-per-cpu=5000mb, cpus-per-gpu=8
| time=48:00:00, mem=752000mb, ntasks-per-node=96
|-
|}

Default resources of a queue class defines time, #tasks and memory if not explicitly given with sbatch command. Resource list acronyms <code>--time</code>, <code>--ntasks</code>, <code>--nodes</code>, <code>--mem</code> and <code>--mem-per-cpu</code>.

A typical Slurm batch script (called for brevity <code>python_run.slurm</code>) for 1-node requiring one NVIDIA L40S GPU:
#!/bin/bash
#SBATCH --partition=gpu1
#SBATCH --ntasks-per-gpu=48
#SBATCH --gres=gpu:1
#SBATCH --time=1:00:00
#SBATCH --mail-type=all
#SBATCH --mail-user=my_email@hs-esslingen.de
module load devel/cuda/12.4
cd $TMPDIR
python3 -m venv my_environment
. my_environment/bin/activate
python3 -m pip install -r $HOME/my_requirements.txt
rsync -avz $HOME/my_data_dir/ .
time python3 $HOME/python_script.py

Submitting <code>sbatch python_run.slurm</code> will allocate one compute node and allocate the one available GPU for 1 hour. Furthermore, this will load the CUDA module version 12.4. It will then change to the '''fast''' scratch directory specified in the environment variable <code>TMPDIR</code>.
You '''have''' to allocate the GPU, otherwise You may not use it.
It will then follow Python's best practices and create a new Virtual Environment in that directory, then installing the dependencies of the projects detailed in <code>my_requirements.txt</code>
It then copies the data directory in <code>my_data_dir</code> to this directory using <code>rsync</code>.
Finally, it executes your main python script, using the time command to figure out, how much time actually was used.
Alternatively you may time all the commands to get an estimate for Your next batch job.

Here, Slurm will email to the specified address upon start and completion of the job with a summary.

The '''better''' your approximation, the better the Slurm scheduler may allocate resources to all users.

== Interactive usage ==
To '''get a good estimation''' of runtime, You may first want to try the resource ''interactively'':
srun --partition=gpu1 --ntasks-per-gpu=48 --gres=gpu:1 --pty /bin/bash

Then You may execute the steps in <code>python_run.slurm</code> script interactively, noting differences and amend your Slurm batch script.
''Please note'' the <code>--pty</code> which forwards the standard output and takes standard input to allow working with the Shell.

== Multiple nodes ==
Of course You may allocate multiple GPUs across nodes running:
sbatch --nodes 4 ./python_run.slurm
Please be aware, that TMPDIR is still local. For the time being run from Your $HOME or better yet from an allocated [[Workspace]].

== Nodes with multiple GPUs ==
The partitions <code>gpu4</code> and <code>gpu8</code> feature multiple GPUs.
The <code>gpu4</code> partition contains the node <code>gpu401</code> featuring 4x AMD MI300A APUs each with 128GB of fast HMB3e memory shared between the 24 cores and the GPU.
You may use AMD's ROCm employing HIP, OpenACC or OpenCL to parallelize for the GPU. Please refer to the documentation on this node.

The <code>gpu8</code> partition contains the node <code>gpu801</code> featuring 8x NVIDIA H100 offering 80GB of VRAM each, interconnected using SXM5.
Please refer to the documentation on this node.

DACHS/Queues

2025-05-14T08:44:51Z

R Keller: /* Nodes with multiple GPUs */

__TOC__

== Partitions ==
DACHS offers three partitions in Slurm, which map directly to the node types: nodes with one NVIDIA L40S GPU, a node with 4 AMD MI300A APUs and the node with 8 NVIDIA H100 GPUs.

== sinfo_t_idle ==
To see the available nodes, DACHS offers the tool ''sinfo_t_info'', which any user may call.

== sbatch -p ''partition'' ==
Batch jobs specify compute requirements, which must fit the resources as in maximum (wall-)time, memory and GPU resources.
If You require a GPU, You must specify this with your request.
These are restricted and must fit the available '''partitions'''.
Since requested compute resources are NOT always automatically mapped to the correct queue class, '''you must add the correct queue class to your sbatch command '''.
As with BwUniCluster 2.0, the specification of a partition is required.
 
Details are:

{| width=750px class="wikitable"
! colspan="5" | DACHS sbatch -p ''partition''
|- style="text-align:left;"
! partition !! node !! default resources !! maximum resources
|- style="text-align:left"
| gpu1
| gpu1[01-45]
| time=30, mem-per-node=5000mb
| time=72:00:00, nodes=16, mem-per-node=300000mb, res=gpu:1
|- style="text-align:left;"
| gpu4
| gpu401
| time=30, mem-per-cpu=5000mb
| time=72:00:00, nodes=1, mem=500000mb, ntasks-per-node=96
|- style="vertical-align:top; text-align:left"
| gpu8
| gpu801
| time=30, mem-per-cpu=5000mb, cpus-per-gpu=8
| time=48:00:00, mem=752000mb, ntasks-per-node=96
|-
|}

Default resources of a queue class defines time, #tasks and memory if not explicitly given with sbatch command. Resource list acronyms <code>--time</code>, <code>--ntasks</code>, <code>--nodes</code>, <code>--mem</code> and <code>--mem-per-cpu</code>.

A typical Slurm batch script (called for brevity <code>python_run.slurm</code>) for 1-node requiring one NVIDIA L40S GPU:
#!/bin/bash
#SBATCH --partition=gpu1
#SBATCH --ntasks-per-gpu=48
#SBATCH --gres=gpu:1
#SBATCH --time=1:00:00
#SBATCH --mail-type=all
#SBATCH --mail-user=my_email@hs-esslingen.de
module load devel/cuda/12.4
cd $TMPDIR
python3 -m venv my_environment
. my_environment/bin/activate
python3 -m pip install -r $HOME/my_requirements.txt
rsync -avz $HOME/my_data_dir/ .
time python3 $HOME/python_script.py

Submitting <code>sbatch python_run.slurm</code> will allocate one compute node and allocate the one available GPU for 1 hour. Furthermore, this will load the CUDA module version 12.4. It will then change to the '''fast''' scratch directory specified in the environment variable <code>TMPDIR</code>.
You '''have''' to allocate the GPU, otherwise You may not use it.
It will then follow Python's best practices and create a new Virtual Environment in that directory, then installing the dependencies of the projects detailed in <code>my_requirements.txt</code>
It then copies the data directory in <code>my_data_dir</code> to this directory using <code>rsync</code>.
Finally, it executes your main python script, using the time command to figure out, how much time actually was used.
Alternatively you may time all the commands to get an estimate for Your next batch job.

Here, Slurm will email to the specified address upon start and completion of the job with a summary.

The '''better''' your approximation, the better the Slurm scheduler may allocate resources to all users.

== Interactive usage ==
To '''get a good estimation''' of runtime, You may first want to try the resource ''interactively'':
srun --partition=gpu1 --ntasks-per-gpu=48 --gres=gpu:1 --pty /bin/bash

Then You may execute the steps in <code>python_run.slurm</code> script interactively, noting differences and amend your Slurm batch script.
''Please note'' the <code>--pty</code> which forwards the standard output and takes standard input to allow working with the Shell.

== Multiple nodes ==
Of course You may allocate multiple GPUs across nodes running:
sbatch --nodes 4 ./python_run.slurm
Please be aware, that TMPDIR is still local. For the time being run from Your $HOME.

== Nodes with multiple GPUs ==
The partitions <code>gpu4</code> and <code>gpu8</code> feature multiple GPUs.
The <code>gpu4</code> partition contains the node <code>gpu401</code> featuring 4x AMD MI300A APUs each with 128GB of fast HMB3e memory shared between the 24 cores and the GPU.
You may use AMD's ROCm employing HIP, OpenACC or OpenCL to parallelize for the GPU. Please refer to the documentation on this node.

The <code>gpu8</code> partition contains the node <code>gpu801</code> featuring 8x NVIDIA H100 offering 80GB of VRAM each, interconnected using SXM5.
Please refer to the documentation on this node.

DACHS/Queues

2025-03-18T13:25:20Z

R Keller: /* Interactive usage */

__TOC__

== Partitions ==
DACHS offers three partitions in Slurm, which map directly to the node types: nodes with one NVIDIA L40S GPU, a node with 4 AMD MI300A APUs and the node with 8 NVIDIA H100 GPUs.

== sinfo_t_idle ==
To see the available nodes, DACHS offers the tool ''sinfo_t_info'', which any user may call.

== sbatch -p ''partition'' ==
Batch jobs specify compute requirements, which must fit the resources as in maximum (wall-)time, memory and GPU resources.
If You require a GPU, You must specify this with your request.
These are restricted and must fit the available '''partitions'''.
Since requested compute resources are NOT always automatically mapped to the correct queue class, '''you must add the correct queue class to your sbatch command '''.
As with BwUniCluster 2.0, the specification of a partition is required.
 
Details are:

{| width=750px class="wikitable"
! colspan="5" | DACHS sbatch -p ''partition''
|- style="text-align:left;"
! partition !! node !! default resources !! maximum resources
|- style="text-align:left"
| gpu1
| gpu1[01-45]
| time=30, mem-per-node=5000mb
| time=72:00:00, nodes=16, mem-per-node=300000mb, res=gpu:1
|- style="text-align:left;"
| gpu4
| gpu401
| time=30, mem-per-cpu=5000mb
| time=72:00:00, nodes=1, mem=500000mb, ntasks-per-node=96
|- style="vertical-align:top; text-align:left"
| gpu8
| gpu801
| time=30, mem-per-cpu=5000mb, cpus-per-gpu=8
| time=48:00:00, mem=752000mb, ntasks-per-node=96
|-
|}

Default resources of a queue class defines time, #tasks and memory if not explicitly given with sbatch command. Resource list acronyms <code>--time</code>, <code>--ntasks</code>, <code>--nodes</code>, <code>--mem</code> and <code>--mem-per-cpu</code>.

A typical Slurm batch script (called for brevity <code>python_run.slurm</code>) for 1-node requiring one NVIDIA L40S GPU:
#!/bin/bash
#SBATCH --partition=gpu1
#SBATCH --ntasks-per-gpu=48
#SBATCH --gres=gpu:1
#SBATCH --time=1:00:00
#SBATCH --mail-type=all
#SBATCH --mail-user=my_email@hs-esslingen.de
module load devel/cuda/12.4
cd $TMPDIR
python3 -m venv my_environment
. my_environment/bin/activate
python3 -m pip install -r $HOME/my_requirements.txt
rsync -avz $HOME/my_data_dir/ .
time python3 $HOME/python_script.py

Submitting <code>sbatch python_run.slurm</code> will allocate one compute node and allocate the one available GPU for 1 hour. Furthermore, this will load the CUDA module version 12.4. It will then change to the '''fast''' scratch directory specified in the environment variable <code>TMPDIR</code>.
You '''have''' to allocate the GPU, otherwise You may not use it.
It will then follow Python's best practices and create a new Virtual Environment in that directory, then installing the dependencies of the projects detailed in <code>my_requirements.txt</code>
It then copies the data directory in <code>my_data_dir</code> to this directory using <code>rsync</code>.
Finally, it executes your main python script, using the time command to figure out, how much time actually was used.
Alternatively you may time all the commands to get an estimate for Your next batch job.

Here, Slurm will email to the specified address upon start and completion of the job with a summary.

The '''better''' your approximation, the better the Slurm scheduler may allocate resources to all users.

== Interactive usage ==
To '''get a good estimation''' of runtime, You may first want to try the resource ''interactively'':
srun --partition=gpu1 --ntasks-per-gpu=48 --gres=gpu:1 --pty /bin/bash

Then You may execute the steps in <code>python_run.slurm</code> script interactively, noting differences and amend your Slurm batch script.
''Please note'' the <code>--pty</code> which forwards the standard output and takes standard input to allow working with the Shell.

== Multiple nodes ==
Of course You may allocate multiple GPUs across nodes running:
sbatch --nodes 4 ./python_run.slurm
Please be aware, that TMPDIR is still local. For the time being run from Your $HOME.

== Nodes with multiple GPUs ==
The partitions <code>gpu4</code> and <code>gpu8</code> feature multiple GPUs.
The <code>gpu4</code> partition contains the node <code>gpu401</code> featuring 4 AMD MI300A APUs with 128GB of memory each using ROCm.
Please refer to the documentation on this node.

The <code>gpu8</code> partition contains the node <code>gpu401</code> featuring 4 AMD MI300A APUs with 128GB of memory each using ROCm.
Please refer to the documentation on this node.

DACHS/Login

2025-02-26T22:09:46Z

R Keller:

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
Access to DACHS is '''limited to IP addresses from the BelWü network'''.
All home institutions of our current users are connected to BelWü, so if you are on your campus network (e.g. in your office or on the Campus WiFi) you should be able to connect to DACHS without restrictions.
If you are outside one of the BelWü networks (e.g. at home), a VPN connection to the home institution or a connection to an SSH jump host at the home institution must be established first.
|}

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
VPN configured with [https://de.wikipedia.org/wiki/Split_Tunneling SPLIT tunneling mode] will '''not''' work, as any traffic not destined to your organizations IP range will not pass through your organizations VPN and hence will again be blocked by the Firewall.
|}

The login nodes of the DACHS cluster are the access points to the compute system, your <code>$HOME</code> directory and your workspaces.
All users must log in through these nodes to submit jobs to the cluster.

'''Prerequisites for successful login:'''

You need to have
* completed the 3-step [[registration]] procedure.
* [[Registration/Password|set a service password]] for DACHS.
* [[Registration/2FA|set up a second factor]] for the time-based one-time password (TOTP).

= Login to the DACHS =

Login to the DACHS is only possible with a Secure Shell (SSH) client for which you must know your username on the cluster and the hostname of the login nodes.
For more general information on SSH clients, visit the [[Registration/Login/Client|SSH clients Guide]].

== Username ==

If you want to use DACHS, you need to add a prefix to your local username: <code>prefix_username</code>.

{| class="wikitable"
! University !! Prefix
|-
| HS Aalen || aa
|-
| HS Albstadt-Sigmaringen || as
|-
| HS Esslingen || es
|-
| HS Heilbronn || hn
|-
| HS Karlsruhe || hk
|-
| HTWG Konstanz || ht
|-
| HS Mannheim || mn
|-
| HS Offenburg || of
|-
| HS Reutlingen || hr
|-
| HfT-Stuttgart || hs
|-
| THU-Ulm || hu
|-
|}

For a full list of user names, you can check the [https://www.bwidm.de/hochschulen.php bwIDM Hochschulen page].

'''Example:''' 
If your local username for the University is <code>vwxyz1234</code> and you are a user from the University of Esslingen this would combine to: <code>es_vwxyz1234</code>.

== Hostnames ==

The system has two login nodes.
The selection of the login node is done automatically.
If you are logging in multiple times, different sessions might run on different login nodes.

Login to DACHS:

{| class="wikitable"
! Hostname !! Node type
|-
| '''dachs-login.hs-esslingen.de''' || login to one of the two login nodes
|-
|}

In general, you should use automatic selection to allow us to balance the load over the two login nodes.
If you need to connect to specific login node, you can use the following hostnames:

{| class="wikitable"
! Hostname !! Node type
|-
| '''dachs-login1.hs-esslingen.de''' || DACHS first login node
|-
| '''dachs-login2.hs-esslingen.de''' || DACHS second login node
|-
|}

If you explicitly connect to dachs-login1 but end up on dachs-login2 (or the other way around), then of of them might not be available at the time and you're automatically redirected to the other one.

== Host Keys ==

When you log in, you may receive the message <code>The authenticity of host '<host address>' can't be established.</code> along with the host key fingerprint. This is intended so you can verify the authenticity of the host you are connecting to. Before you continue you should verify, if this fingerprint matches one of the following:

{| class="wikitable"
! Algorithm !! Fingerprint (SHA256)
|-
| '''RSA''' || SHA256:kdvbATXbd/ggG33G7VEw+O+FpJPcZU6XDeFyWXvBkhc
|-
| '''ECDSA''' || SHA256:4Y/LvkPL9g9DZ8JrmTxXsMTIWyM/u/mEEmSB7S/2yyA
|-
| '''ED25519''' || SHA256:X9eRJepYD3da3BM1pgiWxnvRc/Pt5eBLUr18tDUsZjU
|-
|}

== Login with SSH command (Linux, Mac, Windows) ==

Most Unix and Unix-like operating systems like Linux, Mac OS and *BSD come with a built-in SSH client provided by the OpenSSH project.
More recent versions of Windows 10 and Windows 11 using the [https://docs.microsoft.com/en-us/windows/wsl/install Windows Subsystem for Linux] (WSL) also come with a built-in OpenSSH client.

For login use one of the following ssh commands:

ssh <username>@dachs-login.hs-esslingen.de
ssh -l <username> dachs-login.hs-esslingen.de

To run graphical applications, you can use the <code>-X</code> or <code>-Y</code> flag to <code>ssh</code>:

ssh -Y -l <username> dachs-login.hs-esslingen.de

For better performance, we recommend using [[VNC]].

== Login with graphical SSH client (Windows) ==

For Windows we suggest using MobaXterm for login and file transfer.

Start ''MobaXterm'', fill in the following fields:
<pre>
Remote name : dachs-login.hs-esslingen.de
Specify user name : <username>
Port : 22
</pre>

After that click on 'ok'. Then a terminal will be opened and there you can enter your credentials.

'''Note:''' When using File transfer with MobaXterm version 23.6 the following configuration change has to be made:
In the settings in the tab "SSH", change the option "SSH engine" from "<new>" to "<legacy>". Then restart MobaXterm



== Login Example ==

To log in to DACHS, you must provide your [[Registration/Password|service password]].
Proceed as follows:
# Use SSH for a login node.
# The system will ask for a one-time password <code>Your OTP:</code>. Please enter your OTP and confirm it with Enter/Return. If you do not have a second factor yet, please create one (see [[Registration/2FA]]).
# The system will ask you for your service password <code>Password:</code>. Please enter it and confirm it with Enter/Return. If you do not have a service password yet or have forgotten it, please create one (see [[Registration/Password]]).
# You will be greeted by the DACHS cluster, followed by a shell.

<pre>
~ $ ssh -l es_vwxyz1234 dachs-login.hs-esslingen.de
(es_vwxyz1234@dachs-login.hs-esslingen.de) Your OTP: 123456
(es_vwxyz1234@dachs-login.hs-esslingen.de) Password:
********************************************************************************
Last login: Thu Jul 7 18:09:43 2022 from dachs-login.hs-esslingen.de
********************************************************************************
[es_vwxyz1234@login1 ~]$
</pre>

== Troubleshooting ==

If your OTP code doesn't get accepted multiple times in a row, you should first check that the second factor works and is active in the [https://login.bwidm.de/user/twofa.xhtml bwIDM My Tokens section].



= Allowed Activities on Login Nodes =

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
To guarantee usability for all the users of clusters you must not run your compute jobs on the login nodes.
Compute jobs must be submitted to the queuing system.
Any compute job running on the login nodes will be terminated without any notice.
Any long-running compilation or any long-running pre- or post-processing of batch jobs must also be submitted to the queuing system.
|}

The login nodes of the DACHS cluster are the access point to the compute system, your <code>$HOME</code> directory and your workspaces.
These nodes are shared with all the users therefore, your activities on the login nodes are limited to primarily set up your batch jobs.
Your activities may also be:
* '''short''' compilation of your program code and
* '''short''' pre- and post-processing of your batch jobs.



We advise users to use [[DACHS/Queues|interactive jobs]] for compute and memory intensive tasks like compiling.

= Related Information =

* If you want to reset your service password, consult the [[Registration/Password|Password Guide]].
* If you want to register a new token for the two factor authentication (2FA), consult the [[Registration/2FA|2FA Guide]].
* If you want to de-register, consult the [[Registration/Deregistration|De-registration Guide]].
* If you need an SSH key for your workflow, read [[Registration/SSH|Registering SSH Keys with your Cluster]].
* Configuring your shell: [[.bashrc Do's and Don'ts]]

Development/ollama

2025-02-21T15:37:21Z

R Keller: /* Best Practice */

Using LLMs even for inferencing requires large computational resources - currently at best a powerful GPU -
as provided by the bwHPC clusters.
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work.

== Introduction ==
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like the AMD MI-300A,
as well as GPUs like multiple NVIDIA H100.

Installing the inference server Ollama by default assumes you have root permission to install the server globally for all users
into the directory <code>/usr/local/bin</code>. Of course, this is '''not''' sensible.
Therefore the clusters provide the [[Environment_Modules|Environment Modules]] including binaries and libraries for CPU (if available AVX-512), AMD ROCm (if available) and NVIDIA CUDA using:
module load cs/ollama

More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page.

The inference server Ollama opens the well-known port 11434. The compute node's IP is on the internal network, e.g. 10.1.0.101,
which is not visible to any outside computer like Your laptop.
Therefore we need a way to forward this port on an IP visible to the outside, aka the login nodes.

{|style="background:#FEF4AB; width:100%;"
|style="padding:5px; background:#FEF4AB; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#FEF4AB; text-align:left"|
Please note: this module started off in the Category <code>devel</code>, but has been moved to the correct category computer science, or short <code>cs</code>.
|}

== Preparation ==

Prior to starting and pulling models, it is a '''good idea''' to allocate a proper [[Workspace]] for the (multi-gigabyte) models
and create a soft-link into this directory for Ollama:
ws_allocate ollama_models 60
ln -s `ws_find ollama_models`/ ~/.ollama

Now we may allocate a compute node using [[BwUniCluster2.0/Slurm|Slurm]].
At first You may start with interactively checking out the method in one terminal:
srun --time=00:30:00 --gres=gpu:1 --pty /bin/bash
Please note that on bwUniCluster, You need to provide a partition, here containing a GPU, e.g. for this 30 minute run, we may select <code>--partition=dev_gpu_4</code>, on DACHS <code>--partition=gpu1</code>.

Your Shell's prompt will list the node's name, e.g. on bwUniCluster node <code>uc2n520</code>:
[USERNAME@uc2n520 ~]$

Now You may load the Ollama module and start the server on the compute node and make sure using <code>OLLAMA_HOST</code> that it serves to the external IP address:
module load cs/ollama
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

You should be able to see the usage of the accelerator:

[[File:ollama_gpus.png|850x329px]]

== Accessing from login nodes ==
From another terminal You may log into the Cluster's login node a second time and pull a LLM (please check [https://ollama.com/search] for available models):
module load cs/ollama
export OLLAMA_HOST=uc2n520
ollama pull deepseek-r1

On the previous terminal on the compute node, You should see the model being downloaded and installed into the workspace.
Of course developing on the login nodes is not viable, therefore You may want to forward the ports.

{|style="background:#dedefe; width:100%;"
|style="padding:5px; background:#dedefe; text-align:left"|
[[Image:Info.svg|center]]
|style="padding:5px; background:#dedefe; text-align:left"|
On GPUs with 48GB VRAM like NVIDIA L40S, you may want to use the 70b model of Deepseek, i.e. <code>ollama pull deepseek-r1:70b</code> and amend the below commands accordingly.
|}

== Port forwarding ==

The login nodes of course have externally visible IP addresses, e.g. <code>bwunicluster.scc.kit.edu</code> which get to resolved to one of the multiple login nodes.
Using the Secure shell <code>ssh</code> one may forward a port from the login node to the compute node.

Of course, You may want to '''locally on Your laptop'''.
Open another terminal and start the Secure shell using the port forwarding:
ssh -L 11434:uc2n520:11434 USERNAME@bwunicluster.scc.kit.edu
Your OTP: 123456
Password:

You may check using whether this worked using Your local browser on Your Laptop:
[[File:firefox_ollama.png|645x325px]]

== Local programming ==
Now that You made sure You have access to the compute nodes GPU, you may develop on your local system:
python -m venv ollama_test
source ollama_test/bin/activate
python -m pip install ollama
export OLLAMA_HOST=localhost

and call <code>python</code> to run the following code:
import ollama
response = ollama.chat(model='deepseek-r1', messages=[ { 'role': 'user', 'content': 'why is the sky blue?'},])
print(response)

You should now see DeepSeek's response regarding Rayleigh Scattering.

On the compute node, You will see the computation:

[[File:ollama_gpus_computing.png|850x329px]]

'''Enjoy!'''

== Best Practice ==
Running interactively is generally '''not''' a good idea, especially not with very large models. Better submit Your job with mail notification, here file <code>ollama.slurm</code>:
#!/bin/bash
#SBATCH --partition=gpu_h100 # bwUniCluster3, for DACHS: gpu8
#SBATCH --gres=gpu:h100:4 # bwUniCluster3, for DACHS: gpu:h100:8
#SBATCH --ntasks-per-node=96 # considering bwUniCluster3 AMD EPYC9454, same on DACHS
#SBATCH --mem=500G # considering bwUniCluster3 768GB, enough on DACHS
#SBATCH --time=2:00:00 # Please be courteous to other users
#SBATCH --mail-type=BEGIN # Email when the job starts
#SBATCH --mail-user=my@mail.de # Your email address

module load cs/ollama # Load the the
export OLLAMA_HOST=0.0.0.0 # Serve on global interface
export OLLAMA_KEEP_ALIVE=-1 # Do not unload model (default is 5 minutes)
ollama serve

After starting the SSH for portforwarding or on the login-node after setting <code>export OLLAMA_HOST=</code> to the allocated node (see output of <code>squeue</code>):
ollama run deepseek-r1:671b
>>> /? # Shows the help
>>> /? shortcuts # Shows the keyboard shortcuts
>>> /show # Show information regarding model, prompt
>>> What is log(e)? # Returns explanation of logarithm under the assumption of base 10 and the natural logarithm including LaTeX Math notation.

Development/ollama

2025-02-21T15:35:54Z

R Keller: /* Best Practice */

Using LLMs even for inferencing requires large computational resources - currently at best a powerful GPU -
as provided by the bwHPC clusters.
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work.

== Introduction ==
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like the AMD MI-300A,
as well as GPUs like multiple NVIDIA H100.

Installing the inference server Ollama by default assumes you have root permission to install the server globally for all users
into the directory <code>/usr/local/bin</code>. Of course, this is '''not''' sensible.
Therefore the clusters provide the [[Environment_Modules|Environment Modules]] including binaries and libraries for CPU (if available AVX-512), AMD ROCm (if available) and NVIDIA CUDA using:
module load cs/ollama

More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page.

The inference server Ollama opens the well-known port 11434. The compute node's IP is on the internal network, e.g. 10.1.0.101,
which is not visible to any outside computer like Your laptop.
Therefore we need a way to forward this port on an IP visible to the outside, aka the login nodes.

{|style="background:#FEF4AB; width:100%;"
|style="padding:5px; background:#FEF4AB; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#FEF4AB; text-align:left"|
Please note: this module started off in the Category <code>devel</code>, but has been moved to the correct category computer science, or short <code>cs</code>.
|}

== Preparation ==

Prior to starting and pulling models, it is a '''good idea''' to allocate a proper [[Workspace]] for the (multi-gigabyte) models
and create a soft-link into this directory for Ollama:
ws_allocate ollama_models 60
ln -s `ws_find ollama_models`/ ~/.ollama

Now we may allocate a compute node using [[BwUniCluster2.0/Slurm|Slurm]].
At first You may start with interactively checking out the method in one terminal:
srun --time=00:30:00 --gres=gpu:1 --pty /bin/bash
Please note that on bwUniCluster, You need to provide a partition, here containing a GPU, e.g. for this 30 minute run, we may select <code>--partition=dev_gpu_4</code>, on DACHS <code>--partition=gpu1</code>.

Your Shell's prompt will list the node's name, e.g. on bwUniCluster node <code>uc2n520</code>:
[USERNAME@uc2n520 ~]$

Now You may load the Ollama module and start the server on the compute node and make sure using <code>OLLAMA_HOST</code> that it serves to the external IP address:
module load cs/ollama
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

You should be able to see the usage of the accelerator:

[[File:ollama_gpus.png|850x329px]]

== Accessing from login nodes ==
From another terminal You may log into the Cluster's login node a second time and pull a LLM (please check [https://ollama.com/search] for available models):
module load cs/ollama
export OLLAMA_HOST=uc2n520
ollama pull deepseek-r1

On the previous terminal on the compute node, You should see the model being downloaded and installed into the workspace.
Of course developing on the login nodes is not viable, therefore You may want to forward the ports.

{|style="background:#dedefe; width:100%;"
|style="padding:5px; background:#dedefe; text-align:left"|
[[Image:Info.svg|center]]
|style="padding:5px; background:#dedefe; text-align:left"|
On GPUs with 48GB VRAM like NVIDIA L40S, you may want to use the 70b model of Deepseek, i.e. <code>ollama pull deepseek-r1:70b</code> and amend the below commands accordingly.
|}

== Port forwarding ==

The login nodes of course have externally visible IP addresses, e.g. <code>bwunicluster.scc.kit.edu</code> which get to resolved to one of the multiple login nodes.
Using the Secure shell <code>ssh</code> one may forward a port from the login node to the compute node.

Of course, You may want to '''locally on Your laptop'''.
Open another terminal and start the Secure shell using the port forwarding:
ssh -L 11434:uc2n520:11434 USERNAME@bwunicluster.scc.kit.edu
Your OTP: 123456
Password:

You may check using whether this worked using Your local browser on Your Laptop:
[[File:firefox_ollama.png|645x325px]]

== Local programming ==
Now that You made sure You have access to the compute nodes GPU, you may develop on your local system:
python -m venv ollama_test
source ollama_test/bin/activate
python -m pip install ollama
export OLLAMA_HOST=localhost

and call <code>python</code> to run the following code:
import ollama
response = ollama.chat(model='deepseek-r1', messages=[ { 'role': 'user', 'content': 'why is the sky blue?'},])
print(response)

You should now see DeepSeek's response regarding Rayleigh Scattering.

On the compute node, You will see the computation:

[[File:ollama_gpus_computing.png|850x329px]]

'''Enjoy!'''

== Best Practice ==
Running interactively is generally '''not''' a good idea, especially not with very large models. Better submit Your job with mail notification, here file <code>ollama.slurm</code>:
#!/bin/bash
#SBATCH --partition=gpu_h100 # bwUniCluster3, for DACHS: gpu8
#SBATCH --gres=gpu:h100:4 # bwUniCluster3, for DACHS: gpu:h100:8
#SBATCH --ntasks-per-node=96 # considering bwUniCluster3 AMD EPYC9454, same on DACHS
#SBATCH --mem=500G # considering bwUniCluster3 768GB, enough on DACHS
#SBATCH --time=2:00:00 # Please be courteous to other users
module load cs/ollama # Load the the
export OLLAMA_HOST=0.0.0.0 # Serve on global interface
export OLLAMA_KEEP_ALIVE=-1 # Do not unload model (default is 5 minutes)
ollama serve

After starting the SSH for portforwarding or on the login-node after setting <code>export OLLAMA_HOST=</code> to the allocated node (see output of <code>squeue</code>):
ollama run deepseek-r1:671b
>>> /? # Shows the help
>>> /? shortcuts # Shows the keyboard shortcuts
>>> /show # Show information regarding model, prompt
>>> What is log(e)? # Returns explanation of logarithm under the assumption of base 10 and the natural logarithm including LaTeX Math notation.

Development/ollama

2025-02-21T15:35:16Z

R Keller: /* Local programming */

Using LLMs even for inferencing requires large computational resources - currently at best a powerful GPU -
as provided by the bwHPC clusters.
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work.

== Introduction ==
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like the AMD MI-300A,
as well as GPUs like multiple NVIDIA H100.

Installing the inference server Ollama by default assumes you have root permission to install the server globally for all users
into the directory <code>/usr/local/bin</code>. Of course, this is '''not''' sensible.
Therefore the clusters provide the [[Environment_Modules|Environment Modules]] including binaries and libraries for CPU (if available AVX-512), AMD ROCm (if available) and NVIDIA CUDA using:
module load cs/ollama

More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page.

The inference server Ollama opens the well-known port 11434. The compute node's IP is on the internal network, e.g. 10.1.0.101,
which is not visible to any outside computer like Your laptop.
Therefore we need a way to forward this port on an IP visible to the outside, aka the login nodes.

{|style="background:#FEF4AB; width:100%;"
|style="padding:5px; background:#FEF4AB; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#FEF4AB; text-align:left"|
Please note: this module started off in the Category <code>devel</code>, but has been moved to the correct category computer science, or short <code>cs</code>.
|}

== Preparation ==

Prior to starting and pulling models, it is a '''good idea''' to allocate a proper [[Workspace]] for the (multi-gigabyte) models
and create a soft-link into this directory for Ollama:
ws_allocate ollama_models 60
ln -s `ws_find ollama_models`/ ~/.ollama

Now we may allocate a compute node using [[BwUniCluster2.0/Slurm|Slurm]].
At first You may start with interactively checking out the method in one terminal:
srun --time=00:30:00 --gres=gpu:1 --pty /bin/bash
Please note that on bwUniCluster, You need to provide a partition, here containing a GPU, e.g. for this 30 minute run, we may select <code>--partition=dev_gpu_4</code>, on DACHS <code>--partition=gpu1</code>.

Your Shell's prompt will list the node's name, e.g. on bwUniCluster node <code>uc2n520</code>:
[USERNAME@uc2n520 ~]$

Now You may load the Ollama module and start the server on the compute node and make sure using <code>OLLAMA_HOST</code> that it serves to the external IP address:
module load cs/ollama
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

You should be able to see the usage of the accelerator:

[[File:ollama_gpus.png|850x329px]]

== Accessing from login nodes ==
From another terminal You may log into the Cluster's login node a second time and pull a LLM (please check [https://ollama.com/search] for available models):
module load cs/ollama
export OLLAMA_HOST=uc2n520
ollama pull deepseek-r1

On the previous terminal on the compute node, You should see the model being downloaded and installed into the workspace.
Of course developing on the login nodes is not viable, therefore You may want to forward the ports.

{|style="background:#dedefe; width:100%;"
|style="padding:5px; background:#dedefe; text-align:left"|
[[Image:Info.svg|center]]
|style="padding:5px; background:#dedefe; text-align:left"|
On GPUs with 48GB VRAM like NVIDIA L40S, you may want to use the 70b model of Deepseek, i.e. <code>ollama pull deepseek-r1:70b</code> and amend the below commands accordingly.
|}

== Port forwarding ==

The login nodes of course have externally visible IP addresses, e.g. <code>bwunicluster.scc.kit.edu</code> which get to resolved to one of the multiple login nodes.
Using the Secure shell <code>ssh</code> one may forward a port from the login node to the compute node.

Of course, You may want to '''locally on Your laptop'''.
Open another terminal and start the Secure shell using the port forwarding:
ssh -L 11434:uc2n520:11434 USERNAME@bwunicluster.scc.kit.edu
Your OTP: 123456
Password:

You may check using whether this worked using Your local browser on Your Laptop:
[[File:firefox_ollama.png|645x325px]]

== Local programming ==
Now that You made sure You have access to the compute nodes GPU, you may develop on your local system:
python -m venv ollama_test
source ollama_test/bin/activate
python -m pip install ollama
export OLLAMA_HOST=localhost

and call <code>python</code> to run the following code:
import ollama
response = ollama.chat(model='deepseek-r1', messages=[ { 'role': 'user', 'content': 'why is the sky blue?'},])
print(response)

You should now see DeepSeek's response regarding Rayleigh Scattering.

On the compute node, You will see the computation:

[[File:ollama_gpus_computing.png|850x329px]]

'''Enjoy!'''

== Best Practice ==
Running interactively is generally **not** a good idea, especially not with very large models. Better submit Your job with mail notification, here file <code>ollama.slurm</code>:
#!/bin/bash
#SBATCH --partition=gpu_h100 # bwUniCluster3, for DACHS: gpu8
#SBATCH --gres=gpu:h100:4 # bwUniCluster3, for DACHS: gpu:h100:8
#SBATCH --ntasks-per-node=96 # considering bwUniCluster3 AMD EPYC9454, same on DACHS
#SBATCH --mem=500G # considering bwUniCluster3 768GB, enough on DACHS
#SBATCH --time=2:00:00 # Please be courteous to other users
module load cs/ollama # Load the the
export OLLAMA_HOST=0.0.0.0 # Serve on global interface
export OLLAMA_KEEP_ALIVE=-1 # Do not unload model (default is 5 minutes)
ollama serve

After starting the SSH for portforwarding or on the login-node after setting <code>export OLLAMA_HOST=</code> to the allocated node (see output of <code>squeue</code>):
ollama run deepseek-r1:671b
>>> /? # Shows the help
>>> /? shortcuts # Shows the keyboard shortcuts
>>> /show # Show information regarding model, prompt
>>> What is log(e)? # Returns explanation of logarithm under the assumption of base 10 and the natural logarithm including LaTeX Math notation.

Development/ollama

2025-02-20T13:07:21Z

R Keller: /* Local programming */

Development/ollama

2025-02-20T13:06:13Z

R Keller:

Using LLMs even for inferencing requires large computational resources - currently at best a powerful GPU -
as provided by the bwHPC clusters.
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work.

== Introduction ==
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like the AMD MI-300A,
as well as GPUs like multiple NVIDIA H100.

Installing the inference server Ollama by default assumes you have root permission to install the server globally for all users
into the directory <code>/usr/local/bin</code>. Of course, this is '''not''' sensible.
Therefore the clusters provide the [[Environment_Modules|Environment Modules]] including binaries and libraries for CPU (if available AVX-512), AMD ROCm (if available) and NVIDIA CUDA using:
module load cs/ollama

More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page.

The inference server Ollama opens the well-known port 11434. The compute node's IP is on the internal network, e.g. 10.1.0.101,
which is not visible to any outside computer like Your laptop.
Therefore we need a way to forward this port on an IP visible to the outside, aka the login nodes.

{|style="background:#FEF4AB; width:100%;"
|style="padding:5px; background:#FEF4AB; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#FEF4AB; text-align:left"|
Please note: this module started off in the Category <code>devel</code>, but has been moved to the correct category computer science, or short <code>cs</code>.
|}

== Preparation ==

Prior to starting and pulling models, it is a '''good idea''' to allocate a proper [[Workspace]] for the (multi-gigabyte) models
and create a soft-link into this directory for Ollama:
ws_allocate ollama_models 60
ln -s `ws_find ollama_models`/ ~/.ollama

Now we may allocate a compute node using [[BwUniCluster2.0/Slurm|Slurm]].
At first You may start with interactively checking out the method in one terminal:
srun --time=00:30:00 --gres=gpu:1 --pty /bin/bash
Please note that on bwUniCluster, You need to provide a partition, here containing a GPU, e.g. for this 30 minute run, we may select <code>--partition=dev_gpu_4</code>, on DACHS <code>--partition=gpu1</code>.

Your Shell's prompt will list the node's name, e.g. on bwUniCluster node <code>uc2n520</code>:
[USERNAME@uc2n520 ~]$

Now You may load the Ollama module and start the server on the compute node and make sure using <code>OLLAMA_HOST</code> that it serves to the external IP address:
module load cs/ollama
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

You should be able to see the usage of the accelerator:

[[File:ollama_gpus.png|850x329px]]

== Accessing from login nodes ==
From another terminal You may log into the Cluster's login node a second time and pull a LLM (please check [https://ollama.com/search] for available models):
module load cs/ollama
export OLLAMA_HOST=uc2n520
ollama pull deepseek-r1

On the previous terminal on the compute node, You should see the model being downloaded and installed into the workspace.
Of course developing on the login nodes is not viable, therefore You may want to forward the ports.

{|style="background:#dedefe; width:100%;"
|style="padding:5px; background:#dedefe; text-align:left"|
[[Image:Info.svg|center]]
|style="padding:5px; background:#dedefe; text-align:left"|
On GPUs with 48GB VRAM like NVIDIA L40S, you may want to use the 70b model of Deepseek, i.e. <code>ollama pull deepseek-r1:70b</code> and amend the below commands accordingly.
|}

== Port forwarding ==

The login nodes of course have externally visible IP addresses, e.g. <code>bwunicluster.scc.kit.edu</code> which get to resolved to one of the multiple login nodes.
Using the Secure shell <code>ssh</code> one may forward a port from the login node to the compute node.

Of course, You may want to '''locally on Your laptop'''.
Open another terminal and start the Secure shell using the port forwarding:
ssh -L 11434:uc2n520:11434 USERNAME@bwunicluster.scc.kit.edu
Your OTP: 123456
Password:

You may check using whether this worked using Your local browser on Your Laptop:
[[File:firefox_ollama.png|645x325px]]

== Local programming ==
Now that You made sure You have access to the compute nodes GPU, you may develop on your local system:
python -m venv ollama_test
. ollama_test/bin/activate
python -m pip install ollama
export OLLAMA_HOST=localhost

and call <code>python</code> to run the following code:
import ollama
response = ollama.chat(model='deepseek-r1', messages=[ { 'role': 'user', 'content': 'why is the sky blue?'},])
print(response)

You should now see DeepSeek's response regarding Rayleigh Scattering.

On the compute node, You will see the computation:

[[File:ollama_gpus_computing.png|850x329px]]

'''Enjoy!'''

Development/ollama

2025-02-20T13:02:09Z

R Keller: /* Accessing from login nodes */

Using LLMs even for inferencing requires large computational resources - currently at best a powerful GPU -
as provided by the bwHPC clusters.
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work.

== Introduction ==
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like the AMD MI-300A,
as well as GPUs like multiple NVIDIA H100.

Installing the inference server Ollama by default assumes you have root permission to install the server globally for all users
into the directory <code>/usr/local/bin</code>. Of course, this is '''not''' sensible.
Therefore the clusters provide the [[Environment_Modules|Environment Modules]] including binaries and libraries for CPU (if available AVX-512), AMD ROCm (if available) and NVIDIA CUDA using:
module load devel/ollama

More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page.

The inference server Ollama opens the well-known port 11434. The compute node's IP is on the internal network, e.g. 10.1.0.101,
which is not visible to any outside computer like Your laptop.
Therefore we need a way to forward this port on an IP visible to the outside, aka the login nodes.

== Preparation ==

Prior to starting and pulling models, it is a '''good idea''' to allocate a proper [[Workspace]] for the (multi-gigabyte) models
and create a soft-link into this directory for Ollama:
ws_allocate ollama_models 60
ln -s `ws_find ollama_models`/ ~/.ollama

Now we may allocate a compute node using [[BwUniCluster2.0/Slurm|Slurm]].
At first You may start with interactively checking out the method in one terminal:
srun --time=00:30:00 --gres=gpu:1 --pty /bin/bash
Please note that on bwUniCluster, You need to provide a partition, here containing a GPU, e.g. for this 30 minute run, we may select <code>--partition=dev_gpu_4</code>, on DACHS <code>--partition=gpu1</code>.

Your Shell's prompt will list the node's name, e.g. on bwUniCluster node <code>uc2n520</code>:
[USERNAME@uc2n520 ~]$

Now You may load the Ollama module and start the server on the compute node and make sure using <code>OLLAMA_HOST</code> that it serves to the external IP address:
module load devel/ollama
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

You should be able to see the usage of the accelerator:

[[File:ollama_gpus.png|850x329px]]

== Accessing from login nodes ==
From another terminal You may log into the Cluster's login node a second time and pull a LLM (please check [https://ollama.com/search] for available models):
module load cs/ollama
export OLLAMA_HOST=uc2n520
ollama pull deepseek-r1

On the previous terminal on the compute node, You should see the model being downloaded and installed into the workspace.
Of course developing on the login nodes is not viable, therefore You may want to forward the ports.

{|style="background:#dedefe; width:100%;"
|style="padding:5px; background:#dedefe; text-align:left"|
[[Image:Info.svg|center]]
|style="padding:5px; background:#dedefe; text-align:left"|
On GPUs with 48GB VRAM like NVIDIA L40S, you may want to use the 70b model of Deepseek, i.e. <code>ollama pull deepseek-r1:70b</code> and amend the below commands accordingly.
|}

== Port forwarding ==

The login nodes of course have externally visible IP addresses, e.g. <code>bwunicluster.scc.kit.edu</code> which get to resolved to one of the multiple login nodes.
Using the Secure shell <code>ssh</code> one may forward a port from the login node to the compute node.

Of course, You may want to '''locally on Your laptop'''.
Open another terminal and start the Secure shell using the port forwarding:
ssh -L 11434:uc2n520:11434 USERNAME@bwunicluster.scc.kit.edu
Your OTP: 123456
Password:

You may check using whether this worked using Your local browser on Your Laptop:
[[File:firefox_ollama.png|645x325px]]

== Local programming ==
Now that You made sure You have access to the compute nodes GPU, you may develop on your local system:
python -m venv ollama_test
. ollama_test/bin/activate
python -m pip install ollama
export OLLAMA_HOST=localhost

and call <code>python</code> to run the following code:
import ollama
response = ollama.chat(model='deepseek-r1', messages=[ { 'role': 'user', 'content': 'why is the sky blue?'},])
print(response)

You should now see DeepSeek's response regarding Rayleigh Scattering.

On the compute node, You will see the computation:

[[File:ollama_gpus_computing.png|850x329px]]

'''Enjoy!'''

File:Info.svg

2025-02-20T12:57:37Z

R Keller: Info-Icon: an icon from the OOjs UI MediaWiki lib.

== Summary ==
Info-Icon: an icon from the OOjs UI MediaWiki lib.

Development/ollama

2025-02-13T08:48:42Z

R Keller: /* Preparation */

Using LLMs even for inferencing requires large computational resources - currently at best a powerful GPU --
as provided by the bwHPC clusters.
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work.

== Introduction ==
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like the AMD MI-300A,
as well as GPUs like multiple NVIDIA H100.

Installing the inference server Ollama by default assumes you have root permission to install the server globally for all users
into the directory <code>/usr/local/bin</code>. Of course, this is '''not''' sensible.
Therefore the clusters provide the [[Environment_Modules|Environment Modules]] including binaries and libraries for CPU (if available AVX-512), AMD ROCm (if available) and NVIDIA CUDA using:
module load devel/ollama

More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page.

The inference server Ollama opens the well-known port 11434. The compute node's IP is on the internal network, e.g. 10.1.0.101,
which is not visible to any outside computer like Your laptop.
Therefore we need a way to forward this port on an IP visible to the outside, aka the login nodes.

== Preparation ==

Prior to starting and pulling models, it is a '''good idea''' to allocate a proper [[Workspace]] for the (multi-gigabyte) models
and create a soft-link into this directory for Ollama:
ws_allocate ollama_models 60
ln -s `ws_find ollama_models`/ ~/.ollama

Now we may allocate a compute node using [[BwUniCluster2.0/Slurm|Slurm]].
At first You may start with interactively checking out the method in one terminal:
srun --time=00:30:00 --gres=gpu:1 --pty /bin/bash
Please note that on bwUniCluster, You need to provide a partition, here containing a GPU, e.g. for this 30 minute run, we may select <code>--partition=dev_gpu_4</code>, on DACHS <code>--partition=gpu1</code>.

Your Shell's prompt will list the node's name, e.g. on bwUniCluster node <code>uc2n520</code>:
[USERNAME@uc2n520 ~]$

Now You may load the Ollama module and start the server on the compute node and make sure using <code>OLLAMA_HOST</code> that it serves to the external IP address:
module load devel/ollama
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

You should be able to see the usage of the accelerator:

[[File:ollama_gpus.png|850x329px]]

== Accessing from login nodes ==
From another terminal You may log into the Cluster's login node a second time and pull a LLM (please check [https://ollama.com/search] for available models):
module load devel/ollama
export OLLAMA_HOST=uc2n520
ollama pull deepseek-r1

On the previous terminal on the compute node, You should see the model being downloaded and installed into the workspace.
Of course developing on the login nodes is not viable, therefore You may want to forward the ports.

== Port forwarding ==

The login nodes of course have externally visible IP addresses, e.g. <code>bwunicluster.scc.kit.edu</code> which get to resolved to one of the multiple login nodes.
Using the Secure shell <code>ssh</code> one may forward a port from the login node to the compute node.

Of course, You may want to '''locally on Your laptop'''.
Open another terminal and start the Secure shell using the port forwarding:
ssh -L 11434:uc2n520:11434 USERNAME@bwunicluster.scc.kit.edu
Your OTP: 123456
Password:

You may check using whether this worked using Your local browser on Your Laptop:
[[File:firefox_ollama.png|645x325px]]

== Local programming ==
Now that You made sure You have access to the compute nodes GPU, you may develop on your local system:
python -m venv ollama_test
. ollama_test/bin/activate
python -m pip install ollama
export OLLAMA_HOST=localhost

and call <code>python</code> to run the following code:
import ollama
response = ollama.chat(model='deepseek-r1', messages=[ { 'role': 'user', 'content': 'why is the sky blue?'},])
print(response)

You should now see DeepSeek's response regarding Rayleigh Scattering.

On the compute node, You will see the computation:

[[File:ollama_gpus_computing.png|850x329px]]

'''Enjoy!'''

Development/ollama

2025-02-11T20:06:53Z

R Keller: /* Preparation */

Using LLMs even for inferencing requires large computational resources - currently at best a powerful GPU --
as provided by the bwHPC clusters.
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work.

== Introduction ==
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like the AMD MI-300A,
as well as GPUs like multiple NVIDIA H100.

Installing the inference server Ollama by default assumes you have root permission to install the server globally for all users
into the directory <code>/usr/local/bin</code>. Of course, this is '''not''' sensible.
Therefore the clusters provide the [[Environment_Modules|Environment Modules]] including binaries and libraries for CPU (if available AVX-512), AMD ROCm (if available) and NVIDIA CUDA using:
module load devel/ollama

More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page.

The inference server Ollama opens the well-known port 11434. The compute node's IP is on the internal network, e.g. 10.1.0.101,
which is not visible to any outside computer like Your laptop.
Therefore we need a way to forward this port on an IP visible to the outside, aka the login nodes.

== Preparation ==

Prior to starting and pulling models, it is a '''good idea''' to allocate a proper [[Workspace]] for the (multi-gigabyte) models
and create a soft-link into this directory for Ollama:
ws_allocate ollama_models 60
ln -s `ws_find ollama_models`/ .ollama

Now we may allocate a compute node using [[BwUniCluster2.0/Slurm|Slurm]].
At first You may start with interactively checking out the method in one terminal:
srun --time=00:30:00 --gres=gpu:1 --pty /bin/bash
Please note that on bwUniCluster, You need to provide a partition, here containing a GPU, e.g. for this 30 minute run, we may select <code>--partition=dev_gpu_4</code>, on DACHS <code>--partition=gpu1</code>.

Your Shell's prompt will list the node's name, e.g. on bwUniCluster node <code>uc2n520</code>:
[USERNAME@uc2n520 ~]$

Now You may load the Ollama module and start the server on the compute node and make sure using <code>OLLAMA_HOST</code> that it serves to the external IP address:
module load devel/ollama
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

You should be able to see the usage of the accelerator:

[[File:ollama_gpus.png|850x329px]]

== Accessing from login nodes ==
From another terminal You may log into the Cluster's login node a second time and pull a LLM (please check [https://ollama.com/search] for available models):
module load devel/ollama
export OLLAMA_HOST=uc2n520
ollama pull deepseek-r1

On the previous terminal on the compute node, You should see the model being downloaded and installed into the workspace.
Of course developing on the login nodes is not viable, therefore You may want to forward the ports.

== Port forwarding ==

The login nodes of course have externally visible IP addresses, e.g. <code>bwunicluster.scc.kit.edu</code> which get to resolved to one of the multiple login nodes.
Using the Secure shell <code>ssh</code> one may forward a port from the login node to the compute node.

Of course, You may want to '''locally on Your laptop'''.
Open another terminal and start the Secure shell using the port forwarding:
ssh -L 11434:uc2n520:11434 USERNAME@bwunicluster.scc.kit.edu
Your OTP: 123456
Password:

You may check using whether this worked using Your local browser on Your Laptop:
[[File:firefox_ollama.png|645x325px]]

== Local programming ==
Now that You made sure You have access to the compute nodes GPU, you may develop on your local system:
python -m venv ollama_test
. ollama_test/bin/activate
python -m pip install ollama
export OLLAMA_HOST=localhost

and call <code>python</code> to run the following code:
import ollama
response = ollama.chat(model='deepseek-r1', messages=[ { 'role': 'user', 'content': 'why is the sky blue?'},])
print(response)

You should now see DeepSeek's response regarding Rayleigh Scattering.

On the compute node, You will see the computation:

[[File:ollama_gpus_computing.png|850x329px]]

'''Enjoy!'''

Development/ollama

2025-02-11T19:55:47Z

R Keller: /* Preparation */

Using LLMs even for inferencing requires large computational resources - currently at best a powerful GPU --
as provided by the bwHPC clusters.
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work.

== Introduction ==
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like the AMD MI-300A,
as well as GPUs like multiple NVIDIA H100.

Installing the inference server Ollama by default assumes you have root permission to install the server globally for all users
into the directory <code>/usr/local/bin</code>. Of course, this is '''not''' sensible.
Therefore the clusters provide the [[Environment_Modules|Environment Modules]] including binaries and libraries for CPU (if available AVX-512), AMD ROCm (if available) and NVIDIA CUDA using:
module load devel/ollama

More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page.

The inference server Ollama opens the well-known port 11434. The compute node's IP is on the internal network, e.g. 10.1.0.101,
which is not visible to any outside computer like Your laptop.
Therefore we need a way to forward this port on an IP visible to the outside, aka the login nodes.

== Preparation ==

Prior to starting and pulling models, it is a '''good idea''' to allocate a proper [[Workspace]] for the (multi-gigabyte) models
and create a soft-link into this directory for Ollama:
ws_allocate ollama_models 60
ln -s `ws_find ollama_models`/ .ollama

Now we may allocate a compute node using [[BwUniCluster2.0/Slurm|Slurm]].
At first You may start with interactively checking out the method in one terminal:
srun --time=00:30:00 --gres=gpu:1 --pty /bin/bash
Please note that on bwUniCluster, You need to provide a partition, here containing a GPU, e.g. for this 30 minute run, we may select <code>--partition=dev_gpu_4</code>, on DACHS <code>--partition=gpu1</code>.

Your Shell's prompt will list the nodes name, e.g. on bwUniCluster node <code>uc2n520</code>:
[USERNAME@uc2n520 ~]$

Now You may load the Ollama module and start the server on the compute node and make sure using <code>OLLAMA_HOST</code> that it serves to the external IP address:
module load devel/ollama
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

You should be able to see the usage of the accelerator:

[[File:ollama_gpus.png|850x329px]]

== Accessing from login nodes ==
From another terminal You may log into the Cluster's login node a second time and pull a LLM (please check [https://ollama.com/search] for available models):
module load devel/ollama
export OLLAMA_HOST=uc2n520
ollama pull deepseek-r1

On the previous terminal on the compute node, You should see the model being downloaded and installed into the workspace.
Of course developing on the login nodes is not viable, therefore You may want to forward the ports.

== Port forwarding ==

The login nodes of course have externally visible IP addresses, e.g. <code>bwunicluster.scc.kit.edu</code> which get to resolved to one of the multiple login nodes.
Using the Secure shell <code>ssh</code> one may forward a port from the login node to the compute node.

Of course, You may want to '''locally on Your laptop'''.
Open another terminal and start the Secure shell using the port forwarding:
ssh -L 11434:uc2n520:11434 USERNAME@bwunicluster.scc.kit.edu
Your OTP: 123456
Password:

You may check using whether this worked using Your local browser on Your Laptop:
[[File:firefox_ollama.png|645x325px]]

== Local programming ==
Now that You made sure You have access to the compute nodes GPU, you may develop on your local system:
python -m venv ollama_test
. ollama_test/bin/activate
python -m pip install ollama
export OLLAMA_HOST=localhost

and call <code>python</code> to run the following code:
import ollama
response = ollama.chat(model='deepseek-r1', messages=[ { 'role': 'user', 'content': 'why is the sky blue?'},])
print(response)

You should now see DeepSeek's response regarding Rayleigh Scattering.

On the compute node, You will see the computation:

[[File:ollama_gpus_computing.png|850x329px]]

'''Enjoy!'''

Development/ollama

2025-02-11T19:09:58Z

R Keller:

Using LLMs even for inferencing requires large computational resources - currently at best a powerful GPU --
as provided by the bwHPC clusters.
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work.

== Introduction ==
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like the AMD MI-300A,
as well as GPUs like multiple NVIDIA H100.

Installing the inference server Ollama by default assumes you have root permission to install the server globally for all users
into the directory <code>/usr/local/bin</code>. Of course, this is '''not''' sensible.
Therefore the clusters provide the [[Environment_Modules|Environment Modules]] including binaries and libraries for CPU (if available AVX-512), AMD ROCm (if available) and NVIDIA CUDA using:
module load devel/ollama

More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page.

The inference server Ollama opens the well-known port 11434. The compute node's IP is on the internal network, e.g. 10.1.0.101,
which is not visible to any outside computer like Your laptop.
Therefore we need a way to forward this port on an IP visible to the outside, aka the login nodes.

== Preparation ==

Prior to starting and pulling models, it is a '''good idea''' to allocate a proper [[Workspace]] for the (multi-gigabyte) models
and create a soft-link into this directory for Ollama:
ws_allocate ollama_models 60
ln -s /pfs/work7/workspace/scratch/es_rakeller-ollama_models/ .ollama

Now we may allocate a compute node using [[BwUniCluster2.0/Slurm|Slurm]].
At first You may start with interactively checking out the method in one terminal:
srun --time=00:30:00 --gres=gpu:1 --pty /bin/bash
Please note that on bwUniCluster, You need to provide a partition, here containing a GPU, e.g. for this 30 minute run, we may select <code>--partition=dev_gpu_4</code>, on DACHS <code>--partition=gpu1</code>.

Your Shell's prompt will list the nodes name, e.g. on bwUniCluster node <code>uc2n520</code>:
[USERNAME@uc2n520 ~]$

Now You may load the Ollama module and start the server on the compute node and make sure using <code>OLLAMA_HOST</code> that it serves to the external IP address:
module load devel/ollama
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

You should be able to see the usage of the accelerator:

[[File:ollama_gpus.png|850x329px]]

== Accessing from login nodes ==
From another terminal You may log into the Cluster's login node a second time and pull a LLM (please check [https://ollama.com/search] for available models):
module load devel/ollama
export OLLAMA_HOST=uc2n520
ollama pull deepseek-r1

On the previous terminal on the compute node, You should see the model being downloaded and installed into the workspace.
Of course developing on the login nodes is not viable, therefore You may want to forward the ports.

== Port forwarding ==

The login nodes of course have externally visible IP addresses, e.g. <code>bwunicluster.scc.kit.edu</code> which get to resolved to one of the multiple login nodes.
Using the Secure shell <code>ssh</code> one may forward a port from the login node to the compute node.

Of course, You may want to '''locally on Your laptop'''.
Open another terminal and start the Secure shell using the port forwarding:
ssh -L 11434:uc2n520:11434 USERNAME@bwunicluster.scc.kit.edu
Your OTP: 123456
Password:

You may check using whether this worked using Your local browser on Your Laptop:
[[File:firefox_ollama.png|645x325px]]

== Local programming ==
Now that You made sure You have access to the compute nodes GPU, you may develop on your local system:
python -m venv ollama_test
. ollama_test/bin/activate
python -m pip install ollama
export OLLAMA_HOST=localhost

and call <code>python</code> to run the following code:
import ollama
response = ollama.chat(model='deepseek-r1', messages=[ { 'role': 'user', 'content': 'why is the sky blue?'},])
print(response)

You should now see DeepSeek's response regarding Rayleigh Scattering.

On the compute node, You will see the computation:

[[File:ollama_gpus_computing.png|850x329px]]

'''Enjoy!'''

File:Ollama gpus computing.png

2025-02-11T19:09:29Z

R Keller:

Development/ollama

2025-02-11T19:09:20Z

R Keller:

Using LLMs even for inferencing requires large computational resources - currently at best a powerful GPU --
as provided by the bwHPC clusters.
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work.

== Introduction ==
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like the AMD MI-300A,
as well as GPUs like multiple NVIDIA H100.

Installing the inference server Ollama by default assumes you have root permission to install the server globally for all users
into the directory <code>/usr/local/bin</code>. Of course, this is '''not''' sensible.
Therefore the clusters provide the [[Environment_Modules|Environment Modules]] including binaries and libraries for CPU (if available AVX-512), AMD ROCm (if available) and NVIDIA CUDA using:
module load devel/ollama

More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page.

The inference server Ollama opens the well-known port 11434. The compute node's IP is on the internal network, e.g. 10.1.0.101,
which is not visible to any outside computer like Your laptop.
Therefore we need a way to forward this port on an IP visible to the outside, aka the login nodes.

== Preparation ==

Prior to starting and pulling models, it is a '''good idea''' to allocate a proper [[Workspace]] for the (multi-gigabyte) models
and create a soft-link into this directory for Ollama:
ws_allocate ollama_models 60
ln -s /pfs/work7/workspace/scratch/es_rakeller-ollama_models/ .ollama

Now we may allocate a compute node using [[BwUniCluster2.0/Slurm|Slurm]].
At first You may start with interactively checking out the method in one terminal:
srun --time=00:30:00 --gres=gpu:1 --pty /bin/bash
Please note that on bwUniCluster, You need to provide a partition, here containing a GPU, e.g. for this 30 minute run, we may select <code>--partition=dev_gpu_4</code>, on DACHS <code>--partition=gpu1</code>.

Your Shell's prompt will list the nodes name, e.g. on bwUniCluster node <code>uc2n520</code>:
[USERNAME@uc2n520 ~]$

Now You may load the Ollama module and start the server on the compute node and make sure using <code>OLLAMA_HOST</code> that it serves to the external IP address:
module load devel/ollama
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

You should be able to see the usage of the accelerator:

[[File:ollama_gpus.png|850x329px]]

== Accessing from login nodes ==
From another terminal You may log into the Cluster's login node a second time and pull a LLM (please check [https://ollama.com/search] for available models):
module load devel/ollama
export OLLAMA_HOST=uc2n520
ollama pull deepseek-r1

On the previous terminal on the compute node, You should see the model being downloaded and installed into the workspace.
Of course developing on the login nodes is not viable, therefore You may want to forward the ports.

== Port forwarding ==

The login nodes of course have externally visible IP addresses, e.g. <code>bwunicluster.scc.kit.edu</code> which get to resolved to one of the multiple login nodes.
Using the Secure shell <code>ssh</code> one may forward a port from the login node to the compute node.

Of course, You may want to '''locally on Your laptop'''.
Open another terminal and start the Secure shell using the port forwarding:
ssh -L 11434:uc2n520:11434 USERNAME@bwunicluster.scc.kit.edu
Your OTP: 123456
Password:

You may check using whether this worked using Your local browser on Your Laptop:
[[File:firefox_ollama.png|645x325px]]

== Local programming ==
Now that You made sure You have access to the compute nodes GPU, you may develop on your local system:
python -m venv ollama_test
. ollama_test/bin/activate
python -m pip install ollama
export OLLAMA_HOST=localhost

and call <code>python</code> to run the following code:
import ollama
response = ollama.chat(model='deepseek-r1', messages=[ { 'role': 'user', 'content': 'why is the sky blue?'},])
print(response)

You should now see DeepSeek's response regarding Rayleigh Scattering.

On the compute node, You will see the computation:
[[File:ollama_gpus_computing.png|850x329px]]

Development/ollama

2025-02-11T19:05:33Z

R Keller:

Using LLMs even for inferencing requires large computational resources - currently at best a powerful GPU --
as provided by the bwHPC clusters.
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work.

== Introduction ==
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like the AMD MI-300A,
as well as GPUs like multiple NVIDIA H100.

Installing the inference server Ollama by default assumes you have root permission to install the server globally for all users
into the directory <code>/usr/local/bin</code>. Of course, this is '''not''' sensible.
Therefore the clusters provide the [[Environment_Modules|Environment Modules]] including binaries and libraries for CPU (if available AVX-512), AMD ROCm (if available) and NVIDIA CUDA using:
module load devel/ollama

More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page.

The inference server Ollama opens the well-known port 11434. The compute node's IP is on the internal network, e.g. 10.1.0.101,
which is not visible to any outside computer like Your laptop.
Therefore we need a way to forward this port on an IP visible to the outside, aka the login nodes.

== Preparation ==

Prior to starting and pulling models, it is a '''good idea''' to allocate a proper [[Workspace]] for the (multi-gigabyte) models
and create a soft-link into this directory for Ollama:
ws_allocate ollama_models 60
ln -s /pfs/work7/workspace/scratch/es_rakeller-ollama_models/ .ollama

Now we may allocate a compute node using [[BwUniCluster2.0/Slurm|Slurm]].
At first You may start with interactively checking out the method in one terminal:
srun --time=00:30:00 --gres=gpu:1 --pty /bin/bash
Please note that on bwUniCluster, You need to provide a partition, here containing a GPU, e.g. for this 30 minute run, we may select <code>--partition=dev_gpu_4</code>, on DACHS <code>--partition=gpu1</code>.

Your Shell's prompt will list the nodes name, e.g. on bwUniCluster node <code>uc2n520</code>:
[USERNAME@uc2n520 ~]$

Now You may load the Ollama module and start the server on the compute node and make sure using <code>OLLAMA_HOST</code> that it serves to the external IP address:
module load devel/ollama
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

You should be able to see the usage of the accelerator:

[[File:ollama_gpus.png|850x329px]]

== Accessing from login nodes ==
From another terminal You may log into the Cluster's login node a second time and pull a LLM (please check [https://ollama.com/search] for available models):
module load devel/ollama
export OLLAMA_HOST=uc2n520
ollama pull deepseek-r1

On the previous terminal on the compute node, You should see the model being downloaded and installed into the workspace.
Of course developing on the login nodes is not viable, therefore You may want to forward the ports.

== Port forwarding ==

The login nodes of course have externally visible IP addresses, e.g. <code>bwunicluster.scc.kit.edu</code> which get to resolved to one of the multiple login nodes.
Using the Secure shell <code>ssh</code> one may forward a port from the login node to the compute node.

Of course, You may want to '''locally on Your laptop'''.
Open another terminal and start the Secure shell using the port forwarding:
ssh -L 11434:uc2n520:11434 USERNAME@bwunicluster.scc.kit.edu
Your OTP: 123456
Password:

You may check using whether this worked using Your local browser on Your Laptop:
[[File:firefox_ollama.png|645x325px]]

== Local programming ==
Now that You made sure You have access to the compute nodes GPU, you may develop on your local system:
python -m venv ollama_test
. ollama_test/bin/activate
python -m pip install ollama
export OLLAMA_HOST=localhost

and call <code>python</code> to run the following code:
import ollama
response = ollama.chat(model='deepseek-r1', messages=[ { 'role': 'user', 'content': 'why is the sky blue?'},])
print(response)

File:Ollama gpus.png

2025-02-11T18:59:23Z

R Keller: R Keller uploaded a new version of File:Ollama gpus.png

Development/ollama

2025-02-11T18:59:10Z

R Keller:

Using LLMs even for inferencing requires large computational resources - currently at best a powerful GPU --
as provided by the bwHPC clusters.
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work.

== Introduction ==
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like the AMD MI-300A,
as well as GPUs like multiple NVIDIA H100.

Installing the inference server Ollama by default assumes you have root permission to install the server globally for all users
into the directory <code>/usr/local/bin</code>. Of course, this is '''not''' sensible.
Therefore the clusters provide the [[Environment_Modules|Environment Modules]] including binaries and libraries for CPU (if available AVX-512), AMD ROCm (if available) and NVIDIA CUDA using:
module load devel/ollama

More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page.

The inference server Ollama opens the well-known port 11434. The compute node's IP is on the internal network, e.g. 10.1.0.101,
which is not visible to any outside computer like Your laptop.
Therefore we need a way to forward this port on an IP visible to the outside, aka the login nodes.

== Preparation ==

Prior to starting and pulling models, it is a '''good idea''' to allocate a proper [[Workspace]] for the (multi-gigabyte) models
and create a soft-link into this directory for Ollama:
ws_allocate ollama_models 60
ln -s /pfs/work7/workspace/scratch/es_rakeller-ollama_models/ .ollama

Now we may allocate a compute node using [[BwUniCluster2.0/Slurm|Slurm]].
At first You may start with interactively checking out the method in one terminal:
srun --time=00:30:00 --gres=gpu:1 --pty /bin/bash
Please note that on bwUniCluster, You need to provide a partition, here containing a GPU, e.g. for this 30 minute run, we may select <code>--partition=dev_gpu_4</code>, on DACHS <code>--partition=gpu1</code>.

Your Shell's prompt will list the nodes name, e.g. on bwUniCluster node <code>uc2n520</code>:
[USERNAME@uc2n520 ~]$

Now You may load the Ollama module and start the server on the compute node and make sure using <code>OLLAMA_HOST</code> that it serves to the external IP address:
module load devel/ollama
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

You should be able to see the usage of the accelerator:

[[File:ollama_gpus.png|850x329px]]

== Accessing from login nodes ==
From another terminal You may log into the Cluster's login node a second time and install a LLM:
module load devel/ollama
export OLLAMA_HOST=uc2n520
ollama pull deepseek-r1

On the previous terminal on the compute node, You should see the model being downloaded and installed into the workspace.
Of course developing on the login nodes is not viable, therefore You may want to forward the ports.

== Port forwarding ==

The login nodes of course have externally visible IP addresses, e.g. <code>bwunicluster.scc.kit.edu</code> which get to resolved to one of the multiple login nodes.
Using the Secure shell <code>ssh</code> one may forward a port from the login node to the compute node.

Of course, You may want to '''locally on Your laptop'''.
Open another terminal and start the Secure shell using the port forwarding:
ssh -L 11434:uc2n520:11434 USERNAME@bwunicluster.scc.kit.edu
Your OTP: 123456
Password:

You may check using whether this worked using Your local browser on Your Laptop:
[[File:firefox_ollama.png|645x325px]]

== Local programming ==
Now that You made sure You have access to the compute nodes GPU, you may develop on your local system:
python -m venv ollama_test
. ollama_test/bin/activate
python -m pip install ipykernel gradio llama-index llama-index-llms-ollama llama-index-embeddings-ollama rich ollama

and run the file
from llama_index.llms.ollama import Ollama
llm = Ollama(model="deepseek-r1", request_timeout=120.0)

File:Ollama gpus.png

2025-02-11T18:56:29Z

R Keller:

Development/ollama

2025-02-11T18:56:18Z

R Keller:

Using LLMs even for inferencing requires large computational resources - currently at best a powerful GPU --
as provided by the bwHPC clusters.
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work.

== Introduction ==
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like the AMD MI-300A,
as well as GPUs like multiple NVIDIA H100.

Installing the inference server Ollama by default assumes you have root permission to install the server globally for all users
into the directory <code>/usr/local/bin</code>. Of course, this is '''not''' sensible.
Therefore the clusters provide the [[Environment_Modules|Environment Modules]] including binaries and libraries for CPU (if available AVX-512), AMD ROCm (if available) and NVIDIA CUDA using:
module load devel/ollama

More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page.

The inference server Ollama opens the well-known port 11434. The compute node's IP is on the internal network, e.g. 10.1.0.101,
which is not visible to any outside computer like Your laptop.
Therefore we need a way to forward this port on an IP visible to the outside, aka the login nodes.

== Preparation ==

Prior to starting and pulling models, it is a '''good idea''' to allocate a proper [[Workspace]] for the (multi-gigabyte) models
and create a soft-link into this directory for Ollama:
ws_allocate ollama_models 60
ln -s /pfs/work7/workspace/scratch/es_rakeller-ollama_models/ .ollama

Now we may allocate a compute node using [[BwUniCluster2.0/Slurm|Slurm]].
At first You may start with interactively checking out the method in one terminal:
srun --time=00:30:00 --gres=gpu:1 --pty /bin/bash
Please note that on bwUniCluster, You need to provide a partition, here containing a GPU, e.g. for this 30 minute run, we may select <code>--partition=dev_gpu_4</code>, on DACHS <code>--partition=gpu1</code>.

Your Shell's prompt will list the nodes name, e.g. on bwUniCluster node <code>uc2n520</code>:
[USERNAME@uc2n520 ~]$

Now You may load the Ollama module and start the server on the compute node and make sure using <code>OLLAMA_HOST</code> that it serves to the external IP address:
module load devel/ollama
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

You should be able to see the usage of the accelerator:
[[File:ollama_gpus.png|850x329px]]

== Accessing from login nodes ==
From another terminal You may log into the Cluster's login node a second time and install a LLM:
module load devel/ollama
export OLLAMA_HOST=uc2n520
ollama pull deepseek-r1

On the previous terminal on the compute node, You should see the model being downloaded and installed into the workspace.
Of course developing on the login nodes is not viable, therefore You may want to forward the ports.

== Port forwarding ==

The login nodes of course have externally visible IP addresses, e.g. <code>bwunicluster.scc.kit.edu</code> which get to resolved to one of the multiple login nodes.
Using the Secure shell <code>ssh</code> one may forward a port from the login node to the compute node.

Of course, You may want to '''locally on Your laptop'''.
Open another terminal and start the Secure shell using the port forwarding:
ssh -L 11434:uc2n520:11434 USERNAME@bwunicluster.scc.kit.edu
Your OTP: 123456
Password:

You may check using whether this worked using Your local browser on Your Laptop:
[[File:firefox_ollama.png|645x325px]]

== Local programming ==
Now that You made sure You have access to the compute nodes GPU, you may develop on your local system:
python -m venv ollama_test
. ollama_test/bin/activate
python -m pip install ipykernel gradio llama-index llama-index-llms-ollama llama-index-embeddings-ollama rich ollama

and run the file
from llama_index.llms.ollama import Ollama
llm = Ollama(model="deepseek-r1", request_timeout=120.0)

Development/ollama

2025-02-11T18:55:38Z

R Keller:

Using LLMs even for inferencing requires large computational resources - currently at best a powerful GPU --
as provided by the bwHPC clusters.
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work.

== Introduction ==
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like the AMD MI-300A,
as well as GPUs like multiple NVIDIA H100.

Installing the inference server Ollama by default assumes you have root permission to install the server globally for all users
into the directory <code>/usr/local/bin</code>. Of course, this is '''not''' sensible.
Therefore the clusters provide the [[Environment_Modules|Environment Modules]] including binaries and libraries for CPU (if available AVX-512), AMD ROCm (if available) and NVIDIA CUDA using:
module load devel/ollama

More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page.

The inference server Ollama opens the well-known port 11434. The compute node's IP is on the internal network, e.g. 10.1.0.101,
which is not visible to any outside computer like Your laptop.
Therefore we need a way to forward this port on an IP visible to the outside, aka the login nodes.

== Preparation ==

Prior to starting and pulling models, it is a '''good idea''' to allocate a proper [[Workspace]] for the (multi-gigabyte) models
and create a soft-link into this directory for Ollama:
ws_allocate ollama_models 60
ln -s /pfs/work7/workspace/scratch/es_rakeller-ollama_models/ .ollama

Now we may allocate a compute node using [[BwUniCluster2.0/Slurm|Slurm]].
At first You may start with interactively checking out the method in one terminal:
srun --time=00:30:00 --gres=gpu:1 --pty /bin/bash
Please note that on bwUniCluster, You need to provide a partition, here containing a GPU, e.g. for this 30 minute run, we may select <code>--partition=dev_gpu_4</code>, on DACHS <code>--partition=gpu1</code>.

Your Shell's prompt will list the nodes name, e.g. on bwUniCluster node <code>uc2n520</code>:
[USERNAME@uc2n520 ~]$

Now You may load the Ollama module and start the server on the compute node and make sure using <code>OLLAMA_HOST</code> that it serves to the external IP address:
module load devel/ollama
export OLLAMA_HOST=0.0.0.0:11434
ollama serve

You should be able to see the usage of the accelerator:
[[ollama_gpus.png|850x329px]]

== Accessing from login nodes ==
From another terminal You may log into the Cluster's login node a second time and install a LLM:
module load devel/ollama
export OLLAMA_HOST=uc2n520
ollama pull deepseek-r1

On the previous terminal on the compute node, You should see the model being downloaded and installed into the workspace.
Of course developing on the login nodes is not viable, therefore You may want to forward the ports.

== Port forwarding ==

The login nodes of course have externally visible IP addresses, e.g. <code>bwunicluster.scc.kit.edu</code> which get to resolved to one of the multiple login nodes.
Using the Secure shell <code>ssh</code> one may forward a port from the login node to the compute node.

Of course, You may want to '''locally on Your laptop'''.
Open another terminal and start the Secure shell using the port forwarding:
ssh -L 11434:uc2n520:11434 USERNAME@bwunicluster.scc.kit.edu
Your OTP: 123456
Password:

You may check using whether this worked using Your local browser on Your Laptop:
[[File:firefox_ollama.png|645x325px]]

== Local programming ==
Now that You made sure You have access to the compute nodes GPU, you may develop on your local system:
python -m venv ollama_test
. ollama_test/bin/activate
python -m pip install ipykernel gradio llama-index llama-index-llms-ollama llama-index-embeddings-ollama rich ollama

and run the file
from llama_index.llms.ollama import Ollama
llm = Ollama(model="deepseek-r1", request_timeout=120.0)

Development/ollama

2025-02-11T18:00:49Z

R Keller:

Using LLMs even for inferencing requires large computational resources - currently at best a powerful GPU --
as provided by the bwHPC clusters.
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work.

== Introduction ==
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like the AMD MI-300A,
as well as GPUs like multiple NVIDIA H100.

Installing the inference server Ollama by default assumes you have root permission to install the server globally for all users
into the directory <code>/usr/local/bin</code>. Of course, this is '''not''' sensible.
Therefore the clusters provide the [[Environment_Modules|Environment Modules]] including binaries and libraries for CPU (if available AVX-512), AMD ROCm (if available) and NVIDIA CUDA using:
module load devel/ollama

More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page.

The inference server Ollama opens the well-known port 11434. The compute node's IP is on the internal network, e.g. 10.1.0.101,
which is not visible to any outside computer like Your laptop.
Therefore we need a way to forward this port on an IP visible to the outside.

== Preparation ==

Prior to starting and pulling models, it is a '''good idea''' to allocate a proper [[Workspace]] for the (multi-gigabyte) models,
a sub-directory and create a soft-link into this directory for Ollama:
ws_allocate ollama_models 60
mkdir -p /pfs/work7/workspace/scratch/es_rakeller-ollama_models/.ollama/
ln -s /pfs/work7/workspace/scratch/es_rakeller-ollama_models/.ollama .

Now we may allocate a compute node using [[BwUniCluster2.0/Slurm|Slurm]].
At first You may start with interactively checking out the method in one terminal:
srun --time=00:30:00 --gres=gpu:1 --pty /bin/bash
Please note that on bwUniCluster, You need to provide a partition, here containing a GPU, e.g. for this 30 minute run, we may select <code>--partition=dev_gpu_4</code>, on DACHS <code>--partition=gpu1</code>.

Your Shell's prompt will list the nodes name, e.g. on bwUniCluster node <code>uc2n520</code>:
[USERNAME@uc2n520 ~]$

Now You may load the Ollama module and start the server on the compute node:
module load devel/ollama
ollama serve

From another terminal You may log into the Cluster's login node a second time and install a LLM:
module load devel/ollama
export OLLAMA_HOST=uc2n520
ollama pull deepseek-r1

On the previous terminal on the compute node, You should see the model being downloaded and installed into the workspace.
Of course developing on the login nodes is not viable, therefore You may want to forward the ports.

== Port forwarding ==

The login nodes of course have externally visible IP addresses, e.g. <code>bwunicluster.scc.kit.edu</code> which get to resolved to one of the multiple login nodes.
Using the Secure shell <code>ssh</code> one may forward a port from the login node to the compute node.

Of course, You may want to '''locally on Your laptop'''.
Open another terminal and start the Secure shell using the port forwarding:
ssh -L 11434:uc2n520:11434 USERNAME@bwunicluster.scc.kit.edu
Your OTP: 123456
Password:

You may check using whether this worked using Your local browser on Your Laptop:
[[File:firefox_ollama.png|645x325px]]

== Local programming ==
Now that You made sure You have access to the compute nodes GPU:

File:Firefox ollama.png

2025-02-11T17:42:29Z

R Keller:

Development/ollama

2025-02-11T17:42:13Z

R Keller:

Using LLMs even for inferencing requires large computational resources - currently at best a powerful GPU --
as provided by the bwHPC clusters.
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work.

== Introduction ==
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like the AMD MI-300A,
as well as GPUs like multiple NVIDIA H100.

Installing the inference server Ollama by default assumes you have root permission to install the server globally for all users
into the directory <code>/usr/local/bin</code>. Of course, this is '''not''' sensible.
Therefore the clusters provide the [[Environment_Modules|Environment Modules]] including binaries and libraries for CPU (if available AVX-512), AMD ROCm (if available) and NVIDIA CUDA using:
module load devel/ollama

More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page.

The inference server Ollama opens the well-known port 11434. The compute node's IP is on the internal network, e.g. 10.1.0.101,
which is not visible to any outside computer like Your laptop.
Therefore we need a way to forward this port on an IP visible to the outside.

== Port forwarding ==

The login nodes of course have externally visible IP addresses, e.g. <code>bwunicluster.scc.kit.edu</code> which get to resolved to one of the multiple login nodes.
Using the Secure shell <code>ssh</code> one may forward a port from the login node to the compute node.

First, we need to allocate a compute node using [[BwUniCluster2.0/Slurm|Slurm]].
At first You may start with interactively checking out the method in one terminal:
srun --time=00:30:00 --gres=gpu:1 --pty /bin/bash
Please note that on bwUniCluster, You need to provide a partition, here containing a GPU, e.g. for this 30 minute run, we may select <code>--partition=dev_gpu_4</code>, on DACHS <code>--partition=gpu1</code>.

Your Shell's prompt will list the nodes name, e.g. on bwUniCluster node <code>uc2n520</code>:
[USERNAME@uc2n520 ~]$

Prior to starting and pulling models, it is a '''good idea''' to allocate a proper [[Workspace]] for the (large) models, a sub-directory and create a soft-link into this directory for Ollama:
ws_allocate ollama_models 60
mkdir -p /pfs/work7/workspace/scratch/es_rakeller-ollama_models/.ollama/
ln -s /pfs/work7/workspace/scratch/es_rakeller-ollama_models/.ollama .

Now You may load the Ollama module and start the server on the compute node:
module load devel/ollama
ollama serve &

From another terminal You may log into the Cluster's login node a second time and install a LLM:
module load devel/ollama
export OLLAMA_HOST=uc2n520
ollama pull deepseek-r1

Of course, You may want to '''locally on Your laptop'''.
Open another terminal and start the Secure shell using the port forwarding:
ssh -L 11434:uc2n520:11434 USERNAME@bwunicluster.scc.kit.edu
Your OTP: 123456
Password:

You may check using whether this worked using Your local browser on Your Laptop:
[[File:firefox_ollama.png]]

Development/ollama

2025-02-11T13:22:38Z

R Keller:

Development/ollama

2025-02-11T13:08:36Z

R Keller:

Development/ollama

2025-02-11T13:08:14Z

R Keller:

Development/ollama

2025-02-11T13:07:49Z

R Keller:

Development/ollama

2025-02-11T12:50:49Z

R Keller: Introduction to Ollama

Development/General compiler usage

2025-01-23T10:54:31Z

R Keller:

{| width=600px class="wikitable"
|-
! Description !! Content
|-
| module load
| compiler/gnu or compiler/intel or compiler/llvm and others...
|-
| License
| [[Development/Intel_Compiler|Intel]]: Commercial | [[Development/GCC|GNU]]: GPL | LLVM: Apache 2 | PGI/NVIDIA: Commercial
|}

= Description =

Basically, compilers translate human-readable source code (e.g. C++ interpreted as adhering to ISO/IEC 14882:2014, encoded in UTF-8 text) into binary byte code (e.g. x86-64 with Linux ABI in ELF-format).
Compilers are complex software and have become very powerful in the last decades, to '''guide''' you as a programmer writing better, more portable, more performant programs. Use the compiler as a tool -- and best use multiple compilers on the same source code for best results.
The basic operations and hints can be performed with the same or similar commands on all available compilers. For advanced usage such as optimization and profiling you should consult the best practice guide of the compiler you intend to use ([[Development/GCC|GCC]], [[Development/Intel_Compiler|Intel Suite]]).

More information about the MPI versions of the GNU and Intel Compilers is available here:
* [[Development/Parallel_Programming|Best Practices Guide for Parallel Programming]].

= Loading compilers as modules =

Modules and loading of modules is described in general in [[Environment_Modules|here for traditional Environment Modules]] and about the Lmod implementation of Environment modules in [[Software_Modules_Lmod|here for Lmod]].

Modules need to be mentioned, since on any system there's a pre-installed set of compilers (for C, C++ and usually Fortran) -- the so-called system compilers -- provided by the Linux distribution. The system compiler however may lack certain options for optimization, are typically optimizing for older and more architectures (think SSE vs. AVX2 and AVX-512/AVX10.2). Also they may lack useful warnings or other features. On RedHat Enterprise Linux 9.4 this is GNU compiler v11.4.1.
Be advised to check out the newer compilers available as modules.

Since Fortran (and very old C++) requires compiling and linking libraries with the very same compiler, many libraries, first-and-foremost the MPI libraries need to be provided for specific versions of a compiler.
On [[BwUniCluster_2.0]], these provided libraries will only be visible to <kbd>module avail</kbd>, once a compiler is loaded.
Hence, check out loading
<pre>
$ module avail compiler/intel
...
$ module load compiler/intel/2023.1.0
...
$ module avail
</pre>
to see the available MPI modules.

All vendors whether it's Intel, GNU, LLVM or the Nvidia toolkit (see module group toolkit) have compilers for different programming languages which will be available
only after loading the module.

== Linux Default Compiler ==

The default Compiler installed on all compute nodes is the GNU Compiler Collection (GCC) or in short GNU compiler.
* Don't get distracted with the available compiler modules.
* Only the modules are loading the complete environments needed.
Example
<pre>
$ module purge # unload all modules
$ module list # check which modules are loaded, we expect none
No Modulefiles Currently Loaded.
$ gcc --version # see version of default Linux GNU compiler
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18)
[...]
$ module load compiler/gnu # load default GNU compiler module
$ module list # check which modules are loaded, we expect the default GNU compiler
Currently Loaded Modulefiles:
1) compiler/gnu/13.3
$ gcc --version # now, check the current (loaded) version of the GNU C compiler
gcc (GCC) 13.3.0
[...]
</pre>

= Synoptical Tables =

== Compilers (no MPI) ==

{| width=600px class="wikitable"
|-
! Compiler Suite
! Language
! Command
|-
| style="vertical-align:top;" rowspan="3" | <big>Intel Composer (pre-OneAPI)</big> [[Development/Intel_Compiler|• Best Practice Guides on Intel Compiler Software]]
| C
| icc
|-
| C++
| icpc
|-
| Fortran
| ifort
|-
| style="vertical-align:top;" rowspan="3" | <big>Intel OneAPI (llvm-based)</big> [[Development/Intel_Compiler|• Best Practice Guides on Intel Compiler Software]]
| C
| icx
|-
| C++
| icpx
|-
| Fortran
| ifx
|-
| style="vertical-align:top;" rowspan="3" | <big>GCC</big> [[Development/GCC|• Best Practice Guides on GNU Compiler Software]]
| C
| gcc
|-
| C++
| g++
|-
| Fortran
| gfortran
|-
| style="vertical-align:top;" rowspan="3" | <big>LLVM</big>
| C
| clang
|-
| C++
| clang++
|-
| Fortran 77/90
| flang
|-
| style="vertical-align:top;" rowspan="3" | <big>PGI/NVIDIA</big>
| C
| pgcc
|-
| C++
| pgCC
|-
| Fortran 77/90
| pgf77 or pgf90
|}

== MPI compiler and Underlying Compilers ==

MPI implementations such as MPIch, Intel-MPI (derived from MPIch) or Open MPI provide compiler wrappers, easing the usage of MPI by providing the Include-Directory <kbd>-I</kbd> and required libraries as well as the implementation's library directory flag <kbd>-L</kbd> for linking.
The following table lists available MPI compiler commands and the underlying compilers, compiler families, languages, and application binary interfaces (ABIs) that they support.

{| width=600px class="wikitable"
|-
! MPI Compiler Command !! Default Compiler !! Supported Language(s) !! Supported ABI's
|-
| colspan=4 style="background-color:#DCDCDC;" | Generic Compilers
|-
| mpicc || gcc, cc || C || 32/64 bit
|-
| mpicxx || g++ || C/C++ || 32/64 bit
|-
| mpifc || gfortran || Fortran77/Fortran 95 || 32/64 bit
|-
| colspan=4 style="background-color:#DCDCDC;" | [[Development/GCC|GNU Compiler]] Versions 3 and higher
|-
| mpigcc || gcc || C || 32/64 bit
|-
| mpigxx || g++ || C/C++ || 32/64 bit
|-
| mpif77 || g77 || Fortran 77 || 32/64 bit
|-
| mpif90 || gfortran || Fortran 95 || 32/64 bit
|-
| colspan=4 style="background-color:#DCDCDC;" | [[Development/Intel_Compiler|Intel Fortran, C++ Compilers]] Versions 13.1 through 14.0 and Higher
|-
| mpiicc || icc || C || 32/64 bit
|-
| mpiicpc || icpc || C++ || 32/64 bit
|-
|impiifort || ifort || Fortran77/Fortran 95 || 32/64 bit
|-
|}

= How to use =

The following compiler commands work for all the compilers in the list above even though
the examples will be for '''icc''' only.

== Commands ==

The typical introduction is a "Hello World" program. The following C source code shows best practices:
<source lang="c">
#include <stdio.h> // for printf
#include <stdlib.h> // for EXIT_SUCCESS and EXIT_FAILURE
int main (int argc, char * argv[]) { // std. definition of a program taking arguments
printf("Hello World\n"); // Unix Output is line-buffered, end line with New-line.
return EXIT_SUCCESS; // End program by returning 0 (No Error)
}</source>
After loading the Intel Compiler module using <kbd>module load compiler/intel/</kbd>, the source may be compiled and linked with the single command
<pre>$ icc hello.c -o hello</pre>
to produce an executable named <kbd>hello</kbd>. This may be executed using the command <kbd>./hello</kbd>.

This process can be divided into two steps:
<pre>
$ icc -c hello.c # Compile .c File into object file (ending in .o)
$ icc hello.o -o hello # Link the object file with the system libc library)
</pre>
When using libraries you must sometimes specify the directories where the
* include files (option <kbd>-I</kbd>) and where the
* library files (option <kbd>-L</kbd>) are located.
In addition you have to tell the compiler which
* library you want to link to (option <kbd>-l</kbd>).
For example after <kbd>module load numlib/fftw</kbd> you can compile code for fftw using
<pre>
$ icc -c hello.c -I$FFTW_INC_DIR
$ icc hello.o -o hello -L$FFTW_LIB_DIR -lfftw3
</pre>
When the program crashes or doesn't produce the expected output the compiler can
help you by printing all warning messages <kbd>-Wall</kbd> and adding flags for debugging <kbd>-g</kbd>:
<pre>$ icc -Wall -g hello.c -o hello</pre>

== Debugger ==

If the problem can't be solved this way you can inspect what exactly your program
does using a debugger, e.g. [[Development/GDB|GDB]].

To use the debugger properly with your program you have to compile it with debug information (option <kbd>-g</kbd>):

Example
<pre>$ icc -g hello.c -o hello</pre>
Although the compiler option <kbd>-Wall</kbd> (and possibly others) should always be set, the <kbd>-g</kbd> option should only be passed for
debugging purposes to find bugs.
It may slow down execution and enlarges the binary due to debugging symbols.

== Optimization ==

The usual and common way to compile your source is to apply compiler optimization.

Since there are many optimization options, as a start for now the optimization level -O2 is recommended:
<pre>$ icc -O2 hello.c -o hello</pre>
Beware: The optimization-flag used is a capital-O (like Otto) and not a 0 (Zero)!

All compilers offer a multitude of optimization options,
one may check the complete list of options with short explanation on [[Development/GCC|GCC]], [[LLVM|LLVM]] and
[[Development/Intel_Compiler|Intel Suite]] using option '''-v''' '''--help''':
<pre>
$ icc -v --help | less
$ gcc -v --help | less
$ clang -v --help | less
</pre>

Please note, that the optimization level <kbd>-O2</kbd> produces code for a general instruction set.
If you want to set the instruction set available, and take advantage of AVX2 or AVX512f/AVX10.2, you have to
either add the machine-dependent <kbd>-mavx512f</kbd> or set the specific architecture of your
target processor.
For [[BwUniCluster_2.0]] this depends on whether you run your application on any node, then you would select
the older Broadwell CPU, or whether You target the newer HPC nodes (which feature Xeon Gold 6230, aka "Cascade Lake"
architecture).
<pre>
$ gcc -O2 -o hello hello.c # General optimization for any architecture
$ gcc -O2 -march=broadwell -o hello hello.c # Will work on any compute node on bwUniCluster 2.0
$ gcc -O2 -march=cascadelake -o hello hello.c # This may not run on Broadwell nodes
</pre>

While adding <kbd>-march=broadwell</kbd> adds the compiler options such as <kbd>-mavx -mavx2 -msse3 -msse4 -msse4.1 -msse4.2 -mssse3</kbd>,
adding <kbd>-march=cascadelake</kbd> will further this by <kbd>-mavx512bw -mavx512cd -mavx512dq -mavx512f -mavx512vl -mavx512vnni -mfma</kbd>,
where <kbd>-mfma</kbd> is the setting for allowing fused-multiply-add.
These options may provide considerable speed-up to your code as is.
'''Please note''' however, that Cascade Lake may throttle the processor's clock speed, when executing AVX-512 instructions, possibly running slower than
(older) AVX2 code paths would have.

You should then pay attention to vectorization attained by the compiler -- and concentrate on the time-consuming loops,
where the compiler was not able to vectorize.
Further vectorization as described in the Best Practice Guides may help.
This information is available with the Intel compiler using <kbd>-qopt-report=5</kbd> producing a lot of output in <kbd>hello.optrpt</kbd>,
while GCC offers this information using <kbd>-fopt-info-all</kbd>

For GCC the options in use are best visible by calling <kbd>gcc -O2 -fverbose-asm -S -o hello.S hello.c</kbd>.
The option <kbd>-fverbose-asm</kbd> stores all the options in the assembler file <kbd>hello.S</kbd>.

== Warnings and Error detection ==
All compilers have improved tremendously with regards to analyzing and detecting suspicious code: do make '''use''' of such warnings and hints.
The amount of false positives has reduced and it will make your code more accessible, less error-prone and more portable.

The typical warning flags are <kbd>-Wall</kbd> to turn on ''all'' warnings.
However, there's multiple other worthwhile warnings, which are not covered (since they might increase false positives, or since they are not yet considered so prominent).
E.g. <kbd>-Wextra</kbd> turns on several other warnings, which will in the above example show that neither <kbd>argc</kbd> nor <kbd>argv</kbd> have been used inside of <kbd>main</kbd>.

For LLVM's <kbd>clang</kbd> the flag <kbd>-Weverything</kbd> turns on all available warnings, albeit leading to many warnings (even false positives) on larger projects.
However, the fix-it hints are very helpful as well.

All the compilers offer the flag <kbd>-Werror</kbd> which turns any warning (allowing completion of compilation) into hard errors.
 
 

[[File:static_code_analysis.png|right|border|513px|Copyright: HS Esslingen)]]
Another powerful feature available in GNU- and LLVM-based compilers is '''static code analysis''', otherwise only available in Commercial tools, like [https://www.synopsys.com/software-integrity/security-testing/static-analysis-sast.html Coverity].
Static code analysis evaluates '''each''' and '''every''' code path, making assumptions on input values and branches taken, detecting corner cases which might lead to real errors -- without having to actually execute this code path.

For GCC this is turned on using <kbd>-fanalyzer</kbd> which will detect e.g. cases of memory usage after a <kbd>free()</kbd> of said memory and many others. [https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html#Static-Analyzer-Options GCC's documentation] on Static Analysis provides further details.

For LLVM recompile your project using <kbd>scan-build</kbd>, e.g.:
<pre>
$ scan-build make
</pre>

This produces warnings on <kbd>stdout</kbd>, but more importantly scan reports in directory <kbd>/scratch/scan-build-XXX</kbd>, where XXX is date and time of the build.
For example the output of Open MPI includes real issues of missed memory releases in error code paths -- as shown in the following picture.

Development/General compiler usage

2025-01-23T10:51:35Z

R Keller:

{| width=600px class="wikitable"
|-
! Description !! Content
|-
| module load
| compiler/gnu or compiler/intel or compiler/llvm and others...
|-
| License
| [[Development/Intel_Compiler|Intel]]: Commercial | [[Development/GCC|GNU]]: GPL | LLVM: Apache 2 | PGI/NVIDIA: Commercial
|}

= Description =

Basically, compilers translate human-readable source code (e.g. C++ interpreted as adhering to ISO/IEC 14882:2014, encoded in UTF-8 text) into binary byte code (e.g. x86-64 with Linux ABI in ELF-format).
Compilers are complex software and have become very powerful in the last decades, to '''guide''' you as a programmer writing better, more portable, more performant programs. Use the compiler as a tool -- and best use multiple compilers on the same source code for best results.
The basic operations and hints can be performed with the same or similar commands on all available compilers. For advanced usage such as optimization and profiling you should consult the best practice guide of the compiler you intend to use ([[Development/GCC|GCC]], [[Development/Intel_Compiler|Intel Suite]]).

More information about the MPI versions of the GNU and Intel Compilers is available here:
* [[Development/Parallel_Programming|Best Practices Guide for Parallel Programming]].

= Loading compilers as modules =

Modules and loading of modules is described in general in [[Environment_Modules|here for traditional Environment Modules]] and about the Lmod implementation of Environment modules in [[Software_Modules_Lmod|here for Lmod]].

Modules need to be mentioned, since on any system there's a pre-installed set of compilers (for C, C++ and usually Fortran) -- the so-called system compilers -- provided by the Linux distribution. The system compiler however may lack certain options for optimization, are typically optimizing for older and more architectures (think SSE vs. AVX2 and AVX-512/AVX10.2). Also they may lack useful warnings or other features. On RedHat Enterprise Linux 9.4 this is GNU compiler v11.4.1.
Be advised to check out the newer compilers available as modules.

Since Fortran (and very old C++) requires compiling and linking libraries with the very same compiler, many libraries, first-and-foremost the MPI libraries need to be provided for specific versions of a compiler.
On [[BwUniCluster_2.0]], these provided libraries will only be visible to <kbd>module avail</kbd>, once a compiler is loaded.
Hence, check out loading
<pre>
$ module avail compiler/intel
...
$ module load compiler/intel/2023.1.0
...
$ module avail
</pre>
to see the available MPI modules.

All vendors whether it's Intel, GNU, LLVM or the Nvidia toolkit (see module group toolkit) have compilers for different programming languages which will be available
only after loading the module.

== Linux Default Compiler ==

The default Compiler installed on all compute nodes is the GNU Compiler Collection (GCC) or in short GNU compiler.
* Don't get distracted with the available compiler modules.
* Only the modules are loading the complete environments needed.
Example
<pre>
$ module purge # unload all modules
$ module list # check which modules are loaded, we expect none
No Modulefiles Currently Loaded.
$ gcc --version # see version of default Linux GNU compiler
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18)
[...]
$ module load compiler/gnu # load default GNU compiler module
$ module list # check which modules are loaded, we expect the default GNU compiler
Currently Loaded Modulefiles:
1) compiler/gnu/13.3
$ gcc --version # now, check the current (loaded) version of the GNU C compiler
gcc (GCC) 13.3.0
[...]
</pre>

= Synoptical Tables =

== Compilers (no MPI) ==

{| width=600px class="wikitable"
|-
! Compiler Suite
! Language
! Command
|-
| style="vertical-align:top;" rowspan="3" | <big>Intel Composer (pre-OneAPI)</big> [[Development/Intel_Compiler|• Best Practice Guides on Intel Compiler Software]]
| C
| icc
|-
| C++
| icpc
|-
| Fortran
| ifort
|-
| style="vertical-align:top;" rowspan="3" | <big>Intel OneAPI (llvm-based)</big> [[Development/Intel_Compiler|• Best Practice Guides on Intel Compiler Software]]
| C
| icx
|-
| C++
| icpx
|-
| Fortran
| ifx
|-
| style="vertical-align:top;" rowspan="3" | <big>GCC</big> [[Development/GCC|• Best Practice Guides on GNU Compiler Software]]
| C
| gcc
|-
| C++
| g++
|-
| Fortran
| gfortran
|-
| style="vertical-align:top;" rowspan="3" | <big>LLVM</big>
| C
| clang
|-
| C++
| clang++
|-
| Fortran 77/90
| flang
|-
| style="vertical-align:top;" rowspan="3" | <big>PGI/NVIDIA</big>
| C
| pgcc
|-
| C++
| pgCC
|-
| Fortran 77/90
| pgf77 or pgf90
|}

== MPI compiler and Underlying Compilers ==

MPI implementations such as MPIch, Intel-MPI (derived from MPIch) or Open MPI provide compiler wrappers, easing the usage of MPI by providing the Include-Directory <kbd>-I</kbd> and required libraries as well as the implementation's library directory flag <kbd>-L</kbd> for linking.
The following table lists available MPI compiler commands and the underlying compilers, compiler families, languages, and application binary interfaces (ABIs) that they support.

{| width=600px class="wikitable"
|-
! MPI Compiler Command !! Default Compiler !! Supported Language(s) !! Supported ABI's
|-
| colspan=4 style="background-color:#DCDCDC;" | Generic Compilers
|-
| mpicc || gcc, cc || C || 32/64 bit
|-
| mpicxx || g++ || C/C++ || 32/64 bit
|-
| mpifc || gfortran || Fortran77/Fortran 95 || 32/64 bit
|-
| colspan=4 style="background-color:#DCDCDC;" | [[Development/GCC|GNU Compiler]] Versions 3 and higher
|-
| mpigcc || gcc || C || 32/64 bit
|-
| mpigxx || g++ || C/C++ || 32/64 bit
|-
| mpif77 || g77 || Fortran 77 || 32/64 bit
|-
| mpif90 || gfortran || Fortran 95 || 32/64 bit
|-
| colspan=4 style="background-color:#DCDCDC;" | [[Development/Intel_Compiler|Intel Fortran, C++ Compilers]] Versions 13.1 through 14.0 and Higher
|-
| mpiicc || icc || C || 32/64 bit
|-
| mpiicpc || icpc || C++ || 32/64 bit
|-
|impiifort || ifort || Fortran77/Fortran 95 || 32/64 bit
|-
|}

= How to use =

The following compiler commands work for all the compilers in the list above even though
the examples will be for '''icc''' only.

== Commands ==

The typical introduction is a "Hello World" program. The following C source code shows best practices:
<source lang="c">
#include <stdio.h> // for printf
#include <stdlib.h> // for EXIT_SUCCESS and EXIT_FAILURE
int main (int argc, char * argv[]) { // std. definition of a program taking arguments
printf("Hello World\n"); // Unix Output is line-buffered, end line with New-line.
return EXIT_SUCCESS; // End program by returning 0 (No Error)
}</source>
After loading the Intel Compiler module ''module load compiler/intel/'', the source may be compiled and linked with the single command
<pre>$ icc hello.c -o hello</pre>
to produce an executable named ''hello''. This may be executed using the command ''./hello''.

This process can be divided into two steps:
<pre>
$ icc -c hello.c # Compile .c File into object file (ending in .o)
$ icc hello.o -o hello # Link the object file with the system libc library)
</pre>
When using libraries you must sometimes specify the directories where the
* include files (option <kbd>-I</kbd>) and where the
* library files (option <kbd>-L</kbd>) are located.
In addition you have to tell the compiler which
* library you want to link to (option <kbd>-l</kbd>).
For example after <kbd>module load numlib/fftw</kbd> you can compile code for fftw using
<pre>
$ icc -c hello.c -I$FFTW_INC_DIR
$ icc hello.o -o hello -L$FFTW_LIB_DIR -lfftw3
</pre>
When the program crashes or doesn't produce the expected output the compiler can
help you by printing all warning messages <kbd>-Wall</kbd> and adding flags for debugging <kbd>-g</kbd>:
<pre>$ icc -Wall -g hello.c -o hello</pre>

== Debugger ==

If the problem can't be solved this way you can inspect what exactly your program
does using a debugger, e.g. [[Development/GDB|GDB]].

To use the debugger properly with your program you have to compile it with debug information (option <kbd>-g</kbd>):

Example
<pre>$ icc -g hello.c -o hello</pre>
Although the compiler option <kbd>-Wall</kbd> (and possibly others) should always be set, the <kbd>-g</kbd> option should only be passed for
debugging purposes to find bugs.
It may slow down execution and enlarges the binary due to debugging symbols.

== Optimization ==

The usual and common way to compile your source is to apply compiler optimization.

Since there are many optimization options, as a start for now the optimization level -O2 is recommended:
<pre>$ icc -O2 hello.c -o hello</pre>
Beware: The optimization-flag used is a capital-O (like Otto) and not a 0 (Zero)!

All compilers offer a multitude of optimization options,
one may check the complete list of options with short explanation on [[Development/GCC|GCC]], [[LLVM|LLVM]] and
[[Development/Intel_Compiler|Intel Suite]] using option '''-v''' '''--help''':
<pre>
$ icc -v --help | less
$ gcc -v --help | less
$ clang -v --help | less
</pre>

Please note, that the optimization level <kbd>-O2</kbd> produces code for a general instruction set.
If you want to set the instruction set available, and take advantage of AVX2 or AVX512f/AVX10.2, you have to
either add the machine-dependent <kbd>-mavx512f</kbd> or set the specific architecture of your
target processor.
For [[BwUniCluster_2.0]] this depends on whether you run your application on any node, then you would select
the older Broadwell CPU, or whether You target the newer HPC nodes (which feature Xeon Gold 6230, aka "Cascade Lake"
architecture).
<pre>
$ gcc -O2 -o hello hello.c # General optimization for any architecture
$ gcc -O2 -march=broadwell -o hello hello.c # Will work on any compute node on bwUniCluster 2.0
$ gcc -O2 -march=cascadelake -o hello hello.c # This may not run on Broadwell nodes
</pre>

While adding <kbd>-march=broadwell</kbd> adds the compiler options such as <kbd>-mavx -mavx2 -msse3 -msse4 -msse4.1 -msse4.2 -mssse3</kbd>,
adding <kbd>-march=cascadelake</kbd> will further this by <kbd>-mavx512bw -mavx512cd -mavx512dq -mavx512f -mavx512vl -mavx512vnni -mfma</kbd>,
where <kbd>-mfma</kbd> is the setting for allowing fused-multiply-add.
These options may provide considerable speed-up to your code as is.
'''Please note''' however, that Cascade Lake may throttle the processor's clock speed, when executing AVX-512 instructions, possibly running slower than
(older) AVX2 code paths would have.

You should then pay attention to vectorization attained by the compiler -- and concentrate on the time-consuming loops,
where the compiler was not able to vectorize.
Further vectorization as described in the Best Practice Guides may help.
This information is available with the Intel compiler using <kbd>-qopt-report=5</kbd> producing a lot of output in <kbd>hello.optrpt</kbd>,
while GCC offers this information using <kbd>-fopt-info-all</kbd>

For GCC the options in use are best visible by calling <kbd>gcc -O2 -fverbose-asm -S -o hello.S hello.c</kbd>.
The option <kbd>-fverbose-asm</kbd> stores all the options in the assembler file <kbd>hello.S</kbd>.

== Warnings and Error detection ==
All compilers have improved tremendously with regards to analyzing and detecting suspicious code: do make '''use''' of such warnings and hints.
The amount of false positives has reduced and it will make your code more accessible, less error-prone and more portable.

The typical warning flags are <kbd>-Wall</kbd> to turn on ''all'' warnings.
However, there's multiple other worthwhile warnings, which are not covered (since they might increase false positives, or since they are not yet considered so prominent).
E.g. <kbd>-Wextra</kbd> turns on several other warnings, which will in the above example show that neither <kbd>argc</kbd> nor <kbd>argv</kbd> have been used inside of <kbd>main</kbd>.

For LLVM's <kbd>clang</kbd> the flag <kbd>-Weverything</kbd> turns on all available warnings, albeit leading to many warnings (even false positives) on larger projects.
However, the fix-it hints are very helpful as well.

All the compilers offer the flag <kbd>-Werror</kbd> which turns any warning (allowing completion of compilation) into hard errors.
 
 

[[File:static_code_analysis.png|right|border|513px|Copyright: HS Esslingen)]]
Another powerful feature available in GNU- and LLVM-based compilers is ''static code analysis''', otherwise only available in Commercial tools, like [https://www.synopsys.com/software-integrity/security-testing/static-analysis-sast.html Coverity].
Static code analysis evaluates '''each''' and '''every''' code path, making assumptions on input values and branches taken, detecting corner cases which might lead to real errors -- without having to actually execute this code path.

For GCC this is turned on using <kbd>-fanalyzer</kbd> which will detect e.g. cases of memory usage after a <kbd>free()</kbd> of said memory and many others. [https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html#Static-Analyzer-Options GCC's documentation] on Static Analysis provides further details.

For LLVM recompile your project using <kbd>scan-build</kbd>, e.g.:
<pre>
$ scan-build make
</pre>

This produces warnings on <kbd>stdout</kbd>, but more importantly scan reports in directory <kbd>/scratch/scan-build-XXX</kbd>, where XXX is date and time of the build.
For example the output of Open MPI includes real issues of missed memory releases in error code paths -- as shown in the following picture.

Development/General compiler usage

2025-01-23T10:20:18Z

R Keller: Update the versions to match the current (bwUniCluster 2.0) expectations

{| width=600px class="wikitable"
|-
! Description !! Content
|-
| module load
| compiler/gnu or compiler/intel or compiler/llvm and others...
|-
| License
| [[Development/Intel_Compiler|Intel]]: Commercial | [[Development/GCC|GNU]]: GPL | LLVM: Apache 2 | PGI/NVIDIA: Commercial
|}

= Description =

Basically, compilers translate human-readable source code (e.g. C++ interpreted as adhering to ISO/IEC 14882:2014, encoded in UTF-8 text) into binary byte code (e.g. x86-64 with Linux ABI in ELF-format).
Compilers are complex software and have become very powerful in the last decades, to '''guide''' you as a programmer writing better, more portable, more performant programs. Use the compiler as a tool -- and best use multiple compilers on the same source code for best results.
The basic operations and hints can be performed with the same or similar commands on all available compilers. For advanced usage such as optimization and profiling you should consult the best practice guide of the compiler you intend to use ([[Development/GCC|GCC]], [[Development/Intel_Compiler|Intel Suite]]).

More information about the MPI versions of the GNU and Intel Compilers is available here:
* [[Development/Parallel_Programming|Best Practices Guide for Parallel Programming]].

= Loading compilers as modules =

Modules and loading of modules is described in general in [[Environment_Modules|here for traditional Environment Modules]] and about the Lmod implementation of Environment modules in [[Software_Modules_Lmod|here for Lmod]].

Modules need to be mentioned, since on any system there's a pre-installed set of compilers (for C, C++ and usually Fortran) -- the so-called system compilers -- provided by the Linux distribution. The system compiler however may lack certain options for optimization, are typically optimizing for older and more architectures (think SSE vs. AVX2 and AVX-512/AVX10.2). Also they may lack useful warnings or other features. On RedHat Enterprise Linux 9.4 this is GNU compiler v11.4.1.
Be advised to check out the newer compilers available as modules.

Since Fortran (and very old C++) requires compiling and linking libraries with the very same compiler, many libraries, first-and-foremost the MPI libraries need to be provided for specific versions of a compiler.
On [[BwUniCluster_2.0]], these provided libraries will only be visible to <kbd>module avail</kbd>, once a compiler is loaded.
Hence, check out loading
<pre>
$ module avail compiler/intel
...
$ module load compiler/intel/2023.1.0
...
$ module avail
</pre>
to see the available MPI modules.

All vendors whether it's Intel, GNU, LLVM or the Nvidia toolkit (see toolkit) have compilers for different programming languages which will be available
only after loading the module.

== Linux Default Compiler ==

The default Compiler installed on all compute nodes is the GNU Compiler Collection (GCC) or in short GNU compiler.
* Don't get distracted with the available compiler modules.
* Only the modules are loading the complete environments needed.
Example
<pre>
$ module purge # unload all modules
$ module list # check which modules are loaded, we expect none
No Modulefiles Currently Loaded.
$ gcc --version # see version of default Linux GNU compiler
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18)
[...]
$ module load compiler/gnu # load default GNU compiler module
$ module list # check which modules are loaded, we expect the default GNU compiler
Currently Loaded Modulefiles:
1) compiler/gnu/13.3
$ gcc --version # now, check the current (loaded) version of the GNU C compiler
gcc (GCC) 13.3.0
[...]
</pre>

= Synoptical Tables =

== Compilers (no MPI) ==

{| width=600px class="wikitable"
|-
! Compiler Suite
! Language
! Command
|-
| style="vertical-align:top;" rowspan="3" | <big>Intel Composer (pre-OneAPI)</big> [[Development/Intel_Compiler|• Best Practice Guides on Intel Compiler Software]]
| C
| icc
|-
| C++
| icpc
|-
| Fortran
| ifort
|-
| style="vertical-align:top;" rowspan="3" | <big>Intel OneAPI (llvm-based)</big> [[Development/Intel_Compiler|• Best Practice Guides on Intel Compiler Software]]
| C
| icx
|-
| C++
| icpx
|-
| Fortran
| ifx
|-
| style="vertical-align:top;" rowspan="3" | <big>GCC</big> [[Development/GCC|• Best Practice Guides on GNU Compiler Software]]
| C
| gcc
|-
| C++
| g++
|-
| Fortran
| gfortran
|-
| style="vertical-align:top;" rowspan="3" | <big>LLVM</big>
| C
| clang
|-
| C++
| clang++
|-
| Fortran 77/90
| flang
|-
| style="vertical-align:top;" rowspan="3" | <big>PGI/NVIDIA</big>
| C
| pgcc
|-
| C++
| pgCC
|-
| Fortran 77/90
| pgf77 or pgf90
|}

== MPI compiler and Underlying Compilers ==

MPI implementations such as MPIch, Intel-MPI (derived from MPIch) or Open MPI provide compiler wrappers, easing the usage of MPI by providing the Include-Directory <kbd>-I</kbd> and required libraries as well as the MPI implementations library directorie <kbd>-L</kbd> for linking.
The following table lists available MPI compiler commands and the underlying compilers, compiler families, languages, and application binary interfaces (ABIs) that they support.

{| width=600px class="wikitable"
|-
! MPI Compiler Command !! Default Compiler !! Supported Language(s) !! Supported ABI's
|-
| colspan=4 style="background-color:#DCDCDC;" | Generic Compilers
|-
| mpicc || gcc, cc || C || 32/64 bit
|-
| mpicxx || g++ || C/C++ || 32/64 bit
|-
| mpifc || gfortran || Fortran77/Fortran 95 || 32/64 bit
|-
| colspan=4 style="background-color:#DCDCDC;" | [[Development/GCC|GNU Compiler]] Versions 3 and higher
|-
| mpigcc || gcc || C || 32/64 bit
|-
| mpigxx || g++ || C/C++ || 32/64 bit
|-
| mpif77 || g77 || Fortran 77 || 32/64 bit
|-
| mpif90 || gfortran || Fortran 95 || 32/64 bit
|-
| colspan=4 style="background-color:#DCDCDC;" | [[Development/Intel_Compiler|Intel Fortran, C++ Compilers]] Versions 13.1 through 14.0 and Higher
|-
| mpiicc || icc || C || 32/64 bit
|-
| mpiicpc || icpc || C++ || 32/64 bit
|-
|impiifort || ifort || Fortran77/Fortran 95 || 32/64 bit
|-
|}

= How to use =

The following compiler commands work for all the compilers in the list above even though
the examples will be for '''icc''' only.

== Commands ==

The typical introduction is a "Hello World" program. The following C source code shows best practices:
<source lang="c">
#include <stdio.h> // for printf
#include <stdlib.h> // for EXIT_SUCCESS and EXIT_FAILURE
int main (int argc, char * argv[]) { // std. definition of a program taking arguments
printf("Hello World\n"); // Unix Output is line-buffered, end line with New-line.
return EXIT_SUCCESS; // End program by returning 0 (No Error)
}</source>
It may be compiled and linked with the single command
<pre>$ icc hello.c -o hello</pre>
to produce an executable named ''hello''.

This process can be divided into two steps:
<pre>
$ icc -c hello.c
$ icc hello.o -o hello
</pre>
When using libraries you must sometimes specify where the
* include files are (option <kbd>-I</kbd>) and where the
* library files are (option <kbd>-L</kbd>).
In addition you have to tell the compiler which
* library you want to use (option <kbd>-l</kbd>).
For example after loading the module numlib/fftw you can compile code for fftw using
<pre>
$ icc -c hello.c -I$FFTW_INC_DIR
$ icc hello.o -o hello -L$FFTW_LIB_DIR -lfftw3
</pre>
When the program crashes or doesn't produce the expected output the compiler can
help you by printing all warning messages <kbd>-Wall</kbd> and adding flags for debugging <kbd>-g</kbd>:
<pre>$ icc -Wall -g hello.c -o hello</pre>

== Debugger ==

If the problem can't be solved this way you can inspect what exactly your program
does using a debugger, e.g. [[Development/GDB|GDB]].

To use the debugger properly with your program you have to compile it with debug information (option <kbd>-g</kbd>):

Example
<pre>$ icc -g hello.c -o hello</pre>
Although the compiler option <kbd>-Wall</kbd> (and possibly others) should always be set, the <kbd>-g</kbd> option should only be passed for
debugging purposes to find bugs.
It may slow down execution and enlarges the binary due to debugging symbols.

== Optimization ==

The usual and common way to compile your source is to apply compiler optimization.

Since there are many optimization options, as a start for now the optimization level -O2 is recommended:
<pre>$ icc -O2 hello.c -o hello</pre>
Beware: The optimization-flag used is a capital-O (like Otto) and not a 0 (Zero)!

All compilers offer a multitude of optimization options,
one may check the complete list of options with short explanation on [[Development/GCC|GCC]], [[LLVM|LLVM]] and
[[Development/Intel_Compiler|Intel Suite]] using option '''-v''' '''--help''':
<pre>
$ icc -v --help | less
$ gcc -v --help | less
$ clang -v --help | less
</pre>

Please note, that the optimization level <kbd>-O2</kbd> produces code for a general instruction set.
If you want to set the instruction set available, and take advantage of AVX2 or AVX512f, you have to
either add the machine-dependent <kbd>-mavx512f</kbd> or set the specific architecture of your
target processor.
For [[BwUniCluster_2.0]] this depends on whether you run your application on any node, then you would select
the older Broadwell CPU, or whether You target the newer HPC nodes (which feature Xeon Gold 6230, aka "Cascade Lake"
architecture).
<pre>
$ gcc -O2 -o hello hello.c # General optimization for any architecture
$ gcc -O2 -march=broadwell -o hello hello.c # Will work on any compute node on bwUniCluster 2.0
$ gcc -O2 -march=cascadelake -o hello hello.c # This may not run on Broadwell nodes
</pre>

While adding <kbd>-march=broadwell</kbd> adds the compiler options such as <kbd>-mavx -mavx2 -msse3 -msse4 -msse4.1 -msse4.2 -mssse3</kbd>,
adding <kbd>-march=cascadelake</kbd> will further this by <kbd>-mavx512bw -mavx512cd -mavx512dq -mavx512f -mavx512vl -mavx512vnni -mfma</kbd>,
where <kbd>-mfma</kbd> is the setting for allowing fused-multiply-add.
These options may provide considerable speed-up to your code as is.
'''Please note''' however, that Cascade Lake may throttle the processor's clock speed, when executing AVX-512 instructions, possibly running slower than
(older) AVX2 code paths would have.

You should then pay attention to vectorization attained by the compiler -- and concentrate on the time-consuming loops,
where the compiler was not able to vectorize.
Further vectorization as described in the Best Practice Guides may help.
This information is available with the Intel compiler using <kbd>-qopt-report=5</kbd> producing a lot of output in <kbd>hello.optrpt</kbd>,
while GCC offers this information using <kbd>-fopt-info-all</kbd>

For GCC the options in use are best visible by calling <kbd>gcc -O2 -fverbose-asm -S -o hello.S hello.c</kbd>.
The option <kbd>-fverbose-asm</kbd> stores all the options in the assembler file <kbd>hello.S</kbd>.

== Warnings and Error detection ==
All compilers have improved tremendously with regards to analyzing and detecting suspicious code: do make '''use''' of such warnings and hints.
The amount of false positives has reduced and it will make your code more accessible, less error-prone and more portable.

The typical warning flags are <kbd>-Wall</kbd> to turn on ''all'' warnings.
However, there's multiple other worthwhile warnings, which are not covered (since they might increase false positives, or since they are not yet considered so prominent).
E.g. <kbd>-Wextra</kbd> turns on several other warnings, which will in the above example show that neither <kbd>argc</kbd> nor <kbd>argv</kbd> have been used inside of <kbd>main</kbd>.

For LLVM's <kbd>clang</kbd> the flag <kbd>-Weverything</kbd> turns on all available warnings, albeit leading to many warnings on larger projects.
However, the fix-it hints are very helpful as well.

All the compilers offer the flag <kbd>-Werror</kbd> which turns any warning (allowing completion of compilation) into hard errors.
 
 

[[File:static_code_analysis.png|right|border|513px|Copyright: HS Esslingen)]]
Another powerful feature available in GNU- and LLVM-compilers is ''static code analysis''', otherwise only available in Commercial tools, like [https://www.synopsys.com/software-integrity/security-testing/static-analysis-sast.html Coverity].
Static code analysis evaluates '''each''' and '''every''' code path, making assumptions on input values and branches taken, detecting corner cases which might lead to real errors -- without having to actually execute this code path.

For GCC this is turned on using <kbd>-fanalyzer</kbd> which will detect e.g. cases of memory usage after a <kbd>free()</kbd> of said memory and many others. [https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html#Static-Analyzer-Options GCC's documentation] on Static Analysis provides further details.

For LLVM recompile your project using <kbd>scan-build</kbd>, e.g.:
<pre>
$ scan-build make
</pre>

This produces warnings on <kbd>stdout</kbd>, but more importantly scan reports in directory <kbd>/scratch/scan-build-XXX</kbd>, where XXX is date and time of the build.
For example the output of Open MPI includes real issues of missed memory releases in error code paths:

Development/General compiler usage

2025-01-23T09:28:21Z

R Keller: Order of links switched, since the latter is much more thorough

{| width=600px class="wikitable"
|-
! Description !! Content
|-
| module load
| compiler/gnu or compiler/intel or compiler/llvm and others...
|-
| License
| [[Development/Intel_Compiler|Intel]]: Commercial | [[Development/GCC|GNU]]: GPL | LLVM: Apache 2 | PGI/NVIDIA: Commercial
|}

= Description =

Basically, compilers translate human-readable source code (e.g. C++ interpreted as adhering to ISO/IEC 14882:2014, encoded in UTF-8 text) into binary byte code (e.g. x86-64 with Linux ABI in ELF-format).
Compilers are complex software and have become very powerful in the last decades, to '''guide''' you as a programmer writing better, more portable, more performant programs. Use the compiler as a tool -- and best use multiple compilers on the same source code for best results.
The basic operations and hints can be performed with the same or similar commands on all available compilers. For advanced usage such as optimization and profiling you should consult the best practice guide of the compiler you intend to use ([[Development/GCC|GCC]], [[Development/Intel_Compiler|Intel Suite]]).

More information about the MPI versions of the GNU and Intel Compilers is available here:
* [[Development/Parallel_Programming|Best Practices Guide for Parallel Programming]].

= Loading compilers as modules =

Modules and loading of modules is described in general in [[Environment_Modules|here for traditional Environment Modules]] and about the Lmod implementation of Environment modules in [[Software_Modules_Lmod|here for Lmod]].

However, modules need to be mentioned, since on any system there's a pre-installed set of compilers (for C, C++ and usually Fortran), which are provided by the Linux distribution -- the so-called system compilers. Which however may lack certain options for optimization, for warnings or other features. On RedHat Enterprise Linux this is GNU compiler v8.3.1.
Be advised to check out the newer compilers available as modules.

Since Fortran (and very old C++) requires compiling and linking libraries with the very same compiler, many libraries, first-and-foremost the MPI libraries need to be provided for specific versions of a compiler.
On [[BwUniCluster_2.0]], these provided libraries will only be visible to <kbd>module avail</kbd>, once a compiler is loaded.
Hence, check out loading
<pre>
$ module avail compiler/intel
...
$ module load compiler/intel/2021.4.0
...
$ module avail
</pre>
to see the available MPI modules.

All Intel, GCC and PGI have compilers for different programming languages which will be available
after the module is loaded.

== Linux Default Compiler ==

The default Compiler installed on all compute nodes is the GNU Compiler Collection (GCC) or in short GNU compiler.
* Don't get distracted with the available compiler modules.
* Only the modules are loading the complete environments needed.
Example
<pre>
$ module purge # unload all modules
$ module list # control
No Modulefiles Currently Loaded.
$ gcc --version # see version of default Linux GNU compiler
gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5)
[...]
$ module load compiler/gnu # load default GNU compiler module
$ module list # control
Currently Loaded Modulefiles:
1) compiler/gnu/10.2(default)
$ gcc --version # now, check the current (loaded) module
gcc (GCC) 10.2.0
[...]
</pre>

= Synoptical Tables =

== Compilers (no MPI) ==

{| width=600px class="wikitable"
|-
! Compiler Suite
! Language
! Command
|-
| style="vertical-align:top;" rowspan="3" | <big>Intel Composer (pre-OneAPI)</big> [[Development/Intel_Compiler|• Best Practice Guides on Intel Compiler Software]]
| C
| icc
|-
| C++
| icpc
|-
| Fortran
| ifort
|-
| style="vertical-align:top;" rowspan="3" | <big>Intel OneAPI (llvm-based)</big> [[Development/Intel_Compiler|• Best Practice Guides on Intel Compiler Software]]
| C
| icx
|-
| C++
| icpx
|-
| Fortran
| ifx
|-
| style="vertical-align:top;" rowspan="3" | <big>GCC</big> [[Development/GCC|• Best Practice Guides on GNU Compiler Software]]
| C
| gcc
|-
| C++
| g++
|-
| Fortran
| gfortran
|-
| style="vertical-align:top;" rowspan="3" | <big>LLVM</big>
| C
| clang
|-
| C++
| clang++
|-
| Fortran 77/90
| flang
|-
| style="vertical-align:top;" rowspan="3" | <big>PGI/NVIDIA</big>
| C
| pgcc
|-
| C++
| pgCC
|-
| Fortran 77/90
| pgf77 or pgf90
|}

== MPI compiler and Underlying Compilers ==

MPI implementations such as MPIch, Intel-MPI (derived from MPIch) or Open MPI provide compiler wrappers, easing the usage of MPI by providing the Include-Directory <kbd>-I</kbd> and required libraries as well as the MPI implementations library directorie <kbd>-L</kbd> for linking.
The following table lists available MPI compiler commands and the underlying compilers, compiler families, languages, and application binary interfaces (ABIs) that they support.

{| width=600px class="wikitable"
|-
! MPI Compiler Command !! Default Compiler !! Supported Language(s) !! Supported ABI's
|-
| colspan=4 style="background-color:#DCDCDC;" | Generic Compilers
|-
| mpicc || gcc, cc || C || 32/64 bit
|-
| mpicxx || g++ || C/C++ || 32/64 bit
|-
| mpifc || gfortran || Fortran77/Fortran 95 || 32/64 bit
|-
| colspan=4 style="background-color:#DCDCDC;" | [[Development/GCC|GNU Compiler]] Versions 3 and higher
|-
| mpigcc || gcc || C || 32/64 bit
|-
| mpigxx || g++ || C/C++ || 32/64 bit
|-
| mpif77 || g77 || Fortran 77 || 32/64 bit
|-
| mpif90 || gfortran || Fortran 95 || 32/64 bit
|-
| colspan=4 style="background-color:#DCDCDC;" | [[Development/Intel_Compiler|Intel Fortran, C++ Compilers]] Versions 13.1 through 14.0 and Higher
|-
| mpiicc || icc || C || 32/64 bit
|-
| mpiicpc || icpc || C++ || 32/64 bit
|-
|impiifort || ifort || Fortran77/Fortran 95 || 32/64 bit
|-
|}

= How to use =

The following compiler commands work for all the compilers in the list above even though
the examples will be for '''icc''' only.

== Commands ==

The typical introduction is a "Hello World" program. The following C source code shows best practices:
<source lang="c">
#include <stdio.h> // for printf
#include <stdlib.h> // for EXIT_SUCCESS and EXIT_FAILURE
int main (int argc, char * argv[]) { // std. definition of a program taking arguments
printf("Hello World\n"); // Unix Output is line-buffered, end line with New-line.
return EXIT_SUCCESS; // End program by returning 0 (No Error)
}</source>
It may be compiled and linked with the single command
<pre>$ icc hello.c -o hello</pre>
to produce an executable named ''hello''.

This process can be divided into two steps:
<pre>
$ icc -c hello.c
$ icc hello.o -o hello
</pre>
When using libraries you must sometimes specify where the
* include files are (option <kbd>-I</kbd>) and where the
* library files are (option <kbd>-L</kbd>).
In addition you have to tell the compiler which
* library you want to use (option <kbd>-l</kbd>).
For example after loading the module numlib/fftw you can compile code for fftw using
<pre>
$ icc -c hello.c -I$FFTW_INC_DIR
$ icc hello.o -o hello -L$FFTW_LIB_DIR -lfftw3
</pre>
When the program crashes or doesn't produce the expected output the compiler can
help you by printing all warning messages <kbd>-Wall</kbd> and adding flags for debugging <kbd>-g</kbd>:
<pre>$ icc -Wall -g hello.c -o hello</pre>

== Debugger ==

If the problem can't be solved this way you can inspect what exactly your program
does using a debugger, e.g. [[Development/GDB|GDB]].

To use the debugger properly with your program you have to compile it with debug information (option <kbd>-g</kbd>):

Example
<pre>$ icc -g hello.c -o hello</pre>
Although the compiler option <kbd>-Wall</kbd> (and possibly others) should always be set, the <kbd>-g</kbd> option should only be passed for
debugging purposes to find bugs.
It may slow down execution and enlarges the binary due to debugging symbols.

== Optimization ==

The usual and common way to compile your source is to apply compiler optimization.

Since there are many optimization options, as a start for now the optimization level -O2 is recommended:
<pre>$ icc -O2 hello.c -o hello</pre>
Beware: The optimization-flag used is a capital-O (like Otto) and not a 0 (Zero)!

All compilers offer a multitude of optimization options,
one may check the complete list of options with short explanation on [[Development/GCC|GCC]], [[LLVM|LLVM]] and
[[Development/Intel_Compiler|Intel Suite]] using option '''-v''' '''--help''':
<pre>
$ icc -v --help | less
$ gcc -v --help | less
$ clang -v --help | less
</pre>

Please note, that the optimization level <kbd>-O2</kbd> produces code for a general instruction set.
If you want to set the instruction set available, and take advantage of AVX2 or AVX512f, you have to
either add the machine-dependent <kbd>-mavx512f</kbd> or set the specific architecture of your
target processor.
For [[BwUniCluster_2.0]] this depends on whether you run your application on any node, then you would select
the older Broadwell CPU, or whether You target the newer HPC nodes (which feature Xeon Gold 6230, aka "Cascade Lake"
architecture).
<pre>
$ gcc -O2 -o hello hello.c # General optimization for any architecture
$ gcc -O2 -march=broadwell -o hello hello.c # Will work on any compute node on bwUniCluster 2.0
$ gcc -O2 -march=cascadelake -o hello hello.c # This may not run on Broadwell nodes
</pre>

While adding <kbd>-march=broadwell</kbd> adds the compiler options such as <kbd>-mavx -mavx2 -msse3 -msse4 -msse4.1 -msse4.2 -mssse3</kbd>,
adding <kbd>-march=cascadelake</kbd> will further this by <kbd>-mavx512bw -mavx512cd -mavx512dq -mavx512f -mavx512vl -mavx512vnni -mfma</kbd>,
where <kbd>-mfma</kbd> is the setting for allowing fused-multiply-add.
These options may provide considerable speed-up to your code as is.
'''Please note''' however, that Cascade Lake may throttle the processor's clock speed, when executing AVX-512 instructions, possibly running slower than
(older) AVX2 code paths would have.

You should then pay attention to vectorization attained by the compiler -- and concentrate on the time-consuming loops,
where the compiler was not able to vectorize.
Further vectorization as described in the Best Practice Guides may help.
This information is available with the Intel compiler using <kbd>-qopt-report=5</kbd> producing a lot of output in <kbd>hello.optrpt</kbd>,
while GCC offers this information using <kbd>-fopt-info-all</kbd>

For GCC the options in use are best visible by calling <kbd>gcc -O2 -fverbose-asm -S -o hello.S hello.c</kbd>.
The option <kbd>-fverbose-asm</kbd> stores all the options in the assembler file <kbd>hello.S</kbd>.

== Warnings and Error detection ==
All compilers have improved tremendously with regards to analyzing and detecting suspicious code: do make '''use''' of such warnings and hints.
The amount of false positives has reduced and it will make your code more accessible, less error-prone and more portable.

The typical warning flags are <kbd>-Wall</kbd> to turn on ''all'' warnings.
However, there's multiple other worthwhile warnings, which are not covered (since they might increase false positives, or since they are not yet considered so prominent).
E.g. <kbd>-Wextra</kbd> turns on several other warnings, which will in the above example show that neither <kbd>argc</kbd> nor <kbd>argv</kbd> have been used inside of <kbd>main</kbd>.

For LLVM's <kbd>clang</kbd> the flag <kbd>-Weverything</kbd> turns on all available warnings, albeit leading to many warnings on larger projects.
However, the fix-it hints are very helpful as well.

All the compilers offer the flag <kbd>-Werror</kbd> which turns any warning (allowing completion of compilation) into hard errors.
 
 

[[File:static_code_analysis.png|right|border|513px|Copyright: HS Esslingen)]]
Another powerful feature available in GNU- and LLVM-compilers is ''static code analysis''', otherwise only available in Commercial tools, like [https://www.synopsys.com/software-integrity/security-testing/static-analysis-sast.html Coverity].
Static code analysis evaluates '''each''' and '''every''' code path, making assumptions on input values and branches taken, detecting corner cases which might lead to real errors -- without having to actually execute this code path.

For GCC this is turned on using <kbd>-fanalyzer</kbd> which will detect e.g. cases of memory usage after a <kbd>free()</kbd> of said memory and many others. [https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html#Static-Analyzer-Options GCC's documentation] on Static Analysis provides further details.

For LLVM recompile your project using <kbd>scan-build</kbd>, e.g.:
<pre>
$ scan-build make
</pre>

This produces warnings on <kbd>stdout</kbd>, but more importantly scan reports in directory <kbd>/scratch/scan-build-XXX</kbd>, where XXX is date and time of the build.
For example the output of Open MPI includes real issues of missed memory releases in error code paths:

Registration/bwUniCluster/Entitlement

2025-01-20T11:52:05Z

R Keller: /* Step A: bwUniCluster Entitlement */

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
The bwUniCluster entitlement (see [https://www.bwidm.de/attribute.php#Berechtigung eduPersonEntitlement]) issued by a university assures the operator of the Clusters, that its university member's compute activities comply with the German Foreign Trade Act (Außenwirtschaftsgesetz - AWG) and German Foreign Trade Regulations (Außenwirtschaftsverordnung - AWV). ''Please check'' the regulations at the Federal Office of Economics and Export Control (BAFA) under [https://www.bafa.de/DE/Aussenwirtschaft/Ausfuhrkontrolle/Allgemeine_Einfuehrung/allgemeine_einfuehrung_node.html BAFA Aussenwirtschaft Ausfuhrkontrolle]
|}

= Step A: bwUniCluster Entitlement =

To register for the bwUniCluster 2.0 you need the '''bwUniCluster Entitlement''' issued by your university.

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
The entitlement is called '''bwUniCluster''' (and not bwUniCluster 2.0) and each university assigns the entitlement '''only''' for its own members.
|}

If you are not sure if you already have an entitlement, please check it first with the [[Registration/bwUniCluster/Entitlement#Check_your_Entitlements|'''Check your Entitlements''']] guide below.
If you need the entitlement, please follow the link for your institution or contact your local service desk if no information is provided:
* [https://www.hs-esslingen.de/informatik-und-informationstechnik/forschung-labore/projekte/forschungsprojekte/high-performance-computing/ Hochschule Esslingen]
* [[BwCluster_User_Access_Uni_Freiburg|Universität Freiburg]]
* [https://heiservices.uni-heidelberg.de/entitlement Universität Heidelberg] (access only within Uni Heidelberg network)
* [https://kim.uni-hohenheim.de/bwhpc-account Universität Hohenheim]
* [https://www.scc.kit.edu/downloads/ISM/Accessform_bwUniCluster_DE_EN_new.pdf Karlsruhe Institute of Technology (KIT)]
* [https://www.kim.uni-konstanz.de/en/services/research-and-teaching/high-performance-computing/access-to-bwunicluster Universität Konstanz]
* [[BWUniCluster_User_Access_Members_Uni_Mannheim|Universität Mannheim]]
* [https://www.hlrs.de/apply-for-computing-time/bw-uni-cluster Universität Stuttgart]
* [https://uni-tuebingen.de/de/155157 Universität Tübingen]
* [[BWUniCluster_User_Access_Members_Uni_Ulm|Universität Ulm]]
* [[Registration/HAW|HAW BW e.V.]] and Duale Hochschule Baden-Württemberg: Please contact your local service desk / compute center in case of question contact [mailto:hpc-at-haw@hs-esslingen.de mailto:hpc-at-haw@hs-esslingen.de]

== Check your Entitlements ==

To make sure you do not already have the entitlement, please log in to '''https://login.bwidm.de/user/index.xhtml'''.
To see the list of your entitlements, first select the '''Shibboleth''' tab at the top.
If the list below <code><nowiki>urn:oid:1.3.6.1.4.1.5923.1.1.1.7</nowiki></code> contains
<pre>http://bwidm.de/entitlement/bwUniCluster</pre>
you already have the entitlement and can skip step A.
{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
<code><nowiki>http://bwidm.de/entitlement/bwUniCluster</nowiki></code> is an attribute and not a link!
See [https://www.bwidm.de/dienste.php bwUniCluster und bwForCluster] for more information about needed attributes for this service.
|}
[[File:BwIDM-idp.png|center|600px|thumb|Verify Entitlement.]]

----

[[Registration/bwUniCluster/Service | Go to step B]]

Registration/2FA

2025-01-20T11:07:20Z

R Keller: /* How 2FA works on the bwHPC Clusters */

= Generate a Second Factor (2FA) =

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
You or your group must take care of the hardware for the second factor yourself. We do not provide hardware keys or mobile devices.
|}

To improve security a '''2-factor authentication mechanism (2FA)''' is being enforced for logins to bwUniCluster/bwForClusters. In addition to the service password a second value, the '''second factor''', has to be entered on every login.

If you only have a mobile device, you can use software-based solutions as a second factor. If you don't want to use a smartphone app, we recommend using a hardware token such as Yubikey.

* If you have any questions about 2FA, please read the [[Registration/2FA/FAQ|FAQs]], and if your question remains unanswered, please submit a support ticket.

* The Pros and Cons of the various solutions can be found in this [[Registration/2FA/ProCon|wiki]].

= How 2FA works on the bwHPC Clusters =

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
It is very important that the device that generates the One-Time Passwords and the device which is used to log into the bwUniCluster/bwForClusters are not the same.
Otherwise an attacker who gains access to your system can steal both the service password and the secret key of the Software Token application, which allows them to generate One-Time Passwords and log into the HPC system without your knowledge.
|}

[[File:2fa token code.jpg|right|200px|thumb|Hardware Token for TOTP]]
On the bwUniCluster/bwForClusters we use either six-digit, auto-generated, time-dependent '''one-time passwords''' (TOTP) or Yubico OTP.

'''TOTPs''' are generated by a piece of software which is part of a special hardware device (a '''hardware token''') or of a normal application running on a common device (a '''software token''').

The Token has to be synchronized with a central server before it can be used for authentication and then generates an endless stream of six-digit values (TOTPs) which can only be used once and are only valid during a very short interval of time. This makes it much harder for potential attackers to access the HPC system, even if they know the regular service password.

Typically a new TOTP value is generated every 30 seconds. When the current TOTP value has once been used successfully for a login, it is depleted and one has to wait up to 30 seconds for the next TOTP value. If you don't want to use a smartphone, we recommend using a hardware token, such as Yubikey or another TOTP-compatible device.
We do not recommend the use of TOTP generators for PCs. If the second factor is generated on the same computer on which the login takes place, it is no longer a second factor.

[[File:Otpapp.png|right|150px|thumb|Source: https://getaegis.app]]

The most common solution is to use a mobile device (e.g. your smartphone or tablet) as a Software Token by installing one of the following apps:
* 2FAS for [https://play.google.com/store/apps/details?id=com.twofasapp Android] or [https://apps.apple.com/us/app/2fa-authenticator-2fas/id1217793794 iOS] ([https://2fas.com/ Web Page] and [https://github.com/twofas GitHub], ''Apple and Google Cloud can be used for backups depending on the operating system.'')
* Open Source FreeOTP ([https://github.com/freeotp GitHub]) on [https://f-droid.org/en/packages/org.fedorahosted.freeotp/ F-Droid], [https://play.google.com/store/search?q=freeotp Android] or [https://apps.apple.com/de/app/freeotp-authenticator/id872559395 iOS] with a possibility of local backup files.
* Google Authenticator for [https://play.google.com/store/apps/details?id=com.google.android.apps.authenticator2 Android] or [https://apps.apple.com/de/app/google-authenticator/id388497605 iOS] (''Google Cloud can be used for backups, but these backups are not encrypted and can therefore be read by Google!'')
* Microsoft Authenticator for [https://play.google.com/store/apps/details?id=com.azure.authenticator Android] or [https://apps.apple.com/de/app/microsoft-authenticator/id983156458 iOS] ([https://www.microsoft.com/de-de/security/mobile-authenticator-app Web Page])
* LastPass Authenticator for [https://play.google.com/store/apps/details?id=com.lastpass.authenticator Android], [https://apps.apple.com/us/app/lastpass-authenticator/id1079110004 iOS] or [https://lastpass.com/auth/ Windows]
* Aegis Authenticator for [https://play.google.com/store/apps/details?id=com.beemdevelopment.aegis Android (Google Play)] or [https://f-droid.org/en/packages/com.beemdevelopment.aegis/ Android (F-Droid)] ([https://getaegis.app/ Web Page])
* OTP Auth for [https://apps.apple.com/app/otp-auth/id659877384 iOS]
* (''Authy for [https://play.google.com/store/apps/details?id=com.authy.authy Android], [https://apps.apple.com/us/app/authy/id494168017 iOS], [https://authy.com/download/ Mac, Windows or Linux], requires account'')
(''These are only suggestions. You can use any application compatible with the [https://tools.ietf.org/html/rfc6238 TOTP] standard.'')

[https://www.yubico.com/resources/glossary/yubico-otp/ '''Yubico OTP'''] is also supported if you want to use your Yubikey without depending on having a six-digit code displayed.

= Token Management =

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
* Create at least two separate tokens: '''FIRST''' set up a software/hardware TOTP token. '''THEN''' create and print a "backup TAN list". Never create the "backup TAN list" first.
* If you lose access to all your tokens, you will not be able to create new tokens and support will have to reset your tokens manually.
* The "backup TAN list" should always be created and printed in a '''second step'''. The printout should be kept in a separate place for emergencies.
* Please clean up your second factors as soon as you have created new tokens. Tokens that can no longer be used (e.g. because not initialized, smartphone/Yubikey lost, etc.) or an old backup TAN list where you have already used all TANs or there is no printout should be deactivated and deleted.
* Returning users who have already activated one or more tokens must first verify their token before they can create new tokens, see section [[Registration/2FA#Returning_Users|Returning Users]].
* '''Please disable all privacy tools, ad blockers and further add-ons when registering new tokens.''' These tools prevent the registration website from generating new security tokens. When the problems remains (you can not generate the QR code or can not confirm it by clicking CHECK), please try once more with an entirely new unmodified web browser profile.
|}

'''bwUniCluster/bwForCluster Tokens''' are generally managed via the '''Index -> My Tokens''' menu entry on the registration pages for the clusters. Here you can register, activate, deactivate and delete tokens.

To activate the second factor, '''please perform the following steps:'''

1. '''Select the registration server of the cluster''' for which you want to create a second factor and login to it: → [https://login.bwidm.de/user/twofa.xhtml Registration server for '''bwUniCluster 2.0''', '''bwForCluster JUSTUS 2''' and '''bwForCluster NEMO'''] (2FA tokens are valid for all three clusters; KIT members can reuse their existing hardware and software tokens) → [https://bwservices.uni-heidelberg.de/user/twofa.xhtml Registration server for '''bwForCluster Helix''']
[[File:BwIDM-twofa.png|center|600px|thumb|My Tokens]]

2. '''Register a new "[[Registration/2FA#Registering_a_new_Software_Token_using_a_Mobile_APP|Smartphone Token]]"''' or if you own a [https://www.yubico.com/ Yubikey]''' register a new "[[Registration/2FA#Registering_a_new_Yubikey_OTP_Token|Yubikey Token]]"''' or '''"[[Registration/2FA#Registering_a_new_Yubikey_OATH_TOTP_Token|Yubikey OATH TOTP Token]]"''' ([[Registration/2FA#Pros_and_Cons_of_the_different_Solutions|pros ans cons]]).

3. '''Register a new "[[Registration/2FA#Backup_TAN_List|TAN List]]" (backup TAN list)'''.

4. Repeat step 2. for additional tokens.

== Registering a new Software Token using a Mobile APP ==

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
Please disable all privacy tools, ad blockers and further add-ons when registering new tokens.
|}

1. Select the [[Registration/2FA#Token_Management|registration server]] of the cluster for which you want to create a second factor and login to it.

2. Registering a new Token starts with a click '''NEW SMARTPHONE TOKEN'''.
[[File:BwIDM-token.png|center|600px|thumb|Create a new Token]]

3. A new window opens. Click '''Start''' to generate a new '''QR code'''.
This may take a while.
{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
The QR code contains a key which has to remain secret.
Only use the QR code to link your software token app with bwIDM/bwServices in the next step.
Do not save the QR code, print it out or share it with someone else.
|}
[[File:BwIDM-qr.png|center|600px|thumb|QR Code for Mobile App]]

4. Start the software token app on your separate device and scan the QR code.
The exact process is a little bit different in every app, but is usually started by pressing on a button with a plus (+) sign or an icon of a QR code.

5. Once the QR code has been loaded into your Software Token app there should be a new entry called '''bwIDM''' (bwUniCluster, JUSTUS 2 and NEMO) or '''bwServices''' (Helix).
Generate an One-Time-Password by pressing on this entry or selecting the appropriate button/menu item.
You will receive a six-digit code.
Enter this code into the field labeled "Current code:" in your bwIDM browser window to prove that the connection has worked and then click '''CHECK'''.
{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
If you do not confirm the token by entering the six-digit code in the "Current code:" field, the token will '''NOT''' be initialized!
|}

6. If everything worked as expected, you will be returned to the '''My Tokens''' screen and there will be a new entry for your software token.
[[File:BwIDM-app.png|center|400px|thumb|Success]]

7. Repeat the process to register additional tokens.
Please register at least the "Backup TAN list" in addition to the hardware/software token you plan to use regularly.

== Registering a new Yubikey OTP Token ==

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
Please disable all privacy tools, ad blockers and further add-ons when registering new tokens.
|}

[https://developers.yubico.com/OTP/OTPs_Explained.html Yubikey OTP] is even easier and you don't need a device that displays the six-digit code or extra software.
New Yubikeys are already configured to provide Yubikey OTP in slot 1.
If you need to configure your Yubikey, read this [[Registration/2FA/Yubikey|documentation]].

1. Select the [[Registration/2FA#Token_Management|registration server]] of the cluster for which you want to create a second factor and login to it.

2. If you want to use [https://www.yubico.com/resources/glossary/yubico-otp/ Yubico OTP], you can click '''NEW YUBIKEY TOKEN''' instead.
[[File:BwIDM-token.png|center|600px|thumb|Generate Yubikey OTP]]

3. Yubikey OTP is configured to slot 1 on new Yubikeys, so you only need to click in the text box and then touch the metal part of your Yubikey.
Please refer to this [[Registration/2FA/Yubikey|documentation]] on how to configure your Yubikey.
[[File:BwIDM-yubikey.png|center|400px|thumb|Yubikey OTP]]

4. If everything worked as expected, you will be returned to the '''My Tokens''' screen and there will be a new entry for your Yubikey.
[[File:BwIDM-yubikey2.png|center|400px|thumb|Success]]

5. Repeat the process to register additional tokens.
Please register at least the "Backup TAN list" in addition to the hardware/software token you plan to use regularly.

== Registering a new Yubikey OATH TOTP Token ==

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
Please disable all privacy tools, ad blockers and further add-ons when registering new tokens.
|}

[https://developers.yubico.com/OATH/ Yubikey OATH TOTP] generates the TANs on your Yubikey and therefore you can use different computers and phones to generate these codes.
Please download and install [https://developers.yubico.com/OATH/YubiKey_OATH_software.html Yubico Authenticator] for desktop (or Android/iOS) first.
Insert your Yubikey in your computer.
"Yubikey OTP" (not "Yubikey OATH TOTP") is even easier and you don't need a device that displays the six-digit code or extra software (go to step [[Registration/2FA#Yubikey_OTP|Yubikey OTP]]).

1. Select the [[Registration/2FA#Token_Management|registration server]] of the cluster for which you want to create a second factor and login to it.

2. Registering a new Token starts with a click '''NEW SMARTPHONE TOKEN'''.
[[File:BwIDM-token.png|center|600px|thumb|Create a new Token]]

3. A new window opens. Click '''Start''' to generate a new '''QR code'''.
This may take a while.
{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
The QR code contains a key which has to remain secret.
Only use the QR code to link your software token app with bwIDM/bwServices in the next step.
Do not save the QR code, print it out or share it with someone else.
|}

4. Start the Yubico Authenticator on your OS, click the three vertical dots in the upper right corner and select '''Scan QR code'''.
[[File:BwIDM-yubi1.png|center|600px|thumb|QR Code and Yubico Authenticator on Linux]]

5. Yubico Authenticator automatically translates the QR code to a new entry called '''bwIDM''' or '''bwServices''' (Helix).
Click '''Add account'''.
[[File:BwIDM-yubi2.png|center|600px|thumb|Create new TOTP on Yubico Authenticator]]

6. You will receive a six-digit code.
Enter this code into the field labeled "Current code:" in your bwIDM browser window to prove that the connection has worked and then click '''CHECK'''.
[[File:BwIDM-yubi3.png|center|600px|thumb|Verify TOTP]]

7. If everything worked as expected, you will be returned to the '''My Tokens''' screen and there will be a new entry for your software token.
[[File:BwIDM-app.png|center|400px|thumb|Success]]

8. Repeat the process to register additional tokens.
Please register at least the "Backup TAN list" in addition to the hardware/software token you plan to use regularly.

== Backup TAN List ==

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
Passwords from the "Backup TAN list" should only be used if no other token is left.
Please do not use the Backup TANs for regular cluster login, because you have only a limited number of TANs.
Each TAN can only be used once.
Please disable all privacy tools, ad blockers and further add-ons when registering a new Backup TAN list.
|}

1. Select the [[Registration/2FA#Token_Management|registration server]] of the cluster for which you want to create a second factor and login to it.

2. Please create at least one "Backup TAN list" by clicking '''CREATE NEW TAN LIST'''.
[[File:BwIDM-token.png|center|600px|thumb|Generate Backup TAN list]]

3. Click '''START'''. You will be redirected to the '''My Tokens''' screen and there will be a new entry for your backup TANs.
[[File:BwIDM-tan.png|center|400px|thumb|Success]]

4. Click '''SHOW TANS''', print the codes and keep then in a separate place for emergencies.
[[File:JUSTUS-2-2FA-backup-TAN-list.png|center|800px|thumb|Print Backup TAN List]]

5. Repeat the process to register additional tokens.

== Deactivating a Token ==

Click '''Disable''' next to the Token entry on the '''My Tokens''' screen.

== Deleting a Token ==

After a Token has been disabled a new button labeled '''Delete''' will appear. Click on it to delete the token.

= Returning Users =

Returning users who have already activated one or more tokens must first verify their token before they can create new tokens or deactivate/delete old ones.
If you no longer have valid tokens, you will not be able to create or manage tokens.
In this case, read the section [[Registration/2FA#Lost_Token|Lost Token]].
[[File:BwIDM-totp.png|center|400px|thumb|Returning users must first verify their token.]]

= Lost Token =

If you change your phone, please migrate your tokens first or register your new mobile app under "My Tokens".

'''If you no longer have valid tokens (mobile app, hardware token, Yubikey or backup TAN, i.e. lost or broken smartphone), you can not access the section "My Tokens" anymore.
In this case you will need to contact the [https://bw-support.scc.kit.edu/ ticket system].'''
Open a ticket, include your user name, the name of the bwHPC cluster and ask for a reset of your 2FA tokens.
Please note that this process may take some time and also means additional work for the operators.

Registration/2FA

2025-01-20T11:04:26Z

R Keller: /* How 2FA works on the bwHPC Clusters */ Add FreeOTP

= Generate a Second Factor (2FA) =

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
You or your group must take care of the hardware for the second factor yourself. We do not provide hardware keys or mobile devices.
|}

To improve security a '''2-factor authentication mechanism (2FA)''' is being enforced for logins to bwUniCluster/bwForClusters. In addition to the service password a second value, the '''second factor''', has to be entered on every login.

If you only have a mobile device, you can use software-based solutions as a second factor. If you don't want to use a smartphone app, we recommend using a hardware token such as Yubikey.

* If you have any questions about 2FA, please read the [[Registration/2FA/FAQ|FAQs]], and if your question remains unanswered, please submit a support ticket.

* The Pros and Cons of the various solutions can be found in this [[Registration/2FA/ProCon|wiki]].

= How 2FA works on the bwHPC Clusters =

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
It is very important that the device that generates the One-Time Passwords and the device which is used to log into the bwUniCluster/bwForClusters are not the same.
Otherwise an attacker who gains access to your system can steal both the service password and the secret key of the Software Token application, which allows them to generate One-Time Passwords and log into the HPC system without your knowledge.
|}

[[File:2fa token code.jpg|right|200px|thumb|Hardware Token for TOTP]]
On the bwUniCluster/bwForClusters we use either six-digit, auto-generated, time-dependent '''one-time passwords''' (TOTP) or Yubico OTP.

'''TOTPs''' are generated by a piece of software which is part of a special hardware device (a '''hardware token''') or of a normal application running on a common device (a '''software token''').

The Token has to be synchronized with a central server before it can be used for authentication and then generates an endless stream of six-digit values (TOTPs) which can only be used once and are only valid during a very short interval of time. This makes it much harder for potential attackers to access the HPC system, even if they know the regular service password.

Typically a new TOTP value is generated every 30 seconds. When the current TOTP value has once been used successfully for a login, it is depleted and one has to wait up to 30 seconds for the next TOTP value. If you don't want to use a smartphone, we recommend using a hardware token, such as Yubikey or another TOTP-compatible device.
We do not recommend the use of TOTP generators for PCs. If the second factor is generated on the same computer on which the login takes place, it is no longer a second factor.

[[File:Otpapp.png|right|150px|thumb|Source: https://getaegis.app]]

The most common solution is to use a mobile device (e.g. your smartphone or tablet) as a Software Token by installing one of the following apps:
* 2FAS for [https://play.google.com/store/apps/details?id=com.twofasapp Android] or [https://apps.apple.com/us/app/2fa-authenticator-2fas/id1217793794 iOS] ([https://2fas.com/ Web Page] and [https://github.com/twofas GitHub], ''Apple and Google Cloud can be used for backups depending on the operating system.'')
* Open Source FreeOTP on [https://f-droid.org/en/packages/org.fedorahosted.freeotp/ F-Droid], [https://play.google.com/store/search?q=freeotp Android] or [https://apps.apple.com/de/app/freeotp-authenticator/id872559395 iOS] with a possibility of local backup files.
* Google Authenticator for [https://play.google.com/store/apps/details?id=com.google.android.apps.authenticator2 Android] or [https://apps.apple.com/de/app/google-authenticator/id388497605 iOS] (''Google Cloud can be used for backups, but these backups are not encrypted and can therefore be read by Google!'')
* Microsoft Authenticator for [https://play.google.com/store/apps/details?id=com.azure.authenticator Android] or [https://apps.apple.com/de/app/microsoft-authenticator/id983156458 iOS] ([https://www.microsoft.com/de-de/security/mobile-authenticator-app Web Page])
* LastPass Authenticator for [https://play.google.com/store/apps/details?id=com.lastpass.authenticator Android], [https://apps.apple.com/us/app/lastpass-authenticator/id1079110004 iOS] or [https://lastpass.com/auth/ Windows]
* Aegis Authenticator for [https://play.google.com/store/apps/details?id=com.beemdevelopment.aegis Android (Google Play)] or [https://f-droid.org/en/packages/com.beemdevelopment.aegis/ Android (F-Droid)] ([https://getaegis.app/ Web Page])
* OTP Auth for [https://apps.apple.com/app/otp-auth/id659877384 iOS]
* (''Authy for [https://play.google.com/store/apps/details?id=com.authy.authy Android], [https://apps.apple.com/us/app/authy/id494168017 iOS], [https://authy.com/download/ Mac, Windows or Linux], requires account'')
(''These are only suggestions. You can use any application compatible with the [https://tools.ietf.org/html/rfc6238 TOTP] standard.'')

[https://www.yubico.com/resources/glossary/yubico-otp/ '''Yubico OTP'''] is also supported if you want to use your Yubikey without depending on having a six-digit code displayed.

= Token Management =

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
* Create at least two separate tokens: '''FIRST''' set up a software/hardware TOTP token. '''THEN''' create and print a "backup TAN list". Never create the "backup TAN list" first.
* If you lose access to all your tokens, you will not be able to create new tokens and support will have to reset your tokens manually.
* The "backup TAN list" should always be created and printed in a '''second step'''. The printout should be kept in a separate place for emergencies.
* Please clean up your second factors as soon as you have created new tokens. Tokens that can no longer be used (e.g. because not initialized, smartphone/Yubikey lost, etc.) or an old backup TAN list where you have already used all TANs or there is no printout should be deactivated and deleted.
* Returning users who have already activated one or more tokens must first verify their token before they can create new tokens, see section [[Registration/2FA#Returning_Users|Returning Users]].
* '''Please disable all privacy tools, ad blockers and further add-ons when registering new tokens.''' These tools prevent the registration website from generating new security tokens. When the problems remains (you can not generate the QR code or can not confirm it by clicking CHECK), please try once more with an entirely new unmodified web browser profile.
|}

'''bwUniCluster/bwForCluster Tokens''' are generally managed via the '''Index -> My Tokens''' menu entry on the registration pages for the clusters. Here you can register, activate, deactivate and delete tokens.

To activate the second factor, '''please perform the following steps:'''

1. '''Select the registration server of the cluster''' for which you want to create a second factor and login to it: → [https://login.bwidm.de/user/twofa.xhtml Registration server for '''bwUniCluster 2.0''', '''bwForCluster JUSTUS 2''' and '''bwForCluster NEMO'''] (2FA tokens are valid for all three clusters; KIT members can reuse their existing hardware and software tokens) → [https://bwservices.uni-heidelberg.de/user/twofa.xhtml Registration server for '''bwForCluster Helix''']
[[File:BwIDM-twofa.png|center|600px|thumb|My Tokens]]

2. '''Register a new "[[Registration/2FA#Registering_a_new_Software_Token_using_a_Mobile_APP|Smartphone Token]]"''' or if you own a [https://www.yubico.com/ Yubikey]''' register a new "[[Registration/2FA#Registering_a_new_Yubikey_OTP_Token|Yubikey Token]]"''' or '''"[[Registration/2FA#Registering_a_new_Yubikey_OATH_TOTP_Token|Yubikey OATH TOTP Token]]"''' ([[Registration/2FA#Pros_and_Cons_of_the_different_Solutions|pros ans cons]]).

3. '''Register a new "[[Registration/2FA#Backup_TAN_List|TAN List]]" (backup TAN list)'''.

4. Repeat step 2. for additional tokens.

== Registering a new Software Token using a Mobile APP ==

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
Please disable all privacy tools, ad blockers and further add-ons when registering new tokens.
|}

1. Select the [[Registration/2FA#Token_Management|registration server]] of the cluster for which you want to create a second factor and login to it.

2. Registering a new Token starts with a click '''NEW SMARTPHONE TOKEN'''.
[[File:BwIDM-token.png|center|600px|thumb|Create a new Token]]

3. A new window opens. Click '''Start''' to generate a new '''QR code'''.
This may take a while.
{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
The QR code contains a key which has to remain secret.
Only use the QR code to link your software token app with bwIDM/bwServices in the next step.
Do not save the QR code, print it out or share it with someone else.
|}
[[File:BwIDM-qr.png|center|600px|thumb|QR Code for Mobile App]]

4. Start the software token app on your separate device and scan the QR code.
The exact process is a little bit different in every app, but is usually started by pressing on a button with a plus (+) sign or an icon of a QR code.

5. Once the QR code has been loaded into your Software Token app there should be a new entry called '''bwIDM''' (bwUniCluster, JUSTUS 2 and NEMO) or '''bwServices''' (Helix).
Generate an One-Time-Password by pressing on this entry or selecting the appropriate button/menu item.
You will receive a six-digit code.
Enter this code into the field labeled "Current code:" in your bwIDM browser window to prove that the connection has worked and then click '''CHECK'''.
{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
If you do not confirm the token by entering the six-digit code in the "Current code:" field, the token will '''NOT''' be initialized!
|}

6. If everything worked as expected, you will be returned to the '''My Tokens''' screen and there will be a new entry for your software token.
[[File:BwIDM-app.png|center|400px|thumb|Success]]

7. Repeat the process to register additional tokens.
Please register at least the "Backup TAN list" in addition to the hardware/software token you plan to use regularly.

== Registering a new Yubikey OTP Token ==

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
Please disable all privacy tools, ad blockers and further add-ons when registering new tokens.
|}

[https://developers.yubico.com/OTP/OTPs_Explained.html Yubikey OTP] is even easier and you don't need a device that displays the six-digit code or extra software.
New Yubikeys are already configured to provide Yubikey OTP in slot 1.
If you need to configure your Yubikey, read this [[Registration/2FA/Yubikey|documentation]].

1. Select the [[Registration/2FA#Token_Management|registration server]] of the cluster for which you want to create a second factor and login to it.

2. If you want to use [https://www.yubico.com/resources/glossary/yubico-otp/ Yubico OTP], you can click '''NEW YUBIKEY TOKEN''' instead.
[[File:BwIDM-token.png|center|600px|thumb|Generate Yubikey OTP]]

3. Yubikey OTP is configured to slot 1 on new Yubikeys, so you only need to click in the text box and then touch the metal part of your Yubikey.
Please refer to this [[Registration/2FA/Yubikey|documentation]] on how to configure your Yubikey.
[[File:BwIDM-yubikey.png|center|400px|thumb|Yubikey OTP]]

4. If everything worked as expected, you will be returned to the '''My Tokens''' screen and there will be a new entry for your Yubikey.
[[File:BwIDM-yubikey2.png|center|400px|thumb|Success]]

5. Repeat the process to register additional tokens.
Please register at least the "Backup TAN list" in addition to the hardware/software token you plan to use regularly.

== Registering a new Yubikey OATH TOTP Token ==

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
Please disable all privacy tools, ad blockers and further add-ons when registering new tokens.
|}

[https://developers.yubico.com/OATH/ Yubikey OATH TOTP] generates the TANs on your Yubikey and therefore you can use different computers and phones to generate these codes.
Please download and install [https://developers.yubico.com/OATH/YubiKey_OATH_software.html Yubico Authenticator] for desktop (or Android/iOS) first.
Insert your Yubikey in your computer.
"Yubikey OTP" (not "Yubikey OATH TOTP") is even easier and you don't need a device that displays the six-digit code or extra software (go to step [[Registration/2FA#Yubikey_OTP|Yubikey OTP]]).

1. Select the [[Registration/2FA#Token_Management|registration server]] of the cluster for which you want to create a second factor and login to it.

2. Registering a new Token starts with a click '''NEW SMARTPHONE TOKEN'''.
[[File:BwIDM-token.png|center|600px|thumb|Create a new Token]]

3. A new window opens. Click '''Start''' to generate a new '''QR code'''.
This may take a while.
{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
The QR code contains a key which has to remain secret.
Only use the QR code to link your software token app with bwIDM/bwServices in the next step.
Do not save the QR code, print it out or share it with someone else.
|}

4. Start the Yubico Authenticator on your OS, click the three vertical dots in the upper right corner and select '''Scan QR code'''.
[[File:BwIDM-yubi1.png|center|600px|thumb|QR Code and Yubico Authenticator on Linux]]

5. Yubico Authenticator automatically translates the QR code to a new entry called '''bwIDM''' or '''bwServices''' (Helix).
Click '''Add account'''.
[[File:BwIDM-yubi2.png|center|600px|thumb|Create new TOTP on Yubico Authenticator]]

6. You will receive a six-digit code.
Enter this code into the field labeled "Current code:" in your bwIDM browser window to prove that the connection has worked and then click '''CHECK'''.
[[File:BwIDM-yubi3.png|center|600px|thumb|Verify TOTP]]

7. If everything worked as expected, you will be returned to the '''My Tokens''' screen and there will be a new entry for your software token.
[[File:BwIDM-app.png|center|400px|thumb|Success]]

8. Repeat the process to register additional tokens.
Please register at least the "Backup TAN list" in addition to the hardware/software token you plan to use regularly.

== Backup TAN List ==

{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#cef2e0; text-align:left"|
Passwords from the "Backup TAN list" should only be used if no other token is left.
Please do not use the Backup TANs for regular cluster login, because you have only a limited number of TANs.
Each TAN can only be used once.
Please disable all privacy tools, ad blockers and further add-ons when registering a new Backup TAN list.
|}

1. Select the [[Registration/2FA#Token_Management|registration server]] of the cluster for which you want to create a second factor and login to it.

2. Please create at least one "Backup TAN list" by clicking '''CREATE NEW TAN LIST'''.
[[File:BwIDM-token.png|center|600px|thumb|Generate Backup TAN list]]

3. Click '''START'''. You will be redirected to the '''My Tokens''' screen and there will be a new entry for your backup TANs.
[[File:BwIDM-tan.png|center|400px|thumb|Success]]

4. Click '''SHOW TANS''', print the codes and keep then in a separate place for emergencies.
[[File:JUSTUS-2-2FA-backup-TAN-list.png|center|800px|thumb|Print Backup TAN List]]

5. Repeat the process to register additional tokens.

== Deactivating a Token ==

Click '''Disable''' next to the Token entry on the '''My Tokens''' screen.

== Deleting a Token ==

After a Token has been disabled a new button labeled '''Delete''' will appear. Click on it to delete the token.

= Returning Users =

Returning users who have already activated one or more tokens must first verify their token before they can create new tokens or deactivate/delete old ones.
If you no longer have valid tokens, you will not be able to create or manage tokens.
In this case, read the section [[Registration/2FA#Lost_Token|Lost Token]].
[[File:BwIDM-totp.png|center|400px|thumb|Returning users must first verify their token.]]

= Lost Token =

If you change your phone, please migrate your tokens first or register your new mobile app under "My Tokens".

'''If you no longer have valid tokens (mobile app, hardware token, Yubikey or backup TAN, i.e. lost or broken smartphone), you can not access the section "My Tokens" anymore.
In this case you will need to contact the [https://bw-support.scc.kit.edu/ ticket system].'''
Open a ticket, include your user name, the name of the bwHPC cluster and ask for a reset of your 2FA tokens.
Please note that this process may take some time and also means additional work for the operators.

Registration/2FA/FAQ

2025-01-20T10:58:01Z

R Keller: Mention Auto-login / auto-reconnect for editors with multiple failures.

= Second Factor (2FA) FAQ =

== How does 2FA work? ==

2FA uses two out of multiple factors for authentication. Factors are:

* Something you know (password or PIN)
* Something you own (mobile phone or security device)
* Something you are (biometric features)

The principle idea is that even if an attacking party manages to get hold of one factor, it still has to acquire a second factor for a completely successful attack. Such a completely successful attack results in a theft of your identify, possibly leading to malicious acts committed on your behalf and to your disadvantage.

== Why is 2FA necessary? ==

2FA is the current state-of-the-art method to prevent or mitigate cyber attacks. It is far superior to using a single shared secret (i.e. "password"), even if very strong passwords are used. 2FA mitigates fishing attempts, person-in-the-middle attacks and even cases where local computers (i.e. notebooks or workstations) have been stolen or compromised.

2FA is constantly replacing password-only authentication schemes in a networked world to improve cyber security and prevent identity theft.

== Why is 2FA so resilient? ==

When computers become comprised, passwords and private keys can easily be recorded, copied and used for future attacks at any time from any place. Without a second factor, an attacking party cannot proceed autonomously without further involvement of the victim. As soon as a second factor is required, i.e. something you own or something you are, an attacking party lacks an element which it cannot provide or simulate. Furthermore, these second factors are securely contained in uncompromisable areas on the respective devices (e.g. phones or hardware security keys). For authentication, a challenge is issued that only the owner of the device can answer. The secret never leaves the secure area during this challenge.

== I am using a password manager in conjunction with passwords that are individual per service, long and complex and thus impossible to guess. Isn't this good enough? ==

This is very good practice and in fact recommended for all systems that do not support 2FA yet. However, consider this: Your passwords need to be interpreted by the remote systems and are therefore available to the remote systems in clear text. If a remote system has been compromised, your password and thus your identity are compromised as well. And if an attacker has compromised your host, they can read or intercept the password, but not the second factor.

== I am using SSH private keys and secure them with a passphrase. Isn't this good enough? ==

This is very good and in fact recommended for all systems reachable via SSH that do not support 2FA yet. However, consider this: Computers are vulnerable by remote attacks (a thoughtless wrong click can be sufficient) and local attacks (somebody manipulates your machine when left unattended).

In case you become the victim of a successful attack, your machine will be compromised and every action will be recorded and monitored without you knowing, possibly for a very long time.

2FA offers at least some mitigation in this case.

== What happens if my local computer is compromised? ==

2FA does not prevent this from happening, but it offers mitigation. The services you use that are protected with 2FA are only partially compromised:

* An attacking party cannot use the compromised secret ("password") that it acquired from monitoring your local computer to gain remote access to the 2FA secured service. To initiate a remote session from the attacking parties' computers to the 2FA secured service, the second factor is required, which the attacking party does not have and cannot simulate, since it requires physical ownership of the respective security device.
* An attacking party cannot initiate a session from your compromised local computer to the remote computer without your active participation. The attacking party would have to present a second factor which requires a non-automatic action originating from the respective physical security device.
* ATTENTION: An attacking party can still monitor and possibly hijack connections from your compromised local computer to remote systems at the time these connections are actively initiated by you. However, exploitation is much more difficult than copying a simple shared password and needs to be adapted to the service which is attacked. For the attacking party, this bears at least a heightened risk of being discovered.
* Your passwords, passphrases and secret SSH keys are likely to have been compromised and therefore have to be exchanged after the attack becomes known to you. Your other factors have never left the external devices you keep them on and don't need to be changed. You have only answered cryptographic challenges with them, and these challenges and their corresponding answers cannot be predetermined by an attacker.

== What happens if I lose my security hardware token? ==

An attacking party would need to combine a found or stolen second factor (e.g. phone or security key) with a software attack, i.e. infecting your local computer, to get hold of the secret you know (i.e. PIN or password) to access services secured by 2FA. Nevertheless, you should remove old or lost second factors and replace them with new ones.

== What happens if I lose my phone? ==

Modern phones are protected by biometric measures. The biometric secrets are kept in a secure enclave on the device and cannot be extracted once registered. Neither an attacking party nor you nor the manufacturer can extract the secret data after the initial transfer. It is only ever used inside the secure area on the device to answer cryptographic challenges. In case you don't trust the manufacturers to keep your biometric data safe, you can use a PIN as an alternative. However, you still have to accept a certain level of trust towards the phone manufacturer. This is unavoidable unless you have the means to provide the complete chain of security yourself.

To avoid losing your access, some apps allow secure backups of your secondary factors, but to be safe you should register more than one second factor including backup TANs.

== Whats happens if I lose all secondary factors? ==

There are recovery procedures which vary by service. A common method are recovery codes (backup TANs), which you can print out and deposit in a safe location. Recovery codes work like master keys, so they must never be kept on the same device which is used to access the 2FA protected service.
If all else fails, most services will grant you access again after a thorough identity verification.

== Where should I be careful using 2FA? ==
Please be aware, that multiple wrong OTP authentications in-a-row may deactivate '''all''' OTP tokens (including your TAN Backup-List) -- you may need to re-activate all tokens by opening a Ticket on the bwHPC Support portal.
Therefore be careful using editors like EMACS or Visual Studio Code, which may auto-reconnect after your computer went to sleep or switched networks.

DACHS/Hardware

2025-01-14T14:01:22Z

R Keller: /* Storage Architecture */

= Architecture of DACHS =
The Datenanalyse Cluster der Hochschulen (DACHS) is a parallel computer with distributed memory connected over Infiniband and Ethernet. The compute nodes contain at least dual AMD processors, at least 384GB of local memory, 2 TB local NVMe-based disc storage and accelerators as shown in the table below. With BeeGFS a fast and scalable filesystem is provided via Infiniband to all login and compute nodes

The Operating System is Rocky-Linux 9.4 (which is based on RHEL).
The setup is kept in-line (with regard to Software, Setup and general usage) and thus mostly equivalent to bwHPC and bwUniCluster in particular.

= Components of DACHS =

{| class="wikitable"
|-
! style="width:9%"|
! style="width:13%"| Compute nodes "L40S"
! style="width:13%"| Compute nodes "H100"
! style="width:13%"| Compute nodes "AMD_APU"
! style="width:13%"| Login
|-
!scope="column"| Number of nodes
| 45
| 1
| 1
| 2
|-
!scope="column"| Processors
| AMD EPYC 9254
| AMD EPYC 9454
| AMD MI300A
| AMD EPYC 9254
|-
!scope="column"| Number of sockets
| 2
| 2
| 4
| 2
|-
!scope="column"| Processor frequency (GHz)
| 2.9 Ghz
| 2.75 Ghz
| 2.1 Ghz
| 2.9 Ghz
|-
!scope="column"| Total number of cores
| 48
| 96
| 96
| 48
|-
!scope="column"| Main memory
| 384 GB
| 1536 GB
| 512 GB
| 384 GB
|-
!scope="column"| Local SSD
| 1,92 TB NVMe
| 1,92 TB NVMe
| 1,92 TB NVMe
| 1,92 TB NVMe
|-
!scope="column"| Accelerators
| 1x NVIDIA L40S
| 8x NVIDIA H100
| 4x AMD MI300A
| -
|-
!scope="column"| Accelerator memory
| 48 GB
| 8x 80 GB
| 4x 128 GB
| -
|-
!scope="column"| Interconnect
| IB HDR100
| IB HDR100
| IB HDR100
| IB HDR100
|}
Table 1: Properties of the nodes

== Storage Architecture ==
The system features a 700 TB large BeeGFS filesystem available on login and compute nodes.
Please note: there is a hard file size quota per partner organization and a soft quota per user on Your HOME.
Users will be notified by E-Mail if the quota is to be reached.

Please '''do make usage''' of [[Workspace | Work Space mechanism]] for larger files.