<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.bwhpc.de/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=S+Braun</id>
	<title>bwHPC Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.bwhpc.de/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=S+Braun"/>
	<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/e/Special:Contributions/S_Braun"/>
	<updated>2026-04-23T10:53:32Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.39.17</generator>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Registration/2FA&amp;diff=15989</id>
		<title>Registration/2FA</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Registration/2FA&amp;diff=15989"/>
		<updated>2026-04-21T13:28:23Z</updated>

		<summary type="html">&lt;p&gt;S Braun: /* Token Management */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Generate a Second Factor (2FA) =&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
You or your group must take care of the hardware for the second factor yourself. We do not provide hardware keys or mobile devices.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
To improve security a &#039;&#039;&#039;2-factor authentication mechanism (2FA)&#039;&#039;&#039; is being enforced for logins to bwUniCluster/bwForClusters. In addition to the service password a second value, the &#039;&#039;&#039;second factor&#039;&#039;&#039;, has to be entered on every login.&lt;br /&gt;
&lt;br /&gt;
If you only have a mobile device, you can use software-based solutions as a second factor. If you don&#039;t want to use a smartphone app, we recommend using a hardware token such as Yubikey.&lt;br /&gt;
&lt;br /&gt;
* If you have any questions about 2FA, please read the [[Registration/2FA/FAQ|FAQs]], and if your question remains unanswered, please submit a support ticket.&lt;br /&gt;
&lt;br /&gt;
* The Pros and Cons of the various solutions can be found in this [[Registration/2FA/ProCon|wiki]].&lt;br /&gt;
&lt;br /&gt;
= How 2FA works on the bwHPC Clusters =&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
It is very important that the device that generates the One-Time Passwords and the device which is used to log into the bwUniCluster/bwForClusters are not the same.&lt;br /&gt;
Otherwise an attacker who gains access to your system can steal both the service password and the secret key of the Software Token application, which allows them to generate One-Time Passwords and log into the HPC system without your knowledge.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
[[File:2fa token code.jpg|right|200px|thumb|Hardware Token for TOTP]]&lt;br /&gt;
On the bwUniCluster/bwForClusters we use either six-digit, auto-generated, time-dependent &#039;&#039;&#039;one-time passwords&#039;&#039;&#039; (TOTP) or Yubico OTP.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;TOTPs&#039;&#039;&#039; are generated by a piece of software which is part of a special hardware device (a &#039;&#039;&#039;hardware token&#039;&#039;&#039;) or of a normal application running on a common device (a &#039;&#039;&#039;software token&#039;&#039;&#039;).&lt;br /&gt;
&lt;br /&gt;
The Token has to be synchronized with a central server before it can be used for authentication and then generates an endless stream of six-digit values (TOTPs) which can only be used once and are only valid during a very short interval of time. This makes it much harder for potential attackers to access the HPC system, even if they know the regular service password.&lt;br /&gt;
&lt;br /&gt;
Typically a new TOTP value is generated every 30 seconds. When the current TOTP value has once been used successfully for a login, it is depleted and one has to wait up to 30 seconds for the next TOTP value. If you don&#039;t want to use a smartphone, we recommend using a hardware token, such as Yubikey or another TOTP-compatible device.&lt;br /&gt;
We do not recommend the use of TOTP generators for PCs. If the second factor is generated on the same computer on which the login takes place, it is no longer a second factor.&lt;br /&gt;
&lt;br /&gt;
[[File:Otpapp.png|right|150px|thumb|Source: https://getaegis.app]]&lt;br /&gt;
&lt;br /&gt;
The most common solution is to use a mobile device (e.g. your smartphone or tablet) as a Software Token by installing one of the following apps:&lt;br /&gt;
* 2FAS for [https://play.google.com/store/apps/details?id=com.twofasapp Android] or [https://apps.apple.com/us/app/2fa-authenticator-2fas/id1217793794 iOS] ([https://2fas.com/ Web Page] and [https://github.com/twofas GitHub], &#039;&#039;Apple and Google Cloud can be used for backups depending on the operating system.&#039;&#039;)&lt;br /&gt;
* Open Source FreeOTP ([https://github.com/freeotp GitHub]) on [https://f-droid.org/en/packages/org.fedorahosted.freeotp/ F-Droid], [https://play.google.com/store/search?q=freeotp Android] or [https://apps.apple.com/de/app/freeotp-authenticator/id872559395 iOS] with a possibility of local backup files.&lt;br /&gt;
* Google Authenticator for [https://play.google.com/store/apps/details?id=com.google.android.apps.authenticator2 Android] or [https://apps.apple.com/de/app/google-authenticator/id388497605 iOS] (&#039;&#039;Google Cloud can be used for backups, but these backups are not encrypted and can therefore be read by Google!&#039;&#039;)&lt;br /&gt;
* Microsoft Authenticator for [https://play.google.com/store/apps/details?id=com.azure.authenticator Android] or [https://apps.apple.com/de/app/microsoft-authenticator/id983156458 iOS] ([https://www.microsoft.com/de-de/security/mobile-authenticator-app Web Page])&lt;br /&gt;
* LastPass Authenticator for [https://play.google.com/store/apps/details?id=com.lastpass.authenticator Android], [https://apps.apple.com/us/app/lastpass-authenticator/id1079110004 iOS] or [https://lastpass.com/auth/ Windows]&lt;br /&gt;
* Aegis Authenticator for [https://play.google.com/store/apps/details?id=com.beemdevelopment.aegis Android (Google Play)] or [https://f-droid.org/en/packages/com.beemdevelopment.aegis/ Android (F-Droid)] ([https://getaegis.app/ Web Page])&lt;br /&gt;
* OTP Auth for [https://apps.apple.com/app/otp-auth/id659877384 iOS]&lt;br /&gt;
* (&#039;&#039;Authy for [https://play.google.com/store/apps/details?id=com.authy.authy Android], [https://apps.apple.com/us/app/authy/id494168017 iOS], [https://authy.com/download/ Mac, Windows or Linux], requires account&#039;&#039;)&lt;br /&gt;
(&#039;&#039;These are only suggestions. You can use any application compatible with the [https://tools.ietf.org/html/rfc6238 TOTP] standard.&#039;&#039;)&lt;br /&gt;
&lt;br /&gt;
[https://www.yubico.com/resources/glossary/yubico-otp/ &#039;&#039;&#039;Yubico OTP&#039;&#039;&#039;] is also supported if you want to use your Yubikey without depending on having a six-digit code displayed.&lt;br /&gt;
&lt;br /&gt;
= Token Management =&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
* Create at least two separate tokens: &#039;&#039;&#039;FIRST&#039;&#039;&#039; set up a software/hardware TOTP token. &#039;&#039;&#039;THEN&#039;&#039;&#039; create and print a &amp;quot;backup TAN list&amp;quot;. Never create the &amp;quot;backup TAN list&amp;quot; first.&lt;br /&gt;
* If you lose access to all your tokens, you will not be able to create new tokens and support will have to reset your tokens manually.&lt;br /&gt;
* The &amp;quot;backup TAN list&amp;quot; should always be created and printed in a &#039;&#039;&#039;second step&#039;&#039;&#039;. The printout should be kept in a separate place for emergencies.&lt;br /&gt;
* Please clean up your second factors as soon as you have created new tokens. Tokens that can no longer be used (e.g. because not initialized, smartphone/Yubikey lost, etc.) or an old backup TAN list where you have already used all TANs or there is no printout should be deactivated and deleted.&lt;br /&gt;
* Returning users who have already activated one or more tokens must first verify their token before they can create new tokens, see section [[Registration/2FA#Returning_Users|Returning Users]].&lt;br /&gt;
* &#039;&#039;&#039;Please disable all privacy tools, ad blockers and further add-ons when registering new tokens.&#039;&#039;&#039; These tools prevent the registration website from generating new security tokens. When the problems remains (you can not generate the QR code or can not confirm it by clicking CHECK), please try once more with an entirely new unmodified web browser profile.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;bwUniCluster/bwForCluster Tokens&#039;&#039;&#039; are generally managed via the &#039;&#039;&#039;Index -&amp;gt; My Tokens&#039;&#039;&#039; menu entry on the registration pages for the clusters. Here you can register, activate, deactivate and delete tokens.&lt;br /&gt;
&lt;br /&gt;
To activate the second factor, &#039;&#039;&#039;please perform the following steps:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
1. &#039;&#039;&#039;Select the registration server of the cluster&#039;&#039;&#039; for which you want to create a second factor and login to it:&amp;lt;/br&amp;gt; &amp;amp;rarr; [https://login.bwidm.de/user/twofa.xhtml Registration server for &#039;&#039;&#039;bwUniCluster 3.0&#039;&#039;&#039;, &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; and &#039;&#039;&#039;bwForCluster NEMO 2&#039;&#039;&#039;] (2FA tokens are valid for all three clusters; KIT members can reuse their existing hardware and software tokens)&amp;lt;/br&amp;gt; &amp;amp;rarr; [https://bwservices.uni-heidelberg.de/user/twofa.xhtml Registration server for &#039;&#039;&#039;bwForCluster Helix&#039;&#039;&#039;]&lt;br /&gt;
[[File:BwIDM-twofa.png|center|600px|thumb|My Tokens]]&lt;br /&gt;
&lt;br /&gt;
2. &#039;&#039;&#039;Register a new &amp;quot;[[Registration/2FA#Registering_a_new_Software_Token_using_a_Mobile_APP|Smartphone Token]]&amp;quot;&#039;&#039;&#039; or if you own a [https://www.yubico.com/ Yubikey]&#039;&#039;&#039; register a new &amp;quot;[[Registration/2FA#Registering_a_new_Yubikey_OTP_Token|Yubikey Token]]&amp;quot;&#039;&#039;&#039; or &#039;&#039;&#039;&amp;quot;[[Registration/2FA#Registering_a_new_Yubikey_OATH_TOTP_Token|Yubikey OATH TOTP Token]]&amp;quot;&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
3. &#039;&#039;&#039;Register a new &amp;quot;[[Registration/2FA#Backup_TAN_List|TAN List]]&amp;quot; (backup TAN list)&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
4. Repeat step 2. for additional tokens.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Registering a new Software Token using a Mobile APP ==&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
Please disable all privacy tools, ad blockers and further add-ons when registering new tokens.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
1. Select the [[Registration/2FA#Token_Management|registration server]] of the cluster for which you want to create a second factor and login to it.&lt;br /&gt;
&lt;br /&gt;
2. Registering a new Token starts with a click &#039;&#039;&#039;NEW SMARTPHONE TOKEN&#039;&#039;&#039;.&lt;br /&gt;
[[File:BwIDM-token.png|center|600px|thumb|Create a new Token]]&lt;br /&gt;
&lt;br /&gt;
3. A new window opens. Click &#039;&#039;&#039;Start&#039;&#039;&#039; to generate a new &#039;&#039;&#039;QR code&#039;&#039;&#039;.&lt;br /&gt;
This may take a while.&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
The QR code contains a key which has to remain secret.&lt;br /&gt;
Only use the QR code to link your software token app with bwIDM/bwServices in the next step.&lt;br /&gt;
Do not save the QR code, print it out or share it with someone else.&lt;br /&gt;
|}&lt;br /&gt;
[[File:BwIDM-qr.png|center|600px|thumb|QR Code for Mobile App]]&lt;br /&gt;
&lt;br /&gt;
4. Start the software token app on your separate device and scan the QR code.&lt;br /&gt;
The exact process is a little bit different in every app, but is usually started by pressing on a button with a plus (+) sign or an icon of a QR code.&lt;br /&gt;
&lt;br /&gt;
5. Once the QR code has been loaded into your Software Token app there should be a new entry called &#039;&#039;&#039;bwIDM&#039;&#039;&#039; (bwUniCluster, JUSTUS 2 and NEMO2) or &#039;&#039;&#039;bwServices&#039;&#039;&#039; (Helix).&lt;br /&gt;
Generate an One-Time-Password by pressing on this entry or selecting the appropriate button/menu item.&lt;br /&gt;
You will receive a six-digit code.&lt;br /&gt;
Enter this code into the field labeled &amp;quot;Current code:&amp;quot; in your bwIDM browser window to prove that the connection has worked and then click &#039;&#039;&#039;CHECK&#039;&#039;&#039;.&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
If you do not confirm the token by entering the six-digit code in the &amp;quot;Current code:&amp;quot; field, the token will &#039;&#039;&#039;NOT&#039;&#039;&#039; be initialized!&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
6. If everything worked as expected, you will be returned to the &#039;&#039;&#039;My Tokens&#039;&#039;&#039; screen and there will be a new entry for your software token.&lt;br /&gt;
[[File:BwIDM-app.png|center|400px|thumb|Success]]&lt;br /&gt;
&lt;br /&gt;
7. Repeat the process to register additional tokens.&lt;br /&gt;
Please register at least the &amp;quot;Backup TAN list&amp;quot; in addition to the hardware/software token you plan to use regularly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Registering a new Yubikey OTP Token ==&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
Please disable all privacy tools, ad blockers and further add-ons when registering new tokens.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
[https://developers.yubico.com/OTP/OTPs_Explained.html Yubikey OTP] is even easier and you don&#039;t need a device that displays the six-digit code or extra software.&lt;br /&gt;
New Yubikeys are already configured to provide Yubikey OTP in slot 1.&lt;br /&gt;
If you need to configure your Yubikey, read this [[Registration/2FA/Yubikey|documentation]].&lt;br /&gt;
&lt;br /&gt;
1. Select the [[Registration/2FA#Token_Management|registration server]] of the cluster for which you want to create a second factor and login to it.&lt;br /&gt;
&lt;br /&gt;
2. If you want to use [https://www.yubico.com/resources/glossary/yubico-otp/ Yubico OTP], you can click &#039;&#039;&#039;NEW YUBIKEY TOKEN&#039;&#039;&#039; instead.&lt;br /&gt;
[[File:BwIDM-token.png|center|600px|thumb|Generate Yubikey OTP]]&lt;br /&gt;
&lt;br /&gt;
3. Yubikey OTP is configured to slot 1 on new Yubikeys, so you only need to click in the text box and then touch the metal part of your Yubikey.&lt;br /&gt;
Please refer to this [[Registration/2FA/Yubikey|documentation]] on how to configure your Yubikey.&lt;br /&gt;
[[File:BwIDM-yubikey.png|center|400px|thumb|Yubikey OTP]]&lt;br /&gt;
&lt;br /&gt;
4. If everything worked as expected, you will be returned to the &#039;&#039;&#039;My Tokens&#039;&#039;&#039; screen and there will be a new entry for your Yubikey.&lt;br /&gt;
[[File:BwIDM-yubikey2.png|center|400px|thumb|Success]]&lt;br /&gt;
&lt;br /&gt;
5. Repeat the process to register additional tokens.&lt;br /&gt;
Please register at least the &amp;quot;Backup TAN list&amp;quot; in addition to the hardware/software token you plan to use regularly.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Registering a new Yubikey OATH TOTP Token ==&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
Please disable all privacy tools, ad blockers and further add-ons when registering new tokens.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
[https://developers.yubico.com/OATH/ Yubikey OATH TOTP] generates the TANs on your Yubikey and therefore you can use different computers and phones to generate these codes.&lt;br /&gt;
Please download and install [https://developers.yubico.com/OATH/YubiKey_OATH_software.html Yubico Authenticator] for desktop (or Android/iOS) first.&lt;br /&gt;
Insert your Yubikey in your computer.&lt;br /&gt;
&amp;quot;Yubikey OTP&amp;quot; (not &amp;quot;Yubikey OATH TOTP&amp;quot;) is even easier and you don&#039;t need a device that displays the six-digit code or extra software (go to step [[Registration/2FA#Yubikey_OTP|Yubikey OTP]]).&lt;br /&gt;
&lt;br /&gt;
1. Select the [[Registration/2FA#Token_Management|registration server]] of the cluster for which you want to create a second factor and login to it.&lt;br /&gt;
&lt;br /&gt;
2. Registering a new Token starts with a click &#039;&#039;&#039;NEW SMARTPHONE TOKEN&#039;&#039;&#039;.&lt;br /&gt;
[[File:BwIDM-token.png|center|600px|thumb|Create a new Token]]&lt;br /&gt;
&lt;br /&gt;
3. A new window opens. Click &#039;&#039;&#039;Start&#039;&#039;&#039; to generate a new &#039;&#039;&#039;QR code&#039;&#039;&#039;.&lt;br /&gt;
This may take a while.&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
The QR code contains a key which has to remain secret.&lt;br /&gt;
Only use the QR code to link your software token app with bwIDM/bwServices in the next step.&lt;br /&gt;
Do not save the QR code, print it out or share it with someone else.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
4. Start the Yubico Authenticator on your OS, click the three vertical dots in the upper right corner and select &#039;&#039;&#039;Scan QR code&#039;&#039;&#039;.&lt;br /&gt;
[[File:BwIDM-yubi1.png|center|600px|thumb|QR Code and Yubico Authenticator on Linux]]&lt;br /&gt;
&lt;br /&gt;
5. Yubico Authenticator automatically translates the QR code to a new entry called &#039;&#039;&#039;bwIDM&#039;&#039;&#039; or &#039;&#039;&#039;bwServices&#039;&#039;&#039; (Helix).&lt;br /&gt;
Click &#039;&#039;&#039;Add account&#039;&#039;&#039;.&lt;br /&gt;
[[File:BwIDM-yubi2.png|center|600px|thumb|Create new TOTP on Yubico Authenticator]]&lt;br /&gt;
&lt;br /&gt;
6. You will receive a six-digit code.&lt;br /&gt;
Enter this code into the field labeled &amp;quot;Current code:&amp;quot; in your bwIDM browser window to prove that the connection has worked and then click &#039;&#039;&#039;CHECK&#039;&#039;&#039;.&lt;br /&gt;
[[File:BwIDM-yubi3.png|center|600px|thumb|Verify TOTP]]&lt;br /&gt;
&lt;br /&gt;
7. If everything worked as expected, you will be returned to the &#039;&#039;&#039;My Tokens&#039;&#039;&#039; screen and there will be a new entry for your software token.&lt;br /&gt;
[[File:BwIDM-app.png|center|400px|thumb|Success]]&lt;br /&gt;
&lt;br /&gt;
8. Repeat the process to register additional tokens.&lt;br /&gt;
Please register at least the &amp;quot;Backup TAN list&amp;quot; in addition to the hardware/software token you plan to use regularly.&lt;br /&gt;
&lt;br /&gt;
== Backup TAN List ==&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
Passwords from the &amp;quot;Backup TAN list&amp;quot; should only be used if no other token is left.&lt;br /&gt;
Please do not use the Backup TANs for regular cluster login, because you have only a limited number of TANs.&lt;br /&gt;
Each TAN can only be used once.&lt;br /&gt;
Please disable all privacy tools, ad blockers and further add-ons when registering a new Backup TAN list.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
1. Select the [[Registration/2FA#Token_Management|registration server]] of the cluster for which you want to create a second factor and login to it.&lt;br /&gt;
&lt;br /&gt;
2. Please create at least one &amp;quot;Backup TAN list&amp;quot; by clicking &#039;&#039;&#039;CREATE NEW TAN LIST&#039;&#039;&#039;.&lt;br /&gt;
[[File:BwIDM-token.png|center|600px|thumb|Generate Backup TAN list]]&lt;br /&gt;
&lt;br /&gt;
3. Click &#039;&#039;&#039;START&#039;&#039;&#039;. You will be redirected to the &#039;&#039;&#039;My Tokens&#039;&#039;&#039; screen and there will be a new entry for your backup TANs.&lt;br /&gt;
[[File:BwIDM-tan.png|center|400px|thumb|Success]]&lt;br /&gt;
&lt;br /&gt;
4. Click &#039;&#039;&#039;SHOW TANS&#039;&#039;&#039;, print the codes and keep then in a separate place for emergencies.&lt;br /&gt;
[[File:JUSTUS-2-2FA-backup-TAN-list.png|center|800px|thumb|Print Backup TAN List]]&lt;br /&gt;
&lt;br /&gt;
5. Repeat the process to register additional tokens.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Deactivating a Token ==&lt;br /&gt;
&lt;br /&gt;
Click &#039;&#039;&#039;Disable&#039;&#039;&#039; next to the Token entry on the &#039;&#039;&#039;My Tokens&#039;&#039;&#039; screen.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Deleting a Token ==&lt;br /&gt;
&lt;br /&gt;
After a Token has been disabled a new button labeled &#039;&#039;&#039;Delete&#039;&#039;&#039; will appear. Click on it to delete the token.&lt;br /&gt;
&lt;br /&gt;
= Returning Users =&lt;br /&gt;
&lt;br /&gt;
Returning users who have already activated one or more tokens must first verify their token before they can create new tokens or deactivate/delete old ones.&lt;br /&gt;
If you no longer have valid tokens, you will not be able to create or manage tokens. &lt;br /&gt;
In this case, read the section [[Registration/2FA#Lost_Token|Lost Token]].&lt;br /&gt;
[[File:BwIDM-totp.png|center|400px|thumb|Returning users must first verify their token.]]&lt;br /&gt;
&lt;br /&gt;
= Lost Token =&lt;br /&gt;
&lt;br /&gt;
If you change your phone, please migrate your tokens first or register your new mobile app under &amp;quot;My Tokens&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;If you no longer have valid tokens (mobile app, hardware token, Yubikey or backup TAN, i.e. lost or broken smartphone), you can not access the section &amp;quot;My Tokens&amp;quot; anymore.&lt;br /&gt;
In this case you will need to contact the [https://www.bwhpc.de/supportportal ticket system].&#039;&#039;&#039;&lt;br /&gt;
Open a ticket, include your user name, the name of the bwHPC cluster and ask for a reset of your 2FA tokens.&lt;br /&gt;
Please note that this process may take some time and also means additional work for the operators.&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Hardware_and_Architecture&amp;diff=15988</id>
		<title>BwUniCluster3.0/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Hardware_and_Architecture&amp;diff=15988"/>
		<updated>2026-04-21T10:59:26Z</updated>

		<summary type="html">&lt;p&gt;S Braun: /* Compute nodes */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Architecture of bwUniCluster 3.0 =&lt;br /&gt;
&lt;br /&gt;
The &#039;&#039;&#039;bwUniCluster 3.0&#039;&#039;&#039; is a parallel computer with distributed memory. &lt;br /&gt;
It consists of the bwUniCluster 3.0 components procured in 2024 and also includes the additional compute nodes which were procured as an extension to the bwUniCluster 2.0 in 2022.&lt;br /&gt;
 &lt;br /&gt;
Each node of the system consists of two Intel Xeon or AMD EPYC processors, local memory, local storage, network adapters and optional accelerators (NVIDIA A100 and H100, AMD Instinct MI300A). All nodes are connected via a fast InfiniBand interconnect.&lt;br /&gt;
&lt;br /&gt;
The parallel file system (Lustre) is connected to the InfiniBand switch of the compute cluster. This provides a fast and scalable parallel file &lt;br /&gt;
system.&lt;br /&gt;
&lt;br /&gt;
The operating system on each node is Red Hat Enterprise Linux (RHEL) 9.4.&lt;br /&gt;
&lt;br /&gt;
The individual nodes of the system act in different roles. From an end users point of view the different groups of nodes are login nodes and compute nodes. File server nodes and administrative server nodes are not accessible by users.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Login Nodes&#039;&#039;&#039;&amp;lt;br&amp;gt;&lt;br /&gt;
The login nodes are the only nodes directly accessible by end users. These nodes are used for interactive login, file management, program development, and interactive pre- and post-processing.&lt;br /&gt;
There are two nodes dedicated to this service, but they can all be reached from a single address: &amp;lt;code&amp;gt;uc3.scc.kit.edu&amp;lt;/code&amp;gt;. A DNS round-robin alias distributes login sessions to the login nodes.&lt;br /&gt;
To prevent login nodes from being used for activities that are not permitted there and that affect the user experience of other users, &#039;&#039;&#039;long-running and/or compute-intensive tasks are periodically terminated without any prior warning&#039;&#039;&#039;. Please refer to [[BwUniCluster3.0/Login#Allowed_Activities_on_Login_Nodes|Allowed Activities on Login Nodes]].&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Compute Nodes&#039;&#039;&#039;&amp;lt;br&amp;gt;&lt;br /&gt;
The majority of nodes are compute nodes which are managed by a batch system. Users submit their jobs to the SLURM batch system and a job is executed when the required resources become available (depending on its fair-share priority).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;File Systems&#039;&#039;&#039;&amp;lt;br&amp;gt;&lt;br /&gt;
bwUniCluster 3.0 comprises two parallel file systems based on Lustre.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:uc3.png|Optionen|center|Überschrift|800px]]&lt;br /&gt;
&lt;br /&gt;
= Compute Resources =&lt;br /&gt;
&lt;br /&gt;
== Login nodes ==&lt;br /&gt;
&lt;br /&gt;
After a successful [[BwUniCluster3.0/Login|login]], users find themselves on one of the so called login nodes. Technically, these largely correspond to a standard CPU node, i.e. users have two AMD EPYC 9454 processors with a total of 96 cores at their disposal. Login nodes are the bridgehead for accessing computing resources.&lt;br /&gt;
Data and software are organized here, computing jobs are initiated and managed, and computing resources allocated for interactive use can also be accessed from here.&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#ffa500; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#ffa500; text-align:left&amp;quot;|&lt;br /&gt;
&#039;&#039;&#039;Any compute intensive job running on the login nodes will be terminated without any notice.&#039;&#039;&#039;&amp;lt;br/&amp;gt;&lt;br /&gt;
Please refer to [[BwUniCluster3.0/Login#Allowed_Activities_on_Login_Nodes|Allowed Activities on Login Nodes]].&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Compute nodes ==&lt;br /&gt;
All compute activities on bwUniCluster 3.0 have to be performed on the compute nodes. Compute nodes are only available by requesting the corresponding resources via the queuing system. As soon as the requested resources are available, automated tasks are executed via a batch script or they can be accessed interactively. Please refer to [[BwUniCluster3.0/Running_Jobs|Running Jobs]] on how to request resources.&amp;lt;br&amp;gt;&lt;br /&gt;
The following compute node types are available:&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;CPU nodes&amp;lt;/b&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;Standard&#039;&#039;&#039;: Two AMD EPYC 9454 processors per node with a total of 96 physical CPU cores or 192 logical cores (Hyper-Threading) per node. The nodes have been procured in 2024.&lt;br /&gt;
* &#039;&#039;&#039;Ice Lake&#039;&#039;&#039;: Two Intel Xeon Platinum 8358 processors per node with a total of 64 physical CPU cores or 128 logical cores (Hyper-Threading) per node. The nodes have been procured in 2022 as an extension to bwUniCluster 2.0.&lt;br /&gt;
* &#039;&#039;&#039;High Memory&#039;&#039;&#039;: Similar to the standard nodes, but with six times larger memory.&lt;br /&gt;
&amp;lt;b&amp;gt;GPU nodes&amp;lt;/b&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;NVIDIA GPU x4&#039;&#039;&#039;: Similar to the standard nodes, but with larger memory and four NVIDIA H100 GPUs.&lt;br /&gt;
* &#039;&#039;&#039;AMD GPU x4&#039;&#039;&#039;: AMD&#039;s accelerated processing unit (APU) MI300A with 4 CPU sockets and 4 compute units which share the same high-bandwidth memory (HBM).&lt;br /&gt;
* &#039;&#039;&#039;Ice Lake NVIDIA GPU x4&#039;&#039;&#039;: Similar to the Ice Lake nodes, but with larger memory and four NVIDIA A100 or H100 GPUs.&lt;br /&gt;
* &#039;&#039;&#039;Cascade Lake NVIDIA GPU x4&#039;&#039;&#039;: Nodes with four NVIDIA A100 GPUs.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| CPU nodes&amp;lt;br/&amp;gt;High Memory&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU node&amp;lt;br/&amp;gt;AMD GPU x4&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU nodes&amp;lt;br/&amp;gt;Cascade Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Login nodes&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Availability in [[BwUniCluster3.0/Running_Jobs#Queues_on_bwUniCluster_3.0| queues]]&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu_il&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;dev_cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;dev_cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;highmem&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;dev_highmem&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_h100&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;dev_gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_mi300&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_il&amp;lt;/code&amp;gt; / &amp;lt;code&amp;gt;gpu_h100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_short&amp;lt;/code&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
| -&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Number of nodes&lt;br /&gt;
| 272&lt;br /&gt;
| 80&lt;br /&gt;
| 5&lt;br /&gt;
| 12&lt;br /&gt;
| 1&lt;br /&gt;
| 15&lt;br /&gt;
| 19&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Processors&lt;br /&gt;
| Intel Xeon Platinum 8358&lt;br /&gt;
| AMD EPYC 9454&lt;br /&gt;
| AMD EPYC 9454&lt;br /&gt;
| AMD EPYC 9454&lt;br /&gt;
| AMD Zen 4&lt;br /&gt;
| Intel Xeon Platinum 8358&lt;br /&gt;
| Intel Xeon Gold 6248R&lt;br /&gt;
| AMD EPYC 9454&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Number of sockets&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
| 4&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Total number of cores&lt;br /&gt;
| 64&lt;br /&gt;
| 96&lt;br /&gt;
| 96&lt;br /&gt;
| 96&lt;br /&gt;
| 96 (4x 24)&lt;br /&gt;
| 64&lt;br /&gt;
| 48&lt;br /&gt;
| 96&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Main memory&lt;br /&gt;
| 256 GiB&lt;br /&gt;
| 384 GiB&lt;br /&gt;
| 2304 GiB&lt;br /&gt;
| 768 GiB&lt;br /&gt;
| 4x 128 GiB HBM3&lt;br /&gt;
| 512 GiB&lt;br /&gt;
| 384 GiB&lt;br /&gt;
| 384 GiB&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Local SSD&lt;br /&gt;
| 1.8 TB NVMe&lt;br /&gt;
| 3.84 TB NVMe&lt;br /&gt;
| 15.36 TB NVMe&lt;br /&gt;
| 15.36 TB NVMe&lt;br /&gt;
| 7.68 TB NVMe&lt;br /&gt;
| 6.4 TB NVMe&lt;br /&gt;
| 1.92 TB SATA SSD&lt;br /&gt;
| 7.68 TB SATA SSD&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Accelerators&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 4x NVIDIA H100 &lt;br /&gt;
| 4x AMD Instinct MI300A&lt;br /&gt;
| 4x NVIDIA A100 / H100 &lt;br /&gt;
| 4x NVIDIA A100&lt;br /&gt;
| -&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Accelerator memory&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 94 GiB&lt;br /&gt;
| APU&lt;br /&gt;
| 80 GiB / 94 GiB&lt;br /&gt;
| 40 GiB&lt;br /&gt;
| -&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Interconnect&lt;br /&gt;
| IB HDR200 &lt;br /&gt;
| IB 2x NDR200&lt;br /&gt;
| IB 2x NDR200&lt;br /&gt;
| IB 4x NDR200&lt;br /&gt;
| IB 2x NDR200&lt;br /&gt;
| IB 2x HDR200 &lt;br /&gt;
| IB 4x EDR&lt;br /&gt;
| IB 1x NDR200&lt;br /&gt;
|}&lt;br /&gt;
Table 1: Hardware overview and properties&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 the following file systems are available:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;$HOME&#039;&#039;&#039;&amp;lt;br&amp;gt;The HOME directory is created automatically after account activation, and the environment variable $HOME holds its name. HOME is the place, where users find themselves after login.&lt;br /&gt;
* &#039;&#039;&#039;Workspaces&#039;&#039;&#039;&amp;lt;br&amp;gt;Users can create so-called workspaces for non-permanent data with temporary lifetime.&lt;br /&gt;
* &#039;&#039;&#039;Workspaces on flash storage&#039;&#039;&#039;&amp;lt;br&amp;gt;A further workspace file system based on flash-only storage is available for special requirements and certain users.&lt;br /&gt;
* &#039;&#039;&#039;$TMPDIR&#039;&#039;&#039;&amp;lt;br&amp;gt;The directory $TMPDIR is only available and visible on the local node during the runtime of a compute job. It is located on fast SSD storage devices.&lt;br /&gt;
* &#039;&#039;&#039;BeeOND&#039;&#039;&#039; (BeeGFS On-Demand)&amp;lt;br&amp;gt;On request a parallel on-demand file system (BeeOND) is created which uses the SSDs of the nodes which were allocated to the batch job.&lt;br /&gt;
* &#039;&#039;&#039;LSDF Online Storage&#039;&#039;&#039;&amp;lt;br&amp;gt;On request the external LSDF Online Storage is mounted on the nodes which were allocated to the batch job. On the login nodes, LSDF is automatically mounted.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Which file system to use?&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
You should separate your data and store it on the appropriate file system.&lt;br /&gt;
Permanently needed data like software or important results should be stored in $HOME but capacity restrictions (quotas) apply.&lt;br /&gt;
In case you accidentally deleted data on $HOME there is a chance that we can restore it from backup.&lt;br /&gt;
Permanent data which is not needed for months or exceeds the capacity restrictions should be sent to the LSDF Online Storage or to the archive and deleted from the file systems. Temporary data which is only needed on a single node and which does not exceed the disk space shown in Table 1 above should be stored&lt;br /&gt;
below $TMPDIR. Data which is read many times on a single node, e.g. if you are doing AI training, &lt;br /&gt;
should be copied to $TMPDIR and read from there. Temporary data which is used from many nodes &lt;br /&gt;
of your batch job and which is only needed during job runtime should be stored on a &lt;br /&gt;
parallel on-demand file system BeeOND. Temporary data which can be recomputed or which is the &lt;br /&gt;
result of one job and input for another job should be stored in workspaces. The lifetime &lt;br /&gt;
of data in workspaces is limited and depends on the lifetime of the workspace which can be &lt;br /&gt;
several months.&lt;br /&gt;
&lt;br /&gt;
For further details please check: [[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details|File System Details]]&lt;br /&gt;
&lt;br /&gt;
== $HOME ==&lt;br /&gt;
&lt;br /&gt;
The $HOME directories of bwUniCluster 3.0 users are located on the parallel file system Lustre.&lt;br /&gt;
You have access to your $HOME directory from all nodes of UC3. A regular backup of these directories &lt;br /&gt;
to tape archive is done automatically. The directory $HOME is used to hold those files that are&lt;br /&gt;
permanently used like source codes, configuration files, executable programs etc.&lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#$HOME|Detailed information on $HOME]]&lt;br /&gt;
&lt;br /&gt;
== Workspaces ==&lt;br /&gt;
&lt;br /&gt;
On UC3 workspaces should be used to store large non-permanent data sets, e.g. restart files or output&lt;br /&gt;
data that has to be post-processed. The file system used for workspaces is also the parallel file system Lustre. This file system is especially designed for parallel access and for a high throughput to large&lt;br /&gt;
files. It is able to provide high data transfer rates of up to 40 GB/s write and read performance when data access is parallel. &lt;br /&gt;
&lt;br /&gt;
On UC3 there is a default user quota limit of 40 TiB and 20 million inodes (files and directories) per user.&lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#Workspaces|Detailed information on Workspaces]]&lt;br /&gt;
&lt;br /&gt;
== Workspaces on flash storage ==&lt;br /&gt;
&lt;br /&gt;
Another workspace file system based on flash-only storage is available for special requirements and certain users.&lt;br /&gt;
If possible, this file system should be used from the Ice Lake nodes of bwUniCluster 3.0 (queue &#039;&#039;cpu_il&#039;&#039;). &lt;br /&gt;
It provides high IOPS rates and better performance for small files. The quota limts are lower than on the &lt;br /&gt;
normal workspace file system.&lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#Workspaces_on_flash_storage|Detailed information on Workspaces on flash storage]]&lt;br /&gt;
&lt;br /&gt;
== $TMPDIR ==&lt;br /&gt;
&lt;br /&gt;
The environment variable $TMPDIR contains the name of a directory which is located on the local SSD of each node. &lt;br /&gt;
This directory should be used for temporary files being accessed from the local node. It should &lt;br /&gt;
also be used if you read the same data many times from a single node, e.g. if you are doing AI training. &lt;br /&gt;
Because of the extremely fast local SSD storage devices performance with small files is much better than on the parallel file systems. &lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#$TMPDIR|Detailed information on $TMPDIR]]&lt;br /&gt;
&lt;br /&gt;
== BeeOND (BeeGFS On-Demand) ==&lt;br /&gt;
&lt;br /&gt;
Users have the possibility to request a private BeeOND (on-demand BeeGFS) parallel filesystem for each job. The file system is created during job startup and purged when your job completes.&lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#BeeOND_(BeeGFS_On-Demand)|Detailed information on BeeOND]]&lt;br /&gt;
&lt;br /&gt;
== LSDF Online Storage ==&lt;br /&gt;
&lt;br /&gt;
The LSDF Online Storage allows dedicated users to store scientific measurement data and simulation results. BwUniCluster 3.0 has an extremely fast network connection to the LSDF Online Storage. This file system provides external access via different protocols and is only available for certain users.&lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#LSDF_Online_Storage|Detailed information on LSDF Online Storage]]&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Hardware_and_Architecture&amp;diff=15987</id>
		<title>BwUniCluster3.0/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Hardware_and_Architecture&amp;diff=15987"/>
		<updated>2026-04-21T10:58:43Z</updated>

		<summary type="html">&lt;p&gt;S Braun: /* Compute nodes */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Architecture of bwUniCluster 3.0 =&lt;br /&gt;
&lt;br /&gt;
The &#039;&#039;&#039;bwUniCluster 3.0&#039;&#039;&#039; is a parallel computer with distributed memory. &lt;br /&gt;
It consists of the bwUniCluster 3.0 components procured in 2024 and also includes the additional compute nodes which were procured as an extension to the bwUniCluster 2.0 in 2022.&lt;br /&gt;
 &lt;br /&gt;
Each node of the system consists of two Intel Xeon or AMD EPYC processors, local memory, local storage, network adapters and optional accelerators (NVIDIA A100 and H100, AMD Instinct MI300A). All nodes are connected via a fast InfiniBand interconnect.&lt;br /&gt;
&lt;br /&gt;
The parallel file system (Lustre) is connected to the InfiniBand switch of the compute cluster. This provides a fast and scalable parallel file &lt;br /&gt;
system.&lt;br /&gt;
&lt;br /&gt;
The operating system on each node is Red Hat Enterprise Linux (RHEL) 9.4.&lt;br /&gt;
&lt;br /&gt;
The individual nodes of the system act in different roles. From an end users point of view the different groups of nodes are login nodes and compute nodes. File server nodes and administrative server nodes are not accessible by users.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Login Nodes&#039;&#039;&#039;&amp;lt;br&amp;gt;&lt;br /&gt;
The login nodes are the only nodes directly accessible by end users. These nodes are used for interactive login, file management, program development, and interactive pre- and post-processing.&lt;br /&gt;
There are two nodes dedicated to this service, but they can all be reached from a single address: &amp;lt;code&amp;gt;uc3.scc.kit.edu&amp;lt;/code&amp;gt;. A DNS round-robin alias distributes login sessions to the login nodes.&lt;br /&gt;
To prevent login nodes from being used for activities that are not permitted there and that affect the user experience of other users, &#039;&#039;&#039;long-running and/or compute-intensive tasks are periodically terminated without any prior warning&#039;&#039;&#039;. Please refer to [[BwUniCluster3.0/Login#Allowed_Activities_on_Login_Nodes|Allowed Activities on Login Nodes]].&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Compute Nodes&#039;&#039;&#039;&amp;lt;br&amp;gt;&lt;br /&gt;
The majority of nodes are compute nodes which are managed by a batch system. Users submit their jobs to the SLURM batch system and a job is executed when the required resources become available (depending on its fair-share priority).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;File Systems&#039;&#039;&#039;&amp;lt;br&amp;gt;&lt;br /&gt;
bwUniCluster 3.0 comprises two parallel file systems based on Lustre.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:uc3.png|Optionen|center|Überschrift|800px]]&lt;br /&gt;
&lt;br /&gt;
= Compute Resources =&lt;br /&gt;
&lt;br /&gt;
== Login nodes ==&lt;br /&gt;
&lt;br /&gt;
After a successful [[BwUniCluster3.0/Login|login]], users find themselves on one of the so called login nodes. Technically, these largely correspond to a standard CPU node, i.e. users have two AMD EPYC 9454 processors with a total of 96 cores at their disposal. Login nodes are the bridgehead for accessing computing resources.&lt;br /&gt;
Data and software are organized here, computing jobs are initiated and managed, and computing resources allocated for interactive use can also be accessed from here.&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#ffa500; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#ffa500; text-align:left&amp;quot;|&lt;br /&gt;
&#039;&#039;&#039;Any compute intensive job running on the login nodes will be terminated without any notice.&#039;&#039;&#039;&amp;lt;br/&amp;gt;&lt;br /&gt;
Please refer to [[BwUniCluster3.0/Login#Allowed_Activities_on_Login_Nodes|Allowed Activities on Login Nodes]].&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Compute nodes ==&lt;br /&gt;
All compute activities on bwUniCluster 3.0 have to be performed on the compute nodes. Compute nodes are only available by requesting the corresponding resources via the queuing system. As soon as the requested resources are available, automated tasks are executed via a batch script or they can be accessed interactively. Please refer to [[BwUniCluster3.0/Running_Jobs|Running Jobs]] on how to request resources.&amp;lt;br&amp;gt;&lt;br /&gt;
The following compute node types are available:&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;CPU nodes&amp;lt;/b&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;Standard&#039;&#039;&#039;: Two AMD EPYC 9454 processors per node with a total of 96 physical CPU cores or 192 logical cores (Hyper-Threading) per node. The nodes have been procured in 2024.&lt;br /&gt;
* &#039;&#039;&#039;Ice Lake&#039;&#039;&#039;: Two Intel Xeon Platinum 8358 processors per node with a total of 64 physical CPU cores or 128 logical cores (Hyper-Threading) per node. The nodes have been procured in 2022 as an extension to bwUniCluster 2.0.&lt;br /&gt;
* &#039;&#039;&#039;High Memory&#039;&#039;&#039;: Similar to the standard nodes, but with six times larger memory.&lt;br /&gt;
&amp;lt;b&amp;gt;GPU nodes&amp;lt;/b&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;NVIDIA GPU x4&#039;&#039;&#039;: Similar to the standard nodes, but with larger memory and four NVIDIA H100 GPUs.&lt;br /&gt;
* &#039;&#039;&#039;AMD GPU x4&#039;&#039;&#039;: AMD&#039;s accelerated processing unit (APU) MI300A with 4 CPU sockets and 4 compute units which share the same high-bandwidth memory (HBM).&lt;br /&gt;
* &#039;&#039;&#039;Ice Lake NVIDIA GPU x4&#039;&#039;&#039;: Similar to the Ice Lake nodes, but with larger memory and four NVIDIA A100 or H100 GPUs.&lt;br /&gt;
* &#039;&#039;&#039;Cascade Lake NVIDIA GPU x4&#039;&#039;&#039;: Nodes with four NVIDIA A100 GPUs.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| CPU nodes&amp;lt;br/&amp;gt;High Memory&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU node&amp;lt;br/&amp;gt;AMD GPU x4&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU nodes&amp;lt;br/&amp;gt;Cascade Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Login nodes&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Availability in [[BwUniCluster3.0/Running_Jobs#Queues_on_bwUniCluster_3.0| queues]]&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu_il&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;dev_cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;dev_cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;highmem&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;dev_highmem&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_h100&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;dev_gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_mi300&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_il&amp;lt;/code&amp;gt; / &amp;lt;code&amp;gt;gpu_h100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_short&amp;lt;/code&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
| -&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Number of nodes&lt;br /&gt;
| 272&lt;br /&gt;
| 80&lt;br /&gt;
| 4&lt;br /&gt;
| 12&lt;br /&gt;
| 1&lt;br /&gt;
| 15&lt;br /&gt;
| 19&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Processors&lt;br /&gt;
| Intel Xeon Platinum 8358&lt;br /&gt;
| AMD EPYC 9454&lt;br /&gt;
| AMD EPYC 9454&lt;br /&gt;
| AMD EPYC 9454&lt;br /&gt;
| AMD Zen 4&lt;br /&gt;
| Intel Xeon Platinum 8358&lt;br /&gt;
| Intel Xeon Gold 6248R&lt;br /&gt;
| AMD EPYC 9454&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Number of sockets&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
| 4&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Total number of cores&lt;br /&gt;
| 64&lt;br /&gt;
| 96&lt;br /&gt;
| 96&lt;br /&gt;
| 96&lt;br /&gt;
| 96 (4x 24)&lt;br /&gt;
| 64&lt;br /&gt;
| 48&lt;br /&gt;
| 96&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Main memory&lt;br /&gt;
| 256 GiB&lt;br /&gt;
| 384 GiB&lt;br /&gt;
| 2304 GiB&lt;br /&gt;
| 768 GiB&lt;br /&gt;
| 4x 128 GiB HBM3&lt;br /&gt;
| 512 GiB&lt;br /&gt;
| 384 GiB&lt;br /&gt;
| 384 GiB&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Local SSD&lt;br /&gt;
| 1.8 TB NVMe&lt;br /&gt;
| 3.84 TB NVMe&lt;br /&gt;
| 15.36 TB NVMe&lt;br /&gt;
| 15.36 TB NVMe&lt;br /&gt;
| 7.68 TB NVMe&lt;br /&gt;
| 6.4 TB NVMe&lt;br /&gt;
| 1.92 TB SATA SSD&lt;br /&gt;
| 7.68 TB SATA SSD&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Accelerators&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 4x NVIDIA H100 &lt;br /&gt;
| 4x AMD Instinct MI300A&lt;br /&gt;
| 4x NVIDIA A100 / H100 &lt;br /&gt;
| 4x NVIDIA A100&lt;br /&gt;
| -&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Accelerator memory&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 94 GiB&lt;br /&gt;
| APU&lt;br /&gt;
| 80 GiB / 94 GiB&lt;br /&gt;
| 40 GiB&lt;br /&gt;
| -&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Interconnect&lt;br /&gt;
| IB HDR200 &lt;br /&gt;
| IB 2x NDR200&lt;br /&gt;
| IB 2x NDR200&lt;br /&gt;
| IB 4x NDR200&lt;br /&gt;
| IB 2x NDR200&lt;br /&gt;
| IB 2x HDR200 &lt;br /&gt;
| IB 4x EDR&lt;br /&gt;
| IB 1x NDR200&lt;br /&gt;
|}&lt;br /&gt;
Table 1: Hardware overview and properties&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 the following file systems are available:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;$HOME&#039;&#039;&#039;&amp;lt;br&amp;gt;The HOME directory is created automatically after account activation, and the environment variable $HOME holds its name. HOME is the place, where users find themselves after login.&lt;br /&gt;
* &#039;&#039;&#039;Workspaces&#039;&#039;&#039;&amp;lt;br&amp;gt;Users can create so-called workspaces for non-permanent data with temporary lifetime.&lt;br /&gt;
* &#039;&#039;&#039;Workspaces on flash storage&#039;&#039;&#039;&amp;lt;br&amp;gt;A further workspace file system based on flash-only storage is available for special requirements and certain users.&lt;br /&gt;
* &#039;&#039;&#039;$TMPDIR&#039;&#039;&#039;&amp;lt;br&amp;gt;The directory $TMPDIR is only available and visible on the local node during the runtime of a compute job. It is located on fast SSD storage devices.&lt;br /&gt;
* &#039;&#039;&#039;BeeOND&#039;&#039;&#039; (BeeGFS On-Demand)&amp;lt;br&amp;gt;On request a parallel on-demand file system (BeeOND) is created which uses the SSDs of the nodes which were allocated to the batch job.&lt;br /&gt;
* &#039;&#039;&#039;LSDF Online Storage&#039;&#039;&#039;&amp;lt;br&amp;gt;On request the external LSDF Online Storage is mounted on the nodes which were allocated to the batch job. On the login nodes, LSDF is automatically mounted.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Which file system to use?&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
You should separate your data and store it on the appropriate file system.&lt;br /&gt;
Permanently needed data like software or important results should be stored in $HOME but capacity restrictions (quotas) apply.&lt;br /&gt;
In case you accidentally deleted data on $HOME there is a chance that we can restore it from backup.&lt;br /&gt;
Permanent data which is not needed for months or exceeds the capacity restrictions should be sent to the LSDF Online Storage or to the archive and deleted from the file systems. Temporary data which is only needed on a single node and which does not exceed the disk space shown in Table 1 above should be stored&lt;br /&gt;
below $TMPDIR. Data which is read many times on a single node, e.g. if you are doing AI training, &lt;br /&gt;
should be copied to $TMPDIR and read from there. Temporary data which is used from many nodes &lt;br /&gt;
of your batch job and which is only needed during job runtime should be stored on a &lt;br /&gt;
parallel on-demand file system BeeOND. Temporary data which can be recomputed or which is the &lt;br /&gt;
result of one job and input for another job should be stored in workspaces. The lifetime &lt;br /&gt;
of data in workspaces is limited and depends on the lifetime of the workspace which can be &lt;br /&gt;
several months.&lt;br /&gt;
&lt;br /&gt;
For further details please check: [[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details|File System Details]]&lt;br /&gt;
&lt;br /&gt;
== $HOME ==&lt;br /&gt;
&lt;br /&gt;
The $HOME directories of bwUniCluster 3.0 users are located on the parallel file system Lustre.&lt;br /&gt;
You have access to your $HOME directory from all nodes of UC3. A regular backup of these directories &lt;br /&gt;
to tape archive is done automatically. The directory $HOME is used to hold those files that are&lt;br /&gt;
permanently used like source codes, configuration files, executable programs etc.&lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#$HOME|Detailed information on $HOME]]&lt;br /&gt;
&lt;br /&gt;
== Workspaces ==&lt;br /&gt;
&lt;br /&gt;
On UC3 workspaces should be used to store large non-permanent data sets, e.g. restart files or output&lt;br /&gt;
data that has to be post-processed. The file system used for workspaces is also the parallel file system Lustre. This file system is especially designed for parallel access and for a high throughput to large&lt;br /&gt;
files. It is able to provide high data transfer rates of up to 40 GB/s write and read performance when data access is parallel. &lt;br /&gt;
&lt;br /&gt;
On UC3 there is a default user quota limit of 40 TiB and 20 million inodes (files and directories) per user.&lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#Workspaces|Detailed information on Workspaces]]&lt;br /&gt;
&lt;br /&gt;
== Workspaces on flash storage ==&lt;br /&gt;
&lt;br /&gt;
Another workspace file system based on flash-only storage is available for special requirements and certain users.&lt;br /&gt;
If possible, this file system should be used from the Ice Lake nodes of bwUniCluster 3.0 (queue &#039;&#039;cpu_il&#039;&#039;). &lt;br /&gt;
It provides high IOPS rates and better performance for small files. The quota limts are lower than on the &lt;br /&gt;
normal workspace file system.&lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#Workspaces_on_flash_storage|Detailed information on Workspaces on flash storage]]&lt;br /&gt;
&lt;br /&gt;
== $TMPDIR ==&lt;br /&gt;
&lt;br /&gt;
The environment variable $TMPDIR contains the name of a directory which is located on the local SSD of each node. &lt;br /&gt;
This directory should be used for temporary files being accessed from the local node. It should &lt;br /&gt;
also be used if you read the same data many times from a single node, e.g. if you are doing AI training. &lt;br /&gt;
Because of the extremely fast local SSD storage devices performance with small files is much better than on the parallel file systems. &lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#$TMPDIR|Detailed information on $TMPDIR]]&lt;br /&gt;
&lt;br /&gt;
== BeeOND (BeeGFS On-Demand) ==&lt;br /&gt;
&lt;br /&gt;
Users have the possibility to request a private BeeOND (on-demand BeeGFS) parallel filesystem for each job. The file system is created during job startup and purged when your job completes.&lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#BeeOND_(BeeGFS_On-Demand)|Detailed information on BeeOND]]&lt;br /&gt;
&lt;br /&gt;
== LSDF Online Storage ==&lt;br /&gt;
&lt;br /&gt;
The LSDF Online Storage allows dedicated users to store scientific measurement data and simulation results. BwUniCluster 3.0 has an extremely fast network connection to the LSDF Online Storage. This file system provides external access via different protocols and is only available for certain users.&lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#LSDF_Online_Storage|Detailed information on LSDF Online Storage]]&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Development/VS_Code&amp;diff=15986</id>
		<title>Development/VS Code</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Development/VS_Code&amp;diff=15986"/>
		<updated>2026-04-21T09:32:28Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Overview ==&lt;br /&gt;
&lt;br /&gt;
[[File:vscode.png|thumb|Visual Studio Code, Source: https://code.visualstudio.com/|450px]]&lt;br /&gt;
&lt;br /&gt;
[https://github.com/Microsoft/vscode Visual Studio Code] (VS Code) is an open source source-code editor from Microsoft. It has become one of the most popular IDEs according to a [https://survey.stackoverflow.co/2024/technology#1-integrated-development-environment stackoverflow survey]. The functionality of VS Code can easily be extended by installing extensions. These extensions allow for almost arbitrary &#039;&#039;&#039;language support&#039;&#039;&#039;, &#039;&#039;&#039;debugging&#039;&#039;&#039; or &#039;&#039;&#039;remote development&#039;&#039;&#039;. You can install VS Code locally and use it for remote development.&lt;br /&gt;
&lt;br /&gt;
== Visual Studio Code  ==&lt;br /&gt;
Visual Studio Code (VS Code) is a lightweight, extensible code editor from Microsoft that supports many programming languages and features like debugging, and integrated Git. It offers a rich extension marketplace to add language support, themes, and tools tailored to your workflow. VS Code runs on Windows, macOS, and Linux and is popular for its speed, customizability, and strong community ecosystem.&lt;br /&gt;
&lt;br /&gt;
=== Using AI agents ===&lt;br /&gt;
When deploying AI agents on the bwHPC clusters, users must exercise extreme caution and maintain full oversight of the agent&#039;s activities. You are fully responsible for all actions initiated by an agent, including any security breaches or system disruptions it may cause. It is mandatory to strictly monitor resource usage on login nodes, as these are shared resources intended only for lightweight tasks. Any agent found consuming excessive CPU or memory on a login node will be terminated immediately to ensure stability for other users. To comply with usage policies, all AI-driven workloads which generate heavy load must be submitted to the Slurm batch queues rather than running directly on the login nodes.&lt;br /&gt;
&lt;br /&gt;
=== Extension: Remote-SSH ===&lt;br /&gt;
&lt;br /&gt;
In order to remotely develop and debug code at HPC facilities, you can use the [https://code.visualstudio.com/docs/remote/ssh &#039;&#039;&#039;Remote - SSH&#039;&#039;&#039; extension]. The extension allows you to connect your locally installed VS Code with the remote servers. So in contrast to using graphical IDEs within a remote desktop session (RDP, VNC), there are no negative effects like e.g. laggy reactions to your input or blurred display of fonts.&lt;br /&gt;
&lt;br /&gt;
==== Installation and Configuration ====&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-extensions-button.png|vscode-extensions-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to install the Remote - SSH extension, just click on the Extensions (Erweiterungen) button in the left side bar and enter “remote ssh” in the search field. Choose &#039;&#039;&#039;Remote - SSH&#039;&#039;&#039; from the occurring list and click on &#039;&#039;&#039;Install&#039;&#039;&#039;.&lt;br /&gt;
&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[[File:vscode-remoteexplorer-button.png|vscode-remoteexplorer-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to configure remote connections, open the Remote-Explorer extension. On Linux Systems, the file &amp;lt;code&amp;gt;~/.ssh/config&amp;lt;/code&amp;gt; is automatically evaluated. The targets within this file already appear in the left side bar.&lt;br /&gt;
&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[[File:vscode-remoteexplorer-add.png|vscode-remoteexplorer-add.png|350px]]&amp;lt;br&amp;gt;&lt;br /&gt;
If there are no remote ssh targets defined within this file, you can easily add one by clicking on the + symbol. Make sure that “SSH Targets” is active in the drop down menu of the Remote-Explorer. Enter the connection details &amp;lt;code&amp;gt;&amp;amp;lt;user&amp;amp;gt;@&amp;amp;lt;server&amp;amp;gt;&amp;lt;/code&amp;gt;. You will be asked, whether the file &amp;lt;code&amp;gt;~/.ssh/config&amp;lt;/code&amp;gt; should be modified or if another config file should be used or created.&lt;br /&gt;
&lt;br /&gt;
A minimal entry within &amp;lt;code&amp;gt;~/.ssh/config&amp;lt;/code&amp;gt; in order to have a remote target to be listed there could look like this:&lt;br /&gt;
&lt;br /&gt;
 $ cat ~/.ssh/config&lt;br /&gt;
 Host uc3.scc.kit.edu&lt;br /&gt;
   HostName uc3.scc.kit.edu&lt;br /&gt;
   User xy_ab1234&lt;br /&gt;
&lt;br /&gt;
==== Connect to Login Nodes ====&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-remoteexplorer-button.png|vscode-remoteexplorer-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to connect to a remote SSH target, open the Remote-Explorer. Right-click a target and connect in the current or a new window. TOTP and password can be entered in the corresponding input fields that open.&lt;br /&gt;
&lt;br /&gt;
You are now logged in on the remote server. As usual, you can open a project directory with the standard key binding Ctrl+k Ctrl+o. You can now edit and debug code.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Attention&#039;&#039;&#039;: Please remember that you are running and debugging the code on a login node. Do not perform resource-intensive tasks. Furthermore, no GPU resources are available to you.&lt;br /&gt;
&lt;br /&gt;
Extensions, which are installed locally, are only usable on your local machine and are not automatically installed remotely. However, as soon as you open the Extensions-Explorer during a remote session, VS Code proposes to install the locally installed extensions remotely.&lt;br /&gt;
&lt;br /&gt;
==== Disconnect from Login Nodes ====&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-remoteexplorer-indicator.png|images/vscode-remoteexplorer-indicator.png|200px]]&amp;lt;br&amp;gt;&lt;br /&gt;
If you want to end your remote session, click the green box in the lower left corner. In the input box that opens, select the “Close Remote Connection” option. If you simply close your VS Code window, some server-side components of VS Code will continue to run remotely.&lt;br /&gt;
&lt;br /&gt;
=== Access to Compute Nodes ===&lt;br /&gt;
&lt;br /&gt;
The workflow described above does not allow debugging on compute nodes that have been requested via an interactive Slurm job, for example. Debugging GPU codes is therefore also not possible, since this kind of resource is only accessible within Slurm jobs.&lt;br /&gt;
We strongly discourage using the Code Tunnel application, as it violates our access policies. In this scenario, an application running on the compute node connects to a Microsoft or GitHub server. The locally running VS Code then connects to the compute nodes via these external servers, thereby bypassing the login nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Access the compute nodes via VS Code and the Remote-SSH plugin is only possible, if you start a temporarily running SSH service on the compute node which listens to an unprivileged port. By tunneling this port to your local computer, you can connect VS code to it.&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Code-Server ==&lt;br /&gt;
&lt;br /&gt;
The application [https://github.com/cdr/code-server code-server] allows to run the server part of VS Code on any machine, it can be accessed in the web browser running on your local computer. This enables, for example, development and debugging on compute nodes.&lt;br /&gt;
Code-server runs a web server which serves on an unprivileged port. In order to connect your web browser to the remotely running code server, you have to forward this port via a SSH tunnel.&lt;br /&gt;
&lt;br /&gt;
[[File:code-server.png|thumb|code-server.png|VS Code in web browser: code-server, Source: https://github.com/cdr/code-server&amp;quot;&amp;gt;https://github.com/cdr/code-server|400px]]&lt;br /&gt;
&lt;br /&gt;
=== Install Code-Server ===&lt;br /&gt;
&lt;br /&gt;
Code-server is pre-installed on bwUniCluster and accessible via an Lmod module:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;module load devel/code-server&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
On clusters with no code-server module, the application can easily be installed with the description available on the official [https://github.com/coder/code-server GitHub page].&lt;br /&gt;
&lt;br /&gt;
=== Start Code-Server ===&lt;br /&gt;
&lt;br /&gt;
Code-server can be run on either login nodes or compute nodes. In the example shown, an interactive job is started on a GPU partition to run code-server there.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ salloc -p accelerated --gres=gpu:4 --time=30:00 # Start interactive job with 1 GPU&lt;br /&gt;
$ module load devel/code-server                   # Load code-server module&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
When code-server is started, it opens a web server listening on a certain port. The user has to &#039;&#039;&#039;specify the port&#039;&#039;&#039;. It can be chosen freely in the unprivileged range (above 1024). If a port is already assigned, e.g. because several users choose the same port, another port must be chosen.&lt;br /&gt;
&lt;br /&gt;
By starting code-server, you are running a web server that can be accessed by anyone logged in to the cluster. To prevent other people from gaining access to your account and data, this web server is &#039;&#039;&#039;password protected&#039;&#039;&#039;. If no variable &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt; is defined, the password in the default config file &amp;lt;code&amp;gt;~/.config/code-server/config.yaml&amp;lt;/code&amp;gt; is used. If you want to define your own password, you can either change it in the config file or export the variable &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ PASSWORD=&amp;lt;mySecret&amp;gt; \&lt;br /&gt;
    code-server \&lt;br /&gt;
      --bind-addr 0.0.0.0:8081 \&lt;br /&gt;
      --auth password  # Start code-server on port 8081&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;background:#FFCCCC; width:100%;&amp;quot;&lt;br /&gt;
| &#039;&#039;&#039;Security implications&#039;&#039;&#039;&lt;br /&gt;
Please note that by starting &amp;lt;code&amp;gt;code-server&amp;lt;/code&amp;gt; you are running a web server that can be accessed by everyone logged in on the cluster.&amp;lt;br&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;If password protection is disabled, anybody can access your account and your data.&#039;&#039;&#039;&lt;br /&gt;
* Choose a &#039;&#039;&#039;secure password&#039;&#039;&#039;!&lt;br /&gt;
* Do &#039;&#039;&#039;NOT&#039;&#039;&#039; use &amp;lt;code&amp;gt;code-server --link&amp;lt;/code&amp;gt;!&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Connect to code-server ===&lt;br /&gt;
[[File:code-server-hk.png|thumb|Code-server running on GPU node.|400px]]&lt;br /&gt;
&lt;br /&gt;
As soon as code-server is running, it can be accessed in the web browser. In order to establish the connection, a SSH tunnel from your local computer to the remote server has to be created via:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ ssh -L 8081:&amp;lt;computeNodeID&amp;gt;:8081 &amp;lt;userID&amp;gt;@uc3.scc.kit.edu&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
You need to enter the &amp;lt;code&amp;gt;computeNodeID&amp;lt;/code&amp;gt; of the node on which the interactive Slurm job is running. If you have started code server on a login node, just enter &amp;lt;code&amp;gt;localhost&amp;lt;/code&amp;gt;. Now you can open http://127.0.0.1:8081 in your web browser. Possibly, you have to allow your browser to open an insecure (non-https) site. The login site looks as follows:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:code-server-login.png|Code-server login page.|300px]]&lt;br /&gt;
&lt;br /&gt;
Enter the password from &amp;lt;code&amp;gt;~/.config/code-server/config.yaml&amp;lt;/code&amp;gt; or from the &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt; variable. After clicking the “Submit” button, the familiar VS Code interface will open in your browser.&lt;br /&gt;
&lt;br /&gt;
=== End code-server session ===&lt;br /&gt;
&lt;br /&gt;
If you want to temporarily log out from your code-server session you can open the “Application Menu” in the left side bar and click on “Log out”. To &#039;&#039;&#039;terminate&#039;&#039;&#039; the code-server session, you have to cancel it in the interactive Slurm job by pressing ++ctrl+c++.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Connect to Remote Jupyter Kernel ==&lt;br /&gt;
To work with your python scripts and notebooks within VSCode while using the resources of a compute node, you can create a batch job that launches JupyterLab and connect to it via VS Code. To do so, please follow the instructions below. Any parts of the scripts that might need adjustments are marked with the keyword &amp;quot;@params&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=== Simple Use Case ===&lt;br /&gt;
The most basic steps are to set a password for JupyterLab, start a job which runs JupyterLab, get the connection details from the output log and connect to it locally. The following instructions explain these steps and provide an additional script that replaces the manual step of looking into the output file.&lt;br /&gt;
&lt;br /&gt;
# Load a python module and set a password on the cluster for JupyterLab:&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
    module load devel/miniforge&lt;br /&gt;
    jupyter notebook --generate-config&lt;br /&gt;
    jupyter notebook password&lt;br /&gt;
  &amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Define a batch script to start a JupyterLab Job. Please adjust the first part according to your needs and your specific cluster.&lt;br /&gt;
#: &amp;lt;pre&amp;gt;~/jupyterlab.slurm&amp;lt;/pre&amp;gt;&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#SBATCH --partition=cpu-single&lt;br /&gt;
#SBATCH --job-name=jupyterlab&lt;br /&gt;
#SBATCH --time=00:10:00&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task 1&lt;br /&gt;
#SBATCH --mail-user=my_email_address #my_email_address # to use this generic version, add &amp;quot;alias my_email_address=&amp;lt;yourEmailAddress&amp;gt;&amp;quot; to the ~/.bashrc file&lt;br /&gt;
#SBATCH --mail-type=ALL&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# @param: change this to your preferred python or conda module&lt;br /&gt;
module load devel/miniforge&lt;br /&gt;
&lt;br /&gt;
# @param: cluster address for ssh connection&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
PORT=$(( ( RANDOM % 9999 )  + 1024 ))&lt;br /&gt;
jupyter lab --no-browser --ip=0.0.0.0 --port=${PORT}&lt;br /&gt;
HOSTID=$(squeue -h -o &amp;quot;%A %N %j&amp;quot; | grep jupyterlab | awk &#039;{print $2}&#039;)&lt;br /&gt;
echo &amp;quot;Connect&amp;quot;&lt;br /&gt;
echo &amp;quot;ssh -N -L ${PORT}:${HOSTID}:${PORT} ${USER}@$hostAddress&amp;quot;&lt;br /&gt;
echo &amp;quot;Job {$SLURM_JOB_ID} running on node {$SLURM_NODEID} on host {$HOSTID}.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
returned_code=$?&lt;br /&gt;
echo &amp;quot;&amp;gt; Script completed with exit code ${returned_code}&amp;quot;&lt;br /&gt;
exit ${returned_code}&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Run a wrapper script to execute the batch script and extract needed information from the slurm output file. You could save it together with other utility scripts in a &amp;quot;bin&amp;quot; directory in your home folder.&lt;br /&gt;
#: &amp;lt;pre&amp;gt;./bin/run_jupyterlab_simple.sh&amp;lt;/pre&amp;gt;&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# Define parameters&lt;br /&gt;
jobscript=~/jupyterlab.slurm&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
# Run job&lt;br /&gt;
job_id=$(sbatch $jobscript | awk &#039;{print $4}&#039;)&lt;br /&gt;
echo &amp;quot;jobid: $job_id&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Outfile name&lt;br /&gt;
slurm_out=slurm-${job_id}.out&lt;br /&gt;
&lt;br /&gt;
# Wait for output file&lt;br /&gt;
while [ ! -f $slurm_out ]; do   &lt;br /&gt;
    sleep 2; &lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# Wait until url is written in output file&lt;br /&gt;
while [ -z ${url} ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
    url=$(grep -o &#039;http[^ ]*&#039; $slurm_out | head -n 1); &lt;br /&gt;
    done&lt;br /&gt;
&lt;br /&gt;
# Extract hostID and port from output. The pattern assumes a node name with a length of 6 characters and a port with a length of 3, 4 or 5 numbers.&lt;br /&gt;
url_pattern=&amp;quot;http://([a-z0-9]{6}):([0-9]{3,5})/lab&amp;quot;&lt;br /&gt;
if [[ $url =~ $url_pattern ]]; then &lt;br /&gt;
    hostID=${BASH_REMATCH[1]}&lt;br /&gt;
    port=${BASH_REMATCH[2]}&lt;br /&gt;
    echo &amp;quot;To connect with the JupyterLab kernel, please enter the following into your local commandline: &amp;quot;&lt;br /&gt;
    echo &amp;quot;ssh -N -L $port:$hostID:$port ${USER}@$hostAddress&amp;quot;; &lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Note: It is normal that the ssh command doesn&#039;t end after providing the credentials. Ending the command would mean ending the local connection to the kernel.&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Afterwards, you can use the URL&amp;quot;&lt;br /&gt;
    echo &amp;quot;  http://127.0.0.1:${port}/lab &amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;to:&amp;quot;&lt;br /&gt;
    echo &amp;quot;- use the kernel in VSCode (&#039;Existing Jupyter Server...&#039;, enter URL, enter password, confirm &#039;127.0.0.1&#039;, choose kernel) or &amp;quot;&lt;br /&gt;
    echo &amp;quot;- open JupyterLab in your browser with the URL&amp;quot;&lt;br /&gt;
else&lt;br /&gt;
    echo &amp;quot;The needed information couldn&#039;t be found in the slurm output. Please contact your support unit if you need help with fixing this problem.&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
# rm $slurm_out&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Follow the instructions on the commandline to connect to the Jupyter kernel from your local machine or the Helix login node. More detailed instructions can be found below. &lt;br /&gt;
&lt;br /&gt;
==== Connect to a running job ====&lt;br /&gt;
&lt;br /&gt;
The job runs on a specific compute node and port. With this information, you can create a ssh connection to it. But first, you need to decide, in which way you want to work with your python code. The options are: &lt;br /&gt;
&lt;br /&gt;
# The code is placed locally on your computer. &lt;br /&gt;
# The code is placed on the cluster and you&#039;ve mounted the folder locally. (= The files on the cluster are accessible from within your local VS Code)&lt;br /&gt;
# The code is placed on the cluster and you work on the cluster via a remote connection in VS Code. &lt;br /&gt;
&lt;br /&gt;
Depending on the use case, you need to execute the ssh command in a different place: &lt;br /&gt;
&lt;br /&gt;
# Open VS Code on your computer. &lt;br /&gt;
# Open VS Code on your computer. &lt;br /&gt;
# Open VS Code on your computer and connect to the cluster.&lt;br /&gt;
&lt;br /&gt;
Then open a terminal and execute the ssh command, which is given in the commandline output of the wrapper script. If the terminal isn&#039;t already open, go to menu item &amp;quot;Terminal&amp;quot; at the top of the window and choose &amp;quot;New Terminal&amp;quot; (or &amp;quot;new -&amp;gt; command prompt&amp;quot; on Windows). &lt;br /&gt;
It is normal that the command doesn&#039;t end after you&#039;ve put in your credentials. Leave the terminal open and go on with the next step. &lt;br /&gt;
&lt;br /&gt;
To use the jupyter kernel that is running on the cluster node, you need to connect this kernel. This is similar to connecting any other kernel: &lt;br /&gt;
&lt;br /&gt;
# Open your code file.&lt;br /&gt;
# Click &amp;quot;Select Kernel&amp;quot; in the upper right corner. &lt;br /&gt;
# Choose &amp;quot;Existing Jupyter Server...&amp;quot;.&lt;br /&gt;
# Enter the URL that was given by the wrapper script. &lt;br /&gt;
# Enter your JupyterLab password that you set in the first step of these instructions.&lt;br /&gt;
# Confirm the prefilled value &amp;quot;127.0.0.1&amp;quot; by pressing Enter.&lt;br /&gt;
# Choose one of the virtual environments that you&#039;ve created on the cluster. You should see all python environments. To see the conda environments as well, you need to [[Helix/bwVisu/JupyterLab#Python_version | register them as ipykernel]] first. &lt;br /&gt;
&lt;br /&gt;
=== Complex Use Case ===&lt;br /&gt;
If you have different use cases for juypterlab, you could use a more flexible wrapper script, for example: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;./bin/run_jupyterlab.sh&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Starts a jupyter kernel on a node and provides information on how to connect to it locally.&lt;br /&gt;
# If you have only one use case and therefore need only one combination of slurm settings for your jupyter jobs, then you can use the simpler script.&lt;br /&gt;
# This script supports explorative analyses by allowing to overwrite parameters via commandline.&lt;br /&gt;
# Different job configurations can be defined in advance and then used with a given short name (cpu, gpu,...).&lt;br /&gt;
&lt;br /&gt;
programname=$0&lt;br /&gt;
function help {&lt;br /&gt;
    &#039;&#039;&#039;help text&#039;&#039;&#039;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Starts a jupyterlab kernel&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;usage example: $programname --param_set cpu&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --param_set string   name of the parameter set&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (examples: cpu, gpu)&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --jobscript string   optional, path of batch script&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (default: ~/jupyterlab.slurm)&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --slurm_out string   optional, name of slurm output file&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (default: slurm-${job_id}.out)&amp;quot;&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# These parameters are set later in the script. Providing them via commandline, overwrites their values set in the script.&lt;br /&gt;
jobscript=None&lt;br /&gt;
slurm_out=None&lt;br /&gt;
&lt;br /&gt;
# Process parameters&lt;br /&gt;
while [ $# -gt 0 ]; do&lt;br /&gt;
    if [[ $1 == &amp;quot;--help&amp;quot; ]]; then&lt;br /&gt;
        help&lt;br /&gt;
        exit 0&lt;br /&gt;
    # when given -p as parameter, use its value for the variable param_set&lt;br /&gt;
    elif [[ $1 == &amp;quot;-p&amp;quot; ]]; then&lt;br /&gt;
        param_set=&amp;quot;$2&amp;quot;&lt;br /&gt;
        shift&lt;br /&gt;
    elif [[ $1 == &amp;quot;--&amp;quot;* ]]; then&lt;br /&gt;
        v=&amp;quot;${1/--/}&amp;quot;&lt;br /&gt;
        declare &amp;quot;$v&amp;quot;=&amp;quot;$2&amp;quot;&lt;br /&gt;
        shift&lt;br /&gt;
    fi&lt;br /&gt;
    shift&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
function define_param_set(){&lt;br /&gt;
    &#039;&#039;&#039;Define parameter sets for sbatch&#039;&#039;&#039;&lt;br /&gt;
    # Define different sets&lt;br /&gt;
    cpu=(--partition=cpu-single --mem=2gb)&lt;br /&gt;
    gpu=(--partition=gpu-single --mem=3gb --gres=gpu:1)&lt;br /&gt;
&lt;br /&gt;
    param_set=${1}&lt;br /&gt;
    param_set=$param_set[@] &lt;br /&gt;
    param_set=(&amp;quot;${!param_set}&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
    # Add params that are the same for all sets&lt;br /&gt;
    param_set+=(--ntasks=1)&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# @param: jobscript, name of the slurm batch script to execute&lt;br /&gt;
if  [ &amp;quot;$jobscript&amp;quot; = &amp;quot;None&amp;quot; ]; then&lt;br /&gt;
    jobscript=~/jupyterlab.slurm&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @param: cluster address for ssh connection&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
# Translate given param_set value to actual set of parameters &lt;br /&gt;
define_param_set $param_set&lt;br /&gt;
echo &amp;quot;param_set: ${param_set[*]}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Run job&lt;br /&gt;
job_id=$(sbatch ${param_set[@]} $jobscript | awk &#039;{print $4}&#039;)&lt;br /&gt;
echo &amp;quot;jobid: $job_id&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# @param: slurm_out, the filename for the slurm output file&lt;br /&gt;
if  [ &amp;quot;$slurm_out&amp;quot; = &amp;quot;None&amp;quot; ]; then&lt;br /&gt;
    slurm_out=slurm-${job_id}.out&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# Wait for output file&lt;br /&gt;
while [ ! -f $slurm_out ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# Wait until url is written in output file&lt;br /&gt;
while [ -z ${url} ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
    url=$(grep -o &#039;http[^ ]*&#039; $slurm_out | head -n 1); &lt;br /&gt;
    done&lt;br /&gt;
&lt;br /&gt;
# Extract hostID and port from output.&lt;br /&gt;
url_pattern=&amp;quot;http://([a-z0-9]{6}):([0-9]{3,5})/lab&amp;quot;&lt;br /&gt;
if [[ $url =~ $url_pattern ]]; then &lt;br /&gt;
    hostID=${BASH_REMATCH[1]}&lt;br /&gt;
    port=${BASH_REMATCH[2]}&lt;br /&gt;
    echo &amp;quot;To connect with the JupyterLab kernel, please enter the following into your local commandline: &amp;quot;&lt;br /&gt;
    echo &amp;quot;ssh -N -L $port:$hostID:$port ${USER}@$hostAddress&amp;quot;; &lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;Afterwards, you can either&amp;quot;&lt;br /&gt;
echo &amp;quot;- use the kernel in VSCode or &amp;quot;&lt;br /&gt;
echo &amp;quot;- open JupyterLab with this URL: &amp;quot;&lt;br /&gt;
echo &amp;quot;  http://127.0.0.1:${port}/lab &amp;quot;&lt;br /&gt;
echo &amp;quot;Note: It is normal that the ssh command doesn&#039;t end after providing the credentials. Ending the command would mean ending the local connection to the kernel.&amp;quot;&lt;br /&gt;
#rm $slurm_out&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Development/VS_Code&amp;diff=15976</id>
		<title>Development/VS Code</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Development/VS_Code&amp;diff=15976"/>
		<updated>2026-04-21T07:31:33Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Overview ==&lt;br /&gt;
&lt;br /&gt;
[[File:vscode.png|thumb|Visual Studio Code, Source: https://code.visualstudio.com/|450px]]&lt;br /&gt;
&lt;br /&gt;
[https://github.com/Microsoft/vscode Visual Studio Code] (VS Code) is an open source source-code editor from Microsoft. It has become one of the most popular IDEs according to a [https://survey.stackoverflow.co/2024/technology#1-integrated-development-environment stackoverflow survey]. The functionality of VS Code can easily be extended by installing extensions. These extensions allow for almost arbitrary &#039;&#039;&#039;language support&#039;&#039;&#039;, &#039;&#039;&#039;debugging&#039;&#039;&#039; or &#039;&#039;&#039;remote development&#039;&#039;&#039;. You can install VS Code locally and use it for remote development.&lt;br /&gt;
&lt;br /&gt;
== Visual Studio Code  ==&lt;br /&gt;
Visual Studio Code (VS Code) is a lightweight, extensible code editor from Microsoft that supports many programming languages and features like debugging, and integrated Git. It offers a rich extension marketplace to add language support, themes, and tools tailored to your workflow. VS Code runs on Windows, macOS, and Linux and is popular for its speed, customizability, and strong community ecosystem.&lt;br /&gt;
&lt;br /&gt;
=== Using AI agents ===&lt;br /&gt;
When deploying AI agents on the bwHPC clusters, users must exercise extreme caution and maintain full oversight of the agent&#039;s activities. You are fully responsible for all actions initiated by an agent, including any security breaches or system disruptions it may cause. It is mandatory to strictly monitor resource usage on login nodes, as these are shared resources intended only for lightweight tasks. Any agent found consuming excessive CPU or memory on a login node will be terminated immediately to ensure stability for other users. To comply with usage policies, all AI-driven workloads which generate heavy load must be submitted to the Slurm batch queues rather than running directly on the login nodes.&lt;br /&gt;
&lt;br /&gt;
=== Extension: Remote-SSH ===&lt;br /&gt;
&lt;br /&gt;
In order to remotely develop and debug code at HPC facilities, you can use the [https://code.visualstudio.com/docs/remote/ssh &#039;&#039;&#039;Remote - SSH&#039;&#039;&#039; extension]. The extension allows you to connect your locally installed VS Code with the remote servers. So in contrast to using graphical IDEs within a remote desktop session (RDP, VNC), there are no negative effects like e.g. laggy reactions to your input or blurred display of fonts.&lt;br /&gt;
&lt;br /&gt;
==== Installation and Configuration ====&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-extensions-button.png|vscode-extensions-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to install the Remote - SSH extension, just click on the Extensions (Erweiterungen) button in the left side bar and enter “remote ssh” in the search field. Choose &#039;&#039;&#039;Remote - SSH&#039;&#039;&#039; from the occurring list and click on &#039;&#039;&#039;Install&#039;&#039;&#039;.&lt;br /&gt;
&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[[File:vscode-remoteexplorer-button.png|vscode-remoteexplorer-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to configure remote connections, open the Remote-Explorer extension. On Linux Systems, the file &amp;lt;code&amp;gt;~/.ssh/config&amp;lt;/code&amp;gt; is automatically evaluated. The targets within this file already appear in the left side bar.&lt;br /&gt;
&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[[File:vscode-remoteexplorer-add.png|vscode-remoteexplorer-add.png|350px]]&amp;lt;br&amp;gt;&lt;br /&gt;
If there are no remote ssh targets defined within this file, you can easily add one by clicking on the + symbol. Make sure that “SSH Targets” is active in the drop down menu of the Remote-Explorer. Enter the connection details &amp;lt;code&amp;gt;&amp;amp;lt;user&amp;amp;gt;@&amp;amp;lt;server&amp;amp;gt;&amp;lt;/code&amp;gt;. You will be asked, whether the file &amp;lt;code&amp;gt;~/.ssh/config&amp;lt;/code&amp;gt; should be modified or if another config file should be used or created.&lt;br /&gt;
&lt;br /&gt;
A minimal entry within &amp;lt;code&amp;gt;~/.ssh/config&amp;lt;/code&amp;gt; in order to have a remote target to be listed there could look like this:&lt;br /&gt;
&lt;br /&gt;
 $ cat ~/.ssh/config&lt;br /&gt;
 Host uc3.scc.kit.edu&lt;br /&gt;
   HostName uc3.scc.kit.edu&lt;br /&gt;
   User xy_ab1234&lt;br /&gt;
&lt;br /&gt;
==== Connect to Login Nodes ====&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-remoteexplorer-button.png|vscode-remoteexplorer-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to connect to a remote SSH target, open the Remote-Explorer. Right-click a target and connect in the current or a new window. TOTP and password can be entered in the corresponding input fields that open.&lt;br /&gt;
&lt;br /&gt;
You are now logged in on the remote server. As usual, you can open a project directory with the standard key binding Ctrl+k Ctrl+o. You can now edit and debug code.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Attention&#039;&#039;&#039;: Please remember that you are running and debugging the code on a login node. Do not perform resource-intensive tasks. Furthermore, no GPU resources are available to you.&lt;br /&gt;
&lt;br /&gt;
Extensions, which are installed locally, are only usable on your local machine and are not automatically installed remotely. However, as soon as you open the Extensions-Explorer during a remote session, VS Code proposes to install the locally installed extensions remotely.&lt;br /&gt;
&lt;br /&gt;
==== Disconnect from Login Nodes ====&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-remoteexplorer-indicator.png|images/vscode-remoteexplorer-indicator.png|200px]]&amp;lt;br&amp;gt;&lt;br /&gt;
If you want to end your remote session, click the green box in the lower left corner. In the input box that opens, select the “Close Remote Connection” option. If you simply close your VS Code window, some server-side components of VS Code will continue to run remotely.&lt;br /&gt;
&lt;br /&gt;
=== Access to Compute Nodes ===&lt;br /&gt;
&lt;br /&gt;
The workflow described above does not allow debugging on compute nodes that have been requested via an interactive Slurm job, for example. Debugging GPU codes is therefore also not possible, since this kind of resource is only accessible within Slurm jobs.&lt;br /&gt;
We strongly discourage using the Code Tunnel application, as it violates our access policies. In this scenario, an application running on the compute node connects to a Microsoft or GitHub server. The locally running VS Code then connects to the compute nodes via these external servers, thereby bypassing the login nodes.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
Access the compute nodes via VS Code and the Remote-SSH plugin is only possible, if you start a temporarily running SSH service on the compute node which listens to an unprivileged port. By tunneling this port to your local computer, you can connect VS code to it.&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Code-Server ==&lt;br /&gt;
&lt;br /&gt;
The application [https://github.com/cdr/code-server code-server] allows to run the server part of VS Code on any machine, it can be accessed in the web browser running on your local computer. This enables, for example, development and debugging on compute nodes.&lt;br /&gt;
Code-server runs a web server which serves on an unprivileged port. In order to connect your web browser to the remotely running code server, you have to forward this port via a SSH tunnel.&lt;br /&gt;
&lt;br /&gt;
[[File:code-server.png|thumb|code-server.png|VS Code in web browser: code-server, Source: https://github.com/cdr/code-server&amp;quot;&amp;gt;https://github.com/cdr/code-server|400px]]&lt;br /&gt;
&lt;br /&gt;
=== Install Code-Server ===&lt;br /&gt;
&lt;br /&gt;
Code-server is pre-installed on bwUniCluster and accessible via an Lmod module:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;module load devel/code-server&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
On clusters with no code-server module, the application can easily be installed with the description available on the official [https://github.com/coder/code-server GitHub page].&lt;br /&gt;
&lt;br /&gt;
=== Start Code-Server ===&lt;br /&gt;
&lt;br /&gt;
Code-server can be run on either login nodes or compute nodes. In the example shown, an interactive job is started on a GPU partition to run code-server there.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ salloc -p accelerated --gres=gpu:4 --time=30:00 # Start interactive job with 1 GPU&lt;br /&gt;
$ module load devel/code-server                   # Load code-server module&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
When code-server is started, it opens a web server listening on a certain port. The user has to &#039;&#039;&#039;specify the port&#039;&#039;&#039;. It can be chosen freely in the unprivileged range (above 1024). If a port is already assigned, e.g. because several users choose the same port, another port must be chosen.&lt;br /&gt;
&lt;br /&gt;
By starting code-server, you are running a web server that can be accessed by anyone logged in to the cluster. To prevent other people from gaining access to your account and data, this web server is &#039;&#039;&#039;password protected&#039;&#039;&#039;. If no variable &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt; is defined, the password in the default config file &amp;lt;code&amp;gt;~/.config/code-server/config.yaml&amp;lt;/code&amp;gt; is used. If you want to define your own password, you can either change it in the config file or export the variable &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ PASSWORD=&amp;lt;mySecret&amp;gt; \&lt;br /&gt;
    code-server \&lt;br /&gt;
      --bind-addr 0.0.0.0:8081 \&lt;br /&gt;
      --auth password  # Start code-server on port 8081&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;background:#FFCCCC; width:100%;&amp;quot;&lt;br /&gt;
| &#039;&#039;&#039;Security implications&#039;&#039;&#039;&lt;br /&gt;
Please note that by starting &amp;lt;code&amp;gt;code-server&amp;lt;/code&amp;gt; you are running a web server that can be accessed by everyone logged in on the cluster.&amp;lt;br&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;If password protection is disabled, anybody can access your account and your data.&#039;&#039;&#039;&lt;br /&gt;
* Choose a &#039;&#039;&#039;secure password&#039;&#039;&#039;!&lt;br /&gt;
* Do &#039;&#039;&#039;NOT&#039;&#039;&#039; use &amp;lt;code&amp;gt;code-server --link&amp;lt;/code&amp;gt;!&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Connect to code-server ===&lt;br /&gt;
[[File:code-server-hk.png|thumb|Code-server running on GPU node.|400px]]&lt;br /&gt;
&lt;br /&gt;
As soon as code-server is running, it can be accessed in the web browser. In order to establish the connection, a SSH tunnel from your local computer to the remote server has to be created via:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ ssh -L 8081:&amp;lt;computeNodeID&amp;gt;:8081 &amp;lt;userID&amp;gt;@uc3.scc.kit.edu&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
You need to enter the &amp;lt;code&amp;gt;computeNodeID&amp;lt;/code&amp;gt; of the node on which the interactive Slurm job is running. If you have started code server on a login node, just enter &amp;lt;code&amp;gt;localhost&amp;lt;/code&amp;gt;. Now you can open http://127.0.0.1:8081 in your web browser. Possibly, you have to allow your browser to open an insecure (non-https) site. The login site looks as follows:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:code-server-login.png|Code-server login page.|300px]]&lt;br /&gt;
&lt;br /&gt;
Enter the password from &amp;lt;code&amp;gt;~/.config/code-server/config.yaml&amp;lt;/code&amp;gt; or from the &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt; variable. After clicking the “Submit” button, the familiar VS Code interface will open in your browser.&lt;br /&gt;
&lt;br /&gt;
=== End code-server session ===&lt;br /&gt;
&lt;br /&gt;
If you want to temporarily log out from your code-server session you can open the “Application Menu” in the left side bar and click on “Log out”. To &#039;&#039;&#039;terminate&#039;&#039;&#039; the code-server session, you have to cancel it in the interactive Slurm job by pressing ++ctrl+c++.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
== Connect to Remote Jupyter Kernel ==&lt;br /&gt;
To work with your python scripts and notebooks within VSCode while using the resources of a compute node, you can create a batch job that launches JupyterLab and connect to it via VS Code. To do so, please follow the instructions below. Any parts of the scripts that might need adjustments are marked with the keyword &amp;quot;@params&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=== Simple Use Case ===&lt;br /&gt;
The most basic steps are to set a password for JupyterLab, start a job which runs JupyterLab, get the connection details from the output log and connect to it locally. The following instructions explain these steps and provide an additional script that replaces the manual step of looking into the output file.&lt;br /&gt;
&lt;br /&gt;
# Load a python module and set a password on the cluster for JupyterLab:&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
    module load devel/miniforge&lt;br /&gt;
    jupyter notebook --generate-config&lt;br /&gt;
    jupyter notebook password&lt;br /&gt;
  &amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Define a batch script to start a JupyterLab Job. Please adjust the first part according to your needs and your specific cluster.&lt;br /&gt;
#: &amp;lt;pre&amp;gt;~/jupyterlab.slurm&amp;lt;/pre&amp;gt;&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#SBATCH --partition=cpu-single&lt;br /&gt;
#SBATCH --job-name=jupyterlab&lt;br /&gt;
#SBATCH --time=00:10:00&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task 1&lt;br /&gt;
#SBATCH --mail-user=my_email_address #my_email_address # to use this generic version, add &amp;quot;alias my_email_address=&amp;lt;yourEmailAddress&amp;gt;&amp;quot; to the ~/.bashrc file&lt;br /&gt;
#SBATCH --mail-type=ALL&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# @param: change this to your preferred python or conda module&lt;br /&gt;
module load devel/miniforge&lt;br /&gt;
&lt;br /&gt;
# @param: cluster address for ssh connection&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
PORT=$(( ( RANDOM % 9999 )  + 1024 ))&lt;br /&gt;
jupyter lab --no-browser --ip=0.0.0.0 --port=${PORT}&lt;br /&gt;
HOSTID=$(squeue -h -o &amp;quot;%A %N %j&amp;quot; | grep jupyterlab | awk &#039;{print $2}&#039;)&lt;br /&gt;
echo &amp;quot;Connect&amp;quot;&lt;br /&gt;
echo &amp;quot;ssh -N -L ${PORT}:${HOSTID}:${PORT} ${USER}@$hostAddress&amp;quot;&lt;br /&gt;
echo &amp;quot;Job {$SLURM_JOB_ID} running on node {$SLURM_NODEID} on host {$HOSTID}.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
returned_code=$?&lt;br /&gt;
echo &amp;quot;&amp;gt; Script completed with exit code ${returned_code}&amp;quot;&lt;br /&gt;
exit ${returned_code}&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Run a wrapper script to execute the batch script and extract needed information from the slurm output file. You could save it together with other utility scripts in a &amp;quot;bin&amp;quot; directory in your home folder.&lt;br /&gt;
#: &amp;lt;pre&amp;gt;./bin/run_jupyterlab_simple.sh&amp;lt;/pre&amp;gt;&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# Define parameters&lt;br /&gt;
jobscript=~/jupyterlab.slurm&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
# Run job&lt;br /&gt;
job_id=$(sbatch $jobscript | awk &#039;{print $4}&#039;)&lt;br /&gt;
echo &amp;quot;jobid: $job_id&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Outfile name&lt;br /&gt;
slurm_out=slurm-${job_id}.out&lt;br /&gt;
&lt;br /&gt;
# Wait for output file&lt;br /&gt;
while [ ! -f $slurm_out ]; do   &lt;br /&gt;
    sleep 2; &lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# Wait until url is written in output file&lt;br /&gt;
while [ -z ${url} ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
    url=$(grep -o &#039;http[^ ]*&#039; $slurm_out | head -n 1); &lt;br /&gt;
    done&lt;br /&gt;
&lt;br /&gt;
# Extract hostID and port from output. The pattern assumes a node name with a length of 6 characters and a port with a length of 3, 4 or 5 numbers.&lt;br /&gt;
url_pattern=&amp;quot;http://([a-z0-9]{6}):([0-9]{3,5})/lab&amp;quot;&lt;br /&gt;
if [[ $url =~ $url_pattern ]]; then &lt;br /&gt;
    hostID=${BASH_REMATCH[1]}&lt;br /&gt;
    port=${BASH_REMATCH[2]}&lt;br /&gt;
    echo &amp;quot;To connect with the JupyterLab kernel, please enter the following into your local commandline: &amp;quot;&lt;br /&gt;
    echo &amp;quot;ssh -N -L $port:$hostID:$port ${USER}@$hostAddress&amp;quot;; &lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Note: It is normal that the ssh command doesn&#039;t end after providing the credentials. Ending the command would mean ending the local connection to the kernel.&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Afterwards, you can use the URL&amp;quot;&lt;br /&gt;
    echo &amp;quot;  http://127.0.0.1:${port}/lab &amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;to:&amp;quot;&lt;br /&gt;
    echo &amp;quot;- use the kernel in VSCode (&#039;Existing Jupyter Server...&#039;, enter URL, enter password, confirm &#039;127.0.0.1&#039;, choose kernel) or &amp;quot;&lt;br /&gt;
    echo &amp;quot;- open JupyterLab in your browser with the URL&amp;quot;&lt;br /&gt;
else&lt;br /&gt;
    echo &amp;quot;The needed information couldn&#039;t be found in the slurm output. Please contact your support unit if you need help with fixing this problem.&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
# rm $slurm_out&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Follow the instructions on the commandline to connect to the Jupyter kernel from your local machine or the Helix login node. More detailed instructions can be found below. &lt;br /&gt;
&lt;br /&gt;
==== Connect to a running job ====&lt;br /&gt;
&lt;br /&gt;
The job runs on a specific compute node and port. With this information, you can create a ssh connection to it. But first, you need to decide, in which way you want to work with your python code. The options are: &lt;br /&gt;
&lt;br /&gt;
# The code is placed locally on your computer. &lt;br /&gt;
# The code is placed on the cluster and you&#039;ve mounted the folder locally. (= The files on the cluster are accessible from within your local VS Code)&lt;br /&gt;
# The code is placed on the cluster and you work on the cluster via a remote connection in VS Code. &lt;br /&gt;
&lt;br /&gt;
Depending on the use case, you need to execute the ssh command in a different place: &lt;br /&gt;
&lt;br /&gt;
# Open VS Code on your computer. &lt;br /&gt;
# Open VS Code on your computer. &lt;br /&gt;
# Open VS Code on your computer and connect to the cluster.&lt;br /&gt;
&lt;br /&gt;
Then open a terminal and execute the ssh command, which is given in the commandline output of the wrapper script. If the terminal isn&#039;t already open, go to menu item &amp;quot;Terminal&amp;quot; at the top of the window and choose &amp;quot;New Terminal&amp;quot; (or &amp;quot;new -&amp;gt; command prompt&amp;quot; on Windows). &lt;br /&gt;
It is normal that the command doesn&#039;t end after you&#039;ve put in your credentials. Leave the terminal open and go on with the next step. &lt;br /&gt;
&lt;br /&gt;
To use the jupyter kernel that is running on the cluster node, you need to connect this kernel. This is similar to connecting any other kernel: &lt;br /&gt;
&lt;br /&gt;
# Open your code file.&lt;br /&gt;
# Click &amp;quot;Select Kernel&amp;quot; in the upper right corner. &lt;br /&gt;
# Choose &amp;quot;Existing Jupyter Server...&amp;quot;.&lt;br /&gt;
# Enter the URL that was given by the wrapper script. &lt;br /&gt;
# Enter your JupyterLab password that you set in the first step of these instructions.&lt;br /&gt;
# Confirm the prefilled value &amp;quot;127.0.0.1&amp;quot; by pressing Enter.&lt;br /&gt;
# Choose one of the virtual environments that you&#039;ve created on the cluster. You should see all python environments. To see the conda environments as well, you need to [[Helix/bwVisu/JupyterLab#Python_version | register them as ipykernel]] first. &lt;br /&gt;
&lt;br /&gt;
=== Complex Use Case ===&lt;br /&gt;
If you have different use cases for juypterlab, you could use a more flexible wrapper script, for example: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;./bin/run_jupyterlab.sh&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Starts a jupyter kernel on a node and provides information on how to connect to it locally.&lt;br /&gt;
# If you have only one use case and therefore need only one combination of slurm settings for your jupyter jobs, then you can use the simpler script.&lt;br /&gt;
# This script supports explorative analyses by allowing to overwrite parameters via commandline.&lt;br /&gt;
# Different job configurations can be defined in advance and then used with a given short name (cpu, gpu,...).&lt;br /&gt;
&lt;br /&gt;
programname=$0&lt;br /&gt;
function help {&lt;br /&gt;
    &#039;&#039;&#039;help text&#039;&#039;&#039;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Starts a jupyterlab kernel&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;usage example: $programname --param_set cpu&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --param_set string   name of the parameter set&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (examples: cpu, gpu)&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --jobscript string   optional, path of batch script&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (default: ~/jupyterlab.slurm)&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --slurm_out string   optional, name of slurm output file&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (default: slurm-${job_id}.out)&amp;quot;&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# These parameters are set later in the script. Providing them via commandline, overwrites their values set in the script.&lt;br /&gt;
jobscript=None&lt;br /&gt;
slurm_out=None&lt;br /&gt;
&lt;br /&gt;
# Process parameters&lt;br /&gt;
while [ $# -gt 0 ]; do&lt;br /&gt;
    if [[ $1 == &amp;quot;--help&amp;quot; ]]; then&lt;br /&gt;
        help&lt;br /&gt;
        exit 0&lt;br /&gt;
    # when given -p as parameter, use its value for the variable param_set&lt;br /&gt;
    elif [[ $1 == &amp;quot;-p&amp;quot; ]]; then&lt;br /&gt;
        param_set=&amp;quot;$2&amp;quot;&lt;br /&gt;
        shift&lt;br /&gt;
    elif [[ $1 == &amp;quot;--&amp;quot;* ]]; then&lt;br /&gt;
        v=&amp;quot;${1/--/}&amp;quot;&lt;br /&gt;
        declare &amp;quot;$v&amp;quot;=&amp;quot;$2&amp;quot;&lt;br /&gt;
        shift&lt;br /&gt;
    fi&lt;br /&gt;
    shift&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
function define_param_set(){&lt;br /&gt;
    &#039;&#039;&#039;Define parameter sets for sbatch&#039;&#039;&#039;&lt;br /&gt;
    # Define different sets&lt;br /&gt;
    cpu=(--partition=cpu-single --mem=2gb)&lt;br /&gt;
    gpu=(--partition=gpu-single --mem=3gb --gres=gpu:1)&lt;br /&gt;
&lt;br /&gt;
    param_set=${1}&lt;br /&gt;
    param_set=$param_set[@] &lt;br /&gt;
    param_set=(&amp;quot;${!param_set}&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
    # Add params that are the same for all sets&lt;br /&gt;
    param_set+=(--ntasks=1)&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# @param: jobscript, name of the slurm batch script to execute&lt;br /&gt;
if  [ &amp;quot;$jobscript&amp;quot; = &amp;quot;None&amp;quot; ]; then&lt;br /&gt;
    jobscript=~/jupyterlab.slurm&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @param: cluster address for ssh connection&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
# Translate given param_set value to actual set of parameters &lt;br /&gt;
define_param_set $param_set&lt;br /&gt;
echo &amp;quot;param_set: ${param_set[*]}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Run job&lt;br /&gt;
job_id=$(sbatch ${param_set[@]} $jobscript | awk &#039;{print $4}&#039;)&lt;br /&gt;
echo &amp;quot;jobid: $job_id&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# @param: slurm_out, the filename for the slurm output file&lt;br /&gt;
if  [ &amp;quot;$slurm_out&amp;quot; = &amp;quot;None&amp;quot; ]; then&lt;br /&gt;
    slurm_out=slurm-${job_id}.out&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# Wait for output file&lt;br /&gt;
while [ ! -f $slurm_out ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# Wait until url is written in output file&lt;br /&gt;
while [ -z ${url} ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
    url=$(grep -o &#039;http[^ ]*&#039; $slurm_out | head -n 1); &lt;br /&gt;
    done&lt;br /&gt;
&lt;br /&gt;
# Extract hostID and port from output.&lt;br /&gt;
url_pattern=&amp;quot;http://([a-z0-9]{6}):([0-9]{3,5})/lab&amp;quot;&lt;br /&gt;
if [[ $url =~ $url_pattern ]]; then &lt;br /&gt;
    hostID=${BASH_REMATCH[1]}&lt;br /&gt;
    port=${BASH_REMATCH[2]}&lt;br /&gt;
    echo &amp;quot;To connect with the JupyterLab kernel, please enter the following into your local commandline: &amp;quot;&lt;br /&gt;
    echo &amp;quot;ssh -N -L $port:$hostID:$port ${USER}@$hostAddress&amp;quot;; &lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;Afterwards, you can either&amp;quot;&lt;br /&gt;
echo &amp;quot;- use the kernel in VSCode or &amp;quot;&lt;br /&gt;
echo &amp;quot;- open JupyterLab with this URL: &amp;quot;&lt;br /&gt;
echo &amp;quot;  http://127.0.0.1:${port}/lab &amp;quot;&lt;br /&gt;
echo &amp;quot;Note: It is normal that the ssh command doesn&#039;t end after providing the credentials. Ending the command would mean ending the local connection to the kernel.&amp;quot;&lt;br /&gt;
#rm $slurm_out&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Development/VS_Code&amp;diff=15975</id>
		<title>Development/VS Code</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Development/VS_Code&amp;diff=15975"/>
		<updated>2026-04-21T07:12:07Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Overview ==&lt;br /&gt;
&lt;br /&gt;
[[File:vscode.png|thumb|Visual Studio Code, Source: https://code.visualstudio.com/|450px]]&lt;br /&gt;
&lt;br /&gt;
[https://github.com/Microsoft/vscode Visual Studio Code] (VS Code) is an open source source-code editor from Microsoft. It has become one of the most popular IDEs according to a [https://survey.stackoverflow.co/2024/technology#1-integrated-development-environment stackoverflow survey]. The functionality of VS Code can easily be extended by installing extensions. These extensions allow for almost arbitrary &#039;&#039;&#039;language support&#039;&#039;&#039;, &#039;&#039;&#039;debugging&#039;&#039;&#039; or &#039;&#039;&#039;remote development&#039;&#039;&#039;. You can install VS Code locally and use it for remote development.&lt;br /&gt;
&lt;br /&gt;
== Visual Studio Code  ==&lt;br /&gt;
Visual Studio Code (VS Code) is a lightweight, extensible code editor from Microsoft that supports many programming languages and features like IntelliSense, debugging, and integrated Git. It offers a rich extension marketplace to add language support, themes, and tools tailored to your workflow. VS Code runs on Windows, macOS, and Linux and is popular for its speed, customizability, and strong community ecosystem.&lt;br /&gt;
&lt;br /&gt;
=== Using AI agents ===&lt;br /&gt;
When deploying AI agents on the bwHPC clusters, users must exercise extreme caution and maintain full oversight of the agent&#039;s activities. You are fully responsible for all actions initiated by an agent, including any security breaches or system disruptions it may cause. It is mandatory to strictly monitor resource usage on login nodes, as these are shared resources intended only for lightweight tasks. Any agent found consuming excessive CPU or memory on a login node will be terminated immediately to ensure stability for other users. To comply with usage policies, all AI-driven workloads which generate heavy load must be submitted to the Slurm batch queues rather than running directly on the login nodes.&lt;br /&gt;
&lt;br /&gt;
=== Extension: Remote-SSH ===&lt;br /&gt;
&lt;br /&gt;
In order to remotely develop and debug code at HPC facilities, you can use the [https://code.visualstudio.com/docs/remote/ssh &#039;&#039;&#039;Remote - SSH&#039;&#039;&#039; extension]. The extension allows you to connect your locally installed VS Code with the remote servers. So in contrast to using graphical IDEs within a remote desktop session (RDP, VNC), there are no negative effects like e.g. laggy reactions to your input or blurred display of fonts.&lt;br /&gt;
&lt;br /&gt;
==== Installation and Configuration ====&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-extensions-button.png|vscode-extensions-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to install the Remote - SSH extension, just click on the Extensions (Erweiterungen) button in the left side bar and enter “remote ssh” in the search field. Choose &#039;&#039;&#039;Remote - SSH&#039;&#039;&#039; from the occurring list and click on &#039;&#039;&#039;Install&#039;&#039;&#039;.&lt;br /&gt;
&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[[File:vscode-remoteexplorer-button.png|vscode-remoteexplorer-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to configure remote connections, open the Remote-Explorer extension. On Linux Systems, the file &amp;lt;code&amp;gt;~/.ssh/config&amp;lt;/code&amp;gt; is automatically evaluated. The targets within this file already appear in the left side bar.&lt;br /&gt;
&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[[File:vscode-remoteexplorer-add.png|vscode-remoteexplorer-add.png|350px]]&amp;lt;br&amp;gt;&lt;br /&gt;
If there are no remote ssh targets defined within this file, you can easily add one by clicking on the + symbol. Make sure that “SSH Targets” is active in the drop down menu of the Remote-Explorer. Enter the connection details &amp;lt;code&amp;gt;&amp;amp;lt;user&amp;amp;gt;@&amp;amp;lt;server&amp;amp;gt;&amp;lt;/code&amp;gt;. You will be asked, whether the file &amp;lt;code&amp;gt;~/.ssh/config&amp;lt;/code&amp;gt; should be modified or if another config file should be used or created.&lt;br /&gt;
&lt;br /&gt;
A minimal entry within &amp;lt;code&amp;gt;~/.ssh/config&amp;lt;/code&amp;gt; in order to have a remote target to be listed there could look like this:&lt;br /&gt;
&lt;br /&gt;
 $ cat ~/.ssh/config&lt;br /&gt;
 Host uc3.scc.kit.edu&lt;br /&gt;
   HostName uc3.scc.kit.edu&lt;br /&gt;
   User xy_ab1234&lt;br /&gt;
&lt;br /&gt;
==== Connect to Login Nodes ====&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-remoteexplorer-button.png|vscode-remoteexplorer-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to connect to a remote SSH target, open the Remote-Explorer. Right-click a target and connect in the current or a new window. TOTP and password can be entered in the corresponding input fields that open.&lt;br /&gt;
&lt;br /&gt;
You are now logged in on the remote server. As usual, you can open a project directory with the standard key binding Ctrl+k Ctrl+o. You can now edit and debug code.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Attention&#039;&#039;&#039;: Please remember that you are running and debugging the code on a login node. Do not perform resource-intensive tasks. Furthermore, no GPU resources are available to you.&lt;br /&gt;
&lt;br /&gt;
Extensions, which are installed locally, are only usable on your local machine and are not automatically installed remotely. However, as soon as you open the Extensions-Explorer during a remote session, VS Code proposes to install the locally installed extensions remotely.&lt;br /&gt;
&lt;br /&gt;
==== Disconnect from Login Nodes ====&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-remoteexplorer-indicator.png|images/vscode-remoteexplorer-indicator.png|200px]]&amp;lt;br&amp;gt;&lt;br /&gt;
If you want to end your remote session, click the green box in the lower left corner. In the input box that opens, select the “Close Remote Connection” option. If you simply close your VS Code window, some server-side components of VS Code will continue to run remotely.&lt;br /&gt;
&lt;br /&gt;
=== Access to Compute Nodes ===&lt;br /&gt;
&lt;br /&gt;
The workflow described above does not allow debugging on compute nodes that have been requested via an interactive Slurm job, for example. The security settings prevent the login node from being used as a proxy jump host. So there is no direct way to connect your locally installed VS code to the compute nodes. Debugging GPU codes is therefore also not possible, since this kind of resource is only accessible within Slurm jobs. Please have a look at the overview table in the first chapter to see which solution to follow.&lt;br /&gt;
&lt;br /&gt;
== Code-Server ==&lt;br /&gt;
&lt;br /&gt;
The application [https://github.com/cdr/code-server code-server] allows to run the server part of VS Code on any machine, it can be accessed in the web browser. This enables, for example, development and debugging on compute nodes.&lt;br /&gt;
Code-server runs a web server which runs on an unprivileged port. In order to connect your web browser to the remotely running code server, you have to forward this port via a SSH tunnel.&lt;br /&gt;
&lt;br /&gt;
[[File:code-server.png|thumb|code-server.png|VS Code in web browser: code-server, Source: https://github.com/cdr/code-server&amp;quot;&amp;gt;https://github.com/cdr/code-server|400px]]&lt;br /&gt;
&lt;br /&gt;
=== Install Code-Server ===&lt;br /&gt;
&lt;br /&gt;
Code-server is pre-installed on bwUniCluster and accessible via an Lmod module:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;module load devel/code-server&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
On clusters with no code-server module, the application can easily be installed with the description available on the official [https://github.com/coder/code-server GitHub page].&lt;br /&gt;
&lt;br /&gt;
=== Start Code-Server ===&lt;br /&gt;
&lt;br /&gt;
Code-server can be run on either login nodes or compute nodes. In the example shown, an interactive job is started on a GPU partition to run code-server there.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ salloc -p accelerated --gres=gpu:4 --time=30:00 # Start interactive job with 1 GPU&lt;br /&gt;
$ module load devel/code-server                   # Load code-server module&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
When code-server is started, it opens a web server listening on a certain port. The user has to &#039;&#039;&#039;specify the port&#039;&#039;&#039;. It can be chosen freely in the unprivileged range (above 1024). If a port is already assigned, e.g. because several users choose the same port, another port must be chosen.&lt;br /&gt;
&lt;br /&gt;
By starting code-server, you are running a web server that can be accessed by anyone logged in to the cluster. To prevent other people from gaining access to your account and data, this web server is &#039;&#039;&#039;password protected&#039;&#039;&#039;. If no variable &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt; is defined, the password in the default config file &amp;lt;code&amp;gt;~/.config/code-server/config.yaml&amp;lt;/code&amp;gt; is used. If you want to define your own password, you can either change it in the config file or export the variable &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ PASSWORD=&amp;lt;mySecret&amp;gt; \&lt;br /&gt;
    code-server \&lt;br /&gt;
      --bind-addr 0.0.0.0:8081 \&lt;br /&gt;
      --auth password  # Start code-server on port 8081&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;background:#FFCCCC; width:100%;&amp;quot;&lt;br /&gt;
| &#039;&#039;&#039;Security implications&#039;&#039;&#039;&lt;br /&gt;
Please note that by starting &amp;lt;code&amp;gt;code-server&amp;lt;/code&amp;gt; you are running a web server that can be accessed by everyone logged in on the cluster.&amp;lt;br&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;If password protection is disabled, anybody can access your account and your data.&#039;&#039;&#039;&lt;br /&gt;
* Choose a &#039;&#039;&#039;secure password&#039;&#039;&#039;!&lt;br /&gt;
* Do &#039;&#039;&#039;NOT&#039;&#039;&#039; use &amp;lt;code&amp;gt;code-server --link&amp;lt;/code&amp;gt;!&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Connect to code-server ===&lt;br /&gt;
[[File:code-server-hk.png|thumb|Code-server running on GPU node.|400px]]&lt;br /&gt;
&lt;br /&gt;
As soon as code-server is running, it can be accessed in the web browser. In order to establish the connection, a SSH tunnel from your local computer to the remote server has to be created via:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ ssh -L 8081:&amp;lt;computeNodeID&amp;gt;:8081 &amp;lt;userID&amp;gt;@uc3.scc.kit.edu&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
You need to enter the &amp;lt;code&amp;gt;computeNodeID&amp;lt;/code&amp;gt; of the node on which the interactive Slurm job is running. If you have started code server on a login node, just enter &amp;lt;code&amp;gt;localhost&amp;lt;/code&amp;gt;. Now you can open http://127.0.0.1:8081 in your web browser. Possibly, you have to allow your browser to open an insecure (non-https) site. The login site looks as follows:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:code-server-login.png|Code-server login page.|300px]]&lt;br /&gt;
&lt;br /&gt;
Enter the password from &amp;lt;code&amp;gt;~/.config/code-server/config.yaml&amp;lt;/code&amp;gt; or from the &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt; variable. After clicking the “Submit” button, the familiar VS Code interface will open in your browser.&lt;br /&gt;
&lt;br /&gt;
=== End code-server session ===&lt;br /&gt;
&lt;br /&gt;
If you want to temporarily log out from your code-server session you can open the “Application Menu” in the left side bar and click on “Log out”. To &#039;&#039;&#039;terminate&#039;&#039;&#039; the code-server session, you have to cancel it in the interactive Slurm job by pressing ++ctrl+c++.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
== Connect to Remote Jupyter Kernel ==&lt;br /&gt;
To work with your python scripts and notebooks within VSCode while using the resources of a compute node, you can create a batch job that launches JupyterLab and connect to it via VS Code. To do so, please follow the instructions below. Any parts of the scripts that might need adjustments are marked with the keyword &amp;quot;@params&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=== Simple Use Case ===&lt;br /&gt;
The most basic steps are to set a password for JupyterLab, start a job which runs JupyterLab, get the connection details from the output log and connect to it locally. The following instructions explain these steps and provide an additional script that replaces the manual step of looking into the output file.&lt;br /&gt;
&lt;br /&gt;
# Load a python module and set a password on the cluster for JupyterLab:&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
    module load devel/miniforge&lt;br /&gt;
    jupyter notebook --generate-config&lt;br /&gt;
    jupyter notebook password&lt;br /&gt;
  &amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Define a batch script to start a JupyterLab Job. Please adjust the first part according to your needs and your specific cluster.&lt;br /&gt;
#: &amp;lt;pre&amp;gt;~/jupyterlab.slurm&amp;lt;/pre&amp;gt;&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#SBATCH --partition=cpu-single&lt;br /&gt;
#SBATCH --job-name=jupyterlab&lt;br /&gt;
#SBATCH --time=00:10:00&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task 1&lt;br /&gt;
#SBATCH --mail-user=my_email_address #my_email_address # to use this generic version, add &amp;quot;alias my_email_address=&amp;lt;yourEmailAddress&amp;gt;&amp;quot; to the ~/.bashrc file&lt;br /&gt;
#SBATCH --mail-type=ALL&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# @param: change this to your preferred python or conda module&lt;br /&gt;
module load devel/miniforge&lt;br /&gt;
&lt;br /&gt;
# @param: cluster address for ssh connection&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
PORT=$(( ( RANDOM % 9999 )  + 1024 ))&lt;br /&gt;
jupyter lab --no-browser --ip=0.0.0.0 --port=${PORT}&lt;br /&gt;
HOSTID=$(squeue -h -o &amp;quot;%A %N %j&amp;quot; | grep jupyterlab | awk &#039;{print $2}&#039;)&lt;br /&gt;
echo &amp;quot;Connect&amp;quot;&lt;br /&gt;
echo &amp;quot;ssh -N -L ${PORT}:${HOSTID}:${PORT} ${USER}@$hostAddress&amp;quot;&lt;br /&gt;
echo &amp;quot;Job {$SLURM_JOB_ID} running on node {$SLURM_NODEID} on host {$HOSTID}.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
returned_code=$?&lt;br /&gt;
echo &amp;quot;&amp;gt; Script completed with exit code ${returned_code}&amp;quot;&lt;br /&gt;
exit ${returned_code}&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Run a wrapper script to execute the batch script and extract needed information from the slurm output file. You could save it together with other utility scripts in a &amp;quot;bin&amp;quot; directory in your home folder.&lt;br /&gt;
#: &amp;lt;pre&amp;gt;./bin/run_jupyterlab_simple.sh&amp;lt;/pre&amp;gt;&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# Define parameters&lt;br /&gt;
jobscript=~/jupyterlab.slurm&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
# Run job&lt;br /&gt;
job_id=$(sbatch $jobscript | awk &#039;{print $4}&#039;)&lt;br /&gt;
echo &amp;quot;jobid: $job_id&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Outfile name&lt;br /&gt;
slurm_out=slurm-${job_id}.out&lt;br /&gt;
&lt;br /&gt;
# Wait for output file&lt;br /&gt;
while [ ! -f $slurm_out ]; do   &lt;br /&gt;
    sleep 2; &lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# Wait until url is written in output file&lt;br /&gt;
while [ -z ${url} ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
    url=$(grep -o &#039;http[^ ]*&#039; $slurm_out | head -n 1); &lt;br /&gt;
    done&lt;br /&gt;
&lt;br /&gt;
# Extract hostID and port from output. The pattern assumes a node name with a length of 6 characters and a port with a length of 3, 4 or 5 numbers.&lt;br /&gt;
url_pattern=&amp;quot;http://([a-z0-9]{6}):([0-9]{3,5})/lab&amp;quot;&lt;br /&gt;
if [[ $url =~ $url_pattern ]]; then &lt;br /&gt;
    hostID=${BASH_REMATCH[1]}&lt;br /&gt;
    port=${BASH_REMATCH[2]}&lt;br /&gt;
    echo &amp;quot;To connect with the JupyterLab kernel, please enter the following into your local commandline: &amp;quot;&lt;br /&gt;
    echo &amp;quot;ssh -N -L $port:$hostID:$port ${USER}@$hostAddress&amp;quot;; &lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Note: It is normal that the ssh command doesn&#039;t end after providing the credentials. Ending the command would mean ending the local connection to the kernel.&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Afterwards, you can use the URL&amp;quot;&lt;br /&gt;
    echo &amp;quot;  http://127.0.0.1:${port}/lab &amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;to:&amp;quot;&lt;br /&gt;
    echo &amp;quot;- use the kernel in VSCode (&#039;Existing Jupyter Server...&#039;, enter URL, enter password, confirm &#039;127.0.0.1&#039;, choose kernel) or &amp;quot;&lt;br /&gt;
    echo &amp;quot;- open JupyterLab in your browser with the URL&amp;quot;&lt;br /&gt;
else&lt;br /&gt;
    echo &amp;quot;The needed information couldn&#039;t be found in the slurm output. Please contact your support unit if you need help with fixing this problem.&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
# rm $slurm_out&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Follow the instructions on the commandline to connect to the Jupyter kernel from your local machine or the Helix login node. More detailed instructions can be found below. &lt;br /&gt;
&lt;br /&gt;
==== Connect to a running job ====&lt;br /&gt;
&lt;br /&gt;
The job runs on a specific compute node and port. With this information, you can create a ssh connection to it. But first, you need to decide, in which way you want to work with your python code. The options are: &lt;br /&gt;
&lt;br /&gt;
# The code is placed locally on your computer. &lt;br /&gt;
# The code is placed on the cluster and you&#039;ve mounted the folder locally. (= The files on the cluster are accessible from within your local VS Code)&lt;br /&gt;
# The code is placed on the cluster and you work on the cluster via a remote connection in VS Code. &lt;br /&gt;
&lt;br /&gt;
Depending on the use case, you need to execute the ssh command in a different place: &lt;br /&gt;
&lt;br /&gt;
# Open VS Code on your computer. &lt;br /&gt;
# Open VS Code on your computer. &lt;br /&gt;
# Open VS Code on your computer and connect to the cluster.&lt;br /&gt;
&lt;br /&gt;
Then open a terminal and execute the ssh command, which is given in the commandline output of the wrapper script. If the terminal isn&#039;t already open, go to menu item &amp;quot;Terminal&amp;quot; at the top of the window and choose &amp;quot;New Terminal&amp;quot; (or &amp;quot;new -&amp;gt; command prompt&amp;quot; on Windows). &lt;br /&gt;
It is normal that the command doesn&#039;t end after you&#039;ve put in your credentials. Leave the terminal open and go on with the next step. &lt;br /&gt;
&lt;br /&gt;
To use the jupyter kernel that is running on the cluster node, you need to connect this kernel. This is similar to connecting any other kernel: &lt;br /&gt;
&lt;br /&gt;
# Open your code file.&lt;br /&gt;
# Click &amp;quot;Select Kernel&amp;quot; in the upper right corner. &lt;br /&gt;
# Choose &amp;quot;Existing Jupyter Server...&amp;quot;.&lt;br /&gt;
# Enter the URL that was given by the wrapper script. &lt;br /&gt;
# Enter your JupyterLab password that you set in the first step of these instructions.&lt;br /&gt;
# Confirm the prefilled value &amp;quot;127.0.0.1&amp;quot; by pressing Enter.&lt;br /&gt;
# Choose one of the virtual environments that you&#039;ve created on the cluster. You should see all python environments. To see the conda environments as well, you need to [[Helix/bwVisu/JupyterLab#Python_version | register them as ipykernel]] first. &lt;br /&gt;
&lt;br /&gt;
=== Complex Use Case ===&lt;br /&gt;
If you have different use cases for juypterlab, you could use a more flexible wrapper script, for example: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;./bin/run_jupyterlab.sh&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Starts a jupyter kernel on a node and provides information on how to connect to it locally.&lt;br /&gt;
# If you have only one use case and therefore need only one combination of slurm settings for your jupyter jobs, then you can use the simpler script.&lt;br /&gt;
# This script supports explorative analyses by allowing to overwrite parameters via commandline.&lt;br /&gt;
# Different job configurations can be defined in advance and then used with a given short name (cpu, gpu,...).&lt;br /&gt;
&lt;br /&gt;
programname=$0&lt;br /&gt;
function help {&lt;br /&gt;
    &#039;&#039;&#039;help text&#039;&#039;&#039;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Starts a jupyterlab kernel&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;usage example: $programname --param_set cpu&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --param_set string   name of the parameter set&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (examples: cpu, gpu)&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --jobscript string   optional, path of batch script&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (default: ~/jupyterlab.slurm)&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --slurm_out string   optional, name of slurm output file&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (default: slurm-${job_id}.out)&amp;quot;&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# These parameters are set later in the script. Providing them via commandline, overwrites their values set in the script.&lt;br /&gt;
jobscript=None&lt;br /&gt;
slurm_out=None&lt;br /&gt;
&lt;br /&gt;
# Process parameters&lt;br /&gt;
while [ $# -gt 0 ]; do&lt;br /&gt;
    if [[ $1 == &amp;quot;--help&amp;quot; ]]; then&lt;br /&gt;
        help&lt;br /&gt;
        exit 0&lt;br /&gt;
    # when given -p as parameter, use its value for the variable param_set&lt;br /&gt;
    elif [[ $1 == &amp;quot;-p&amp;quot; ]]; then&lt;br /&gt;
        param_set=&amp;quot;$2&amp;quot;&lt;br /&gt;
        shift&lt;br /&gt;
    elif [[ $1 == &amp;quot;--&amp;quot;* ]]; then&lt;br /&gt;
        v=&amp;quot;${1/--/}&amp;quot;&lt;br /&gt;
        declare &amp;quot;$v&amp;quot;=&amp;quot;$2&amp;quot;&lt;br /&gt;
        shift&lt;br /&gt;
    fi&lt;br /&gt;
    shift&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
function define_param_set(){&lt;br /&gt;
    &#039;&#039;&#039;Define parameter sets for sbatch&#039;&#039;&#039;&lt;br /&gt;
    # Define different sets&lt;br /&gt;
    cpu=(--partition=cpu-single --mem=2gb)&lt;br /&gt;
    gpu=(--partition=gpu-single --mem=3gb --gres=gpu:1)&lt;br /&gt;
&lt;br /&gt;
    param_set=${1}&lt;br /&gt;
    param_set=$param_set[@] &lt;br /&gt;
    param_set=(&amp;quot;${!param_set}&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
    # Add params that are the same for all sets&lt;br /&gt;
    param_set+=(--ntasks=1)&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# @param: jobscript, name of the slurm batch script to execute&lt;br /&gt;
if  [ &amp;quot;$jobscript&amp;quot; = &amp;quot;None&amp;quot; ]; then&lt;br /&gt;
    jobscript=~/jupyterlab.slurm&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @param: cluster address for ssh connection&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
# Translate given param_set value to actual set of parameters &lt;br /&gt;
define_param_set $param_set&lt;br /&gt;
echo &amp;quot;param_set: ${param_set[*]}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Run job&lt;br /&gt;
job_id=$(sbatch ${param_set[@]} $jobscript | awk &#039;{print $4}&#039;)&lt;br /&gt;
echo &amp;quot;jobid: $job_id&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# @param: slurm_out, the filename for the slurm output file&lt;br /&gt;
if  [ &amp;quot;$slurm_out&amp;quot; = &amp;quot;None&amp;quot; ]; then&lt;br /&gt;
    slurm_out=slurm-${job_id}.out&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# Wait for output file&lt;br /&gt;
while [ ! -f $slurm_out ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# Wait until url is written in output file&lt;br /&gt;
while [ -z ${url} ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
    url=$(grep -o &#039;http[^ ]*&#039; $slurm_out | head -n 1); &lt;br /&gt;
    done&lt;br /&gt;
&lt;br /&gt;
# Extract hostID and port from output.&lt;br /&gt;
url_pattern=&amp;quot;http://([a-z0-9]{6}):([0-9]{3,5})/lab&amp;quot;&lt;br /&gt;
if [[ $url =~ $url_pattern ]]; then &lt;br /&gt;
    hostID=${BASH_REMATCH[1]}&lt;br /&gt;
    port=${BASH_REMATCH[2]}&lt;br /&gt;
    echo &amp;quot;To connect with the JupyterLab kernel, please enter the following into your local commandline: &amp;quot;&lt;br /&gt;
    echo &amp;quot;ssh -N -L $port:$hostID:$port ${USER}@$hostAddress&amp;quot;; &lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;Afterwards, you can either&amp;quot;&lt;br /&gt;
echo &amp;quot;- use the kernel in VSCode or &amp;quot;&lt;br /&gt;
echo &amp;quot;- open JupyterLab with this URL: &amp;quot;&lt;br /&gt;
echo &amp;quot;  http://127.0.0.1:${port}/lab &amp;quot;&lt;br /&gt;
echo &amp;quot;Note: It is normal that the ssh command doesn&#039;t end after providing the credentials. Ending the command would mean ending the local connection to the kernel.&amp;quot;&lt;br /&gt;
#rm $slurm_out&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Development/VS_Code&amp;diff=15974</id>
		<title>Development/VS Code</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Development/VS_Code&amp;diff=15974"/>
		<updated>2026-04-21T07:04:43Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Overview ==&lt;br /&gt;
&lt;br /&gt;
[[File:vscode.png|thumb|Visual Studio Code, Source: https://code.visualstudio.com/|450px]]&lt;br /&gt;
&lt;br /&gt;
[https://github.com/Microsoft/vscode Visual Studio Code] (VS Code) is an open source source-code editor from Microsoft. It has become one of the most popular IDEs according to a [https://survey.stackoverflow.co/2024/technology#1-integrated-development-environment stackoverflow survey]. The functionality of VS Code can easily be extended by installing extensions. These extensions allow for almost arbitrary &#039;&#039;&#039;language support&#039;&#039;&#039;, &#039;&#039;&#039;debugging&#039;&#039;&#039; or &#039;&#039;&#039;remote development&#039;&#039;&#039;. You can install VS Code locally and use it for remote development.&lt;br /&gt;
&lt;br /&gt;
== VS Code extension: Remote - SSH ==&lt;br /&gt;
&lt;br /&gt;
In order to remotely develop and debug code at HPC facilities, you can use the [https://code.visualstudio.com/docs/remote/ssh &#039;&#039;&#039;Remote - SSH&#039;&#039;&#039; extension]. The extension allows you to connect your locally installed VS Code with the remote servers. So in contrast to using graphical IDEs within a remote desktop session (RDP, VNC), there are no negative effects like e.g. laggy reactions to your input or blurred display of fonts.&lt;br /&gt;
&lt;br /&gt;
=== Installation and Configuration ===&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-extensions-button.png|vscode-extensions-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to install the Remote - SSH extension, just click on the Extensions (Erweiterungen) button in the left side bar and enter “remote ssh” in the search field. Choose &#039;&#039;&#039;Remote - SSH&#039;&#039;&#039; from the occurring list and click on &#039;&#039;&#039;Install&#039;&#039;&#039;.&lt;br /&gt;
&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[[File:vscode-remoteexplorer-button.png|vscode-remoteexplorer-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to configure remote connections, open the Remote-Explorer extension. On Linux Systems, the file &amp;lt;code&amp;gt;~/.ssh/config&amp;lt;/code&amp;gt; is automatically evaluated. The targets within this file already appear in the left side bar.&lt;br /&gt;
&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[[File:vscode-remoteexplorer-add.png|vscode-remoteexplorer-add.png|350px]]&amp;lt;br&amp;gt;&lt;br /&gt;
If there are no remote ssh targets defined within this file, you can easily add one by clicking on the + symbol. Make sure that “SSH Targets” is active in the drop down menu of the Remote-Explorer. Enter the connection details &amp;lt;code&amp;gt;&amp;amp;lt;user&amp;amp;gt;@&amp;amp;lt;server&amp;amp;gt;&amp;lt;/code&amp;gt;. You will be asked, whether the file &amp;lt;code&amp;gt;~/.ssh/config&amp;lt;/code&amp;gt; should be modified or if another config file should be used or created.&lt;br /&gt;
&lt;br /&gt;
A minimal entry within &amp;lt;code&amp;gt;~/.ssh/config&amp;lt;/code&amp;gt; in order to have a remote target to be listed there could look like this:&lt;br /&gt;
&lt;br /&gt;
 $ cat ~/.ssh/config&lt;br /&gt;
 Host uc3.scc.kit.edu&lt;br /&gt;
   HostName uc3.scc.kit.edu&lt;br /&gt;
   User xy_ab1234&lt;br /&gt;
&lt;br /&gt;
=== Connect to Login Nodes ===&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-remoteexplorer-button.png|vscode-remoteexplorer-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to connect to a remote SSH target, open the Remote-Explorer. Right-click a target and connect in the current or a new window. TOTP and password can be entered in the corresponding input fields that open.&lt;br /&gt;
&lt;br /&gt;
You are now logged in on the remote server. As usual, you can open a project directory with the standard key binding Ctrl+k Ctrl+o. You can now edit and debug code.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Attention&#039;&#039;&#039;: Please remember that you are running and debugging the code on a login node. Do not perform resource-intensive tasks. Furthermore, no GPU resources are available to you.&lt;br /&gt;
&lt;br /&gt;
Extensions, which are installed locally, are only usable on your local machine and are not automatically installed remotely. However, as soon as you open the Extensions-Explorer during a remote session, VS Code proposes to install the locally installed extensions remotely.&lt;br /&gt;
&lt;br /&gt;
=== Disconnect from Login Nodes ===&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-remoteexplorer-indicator.png|images/vscode-remoteexplorer-indicator.png|200px]]&amp;lt;br&amp;gt;&lt;br /&gt;
If you want to end your remote session, click the green box in the lower left corner. In the input box that opens, select the “Close Remote Connection” option. If you simply close your VS Code window, some server-side components of VS Code will continue to run remotely.&lt;br /&gt;
&lt;br /&gt;
=== Access to Compute Nodes ===&lt;br /&gt;
&lt;br /&gt;
The workflow described above does not allow debugging on compute nodes that have been requested via an interactive Slurm job, for example. The security settings prevent the login node from being used as a proxy jump host. So there is no direct way to connect your locally installed VS code to the compute nodes. Debugging GPU codes is therefore also not possible, since this kind of resource is only accessible within Slurm jobs. Please have a look at the overview table in the first chapter to see which solution to follow.&lt;br /&gt;
&lt;br /&gt;
== Code-Server ==&lt;br /&gt;
&lt;br /&gt;
The application [https://github.com/cdr/code-server code-server] allows to run the server part of VS Code on any machine, it can be accessed in the web browser. This enables, for example, development and debugging on compute nodes.&lt;br /&gt;
Code-server runs a web server which runs on an unprivileged port. In order to connect your web browser to the remotely running code server, you have to forward this port via a SSH tunnel.&lt;br /&gt;
&lt;br /&gt;
[[File:code-server.png|thumb|code-server.png|VS Code in web browser: code-server, Source: https://github.com/cdr/code-server&amp;quot;&amp;gt;https://github.com/cdr/code-server|400px]]&lt;br /&gt;
&lt;br /&gt;
=== Install Code-Server ===&lt;br /&gt;
&lt;br /&gt;
Code-server is pre-installed on bwUniCluster and accessible via an Lmod module:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;module load devel/code-server&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
On clusters with no code-server module, the application can easily be installed with the description available on the official [https://github.com/coder/code-server GitHub page].&lt;br /&gt;
&lt;br /&gt;
=== Start Code-Server ===&lt;br /&gt;
&lt;br /&gt;
Code-server can be run on either login nodes or compute nodes. In the example shown, an interactive job is started on a GPU partition to run code-server there.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ salloc -p accelerated --gres=gpu:4 --time=30:00 # Start interactive job with 1 GPU&lt;br /&gt;
$ module load devel/code-server                   # Load code-server module&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
When code-server is started, it opens a web server listening on a certain port. The user has to &#039;&#039;&#039;specify the port&#039;&#039;&#039;. It can be chosen freely in the unprivileged range (above 1024). If a port is already assigned, e.g. because several users choose the same port, another port must be chosen.&lt;br /&gt;
&lt;br /&gt;
By starting code-server, you are running a web server that can be accessed by anyone logged in to the cluster. To prevent other people from gaining access to your account and data, this web server is &#039;&#039;&#039;password protected&#039;&#039;&#039;. If no variable &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt; is defined, the password in the default config file &amp;lt;code&amp;gt;~/.config/code-server/config.yaml&amp;lt;/code&amp;gt; is used. If you want to define your own password, you can either change it in the config file or export the variable &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ PASSWORD=&amp;lt;mySecret&amp;gt; \&lt;br /&gt;
    code-server \&lt;br /&gt;
      --bind-addr 0.0.0.0:8081 \&lt;br /&gt;
      --auth password  # Start code-server on port 8081&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;background:#FFCCCC; width:100%;&amp;quot;&lt;br /&gt;
| &#039;&#039;&#039;Security implications&#039;&#039;&#039;&lt;br /&gt;
Please note that by starting &amp;lt;code&amp;gt;code-server&amp;lt;/code&amp;gt; you are running a web server that can be accessed by everyone logged in on the cluster.&amp;lt;br&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;If password protection is disabled, anybody can access your account and your data.&#039;&#039;&#039;&lt;br /&gt;
* Choose a &#039;&#039;&#039;secure password&#039;&#039;&#039;!&lt;br /&gt;
* Do &#039;&#039;&#039;NOT&#039;&#039;&#039; use &amp;lt;code&amp;gt;code-server --link&amp;lt;/code&amp;gt;!&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Connect to code-server ===&lt;br /&gt;
[[File:code-server-hk.png|thumb|Code-server running on GPU node.|400px]]&lt;br /&gt;
&lt;br /&gt;
As soon as code-server is running, it can be accessed in the web browser. In order to establish the connection, a SSH tunnel from your local computer to the remote server has to be created via:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ ssh -L 8081:&amp;lt;computeNodeID&amp;gt;:8081 &amp;lt;userID&amp;gt;@uc3.scc.kit.edu&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
You need to enter the &amp;lt;code&amp;gt;computeNodeID&amp;lt;/code&amp;gt; of the node on which the interactive Slurm job is running. If you have started code server on a login node, just enter &amp;lt;code&amp;gt;localhost&amp;lt;/code&amp;gt;. Now you can open http://127.0.0.1:8081 in your web browser. Possibly, you have to allow your browser to open an insecure (non-https) site. The login site looks as follows:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:code-server-login.png|Code-server login page.|300px]]&lt;br /&gt;
&lt;br /&gt;
Enter the password from &amp;lt;code&amp;gt;~/.config/code-server/config.yaml&amp;lt;/code&amp;gt; or from the &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt; variable. After clicking the “Submit” button, the familiar VS Code interface will open in your browser.&lt;br /&gt;
&lt;br /&gt;
=== End code-server session ===&lt;br /&gt;
&lt;br /&gt;
If you want to temporarily log out from your code-server session you can open the “Application Menu” in the left side bar and click on “Log out”. To &#039;&#039;&#039;terminate&#039;&#039;&#039; the code-server session, you have to cancel it in the interactive Slurm job by pressing ++ctrl+c++.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
== Connect to Remote Jupyter Kernel ==&lt;br /&gt;
To work with your python scripts and notebooks within VSCode while using the resources of a compute node, you can create a batch job that launches JupyterLab and connect to it via VS Code. To do so, please follow the instructions below. Any parts of the scripts that might need adjustments are marked with the keyword &amp;quot;@params&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=== Simple Use Case ===&lt;br /&gt;
The most basic steps are to set a password for JupyterLab, start a job which runs JupyterLab, get the connection details from the output log and connect to it locally. The following instructions explain these steps and provide an additional script that replaces the manual step of looking into the output file.&lt;br /&gt;
&lt;br /&gt;
# Load a python module and set a password on the cluster for JupyterLab:&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
    module load devel/miniforge&lt;br /&gt;
    jupyter notebook --generate-config&lt;br /&gt;
    jupyter notebook password&lt;br /&gt;
  &amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Define a batch script to start a JupyterLab Job. Please adjust the first part according to your needs and your specific cluster.&lt;br /&gt;
#: &amp;lt;pre&amp;gt;~/jupyterlab.slurm&amp;lt;/pre&amp;gt;&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#SBATCH --partition=cpu-single&lt;br /&gt;
#SBATCH --job-name=jupyterlab&lt;br /&gt;
#SBATCH --time=00:10:00&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task 1&lt;br /&gt;
#SBATCH --mail-user=my_email_address #my_email_address # to use this generic version, add &amp;quot;alias my_email_address=&amp;lt;yourEmailAddress&amp;gt;&amp;quot; to the ~/.bashrc file&lt;br /&gt;
#SBATCH --mail-type=ALL&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# @param: change this to your preferred python or conda module&lt;br /&gt;
module load devel/miniforge&lt;br /&gt;
&lt;br /&gt;
# @param: cluster address for ssh connection&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
PORT=$(( ( RANDOM % 9999 )  + 1024 ))&lt;br /&gt;
jupyter lab --no-browser --ip=0.0.0.0 --port=${PORT}&lt;br /&gt;
HOSTID=$(squeue -h -o &amp;quot;%A %N %j&amp;quot; | grep jupyterlab | awk &#039;{print $2}&#039;)&lt;br /&gt;
echo &amp;quot;Connect&amp;quot;&lt;br /&gt;
echo &amp;quot;ssh -N -L ${PORT}:${HOSTID}:${PORT} ${USER}@$hostAddress&amp;quot;&lt;br /&gt;
echo &amp;quot;Job {$SLURM_JOB_ID} running on node {$SLURM_NODEID} on host {$HOSTID}.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
returned_code=$?&lt;br /&gt;
echo &amp;quot;&amp;gt; Script completed with exit code ${returned_code}&amp;quot;&lt;br /&gt;
exit ${returned_code}&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Run a wrapper script to execute the batch script and extract needed information from the slurm output file. You could save it together with other utility scripts in a &amp;quot;bin&amp;quot; directory in your home folder.&lt;br /&gt;
#: &amp;lt;pre&amp;gt;./bin/run_jupyterlab_simple.sh&amp;lt;/pre&amp;gt;&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# Define parameters&lt;br /&gt;
jobscript=~/jupyterlab.slurm&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
# Run job&lt;br /&gt;
job_id=$(sbatch $jobscript | awk &#039;{print $4}&#039;)&lt;br /&gt;
echo &amp;quot;jobid: $job_id&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Outfile name&lt;br /&gt;
slurm_out=slurm-${job_id}.out&lt;br /&gt;
&lt;br /&gt;
# Wait for output file&lt;br /&gt;
while [ ! -f $slurm_out ]; do   &lt;br /&gt;
    sleep 2; &lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# Wait until url is written in output file&lt;br /&gt;
while [ -z ${url} ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
    url=$(grep -o &#039;http[^ ]*&#039; $slurm_out | head -n 1); &lt;br /&gt;
    done&lt;br /&gt;
&lt;br /&gt;
# Extract hostID and port from output. The pattern assumes a node name with a length of 6 characters and a port with a length of 3, 4 or 5 numbers.&lt;br /&gt;
url_pattern=&amp;quot;http://([a-z0-9]{6}):([0-9]{3,5})/lab&amp;quot;&lt;br /&gt;
if [[ $url =~ $url_pattern ]]; then &lt;br /&gt;
    hostID=${BASH_REMATCH[1]}&lt;br /&gt;
    port=${BASH_REMATCH[2]}&lt;br /&gt;
    echo &amp;quot;To connect with the JupyterLab kernel, please enter the following into your local commandline: &amp;quot;&lt;br /&gt;
    echo &amp;quot;ssh -N -L $port:$hostID:$port ${USER}@$hostAddress&amp;quot;; &lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Note: It is normal that the ssh command doesn&#039;t end after providing the credentials. Ending the command would mean ending the local connection to the kernel.&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Afterwards, you can use the URL&amp;quot;&lt;br /&gt;
    echo &amp;quot;  http://127.0.0.1:${port}/lab &amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;to:&amp;quot;&lt;br /&gt;
    echo &amp;quot;- use the kernel in VSCode (&#039;Existing Jupyter Server...&#039;, enter URL, enter password, confirm &#039;127.0.0.1&#039;, choose kernel) or &amp;quot;&lt;br /&gt;
    echo &amp;quot;- open JupyterLab in your browser with the URL&amp;quot;&lt;br /&gt;
else&lt;br /&gt;
    echo &amp;quot;The needed information couldn&#039;t be found in the slurm output. Please contact your support unit if you need help with fixing this problem.&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
# rm $slurm_out&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Follow the instructions on the commandline to connect to the Jupyter kernel from your local machine or the Helix login node. More detailed instructions can be found below. &lt;br /&gt;
&lt;br /&gt;
==== Connect to a running job ====&lt;br /&gt;
&lt;br /&gt;
The job runs on a specific compute node and port. With this information, you can create a ssh connection to it. But first, you need to decide, in which way you want to work with your python code. The options are: &lt;br /&gt;
&lt;br /&gt;
# The code is placed locally on your computer. &lt;br /&gt;
# The code is placed on the cluster and you&#039;ve mounted the folder locally. (= The files on the cluster are accessible from within your local VS Code)&lt;br /&gt;
# The code is placed on the cluster and you work on the cluster via a remote connection in VS Code. &lt;br /&gt;
&lt;br /&gt;
Depending on the use case, you need to execute the ssh command in a different place: &lt;br /&gt;
&lt;br /&gt;
# Open VS Code on your computer. &lt;br /&gt;
# Open VS Code on your computer. &lt;br /&gt;
# Open VS Code on your computer and connect to the cluster.&lt;br /&gt;
&lt;br /&gt;
Then open a terminal and execute the ssh command, which is given in the commandline output of the wrapper script. If the terminal isn&#039;t already open, go to menu item &amp;quot;Terminal&amp;quot; at the top of the window and choose &amp;quot;New Terminal&amp;quot; (or &amp;quot;new -&amp;gt; command prompt&amp;quot; on Windows). &lt;br /&gt;
It is normal that the command doesn&#039;t end after you&#039;ve put in your credentials. Leave the terminal open and go on with the next step. &lt;br /&gt;
&lt;br /&gt;
To use the jupyter kernel that is running on the cluster node, you need to connect this kernel. This is similar to connecting any other kernel: &lt;br /&gt;
&lt;br /&gt;
# Open your code file.&lt;br /&gt;
# Click &amp;quot;Select Kernel&amp;quot; in the upper right corner. &lt;br /&gt;
# Choose &amp;quot;Existing Jupyter Server...&amp;quot;.&lt;br /&gt;
# Enter the URL that was given by the wrapper script. &lt;br /&gt;
# Enter your JupyterLab password that you set in the first step of these instructions.&lt;br /&gt;
# Confirm the prefilled value &amp;quot;127.0.0.1&amp;quot; by pressing Enter.&lt;br /&gt;
# Choose one of the virtual environments that you&#039;ve created on the cluster. You should see all python environments. To see the conda environments as well, you need to [[Helix/bwVisu/JupyterLab#Python_version | register them as ipykernel]] first. &lt;br /&gt;
&lt;br /&gt;
=== Complex Use Case ===&lt;br /&gt;
If you have different use cases for juypterlab, you could use a more flexible wrapper script, for example: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;./bin/run_jupyterlab.sh&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Starts a jupyter kernel on a node and provides information on how to connect to it locally.&lt;br /&gt;
# If you have only one use case and therefore need only one combination of slurm settings for your jupyter jobs, then you can use the simpler script.&lt;br /&gt;
# This script supports explorative analyses by allowing to overwrite parameters via commandline.&lt;br /&gt;
# Different job configurations can be defined in advance and then used with a given short name (cpu, gpu,...).&lt;br /&gt;
&lt;br /&gt;
programname=$0&lt;br /&gt;
function help {&lt;br /&gt;
    &#039;&#039;&#039;help text&#039;&#039;&#039;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Starts a jupyterlab kernel&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;usage example: $programname --param_set cpu&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --param_set string   name of the parameter set&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (examples: cpu, gpu)&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --jobscript string   optional, path of batch script&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (default: ~/jupyterlab.slurm)&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --slurm_out string   optional, name of slurm output file&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (default: slurm-${job_id}.out)&amp;quot;&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# These parameters are set later in the script. Providing them via commandline, overwrites their values set in the script.&lt;br /&gt;
jobscript=None&lt;br /&gt;
slurm_out=None&lt;br /&gt;
&lt;br /&gt;
# Process parameters&lt;br /&gt;
while [ $# -gt 0 ]; do&lt;br /&gt;
    if [[ $1 == &amp;quot;--help&amp;quot; ]]; then&lt;br /&gt;
        help&lt;br /&gt;
        exit 0&lt;br /&gt;
    # when given -p as parameter, use its value for the variable param_set&lt;br /&gt;
    elif [[ $1 == &amp;quot;-p&amp;quot; ]]; then&lt;br /&gt;
        param_set=&amp;quot;$2&amp;quot;&lt;br /&gt;
        shift&lt;br /&gt;
    elif [[ $1 == &amp;quot;--&amp;quot;* ]]; then&lt;br /&gt;
        v=&amp;quot;${1/--/}&amp;quot;&lt;br /&gt;
        declare &amp;quot;$v&amp;quot;=&amp;quot;$2&amp;quot;&lt;br /&gt;
        shift&lt;br /&gt;
    fi&lt;br /&gt;
    shift&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
function define_param_set(){&lt;br /&gt;
    &#039;&#039;&#039;Define parameter sets for sbatch&#039;&#039;&#039;&lt;br /&gt;
    # Define different sets&lt;br /&gt;
    cpu=(--partition=cpu-single --mem=2gb)&lt;br /&gt;
    gpu=(--partition=gpu-single --mem=3gb --gres=gpu:1)&lt;br /&gt;
&lt;br /&gt;
    param_set=${1}&lt;br /&gt;
    param_set=$param_set[@] &lt;br /&gt;
    param_set=(&amp;quot;${!param_set}&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
    # Add params that are the same for all sets&lt;br /&gt;
    param_set+=(--ntasks=1)&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# @param: jobscript, name of the slurm batch script to execute&lt;br /&gt;
if  [ &amp;quot;$jobscript&amp;quot; = &amp;quot;None&amp;quot; ]; then&lt;br /&gt;
    jobscript=~/jupyterlab.slurm&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @param: cluster address for ssh connection&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
# Translate given param_set value to actual set of parameters &lt;br /&gt;
define_param_set $param_set&lt;br /&gt;
echo &amp;quot;param_set: ${param_set[*]}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Run job&lt;br /&gt;
job_id=$(sbatch ${param_set[@]} $jobscript | awk &#039;{print $4}&#039;)&lt;br /&gt;
echo &amp;quot;jobid: $job_id&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# @param: slurm_out, the filename for the slurm output file&lt;br /&gt;
if  [ &amp;quot;$slurm_out&amp;quot; = &amp;quot;None&amp;quot; ]; then&lt;br /&gt;
    slurm_out=slurm-${job_id}.out&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# Wait for output file&lt;br /&gt;
while [ ! -f $slurm_out ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# Wait until url is written in output file&lt;br /&gt;
while [ -z ${url} ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
    url=$(grep -o &#039;http[^ ]*&#039; $slurm_out | head -n 1); &lt;br /&gt;
    done&lt;br /&gt;
&lt;br /&gt;
# Extract hostID and port from output.&lt;br /&gt;
url_pattern=&amp;quot;http://([a-z0-9]{6}):([0-9]{3,5})/lab&amp;quot;&lt;br /&gt;
if [[ $url =~ $url_pattern ]]; then &lt;br /&gt;
    hostID=${BASH_REMATCH[1]}&lt;br /&gt;
    port=${BASH_REMATCH[2]}&lt;br /&gt;
    echo &amp;quot;To connect with the JupyterLab kernel, please enter the following into your local commandline: &amp;quot;&lt;br /&gt;
    echo &amp;quot;ssh -N -L $port:$hostID:$port ${USER}@$hostAddress&amp;quot;; &lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;Afterwards, you can either&amp;quot;&lt;br /&gt;
echo &amp;quot;- use the kernel in VSCode or &amp;quot;&lt;br /&gt;
echo &amp;quot;- open JupyterLab with this URL: &amp;quot;&lt;br /&gt;
echo &amp;quot;  http://127.0.0.1:${port}/lab &amp;quot;&lt;br /&gt;
echo &amp;quot;Note: It is normal that the ssh command doesn&#039;t end after providing the credentials. Ending the command would mean ending the local connection to the kernel.&amp;quot;&lt;br /&gt;
#rm $slurm_out&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
--&amp;gt;&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Development/VS_Code&amp;diff=15973</id>
		<title>Development/VS Code</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Development/VS_Code&amp;diff=15973"/>
		<updated>2026-04-21T06:48:13Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Overview ==&lt;br /&gt;
&lt;br /&gt;
[[File:vscode.png|thumb|Visual Studio Code, Source: https://code.visualstudio.com/|450px]]&lt;br /&gt;
&lt;br /&gt;
[https://github.com/Microsoft/vscode Visual Studio Code] (VS Code) is an open source source-code editor from Microsoft. It has become one of the most popular IDEs according to a [https://survey.stackoverflow.co/2024/technology#1-integrated-development-environment stackoverflow survey]. The functionality of VS Code can easily be extended by installing extensions. These extensions allow for almost arbitrary &#039;&#039;&#039;language support&#039;&#039;&#039;, &#039;&#039;&#039;debugging&#039;&#039;&#039; or &#039;&#039;&#039;remote development&#039;&#039;&#039;. You can install VS Code locally and use it for remote development.&lt;br /&gt;
&lt;br /&gt;
== VS Code extension: Remote - SSH ==&lt;br /&gt;
&lt;br /&gt;
In order to remotely develop and debug code at HPC facilities, you can use the [https://code.visualstudio.com/docs/remote/ssh &#039;&#039;&#039;Remote - SSH&#039;&#039;&#039; extension]. The extension allows you to connect your locally installed VS Code with the remote servers. So in contrast to using graphical IDEs within a remote desktop session (RDP, VNC), there are no negative effects like e.g. laggy reactions to your input or blurred display of fonts.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Installation and Configuration ===&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-extensions-button.png|vscode-extensions-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to install the Remote - SSH extension, just click on the Extensions (Erweiterungen) button in the left side bar and enter “remote ssh” in the search field. Choose &#039;&#039;&#039;Remote - SSH&#039;&#039;&#039; from the occurring list and click on &#039;&#039;&#039;Install&#039;&#039;&#039;.&lt;br /&gt;
&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[[File:vscode-remoteexplorer-button.png|vscode-remoteexplorer-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to configure remote connections, open the Remote-Explorer extension. On Linux Systems, the file &amp;lt;code&amp;gt;~/.ssh/config&amp;lt;/code&amp;gt; is automatically evaluated. The targets within this file already appear in the left side bar.&lt;br /&gt;
&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[[File:vscode-remoteexplorer-add.png|vscode-remoteexplorer-add.png|350px]]&amp;lt;br&amp;gt;&lt;br /&gt;
If there are no remote ssh targets defined within this file, you can easily add one by clicking on the + symbol. Make sure that “SSH Targets” is active in the drop down menu of the Remote-Explorer. Enter the connection details &amp;lt;code&amp;gt;&amp;amp;lt;user&amp;amp;gt;@&amp;amp;lt;server&amp;amp;gt;&amp;lt;/code&amp;gt;. You will be asked, whether the file &amp;lt;code&amp;gt;~/.ssh/config&amp;lt;/code&amp;gt; should be modified or if another config file should be used or created.&lt;br /&gt;
&lt;br /&gt;
A minimal entry within &amp;lt;code&amp;gt;~/.ssh/config&amp;lt;/code&amp;gt; in order to have a remote target to be listed there could look like this:&lt;br /&gt;
&lt;br /&gt;
 $ cat ~/.ssh/config&lt;br /&gt;
 Host uc3.scc.kit.edu&lt;br /&gt;
   HostName uc3.scc.kit.edu&lt;br /&gt;
   User xy_ab1234&lt;br /&gt;
&lt;br /&gt;
=== Connect to Login Nodes ===&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-remoteexplorer-button.png|vscode-remoteexplorer-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to connect to a remote SSH target, open the Remote-Explorer. Right-click a target and connect in the current or a new window. TOTP and password can be entered in the corresponding input fields that open.&lt;br /&gt;
&lt;br /&gt;
You are now logged in on the remote server. As usual, you can open a project directory with the standard key binding Ctrl+k Ctrl+o. You can now edit and debug code.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Attention&#039;&#039;&#039;: Please remember that you are running and debugging the code on a login node. Do not perform resource-intensive tasks. Furthermore, no GPU resources are available to you.&lt;br /&gt;
&lt;br /&gt;
Extensions, which are installed locally, are only usable on your local machine and are not automatically installed remotely. However, as soon as you open the Extensions-Explorer during a remote session, VS Code proposes to install the locally installed extensions remotely.&lt;br /&gt;
&lt;br /&gt;
=== Disconnect from Login Nodes ===&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-remoteexplorer-indicator.png|images/vscode-remoteexplorer-indicator.png|200px]]&amp;lt;br&amp;gt;&lt;br /&gt;
If you want to end your remote session, click the green box in the lower left corner. In the input box that opens, select the “Close Remote Connection” option. If you simply close your VS Code window, some server-side components of VS Code will continue to run remotely.&lt;br /&gt;
&lt;br /&gt;
=== Access to Compute Nodes ===&lt;br /&gt;
&lt;br /&gt;
The workflow described above does not allow debugging on compute nodes that have been requested via an interactive Slurm job, for example. The security settings prevent the login node from being used as a proxy jump host. So there is no direct way to connect your locally installed VS code to the compute nodes. Debugging GPU codes is therefore also not possible, since this kind of resource is only accessible within Slurm jobs. Please have a look at the overview table in the first chapter to see which solution to follow.&lt;br /&gt;
&lt;br /&gt;
== Code-Server ==&lt;br /&gt;
&lt;br /&gt;
The application [https://github.com/cdr/code-server code-server] allows to run the server part of VS Code on any machine, it can be accessed in the web browser. This enables, for example, development and debugging on compute nodes.&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:code-server.png|thumb|code-server.png|VS Code in web browser: code-server, Source: https://github.com/cdr/code-server&amp;quot;&amp;gt;https://github.com/cdr/code-server|400px]]&lt;br /&gt;
&lt;br /&gt;
=== Install Code-Server ===&lt;br /&gt;
&lt;br /&gt;
From the following table you can see which instructions you need to follow to develop on a bwHPC cluster with VS Code.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; &lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;|Cluster&lt;br /&gt;
! Description&lt;br /&gt;
! Commands&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| bwUniCluster&lt;br /&gt;
| Setup with [[Development/VS_Code#code-server | Code Server]]&lt;br /&gt;
| &amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;module load devel/code-server&amp;lt;/source&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Other&lt;br /&gt;
| Setup with [[Development/VS_Code#Connect_to_Remote_Jupyter_Kernel | Jupyter kernel]] or [[Development/VS_Code#Install_Code-Server | install Code-Server]]&lt;br /&gt;
| -&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
If no code-server module is provided, you can install it yourself. &lt;br /&gt;
# Download the latest release archive for your system from GitHub and unpack it.&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
    # Look up the version that you want to install: https://github.com/coder/code-server/releases&lt;br /&gt;
    VERSION=4.101.2&lt;br /&gt;
    mkdir -p ~/.local/lib ~/.local/bin&lt;br /&gt;
    curl -fL https://github.com/coder/code-server/releases/download/v$VERSION/code-server-$VERSION-linux-amd64.tar.gz \&lt;br /&gt;
    | tar -C ~/.local/lib -xz&lt;br /&gt;
  &amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# You can run code-server by executing &amp;quot;./bin/code-server&amp;quot; or add ./bin/code-server to your $PATH and run it with &amp;quot;code-server&amp;quot; &lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
    mv ~/.local/lib/code-server-$VERSION-linux-amd64 ~/.local/lib/code-server-$VERSION&lt;br /&gt;
    ln -s ~/.local/lib/code-server-$VERSION/bin/code-server ~/.local/bin/code-server&lt;br /&gt;
    # Add the following line in your ~/.bashrc&lt;br /&gt;
    export PATH=&amp;quot;~/.local/bin:$PATH&amp;quot;&lt;br /&gt;
  &amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Start Code-Server ===&lt;br /&gt;
&lt;br /&gt;
Code-server can be run on either login nodes or compute nodes. In the example shown, an interactive job is started on a GPU partition to run code-server there.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ salloc -p accelerated --gres=gpu:4 --time=30:00 # Start interactive job with 1 GPU&lt;br /&gt;
$ module load devel/code-server                   # Load code-server module&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
When code-server is started, it opens a web server listening on a certain port. The user has to &#039;&#039;&#039;specify the port&#039;&#039;&#039;. It can be chosen freely in the unprivileged range (above 1024). If a port is already assigned, e.g. because several users choose the same port, another port must be chosen.&lt;br /&gt;
&lt;br /&gt;
By starting code-server, you are running a web server that can be accessed by anyone logged in to the cluster. To prevent other people from gaining access to your account and data, this web server is &#039;&#039;&#039;password protected&#039;&#039;&#039;. If no variable &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt; is defined, the password in the default config file &amp;lt;code&amp;gt;~/.config/code-server/config.yaml&amp;lt;/code&amp;gt; is used. If you want to define your own password, you can either change it in the config file or export the variable &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ PASSWORD=&amp;lt;mySecret&amp;gt; \&lt;br /&gt;
    code-server \&lt;br /&gt;
      --bind-addr 0.0.0.0:8081 \&lt;br /&gt;
      --auth password  # Start code-server on port 8081&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;background:#FFCCCC; width:100%;&amp;quot;&lt;br /&gt;
| &#039;&#039;&#039;Security implications&#039;&#039;&#039;&lt;br /&gt;
Please note that by starting &amp;lt;code&amp;gt;code-server&amp;lt;/code&amp;gt; you are running a web server that can be accessed by everyone logged in on the cluster.&amp;lt;br&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;If password protection is disabled, anybody can access your account and your data.&#039;&#039;&#039;&lt;br /&gt;
* Choose a &#039;&#039;&#039;secure password&#039;&#039;&#039;!&lt;br /&gt;
* Do &#039;&#039;&#039;NOT&#039;&#039;&#039; use &amp;lt;code&amp;gt;code-server --link&amp;lt;/code&amp;gt;!&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Connect to code-server ===&lt;br /&gt;
[[File:code-server-hk.png|thumb|Code-server running on GPU node.|400px]]&lt;br /&gt;
&lt;br /&gt;
As soon as code-server is running, it can be accessed in the web browser. In order to establish the connection, a SSH tunnel from your local computer to the remote server has to be created via:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ ssh -L 8081:&amp;lt;computeNodeID&amp;gt;:8081 &amp;lt;userID&amp;gt;@uc3.scc.kit.edu&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
You need to enter the &amp;lt;code&amp;gt;computeNodeID&amp;lt;/code&amp;gt; of the node on which the interactive Slurm job is running. If you have started code server on a login node, just enter &amp;lt;code&amp;gt;localhost&amp;lt;/code&amp;gt;. Now you can open http://127.0.0.1:8081 in your web browser. Possibly, you have to allow your browser to open an insecure (non-https) site. The login site looks as follows:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:code-server-login.png|Code-server login page.|300px]]&lt;br /&gt;
&lt;br /&gt;
Enter the password from &amp;lt;code&amp;gt;~/.config/code-server/config.yaml&amp;lt;/code&amp;gt; or from the &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt; variable. After clicking the “Submit” button, the familiar VS Code interface will open in your browser.&lt;br /&gt;
&lt;br /&gt;
=== End code-server session ===&lt;br /&gt;
&lt;br /&gt;
If you want to temporarily log out from your code-server session you can open the “Application Menu” in the left side bar and click on “Log out”. To &#039;&#039;&#039;terminate&#039;&#039;&#039; the code-server session, you have to cancel it in the interactive Slurm job by pressing ++ctrl+c++.&lt;br /&gt;
&lt;br /&gt;
== Connect to Remote Jupyter Kernel ==&lt;br /&gt;
To work with your python scripts and notebooks within VSCode while using the resources of a compute node, you can create a batch job that launches JupyterLab and connect to it via VS Code. To do so, please follow the instructions below. Any parts of the scripts that might need adjustments are marked with the keyword &amp;quot;@params&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=== Simple Use Case ===&lt;br /&gt;
The most basic steps are to set a password for JupyterLab, start a job which runs JupyterLab, get the connection details from the output log and connect to it locally. The following instructions explain these steps and provide an additional script that replaces the manual step of looking into the output file.&lt;br /&gt;
&lt;br /&gt;
# Load a python module and set a password on the cluster for JupyterLab:&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
    module load devel/miniforge&lt;br /&gt;
    jupyter notebook --generate-config&lt;br /&gt;
    jupyter notebook password&lt;br /&gt;
  &amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Define a batch script to start a JupyterLab Job. Please adjust the first part according to your needs and your specific cluster.&lt;br /&gt;
#: &amp;lt;pre&amp;gt;~/jupyterlab.slurm&amp;lt;/pre&amp;gt;&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#SBATCH --partition=cpu-single&lt;br /&gt;
#SBATCH --job-name=jupyterlab&lt;br /&gt;
#SBATCH --time=00:10:00&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task 1&lt;br /&gt;
#SBATCH --mail-user=my_email_address #my_email_address # to use this generic version, add &amp;quot;alias my_email_address=&amp;lt;yourEmailAddress&amp;gt;&amp;quot; to the ~/.bashrc file&lt;br /&gt;
#SBATCH --mail-type=ALL&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# @param: change this to your preferred python or conda module&lt;br /&gt;
module load devel/miniforge&lt;br /&gt;
&lt;br /&gt;
# @param: cluster address for ssh connection&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
PORT=$(( ( RANDOM % 9999 )  + 1024 ))&lt;br /&gt;
jupyter lab --no-browser --ip=0.0.0.0 --port=${PORT}&lt;br /&gt;
HOSTID=$(squeue -h -o &amp;quot;%A %N %j&amp;quot; | grep jupyterlab | awk &#039;{print $2}&#039;)&lt;br /&gt;
echo &amp;quot;Connect&amp;quot;&lt;br /&gt;
echo &amp;quot;ssh -N -L ${PORT}:${HOSTID}:${PORT} ${USER}@$hostAddress&amp;quot;&lt;br /&gt;
echo &amp;quot;Job {$SLURM_JOB_ID} running on node {$SLURM_NODEID} on host {$HOSTID}.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
returned_code=$?&lt;br /&gt;
echo &amp;quot;&amp;gt; Script completed with exit code ${returned_code}&amp;quot;&lt;br /&gt;
exit ${returned_code}&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Run a wrapper script to execute the batch script and extract needed information from the slurm output file. You could save it together with other utility scripts in a &amp;quot;bin&amp;quot; directory in your home folder.&lt;br /&gt;
#: &amp;lt;pre&amp;gt;./bin/run_jupyterlab_simple.sh&amp;lt;/pre&amp;gt;&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# Define parameters&lt;br /&gt;
jobscript=~/jupyterlab.slurm&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
# Run job&lt;br /&gt;
job_id=$(sbatch $jobscript | awk &#039;{print $4}&#039;)&lt;br /&gt;
echo &amp;quot;jobid: $job_id&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Outfile name&lt;br /&gt;
slurm_out=slurm-${job_id}.out&lt;br /&gt;
&lt;br /&gt;
# Wait for output file&lt;br /&gt;
while [ ! -f $slurm_out ]; do   &lt;br /&gt;
    sleep 2; &lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# Wait until url is written in output file&lt;br /&gt;
while [ -z ${url} ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
    url=$(grep -o &#039;http[^ ]*&#039; $slurm_out | head -n 1); &lt;br /&gt;
    done&lt;br /&gt;
&lt;br /&gt;
# Extract hostID and port from output. The pattern assumes a node name with a length of 6 characters and a port with a length of 3, 4 or 5 numbers.&lt;br /&gt;
url_pattern=&amp;quot;http://([a-z0-9]{6}):([0-9]{3,5})/lab&amp;quot;&lt;br /&gt;
if [[ $url =~ $url_pattern ]]; then &lt;br /&gt;
    hostID=${BASH_REMATCH[1]}&lt;br /&gt;
    port=${BASH_REMATCH[2]}&lt;br /&gt;
    echo &amp;quot;To connect with the JupyterLab kernel, please enter the following into your local commandline: &amp;quot;&lt;br /&gt;
    echo &amp;quot;ssh -N -L $port:$hostID:$port ${USER}@$hostAddress&amp;quot;; &lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Note: It is normal that the ssh command doesn&#039;t end after providing the credentials. Ending the command would mean ending the local connection to the kernel.&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Afterwards, you can use the URL&amp;quot;&lt;br /&gt;
    echo &amp;quot;  http://127.0.0.1:${port}/lab &amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;to:&amp;quot;&lt;br /&gt;
    echo &amp;quot;- use the kernel in VSCode (&#039;Existing Jupyter Server...&#039;, enter URL, enter password, confirm &#039;127.0.0.1&#039;, choose kernel) or &amp;quot;&lt;br /&gt;
    echo &amp;quot;- open JupyterLab in your browser with the URL&amp;quot;&lt;br /&gt;
else&lt;br /&gt;
    echo &amp;quot;The needed information couldn&#039;t be found in the slurm output. Please contact your support unit if you need help with fixing this problem.&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
# rm $slurm_out&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Follow the instructions on the commandline to connect to the Jupyter kernel from your local machine or the Helix login node. More detailed instructions can be found below. &lt;br /&gt;
&lt;br /&gt;
==== Connect to a running job ====&lt;br /&gt;
&lt;br /&gt;
The job runs on a specific compute node and port. With this information, you can create a ssh connection to it. But first, you need to decide, in which way you want to work with your python code. The options are: &lt;br /&gt;
&lt;br /&gt;
# The code is placed locally on your computer. &lt;br /&gt;
# The code is placed on the cluster and you&#039;ve mounted the folder locally. (= The files on the cluster are accessible from within your local VS Code)&lt;br /&gt;
# The code is placed on the cluster and you work on the cluster via a remote connection in VS Code. &lt;br /&gt;
&lt;br /&gt;
Depending on the use case, you need to execute the ssh command in a different place: &lt;br /&gt;
&lt;br /&gt;
# Open VS Code on your computer. &lt;br /&gt;
# Open VS Code on your computer. &lt;br /&gt;
# Open VS Code on your computer and connect to the cluster.&lt;br /&gt;
&lt;br /&gt;
Then open a terminal and execute the ssh command, which is given in the commandline output of the wrapper script. If the terminal isn&#039;t already open, go to menu item &amp;quot;Terminal&amp;quot; at the top of the window and choose &amp;quot;New Terminal&amp;quot; (or &amp;quot;new -&amp;gt; command prompt&amp;quot; on Windows). &lt;br /&gt;
It is normal that the command doesn&#039;t end after you&#039;ve put in your credentials. Leave the terminal open and go on with the next step. &lt;br /&gt;
&lt;br /&gt;
To use the jupyter kernel that is running on the cluster node, you need to connect this kernel. This is similar to connecting any other kernel: &lt;br /&gt;
&lt;br /&gt;
# Open your code file.&lt;br /&gt;
# Click &amp;quot;Select Kernel&amp;quot; in the upper right corner. &lt;br /&gt;
# Choose &amp;quot;Existing Jupyter Server...&amp;quot;.&lt;br /&gt;
# Enter the URL that was given by the wrapper script. &lt;br /&gt;
# Enter your JupyterLab password that you set in the first step of these instructions.&lt;br /&gt;
# Confirm the prefilled value &amp;quot;127.0.0.1&amp;quot; by pressing Enter.&lt;br /&gt;
# Choose one of the virtual environments that you&#039;ve created on the cluster. You should see all python environments. To see the conda environments as well, you need to [[Helix/bwVisu/JupyterLab#Python_version | register them as ipykernel]] first. &lt;br /&gt;
&lt;br /&gt;
=== Complex Use Case ===&lt;br /&gt;
If you have different use cases for juypterlab, you could use a more flexible wrapper script, for example: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;./bin/run_jupyterlab.sh&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Starts a jupyter kernel on a node and provides information on how to connect to it locally.&lt;br /&gt;
# If you have only one use case and therefore need only one combination of slurm settings for your jupyter jobs, then you can use the simpler script.&lt;br /&gt;
# This script supports explorative analyses by allowing to overwrite parameters via commandline.&lt;br /&gt;
# Different job configurations can be defined in advance and then used with a given short name (cpu, gpu,...).&lt;br /&gt;
&lt;br /&gt;
programname=$0&lt;br /&gt;
function help {&lt;br /&gt;
    &#039;&#039;&#039;help text&#039;&#039;&#039;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Starts a jupyterlab kernel&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;usage example: $programname --param_set cpu&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --param_set string   name of the parameter set&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (examples: cpu, gpu)&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --jobscript string   optional, path of batch script&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (default: ~/jupyterlab.slurm)&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --slurm_out string   optional, name of slurm output file&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (default: slurm-${job_id}.out)&amp;quot;&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# These parameters are set later in the script. Providing them via commandline, overwrites their values set in the script.&lt;br /&gt;
jobscript=None&lt;br /&gt;
slurm_out=None&lt;br /&gt;
&lt;br /&gt;
# Process parameters&lt;br /&gt;
while [ $# -gt 0 ]; do&lt;br /&gt;
    if [[ $1 == &amp;quot;--help&amp;quot; ]]; then&lt;br /&gt;
        help&lt;br /&gt;
        exit 0&lt;br /&gt;
    # when given -p as parameter, use its value for the variable param_set&lt;br /&gt;
    elif [[ $1 == &amp;quot;-p&amp;quot; ]]; then&lt;br /&gt;
        param_set=&amp;quot;$2&amp;quot;&lt;br /&gt;
        shift&lt;br /&gt;
    elif [[ $1 == &amp;quot;--&amp;quot;* ]]; then&lt;br /&gt;
        v=&amp;quot;${1/--/}&amp;quot;&lt;br /&gt;
        declare &amp;quot;$v&amp;quot;=&amp;quot;$2&amp;quot;&lt;br /&gt;
        shift&lt;br /&gt;
    fi&lt;br /&gt;
    shift&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
function define_param_set(){&lt;br /&gt;
    &#039;&#039;&#039;Define parameter sets for sbatch&#039;&#039;&#039;&lt;br /&gt;
    # Define different sets&lt;br /&gt;
    cpu=(--partition=cpu-single --mem=2gb)&lt;br /&gt;
    gpu=(--partition=gpu-single --mem=3gb --gres=gpu:1)&lt;br /&gt;
&lt;br /&gt;
    param_set=${1}&lt;br /&gt;
    param_set=$param_set[@] &lt;br /&gt;
    param_set=(&amp;quot;${!param_set}&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
    # Add params that are the same for all sets&lt;br /&gt;
    param_set+=(--ntasks=1)&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# @param: jobscript, name of the slurm batch script to execute&lt;br /&gt;
if  [ &amp;quot;$jobscript&amp;quot; = &amp;quot;None&amp;quot; ]; then&lt;br /&gt;
    jobscript=~/jupyterlab.slurm&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @param: cluster address for ssh connection&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
# Translate given param_set value to actual set of parameters &lt;br /&gt;
define_param_set $param_set&lt;br /&gt;
echo &amp;quot;param_set: ${param_set[*]}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Run job&lt;br /&gt;
job_id=$(sbatch ${param_set[@]} $jobscript | awk &#039;{print $4}&#039;)&lt;br /&gt;
echo &amp;quot;jobid: $job_id&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# @param: slurm_out, the filename for the slurm output file&lt;br /&gt;
if  [ &amp;quot;$slurm_out&amp;quot; = &amp;quot;None&amp;quot; ]; then&lt;br /&gt;
    slurm_out=slurm-${job_id}.out&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# Wait for output file&lt;br /&gt;
while [ ! -f $slurm_out ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# Wait until url is written in output file&lt;br /&gt;
while [ -z ${url} ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
    url=$(grep -o &#039;http[^ ]*&#039; $slurm_out | head -n 1); &lt;br /&gt;
    done&lt;br /&gt;
&lt;br /&gt;
# Extract hostID and port from output.&lt;br /&gt;
url_pattern=&amp;quot;http://([a-z0-9]{6}):([0-9]{3,5})/lab&amp;quot;&lt;br /&gt;
if [[ $url =~ $url_pattern ]]; then &lt;br /&gt;
    hostID=${BASH_REMATCH[1]}&lt;br /&gt;
    port=${BASH_REMATCH[2]}&lt;br /&gt;
    echo &amp;quot;To connect with the JupyterLab kernel, please enter the following into your local commandline: &amp;quot;&lt;br /&gt;
    echo &amp;quot;ssh -N -L $port:$hostID:$port ${USER}@$hostAddress&amp;quot;; &lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;Afterwards, you can either&amp;quot;&lt;br /&gt;
echo &amp;quot;- use the kernel in VSCode or &amp;quot;&lt;br /&gt;
echo &amp;quot;- open JupyterLab with this URL: &amp;quot;&lt;br /&gt;
echo &amp;quot;  http://127.0.0.1:${port}/lab &amp;quot;&lt;br /&gt;
echo &amp;quot;Note: It is normal that the ssh command doesn&#039;t end after providing the credentials. Ending the command would mean ending the local connection to the kernel.&amp;quot;&lt;br /&gt;
#rm $slurm_out&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Development/VS_Code&amp;diff=15972</id>
		<title>Development/VS Code</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Development/VS_Code&amp;diff=15972"/>
		<updated>2026-04-21T06:38:30Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Overview ==&lt;br /&gt;
&lt;br /&gt;
[[File:vscode.png|thumb|Visual Studio Code, Source: https://code.visualstudio.com/|450px]]&lt;br /&gt;
&lt;br /&gt;
[https://github.com/Microsoft/vscode Visual Studio Code] (VS Code) is an open source source-code editor from Microsoft. It has become one of the most popular IDEs according to a [https://survey.stackoverflow.co/2024/technology#1-integrated-development-environment stackoverflow survey]. The functionality of VS Code can easily be extended by installing extensions. These extensions allow for almost arbitrary &#039;&#039;&#039;language support&#039;&#039;&#039;, &#039;&#039;&#039;debugging&#039;&#039;&#039; or &#039;&#039;&#039;remote development&#039;&#039;&#039;. You can install VS Code locally and use it for remote development.&lt;br /&gt;
&lt;br /&gt;
== Remote - SSH ==&lt;br /&gt;
&lt;br /&gt;
In order to remotely develop and debug code at HPC facilities, you can use the [https://code.visualstudio.com/docs/remote/ssh &#039;&#039;&#039;Remote - SSH&#039;&#039;&#039; extension]. The extension allows you to connect your locally installed VS Code with the remote servers. So in contrast to using graphical IDEs within a remote desktop session (RDP, VNC), there are no negative effects like e.g. laggy reactions to your input or blurred display of fonts.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Installation and Configuration ===&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-extensions-button.png|vscode-extensions-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to install the Remote - SSH extension, just click on the Extensions (Erweiterungen) button in the left side bar and enter “remote ssh” in the search field. Choose &#039;&#039;&#039;Remote - SSH&#039;&#039;&#039; from the occurring list and click on &#039;&#039;&#039;Install&#039;&#039;&#039;.&lt;br /&gt;
&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[[File:vscode-remoteexplorer-button.png|vscode-remoteexplorer-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to configure remote connections, open the Remote-Explorer extension. On Linux Systems, the file &amp;lt;code&amp;gt;~/.ssh/config&amp;lt;/code&amp;gt; is automatically evaluated. The targets within this file already appear in the left side bar.&lt;br /&gt;
&amp;lt;br&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
[[File:vscode-remoteexplorer-add.png|vscode-remoteexplorer-add.png|350px]]&amp;lt;br&amp;gt;&lt;br /&gt;
If there are no remote ssh targets defined within this file, you can easily add one by clicking on the + symbol. Make sure that “SSH Targets” is active in the drop down menu of the Remote-Explorer. Enter the connection details &amp;lt;code&amp;gt;&amp;amp;lt;user&amp;amp;gt;@&amp;amp;lt;server&amp;amp;gt;&amp;lt;/code&amp;gt;. You will be asked, whether the file &amp;lt;code&amp;gt;~/.ssh/config&amp;lt;/code&amp;gt; should be modified or if another config file should be used or created.&lt;br /&gt;
&lt;br /&gt;
=== Connect to Login Nodes ===&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-remoteexplorer-button.png|vscode-remoteexplorer-button.png|30px]]&amp;lt;br&amp;gt;&lt;br /&gt;
In order to connect to a remote SSH target, open the Remote-Explorer. Right-click a target and connect in the current or a new window. TOTP and password can be entered in the corresponding input fields that open.&lt;br /&gt;
&lt;br /&gt;
You are now logged in on the remote server. As usual, you can open a project directory with the standard key binding Ctrl+k Ctrl+o. You can now edit and debug code.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Attention&#039;&#039;&#039;: Please remember that you are running and debugging the code on a login node. Do not perform resource-intensive tasks. Furthermore, no GPU resources are available to you.&lt;br /&gt;
&lt;br /&gt;
Extensions, which are installed locally, are only usable on your local machine and are not automatically installed remotely. However, as soon as you open the Extensions-Explorer during a remote session, VS Code proposes to install the locally installed extensions remotely.&lt;br /&gt;
&lt;br /&gt;
=== Disconnect from Login Nodes ===&lt;br /&gt;
&lt;br /&gt;
[[File:vscode-remoteexplorer-indicator.png|images/vscode-remoteexplorer-indicator.png|200px]]&amp;lt;br&amp;gt;&lt;br /&gt;
If you want to end your remote session, click the green box in the lower left corner. In the input box that opens, select the “Close Remote Connection” option. If you simply close your VS Code window, some server-side components of VS Code will continue to run remotely.&lt;br /&gt;
&lt;br /&gt;
=== Access to Compute Nodes ===&lt;br /&gt;
&lt;br /&gt;
The workflow described above does not allow debugging on compute nodes that have been requested via an interactive Slurm job, for example. The security settings prevent the login node from being used as a proxy jump host. So there is no direct way to connect your locally installed VS code to the compute nodes. Debugging GPU codes is therefore also not possible, since this kind of resource is only accessible within Slurm jobs. Please have a look at the overview table in the first chapter to see which solution to follow.&lt;br /&gt;
&lt;br /&gt;
== Code-Server ==&lt;br /&gt;
&lt;br /&gt;
The application [https://github.com/cdr/code-server code-server] allows to run the server part of VS Code on any machine, it can be accessed in the web browser. This enables, for example, development and debugging on compute nodes.&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
[[File:code-server.png|thumb|code-server.png|VS Code in web browser: code-server, Source: https://github.com/cdr/code-server&amp;quot;&amp;gt;https://github.com/cdr/code-server|400px]]&lt;br /&gt;
&lt;br /&gt;
=== Install Code-Server ===&lt;br /&gt;
&lt;br /&gt;
From the following table you can see which instructions you need to follow to develop on a bwHPC cluster with VS Code.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; &lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;|Cluster&lt;br /&gt;
! Description&lt;br /&gt;
! Commands&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| bwUniCluster&lt;br /&gt;
| Setup with [[Development/VS_Code#code-server | Code Server]]&lt;br /&gt;
| &amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;module load devel/code-server&amp;lt;/source&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Other&lt;br /&gt;
| Setup with [[Development/VS_Code#Connect_to_Remote_Jupyter_Kernel | Jupyter kernel]] or [[Development/VS_Code#Install_Code-Server | install Code-Server]]&lt;br /&gt;
| -&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
If no code-server module is provided, you can install it yourself. &lt;br /&gt;
# Download the latest release archive for your system from GitHub and unpack it.&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
    # Look up the version that you want to install: https://github.com/coder/code-server/releases&lt;br /&gt;
    VERSION=4.101.2&lt;br /&gt;
    mkdir -p ~/.local/lib ~/.local/bin&lt;br /&gt;
    curl -fL https://github.com/coder/code-server/releases/download/v$VERSION/code-server-$VERSION-linux-amd64.tar.gz \&lt;br /&gt;
    | tar -C ~/.local/lib -xz&lt;br /&gt;
  &amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# You can run code-server by executing &amp;quot;./bin/code-server&amp;quot; or add ./bin/code-server to your $PATH and run it with &amp;quot;code-server&amp;quot; &lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
    mv ~/.local/lib/code-server-$VERSION-linux-amd64 ~/.local/lib/code-server-$VERSION&lt;br /&gt;
    ln -s ~/.local/lib/code-server-$VERSION/bin/code-server ~/.local/bin/code-server&lt;br /&gt;
    # Add the following line in your ~/.bashrc&lt;br /&gt;
    export PATH=&amp;quot;~/.local/bin:$PATH&amp;quot;&lt;br /&gt;
  &amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Start Code-Server ===&lt;br /&gt;
&lt;br /&gt;
Code-server can be run on either login nodes or compute nodes. In the example shown, an interactive job is started on a GPU partition to run code-server there.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ salloc -p accelerated --gres=gpu:4 --time=30:00 # Start interactive job with 1 GPU&lt;br /&gt;
$ module load devel/code-server                   # Load code-server module&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
When code-server is started, it opens a web server listening on a certain port. The user has to &#039;&#039;&#039;specify the port&#039;&#039;&#039;. It can be chosen freely in the unprivileged range (above 1024). If a port is already assigned, e.g. because several users choose the same port, another port must be chosen.&lt;br /&gt;
&lt;br /&gt;
By starting code-server, you are running a web server that can be accessed by anyone logged in to the cluster. To prevent other people from gaining access to your account and data, this web server is &#039;&#039;&#039;password protected&#039;&#039;&#039;. If no variable &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt; is defined, the password in the default config file &amp;lt;code&amp;gt;~/.config/code-server/config.yaml&amp;lt;/code&amp;gt; is used. If you want to define your own password, you can either change it in the config file or export the variable &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ PASSWORD=&amp;lt;mySecret&amp;gt; \&lt;br /&gt;
    code-server \&lt;br /&gt;
      --bind-addr 0.0.0.0:8081 \&lt;br /&gt;
      --auth password  # Start code-server on port 8081&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;background:#FFCCCC; width:100%;&amp;quot;&lt;br /&gt;
| &#039;&#039;&#039;Security implications&#039;&#039;&#039;&lt;br /&gt;
Please note that by starting &amp;lt;code&amp;gt;code-server&amp;lt;/code&amp;gt; you are running a web server that can be accessed by everyone logged in on the cluster.&amp;lt;br&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;If password protection is disabled, anybody can access your account and your data.&#039;&#039;&#039;&lt;br /&gt;
* Choose a &#039;&#039;&#039;secure password&#039;&#039;&#039;!&lt;br /&gt;
* Do &#039;&#039;&#039;NOT&#039;&#039;&#039; use &amp;lt;code&amp;gt;code-server --link&amp;lt;/code&amp;gt;!&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Connect to code-server ===&lt;br /&gt;
[[File:code-server-hk.png|thumb|Code-server running on GPU node.|400px]]&lt;br /&gt;
&lt;br /&gt;
As soon as code-server is running, it can be accessed in the web browser. In order to establish the connection, a SSH tunnel from your local computer to the remote server has to be created via:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;console&amp;quot;&amp;gt;$ ssh -L 8081:&amp;lt;computeNodeID&amp;gt;:8081 &amp;lt;userID&amp;gt;@uc3.scc.kit.edu&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
You need to enter the &amp;lt;code&amp;gt;computeNodeID&amp;lt;/code&amp;gt; of the node on which the interactive Slurm job is running. If you have started code server on a login node, just enter &amp;lt;code&amp;gt;localhost&amp;lt;/code&amp;gt;. Now you can open http://127.0.0.1:8081 in your web browser. Possibly, you have to allow your browser to open an insecure (non-https) site. The login site looks as follows:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:code-server-login.png|Code-server login page.|300px]]&lt;br /&gt;
&lt;br /&gt;
Enter the password from &amp;lt;code&amp;gt;~/.config/code-server/config.yaml&amp;lt;/code&amp;gt; or from the &amp;lt;code&amp;gt;PASSWORD&amp;lt;/code&amp;gt; variable. After clicking the “Submit” button, the familiar VS Code interface will open in your browser.&lt;br /&gt;
&lt;br /&gt;
=== End code-server session ===&lt;br /&gt;
&lt;br /&gt;
If you want to temporarily log out from your code-server session you can open the “Application Menu” in the left side bar and click on “Log out”. To &#039;&#039;&#039;terminate&#039;&#039;&#039; the code-server session, you have to cancel it in the interactive Slurm job by pressing ++ctrl+c++.&lt;br /&gt;
&lt;br /&gt;
== Connect to Remote Jupyter Kernel ==&lt;br /&gt;
To work with your python scripts and notebooks within VSCode while using the resources of a compute node, you can create a batch job that launches JupyterLab and connect to it via VS Code. To do so, please follow the instructions below. Any parts of the scripts that might need adjustments are marked with the keyword &amp;quot;@params&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
=== Simple Use Case ===&lt;br /&gt;
The most basic steps are to set a password for JupyterLab, start a job which runs JupyterLab, get the connection details from the output log and connect to it locally. The following instructions explain these steps and provide an additional script that replaces the manual step of looking into the output file.&lt;br /&gt;
&lt;br /&gt;
# Load a python module and set a password on the cluster for JupyterLab:&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
    module load devel/miniforge&lt;br /&gt;
    jupyter notebook --generate-config&lt;br /&gt;
    jupyter notebook password&lt;br /&gt;
  &amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Define a batch script to start a JupyterLab Job. Please adjust the first part according to your needs and your specific cluster.&lt;br /&gt;
#: &amp;lt;pre&amp;gt;~/jupyterlab.slurm&amp;lt;/pre&amp;gt;&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
#SBATCH --partition=cpu-single&lt;br /&gt;
#SBATCH --job-name=jupyterlab&lt;br /&gt;
#SBATCH --time=00:10:00&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --cpus-per-task 1&lt;br /&gt;
#SBATCH --mail-user=my_email_address #my_email_address # to use this generic version, add &amp;quot;alias my_email_address=&amp;lt;yourEmailAddress&amp;gt;&amp;quot; to the ~/.bashrc file&lt;br /&gt;
#SBATCH --mail-type=ALL&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
# @param: change this to your preferred python or conda module&lt;br /&gt;
module load devel/miniforge&lt;br /&gt;
&lt;br /&gt;
# @param: cluster address for ssh connection&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
PORT=$(( ( RANDOM % 9999 )  + 1024 ))&lt;br /&gt;
jupyter lab --no-browser --ip=0.0.0.0 --port=${PORT}&lt;br /&gt;
HOSTID=$(squeue -h -o &amp;quot;%A %N %j&amp;quot; | grep jupyterlab | awk &#039;{print $2}&#039;)&lt;br /&gt;
echo &amp;quot;Connect&amp;quot;&lt;br /&gt;
echo &amp;quot;ssh -N -L ${PORT}:${HOSTID}:${PORT} ${USER}@$hostAddress&amp;quot;&lt;br /&gt;
echo &amp;quot;Job {$SLURM_JOB_ID} running on node {$SLURM_NODEID} on host {$HOSTID}.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
returned_code=$?&lt;br /&gt;
echo &amp;quot;&amp;gt; Script completed with exit code ${returned_code}&amp;quot;&lt;br /&gt;
exit ${returned_code}&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Run a wrapper script to execute the batch script and extract needed information from the slurm output file. You could save it together with other utility scripts in a &amp;quot;bin&amp;quot; directory in your home folder.&lt;br /&gt;
#: &amp;lt;pre&amp;gt;./bin/run_jupyterlab_simple.sh&amp;lt;/pre&amp;gt;&lt;br /&gt;
#: &amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
&lt;br /&gt;
# Define parameters&lt;br /&gt;
jobscript=~/jupyterlab.slurm&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
# Run job&lt;br /&gt;
job_id=$(sbatch $jobscript | awk &#039;{print $4}&#039;)&lt;br /&gt;
echo &amp;quot;jobid: $job_id&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Outfile name&lt;br /&gt;
slurm_out=slurm-${job_id}.out&lt;br /&gt;
&lt;br /&gt;
# Wait for output file&lt;br /&gt;
while [ ! -f $slurm_out ]; do   &lt;br /&gt;
    sleep 2; &lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# Wait until url is written in output file&lt;br /&gt;
while [ -z ${url} ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
    url=$(grep -o &#039;http[^ ]*&#039; $slurm_out | head -n 1); &lt;br /&gt;
    done&lt;br /&gt;
&lt;br /&gt;
# Extract hostID and port from output. The pattern assumes a node name with a length of 6 characters and a port with a length of 3, 4 or 5 numbers.&lt;br /&gt;
url_pattern=&amp;quot;http://([a-z0-9]{6}):([0-9]{3,5})/lab&amp;quot;&lt;br /&gt;
if [[ $url =~ $url_pattern ]]; then &lt;br /&gt;
    hostID=${BASH_REMATCH[1]}&lt;br /&gt;
    port=${BASH_REMATCH[2]}&lt;br /&gt;
    echo &amp;quot;To connect with the JupyterLab kernel, please enter the following into your local commandline: &amp;quot;&lt;br /&gt;
    echo &amp;quot;ssh -N -L $port:$hostID:$port ${USER}@$hostAddress&amp;quot;; &lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Note: It is normal that the ssh command doesn&#039;t end after providing the credentials. Ending the command would mean ending the local connection to the kernel.&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Afterwards, you can use the URL&amp;quot;&lt;br /&gt;
    echo &amp;quot;  http://127.0.0.1:${port}/lab &amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;to:&amp;quot;&lt;br /&gt;
    echo &amp;quot;- use the kernel in VSCode (&#039;Existing Jupyter Server...&#039;, enter URL, enter password, confirm &#039;127.0.0.1&#039;, choose kernel) or &amp;quot;&lt;br /&gt;
    echo &amp;quot;- open JupyterLab in your browser with the URL&amp;quot;&lt;br /&gt;
else&lt;br /&gt;
    echo &amp;quot;The needed information couldn&#039;t be found in the slurm output. Please contact your support unit if you need help with fixing this problem.&amp;quot;&lt;br /&gt;
fi&lt;br /&gt;
# rm $slurm_out&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
# Follow the instructions on the commandline to connect to the Jupyter kernel from your local machine or the Helix login node. More detailed instructions can be found below. &lt;br /&gt;
&lt;br /&gt;
==== Connect to a running job ====&lt;br /&gt;
&lt;br /&gt;
The job runs on a specific compute node and port. With this information, you can create a ssh connection to it. But first, you need to decide, in which way you want to work with your python code. The options are: &lt;br /&gt;
&lt;br /&gt;
# The code is placed locally on your computer. &lt;br /&gt;
# The code is placed on the cluster and you&#039;ve mounted the folder locally. (= The files on the cluster are accessible from within your local VS Code)&lt;br /&gt;
# The code is placed on the cluster and you work on the cluster via a remote connection in VS Code. &lt;br /&gt;
&lt;br /&gt;
Depending on the use case, you need to execute the ssh command in a different place: &lt;br /&gt;
&lt;br /&gt;
# Open VS Code on your computer. &lt;br /&gt;
# Open VS Code on your computer. &lt;br /&gt;
# Open VS Code on your computer and connect to the cluster.&lt;br /&gt;
&lt;br /&gt;
Then open a terminal and execute the ssh command, which is given in the commandline output of the wrapper script. If the terminal isn&#039;t already open, go to menu item &amp;quot;Terminal&amp;quot; at the top of the window and choose &amp;quot;New Terminal&amp;quot; (or &amp;quot;new -&amp;gt; command prompt&amp;quot; on Windows). &lt;br /&gt;
It is normal that the command doesn&#039;t end after you&#039;ve put in your credentials. Leave the terminal open and go on with the next step. &lt;br /&gt;
&lt;br /&gt;
To use the jupyter kernel that is running on the cluster node, you need to connect this kernel. This is similar to connecting any other kernel: &lt;br /&gt;
&lt;br /&gt;
# Open your code file.&lt;br /&gt;
# Click &amp;quot;Select Kernel&amp;quot; in the upper right corner. &lt;br /&gt;
# Choose &amp;quot;Existing Jupyter Server...&amp;quot;.&lt;br /&gt;
# Enter the URL that was given by the wrapper script. &lt;br /&gt;
# Enter your JupyterLab password that you set in the first step of these instructions.&lt;br /&gt;
# Confirm the prefilled value &amp;quot;127.0.0.1&amp;quot; by pressing Enter.&lt;br /&gt;
# Choose one of the virtual environments that you&#039;ve created on the cluster. You should see all python environments. To see the conda environments as well, you need to [[Helix/bwVisu/JupyterLab#Python_version | register them as ipykernel]] first. &lt;br /&gt;
&lt;br /&gt;
=== Complex Use Case ===&lt;br /&gt;
If you have different use cases for juypterlab, you could use a more flexible wrapper script, for example: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;./bin/run_jupyterlab.sh&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Starts a jupyter kernel on a node and provides information on how to connect to it locally.&lt;br /&gt;
# If you have only one use case and therefore need only one combination of slurm settings for your jupyter jobs, then you can use the simpler script.&lt;br /&gt;
# This script supports explorative analyses by allowing to overwrite parameters via commandline.&lt;br /&gt;
# Different job configurations can be defined in advance and then used with a given short name (cpu, gpu,...).&lt;br /&gt;
&lt;br /&gt;
programname=$0&lt;br /&gt;
function help {&lt;br /&gt;
    &#039;&#039;&#039;help text&#039;&#039;&#039;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;Starts a jupyterlab kernel&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;usage example: $programname --param_set cpu&amp;quot;&lt;br /&gt;
    echo &amp;quot;&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --param_set string   name of the parameter set&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (examples: cpu, gpu)&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --jobscript string   optional, path of batch script&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (default: ~/jupyterlab.slurm)&amp;quot;&lt;br /&gt;
    echo &amp;quot;  --slurm_out string   optional, name of slurm output file&amp;quot;&lt;br /&gt;
    echo &amp;quot;                          (default: slurm-${job_id}.out)&amp;quot;&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# These parameters are set later in the script. Providing them via commandline, overwrites their values set in the script.&lt;br /&gt;
jobscript=None&lt;br /&gt;
slurm_out=None&lt;br /&gt;
&lt;br /&gt;
# Process parameters&lt;br /&gt;
while [ $# -gt 0 ]; do&lt;br /&gt;
    if [[ $1 == &amp;quot;--help&amp;quot; ]]; then&lt;br /&gt;
        help&lt;br /&gt;
        exit 0&lt;br /&gt;
    # when given -p as parameter, use its value for the variable param_set&lt;br /&gt;
    elif [[ $1 == &amp;quot;-p&amp;quot; ]]; then&lt;br /&gt;
        param_set=&amp;quot;$2&amp;quot;&lt;br /&gt;
        shift&lt;br /&gt;
    elif [[ $1 == &amp;quot;--&amp;quot;* ]]; then&lt;br /&gt;
        v=&amp;quot;${1/--/}&amp;quot;&lt;br /&gt;
        declare &amp;quot;$v&amp;quot;=&amp;quot;$2&amp;quot;&lt;br /&gt;
        shift&lt;br /&gt;
    fi&lt;br /&gt;
    shift&lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
function define_param_set(){&lt;br /&gt;
    &#039;&#039;&#039;Define parameter sets for sbatch&#039;&#039;&#039;&lt;br /&gt;
    # Define different sets&lt;br /&gt;
    cpu=(--partition=cpu-single --mem=2gb)&lt;br /&gt;
    gpu=(--partition=gpu-single --mem=3gb --gres=gpu:1)&lt;br /&gt;
&lt;br /&gt;
    param_set=${1}&lt;br /&gt;
    param_set=$param_set[@] &lt;br /&gt;
    param_set=(&amp;quot;${!param_set}&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
    # Add params that are the same for all sets&lt;br /&gt;
    param_set+=(--ntasks=1)&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# @param: jobscript, name of the slurm batch script to execute&lt;br /&gt;
if  [ &amp;quot;$jobscript&amp;quot; = &amp;quot;None&amp;quot; ]; then&lt;br /&gt;
    jobscript=~/jupyterlab.slurm&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# @param: cluster address for ssh connection&lt;br /&gt;
hostAddress=helix.bwservices.uni-heidelberg.de&lt;br /&gt;
&lt;br /&gt;
# Translate given param_set value to actual set of parameters &lt;br /&gt;
define_param_set $param_set&lt;br /&gt;
echo &amp;quot;param_set: ${param_set[*]}&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Run job&lt;br /&gt;
job_id=$(sbatch ${param_set[@]} $jobscript | awk &#039;{print $4}&#039;)&lt;br /&gt;
echo &amp;quot;jobid: $job_id&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# @param: slurm_out, the filename for the slurm output file&lt;br /&gt;
if  [ &amp;quot;$slurm_out&amp;quot; = &amp;quot;None&amp;quot; ]; then&lt;br /&gt;
    slurm_out=slurm-${job_id}.out&lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
# Wait for output file&lt;br /&gt;
while [ ! -f $slurm_out ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
done&lt;br /&gt;
&lt;br /&gt;
# Wait until url is written in output file&lt;br /&gt;
while [ -z ${url} ]; do   &lt;br /&gt;
    sleep 1; &lt;br /&gt;
    url=$(grep -o &#039;http[^ ]*&#039; $slurm_out | head -n 1); &lt;br /&gt;
    done&lt;br /&gt;
&lt;br /&gt;
# Extract hostID and port from output.&lt;br /&gt;
url_pattern=&amp;quot;http://([a-z0-9]{6}):([0-9]{3,5})/lab&amp;quot;&lt;br /&gt;
if [[ $url =~ $url_pattern ]]; then &lt;br /&gt;
    hostID=${BASH_REMATCH[1]}&lt;br /&gt;
    port=${BASH_REMATCH[2]}&lt;br /&gt;
    echo &amp;quot;To connect with the JupyterLab kernel, please enter the following into your local commandline: &amp;quot;&lt;br /&gt;
    echo &amp;quot;ssh -N -L $port:$hostID:$port ${USER}@$hostAddress&amp;quot;; &lt;br /&gt;
fi&lt;br /&gt;
&lt;br /&gt;
echo &amp;quot;Afterwards, you can either&amp;quot;&lt;br /&gt;
echo &amp;quot;- use the kernel in VSCode or &amp;quot;&lt;br /&gt;
echo &amp;quot;- open JupyterLab with this URL: &amp;quot;&lt;br /&gt;
echo &amp;quot;  http://127.0.0.1:${port}/lab &amp;quot;&lt;br /&gt;
echo &amp;quot;Note: It is normal that the ssh command doesn&#039;t end after providing the credentials. Ending the command would mean ending the local connection to the kernel.&amp;quot;&lt;br /&gt;
#rm $slurm_out&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Login&amp;diff=15854</id>
		<title>BwUniCluster3.0/Login</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Login&amp;diff=15854"/>
		<updated>2026-03-20T05:55:00Z</updated>

		<summary type="html">&lt;p&gt;S Braun: /* Login with SSH command (Linux, Mac, Windows) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
Access to bwUniCluster 3.0 is &#039;&#039;&#039;limited to IP addresses from the BelWü network&#039;&#039;&#039;.&lt;br /&gt;
All home institutions of our current users are connected to BelWü, so if you are on your campus network (e.g. in your office or on the Campus WiFi) you should be able to connect to bwUniCluster 3.0 without restrictions.&lt;br /&gt;
If you are outside one of the BelWü networks (e.g. at home), a VPN connection to the home institution or a connection to an SSH jump host at the home institution must be established first.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The login nodes of the bwHPC clusters are the access point to the compute system, your &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; directory and your workspaces.&lt;br /&gt;
All users must log in through these nodes to submit jobs to the cluster.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Prerequisites for successful login:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
You need to have&lt;br /&gt;
# Completed the 3-step [[registration|&#039;&#039;&#039;registration&#039;&#039;&#039;]] procedure.&lt;br /&gt;
# Set a [[Registration/Password|&#039;&#039;&#039;service password&#039;&#039;&#039;]] for bwUniCluster 3.0.&lt;br /&gt;
# Set up a [[Registration/2FA|&#039;&#039;&#039;second factor&#039;&#039;&#039;]] for the time-based one-time password (TOTP).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Login to the bwUniCluster =&lt;br /&gt;
&lt;br /&gt;
Login to the bwUniCluster 3.0 is only possible with a Secure Shell (SSH) client for which you must know your username on the cluster and the hostname of the login nodes.&lt;br /&gt;
For more general information on SSH clients, visit the [[BwUniCluster3.0/Login/Client|SSH Clients Guide]].&lt;br /&gt;
&lt;br /&gt;
== Username ==&lt;br /&gt;
&lt;br /&gt;
If you want to use the bwUniCluster 3.0 you need to add a prefix to your local username.&lt;br /&gt;
&lt;br /&gt;
For prefixes please refer to the [[Registration/Login/Username#Prefix_for_Universities|prefix table]].&lt;br /&gt;
&lt;br /&gt;
Examples:&amp;lt;br/&amp;gt;&lt;br /&gt;
* If your local username for the University is &amp;lt;code&amp;gt;ab123&amp;lt;/code&amp;gt; and you are a user from the University of Freiburg this would combine to: &amp;lt;code&amp;gt;fr_ab123&amp;lt;/code&amp;gt;.&lt;br /&gt;
* If your KIT username is &amp;lt;code&amp;gt;ab1234&amp;lt;/code&amp;gt; and you are a user from KIT this would combine to: &amp;lt;code&amp;gt;ka_ab1234&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Hostnames ==&lt;br /&gt;
&lt;br /&gt;
The system has two login nodes.&lt;br /&gt;
The selection of the login node is done automatically.&lt;br /&gt;
If you are logging in multiple times, different sessions might run on different login nodes.&lt;br /&gt;
&lt;br /&gt;
Login to bwUniCluster 3.0:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Hostname !! Node type&lt;br /&gt;
|-&lt;br /&gt;
| &#039;&#039;&#039;uc3.scc.kit.edu&#039;&#039;&#039;          || login to one of the two login nodes&lt;br /&gt;
|-&lt;br /&gt;
| &#039;&#039;&#039;bwunicluster.scc.kit.edu&#039;&#039;&#039; || login to one of the two login nodes&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
With the launch of bwUniCluster 3.0, &#039;&#039;&#039;bwunicluster.scc.kit.edu&#039;&#039;&#039; no longer points to &#039;&#039;&#039;uc2.scc.kit.edu&#039;&#039;&#039; but to &#039;&#039;&#039;uc3.scc.kit.edu&#039;&#039;&#039;. In order to remove the warnings from your SSH client, you can delete the old hostkey as follows: &amp;lt;code&amp;gt;ssh-keygen -R bwunicluster&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
For the sake of simplicity, &#039;&#039;&#039;we recommend using uc3.scc.kit.edu as the server address&#039;&#039;&#039;: &amp;lt;code&amp;gt;ssh prefix_&amp;lt;username&amp;gt;@uc3.scc.kit.edu&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Until 06.07.2025 the login to bwUniCluster 2.0 is possible analogously via &amp;lt;code&amp;gt;ssh &amp;lt;username&amp;gt;@uc2.scc.kit.edu&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
In general, you should use automatic selection to allow us to balance the load over the three login nodes.&lt;br /&gt;
If you need to connect to specific login nodes, you can use the following hostnames:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Hostname !! Node type&lt;br /&gt;
|-&lt;br /&gt;
| &#039;&#039;&#039;uc3-login1.scc.kit.edu&#039;&#039;&#039; || bwUniCluster 3.0 first login node&lt;br /&gt;
|-&lt;br /&gt;
| &#039;&#039;&#039;uc3-login2.scc.kit.edu&#039;&#039;&#039; || bwUniCluster 3.0 second login node&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Host Keys ==&lt;br /&gt;
&lt;br /&gt;
When you log in, you may receive the message &amp;lt;code&amp;gt;The authenticity of host &#039;&amp;lt;host address&amp;gt;&#039; can&#039;t be established.&amp;lt;/code&amp;gt; along with the host key fingerprint. This is intended so you can verify the authenticity of the host you are connecting to. Before you continue you should verify, if this fingerprint matches one of the following:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Algorithm !! Fingerprint (SHA256)&lt;br /&gt;
|-&lt;br /&gt;
| &#039;&#039;&#039;RSA&#039;&#039;&#039; || SHA256:RaE0/tqQMMBmJuDCIo3WZ38YJsz0godVyt6aUOk/E0M&lt;br /&gt;
|-&lt;br /&gt;
| &#039;&#039;&#039;ECDSA&#039;&#039;&#039; || SHA256:LjBYL/x86ZAlL0JdlXrCmPYXvS3DaSiMuvycojBMdwQ&lt;br /&gt;
|-&lt;br /&gt;
| &#039;&#039;&#039;ED25519&#039;&#039;&#039; || SHA256:5mZYEpKigwK5ibBMHRrh3WIkOtCqomJW6H7OMbPk3ec&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Login with SSH command (Linux, Mac, Windows) ==&lt;br /&gt;
&lt;br /&gt;
Linux, Mac OS, other Unix-like operating systems and Microsoft Windows come with a built-in SSH client, most likely provided by the OpenSSH project.&lt;br /&gt;
&lt;br /&gt;
For login use one of the following ssh commands:&lt;br /&gt;
&lt;br /&gt;
 ssh -l &amp;lt;username&amp;gt; uc3.scc.kit.edu&lt;br /&gt;
 ssh &amp;lt;username&amp;gt;@bwunicluster.scc.kit.edu&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
To run graphical applications, you can use the &amp;lt;code&amp;gt;-X&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;-Y&amp;lt;/code&amp;gt; flag to &amp;lt;code&amp;gt;ssh&amp;lt;/code&amp;gt;:&lt;br /&gt;
&lt;br /&gt;
 ssh -Y -l &amp;lt;username&amp;gt; bwunicluster.scc.kit.edu&lt;br /&gt;
&lt;br /&gt;
For better performance, we recommend using [[VNC]].&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Login with graphical SSH client (Windows) ==&lt;br /&gt;
&lt;br /&gt;
For Windows we suggest using  [[Data_Transfer/Graphical_Clients#MobaXterm|MobaXterm]] for login and file transfer.&lt;br /&gt;
 &lt;br /&gt;
Start &#039;&#039;MobaXterm&#039;&#039;, fill in the following fields:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Remote name              : uc3.scc.kit.edu    # or bwunicluster.scc.kit.edu&lt;br /&gt;
Specify user name        : &amp;lt;username&amp;gt;&lt;br /&gt;
Port                     : 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After that click on &#039;ok&#039;. Then a terminal will be opened and there you can enter your credentials.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; When using File transfer with MobaXterm version 23.6 the following configuration change has to be made:&lt;br /&gt;
In the settings in the tab &amp;quot;SSH&amp;quot;, change the option &amp;quot;SSH engine&amp;quot; from &amp;quot;&amp;lt;new&amp;gt;&amp;quot; to &amp;quot;&amp;lt;legacy&amp;gt;&amp;quot;. Then restart MobaXterm&lt;br /&gt;
&lt;br /&gt;
== Login with Jupyterhub ==&lt;br /&gt;
&lt;br /&gt;
Login takes place at:&lt;br /&gt;
* bwUniCluster 3.0: [https://uc3-jupyter.scc.kit.edu uc3-jupyter.scc.kit.edu]&lt;br /&gt;
* SDIL: [https://sdil-jupyter.scc.kit.edu sdil-jupyter.scc.kit.edu]&lt;br /&gt;
&lt;br /&gt;
More Information can be found [[BwUniCluster3.0/Jupyter#Login_process|here]].&lt;br /&gt;
&lt;br /&gt;
== Login Example ==&lt;br /&gt;
&lt;br /&gt;
To log in to bwUniCluster 3.0, you must provide your [[Registration/Password|service password]].&lt;br /&gt;
Proceed as follows:&lt;br /&gt;
# Use SSH for a login node.&lt;br /&gt;
# The system will ask for a one-time password &amp;lt;code&amp;gt;Your OTP:&amp;lt;/code&amp;gt;. Please enter your OTP and confirm it with Enter/Return. If you do not have a second factor yet, please create one (see [[Registration/2FA]]).&lt;br /&gt;
# The system will ask you for your service password &amp;lt;code&amp;gt;Password:&amp;lt;/code&amp;gt;. Please enter it and confirm it with Enter/Return. If you do not have a service password yet or have forgotten it, please create one (see [[Registration/Password]]).&lt;br /&gt;
# You will be greeted by the cluster, followed by a shell.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
[user@client ~]$ ssh ka_ab1234@uc3.scc.kit.edu&lt;br /&gt;
(ka_ab1234@uc3.scc.kit.edu) Your OTP: cccccctlljdbrjdleujigivvfnkjbucudugjjlutfbrk&lt;br /&gt;
(ka_ab1234@uc3.scc.kit.edu) Password: &lt;br /&gt;
********************************************************************************&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*                   Karlsruher Institut für Technologie (KIT)                  *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*                       Scientific Computing Center (SCC)                      *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*                            _    _    _____   ____                            *&lt;br /&gt;
*                           | |  | |  / ____| |___ \                           *&lt;br /&gt;
*                           | |  | | | |        __) |                          *&lt;br /&gt;
*                           | |  | | | |       |__ &amp;lt;                           *&lt;br /&gt;
*                           | |__| | | |____   ___) |                          *&lt;br /&gt;
*                            \____/   \_____| |____/                           *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*                  (KITE 2.0, RHEL 9.4, Lustre 2.14.0_ddn154)                  *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
********************************************************************************&lt;br /&gt;
Last login: Wed Feb 26 11:08:20 2025 from 2a00:1398:4:181c:2be1:437b:1c36:1337&lt;br /&gt;
&lt;br /&gt;
[ka_ab1234@uc3n990 ~]$&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Troubleshooting ==&lt;br /&gt;
&lt;br /&gt;
See [[BwUniCluster3.0/FAQ#Login|bwUniCluster FAQ]].&lt;br /&gt;
&lt;br /&gt;
= Allowed Activities on Login Nodes =&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#ffa500; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#ffa500; text-align:left&amp;quot;|&lt;br /&gt;
To guarantee usability for all the users of clusters you must not run your compute jobs on the login nodes.&lt;br /&gt;
Compute jobs must be submitted to the queuing system.&amp;lt;br/&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Any compute job running on the login nodes will be terminated without any notice.&#039;&#039;&#039;&amp;lt;br/&amp;gt;&lt;br /&gt;
Any long-running compilation or any long-running pre- or post-processing of batch jobs must also be submitted to the queuing system.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
The login nodes of the bwHPC clusters are the access point to the compute system, your &amp;lt;code&amp;gt;$HOME&amp;lt;/code&amp;gt; directory and your workspaces.&lt;br /&gt;
These nodes are shared with all the users therefore, your activities on the login nodes are limited to primarily set up your batch jobs.&lt;br /&gt;
Your activities may also be:&lt;br /&gt;
* &#039;&#039;&#039;short&#039;&#039;&#039; compilation of your program code and&lt;br /&gt;
* &#039;&#039;&#039;light weight&#039;&#039;&#039; pre- and post-processing of your batch jobs.&lt;br /&gt;
&lt;br /&gt;
We advise users to use [[BwUniCluster3.0/Batch_Queues#Interactive_Jobs|interactive jobs]] for compute and memory intensive tasks like compiling.&lt;br /&gt;
&lt;br /&gt;
= Related Information =&lt;br /&gt;
&lt;br /&gt;
* If you want to reset your service password, consult the [[Registration/Password|Password Guide]].&lt;br /&gt;
* If you want to register a new token for the two factor authentication (2FA), consult the [[Registration/2FA|2FA Guide]].&lt;br /&gt;
* If you want to de-register, consult the [[Registration/Deregistration|De-registration Guide]].&lt;br /&gt;
* If you need an SSH key for your workflow, read [[Registration/SSH|Registering SSH Keys with your Cluster]].&lt;br /&gt;
* Configuring your shell: [[.bashrc Do&#039;s and Don&#039;ts]]&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15844</id>
		<title>BwUniCluster3.0</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15844"/>
		<updated>2026-03-18T16:58:35Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## Picture of bwUniCluster - right side  ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## About bwUniCluster                    ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwUniCluster 3.0+KIT-GFA-HPC 3&#039;&#039;&#039; is the joint high-performance computer system of Baden-Württemberg&#039;s Universities and Universities of Applied Sciences for &#039;&#039;&#039;general purpose and teaching&#039;&#039;&#039; and located at the Scientific Computing Center (SCC) at Karlsruhe Institute of Technology (KIT). The bwUniCluster 3.0 complements the four bwForClusters and their dedicated scientific areas.&lt;br /&gt;
[[File:DSCF6485_rectangled_perspective.jpg|center|600px|frameless|alt=bwUniCluster3.0 |upright=1| bwUniCluster 3.0 ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Maintenance Section     ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no upcoming maintenance&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | SOLVED: Service Incident Notice: bwUniCluster 3.0 Login Not Possible&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
The login issue with bwUniCluster 3.0, which had been occurring since Friday, March 13, 2026, at 10:00 p.m., has been resolved.&lt;br /&gt;
&lt;br /&gt;
The cause was a software error in the parallel file system, which has since been successfully corrected.&lt;br /&gt;
A patch developed for us by the manufacturer has been applied. However, we would like to point out that we cannot currently completely rule out the possibility that the problem may recur under certain circumstances.&lt;br /&gt;
&lt;br /&gt;
You can now log in as usual. Please check the results of your calculations and resubmit any jobs that were interrupted. &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: News section            ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no news&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Transition bwUniCluster 2.0 &amp;amp;rarr; bwUniCluster 3.0&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
&lt;br /&gt;
The HPC cluster bwUniCluster 3.0 is the successor of bwUniCluster 2.0. It features accelerated and CPU-only nodes, with the host system of both node types consisting of classic x86 processor architectures.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
To ensure that you can use the new system successfully and set up your working environment with ease, the following points should be noted.&lt;br /&gt;
&lt;br /&gt;
== Registration ==&lt;br /&gt;
All users who already have an entitlement on bwUniCluster 2.0 are authorized to access bwUniCluster 3.0. The user only needs to &#039;&#039;&#039;register for the new service&#039;&#039;&#039; at https://bwidm.scc.kit.edu .&lt;br /&gt;
&lt;br /&gt;
== Changes ==&lt;br /&gt;
&lt;br /&gt;
Hardware, software and the operating system have been updated and adapted to the latest standards. We would like to draw your attention in particular to the changes in policy, which must also be taken into account.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Changes to hardware, software and policy can be looked up here: [[BwUniCluster3.0/Data_Migration_Guide#Summary_of_changes|Summary of Changes]]&lt;br /&gt;
&lt;br /&gt;
== Migration ==&lt;br /&gt;
bwUniCluster 3.0 features a completely new file system. &#039;&#039;&#039;There is no automatic migration of user data!&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The file systems of the old system and the login nodes will remain in operation for a period of &#039;&#039;&#039;3 months&#039;&#039;&#039; after the new system goes live (till July 6, 2025).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
In order to move data that is still needed, user software, and user specific settings from the old HOME directory to the new HOME directory, or to new workspaces, instructions are provided here: [[BwUniCluster3.0/Data_Migration_Guide#Migration_of_Data|Data Migration Guide]]&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Training/Support section##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[BwUniCluster3.0/Getting_Started|Getting Started]]&lt;br /&gt;
* [https://training.bwhpc.de E-Learning Courses]&lt;br /&gt;
* [[BwUniCluster3.0/Support|Support]]&lt;br /&gt;
* [[BwUniCluster3.0/FAQ|FAQ]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: User Documentation      ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Access: [[Registration/bwUniCluster|Registration]], [[Registration/Deregistration|Deregistration]], [[BwUniCluster3.0/Policies|Policies]]&lt;br /&gt;
* [[BwUniCluster3.0/Login|Login]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Client|SSH Clients]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Data_Transfer|Data Transfer]]&lt;br /&gt;
* [[BwUniCluster3.0/Hardware_and_Architecture|Hardware and Architecture]]&lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#Compute_resources|Compute Resources]] &lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#File_Systems|File Systems]] &lt;br /&gt;
* [[BwUniCluster3.0/Software|Cluster Specific Software]]&lt;br /&gt;
** [[BwUniCluster3.0/Containers|Using Containers]]&lt;br /&gt;
* [[BwUniCluster3.0/Running_Jobs|Runnning Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Batch_Jobs:_sbatch|Running Batch Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Interactive_Jobs:_salloc|Running Interactive Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Jupyter|Interactive Computing with Jupyter]]&lt;br /&gt;
* [[BwUniCluster3.0/Maintenance|Operational Changes]]&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Acknowledgement         ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please [[BwUniCluster3.0/Acknowledgement|acknowledge]] bwUniCluster 3.0 in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15832</id>
		<title>BwUniCluster3.0</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15832"/>
		<updated>2026-03-17T08:08:32Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## Picture of bwUniCluster - right side  ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## About bwUniCluster                    ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwUniCluster 3.0+KIT-GFA-HPC 3&#039;&#039;&#039; is the joint high-performance computer system of Baden-Württemberg&#039;s Universities and Universities of Applied Sciences for &#039;&#039;&#039;general purpose and teaching&#039;&#039;&#039; and located at the Scientific Computing Center (SCC) at Karlsruhe Institute of Technology (KIT). The bwUniCluster 3.0 complements the four bwForClusters and their dedicated scientific areas.&lt;br /&gt;
[[File:DSCF6485_rectangled_perspective.jpg|center|600px|frameless|alt=bwUniCluster3.0 |upright=1| bwUniCluster 3.0 ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Maintenance Section     ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no upcoming maintenance&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Service Incident Notice: bwUniCluster 3.0 Login Not Possible&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
We are currently experiencing an issue on bwUniCluster 3.0 that prevents users from logging in. The disruption is caused by a software error in the filesystem.&lt;br /&gt;
Our team is working intensively to resolve the problem, in close collaboration with the system’s manufacturer. At this time, we are unable to provide an exact estimate for when the issue will be fully resolved.&lt;br /&gt;
&lt;br /&gt;
We do not expect a long‑term outage; therefore, any workspaces that may have expired during the disruption should be easily restorable using ws_restore.&lt;br /&gt;
&amp;lt;!-- Please see the [[BwUniCluster3.0/Maintenance|maintenance]] page for more information about planned upgrades and other changes --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
We will inform you via the mailing list as soon as there are any updates.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: News section            ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no news&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Transition bwUniCluster 2.0 &amp;amp;rarr; bwUniCluster 3.0&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
&lt;br /&gt;
The HPC cluster bwUniCluster 3.0 is the successor of bwUniCluster 2.0. It features accelerated and CPU-only nodes, with the host system of both node types consisting of classic x86 processor architectures.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
To ensure that you can use the new system successfully and set up your working environment with ease, the following points should be noted.&lt;br /&gt;
&lt;br /&gt;
== Registration ==&lt;br /&gt;
All users who already have an entitlement on bwUniCluster 2.0 are authorized to access bwUniCluster 3.0. The user only needs to &#039;&#039;&#039;register for the new service&#039;&#039;&#039; at https://bwidm.scc.kit.edu .&lt;br /&gt;
&lt;br /&gt;
== Changes ==&lt;br /&gt;
&lt;br /&gt;
Hardware, software and the operating system have been updated and adapted to the latest standards. We would like to draw your attention in particular to the changes in policy, which must also be taken into account.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Changes to hardware, software and policy can be looked up here: [[BwUniCluster3.0/Data_Migration_Guide#Summary_of_changes|Summary of Changes]]&lt;br /&gt;
&lt;br /&gt;
== Migration ==&lt;br /&gt;
bwUniCluster 3.0 features a completely new file system. &#039;&#039;&#039;There is no automatic migration of user data!&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The file systems of the old system and the login nodes will remain in operation for a period of &#039;&#039;&#039;3 months&#039;&#039;&#039; after the new system goes live (till July 6, 2025).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
In order to move data that is still needed, user software, and user specific settings from the old HOME directory to the new HOME directory, or to new workspaces, instructions are provided here: [[BwUniCluster3.0/Data_Migration_Guide#Migration_of_Data|Data Migration Guide]]&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Training/Support section##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[BwUniCluster3.0/Getting_Started|Getting Started]]&lt;br /&gt;
* [https://training.bwhpc.de E-Learning Courses]&lt;br /&gt;
* [[BwUniCluster3.0/Support|Support]]&lt;br /&gt;
* [[BwUniCluster3.0/FAQ|FAQ]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: User Documentation      ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Access: [[Registration/bwUniCluster|Registration]], [[Registration/Deregistration|Deregistration]], [[BwUniCluster3.0/Policies|Policies]]&lt;br /&gt;
* [[BwUniCluster3.0/Login|Login]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Client|SSH Clients]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Data_Transfer|Data Transfer]]&lt;br /&gt;
* [[BwUniCluster3.0/Hardware_and_Architecture|Hardware and Architecture]]&lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#Compute_resources|Compute Resources]] &lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#File_Systems|File Systems]] &lt;br /&gt;
* [[BwUniCluster3.0/Software|Cluster Specific Software]]&lt;br /&gt;
** [[BwUniCluster3.0/Containers|Using Containers]]&lt;br /&gt;
* [[BwUniCluster3.0/Running_Jobs|Runnning Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Batch_Jobs:_sbatch|Running Batch Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Interactive_Jobs:_salloc|Running Interactive Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Jupyter|Interactive Computing with Jupyter]]&lt;br /&gt;
* [[BwUniCluster3.0/Maintenance|Operational Changes]]&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Acknowledgement         ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please [[BwUniCluster3.0/Acknowledgement|acknowledge]] bwUniCluster 3.0 in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15831</id>
		<title>BwUniCluster3.0</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15831"/>
		<updated>2026-03-17T08:08:22Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## Picture of bwUniCluster - right side  ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## About bwUniCluster                    ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwUniCluster 3.0+KIT-GFA-HPC 3&#039;&#039;&#039; is the joint high-performance computer system of Baden-Württemberg&#039;s Universities and Universities of Applied Sciences for &#039;&#039;&#039;general purpose and teaching&#039;&#039;&#039; and located at the Scientific Computing Center (SCC) at Karlsruhe Institute of Technology (KIT). The bwUniCluster 3.0 complements the four bwForClusters and their dedicated scientific areas.&lt;br /&gt;
[[File:DSCF6485_rectangled_perspective.jpg|center|600px|frameless|alt=bwUniCluster3.0 |upright=1| bwUniCluster 3.0 ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Maintenance Section     ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no upcoming maintenance&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Service Incident Notice: bwUniCluster 3.0 Login Not Possible&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
We are currently experiencing an issue on bwUniCluster 3.0 that prevents users from logging in. The disruption is caused by a software error in the filesystem.&lt;br /&gt;
Our team is working intensively to resolve the problem, in close collaboration with the system’s manufacturer. At this time, we are unable to provide an exact estimate for when the issue will be fully resolved.&lt;br /&gt;
&lt;br /&gt;
We do not expect a long‑term outage; therefore, any workspaces that may have expired during the disruption should be easily restorable using ws_restore.&lt;br /&gt;
&amp;lt;!-- Please see the [[BwUniCluster3.0/Maintenance|maintenance]] page for more information about planned upgrades and other changes --&amp;gt;&lt;br /&gt;
We will inform you via the mailing list as soon as there are any updates.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: News section            ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no news&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Transition bwUniCluster 2.0 &amp;amp;rarr; bwUniCluster 3.0&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
&lt;br /&gt;
The HPC cluster bwUniCluster 3.0 is the successor of bwUniCluster 2.0. It features accelerated and CPU-only nodes, with the host system of both node types consisting of classic x86 processor architectures.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
To ensure that you can use the new system successfully and set up your working environment with ease, the following points should be noted.&lt;br /&gt;
&lt;br /&gt;
== Registration ==&lt;br /&gt;
All users who already have an entitlement on bwUniCluster 2.0 are authorized to access bwUniCluster 3.0. The user only needs to &#039;&#039;&#039;register for the new service&#039;&#039;&#039; at https://bwidm.scc.kit.edu .&lt;br /&gt;
&lt;br /&gt;
== Changes ==&lt;br /&gt;
&lt;br /&gt;
Hardware, software and the operating system have been updated and adapted to the latest standards. We would like to draw your attention in particular to the changes in policy, which must also be taken into account.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Changes to hardware, software and policy can be looked up here: [[BwUniCluster3.0/Data_Migration_Guide#Summary_of_changes|Summary of Changes]]&lt;br /&gt;
&lt;br /&gt;
== Migration ==&lt;br /&gt;
bwUniCluster 3.0 features a completely new file system. &#039;&#039;&#039;There is no automatic migration of user data!&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The file systems of the old system and the login nodes will remain in operation for a period of &#039;&#039;&#039;3 months&#039;&#039;&#039; after the new system goes live (till July 6, 2025).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
In order to move data that is still needed, user software, and user specific settings from the old HOME directory to the new HOME directory, or to new workspaces, instructions are provided here: [[BwUniCluster3.0/Data_Migration_Guide#Migration_of_Data|Data Migration Guide]]&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Training/Support section##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[BwUniCluster3.0/Getting_Started|Getting Started]]&lt;br /&gt;
* [https://training.bwhpc.de E-Learning Courses]&lt;br /&gt;
* [[BwUniCluster3.0/Support|Support]]&lt;br /&gt;
* [[BwUniCluster3.0/FAQ|FAQ]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: User Documentation      ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Access: [[Registration/bwUniCluster|Registration]], [[Registration/Deregistration|Deregistration]], [[BwUniCluster3.0/Policies|Policies]]&lt;br /&gt;
* [[BwUniCluster3.0/Login|Login]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Client|SSH Clients]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Data_Transfer|Data Transfer]]&lt;br /&gt;
* [[BwUniCluster3.0/Hardware_and_Architecture|Hardware and Architecture]]&lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#Compute_resources|Compute Resources]] &lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#File_Systems|File Systems]] &lt;br /&gt;
* [[BwUniCluster3.0/Software|Cluster Specific Software]]&lt;br /&gt;
** [[BwUniCluster3.0/Containers|Using Containers]]&lt;br /&gt;
* [[BwUniCluster3.0/Running_Jobs|Runnning Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Batch_Jobs:_sbatch|Running Batch Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Interactive_Jobs:_salloc|Running Interactive Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Jupyter|Interactive Computing with Jupyter]]&lt;br /&gt;
* [[BwUniCluster3.0/Maintenance|Operational Changes]]&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Acknowledgement         ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please [[BwUniCluster3.0/Acknowledgement|acknowledge]] bwUniCluster 3.0 in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15830</id>
		<title>BwUniCluster3.0</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15830"/>
		<updated>2026-03-17T08:08:07Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## Picture of bwUniCluster - right side  ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## About bwUniCluster                    ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwUniCluster 3.0+KIT-GFA-HPC 3&#039;&#039;&#039; is the joint high-performance computer system of Baden-Württemberg&#039;s Universities and Universities of Applied Sciences for &#039;&#039;&#039;general purpose and teaching&#039;&#039;&#039; and located at the Scientific Computing Center (SCC) at Karlsruhe Institute of Technology (KIT). The bwUniCluster 3.0 complements the four bwForClusters and their dedicated scientific areas.&lt;br /&gt;
[[File:DSCF6485_rectangled_perspective.jpg|center|600px|frameless|alt=bwUniCluster3.0 |upright=1| bwUniCluster 3.0 ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Maintenance Section     ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no upcoming maintenance&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Service Incident Notice: bwUniCluster 3.0 Login Not Possible&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
We are currently experiencing an issue on bwUniCluster 3.0 that prevents users from logging in. The disruption is caused by a software error in the filesystem.&lt;br /&gt;
Our team is working intensively to resolve the problem, in close collaboration with the system’s manufacturer. At this time, we are unable to provide an exact estimate for when the issue will be fully resolved.&lt;br /&gt;
&lt;br /&gt;
We do not expect a long‑term outage; therefore, any workspaces that may have expired during the disruption should be easily restorable using ws_restore.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Please see the [[BwUniCluster3.0/Maintenance|maintenance]] page for more information about planned upgrades and other changes --&amp;gt;&lt;br /&gt;
We will inform you via the mailing list as soon as there are any updates.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: News section            ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no news&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Transition bwUniCluster 2.0 &amp;amp;rarr; bwUniCluster 3.0&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
&lt;br /&gt;
The HPC cluster bwUniCluster 3.0 is the successor of bwUniCluster 2.0. It features accelerated and CPU-only nodes, with the host system of both node types consisting of classic x86 processor architectures.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
To ensure that you can use the new system successfully and set up your working environment with ease, the following points should be noted.&lt;br /&gt;
&lt;br /&gt;
== Registration ==&lt;br /&gt;
All users who already have an entitlement on bwUniCluster 2.0 are authorized to access bwUniCluster 3.0. The user only needs to &#039;&#039;&#039;register for the new service&#039;&#039;&#039; at https://bwidm.scc.kit.edu .&lt;br /&gt;
&lt;br /&gt;
== Changes ==&lt;br /&gt;
&lt;br /&gt;
Hardware, software and the operating system have been updated and adapted to the latest standards. We would like to draw your attention in particular to the changes in policy, which must also be taken into account.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Changes to hardware, software and policy can be looked up here: [[BwUniCluster3.0/Data_Migration_Guide#Summary_of_changes|Summary of Changes]]&lt;br /&gt;
&lt;br /&gt;
== Migration ==&lt;br /&gt;
bwUniCluster 3.0 features a completely new file system. &#039;&#039;&#039;There is no automatic migration of user data!&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The file systems of the old system and the login nodes will remain in operation for a period of &#039;&#039;&#039;3 months&#039;&#039;&#039; after the new system goes live (till July 6, 2025).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
In order to move data that is still needed, user software, and user specific settings from the old HOME directory to the new HOME directory, or to new workspaces, instructions are provided here: [[BwUniCluster3.0/Data_Migration_Guide#Migration_of_Data|Data Migration Guide]]&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Training/Support section##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[BwUniCluster3.0/Getting_Started|Getting Started]]&lt;br /&gt;
* [https://training.bwhpc.de E-Learning Courses]&lt;br /&gt;
* [[BwUniCluster3.0/Support|Support]]&lt;br /&gt;
* [[BwUniCluster3.0/FAQ|FAQ]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: User Documentation      ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Access: [[Registration/bwUniCluster|Registration]], [[Registration/Deregistration|Deregistration]], [[BwUniCluster3.0/Policies|Policies]]&lt;br /&gt;
* [[BwUniCluster3.0/Login|Login]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Client|SSH Clients]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Data_Transfer|Data Transfer]]&lt;br /&gt;
* [[BwUniCluster3.0/Hardware_and_Architecture|Hardware and Architecture]]&lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#Compute_resources|Compute Resources]] &lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#File_Systems|File Systems]] &lt;br /&gt;
* [[BwUniCluster3.0/Software|Cluster Specific Software]]&lt;br /&gt;
** [[BwUniCluster3.0/Containers|Using Containers]]&lt;br /&gt;
* [[BwUniCluster3.0/Running_Jobs|Runnning Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Batch_Jobs:_sbatch|Running Batch Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Interactive_Jobs:_salloc|Running Interactive Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Jupyter|Interactive Computing with Jupyter]]&lt;br /&gt;
* [[BwUniCluster3.0/Maintenance|Operational Changes]]&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Acknowledgement         ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please [[BwUniCluster3.0/Acknowledgement|acknowledge]] bwUniCluster 3.0 in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15829</id>
		<title>BwUniCluster3.0</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15829"/>
		<updated>2026-03-17T08:07:24Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## Picture of bwUniCluster - right side  ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## About bwUniCluster                    ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwUniCluster 3.0+KIT-GFA-HPC 3&#039;&#039;&#039; is the joint high-performance computer system of Baden-Württemberg&#039;s Universities and Universities of Applied Sciences for &#039;&#039;&#039;general purpose and teaching&#039;&#039;&#039; and located at the Scientific Computing Center (SCC) at Karlsruhe Institute of Technology (KIT). The bwUniCluster 3.0 complements the four bwForClusters and their dedicated scientific areas.&lt;br /&gt;
[[File:DSCF6485_rectangled_perspective.jpg|center|600px|frameless|alt=bwUniCluster3.0 |upright=1| bwUniCluster 3.0 ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Maintenance Section     ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no upcoming maintenance&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Service Incident Notice: bwUniCluster 3.0 Login Not Possible&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
We are currently experiencing an issue on bwUniCluster 3.0 that prevents users from logging in. The disruption is caused by a software error in the filesystem.&lt;br /&gt;
Our team is working intensively to resolve the problem, in close collaboration with the system’s manufacturer. At this time, we are unable to provide an exact estimate for when the issue will be fully resolved.&lt;br /&gt;
&lt;br /&gt;
We do not expect a long‑term outage; therefore, any workspaces that may have expired during the disruption should be easily restorable using ws_restore.&lt;br /&gt;
&lt;br /&gt;
We will keep you updated as soon as new information becomes available.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Please see the [[BwUniCluster3.0/Maintenance|maintenance]] page for more information about planned upgrades and other changes --&amp;gt;&lt;br /&gt;
We will inform you via the mailing list as soon as there are any updates.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: News section            ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no news&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Transition bwUniCluster 2.0 &amp;amp;rarr; bwUniCluster 3.0&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
&lt;br /&gt;
The HPC cluster bwUniCluster 3.0 is the successor of bwUniCluster 2.0. It features accelerated and CPU-only nodes, with the host system of both node types consisting of classic x86 processor architectures.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
To ensure that you can use the new system successfully and set up your working environment with ease, the following points should be noted.&lt;br /&gt;
&lt;br /&gt;
== Registration ==&lt;br /&gt;
All users who already have an entitlement on bwUniCluster 2.0 are authorized to access bwUniCluster 3.0. The user only needs to &#039;&#039;&#039;register for the new service&#039;&#039;&#039; at https://bwidm.scc.kit.edu .&lt;br /&gt;
&lt;br /&gt;
== Changes ==&lt;br /&gt;
&lt;br /&gt;
Hardware, software and the operating system have been updated and adapted to the latest standards. We would like to draw your attention in particular to the changes in policy, which must also be taken into account.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Changes to hardware, software and policy can be looked up here: [[BwUniCluster3.0/Data_Migration_Guide#Summary_of_changes|Summary of Changes]]&lt;br /&gt;
&lt;br /&gt;
== Migration ==&lt;br /&gt;
bwUniCluster 3.0 features a completely new file system. &#039;&#039;&#039;There is no automatic migration of user data!&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The file systems of the old system and the login nodes will remain in operation for a period of &#039;&#039;&#039;3 months&#039;&#039;&#039; after the new system goes live (till July 6, 2025).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
In order to move data that is still needed, user software, and user specific settings from the old HOME directory to the new HOME directory, or to new workspaces, instructions are provided here: [[BwUniCluster3.0/Data_Migration_Guide#Migration_of_Data|Data Migration Guide]]&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Training/Support section##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[BwUniCluster3.0/Getting_Started|Getting Started]]&lt;br /&gt;
* [https://training.bwhpc.de E-Learning Courses]&lt;br /&gt;
* [[BwUniCluster3.0/Support|Support]]&lt;br /&gt;
* [[BwUniCluster3.0/FAQ|FAQ]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: User Documentation      ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Access: [[Registration/bwUniCluster|Registration]], [[Registration/Deregistration|Deregistration]], [[BwUniCluster3.0/Policies|Policies]]&lt;br /&gt;
* [[BwUniCluster3.0/Login|Login]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Client|SSH Clients]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Data_Transfer|Data Transfer]]&lt;br /&gt;
* [[BwUniCluster3.0/Hardware_and_Architecture|Hardware and Architecture]]&lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#Compute_resources|Compute Resources]] &lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#File_Systems|File Systems]] &lt;br /&gt;
* [[BwUniCluster3.0/Software|Cluster Specific Software]]&lt;br /&gt;
** [[BwUniCluster3.0/Containers|Using Containers]]&lt;br /&gt;
* [[BwUniCluster3.0/Running_Jobs|Runnning Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Batch_Jobs:_sbatch|Running Batch Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Interactive_Jobs:_salloc|Running Interactive Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Jupyter|Interactive Computing with Jupyter]]&lt;br /&gt;
* [[BwUniCluster3.0/Maintenance|Operational Changes]]&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Acknowledgement         ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please [[BwUniCluster3.0/Acknowledgement|acknowledge]] bwUniCluster 3.0 in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15735</id>
		<title>BwUniCluster3.0</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15735"/>
		<updated>2026-02-18T14:38:35Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## Picture of bwUniCluster - right side  ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## About bwUniCluster                    ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwUniCluster 3.0+KIT-GFA-HPC 3&#039;&#039;&#039; is the joint high-performance computer system of Baden-Württemberg&#039;s Universities and Universities of Applied Sciences for &#039;&#039;&#039;general purpose and teaching&#039;&#039;&#039; and located at the Scientific Computing Center (SCC) at Karlsruhe Institute of Technology (KIT). The bwUniCluster 3.0 complements the four bwForClusters and their dedicated scientific areas.&lt;br /&gt;
[[File:DSCF6485_rectangled_perspective.jpg|center|600px|frameless|alt=bwUniCluster3.0 |upright=1| bwUniCluster 3.0 ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Maintenance Section     ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no upcoming maintenance&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Maintenance&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
Due to extensive work on the electrical installation, the HPC system bwUniCluster 3.0 and all other HPC services will be unavailable from&lt;br /&gt;
&lt;br /&gt;
09.02.2026 at 06:00 AM until 18.02.2026&lt;br /&gt;
&lt;br /&gt;
Please see the [[BwUniCluster3.0/Maintenance|maintenance]] page for more information about planned upgrades and other changes&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: News section            ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no news&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Transition bwUniCluster 2.0 &amp;amp;rarr; bwUniCluster 3.0&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
&lt;br /&gt;
The HPC cluster bwUniCluster 3.0 is the successor of bwUniCluster 2.0. It features accelerated and CPU-only nodes, with the host system of both node types consisting of classic x86 processor architectures.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
To ensure that you can use the new system successfully and set up your working environment with ease, the following points should be noted.&lt;br /&gt;
&lt;br /&gt;
== Registration ==&lt;br /&gt;
All users who already have an entitlement on bwUniCluster 2.0 are authorized to access bwUniCluster 3.0. The user only needs to &#039;&#039;&#039;register for the new service&#039;&#039;&#039; at https://bwidm.scc.kit.edu .&lt;br /&gt;
&lt;br /&gt;
== Changes ==&lt;br /&gt;
&lt;br /&gt;
Hardware, software and the operating system have been updated and adapted to the latest standards. We would like to draw your attention in particular to the changes in policy, which must also be taken into account.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Changes to hardware, software and policy can be looked up here: [[BwUniCluster3.0/Data_Migration_Guide#Summary_of_changes|Summary of Changes]]&lt;br /&gt;
&lt;br /&gt;
== Migration ==&lt;br /&gt;
bwUniCluster 3.0 features a completely new file system. &#039;&#039;&#039;There is no automatic migration of user data!&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The file systems of the old system and the login nodes will remain in operation for a period of &#039;&#039;&#039;3 months&#039;&#039;&#039; after the new system goes live (till July 6, 2025).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
In order to move data that is still needed, user software, and user specific settings from the old HOME directory to the new HOME directory, or to new workspaces, instructions are provided here: [[BwUniCluster3.0/Data_Migration_Guide#Migration_of_Data|Data Migration Guide]]&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Training/Support section##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[BwUniCluster3.0/Getting_Started|Getting Started]]&lt;br /&gt;
* [https://training.bwhpc.de E-Learning Courses]&lt;br /&gt;
* [[BwUniCluster3.0/Support|Support]]&lt;br /&gt;
* [[BwUniCluster3.0/FAQ|FAQ]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: User Documentation      ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Access: [[Registration/bwUniCluster|Registration]], [[Registration/Deregistration|Deregistration]], [[BwUniCluster3.0/Policies|Policies]]&lt;br /&gt;
* [[BwUniCluster3.0/Login|Login]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Client|SSH Clients]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Data_Transfer|Data Transfer]]&lt;br /&gt;
* [[BwUniCluster3.0/Hardware_and_Architecture|Hardware and Architecture]]&lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#Compute_resources|Compute Resources]] &lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#File_Systems|File Systems]] &lt;br /&gt;
* [[BwUniCluster3.0/Software|Cluster Specific Software]]&lt;br /&gt;
** [[BwUniCluster3.0/Containers|Using Containers]]&lt;br /&gt;
* [[BwUniCluster3.0/Running_Jobs|Runnning Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Batch_Jobs:_sbatch|Running Batch Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Interactive_Jobs:_salloc|Running Interactive Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Jupyter|Interactive Computing with Jupyter]]&lt;br /&gt;
* [[BwUniCluster3.0/Maintenance|Operational Changes]]&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Acknowledgement         ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please [[BwUniCluster3.0/Acknowledgement|acknowledge]] bwUniCluster 3.0 in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Maintenance&amp;diff=15717</id>
		<title>BwUniCluster3.0/Maintenance</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Maintenance&amp;diff=15717"/>
		<updated>2026-02-09T08:52:54Z</updated>

		<summary type="html">&lt;p&gt;S Braun: /* Maintenance records of bwUniCluster 3.0 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Maintenance records of bwUniCluster 3.0 ===&lt;br /&gt;
&#039;&#039;&#039;2026&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Minor updates regarding drivers, and the kernel.&lt;br /&gt;
&lt;br /&gt;
=== Maintenance records of retired bwUniCluster 2.0 ===&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2024&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2024-05]] from 21.05.2024 to 24.05.2024&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2023&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2023-03]] from 20.03.2023 to 24.03.2023&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2022&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2022-11]] from 07.11.2022 to 10.11.2022&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2022-03]] from 28.03.2022 to 31.03.2022&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2021&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2021-10]] from 11.10.2021 to 15.10.2021&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2020&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2020-10]] from 06.10.2020 to 13.10.2020&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Maintenance records of retired bwUniCluster 1.0 ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2019&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster/Maintenance/2019-02]] from 02.02.2019 to 08.02.2019&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2017&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster/Maintenance/2017-05]] from 02.05.2017 to 02.05.2017&lt;br /&gt;
* [[BwUniCluster/Maintenance/2017-03]] from 20.03.2017 to 21.03.2017&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2016&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster/Maintenance/2016-10]] from 17.10.2016 to 21.10.2016&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Maintenance&amp;diff=15716</id>
		<title>BwUniCluster3.0/Maintenance</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Maintenance&amp;diff=15716"/>
		<updated>2026-02-09T08:52:45Z</updated>

		<summary type="html">&lt;p&gt;S Braun: /* Maintenance records of bwUniCluster 3.0 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Maintenance records of bwUniCluster 3.0 ===&lt;br /&gt;
&#039;&#039;&#039;2026&#039;&#039;&#039;&lt;br /&gt;
Minor updates regarding drivers, and the kernel.&lt;br /&gt;
&lt;br /&gt;
=== Maintenance records of retired bwUniCluster 2.0 ===&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2024&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2024-05]] from 21.05.2024 to 24.05.2024&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2023&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2023-03]] from 20.03.2023 to 24.03.2023&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2022&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2022-11]] from 07.11.2022 to 10.11.2022&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2022-03]] from 28.03.2022 to 31.03.2022&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2021&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2021-10]] from 11.10.2021 to 15.10.2021&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2020&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2020-10]] from 06.10.2020 to 13.10.2020&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Maintenance records of retired bwUniCluster 1.0 ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2019&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster/Maintenance/2019-02]] from 02.02.2019 to 08.02.2019&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2017&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster/Maintenance/2017-05]] from 02.05.2017 to 02.05.2017&lt;br /&gt;
* [[BwUniCluster/Maintenance/2017-03]] from 20.03.2017 to 21.03.2017&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2016&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster/Maintenance/2016-10]] from 17.10.2016 to 21.10.2016&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Maintenance&amp;diff=15715</id>
		<title>BwUniCluster3.0/Maintenance</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Maintenance&amp;diff=15715"/>
		<updated>2026-02-09T08:52:09Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=== Maintenance records of bwUniCluster 3.0 ===&lt;br /&gt;
&#039;&#039;&#039;2026&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Maintenance records of retired bwUniCluster 2.0 ===&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2024&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2024-05]] from 21.05.2024 to 24.05.2024&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2023&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2023-03]] from 20.03.2023 to 24.03.2023&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2022&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2022-11]] from 07.11.2022 to 10.11.2022&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2022-03]] from 28.03.2022 to 31.03.2022&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2021&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2021-10]] from 11.10.2021 to 15.10.2021&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2020&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster2.0/Maintenance/2020-10]] from 06.10.2020 to 13.10.2020&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Maintenance records of retired bwUniCluster 1.0 ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2019&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster/Maintenance/2019-02]] from 02.02.2019 to 08.02.2019&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2017&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster/Maintenance/2017-05]] from 02.05.2017 to 02.05.2017&lt;br /&gt;
* [[BwUniCluster/Maintenance/2017-03]] from 20.03.2017 to 21.03.2017&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;2016&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* [[BwUniCluster/Maintenance/2016-10]] from 17.10.2016 to 21.10.2016&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15714</id>
		<title>BwUniCluster3.0</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15714"/>
		<updated>2026-02-09T08:51:27Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## Picture of bwUniCluster - right side  ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## About bwUniCluster                    ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwUniCluster 3.0+KIT-GFA-HPC 3&#039;&#039;&#039; is the joint high-performance computer system of Baden-Württemberg&#039;s Universities and Universities of Applied Sciences for &#039;&#039;&#039;general purpose and teaching&#039;&#039;&#039; and located at the Scientific Computing Center (SCC) at Karlsruhe Institute of Technology (KIT). The bwUniCluster 3.0 complements the four bwForClusters and their dedicated scientific areas.&lt;br /&gt;
[[File:DSCF6485_rectangled_perspective.jpg|center|600px|frameless|alt=bwUniCluster3.0 |upright=1| bwUniCluster 3.0 ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Maintenance Section     ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no upcoming maintenance&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Maintenance&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
Due to extensive work on the electrical installation, the HPC system bwUniCluster 3.0 and all other HPC services will be unavailable from&lt;br /&gt;
&lt;br /&gt;
09.02.2026 at 06:00 AM until 18.02.2026&lt;br /&gt;
&lt;br /&gt;
Please see the [[BwUniCluster3.0/Maintenance|maintenance]] page for more information about planned upgrades and other changes&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: News section            ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no news&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Transition bwUniCluster 2.0 &amp;amp;rarr; bwUniCluster 3.0&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
&lt;br /&gt;
The HPC cluster bwUniCluster 3.0 is the successor of bwUniCluster 2.0. It features accelerated and CPU-only nodes, with the host system of both node types consisting of classic x86 processor architectures.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
To ensure that you can use the new system successfully and set up your working environment with ease, the following points should be noted.&lt;br /&gt;
&lt;br /&gt;
== Registration ==&lt;br /&gt;
All users who already have an entitlement on bwUniCluster 2.0 are authorized to access bwUniCluster 3.0. The user only needs to &#039;&#039;&#039;register for the new service&#039;&#039;&#039; at https://bwidm.scc.kit.edu .&lt;br /&gt;
&lt;br /&gt;
== Changes ==&lt;br /&gt;
&lt;br /&gt;
Hardware, software and the operating system have been updated and adapted to the latest standards. We would like to draw your attention in particular to the changes in policy, which must also be taken into account.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Changes to hardware, software and policy can be looked up here: [[BwUniCluster3.0/Data_Migration_Guide#Summary_of_changes|Summary of Changes]]&lt;br /&gt;
&lt;br /&gt;
== Migration ==&lt;br /&gt;
bwUniCluster 3.0 features a completely new file system. &#039;&#039;&#039;There is no automatic migration of user data!&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The file systems of the old system and the login nodes will remain in operation for a period of &#039;&#039;&#039;3 months&#039;&#039;&#039; after the new system goes live (till July 6, 2025).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
In order to move data that is still needed, user software, and user specific settings from the old HOME directory to the new HOME directory, or to new workspaces, instructions are provided here: [[BwUniCluster3.0/Data_Migration_Guide#Migration_of_Data|Data Migration Guide]]&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Training/Support section##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[BwUniCluster3.0/Getting_Started|Getting Started]]&lt;br /&gt;
* [https://training.bwhpc.de E-Learning Courses]&lt;br /&gt;
* [[BwUniCluster3.0/Support|Support]]&lt;br /&gt;
* [[BwUniCluster3.0/FAQ|FAQ]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: User Documentation      ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Access: [[Registration/bwUniCluster|Registration]], [[Registration/Deregistration|Deregistration]], [[BwUniCluster3.0/Policies|Policies]]&lt;br /&gt;
* [[BwUniCluster3.0/Login|Login]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Client|SSH Clients]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Data_Transfer|Data Transfer]]&lt;br /&gt;
* [[BwUniCluster3.0/Hardware_and_Architecture|Hardware and Architecture]]&lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#Compute_resources|Compute Resources]] &lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#File_Systems|File Systems]] &lt;br /&gt;
* [[BwUniCluster3.0/Software|Cluster Specific Software]]&lt;br /&gt;
** [[BwUniCluster3.0/Containers|Using Containers]]&lt;br /&gt;
* [[BwUniCluster3.0/Running_Jobs|Runnning Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Batch_Jobs:_sbatch|Running Batch Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Interactive_Jobs:_salloc|Running Interactive Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Jupyter|Interactive Computing with Jupyter]]&lt;br /&gt;
* [[BwUniCluster3.0/Maintenance|Operational Changes]]&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Acknowledgement         ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please [[BwUniCluster3.0/Acknowledgement|acknowledge]] bwUniCluster 3.0 in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15713</id>
		<title>BwUniCluster3.0</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15713"/>
		<updated>2026-02-09T08:50:29Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## Picture of bwUniCluster - right side  ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## About bwUniCluster                    ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwUniCluster 3.0+KIT-GFA-HPC 3&#039;&#039;&#039; is the joint high-performance computer system of Baden-Württemberg&#039;s Universities and Universities of Applied Sciences for &#039;&#039;&#039;general purpose and teaching&#039;&#039;&#039; and located at the Scientific Computing Center (SCC) at Karlsruhe Institute of Technology (KIT). The bwUniCluster 3.0 complements the four bwForClusters and their dedicated scientific areas.&lt;br /&gt;
[[File:DSCF6485_rectangled_perspective.jpg|center|600px|frameless|alt=bwUniCluster3.0 |upright=1| bwUniCluster 3.0 ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Maintenance Section     ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no upcoming maintenance&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Maintenance&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
Due to extensive work on the electrical installation, the HPC system bwUniCluster 3.0 and all other HPC services will be unavailable from&lt;br /&gt;
&lt;br /&gt;
09.02.2026 at 06:00 AM until 18.02.2026&lt;br /&gt;
&lt;br /&gt;
Please see the [[BwUniCluster2.0/Maintenance/2024-05|maintenance]] page for more information about planned upgrades and other changes&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: News section            ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no news&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Transition bwUniCluster 2.0 &amp;amp;rarr; bwUniCluster 3.0&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
&lt;br /&gt;
The HPC cluster bwUniCluster 3.0 is the successor of bwUniCluster 2.0. It features accelerated and CPU-only nodes, with the host system of both node types consisting of classic x86 processor architectures.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
To ensure that you can use the new system successfully and set up your working environment with ease, the following points should be noted.&lt;br /&gt;
&lt;br /&gt;
== Registration ==&lt;br /&gt;
All users who already have an entitlement on bwUniCluster 2.0 are authorized to access bwUniCluster 3.0. The user only needs to &#039;&#039;&#039;register for the new service&#039;&#039;&#039; at https://bwidm.scc.kit.edu .&lt;br /&gt;
&lt;br /&gt;
== Changes ==&lt;br /&gt;
&lt;br /&gt;
Hardware, software and the operating system have been updated and adapted to the latest standards. We would like to draw your attention in particular to the changes in policy, which must also be taken into account.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Changes to hardware, software and policy can be looked up here: [[BwUniCluster3.0/Data_Migration_Guide#Summary_of_changes|Summary of Changes]]&lt;br /&gt;
&lt;br /&gt;
== Migration ==&lt;br /&gt;
bwUniCluster 3.0 features a completely new file system. &#039;&#039;&#039;There is no automatic migration of user data!&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The file systems of the old system and the login nodes will remain in operation for a period of &#039;&#039;&#039;3 months&#039;&#039;&#039; after the new system goes live (till July 6, 2025).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
In order to move data that is still needed, user software, and user specific settings from the old HOME directory to the new HOME directory, or to new workspaces, instructions are provided here: [[BwUniCluster3.0/Data_Migration_Guide#Migration_of_Data|Data Migration Guide]]&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Training/Support section##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[BwUniCluster3.0/Getting_Started|Getting Started]]&lt;br /&gt;
* [https://training.bwhpc.de E-Learning Courses]&lt;br /&gt;
* [[BwUniCluster3.0/Support|Support]]&lt;br /&gt;
* [[BwUniCluster3.0/FAQ|FAQ]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: User Documentation      ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Access: [[Registration/bwUniCluster|Registration]], [[Registration/Deregistration|Deregistration]], [[BwUniCluster3.0/Policies|Policies]]&lt;br /&gt;
* [[BwUniCluster3.0/Login|Login]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Client|SSH Clients]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Data_Transfer|Data Transfer]]&lt;br /&gt;
* [[BwUniCluster3.0/Hardware_and_Architecture|Hardware and Architecture]]&lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#Compute_resources|Compute Resources]] &lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#File_Systems|File Systems]] &lt;br /&gt;
* [[BwUniCluster3.0/Software|Cluster Specific Software]]&lt;br /&gt;
** [[BwUniCluster3.0/Containers|Using Containers]]&lt;br /&gt;
* [[BwUniCluster3.0/Running_Jobs|Runnning Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Batch_Jobs:_sbatch|Running Batch Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Interactive_Jobs:_salloc|Running Interactive Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Jupyter|Interactive Computing with Jupyter]]&lt;br /&gt;
* [[BwUniCluster3.0/Maintenance|Operational Changes]]&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Acknowledgement         ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please [[BwUniCluster3.0/Acknowledgement|acknowledge]] bwUniCluster 3.0 in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15712</id>
		<title>BwUniCluster3.0</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15712"/>
		<updated>2026-02-09T08:46:28Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## Picture of bwUniCluster - right side  ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## About bwUniCluster                    ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwUniCluster 3.0+KIT-GFA-HPC 3&#039;&#039;&#039; is the joint high-performance computer system of Baden-Württemberg&#039;s Universities and Universities of Applied Sciences for &#039;&#039;&#039;general purpose and teaching&#039;&#039;&#039; and located at the Scientific Computing Center (SCC) at Karlsruhe Institute of Technology (KIT). The bwUniCluster 3.0 complements the four bwForClusters and their dedicated scientific areas.&lt;br /&gt;
[[File:DSCF6485_rectangled_perspective.jpg|center|600px|frameless|alt=bwUniCluster3.0 |upright=1| bwUniCluster 3.0 ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Maintenance Section     ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no upcoming maintenance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Next maintenance&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
Due to regular maintenance work the HPC System bwUnicluster 2 will not be available from &lt;br /&gt;
&lt;br /&gt;
21.05.2024 at 08:30 AM until 24.05.2024 at 15:00 AM&lt;br /&gt;
&lt;br /&gt;
Please see the [[BwUniCluster2.0/Maintenance/2024-05|maintenance]] page for more information about planned upgrades and other changes&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: News section            ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no news&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Transition bwUniCluster 2.0 &amp;amp;rarr; bwUniCluster 3.0&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
&lt;br /&gt;
The HPC cluster bwUniCluster 3.0 is the successor of bwUniCluster 2.0. It features accelerated and CPU-only nodes, with the host system of both node types consisting of classic x86 processor architectures.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
To ensure that you can use the new system successfully and set up your working environment with ease, the following points should be noted.&lt;br /&gt;
&lt;br /&gt;
== Registration ==&lt;br /&gt;
All users who already have an entitlement on bwUniCluster 2.0 are authorized to access bwUniCluster 3.0. The user only needs to &#039;&#039;&#039;register for the new service&#039;&#039;&#039; at https://bwidm.scc.kit.edu .&lt;br /&gt;
&lt;br /&gt;
== Changes ==&lt;br /&gt;
&lt;br /&gt;
Hardware, software and the operating system have been updated and adapted to the latest standards. We would like to draw your attention in particular to the changes in policy, which must also be taken into account.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Changes to hardware, software and policy can be looked up here: [[BwUniCluster3.0/Data_Migration_Guide#Summary_of_changes|Summary of Changes]]&lt;br /&gt;
&lt;br /&gt;
== Migration ==&lt;br /&gt;
bwUniCluster 3.0 features a completely new file system. &#039;&#039;&#039;There is no automatic migration of user data!&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The file systems of the old system and the login nodes will remain in operation for a period of &#039;&#039;&#039;3 months&#039;&#039;&#039; after the new system goes live (till July 6, 2025).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
In order to move data that is still needed, user software, and user specific settings from the old HOME directory to the new HOME directory, or to new workspaces, instructions are provided here: [[BwUniCluster3.0/Data_Migration_Guide#Migration_of_Data|Data Migration Guide]]&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Training/Support section##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[BwUniCluster3.0/Getting_Started|Getting Started]]&lt;br /&gt;
* [https://training.bwhpc.de E-Learning Courses]&lt;br /&gt;
* [[BwUniCluster3.0/Support|Support]]&lt;br /&gt;
* [[BwUniCluster3.0/FAQ|FAQ]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: User Documentation      ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Access: [[Registration/bwUniCluster|Registration]], [[Registration/Deregistration|Deregistration]], [[BwUniCluster3.0/Policies|Policies]]&lt;br /&gt;
* [[BwUniCluster3.0/Login|Login]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Client|SSH Clients]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Data_Transfer|Data Transfer]]&lt;br /&gt;
* [[BwUniCluster3.0/Hardware_and_Architecture|Hardware and Architecture]]&lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#Compute_resources|Compute Resources]] &lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#File_Systems|File Systems]] &lt;br /&gt;
* [[BwUniCluster3.0/Software|Cluster Specific Software]]&lt;br /&gt;
** [[BwUniCluster3.0/Containers|Using Containers]]&lt;br /&gt;
* [[BwUniCluster3.0/Running_Jobs|Runnning Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Batch_Jobs:_sbatch|Running Batch Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Interactive_Jobs:_salloc|Running Interactive Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Jupyter|Interactive Computing with Jupyter]]&lt;br /&gt;
* [[BwUniCluster3.0/Maintenance|Operational Changes]]&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Acknowledgement         ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please [[BwUniCluster3.0/Acknowledgement|acknowledge]] bwUniCluster 3.0 in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15710</id>
		<title>BwUniCluster3.0/Running Jobs</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15710"/>
		<updated>2026-02-03T14:36:30Z</updated>

		<summary type="html">&lt;p&gt;S Braun: /* Interactive Computing with Jupyter */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
= Purpose and function of a queuing system =&lt;br /&gt;
&lt;br /&gt;
All compute activities on bwUniCluster 3.0 have to be performed on the compute nodes. Compute nodes are only available by requesting the corresponding resources via the queuing system. As soon as the requested resources are available, automated tasks are executed via a batch script or they can be accessed interactively.&amp;lt;br&amp;gt;&lt;br /&gt;
General procedure: Hint to [[Running_Calculations | Running Calculations]]&lt;br /&gt;
&lt;br /&gt;
== Job submission process ==&lt;br /&gt;
&lt;br /&gt;
bwUniCluster 3.0 has installed the workload managing software Slurm. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.&lt;br /&gt;
&lt;br /&gt;
== Slurm ==&lt;br /&gt;
&lt;br /&gt;
HPC Workload Manager on bwUniCluster 3.0 is Slurm.&lt;br /&gt;
Slurm is a cluster management and job scheduling system. Slurm has three key functions. &lt;br /&gt;
* It allocates access to resources (compute cores on nodes) to users for some duration of time so they can perform work. &lt;br /&gt;
* It provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. &lt;br /&gt;
* It arbitrates contention for resources by managing a queue of pending work.&lt;br /&gt;
&lt;br /&gt;
Any kind of calculation on the compute nodes of bwUniCluster 3.0 requires the user to define calculations as a sequence of commands together with required run time, number of CPU cores and main memory and submit all, i.e., the &#039;&#039;&#039;batch job&#039;&#039;&#039;, to a resource and workload managing software.&lt;br /&gt;
&lt;br /&gt;
== Terms and definitions ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Partitions &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Slurm manages job queues for different &#039;&#039;&#039;partitions&#039;&#039;&#039;. Partitions are used to group similar node types (e.g. nodes with and without accelerators) and to enforce different access policies and resource limits.&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different partitions:&lt;br /&gt;
&lt;br /&gt;
* CPU-only nodes&lt;br /&gt;
** 2-socket nodes, consisting of 2 Intel Ice Lake processors with 32 cores each or 2 AMD processors with 48 cores each&lt;br /&gt;
** 2-socket nodes with very high RAM capacity, consisting of 2 AMD processors with 48 cores each&lt;br /&gt;
* GPU-accelerated nodes&lt;br /&gt;
** 2-socket nodes with 4x NVIDIA A100 or 4x NVIDIA H100 GPUs&lt;br /&gt;
** 4-socket node with 4x AMD Instinct accelerator&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Queues &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Job &#039;&#039;&#039;queues&#039;&#039;&#039; are used to manage jobs that request access to shared but limited computing resources of a certain kind (partition).&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different main types of queues:&lt;br /&gt;
* Regular queues&lt;br /&gt;
** cpu: Jobs that request CPU-only nodes.&lt;br /&gt;
** gpu: Jobs that request GPU-accelerated nodes.&lt;br /&gt;
* Development queues (dev)&lt;br /&gt;
** Short, usually interactive jobs that are used for developing, compiling and testing code and workflows. The intention behind development queues is to provide users with immediate access to computer resources without having to wait. This is the place to realize instantaneous heavy compute without affecting other users, as would be the case on the login nodes.&lt;br /&gt;
&lt;br /&gt;
Requested compute resources such as (wall-)time, number of nodes and amount of memory are restricted and must fit into the boundaries imposed by the queues. The request for compute resources on the bwUniCluster 3.0 &amp;lt;font color=red&amp;gt;requires at least the specification of the &#039;&#039;&#039;queue&#039;&#039;&#039; and the &#039;&#039;&#039;time&#039;&#039;&#039;&amp;lt;/font&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Jobs &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Jobs can be run non-interactively as &#039;&#039;&#039;batch jobs&#039;&#039;&#039; or as &#039;&#039;&#039;interactive jobs&#039;&#039;&#039;.&amp;lt;br&amp;gt;&lt;br /&gt;
Submitting a batch job means, that all steps of a compute project are defined in a Bash script. This Bash script is queued and executed as soon as the compute resources are available and allocated. Jobs are enqueued with the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command.&lt;br /&gt;
For interactive jobs, the resources are requested with the &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; command. As soon as the computing resources are available and allocated, a command line prompt is returned on a computing node and the user can freely dispose of the resources now available to him.&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
&#039;&#039;&#039;Please remember:&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;Heavy computations are not allowed on the login nodes&#039;&#039;&#039;.&amp;lt;br&amp;gt;Use a developement or a regular job queue instead! Please refer to [[BwUniCluster3.0/Login#Allowed_Activities_on_Login_Nodes|Allowed Activities on Login Nodes]].&lt;br /&gt;
* &#039;&#039;&#039;Development queues&#039;&#039;&#039; are meant for &#039;&#039;&#039;development tasks&#039;&#039;&#039;.&amp;lt;br&amp;gt;Do not misuse this queue for regular, short-running jobs or chain jobs! Only one running job at a time is enabled. Maximum queue length is reduced to 3.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Queues on bwUniCluster 3.0 = &lt;br /&gt;
== Policy ==&lt;br /&gt;
&lt;br /&gt;
The computing time is provided in accordance with the &#039;&#039;&#039;fair share policy&#039;&#039;&#039;. The individual investment shares of the respective university and the resources already used by its members are taken into account. Furthermore, the following throttling policy is also active: The &#039;&#039;&#039;maximum amount of physical cores&#039;&#039;&#039; used at any given time from jobs running is &#039;&#039;&#039;1920 per user&#039;&#039;&#039; (aggregated over all running jobs). This number corresponds to 30 nodes on the Ice Lake partition or 20 nodes on the standard partition. The aim is to minimize waiting times and maximize the number of users who can access computing time at the same time.&lt;br /&gt;
&lt;br /&gt;
== Regular Queues ==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node-Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=30, mem=249600mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=20, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;highmem&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;High Memory&lt;br /&gt;
| mem-per-cpu=12090mb&lt;br /&gt;
| mem=380001mb&lt;br /&gt;
| time=72:00:00, nodes=4, mem=2300000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=72:00:00, nodes=12, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_mi300&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU node&amp;lt;br/&amp;gt;AMD GPU x4&lt;br /&gt;
| mem-per-gpu=128200mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=72:00:00, nodes=1, mem=510000mb, ntasks-per-node=40, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_il&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;gpu_h100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=48:00:00, nodes=9(A100)/nodes=5(H100) , mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 1: Regular Queues&lt;br /&gt;
&lt;br /&gt;
== Short Queues ==&lt;br /&gt;
&amp;lt;p style=&amp;quot;color:red; &amp;quot;&amp;gt;&amp;lt;b&amp;gt;Queues with a short runtime of 30 minutes.&amp;lt;/b&amp;gt;&amp;lt;/p&amp;gt; &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_short&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=94000mb&amp;lt;br/&amp;gt;cpus-per-gpu=12&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=12, mem=376000mb, ntasks-per-node=48, (threads-per-core=2)&lt;br /&gt;
|}&lt;br /&gt;
Table 2: Short Queues&lt;br /&gt;
&lt;br /&gt;
== Development Queues ==&lt;br /&gt;
Only for development, i.e. debugging or performance optimization ...&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=8, mem=249600mb, ntasks-per-node=64, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=1, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=1, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_a100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&amp;lt;br/&amp;gt;&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16 &lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=1, mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 3: Development Queues&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Default resources of a queue class define number of tasks and memory if not explicitly given with sbatch command. Resource list acronyms &#039;&#039;--time&#039;&#039;, &#039;&#039;--ntasks&#039;&#039;, &#039;&#039;--nodes&#039;&#039;, &#039;&#039;--mem&#039;&#039; and &#039;&#039;--mem-per-cpu&#039;&#039; are described [[BwUniCluster3.0/Running_Jobs/Slurm|here]].&lt;br /&gt;
&lt;br /&gt;
== Check available resources: sinfo_t_idle ==&lt;br /&gt;
The Slurm command sinfo is used to view partition and node information for a system running Slurm. It incorporates down time, reservations, and node state information in determining the available backfill window. The sinfo command can only be used by the administrator.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
SCC has prepared a special script (sinfo_t_idle) to find out how many processors are available for immediate use on the system. It is anticipated that users will use this information to submit jobs that meet these criteria and thus obtain quick job turnaround times. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The following command displays what resources are available for immediate use for the whole partition.&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sinfo_t_idle &lt;br /&gt;
Partition dev_cpu                 :      1 nodes idle&lt;br /&gt;
Partition cpu                     :      1 nodes idle&lt;br /&gt;
Partition highmem                 :      2 nodes idle&lt;br /&gt;
Partition dev_gpu_h100            :      0 nodes idle&lt;br /&gt;
Partition gpu_h100                :      0 nodes idle&lt;br /&gt;
Partition gpu_mi300               :      0 nodes idle&lt;br /&gt;
Partition dev_cpu_il              :      7 nodes idle&lt;br /&gt;
Partition cpu_il                  :      2 nodes idle&lt;br /&gt;
Partition dev_gpu_a100_il         :      1 nodes idle&lt;br /&gt;
Partition gpu_a100_il             :      0 nodes idle&lt;br /&gt;
Partition gpu_h100_il             :      1 nodes idle&lt;br /&gt;
Partition gpu_a100_short          :      0 nodes idle&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Running Jobs =&lt;br /&gt;
&lt;br /&gt;
== Slurm Commands (excerpt) ==&lt;br /&gt;
Important Slurm commands for non-administrators working on bwUniCluster 3.0.&lt;br /&gt;
{| width=850px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Brief explanation&lt;br /&gt;
|-&lt;br /&gt;
| [[#Batch Jobs: sbatch|sbatch]] || Submits a job and puts it into the queue [[https://slurm.schedmd.com/sbatch.html sbatch]] &lt;br /&gt;
|-&lt;br /&gt;
| [[#Interactive Jobs: salloc|salloc]] || Requests resources for an interactive Job [[https://slurm.schedmd.com/salloc.html salloc]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Monitor and manage jobs |scontrol show job]] || Displays detailed job state information [[https://slurm.schedmd.com/scontrol.html scontrol]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue]] || Displays information about active, eligible, blocked, and/or recently completed jobs [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue --start]] || Returns start time of submitted job [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Check available resources: sinfo_t_idle|sinfo_t_idle]] || Shows what resources are available for immediate use [[https://slurm.schedmd.com/sinfo.html sinfo]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Canceling own jobs : scancel|scancel]] || Cancels a job [[https://slurm.schedmd.com/scancel.html scancel]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
* [https://slurm.schedmd.com/tutorials.html  Slurm Tutorials]&lt;br /&gt;
* [https://slurm.schedmd.com/pdfs/summary.pdf  Slurm command/option summary (2 pages)]&lt;br /&gt;
* [https://slurm.schedmd.com/man_index.html  Slurm Commands]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Batch Jobs: sbatch ==&lt;br /&gt;
&lt;br /&gt;
Batch jobs are submitted by using the command &#039;&#039;&#039;sbatch&#039;&#039;&#039;. The main purpose of the &#039;&#039;&#039;sbatch&#039;&#039;&#039; command is to specify the resources that are needed to run the job. &#039;&#039;&#039;sbatch&#039;&#039;&#039; will then queue the batch job. However, starting of batch job depends on the availability of the requested resources and the fair sharing value.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The syntax and use of &#039;&#039;&#039;sbatch&#039;&#039;&#039; can be displayed via:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ man sbatch&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;sbatch&#039;&#039;&#039; options can be used from the command line or in your job script. Different defaults for some of these options are set based on the queue and can be found [[BwUniCluster3.0/Slurm | here]]&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;3&amp;quot; | sbatch Options&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;width:8%&amp;quot;| Command line&lt;br /&gt;
! style=&amp;quot;width:9%&amp;quot;| Script&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Purpose&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -t, --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| #SBATCH --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| Wall clock time limit.&amp;lt;br&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -N, --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of nodes to be used.&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -n, --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of tasks to be launched.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Maximum count of tasks per node.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -c, --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of CPUs required per (MPI-)task.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --gres=gpu:&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --gres=gpu:&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of GPUs required per node.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Memory in MegaByte per node. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Minimum Memory required per allocated CPU. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --exclusive&lt;br /&gt;
| #SBATCH --exclusive &lt;br /&gt;
| The job allocates all CPUs and GPUs on the nodes. It will not share the node with other running jobs&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| Notify user by email when certain event types occur.&amp;lt;br&amp;gt;Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
|  The specified mail-address receives email notification of state changes as defined by --mail-type.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job output is stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job error messages are stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -J, --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| Job name.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| #SBATCH --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| Identifies which environment variables from the submission environment are propagated to the launched application. Default is ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -A, --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| #SBATCH --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| Change resources used by this job to specified group. You may need this option if your account is assigned to more than one group. By command &amp;quot;scontrol show job&amp;quot; the project group the job is accounted on can be seen behind &amp;quot;Account=&amp;quot;. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -p, --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| #SBATCH --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| Request a specific queue for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| #SBATCH --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| Use a specific reservation for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;LSDF&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=LSDF&lt;br /&gt;
| Job constraint LSDF filesystems.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&lt;br /&gt;
| Job constraint BeeOND filesystem.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Interactive Jobs: salloc ==&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 you are only allowed to run short jobs (&amp;lt;&amp;lt; 1 hour) with little memory requirements (&amp;lt;&amp;lt; 8 GByte) on the logins nodes. If you want to run longer jobs and/or jobs with a request of more than 8 GByte of memory, you must allocate resources for so-called interactive jobs by usage of the command salloc on a login node. Considering a serial application running on a compute node that requires 5000 MByte of memory and limiting the interactive run to 2 hours the following command has to be executed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -n 1 -t 120 --mem=5000&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then you will get one core on a compute node within the partition &amp;quot;cpu&amp;quot;. After execution of this command &#039;&#039;&#039;DO NOT CLOSE&#039;&#039;&#039; your current terminal session but wait until the queueing system Slurm has granted you the requested resources on the compute system. You will be logged in automatically on the granted core! To run a serial program on the granted core you only have to type the name of the executable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ./&amp;lt;my_serial_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please be aware that your serial job must run less than 2 hours in this example, else the job will be killed during runtime by the system. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can also start now a graphical X11-terminal connecting you to the dedicated resource that is available for 2 hours. You can start it by the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ xterm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that, once the walltime limit has been reached the resources - i.e. the compute node - will automatically be revoked.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
An interactive parallel application running on one compute node or on many compute nodes (e.g. here 5 nodes) with 96 cores each requires usually an amount of memory in GByte (e.g. 50 GByte) and a maximum time (e.g. 1 hour). E.g. 5 nodes can be allocated by the following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -N 5 --ntasks-per-node=96 -t 01:00:00  --mem=50gb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now you can run parallel jobs on 480 cores requiring 50 GByte of memory per node. Please be aware that you will be logged in on core 0 of the first node.&lt;br /&gt;
If you want to have access to another node you have to open a new terminal, connect it also to bwUniCluster 3.0 and type the following commands to&lt;br /&gt;
connect to the running interactive job and then to a specific node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ srun --jobid=XXXXXXXX --pty /bin/bash&lt;br /&gt;
$ srun --nodelist=uc3nXXX --pty /bin/bash&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
the jobid and the nodelist can be shown.&lt;br /&gt;
&lt;br /&gt;
If you want to run MPI-programs, you can do it by simply typing mpirun &amp;lt;program_name&amp;gt;. Then your program will be run on 480 cores. A very simple example for starting a parallel job can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also start the debugger ddt by the commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module add devel/ddt&lt;br /&gt;
$ ddt &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above commands will execute the parallel program &amp;lt;my_mpi_program&amp;gt; on all available cores. You can also start parallel programs on a subset of cores; an example for this can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -n 50 &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you are using Intel MPI you must start &amp;lt;my_mpi_program&amp;gt; by the command mpiexec.hydra (instead of mpirun).&lt;br /&gt;
&lt;br /&gt;
== Monitor and manage jobs ==&lt;br /&gt;
&lt;br /&gt;
=== List of your submitted jobs : squeue ===&lt;br /&gt;
Displays information about YOUR active, pending and/or recently completed jobs. The command displays all own active and pending jobs. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;squeue&#039;&#039; example on bwUniCluster 3.0 &amp;lt;small&amp;gt;(Only your own jobs are displayed!)&amp;lt;/small&amp;gt;.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue &lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  R       8:15      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123 PD       0:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  R       2:41      1 uc3n084&lt;br /&gt;
$ squeue -l&lt;br /&gt;
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  RUNNING       8:55     20:00      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123  PENDING       0:00     20:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  RUNNING       3:21     20:00      1 uc3n084&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
If resources are not immediately available add &amp;lt;code&amp;gt;--start&amp;lt;/code&amp;gt; to show its expected start time:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;sh&amp;quot;&amp;gt;squeue --start&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Detailed job information : scontrol show job ===&lt;br /&gt;
scontrol show job displays detailed job state information and diagnostic output for all or a specified job of yours. Detailed information is available for active, pending and recently completed jobs. The command scontrol is explained in detail on the webpage https://slurm.schedmd.com/scontrol.html or via manpage (man scontrol). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of all your jobs in normal mode: scontrol show job&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of a job with &amp;lt;jobid&amp;gt; in normal mode: scontrol show job &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here is an example from bwUniCluster 3.0:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
1262       cpu     wrap ka_zs040  R       1:12      1 uc3n002&lt;br /&gt;
&lt;br /&gt;
$&lt;br /&gt;
$ # now, see what&#039;s up with my pending job with jobid 1262&lt;br /&gt;
$ &lt;br /&gt;
$ scontrol show job 1262&lt;br /&gt;
&lt;br /&gt;
JobId=1262 JobName=wrap&lt;br /&gt;
   UserId=ka_zs0402(241992) GroupId=ka_scc(12345) MCS_label=N/A&lt;br /&gt;
   Priority=4246 Nice=0 Account=ka QOS=normal&lt;br /&gt;
   JobState=RUNNING Reason=None Dependency=(null)&lt;br /&gt;
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0&lt;br /&gt;
   RunTime=00:00:37 TimeLimit=00:20:00 TimeMin=N/A&lt;br /&gt;
   SubmitTime=2025-04-04T10:01:30 EligibleTime=2025-04-04T10:01:30&lt;br /&gt;
   AccrueTime=2025-04-04T10:01:30&lt;br /&gt;
   StartTime=2025-04-04T10:01:31 EndTime=2025-04-04T10:21:31 Deadline=N/A&lt;br /&gt;
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-04-04T10:01:31 Scheduler=Main&lt;br /&gt;
   Partition=cpu AllocNode:Sid=uc3n999:2819841&lt;br /&gt;
   ReqNodeList=(null) ExcNodeList=(null)&lt;br /&gt;
   NodeList=uc3n002&lt;br /&gt;
   BatchHost=uc3n002&lt;br /&gt;
   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*&lt;br /&gt;
   ReqTRES=cpu=1,mem=2000M,node=1,billing=1&lt;br /&gt;
   AllocTRES=cpu=2,mem=4000M,node=1,billing=2&lt;br /&gt;
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*&lt;br /&gt;
   MinCPUsNode=1 MinMemoryCPU=2000M MinTmpDiskNode=0&lt;br /&gt;
   Features=(null) DelayBoot=00:00:00&lt;br /&gt;
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)&lt;br /&gt;
   Command=(null)&lt;br /&gt;
   WorkDir=/pfs/data6/home/ka/ka_scc/ka_zs0402&lt;br /&gt;
   StdErr=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
   StdIn=/dev/null&lt;br /&gt;
   StdOut=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Each request to the Slurm workload manager generates a load. &amp;lt;p style=&amp;quot;color:red;&amp;gt;&amp;lt;b&amp;gt;Therefore, do not use &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; with a simple &amp;lt;code&amp;gt;watch&amp;lt;/code&amp;gt;.&amp;lt;/b&amp;gt;&amp;lt;/p&amp;gt; The smallest allowed time interval is &amp;lt;b&amp;gt;30 seconds&amp;lt;/b&amp;gt;.&amp;lt;br&amp;gt;&lt;br /&gt;
Any violation of this rule will result in the task being terminated without notice.&lt;br /&gt;
&lt;br /&gt;
=== Canceling own jobs : scancel ===&lt;br /&gt;
The scancel command is used to cancel jobs. The command scancel is explained in detail on the webpage https://slurm.schedmd.com/scancel.html or via manpage (man scancel). The command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel [-i] &amp;lt;job-id&amp;gt;&lt;br /&gt;
$ scancel -t &amp;lt;job_state_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Slurm Options =&lt;br /&gt;
[[BwUniCluster3.0/Running_Jobs/Slurm | Detailed Slurm usage]]&lt;br /&gt;
&lt;br /&gt;
= Best Practices =&lt;br /&gt;
&lt;br /&gt;
== Step-by-Step example==&lt;br /&gt;
&lt;br /&gt;
== Dos and Don&#039;ts ==&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;| Do not run squeue and other slurm commands in loops or &amp;quot;watch&amp;quot; as not to saturate up the slurm daemon with rpc requests&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15709</id>
		<title>BwUniCluster3.0/Running Jobs</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15709"/>
		<updated>2026-02-03T14:35:52Z</updated>

		<summary type="html">&lt;p&gt;S Braun: /* Monitor and manage jobs */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
= Purpose and function of a queuing system =&lt;br /&gt;
&lt;br /&gt;
All compute activities on bwUniCluster 3.0 have to be performed on the compute nodes. Compute nodes are only available by requesting the corresponding resources via the queuing system. As soon as the requested resources are available, automated tasks are executed via a batch script or they can be accessed interactively.&amp;lt;br&amp;gt;&lt;br /&gt;
General procedure: Hint to [[Running_Calculations | Running Calculations]]&lt;br /&gt;
&lt;br /&gt;
== Job submission process ==&lt;br /&gt;
&lt;br /&gt;
bwUniCluster 3.0 has installed the workload managing software Slurm. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.&lt;br /&gt;
&lt;br /&gt;
== Slurm ==&lt;br /&gt;
&lt;br /&gt;
HPC Workload Manager on bwUniCluster 3.0 is Slurm.&lt;br /&gt;
Slurm is a cluster management and job scheduling system. Slurm has three key functions. &lt;br /&gt;
* It allocates access to resources (compute cores on nodes) to users for some duration of time so they can perform work. &lt;br /&gt;
* It provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. &lt;br /&gt;
* It arbitrates contention for resources by managing a queue of pending work.&lt;br /&gt;
&lt;br /&gt;
Any kind of calculation on the compute nodes of bwUniCluster 3.0 requires the user to define calculations as a sequence of commands together with required run time, number of CPU cores and main memory and submit all, i.e., the &#039;&#039;&#039;batch job&#039;&#039;&#039;, to a resource and workload managing software.&lt;br /&gt;
&lt;br /&gt;
== Terms and definitions ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Partitions &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Slurm manages job queues for different &#039;&#039;&#039;partitions&#039;&#039;&#039;. Partitions are used to group similar node types (e.g. nodes with and without accelerators) and to enforce different access policies and resource limits.&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different partitions:&lt;br /&gt;
&lt;br /&gt;
* CPU-only nodes&lt;br /&gt;
** 2-socket nodes, consisting of 2 Intel Ice Lake processors with 32 cores each or 2 AMD processors with 48 cores each&lt;br /&gt;
** 2-socket nodes with very high RAM capacity, consisting of 2 AMD processors with 48 cores each&lt;br /&gt;
* GPU-accelerated nodes&lt;br /&gt;
** 2-socket nodes with 4x NVIDIA A100 or 4x NVIDIA H100 GPUs&lt;br /&gt;
** 4-socket node with 4x AMD Instinct accelerator&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Queues &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Job &#039;&#039;&#039;queues&#039;&#039;&#039; are used to manage jobs that request access to shared but limited computing resources of a certain kind (partition).&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different main types of queues:&lt;br /&gt;
* Regular queues&lt;br /&gt;
** cpu: Jobs that request CPU-only nodes.&lt;br /&gt;
** gpu: Jobs that request GPU-accelerated nodes.&lt;br /&gt;
* Development queues (dev)&lt;br /&gt;
** Short, usually interactive jobs that are used for developing, compiling and testing code and workflows. The intention behind development queues is to provide users with immediate access to computer resources without having to wait. This is the place to realize instantaneous heavy compute without affecting other users, as would be the case on the login nodes.&lt;br /&gt;
&lt;br /&gt;
Requested compute resources such as (wall-)time, number of nodes and amount of memory are restricted and must fit into the boundaries imposed by the queues. The request for compute resources on the bwUniCluster 3.0 &amp;lt;font color=red&amp;gt;requires at least the specification of the &#039;&#039;&#039;queue&#039;&#039;&#039; and the &#039;&#039;&#039;time&#039;&#039;&#039;&amp;lt;/font&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Jobs &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Jobs can be run non-interactively as &#039;&#039;&#039;batch jobs&#039;&#039;&#039; or as &#039;&#039;&#039;interactive jobs&#039;&#039;&#039;.&amp;lt;br&amp;gt;&lt;br /&gt;
Submitting a batch job means, that all steps of a compute project are defined in a Bash script. This Bash script is queued and executed as soon as the compute resources are available and allocated. Jobs are enqueued with the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command.&lt;br /&gt;
For interactive jobs, the resources are requested with the &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; command. As soon as the computing resources are available and allocated, a command line prompt is returned on a computing node and the user can freely dispose of the resources now available to him.&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
&#039;&#039;&#039;Please remember:&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;Heavy computations are not allowed on the login nodes&#039;&#039;&#039;.&amp;lt;br&amp;gt;Use a developement or a regular job queue instead! Please refer to [[BwUniCluster3.0/Login#Allowed_Activities_on_Login_Nodes|Allowed Activities on Login Nodes]].&lt;br /&gt;
* &#039;&#039;&#039;Development queues&#039;&#039;&#039; are meant for &#039;&#039;&#039;development tasks&#039;&#039;&#039;.&amp;lt;br&amp;gt;Do not misuse this queue for regular, short-running jobs or chain jobs! Only one running job at a time is enabled. Maximum queue length is reduced to 3.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Queues on bwUniCluster 3.0 = &lt;br /&gt;
== Policy ==&lt;br /&gt;
&lt;br /&gt;
The computing time is provided in accordance with the &#039;&#039;&#039;fair share policy&#039;&#039;&#039;. The individual investment shares of the respective university and the resources already used by its members are taken into account. Furthermore, the following throttling policy is also active: The &#039;&#039;&#039;maximum amount of physical cores&#039;&#039;&#039; used at any given time from jobs running is &#039;&#039;&#039;1920 per user&#039;&#039;&#039; (aggregated over all running jobs). This number corresponds to 30 nodes on the Ice Lake partition or 20 nodes on the standard partition. The aim is to minimize waiting times and maximize the number of users who can access computing time at the same time.&lt;br /&gt;
&lt;br /&gt;
== Regular Queues ==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node-Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=30, mem=249600mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=20, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;highmem&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;High Memory&lt;br /&gt;
| mem-per-cpu=12090mb&lt;br /&gt;
| mem=380001mb&lt;br /&gt;
| time=72:00:00, nodes=4, mem=2300000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=72:00:00, nodes=12, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_mi300&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU node&amp;lt;br/&amp;gt;AMD GPU x4&lt;br /&gt;
| mem-per-gpu=128200mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=72:00:00, nodes=1, mem=510000mb, ntasks-per-node=40, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_il&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;gpu_h100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=48:00:00, nodes=9(A100)/nodes=5(H100) , mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 1: Regular Queues&lt;br /&gt;
&lt;br /&gt;
== Short Queues ==&lt;br /&gt;
&amp;lt;p style=&amp;quot;color:red; &amp;quot;&amp;gt;&amp;lt;b&amp;gt;Queues with a short runtime of 30 minutes.&amp;lt;/b&amp;gt;&amp;lt;/p&amp;gt; &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_short&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=94000mb&amp;lt;br/&amp;gt;cpus-per-gpu=12&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=12, mem=376000mb, ntasks-per-node=48, (threads-per-core=2)&lt;br /&gt;
|}&lt;br /&gt;
Table 2: Short Queues&lt;br /&gt;
&lt;br /&gt;
== Development Queues ==&lt;br /&gt;
Only for development, i.e. debugging or performance optimization ...&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=8, mem=249600mb, ntasks-per-node=64, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=1, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=1, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_a100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&amp;lt;br/&amp;gt;&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16 &lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=1, mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 3: Development Queues&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Default resources of a queue class define number of tasks and memory if not explicitly given with sbatch command. Resource list acronyms &#039;&#039;--time&#039;&#039;, &#039;&#039;--ntasks&#039;&#039;, &#039;&#039;--nodes&#039;&#039;, &#039;&#039;--mem&#039;&#039; and &#039;&#039;--mem-per-cpu&#039;&#039; are described [[BwUniCluster3.0/Running_Jobs/Slurm|here]].&lt;br /&gt;
&lt;br /&gt;
== Check available resources: sinfo_t_idle ==&lt;br /&gt;
The Slurm command sinfo is used to view partition and node information for a system running Slurm. It incorporates down time, reservations, and node state information in determining the available backfill window. The sinfo command can only be used by the administrator.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
SCC has prepared a special script (sinfo_t_idle) to find out how many processors are available for immediate use on the system. It is anticipated that users will use this information to submit jobs that meet these criteria and thus obtain quick job turnaround times. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The following command displays what resources are available for immediate use for the whole partition.&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sinfo_t_idle &lt;br /&gt;
Partition dev_cpu                 :      1 nodes idle&lt;br /&gt;
Partition cpu                     :      1 nodes idle&lt;br /&gt;
Partition highmem                 :      2 nodes idle&lt;br /&gt;
Partition dev_gpu_h100            :      0 nodes idle&lt;br /&gt;
Partition gpu_h100                :      0 nodes idle&lt;br /&gt;
Partition gpu_mi300               :      0 nodes idle&lt;br /&gt;
Partition dev_cpu_il              :      7 nodes idle&lt;br /&gt;
Partition cpu_il                  :      2 nodes idle&lt;br /&gt;
Partition dev_gpu_a100_il         :      1 nodes idle&lt;br /&gt;
Partition gpu_a100_il             :      0 nodes idle&lt;br /&gt;
Partition gpu_h100_il             :      1 nodes idle&lt;br /&gt;
Partition gpu_a100_short          :      0 nodes idle&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Running Jobs =&lt;br /&gt;
&lt;br /&gt;
== Slurm Commands (excerpt) ==&lt;br /&gt;
Important Slurm commands for non-administrators working on bwUniCluster 3.0.&lt;br /&gt;
{| width=850px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Brief explanation&lt;br /&gt;
|-&lt;br /&gt;
| [[#Batch Jobs: sbatch|sbatch]] || Submits a job and puts it into the queue [[https://slurm.schedmd.com/sbatch.html sbatch]] &lt;br /&gt;
|-&lt;br /&gt;
| [[#Interactive Jobs: salloc|salloc]] || Requests resources for an interactive Job [[https://slurm.schedmd.com/salloc.html salloc]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Monitor and manage jobs |scontrol show job]] || Displays detailed job state information [[https://slurm.schedmd.com/scontrol.html scontrol]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue]] || Displays information about active, eligible, blocked, and/or recently completed jobs [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue --start]] || Returns start time of submitted job [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Check available resources: sinfo_t_idle|sinfo_t_idle]] || Shows what resources are available for immediate use [[https://slurm.schedmd.com/sinfo.html sinfo]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Canceling own jobs : scancel|scancel]] || Cancels a job [[https://slurm.schedmd.com/scancel.html scancel]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
* [https://slurm.schedmd.com/tutorials.html  Slurm Tutorials]&lt;br /&gt;
* [https://slurm.schedmd.com/pdfs/summary.pdf  Slurm command/option summary (2 pages)]&lt;br /&gt;
* [https://slurm.schedmd.com/man_index.html  Slurm Commands]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Batch Jobs: sbatch ==&lt;br /&gt;
&lt;br /&gt;
Batch jobs are submitted by using the command &#039;&#039;&#039;sbatch&#039;&#039;&#039;. The main purpose of the &#039;&#039;&#039;sbatch&#039;&#039;&#039; command is to specify the resources that are needed to run the job. &#039;&#039;&#039;sbatch&#039;&#039;&#039; will then queue the batch job. However, starting of batch job depends on the availability of the requested resources and the fair sharing value.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The syntax and use of &#039;&#039;&#039;sbatch&#039;&#039;&#039; can be displayed via:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ man sbatch&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;sbatch&#039;&#039;&#039; options can be used from the command line or in your job script. Different defaults for some of these options are set based on the queue and can be found [[BwUniCluster3.0/Slurm | here]]&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;3&amp;quot; | sbatch Options&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;width:8%&amp;quot;| Command line&lt;br /&gt;
! style=&amp;quot;width:9%&amp;quot;| Script&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Purpose&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -t, --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| #SBATCH --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| Wall clock time limit.&amp;lt;br&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -N, --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of nodes to be used.&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -n, --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of tasks to be launched.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Maximum count of tasks per node.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -c, --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of CPUs required per (MPI-)task.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --gres=gpu:&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --gres=gpu:&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of GPUs required per node.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Memory in MegaByte per node. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Minimum Memory required per allocated CPU. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --exclusive&lt;br /&gt;
| #SBATCH --exclusive &lt;br /&gt;
| The job allocates all CPUs and GPUs on the nodes. It will not share the node with other running jobs&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| Notify user by email when certain event types occur.&amp;lt;br&amp;gt;Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
|  The specified mail-address receives email notification of state changes as defined by --mail-type.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job output is stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job error messages are stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -J, --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| Job name.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| #SBATCH --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| Identifies which environment variables from the submission environment are propagated to the launched application. Default is ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -A, --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| #SBATCH --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| Change resources used by this job to specified group. You may need this option if your account is assigned to more than one group. By command &amp;quot;scontrol show job&amp;quot; the project group the job is accounted on can be seen behind &amp;quot;Account=&amp;quot;. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -p, --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| #SBATCH --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| Request a specific queue for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| #SBATCH --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| Use a specific reservation for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;LSDF&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=LSDF&lt;br /&gt;
| Job constraint LSDF filesystems.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&lt;br /&gt;
| Job constraint BeeOND filesystem.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Interactive Jobs: salloc ==&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 you are only allowed to run short jobs (&amp;lt;&amp;lt; 1 hour) with little memory requirements (&amp;lt;&amp;lt; 8 GByte) on the logins nodes. If you want to run longer jobs and/or jobs with a request of more than 8 GByte of memory, you must allocate resources for so-called interactive jobs by usage of the command salloc on a login node. Considering a serial application running on a compute node that requires 5000 MByte of memory and limiting the interactive run to 2 hours the following command has to be executed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -n 1 -t 120 --mem=5000&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then you will get one core on a compute node within the partition &amp;quot;cpu&amp;quot;. After execution of this command &#039;&#039;&#039;DO NOT CLOSE&#039;&#039;&#039; your current terminal session but wait until the queueing system Slurm has granted you the requested resources on the compute system. You will be logged in automatically on the granted core! To run a serial program on the granted core you only have to type the name of the executable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ./&amp;lt;my_serial_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please be aware that your serial job must run less than 2 hours in this example, else the job will be killed during runtime by the system. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can also start now a graphical X11-terminal connecting you to the dedicated resource that is available for 2 hours. You can start it by the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ xterm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that, once the walltime limit has been reached the resources - i.e. the compute node - will automatically be revoked.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
An interactive parallel application running on one compute node or on many compute nodes (e.g. here 5 nodes) with 96 cores each requires usually an amount of memory in GByte (e.g. 50 GByte) and a maximum time (e.g. 1 hour). E.g. 5 nodes can be allocated by the following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -N 5 --ntasks-per-node=96 -t 01:00:00  --mem=50gb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now you can run parallel jobs on 480 cores requiring 50 GByte of memory per node. Please be aware that you will be logged in on core 0 of the first node.&lt;br /&gt;
If you want to have access to another node you have to open a new terminal, connect it also to bwUniCluster 3.0 and type the following commands to&lt;br /&gt;
connect to the running interactive job and then to a specific node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ srun --jobid=XXXXXXXX --pty /bin/bash&lt;br /&gt;
$ srun --nodelist=uc3nXXX --pty /bin/bash&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
the jobid and the nodelist can be shown.&lt;br /&gt;
&lt;br /&gt;
If you want to run MPI-programs, you can do it by simply typing mpirun &amp;lt;program_name&amp;gt;. Then your program will be run on 480 cores. A very simple example for starting a parallel job can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also start the debugger ddt by the commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module add devel/ddt&lt;br /&gt;
$ ddt &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above commands will execute the parallel program &amp;lt;my_mpi_program&amp;gt; on all available cores. You can also start parallel programs on a subset of cores; an example for this can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -n 50 &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you are using Intel MPI you must start &amp;lt;my_mpi_program&amp;gt; by the command mpiexec.hydra (instead of mpirun).&lt;br /&gt;
&lt;br /&gt;
== Interactive Computing with Jupyter ==&lt;br /&gt;
&lt;br /&gt;
== Monitor and manage jobs ==&lt;br /&gt;
&lt;br /&gt;
=== List of your submitted jobs : squeue ===&lt;br /&gt;
Displays information about YOUR active, pending and/or recently completed jobs. The command displays all own active and pending jobs. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;squeue&#039;&#039; example on bwUniCluster 3.0 &amp;lt;small&amp;gt;(Only your own jobs are displayed!)&amp;lt;/small&amp;gt;.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue &lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  R       8:15      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123 PD       0:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  R       2:41      1 uc3n084&lt;br /&gt;
$ squeue -l&lt;br /&gt;
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  RUNNING       8:55     20:00      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123  PENDING       0:00     20:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  RUNNING       3:21     20:00      1 uc3n084&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
If resources are not immediately available add &amp;lt;code&amp;gt;--start&amp;lt;/code&amp;gt; to show its expected start time:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;sh&amp;quot;&amp;gt;squeue --start&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Detailed job information : scontrol show job ===&lt;br /&gt;
scontrol show job displays detailed job state information and diagnostic output for all or a specified job of yours. Detailed information is available for active, pending and recently completed jobs. The command scontrol is explained in detail on the webpage https://slurm.schedmd.com/scontrol.html or via manpage (man scontrol). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of all your jobs in normal mode: scontrol show job&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of a job with &amp;lt;jobid&amp;gt; in normal mode: scontrol show job &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here is an example from bwUniCluster 3.0:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
1262       cpu     wrap ka_zs040  R       1:12      1 uc3n002&lt;br /&gt;
&lt;br /&gt;
$&lt;br /&gt;
$ # now, see what&#039;s up with my pending job with jobid 1262&lt;br /&gt;
$ &lt;br /&gt;
$ scontrol show job 1262&lt;br /&gt;
&lt;br /&gt;
JobId=1262 JobName=wrap&lt;br /&gt;
   UserId=ka_zs0402(241992) GroupId=ka_scc(12345) MCS_label=N/A&lt;br /&gt;
   Priority=4246 Nice=0 Account=ka QOS=normal&lt;br /&gt;
   JobState=RUNNING Reason=None Dependency=(null)&lt;br /&gt;
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0&lt;br /&gt;
   RunTime=00:00:37 TimeLimit=00:20:00 TimeMin=N/A&lt;br /&gt;
   SubmitTime=2025-04-04T10:01:30 EligibleTime=2025-04-04T10:01:30&lt;br /&gt;
   AccrueTime=2025-04-04T10:01:30&lt;br /&gt;
   StartTime=2025-04-04T10:01:31 EndTime=2025-04-04T10:21:31 Deadline=N/A&lt;br /&gt;
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-04-04T10:01:31 Scheduler=Main&lt;br /&gt;
   Partition=cpu AllocNode:Sid=uc3n999:2819841&lt;br /&gt;
   ReqNodeList=(null) ExcNodeList=(null)&lt;br /&gt;
   NodeList=uc3n002&lt;br /&gt;
   BatchHost=uc3n002&lt;br /&gt;
   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*&lt;br /&gt;
   ReqTRES=cpu=1,mem=2000M,node=1,billing=1&lt;br /&gt;
   AllocTRES=cpu=2,mem=4000M,node=1,billing=2&lt;br /&gt;
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*&lt;br /&gt;
   MinCPUsNode=1 MinMemoryCPU=2000M MinTmpDiskNode=0&lt;br /&gt;
   Features=(null) DelayBoot=00:00:00&lt;br /&gt;
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)&lt;br /&gt;
   Command=(null)&lt;br /&gt;
   WorkDir=/pfs/data6/home/ka/ka_scc/ka_zs0402&lt;br /&gt;
   StdErr=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
   StdIn=/dev/null&lt;br /&gt;
   StdOut=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Each request to the Slurm workload manager generates a load. &amp;lt;p style=&amp;quot;color:red;&amp;gt;&amp;lt;b&amp;gt;Therefore, do not use &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; with a simple &amp;lt;code&amp;gt;watch&amp;lt;/code&amp;gt;.&amp;lt;/b&amp;gt;&amp;lt;/p&amp;gt; The smallest allowed time interval is &amp;lt;b&amp;gt;30 seconds&amp;lt;/b&amp;gt;.&amp;lt;br&amp;gt;&lt;br /&gt;
Any violation of this rule will result in the task being terminated without notice.&lt;br /&gt;
&lt;br /&gt;
=== Canceling own jobs : scancel ===&lt;br /&gt;
The scancel command is used to cancel jobs. The command scancel is explained in detail on the webpage https://slurm.schedmd.com/scancel.html or via manpage (man scancel). The command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel [-i] &amp;lt;job-id&amp;gt;&lt;br /&gt;
$ scancel -t &amp;lt;job_state_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Slurm Options =&lt;br /&gt;
[[BwUniCluster3.0/Running_Jobs/Slurm | Detailed Slurm usage]]&lt;br /&gt;
&lt;br /&gt;
= Best Practices =&lt;br /&gt;
&lt;br /&gt;
== Step-by-Step example==&lt;br /&gt;
&lt;br /&gt;
== Dos and Don&#039;ts ==&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;| Do not run squeue and other slurm commands in loops or &amp;quot;watch&amp;quot; as not to saturate up the slurm daemon with rpc requests&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15672</id>
		<title>BwUniCluster3.0/Running Jobs</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15672"/>
		<updated>2026-01-07T13:52:09Z</updated>

		<summary type="html">&lt;p&gt;S Braun: /* Batch Jobs: sbatch */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
= Purpose and function of a queuing system =&lt;br /&gt;
&lt;br /&gt;
All compute activities on bwUniCluster 3.0 have to be performed on the compute nodes. Compute nodes are only available by requesting the corresponding resources via the queuing system. As soon as the requested resources are available, automated tasks are executed via a batch script or they can be accessed interactively.&amp;lt;br&amp;gt;&lt;br /&gt;
General procedure: Hint to [[Running_Calculations | Running Calculations]]&lt;br /&gt;
&lt;br /&gt;
== Job submission process ==&lt;br /&gt;
&lt;br /&gt;
bwUniCluster 3.0 has installed the workload managing software Slurm. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.&lt;br /&gt;
&lt;br /&gt;
== Slurm ==&lt;br /&gt;
&lt;br /&gt;
HPC Workload Manager on bwUniCluster 3.0 is Slurm.&lt;br /&gt;
Slurm is a cluster management and job scheduling system. Slurm has three key functions. &lt;br /&gt;
* It allocates access to resources (compute cores on nodes) to users for some duration of time so they can perform work. &lt;br /&gt;
* It provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. &lt;br /&gt;
* It arbitrates contention for resources by managing a queue of pending work.&lt;br /&gt;
&lt;br /&gt;
Any kind of calculation on the compute nodes of bwUniCluster 3.0 requires the user to define calculations as a sequence of commands together with required run time, number of CPU cores and main memory and submit all, i.e., the &#039;&#039;&#039;batch job&#039;&#039;&#039;, to a resource and workload managing software.&lt;br /&gt;
&lt;br /&gt;
== Terms and definitions ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Partitions &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Slurm manages job queues for different &#039;&#039;&#039;partitions&#039;&#039;&#039;. Partitions are used to group similar node types (e.g. nodes with and without accelerators) and to enforce different access policies and resource limits.&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different partitions:&lt;br /&gt;
&lt;br /&gt;
* CPU-only nodes&lt;br /&gt;
** 2-socket nodes, consisting of 2 Intel Ice Lake processors with 32 cores each or 2 AMD processors with 48 cores each&lt;br /&gt;
** 2-socket nodes with very high RAM capacity, consisting of 2 AMD processors with 48 cores each&lt;br /&gt;
* GPU-accelerated nodes&lt;br /&gt;
** 2-socket nodes with 4x NVIDIA A100 or 4x NVIDIA H100 GPUs&lt;br /&gt;
** 4-socket node with 4x AMD Instinct accelerator&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Queues &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Job &#039;&#039;&#039;queues&#039;&#039;&#039; are used to manage jobs that request access to shared but limited computing resources of a certain kind (partition).&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different main types of queues:&lt;br /&gt;
* Regular queues&lt;br /&gt;
** cpu: Jobs that request CPU-only nodes.&lt;br /&gt;
** gpu: Jobs that request GPU-accelerated nodes.&lt;br /&gt;
* Development queues (dev)&lt;br /&gt;
** Short, usually interactive jobs that are used for developing, compiling and testing code and workflows. The intention behind development queues is to provide users with immediate access to computer resources without having to wait. This is the place to realize instantaneous heavy compute without affecting other users, as would be the case on the login nodes.&lt;br /&gt;
&lt;br /&gt;
Requested compute resources such as (wall-)time, number of nodes and amount of memory are restricted and must fit into the boundaries imposed by the queues. The request for compute resources on the bwUniCluster 3.0 &amp;lt;font color=red&amp;gt;requires at least the specification of the &#039;&#039;&#039;queue&#039;&#039;&#039; and the &#039;&#039;&#039;time&#039;&#039;&#039;&amp;lt;/font&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Jobs &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Jobs can be run non-interactively as &#039;&#039;&#039;batch jobs&#039;&#039;&#039; or as &#039;&#039;&#039;interactive jobs&#039;&#039;&#039;.&amp;lt;br&amp;gt;&lt;br /&gt;
Submitting a batch job means, that all steps of a compute project are defined in a Bash script. This Bash script is queued and executed as soon as the compute resources are available and allocated. Jobs are enqueued with the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command.&lt;br /&gt;
For interactive jobs, the resources are requested with the &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; command. As soon as the computing resources are available and allocated, a command line prompt is returned on a computing node and the user can freely dispose of the resources now available to him.&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
&#039;&#039;&#039;Please remember:&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;Heavy computations are not allowed on the login nodes&#039;&#039;&#039;.&amp;lt;br&amp;gt;Use a developement or a regular job queue instead! Please refer to [[BwUniCluster3.0/Login#Allowed_Activities_on_Login_Nodes|Allowed Activities on Login Nodes]].&lt;br /&gt;
* &#039;&#039;&#039;Development queues&#039;&#039;&#039; are meant for &#039;&#039;&#039;development tasks&#039;&#039;&#039;.&amp;lt;br&amp;gt;Do not misuse this queue for regular, short-running jobs or chain jobs! Only one running job at a time is enabled. Maximum queue length is reduced to 3.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Queues on bwUniCluster 3.0 = &lt;br /&gt;
== Policy ==&lt;br /&gt;
&lt;br /&gt;
The computing time is provided in accordance with the &#039;&#039;&#039;fair share policy&#039;&#039;&#039;. The individual investment shares of the respective university and the resources already used by its members are taken into account. Furthermore, the following throttling policy is also active: The &#039;&#039;&#039;maximum amount of physical cores&#039;&#039;&#039; used at any given time from jobs running is &#039;&#039;&#039;1920 per user&#039;&#039;&#039; (aggregated over all running jobs). This number corresponds to 30 nodes on the Ice Lake partition or 20 nodes on the standard partition. The aim is to minimize waiting times and maximize the number of users who can access computing time at the same time.&lt;br /&gt;
&lt;br /&gt;
== Regular Queues ==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node-Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=30, mem=249600mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=20, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;highmem&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;High Memory&lt;br /&gt;
| mem-per-cpu=12090mb&lt;br /&gt;
| mem=380001mb&lt;br /&gt;
| time=72:00:00, nodes=4, mem=2300000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=72:00:00, nodes=12, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_mi300&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU node&amp;lt;br/&amp;gt;AMD GPU x4&lt;br /&gt;
| mem-per-gpu=128200mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=72:00:00, nodes=1, mem=510000mb, ntasks-per-node=40, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_il&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;gpu_h100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=48:00:00, nodes=9(A100)/nodes=5(H100) , mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 1: Regular Queues&lt;br /&gt;
&lt;br /&gt;
== Short Queues ==&lt;br /&gt;
&amp;lt;p style=&amp;quot;color:red; &amp;quot;&amp;gt;&amp;lt;b&amp;gt;Queues with a short runtime of 30 minutes.&amp;lt;/b&amp;gt;&amp;lt;/p&amp;gt; &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_short&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=94000mb&amp;lt;br/&amp;gt;cpus-per-gpu=12&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=12, mem=376000mb, ntasks-per-node=48, (threads-per-core=2)&lt;br /&gt;
|}&lt;br /&gt;
Table 2: Short Queues&lt;br /&gt;
&lt;br /&gt;
== Development Queues ==&lt;br /&gt;
Only for development, i.e. debugging or performance optimization ...&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=8, mem=249600mb, ntasks-per-node=64, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=1, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=1, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_a100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&amp;lt;br/&amp;gt;&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16 &lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=1, mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 3: Development Queues&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Default resources of a queue class define number of tasks and memory if not explicitly given with sbatch command. Resource list acronyms &#039;&#039;--time&#039;&#039;, &#039;&#039;--ntasks&#039;&#039;, &#039;&#039;--nodes&#039;&#039;, &#039;&#039;--mem&#039;&#039; and &#039;&#039;--mem-per-cpu&#039;&#039; are described [[BwUniCluster3.0/Running_Jobs/Slurm|here]].&lt;br /&gt;
&lt;br /&gt;
== Check available resources: sinfo_t_idle ==&lt;br /&gt;
The Slurm command sinfo is used to view partition and node information for a system running Slurm. It incorporates down time, reservations, and node state information in determining the available backfill window. The sinfo command can only be used by the administrator.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
SCC has prepared a special script (sinfo_t_idle) to find out how many processors are available for immediate use on the system. It is anticipated that users will use this information to submit jobs that meet these criteria and thus obtain quick job turnaround times. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The following command displays what resources are available for immediate use for the whole partition.&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sinfo_t_idle &lt;br /&gt;
Partition dev_cpu                 :      1 nodes idle&lt;br /&gt;
Partition cpu                     :      1 nodes idle&lt;br /&gt;
Partition highmem                 :      2 nodes idle&lt;br /&gt;
Partition dev_gpu_h100            :      0 nodes idle&lt;br /&gt;
Partition gpu_h100                :      0 nodes idle&lt;br /&gt;
Partition gpu_mi300               :      0 nodes idle&lt;br /&gt;
Partition dev_cpu_il              :      7 nodes idle&lt;br /&gt;
Partition cpu_il                  :      2 nodes idle&lt;br /&gt;
Partition dev_gpu_a100_il         :      1 nodes idle&lt;br /&gt;
Partition gpu_a100_il             :      0 nodes idle&lt;br /&gt;
Partition gpu_h100_il             :      1 nodes idle&lt;br /&gt;
Partition gpu_a100_short          :      0 nodes idle&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Running Jobs =&lt;br /&gt;
&lt;br /&gt;
== Slurm Commands (excerpt) ==&lt;br /&gt;
Important Slurm commands for non-administrators working on bwUniCluster 3.0.&lt;br /&gt;
{| width=850px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Brief explanation&lt;br /&gt;
|-&lt;br /&gt;
| [[#Batch Jobs: sbatch|sbatch]] || Submits a job and puts it into the queue [[https://slurm.schedmd.com/sbatch.html sbatch]] &lt;br /&gt;
|-&lt;br /&gt;
| [[#Interactive Jobs: salloc|salloc]] || Requests resources for an interactive Job [[https://slurm.schedmd.com/salloc.html salloc]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Monitor and manage jobs |scontrol show job]] || Displays detailed job state information [[https://slurm.schedmd.com/scontrol.html scontrol]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue]] || Displays information about active, eligible, blocked, and/or recently completed jobs [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue --start]] || Returns start time of submitted job [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Check available resources: sinfo_t_idle|sinfo_t_idle]] || Shows what resources are available for immediate use [[https://slurm.schedmd.com/sinfo.html sinfo]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Canceling own jobs : scancel|scancel]] || Cancels a job [[https://slurm.schedmd.com/scancel.html scancel]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
* [https://slurm.schedmd.com/tutorials.html  Slurm Tutorials]&lt;br /&gt;
* [https://slurm.schedmd.com/pdfs/summary.pdf  Slurm command/option summary (2 pages)]&lt;br /&gt;
* [https://slurm.schedmd.com/man_index.html  Slurm Commands]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Batch Jobs: sbatch ==&lt;br /&gt;
&lt;br /&gt;
Batch jobs are submitted by using the command &#039;&#039;&#039;sbatch&#039;&#039;&#039;. The main purpose of the &#039;&#039;&#039;sbatch&#039;&#039;&#039; command is to specify the resources that are needed to run the job. &#039;&#039;&#039;sbatch&#039;&#039;&#039; will then queue the batch job. However, starting of batch job depends on the availability of the requested resources and the fair sharing value.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The syntax and use of &#039;&#039;&#039;sbatch&#039;&#039;&#039; can be displayed via:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ man sbatch&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;sbatch&#039;&#039;&#039; options can be used from the command line or in your job script. Different defaults for some of these options are set based on the queue and can be found [[BwUniCluster3.0/Slurm | here]]&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;3&amp;quot; | sbatch Options&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;width:8%&amp;quot;| Command line&lt;br /&gt;
! style=&amp;quot;width:9%&amp;quot;| Script&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Purpose&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -t, --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| #SBATCH --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| Wall clock time limit.&amp;lt;br&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -N, --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of nodes to be used.&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -n, --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of tasks to be launched.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Maximum count of tasks per node.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -c, --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of CPUs required per (MPI-)task.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --gres=gpu:&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --gres=gpu:&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of GPUs required per node.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Memory in MegaByte per node. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Minimum Memory required per allocated CPU. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --exclusive&lt;br /&gt;
| #SBATCH --exclusive &lt;br /&gt;
| The job allocates all CPUs and GPUs on the nodes. It will not share the node with other running jobs&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| Notify user by email when certain event types occur.&amp;lt;br&amp;gt;Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
|  The specified mail-address receives email notification of state changes as defined by --mail-type.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job output is stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job error messages are stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -J, --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| Job name.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| #SBATCH --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| Identifies which environment variables from the submission environment are propagated to the launched application. Default is ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -A, --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| #SBATCH --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| Change resources used by this job to specified group. You may need this option if your account is assigned to more than one group. By command &amp;quot;scontrol show job&amp;quot; the project group the job is accounted on can be seen behind &amp;quot;Account=&amp;quot;. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -p, --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| #SBATCH --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| Request a specific queue for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| #SBATCH --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| Use a specific reservation for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;LSDF&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=LSDF&lt;br /&gt;
| Job constraint LSDF filesystems.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&lt;br /&gt;
| Job constraint BeeOND filesystem.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Interactive Jobs: salloc ==&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 you are only allowed to run short jobs (&amp;lt;&amp;lt; 1 hour) with little memory requirements (&amp;lt;&amp;lt; 8 GByte) on the logins nodes. If you want to run longer jobs and/or jobs with a request of more than 8 GByte of memory, you must allocate resources for so-called interactive jobs by usage of the command salloc on a login node. Considering a serial application running on a compute node that requires 5000 MByte of memory and limiting the interactive run to 2 hours the following command has to be executed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -n 1 -t 120 --mem=5000&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then you will get one core on a compute node within the partition &amp;quot;cpu&amp;quot;. After execution of this command &#039;&#039;&#039;DO NOT CLOSE&#039;&#039;&#039; your current terminal session but wait until the queueing system Slurm has granted you the requested resources on the compute system. You will be logged in automatically on the granted core! To run a serial program on the granted core you only have to type the name of the executable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ./&amp;lt;my_serial_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please be aware that your serial job must run less than 2 hours in this example, else the job will be killed during runtime by the system. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can also start now a graphical X11-terminal connecting you to the dedicated resource that is available for 2 hours. You can start it by the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ xterm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that, once the walltime limit has been reached the resources - i.e. the compute node - will automatically be revoked.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
An interactive parallel application running on one compute node or on many compute nodes (e.g. here 5 nodes) with 96 cores each requires usually an amount of memory in GByte (e.g. 50 GByte) and a maximum time (e.g. 1 hour). E.g. 5 nodes can be allocated by the following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -N 5 --ntasks-per-node=96 -t 01:00:00  --mem=50gb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now you can run parallel jobs on 480 cores requiring 50 GByte of memory per node. Please be aware that you will be logged in on core 0 of the first node.&lt;br /&gt;
If you want to have access to another node you have to open a new terminal, connect it also to bwUniCluster 3.0 and type the following commands to&lt;br /&gt;
connect to the running interactive job and then to a specific node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ srun --jobid=XXXXXXXX --pty /bin/bash&lt;br /&gt;
$ srun --nodelist=uc3nXXX --pty /bin/bash&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
the jobid and the nodelist can be shown.&lt;br /&gt;
&lt;br /&gt;
If you want to run MPI-programs, you can do it by simply typing mpirun &amp;lt;program_name&amp;gt;. Then your program will be run on 480 cores. A very simple example for starting a parallel job can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also start the debugger ddt by the commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module add devel/ddt&lt;br /&gt;
$ ddt &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above commands will execute the parallel program &amp;lt;my_mpi_program&amp;gt; on all available cores. You can also start parallel programs on a subset of cores; an example for this can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -n 50 &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you are using Intel MPI you must start &amp;lt;my_mpi_program&amp;gt; by the command mpiexec.hydra (instead of mpirun).&lt;br /&gt;
&lt;br /&gt;
== Interactive Computing with Jupyter ==&lt;br /&gt;
&lt;br /&gt;
== Monitor and manage jobs ==&lt;br /&gt;
&lt;br /&gt;
=== List of your submitted jobs : squeue ===&lt;br /&gt;
Displays information about YOUR active, pending and/or recently completed jobs. The command displays all own active and pending jobs. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;squeue&#039;&#039; example on bwUniCluster 3.0 &amp;lt;small&amp;gt;(Only your own jobs are displayed!)&amp;lt;/small&amp;gt;.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue &lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  R       8:15      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123 PD       0:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  R       2:41      1 uc3n084&lt;br /&gt;
$ squeue -l&lt;br /&gt;
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  RUNNING       8:55     20:00      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123  PENDING       0:00     20:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  RUNNING       3:21     20:00      1 uc3n084&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Detailed job information : scontrol show job ===&lt;br /&gt;
scontrol show job displays detailed job state information and diagnostic output for all or a specified job of yours. Detailed information is available for active, pending and recently completed jobs. The command scontrol is explained in detail on the webpage https://slurm.schedmd.com/scontrol.html or via manpage (man scontrol). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of all your jobs in normal mode: scontrol show job&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of a job with &amp;lt;jobid&amp;gt; in normal mode: scontrol show job &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here is an example from bwUniCluster 3.0:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
1262       cpu     wrap ka_zs040  R       1:12      1 uc3n002&lt;br /&gt;
&lt;br /&gt;
$&lt;br /&gt;
$ # now, see what&#039;s up with my pending job with jobid 1262&lt;br /&gt;
$ &lt;br /&gt;
$ scontrol show job 1262&lt;br /&gt;
&lt;br /&gt;
JobId=1262 JobName=wrap&lt;br /&gt;
   UserId=ka_zs0402(241992) GroupId=ka_scc(12345) MCS_label=N/A&lt;br /&gt;
   Priority=4246 Nice=0 Account=ka QOS=normal&lt;br /&gt;
   JobState=RUNNING Reason=None Dependency=(null)&lt;br /&gt;
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0&lt;br /&gt;
   RunTime=00:00:37 TimeLimit=00:20:00 TimeMin=N/A&lt;br /&gt;
   SubmitTime=2025-04-04T10:01:30 EligibleTime=2025-04-04T10:01:30&lt;br /&gt;
   AccrueTime=2025-04-04T10:01:30&lt;br /&gt;
   StartTime=2025-04-04T10:01:31 EndTime=2025-04-04T10:21:31 Deadline=N/A&lt;br /&gt;
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-04-04T10:01:31 Scheduler=Main&lt;br /&gt;
   Partition=cpu AllocNode:Sid=uc3n999:2819841&lt;br /&gt;
   ReqNodeList=(null) ExcNodeList=(null)&lt;br /&gt;
   NodeList=uc3n002&lt;br /&gt;
   BatchHost=uc3n002&lt;br /&gt;
   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*&lt;br /&gt;
   ReqTRES=cpu=1,mem=2000M,node=1,billing=1&lt;br /&gt;
   AllocTRES=cpu=2,mem=4000M,node=1,billing=2&lt;br /&gt;
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*&lt;br /&gt;
   MinCPUsNode=1 MinMemoryCPU=2000M MinTmpDiskNode=0&lt;br /&gt;
   Features=(null) DelayBoot=00:00:00&lt;br /&gt;
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)&lt;br /&gt;
   Command=(null)&lt;br /&gt;
   WorkDir=/pfs/data6/home/ka/ka_scc/ka_zs0402&lt;br /&gt;
   StdErr=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
   StdIn=/dev/null&lt;br /&gt;
   StdOut=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Each request to the Slurm workload manager generates a load. &amp;lt;p style=&amp;quot;color:red;&amp;gt;&amp;lt;b&amp;gt;Therefore, do not use &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; with a simple &amp;lt;code&amp;gt;watch&amp;lt;/code&amp;gt;.&amp;lt;/b&amp;gt;&amp;lt;/p&amp;gt; The smallest allowed time interval is &amp;lt;b&amp;gt;30 seconds&amp;lt;/b&amp;gt;.&amp;lt;br&amp;gt;&lt;br /&gt;
Any violation of this rule will result in the task being terminated without notice.&lt;br /&gt;
&lt;br /&gt;
=== Canceling own jobs : scancel ===&lt;br /&gt;
The scancel command is used to cancel jobs. The command scancel is explained in detail on the webpage https://slurm.schedmd.com/scancel.html or via manpage (man scancel). The command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel [-i] &amp;lt;job-id&amp;gt;&lt;br /&gt;
$ scancel -t &amp;lt;job_state_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Slurm Options =&lt;br /&gt;
[[BwUniCluster3.0/Running_Jobs/Slurm | Detailed Slurm usage]]&lt;br /&gt;
&lt;br /&gt;
= Best Practices =&lt;br /&gt;
&lt;br /&gt;
== Step-by-Step example==&lt;br /&gt;
&lt;br /&gt;
== Dos and Don&#039;ts ==&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;| Do not run squeue and other slurm commands in loops or &amp;quot;watch&amp;quot; as not to saturate up the slurm daemon with rpc requests&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Policies&amp;diff=15560</id>
		<title>BwUniCluster3.0/Policies</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Policies&amp;diff=15560"/>
		<updated>2025-12-02T08:49:50Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Policies =&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;File system quotas&#039;&#039;&#039;&lt;br /&gt;
** HOME: &#039;&#039;&#039;500GB&#039;&#039;&#039;, &#039;&#039;&#039;5 million files (inodes)&#039;&#039;&#039;&lt;br /&gt;
** Workspace: &#039;&#039;&#039;40TB&#039;&#039;&#039;, &#039;&#039;&#039;20 million files (inodes)&#039;&#039;&#039;&lt;br /&gt;
** Throttling Policies: The &#039;&#039;&#039;maximum amount of cores&#039;&#039;&#039; used at any given time from jobs running is 1920 per user (aggregated over all running jobs).&lt;br /&gt;
* &#039;&#039;&#039;Username and HOME directory for KIT users&#039;&#039;&#039;&lt;br /&gt;
** Like everyone else, KIT users&#039; usernames now have the two-character prefix of their home location: &#039;&#039;&#039;&amp;lt;code&amp;gt;ka_&amp;lt;/code&amp;gt;&#039;&#039;&#039;&lt;br /&gt;
** The HOME directory for user &#039;&#039;ab1234&#039;&#039; would be: &#039;&#039;&#039;&amp;lt;code&amp;gt;/home/ka/ka_OE/ka_ab1234&amp;lt;/code&amp;gt;&#039;&#039;&#039; (OE: organizational unit)&lt;br /&gt;
** Login with SSH: &#039;&#039;&#039;&amp;lt;code&amp;gt;ssh ka_ab1234@uc3.scc.kit.edu&amp;lt;/code&amp;gt;&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;Access for KIT students&#039;&#039;&#039;&lt;br /&gt;
** KIT students can be granted access with their regular u-student account in the context of a lecture (cf. https://www.scc.kit.edu/servicedesk/formulare.php &amp;amp;rarr; Application Form for Students accounts on bwUniCluster).&lt;br /&gt;
** The account is only enabled &#039;&#039;&#039;during the lecture period&#039;&#039;&#039;. After the end of the semester, the accounts will be deprovisioned and the user data is deleted.&lt;br /&gt;
** A guest and partner account (GuP) is required for all other projects of KIT students on bwUniCluster 3.0.&lt;br /&gt;
* &#039;&#039;&#039;Allowed Activities on Login Nodes&#039;&#039;&#039;&lt;br /&gt;
** To guarantee usability for all the users of clusters you must not run your compute jobs on the login nodes.&lt;br /&gt;
** Compute intensive jobs must be submitted to the queuing system.&amp;lt;br/&amp;gt;&lt;br /&gt;
** &#039;&#039;&#039;Any compute job running on the login nodes will be terminated without any notice.&#039;&#039;&#039;&lt;br /&gt;
** Any long-running compilation or any long-running pre- or post-processing of batch jobs must also be submitted to the queuing system.&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Policies&amp;diff=15554</id>
		<title>BwUniCluster3.0/Policies</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Policies&amp;diff=15554"/>
		<updated>2025-12-02T08:40:12Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Policies =&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;File system quotas&#039;&#039;&#039;&lt;br /&gt;
** HOME: &#039;&#039;&#039;500GB&#039;&#039;&#039;, &#039;&#039;&#039;5 million files (inodes)&#039;&#039;&#039;&lt;br /&gt;
** Workspace: &#039;&#039;&#039;40TB&#039;&#039;&#039;, &#039;&#039;&#039;20 million files (inodes)&#039;&#039;&#039;&lt;br /&gt;
** Throttling Policies: The &#039;&#039;&#039;maximum amount of cores&#039;&#039;&#039; used at any given time from jobs running is 1920 per user (aggregated over all running jobs).&lt;br /&gt;
* &#039;&#039;&#039;Username and HOME directory for KIT users&#039;&#039;&#039;&lt;br /&gt;
** Like everyone else, KIT users&#039; usernames now have the two-character prefix of their home location: &#039;&#039;&#039;&amp;lt;code&amp;gt;ka_&amp;lt;/code&amp;gt;&#039;&#039;&#039;&lt;br /&gt;
** The HOME directory for user &#039;&#039;ab1234&#039;&#039; would be: &#039;&#039;&#039;&amp;lt;code&amp;gt;/home/ka/ka_OE/ka_ab1234&amp;lt;/code&amp;gt;&#039;&#039;&#039; (OE: organizational unit)&lt;br /&gt;
** Login with SSH: &#039;&#039;&#039;&amp;lt;code&amp;gt;ssh ka_ab1234@uc3.scc.kit.edu&amp;lt;/code&amp;gt;&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;Access for KIT students&#039;&#039;&#039;&lt;br /&gt;
** KIT students can be granted access with their regular u-student account in the context of a lecture (cf. https://www.scc.kit.edu/servicedesk/formulare.php &amp;amp;rarr; Application Form for Students accounts on bwUniCluster).&lt;br /&gt;
** The account is only enabled &#039;&#039;&#039;during the lecture period&#039;&#039;&#039;. After the end of the semester, the accounts will be deprovisioned and the user data is deleted.&lt;br /&gt;
** A guest and partner account (GuP) is required for all other projects of KIT students on bwUniCluster 3.0.&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15552</id>
		<title>BwUniCluster3.0</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15552"/>
		<updated>2025-12-02T08:38:19Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## Picture of bwUniCluster - right side  ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## About bwUniCluster                    ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwUniCluster 3.0+KIT-GFA-HPC 3&#039;&#039;&#039; is the joint high-performance computer system of Baden-Württemberg&#039;s Universities and Universities of Applied Sciences for &#039;&#039;&#039;general purpose and teaching&#039;&#039;&#039; and located at the Scientific Computing Center (SCC) at Karlsruhe Institute of Technology (KIT). The bwUniCluster 3.0 complements the four bwForClusters and their dedicated scientific areas.&lt;br /&gt;
[[File:DSCF6485_rectangled_perspective.jpg|center|600px|frameless|alt=bwUniCluster3.0 |upright=1| bwUniCluster 3.0 ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Maintenance Section     ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no upcoming maintenance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Next maintenance&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
Due to regular maintenance work the HPC System bwUnicluster 2 will not be available from &lt;br /&gt;
&lt;br /&gt;
21.05.2024 at 08:30 AM until 24.05.2024 at 15:00 AM&lt;br /&gt;
&lt;br /&gt;
Please see the [[BwUniCluster2.0/Maintenance/2024-05|maintenance]] page for more information about planned upgrades and other changes&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: News section            ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no news&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Transition bwUniCluster 2.0 &amp;amp;rarr; bwUniCluster 3.0&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
&lt;br /&gt;
The HPC cluster bwUniCluster 3.0 is the successor of bwUniCluster 2.0. It features accelerated and CPU-only nodes, with the host system of both node types consisting of classic x86 processor architectures.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
To ensure that you can use the new system successfully and set up your working environment with ease, the following points should be noted.&lt;br /&gt;
&lt;br /&gt;
== Registration ==&lt;br /&gt;
All users who already have an entitlement on bwUniCluster 2.0 are authorized to access bwUniCluster 3.0. The user only needs to &#039;&#039;&#039;register for the new service&#039;&#039;&#039; at https://bwidm.scc.kit.edu .&lt;br /&gt;
&lt;br /&gt;
== Changes ==&lt;br /&gt;
&lt;br /&gt;
Hardware, software and the operating system have been updated and adapted to the latest standards. We would like to draw your attention in particular to the changes in policy, which must also be taken into account.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Changes to hardware, software and policy can be looked up here: [[BwUniCluster3.0/Data_Migration_Guide#Summary_of_changes|Summary of Changes]]&lt;br /&gt;
&lt;br /&gt;
== Migration ==&lt;br /&gt;
bwUniCluster 3.0 features a completely new file system. &#039;&#039;&#039;There is no automatic migration of user data!&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The file systems of the old system and the login nodes will remain in operation for a period of &#039;&#039;&#039;3 months&#039;&#039;&#039; after the new system goes live (till July 6, 2025).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
In order to move data that is still needed, user software, and user specific settings from the old HOME directory to the new HOME directory, or to new workspaces, instructions are provided here: [[BwUniCluster3.0/Data_Migration_Guide#Migration_of_Data|Data Migration Guide]]&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Training/Support section##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[BwUniCluster3.0/Getting_Started|Getting Started]]&lt;br /&gt;
* [https://training.bwhpc.de E-Learning Courses]&lt;br /&gt;
* [[BwUniCluster3.0/Support|Support]]&lt;br /&gt;
* [[BwUniCluster3.0/FAQ|FAQ]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: User Documentation      ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Access: [[Registration/bwUniCluster|Registration]], [[Registration/Deregistration|Deregistration]], [[BwUniCluster3.0/Policies|Policies]]&lt;br /&gt;
* [[BwUniCluster3.0/Login|Login]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Client|SSH Clients]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Data_Transfer|Data Transfer]]&lt;br /&gt;
* [[BwUniCluster3.0/Hardware_and_Architecture|Hardware and Architecture]]&lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#Compute_resources|Compute Resources]] &lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#File_Systems|File Systems]] &lt;br /&gt;
* [[BwUniCluster3.0/Software|Cluster Specific Software]]&lt;br /&gt;
** [[BwUniCluster3.0/Containers|Using Containers]]&lt;br /&gt;
* [[BwUniCluster3.0/Running_Jobs|Runnning Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Batch_Jobs:_sbatch|Running Batch Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Interactive_Jobs:_salloc|Running Interactive Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Jupyter|Interactive Computing with Jupyter]]&lt;br /&gt;
* [[BwUniCluster3.0/Maintenance|Operational Changes]]&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Acknowledgement         ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please [[BwUniCluster3.0/Acknowledgement|acknowledge]] bwUniCluster 3.0 in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Policies&amp;diff=15551</id>
		<title>BwUniCluster3.0/Policies</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Policies&amp;diff=15551"/>
		<updated>2025-12-02T08:37:26Z</updated>

		<summary type="html">&lt;p&gt;S Braun: Created page with &amp;quot;= Policies =&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Policies =&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15549</id>
		<title>BwUniCluster3.0</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15549"/>
		<updated>2025-12-02T08:36:27Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## Picture of bwUniCluster - right side  ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## About bwUniCluster                    ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwUniCluster 3.0+KIT-GFA-HPC 3&#039;&#039;&#039; is the joint high-performance computer system of Baden-Württemberg&#039;s Universities and Universities of Applied Sciences for &#039;&#039;&#039;general purpose and teaching&#039;&#039;&#039; and located at the Scientific Computing Center (SCC) at Karlsruhe Institute of Technology (KIT). The bwUniCluster 3.0 complements the four bwForClusters and their dedicated scientific areas.&lt;br /&gt;
[[File:DSCF6485_rectangled_perspective.jpg|center|600px|frameless|alt=bwUniCluster3.0 |upright=1| bwUniCluster 3.0 ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Maintenance Section     ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no upcoming maintenance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Next maintenance&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
Due to regular maintenance work the HPC System bwUnicluster 2 will not be available from &lt;br /&gt;
&lt;br /&gt;
21.05.2024 at 08:30 AM until 24.05.2024 at 15:00 AM&lt;br /&gt;
&lt;br /&gt;
Please see the [[BwUniCluster2.0/Maintenance/2024-05|maintenance]] page for more information about planned upgrades and other changes&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: News section            ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no news&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Transition bwUniCluster 2.0 &amp;amp;rarr; bwUniCluster 3.0&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
&lt;br /&gt;
The HPC cluster bwUniCluster 3.0 is the successor of bwUniCluster 2.0. It features accelerated and CPU-only nodes, with the host system of both node types consisting of classic x86 processor architectures.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
To ensure that you can use the new system successfully and set up your working environment with ease, the following points should be noted.&lt;br /&gt;
&lt;br /&gt;
== Registration ==&lt;br /&gt;
All users who already have an entitlement on bwUniCluster 2.0 are authorized to access bwUniCluster 3.0. The user only needs to &#039;&#039;&#039;register for the new service&#039;&#039;&#039; at https://bwidm.scc.kit.edu .&lt;br /&gt;
&lt;br /&gt;
== Changes ==&lt;br /&gt;
&lt;br /&gt;
Hardware, software and the operating system have been updated and adapted to the latest standards. We would like to draw your attention in particular to the changes in policy, which must also be taken into account.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Changes to hardware, software and policy can be looked up here: [[BwUniCluster3.0/Data_Migration_Guide#Summary_of_changes|Summary of Changes]]&lt;br /&gt;
&lt;br /&gt;
== Migration ==&lt;br /&gt;
bwUniCluster 3.0 features a completely new file system. &#039;&#039;&#039;There is no automatic migration of user data!&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The file systems of the old system and the login nodes will remain in operation for a period of &#039;&#039;&#039;3 months&#039;&#039;&#039; after the new system goes live (till July 6, 2025).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
In order to move data that is still needed, user software, and user specific settings from the old HOME directory to the new HOME directory, or to new workspaces, instructions are provided here: [[BwUniCluster3.0/Data_Migration_Guide#Migration_of_Data|Data Migration Guide]]&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Training/Support section##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[BwUniCluster3.0/Getting_Started|Getting Started]]&lt;br /&gt;
* [https://training.bwhpc.de E-Learning Courses]&lt;br /&gt;
* [[BwUniCluster3.0/Support|Support]]&lt;br /&gt;
* [[BwUniCluster3.0/FAQ|FAQ]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: User Documentation      ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Access: [[Registration/bwUniCluster|Registration]], [[Registration/Deregistration|Deregistration]], [[Registration/Deregistration|Policies]]&lt;br /&gt;
* [[BwUniCluster3.0/Login|Login]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Client|SSH Clients]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Data_Transfer|Data Transfer]]&lt;br /&gt;
* [[BwUniCluster3.0/Hardware_and_Architecture|Hardware and Architecture]]&lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#Compute_resources|Compute Resources]] &lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#File_Systems|File Systems]] &lt;br /&gt;
* [[BwUniCluster3.0/Software|Cluster Specific Software]]&lt;br /&gt;
** [[BwUniCluster3.0/Containers|Using Containers]]&lt;br /&gt;
* [[BwUniCluster3.0/Running_Jobs|Runnning Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Batch_Jobs:_sbatch|Running Batch Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Interactive_Jobs:_salloc|Running Interactive Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Jupyter|Interactive Computing with Jupyter]]&lt;br /&gt;
* [[BwUniCluster3.0/Maintenance|Operational Changes]]&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Acknowledgement         ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please [[BwUniCluster3.0/Acknowledgement|acknowledge]] bwUniCluster 3.0 in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15547</id>
		<title>BwUniCluster3.0</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0&amp;diff=15547"/>
		<updated>2025-12-02T08:35:01Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## Picture of bwUniCluster - right side  ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## About bwUniCluster                    ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwUniCluster 3.0+KIT-GFA-HPC 3&#039;&#039;&#039; is the joint high-performance computer system of Baden-Württemberg&#039;s Universities and Universities of Applied Sciences for &#039;&#039;&#039;general purpose and teaching&#039;&#039;&#039; and located at the Scientific Computing Center (SCC) at Karlsruhe Institute of Technology (KIT). The bwUniCluster 3.0 complements the four bwForClusters and their dedicated scientific areas.&lt;br /&gt;
[[File:DSCF6485_rectangled_perspective.jpg|center|600px|frameless|alt=bwUniCluster3.0 |upright=1| bwUniCluster 3.0 ]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Maintenance Section     ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no upcoming maintenance&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Next maintenance&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
Due to regular maintenance work the HPC System bwUnicluster 2 will not be available from &lt;br /&gt;
&lt;br /&gt;
21.05.2024 at 08:30 AM until 24.05.2024 at 15:00 AM&lt;br /&gt;
&lt;br /&gt;
Please see the [[BwUniCluster2.0/Maintenance/2024-05|maintenance]] page for more information about planned upgrades and other changes&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: News section            ##&lt;br /&gt;
###########################################&lt;br /&gt;
## Comment out full section if there no news&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Transition bwUniCluster 2.0 &amp;amp;rarr; bwUniCluster 3.0&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
&lt;br /&gt;
The HPC cluster bwUniCluster 3.0 is the successor of bwUniCluster 2.0. It features accelerated and CPU-only nodes, with the host system of both node types consisting of classic x86 processor architectures.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
To ensure that you can use the new system successfully and set up your working environment with ease, the following points should be noted.&lt;br /&gt;
&lt;br /&gt;
== Registration ==&lt;br /&gt;
All users who already have an entitlement on bwUniCluster 2.0 are authorized to access bwUniCluster 3.0. The user only needs to &#039;&#039;&#039;register for the new service&#039;&#039;&#039; at https://bwidm.scc.kit.edu .&lt;br /&gt;
&lt;br /&gt;
== Changes ==&lt;br /&gt;
&lt;br /&gt;
Hardware, software and the operating system have been updated and adapted to the latest standards. We would like to draw your attention in particular to the changes in policy, which must also be taken into account.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Changes to hardware, software and policy can be looked up here: [[BwUniCluster3.0/Data_Migration_Guide#Summary_of_changes|Summary of Changes]]&lt;br /&gt;
&lt;br /&gt;
== Migration ==&lt;br /&gt;
bwUniCluster 3.0 features a completely new file system. &#039;&#039;&#039;There is no automatic migration of user data!&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The file systems of the old system and the login nodes will remain in operation for a period of &#039;&#039;&#039;3 months&#039;&#039;&#039; after the new system goes live (till July 6, 2025).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
In order to move data that is still needed, user software, and user specific settings from the old HOME directory to the new HOME directory, or to new workspaces, instructions are provided here: [[BwUniCluster3.0/Data_Migration_Guide#Migration_of_Data|Data Migration Guide]]&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Training/Support section##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[BwUniCluster3.0/Getting_Started|Getting Started]]&lt;br /&gt;
* [https://training.bwhpc.de E-Learning Courses]&lt;br /&gt;
* [[BwUniCluster3.0/Support|Support]]&lt;br /&gt;
* [[BwUniCluster3.0/FAQ|FAQ]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: User Documentation      ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Access: [[Registration/bwUniCluster|Registration]], [[Registration/Deregistration|Deregistration]]&lt;br /&gt;
* [[BwUniCluster3.0/Login|Login]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Client|SSH Clients]]&lt;br /&gt;
** [[BwUniCluster3.0/Login/Data_Transfer|Data Transfer]]&lt;br /&gt;
* [[BwUniCluster3.0/Hardware_and_Architecture|Hardware and Architecture]]&lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#Compute_resources|Compute Resources]] &lt;br /&gt;
** [[BwUniCluster3.0/Hardware_and_Architecture#File_Systems|File Systems]] &lt;br /&gt;
* [[BwUniCluster3.0/Software|Cluster Specific Software]]&lt;br /&gt;
** [[BwUniCluster3.0/Containers|Using Containers]]&lt;br /&gt;
* [[BwUniCluster3.0/Running_Jobs|Runnning Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Batch_Jobs:_sbatch|Running Batch Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Running_Jobs#Interactive_Jobs:_salloc|Running Interactive Jobs]]&lt;br /&gt;
** [[BwUniCluster3.0/Jupyter|Interactive Computing with Jupyter]]&lt;br /&gt;
* [[BwUniCluster3.0/Maintenance|Operational Changes]]&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;!-- &lt;br /&gt;
###########################################&lt;br /&gt;
## bwUniCluster: Acknowledgement         ##&lt;br /&gt;
###########################################&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please [[BwUniCluster3.0/Acknowledgement|acknowledge]] bwUniCluster 3.0 in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Registration/SSH&amp;diff=15384</id>
		<title>Registration/SSH</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Registration/SSH&amp;diff=15384"/>
		<updated>2025-11-07T13:21:13Z</updated>

		<summary type="html">&lt;p&gt;S Braun: /* Minimum requirements for SSH Keys */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
This process is only necessary for the bwUniCluster and the bwForCluster Helix and NEMO2.&lt;br /&gt;
On the other clusters, SSH keys can still be copied to the &amp;lt;code&amp;gt;authorized_keys&amp;lt;/code&amp;gt; file.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Registering SSH Keys with your Cluster =&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
Interactive SSH Keys are not valid all the time, but only for a few hours after the last 2-factor login.&lt;br /&gt;
They have to be &amp;quot;unlocked&amp;quot; by entering the OTP and service password.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;SSH Keys&#039;&#039;&#039; are a mechanism for logging into a computer system without having to enter a password. Instead of authenticating yourself with something you know (a password), you prove your identity by showing the server something you have (a cryptographic key).&lt;br /&gt;
&lt;br /&gt;
The usual process is the following:&lt;br /&gt;
&lt;br /&gt;
* The user generates a pair of SSH Keys, a private key and a public key, on their local system. The private key never leaves the local system.&lt;br /&gt;
&lt;br /&gt;
* The user then logs into the remote system using the remote system password and adds the public key to a file called ~/.ssh/authorized_keys .&lt;br /&gt;
&lt;br /&gt;
* All following logins will no longer require the entry of the remote system password because the local system can prove to the remote system that it has a private key matching the public key on file.&lt;br /&gt;
&lt;br /&gt;
While SSH Keys have many advantages, the concept also has &#039;&#039;&#039;a number of issues&#039;&#039;&#039; which make it hard to handle them securely:&lt;br /&gt;
&lt;br /&gt;
* The private key on the local system is supposed to be protected by a strong passphrase. There is no possibility for the server to check if this is the case. Many users do not use a strong passphrase or do not use any passphrase at all. If such a private key is stolen, an attacker can immediately use it to access the remote system.&lt;br /&gt;
&lt;br /&gt;
* There is no concept of validity. Users are not forced to regularly generate new SSH Key pairs and replace the old ones. Often the same key pair is used for many years and the users have no overview of how many systems they have stored their SSH Keys on.&lt;br /&gt;
&lt;br /&gt;
* SSH Keys can be restricted so they can only be used to execute specific commands on the server, or to log in from specified IP addresses. Most users do not do this.&lt;br /&gt;
&lt;br /&gt;
To fix these issues &#039;&#039;&#039;it is no longer possible to self-manage your SSH Keys by adding them to the ~/.ssh/authorized_keys file&#039;&#039;&#039; on bwUniCluster/bwForCluster.&lt;br /&gt;
SSH Keys have to be managed through bwIDM/bwServces instead.&lt;br /&gt;
Existing authorized_keys files are ignored.&lt;br /&gt;
&lt;br /&gt;
== Minimum requirements for SSH Keys ==&lt;br /&gt;
&lt;br /&gt;
Algorithms and Key sizes:&lt;br /&gt;
&lt;br /&gt;
* 2048 bits or more for RSA&lt;br /&gt;
* 521 bits for ECDSA&lt;br /&gt;
* 256 Bits (Default) for ED25519&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Please set a strong passphrase for your private keys.&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
ECDSA-SK and ED25519-SK keys (for use with U2F/FIDO Hardware Tokens like Yubikeys) can currently only be used on NEMO2 and bwUniCluster 3.0.&lt;br /&gt;
&lt;br /&gt;
= Adding a new SSH Key =&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
* Newly added keys are valid for 180 days. After that, they are revoked and placed on a &amp;quot;revocation list&amp;quot; so that they cannot be reused.&lt;br /&gt;
* Copy only the contents of your public ssh key file to bwIDM/bwServices. The file ends with &amp;lt;code&amp;gt;.pub&amp;lt;/code&amp;gt; ( e.g. &amp;lt;code&amp;gt;~/.ssh/&amp;lt;filename&amp;gt;.pub&amp;lt;/code&amp;gt;).&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;SSH keys&#039;&#039;&#039; are generally managed via the &#039;&#039;&#039;My SSH Pubkeys&#039;&#039;&#039; menu entry on the registration pages for the clusters.&lt;br /&gt;
Here you can add and revoke SSH keys. To add a ssh key, please follow these steps:&lt;br /&gt;
&lt;br /&gt;
1. &#039;&#039;&#039;Select the cluster&#039;&#039;&#039; for which you want to create a second factor:&amp;lt;/br&amp;gt; &amp;amp;rarr; [https://login.bwidm.de/user/ssh-keys.xhtml &#039;&#039;&#039;bwUniCluster 3.0&#039;&#039;&#039;]&amp;lt;/br&amp;gt; &amp;amp;rarr; [https://bwservices.uni-heidelberg.de/user/ssh-keys.xhtml &#039;&#039;&#039;bwForCluster Helix&#039;&#039;&#039;]&amp;lt;/br&amp;gt; &amp;amp;rarr; [https://login.bwidm.de/user/ssh-keys.xhtml &#039;&#039;&#039;bwForCluster NEMO 2&#039;&#039;&#039;]&lt;br /&gt;
[[File:BwIDM-twofa.png|center|600px|thumb|My SSH Pubkeys.]]&lt;br /&gt;
&lt;br /&gt;
3. Click the &#039;&#039;&#039;Add SSH Key&#039;&#039;&#039; or &#039;&#039;&#039;SSH Key Hochladen&#039;&#039;&#039; button.&lt;br /&gt;
[[File:Bwunicluster 2.0 access ssh keys empty.png|center|400px|thumb|Add new SSH key.]]&lt;br /&gt;
&lt;br /&gt;
4. A new window will appear.&lt;br /&gt;
Enter a name for the key and paste your SSH public key (file &amp;lt;code&amp;gt;~/.ssh/&amp;lt;filename&amp;gt;.pub&amp;lt;/code&amp;gt;) into the box labelled &amp;quot;SSH Key:&amp;quot;.&lt;br /&gt;
Click on the button labelled &#039;&#039;&#039;Add&#039;&#039;&#039; or &#039;&#039;&#039;Hinzufügen&#039;&#039;&#039;.&lt;br /&gt;
[[File:Ssh-key.png|center|600px|thumb|Add new SSH key.]]&lt;br /&gt;
&lt;br /&gt;
5. If everything worked fine your new key will show up in the user interface:&lt;br /&gt;
[[File:Ssh-success.png|center|800px|thumb|New SSH key added.]]&lt;br /&gt;
&lt;br /&gt;
Once you have added SSH keys to the system, you can bind them to one or more services to use either for interactive logins (&#039;&#039;&#039;Interactive key&#039;&#039;&#039;) or for automatic logins (&#039;&#039;&#039;Command key&#039;&#039;&#039;).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Registering an Interactive Key ==&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
Interactive SSH Keys are not valid all the time, but only for a few hours after the last 2-factor login.&lt;br /&gt;
They have to be &amp;quot;unlocked&amp;quot; by entering the OTP and service password.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Interactive Keys&#039;&#039;&#039; can be used to log into a system for interactive use.&lt;br /&gt;
Perform the following steps to register an interactive key:&lt;br /&gt;
&lt;br /&gt;
1. [[Registration/SSH#Adding_a_new_SSH_Key|&#039;&#039;&#039;Add a new interactive SSH key&#039;&#039;&#039;]] if you have not already done so.&lt;br /&gt;
&lt;br /&gt;
2. Select &#039;&#039;&#039;Registered services/Registrierte Dienste&#039;&#039;&#039; from the top menu and click &#039;&#039;&#039;Set SSH Key/SSH Key setzen&#039;&#039;&#039; for the cluster for which you want to use the SSH key.&lt;br /&gt;
[[File:BwIDM-registered.png|center|600px|thumb|Select Cluster for which you want to use the SSH key.]]&lt;br /&gt;
&lt;br /&gt;
3. The upper block displays the SSH keys currently registered for the service.&lt;br /&gt;
The bottom block displays all the public SSH keys associated with your account.&lt;br /&gt;
Find the SSH key you want to use and click &#039;&#039;&#039;Add/Hinzufügen&#039;&#039;&#039;.&lt;br /&gt;
[[File:Ssh-service-int.png|center|800px|thumb|Add SSH key to service.]]&lt;br /&gt;
&lt;br /&gt;
4. A new window appears.&lt;br /&gt;
Select &#039;&#039;&#039;Interactive&#039;&#039;&#039; as the usage type, enter an optional comment and click &#039;&#039;&#039;Add/Hinzufügen&#039;&#039;&#039;.&lt;br /&gt;
[[File:Ssh-int.png|center|600px|thumb|Add interactive SSH key to service.]]&lt;br /&gt;
&lt;br /&gt;
5. Your SSH key is now registered for interactive use with this service.&lt;br /&gt;
[[File:Ssh-service.png|center|800px|thumb|SSH key is now registered for interactive use.]]&lt;br /&gt;
&lt;br /&gt;
=== SSH Interactive Key valid after successful Login ===&lt;br /&gt;
&lt;br /&gt;
Interactive SSH Keys are not valid all the time, but only for a few hours after the last 2-factor login.&lt;br /&gt;
They have to be &amp;quot;unlocked&amp;quot; by entering the OTP and service password.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;text-align:center;&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;width:50%&amp;quot;| Cluster&lt;br /&gt;
! style=&amp;quot;width:50%&amp;quot;| Interactive SSH Key Validity&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| bwUniCluster 3.0&lt;br /&gt;
| 8h&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| bwForCluster Helix&lt;br /&gt;
| 12h&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| bwForCluster NEMO 2&lt;br /&gt;
| 12h&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Registering a Command Key ==&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
SSH command keys are always valid and do not need to be unlocked with a 2-factor login.&lt;br /&gt;
This makes these keys extremely valuable to a potential attacker and poses a security risk.&lt;br /&gt;
Therefore, additional restrictions apply to these keys:&lt;br /&gt;
* They must be limited to a single command to be executed.&lt;br /&gt;
* They must be limited to a single IP address (e.g., the workflow server) or a small number of IP addresses (e.g., the institution&#039;s subnet).&lt;br /&gt;
* They must be reviewed and approved by a cluster administrator before they can be used.&lt;br /&gt;
* Validity is reduced to one month.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Command Keys&#039;&#039;&#039; can be used for automatic workflows.&lt;br /&gt;
If you want to use rsync, please read the [[Registration/SSH/rrsync|rrsync wiki]].&lt;br /&gt;
&lt;br /&gt;
Perform the following steps to register a &amp;quot;Command key&amp;quot; (in this example we use rrsync):&lt;br /&gt;
&lt;br /&gt;
1. [[Registration/SSH#Adding_a_new_SSH_Key|&#039;&#039;&#039;Add a new &amp;quot;SSH key&amp;quot;&#039;&#039;&#039;]] if you have not already done so.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. Select &#039;&#039;&#039;Registered services/Registrierte Dienste&#039;&#039;&#039; from the top menu and click &#039;&#039;&#039;Set SSH Key/SSH Key setzen&#039;&#039;&#039; for the cluster for which you want to use the SSH key.&lt;br /&gt;
[[File:BwIDM-registered.png|center|600px|thumb|Select Cluster for which you want to use the SSH key.]]&lt;br /&gt;
&lt;br /&gt;
3. The upper block displays the SSH keys currently registered for the service.&lt;br /&gt;
The bottom block displays all the public SSH keys associated with your account.&lt;br /&gt;
Find the SSH key you want to use and click &#039;&#039;&#039;Add/Hinzufügen&#039;&#039;&#039;.&lt;br /&gt;
[[File:Ssh-service-com.png|center|800px|thumb|Add SSH key to service.]]&lt;br /&gt;
&lt;br /&gt;
4. A new window appears.&lt;br /&gt;
Select &#039;&#039;&#039;Command&#039;&#039;&#039; as the usage type.&lt;br /&gt;
Type the full command with the full path, including all parameters, in the &#039;&#039;&#039;Command&#039;&#039;&#039; text box.&lt;br /&gt;
Specify a network address, list, or range in the &#039;&#039;&#039;From&#039;&#039;&#039; text field (see [https://man.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man8/sshd.8#from=_pattern-list_ man 8 sshd] for more info).&lt;br /&gt;
Please also provide a comment to speed up the approval process.&lt;br /&gt;
Click &#039;&#039;&#039;Add/Hinzufügen&#039;&#039;&#039;.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! | Example&lt;br /&gt;
|-&lt;br /&gt;
| If you want to register a command key to be able to transfer data automatically, please use the following string as in the &#039;&#039;&#039;Command&#039;&#039;&#039; text field (please verify the path on the cluster first):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
/usr[/local]/bin/rrsync -ro /home/aa/aa_bb/aa_abc1/&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
[[File:Ssh-com.png|center|600px|thumb|Add command SSH key to service.]]&lt;br /&gt;
&lt;br /&gt;
5. After the key has been added, it will be marked as &#039;&#039;&#039;Pending&#039;&#039;&#039;:&lt;br /&gt;
You will receive an e-mail as soon as the key has been approved and can be used.&lt;br /&gt;
[[File:Ssh-service.png|center|800px|thumb|SSH key is now registered for interactive use.]]&lt;br /&gt;
&lt;br /&gt;
== Revoke/Delete SSH Key ==&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
Revoked keys are locked and can no longer be used.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;SSH keys&#039;&#039;&#039; are generally managed via the &#039;&#039;&#039;My SSH Pubkeys&#039;&#039;&#039; menu entry on the registration pages for the clusters.&lt;br /&gt;
Here you can add and revoke SSH keys. To revoke/delete a ssh key, please follow these steps:&lt;br /&gt;
&lt;br /&gt;
1. &#039;&#039;&#039;Select the cluster&#039;&#039;&#039; for which you want to delete the SSH key:&amp;lt;/br&amp;gt; &amp;amp;rarr; [https://login.bwidm.de/user/ssh-keys.xhtml &#039;&#039;&#039;bwUniCluster 3.0&#039;&#039;&#039;]&amp;lt;/br&amp;gt; &amp;amp;rarr; [https://bwservices.uni-heidelberg.de/user/ssh-keys.xhtml &#039;&#039;&#039;bwForCluster Helix&#039;&#039;&#039;]&amp;lt;/br&amp;gt; &amp;amp;rarr; [https://login.bwidm.de/user/ssh-keys.xhtml &#039;&#039;&#039;bwForCluster NEMO 2&#039;&#039;&#039;]&lt;br /&gt;
[[File:BwIDM-twofa.png|center|600px|thumb|My SSH Pubkeys.]]&lt;br /&gt;
&lt;br /&gt;
2. Click &#039;&#039;&#039;REVOKE/ZURÜCKZIEHEN&#039;&#039;&#039; next to the SSH key you want to revoke.&lt;br /&gt;
[[File:Ssh-success.png|center|800px|thumb|Revoke SSH key.]]&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15364</id>
		<title>BwUniCluster3.0/Running Jobs</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15364"/>
		<updated>2025-10-24T08:07:44Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
= Purpose and function of a queuing system =&lt;br /&gt;
&lt;br /&gt;
All compute activities on bwUniCluster 3.0 have to be performed on the compute nodes. Compute nodes are only available by requesting the corresponding resources via the queuing system. As soon as the requested resources are available, automated tasks are executed via a batch script or they can be accessed interactively.&amp;lt;br&amp;gt;&lt;br /&gt;
General procedure: Hint to [[Running_Calculations | Running Calculations]]&lt;br /&gt;
&lt;br /&gt;
== Job submission process ==&lt;br /&gt;
&lt;br /&gt;
bwUniCluster 3.0 has installed the workload managing software Slurm. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.&lt;br /&gt;
&lt;br /&gt;
== Slurm ==&lt;br /&gt;
&lt;br /&gt;
HPC Workload Manager on bwUniCluster 3.0 is Slurm.&lt;br /&gt;
Slurm is a cluster management and job scheduling system. Slurm has three key functions. &lt;br /&gt;
* It allocates access to resources (compute cores on nodes) to users for some duration of time so they can perform work. &lt;br /&gt;
* It provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. &lt;br /&gt;
* It arbitrates contention for resources by managing a queue of pending work.&lt;br /&gt;
&lt;br /&gt;
Any kind of calculation on the compute nodes of bwUniCluster 3.0 requires the user to define calculations as a sequence of commands together with required run time, number of CPU cores and main memory and submit all, i.e., the &#039;&#039;&#039;batch job&#039;&#039;&#039;, to a resource and workload managing software.&lt;br /&gt;
&lt;br /&gt;
== Terms and definitions ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Partitions &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Slurm manages job queues for different &#039;&#039;&#039;partitions&#039;&#039;&#039;. Partitions are used to group similar node types (e.g. nodes with and without accelerators) and to enforce different access policies and resource limits.&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different partitions:&lt;br /&gt;
&lt;br /&gt;
* CPU-only nodes&lt;br /&gt;
** 2-socket nodes, consisting of 2 Intel Ice Lake processors with 32 cores each or 2 AMD processors with 48 cores each&lt;br /&gt;
** 2-socket nodes with very high RAM capacity, consisting of 2 AMD processors with 48 cores each&lt;br /&gt;
* GPU-accelerated nodes&lt;br /&gt;
** 2-socket nodes with 4x NVIDIA A100 or 4x NVIDIA H100 GPUs&lt;br /&gt;
** 4-socket node with 4x AMD Instinct accelerator&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Queues &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Job &#039;&#039;&#039;queues&#039;&#039;&#039; are used to manage jobs that request access to shared but limited computing resources of a certain kind (partition).&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different main types of queues:&lt;br /&gt;
* Regular queues&lt;br /&gt;
** cpu: Jobs that request CPU-only nodes.&lt;br /&gt;
** gpu: Jobs that request GPU-accelerated nodes.&lt;br /&gt;
* Development queues (dev)&lt;br /&gt;
** Short, usually interactive jobs that are used for developing, compiling and testing code and workflows. The intention behind development queues is to provide users with immediate access to computer resources without having to wait. This is the place to realize instantaneous heavy compute without affecting other users, as would be the case on the login nodes.&lt;br /&gt;
&lt;br /&gt;
Requested compute resources such as (wall-)time, number of nodes and amount of memory are restricted and must fit into the boundaries imposed by the queues. The request for compute resources on the bwUniCluster 3.0 &amp;lt;font color=red&amp;gt;requires at least the specification of the &#039;&#039;&#039;queue&#039;&#039;&#039; and the &#039;&#039;&#039;time&#039;&#039;&#039;&amp;lt;/font&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Jobs &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Jobs can be run non-interactively as &#039;&#039;&#039;batch jobs&#039;&#039;&#039; or as &#039;&#039;&#039;interactive jobs&#039;&#039;&#039;.&amp;lt;br&amp;gt;&lt;br /&gt;
Submitting a batch job means, that all steps of a compute project are defined in a Bash script. This Bash script is queued and executed as soon as the compute resources are available and allocated. Jobs are enqueued with the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command.&lt;br /&gt;
For interactive jobs, the resources are requested with the &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; command. As soon as the computing resources are available and allocated, a command line prompt is returned on a computing node and the user can freely dispose of the resources now available to him.&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
&#039;&#039;&#039;Please remember:&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;Heavy computations are not allowed on the login nodes&#039;&#039;&#039;.&amp;lt;br&amp;gt;Use a developement or a regular job queue instead! Please refer to [[BwUniCluster3.0/Login#Allowed_Activities_on_Login_Nodes|Allowed Activities on Login Nodes]].&lt;br /&gt;
* &#039;&#039;&#039;Development queues&#039;&#039;&#039; are meant for &#039;&#039;&#039;development tasks&#039;&#039;&#039;.&amp;lt;br&amp;gt;Do not misuse this queue for regular, short-running jobs or chain jobs! Only one running job at a time is enabled. Maximum queue length is reduced to 3.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Queues on bwUniCluster 3.0 = &lt;br /&gt;
== Policy ==&lt;br /&gt;
&lt;br /&gt;
The computing time is provided in accordance with the &#039;&#039;&#039;fair share policy&#039;&#039;&#039;. The individual investment shares of the respective university and the resources already used by its members are taken into account. Furthermore, the following throttling policy is also active: The &#039;&#039;&#039;maximum amount of physical cores&#039;&#039;&#039; used at any given time from jobs running is &#039;&#039;&#039;1920 per user&#039;&#039;&#039; (aggregated over all running jobs). This number corresponds to 30 nodes on the Ice Lake partition or 20 nodes on the standard partition. The aim is to minimize waiting times and maximize the number of users who can access computing time at the same time.&lt;br /&gt;
&lt;br /&gt;
== Regular Queues ==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node-Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=30, mem=249600mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=20, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;highmem&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;High Memory&lt;br /&gt;
| mem-per-cpu=12090mb&lt;br /&gt;
| mem=380001mb&lt;br /&gt;
| time=72:00:00, nodes=4, mem=2300000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=72:00:00, nodes=12, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_mi300&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU node&amp;lt;br/&amp;gt;AMD GPU x4&lt;br /&gt;
| mem-per-gpu=128200mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=72:00:00, nodes=1, mem=510000mb, ntasks-per-node=40, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_il&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;gpu_h100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=48:00:00, nodes=9(A100)/nodes=5(H100) , mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 1: Regular Queues&lt;br /&gt;
&lt;br /&gt;
== Short Queues ==&lt;br /&gt;
&amp;lt;p style=&amp;quot;color:red; &amp;quot;&amp;gt;&amp;lt;b&amp;gt;Queues with a short runtime of 30 minutes.&amp;lt;/b&amp;gt;&amp;lt;/p&amp;gt; &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_short&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=94000mb&amp;lt;br/&amp;gt;cpus-per-gpu=12&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=12, mem=376000mb, ntasks-per-node=48, (threads-per-core=2)&lt;br /&gt;
|}&lt;br /&gt;
Table 2: Short Queues&lt;br /&gt;
&lt;br /&gt;
== Development Queues ==&lt;br /&gt;
Only for development, i.e. debugging or performance optimization ...&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=8, mem=249600mb, ntasks-per-node=64, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=1, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=1, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_a100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&amp;lt;br/&amp;gt;&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16 &lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=1, mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 3: Development Queues&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Default resources of a queue class define number of tasks and memory if not explicitly given with sbatch command. Resource list acronyms &#039;&#039;--time&#039;&#039;, &#039;&#039;--ntasks&#039;&#039;, &#039;&#039;--nodes&#039;&#039;, &#039;&#039;--mem&#039;&#039; and &#039;&#039;--mem-per-cpu&#039;&#039; are described [[BwUniCluster3.0/Running_Jobs/Slurm|here]].&lt;br /&gt;
&lt;br /&gt;
== Check available resources: sinfo_t_idle ==&lt;br /&gt;
The Slurm command sinfo is used to view partition and node information for a system running Slurm. It incorporates down time, reservations, and node state information in determining the available backfill window. The sinfo command can only be used by the administrator.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
SCC has prepared a special script (sinfo_t_idle) to find out how many processors are available for immediate use on the system. It is anticipated that users will use this information to submit jobs that meet these criteria and thus obtain quick job turnaround times. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The following command displays what resources are available for immediate use for the whole partition.&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sinfo_t_idle &lt;br /&gt;
Partition dev_cpu                 :      1 nodes idle&lt;br /&gt;
Partition cpu                     :      1 nodes idle&lt;br /&gt;
Partition highmem                 :      2 nodes idle&lt;br /&gt;
Partition dev_gpu_h100            :      0 nodes idle&lt;br /&gt;
Partition gpu_h100                :      0 nodes idle&lt;br /&gt;
Partition gpu_mi300               :      0 nodes idle&lt;br /&gt;
Partition dev_cpu_il              :      7 nodes idle&lt;br /&gt;
Partition cpu_il                  :      2 nodes idle&lt;br /&gt;
Partition dev_gpu_a100_il         :      1 nodes idle&lt;br /&gt;
Partition gpu_a100_il             :      0 nodes idle&lt;br /&gt;
Partition gpu_h100_il             :      1 nodes idle&lt;br /&gt;
Partition gpu_a100_short          :      0 nodes idle&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Running Jobs =&lt;br /&gt;
&lt;br /&gt;
== Slurm Commands (excerpt) ==&lt;br /&gt;
Important Slurm commands for non-administrators working on bwUniCluster 3.0.&lt;br /&gt;
{| width=850px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Brief explanation&lt;br /&gt;
|-&lt;br /&gt;
| [[#Batch Jobs: sbatch|sbatch]] || Submits a job and puts it into the queue [[https://slurm.schedmd.com/sbatch.html sbatch]] &lt;br /&gt;
|-&lt;br /&gt;
| [[#Interactive Jobs: salloc|salloc]] || Requests resources for an interactive Job [[https://slurm.schedmd.com/salloc.html salloc]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Monitor and manage jobs |scontrol show job]] || Displays detailed job state information [[https://slurm.schedmd.com/scontrol.html scontrol]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue]] || Displays information about active, eligible, blocked, and/or recently completed jobs [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue --start]] || Returns start time of submitted job [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Check available resources: sinfo_t_idle|sinfo_t_idle]] || Shows what resources are available for immediate use [[https://slurm.schedmd.com/sinfo.html sinfo]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Canceling own jobs : scancel|scancel]] || Cancels a job [[https://slurm.schedmd.com/scancel.html scancel]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
* [https://slurm.schedmd.com/tutorials.html  Slurm Tutorials]&lt;br /&gt;
* [https://slurm.schedmd.com/pdfs/summary.pdf  Slurm command/option summary (2 pages)]&lt;br /&gt;
* [https://slurm.schedmd.com/man_index.html  Slurm Commands]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Batch Jobs: sbatch ==&lt;br /&gt;
&lt;br /&gt;
Batch jobs are submitted by using the command &#039;&#039;&#039;sbatch&#039;&#039;&#039;. The main purpose of the &#039;&#039;&#039;sbatch&#039;&#039;&#039; command is to specify the resources that are needed to run the job. &#039;&#039;&#039;sbatch&#039;&#039;&#039; will then queue the batch job. However, starting of batch job depends on the availability of the requested resources and the fair sharing value.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The syntax and use of &#039;&#039;&#039;sbatch&#039;&#039;&#039; can be displayed via:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ man sbatch&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;sbatch&#039;&#039;&#039; options can be used from the command line or in your job script. Different defaults for some of these options are set based on the queue and can be found [[BwUniCluster3.0/Slurm | here]]&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;3&amp;quot; | sbatch Options&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;width:8%&amp;quot;| Command line&lt;br /&gt;
! style=&amp;quot;width:9%&amp;quot;| Script&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Purpose&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -t, --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| #SBATCH --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| Wall clock time limit.&amp;lt;br&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -N, --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of nodes to be used.&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -n, --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of tasks to be launched.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Maximum count of tasks per node.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -c, --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of CPUs required per (MPI-)task.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Memory in MegaByte per node. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Minimum Memory required per allocated CPU. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --exclusive&lt;br /&gt;
| #SBATCH --exclusive &lt;br /&gt;
| The job allocates all CPUs and GPUs on the nodes. It will not share the node with other running jobs&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| Notify user by email when certain event types occur.&amp;lt;br&amp;gt;Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
|  The specified mail-address receives email notification of state changes as defined by --mail-type.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job output is stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job error messages are stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -J, --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| Job name.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| #SBATCH --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| Identifies which environment variables from the submission environment are propagated to the launched application. Default is ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -A, --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| #SBATCH --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| Change resources used by this job to specified group. You may need this option if your account is assigned to more than one group. By command &amp;quot;scontrol show job&amp;quot; the project group the job is accounted on can be seen behind &amp;quot;Account=&amp;quot;. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -p, --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| #SBATCH --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| Request a specific queue for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| #SBATCH --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| Use a specific reservation for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;LSDF&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=LSDF&lt;br /&gt;
| Job constraint LSDF filesystems.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&lt;br /&gt;
| Job constraint BeeOND filesystem.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Interactive Jobs: salloc ==&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 you are only allowed to run short jobs (&amp;lt;&amp;lt; 1 hour) with little memory requirements (&amp;lt;&amp;lt; 8 GByte) on the logins nodes. If you want to run longer jobs and/or jobs with a request of more than 8 GByte of memory, you must allocate resources for so-called interactive jobs by usage of the command salloc on a login node. Considering a serial application running on a compute node that requires 5000 MByte of memory and limiting the interactive run to 2 hours the following command has to be executed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -n 1 -t 120 --mem=5000&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then you will get one core on a compute node within the partition &amp;quot;cpu&amp;quot;. After execution of this command &#039;&#039;&#039;DO NOT CLOSE&#039;&#039;&#039; your current terminal session but wait until the queueing system Slurm has granted you the requested resources on the compute system. You will be logged in automatically on the granted core! To run a serial program on the granted core you only have to type the name of the executable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ./&amp;lt;my_serial_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please be aware that your serial job must run less than 2 hours in this example, else the job will be killed during runtime by the system. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can also start now a graphical X11-terminal connecting you to the dedicated resource that is available for 2 hours. You can start it by the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ xterm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that, once the walltime limit has been reached the resources - i.e. the compute node - will automatically be revoked.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
An interactive parallel application running on one compute node or on many compute nodes (e.g. here 5 nodes) with 96 cores each requires usually an amount of memory in GByte (e.g. 50 GByte) and a maximum time (e.g. 1 hour). E.g. 5 nodes can be allocated by the following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -N 5 --ntasks-per-node=96 -t 01:00:00  --mem=50gb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now you can run parallel jobs on 480 cores requiring 50 GByte of memory per node. Please be aware that you will be logged in on core 0 of the first node.&lt;br /&gt;
If you want to have access to another node you have to open a new terminal, connect it also to bwUniCluster 3.0 and type the following commands to&lt;br /&gt;
connect to the running interactive job and then to a specific node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ srun --jobid=XXXXXXXX --pty /bin/bash&lt;br /&gt;
$ srun --nodelist=uc3nXXX --pty /bin/bash&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
the jobid and the nodelist can be shown.&lt;br /&gt;
&lt;br /&gt;
If you want to run MPI-programs, you can do it by simply typing mpirun &amp;lt;program_name&amp;gt;. Then your program will be run on 480 cores. A very simple example for starting a parallel job can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also start the debugger ddt by the commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module add devel/ddt&lt;br /&gt;
$ ddt &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above commands will execute the parallel program &amp;lt;my_mpi_program&amp;gt; on all available cores. You can also start parallel programs on a subset of cores; an example for this can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -n 50 &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you are using Intel MPI you must start &amp;lt;my_mpi_program&amp;gt; by the command mpiexec.hydra (instead of mpirun).&lt;br /&gt;
&lt;br /&gt;
== Interactive Computing with Jupyter ==&lt;br /&gt;
&lt;br /&gt;
== Monitor and manage jobs ==&lt;br /&gt;
&lt;br /&gt;
=== List of your submitted jobs : squeue ===&lt;br /&gt;
Displays information about YOUR active, pending and/or recently completed jobs. The command displays all own active and pending jobs. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;squeue&#039;&#039; example on bwUniCluster 3.0 &amp;lt;small&amp;gt;(Only your own jobs are displayed!)&amp;lt;/small&amp;gt;.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue &lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  R       8:15      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123 PD       0:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  R       2:41      1 uc3n084&lt;br /&gt;
$ squeue -l&lt;br /&gt;
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  RUNNING       8:55     20:00      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123  PENDING       0:00     20:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  RUNNING       3:21     20:00      1 uc3n084&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Detailed job information : scontrol show job ===&lt;br /&gt;
scontrol show job displays detailed job state information and diagnostic output for all or a specified job of yours. Detailed information is available for active, pending and recently completed jobs. The command scontrol is explained in detail on the webpage https://slurm.schedmd.com/scontrol.html or via manpage (man scontrol). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of all your jobs in normal mode: scontrol show job&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of a job with &amp;lt;jobid&amp;gt; in normal mode: scontrol show job &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here is an example from bwUniCluster 3.0:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
1262       cpu     wrap ka_zs040  R       1:12      1 uc3n002&lt;br /&gt;
&lt;br /&gt;
$&lt;br /&gt;
$ # now, see what&#039;s up with my pending job with jobid 1262&lt;br /&gt;
$ &lt;br /&gt;
$ scontrol show job 1262&lt;br /&gt;
&lt;br /&gt;
JobId=1262 JobName=wrap&lt;br /&gt;
   UserId=ka_zs0402(241992) GroupId=ka_scc(12345) MCS_label=N/A&lt;br /&gt;
   Priority=4246 Nice=0 Account=ka QOS=normal&lt;br /&gt;
   JobState=RUNNING Reason=None Dependency=(null)&lt;br /&gt;
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0&lt;br /&gt;
   RunTime=00:00:37 TimeLimit=00:20:00 TimeMin=N/A&lt;br /&gt;
   SubmitTime=2025-04-04T10:01:30 EligibleTime=2025-04-04T10:01:30&lt;br /&gt;
   AccrueTime=2025-04-04T10:01:30&lt;br /&gt;
   StartTime=2025-04-04T10:01:31 EndTime=2025-04-04T10:21:31 Deadline=N/A&lt;br /&gt;
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-04-04T10:01:31 Scheduler=Main&lt;br /&gt;
   Partition=cpu AllocNode:Sid=uc3n999:2819841&lt;br /&gt;
   ReqNodeList=(null) ExcNodeList=(null)&lt;br /&gt;
   NodeList=uc3n002&lt;br /&gt;
   BatchHost=uc3n002&lt;br /&gt;
   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*&lt;br /&gt;
   ReqTRES=cpu=1,mem=2000M,node=1,billing=1&lt;br /&gt;
   AllocTRES=cpu=2,mem=4000M,node=1,billing=2&lt;br /&gt;
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*&lt;br /&gt;
   MinCPUsNode=1 MinMemoryCPU=2000M MinTmpDiskNode=0&lt;br /&gt;
   Features=(null) DelayBoot=00:00:00&lt;br /&gt;
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)&lt;br /&gt;
   Command=(null)&lt;br /&gt;
   WorkDir=/pfs/data6/home/ka/ka_scc/ka_zs0402&lt;br /&gt;
   StdErr=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
   StdIn=/dev/null&lt;br /&gt;
   StdOut=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Each request to the Slurm workload manager generates a load. &amp;lt;p style=&amp;quot;color:red;&amp;gt;&amp;lt;b&amp;gt;Therefore, do not use &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; with a simple &amp;lt;code&amp;gt;watch&amp;lt;/code&amp;gt;.&amp;lt;/b&amp;gt;&amp;lt;/p&amp;gt; The smallest allowed time interval is &amp;lt;b&amp;gt;30 seconds&amp;lt;/b&amp;gt;.&amp;lt;br&amp;gt;&lt;br /&gt;
Any violation of this rule will result in the task being terminated without notice.&lt;br /&gt;
&lt;br /&gt;
=== Canceling own jobs : scancel ===&lt;br /&gt;
The scancel command is used to cancel jobs. The command scancel is explained in detail on the webpage https://slurm.schedmd.com/scancel.html or via manpage (man scancel). The command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel [-i] &amp;lt;job-id&amp;gt;&lt;br /&gt;
$ scancel -t &amp;lt;job_state_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Slurm Options =&lt;br /&gt;
[[BwUniCluster3.0/Running_Jobs/Slurm | Detailed Slurm usage]]&lt;br /&gt;
&lt;br /&gt;
= Best Practices =&lt;br /&gt;
&lt;br /&gt;
== Step-by-Step example==&lt;br /&gt;
&lt;br /&gt;
== Dos and Don&#039;ts ==&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;| Do not run squeue and other slurm commands in loops or &amp;quot;watch&amp;quot; as not to saturate up the slurm daemon with rpc requests&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15363</id>
		<title>BwUniCluster3.0/Running Jobs</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15363"/>
		<updated>2025-10-24T08:06:40Z</updated>

		<summary type="html">&lt;p&gt;S Braun: /* Detailed job information : scontrol show job */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
= Purpose and function of a queuing system =&lt;br /&gt;
&lt;br /&gt;
All compute activities on bwUniCluster 3.0 have to be performed on the compute nodes. Compute nodes are only available by requesting the corresponding resources via the queuing system. As soon as the requested resources are available, automated tasks are executed via a batch script or they can be accessed interactively.&amp;lt;br&amp;gt;&lt;br /&gt;
General procedure: Hint to [[Running_Calculations | Running Calculations]]&lt;br /&gt;
&lt;br /&gt;
== Job submission process ==&lt;br /&gt;
&lt;br /&gt;
bwUniCluster 3.0 has installed the workload managing software Slurm. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.&lt;br /&gt;
&lt;br /&gt;
== Slurm ==&lt;br /&gt;
&lt;br /&gt;
HPC Workload Manager on bwUniCluster 3.0 is Slurm.&lt;br /&gt;
Slurm is a cluster management and job scheduling system. Slurm has three key functions. &lt;br /&gt;
* It allocates access to resources (compute cores on nodes) to users for some duration of time so they can perform work. &lt;br /&gt;
* It provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. &lt;br /&gt;
* It arbitrates contention for resources by managing a queue of pending work.&lt;br /&gt;
&lt;br /&gt;
Any kind of calculation on the compute nodes of bwUniCluster 3.0 requires the user to define calculations as a sequence of commands together with required run time, number of CPU cores and main memory and submit all, i.e., the &#039;&#039;&#039;batch job&#039;&#039;&#039;, to a resource and workload managing software.&lt;br /&gt;
&lt;br /&gt;
== Terms and definitions ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Partitions &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Slurm manages job queues for different &#039;&#039;&#039;partitions&#039;&#039;&#039;. Partitions are used to group similar node types (e.g. nodes with and without accelerators) and to enforce different access policies and resource limits.&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different partitions:&lt;br /&gt;
&lt;br /&gt;
* CPU-only nodes&lt;br /&gt;
** 2-socket nodes, consisting of 2 Intel Ice Lake processors with 32 cores each or 2 AMD processors with 48 cores each&lt;br /&gt;
** 2-socket nodes with very high RAM capacity, consisting of 2 AMD processors with 48 cores each&lt;br /&gt;
* GPU-accelerated nodes&lt;br /&gt;
** 2-socket nodes with 4x NVIDIA A100 or 4x NVIDIA H100 GPUs&lt;br /&gt;
** 4-socket node with 4x AMD Instinct accelerator&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Queues &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Job &#039;&#039;&#039;queues&#039;&#039;&#039; are used to manage jobs that request access to shared but limited computing resources of a certain kind (partition).&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different main types of queues:&lt;br /&gt;
* Regular queues&lt;br /&gt;
** cpu: Jobs that request CPU-only nodes.&lt;br /&gt;
** gpu: Jobs that request GPU-accelerated nodes.&lt;br /&gt;
* Development queues (dev)&lt;br /&gt;
** Short, usually interactive jobs that are used for developing, compiling and testing code and workflows. The intention behind development queues is to provide users with immediate access to computer resources without having to wait. This is the place to realize instantaneous heavy compute without affecting other users, as would be the case on the login nodes.&lt;br /&gt;
&lt;br /&gt;
Requested compute resources such as (wall-)time, number of nodes and amount of memory are restricted and must fit into the boundaries imposed by the queues. The request for compute resources on the bwUniCluster 3.0 &amp;lt;font color=red&amp;gt;requires at least the specification of the &#039;&#039;&#039;queue&#039;&#039;&#039; and the &#039;&#039;&#039;time&#039;&#039;&#039;&amp;lt;/font&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Jobs &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Jobs can be run non-interactively as &#039;&#039;&#039;batch jobs&#039;&#039;&#039; or as &#039;&#039;&#039;interactive jobs&#039;&#039;&#039;.&amp;lt;br&amp;gt;&lt;br /&gt;
Submitting a batch job means, that all steps of a compute project are defined in a Bash script. This Bash script is queued and executed as soon as the compute resources are available and allocated. Jobs are enqueued with the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command.&lt;br /&gt;
For interactive jobs, the resources are requested with the &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; command. As soon as the computing resources are available and allocated, a command line prompt is returned on a computing node and the user can freely dispose of the resources now available to him.&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
&#039;&#039;&#039;Please remember:&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;Heavy computations are not allowed on the login nodes&#039;&#039;&#039;.&amp;lt;br&amp;gt;Use a developement or a regular job queue instead! Please refer to [[BwUniCluster3.0/Login#Allowed_Activities_on_Login_Nodes|Allowed Activities on Login Nodes]].&lt;br /&gt;
* &#039;&#039;&#039;Development queues&#039;&#039;&#039; are meant for &#039;&#039;&#039;development tasks&#039;&#039;&#039;.&amp;lt;br&amp;gt;Do not misuse this queue for regular, short-running jobs or chain jobs! Only one running job at a time is enabled. Maximum queue length is reduced to 3.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Queues on bwUniCluster 3.0 = &lt;br /&gt;
== Policy ==&lt;br /&gt;
&lt;br /&gt;
The computing time is provided in accordance with the &#039;&#039;&#039;fair share policy&#039;&#039;&#039;. The individual investment shares of the respective university and the resources already used by its members are taken into account. Furthermore, the following throttling policy is also active: The &#039;&#039;&#039;maximum amount of physical cores&#039;&#039;&#039; used at any given time from jobs running is &#039;&#039;&#039;1920 per user&#039;&#039;&#039; (aggregated over all running jobs). This number corresponds to 30 nodes on the Ice Lake partition or 20 nodes on the standard partition. The aim is to minimize waiting times and maximize the number of users who can access computing time at the same time.&lt;br /&gt;
&lt;br /&gt;
== Regular Queues ==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node-Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=30, mem=249600mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=20, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;highmem&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;High Memory&lt;br /&gt;
| mem-per-cpu=12090mb&lt;br /&gt;
| mem=380001mb&lt;br /&gt;
| time=72:00:00, nodes=4, mem=2300000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=72:00:00, nodes=12, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_mi300&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU node&amp;lt;br/&amp;gt;AMD GPU x4&lt;br /&gt;
| mem-per-gpu=128200mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=72:00:00, nodes=1, mem=510000mb, ntasks-per-node=40, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_il&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;gpu_h100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=48:00:00, nodes=9(A100)/nodes=5(H100) , mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 1: Regular Queues&lt;br /&gt;
&lt;br /&gt;
== Short Queues ==&lt;br /&gt;
&amp;lt;p style=&amp;quot;color:red; &amp;quot;&amp;gt;&amp;lt;b&amp;gt;Queues with a short runtime of 30 minutes.&amp;lt;/b&amp;gt;&amp;lt;/p&amp;gt; &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_short&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=94000mb&amp;lt;br/&amp;gt;cpus-per-gpu=12&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=12, mem=376000mb, ntasks-per-node=48, (threads-per-core=2)&lt;br /&gt;
|}&lt;br /&gt;
Table 2: Short Queues&lt;br /&gt;
&lt;br /&gt;
== Development Queues ==&lt;br /&gt;
Only for development, i.e. debugging or performance optimization ...&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=8, mem=249600mb, ntasks-per-node=64, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=1, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=1, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_a100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&amp;lt;br/&amp;gt;&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16 &lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=1, mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 3: Development Queues&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Default resources of a queue class define number of tasks and memory if not explicitly given with sbatch command. Resource list acronyms &#039;&#039;--time&#039;&#039;, &#039;&#039;--ntasks&#039;&#039;, &#039;&#039;--nodes&#039;&#039;, &#039;&#039;--mem&#039;&#039; and &#039;&#039;--mem-per-cpu&#039;&#039; are described [[BwUniCluster3.0/Running_Jobs/Slurm|here]].&lt;br /&gt;
&lt;br /&gt;
== Check available resources: sinfo_t_idle ==&lt;br /&gt;
The Slurm command sinfo is used to view partition and node information for a system running Slurm. It incorporates down time, reservations, and node state information in determining the available backfill window. The sinfo command can only be used by the administrator.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
SCC has prepared a special script (sinfo_t_idle) to find out how many processors are available for immediate use on the system. It is anticipated that users will use this information to submit jobs that meet these criteria and thus obtain quick job turnaround times. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The following command displays what resources are available for immediate use for the whole partition.&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sinfo_t_idle &lt;br /&gt;
Partition dev_cpu                 :      1 nodes idle&lt;br /&gt;
Partition cpu                     :      1 nodes idle&lt;br /&gt;
Partition highmem                 :      2 nodes idle&lt;br /&gt;
Partition dev_gpu_h100            :      0 nodes idle&lt;br /&gt;
Partition gpu_h100                :      0 nodes idle&lt;br /&gt;
Partition gpu_mi300               :      0 nodes idle&lt;br /&gt;
Partition dev_cpu_il              :      7 nodes idle&lt;br /&gt;
Partition cpu_il                  :      2 nodes idle&lt;br /&gt;
Partition dev_gpu_a100_il         :      1 nodes idle&lt;br /&gt;
Partition gpu_a100_il             :      0 nodes idle&lt;br /&gt;
Partition gpu_h100_il             :      1 nodes idle&lt;br /&gt;
Partition gpu_a100_short          :      0 nodes idle&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Running Jobs =&lt;br /&gt;
&lt;br /&gt;
== Slurm Commands (excerpt) ==&lt;br /&gt;
Important Slurm commands for non-administrators working on bwUniCluster 3.0.&lt;br /&gt;
{| width=850px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Brief explanation&lt;br /&gt;
|-&lt;br /&gt;
| [[#Batch Jobs: sbatch|sbatch]] || Submits a job and puts it into the queue [[https://slurm.schedmd.com/sbatch.html sbatch]] &lt;br /&gt;
|-&lt;br /&gt;
| [[#Interactive Jobs: salloc|salloc]] || Requests resources for an interactive Job [[https://slurm.schedmd.com/salloc.html salloc]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Monitor and manage jobs |scontrol show job]] || Displays detailed job state information [[https://slurm.schedmd.com/scontrol.html scontrol]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue]] || Displays information about active, eligible, blocked, and/or recently completed jobs [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue --start]] || Returns start time of submitted job [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Check available resources: sinfo_t_idle|sinfo_t_idle]] || Shows what resources are available for immediate use [[https://slurm.schedmd.com/sinfo.html sinfo]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Canceling own jobs : scancel|scancel]] || Cancels a job [[https://slurm.schedmd.com/scancel.html scancel]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
* [https://slurm.schedmd.com/tutorials.html  Slurm Tutorials]&lt;br /&gt;
* [https://slurm.schedmd.com/pdfs/summary.pdf  Slurm command/option summary (2 pages)]&lt;br /&gt;
* [https://slurm.schedmd.com/man_index.html  Slurm Commands]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Batch Jobs: sbatch ==&lt;br /&gt;
&lt;br /&gt;
Batch jobs are submitted by using the command &#039;&#039;&#039;sbatch&#039;&#039;&#039;. The main purpose of the &#039;&#039;&#039;sbatch&#039;&#039;&#039; command is to specify the resources that are needed to run the job. &#039;&#039;&#039;sbatch&#039;&#039;&#039; will then queue the batch job. However, starting of batch job depends on the availability of the requested resources and the fair sharing value.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The syntax and use of &#039;&#039;&#039;sbatch&#039;&#039;&#039; can be displayed via:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ man sbatch&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;sbatch&#039;&#039;&#039; options can be used from the command line or in your job script. Different defaults for some of these options are set based on the queue and can be found [[BwUniCluster3.0/Slurm | here]]&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;3&amp;quot; | sbatch Options&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;width:8%&amp;quot;| Command line&lt;br /&gt;
! style=&amp;quot;width:9%&amp;quot;| Script&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Purpose&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -t, --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| #SBATCH --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| Wall clock time limit.&amp;lt;br&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -N, --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of nodes to be used.&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -n, --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of tasks to be launched.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Maximum count of tasks per node.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -c, --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of CPUs required per (MPI-)task.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Memory in MegaByte per node. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Minimum Memory required per allocated CPU. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --exclusive&lt;br /&gt;
| #SBATCH --exclusive &lt;br /&gt;
| The job allocates all CPUs and GPUs on the nodes. It will not share the node with other running jobs&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| Notify user by email when certain event types occur.&amp;lt;br&amp;gt;Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
|  The specified mail-address receives email notification of state changes as defined by --mail-type.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job output is stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job error messages are stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -J, --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| Job name.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| #SBATCH --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| Identifies which environment variables from the submission environment are propagated to the launched application. Default is ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -A, --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| #SBATCH --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| Change resources used by this job to specified group. You may need this option if your account is assigned to more than one group. By command &amp;quot;scontrol show job&amp;quot; the project group the job is accounted on can be seen behind &amp;quot;Account=&amp;quot;. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -p, --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| #SBATCH --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| Request a specific queue for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| #SBATCH --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| Use a specific reservation for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;LSDF&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=LSDF&lt;br /&gt;
| Job constraint LSDF filesystems.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&lt;br /&gt;
| Job constraint BeeOND filesystem.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Interactive Jobs: salloc ==&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 you are only allowed to run short jobs (&amp;lt;&amp;lt; 1 hour) with little memory requirements (&amp;lt;&amp;lt; 8 GByte) on the logins nodes. If you want to run longer jobs and/or jobs with a request of more than 8 GByte of memory, you must allocate resources for so-called interactive jobs by usage of the command salloc on a login node. Considering a serial application running on a compute node that requires 5000 MByte of memory and limiting the interactive run to 2 hours the following command has to be executed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -n 1 -t 120 --mem=5000&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then you will get one core on a compute node within the partition &amp;quot;cpu&amp;quot;. After execution of this command &#039;&#039;&#039;DO NOT CLOSE&#039;&#039;&#039; your current terminal session but wait until the queueing system Slurm has granted you the requested resources on the compute system. You will be logged in automatically on the granted core! To run a serial program on the granted core you only have to type the name of the executable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ./&amp;lt;my_serial_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please be aware that your serial job must run less than 2 hours in this example, else the job will be killed during runtime by the system. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can also start now a graphical X11-terminal connecting you to the dedicated resource that is available for 2 hours. You can start it by the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ xterm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that, once the walltime limit has been reached the resources - i.e. the compute node - will automatically be revoked.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
An interactive parallel application running on one compute node or on many compute nodes (e.g. here 5 nodes) with 96 cores each requires usually an amount of memory in GByte (e.g. 50 GByte) and a maximum time (e.g. 1 hour). E.g. 5 nodes can be allocated by the following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -N 5 --ntasks-per-node=96 -t 01:00:00  --mem=50gb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now you can run parallel jobs on 480 cores requiring 50 GByte of memory per node. Please be aware that you will be logged in on core 0 of the first node.&lt;br /&gt;
If you want to have access to another node you have to open a new terminal, connect it also to bwUniCluster 3.0 and type the following commands to&lt;br /&gt;
connect to the running interactive job and then to a specific node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ srun --jobid=XXXXXXXX --pty /bin/bash&lt;br /&gt;
$ srun --nodelist=uc3nXXX --pty /bin/bash&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
the jobid and the nodelist can be shown.&lt;br /&gt;
&lt;br /&gt;
If you want to run MPI-programs, you can do it by simply typing mpirun &amp;lt;program_name&amp;gt;. Then your program will be run on 480 cores. A very simple example for starting a parallel job can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also start the debugger ddt by the commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module add devel/ddt&lt;br /&gt;
$ ddt &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above commands will execute the parallel program &amp;lt;my_mpi_program&amp;gt; on all available cores. You can also start parallel programs on a subset of cores; an example for this can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -n 50 &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you are using Intel MPI you must start &amp;lt;my_mpi_program&amp;gt; by the command mpiexec.hydra (instead of mpirun).&lt;br /&gt;
&lt;br /&gt;
== Interactive Computing with Jupyter ==&lt;br /&gt;
&lt;br /&gt;
== Monitor and manage jobs ==&lt;br /&gt;
&lt;br /&gt;
=== List of your submitted jobs : squeue ===&lt;br /&gt;
Displays information about YOUR active, pending and/or recently completed jobs. The command displays all own active and pending jobs. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;squeue&#039;&#039; example on bwUniCluster 3.0 &amp;lt;small&amp;gt;(Only your own jobs are displayed!)&amp;lt;/small&amp;gt;.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue &lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  R       8:15      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123 PD       0:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  R       2:41      1 uc3n084&lt;br /&gt;
$ squeue -l&lt;br /&gt;
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  RUNNING       8:55     20:00      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123  PENDING       0:00     20:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  RUNNING       3:21     20:00      1 uc3n084&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Detailed job information : scontrol show job ===&lt;br /&gt;
scontrol show job displays detailed job state information and diagnostic output for all or a specified job of yours. Detailed information is available for active, pending and recently completed jobs. The command scontrol is explained in detail on the webpage https://slurm.schedmd.com/scontrol.html or via manpage (man scontrol). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of all your jobs in normal mode: scontrol show job&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of a job with &amp;lt;jobid&amp;gt; in normal mode: scontrol show job &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here is an example from bwUniCluster 3.0:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
1262       cpu     wrap ka_zs040  R       1:12      1 uc3n002&lt;br /&gt;
&lt;br /&gt;
$&lt;br /&gt;
$ # now, see what&#039;s up with my pending job with jobid 1262&lt;br /&gt;
$ &lt;br /&gt;
$ scontrol show job 1262&lt;br /&gt;
&lt;br /&gt;
JobId=1262 JobName=wrap&lt;br /&gt;
   UserId=ka_zs0402(241992) GroupId=ka_scc(12345) MCS_label=N/A&lt;br /&gt;
   Priority=4246 Nice=0 Account=ka QOS=normal&lt;br /&gt;
   JobState=RUNNING Reason=None Dependency=(null)&lt;br /&gt;
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0&lt;br /&gt;
   RunTime=00:00:37 TimeLimit=00:20:00 TimeMin=N/A&lt;br /&gt;
   SubmitTime=2025-04-04T10:01:30 EligibleTime=2025-04-04T10:01:30&lt;br /&gt;
   AccrueTime=2025-04-04T10:01:30&lt;br /&gt;
   StartTime=2025-04-04T10:01:31 EndTime=2025-04-04T10:21:31 Deadline=N/A&lt;br /&gt;
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-04-04T10:01:31 Scheduler=Main&lt;br /&gt;
   Partition=cpu AllocNode:Sid=uc3n999:2819841&lt;br /&gt;
   ReqNodeList=(null) ExcNodeList=(null)&lt;br /&gt;
   NodeList=uc3n002&lt;br /&gt;
   BatchHost=uc3n002&lt;br /&gt;
   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*&lt;br /&gt;
   ReqTRES=cpu=1,mem=2000M,node=1,billing=1&lt;br /&gt;
   AllocTRES=cpu=2,mem=4000M,node=1,billing=2&lt;br /&gt;
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*&lt;br /&gt;
   MinCPUsNode=1 MinMemoryCPU=2000M MinTmpDiskNode=0&lt;br /&gt;
   Features=(null) DelayBoot=00:00:00&lt;br /&gt;
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)&lt;br /&gt;
   Command=(null)&lt;br /&gt;
   WorkDir=/pfs/data6/home/ka/ka_scc/ka_zs0402&lt;br /&gt;
   StdErr=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
   StdIn=/dev/null&lt;br /&gt;
   StdOut=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Each request to the Slurm workload manager generates a load. &amp;lt;p style=&amp;quot;color:red;&amp;gt; Therefore, do not use &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; with a simple &amp;lt;code&amp;gt;watch&amp;lt;/code&amp;gt;.&amp;lt;/p&amp;gt; The smallest allowed time interval is &amp;lt;b&amp;gt;30 seconds&amp;lt;/b&amp;gt;.&amp;lt;br&amp;gt;&lt;br /&gt;
Any violation of this rule will result in the task being terminated without notice.&lt;br /&gt;
&lt;br /&gt;
=== Canceling own jobs : scancel ===&lt;br /&gt;
The scancel command is used to cancel jobs. The command scancel is explained in detail on the webpage https://slurm.schedmd.com/scancel.html or via manpage (man scancel). The command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel [-i] &amp;lt;job-id&amp;gt;&lt;br /&gt;
$ scancel -t &amp;lt;job_state_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Slurm Options =&lt;br /&gt;
[[BwUniCluster3.0/Running_Jobs/Slurm | Detailed Slurm usage]]&lt;br /&gt;
&lt;br /&gt;
= Best Practices =&lt;br /&gt;
&lt;br /&gt;
== Step-by-Step example==&lt;br /&gt;
&lt;br /&gt;
== Dos and Don&#039;ts ==&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;| Do not run squeue and other slurm commands in loops or &amp;quot;watch&amp;quot; as not to saturate up the slurm daemon with rpc requests&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15362</id>
		<title>BwUniCluster3.0/Running Jobs</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15362"/>
		<updated>2025-10-24T08:05:14Z</updated>

		<summary type="html">&lt;p&gt;S Braun: /* Detailed job information : scontrol show job */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
= Purpose and function of a queuing system =&lt;br /&gt;
&lt;br /&gt;
All compute activities on bwUniCluster 3.0 have to be performed on the compute nodes. Compute nodes are only available by requesting the corresponding resources via the queuing system. As soon as the requested resources are available, automated tasks are executed via a batch script or they can be accessed interactively.&amp;lt;br&amp;gt;&lt;br /&gt;
General procedure: Hint to [[Running_Calculations | Running Calculations]]&lt;br /&gt;
&lt;br /&gt;
== Job submission process ==&lt;br /&gt;
&lt;br /&gt;
bwUniCluster 3.0 has installed the workload managing software Slurm. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.&lt;br /&gt;
&lt;br /&gt;
== Slurm ==&lt;br /&gt;
&lt;br /&gt;
HPC Workload Manager on bwUniCluster 3.0 is Slurm.&lt;br /&gt;
Slurm is a cluster management and job scheduling system. Slurm has three key functions. &lt;br /&gt;
* It allocates access to resources (compute cores on nodes) to users for some duration of time so they can perform work. &lt;br /&gt;
* It provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. &lt;br /&gt;
* It arbitrates contention for resources by managing a queue of pending work.&lt;br /&gt;
&lt;br /&gt;
Any kind of calculation on the compute nodes of bwUniCluster 3.0 requires the user to define calculations as a sequence of commands together with required run time, number of CPU cores and main memory and submit all, i.e., the &#039;&#039;&#039;batch job&#039;&#039;&#039;, to a resource and workload managing software.&lt;br /&gt;
&lt;br /&gt;
== Terms and definitions ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Partitions &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Slurm manages job queues for different &#039;&#039;&#039;partitions&#039;&#039;&#039;. Partitions are used to group similar node types (e.g. nodes with and without accelerators) and to enforce different access policies and resource limits.&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different partitions:&lt;br /&gt;
&lt;br /&gt;
* CPU-only nodes&lt;br /&gt;
** 2-socket nodes, consisting of 2 Intel Ice Lake processors with 32 cores each or 2 AMD processors with 48 cores each&lt;br /&gt;
** 2-socket nodes with very high RAM capacity, consisting of 2 AMD processors with 48 cores each&lt;br /&gt;
* GPU-accelerated nodes&lt;br /&gt;
** 2-socket nodes with 4x NVIDIA A100 or 4x NVIDIA H100 GPUs&lt;br /&gt;
** 4-socket node with 4x AMD Instinct accelerator&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Queues &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Job &#039;&#039;&#039;queues&#039;&#039;&#039; are used to manage jobs that request access to shared but limited computing resources of a certain kind (partition).&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different main types of queues:&lt;br /&gt;
* Regular queues&lt;br /&gt;
** cpu: Jobs that request CPU-only nodes.&lt;br /&gt;
** gpu: Jobs that request GPU-accelerated nodes.&lt;br /&gt;
* Development queues (dev)&lt;br /&gt;
** Short, usually interactive jobs that are used for developing, compiling and testing code and workflows. The intention behind development queues is to provide users with immediate access to computer resources without having to wait. This is the place to realize instantaneous heavy compute without affecting other users, as would be the case on the login nodes.&lt;br /&gt;
&lt;br /&gt;
Requested compute resources such as (wall-)time, number of nodes and amount of memory are restricted and must fit into the boundaries imposed by the queues. The request for compute resources on the bwUniCluster 3.0 &amp;lt;font color=red&amp;gt;requires at least the specification of the &#039;&#039;&#039;queue&#039;&#039;&#039; and the &#039;&#039;&#039;time&#039;&#039;&#039;&amp;lt;/font&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Jobs &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Jobs can be run non-interactively as &#039;&#039;&#039;batch jobs&#039;&#039;&#039; or as &#039;&#039;&#039;interactive jobs&#039;&#039;&#039;.&amp;lt;br&amp;gt;&lt;br /&gt;
Submitting a batch job means, that all steps of a compute project are defined in a Bash script. This Bash script is queued and executed as soon as the compute resources are available and allocated. Jobs are enqueued with the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command.&lt;br /&gt;
For interactive jobs, the resources are requested with the &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; command. As soon as the computing resources are available and allocated, a command line prompt is returned on a computing node and the user can freely dispose of the resources now available to him.&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
&#039;&#039;&#039;Please remember:&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;Heavy computations are not allowed on the login nodes&#039;&#039;&#039;.&amp;lt;br&amp;gt;Use a developement or a regular job queue instead! Please refer to [[BwUniCluster3.0/Login#Allowed_Activities_on_Login_Nodes|Allowed Activities on Login Nodes]].&lt;br /&gt;
* &#039;&#039;&#039;Development queues&#039;&#039;&#039; are meant for &#039;&#039;&#039;development tasks&#039;&#039;&#039;.&amp;lt;br&amp;gt;Do not misuse this queue for regular, short-running jobs or chain jobs! Only one running job at a time is enabled. Maximum queue length is reduced to 3.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Queues on bwUniCluster 3.0 = &lt;br /&gt;
== Policy ==&lt;br /&gt;
&lt;br /&gt;
The computing time is provided in accordance with the &#039;&#039;&#039;fair share policy&#039;&#039;&#039;. The individual investment shares of the respective university and the resources already used by its members are taken into account. Furthermore, the following throttling policy is also active: The &#039;&#039;&#039;maximum amount of physical cores&#039;&#039;&#039; used at any given time from jobs running is &#039;&#039;&#039;1920 per user&#039;&#039;&#039; (aggregated over all running jobs). This number corresponds to 30 nodes on the Ice Lake partition or 20 nodes on the standard partition. The aim is to minimize waiting times and maximize the number of users who can access computing time at the same time.&lt;br /&gt;
&lt;br /&gt;
== Regular Queues ==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node-Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=30, mem=249600mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=20, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;highmem&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;High Memory&lt;br /&gt;
| mem-per-cpu=12090mb&lt;br /&gt;
| mem=380001mb&lt;br /&gt;
| time=72:00:00, nodes=4, mem=2300000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=72:00:00, nodes=12, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_mi300&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU node&amp;lt;br/&amp;gt;AMD GPU x4&lt;br /&gt;
| mem-per-gpu=128200mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=72:00:00, nodes=1, mem=510000mb, ntasks-per-node=40, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_il&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;gpu_h100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=48:00:00, nodes=9(A100)/nodes=5(H100) , mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 1: Regular Queues&lt;br /&gt;
&lt;br /&gt;
== Short Queues ==&lt;br /&gt;
&amp;lt;p style=&amp;quot;color:red; &amp;quot;&amp;gt;&amp;lt;b&amp;gt;Queues with a short runtime of 30 minutes.&amp;lt;/b&amp;gt;&amp;lt;/p&amp;gt; &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_short&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=94000mb&amp;lt;br/&amp;gt;cpus-per-gpu=12&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=12, mem=376000mb, ntasks-per-node=48, (threads-per-core=2)&lt;br /&gt;
|}&lt;br /&gt;
Table 2: Short Queues&lt;br /&gt;
&lt;br /&gt;
== Development Queues ==&lt;br /&gt;
Only for development, i.e. debugging or performance optimization ...&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=8, mem=249600mb, ntasks-per-node=64, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=1, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=1, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_a100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&amp;lt;br/&amp;gt;&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16 &lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=1, mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 3: Development Queues&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Default resources of a queue class define number of tasks and memory if not explicitly given with sbatch command. Resource list acronyms &#039;&#039;--time&#039;&#039;, &#039;&#039;--ntasks&#039;&#039;, &#039;&#039;--nodes&#039;&#039;, &#039;&#039;--mem&#039;&#039; and &#039;&#039;--mem-per-cpu&#039;&#039; are described [[BwUniCluster3.0/Running_Jobs/Slurm|here]].&lt;br /&gt;
&lt;br /&gt;
== Check available resources: sinfo_t_idle ==&lt;br /&gt;
The Slurm command sinfo is used to view partition and node information for a system running Slurm. It incorporates down time, reservations, and node state information in determining the available backfill window. The sinfo command can only be used by the administrator.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
SCC has prepared a special script (sinfo_t_idle) to find out how many processors are available for immediate use on the system. It is anticipated that users will use this information to submit jobs that meet these criteria and thus obtain quick job turnaround times. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The following command displays what resources are available for immediate use for the whole partition.&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sinfo_t_idle &lt;br /&gt;
Partition dev_cpu                 :      1 nodes idle&lt;br /&gt;
Partition cpu                     :      1 nodes idle&lt;br /&gt;
Partition highmem                 :      2 nodes idle&lt;br /&gt;
Partition dev_gpu_h100            :      0 nodes idle&lt;br /&gt;
Partition gpu_h100                :      0 nodes idle&lt;br /&gt;
Partition gpu_mi300               :      0 nodes idle&lt;br /&gt;
Partition dev_cpu_il              :      7 nodes idle&lt;br /&gt;
Partition cpu_il                  :      2 nodes idle&lt;br /&gt;
Partition dev_gpu_a100_il         :      1 nodes idle&lt;br /&gt;
Partition gpu_a100_il             :      0 nodes idle&lt;br /&gt;
Partition gpu_h100_il             :      1 nodes idle&lt;br /&gt;
Partition gpu_a100_short          :      0 nodes idle&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Running Jobs =&lt;br /&gt;
&lt;br /&gt;
== Slurm Commands (excerpt) ==&lt;br /&gt;
Important Slurm commands for non-administrators working on bwUniCluster 3.0.&lt;br /&gt;
{| width=850px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Brief explanation&lt;br /&gt;
|-&lt;br /&gt;
| [[#Batch Jobs: sbatch|sbatch]] || Submits a job and puts it into the queue [[https://slurm.schedmd.com/sbatch.html sbatch]] &lt;br /&gt;
|-&lt;br /&gt;
| [[#Interactive Jobs: salloc|salloc]] || Requests resources for an interactive Job [[https://slurm.schedmd.com/salloc.html salloc]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Monitor and manage jobs |scontrol show job]] || Displays detailed job state information [[https://slurm.schedmd.com/scontrol.html scontrol]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue]] || Displays information about active, eligible, blocked, and/or recently completed jobs [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue --start]] || Returns start time of submitted job [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Check available resources: sinfo_t_idle|sinfo_t_idle]] || Shows what resources are available for immediate use [[https://slurm.schedmd.com/sinfo.html sinfo]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Canceling own jobs : scancel|scancel]] || Cancels a job [[https://slurm.schedmd.com/scancel.html scancel]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
* [https://slurm.schedmd.com/tutorials.html  Slurm Tutorials]&lt;br /&gt;
* [https://slurm.schedmd.com/pdfs/summary.pdf  Slurm command/option summary (2 pages)]&lt;br /&gt;
* [https://slurm.schedmd.com/man_index.html  Slurm Commands]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Batch Jobs: sbatch ==&lt;br /&gt;
&lt;br /&gt;
Batch jobs are submitted by using the command &#039;&#039;&#039;sbatch&#039;&#039;&#039;. The main purpose of the &#039;&#039;&#039;sbatch&#039;&#039;&#039; command is to specify the resources that are needed to run the job. &#039;&#039;&#039;sbatch&#039;&#039;&#039; will then queue the batch job. However, starting of batch job depends on the availability of the requested resources and the fair sharing value.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The syntax and use of &#039;&#039;&#039;sbatch&#039;&#039;&#039; can be displayed via:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ man sbatch&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;sbatch&#039;&#039;&#039; options can be used from the command line or in your job script. Different defaults for some of these options are set based on the queue and can be found [[BwUniCluster3.0/Slurm | here]]&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;3&amp;quot; | sbatch Options&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;width:8%&amp;quot;| Command line&lt;br /&gt;
! style=&amp;quot;width:9%&amp;quot;| Script&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Purpose&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -t, --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| #SBATCH --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| Wall clock time limit.&amp;lt;br&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -N, --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of nodes to be used.&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -n, --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of tasks to be launched.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Maximum count of tasks per node.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -c, --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of CPUs required per (MPI-)task.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Memory in MegaByte per node. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Minimum Memory required per allocated CPU. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --exclusive&lt;br /&gt;
| #SBATCH --exclusive &lt;br /&gt;
| The job allocates all CPUs and GPUs on the nodes. It will not share the node with other running jobs&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| Notify user by email when certain event types occur.&amp;lt;br&amp;gt;Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
|  The specified mail-address receives email notification of state changes as defined by --mail-type.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job output is stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job error messages are stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -J, --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| Job name.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| #SBATCH --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| Identifies which environment variables from the submission environment are propagated to the launched application. Default is ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -A, --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| #SBATCH --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| Change resources used by this job to specified group. You may need this option if your account is assigned to more than one group. By command &amp;quot;scontrol show job&amp;quot; the project group the job is accounted on can be seen behind &amp;quot;Account=&amp;quot;. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -p, --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| #SBATCH --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| Request a specific queue for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| #SBATCH --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| Use a specific reservation for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;LSDF&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=LSDF&lt;br /&gt;
| Job constraint LSDF filesystems.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&lt;br /&gt;
| Job constraint BeeOND filesystem.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Interactive Jobs: salloc ==&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 you are only allowed to run short jobs (&amp;lt;&amp;lt; 1 hour) with little memory requirements (&amp;lt;&amp;lt; 8 GByte) on the logins nodes. If you want to run longer jobs and/or jobs with a request of more than 8 GByte of memory, you must allocate resources for so-called interactive jobs by usage of the command salloc on a login node. Considering a serial application running on a compute node that requires 5000 MByte of memory and limiting the interactive run to 2 hours the following command has to be executed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -n 1 -t 120 --mem=5000&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then you will get one core on a compute node within the partition &amp;quot;cpu&amp;quot;. After execution of this command &#039;&#039;&#039;DO NOT CLOSE&#039;&#039;&#039; your current terminal session but wait until the queueing system Slurm has granted you the requested resources on the compute system. You will be logged in automatically on the granted core! To run a serial program on the granted core you only have to type the name of the executable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ./&amp;lt;my_serial_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please be aware that your serial job must run less than 2 hours in this example, else the job will be killed during runtime by the system. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can also start now a graphical X11-terminal connecting you to the dedicated resource that is available for 2 hours. You can start it by the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ xterm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that, once the walltime limit has been reached the resources - i.e. the compute node - will automatically be revoked.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
An interactive parallel application running on one compute node or on many compute nodes (e.g. here 5 nodes) with 96 cores each requires usually an amount of memory in GByte (e.g. 50 GByte) and a maximum time (e.g. 1 hour). E.g. 5 nodes can be allocated by the following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -N 5 --ntasks-per-node=96 -t 01:00:00  --mem=50gb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now you can run parallel jobs on 480 cores requiring 50 GByte of memory per node. Please be aware that you will be logged in on core 0 of the first node.&lt;br /&gt;
If you want to have access to another node you have to open a new terminal, connect it also to bwUniCluster 3.0 and type the following commands to&lt;br /&gt;
connect to the running interactive job and then to a specific node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ srun --jobid=XXXXXXXX --pty /bin/bash&lt;br /&gt;
$ srun --nodelist=uc3nXXX --pty /bin/bash&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
the jobid and the nodelist can be shown.&lt;br /&gt;
&lt;br /&gt;
If you want to run MPI-programs, you can do it by simply typing mpirun &amp;lt;program_name&amp;gt;. Then your program will be run on 480 cores. A very simple example for starting a parallel job can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also start the debugger ddt by the commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module add devel/ddt&lt;br /&gt;
$ ddt &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above commands will execute the parallel program &amp;lt;my_mpi_program&amp;gt; on all available cores. You can also start parallel programs on a subset of cores; an example for this can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -n 50 &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you are using Intel MPI you must start &amp;lt;my_mpi_program&amp;gt; by the command mpiexec.hydra (instead of mpirun).&lt;br /&gt;
&lt;br /&gt;
== Interactive Computing with Jupyter ==&lt;br /&gt;
&lt;br /&gt;
== Monitor and manage jobs ==&lt;br /&gt;
&lt;br /&gt;
=== List of your submitted jobs : squeue ===&lt;br /&gt;
Displays information about YOUR active, pending and/or recently completed jobs. The command displays all own active and pending jobs. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;squeue&#039;&#039; example on bwUniCluster 3.0 &amp;lt;small&amp;gt;(Only your own jobs are displayed!)&amp;lt;/small&amp;gt;.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue &lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  R       8:15      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123 PD       0:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  R       2:41      1 uc3n084&lt;br /&gt;
$ squeue -l&lt;br /&gt;
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  RUNNING       8:55     20:00      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123  PENDING       0:00     20:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  RUNNING       3:21     20:00      1 uc3n084&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Detailed job information : scontrol show job ===&lt;br /&gt;
scontrol show job displays detailed job state information and diagnostic output for all or a specified job of yours. Detailed information is available for active, pending and recently completed jobs. The command scontrol is explained in detail on the webpage https://slurm.schedmd.com/scontrol.html or via manpage (man scontrol). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of all your jobs in normal mode: scontrol show job&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of a job with &amp;lt;jobid&amp;gt; in normal mode: scontrol show job &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Here is an example from bwUniCluster 3.0:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
1262       cpu     wrap ka_zs040  R       1:12      1 uc3n002&lt;br /&gt;
&lt;br /&gt;
$&lt;br /&gt;
$ # now, see what&#039;s up with my pending job with jobid 1262&lt;br /&gt;
$ &lt;br /&gt;
$ scontrol show job 1262&lt;br /&gt;
&lt;br /&gt;
JobId=1262 JobName=wrap&lt;br /&gt;
   UserId=ka_zs0402(241992) GroupId=ka_scc(12345) MCS_label=N/A&lt;br /&gt;
   Priority=4246 Nice=0 Account=ka QOS=normal&lt;br /&gt;
   JobState=RUNNING Reason=None Dependency=(null)&lt;br /&gt;
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0&lt;br /&gt;
   RunTime=00:00:37 TimeLimit=00:20:00 TimeMin=N/A&lt;br /&gt;
   SubmitTime=2025-04-04T10:01:30 EligibleTime=2025-04-04T10:01:30&lt;br /&gt;
   AccrueTime=2025-04-04T10:01:30&lt;br /&gt;
   StartTime=2025-04-04T10:01:31 EndTime=2025-04-04T10:21:31 Deadline=N/A&lt;br /&gt;
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-04-04T10:01:31 Scheduler=Main&lt;br /&gt;
   Partition=cpu AllocNode:Sid=uc3n999:2819841&lt;br /&gt;
   ReqNodeList=(null) ExcNodeList=(null)&lt;br /&gt;
   NodeList=uc3n002&lt;br /&gt;
   BatchHost=uc3n002&lt;br /&gt;
   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*&lt;br /&gt;
   ReqTRES=cpu=1,mem=2000M,node=1,billing=1&lt;br /&gt;
   AllocTRES=cpu=2,mem=4000M,node=1,billing=2&lt;br /&gt;
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*&lt;br /&gt;
   MinCPUsNode=1 MinMemoryCPU=2000M MinTmpDiskNode=0&lt;br /&gt;
   Features=(null) DelayBoot=00:00:00&lt;br /&gt;
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)&lt;br /&gt;
   Command=(null)&lt;br /&gt;
   WorkDir=/pfs/data6/home/ka/ka_scc/ka_zs0402&lt;br /&gt;
   StdErr=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
   StdIn=/dev/null&lt;br /&gt;
   StdOut=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Each request to the Slurm workload manager generates a load. Therefore, do not use &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; with a simple &amp;lt;code&amp;gt;watch&amp;lt;/code&amp;gt;. The smallest allowed time interval is &amp;lt;b&amp;gt;30 seconds&amp;lt;/b&amp;gt;.&amp;lt;br&amp;gt;&lt;br /&gt;
Any violation of this rule will result in the task being terminated without notice.&lt;br /&gt;
&lt;br /&gt;
=== Canceling own jobs : scancel ===&lt;br /&gt;
The scancel command is used to cancel jobs. The command scancel is explained in detail on the webpage https://slurm.schedmd.com/scancel.html or via manpage (man scancel). The command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel [-i] &amp;lt;job-id&amp;gt;&lt;br /&gt;
$ scancel -t &amp;lt;job_state_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Slurm Options =&lt;br /&gt;
[[BwUniCluster3.0/Running_Jobs/Slurm | Detailed Slurm usage]]&lt;br /&gt;
&lt;br /&gt;
= Best Practices =&lt;br /&gt;
&lt;br /&gt;
== Step-by-Step example==&lt;br /&gt;
&lt;br /&gt;
== Dos and Don&#039;ts ==&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;| Do not run squeue and other slurm commands in loops or &amp;quot;watch&amp;quot; as not to saturate up the slurm daemon with rpc requests&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15302</id>
		<title>BwUniCluster3.0/Running Jobs</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15302"/>
		<updated>2025-09-23T15:08:16Z</updated>

		<summary type="html">&lt;p&gt;S Braun: /* Dos and Don&amp;#039;ts */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
= Purpose and function of a queuing system =&lt;br /&gt;
&lt;br /&gt;
All compute activities on bwUniCluster 3.0 have to be performed on the compute nodes. Compute nodes are only available by requesting the corresponding resources via the queuing system. As soon as the requested resources are available, automated tasks are executed via a batch script or they can be accessed interactively.&amp;lt;br&amp;gt;&lt;br /&gt;
General procedure: Hint to [[Running_Calculations | Running Calculations]]&lt;br /&gt;
&lt;br /&gt;
== Job submission process ==&lt;br /&gt;
&lt;br /&gt;
bwUniCluster 3.0 has installed the workload managing software Slurm. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.&lt;br /&gt;
&lt;br /&gt;
== Slurm ==&lt;br /&gt;
&lt;br /&gt;
HPC Workload Manager on bwUniCluster 3.0 is Slurm.&lt;br /&gt;
Slurm is a cluster management and job scheduling system. Slurm has three key functions. &lt;br /&gt;
* It allocates access to resources (compute cores on nodes) to users for some duration of time so they can perform work. &lt;br /&gt;
* It provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. &lt;br /&gt;
* It arbitrates contention for resources by managing a queue of pending work.&lt;br /&gt;
&lt;br /&gt;
Any kind of calculation on the compute nodes of bwUniCluster 3.0 requires the user to define calculations as a sequence of commands together with required run time, number of CPU cores and main memory and submit all, i.e., the &#039;&#039;&#039;batch job&#039;&#039;&#039;, to a resource and workload managing software.&lt;br /&gt;
&lt;br /&gt;
== Terms and definitions ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Partitions &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Slurm manages job queues for different &#039;&#039;&#039;partitions&#039;&#039;&#039;. Partitions are used to group similar node types (e.g. nodes with and without accelerators) and to enforce different access policies and resource limits.&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different partitions:&lt;br /&gt;
&lt;br /&gt;
* CPU-only nodes&lt;br /&gt;
** 2-socket nodes, consisting of 2 Intel Ice Lake processors with 32 cores each or 2 AMD processors with 48 cores each&lt;br /&gt;
** 2-socket nodes with very high RAM capacity, consisting of 2 AMD processors with 48 cores each&lt;br /&gt;
* GPU-accelerated nodes&lt;br /&gt;
** 2-socket nodes with 4x NVIDIA A100 or 4x NVIDIA H100 GPUs&lt;br /&gt;
** 4-socket node with 4x AMD Instinct accelerator&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Queues &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Job &#039;&#039;&#039;queues&#039;&#039;&#039; are used to manage jobs that request access to shared but limited computing resources of a certain kind (partition).&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different main types of queues:&lt;br /&gt;
* Regular queues&lt;br /&gt;
** cpu: Jobs that request CPU-only nodes.&lt;br /&gt;
** gpu: Jobs that request GPU-accelerated nodes.&lt;br /&gt;
* Development queues (dev)&lt;br /&gt;
** Short, usually interactive jobs that are used for developing, compiling and testing code and workflows. The intention behind development queues is to provide users with immediate access to computer resources without having to wait. This is the place to realize instantaneous heavy compute without affecting other users, as would be the case on the login nodes.&lt;br /&gt;
&lt;br /&gt;
Requested compute resources such as (wall-)time, number of nodes and amount of memory are restricted and must fit into the boundaries imposed by the queues. The request for compute resources on the bwUniCluster 3.0 &amp;lt;font color=red&amp;gt;requires at least the specification of the &#039;&#039;&#039;queue&#039;&#039;&#039; and the &#039;&#039;&#039;time&#039;&#039;&#039;&amp;lt;/font&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Jobs &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Jobs can be run non-interactively as &#039;&#039;&#039;batch jobs&#039;&#039;&#039; or as &#039;&#039;&#039;interactive jobs&#039;&#039;&#039;.&amp;lt;br&amp;gt;&lt;br /&gt;
Submitting a batch job means, that all steps of a compute project are defined in a Bash script. This Bash script is queued and executed as soon as the compute resources are available and allocated. Jobs are enqueued with the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command.&lt;br /&gt;
For interactive jobs, the resources are requested with the &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; command. As soon as the computing resources are available and allocated, a command line prompt is returned on a computing node and the user can freely dispose of the resources now available to him.&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
&#039;&#039;&#039;Please remember:&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;Heavy computations are not allowed on the login nodes&#039;&#039;&#039;.&amp;lt;br&amp;gt;Use a developement or a regular job queue instead! Please refer to [[BwUniCluster3.0/Login#Allowed_Activities_on_Login_Nodes|Allowed Activities on Login Nodes]].&lt;br /&gt;
* &#039;&#039;&#039;Development queues&#039;&#039;&#039; are meant for &#039;&#039;&#039;development tasks&#039;&#039;&#039;.&amp;lt;br&amp;gt;Do not misuse this queue for regular, short-running jobs or chain jobs! Only one running job at a time is enabled. Maximum queue length is reduced to 3.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Queues on bwUniCluster 3.0 = &lt;br /&gt;
== Policy ==&lt;br /&gt;
&lt;br /&gt;
The computing time is provided in accordance with the &#039;&#039;&#039;fair share policy&#039;&#039;&#039;. The individual investment shares of the respective university and the resources already used by its members are taken into account. Furthermore, the following throttling policy is also active: The &#039;&#039;&#039;maximum amount of physical cores&#039;&#039;&#039; used at any given time from jobs running is &#039;&#039;&#039;1920 per user&#039;&#039;&#039; (aggregated over all running jobs). This number corresponds to 30 nodes on the Ice Lake partition or 20 nodes on the standard partition. The aim is to minimize waiting times and maximize the number of users who can access computing time at the same time.&lt;br /&gt;
&lt;br /&gt;
== Regular Queues ==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node-Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=30, mem=249600mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=20, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;highmem&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;High Memory&lt;br /&gt;
| mem-per-cpu=12090mb&lt;br /&gt;
| mem=380001mb&lt;br /&gt;
| time=72:00:00, nodes=4, mem=2300000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=72:00:00, nodes=12, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_mi300&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU node&amp;lt;br/&amp;gt;AMD GPU x4&lt;br /&gt;
| mem-per-gpu=128200mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=72:00:00, nodes=1, mem=510000mb, ntasks-per-node=40, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_il&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;gpu_h100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=48:00:00, nodes=9(A100)/nodes=5(H100) , mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 1: Regular Queues&lt;br /&gt;
&lt;br /&gt;
== Short Queues ==&lt;br /&gt;
&amp;lt;p style=&amp;quot;color:red; &amp;quot;&amp;gt;&amp;lt;b&amp;gt;Queues with a short runtime of 30 minutes.&amp;lt;/b&amp;gt;&amp;lt;/p&amp;gt; &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_short&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=94000mb&amp;lt;br/&amp;gt;cpus-per-gpu=12&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=12, mem=376000mb, ntasks-per-node=48, (threads-per-core=2)&lt;br /&gt;
|}&lt;br /&gt;
Table 2: Short Queues&lt;br /&gt;
&lt;br /&gt;
== Development Queues ==&lt;br /&gt;
Only for development, i.e. debugging or performance optimization ...&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=8, mem=249600mb, ntasks-per-node=64, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=1, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=1, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_a100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&amp;lt;br/&amp;gt;&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16 &lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=1, mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 3: Development Queues&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Default resources of a queue class define number of tasks and memory if not explicitly given with sbatch command. Resource list acronyms &#039;&#039;--time&#039;&#039;, &#039;&#039;--ntasks&#039;&#039;, &#039;&#039;--nodes&#039;&#039;, &#039;&#039;--mem&#039;&#039; and &#039;&#039;--mem-per-cpu&#039;&#039; are described [[BwUniCluster3.0/Running_Jobs/Slurm|here]].&lt;br /&gt;
&lt;br /&gt;
== Check available resources: sinfo_t_idle ==&lt;br /&gt;
The Slurm command sinfo is used to view partition and node information for a system running Slurm. It incorporates down time, reservations, and node state information in determining the available backfill window. The sinfo command can only be used by the administrator.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
SCC has prepared a special script (sinfo_t_idle) to find out how many processors are available for immediate use on the system. It is anticipated that users will use this information to submit jobs that meet these criteria and thus obtain quick job turnaround times. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The following command displays what resources are available for immediate use for the whole partition.&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sinfo_t_idle &lt;br /&gt;
Partition dev_cpu                 :      1 nodes idle&lt;br /&gt;
Partition cpu                     :      1 nodes idle&lt;br /&gt;
Partition highmem                 :      2 nodes idle&lt;br /&gt;
Partition dev_gpu_h100            :      0 nodes idle&lt;br /&gt;
Partition gpu_h100                :      0 nodes idle&lt;br /&gt;
Partition gpu_mi300               :      0 nodes idle&lt;br /&gt;
Partition dev_cpu_il              :      7 nodes idle&lt;br /&gt;
Partition cpu_il                  :      2 nodes idle&lt;br /&gt;
Partition dev_gpu_a100_il         :      1 nodes idle&lt;br /&gt;
Partition gpu_a100_il             :      0 nodes idle&lt;br /&gt;
Partition gpu_h100_il             :      1 nodes idle&lt;br /&gt;
Partition gpu_a100_short          :      0 nodes idle&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Running Jobs =&lt;br /&gt;
&lt;br /&gt;
== Slurm Commands (excerpt) ==&lt;br /&gt;
Important Slurm commands for non-administrators working on bwUniCluster 3.0.&lt;br /&gt;
{| width=850px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Brief explanation&lt;br /&gt;
|-&lt;br /&gt;
| [[#Batch Jobs: sbatch|sbatch]] || Submits a job and puts it into the queue [[https://slurm.schedmd.com/sbatch.html sbatch]] &lt;br /&gt;
|-&lt;br /&gt;
| [[#Interactive Jobs: salloc|salloc]] || Requests resources for an interactive Job [[https://slurm.schedmd.com/salloc.html salloc]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Monitor and manage jobs |scontrol show job]] || Displays detailed job state information [[https://slurm.schedmd.com/scontrol.html scontrol]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue]] || Displays information about active, eligible, blocked, and/or recently completed jobs [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue --start]] || Returns start time of submitted job [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Check available resources: sinfo_t_idle|sinfo_t_idle]] || Shows what resources are available for immediate use [[https://slurm.schedmd.com/sinfo.html sinfo]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Canceling own jobs : scancel|scancel]] || Cancels a job [[https://slurm.schedmd.com/scancel.html scancel]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
* [https://slurm.schedmd.com/tutorials.html  Slurm Tutorials]&lt;br /&gt;
* [https://slurm.schedmd.com/pdfs/summary.pdf  Slurm command/option summary (2 pages)]&lt;br /&gt;
* [https://slurm.schedmd.com/man_index.html  Slurm Commands]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Batch Jobs: sbatch ==&lt;br /&gt;
&lt;br /&gt;
Batch jobs are submitted by using the command &#039;&#039;&#039;sbatch&#039;&#039;&#039;. The main purpose of the &#039;&#039;&#039;sbatch&#039;&#039;&#039; command is to specify the resources that are needed to run the job. &#039;&#039;&#039;sbatch&#039;&#039;&#039; will then queue the batch job. However, starting of batch job depends on the availability of the requested resources and the fair sharing value.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The syntax and use of &#039;&#039;&#039;sbatch&#039;&#039;&#039; can be displayed via:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ man sbatch&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;sbatch&#039;&#039;&#039; options can be used from the command line or in your job script. Different defaults for some of these options are set based on the queue and can be found [[BwUniCluster3.0/Slurm | here]]&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;3&amp;quot; | sbatch Options&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;width:8%&amp;quot;| Command line&lt;br /&gt;
! style=&amp;quot;width:9%&amp;quot;| Script&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Purpose&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -t, --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| #SBATCH --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| Wall clock time limit.&amp;lt;br&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -N, --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of nodes to be used.&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -n, --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of tasks to be launched.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Maximum count of tasks per node.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -c, --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of CPUs required per (MPI-)task.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Memory in MegaByte per node. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Minimum Memory required per allocated CPU. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --exclusive&lt;br /&gt;
| #SBATCH --exclusive &lt;br /&gt;
| The job allocates all CPUs and GPUs on the nodes. It will not share the node with other running jobs&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| Notify user by email when certain event types occur.&amp;lt;br&amp;gt;Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
|  The specified mail-address receives email notification of state changes as defined by --mail-type.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job output is stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job error messages are stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -J, --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| Job name.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| #SBATCH --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| Identifies which environment variables from the submission environment are propagated to the launched application. Default is ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -A, --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| #SBATCH --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| Change resources used by this job to specified group. You may need this option if your account is assigned to more than one group. By command &amp;quot;scontrol show job&amp;quot; the project group the job is accounted on can be seen behind &amp;quot;Account=&amp;quot;. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -p, --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| #SBATCH --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| Request a specific queue for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| #SBATCH --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| Use a specific reservation for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;LSDF&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=LSDF&lt;br /&gt;
| Job constraint LSDF filesystems.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&lt;br /&gt;
| Job constraint BeeOND filesystem.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Interactive Jobs: salloc ==&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 you are only allowed to run short jobs (&amp;lt;&amp;lt; 1 hour) with little memory requirements (&amp;lt;&amp;lt; 8 GByte) on the logins nodes. If you want to run longer jobs and/or jobs with a request of more than 8 GByte of memory, you must allocate resources for so-called interactive jobs by usage of the command salloc on a login node. Considering a serial application running on a compute node that requires 5000 MByte of memory and limiting the interactive run to 2 hours the following command has to be executed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -n 1 -t 120 --mem=5000&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then you will get one core on a compute node within the partition &amp;quot;cpu&amp;quot;. After execution of this command &#039;&#039;&#039;DO NOT CLOSE&#039;&#039;&#039; your current terminal session but wait until the queueing system Slurm has granted you the requested resources on the compute system. You will be logged in automatically on the granted core! To run a serial program on the granted core you only have to type the name of the executable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ./&amp;lt;my_serial_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please be aware that your serial job must run less than 2 hours in this example, else the job will be killed during runtime by the system. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can also start now a graphical X11-terminal connecting you to the dedicated resource that is available for 2 hours. You can start it by the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ xterm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that, once the walltime limit has been reached the resources - i.e. the compute node - will automatically be revoked.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
An interactive parallel application running on one compute node or on many compute nodes (e.g. here 5 nodes) with 96 cores each requires usually an amount of memory in GByte (e.g. 50 GByte) and a maximum time (e.g. 1 hour). E.g. 5 nodes can be allocated by the following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -N 5 --ntasks-per-node=96 -t 01:00:00  --mem=50gb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now you can run parallel jobs on 480 cores requiring 50 GByte of memory per node. Please be aware that you will be logged in on core 0 of the first node.&lt;br /&gt;
If you want to have access to another node you have to open a new terminal, connect it also to bwUniCluster 3.0 and type the following commands to&lt;br /&gt;
connect to the running interactive job and then to a specific node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ srun --jobid=XXXXXXXX --pty /bin/bash&lt;br /&gt;
$ srun --nodelist=uc3nXXX --pty /bin/bash&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
the jobid and the nodelist can be shown.&lt;br /&gt;
&lt;br /&gt;
If you want to run MPI-programs, you can do it by simply typing mpirun &amp;lt;program_name&amp;gt;. Then your program will be run on 480 cores. A very simple example for starting a parallel job can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also start the debugger ddt by the commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module add devel/ddt&lt;br /&gt;
$ ddt &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above commands will execute the parallel program &amp;lt;my_mpi_program&amp;gt; on all available cores. You can also start parallel programs on a subset of cores; an example for this can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -n 50 &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you are using Intel MPI you must start &amp;lt;my_mpi_program&amp;gt; by the command mpiexec.hydra (instead of mpirun).&lt;br /&gt;
&lt;br /&gt;
== Interactive Computing with Jupyter ==&lt;br /&gt;
&lt;br /&gt;
== Monitor and manage jobs ==&lt;br /&gt;
&lt;br /&gt;
=== List of your submitted jobs : squeue ===&lt;br /&gt;
Displays information about YOUR active, pending and/or recently completed jobs. The command displays all own active and pending jobs. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;squeue&#039;&#039; example on bwUniCluster 3.0 &amp;lt;small&amp;gt;(Only your own jobs are displayed!)&amp;lt;/small&amp;gt;.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue &lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  R       8:15      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123 PD       0:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  R       2:41      1 uc3n084&lt;br /&gt;
$ squeue -l&lt;br /&gt;
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  RUNNING       8:55     20:00      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123  PENDING       0:00     20:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  RUNNING       3:21     20:00      1 uc3n084&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Detailed job information : scontrol show job ===&lt;br /&gt;
scontrol show job displays detailed job state information and diagnostic output for all or a specified job of yours. Detailed information is available for active, pending and recently completed jobs. The command scontrol is explained in detail on the webpage https://slurm.schedmd.com/scontrol.html or via manpage (man scontrol). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of all your jobs in normal mode: scontrol show job&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of a job with &amp;lt;jobid&amp;gt; in normal mode: scontrol show job &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Here is an example from bwUniCluster 3.0.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
1262       cpu     wrap ka_zs040  R       1:12      1 uc3n002&lt;br /&gt;
&lt;br /&gt;
$&lt;br /&gt;
$ # now, see what&#039;s up with my pending job with jobid 1262&lt;br /&gt;
$ &lt;br /&gt;
$ scontrol show job 1262&lt;br /&gt;
&lt;br /&gt;
JobId=1262 JobName=wrap&lt;br /&gt;
   UserId=ka_zs0402(241992) GroupId=ka_scc(12345) MCS_label=N/A&lt;br /&gt;
   Priority=4246 Nice=0 Account=ka QOS=normal&lt;br /&gt;
   JobState=RUNNING Reason=None Dependency=(null)&lt;br /&gt;
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0&lt;br /&gt;
   RunTime=00:00:37 TimeLimit=00:20:00 TimeMin=N/A&lt;br /&gt;
   SubmitTime=2025-04-04T10:01:30 EligibleTime=2025-04-04T10:01:30&lt;br /&gt;
   AccrueTime=2025-04-04T10:01:30&lt;br /&gt;
   StartTime=2025-04-04T10:01:31 EndTime=2025-04-04T10:21:31 Deadline=N/A&lt;br /&gt;
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-04-04T10:01:31 Scheduler=Main&lt;br /&gt;
   Partition=cpu AllocNode:Sid=uc3n999:2819841&lt;br /&gt;
   ReqNodeList=(null) ExcNodeList=(null)&lt;br /&gt;
   NodeList=uc3n002&lt;br /&gt;
   BatchHost=uc3n002&lt;br /&gt;
   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*&lt;br /&gt;
   ReqTRES=cpu=1,mem=2000M,node=1,billing=1&lt;br /&gt;
   AllocTRES=cpu=2,mem=4000M,node=1,billing=2&lt;br /&gt;
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*&lt;br /&gt;
   MinCPUsNode=1 MinMemoryCPU=2000M MinTmpDiskNode=0&lt;br /&gt;
   Features=(null) DelayBoot=00:00:00&lt;br /&gt;
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)&lt;br /&gt;
   Command=(null)&lt;br /&gt;
   WorkDir=/pfs/data6/home/ka/ka_scc/ka_zs0402&lt;br /&gt;
   StdErr=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
   StdIn=/dev/null&lt;br /&gt;
   StdOut=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Canceling own jobs : scancel ===&lt;br /&gt;
The scancel command is used to cancel jobs. The command scancel is explained in detail on the webpage https://slurm.schedmd.com/scancel.html or via manpage (man scancel). The command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel [-i] &amp;lt;job-id&amp;gt;&lt;br /&gt;
$ scancel -t &amp;lt;job_state_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Slurm Options =&lt;br /&gt;
[[BwUniCluster3.0/Running_Jobs/Slurm | Detailed Slurm usage]]&lt;br /&gt;
&lt;br /&gt;
= Best Practices =&lt;br /&gt;
&lt;br /&gt;
== Step-by-Step example==&lt;br /&gt;
&lt;br /&gt;
== Dos and Don&#039;ts ==&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;| Do not run squeue and other slurm commands in loops or &amp;quot;watch&amp;quot; as not to saturate up the slurm daemon with rpc requests&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Hardware_and_Architecture&amp;diff=15265</id>
		<title>BwUniCluster3.0/Hardware and Architecture</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Hardware_and_Architecture&amp;diff=15265"/>
		<updated>2025-09-02T09:45:21Z</updated>

		<summary type="html">&lt;p&gt;S Braun: /* Compute nodes */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Architecture of bwUniCluster 3.0 =&lt;br /&gt;
&lt;br /&gt;
The &#039;&#039;&#039;bwUniCluster 3.0&#039;&#039;&#039; is a parallel computer with distributed memory. &lt;br /&gt;
It consists of the bwUniCluster 3.0 components procured in 2024 and also includes the additional compute nodes which were procured as an extension to the bwUniCluster 2.0 in 2022.&lt;br /&gt;
 &lt;br /&gt;
Each node of the system consists of two Intel Xeon or AMD EPYC processors, local memory, local storage, network adapters and optional accelerators (NVIDIA A100 and H100, AMD Instinct MI300A). All nodes are connected via a fast InfiniBand interconnect.&lt;br /&gt;
&lt;br /&gt;
The parallel file system (Lustre) is connected to the InfiniBand switch of the compute cluster. This provides a fast and scalable parallel file &lt;br /&gt;
system.&lt;br /&gt;
&lt;br /&gt;
The operating system on each node is Red Hat Enterprise Linux (RHEL) 9.4.&lt;br /&gt;
&lt;br /&gt;
The individual nodes of the system act in different roles. From an end users point of view the different groups of nodes are login nodes and compute nodes. File server nodes and administrative server nodes are not accessible by users.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Login Nodes&#039;&#039;&#039;&amp;lt;br&amp;gt;&lt;br /&gt;
The login nodes are the only nodes directly accessible by end users. These nodes are used for interactive login, file management, program development, and interactive pre- and post-processing.&lt;br /&gt;
There are two nodes dedicated to this service, but they can all be reached from a single address: &amp;lt;code&amp;gt;uc3.scc.kit.edu&amp;lt;/code&amp;gt;. A DNS round-robin alias distributes login sessions to the login nodes.&lt;br /&gt;
To prevent login nodes from being used for activities that are not permitted there and that affect the user experience of other users, &#039;&#039;&#039;long-running and/or compute-intensive tasks are periodically terminated without any prior warning&#039;&#039;&#039;. Please refer to [[BwUniCluster3.0/Login#Allowed_Activities_on_Login_Nodes|Allowed Activities on Login Nodes]].&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Compute Nodes&#039;&#039;&#039;&amp;lt;br&amp;gt;&lt;br /&gt;
The majority of nodes are compute nodes which are managed by a batch system. Users submit their jobs to the SLURM batch system and a job is executed when the required resources become available (depending on its fair-share priority).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;File Systems&#039;&#039;&#039;&amp;lt;br&amp;gt;&lt;br /&gt;
bwUniCluster 3.0 comprises two parallel file systems based on Lustre.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[File:uc3.png|Optionen|center|Überschrift|800px]]&lt;br /&gt;
&lt;br /&gt;
= Compute Resources =&lt;br /&gt;
&lt;br /&gt;
== Login nodes ==&lt;br /&gt;
&lt;br /&gt;
After a successful [[BwUniCluster3.0/Login|login]], users find themselves on one of the so called login nodes. Technically, these largely correspond to a standard CPU node, i.e. users have two AMD EPYC 9454 processors with a total of 96 cores at their disposal. Login nodes are the bridgehead for accessing computing resources.&lt;br /&gt;
Data and software are organized here, computing jobs are initiated and managed, and computing resources allocated for interactive use can also be accessed from here.&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#ffa500; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#ffa500; text-align:left&amp;quot;|&lt;br /&gt;
&#039;&#039;&#039;Any compute intensive job running on the login nodes will be terminated without any notice.&#039;&#039;&#039;&amp;lt;br/&amp;gt;&lt;br /&gt;
Please refer to [[BwUniCluster3.0/Login#Allowed_Activities_on_Login_Nodes|Allowed Activities on Login Nodes]].&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Compute nodes ==&lt;br /&gt;
All compute activities on bwUniCluster 3.0 have to be performed on the compute nodes. Compute nodes are only available by requesting the corresponding resources via the queuing system. As soon as the requested resources are available, automated tasks are executed via a batch script or they can be accessed interactively. Please refer to [[BwUniCluster3.0/Running_Jobs|Running Jobs]] on how to request resources.&amp;lt;br&amp;gt;&lt;br /&gt;
The following compute node types are available:&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;b&amp;gt;CPU nodes&amp;lt;/b&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;Standard&#039;&#039;&#039;: Two AMD EPYC 9454 processors per node with a total of 96 physical CPU cores or 192 logical cores (Hyper-Threading) per node. The nodes have been procured in 2024.&lt;br /&gt;
* &#039;&#039;&#039;Ice Lake&#039;&#039;&#039;: Two Intel Xeon Platinum 8358 processors per node with a total of 64 physical CPU cores or 128 logical cores (Hyper-Threading) per node. The nodes have been procured in 2022 as an extension to bwUniCluster 2.0.&lt;br /&gt;
* &#039;&#039;&#039;High Memory&#039;&#039;&#039;: Similar to the standard nodes, but with six times larger memory.&lt;br /&gt;
&amp;lt;b&amp;gt;GPU nodes&amp;lt;/b&amp;gt;&lt;br /&gt;
* &#039;&#039;&#039;NVIDIA GPU x4&#039;&#039;&#039;: Similar to the standard nodes, but with larger memory and four NVIDIA H100 GPUs.&lt;br /&gt;
* &#039;&#039;&#039;AMD GPU x4&#039;&#039;&#039;: AMD&#039;s accelerated processing unit (APU) MI300A with 4 CPU sockets and 4 compute units which share the same high-bandwidth memory (HBM).&lt;br /&gt;
* &#039;&#039;&#039;Ice Lake NVIDIA GPU x4&#039;&#039;&#039;: Similar to the Ice Lake nodes, but with larger memory and four NVIDIA A100 or H100 GPUs.&lt;br /&gt;
* &#039;&#039;&#039;Cascade Lake NVIDIA GPU x4&#039;&#039;&#039;: Nodes with four NVIDIA A100 GPUs.&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| CPU nodes&amp;lt;br/&amp;gt;High Memory&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU node&amp;lt;br/&amp;gt;AMD GPU x4&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| GPU nodes&amp;lt;br/&amp;gt;Cascade Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Login nodes&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Availability in [[BwUniCluster3.0/Running_Jobs#Queues_on_bwUniCluster_3.0| queues]]&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu_il&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;dev_cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;dev_cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;highmem&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;dev_highmem&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_h100&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;dev_gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_mi300&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_il&amp;lt;/code&amp;gt; / &amp;lt;code&amp;gt;gpu_h100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_short&amp;lt;/code&amp;gt;&amp;lt;/code&amp;gt;&lt;br /&gt;
| -&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Number of nodes&lt;br /&gt;
| 272&lt;br /&gt;
| 70&lt;br /&gt;
| 4&lt;br /&gt;
| 12&lt;br /&gt;
| 1&lt;br /&gt;
| 15&lt;br /&gt;
| 19&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Processors&lt;br /&gt;
| Intel Xeon Platinum 8358&lt;br /&gt;
| AMD EPYC 9454&lt;br /&gt;
| AMD EPYC 9454&lt;br /&gt;
| AMD EPYC 9454&lt;br /&gt;
| AMD Zen 4&lt;br /&gt;
| Intel Xeon Platinum 8358&lt;br /&gt;
| Intel Xeon Gold 6248R&lt;br /&gt;
| AMD EPYC 9454&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Number of sockets&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
| 4&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Total number of cores&lt;br /&gt;
| 64&lt;br /&gt;
| 96&lt;br /&gt;
| 96&lt;br /&gt;
| 96&lt;br /&gt;
| 96 (4x 24)&lt;br /&gt;
| 64&lt;br /&gt;
| 48&lt;br /&gt;
| 96&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Main memory&lt;br /&gt;
| 256 GB&lt;br /&gt;
| 384 GB&lt;br /&gt;
| 2.3 TB&lt;br /&gt;
| 768 GB&lt;br /&gt;
| 4x 128 GB HBM3&lt;br /&gt;
| 512 GB&lt;br /&gt;
| 384 GB&lt;br /&gt;
| 384 GB&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Local SSD&lt;br /&gt;
| 1.8 TB NVMe&lt;br /&gt;
| 3.84 TB NVMe&lt;br /&gt;
| 15.36 TB NVMe&lt;br /&gt;
| 15.36 TB NVMe&lt;br /&gt;
| 7.68 TB NVMe&lt;br /&gt;
| 6.4 TB NVMe&lt;br /&gt;
| 1.92 TB SATA SSD&lt;br /&gt;
| 7.68 TB SATA SSD&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Accelerators&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 4x NVIDIA H100 &lt;br /&gt;
| 4x AMD Instinct MI300A&lt;br /&gt;
| 4x NVIDIA A100 / H100 &lt;br /&gt;
| 4x NVIDIA A100&lt;br /&gt;
| -&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Accelerator memory&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| -&lt;br /&gt;
| 94 GB&lt;br /&gt;
| APU&lt;br /&gt;
| 80 GB / 94 GB&lt;br /&gt;
| 40 GB&lt;br /&gt;
| -&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Interconnect&lt;br /&gt;
| IB HDR200 &lt;br /&gt;
| IB 2x NDR200&lt;br /&gt;
| IB 2x NDR200&lt;br /&gt;
| IB 4x NDR200&lt;br /&gt;
| IB 2x NDR200&lt;br /&gt;
| IB 2x HDR200 &lt;br /&gt;
| IB 4x EDR&lt;br /&gt;
| IB 1x NDR200&lt;br /&gt;
|}&lt;br /&gt;
Table 1: Hardware overview and properties&lt;br /&gt;
&lt;br /&gt;
= File Systems =&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 the following file systems are available:&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;&#039;$HOME&#039;&#039;&#039;&amp;lt;br&amp;gt;The HOME directory is created automatically after account activation, and the environment variable $HOME holds its name. HOME is the place, where users find themselves after login.&lt;br /&gt;
* &#039;&#039;&#039;Workspaces&#039;&#039;&#039;&amp;lt;br&amp;gt;Users can create so-called workspaces for non-permanent data with temporary lifetime.&lt;br /&gt;
* &#039;&#039;&#039;Workspaces on flash storage&#039;&#039;&#039;&amp;lt;br&amp;gt;A further workspace file system based on flash-only storage is available for special requirements and certain users.&lt;br /&gt;
* &#039;&#039;&#039;$TMPDIR&#039;&#039;&#039;&amp;lt;br&amp;gt;The directory $TMPDIR is only available and visible on the local node during the runtime of a compute job. It is located on fast SSD storage devices.&lt;br /&gt;
* &#039;&#039;&#039;BeeOND&#039;&#039;&#039; (BeeGFS On-Demand)&amp;lt;br&amp;gt;On request a parallel on-demand file system (BeeOND) is created which uses the SSDs of the nodes which were allocated to the batch job.&lt;br /&gt;
* &#039;&#039;&#039;LSDF Online Storage&#039;&#039;&#039;&amp;lt;br&amp;gt;On request the external LSDF Online Storage is mounted on the nodes which were allocated to the batch job. On the login nodes, LSDF is automatically mounted.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Which file system to use?&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
You should separate your data and store it on the appropriate file system.&lt;br /&gt;
Permanently needed data like software or important results should be stored in $HOME but capacity restrictions (quotas) apply.&lt;br /&gt;
In case you accidentally deleted data on $HOME there is a chance that we can restore it from backup.&lt;br /&gt;
Permanent data which is not needed for months or exceeds the capacity restrictions should be sent to the LSDF Online Storage or to the archive and deleted from the file systems. Temporary data which is only needed on a single node and which does not exceed the disk space shown in Table 1 above should be stored&lt;br /&gt;
below $TMPDIR. Data which is read many times on a single node, e.g. if you are doing AI training, &lt;br /&gt;
should be copied to $TMPDIR and read from there. Temporary data which is used from many nodes &lt;br /&gt;
of your batch job and which is only needed during job runtime should be stored on a &lt;br /&gt;
parallel on-demand file system BeeOND. Temporary data which can be recomputed or which is the &lt;br /&gt;
result of one job and input for another job should be stored in workspaces. The lifetime &lt;br /&gt;
of data in workspaces is limited and depends on the lifetime of the workspace which can be &lt;br /&gt;
several months.&lt;br /&gt;
&lt;br /&gt;
For further details please check: [[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details|File System Details]]&lt;br /&gt;
&lt;br /&gt;
== $HOME ==&lt;br /&gt;
&lt;br /&gt;
The $HOME directories of bwUniCluster 3.0 users are located on the parallel file system Lustre.&lt;br /&gt;
You have access to your $HOME directory from all nodes of UC3. A regular backup of these directories &lt;br /&gt;
to tape archive is done automatically. The directory $HOME is used to hold those files that are&lt;br /&gt;
permanently used like source codes, configuration files, executable programs etc.&lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#$HOME|Detailed information on $HOME]]&lt;br /&gt;
&lt;br /&gt;
== Workspaces ==&lt;br /&gt;
&lt;br /&gt;
On UC3 workspaces should be used to store large non-permanent data sets, e.g. restart files or output&lt;br /&gt;
data that has to be post-processed. The file system used for workspaces is also the parallel file system Lustre. This file system is especially designed for parallel access and for a high throughput to large&lt;br /&gt;
files. It is able to provide high data transfer rates of up to 40 GB/s write and read performance when data access is parallel. &lt;br /&gt;
&lt;br /&gt;
On UC3 there is a default user quota limit of 40 TiB and 20 million inodes (files and directories) per user.&lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#Workspaces|Detailed information on Workspaces]]&lt;br /&gt;
&lt;br /&gt;
== Workspaces on flash storage ==&lt;br /&gt;
&lt;br /&gt;
Another workspace file system based on flash-only storage is available for special requirements and certain users.&lt;br /&gt;
If possible, this file system should be used from the Ice Lake nodes of bwUniCluster 3.0 (queue &#039;&#039;cpu_il&#039;&#039;). &lt;br /&gt;
It provides high IOPS rates and better performance for small files. The quota limts are lower than on the &lt;br /&gt;
normal workspace file system.&lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#Workspaces_on_flash_storage|Detailed information on Workspaces on flash storage]]&lt;br /&gt;
&lt;br /&gt;
== $TMPDIR ==&lt;br /&gt;
&lt;br /&gt;
The environment variable $TMPDIR contains the name of a directory which is located on the local SSD of each node. &lt;br /&gt;
This directory should be used for temporary files being accessed from the local node. It should &lt;br /&gt;
also be used if you read the same data many times from a single node, e.g. if you are doing AI training. &lt;br /&gt;
Because of the extremely fast local SSD storage devices performance with small files is much better than on the parallel file systems. &lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#$TMPDIR|Detailed information on $TMPDIR]]&lt;br /&gt;
&lt;br /&gt;
== BeeOND (BeeGFS On-Demand) ==&lt;br /&gt;
&lt;br /&gt;
Users have the possibility to request a private BeeOND (on-demand BeeGFS) parallel filesystem for each job. The file system is created during job startup and purged when your job completes.&lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#BeeOND_(BeeGFS_On-Demand)|Detailed information on BeeOND]]&lt;br /&gt;
&lt;br /&gt;
== LSDF Online Storage ==&lt;br /&gt;
&lt;br /&gt;
The LSDF Online Storage allows dedicated users to store scientific measurement data and simulation results. BwUniCluster 3.0 has an extremely fast network connection to the LSDF Online Storage. This file system provides external access via different protocols and is only available for certain users.&lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster3.0/Hardware_and_Architecture/Filesystem_Details#LSDF_Online_Storage|Detailed information on LSDF Online Storage]]&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Talk:Development/Python&amp;diff=15241</id>
		<title>Talk:Development/Python</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Talk:Development/Python&amp;diff=15241"/>
		<updated>2025-08-21T11:29:08Z</updated>

		<summary type="html">&lt;p&gt;S Braun: Created page with &amp;quot;Samuel: Wir sollten uv mit in die Tools Liste aufnehmen.&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Samuel: Wir sollten uv mit in die Tools Liste aufnehmen.&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Workspace&amp;diff=15209</id>
		<title>Workspace</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Workspace&amp;diff=15209"/>
		<updated>2025-08-15T11:07:02Z</updated>

		<summary type="html">&lt;p&gt;S Braun: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Workspace tools&#039;&#039;&#039; provide temporary scratch space so calles &#039;&#039;&#039;workspaces&#039;&#039;&#039; for your calculation on a central file storage. They are meant to keep data for a limited time – but usually longer than the time of a single job run. &lt;br /&gt;
&lt;br /&gt;
== No Backup ==&lt;br /&gt;
&lt;br /&gt;
Workspaces are not meant for permanent storage, hence data in workspaces is not backed up and may be lost in case of problems on the storage system. Please copy/move important results to $HOME or some disks outside the cluster.&lt;br /&gt;
&lt;br /&gt;
== Create workspace ==&lt;br /&gt;
To create a workspace you need to state &#039;&#039;name&#039;&#039; of your workspace and &#039;&#039;lifetime&#039;&#039; in days. A maximum value for &#039;&#039;lifetime&#039;&#039; and a maximum number of renewals is defined on each cluster.  Execution of:&lt;br /&gt;
&lt;br /&gt;
   $ ws_allocate mySpace 30&lt;br /&gt;
&lt;br /&gt;
e.g. returns:&lt;br /&gt;
 &lt;br /&gt;
   Workspace created. Duration is 720 hours. &lt;br /&gt;
   Further extensions available: 3&lt;br /&gt;
   /work/workspace/scratch/username-mySpace-0&lt;br /&gt;
&lt;br /&gt;
For more information read the program&#039;s help, i.e. &#039;&#039;$ ws_allocate -h&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
== List all your workspaces ==&lt;br /&gt;
To list all your workspaces, execute:&lt;br /&gt;
&lt;br /&gt;
   $ ws_list&lt;br /&gt;
&lt;br /&gt;
which will return:&lt;br /&gt;
* Workspace ID&lt;br /&gt;
* Workspace location&lt;br /&gt;
* available extensions&lt;br /&gt;
* creation date and remaining time&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Find workspace location ==&lt;br /&gt;
&lt;br /&gt;
Workspace location/path can be prompted for any workspace &#039;&#039;ID&#039;&#039; using &#039;&#039;&#039;ws_find&#039;&#039;&#039;, in case of workspace &#039;&#039;mySpace&#039;&#039;:&lt;br /&gt;
&lt;br /&gt;
   $ ws_find mySpace&lt;br /&gt;
&lt;br /&gt;
returns the one-liner:&lt;br /&gt;
&lt;br /&gt;
   /work/workspace/scratch/username-mySpace-0&lt;br /&gt;
&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
== Extend lifetime of your workspace ==&lt;br /&gt;
&lt;br /&gt;
Any workspace&#039;s lifetime can be only extended a cluster-specific number of times. There several commands to extend workspace lifetime&lt;br /&gt;
#&amp;lt;pre&amp;gt;$ ws_extend mySpace 40&amp;lt;/pre&amp;gt; which extends workspace ID &#039;&#039;mySpace&#039;&#039; by &#039;&#039;40&#039;&#039; days from now,&lt;br /&gt;
#&amp;lt;pre&amp;gt;$ ws_extend mySpace&amp;lt;/pre&amp;gt; which extends workspace ID &#039;&#039;mySpace&#039;&#039; by the number days used previously&lt;br /&gt;
#&amp;lt;pre&amp;gt;$ ws_allocate -x mySpace 40&amp;lt;/pre&amp;gt; which extends workspace ID &#039;&#039;mySpace&#039;&#039; by &#039;&#039;40&#039;&#039; days from now.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Setting Permissions for Sharing Files ==&lt;br /&gt;
The examples will assume you want to change the directory in $DIR. If you want to share a workspace, DIR could be set with &amp;lt;code&amp;gt;DIR=$(ws_find my_workspace)&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Regular Unix Permissions ===&lt;br /&gt;
&lt;br /&gt;
Making workspaces world readable/writable using standard unix access rights with &amp;lt;tt&amp;gt;chmod&amp;lt;/tt&amp;gt; is only feasible if you are in a research group and you and your co-workers share a common  (&amp;quot;bwXXXXX&amp;quot;) unix group. It is strongly discouraged to make files readable or even writable to everyone or to large common groups. &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:45%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:55%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;chgrp -R bw16e001 &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
&amp;lt;tt&amp;gt;chmod -R g+rX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Set group ownership and grant read access to group for files in workspace via unix rights to the group &amp;quot;bw16e001&amp;quot; (has to be re-done if files are added)&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;chgrp -R bw16e001 &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt; &lt;br /&gt;
&amp;lt;tt&amp;gt;chmod -R g+rswX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Set group ownership and grant read/write access to group for files in workspace via unix rights (has to be re-done if files are added). Group will be inherited by new files, but rights for the group will have to be re-set with chmod for every new file&lt;br /&gt;
|- &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Options used:&lt;br /&gt;
* -R: recursive&lt;br /&gt;
* g+rwx&lt;br /&gt;
** g: group&lt;br /&gt;
** + add permissions (- to remove)&lt;br /&gt;
** rwx: read, write, execute&lt;br /&gt;
&lt;br /&gt;
=== &amp;quot;ACL&amp;quot;s: Access Crontrol Lists ===&lt;br /&gt;
ACLs  allow a much more detailed distribution of permissions but are a bit more complicated and not visible in detail via &amp;quot;ls&amp;quot;. They have the additional advantage that you can set a &amp;quot;default&amp;quot; ACL for a directory, (with a &amp;lt;tt&amp;gt;-d&amp;lt;/tt&amp;gt; flag or a &amp;lt;tt&amp;gt;d:&amp;lt;/tt&amp;gt; prefix) which will cause all newly created files to inherit the ACLs from the directory. Regular unix permissions only have limited support (only group ownership, not access rights) for this via the suid bit.&lt;br /&gt;
&lt;br /&gt;
Best practices with respect to ACL usage:&lt;br /&gt;
# Take into account that ACL take precedence over standard unix access rights&lt;br /&gt;
# The owner of a workspace is responsible for its content and management&lt;br /&gt;
&lt;br /&gt;
Please note that &amp;lt;tt&amp;gt;ls&amp;lt;/tt&amp;gt; (List directory contents) shows ACLs on directories and files only when run as &amp;lt;tt&amp;gt;ls -l&amp;lt;/tt&amp;gt; as in long format, as &amp;quot;plus&amp;quot; sign after the standard unix access rights. &lt;br /&gt;
&lt;br /&gt;
Examples with regard to &amp;quot;my_workspace&amp;quot;:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:45%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:55%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;getfacl &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|List access rights on $DIR&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;setfacl -Rm u:fr_xy1:rX,d:u:fr_xy1:rX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Grant user &amp;quot;fr_xy1&amp;quot; read-only access to $DIR&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;setfacl -Rm u:fr_me0000:rwX,d:u:fr_me0000:rwX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
&amp;lt;tt&amp;gt;setfacl -Rm u:fr_xy1:rwX,d:u:fr_xy1:rwX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Grant your own user &amp;quot;fr_me0000&amp;quot; and &amp;quot;fr_xy1&amp;quot; inheritable read and write access to $DIR, so you can also read/write files put into the workspace by a coworker&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;setfacl -Rm g:bw16e001:rX,d:g:bw16e001:rX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Grant group (Rechenvorhaben) &amp;quot;bw16e001&amp;quot; read-only access to $DIR&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;setfacl -Rb &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Remove all ACL rights. Standard Unix access rights apply again.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Options used:&lt;br /&gt;
* -R: recursive&lt;br /&gt;
* -m: modify&lt;br /&gt;
* u:username:rwX u: next name is a user; rwX read, write, eXecute (only where execute is set for user)&lt;br /&gt;
&lt;br /&gt;
== Delete a Workspace ==&lt;br /&gt;
&lt;br /&gt;
   $ ws_release mySpace # Manually erase your workspace mySpace&lt;br /&gt;
&lt;br /&gt;
Note: workspaces are kept for some time after release. To immediately delete and free space e.g. for quota reasons, delete the files with rm before release.&lt;br /&gt;
&lt;br /&gt;
Newer versions of workspace tools have a --delete-data flag that immediately deletes data. Note that deleted data from workspaces is permanently lost.&lt;br /&gt;
&lt;br /&gt;
== Restore an Expired Workspace ==&lt;br /&gt;
&lt;br /&gt;
For a certain (system-specific) grace time following workspace expiration, a workspace can be restored by performing the following steps:&lt;br /&gt;
&lt;br /&gt;
(1) Display restorable workspaces.&lt;br /&gt;
 ws_restore -l&lt;br /&gt;
&lt;br /&gt;
(2) Create a new workspace as the target for the restore:&lt;br /&gt;
 ws_allocate restored 60&lt;br /&gt;
&lt;br /&gt;
(3) Restore:&lt;br /&gt;
 ws_restore &amp;lt;full_name_of_expired_workspace&amp;gt; restored&lt;br /&gt;
&lt;br /&gt;
The expired workspace has to be specified using the &#039;&#039;&#039;full name&#039;&#039;&#039;, including username prefix and timestamp suffix (otherwise, it cannot be uniquely identified).&lt;br /&gt;
The target workspace, on the other hand, must be given with just its short name as listed by &amp;lt;code&amp;gt;ws_list&amp;lt;/code&amp;gt;, without the username prefix.&lt;br /&gt;
&lt;br /&gt;
If the workspace is no visible/restorable, it has been &#039;&#039;&#039;permanently deleted&#039;&#039;&#039; and cannot be restored, not even by us. Please always remember, that workspaces are intended solely for temporary work data, and there is no backup of data in the workspaces.&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Workspace&amp;diff=15208</id>
		<title>Workspace</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Workspace&amp;diff=15208"/>
		<updated>2025-08-15T11:06:29Z</updated>

		<summary type="html">&lt;p&gt;S Braun: /* Create workspace */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Workspace tools&#039;&#039;&#039; provide temporary scratch space so calles &#039;&#039;&#039;workspaces&#039;&#039;&#039; for your calculation on a central file storage. They are meant to keep data for a limited time – but usually longer than the time of a single job run. &lt;br /&gt;
&lt;br /&gt;
== No Backup ==&lt;br /&gt;
&lt;br /&gt;
Workspaces are not meant for permanent storage, hence data in workspaces is not backed up and may be lost in case of problems on the storage system. Please copy/move important results to $HOME or some disks outside the cluster.&lt;br /&gt;
&lt;br /&gt;
== Create workspace ==&lt;br /&gt;
To create a workspace you need to state &#039;&#039;name&#039;&#039; of your workspace and &#039;&#039;lifetime&#039;&#039; in days. A maximum value for &#039;&#039;lifetime&#039;&#039; and a maximum number of renewals is defined on each cluster.  Execution of:&lt;br /&gt;
&lt;br /&gt;
   $ ws_allocate mySpace 30&lt;br /&gt;
&lt;br /&gt;
e.g. returns:&lt;br /&gt;
 &lt;br /&gt;
   Workspace created. Duration is 720 hours. &lt;br /&gt;
   Further extensions available: 3&lt;br /&gt;
   /work/workspace/scratch/username-mySpace-0&lt;br /&gt;
&lt;br /&gt;
For more information read the program&#039;s help, i.e. &#039;&#039;$ ws_allocate -h&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
== List all your workspaces ==&lt;br /&gt;
To list all your workspaces, execute:&lt;br /&gt;
&lt;br /&gt;
   $ ws_list&lt;br /&gt;
&lt;br /&gt;
which will return:&lt;br /&gt;
* Workspace ID&lt;br /&gt;
* Workspace location&lt;br /&gt;
* available extensions&lt;br /&gt;
* creation date and remaining time&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Find workspace location ==&lt;br /&gt;
&lt;br /&gt;
Workspace location/path can be prompted for any workspace &#039;&#039;ID&#039;&#039; using &#039;&#039;&#039;ws_find&#039;&#039;&#039;, in case of workspace &#039;&#039;blah&#039;&#039;:&lt;br /&gt;
&lt;br /&gt;
   $ ws_find blah&lt;br /&gt;
&lt;br /&gt;
returns the one-liner:&lt;br /&gt;
&lt;br /&gt;
   /work/workspace/scratch/username-blah-0&lt;br /&gt;
&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
== Extend lifetime of your workspace ==&lt;br /&gt;
&lt;br /&gt;
Any workspace&#039;s lifetime can be only extended a cluster-specific number of times. There several commands to extend workspace lifetime&lt;br /&gt;
#&amp;lt;pre&amp;gt;$ ws_extend blah 40&amp;lt;/pre&amp;gt; which extends workspace ID &#039;&#039;blah&#039;&#039; by &#039;&#039;40&#039;&#039; days from now,&lt;br /&gt;
#&amp;lt;pre&amp;gt;$ ws_extend blah&amp;lt;/pre&amp;gt; which extends workspace ID &#039;&#039;blah&#039;&#039; by the number days used previously&lt;br /&gt;
#&amp;lt;pre&amp;gt;$ ws_allocate -x blah 40&amp;lt;/pre&amp;gt; which extends workspace ID &#039;&#039;blah&#039;&#039; by &#039;&#039;40&#039;&#039; days from now.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Setting Permissions for Sharing Files ==&lt;br /&gt;
The examples will assume you want to change the directory in $DIR. If you want to share a workspace, DIR could be set with &amp;lt;code&amp;gt;DIR=$(ws_find my_workspace)&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Regular Unix Permissions ===&lt;br /&gt;
&lt;br /&gt;
Making workspaces world readable/writable using standard unix access rights with &amp;lt;tt&amp;gt;chmod&amp;lt;/tt&amp;gt; is only feasible if you are in a research group and you and your co-workers share a common  (&amp;quot;bwXXXXX&amp;quot;) unix group. It is strongly discouraged to make files readable or even writable to everyone or to large common groups. &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:45%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:55%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;chgrp -R bw16e001 &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
&amp;lt;tt&amp;gt;chmod -R g+rX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Set group ownership and grant read access to group for files in workspace via unix rights to the group &amp;quot;bw16e001&amp;quot; (has to be re-done if files are added)&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;chgrp -R bw16e001 &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt; &lt;br /&gt;
&amp;lt;tt&amp;gt;chmod -R g+rswX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Set group ownership and grant read/write access to group for files in workspace via unix rights (has to be re-done if files are added). Group will be inherited by new files, but rights for the group will have to be re-set with chmod for every new file&lt;br /&gt;
|- &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Options used:&lt;br /&gt;
* -R: recursive&lt;br /&gt;
* g+rwx&lt;br /&gt;
** g: group&lt;br /&gt;
** + add permissions (- to remove)&lt;br /&gt;
** rwx: read, write, execute&lt;br /&gt;
&lt;br /&gt;
=== &amp;quot;ACL&amp;quot;s: Access Crontrol Lists ===&lt;br /&gt;
ACLs  allow a much more detailed distribution of permissions but are a bit more complicated and not visible in detail via &amp;quot;ls&amp;quot;. They have the additional advantage that you can set a &amp;quot;default&amp;quot; ACL for a directory, (with a &amp;lt;tt&amp;gt;-d&amp;lt;/tt&amp;gt; flag or a &amp;lt;tt&amp;gt;d:&amp;lt;/tt&amp;gt; prefix) which will cause all newly created files to inherit the ACLs from the directory. Regular unix permissions only have limited support (only group ownership, not access rights) for this via the suid bit.&lt;br /&gt;
&lt;br /&gt;
Best practices with respect to ACL usage:&lt;br /&gt;
# Take into account that ACL take precedence over standard unix access rights&lt;br /&gt;
# The owner of a workspace is responsible for its content and management&lt;br /&gt;
&lt;br /&gt;
Please note that &amp;lt;tt&amp;gt;ls&amp;lt;/tt&amp;gt; (List directory contents) shows ACLs on directories and files only when run as &amp;lt;tt&amp;gt;ls -l&amp;lt;/tt&amp;gt; as in long format, as &amp;quot;plus&amp;quot; sign after the standard unix access rights. &lt;br /&gt;
&lt;br /&gt;
Examples with regard to &amp;quot;my_workspace&amp;quot;:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:45%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:55%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;getfacl &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|List access rights on $DIR&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;setfacl -Rm u:fr_xy1:rX,d:u:fr_xy1:rX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Grant user &amp;quot;fr_xy1&amp;quot; read-only access to $DIR&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;setfacl -Rm u:fr_me0000:rwX,d:u:fr_me0000:rwX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
&amp;lt;tt&amp;gt;setfacl -Rm u:fr_xy1:rwX,d:u:fr_xy1:rwX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Grant your own user &amp;quot;fr_me0000&amp;quot; and &amp;quot;fr_xy1&amp;quot; inheritable read and write access to $DIR, so you can also read/write files put into the workspace by a coworker&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;setfacl -Rm g:bw16e001:rX,d:g:bw16e001:rX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Grant group (Rechenvorhaben) &amp;quot;bw16e001&amp;quot; read-only access to $DIR&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;setfacl -Rb &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Remove all ACL rights. Standard Unix access rights apply again.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Options used:&lt;br /&gt;
* -R: recursive&lt;br /&gt;
* -m: modify&lt;br /&gt;
* u:username:rwX u: next name is a user; rwX read, write, eXecute (only where execute is set for user)&lt;br /&gt;
&lt;br /&gt;
== Delete a Workspace ==&lt;br /&gt;
&lt;br /&gt;
   $ ws_release blah # Manually erase your workspace blah&lt;br /&gt;
&lt;br /&gt;
Note: workspaces are kept for some time after release. To immediately delete and free space e.g. for quota reasons, delete the files with rm before release.&lt;br /&gt;
&lt;br /&gt;
Newer versions of workspace tools have a --delete-data flag that immediately deletes data. Note that deleted data from workspaces is permanently lost.&lt;br /&gt;
&lt;br /&gt;
== Restore an Expired Workspace ==&lt;br /&gt;
&lt;br /&gt;
For a certain (system-specific) grace time following workspace expiration, a workspace can be restored by performing the following steps:&lt;br /&gt;
&lt;br /&gt;
(1) Display restorable workspaces.&lt;br /&gt;
 ws_restore -l&lt;br /&gt;
&lt;br /&gt;
(2) Create a new workspace as the target for the restore:&lt;br /&gt;
 ws_allocate restored 60&lt;br /&gt;
&lt;br /&gt;
(3) Restore:&lt;br /&gt;
 ws_restore &amp;lt;full_name_of_expired_workspace&amp;gt; restored&lt;br /&gt;
&lt;br /&gt;
The expired workspace has to be specified using the &#039;&#039;&#039;full name&#039;&#039;&#039;, including username prefix and timestamp suffix (otherwise, it cannot be uniquely identified).&lt;br /&gt;
The target workspace, on the other hand, must be given with just its short name as listed by &amp;lt;code&amp;gt;ws_list&amp;lt;/code&amp;gt;, without the username prefix.&lt;br /&gt;
&lt;br /&gt;
If the workspace is no visible/restorable, it has been &#039;&#039;&#039;permanently deleted&#039;&#039;&#039; and cannot be restored, not even by us. Please always remember, that workspaces are intended solely for temporary work data, and there is no backup of data in the workspaces.&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster2.0/Software/Start_vnc_desktop&amp;diff=15207</id>
		<title>BwUniCluster2.0/Software/Start vnc desktop</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster2.0/Software/Start_vnc_desktop&amp;diff=15207"/>
		<updated>2025-08-14T11:14:15Z</updated>

		<summary type="html">&lt;p&gt;S Braun: S Braun moved page BwUniCluster2.0/Software/Start vnc desktop to BwUniCluster3.0/Software/Start vnc desktop&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;#REDIRECT [[BwUniCluster3.0/Software/Start vnc desktop]]&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Software/Start_vnc_desktop&amp;diff=15206</id>
		<title>BwUniCluster3.0/Software/Start vnc desktop</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Software/Start_vnc_desktop&amp;diff=15206"/>
		<updated>2025-08-14T11:14:15Z</updated>

		<summary type="html">&lt;p&gt;S Braun: S Braun moved page BwUniCluster2.0/Software/Start vnc desktop to BwUniCluster3.0/Software/Start vnc desktop&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;The Linux 3D graphics stack is based on &#039;&#039;X11&#039;&#039; and &#039;&#039;OpenGL&#039;&#039;. This has some&lt;br /&gt;
drawbacks in conjunction with remote visualization:&lt;br /&gt;
&lt;br /&gt;
* Rendering takes place on the client, not the cluster&lt;br /&gt;
* Whole 3D model must be transferred via network to the client&lt;br /&gt;
* Some OpenGL extensions are not supported when using indirect / client side rendering instead of direct / hardware based rendering&lt;br /&gt;
* Many round trips in the X11 protocol negatively influence interactivity&lt;br /&gt;
* X11 is not available on non-Linux platforms&lt;br /&gt;
* Compatibility problems between client and cluster can occur&lt;br /&gt;
&lt;br /&gt;
To avoid these drawbacks,  &amp;lt;code&amp;gt;start_vnc_desktop&amp;lt;/code&amp;gt; is provided.&lt;br /&gt;
It combines the three open source  products [http://www.turbovnc.org/ TurboVNC], [http://www.virtualgl.org/ VirtualGL] and [http://openswr.org/ OpenSWR].&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;Virtual Network Computing (VNC)&#039;&#039; is a graphical desktop sharing system.&lt;br /&gt;
VNC is platform-independent - there are clients and servers for many&lt;br /&gt;
GUI-based operating systems. The VNC server is the program on the&lt;br /&gt;
machine that shares its screen. The VNC client (or viewer) is the&lt;br /&gt;
program that watches, controls, and interacts with the server. For more&lt;br /&gt;
details see: [https://en.wikipedia.org/wiki/VNC Wikipedia]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;VirtualGL&#039;&#039; redirects the 3D rendering commands from Linux OpenGL&lt;br /&gt;
applications to 3D accelerator hardware in the cluster. For more details&lt;br /&gt;
see: [https://en.wikipedia.org/wiki/VirtualGL Wikipedia]&lt;br /&gt;
&lt;br /&gt;
When no 3D accelerator hardware is available &#039;&#039;OpenSWR&#039;&#039;, a high&lt;br /&gt;
performance, highly scalable software rasterizer for OpenGL can carry&lt;br /&gt;
out the rendering task. For more details see:  [http://openswr.org OpenSWR]&lt;br /&gt;
&lt;br /&gt;
This script takes a two step approach to start a VNC server in the&lt;br /&gt;
cluster environment:&lt;br /&gt;
&lt;br /&gt;
In the first step the batch system is used to allocate resources where a&lt;br /&gt;
VNC server can be started.&lt;br /&gt;
&lt;br /&gt;
In the second step the VNC server is launched on the resources granted&lt;br /&gt;
by the batch system. When VNC server is successfully started all&lt;br /&gt;
required login credentials and connection parameters will be reported.&lt;br /&gt;
To connect to this VNC server a VNC client installation on the local&lt;br /&gt;
desktop is required. &lt;br /&gt;
&lt;br /&gt;
= Script usage =&lt;br /&gt;
&lt;br /&gt;
* After login the script can simply be called from the command line:&amp;lt;pre&amp;gt;start_vnc_desktop&amp;lt;/pre&amp;gt;&lt;br /&gt;
* To get help on the available options use:&amp;lt;pre&amp;gt;start_vnc_desktop --help&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Hardware rendering is currently only available on FH2 and bwUniCluster, it can be requested with:&amp;lt;pre&amp;gt;start_vnc_desktop --hw-rendering&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Software rendering is available on all clusters, it can be requested with: &amp;lt;pre&amp;gt;start_vnc_desktop --sw-rendering&amp;lt;/pre&amp;gt;&lt;br /&gt;
* There is only a limited number of nodes with hardware rendering support, software rendering runs on all nodes.&lt;br /&gt;
* For large 3D data sets the software renderer may be faster.&lt;br /&gt;
* If neither &amp;lt;code&amp;gt;--hw-rendering&amp;lt;/code&amp;gt; nor &amp;lt;code&amp;gt;--sw-rendering&amp;lt;/code&amp;gt; is selected no 3D rendering support is available.&lt;br /&gt;
&lt;br /&gt;
= VNC client =&lt;br /&gt;
&lt;br /&gt;
In general every VNC client can be used to connect to the VNC server.&lt;br /&gt;
However for best performance and compatibility the use of the&lt;br /&gt;
[http://www.turbovnc.org/ TurboVNC] client is recommended.&lt;br /&gt;
Below you find the necessary steps for different client operation systems.&lt;br /&gt;
&lt;br /&gt;
; Debian, Ubuntu:&lt;br /&gt;
* Download: [https://sourceforge.net/projects/turbovnc/files Download Site] -&amp;gt; latest version -&amp;gt; turbovnc_&amp;lt;VERSION&amp;gt;_amd64.deb&lt;br /&gt;
* Install: &amp;lt;pre&amp;gt; sudo apt-get install ./turbovnc_&amp;lt;VERSION&amp;gt;_amd64.deb&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Execute: &amp;lt;pre&amp;gt;/opt/TurboVNC/bin/vncviewer&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
; Red Hat Enterprise Linux, Fedora:&lt;br /&gt;
* Download: [https://sourceforge.net/projects/turbovnc/files Download Site]  -&amp;gt; latest version -&amp;gt; turbovnc-&amp;lt;VERSION&amp;gt;.x86_64.rpm&lt;br /&gt;
* Install: &amp;lt;pre&amp;gt;sudo yum install ./turbovnc-&amp;lt;VERSION&amp;gt;.x86_64.rpm&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Execute: &amp;lt;pre&amp;gt;/opt/TurboVNC/bin/vncviewer&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
; SUSE Linux Enterprise, openSUSE:&lt;br /&gt;
* Download [https://sourceforge.net/projects/turbovnc/files Download Site]  -&amp;gt; latest version -&amp;gt; turbovnc-&amp;lt;VERSION&amp;gt;.x86_64.rpm&lt;br /&gt;
* Install: &amp;lt;pre&amp;gt;sudo zypper install ./turbovnc-&amp;lt;VERSION&amp;gt;.x86_64.rpm&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Execute: &amp;lt;pre&amp;gt;/opt/TurboVNC/bin/vncviewer&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
; ArchLinux:&lt;br /&gt;
* Download: Can be installed from the AUR&lt;br /&gt;
* Install: &amp;lt;pre&amp;gt;pacaur -S turbovnc&amp;lt;/pre&amp;gt;&lt;br /&gt;
* Execute: &amp;lt;pre&amp;gt;vncviewer&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
; Windows:&lt;br /&gt;
* Download: [https://sourceforge.net/projects/turbovnc/files Download Site] -&amp;gt; latest version -&amp;gt; TurboVNC64-&amp;lt;VERSION&amp;gt;.exe for 64-bit, TurboVNC-&amp;lt;VERSION&amp;gt;.exe for 32-bit&lt;br /&gt;
* Install: Double click on TurboVNC64-&amp;lt;VERSION&amp;gt;.exe / TurboVNC-&amp;lt;VERSION&amp;gt;.exe. Install in default directory (or chose a different one, if preferred)&lt;br /&gt;
* Execute:  Java TurboVNCviewer (vncviewer-javaw.bat in installation directory)&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster2.0/Software/Python_Dask&amp;diff=15205</id>
		<title>BwUniCluster2.0/Software/Python Dask</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster2.0/Software/Python_Dask&amp;diff=15205"/>
		<updated>2025-08-14T11:13:43Z</updated>

		<summary type="html">&lt;p&gt;S Braun: S Braun moved page BwUniCluster2.0/Software/Python Dask to BwUniCluster3.0/Software/Python Dask&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;#REDIRECT [[BwUniCluster3.0/Software/Python Dask]]&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Software/Python_Dask&amp;diff=15204</id>
		<title>BwUniCluster3.0/Software/Python Dask</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Software/Python_Dask&amp;diff=15204"/>
		<updated>2025-08-14T11:13:43Z</updated>

		<summary type="html">&lt;p&gt;S Braun: S Braun moved page BwUniCluster2.0/Software/Python Dask to BwUniCluster3.0/Software/Python Dask&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!--{| style=&amp;quot;border-style: solid; border-width: 1px&amp;quot;&lt;br /&gt;
! Navigation: [[BwHPC_Best_Practices_Repository|bwHPC BPR]] / [[BwUniCluster_User_Guide|bwUniCluster]] &lt;br /&gt;
|}--&amp;gt;&lt;br /&gt;
This guide explains how to use Python Dask and dask-jobqueue on bwUniCluster2.0.&lt;br /&gt;
&lt;br /&gt;
== Installation and Usage ==&lt;br /&gt;
Please have a look at our [https://github.com/hpcraink/workshop-parallel-jupyter Workshop] on how to use Dask on bwUniCluster2.0 (2_Grundlagen: Environment erstellen and 6_Dask). This is currently only available in German.&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster2.0/Software/OpenFoam&amp;diff=15203</id>
		<title>BwUniCluster2.0/Software/OpenFoam</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster2.0/Software/OpenFoam&amp;diff=15203"/>
		<updated>2025-08-14T11:13:06Z</updated>

		<summary type="html">&lt;p&gt;S Braun: S Braun moved page BwUniCluster2.0/Software/OpenFoam to BwUniCluster3.0/Software/OpenFoam&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;#REDIRECT [[BwUniCluster3.0/Software/OpenFoam]]&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Software/OpenFoam&amp;diff=15202</id>
		<title>BwUniCluster3.0/Software/OpenFoam</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Software/OpenFoam&amp;diff=15202"/>
		<updated>2025-08-14T11:13:06Z</updated>

		<summary type="html">&lt;p&gt;S Braun: S Braun moved page BwUniCluster2.0/Software/OpenFoam to BwUniCluster3.0/Software/OpenFoam&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Softwarepage|cae/openfoam}}&lt;br /&gt;
&lt;br /&gt;
{| width=600px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Description !! Content&lt;br /&gt;
|-&lt;br /&gt;
| module load&lt;br /&gt;
| cae/openfoam&lt;br /&gt;
|-&lt;br /&gt;
| License&lt;br /&gt;
| [https://www.openfoam.org/licence.php GNU General Public Licence]&lt;br /&gt;
|-&lt;br /&gt;
| Citing&lt;br /&gt;
| n/a&lt;br /&gt;
|-&lt;br /&gt;
| Links&lt;br /&gt;
| [https://www.openfoam.org/ Homepage] &amp;amp;#124; [https://www.openfoam.org/docs/ Documentation]&lt;br /&gt;
|-&lt;br /&gt;
| Graphical Interface&lt;br /&gt;
| No&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
= Description =&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;OpenFOAM&#039;&#039;&#039; (Open-source Field Operation And Manipulation) is a free, open-source CFD software package with an extensive range of features to solve anything from complex fluid flows involving chemical reactions, turbulence and heat transfer, to solid dynamics and electromagnetics.&lt;br /&gt;
&lt;br /&gt;
= Adding OpenFOAM to Your Environment =&lt;br /&gt;
&lt;br /&gt;
After loading the desired module, type to activate the OpenFOAM applications&lt;br /&gt;
&amp;lt;pre&amp;gt;$ source $FOAM_INIT&amp;lt;/pre&amp;gt;&lt;br /&gt;
or simply&lt;br /&gt;
&amp;lt;pre&amp;gt;$ foamInit&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Parallel run with OpenFOAM  =&lt;br /&gt;
For a better performance on running OpenFOAM jobs in parallel on bwUniCluster, it is recommended to have the decomposed data in local folders on each node.  &lt;br /&gt;
&lt;br /&gt;
Therefore you may use *HPC scripts, wich will copy your data to the node specific folders after running the decomposePar, and copy it back to the local case folder before running reconstructPar.&lt;br /&gt;
&lt;br /&gt;
Don&#039;t forget to allocate enough wall-time for decomposition and reconstruction of your cases. As the data will be processed directly on  the nodes, and may be lost if the job is cancelled before  the data is copied back into the case folder.&lt;br /&gt;
&lt;br /&gt;
Following commands will do that for you: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ decomposeParHPC&lt;br /&gt;
$ reconstructParHPC&lt;br /&gt;
$ reconstructParMeshHPC&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
instead of:&lt;br /&gt;
&amp;lt;pre&amp;gt;$ decomposePar&lt;br /&gt;
$ reconstructPar&lt;br /&gt;
$ recontructParMesh&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For example, if you want to run&amp;lt;span style=&amp;quot;background:#edeae2;margin:10px;padding:1px;border:1px dotted #808080&amp;quot;&amp;gt;snappyHexMesh&amp;lt;/span&amp;gt;in parallel, you may use the following commands:&lt;br /&gt;
&amp;lt;pre&amp;gt;$ decomposeParMeshHPC&lt;br /&gt;
$ mpirun --bind-to core --map-by core -report-bindings snappyHexMesh -overwrite -parallel&lt;br /&gt;
$ reconstructParMeshHPC -constant&amp;lt;/pre&amp;gt;&lt;br /&gt;
instead of:&lt;br /&gt;
&amp;lt;pre&amp;gt;$ decomposePar&lt;br /&gt;
$ mpirun --bind-to core --map-by core -report-bindings snappyHexMesh -overwrite -parallel&lt;br /&gt;
$ reconstructParMesh -constant&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For running jobs on multiple nodes, OpenFOAM needs passwordless communication between the nodes, to copy data into the local folders.&lt;br /&gt;
&lt;br /&gt;
A small trick using ssh-keygen once will let your nodes to communicate freely over rsh. &lt;br /&gt;
&lt;br /&gt;
Do it once (if you didn&#039;t do it already in the past):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh-keygen&lt;br /&gt;
$ cat $HOME/.ssh/id_rsa.pub &amp;gt;&amp;gt; $HOME/.ssh/authorized_keys&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Building an OpenFOAM batch file for parallel processing =&lt;br /&gt;
== General information == &lt;br /&gt;
Before running OpenFOAM jobs in parallel, it is necessary to decompose the geometry domain into segments, equal to the number of processors (or threads) you intend to use. &lt;br /&gt;
&lt;br /&gt;
That means, for example, if you want to run a case on 8 processors, you will have to decompose the mesh in 8 segments, first. Then, you start the solver in &#039;&#039;parallel&#039;&#039;, letting &#039;&#039;OpenFOAM&#039;&#039; to run calculations concurrently on these segments, one processor responding for one segment of the mesh, sharing the data with all other processors in between. &lt;br /&gt;
&lt;br /&gt;
There is, of course, a mechanism that connects properly the calculations, so you don&#039;t loose your data or generate wrong results. &lt;br /&gt;
&lt;br /&gt;
Decomposition and segments building process is handled by&amp;lt;span style=&amp;quot;background:#edeae2;margin:10px;padding:1px;border:1px dotted #808080&amp;quot;&amp;gt;decomposePar&amp;lt;/span&amp;gt;utility. &lt;br /&gt;
&lt;br /&gt;
The number of subdomains, in which the geometry will be decomposed, is specified in &amp;quot;&#039;&#039;system/decomposeParDict&#039;&#039;&amp;quot;, as well as the decomposition method to use. &lt;br /&gt;
&lt;br /&gt;
The automatic decomposition method is &amp;quot;&#039;&#039;scotch&#039;&#039;&amp;quot;. It trims the mesh, collecting as many cells as possible per processor, trying to avoid having empty segments or segments with not enough cells. If you want your mesh to be divided in other way, specifying the number of segments it should be cut in x, y or z direction, for example, you can use &amp;quot;simple&amp;quot; or &amp;quot;hierarchical&amp;quot; methods. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Wrapper script generation == &lt;br /&gt;
&#039;&#039;&#039;Attention:&#039;&#039;&#039; &amp;lt;span style=&amp;quot;background:#edeae2;margin:10px;padding:1px;border:1px dotted #808080&amp;quot;&amp;gt;openfoam&amp;lt;/span&amp;gt; module loads automatically the necessary &amp;lt;span style=&amp;quot;background:#edeae2;margin:10px;padding:1px;border:1px dotted #808080&amp;quot;&amp;gt;openmpi&amp;lt;/span&amp;gt; module for parallel run, do &#039;&#039;&#039;NOT&#039;&#039;&#039; load another version of mpi, as it may conflict with the loaded &amp;lt;span style=&amp;quot;background:#edeae2;margin:10px;padding:1px;border:1px dotted #808080&amp;quot;&amp;gt;openfoam&amp;lt;/span&amp;gt; version. &lt;br /&gt;
&lt;br /&gt;
A job-script to submit a batch job called &#039;&#039;job_openfoam.sh&#039;&#039; that runs &#039;&#039;icoFoam&#039;&#039; solver with OpenFoam version 8, on 80 processors, on a &#039;&#039;multiple&#039;&#039; partition with a total wall clock time of 6 hours looks like: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--b)--&amp;gt; &lt;br /&gt;
{| style=&amp;quot;width: 100%; border:1px solid #d0cfcc; background:#f2f7ff;border-spacing: 5px;&amp;quot;&lt;br /&gt;
| style=&amp;quot;width:280px; white-space:nowrap; color:#000;&amp;quot; |&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate nodes&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
# Number of tasks per node&lt;br /&gt;
#SBATCH --ntasks-per-node=40&lt;br /&gt;
# Queue class https://wiki.bwhpc.de/e/BwUniCluster_2.0_Batch_Queues&lt;br /&gt;
#SBATCH --partition=multiple&lt;br /&gt;
# Maximum job run time&lt;br /&gt;
#SBATCH --time=4:00:00&lt;br /&gt;
# Give the job a reasonable name&lt;br /&gt;
#SBATCH --job-name=openfoam&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=logs-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=logs-%j.err&lt;br /&gt;
&lt;br /&gt;
# User defined variables&lt;br /&gt;
FOAM_VERSION=&amp;quot;8&amp;quot;&lt;br /&gt;
EXECUTABLE=&amp;quot;icoFoam&amp;quot;&lt;br /&gt;
MPIRUN_OPTIONS=&amp;quot;--bind-to core --map-by core --report-bindings&amp;quot;&lt;br /&gt;
&lt;br /&gt;
module load ${FOAM_VERSION}&lt;br /&gt;
foamInit&lt;br /&gt;
&lt;br /&gt;
# remove decomposePar if you already decomposed your case beforehand &lt;br /&gt;
decomposeParHPC &amp;amp;&amp;amp;&lt;br /&gt;
&lt;br /&gt;
# starting the solver in parallel. Name of the solver is given in the &amp;quot;EXECUTABLE&amp;quot; variable&lt;br /&gt;
mpirun ${MPIRUN_OPTIONS} ${EXECUTABLE} -parallel &amp;amp;&amp;amp;&lt;br /&gt;
&lt;br /&gt;
reconstructParHPC&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Attention:&#039;&#039;&#039; The script above will run a parallel OpenFOAM Job with pre-installed OpenMPI. If you are using an OpenFOAM version wich comes with pre-installed Intel MPI (like, for example&amp;lt;span style=&amp;quot;background:#edeae2;margin:10px;padding:1px;border:1px dotted #808080&amp;quot;&amp;gt;cae/openfoam/v1712-impi&amp;lt;/span&amp;gt;) you will have to modify the batch script to use all the advantages of Intel MPI for parallel calculations. For details see:  &lt;br /&gt;
* [[Batch_Jobs_-_bwUniCluster_Features|Batch Jobs Features]]&lt;br /&gt;
&lt;br /&gt;
= Using I/O and reducing the amount of data and files =&lt;br /&gt;
In OpenFOAM, you can control which variables or fields are written at specific times. For example, for post-processing purposes, you might need only a subset of variables. In order to control which files will be written, there is a function object called &amp;quot;writeObjects&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
An example controlDict file may look like this: At the top of the file (entry &amp;quot;writeControl&amp;quot;) you specify that ALL fields (variables) required for restarting are saved every 12 wall-clock hours. Then, additionally, at the bottom of the controlDict in the &amp;quot;functions&amp;quot; block, you can add a function object of type &amp;quot;writeObjects&amp;quot;. With this function object, you can control the output of specific fields independent of the entry at the top of the file: &lt;br /&gt;
&amp;lt;!--b)--&amp;gt; &lt;br /&gt;
{| style=&amp;quot;width: 100%; border:1px solid #d0cfcc; background:#f2f7ff;border-spacing: 5px;&amp;quot;&lt;br /&gt;
| style=&amp;quot;width:280px; white-space:nowrap; color:#000;&amp;quot; |&lt;br /&gt;
&amp;lt;source lang=&amp;quot;text&amp;quot;&amp;gt;&lt;br /&gt;
/*--------------------------------*- C++ -*----------------------------------*\&lt;br /&gt;
| =========                 |                                                 |&lt;br /&gt;
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |&lt;br /&gt;
|  \\    /   O peration     | Version:  4.1.x                                 |&lt;br /&gt;
|   \\  /    A nd           | Web:      www.OpenFOAM.org                      |&lt;br /&gt;
|    \\/     M anipulation  |                                                 |&lt;br /&gt;
\*---------------------------------------------------------------------------*/&lt;br /&gt;
FoamFile&lt;br /&gt;
{&lt;br /&gt;
    version     2.0;&lt;br /&gt;
    format      ascii;&lt;br /&gt;
    class       dictionary;&lt;br /&gt;
    location    &amp;quot;system&amp;quot;;&lt;br /&gt;
    object      controlDict;&lt;br /&gt;
}&lt;br /&gt;
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //&lt;br /&gt;
&lt;br /&gt;
startFrom       latestTime;&lt;br /&gt;
startTime       0;&lt;br /&gt;
stopAt  	endTime;&lt;br /&gt;
endTime         1e2;&lt;br /&gt;
deltaT          1e-5;&lt;br /&gt;
&lt;br /&gt;
writeControl    clockTime;&lt;br /&gt;
writeInterval   43200; // write ALL fields necessary to restart your simulation &lt;br /&gt;
                       // every 43200 wall-clock seconds = 12 hours of real time&lt;br /&gt;
&lt;br /&gt;
purgeWrite      0;&lt;br /&gt;
writeFormat     binary;&lt;br /&gt;
writePrecision  10;&lt;br /&gt;
writeCompression off;&lt;br /&gt;
timeFormat      general;&lt;br /&gt;
timePrecision   10;&lt;br /&gt;
runTimeModifiable false;&lt;br /&gt;
&lt;br /&gt;
functions&lt;br /&gt;
{&lt;br /&gt;
    writeFields // name of the function object&lt;br /&gt;
    {&lt;br /&gt;
        type writeObjects;&lt;br /&gt;
        libs ( &amp;quot;libutilityFunctionObjects.so&amp;quot; );&lt;br /&gt;
&lt;br /&gt;
        objects&lt;br /&gt;
        (&lt;br /&gt;
	    T U rho // list of fields/variables to be written&lt;br /&gt;
        );&lt;br /&gt;
&lt;br /&gt;
        // E.g. write every 1e-5 seconds of simulation time only the specified fields&lt;br /&gt;
        writeControl runTime;&lt;br /&gt;
        writeInterval 1e-5; // write every 1e-5 seconds&lt;br /&gt;
    }&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also define multiple function objects in order to write different subsets of fields at different times. You can also use wildcards in the list of fields- for example, in order to write out all fields starting with &amp;quot;RR_&amp;quot; you can add&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;quot;RR_.*&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
to the list of objects. You can get a list of valid field names by writing &amp;quot;banana&amp;quot; in the field list. During the run of the solver all valid field names are printed.&lt;br /&gt;
The output time can be changed too. Instead of writing at specific times in the simulation, you can also write after a certain number of time steps or depening on the wall clock time:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;// write every 100th simulation time step&lt;br /&gt;
writeControl timeStep;&lt;br /&gt;
writeInterval 100;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;// every 3600 seconds of real wall clock time&lt;br /&gt;
writeControl runtime;&lt;br /&gt;
writeInterval 3600; &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you use OpenFOAM before version 4.0 or 1606, the type of function object is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
type writeRegisteredObject; // (instead of type writeObjects) &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
If you use OpenFOAM before version 3.0, you have to load the library with&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
functionObjectLibs (&amp;quot;libIOFunctionObjects.so&amp;quot;); // (instead of libs ( &amp;quot;libutilityFunctionObjects.so&amp;quot; )) &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
and exchange the entry &amp;quot;writeControl&amp;quot; with &amp;quot;outputControl&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
= OpenFOAM and ParaView on bwUniCluster=&lt;br /&gt;
ParaView is not directly linked to OpenFOAM installation on the cluster. Therefore, to visualize OpenFOAM jobs with ParaView, they will have to be manually opened within the specific ParaView module.  &lt;br /&gt;
&lt;br /&gt;
1. Load the ParaView module. For example: &lt;br /&gt;
&amp;lt;pre&amp;gt;$ module load cae/paraview/5.9&amp;lt;/pre&amp;gt;&lt;br /&gt;
2. Create a dummy &#039;*.openfoam&#039; file in the OpenFOAM case folder:&lt;br /&gt;
&amp;lt;pre&amp;gt;$ cd &amp;lt;case_folder_path&amp;gt;&lt;br /&gt;
$ touch &amp;lt;case_name&amp;gt;.openfoam&amp;lt;/pre&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;NOTICE:&#039;&#039;&#039; the name of the dummy file should be the same as the name of the OpenFOAM case folder, with &#039;.openfoam&#039; extension.&lt;br /&gt;
&lt;br /&gt;
3. Open ParaView:&lt;br /&gt;
To run Paraview using VNC system is required on the bwUniCluster.&lt;br /&gt;
On the cluster run: &lt;br /&gt;
&amp;lt;pre&amp;gt;$ start_vnc_desktop --hw-rendering &amp;lt;/pre&amp;gt;&lt;br /&gt;
Start your VNC client on your desktop PC.&lt;br /&gt;
&#039;&#039;&#039;NOTICE&#039;&#039;&#039; Information for remote visualization on KIT HPC system is available on: https://wiki.bwhpc.de/e/BwUniCluster2.0/Software/Start_vnc_desktop&lt;br /&gt;
&lt;br /&gt;
4. In Paraview go to &#039;File&#039; -&amp;gt; &#039;Open&#039;, or press Ctrl+O. Choose to show &#039;All files (*)&#039;, and open your &amp;lt;case_name&amp;gt;.openfoam file. In the pop-up window select OpenFOAM, and press &#039;Ok&#039;.&lt;br /&gt;
&lt;br /&gt;
5. That&#039;s it! Enjoy ParaView and OpenFOAM.&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster2.0/Software/Matlab&amp;diff=15201</id>
		<title>BwUniCluster2.0/Software/Matlab</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster2.0/Software/Matlab&amp;diff=15201"/>
		<updated>2025-08-14T11:11:33Z</updated>

		<summary type="html">&lt;p&gt;S Braun: S Braun moved page BwUniCluster2.0/Software/Matlab to BwUniCluster3.0/Software/Matlab&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;#REDIRECT [[BwUniCluster3.0/Software/Matlab]]&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Software/Matlab&amp;diff=15200</id>
		<title>BwUniCluster3.0/Software/Matlab</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Software/Matlab&amp;diff=15200"/>
		<updated>2025-08-14T11:11:33Z</updated>

		<summary type="html">&lt;p&gt;S Braun: S Braun moved page BwUniCluster2.0/Software/Matlab to BwUniCluster3.0/Software/Matlab&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Softwarepage|math/matlab}}&lt;br /&gt;
&lt;br /&gt;
{| width=600px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Description !! Content&lt;br /&gt;
|-&lt;br /&gt;
| module load&lt;br /&gt;
| math/matlab&lt;br /&gt;
|-&lt;br /&gt;
| License&lt;br /&gt;
| [https://de.mathworks.com/pricing-licensing/index.html?intendeduse=edu&amp;amp;prodcode=ML Academic License/Commercial]&lt;br /&gt;
|-&lt;br /&gt;
| Citing&lt;br /&gt;
| n/a&lt;br /&gt;
|-&lt;br /&gt;
| Links&lt;br /&gt;
| [https://de.mathworks.com/products/matlab/ MATLAB Homepage] &amp;amp;#124; [https://de.mathworks.com/index.html?s_tid=gn_logo MathWorks Homepage] &amp;amp;#124; [https://de.mathworks.com/support/?s_tid=gn_supp Support and more]&lt;br /&gt;
|-&lt;br /&gt;
| Graphical Interface&lt;br /&gt;
| No&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Description =&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;MATLAB&#039;&#039;&#039; (MATrix LABoratory) is a high-level programming language and interactive computing environment for numerical calculation and data visualization.&lt;br /&gt;
&lt;br /&gt;
= Loading MATLAB =&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
It is not advisable to invoke an interactive MATLAB session on a login node of the cluster. Such sessions will be terminated automatically.&lt;br /&gt;
The recommended way to run a long-duration interactive MATLAB session is to submit an interactive job and start MATLAB from within the dedicated compute node assigned to you by the queueing system (consult the specific cluster users guide on how to submit interactive jobs).&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
An interactive MATLAB session with graphical user interface (GUI) can be started with the command (requires X11 forwarding enabled for your ssh login):&lt;br /&gt;
&amp;lt;pre&amp;gt;$ matlab&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Since graphics rendering can be very slow on remote connections, the preferable way is to run the MATLAB command line interface without GUI:&lt;br /&gt;
&amp;lt;pre&amp;gt;$ matlab -nodisplay&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The following command will execute a MATLAB script or function named &amp;quot;example&amp;quot; &#039;&#039;&#039;on a single thread&#039;&#039;&#039;:&lt;br /&gt;
&amp;lt;pre&amp;gt;$ matlab -nodisplay -singleCompThread -r example &amp;gt; result.out 2&amp;gt;&amp;amp;1&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The output of this session will be redirected to the file result.out. The option &amp;lt;span style=&amp;quot;background:#edeae2;margin:2px;padding:1px;border:1px dotted #808080&amp;quot;&amp;gt;-r&amp;lt;/span&amp;gt; executes the MATLAB statement non-interactively. The option &amp;lt;span style=&amp;quot;background:#edeae2;margin:2px;padding:1px;border:1px dotted #808080&amp;quot;&amp;gt;-singleCompThread&amp;lt;/span&amp;gt; limits MATLAB to single computational thread. Most of the time, running MATLAB in single-threaded mode will meet your needs. But if you have mathematically intense computations that benefit from the built-in multithreading provided by MATLAB&#039;s BLAS and FFT implementation, then you can experiment with running in multi-threaded mode by omitting this option (see section 4.1 - Implicit Threading).&lt;br /&gt;
&lt;br /&gt;
As with all processes that require more than a few minutes to run, non-trivial MATLAB jobs must be submitted to the cluster queuing system. Example batch scripts are available in the directory pointed to by the environment variable &amp;lt;span style=&amp;quot;background:#edeae2;margin:2px;padding:1px;border:1px dotted #808080&amp;quot;&amp;gt;$MATLAB_EXA_DIR&amp;lt;/span&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
= Parallel Computing Using MATLAB =&lt;br /&gt;
&lt;br /&gt;
Parallelization of MATLAB jobs is realized via the built-in multithreading provided by MATLAB&#039;s BLAS and FFT implementation and the parallel computing functionality of MATLAB&#039;s Parallel Computing Toolbox (PCT). The MATLAB Parallel/Distributed Computing Server is not available on the bwHPC-Clusters.&lt;br /&gt;
&lt;br /&gt;
== Implicit Threading ==&lt;br /&gt;
&lt;br /&gt;
A large number of built-in MATLAB functions may utilize multiple cores automatically without any code modifications required. This is referred to as implicit multithreading and must be strictly distinguished from explicit parallelism provided by the Parallel Computing Toolbox (PCT) which requires specific commands in your code in order to create threads.&lt;br /&gt;
&lt;br /&gt;
Implicit threading particularly takes place for linear algebra operations (such as the solution to a linear system A\b or matrix products A*B) and FFT operations. Many other high-level MATLAB functions do also benefit from multithreading capabilities of their underlying routines. However, the user can still enforce single-threaded mode by adding the command line option &amp;lt;span style=&amp;quot;background:#edeae2;margin:2px;padding:1px;border:1px dotted #808080&amp;quot;&amp;gt;-singleCompThread&amp;lt;/span&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Whenever implicit threading takes place, MATLAB will detect the total number of cores that exist on a machine and by default makes use of all of them. This has very important implications for MATLAB jobs in HPC environments with shared-node job scheduling policy (i.e. with multiple users sharing one compute node). Due to this behaviour, a MATLAB job may take over more compute resources than assigned by the queueing system of the cluster (and thereby taking away these resources from all other users with running jobs on the same node - including your own jobs).&lt;br /&gt;
&lt;br /&gt;
Therefore, when running in multi-threaded mode, MATLAB always requires the user&#039;s intervention to not allocate all cores of the machine (unless requested so from the queueing system). The number of threads must be controlled from within the code by means of the &amp;lt;span style=&amp;quot;background:#edeae2;margin:2px;padding:1px;border:1px dotted #808080&amp;quot;&amp;gt;maxNumCompThreads(N)&amp;lt;/span&amp;gt; function (which is supposed to be deprecated) or, alternatively, with the &amp;lt;span style=&amp;quot;background:#edeae2;margin:2px;padding:1px;border:1px dotted #808080&amp;quot;&amp;gt;feature(&#039;numThreads&#039;, N)&amp;lt;/span&amp;gt; function (which is currently undocumented).&lt;br /&gt;
&lt;br /&gt;
== Using the Parallel Computing Toolbox (PCT) ==&lt;br /&gt;
&lt;br /&gt;
By using the PCT one can make explicit use of several cores on multicore processors to parallelize MATLAB applications without MPI programming. Under MATLAB version 8.4 and earlier, this toolbox provides 12 workers (MATLAB computational engines) to execute applications locally on a single multicore node. Under MATLAB version 8.5 and later, the number of workers available is equal to the number of cores on a single node (up to a maximum of 512).&lt;br /&gt;
&lt;br /&gt;
If multiple PCT jobs are running at the same time, they all write temporary MATLAB job information to the same location. This race condition can cause one or more of the parallel MATLAB jobs fail to use the parallel functionality of the toolbox.&lt;br /&gt;
&lt;br /&gt;
To solve this issue, each MATLAB job should explicitly set a unique location where these files are created. This can be accomplished by the following snippet of code added to your MATLAB script.&lt;br /&gt;
&lt;br /&gt;
{{bwFrameA|&lt;br /&gt;
&amp;lt;source lang=&amp;quot;Matlab&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
% create a local cluster object&lt;br /&gt;
pc = parcluster(&#039;local&#039;)&lt;br /&gt;
&lt;br /&gt;
% get the number of dedicated cores from environment&lt;br /&gt;
pc.NumWorkers = str2num(getenv(&#039;SLURM_NPROCS&#039;))&lt;br /&gt;
&lt;br /&gt;
% explicitly set the JobStorageLocation to the tmp directory that is unique to each cluster job (and is on local, fast scratch)&lt;br /&gt;
parpool_tmpdir = [getenv(&#039;TMP&#039;),&#039;/.matlab/local_cluster_jobs/slurm_jobID_&#039;,getenv(&#039;SLURM_JOB_ID&#039;)]&lt;br /&gt;
mkdir(parpool_tmpdir)&lt;br /&gt;
pc.JobStorageLocation = parpool_tmpdir&lt;br /&gt;
&lt;br /&gt;
% start the parallel pool&lt;br /&gt;
parpool(pc)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
Note: The code snippet also sets the correct number of parallel workers in MATLAB according to the total number of processes dedicated to the job given by the environment variable &amp;lt;span style=&amp;quot;background:#edeae2;margin:2px;padding:1px;border:1px dotted #808080&amp;quot;&amp;gt;$SLURM_NPROCS&amp;lt;/span&amp;gt; in the job submission file.&lt;br /&gt;
&lt;br /&gt;
= General Performance Tips for MATLAB =&lt;br /&gt;
&lt;br /&gt;
MATLAB data structures (arrays or matrices) are dynamic in size, i.e. MATLAB will automatically resize the structure on demand. Although this seems to be convenient, MATLAB continually needs to allocate a new chunk of memory and copy over the data to the new block of memory as the array or matrix grows in a loop. This may take a significant amount of extra time during execution of the program.&lt;br /&gt;
&lt;br /&gt;
Code performance can often be drastically improved by preallocating memory for the final expected size of the array or matrix before actually starting the processing loop. In order to preallocate an array of strings, you can use MATLAB&#039;s build-in cell function. In order to preallocate an array or matrix of numbers, you can use MATLAB&#039;s build-in zeros function.&lt;br /&gt;
&lt;br /&gt;
The performance benefit of preallocation is illustrated with the following example code.&lt;br /&gt;
&lt;br /&gt;
{{bwFrameA|&lt;br /&gt;
&amp;lt;source lang=&amp;quot;Matlab&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
% prealloc.m&lt;br /&gt;
&lt;br /&gt;
clear all;&lt;br /&gt;
&lt;br /&gt;
num=10000000;&lt;br /&gt;
&lt;br /&gt;
disp(&#039;Without preallocation:&#039;)&lt;br /&gt;
tic&lt;br /&gt;
for i=1:num&lt;br /&gt;
    a(i)=i;&lt;br /&gt;
end&lt;br /&gt;
toc&lt;br /&gt;
&lt;br /&gt;
disp(&#039;With preallocation:&#039;)&lt;br /&gt;
tic&lt;br /&gt;
b=zeros(1,num);&lt;br /&gt;
for i=1:num&lt;br /&gt;
    b(i)=i;&lt;br /&gt;
end&lt;br /&gt;
toc&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
On a compute node, the result may look like this:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Without preallocation:&lt;br /&gt;
Elapsed time is 2.879446 seconds.&lt;br /&gt;
With preallocation:&lt;br /&gt;
Elapsed time is 0.097557 seconds.&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please recognize that the code runs almost 30 times faster with preallocation.&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15177</id>
		<title>BwUniCluster3.0/Running Jobs</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15177"/>
		<updated>2025-07-25T04:40:24Z</updated>

		<summary type="html">&lt;p&gt;S Braun: /* Short Queues */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
= Purpose and function of a queuing system =&lt;br /&gt;
&lt;br /&gt;
All compute activities on bwUniCluster 3.0 have to be performed on the compute nodes. Compute nodes are only available by requesting the corresponding resources via the queuing system. As soon as the requested resources are available, automated tasks are executed via a batch script or they can be accessed interactively.&amp;lt;br&amp;gt;&lt;br /&gt;
General procedure: Hint to [[Running_Calculations | Running Calculations]]&lt;br /&gt;
&lt;br /&gt;
== Job submission process ==&lt;br /&gt;
&lt;br /&gt;
bwUniCluster 3.0 has installed the workload managing software Slurm. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.&lt;br /&gt;
&lt;br /&gt;
== Slurm ==&lt;br /&gt;
&lt;br /&gt;
HPC Workload Manager on bwUniCluster 3.0 is Slurm.&lt;br /&gt;
Slurm is a cluster management and job scheduling system. Slurm has three key functions. &lt;br /&gt;
* It allocates access to resources (compute cores on nodes) to users for some duration of time so they can perform work. &lt;br /&gt;
* It provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. &lt;br /&gt;
* It arbitrates contention for resources by managing a queue of pending work.&lt;br /&gt;
&lt;br /&gt;
Any kind of calculation on the compute nodes of bwUniCluster 3.0 requires the user to define calculations as a sequence of commands together with required run time, number of CPU cores and main memory and submit all, i.e., the &#039;&#039;&#039;batch job&#039;&#039;&#039;, to a resource and workload managing software.&lt;br /&gt;
&lt;br /&gt;
== Terms and definitions ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Partitions &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Slurm manages job queues for different &#039;&#039;&#039;partitions&#039;&#039;&#039;. Partitions are used to group similar node types (e.g. nodes with and without accelerators) and to enforce different access policies and resource limits.&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different partitions:&lt;br /&gt;
&lt;br /&gt;
* CPU-only nodes&lt;br /&gt;
** 2-socket nodes, consisting of 2 Intel Ice Lake processors with 32 cores each or 2 AMD processors with 48 cores each&lt;br /&gt;
** 2-socket nodes with very high RAM capacity, consisting of 2 AMD processors with 48 cores each&lt;br /&gt;
* GPU-accelerated nodes&lt;br /&gt;
** 2-socket nodes with 4x NVIDIA A100 or 4x NVIDIA H100 GPUs&lt;br /&gt;
** 4-socket node with 4x AMD Instinct accelerator&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Queues &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Job &#039;&#039;&#039;queues&#039;&#039;&#039; are used to manage jobs that request access to shared but limited computing resources of a certain kind (partition).&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different main types of queues:&lt;br /&gt;
* Regular queues&lt;br /&gt;
** cpu: Jobs that request CPU-only nodes.&lt;br /&gt;
** gpu: Jobs that request GPU-accelerated nodes.&lt;br /&gt;
* Development queues (dev)&lt;br /&gt;
** Short, usually interactive jobs that are used for developing, compiling and testing code and workflows. The intention behind development queues is to provide users with immediate access to computer resources without having to wait. This is the place to realize instantaneous heavy compute without affecting other users, as would be the case on the login nodes.&lt;br /&gt;
&lt;br /&gt;
Requested compute resources such as (wall-)time, number of nodes and amount of memory are restricted and must fit into the boundaries imposed by the queues. The request for compute resources on the bwUniCluster 3.0 &amp;lt;font color=red&amp;gt;requires at least the specification of the &#039;&#039;&#039;queue&#039;&#039;&#039; and the &#039;&#039;&#039;time&#039;&#039;&#039;&amp;lt;/font&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Jobs &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Jobs can be run non-interactively as &#039;&#039;&#039;batch jobs&#039;&#039;&#039; or as &#039;&#039;&#039;interactive jobs&#039;&#039;&#039;.&amp;lt;br&amp;gt;&lt;br /&gt;
Submitting a batch job means, that all steps of a compute project are defined in a Bash script. This Bash script is queued and executed as soon as the compute resources are available and allocated. Jobs are enqueued with the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command.&lt;br /&gt;
For interactive jobs, the resources are requested with the &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; command. As soon as the computing resources are available and allocated, a command line prompt is returned on a computing node and the user can freely dispose of the resources now available to him.&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
&#039;&#039;&#039;Please remember:&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;Heavy computations are not allowed on the login nodes&#039;&#039;&#039;.&amp;lt;br&amp;gt;Use a developement or a regular job queue instead! Please refer to [[BwUniCluster3.0/Login#Allowed_Activities_on_Login_Nodes|Allowed Activities on Login Nodes]].&lt;br /&gt;
* &#039;&#039;&#039;Development queues&#039;&#039;&#039; are meant for &#039;&#039;&#039;development tasks&#039;&#039;&#039;.&amp;lt;br&amp;gt;Do not misuse this queue for regular, short-running jobs or chain jobs! Only one running job at a time is enabled. Maximum queue length is reduced to 3.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Queues on bwUniCluster 3.0 = &lt;br /&gt;
== Policy ==&lt;br /&gt;
&lt;br /&gt;
The computing time is provided in accordance with the &#039;&#039;&#039;fair share policy&#039;&#039;&#039;. The individual investment shares of the respective university and the resources already used by its members are taken into account. Furthermore, the following throttling policy is also active: The &#039;&#039;&#039;maximum amount of physical cores&#039;&#039;&#039; used at any given time from jobs running is &#039;&#039;&#039;1920 per user&#039;&#039;&#039; (aggregated over all running jobs). This number corresponds to 30 nodes on the Ice Lake partition or 20 nodes on the standard partition. The aim is to minimize waiting times and maximize the number of users who can access computing time at the same time.&lt;br /&gt;
&lt;br /&gt;
== Regular Queues ==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node-Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=30, mem=249600mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=20, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;highmem&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;High Memory&lt;br /&gt;
| mem-per-cpu=12090mb&lt;br /&gt;
| mem=380001mb&lt;br /&gt;
| time=72:00:00, nodes=4, mem=2300000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=72:00:00, nodes=12, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_mi300&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU node&amp;lt;br/&amp;gt;AMD GPU x4&lt;br /&gt;
| mem-per-gpu=128200mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=72:00:00, nodes=1, mem=510000mb, ntasks-per-node=40, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_il&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;gpu_h100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=48:00:00, nodes=9, mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 1: Regular Queues&lt;br /&gt;
&lt;br /&gt;
== Short Queues ==&lt;br /&gt;
&amp;lt;p style=&amp;quot;color:red; &amp;quot;&amp;gt;&amp;lt;b&amp;gt;Queues with a short runtime of 30 minutes.&amp;lt;/b&amp;gt;&amp;lt;/p&amp;gt; &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_short&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=94000mb&amp;lt;br/&amp;gt;cpus-per-gpu=12&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=12, mem=376000mb, ntasks-per-node=48, (threads-per-core=2)&lt;br /&gt;
|}&lt;br /&gt;
Table 2: Short Queues&lt;br /&gt;
&lt;br /&gt;
== Development Queues ==&lt;br /&gt;
Only for development, i.e. debugging or performance optimization ...&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=8, mem=249600mb, ntasks-per-node=64, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=1, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=1, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_a100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&amp;lt;br/&amp;gt;&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16 &lt;br /&gt;
| gres=gpu:1&lt;br /&gt;
| time=30, nodes=1, mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 3: Development Queues&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Default resources of a queue class define number of tasks and memory if not explicitly given with sbatch command. Resource list acronyms &#039;&#039;--time&#039;&#039;, &#039;&#039;--ntasks&#039;&#039;, &#039;&#039;--nodes&#039;&#039;, &#039;&#039;--mem&#039;&#039; and &#039;&#039;--mem-per-cpu&#039;&#039; are described [[BwUniCluster3.0/Running_Jobs/Slurm|here]].&lt;br /&gt;
&lt;br /&gt;
== Check available resources: sinfo_t_idle ==&lt;br /&gt;
The Slurm command sinfo is used to view partition and node information for a system running Slurm. It incorporates down time, reservations, and node state information in determining the available backfill window. The sinfo command can only be used by the administrator.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
SCC has prepared a special script (sinfo_t_idle) to find out how many processors are available for immediate use on the system. It is anticipated that users will use this information to submit jobs that meet these criteria and thus obtain quick job turnaround times. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The following command displays what resources are available for immediate use for the whole partition.&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sinfo_t_idle &lt;br /&gt;
Partition dev_cpu                 :      1 nodes idle&lt;br /&gt;
Partition cpu                     :      1 nodes idle&lt;br /&gt;
Partition highmem                 :      2 nodes idle&lt;br /&gt;
Partition dev_gpu_h100            :      0 nodes idle&lt;br /&gt;
Partition gpu_h100                :      0 nodes idle&lt;br /&gt;
Partition gpu_mi300               :      0 nodes idle&lt;br /&gt;
Partition dev_cpu_il              :      7 nodes idle&lt;br /&gt;
Partition cpu_il                  :      2 nodes idle&lt;br /&gt;
Partition dev_gpu_a100_il         :      1 nodes idle&lt;br /&gt;
Partition gpu_a100_il             :      0 nodes idle&lt;br /&gt;
Partition gpu_h100_il             :      1 nodes idle&lt;br /&gt;
Partition gpu_a100_short          :      0 nodes idle&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Running Jobs =&lt;br /&gt;
&lt;br /&gt;
== Slurm Commands (excerpt) ==&lt;br /&gt;
Important Slurm commands for non-administrators working on bwUniCluster 3.0.&lt;br /&gt;
{| width=850px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Brief explanation&lt;br /&gt;
|-&lt;br /&gt;
| [[#Batch Jobs: sbatch|sbatch]] || Submits a job and puts it into the queue [[https://slurm.schedmd.com/sbatch.html sbatch]] &lt;br /&gt;
|-&lt;br /&gt;
| [[#Interactive Jobs: salloc|salloc]] || Requests resources for an interactive Job [[https://slurm.schedmd.com/salloc.html salloc]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Monitor and manage jobs |scontrol show job]] || Displays detailed job state information [[https://slurm.schedmd.com/scontrol.html scontrol]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue]] || Displays information about active, eligible, blocked, and/or recently completed jobs [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue --start]] || Returns start time of submitted job [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Check available resources: sinfo_t_idle|sinfo_t_idle]] || Shows what resources are available for immediate use [[https://slurm.schedmd.com/sinfo.html sinfo]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Canceling own jobs : scancel|scancel]] || Cancels a job [[https://slurm.schedmd.com/scancel.html scancel]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
* [https://slurm.schedmd.com/tutorials.html  Slurm Tutorials]&lt;br /&gt;
* [https://slurm.schedmd.com/pdfs/summary.pdf  Slurm command/option summary (2 pages)]&lt;br /&gt;
* [https://slurm.schedmd.com/man_index.html  Slurm Commands]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Batch Jobs: sbatch ==&lt;br /&gt;
&lt;br /&gt;
Batch jobs are submitted by using the command &#039;&#039;&#039;sbatch&#039;&#039;&#039;. The main purpose of the &#039;&#039;&#039;sbatch&#039;&#039;&#039; command is to specify the resources that are needed to run the job. &#039;&#039;&#039;sbatch&#039;&#039;&#039; will then queue the batch job. However, starting of batch job depends on the availability of the requested resources and the fair sharing value.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The syntax and use of &#039;&#039;&#039;sbatch&#039;&#039;&#039; can be displayed via:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ man sbatch&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;sbatch&#039;&#039;&#039; options can be used from the command line or in your job script. Different defaults for some of these options are set based on the queue and can be found [[BwUniCluster3.0/Slurm | here]]&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;3&amp;quot; | sbatch Options&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;width:8%&amp;quot;| Command line&lt;br /&gt;
! style=&amp;quot;width:9%&amp;quot;| Script&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Purpose&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -t, --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| #SBATCH --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| Wall clock time limit.&amp;lt;br&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -N, --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of nodes to be used.&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -n, --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of tasks to be launched.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Maximum count of tasks per node.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -c, --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of CPUs required per (MPI-)task.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Memory in MegaByte per node. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Minimum Memory required per allocated CPU. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| Notify user by email when certain event types occur.&amp;lt;br&amp;gt;Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
|  The specified mail-address receives email notification of state changes as defined by --mail-type.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job output is stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job error messages are stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -J, --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| Job name.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| #SBATCH --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| Identifies which environment variables from the submission environment are propagated to the launched application. Default is ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -A, --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| #SBATCH --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| Change resources used by this job to specified group. You may need this option if your account is assigned to more than one group. By command &amp;quot;scontrol show job&amp;quot; the project group the job is accounted on can be seen behind &amp;quot;Account=&amp;quot;. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -p, --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| #SBATCH --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| Request a specific queue for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| #SBATCH --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| Use a specific reservation for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;LSDF&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=LSDF&lt;br /&gt;
| Job constraint LSDF filesystems.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&lt;br /&gt;
| Job constraint BeeOND filesystem.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Interactive Jobs: salloc ==&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 you are only allowed to run short jobs (&amp;lt;&amp;lt; 1 hour) with little memory requirements (&amp;lt;&amp;lt; 8 GByte) on the logins nodes. If you want to run longer jobs and/or jobs with a request of more than 8 GByte of memory, you must allocate resources for so-called interactive jobs by usage of the command salloc on a login node. Considering a serial application running on a compute node that requires 5000 MByte of memory and limiting the interactive run to 2 hours the following command has to be executed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -n 1 -t 120 --mem=5000&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then you will get one core on a compute node within the partition &amp;quot;cpu&amp;quot;. After execution of this command &#039;&#039;&#039;DO NOT CLOSE&#039;&#039;&#039; your current terminal session but wait until the queueing system Slurm has granted you the requested resources on the compute system. You will be logged in automatically on the granted core! To run a serial program on the granted core you only have to type the name of the executable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ./&amp;lt;my_serial_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please be aware that your serial job must run less than 2 hours in this example, else the job will be killed during runtime by the system. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can also start now a graphical X11-terminal connecting you to the dedicated resource that is available for 2 hours. You can start it by the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ xterm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that, once the walltime limit has been reached the resources - i.e. the compute node - will automatically be revoked.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
An interactive parallel application running on one compute node or on many compute nodes (e.g. here 5 nodes) with 96 cores each requires usually an amount of memory in GByte (e.g. 50 GByte) and a maximum time (e.g. 1 hour). E.g. 5 nodes can be allocated by the following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -N 5 --ntasks-per-node=96 -t 01:00:00  --mem=50gb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now you can run parallel jobs on 480 cores requiring 50 GByte of memory per node. Please be aware that you will be logged in on core 0 of the first node.&lt;br /&gt;
If you want to have access to another node you have to open a new terminal, connect it also to bwUniCluster 3.0 and type the following commands to&lt;br /&gt;
connect to the running interactive job and then to a specific node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ srun --jobid=XXXXXXXX --pty /bin/bash&lt;br /&gt;
$ srun --nodelist=uc3nXXX --pty /bin/bash&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
the jobid and the nodelist can be shown.&lt;br /&gt;
&lt;br /&gt;
If you want to run MPI-programs, you can do it by simply typing mpirun &amp;lt;program_name&amp;gt;. Then your program will be run on 480 cores. A very simple example for starting a parallel job can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also start the debugger ddt by the commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module add devel/ddt&lt;br /&gt;
$ ddt &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above commands will execute the parallel program &amp;lt;my_mpi_program&amp;gt; on all available cores. You can also start parallel programs on a subset of cores; an example for this can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -n 50 &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you are using Intel MPI you must start &amp;lt;my_mpi_program&amp;gt; by the command mpiexec.hydra (instead of mpirun).&lt;br /&gt;
&lt;br /&gt;
== Interactive Computing with Jupyter ==&lt;br /&gt;
&lt;br /&gt;
== Monitor and manage jobs ==&lt;br /&gt;
&lt;br /&gt;
=== List of your submitted jobs : squeue ===&lt;br /&gt;
Displays information about YOUR active, pending and/or recently completed jobs. The command displays all own active and pending jobs. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;squeue&#039;&#039; example on bwUniCluster 3.0 &amp;lt;small&amp;gt;(Only your own jobs are displayed!)&amp;lt;/small&amp;gt;.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue &lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  R       8:15      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123 PD       0:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  R       2:41      1 uc3n084&lt;br /&gt;
$ squeue -l&lt;br /&gt;
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  RUNNING       8:55     20:00      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123  PENDING       0:00     20:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  RUNNING       3:21     20:00      1 uc3n084&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Detailed job information : scontrol show job ===&lt;br /&gt;
scontrol show job displays detailed job state information and diagnostic output for all or a specified job of yours. Detailed information is available for active, pending and recently completed jobs. The command scontrol is explained in detail on the webpage https://slurm.schedmd.com/scontrol.html or via manpage (man scontrol). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of all your jobs in normal mode: scontrol show job&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of a job with &amp;lt;jobid&amp;gt; in normal mode: scontrol show job &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Here is an example from bwUniCluster 3.0.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
1262       cpu     wrap ka_zs040  R       1:12      1 uc3n002&lt;br /&gt;
&lt;br /&gt;
$&lt;br /&gt;
$ # now, see what&#039;s up with my pending job with jobid 1262&lt;br /&gt;
$ &lt;br /&gt;
$ scontrol show job 1262&lt;br /&gt;
&lt;br /&gt;
JobId=1262 JobName=wrap&lt;br /&gt;
   UserId=ka_zs0402(241992) GroupId=ka_scc(12345) MCS_label=N/A&lt;br /&gt;
   Priority=4246 Nice=0 Account=ka QOS=normal&lt;br /&gt;
   JobState=RUNNING Reason=None Dependency=(null)&lt;br /&gt;
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0&lt;br /&gt;
   RunTime=00:00:37 TimeLimit=00:20:00 TimeMin=N/A&lt;br /&gt;
   SubmitTime=2025-04-04T10:01:30 EligibleTime=2025-04-04T10:01:30&lt;br /&gt;
   AccrueTime=2025-04-04T10:01:30&lt;br /&gt;
   StartTime=2025-04-04T10:01:31 EndTime=2025-04-04T10:21:31 Deadline=N/A&lt;br /&gt;
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-04-04T10:01:31 Scheduler=Main&lt;br /&gt;
   Partition=cpu AllocNode:Sid=uc3n999:2819841&lt;br /&gt;
   ReqNodeList=(null) ExcNodeList=(null)&lt;br /&gt;
   NodeList=uc3n002&lt;br /&gt;
   BatchHost=uc3n002&lt;br /&gt;
   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*&lt;br /&gt;
   ReqTRES=cpu=1,mem=2000M,node=1,billing=1&lt;br /&gt;
   AllocTRES=cpu=2,mem=4000M,node=1,billing=2&lt;br /&gt;
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*&lt;br /&gt;
   MinCPUsNode=1 MinMemoryCPU=2000M MinTmpDiskNode=0&lt;br /&gt;
   Features=(null) DelayBoot=00:00:00&lt;br /&gt;
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)&lt;br /&gt;
   Command=(null)&lt;br /&gt;
   WorkDir=/pfs/data6/home/ka/ka_scc/ka_zs0402&lt;br /&gt;
   StdErr=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
   StdIn=/dev/null&lt;br /&gt;
   StdOut=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Canceling own jobs : scancel ===&lt;br /&gt;
The scancel command is used to cancel jobs. The command scancel is explained in detail on the webpage https://slurm.schedmd.com/scancel.html or via manpage (man scancel). The command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel [-i] &amp;lt;job-id&amp;gt;&lt;br /&gt;
$ scancel -t &amp;lt;job_state_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Slurm Options =&lt;br /&gt;
[[BwUniCluster3.0/Running_Jobs/Slurm | Detailed Slurm usage]]&lt;br /&gt;
&lt;br /&gt;
= Best Practices =&lt;br /&gt;
&lt;br /&gt;
== Step-by-Step example==&lt;br /&gt;
&lt;br /&gt;
== Dos and Don&#039;ts ==&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15162</id>
		<title>BwUniCluster3.0/Running Jobs</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster3.0/Running_Jobs&amp;diff=15162"/>
		<updated>2025-07-21T15:18:20Z</updated>

		<summary type="html">&lt;p&gt;S Braun: /* Queues on bwUniCluster 3.0 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
= Purpose and function of a queuing system =&lt;br /&gt;
&lt;br /&gt;
All compute activities on bwUniCluster 3.0 have to be performed on the compute nodes. Compute nodes are only available by requesting the corresponding resources via the queuing system. As soon as the requested resources are available, automated tasks are executed via a batch script or they can be accessed interactively.&amp;lt;br&amp;gt;&lt;br /&gt;
General procedure: Hint to [[Running_Calculations | Running Calculations]]&lt;br /&gt;
&lt;br /&gt;
== Job submission process ==&lt;br /&gt;
&lt;br /&gt;
bwUniCluster 3.0 has installed the workload managing software Slurm. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.&lt;br /&gt;
&lt;br /&gt;
== Slurm ==&lt;br /&gt;
&lt;br /&gt;
HPC Workload Manager on bwUniCluster 3.0 is Slurm.&lt;br /&gt;
Slurm is a cluster management and job scheduling system. Slurm has three key functions. &lt;br /&gt;
* It allocates access to resources (compute cores on nodes) to users for some duration of time so they can perform work. &lt;br /&gt;
* It provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. &lt;br /&gt;
* It arbitrates contention for resources by managing a queue of pending work.&lt;br /&gt;
&lt;br /&gt;
Any kind of calculation on the compute nodes of bwUniCluster 3.0 requires the user to define calculations as a sequence of commands together with required run time, number of CPU cores and main memory and submit all, i.e., the &#039;&#039;&#039;batch job&#039;&#039;&#039;, to a resource and workload managing software.&lt;br /&gt;
&lt;br /&gt;
== Terms and definitions ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Partitions &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Slurm manages job queues for different &#039;&#039;&#039;partitions&#039;&#039;&#039;. Partitions are used to group similar node types (e.g. nodes with and without accelerators) and to enforce different access policies and resource limits.&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different partitions:&lt;br /&gt;
&lt;br /&gt;
* CPU-only nodes&lt;br /&gt;
** 2-socket nodes, consisting of 2 Intel Ice Lake processors with 32 cores each or 2 AMD processors with 48 cores each&lt;br /&gt;
** 2-socket nodes with very high RAM capacity, consisting of 2 AMD processors with 48 cores each&lt;br /&gt;
* GPU-accelerated nodes&lt;br /&gt;
** 2-socket nodes with 4x NVIDIA A100 or 4x NVIDIA H100 GPUs&lt;br /&gt;
** 4-socket node with 4x AMD Instinct accelerator&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Queues &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Job &#039;&#039;&#039;queues&#039;&#039;&#039; are used to manage jobs that request access to shared but limited computing resources of a certain kind (partition).&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 there are different main types of queues:&lt;br /&gt;
* Regular queues&lt;br /&gt;
** cpu: Jobs that request CPU-only nodes.&lt;br /&gt;
** gpu: Jobs that request GPU-accelerated nodes.&lt;br /&gt;
* Development queues (dev)&lt;br /&gt;
** Short, usually interactive jobs that are used for developing, compiling and testing code and workflows. The intention behind development queues is to provide users with immediate access to computer resources without having to wait. This is the place to realize instantaneous heavy compute without affecting other users, as would be the case on the login nodes.&lt;br /&gt;
&lt;br /&gt;
Requested compute resources such as (wall-)time, number of nodes and amount of memory are restricted and must fit into the boundaries imposed by the queues. The request for compute resources on the bwUniCluster 3.0 &amp;lt;font color=red&amp;gt;requires at least the specification of the &#039;&#039;&#039;queue&#039;&#039;&#039; and the &#039;&#039;&#039;time&#039;&#039;&#039;&amp;lt;/font&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Jobs &#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Jobs can be run non-interactively as &#039;&#039;&#039;batch jobs&#039;&#039;&#039; or as &#039;&#039;&#039;interactive jobs&#039;&#039;&#039;.&amp;lt;br&amp;gt;&lt;br /&gt;
Submitting a batch job means, that all steps of a compute project are defined in a Bash script. This Bash script is queued and executed as soon as the compute resources are available and allocated. Jobs are enqueued with the &amp;lt;code&amp;gt;sbatch&amp;lt;/code&amp;gt; command.&lt;br /&gt;
For interactive jobs, the resources are requested with the &amp;lt;code&amp;gt;salloc&amp;lt;/code&amp;gt; command. As soon as the computing resources are available and allocated, a command line prompt is returned on a computing node and the user can freely dispose of the resources now available to him.&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
&#039;&#039;&#039;Please remember:&#039;&#039;&#039;&lt;br /&gt;
* &#039;&#039;&#039;Heavy computations are not allowed on the login nodes&#039;&#039;&#039;.&amp;lt;br&amp;gt;Use a developement or a regular job queue instead! Please refer to [[BwUniCluster3.0/Login#Allowed_Activities_on_Login_Nodes|Allowed Activities on Login Nodes]].&lt;br /&gt;
* &#039;&#039;&#039;Development queues&#039;&#039;&#039; are meant for &#039;&#039;&#039;development tasks&#039;&#039;&#039;.&amp;lt;br&amp;gt;Do not misuse this queue for regular, short-running jobs or chain jobs! Only one running job at a time is enabled. Maximum queue length is reduced to 3.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Queues on bwUniCluster 3.0 = &lt;br /&gt;
== Policy ==&lt;br /&gt;
&lt;br /&gt;
The computing time is provided in accordance with the &#039;&#039;&#039;fair share policy&#039;&#039;&#039;. The individual investment shares of the respective university and the resources already used by its members are taken into account. Furthermore, the following throttling policy is also active: The &#039;&#039;&#039;maximum amount of physical cores&#039;&#039;&#039; used at any given time from jobs running is &#039;&#039;&#039;1920 per user&#039;&#039;&#039; (aggregated over all running jobs). This number corresponds to 30 nodes on the Ice Lake partition or 20 nodes on the standard partition. The aim is to minimize waiting times and maximize the number of users who can access computing time at the same time.&lt;br /&gt;
&lt;br /&gt;
== Regular Queues ==&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node-Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=30, mem=249600mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=20, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;highmem&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;High Memory&lt;br /&gt;
| mem-per-cpu=12090mb&lt;br /&gt;
| mem=380001mb&lt;br /&gt;
| time=72:00:00, nodes=4, mem=2300000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=12, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_mi300&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU node&amp;lt;br/&amp;gt;AMD GPU x4&lt;br /&gt;
| mem-per-gpu=128200mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| &lt;br /&gt;
| time=72:00:00, nodes=1, mem=510000mb, ntasks-per-node=40, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_il&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;gpu_h100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16&lt;br /&gt;
| &lt;br /&gt;
| time=48:00:00, nodes=9, mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 1: Regular Queues&lt;br /&gt;
&lt;br /&gt;
== Short Queues ==&lt;br /&gt;
Queues with a short runtime of 30 minutes.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;gpu_a100_short&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;Ice Lake&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=8, mem=249600mb, ntasks-per-node=64, (threads-per-core=2)&lt;br /&gt;
|}&lt;br /&gt;
Table 2: Short Queues&lt;br /&gt;
&lt;br /&gt;
== Development Queues ==&lt;br /&gt;
Only for development, i.e. debugging or performance optimization ...&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:5%&amp;quot;| Queue&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Node Type&lt;br /&gt;
! style=&amp;quot;width:23%&amp;quot;| Default Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Minimal Resources&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Maximum Resources&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Ice Lake&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=8, mem=249600mb, ntasks-per-node=64, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_cpu&amp;lt;/code&amp;gt;&lt;br /&gt;
| CPU nodes&amp;lt;br/&amp;gt;Standard&lt;br /&gt;
| mem-per-cpu=2000mb&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=1, mem=380000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_h100&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&lt;br /&gt;
| mem-per-gpu=193300mb&amp;lt;br/&amp;gt;cpus-per-gpu=24&lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=1, mem=760000mb, ntasks-per-node=96, (threads-per-core=2)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;code&amp;gt;dev_gpu_a100_il&amp;lt;/code&amp;gt;&lt;br /&gt;
| GPU nodes&amp;lt;br/&amp;gt;NVIDIA GPU x4&amp;lt;br/&amp;gt;&lt;br /&gt;
| mem-per-gpu=127500mb&amp;lt;br/&amp;gt;cpus-per-gpu=16 &lt;br /&gt;
| &lt;br /&gt;
| time=30, nodes=1, mem=510000mb, ntasks-per-node=64, (threads-per-core=2) &lt;br /&gt;
|}&lt;br /&gt;
Table 3: Development Queues&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Default resources of a queue class define number of tasks and memory if not explicitly given with sbatch command. Resource list acronyms &#039;&#039;--time&#039;&#039;, &#039;&#039;--ntasks&#039;&#039;, &#039;&#039;--nodes&#039;&#039;, &#039;&#039;--mem&#039;&#039; and &#039;&#039;--mem-per-cpu&#039;&#039; are described [[BwUniCluster3.0/Running_Jobs/Slurm|here]].&lt;br /&gt;
&lt;br /&gt;
== Check available resources: sinfo_t_idle ==&lt;br /&gt;
The Slurm command sinfo is used to view partition and node information for a system running Slurm. It incorporates down time, reservations, and node state information in determining the available backfill window. The sinfo command can only be used by the administrator.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
SCC has prepared a special script (sinfo_t_idle) to find out how many processors are available for immediate use on the system. It is anticipated that users will use this information to submit jobs that meet these criteria and thus obtain quick job turnaround times. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The following command displays what resources are available for immediate use for the whole partition.&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sinfo_t_idle&lt;br /&gt;
Partition dev_cpu                 :      2 nodes idle&lt;br /&gt;
Partition cpu                     :     68 nodes idle&lt;br /&gt;
Partition highmem                 :      4 nodes idle&lt;br /&gt;
Partition dev_gpu_h100            :      0 nodes idle&lt;br /&gt;
Partition gpu_h100                :     11 nodes idle&lt;br /&gt;
Partition gpu_mi300               :      1 nodes idle&lt;br /&gt;
Partition dev_cpu_il              :      0 nodes idle&lt;br /&gt;
Partition cpu_il                  :      0 nodes idle&lt;br /&gt;
Partition dev_gpu_a100_il         :      0 nodes idle&lt;br /&gt;
Partition gpu_a100_il             :      0 nodes idle&lt;br /&gt;
Partition gpu_h100_il             :      0 nodes idle&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Running Jobs =&lt;br /&gt;
&lt;br /&gt;
== Slurm Commands (excerpt) ==&lt;br /&gt;
Important Slurm commands for non-administrators working on bwUniCluster 3.0.&lt;br /&gt;
{| width=850px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Brief explanation&lt;br /&gt;
|-&lt;br /&gt;
| [[#Batch Jobs: sbatch|sbatch]] || Submits a job and puts it into the queue [[https://slurm.schedmd.com/sbatch.html sbatch]] &lt;br /&gt;
|-&lt;br /&gt;
| [[#Interactive Jobs: salloc|salloc]] || Requests resources for an interactive Job [[https://slurm.schedmd.com/salloc.html salloc]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Monitor and manage jobs |scontrol show job]] || Displays detailed job state information [[https://slurm.schedmd.com/scontrol.html scontrol]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue]] || Displays information about active, eligible, blocked, and/or recently completed jobs [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jobs : squeue|squeue --start]] || Returns start time of submitted job [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Check available resources: sinfo_t_idle|sinfo_t_idle]] || Shows what resources are available for immediate use [[https://slurm.schedmd.com/sinfo.html sinfo]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Canceling own jobs : scancel|scancel]] || Cancels a job [[https://slurm.schedmd.com/scancel.html scancel]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
* [https://slurm.schedmd.com/tutorials.html  Slurm Tutorials]&lt;br /&gt;
* [https://slurm.schedmd.com/pdfs/summary.pdf  Slurm command/option summary (2 pages)]&lt;br /&gt;
* [https://slurm.schedmd.com/man_index.html  Slurm Commands]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Batch Jobs: sbatch ==&lt;br /&gt;
&lt;br /&gt;
Batch jobs are submitted by using the command &#039;&#039;&#039;sbatch&#039;&#039;&#039;. The main purpose of the &#039;&#039;&#039;sbatch&#039;&#039;&#039; command is to specify the resources that are needed to run the job. &#039;&#039;&#039;sbatch&#039;&#039;&#039; will then queue the batch job. However, starting of batch job depends on the availability of the requested resources and the fair sharing value.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* The syntax and use of &#039;&#039;&#039;sbatch&#039;&#039;&#039; can be displayed via:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ man sbatch&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;sbatch&#039;&#039;&#039; options can be used from the command line or in your job script. Different defaults for some of these options are set based on the queue and can be found [[BwUniCluster3.0/Slurm | here]]&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;3&amp;quot; | sbatch Options&lt;br /&gt;
|-&lt;br /&gt;
! style=&amp;quot;width:8%&amp;quot;| Command line&lt;br /&gt;
! style=&amp;quot;width:9%&amp;quot;| Script&lt;br /&gt;
! style=&amp;quot;width:13%&amp;quot;| Purpose&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -t, --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| #SBATCH --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| Wall clock time limit.&amp;lt;br&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -N, --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of nodes to be used.&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -n, --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of tasks to be launched.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Maximum count of tasks per node.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -c, --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of CPUs required per (MPI-)task.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Memory in MegaByte per node. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Minimum Memory required per allocated CPU. (You should omit the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| Notify user by email when certain event types occur.&amp;lt;br&amp;gt;Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
|  The specified mail-address receives email notification of state changes as defined by --mail-type.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job output is stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job error messages are stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -J, --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| Job name.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| #SBATCH --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| Identifies which environment variables from the submission environment are propagated to the launched application. Default is ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -A, --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| #SBATCH --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| Change resources used by this job to specified group. You may need this option if your account is assigned to more than one group. By command &amp;quot;scontrol show job&amp;quot; the project group the job is accounted on can be seen behind &amp;quot;Account=&amp;quot;. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -p, --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| #SBATCH --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| Request a specific queue for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| #SBATCH --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| Use a specific reservation for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;LSDF&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=LSDF&lt;br /&gt;
| Job constraint LSDF filesystems.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C, --constraint=&#039;&#039;BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&lt;br /&gt;
| Job constraint BeeOND filesystem.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Interactive Jobs: salloc ==&lt;br /&gt;
&lt;br /&gt;
On bwUniCluster 3.0 you are only allowed to run short jobs (&amp;lt;&amp;lt; 1 hour) with little memory requirements (&amp;lt;&amp;lt; 8 GByte) on the logins nodes. If you want to run longer jobs and/or jobs with a request of more than 8 GByte of memory, you must allocate resources for so-called interactive jobs by usage of the command salloc on a login node. Considering a serial application running on a compute node that requires 5000 MByte of memory and limiting the interactive run to 2 hours the following command has to be executed:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -n 1 -t 120 --mem=5000&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then you will get one core on a compute node within the partition &amp;quot;cpu&amp;quot;. After execution of this command &#039;&#039;&#039;DO NOT CLOSE&#039;&#039;&#039; your current terminal session but wait until the queueing system Slurm has granted you the requested resources on the compute system. You will be logged in automatically on the granted core! To run a serial program on the granted core you only have to type the name of the executable.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ./&amp;lt;my_serial_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please be aware that your serial job must run less than 2 hours in this example, else the job will be killed during runtime by the system. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can also start now a graphical X11-terminal connecting you to the dedicated resource that is available for 2 hours. You can start it by the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ xterm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that, once the walltime limit has been reached the resources - i.e. the compute node - will automatically be revoked.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
An interactive parallel application running on one compute node or on many compute nodes (e.g. here 5 nodes) with 96 cores each requires usually an amount of memory in GByte (e.g. 50 GByte) and a maximum time (e.g. 1 hour). E.g. 5 nodes can be allocated by the following command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc -p cpu -N 5 --ntasks-per-node=96 -t 01:00:00  --mem=50gb&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Now you can run parallel jobs on 480 cores requiring 50 GByte of memory per node. Please be aware that you will be logged in on core 0 of the first node.&lt;br /&gt;
If you want to have access to another node you have to open a new terminal, connect it also to bwUniCluster 3.0 and type the following commands to&lt;br /&gt;
connect to the running interactive job and then to a specific node:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ srun --jobid=XXXXXXXX --pty /bin/bash&lt;br /&gt;
$ srun --nodelist=uc3nXXX --pty /bin/bash&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
With the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
the jobid and the nodelist can be shown.&lt;br /&gt;
&lt;br /&gt;
If you want to run MPI-programs, you can do it by simply typing mpirun &amp;lt;program_name&amp;gt;. Then your program will be run on 480 cores. A very simple example for starting a parallel job can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also start the debugger ddt by the commands:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module add devel/ddt&lt;br /&gt;
$ ddt &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The above commands will execute the parallel program &amp;lt;my_mpi_program&amp;gt; on all available cores. You can also start parallel programs on a subset of cores; an example for this can be:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -n 50 &amp;lt;my_mpi_program&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you are using Intel MPI you must start &amp;lt;my_mpi_program&amp;gt; by the command mpiexec.hydra (instead of mpirun).&lt;br /&gt;
&lt;br /&gt;
== Interactive Computing with Jupyter ==&lt;br /&gt;
&lt;br /&gt;
== Monitor and manage jobs ==&lt;br /&gt;
&lt;br /&gt;
=== List of your submitted jobs : squeue ===&lt;br /&gt;
Displays information about YOUR active, pending and/or recently completed jobs. The command displays all own active and pending jobs. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* &#039;&#039;squeue&#039;&#039; example on bwUniCluster 3.0 &amp;lt;small&amp;gt;(Only your own jobs are displayed!)&amp;lt;/small&amp;gt;.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue &lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  R       8:15      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123 PD       0:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  R       2:41      1 uc3n084&lt;br /&gt;
$ squeue -l&lt;br /&gt;
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)&lt;br /&gt;
              1262       cpu     wrap ka_ab123  RUNNING       8:55     20:00      1 uc3n002&lt;br /&gt;
              1267 dev_gpu_h     wrap ka_ab123  PENDING       0:00     20:00      1 (Resources)&lt;br /&gt;
              1265   highmem     wrap ka_ab123  RUNNING       3:21     20:00      1 uc3n084&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Detailed job information : scontrol show job ===&lt;br /&gt;
scontrol show job displays detailed job state information and diagnostic output for all or a specified job of yours. Detailed information is available for active, pending and recently completed jobs. The command scontrol is explained in detail on the webpage https://slurm.schedmd.com/scontrol.html or via manpage (man scontrol). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of all your jobs in normal mode: scontrol show job&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of a job with &amp;lt;jobid&amp;gt; in normal mode: scontrol show job &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Here is an example from bwUniCluster 3.0.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue&lt;br /&gt;
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
1262       cpu     wrap ka_zs040  R       1:12      1 uc3n002&lt;br /&gt;
&lt;br /&gt;
$&lt;br /&gt;
$ # now, see what&#039;s up with my pending job with jobid 1262&lt;br /&gt;
$ &lt;br /&gt;
$ scontrol show job 1262&lt;br /&gt;
&lt;br /&gt;
JobId=1262 JobName=wrap&lt;br /&gt;
   UserId=ka_zs0402(241992) GroupId=ka_scc(12345) MCS_label=N/A&lt;br /&gt;
   Priority=4246 Nice=0 Account=ka QOS=normal&lt;br /&gt;
   JobState=RUNNING Reason=None Dependency=(null)&lt;br /&gt;
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0&lt;br /&gt;
   RunTime=00:00:37 TimeLimit=00:20:00 TimeMin=N/A&lt;br /&gt;
   SubmitTime=2025-04-04T10:01:30 EligibleTime=2025-04-04T10:01:30&lt;br /&gt;
   AccrueTime=2025-04-04T10:01:30&lt;br /&gt;
   StartTime=2025-04-04T10:01:31 EndTime=2025-04-04T10:21:31 Deadline=N/A&lt;br /&gt;
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2025-04-04T10:01:31 Scheduler=Main&lt;br /&gt;
   Partition=cpu AllocNode:Sid=uc3n999:2819841&lt;br /&gt;
   ReqNodeList=(null) ExcNodeList=(null)&lt;br /&gt;
   NodeList=uc3n002&lt;br /&gt;
   BatchHost=uc3n002&lt;br /&gt;
   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*&lt;br /&gt;
   ReqTRES=cpu=1,mem=2000M,node=1,billing=1&lt;br /&gt;
   AllocTRES=cpu=2,mem=4000M,node=1,billing=2&lt;br /&gt;
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*&lt;br /&gt;
   MinCPUsNode=1 MinMemoryCPU=2000M MinTmpDiskNode=0&lt;br /&gt;
   Features=(null) DelayBoot=00:00:00&lt;br /&gt;
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)&lt;br /&gt;
   Command=(null)&lt;br /&gt;
   WorkDir=/pfs/data6/home/ka/ka_scc/ka_zs0402&lt;br /&gt;
   StdErr=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
   StdIn=/dev/null&lt;br /&gt;
   StdOut=/pfs/data6/home/ka/ka_scc/ka_zs0402/slurm-1262.out&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Canceling own jobs : scancel ===&lt;br /&gt;
The scancel command is used to cancel jobs. The command scancel is explained in detail on the webpage https://slurm.schedmd.com/scancel.html or via manpage (man scancel). The command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel [-i] &amp;lt;job-id&amp;gt;&lt;br /&gt;
$ scancel -t &amp;lt;job_state_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Slurm Options =&lt;br /&gt;
[[BwUniCluster3.0/Running_Jobs/Slurm | Detailed Slurm usage]]&lt;br /&gt;
&lt;br /&gt;
= Best Practices =&lt;br /&gt;
&lt;br /&gt;
== Step-by-Step example==&lt;br /&gt;
&lt;br /&gt;
== Dos and Don&#039;ts ==&lt;/div&gt;</summary>
		<author><name>S Braun</name></author>
	</entry>
</feed>