<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.bwhpc.de/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=J+Salk</id>
	<title>bwHPC Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.bwhpc.de/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=J+Salk"/>
	<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/e/Special:Contributions/J_Salk"/>
	<updated>2026-05-11T23:42:14Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.39.17</generator>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=15521</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=15521"/>
		<updated>2025-11-26T09:06:18Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- [[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]] --&amp;gt;&lt;br /&gt;
[[File:JUSTUS2_header.png|700px]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is the state-wide high-performance compute cluster dedicated to &#039;&#039;&#039;Computational Chemistry and Quantum Sciences&#039;&#039;&#039; in Baden-Württemberg. &lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* None&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* Online Tutorial [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC] (external link)&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php/crs/629  Introduction to JUSTUS2 ] &lt;br /&gt;
* [[JUSTUS2/Support|Contact and Support]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
** [[JUSTUS2/Hardware#Node_Specifications|Node Specifications]]&lt;br /&gt;
** [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Jobscripts: Running Your Calculations|Jobscripts: Running Your Calculations]]&lt;br /&gt;
** [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=15442</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=15442"/>
		<updated>2025-11-19T20:13:45Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- [[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]] --&amp;gt;&lt;br /&gt;
[[File:JUSTUS2_header.png|700px]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is the state-wide high-performance compute cluster dedicated to &#039;&#039;&#039;Computational Chemistry and Quantum Sciences&#039;&#039;&#039; in Baden-Württemberg. &lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* &#039;&#039;&#039;2025-11-19:&#039;&#039;&#039; Login nodes will be rebooted on &#039;&#039;&#039;Thursday, 20.11.2025, between 18:00 and 19:00&#039;&#039;&#039;, to apply security updates. Batch jobs are &#039;&#039;&#039;not affected&#039;&#039;&#039;.&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* Online Tutorial [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC] (external link)&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php/crs/629  Introduction to JUSTUS2 ] &lt;br /&gt;
* [[JUSTUS2/Support|Contact and Support]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
** [[JUSTUS2/Hardware#Node_Specifications|Node Specifications]]&lt;br /&gt;
** [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Jobscripts: Running Your Calculations|Jobscripts: Running Your Calculations]]&lt;br /&gt;
** [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=15404</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=15404"/>
		<updated>2025-11-13T11:43:36Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- [[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]] --&amp;gt;&lt;br /&gt;
[[File:JUSTUS2_header.png|700px]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is the state-wide high-performance compute cluster dedicated to &#039;&#039;&#039;Computational Chemistry and Quantum Sciences&#039;&#039;&#039; in Baden-Württemberg. &lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* 2025-11-12: There is currently an issue with the identity provider service at the University of Tübingen. As a result, users from the University of Tübingen may have been unable to log in to JUSTUS 2. The team at the University of Tübingen is aware of the problem and is actively working to resolve it. For now we have implemented a workaround on our side to mitigate this issue for users from Tübingen.&lt;br /&gt;
* 2025-11-13: Issue with the identity provider service at the University of Tübingen has been fixed. &lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* Online Tutorial [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC] (external link)&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php/crs/629  Introduction to JUSTUS2 ] &lt;br /&gt;
* [[JUSTUS2/Support|Contact and Support]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
** [[JUSTUS2/Hardware#Node_Specifications|Node Specifications]]&lt;br /&gt;
** [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Jobscripts: Running Your Calculations|Jobscripts: Running Your Calculations]]&lt;br /&gt;
** [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=15400</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=15400"/>
		<updated>2025-11-12T18:14:01Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- [[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]] --&amp;gt;&lt;br /&gt;
[[File:JUSTUS2_header.png|700px]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is the state-wide high-performance compute cluster dedicated to &#039;&#039;&#039;Computational Chemistry and Quantum Sciences&#039;&#039;&#039; in Baden-Württemberg. &lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* 2025-11-12: There is currently an issue with the identity provider service at the University of Tübingen. As a result, users from the University of Tübingen may have been unable to log in to JUSTUS 2. The team at the University of Tübingen is aware of the problem and is actively working to resolve it. For now we have implemented a workaround on our side to mitigate this issue for users from Tübingen.&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* Online Tutorial [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC] (external link)&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php/crs/629  Introduction to JUSTUS2 ] &lt;br /&gt;
* [[JUSTUS2/Support|Contact and Support]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
** [[JUSTUS2/Hardware#Node_Specifications|Node Specifications]]&lt;br /&gt;
** [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Jobscripts: Running Your Calculations|Jobscripts: Running Your Calculations]]&lt;br /&gt;
** [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=15399</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=15399"/>
		<updated>2025-11-12T17:42:34Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- [[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]] --&amp;gt;&lt;br /&gt;
[[File:JUSTUS2_header.png|700px]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is the state-wide high-performance compute cluster dedicated to &#039;&#039;&#039;Computational Chemistry and Quantum Sciences&#039;&#039;&#039; in Baden-Württemberg. &lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* 2025-11-12: There is currently an issue with the identity provider service at the University of Tübingen. As a result, users from the University of Tübingen may be unable to log in to JUSTUS 2. The team at the University of Tübingen is aware of the problem and is actively working to resolve it. &lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* Online Tutorial [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC] (external link)&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php/crs/629  Introduction to JUSTUS2 ] &lt;br /&gt;
* [[JUSTUS2/Support|Contact and Support]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
** [[JUSTUS2/Hardware#Node_Specifications|Node Specifications]]&lt;br /&gt;
** [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Jobscripts: Running Your Calculations|Jobscripts: Running Your Calculations]]&lt;br /&gt;
** [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_JUSTUS_2_Slurm_HOWTO&amp;diff=15306</id>
		<title>BwForCluster JUSTUS 2 Slurm HOWTO</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_JUSTUS_2_Slurm_HOWTO&amp;diff=15306"/>
		<updated>2025-09-30T17:35:40Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Justus2}}&lt;br /&gt;
&lt;br /&gt;
This is a collection of howtos and convenient Slurm commands for JUSTUS 2.&lt;br /&gt;
&lt;br /&gt;
Some commands behave slightly different depending on whether they are executed &lt;br /&gt;
by a system administrator or by a regular user, as Slurm prevents regular users from accessing critical system information and viewing job and usage information of other users.  &lt;br /&gt;
&lt;br /&gt;
= GENERAL INFORMATION =&lt;br /&gt;
&lt;br /&gt;
== How to find a general quick start user guide? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/quickstart.html&lt;br /&gt;
&lt;br /&gt;
== How to find Slurm FAQ? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/faq.html&lt;br /&gt;
&lt;br /&gt;
== How to find a Slurm cheat sheet? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/pdfs/summary.pdf&lt;br /&gt;
&lt;br /&gt;
== How to find Slurm tutorials? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/tutorials.html&lt;br /&gt;
&lt;br /&gt;
== How to get more information on Slurm? ==&lt;br /&gt;
&lt;br /&gt;
(Almost) every Slurm command has a man page. Use it.&lt;br /&gt;
&lt;br /&gt;
Online versions: https://slurm.schedmd.com/man_index.html&lt;br /&gt;
&lt;br /&gt;
== How to find hardware specific details about JUSTUS 2? ==&lt;br /&gt;
&lt;br /&gt;
See our Wiki page: [[Hardware and Architecture (bwForCluster JUSTUS 2)|Hardware and Architecture]]&lt;br /&gt;
&lt;br /&gt;
= JOB SUBMISSION =&lt;br /&gt;
&lt;br /&gt;
== How to submit a serial batch job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html sbatch]  command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch &amp;lt;job-script&amp;gt; &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample job script template for serial job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# 8 GB memory required per node&lt;br /&gt;
#SBATCH --mem=8G&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=serial_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=serial_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=serial_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# Run serial program&lt;br /&gt;
./my_serial_program&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for serial program: [[Media:Hello_serial.c | Hello_serial.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
* --nodes=1 and --ntasks-per-node=1 may be replaced by --ntasks=1.&lt;br /&gt;
* If not specified, stdout and stderr are both written to slurm-%j.out.&lt;br /&gt;
&lt;br /&gt;
== How to find working sample scripts for my program? ==&lt;br /&gt;
&lt;br /&gt;
Most software modules for applications provide working sample batch scripts.&lt;br /&gt;
Check with [[Software_Modules_Lmod#Module_specific_help | module help]] command, e.g. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module help chem/vasp     # display module help for VASP&lt;br /&gt;
$ module help math/matlab   # display module help for Matlab&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to harden job scripts against common errors? ==&lt;br /&gt;
&lt;br /&gt;
The bash shell provides several options that support users in disclosing hidden bugs and writing safer job scripts.&lt;br /&gt;
In order to activate these safeguard settings users can insert the following lines in their scripts (after all #SBATCH directives):    &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
set -o errexit   # (or set -e) cause batch script to exit immediately when a command fails.&lt;br /&gt;
set -o pipefail  # cause batch script to exit immediately also when the command that failed is embedded in a pipeline&lt;br /&gt;
set -o nounset   # (or set -u) causes the script to treat unset variables as an error and exit immediately &lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to submit an interactive job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/salloc.html salloc] command, e.g.:&lt;br /&gt;
&amp;lt;pre&amp;gt;$ salloc --nodes=1 --ntasks-per-node=8&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
In previous Slurm versions &amp;lt; 20.11 the use of [https://slurm.schedmd.com/srun.html srun] has been the recommended way for launching interactive jobs, e.g.:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ srun --nodes=1 --ntasks-per-node=8 --pty bash &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Although this still works with current Slurm versions this is considered &#039;&#039;&#039;deprecated &#039;&#039;&#039; for current Slurm versions as it may cause issues when launching additional jobs steps from within the interactive job environment. Use [https://slurm.schedmd.com/salloc.html salloc] command.&lt;br /&gt;
&lt;br /&gt;
== How to enable X11 forwarding for an interactive job? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--x11&#039; flag, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc --nodes=1 --ntasks-per-node=8 --x11     # run shell with X11 forwarding enabled&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
* For X11 forwarding to work, you must also enable X11 forwarding for your ssh login from your local computer to the cluster, i.e.:&lt;br /&gt;
 &amp;lt;pre&amp;gt;local&amp;gt; ssh -X &amp;lt;username&amp;gt;@justus2.uni-ulm.de&amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to convert Moab batch job scripts to Slurm? ==&lt;br /&gt;
&lt;br /&gt;
Replace Moab/Torque job specification flags and environment variables in your job&lt;br /&gt;
scripts by their corresponding Slurm counterparts.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Commonly used Moab job specification flags and their Slurm equivalents&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Option !! Moab (msub) !! Slurm (sbatch)&lt;br /&gt;
|-&lt;br /&gt;
| Script directive                            || #MSUB                                  || #SBATCH&lt;br /&gt;
|-&lt;br /&gt;
| Job name                                    || -N &amp;lt;name&amp;gt;                              || --job-name=&amp;lt;name&amp;gt;  (-J &amp;lt;name&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Account                                     || -A &amp;lt;account&amp;gt;                           || --account=&amp;lt;account&amp;gt; (-A &amp;lt;account&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Queue                                       || -q &amp;lt;queue&amp;gt;                             || --partition=&amp;lt;partition&amp;gt; (-p &amp;lt;partition&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Wall time limit                             || -l walltime=&amp;lt;hh:mm:ss&amp;gt;                 || --time=&amp;lt;hh:mm:ss&amp;gt; (-t &amp;lt;hh:mm:ss&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Node count                                  || -l nodes=&amp;lt;count&amp;gt;                       || --nodes=&amp;lt;count&amp;gt; (-N &amp;lt;count&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Core count                                  || -l procs=&amp;lt;count&amp;gt;                       || --ntasks=&amp;lt;count&amp;gt; (-n &amp;lt;count&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Process count per node                      || -l ppn=&amp;lt;count&amp;gt;                         || --ntasks-per-node=&amp;lt;count&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Core count per process                      ||                                        || --cpus-per-task=&amp;lt;count&amp;gt; (-c &amp;lt;count&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Memory limit per node                       || -l mem=&amp;lt;limit&amp;gt;                         || --mem=&amp;lt;limit&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Memory limit per process                    || -l pmem=&amp;lt;limit&amp;gt;                        || --mem-per-cpu=&amp;lt;limit&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Job array                                   || -t &amp;lt;array indices&amp;gt;                     || --array=&amp;lt;indices&amp;gt; (-a &amp;lt;indices&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Node exclusive job                          || -l naccesspolicy=singlejob             || --exclusive&lt;br /&gt;
|-&lt;br /&gt;
| Initial working directory                   || -d &amp;lt;directory&amp;gt; (default: $HOME)        || --chdir=&amp;lt;directory&amp;gt; (-D &amp;lt;directory&amp;gt;) (default: submission directory)&lt;br /&gt;
|-&lt;br /&gt;
| Standard output file                        || -o &amp;lt;file path&amp;gt;                         || --output=&amp;lt;file&amp;gt; (-o &amp;lt;file&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Standard error file                         || -e &amp;lt;file path&amp;gt;                         || --error=&amp;lt;file&amp;gt;  (-e &amp;lt;file&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Combine stdout/stderr to stdout             || -j oe                                  || --output=&amp;lt;combined stdout/stderr file&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Mail notification events                    || -m &amp;lt;event&amp;gt;                             || --mail-type=&amp;lt;events&amp;gt; (valid types include: NONE, BEGIN, END, FAIL, ALL)&lt;br /&gt;
|-&lt;br /&gt;
| Export environment to job                   || -V                                     || --export=ALL (default)&lt;br /&gt;
|-&lt;br /&gt;
| Don&#039;t export environment to job             || (default)                              || --export=NONE&lt;br /&gt;
|-&lt;br /&gt;
| Export environment variables to job         || -v &amp;lt;var[=value][,var2=value2[, ...]]&amp;gt;  || --export=&amp;lt;var[=value][,var2=value2[,...]]&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
* Default initial job working directory is $HOME for Moab. For Slurm the default working directory is where you submit your job from.&lt;br /&gt;
* By default Moab does not export any environment variables to the job&#039;s runtime environment. With Slurm most of the login environment variables are exported to your job&#039;s runtime environment. This includes environment variables from software modules that were loaded at job submission time (and also $HOSTNAME variable).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Commonly used Moab/Torque script environment variables and their Slurm equivalents&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Information                 !! Moab                !! Torque               !! Slurm                                     &lt;br /&gt;
|-&lt;br /&gt;
| Job name                     || $MOAB_JOBNAME        || $PBS_JOBNAME        || $SLURM_JOB_NAME                           &lt;br /&gt;
|-&lt;br /&gt;
| Job ID                       || $MOAB_JOBID          || $PBS_JOBID          || $SLURM_JOB_ID                             &lt;br /&gt;
|-&lt;br /&gt;
| Submit directory             || $MOAB_SUBMITDIR      || $PBS_O_WORKDIR      || $SLURM_SUBMIT_DIR                         &lt;br /&gt;
|-&lt;br /&gt;
| Number of nodes allocated    || $MOAB_NODECOUNT      || $PBS_NUM_NODES      || $SLURM_JOB_NUM_NODES (and: $SLURM_NNODES) &lt;br /&gt;
|-&lt;br /&gt;
| Node list                    || $MOAB_NODELIST       || cat $PBS_NODEFILE   || $SLURM_JOB_NODELIST                       &lt;br /&gt;
|-&lt;br /&gt;
| Number of processes          || $MOAB_PROCCOUNT      || $PBS_TASKNUM        || $SLURM_NTASKS                             &lt;br /&gt;
|-&lt;br /&gt;
| Requested tasks per node     || ---                    || $PBS_NUM_PPN        || $SLURM_NTASKS_PER_NODE                    &lt;br /&gt;
|-&lt;br /&gt;
| Requested CPUs per task      || ---                  || ---                 || $SLURM_CPUS_PER_TASK                      &lt;br /&gt;
|-&lt;br /&gt;
| Job array index              || $MOAB_JOBARRAYINDEX  || $PBS_ARRAY_INDEX    || $SLURM_ARRAY_TASK_ID                      &lt;br /&gt;
|-&lt;br /&gt;
| Job array range              || $MOAB_JOBARRAYRANGE  || -                   || $SLURM_ARRAY_TASK_COUNT                   &lt;br /&gt;
|-&lt;br /&gt;
| Queue name                   || $MOAB_CLASS          || $PBS_QUEUE          || $SLURM_JOB_PARTITION                      &lt;br /&gt;
|-&lt;br /&gt;
| QOS name                     || $MOAB_QOS            || ---                 || $SLURM_JOB_QOS                            &lt;br /&gt;
|-&lt;br /&gt;
| Number of processes per node | ---                   || $PBS_NUM_PPN        || $SLURM_TASKS_PER_NODE                     &lt;br /&gt;
|-&lt;br /&gt;
| Job user                     || $MOAB_USER           || $PBS_O_LOGNAME      || $SLURM_JOB_USER                           &lt;br /&gt;
|-&lt;br /&gt;
| Hostname                     || $MOAB_MACHINE        || $PBS_O_HOST         || $SLURMD_NODENAME                          &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* See [https://slurm.schedmd.com/sbatch.html sbatch] man page for a complete list of flags and environment variables.&lt;br /&gt;
&lt;br /&gt;
== How to emulate Moab output file names? ==&lt;br /&gt;
&lt;br /&gt;
Use the following directives:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#SBATCH --output=&amp;quot;%x.o%j&amp;quot;&lt;br /&gt;
#SBATCH --error=&amp;quot;%x.e%j&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to pass command line arguments to the job script? ==&lt;br /&gt;
&lt;br /&gt;
Run &amp;lt;pre&amp;gt;$ sbatch &amp;lt;job-script&amp;gt; arg1 arg2 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Inside the job script the arguments can be accessed as $1, $2, ...&lt;br /&gt;
&lt;br /&gt;
E.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
infile=&amp;quot;$1&amp;quot;&lt;br /&gt;
outfile=&amp;quot;$2&amp;quot;&lt;br /&gt;
./my_serial_program &amp;lt; &amp;quot;$infile&amp;quot; &amp;gt; &amp;quot;$outfile&amp;quot; 2&amp;gt;&amp;amp;1&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; use $1, $2, ... in &amp;quot;#SBATCH&amp;quot; lines. These parameters can be used only within the regular shell script.&lt;br /&gt;
&lt;br /&gt;
== How to request local scratch (SSD/NVMe) at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--gres=scratch:nnn&#039; option to allocate nnn GB of local (i.e. node-local) scratch space for the entire job.&lt;br /&gt;
&lt;br /&gt;
Example: &#039;--gres=scratch:100&#039; will allocate 100 GB scratch space on a locally attached NVMe device.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; add any unit (such as --gres=scratch:100G). This would be treated as requesting an amount of 10^9 * 100GB of scratch space.&lt;br /&gt;
&lt;br /&gt;
* Multinode jobs get nnn GB of local scratch space on every node of the job.&lt;br /&gt;
&lt;br /&gt;
* Environment variable &#039;&#039;&#039;$SCRATCH&#039;&#039;&#039; will point to &lt;br /&gt;
** /scratch/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt; when local scratch has been requested. This will be on locally attached SSD/NVMe devices.&lt;br /&gt;
** /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt; when no local scratch has been requested. This will be in memory and, thus, be limited in size.&lt;br /&gt;
&lt;br /&gt;
* Environment variable &#039;&#039;&#039;$TMPDIR&#039;&#039;&#039; always points to /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt;. This will always be in memory and, thus, limited in size.&lt;br /&gt;
&lt;br /&gt;
* For backward compatibility environment variable $RAMDISK always points to /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Scratch space allocation in /scratch will be enforced by quota limits&lt;br /&gt;
&lt;br /&gt;
* Data written to $TMPDIR will always count against allocated memory.&lt;br /&gt;
&lt;br /&gt;
* Data written to local scratch space will automatically be removed at the end of the job.&lt;br /&gt;
&lt;br /&gt;
== How to request GPGPU nodes at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--gres=gpu:&amp;lt;count&amp;gt;&#039; option to allocate 1 or 2 GPUs per node for the entire job.&lt;br /&gt;
&lt;br /&gt;
Example: &#039;--gres=gpu:1&#039; will allocate one GPU per node for this job.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* GPGPU nodes are equipped with two Nvidia V100S cards &lt;br /&gt;
&lt;br /&gt;
* Environment variables $CUDA_VISIBLE_DEVICES, $SLURM_JOB_GPUS and $GPU_DEVICE_ORDINAL will denote card(s) allocated for the job.&lt;br /&gt;
&lt;br /&gt;
* CUDA Toolkit is available as software module devel/cuda.&lt;br /&gt;
&lt;br /&gt;
== How to clean-up or save files before a job times out? ==&lt;br /&gt;
&lt;br /&gt;
Possibly you would like to clean up the work directory or save intermediate result files in case a job times out.&lt;br /&gt;
&lt;br /&gt;
The following sample script may serve as a blueprint for implementing a pre-termination function to perform clean-up or file recovery actions. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# 2 GB memory required per node&lt;br /&gt;
#SBATCH --mem=2G&lt;br /&gt;
# Request 10 GB local scratch space&lt;br /&gt;
#SBATCH --gres=scratch:10&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=10:00&lt;br /&gt;
# Send the USR1 signal 120 seconds before end of time limit&lt;br /&gt;
#SBATCH --signal=B:USR1@120&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=signal_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=signal_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=signal_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Define the signal handler function&lt;br /&gt;
# Note: This is not executed here, but rather when the associated &lt;br /&gt;
# signal is received by the shell.&lt;br /&gt;
finalize_job()&lt;br /&gt;
{&lt;br /&gt;
    # Do whatever cleanup you want here. In this example we copy&lt;br /&gt;
    # output file(s) back to $SLURM_SUBMIT_DIR, but you may implement &lt;br /&gt;
    # your own job finalization code here.&lt;br /&gt;
    echo &amp;quot;function finalize_job called at `date`&amp;quot;&lt;br /&gt;
    cd $SCRATCH&lt;br /&gt;
    mkdir -vp &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/results&lt;br /&gt;
    tar czvf &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/results/${SLURM_JOB_ID}.tgz output*.txt&lt;br /&gt;
    exit&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# Call finalize_job function as soon as we receive USR1 signal&lt;br /&gt;
trap &#039;finalize_job&#039; USR1&lt;br /&gt;
&lt;br /&gt;
# Copy input files for this job to the scratch directory (if needed).&lt;br /&gt;
# Note: Environment variable $SCRATCH always points to a scratch directory &lt;br /&gt;
# automatically created for this job. Environment variable $SLURM_SUBMIT_DIR &lt;br /&gt;
# points to the path where this script was submitted from.&lt;br /&gt;
# Example:&lt;br /&gt;
# cp -v &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/input*.txt &amp;quot;$SCRATCH&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Change working directory to local scratch directory&lt;br /&gt;
cd &amp;quot;$SCRATCH&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# This is where the actual work is done. In this case we just create &lt;br /&gt;
# a sample output file for 900 (=15*60) seconds, but since we asked &lt;br /&gt;
# Slurm for 600 seconds only it will not be able finish within this &lt;br /&gt;
# wall time.&lt;br /&gt;
# Note: It is important to run this task in the background &lt;br /&gt;
# by placing the &amp;amp; symbol at the end. Otherwise the signal handler &lt;br /&gt;
# would not be executed until that process has finished, which is not &lt;br /&gt;
# what we want.&lt;br /&gt;
(for i in `seq 15`; do echo &amp;quot;Hello World at `date +%H:%M:%S`.&amp;quot;; sleep 60; done) &amp;gt;output.txt 2&amp;gt;&amp;amp;1 &amp;amp;&lt;br /&gt;
&lt;br /&gt;
# Note: The command above is just for illustration. Normally you would just run&lt;br /&gt;
# my_program &amp;gt;output.txt 2&amp;gt;&amp;amp;1 &amp;amp;&lt;br /&gt;
&lt;br /&gt;
# Tell the shell to wait for background task(s) to finish. &lt;br /&gt;
# Note: This is important because otherwise the parent shell &lt;br /&gt;
# (this script) would proceed (and terminate) without waiting for &lt;br /&gt;
# background task(s) to finish.&lt;br /&gt;
wait&lt;br /&gt;
&lt;br /&gt;
# If we get here, the job did not time out but finished in time.&lt;br /&gt;
&lt;br /&gt;
# Release user defined signal handler for USR1&lt;br /&gt;
trap - USR1&lt;br /&gt;
&lt;br /&gt;
# Do regular cleanup and save files. In this example we simply call &lt;br /&gt;
# the same function that we defined as a signal handler above, but you &lt;br /&gt;
# may implement your own code here. &lt;br /&gt;
finalize_job&lt;br /&gt;
&lt;br /&gt;
exit&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* The number of seconds specified in --signal option must match the runtime of the pre-termination function and must not exceed 65535 seconds.&lt;br /&gt;
&lt;br /&gt;
* Due to the resolution of event handling by Slurm, the signal may be sent a little earlier than specified.&lt;br /&gt;
&lt;br /&gt;
== How to submit a multithreaded batch job? ==&lt;br /&gt;
&lt;br /&gt;
Sample job script template for a job running one multithreaded program instance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# Number of cores per program instance&lt;br /&gt;
#SBATCH --cpus-per-task=8&lt;br /&gt;
# 8 GB memory required per node&lt;br /&gt;
#SBATCH --mem=8G&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=multithreaded_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=multithreaded_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=multithreaded_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}&lt;br /&gt;
export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK}&lt;br /&gt;
&lt;br /&gt;
# Run multithreaded program&lt;br /&gt;
./my_multithreaded_program&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for multithreaded program: [[Media:Hello_openmp.c | Hello_openmp.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* In our configuration each physical core is considered a &amp;quot;CPU&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
* On JUSTUS 2 it is recommended to specify a number of cores per task (&#039;--cpus-per-task&#039;) that is either an integer divisor of 24 or (at most) 48.&lt;br /&gt;
&lt;br /&gt;
* Required memory can also by specified per allocated CPU with &#039;--mem-per-cpu&#039; option. &lt;br /&gt;
&lt;br /&gt;
* The &#039;--mem&#039; and &#039;--mem-per-cpu&#039; options are mutually exclusive.&lt;br /&gt;
&lt;br /&gt;
==  How to submit an array job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_array -a] (or [https://slurm.schedmd.com/sbatch.html#OPT_array --array]) option, e.g. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -a 1-16%8 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will submit 16 tasks to be executed, each one indexed by SLURM_ARRAY_TASK_ID ranging from 1 to 16, but will limit the number of simultaneously running tasks from this job array to 8.&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an array job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Number of cores per individual array task&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --array=1-16%8&lt;br /&gt;
#SBATCH --mem=4G&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=array_job&lt;br /&gt;
#SBATCH --output=array_job-%A_%a.out&lt;br /&gt;
#SBATCH --error=array_job-%A_%a.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# Print the task id.&lt;br /&gt;
echo &amp;quot;My SLURM_ARRAY_TASK_ID: &amp;quot; $SLURM_ARRAY_TASK_ID&lt;br /&gt;
&lt;br /&gt;
# Add lines here to run your computations, e.g.&lt;br /&gt;
# ./my_program &amp;lt;input.$SLURM_ARRAY_TASK_ID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Placeholder %A will be replaced by the master job id, %a will be replaced by the array task id.&lt;br /&gt;
&lt;br /&gt;
* Every sub job in an array job will have its own unique environment variable $SLURM_JOB_ID. Environment variable $SLURM_ARRAY_JOB_ID  will be set to the first job array index value for all tasks.&lt;br /&gt;
&lt;br /&gt;
* The remaining options in the sample job script are the same as the options used in other, non-array jobs. In the example above, we are requesting that each array task be allocated 1 CPU (--ntasks=1) and 4 GB of memory (--mem=4G) for up to one hour (--time=01:00:00).&lt;br /&gt;
&lt;br /&gt;
* More information: https://slurm.schedmd.com/job_array.html&lt;br /&gt;
&lt;br /&gt;
== How to delay the start of a job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_begin -b] (or [https://slurm.schedmd.com/sbatch.html#OPT_begin --begin]) option in order to defer the allocation of the job until the specified time.&lt;br /&gt;
&lt;br /&gt;
Examples:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch --begin=20:00 ...               # job can start after 8 p.m. &lt;br /&gt;
sbatch --begin=now+1hour ...           # job can start 1 hour after submission&lt;br /&gt;
sbatch --begin=teatime ...             # job can start at teatime (4 p.m.)&lt;br /&gt;
sbatch --begin=2023-12-24T20:00:00 ... # job can start after specified date/time&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to submit dependency (chain) jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_dependency -d] (or [https://slurm.schedmd.com/sbatch.html#OPT_dependency --dependency]) option, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -d afterany:123456 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will defer the submitted job until the specified job 123456 has terminated.&lt;br /&gt;
&lt;br /&gt;
Slurm supports a number of different dependency types, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-d after:123456      # job can begin execution after the specified job has begun execution&lt;br /&gt;
-d afterany:123456   # job can begin execution after the specified job has finished&lt;br /&gt;
-d afternotok:123456 # job can begin execution after the specified job has failed (exit code not equal zero)&lt;br /&gt;
-d afterok:123456    # job can begin execution after the specified job has successfully finished (exit code zero)&lt;br /&gt;
-d singleton         # job can begin execution after any previously job with the same job name and user have finished&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Multiple jobs can be specified by separating their job ids by colon characters (:), e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt; $ sbatch -d afterany:123456:123457 ... &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will defer the submitted job until the specified jobs 123456 and 123457 have both finished.&lt;br /&gt;
&lt;br /&gt;
== How to deal with invalid job dependencies? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_kill-on-invalid-dep --kill-on-invalid-dep=yes] option in order to automatically terminate jobs which can never run due to invalid dependencies. By default the job stays pending with reason &#039;DependencyNeverSatisfied&#039; to allow review and appropriate action by the user.  &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; A job dependency may also become invalid if a job has been submitted with &#039;-d afterok:&amp;lt;jobid&amp;gt;&#039; but the specified dependency job has failed, e.g. because it timed out (i.e. exceeded its wall time limit).&lt;br /&gt;
&lt;br /&gt;
== How to submit an MPI batch job? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/mpi_guide.html&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an MPI job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate two nodes&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=48&lt;br /&gt;
# Allocate 32 GB memory per node&lt;br /&gt;
#SBATCH --mem=32gb&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=mpi_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=mpi_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Add lines here to run your computations, e.g.&lt;br /&gt;
#&lt;br /&gt;
# Option 1: Lauch MPI tasks by using mpirun&lt;br /&gt;
#&lt;br /&gt;
# for OpenMPI and GNU compiler:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/gnu&lt;br /&gt;
# module load mpi/openmpi&lt;br /&gt;
# mpirun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# for Intel MPI and Intel complier:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/intel&lt;br /&gt;
# module load mpi/impi&lt;br /&gt;
# mpirun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# Option 2: Launch MPI tasks by using srun&lt;br /&gt;
#&lt;br /&gt;
# for OpenMPI and GNU compiler:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/gnu&lt;br /&gt;
# module load mpi/openmpi&lt;br /&gt;
# srun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# for Intel MPI and Intel compiler:&lt;br /&gt;
#&lt;br /&gt;
module load compiler/intel&lt;br /&gt;
module load mpi/impi&lt;br /&gt;
srun  ./my_mpi_program&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for MPI program: [[Media:Hello_mpi.c | Hello_mpi.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* SchedMD recommends to use srun and many (most?) sites do so as well. The rationale is that srun is more tightly integrated with the scheduler and provides more consistent and reliable resource tracking and accounting for individual jobs and job steps. mpirun may behave differently for different MPI implementations and versions. There are reports that claim &amp;quot;strange behavior&amp;quot; of mpirun especially when using task affinity and core binding. Using srun is supposed to resolve these issues and is therefore highly recommended.&lt;br /&gt;
* Do not run batch jobs that launch a large number (hundreds or thousands) short running (few minutes or less) MPI programs, e.g. from a shell loop. Every single MPI invocation does generate its own job step and sends remote procedure calls to the Slurm controller server. This can result in degradation of performance for both, Slurm and the application, especially if many of that jobs happen to run at the same time. Jobs of that kind can even get stuck without showing any further activity until hitting the wall time limit. For high throughput computing (e.g. processing a large number of files with every single task running independently from each other and very shortly), consider a more appropriate parallelization paradigm that invokes independent serial (non-MPI) processes in parallel at the same time. This approach is sometimes referred to as &amp;quot;[https://en.wikipedia.org/wiki/Embarrassingly_parallel pleasingly parallel]&amp;quot; workload. GNU Parallel is a shell tool that facilitates executing serial tasks in parallel. On JUSTUS 2 this tool is available as a software module &amp;quot;system/parallel&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== How to submit a hybrid MPI/OpenMP job? ==&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an hybrid job:&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Number of nodes to allocate&lt;br /&gt;
#SBATCH --nodes=4&lt;br /&gt;
# Number of MPI instances (ranks) to be executed per node&lt;br /&gt;
#SBATCH --ntasks-per-node=2&lt;br /&gt;
# Number of threads per MPI instance&lt;br /&gt;
#SBATCH --cpus-per-task=24&lt;br /&gt;
# Allocate 8 GB memory per node&lt;br /&gt;
#SBATCH --mem=8gb&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=hybrid_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=hybrid_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=hybrid_job-%j.err&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}&lt;br /&gt;
export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK}&lt;br /&gt;
&lt;br /&gt;
module load compiler/intel&lt;br /&gt;
module load mpi/impi&lt;br /&gt;
srun ./my_hybrid_program&lt;br /&gt;
&lt;br /&gt;
# or:&lt;br /&gt;
# mpirun ./my_hybrid_program&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for hybrid program: [[Media:Hello_hybrid.c | Hello_hybrid.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* $SLURM_CPUS_PER_TASK is only set if the &#039;--cpus-per-task&#039; option is specified.&lt;br /&gt;
&lt;br /&gt;
== How to request specific node(s) at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_nodelist -w] (or [https://slurm.schedmd.com/sbatch.html#OPT_nodelist --nodelist]) option, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -w &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also see [https://slurm.schedmd.com/sbatch.html#OPT_nodefile -F] (or [https://slurm.schedmd.com/sbatch.html#OPT_nodefile --nodefile]) option.&lt;br /&gt;
&lt;br /&gt;
== How to exclude specific nodes from job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_exclude -x] (or [https://slurm.schedmd.com/sbatch.html#OPT_exclude --exclude]) option, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -x &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get exclusive jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--exclusive&#039; option on job submission. This makes sure that there will be no other jobs running on your nodes. Very useful for benchmarking!&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* --exclusive option does &#039;&#039;&#039;not&#039;&#039;&#039; mean that you automatically get full access to all the resources which the node might provide without explicitly requesting them.&lt;br /&gt;
&lt;br /&gt;
== How to avoid sharing nodes with other users? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--exclusive=user&#039; option on job submission. This will still allow multiple jobs of one and the same user on the nodes.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Depending on configuration, exclusive=user may (and probably will) be the default node access policy on JUSTUS 2.&lt;br /&gt;
&lt;br /&gt;
==  How to submit batch job without job script? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_wrap --wrap] option.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch --nodes=2 --ntasks-per-node=16 --wrap &amp;quot;sleep 600&amp;quot;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; May be useful for testing purposes.&lt;br /&gt;
&lt;br /&gt;
= JOB MONITORING AND CONTROL =&lt;br /&gt;
&lt;br /&gt;
== How to prevent Slurm performance degradation? ==&lt;br /&gt;
&lt;br /&gt;
Almost every invocation of a Slurm client command (e.g. squeue, sacct, sprio or sshare) sends a remote procedure call (RPC) to the Slurm control daemon and/or database. &lt;br /&gt;
If enough remote procedure calls come in at once, this can result in a degradation of performance of the Slurm services for all users, possibly resulting in a denial of service. &lt;br /&gt;
&lt;br /&gt;
Therefore, &#039;&#039;&#039;do not run Slurm client commands that send remote procedure calls from loops in shell scripts or other programs&#039;&#039;&#039; (such as &#039;watch squeue&#039;). Always ensure to limit calls to squeue, sstat, sacct etc. to the minimum necessary for the information you are trying to gather. &lt;br /&gt;
&lt;br /&gt;
Slurm does collect RPC counts and timing statistics by message type and user for diagnostic purposes.&lt;br /&gt;
&lt;br /&gt;
== How to view information about submitted jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] command, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue                  # all jobs owned by user (all jobs owned by all users for admins)&lt;br /&gt;
$ squeue --me             # all jobs owned by user (same as squeue for regular users)&lt;br /&gt;
$ squeue -u &amp;lt;username&amp;gt;    # jobs of specific user&lt;br /&gt;
$ squeue -t PENDING       # pending jobs only&lt;br /&gt;
$ squeue -t RUNNING       # running jobs only&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
* The output format of [https://slurm.schedmd.com/squeue.html squeue] (and most other Slurm commands) is highly configurable to your needs. Look for the --format or --Format options.&lt;br /&gt;
&lt;br /&gt;
* Every invocation of squeue sends a remote procedure call to the Slurm database server. &#039;&#039;&#039;Do not run squeue or other Slurm client commands from loops in shell scripts or other programs&#039;&#039;&#039; as this can result in a degradation of performance. Ensure that programs limit calls to squeue to the minimum necessary for the information you are trying to gather.&lt;br /&gt;
&lt;br /&gt;
== How to cancel jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/scancel.html scancel] command, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel &amp;lt;jobid&amp;gt;         # cancel specific job&lt;br /&gt;
$ scancel &amp;lt;jobid&amp;gt;_&amp;lt;index&amp;gt; # cancel indexed job in a job array&lt;br /&gt;
$ scancel -u &amp;lt;username&amp;gt;   # cancel all jobs of specific user&lt;br /&gt;
$ scancel -t PENDING      # cancel pending jobs&lt;br /&gt;
$ scancel -t RUNNING      # cancel running jobs&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to show job script of a running job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/scontrol.html scontrol] command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol write batch_script &amp;lt;job_id&amp;gt; &amp;lt;file&amp;gt;&lt;br /&gt;
$ scontrol write batch_script &amp;lt;job_id&amp;gt; -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* If file name is omitted default file name will be slurm-&amp;lt;job_id&amp;gt;.sh&lt;br /&gt;
* If file name is - (i.e. dash) job script will be written to stdout.&lt;br /&gt;
&lt;br /&gt;
== How to get estimated start time of a job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ squeue --start&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* Estimated start times are dynamic and can change at any moment. Exact start times of individual jobs are usually unpredictable.&lt;br /&gt;
* Slurm will report N/A for the start time estimate if nodes are not currently being reserved by the scheduler for the job to run on.&lt;br /&gt;
&lt;br /&gt;
== How to show remaining walltime of running jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] with format option &amp;quot;%L&amp;quot;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt; $ squeue -t r -o &amp;quot;%u %i %L&amp;quot; &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to check priority of jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] with format options &amp;quot;%Q&amp;quot; and/or &amp;quot;%p&amp;quot;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue -o &amp;quot;%8i %8u %15a %.10r %.10L %.5D %.10Q&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sprio.html sprio] command to display the priority components (age/fairshare/...) for each job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sprio&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use &amp;quot;[https://slurm.schedmd.com/sshare.html sshare] command for listing the shares of associations, e.g. accounts.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sshare&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to prevent (hold) jobs from being scheduled for execution? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol hold &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to unhold job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol release &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to suspend a running job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol suspend &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to resume a suspended job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol resume &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to requeue (cancel and resubmit) a particular job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol requeue &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to monitor resource usage of running job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use &amp;quot;[https://slurm.schedmd.com/sstat.html sstat] command.&lt;br /&gt;
&lt;br /&gt;
&#039;sstat -e&#039; command shows a list of fields that can be specified with the &#039;--format&#039; option.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sstat --format=JobId,AveCPU,AveRSS,MaxRSS -j &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will show average CPU time, average and maximum memory consumption of all tasks in the running job.&lt;br /&gt;
Ideally, average CPU time equals the number of cores allocated for the job multiplied by the current run time of the job. &lt;br /&gt;
The maximum memory consumption gives an estimate of the peak amount of memory actually needed so far. This can be compared with the amount of memory requested for the job. Over-requesting memory can result in significant waste of compute resources.       &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Users can also ssh into compute nodes that they have one or more running jobs on. Once logged in, they can use standard Linux process monitoring tools like ps, (h)top, free, vmstat, iostat, du, ...&lt;br /&gt;
&lt;br /&gt;
* Users can also attach an interactive shell under an already allocated job by running the following command: &amp;lt;pre&amp;gt;srun --jobid &amp;lt;job&amp;gt; --overlap --pty /bin/bash&amp;lt;/pre&amp;gt; Once logged in, they can again use standard Linux process monitoring tools like ps, (h)top, free, vmstat, iostat, du, ... For a single node job the user does not even need to know the node that the job is running on. For a multinode job, the user can still use &#039;-w &amp;lt;node&amp;gt;&#039; option to specify a specific node.&lt;br /&gt;
&lt;br /&gt;
== How to get detailed job information ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show job 1234  # For job id 1234&lt;br /&gt;
$ scontrol show jobs      # For all jobs&lt;br /&gt;
$ scontrol -o show jobs   # One line per job&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to modify a pending/running job? ==&lt;br /&gt;
&lt;br /&gt;
Use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ scontrol update JobId=&amp;lt;jobid&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
E.g.: &amp;lt;pre&amp;gt;$ scontrol update JobId=42 TimeLimit=7-0&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will modify the time limit of the job to 7 days.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Update requests for &#039;&#039;&#039;running&#039;&#039;&#039; jobs are mostly restricted to Slurm administrators. In particular, only an administrator can increase the TimeLimit of a job.&lt;br /&gt;
&lt;br /&gt;
== How to show accounting data of completed job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sacct.html sacct] command.&lt;br /&gt;
&lt;br /&gt;
&#039;sacct -e&#039; command shows a list of fields that can be&lt;br /&gt;
specified with the &#039;--format&#039; option.&lt;br /&gt;
&lt;br /&gt;
== How to retrieve job history and accounting? ==&lt;br /&gt;
&lt;br /&gt;
For a specific job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -j &amp;lt;jobid&amp;gt; --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For a specific user:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note: Default time window is the current day.&lt;br /&gt;
&lt;br /&gt;
Starting from a specific date:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; -S 2020-01-15 --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Within a time window:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; -S 2020-01-15 -E 2020-01-31 --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
You can also set the environment variable $SACCT_FORMAT to specify the default format. To get a general idea of how efficiently a job utilized its resources, the following format can be used:  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export SACCT_FORMAT=&amp;quot;JobID,JobName,Elapsed,NCPUs,TotalCPU,CPUTime,ReqMem,MaxRSS,MaxDiskRead,MaxDiskWrite,State,ExitCode&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find how efficiently the CPUs were used, divide TotalCPU by CPUTime. To find how efficiently memory were used, devide MaxRSS by ReqMem. But be aware that sacct memory usage measurement doesn&#039;t catch very rapid memory spikes. If your job got killed for running out of memory, it &#039;&#039;&#039;did run out of memory&#039;&#039;&#039; even if sacct reports a lower memory usage than would trigger an out-of-memory-kill. A job that reads or writes excessively to disk might be bogged down significantly by I/O operations.&lt;br /&gt;
&lt;br /&gt;
== How to get efficiency information of completed job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;pre&amp;gt;$ seff &amp;lt;jobid&amp;gt; &amp;lt;/pre&amp;gt; command for some brief information.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; It is good practice to have a look at the efficiency of your job(s) on completion &#039;&#039;&#039;and we expect you to do so&#039;&#039;&#039;. This way you can improve your job specifications in the future.&lt;br /&gt;
&lt;br /&gt;
== How to get complete field values from sstat and sacct commands? ==&lt;br /&gt;
&lt;br /&gt;
When using the [https://slurm.schedmd.com/sacct.html#OPT_format --format] option for listing various fields you can put a %NUMBER afterwards to specify how many characters should be printed.&lt;br /&gt;
&lt;br /&gt;
E.g. &#039;--format=User%30&#039; will print 30 characters for the user name (right justified).  A %-30 will print 30 characters left justified.&lt;br /&gt;
&lt;br /&gt;
sstat and sacct also provide the &#039;--parsable&#039; and &#039;--parsable2&#039; option to always print full field values delimited with a pipe &#039;|&#039; character by default.&lt;br /&gt;
The delimiting character can be specified by using the &#039;--delimiter&#039; option, e.g. &#039;--delimiter=&amp;quot;,&amp;quot;&#039; for comma separated values.&lt;br /&gt;
&lt;br /&gt;
== How to retrieve job records for all jobs running/pending at a certain point in time? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sacct.html sacct] with [https://slurm.schedmd.com/sacct.html#OPT_state -s &amp;lt;state&amp;gt;] and [https://slurm.schedmd.com/sacct.html#OPT_starttime -S &amp;lt;start time&amp;gt;] options, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$  sacct -n -a -X -S 2021-04-01T00:00:00 -s R -o JobID,User%15,Account%10,NCPUS,NNodes,NodeList%1500&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; When specifying the state &amp;quot;-s &amp;lt;state&amp;gt;&amp;quot; &#039;&#039;&#039;and&#039;&#039;&#039; the start time &amp;quot;-S &amp;lt;start time&amp;gt;&amp;quot;, the default &lt;br /&gt;
time window will be set to end time &amp;quot;-E&amp;quot; equal to start time. Thus, you will get a snapshot of all running/pending &lt;br /&gt;
jobs at the instance given by &amp;quot;-S &amp;lt;start time&amp;gt;&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== How to get a parsable list of hostnames from $SLURM_JOB_NODELIST? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show hostnames $SLURM_JOB_NODELIST&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= ADMINISTRATION =&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Most commands in this section are restricted to system administrators.&lt;br /&gt;
&lt;br /&gt;
== How to stop Slurm from scheduling jobs? ==&lt;br /&gt;
&lt;br /&gt;
You can stop Slurm from scheduling jobs on a per partition basis by&lt;br /&gt;
setting that partition&#039;s state to DOWN. Set its state UP to resume&lt;br /&gt;
scheduling. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update PartitionName=foo State=DOWN&lt;br /&gt;
$ scontrol update PartitionName=foo State=UP&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to print actual hardware configuration of a node? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ slurmd -C   # print hardware configuration plus uptime&lt;br /&gt;
$ slurmd -G   # print generic resource configuration&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to reboot (all) nodes as soon as they become idle? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol reboot ASAP nextstate=RESUME &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;  # specific nodes&lt;br /&gt;
$ scontrol reboot ASAP nextstate=RESUME ALL              # all nodes&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to cancel pending reboot of nodes? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol cancel_reboot &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to check current node status? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show node &amp;lt;node&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to instruct all Slurm daemons to re-read the configuration file ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol reconfigure&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to prevent a user from submitting new jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use the following [https://slurm.schedmd.com/sacctmgr.html sacctmgr] command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr update user &amp;lt;username&amp;gt; set maxsubmitjobs=0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
*Job submission is then rejected with the following message:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch job.slurm&lt;br /&gt;
sbatch: error: AssocMaxSubmitJobLimit&lt;br /&gt;
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user&#039;s size and/or time limits)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Use the following command to release the limit:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr update user &amp;lt;username&amp;gt; set maxsubmitjobs=-1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to drain node(s)? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update NodeName=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; State=DRAIN Reason=&amp;quot;Some Reason&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notes:&lt;br /&gt;
&lt;br /&gt;
* Reason is mandatory.&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; just set state DOWN to drain nodes. This will kill any active jobs that may run on that nodes.&lt;br /&gt;
&lt;br /&gt;
== How to list reason for nodes being drained or down? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to resume node state? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update NodeName=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; State=RESUME&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to create a reservation on nodes? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/reservations.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol create reservation user=root starttime=now duration=UNLIMITED flags=maint,ignore_jobs nodes=ALL&lt;br /&gt;
$ scontrol create reservation user=root starttime=2020-12-24T17:00 duration=12:00:00 flags=maint,ignore_jobs nodes=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
$ scontrol show reservation&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Add &amp;quot;FLEX&amp;quot; flag to allow jobs that qualify for the reservation to start before the reservation begins (and continue after it starts). &lt;br /&gt;
Add &amp;quot;MAGNETIC&amp;quot; flag to attract jobs that qualify for the reservation to run in that reservation without having requested it at submit time.&lt;br /&gt;
&lt;br /&gt;
== How to create a floating reservation on nodes? ==&lt;br /&gt;
&lt;br /&gt;
Use the flag &amp;quot;TIME_FLOAT&amp;quot; and a start time that is relative to the current time (use the keyword &amp;quot;now&amp;quot;).&lt;br /&gt;
In the example below, the nodes are prevented from starting any jobs exceeding a walltime of 2 days.&lt;br /&gt;
 &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol create reservation user=root starttime=now+2days duration=UNLIMITED flags=maint,ignore_jobs,time_float nodes=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Floating reservation are not intended to run jobs, but to prevent long running jobs from being initiated on specific nodes. Attempts by users to make use of a floating reservation will be rejected. When ready to perform the maintenance, place the nodes in DRAIN state and delete the reservation.&lt;br /&gt;
&lt;br /&gt;
== How to use a reservation? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch --reservation=foo_6 ... script.slurm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to delete a reservation? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol delete ReservationName=foo_6&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get node oriented information similar to &#039;mdiag -n&#039;? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -N -l&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Fields can be individually customized. See [https://slurm.schedmd.com/sinfo.html sinfo] man page. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -N --format=&amp;quot;%8N %12P %.4C %.8O %.6m %.6e %.8T %.20E&amp;quot;&lt;br /&gt;
&lt;br /&gt;
NODELIST PARTITION    CPUS CPU_LOAD MEMORY FREE_M    STATE               REASON&lt;br /&gt;
n0001    standard*    0/16     0.01 128000 120445     idle                 none&lt;br /&gt;
n0002    standard*    0/16     0.01 128000 120438     idle                 none&lt;br /&gt;
n0003    standard*    0/0/      N/A 128000    N/A    down*       Not responding&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get node oriented information similar to &#039;pbsnodes&#039;? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show nodes                     # One paragraph per node (all nodes)&lt;br /&gt;
$ scontrol show nodes &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;     # One paragraph per node (specified nodes) &lt;br /&gt;
$ scontrol -o show nodes                  # One line per node (all nodes)&lt;br /&gt;
$ scontrol -o show nodes &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;  # One line per node (specified nodes)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to update multiple jobs of a user with a single scontrol command? ==&lt;br /&gt;
&lt;br /&gt;
Not possible. But you can e.g. use squeue to build the script taking&lt;br /&gt;
advantage of its filtering and formatting options.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue -tpd -h -o &amp;quot;scontrol update jobid=%i priority=1000&amp;quot; &amp;gt;my.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also identify the list of jobs and add them to the JobID all at once, for example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update JobID=123 qos=reallylargeqos&lt;br /&gt;
$ scontrol update JobID=123,456,789 qos=reallylargeqos&lt;br /&gt;
$ scontrol update JobID=[123-400],[500-600] qos=reallylargeqos&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another option is to use the JobName, if all the jobs have the same name.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update JobName=&amp;quot;foobar&amp;quot; UserID=johndoe qos=reallylargeqos&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
However, Slurm does not allow the UserID filter alone.&lt;br /&gt;
&lt;br /&gt;
== How to create a new account? ==&lt;br /&gt;
&lt;br /&gt;
Add account at top level in association tree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add account &amp;lt;accountname&amp;gt; Cluster=justus Description=&amp;quot;Account description&amp;quot; Organization=&amp;quot;none&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Add account as child of some parent account in association tree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add account &amp;lt;accountname&amp;gt; parent=&amp;lt;parent_accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to move account to another parent? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify account name=&amp;lt;accountname&amp;gt; set parent=&amp;lt;new_parent_accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to delete an account? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr delete account name=&amp;lt;accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to add a new user? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; DefaultAccount=&amp;lt;accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to add/remove users from an account? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; account=&amp;lt;accountname&amp;gt;                  # Add user to account&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; account=&amp;lt;accountname2&amp;gt;                 # Add user to a second account&lt;br /&gt;
$ sacctmgr remove user &amp;lt;username&amp;gt; where account=&amp;lt;accountname&amp;gt;         # Remove user from this account&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to change default account of a user? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;  &lt;br /&gt;
$ sacctmgr modify user where user=&amp;lt;username&amp;gt; set DefaultAccount=&amp;lt;default_account&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The user must already be associated with the account you want to set as default.&lt;br /&gt;
&lt;br /&gt;
== How to show account information? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr show assoc&lt;br /&gt;
$ sacctmgr show assoc tree&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to implement user resource throttling policies? ==&lt;br /&gt;
&lt;br /&gt;
Quoting from https://bugs.schedmd.com/show_bug.cgi?id=3600#c4&lt;br /&gt;
&lt;br /&gt;
 With Slurm, the associations are meant to establish base limits on the&lt;br /&gt;
 defined partitions, accounts and users. Because limits propagate down&lt;br /&gt;
 through the association tree, you only need to define limits at a high&lt;br /&gt;
 level and those limits will be applied to all partitions, accounts and&lt;br /&gt;
 users that are below it (parent to child). You can also override those&lt;br /&gt;
 high level (parent) limits by explicitly setting different limits at&lt;br /&gt;
 any lower level (on the child). So using the association tree is the&lt;br /&gt;
 best way to get some base limits applied that you want for most cases. &lt;br /&gt;
 QOS&#039;s are meant to override any of those base limits for exceptional&lt;br /&gt;
 cases. Like Maui, you can use QOS&#039;s to set a different priority.&lt;br /&gt;
 Again, the QOS would be overriding the base priority that could be set&lt;br /&gt;
 in the associations.&lt;br /&gt;
&lt;br /&gt;
== How to set a resource limit for an individual user? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/resource_limits.html&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set maxjobs=1            # Limit maximum number of running jobs for user&lt;br /&gt;
$ sacctmgr list assoc user=&amp;lt;username&amp;gt; format=user,maxjobs  # Show that limit&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set maxjobs=-1           # Remove that limit&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to retrieve historical resource usage for a specific user or account? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sreport.html sreport] command.&lt;br /&gt;
&lt;br /&gt;
Examples: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sreport cluster UserUtilizationByAccount Start=2021-01-01 End=2021-12-31 -t Hours user=&amp;lt;username&amp;gt;    # Report cluster utilization of given user broken down by accounts&lt;br /&gt;
$ sreport cluster AccountUtilizationByUser Start=2021-01-01 End=2021-12-31 -t Hours account=&amp;lt;account&amp;gt;  # Report cluster utilization of given account broken down by users    &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* By default CPU resources will be reported. Use &#039;-T&#039; option for other trackable resources, e.g. &#039;-T cpu,mem,gres/gpu,gres/scratch&#039;.&lt;br /&gt;
* On JUSTUS 2 registered compute projects (&amp;quot;Rechenvorhaben&amp;quot;) are uniquely mapped to Slurm accounts of the same name. Thus, &#039;AccountUtilizationByUser&#039; can also be used to report the aggregated cluster utilization of compute projects.&lt;br /&gt;
* Can be executed by regular users as well in which case Slurm will only report their own usage records (but along with the total usage of the associated account in the case of &#039;AccountUtilizationByUser&#039;).&lt;br /&gt;
&lt;br /&gt;
== How to fix/reset a user&#039;s RawUsage value? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; where Account=&amp;lt;account&amp;gt; set RawUsage=&amp;lt;number&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to create/modify/delete QOSes? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/qos.html&lt;br /&gt;
&lt;br /&gt;
Examples:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr show qos                                      # Show existing QOSes&lt;br /&gt;
$ sacctmgr add qos verylong                              # Create new QOS verylong&lt;br /&gt;
$ sacctmgr modify qos verylong set MaxWall=28-00:00:00   # Set maximum walltime limit&lt;br /&gt;
$ sacctmgr modify qos verylong set MaxTRESPerUser=cpu=4  # Set maximum maximum number of CPUS a user can allocate at a given time&lt;br /&gt;
$ sacctmgr modify qos verylong set flags=denyonlimit     # Prevent submission if job requests exceed any limits of QOS&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set qos+=verylong      # Add a QOS to a user account&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set qos-=verylong      # Remove a QOS from a user account&lt;br /&gt;
$ sacctmgr delete qos verylong                           # Delete that QOS&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to find (and fix) runaway jobs? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sacctmgr show runaway&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* Runaway jobs are orphaned jobs that don&#039;t exist in the Slurm controller but have a start and no end time in the Slurm data base. Runaway jobs mess with accounting and affects new jobs of users who have too many runaway jobs. &lt;br /&gt;
* If there are jobs in this state this command will also provide an option to fix them. This will set the end time for each job to the latest out of the start, eligible, or submit times, and set the state to completed.&lt;br /&gt;
&lt;br /&gt;
== How to show a history of database transactions? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sacctmgr list transactions&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Useful to get timestamps for when a user/account/qos has been created/modified/removed etc.&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Data_Transfer/Graphical_Clients&amp;diff=15224</id>
		<title>Data Transfer/Graphical Clients</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Data_Transfer/Graphical_Clients&amp;diff=15224"/>
		<updated>2025-08-19T15:55:38Z</updated>

		<summary type="html">&lt;p&gt;J Salk: /* Usage */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Graphical Clients are an alternative to command line tools. They provide a graphical user interface and usually offer access to multiple network protocols. They can be used for data transfer as well as access via ssh.&lt;br /&gt;
&lt;br /&gt;
== MobaXterm ==&lt;br /&gt;
&lt;br /&gt;
[https://mobaxterm.mobatek.net/ MobaXterm] is a graphical user interface for Windows. It is the recommended client for Windows as it provides a full-featured solution for Windows users to comfortably interact with remote systems. It enables the usage of basic Linux/Unix commands on Windows computers, for example ssh. Furthermore, it provides a graphical SFTP browser upon login to a remote system.&lt;br /&gt;
&lt;br /&gt;
=== Usage ===&lt;br /&gt;
&lt;br /&gt;
After MobaXterm is downloaded and installed, you can connect by providing the hostname and your credentials. Therefore, navigate to the &amp;amp;quot;Session&amp;amp;quot; option located in the top-left corner.&lt;br /&gt;
&lt;br /&gt;
[[File:mobaXterm_connect.png|center|center|x600px]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; When using file transfer with MobaXterm version &amp;amp;gt;=23.3, the following configuration change has to be made: In the settings in the tab &amp;amp;quot;SSH&amp;amp;quot;, change the option &amp;amp;quot;SSH engine&amp;amp;quot; from &amp;amp;quot;&amp;lt;new&amp;gt;&amp;amp;quot; to &amp;amp;quot;&amp;lt;legacy&amp;gt;&amp;amp;quot;. Then restart MobaXterm.&lt;br /&gt;
&lt;br /&gt;
To get the file browser, you might need to specify the SSH-browser type &amp;amp;quot;SCP&amp;amp;quot;:&amp;lt;br /&amp;gt;&lt;br /&gt;
[[File:mobaXterm_connect_advanced.png|center|x400px]]&lt;br /&gt;
&lt;br /&gt;
Files can be transferred between the local system and the cluster by navigating to the respective folders in the split file view and then either dragging files and folders between the views or by clicking on a file/folder with the right mouse button and then selecting &amp;amp;quot;Upload&amp;amp;quot; or &amp;amp;quot;Download&amp;amp;quot; from the menu. When copying directories and files from Windows to a Linux system, always check the access rights!&amp;lt;br /&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Make sure that your login scripts on the cluster, such as ~/.bashrc, do not print any extra messages unconditionally on the terminal as that will interfere with the underlying SCP protocol and cause the file transfer to fail.   &lt;br /&gt;
&lt;br /&gt;
[[File:mobaXterm_dragAndDrop.png|center|x500px]]&lt;br /&gt;
&lt;br /&gt;
Alternatively, you can connect via sftp.&lt;br /&gt;
&lt;br /&gt;
[[File:mobaXterm_sftp.png|center|x400px]]&lt;br /&gt;
&lt;br /&gt;
== FileZilla ==&lt;br /&gt;
&lt;br /&gt;
Filezilla is a graphical user interface for data transfer on all operating systems. It uses the network protocol sftp.&lt;br /&gt;
&lt;br /&gt;
=== Usage ===&lt;br /&gt;
&lt;br /&gt;
Start FileZilla and select &amp;amp;quot;File -&amp;amp;gt; Site Manager...&amp;amp;quot; from the main menu.&lt;br /&gt;
&lt;br /&gt;
[[File:filezilla_siteManager.png|center|x600px]]&lt;br /&gt;
&lt;br /&gt;
Select &amp;amp;quot;New site&amp;amp;quot; and set up a new connection with the following settings:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;Protocol: SFTP - SSH File Transfer Protocol  &lt;br /&gt;
Host: &amp;amp;lt;hostname&amp;amp;gt;  &lt;br /&gt;
Logon Typ: Interactive  &lt;br /&gt;
User: &amp;amp;lt;[username](https://wiki.bwhpc.de/e/Registration/Login/Username)&amp;amp;gt;  &amp;lt;/pre&amp;gt;&lt;br /&gt;
[[File:filezilla_newSite.png|center|x400px]]&lt;br /&gt;
&lt;br /&gt;
After providing the service password, you are connected. The left side of the interface displays your local machine, while the right side shows the server you have connected to.&lt;br /&gt;
&lt;br /&gt;
[[File:filezilla_connected.png|center|x500px]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; By default Filezilla will close the connection after 20 seconds of inactivity. In order to increase or disable this timeout, select &amp;amp;quot;Edit -&amp;amp;gt; Settings ... -&amp;amp;gt; Connections&amp;amp;quot; and increase &amp;amp;quot;Timeout in seconds&amp;amp;quot; to a reasonable value or set to 0 to disable connection timeout.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== WinSCP ==&lt;br /&gt;
&lt;br /&gt;
[https://winscp.net/eng/download.php WinSCP] is a graphical user interface for Windows. It can be used to transfer files between your PC and a cluster. It implements the &#039;&#039;scp&#039;&#039; command known from Linux and Mac. It supports sftp and webDAV as well.&lt;br /&gt;
&lt;br /&gt;
=== Usage ===&lt;br /&gt;
&lt;br /&gt;
After WinSCP is downloaded and installed, you can connect by providing the hostname and your credentials. Therefore, click on &amp;amp;quot;New Tab&amp;amp;quot; to create a &amp;amp;quot;New Site&amp;amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[File:winSCP_newSite.png|center|x500px|x20px]]&lt;br /&gt;
&lt;br /&gt;
[[File:winSCP_connect.png|center|x400px]]&lt;br /&gt;
&lt;br /&gt;
You can transfer the files via drag &amp;amp;amp; drop.&amp;lt;br /&amp;gt;&lt;br /&gt;
[[File:winSCP_dragAndDrop.png|center|x400px]]&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Workspace&amp;diff=14944</id>
		<title>Workspace</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Workspace&amp;diff=14944"/>
		<updated>2025-06-10T04:11:26Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;Workspace tools&#039;&#039;&#039; provide temporary scratch space so calles &#039;&#039;&#039;workspaces&#039;&#039;&#039; for your calculation on a central file storage. They are meant to keep data for a limited time – but usually longer than the time of a single job run. &lt;br /&gt;
&lt;br /&gt;
== No Backup ==&lt;br /&gt;
&lt;br /&gt;
Workspaces are not meant for permanent storage, hence data in workspaces is not backed up and may be lost in case of problems on the storage system. Please copy/move important results to $HOME or some disks outside the cluster.&lt;br /&gt;
&lt;br /&gt;
== Create workspace ==&lt;br /&gt;
To create a workspace you need to state &#039;&#039;name&#039;&#039; of your workspace and &#039;&#039;lifetime&#039;&#039; in days. A maximum value for &#039;&#039;lifetime&#039;&#039; and a maximum number of renewals is defined on each cluster.  Execution of:&lt;br /&gt;
&lt;br /&gt;
   $ ws_allocate blah 30&lt;br /&gt;
&lt;br /&gt;
e.g. returns:&lt;br /&gt;
 &lt;br /&gt;
   Workspace created. Duration is 720 hours. &lt;br /&gt;
   Further extensions available: 3&lt;br /&gt;
   /work/workspace/scratch/username-blah-0&lt;br /&gt;
&lt;br /&gt;
For more information read the program&#039;s help, i.e. &#039;&#039;$ ws_allocate -h&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== List all your workspaces ==&lt;br /&gt;
To list all your workspaces, execute:&lt;br /&gt;
&lt;br /&gt;
   $ ws_list&lt;br /&gt;
&lt;br /&gt;
which will return:&lt;br /&gt;
* Workspace ID&lt;br /&gt;
* Workspace location&lt;br /&gt;
* available extensions&lt;br /&gt;
* creation date and remaining time&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Find workspace location ==&lt;br /&gt;
&lt;br /&gt;
Workspace location/path can be prompted for any workspace &#039;&#039;ID&#039;&#039; using &#039;&#039;&#039;ws_find&#039;&#039;&#039;, in case of workspace &#039;&#039;blah&#039;&#039;:&lt;br /&gt;
&lt;br /&gt;
   $ ws_find blah&lt;br /&gt;
&lt;br /&gt;
returns the one-liner:&lt;br /&gt;
&lt;br /&gt;
   /work/workspace/scratch/username-blah-0&lt;br /&gt;
&lt;br /&gt;
 &lt;br /&gt;
&lt;br /&gt;
== Extend lifetime of your workspace ==&lt;br /&gt;
&lt;br /&gt;
Any workspace&#039;s lifetime can be only extended a cluster-specific number of times. There several commands to extend workspace lifetime&lt;br /&gt;
#&amp;lt;pre&amp;gt;$ ws_extend blah 40&amp;lt;/pre&amp;gt; which extends workspace ID &#039;&#039;blah&#039;&#039; by &#039;&#039;40&#039;&#039; days from now,&lt;br /&gt;
#&amp;lt;pre&amp;gt;$ ws_extend blah&amp;lt;/pre&amp;gt; which extends workspace ID &#039;&#039;blah&#039;&#039; by the number days used previously&lt;br /&gt;
#&amp;lt;pre&amp;gt;$ ws_allocate -x blah 40&amp;lt;/pre&amp;gt; which extends workspace ID &#039;&#039;blah&#039;&#039; by &#039;&#039;40&#039;&#039; days from now.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Setting Permissions for Sharing Files ==&lt;br /&gt;
The examples will assume you want to change the directory in $DIR. If you want to share a workspace, DIR could be set with &amp;lt;code&amp;gt;DIR=$(ws_find my_workspace)&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Regular Unix Permissions ===&lt;br /&gt;
&lt;br /&gt;
Making workspaces world readable/writable using standard unix access rights with &amp;lt;tt&amp;gt;chmod&amp;lt;/tt&amp;gt; is only feasible if you are in a research group and you and your co-workers share a common  (&amp;quot;bwXXXXX&amp;quot;) unix group. It is strongly discouraged to make files readable or even writable to everyone or to large common groups. &lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:45%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:55%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;chgrp -R bw16e001 &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
&amp;lt;tt&amp;gt;chmod -R g+rX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Set group ownership and grant read access to group for files in workspace via unix rights to the group &amp;quot;bw16e001&amp;quot; (has to be re-done if files are added)&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;chgrp -R bw16e001 &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt; &lt;br /&gt;
&amp;lt;tt&amp;gt;chmod -R g+rswX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Set group ownership and grant read/write access to group for files in workspace via unix rights (has to be re-done if files are added). Group will be inherited by new files, but rights for the group will have to be re-set with chmod for every new file&lt;br /&gt;
|- &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Options used:&lt;br /&gt;
* -R: recursive&lt;br /&gt;
* g+rwx&lt;br /&gt;
** g: group&lt;br /&gt;
** + add permissions (- to remove)&lt;br /&gt;
** rwx: read, write, execute&lt;br /&gt;
&lt;br /&gt;
=== &amp;quot;ACL&amp;quot;s: Access Crontrol Lists ===&lt;br /&gt;
ACLs  allow a much more detailed distribution of permissions but are a bit more complicated and not visible in detail via &amp;quot;ls&amp;quot;. They have the additional advantage that you can set a &amp;quot;default&amp;quot; ACL for a directory, (with a &amp;lt;tt&amp;gt;-d&amp;lt;/tt&amp;gt; flag or a &amp;lt;tt&amp;gt;d:&amp;lt;/tt&amp;gt; prefix) which will cause all newly created files to inherit the ACLs from the directory. Regular unix permissions only have limited support (only group ownership, not access rights) for this via the suid bit.&lt;br /&gt;
&lt;br /&gt;
Best practices with respect to ACL usage:&lt;br /&gt;
# Take into account that ACL take precedence over standard unix access rights&lt;br /&gt;
# The owner of a workspace is responsible for its content and management&lt;br /&gt;
&lt;br /&gt;
Please note that &amp;lt;tt&amp;gt;ls&amp;lt;/tt&amp;gt; (List directory contents) shows ACLs on directories and files only when run as &amp;lt;tt&amp;gt;ls -l&amp;lt;/tt&amp;gt; as in long format, as &amp;quot;plus&amp;quot; sign after the standard unix access rights. &lt;br /&gt;
&lt;br /&gt;
Examples with regard to &amp;quot;my_workspace&amp;quot;:&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:45%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:55%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;getfacl &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|List access rights on $DIR&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;setfacl -Rm u:fr_xy1:rX,d:u:fr_xy1:rX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Grant user &amp;quot;fr_xy1&amp;quot; read-only access to $DIR&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;setfacl -Rm u:fr_me0000:rwX,d:u:fr_me0000:rwX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
&amp;lt;tt&amp;gt;setfacl -Rm u:fr_xy1:rwX,d:u:fr_xy1:rwX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Grant your own user &amp;quot;fr_me0000&amp;quot; and &amp;quot;fr_xy1&amp;quot; inheritable read and write access to $DIR, so you can also read/write files put into the workspace by a coworker&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;setfacl -Rm g:bw16e001:rX,d:g:bw16e001:rX &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Grant group (Rechenvorhaben) &amp;quot;bw16e001&amp;quot; read-only access to $DIR&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;setfacl -Rb &amp;quot;$DIR&amp;quot;&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Remove all ACL rights. Standard Unix access rights apply again.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Options used:&lt;br /&gt;
* -R: recursive&lt;br /&gt;
* -m: modify&lt;br /&gt;
* u:username:rwX u: next name is a user; rwX read, write, eXecute (only where execute is set for user)&lt;br /&gt;
&lt;br /&gt;
== Delete a Workspace ==&lt;br /&gt;
&lt;br /&gt;
   $ ws_release blah # Manually erase your workspace blah&lt;br /&gt;
&lt;br /&gt;
Note: workspaces are kept for some time after release. To immediately delete and free space e.g. for quota reasons, delete the files with rm before release.&lt;br /&gt;
&lt;br /&gt;
Newer versions of workspace tools have a --delete-data flag that immediately deletes data. Note that deleted data from workspaces is permanently lost.&lt;br /&gt;
&lt;br /&gt;
== Restore an Expired Workspace ==&lt;br /&gt;
&lt;br /&gt;
For a certain (system-specific) grace time following workspace expiration, a workspace can be restored by performing the following steps:&lt;br /&gt;
&lt;br /&gt;
(1) Display restorable workspaces.&lt;br /&gt;
 ws_restore -l&lt;br /&gt;
&lt;br /&gt;
(2) Create a new workspace as the target for the restore:&lt;br /&gt;
 ws_allocate restored 60&lt;br /&gt;
&lt;br /&gt;
(3) Restore:&lt;br /&gt;
 ws_restore &amp;lt;full_name_of_expired_workspace&amp;gt; restored&lt;br /&gt;
&lt;br /&gt;
The expired workspace has to be specified using the &#039;&#039;&#039;full name&#039;&#039;&#039;, including username prefix and timestamp suffix (otherwise, it cannot be uniquely identified).&lt;br /&gt;
The target workspace, on the other hand, must be given with just its short name as listed by &amp;lt;code&amp;gt;ws_list&amp;lt;/code&amp;gt;, without the username prefix.&lt;br /&gt;
&lt;br /&gt;
If the workspace is no visible/restorable, it has been &#039;&#039;&#039;permanently deleted&#039;&#039;&#039; and cannot be restored, not even by us. Please always remember, that workspaces are intended solely for temporary work data, and there is no backup of data in the workspaces.&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_JUSTUS_2_Slurm_HOWTO&amp;diff=14346</id>
		<title>BwForCluster JUSTUS 2 Slurm HOWTO</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_JUSTUS_2_Slurm_HOWTO&amp;diff=14346"/>
		<updated>2025-03-13T08:02:31Z</updated>

		<summary type="html">&lt;p&gt;J Salk: /* How to convert Moab batch job scripts to Slurm? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Justus2}}&lt;br /&gt;
&lt;br /&gt;
This is a collection of howtos and convenient Slurm commands for JUSTUS 2.&lt;br /&gt;
&lt;br /&gt;
Some commands behave slightly different depending on whether they are executed &lt;br /&gt;
by a system administrator or by a regular user, as Slurm prevents regular users from accessing critical system information and viewing job and usage information of other users.  &lt;br /&gt;
&lt;br /&gt;
= GENERAL INFORMATION =&lt;br /&gt;
&lt;br /&gt;
== How to find a general quick start user guide? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/quickstart.html&lt;br /&gt;
&lt;br /&gt;
== How to find Slurm FAQ? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/faq.html&lt;br /&gt;
&lt;br /&gt;
== How to find a Slurm cheat sheet? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/pdfs/summary.pdf&lt;br /&gt;
&lt;br /&gt;
== How to find Slurm tutorials? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/tutorials.html&lt;br /&gt;
&lt;br /&gt;
== How to get more information on Slurm? ==&lt;br /&gt;
&lt;br /&gt;
(Almost) every Slurm command has a man page. Use it.&lt;br /&gt;
&lt;br /&gt;
Online versions: https://slurm.schedmd.com/man_index.html&lt;br /&gt;
&lt;br /&gt;
== How to find hardware specific details about JUSTUS 2? ==&lt;br /&gt;
&lt;br /&gt;
See our Wiki page: [[Hardware and Architecture (bwForCluster JUSTUS 2)|Hardware and Architecture]]&lt;br /&gt;
&lt;br /&gt;
= JOB SUBMISSION =&lt;br /&gt;
&lt;br /&gt;
== How to submit a serial batch job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html sbatch]  command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch &amp;lt;job-script&amp;gt; &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample job script template for serial job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# 8 GB memory required per node&lt;br /&gt;
#SBATCH --mem=8G&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=serial_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=serial_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=serial_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# Run serial program&lt;br /&gt;
./my_serial_program&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for serial program: [[Media:Hello_serial.c | Hello_serial.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
* --nodes=1 and --ntasks-per-node=1 may be replaced by --ntasks=1.&lt;br /&gt;
* If not specified, stdout and stderr are both written to slurm-%j.out.&lt;br /&gt;
&lt;br /&gt;
== How to find working sample scripts for my program? ==&lt;br /&gt;
&lt;br /&gt;
Most software modules for applications provide working sample batch scripts.&lt;br /&gt;
Check with [[Software_Modules_Lmod#Module_specific_help | module help]] command, e.g. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module help chem/vasp     # display module help for VASP&lt;br /&gt;
$ module help math/matlab   # display module help for Matlab&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to harden job scripts against common errors? ==&lt;br /&gt;
&lt;br /&gt;
The bash shell provides several options that support users in disclosing hidden bugs and writing safer job scripts.&lt;br /&gt;
In order to activate these safeguard settings users can insert the following lines in their scripts (after all #SBATCH directives):    &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
set -o errexit   # (or set -e) cause batch script to exit immediately when a command fails.&lt;br /&gt;
set -o pipefail  # cause batch script to exit immediately also when the command that failed is embedded in a pipeline&lt;br /&gt;
set -o nounset   # (or set -u) causes the script to treat unset variables as an error and exit immediately &lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to submit an interactive job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/salloc.html salloc] command, e.g.:&lt;br /&gt;
&amp;lt;pre&amp;gt;$ salloc --nodes=1 --ntasks-per-node=8&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
In previous Slurm versions &amp;lt; 20.11 the use of [https://slurm.schedmd.com/srun.html srun] has been the recommended way for launching interactive jobs, e.g.:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ srun --nodes=1 --ntasks-per-node=8 --pty bash &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Although this still works with current Slurm versions this is considered &#039;&#039;&#039;deprecated &#039;&#039;&#039; for current Slurm versions as it may cause issues when launching additional jobs steps from within the interactive job environment. Use [https://slurm.schedmd.com/salloc.html salloc] command.&lt;br /&gt;
&lt;br /&gt;
== How to enable X11 forwarding for an interactive job? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--x11&#039; flag, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc --nodes=1 --ntasks-per-node=8 --x11     # run shell with X11 forwarding enabled&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
* For X11 forwarding to work, you must also enable X11 forwarding for your ssh login from your local computer to the cluster, i.e.:&lt;br /&gt;
 &amp;lt;pre&amp;gt;local&amp;gt; ssh -X &amp;lt;username&amp;gt;@justus2.uni-ulm.de&amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to convert Moab batch job scripts to Slurm? ==&lt;br /&gt;
&lt;br /&gt;
Replace Moab/Torque job specification flags and environment variables in your job&lt;br /&gt;
scripts by their corresponding Slurm counterparts.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Commonly used Moab job specification flags and their Slurm equivalents&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Option !! Moab (msub) !! Slurm (sbatch)&lt;br /&gt;
|-&lt;br /&gt;
| Script directive                            || #MSUB                                  || #SBATCH&lt;br /&gt;
|-&lt;br /&gt;
| Job name                                    || -N &amp;lt;name&amp;gt;                              || --job-name=&amp;lt;name&amp;gt;  (-J &amp;lt;name&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Account                                     || -A &amp;lt;account&amp;gt;                           || --account=&amp;lt;account&amp;gt; (-A &amp;lt;account&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Queue                                       || -q &amp;lt;queue&amp;gt;                             || --partition=&amp;lt;partition&amp;gt; (-p &amp;lt;partition&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Wall time limit                             || -l walltime=&amp;lt;hh:mm:ss&amp;gt;                 || --time=&amp;lt;hh:mm:ss&amp;gt; (-t &amp;lt;hh:mm:ss&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Node count                                  || -l nodes=&amp;lt;count&amp;gt;                       || --nodes=&amp;lt;count&amp;gt; (-N &amp;lt;count&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Core count                                  || -l procs=&amp;lt;count&amp;gt;                       || --ntasks=&amp;lt;count&amp;gt; (-n &amp;lt;count&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Process count per node                      || -l ppn=&amp;lt;count&amp;gt;                         || --ntasks-per-node=&amp;lt;count&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Core count per process                      ||                                        || --cpus-per-task=&amp;lt;count&amp;gt; (-c &amp;lt;count&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Memory limit per node                       || -l mem=&amp;lt;limit&amp;gt;                         || --mem=&amp;lt;limit&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Memory limit per process                    || -l pmem=&amp;lt;limit&amp;gt;                        || --mem-per-cpu=&amp;lt;limit&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Job array                                   || -t &amp;lt;array indices&amp;gt;                     || --array=&amp;lt;indices&amp;gt; (-a &amp;lt;indices&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Node exclusive job                          || -l naccesspolicy=singlejob             || --exclusive&lt;br /&gt;
|-&lt;br /&gt;
| Initial working directory                   || -d &amp;lt;directory&amp;gt; (default: $HOME)        || --chdir=&amp;lt;directory&amp;gt; (-D &amp;lt;directory&amp;gt;) (default: submission directory)&lt;br /&gt;
|-&lt;br /&gt;
| Standard output file                        || -o &amp;lt;file path&amp;gt;                         || --output=&amp;lt;file&amp;gt; (-o &amp;lt;file&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Standard error file                         || -e &amp;lt;file path&amp;gt;                         || --error=&amp;lt;file&amp;gt;  (-e &amp;lt;file&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Combine stdout/stderr to stdout             || -j oe                                  || --output=&amp;lt;combined stdout/stderr file&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Mail notification events                    || -m &amp;lt;event&amp;gt;                             || --mail-type=&amp;lt;events&amp;gt; (valid types include: NONE, BEGIN, END, FAIL, ALL)&lt;br /&gt;
|-&lt;br /&gt;
| Export environment to job                   || -V                                     || --export=ALL (default)&lt;br /&gt;
|-&lt;br /&gt;
| Don&#039;t export environment to job             || (default)                              || --export=NONE&lt;br /&gt;
|-&lt;br /&gt;
| Export environment variables to job         || -v &amp;lt;var[=value][,var2=value2[, ...]]&amp;gt;  || --export=&amp;lt;var[=value][,var2=value2[,...]]&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
* Default initial job working directory is $HOME for Moab. For Slurm the default working directory is where you submit your job from.&lt;br /&gt;
* By default Moab does not export any environment variables to the job&#039;s runtime environment. With Slurm most of the login environment variables are exported to your job&#039;s runtime environment. This includes environment variables from software modules that were loaded at job submission time (and also $HOSTNAME variable).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Commonly used Moab/Torque script environment variables and their Slurm equivalents&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Information                 !! Moab                !! Torque               !! Slurm                                     &lt;br /&gt;
|-&lt;br /&gt;
| Job name                     || $MOAB_JOBNAME        || $PBS_JOBNAME        || $SLURM_JOB_NAME                           &lt;br /&gt;
|-&lt;br /&gt;
| Job ID                       || $MOAB_JOBID          || $PBS_JOBID          || $SLURM_JOB_ID                             &lt;br /&gt;
|-&lt;br /&gt;
| Submit directory             || $MOAB_SUBMITDIR      || $PBS_O_WORKDIR      || $SLURM_SUBMIT_DIR                         &lt;br /&gt;
|-&lt;br /&gt;
| Number of nodes allocated    || $MOAB_NODECOUNT      || $PBS_NUM_NODES      || $SLURM_JOB_NUM_NODES (and: $SLURM_NNODES) &lt;br /&gt;
|-&lt;br /&gt;
| Node list                    || $MOAB_NODELIST       || cat $PBS_NODEFILE   || $SLURM_JOB_NODELIST                       &lt;br /&gt;
|-&lt;br /&gt;
| Number of processes          || $MOAB_PROCCOUNT      || $PBS_TASKNUM        || $SLURM_NTASKS                             &lt;br /&gt;
|-&lt;br /&gt;
| Requested tasks per node     || ---                    || $PBS_NUM_PPN        || $SLURM_NTASKS_PER_NODE                    &lt;br /&gt;
|-&lt;br /&gt;
| Requested CPUs per task      || ---                  || ---                 || $SLURM_CPUS_PER_TASK                      &lt;br /&gt;
|-&lt;br /&gt;
| Job array index              || $MOAB_JOBARRAYINDEX  || $PBS_ARRAY_INDEX    || $SLURM_ARRAY_TASK_ID                      &lt;br /&gt;
|-&lt;br /&gt;
| Job array range              || $MOAB_JOBARRAYRANGE  || -                   || $SLURM_ARRAY_TASK_COUNT                   &lt;br /&gt;
|-&lt;br /&gt;
| Queue name                   || $MOAB_CLASS          || $PBS_QUEUE          || $SLURM_JOB_PARTITION                      &lt;br /&gt;
|-&lt;br /&gt;
| QOS name                     || $MOAB_QOS            || ---                 || $SLURM_JOB_QOS                            &lt;br /&gt;
|-&lt;br /&gt;
| Number of processes per node | ---                   || $PBS_NUM_PPN        || $SLURM_TASKS_PER_NODE                     &lt;br /&gt;
|-&lt;br /&gt;
| Job user                     || $MOAB_USER           || $PBS_O_LOGNAME      || $SLURM_JOB_USER                           &lt;br /&gt;
|-&lt;br /&gt;
| Hostname                     || $MOAB_MACHINE        || $PBS_O_HOST         || $SLURMD_NODENAME                          &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* See [https://slurm.schedmd.com/sbatch.html sbatch] man page for a complete list of flags and environment variables.&lt;br /&gt;
&lt;br /&gt;
== How to emulate Moab output file names? ==&lt;br /&gt;
&lt;br /&gt;
Use the following directives:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#SBATCH --output=&amp;quot;%x.o%j&amp;quot;&lt;br /&gt;
#SBATCH --error=&amp;quot;%x.e%j&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to pass command line arguments to the job script? ==&lt;br /&gt;
&lt;br /&gt;
Run &amp;lt;pre&amp;gt;$ sbatch &amp;lt;job-script&amp;gt; arg1 arg2 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Inside the job script the arguments can be accessed as $1, $2, ...&lt;br /&gt;
&lt;br /&gt;
E.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
infile=&amp;quot;$1&amp;quot;&lt;br /&gt;
outfile=&amp;quot;$2&amp;quot;&lt;br /&gt;
./my_serial_program &amp;lt; &amp;quot;$infile&amp;quot; &amp;gt; &amp;quot;$outfile&amp;quot; 2&amp;gt;&amp;amp;1&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; use $1, $2, ... in &amp;quot;#SBATCH&amp;quot; lines. These parameters can be used only within the regular shell script.&lt;br /&gt;
&lt;br /&gt;
== How to request local scratch (SSD/NVMe) at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--gres=scratch:nnn&#039; option to allocate nnn GB of local (i.e. node-local) scratch space for the entire job.&lt;br /&gt;
&lt;br /&gt;
Example: &#039;--gres=scratch:100&#039; will allocate 100 GB scratch space on a locally attached NVMe device.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; add any unit (such as --gres=scratch:100G). This would be treated as requesting an amount of 10^9 * 100GB of scratch space.&lt;br /&gt;
&lt;br /&gt;
* Multinode jobs get nnn GB of local scratch space on every node of the job.&lt;br /&gt;
&lt;br /&gt;
* Environment variable &#039;&#039;&#039;$SCRATCH&#039;&#039;&#039; will point to &lt;br /&gt;
** /scratch/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt; when local scratch has been requested. This will be on locally attached SSD/NVMe devices.&lt;br /&gt;
** /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt; when no local scratch has been requested. This will be in memory and, thus, be limited in size.&lt;br /&gt;
&lt;br /&gt;
* Environment variable &#039;&#039;&#039;$TMPDIR&#039;&#039;&#039; always points to /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt;. This will always be in memory and, thus, limited in size.&lt;br /&gt;
&lt;br /&gt;
* For backward compatibility environment variable $RAMDISK always points to /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Scratch space allocation in /scratch will be enforced by quota limits&lt;br /&gt;
&lt;br /&gt;
* Data written to $TMPDIR will always count against allocated memory.&lt;br /&gt;
&lt;br /&gt;
* Data written to local scratch space will automatically be removed at the end of the job.&lt;br /&gt;
&lt;br /&gt;
== How to request GPGPU nodes at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--gres=gpu:&amp;lt;count&amp;gt;&#039; option to allocate 1 or 2 GPUs per node for the entire job.&lt;br /&gt;
&lt;br /&gt;
Example: &#039;--gres=gpu:1&#039; will allocate one GPU per node for this job.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* GPGPU nodes are equipped with two Nvidia V100S cards &lt;br /&gt;
&lt;br /&gt;
* Environment variables $CUDA_VISIBLE_DEVICES, $SLURM_JOB_GPUS and $GPU_DEVICE_ORDINAL will denote card(s) allocated for the job.&lt;br /&gt;
&lt;br /&gt;
* CUDA Toolkit is available as software module devel/cuda.&lt;br /&gt;
&lt;br /&gt;
== How to clean-up or save files before a job times out? ==&lt;br /&gt;
&lt;br /&gt;
Possibly you would like to clean up the work directory or save intermediate result files in case a job times out.&lt;br /&gt;
&lt;br /&gt;
The following sample script may serve as a blueprint for implementing a pre-termination function to perform clean-up or file recovery actions. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# 2 GB memory required per node&lt;br /&gt;
#SBATCH --mem=2G&lt;br /&gt;
# Request 10 GB local scratch space&lt;br /&gt;
#SBATCH --gres=scratch:10&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=10:00&lt;br /&gt;
# Send the USR1 signal 120 seconds before end of time limit&lt;br /&gt;
#SBATCH --signal=B:USR1@120&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=signal_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=signal_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=signal_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Define the signal handler function&lt;br /&gt;
# Note: This is not executed here, but rather when the associated &lt;br /&gt;
# signal is received by the shell.&lt;br /&gt;
finalize_job()&lt;br /&gt;
{&lt;br /&gt;
    # Do whatever cleanup you want here. In this example we copy&lt;br /&gt;
    # output file(s) back to $SLURM_SUBMIT_DIR, but you may implement &lt;br /&gt;
    # your own job finalization code here.&lt;br /&gt;
    echo &amp;quot;function finalize_job called at `date`&amp;quot;&lt;br /&gt;
    cd $SCRATCH&lt;br /&gt;
    mkdir -vp &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/results&lt;br /&gt;
    tar czvf &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/results/${SLURM_JOB_ID}.tgz output*.txt&lt;br /&gt;
    exit&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# Call finalize_job function as soon as we receive USR1 signal&lt;br /&gt;
trap &#039;finalize_job&#039; USR1&lt;br /&gt;
&lt;br /&gt;
# Copy input files for this job to the scratch directory (if needed).&lt;br /&gt;
# Note: Environment variable $SCRATCH always points to a scratch directory &lt;br /&gt;
# automatically created for this job. Environment variable $SLURM_SUBMIT_DIR &lt;br /&gt;
# points to the path where this script was submitted from.&lt;br /&gt;
# Example:&lt;br /&gt;
# cp -v &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/input*.txt &amp;quot;$SCRATCH&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Change working directory to local scratch directory&lt;br /&gt;
cd &amp;quot;$SCRATCH&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# This is where the actual work is done. In this case we just create &lt;br /&gt;
# a sample output file for 900 (=15*60) seconds, but since we asked &lt;br /&gt;
# Slurm for 600 seconds only it will not be able finish within this &lt;br /&gt;
# wall time.&lt;br /&gt;
# Note: It is important to run this task in the background &lt;br /&gt;
# by placing the &amp;amp; symbol at the end. Otherwise the signal handler &lt;br /&gt;
# would not be executed until that process has finished, which is not &lt;br /&gt;
# what we want.&lt;br /&gt;
(for i in `seq 15`; do echo &amp;quot;Hello World at `date +%H:%M:%S`.&amp;quot;; sleep 60; done) &amp;gt;output.txt 2&amp;gt;&amp;amp;1 &amp;amp;&lt;br /&gt;
&lt;br /&gt;
# Note: The command above is just for illustration. Normally you would just run&lt;br /&gt;
# my_program &amp;gt;output.txt 2&amp;gt;&amp;amp;1 &amp;amp;&lt;br /&gt;
&lt;br /&gt;
# Tell the shell to wait for background task(s) to finish. &lt;br /&gt;
# Note: This is important because otherwise the parent shell &lt;br /&gt;
# (this script) would proceed (and terminate) without waiting for &lt;br /&gt;
# background task(s) to finish.&lt;br /&gt;
wait&lt;br /&gt;
&lt;br /&gt;
# If we get here, the job did not time out but finished in time.&lt;br /&gt;
&lt;br /&gt;
# Release user defined signal handler for USR1&lt;br /&gt;
trap - USR1&lt;br /&gt;
&lt;br /&gt;
# Do regular cleanup and save files. In this example we simply call &lt;br /&gt;
# the same function that we defined as a signal handler above, but you &lt;br /&gt;
# may implement your own code here. &lt;br /&gt;
finalize_job&lt;br /&gt;
&lt;br /&gt;
exit&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* The number of seconds specified in --signal option must match the runtime of the pre-termination function and must not exceed 65535 seconds.&lt;br /&gt;
&lt;br /&gt;
* Due to the resolution of event handling by Slurm, the signal may be sent a little earlier than specified.&lt;br /&gt;
&lt;br /&gt;
== How to submit a multithreaded batch job? ==&lt;br /&gt;
&lt;br /&gt;
Sample job script template for a job running one multithreaded program instance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# Number of cores per program instance&lt;br /&gt;
#SBATCH --cpus-per-task=8&lt;br /&gt;
# 8 GB memory required per node&lt;br /&gt;
#SBATCH --mem=8G&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=multithreaded_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=multithreaded_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=multithreaded_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}&lt;br /&gt;
export MKL_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}&lt;br /&gt;
&lt;br /&gt;
# Run multithreaded program&lt;br /&gt;
./my_multithreaded_program&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for multithreaded program: [[Media:Hello_openmp.c | Hello_openmp.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* In our configuration each physical core is considered a &amp;quot;CPU&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
* On JUSTUS 2 it is recommended to specify a number of cores per task (&#039;--cpus-per-task&#039;) that is either an integer divisor of 24 or (at most) 48.&lt;br /&gt;
&lt;br /&gt;
* Required memory can also by specified per allocated CPU with &#039;--mem-per-cpu&#039; option. &lt;br /&gt;
&lt;br /&gt;
* The &#039;--mem&#039; and &#039;--mem-per-cpu&#039; options are mutually exclusive.&lt;br /&gt;
&lt;br /&gt;
==  How to submit an array job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_array -a] (or [https://slurm.schedmd.com/sbatch.html#OPT_array --array]) option, e.g. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -a 1-16%8 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will submit 16 tasks to be executed, each one indexed by SLURM_ARRAY_TASK_ID ranging from 1 to 16, but will limit the number of simultaneously running tasks from this job array to 8.&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an array job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Number of cores per individual array task&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --array=1-16%8&lt;br /&gt;
#SBATCH --mem=4G&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=array_job&lt;br /&gt;
#SBATCH --output=array_job-%A_%a.out&lt;br /&gt;
#SBATCH --error=array_job-%A_%a.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# Print the task id.&lt;br /&gt;
echo &amp;quot;My SLURM_ARRAY_TASK_ID: &amp;quot; $SLURM_ARRAY_TASK_ID&lt;br /&gt;
&lt;br /&gt;
# Add lines here to run your computations, e.g.&lt;br /&gt;
# ./my_program &amp;lt;input.$SLURM_ARRAY_TASK_ID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Placeholder %A will be replaced by the master job id, %a will be replaced by the array task id.&lt;br /&gt;
&lt;br /&gt;
* Every sub job in an array job will have its own unique environment variable $SLURM_JOB_ID. Environment variable $SLURM_ARRAY_JOB_ID  will be set to the first job array index value for all tasks.&lt;br /&gt;
&lt;br /&gt;
* The remaining options in the sample job script are the same as the options used in other, non-array jobs. In the example above, we are requesting that each array task be allocated 1 CPU (--ntasks=1) and 4 GB of memory (--mem=4G) for up to one hour (--time=01:00:00).&lt;br /&gt;
&lt;br /&gt;
* More information: https://slurm.schedmd.com/job_array.html&lt;br /&gt;
&lt;br /&gt;
== How to delay the start of a job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_begin -b] (or [https://slurm.schedmd.com/sbatch.html#OPT_begin --begin]) option in order to defer the allocation of the job until the specified time.&lt;br /&gt;
&lt;br /&gt;
Examples:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch --begin=20:00 ...               # job can start after 8 p.m. &lt;br /&gt;
sbatch --begin=now+1hour ...           # job can start 1 hour after submission&lt;br /&gt;
sbatch --begin=teatime ...             # job can start at teatime (4 p.m.)&lt;br /&gt;
sbatch --begin=2023-12-24T20:00:00 ... # job can start after specified date/time&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to submit dependency (chain) jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_dependency -d] (or [https://slurm.schedmd.com/sbatch.html#OPT_dependency --dependency]) option, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -d afterany:123456 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will defer the submitted job until the specified job 123456 has terminated.&lt;br /&gt;
&lt;br /&gt;
Slurm supports a number of different dependency types, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-d after:123456      # job can begin execution after the specified job has begun execution&lt;br /&gt;
-d afterany:123456   # job can begin execution after the specified job has finished&lt;br /&gt;
-d afternotok:123456 # job can begin execution after the specified job has failed (exit code not equal zero)&lt;br /&gt;
-d afterok:123456    # job can begin execution after the specified job has successfully finished (exit code zero)&lt;br /&gt;
-d singleton         # job can begin execution after any previously job with the same job name and user have finished&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Multiple jobs can be specified by separating their job ids by colon characters (:), e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt; $ sbatch -d afterany:123456:123457 ... &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will defer the submitted job until the specified jobs 123456 and 123457 have both finished.&lt;br /&gt;
&lt;br /&gt;
== How to deal with invalid job dependencies? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_kill-on-invalid-dep --kill-on-invalid-dep=yes] option in order to automatically terminate jobs which can never run due to invalid dependencies. By default the job stays pending with reason &#039;DependencyNeverSatisfied&#039; to allow review and appropriate action by the user.  &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; A job dependency may also become invalid if a job has been submitted with &#039;-d afterok:&amp;lt;jobid&amp;gt;&#039; but the specified dependency job has failed, e.g. because it timed out (i.e. exceeded its wall time limit).&lt;br /&gt;
&lt;br /&gt;
== How to submit an MPI batch job? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/mpi_guide.html&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an MPI job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate two nodes&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=48&lt;br /&gt;
# Allocate 32 GB memory per node&lt;br /&gt;
#SBATCH --mem=32gb&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=mpi_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=mpi_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Add lines here to run your computations, e.g.&lt;br /&gt;
#&lt;br /&gt;
# Option 1: Lauch MPI tasks by using mpirun&lt;br /&gt;
#&lt;br /&gt;
# for OpenMPI and GNU compiler:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/gnu&lt;br /&gt;
# module load mpi/openmpi&lt;br /&gt;
# mpirun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# for Intel MPI and Intel complier:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/intel&lt;br /&gt;
# module load mpi/impi&lt;br /&gt;
# mpirun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# Option 2: Launch MPI tasks by using srun&lt;br /&gt;
#&lt;br /&gt;
# for OpenMPI and GNU compiler:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/gnu&lt;br /&gt;
# module load mpi/openmpi&lt;br /&gt;
# srun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# for Intel MPI and Intel compiler:&lt;br /&gt;
#&lt;br /&gt;
module load compiler/intel&lt;br /&gt;
module load mpi/impi&lt;br /&gt;
srun  ./my_mpi_program&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for MPI program: [[Media:Hello_mpi.c | Hello_mpi.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* SchedMD recommends to use srun and many (most?) sites do so as well. The rationale is that srun is more tightly integrated with the scheduler and provides more consistent and reliable resource tracking and accounting for individual jobs and job steps. mpirun may behave differently for different MPI implementations and versions. There are reports that claim &amp;quot;strange behavior&amp;quot; of mpirun especially when using task affinity and core binding. Using srun is supposed to resolve these issues and is therefore highly recommended.&lt;br /&gt;
* Do not run batch jobs that launch a large number (hundreds or thousands) short running (few minutes or less) MPI programs, e.g. from a shell loop. Every single MPI invocation does generate its own job step and sends remote procedure calls to the Slurm controller server. This can result in degradation of performance for both, Slurm and the application, especially if many of that jobs happen to run at the same time. Jobs of that kind can even get stuck without showing any further activity until hitting the wall time limit. For high throughput computing (e.g. processing a large number of files with every single task running independently from each other and very shortly), consider a more appropriate parallelization paradigm that invokes independent serial (non-MPI) processes in parallel at the same time. This approach is sometimes referred to as &amp;quot;[https://en.wikipedia.org/wiki/Embarrassingly_parallel pleasingly parallel]&amp;quot; workload. GNU Parallel is a shell tool that facilitates executing serial tasks in parallel. On JUSTUS 2 this tool is available as a software module &amp;quot;system/parallel&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== How to submit a hybrid MPI/OpenMP job? ==&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an hybrid job:&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Number of nodes to allocate&lt;br /&gt;
#SBATCH --nodes=4&lt;br /&gt;
# Number of MPI instances (ranks) to be executed per node&lt;br /&gt;
#SBATCH --ntasks-per-node=2&lt;br /&gt;
# Number of threads per MPI instance&lt;br /&gt;
#SBATCH --cpus-per-task=24&lt;br /&gt;
# Allocate 8 GB memory per node&lt;br /&gt;
#SBATCH --mem=8gb&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=hybrid_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=hybrid_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=hybrid_job-%j.err&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}&lt;br /&gt;
export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK}&lt;br /&gt;
&lt;br /&gt;
module load compiler/intel&lt;br /&gt;
module load mpi/impi&lt;br /&gt;
srun ./my_hybrid_program&lt;br /&gt;
&lt;br /&gt;
# or:&lt;br /&gt;
# mpirun ./my_hybrid_program&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for hybrid program: [[Media:Hello_hybrid.c | Hello_hybrid.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* $SLURM_CPUS_PER_TASK is only set if the &#039;--cpus-per-task&#039; option is specified.&lt;br /&gt;
&lt;br /&gt;
== How to request specific node(s) at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_nodelist -w] (or [https://slurm.schedmd.com/sbatch.html#OPT_nodelist --nodelist]) option, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -w &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also see [https://slurm.schedmd.com/sbatch.html#OPT_nodefile -F] (or [https://slurm.schedmd.com/sbatch.html#OPT_nodefile --nodefile]) option.&lt;br /&gt;
&lt;br /&gt;
== How to exclude specific nodes from job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_exclude -x] (or [https://slurm.schedmd.com/sbatch.html#OPT_exclude --exclude]) option, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -x &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get exclusive jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--exclusive&#039; option on job submission. This makes sure that there will be no other jobs running on your nodes. Very useful for benchmarking!&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* --exclusive option does &#039;&#039;&#039;not&#039;&#039;&#039; mean that you automatically get full access to all the resources which the node might provide without explicitly requesting them.&lt;br /&gt;
&lt;br /&gt;
== How to avoid sharing nodes with other users? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--exclusive=user&#039; option on job submission. This will still allow multiple jobs of one and the same user on the nodes.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Depending on configuration, exclusive=user may (and probably will) be the default node access policy on JUSTUS 2.&lt;br /&gt;
&lt;br /&gt;
==  How to submit batch job without job script? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_wrap --wrap] option.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch --nodes=2 --ntasks-per-node=16 --wrap &amp;quot;sleep 600&amp;quot;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; May be useful for testing purposes.&lt;br /&gt;
&lt;br /&gt;
= JOB MONITORING AND CONTROL =&lt;br /&gt;
&lt;br /&gt;
== How to prevent Slurm performance degradation? ==&lt;br /&gt;
&lt;br /&gt;
Almost every invocation of a Slurm client command (e.g. squeue, sacct, sprio or sshare) sends a remote procedure call (RPC) to the Slurm control daemon and/or database. &lt;br /&gt;
If enough remote procedure calls come in at once, this can result in a degradation of performance of the Slurm services for all users, possibly resulting in a denial of service. &lt;br /&gt;
&lt;br /&gt;
Therefore, &#039;&#039;&#039;do not run Slurm client commands that send remote procedure calls from loops in shell scripts or other programs&#039;&#039;&#039; (such as &#039;watch squeue&#039;). Always ensure to limit calls to squeue, sstat, sacct etc. to the minimum necessary for the information you are trying to gather. &lt;br /&gt;
&lt;br /&gt;
Slurm does collect RPC counts and timing statistics by message type and user for diagnostic purposes.&lt;br /&gt;
&lt;br /&gt;
== How to view information about submitted jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] command, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue                  # all jobs owned by user (all jobs owned by all users for admins)&lt;br /&gt;
$ squeue --me             # all jobs owned by user (same as squeue for regular users)&lt;br /&gt;
$ squeue -u &amp;lt;username&amp;gt;    # jobs of specific user&lt;br /&gt;
$ squeue -t PENDING       # pending jobs only&lt;br /&gt;
$ squeue -t RUNNING       # running jobs only&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
* The output format of [https://slurm.schedmd.com/squeue.html squeue] (and most other Slurm commands) is highly configurable to your needs. Look for the --format or --Format options.&lt;br /&gt;
&lt;br /&gt;
* Every invocation of squeue sends a remote procedure call to the Slurm database server. &#039;&#039;&#039;Do not run squeue or other Slurm client commands from loops in shell scripts or other programs&#039;&#039;&#039; as this can result in a degradation of performance. Ensure that programs limit calls to squeue to the minimum necessary for the information you are trying to gather.&lt;br /&gt;
&lt;br /&gt;
== How to cancel jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/scancel.html scancel] command, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel &amp;lt;jobid&amp;gt;         # cancel specific job&lt;br /&gt;
$ scancel &amp;lt;jobid&amp;gt;_&amp;lt;index&amp;gt; # cancel indexed job in a job array&lt;br /&gt;
$ scancel -u &amp;lt;username&amp;gt;   # cancel all jobs of specific user&lt;br /&gt;
$ scancel -t PENDING      # cancel pending jobs&lt;br /&gt;
$ scancel -t RUNNING      # cancel running jobs&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to show job script of a running job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/scontrol.html scontrol] command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol write batch_script &amp;lt;job_id&amp;gt; &amp;lt;file&amp;gt;&lt;br /&gt;
$ scontrol write batch_script &amp;lt;job_id&amp;gt; -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* If file name is omitted default file name will be slurm-&amp;lt;job_id&amp;gt;.sh&lt;br /&gt;
* If file name is - (i.e. dash) job script will be written to stdout.&lt;br /&gt;
&lt;br /&gt;
== How to get estimated start time of a job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ squeue --start&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* Estimated start times are dynamic and can change at any moment. Exact start times of individual jobs are usually unpredictable.&lt;br /&gt;
* Slurm will report N/A for the start time estimate if nodes are not currently being reserved by the scheduler for the job to run on.&lt;br /&gt;
&lt;br /&gt;
== How to show remaining walltime of running jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] with format option &amp;quot;%L&amp;quot;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt; $ squeue -t r -o &amp;quot;%u %i %L&amp;quot; &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to check priority of jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] with format options &amp;quot;%Q&amp;quot; and/or &amp;quot;%p&amp;quot;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue -o &amp;quot;%8i %8u %15a %.10r %.10L %.5D %.10Q&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sprio.html sprio] command to display the priority components (age/fairshare/...) for each job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sprio&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use &amp;quot;[https://slurm.schedmd.com/sshare.html sshare] command for listing the shares of associations, e.g. accounts.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sshare&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to prevent (hold) jobs from being scheduled for execution? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol hold &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to unhold job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol release &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to suspend a running job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol suspend &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to resume a suspended job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol resume &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to requeue (cancel and resubmit) a particular job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol requeue &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to monitor resource usage of running job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use &amp;quot;[https://slurm.schedmd.com/sstat.html sstat] command.&lt;br /&gt;
&lt;br /&gt;
&#039;sstat -e&#039; command shows a list of fields that can be specified with the &#039;--format&#039; option.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sstat --format=JobId,AveCPU,AveRSS,MaxRSS -j &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will show average CPU time, average and maximum memory consumption of all tasks in the running job.&lt;br /&gt;
Ideally, average CPU time equals the number of cores allocated for the job multiplied by the current run time of the job. &lt;br /&gt;
The maximum memory consumption gives an estimate of the peak amount of memory actually needed so far. This can be compared with the amount of memory requested for the job. Over-requesting memory can result in significant waste of compute resources.       &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Users can also ssh into compute nodes that they have one or more running jobs on. Once logged in, they can use standard Linux process monitoring tools like ps, (h)top, free, vmstat, iostat, du, ...&lt;br /&gt;
&lt;br /&gt;
* Users can also attach an interactive shell under an already allocated job by running the following command: &amp;lt;pre&amp;gt;srun --jobid &amp;lt;job&amp;gt; --overlap --pty /bin/bash&amp;lt;/pre&amp;gt; Once logged in, they can again use standard Linux process monitoring tools like ps, (h)top, free, vmstat, iostat, du, ... For a single node job the user does not even need to know the node that the job is running on. For a multinode job, the user can still use &#039;-w &amp;lt;node&amp;gt;&#039; option to specify a specific node.&lt;br /&gt;
&lt;br /&gt;
== How to get detailed job information ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show job 1234  # For job id 1234&lt;br /&gt;
$ scontrol show jobs      # For all jobs&lt;br /&gt;
$ scontrol -o show jobs   # One line per job&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to modify a pending/running job? ==&lt;br /&gt;
&lt;br /&gt;
Use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ scontrol update JobId=&amp;lt;jobid&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
E.g.: &amp;lt;pre&amp;gt;$ scontrol update JobId=42 TimeLimit=7-0&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will modify the time limit of the job to 7 days.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Update requests for &#039;&#039;&#039;running&#039;&#039;&#039; jobs are mostly restricted to Slurm administrators. In particular, only an administrator can increase the TimeLimit of a job.&lt;br /&gt;
&lt;br /&gt;
== How to show accounting data of completed job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sacct.html sacct] command.&lt;br /&gt;
&lt;br /&gt;
&#039;sacct -e&#039; command shows a list of fields that can be&lt;br /&gt;
specified with the &#039;--format&#039; option.&lt;br /&gt;
&lt;br /&gt;
== How to retrieve job history and accounting? ==&lt;br /&gt;
&lt;br /&gt;
For a specific job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -j &amp;lt;jobid&amp;gt; --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For a specific user:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note: Default time window is the current day.&lt;br /&gt;
&lt;br /&gt;
Starting from a specific date:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; -S 2020-01-15 --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Within a time window:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; -S 2020-01-15 -E 2020-01-31 --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
You can also set the environment variable $SACCT_FORMAT to specify the default format. To get a general idea of how efficiently a job utilized its resources, the following format can be used:  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export SACCT_FORMAT=&amp;quot;JobID,JobName,Elapsed,NCPUs,TotalCPU,CPUTime,ReqMem,MaxRSS,MaxDiskRead,MaxDiskWrite,State,ExitCode&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find how efficiently the CPUs were used, divide TotalCPU by CPUTime. To find how efficiently memory were used, devide MaxRSS by ReqMem. But be aware that sacct memory usage measurement doesn&#039;t catch very rapid memory spikes. If your job got killed for running out of memory, it &#039;&#039;&#039;did run out of memory&#039;&#039;&#039; even if sacct reports a lower memory usage than would trigger an out-of-memory-kill. A job that reads or writes excessively to disk might be bogged down significantly by I/O operations.&lt;br /&gt;
&lt;br /&gt;
== How to get efficiency information of completed job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;pre&amp;gt;$ seff &amp;lt;jobid&amp;gt; &amp;lt;/pre&amp;gt; command for some brief information.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; It is good practice to have a look at the efficiency of your job(s) on completion &#039;&#039;&#039;and we expect you to do so&#039;&#039;&#039;. This way you can improve your job specifications in the future.&lt;br /&gt;
&lt;br /&gt;
== How to get complete field values from sstat and sacct commands? ==&lt;br /&gt;
&lt;br /&gt;
When using the [https://slurm.schedmd.com/sacct.html#OPT_format --format] option for listing various fields you can put a %NUMBER afterwards to specify how many characters should be printed.&lt;br /&gt;
&lt;br /&gt;
E.g. &#039;--format=User%30&#039; will print 30 characters for the user name (right justified).  A %-30 will print 30 characters left justified.&lt;br /&gt;
&lt;br /&gt;
sstat and sacct also provide the &#039;--parsable&#039; and &#039;--parsable2&#039; option to always print full field values delimited with a pipe &#039;|&#039; character by default.&lt;br /&gt;
The delimiting character can be specified by using the &#039;--delimiter&#039; option, e.g. &#039;--delimiter=&amp;quot;,&amp;quot;&#039; for comma separated values.&lt;br /&gt;
&lt;br /&gt;
== How to retrieve job records for all jobs running/pending at a certain point in time? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sacct.html sacct] with [https://slurm.schedmd.com/sacct.html#OPT_state -s &amp;lt;state&amp;gt;] and [https://slurm.schedmd.com/sacct.html#OPT_starttime -S &amp;lt;start time&amp;gt;] options, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$  sacct -n -a -X -S 2021-04-01T00:00:00 -s R -o JobID,User%15,Account%10,NCPUS,NNodes,NodeList%1500&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; When specifying the state &amp;quot;-s &amp;lt;state&amp;gt;&amp;quot; &#039;&#039;&#039;and&#039;&#039;&#039; the start time &amp;quot;-S &amp;lt;start time&amp;gt;&amp;quot;, the default &lt;br /&gt;
time window will be set to end time &amp;quot;-E&amp;quot; equal to start time. Thus, you will get a snapshot of all running/pending &lt;br /&gt;
jobs at the instance given by &amp;quot;-S &amp;lt;start time&amp;gt;&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== How to get a parsable list of hostnames from $SLURM_JOB_NODELIST? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show hostnames $SLURM_JOB_NODELIST&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= ADMINISTRATION =&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Most commands in this section are restricted to system administrators.&lt;br /&gt;
&lt;br /&gt;
== How to stop Slurm from scheduling jobs? ==&lt;br /&gt;
&lt;br /&gt;
You can stop Slurm from scheduling jobs on a per partition basis by&lt;br /&gt;
setting that partition&#039;s state to DOWN. Set its state UP to resume&lt;br /&gt;
scheduling. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update PartitionName=foo State=DOWN&lt;br /&gt;
$ scontrol update PartitionName=foo State=UP&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to print actual hardware configuration of a node? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ slurmd -C   # print hardware configuration plus uptime&lt;br /&gt;
$ slurmd -G   # print generic resource configuration&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to reboot (all) nodes as soon as they become idle? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol reboot ASAP nextstate=RESUME &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;  # specific nodes&lt;br /&gt;
$ scontrol reboot ASAP nextstate=RESUME ALL              # all nodes&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to cancel pending reboot of nodes? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol cancel_reboot &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to check current node status? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show node &amp;lt;node&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to instruct all Slurm daemons to re-read the configuration file ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol reconfigure&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to prevent a user from submitting new jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use the following [https://slurm.schedmd.com/sacctmgr.html sacctmgr] command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr update user &amp;lt;username&amp;gt; set maxsubmitjobs=0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
*Job submission is then rejected with the following message:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch job.slurm&lt;br /&gt;
sbatch: error: AssocMaxSubmitJobLimit&lt;br /&gt;
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user&#039;s size and/or time limits)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Use the following command to release the limit:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr update user &amp;lt;username&amp;gt; set maxsubmitjobs=-1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to drain node(s)? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update NodeName=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; State=DRAIN Reason=&amp;quot;Some Reason&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notes:&lt;br /&gt;
&lt;br /&gt;
* Reason is mandatory.&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; just set state DOWN to drain nodes. This will kill any active jobs that may run on that nodes.&lt;br /&gt;
&lt;br /&gt;
== How to list reason for nodes being drained or down? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to resume node state? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update NodeName=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; State=RESUME&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to create a reservation on nodes? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/reservations.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol create reservation user=root starttime=now duration=UNLIMITED flags=maint,ignore_jobs nodes=ALL&lt;br /&gt;
$ scontrol create reservation user=root starttime=2020-12-24T17:00 duration=12:00:00 flags=maint,ignore_jobs nodes=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
$ scontrol show reservation&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Add &amp;quot;FLEX&amp;quot; flag to allow jobs that qualify for the reservation to start before the reservation begins (and continue after it starts). &lt;br /&gt;
Add &amp;quot;MAGNETIC&amp;quot; flag to attract jobs that qualify for the reservation to run in that reservation without having requested it at submit time.&lt;br /&gt;
&lt;br /&gt;
== How to create a floating reservation on nodes? ==&lt;br /&gt;
&lt;br /&gt;
Use the flag &amp;quot;TIME_FLOAT&amp;quot; and a start time that is relative to the current time (use the keyword &amp;quot;now&amp;quot;).&lt;br /&gt;
In the example below, the nodes are prevented from starting any jobs exceeding a walltime of 2 days.&lt;br /&gt;
 &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol create reservation user=root starttime=now+2days duration=UNLIMITED flags=maint,ignore_jobs,time_float nodes=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Floating reservation are not intended to run jobs, but to prevent long running jobs from being initiated on specific nodes. Attempts by users to make use of a floating reservation will be rejected. When ready to perform the maintenance, place the nodes in DRAIN state and delete the reservation.&lt;br /&gt;
&lt;br /&gt;
== How to use a reservation? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch --reservation=foo_6 ... script.slurm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to delete a reservation? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol delete ReservationName=foo_6&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get node oriented information similar to &#039;mdiag -n&#039;? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -N -l&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Fields can be individually customized. See [https://slurm.schedmd.com/sinfo.html sinfo] man page. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -N --format=&amp;quot;%8N %12P %.4C %.8O %.6m %.6e %.8T %.20E&amp;quot;&lt;br /&gt;
&lt;br /&gt;
NODELIST PARTITION    CPUS CPU_LOAD MEMORY FREE_M    STATE               REASON&lt;br /&gt;
n0001    standard*    0/16     0.01 128000 120445     idle                 none&lt;br /&gt;
n0002    standard*    0/16     0.01 128000 120438     idle                 none&lt;br /&gt;
n0003    standard*    0/0/      N/A 128000    N/A    down*       Not responding&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get node oriented information similar to &#039;pbsnodes&#039;? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show nodes                     # One paragraph per node (all nodes)&lt;br /&gt;
$ scontrol show nodes &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;     # One paragraph per node (specified nodes) &lt;br /&gt;
$ scontrol -o show nodes                  # One line per node (all nodes)&lt;br /&gt;
$ scontrol -o show nodes &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;  # One line per node (specified nodes)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to update multiple jobs of a user with a single scontrol command? ==&lt;br /&gt;
&lt;br /&gt;
Not possible. But you can e.g. use squeue to build the script taking&lt;br /&gt;
advantage of its filtering and formatting options.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue -tpd -h -o &amp;quot;scontrol update jobid=%i priority=1000&amp;quot; &amp;gt;my.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also identify the list of jobs and add them to the JobID all at once, for example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update JobID=123 qos=reallylargeqos&lt;br /&gt;
$ scontrol update JobID=123,456,789 qos=reallylargeqos&lt;br /&gt;
$ scontrol update JobID=[123-400],[500-600] qos=reallylargeqos&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another option is to use the JobName, if all the jobs have the same name.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update JobName=&amp;quot;foobar&amp;quot; UserID=johndoe qos=reallylargeqos&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
However, Slurm does not allow the UserID filter alone.&lt;br /&gt;
&lt;br /&gt;
== How to create a new account? ==&lt;br /&gt;
&lt;br /&gt;
Add account at top level in association tree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add account &amp;lt;accountname&amp;gt; Cluster=justus Description=&amp;quot;Account description&amp;quot; Organization=&amp;quot;none&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Add account as child of some parent account in association tree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add account &amp;lt;accountname&amp;gt; parent=&amp;lt;parent_accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to move account to another parent? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify account name=&amp;lt;accountname&amp;gt; set parent=&amp;lt;new_parent_accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to delete an account? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr delete account name=&amp;lt;accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to add a new user? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; DefaultAccount=&amp;lt;accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to add/remove users from an account? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; account=&amp;lt;accountname&amp;gt;                  # Add user to account&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; account=&amp;lt;accountname2&amp;gt;                 # Add user to a second account&lt;br /&gt;
$ sacctmgr remove user &amp;lt;username&amp;gt; where account=&amp;lt;accountname&amp;gt;         # Remove user from this account&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to change default account of a user? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;  &lt;br /&gt;
$ sacctmgr modify user where user=&amp;lt;username&amp;gt; set DefaultAccount=&amp;lt;default_account&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The user must already be associated with the account you want to set as default.&lt;br /&gt;
&lt;br /&gt;
== How to show account information? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr show assoc&lt;br /&gt;
$ sacctmgr show assoc tree&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to implement user resource throttling policies? ==&lt;br /&gt;
&lt;br /&gt;
Quoting from https://bugs.schedmd.com/show_bug.cgi?id=3600#c4&lt;br /&gt;
&lt;br /&gt;
 With Slurm, the associations are meant to establish base limits on the&lt;br /&gt;
 defined partitions, accounts and users. Because limits propagate down&lt;br /&gt;
 through the association tree, you only need to define limits at a high&lt;br /&gt;
 level and those limits will be applied to all partitions, accounts and&lt;br /&gt;
 users that are below it (parent to child). You can also override those&lt;br /&gt;
 high level (parent) limits by explicitly setting different limits at&lt;br /&gt;
 any lower level (on the child). So using the association tree is the&lt;br /&gt;
 best way to get some base limits applied that you want for most cases. &lt;br /&gt;
 QOS&#039;s are meant to override any of those base limits for exceptional&lt;br /&gt;
 cases. Like Maui, you can use QOS&#039;s to set a different priority.&lt;br /&gt;
 Again, the QOS would be overriding the base priority that could be set&lt;br /&gt;
 in the associations.&lt;br /&gt;
&lt;br /&gt;
== How to set a resource limit for an individual user? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/resource_limits.html&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set maxjobs=1            # Limit maximum number of running jobs for user&lt;br /&gt;
$ sacctmgr list assoc user=&amp;lt;username&amp;gt; format=user,maxjobs  # Show that limit&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set maxjobs=-1           # Remove that limit&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to retrieve historical resource usage for a specific user or account? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sreport.html sreport] command.&lt;br /&gt;
&lt;br /&gt;
Examples: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sreport cluster UserUtilizationByAccount Start=2021-01-01 End=2021-12-31 -t Hours user=&amp;lt;username&amp;gt;    # Report cluster utilization of given user broken down by accounts&lt;br /&gt;
$ sreport cluster AccountUtilizationByUser Start=2021-01-01 End=2021-12-31 -t Hours account=&amp;lt;account&amp;gt;  # Report cluster utilization of given account broken down by users    &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* By default CPU resources will be reported. Use &#039;-T&#039; option for other trackable resources, e.g. &#039;-T cpu,mem,gres/gpu,gres/scratch&#039;.&lt;br /&gt;
* On JUSTUS 2 registered compute projects (&amp;quot;Rechenvorhaben&amp;quot;) are uniquely mapped to Slurm accounts of the same name. Thus, &#039;AccountUtilizationByUser&#039; can also be used to report the aggregated cluster utilization of compute projects.&lt;br /&gt;
* Can be executed by regular users as well in which case Slurm will only report their own usage records (but along with the total usage of the associated account in the case of &#039;AccountUtilizationByUser&#039;).&lt;br /&gt;
&lt;br /&gt;
== How to fix/reset a user&#039;s RawUsage value? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; where Account=&amp;lt;account&amp;gt; set RawUsage=&amp;lt;number&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to create/modify/delete QOSes? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/qos.html&lt;br /&gt;
&lt;br /&gt;
Examples:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr show qos                                      # Show existing QOSes&lt;br /&gt;
$ sacctmgr add qos verylong                              # Create new QOS verylong&lt;br /&gt;
$ sacctmgr modify qos verylong set MaxWall=28-00:00:00   # Set maximum walltime limit&lt;br /&gt;
$ sacctmgr modify qos verylong set MaxTRESPerUser=cpu=4  # Set maximum maximum number of CPUS a user can allocate at a given time&lt;br /&gt;
$ sacctmgr modify qos verylong set flags=denyonlimit     # Prevent submission if job requests exceed any limits of QOS&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set qos+=verylong      # Add a QOS to a user account&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set qos-=verylong      # Remove a QOS from a user account&lt;br /&gt;
$ sacctmgr delete qos verylong                           # Delete that QOS&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to find (and fix) runaway jobs? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sacctmgr show runaway&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* Runaway jobs are orphaned jobs that don&#039;t exist in the Slurm controller but have a start and no end time in the Slurm data base. Runaway jobs mess with accounting and affects new jobs of users who have too many runaway jobs. &lt;br /&gt;
* If there are jobs in this state this command will also provide an option to fix them. This will set the end time for each job to the latest out of the start, eligible, or submit times, and set the state to completed.&lt;br /&gt;
&lt;br /&gt;
== How to show a history of database transactions? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sacctmgr list transactions&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Useful to get timestamps for when a user/account/qos has been created/modified/removed etc.&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=14288</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=14288"/>
		<updated>2025-03-06T16:02:25Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- [[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]] --&amp;gt;&lt;br /&gt;
[[File:JUSTUS2_header.png|700px]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is the state-wide high-performance compute cluster dedicated to &#039;&#039;&#039;Computational Chemistry and Quantum Sciences&#039;&#039;&#039; in Baden-Württemberg. &lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Funding for a new Justus3 system has been secured and the procurement of the new system is underway. &lt;br /&gt;
* Our ticketing system has been upgraded. Please check out the visual guide on how to create tickets [[BwSupportPortal|here]].&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* Online Tutorial [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC] (external link)&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [[JUSTUS2/Support|Contact and Support]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
** [[JUSTUS2/Hardware#Node_Specifications|Node Specifications]]&lt;br /&gt;
** [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Jobscripts: Running Your Calculations|Jobscripts: Running Your Calculations]]&lt;br /&gt;
** [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=MediaWiki:Sidebar&amp;diff=13941</id>
		<title>MediaWiki:Sidebar</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=MediaWiki:Sidebar&amp;diff=13941"/>
		<updated>2025-02-10T15:06:17Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;* SEARCH&lt;br /&gt;
* bwHPC Wiki&lt;br /&gt;
** mainpage|Home&lt;br /&gt;
** https://www.bwhpc.de|bwHPC web page&lt;br /&gt;
* bwHPC Systems&lt;br /&gt;
** BwUniCluster2.0|bwUniCluster 2.0&lt;br /&gt;
** JUSTUS2|JUSTUS 2&lt;br /&gt;
** Helix|Helix&lt;br /&gt;
** NEMO|NEMO&lt;br /&gt;
** BinAC|BinAC&lt;br /&gt;
* Documentation&lt;br /&gt;
** Registration | Registration&lt;br /&gt;
** Running_Calculations| Running Calculations&lt;br /&gt;
** Software_Modules|Software Modules&lt;br /&gt;
** https://www.bwhpc.de/software.html|Software Search&lt;br /&gt;
** HPC Glossary | HPC Glossary&lt;br /&gt;
* Support&lt;br /&gt;
** https://training.bwhpc.de|eLearning&lt;br /&gt;
** https://www.bwhpc.de/supportportal.php|Ticketing System&lt;br /&gt;
** Feedback|Feedback&lt;br /&gt;
* Data Storage&lt;br /&gt;
** SDS@hd|SDS@hd&lt;br /&gt;
** https://www.rda.kit.edu/english|bwDataArchive&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Jobscripts:_Running_Your_Calculations&amp;diff=13466</id>
		<title>JUSTUS2/Jobscripts: Running Your Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Jobscripts:_Running_Your_Calculations&amp;diff=13466"/>
		<updated>2024-12-09T14:42:17Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Justus2}}&lt;br /&gt;
&lt;br /&gt;
The JUSTUS 2 cluster uses Slurm ([https://slurm.schedmd.com/ https://slurm.schedmd.com/]) for scheduling compute jobs. &lt;br /&gt;
&lt;br /&gt;
= JUSTUS 2 Slurm Howto =&lt;br /&gt;
&lt;br /&gt;
This page only presents some very basic introduction. &lt;br /&gt;
&lt;br /&gt;
Please see  the &#039;&#039;&#039;[[bwForCluster JUSTUS 2 Slurm HOWTO|JUSTUS 2 Slurm HOWTO]]&#039;&#039;&#039; for many more examples and commands for common tasks.&lt;br /&gt;
&lt;br /&gt;
= Slurm Command Overview =&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Brief explanation&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/sbatch.html sbatch] || Submits a job and queues it in an input queue&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/salloc.html salloc] || Request resources for an interactive job&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/squeue.html squeue] || Displays information about active, eligible, blocked, and/or recently completed jobs &lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/scontrol.html scontrol] || Displays detailed job state information&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/scontrol.html sstat] || Displays status information about a running job&lt;br /&gt;
|- &lt;br /&gt;
| [https://slurm.schedmd.com/scancel.html scancel] || Cancels a job&lt;br /&gt;
|- &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the bwForCluster JUSTUS 2 =&lt;br /&gt;
Batch jobs are submitted with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;$ sbatch &amp;lt;job-script&amp;gt; &amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A job script contains options for Slurm in lines beginning with #SBATCH as well as your commands which you want to execute on the compute nodes. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&#039;bash&#039;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
#SBATCH --time=00:14:00&lt;br /&gt;
#SBATCH --mem=1gb&lt;br /&gt;
echo &#039;Here starts the calculation&#039;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
You can override options from the script on the command-line:&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;$ sbatch --time=03:00:00 &amp;lt;job-script&amp;gt; &amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note: &amp;lt;font color=&amp;quot;red&amp;quot;&amp;gt; Compute jobs must not write/read from the global file systems as a calculation swap file. &amp;lt;/font&amp;gt; &lt;br /&gt;
&lt;br /&gt;
Use local storage /tmp in the ramdisk for small files or /scratch (see [[BwForCluster_JUSTUS_2_Slurm_HOWTO#How_to_request_local_scratch_.28SSD.2FNVMe.29_at_job_submission.3F|How to request NVME]]) for this purpose&lt;br /&gt;
&lt;br /&gt;
To not use the central file system for calculation, you must often configure the the program you are using to write temporary files elsewhere. &lt;br /&gt;
&lt;br /&gt;
If the program uses the current directory to look for files, you must copy files to a temporary directory - and copy/save the results of the calculation in the end, else your results get deleted by automated cleanup happening after the job.&lt;br /&gt;
&lt;br /&gt;
There diskless nodes have a disk in RAM memory, that can have a maximum of half the size of the total RAM. Note that files created plus memory requirement of your job need to fit into the total memory. &lt;br /&gt;
&lt;br /&gt;
There are more diskless nodes than nodes with disks, so if your job can run on a diskless node, you should choose this option. &lt;br /&gt;
&lt;br /&gt;
Example job script with requesting 700GB disk space and copying files:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&#039;bash&#039;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
#SBATCH --time=00:14:00&lt;br /&gt;
#SBATCH --mem=1gb&lt;br /&gt;
#SBATCH --gres=scratch:700 &lt;br /&gt;
# copy input file&lt;br /&gt;
cp $HOME/inputfiles/myinput.inp $SCRATCH&lt;br /&gt;
# switch directory&lt;br /&gt;
cd $SCRATCH&lt;br /&gt;
echo &#039;Here starts the calculation&#039;&lt;br /&gt;
myprogram --input=$SCRATCH/myinput.inp&lt;br /&gt;
# calculation ends&lt;br /&gt;
# copy result&lt;br /&gt;
cp outfile.out results2.txt $HOME/resultdir/job12345&lt;br /&gt;
# clean up&lt;br /&gt;
rm myinput outfile.out results2.txt&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Testing Your Jobs = &lt;br /&gt;
&lt;br /&gt;
Justus2 has three compute nodes reserved for jobs with a walltime under 15 minutes. You can test if your jobs start properly just by specifying a short walltime, e.g. --time=00:14:00 and your job should start very quickly. &lt;br /&gt;
&lt;br /&gt;
= Monitoring Your Jobs =&lt;br /&gt;
== squeue ==&lt;br /&gt;
&lt;br /&gt;
After you submitted the job, you can see it waiting using the &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&lt;br /&gt;
(also read the man page with &amp;lt;code&amp;gt;man squeue&amp;lt;/code&amp;gt; for more information on how to use the command)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&#039;shell&#039;&amp;gt;&lt;br /&gt;
&amp;gt; squeue&lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
             6260301  standard r_60_b_2 ul_yxz1 PD       0:00      1 (AssocGrpMemRunMinutes)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Output shows: &lt;br /&gt;
* JOBID: the jobid is an unique number your job gets&lt;br /&gt;
* PARTITION: the cluster can be divided in different types of nodes.&lt;br /&gt;
* NAME: the name you gave your job with the --job-name= option&lt;br /&gt;
* USER: your username&lt;br /&gt;
* ST: the state the job is in. R = running, PD = pending, CD = completed. See man page for a full list on states. &lt;br /&gt;
* TIME: how long the job has been running&lt;br /&gt;
* NODES: how many nodes were requested&lt;br /&gt;
* NODELIST(REASON): either show the node(s) the job is running on, or a reason why it hasn&#039;t started&lt;br /&gt;
&lt;br /&gt;
==scontrol==&lt;br /&gt;
&lt;br /&gt;
You can then show more info on one specific running job using the &amp;lt;code&amp;gt;scontrol&amp;lt;/code&amp;gt; command, e.g for the job with ID 6260301 listed above:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
&amp;gt; scontrol show job 6260301&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring a Started Job ==&lt;br /&gt;
&lt;br /&gt;
After a job has started, you can ssh to the node(s) the job is running on, using the node name from NODELIST, e.g. if your job runs on n0603:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&amp;gt; ssh n0603 &lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Partitions =&lt;br /&gt;
Job allocations at JUSTUS 2 are routed automatically to the most suitable compute node(s) that can provide the requested resources for the job (e.g. amount of cores, memory, local scratch space). This is to prevent fragmentation of the cluster system and to ensure most efficient usage of available compute resources. Thus, there is no point in requesting a partition in batch job scripts, i.e. users &#039;&#039;&#039;should not&#039;&#039;&#039; specify any partition &amp;quot;-p, --partition=&amp;lt;partition_name&amp;gt;&amp;quot; on job submission. This is of particular importance if you adapt job scripts from other cluster systems (e.g. bwUniCluster 2.0) to JUSTUS 2.&lt;br /&gt;
&lt;br /&gt;
= Job Priorities =&lt;br /&gt;
Job priorities at JUSTUS 2 depend on [https://slurm.schedmd.com/priority_multifactor.html multiple factors ]:&lt;br /&gt;
* Age: The amount of time a job has been waiting in the queue, eligible to be scheduled.&lt;br /&gt;
* Fairshare: The difference between the portion of the computing resource allocated to an association and the amount of resources that has been consumed.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
Jobs that are pending because the user reached one of the resource usage limits (see below) are not eligible to be scheduled and, thus, do not accrue priority by their age.  &lt;br /&gt;
&lt;br /&gt;
Fairshare does &#039;&#039;&#039;not&#039;&#039;&#039; introduce a fixed allotment, in that a user&#039;s ability to run new jobs is cut off as soon as a fixed target utilization is reached. Instead, the fairshare factor ensures that jobs from users who were under-served in the past are given higher priority than jobs from users who were over-served in the past. This keeps individual groups from long term monopolizing the resources, thus making it unfair to groups who have not used their fairshare for quite some time.&lt;br /&gt;
&lt;br /&gt;
Slurm features &#039;&#039;&#039;backfilling&#039;&#039;&#039;, meaning that the scheduler will start lower priority jobs if doing so does not delay the expected start time of &#039;&#039;&#039;any&#039;&#039;&#039; higher priority job. Since the expected start time of pending jobs depends upon the expected completion time of running jobs, reasonably accurate time limits are valuable for backfill scheduling to work well. This &#039;&#039;&#039;[https://youtu.be/OKhWwem1XZg?t=161 video]&#039;&#039;&#039; gives an illustrative description to how backfilling works.&lt;br /&gt;
&lt;br /&gt;
In summary, an approximate model of Slurm&#039;s behavior for scheduling jobs is this:&lt;br /&gt;
&lt;br /&gt;
* Step 1: Can the job in position one (highest priority) start now?&lt;br /&gt;
* Step 2: If it can, remove it from the queue, start it and continue with step 1.&lt;br /&gt;
* Step 3: If it can not, look at next job.&lt;br /&gt;
* Step 4: Can it start now, without delaying the start time of any job before it in the queue?&lt;br /&gt;
* Step 5: If it can, remove it from the queue, start it, recalculate what nodes are free, look at next job and continue with step 4.&lt;br /&gt;
* Step 6: If it can not, look at next job, and continue with step 4.&lt;br /&gt;
&lt;br /&gt;
As soon as a new job is submitted and as soon as a job finishes, Slurm restarts its main scheduling cycle with step 1.&lt;br /&gt;
&lt;br /&gt;
= Usage Limits/Throttling Policies =&lt;br /&gt;
&lt;br /&gt;
While the fairshare factor ensures fair long term balance of resource utilization between users and groups, there are additional usage limits that constrain the total cumulative resources at a given time. This is to prevent individual users from short term monopolizing large fractions of the whole cluster system.&lt;br /&gt;
&lt;br /&gt;
* The &#039;&#039;&#039;maximum walltime&#039;&#039;&#039; for a job is &#039;&#039;&#039;14 days&#039;&#039;&#039; (336 hours)&lt;br /&gt;
  --time=336:00:00 or --time=14-0&lt;br /&gt;
&lt;br /&gt;
* The &#039;&#039;&#039;maximum amount of cores&#039;&#039;&#039; used at any given time from jobs running is &#039;&#039;&#039;1920&#039;&#039;&#039; per user (aggregated over all running jobs). This translates to 40 nodes. An equivalent limit for allocated memory does also apply. If this limit is reached new jobs will be queued (with REASON: AssocGrpCpuLimit) but only allowed to run after resources have been relinquished. &lt;br /&gt;
&lt;br /&gt;
* The maximum amount of &#039;&#039;&#039;remaining allocated core-minutes&#039;&#039;&#039; per user is &#039;&#039;&#039;3300000&#039;&#039;&#039; (aggregated over all running jobs). For example, if a user has a 4-core job running that will complete in 1 hour and a 2-core job that will complete in 6 hours, this translates to 4 * 1 * 60 + 2 * 6 * 60 = 16 * 60 = 960 remaining core-minutes. Once a user reaches the limit, no more jobs are allowed to start (REASON: AssocGrpCPURunMinutesLimit). As the jobs continue to run, the remaining core time will decrease and eventually allow more jobs to start in a staggered way. This limit also &#039;&#039;&#039;correlates the maximum walltime and amount of cores that can be allocated&#039;&#039;&#039; for this amount of time. Thus, shorter walltimes for the jobs allow more resources to be allocated at a given time (but capped by the maximum amount of cores limit above). Watch this &#039;&#039;&#039;[https://youtu.be/OKhWwem1XZg?t=306 video]&#039;&#039;&#039; for an illustrative description. An equivalent limit applies for remaining time of memory allocation in which case jobs may be held back from starting with REASON AssocGrpMemRunMinutes.&lt;br /&gt;
&lt;br /&gt;
* The &#039;&#039;&#039;maximum amount of GPUs&#039;&#039;&#039; allocated by running jobs is &#039;&#039;&#039;8&#039;&#039;&#039; per user. If this limit it reached new jobs will be queued (with REASON: AssocGrpGRES) but only allowed to run after GPU resources have been relinquished. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Usage limits are subject to change.&lt;br /&gt;
&lt;br /&gt;
= Other Considerations =&lt;br /&gt;
&lt;br /&gt;
== Default Values ==&lt;br /&gt;
&lt;br /&gt;
Default values for jobs are:&lt;br /&gt;
&lt;br /&gt;
* Runtime: --time=02:00:00 (2 hours)&lt;br /&gt;
* Nodes: --nodes=1 (one node)&lt;br /&gt;
* Tasks: --tasks-per-node=1 (one task per node)&lt;br /&gt;
* Cores: --cpus-per-task=1 (one core per task)&lt;br /&gt;
* Memory: --mem-per-cpu=2gb (2 GB per core)&lt;br /&gt;
&lt;br /&gt;
== Node Access Policy ==&lt;br /&gt;
&lt;br /&gt;
Node access policy for jobs is &amp;quot;&#039;&#039;&#039;exclusive user&#039;&#039;&#039;&amp;quot;. Nodes will be exclusively allocated to users. &#039;&#039;&#039;Multiple jobs (up to 48) of the same user can run on a single node&#039;&#039;&#039; at any time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; This implies that for &#039;&#039;&#039;sub-node jobs&#039;&#039;&#039;, it is advisable for efficient resource utilization and maximum job throughput to &#039;&#039;&#039;adjust the number of cores to be an integer divisor of 48&#039;&#039;&#039; (total number of cores on each node). For example, two 24-core jobs can run simultaneously on one and the same node, while two 32-core jobs will always have to allocate two separate nodes, but leave 16 cores unused on each of them. Users must therefore always &#039;&#039;&#039;think carefully about how many cores to request&#039;&#039;&#039; and whether their applications really benefit from allocating more cores for their jobs. Similar considerations apply - at the same time - to the &#039;&#039;&#039;requested amount of memory per job&#039;&#039;&#039;. &lt;br /&gt;
&lt;br /&gt;
Think of it as the scheduler playing a game of multi-dimensional Tetris, where the dimensions are number of cores, amount of memory and other resources. &#039;&#039;&#039;Users can support this by making resource allocations that allow the scheduler to pack their jobs as densely as possible on the nodes&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
== Memory Management ==&lt;br /&gt;
&lt;br /&gt;
The &#039;&#039;&#039;wait time of a job also depends largely on the amount of requested resources&#039;&#039;&#039; and the available number of nodes providing this amount of resources. This must be taken into account &#039;&#039;&#039;in particular when requesting a certain amount of memory&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
For example, there is a total of 692 compute nodes in JUSTUS, of which 456 nodes have 192 GB RAM. However, &#039;&#039;&#039;not the entire amount of physical RAM is available exclusively for user jobs&#039;&#039;&#039;, because the operating system, system services and local file systems also require a certain amount of RAM.&lt;br /&gt;
This means that if a job requests 192 GB RAM per node (i.e. --mem=192gb or --tasks-per-node=48 and --mem-per-cpu=4gb), Slurm will rule out 456 out of 692 nodes as being suitable for this job and considers only 220 out of 692 nodes as being eligible for running this job.&lt;br /&gt;
&lt;br /&gt;
The following table provides an overview of how much memory can be allocated by user jobs on the various node types and how many nodes can serve this memory requirement:&lt;br /&gt;
&lt;br /&gt;
{| width=500px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Physical RAM on node !! Available RAM on node !! Number of suitable nodes &lt;br /&gt;
|-&lt;br /&gt;
| 192 GB || 187 GB || 692 &lt;br /&gt;
|-&lt;br /&gt;
| 384 GB || 376 GB || 220&lt;br /&gt;
|-&lt;br /&gt;
| 768 GB || 754 GB || 28&lt;br /&gt;
|-&lt;br /&gt;
| 1536 GB || 1510 GB || 8&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Also note that allocated memory is factored into resource usage accounting for fair share. This means over-requesting memory may have a negative impact on the priority of subsequent jobs.&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=13237</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=13237"/>
		<updated>2024-11-11T23:34:00Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- [[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]] --&amp;gt;&lt;br /&gt;
[[File:JUSTUS2_header.png|700px]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is the state-wide high-performance compute cluster dedicated to &#039;&#039;&#039;Computational Chemistry and Quantum Sciences&#039;&#039;&#039; in Baden-Württemberg. &lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* None &lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* Online Tutorial [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC] (external link)&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [[JUSTUS2/Support|Contact and Support]]&lt;br /&gt;
* Send [[Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
** [[JUSTUS2/Hardware#Node_Specifications|Node Specifications]]&lt;br /&gt;
** [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Running your calculations]]&lt;br /&gt;
** [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_JUSTUS_2_Slurm_HOWTO&amp;diff=13036</id>
		<title>BwForCluster JUSTUS 2 Slurm HOWTO</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_JUSTUS_2_Slurm_HOWTO&amp;diff=13036"/>
		<updated>2024-10-23T15:21:00Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Justus2}}&lt;br /&gt;
&lt;br /&gt;
This is a collection of howtos and convenient Slurm commands for JUSTUS 2.&lt;br /&gt;
&lt;br /&gt;
Some commands behave slightly different depending on whether they are executed &lt;br /&gt;
by a system administrator or by a regular user, as Slurm prevents regular users from accessing critical system information and viewing job and usage information of other users.  &lt;br /&gt;
&lt;br /&gt;
= GENERAL INFORMATION =&lt;br /&gt;
&lt;br /&gt;
== How to find a general quick start user guide? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/quickstart.html&lt;br /&gt;
&lt;br /&gt;
== How to find Slurm FAQ? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/faq.html&lt;br /&gt;
&lt;br /&gt;
== How to find a Slurm cheat sheet? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/pdfs/summary.pdf&lt;br /&gt;
&lt;br /&gt;
== How to find Slurm tutorials? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/tutorials.html&lt;br /&gt;
&lt;br /&gt;
== How to get more information on Slurm? ==&lt;br /&gt;
&lt;br /&gt;
(Almost) every Slurm command has a man page. Use it.&lt;br /&gt;
&lt;br /&gt;
Online versions: https://slurm.schedmd.com/man_index.html&lt;br /&gt;
&lt;br /&gt;
== How to find hardware specific details about JUSTUS 2? ==&lt;br /&gt;
&lt;br /&gt;
See our Wiki page: [[Hardware and Architecture (bwForCluster JUSTUS 2)|Hardware and Architecture]]&lt;br /&gt;
&lt;br /&gt;
= JOB SUBMISSION =&lt;br /&gt;
&lt;br /&gt;
== How to submit a serial batch job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html sbatch]  command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch &amp;lt;job-script&amp;gt; &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample job script template for serial job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# 8 GB memory required per node&lt;br /&gt;
#SBATCH --mem=8G&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=serial_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=serial_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=serial_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# Run serial program&lt;br /&gt;
./my_serial_program&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for serial program: [[Media:Hello_serial.c | Hello_serial.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
* --nodes=1 and --ntasks-per-node=1 may be replaced by --ntasks=1.&lt;br /&gt;
* If not specified, stdout and stderr are both written to slurm-%j.out.&lt;br /&gt;
&lt;br /&gt;
== How to find working sample scripts for my program? ==&lt;br /&gt;
&lt;br /&gt;
Most software modules for applications provide working sample batch scripts.&lt;br /&gt;
Check with [[Software_Modules_Lmod#Module_specific_help | module help]] command, e.g. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module help chem/vasp     # display module help for VASP&lt;br /&gt;
$ module help math/matlab   # display module help for Matlab&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to harden job scripts against common errors? ==&lt;br /&gt;
&lt;br /&gt;
The bash shell provides several options that support users in disclosing hidden bugs and writing safer job scripts.&lt;br /&gt;
In order to activate these safeguard settings users can insert the following lines in their scripts (after all #SBATCH directives):    &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
set -o errexit   # (or set -e) cause batch script to exit immediately when a command fails.&lt;br /&gt;
set -o pipefail  # cause batch script to exit immediately also when the command that failed is embedded in a pipeline&lt;br /&gt;
set -o nounset   # (or set -u) causes the script to treat unset variables as an error and exit immediately &lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to submit an interactive job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/salloc.html salloc] command, e.g.:&lt;br /&gt;
&amp;lt;pre&amp;gt;$ salloc --nodes=1 --ntasks-per-node=8&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
In previous Slurm versions &amp;lt; 20.11 the use of [https://slurm.schedmd.com/srun.html srun] has been the recommended way for launching interactive jobs, e.g.:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ srun --nodes=1 --ntasks-per-node=8 --pty bash &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Although this still works with current Slurm versions this is considered &#039;&#039;&#039;deprecated &#039;&#039;&#039; for current Slurm versions as it may cause issues when launching additional jobs steps from within the interactive job environment. Use [https://slurm.schedmd.com/salloc.html salloc] command.&lt;br /&gt;
&lt;br /&gt;
== How to enable X11 forwarding for an interactive job? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--x11&#039; flag, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc --nodes=1 --ntasks-per-node=8 --x11     # run shell with X11 forwarding enabled&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
* For X11 forwarding to work, you must also enable X11 forwarding for your ssh login from your local computer to the cluster, i.e.:&lt;br /&gt;
 &amp;lt;pre&amp;gt;local&amp;gt; ssh -X &amp;lt;username&amp;gt;@justus2.uni-ulm.de&amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to convert Moab batch job scripts to Slurm? ==&lt;br /&gt;
&lt;br /&gt;
Replace Moab/Torque job specification flags and environment variables in your job&lt;br /&gt;
scripts by their corresponding Slurm counterparts.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Commonly used Moab job specification flags and their Slurm equivalents&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Option !! Moab (msub) !! Slurm (sbatch)&lt;br /&gt;
|-&lt;br /&gt;
| Script directive                            || #MSUB                                  || #SBATCH&lt;br /&gt;
|-&lt;br /&gt;
| Job name                                    || -N &amp;lt;name&amp;gt;                              || --job-name=&amp;lt;name&amp;gt;  (-J &amp;lt;name&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Account                                     || -A &amp;lt;account&amp;gt;                           || --account=&amp;lt;account&amp;gt; (-A &amp;lt;account&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Queue                                       || -q &amp;lt;queue&amp;gt;                             || --partition=&amp;lt;partition&amp;gt; (-p &amp;lt;partition&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Wall time limit                             || -l walltime=&amp;lt;hh:mm:ss&amp;gt;                 || --time=&amp;lt;hh:mm:ss&amp;gt; (-t &amp;lt;hh:mm:ss&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Node count                                  || -l nodes=&amp;lt;count&amp;gt;                       || --nodes=&amp;lt;count&amp;gt; (-N &amp;lt;count&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Core count                                  || -l procs=&amp;lt;count&amp;gt;                       || --ntasks=&amp;lt;count&amp;gt; (-n &amp;lt;count&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Process count per node                      || -l ppn=&amp;lt;count&amp;gt;                         || --ntasks-per-node=&amp;lt;count&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Core count per process                      ||                                        || --cpus-per-task=&amp;lt;count&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Memory limit per node                       || -l mem=&amp;lt;limit&amp;gt;                         || --mem=&amp;lt;limit&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Memory limit per process                    || -l pmem=&amp;lt;limit&amp;gt;                        || --mem-per-cpu=&amp;lt;limit&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Job array                                   || -t &amp;lt;array indices&amp;gt;                     || --array=&amp;lt;indices&amp;gt; (-a &amp;lt;indices&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Node exclusive job                          || -l naccesspolicy=singlejob             || --exclusive&lt;br /&gt;
|-&lt;br /&gt;
| Initial working directory                   || -d &amp;lt;directory&amp;gt; (default: $HOME)        || --chdir=&amp;lt;directory&amp;gt; (-D &amp;lt;directory&amp;gt;) (default: submission directory)&lt;br /&gt;
|-&lt;br /&gt;
| Standard output file                        || -o &amp;lt;file path&amp;gt;                         || --output=&amp;lt;file&amp;gt; (-o &amp;lt;file&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Standard error file                         || -e &amp;lt;file path&amp;gt;                         || --error=&amp;lt;file&amp;gt;  (-e &amp;lt;file&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Combine stdout/stderr to stdout             || -j oe                                  || --output=&amp;lt;combined stdout/stderr file&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Mail notification events                    || -m &amp;lt;event&amp;gt;                             || --mail-type=&amp;lt;events&amp;gt; (valid types include: NONE, BEGIN, END, FAIL, ALL)&lt;br /&gt;
|-&lt;br /&gt;
| Export environment to job                   || -V                                     || --export=ALL (default)&lt;br /&gt;
|-&lt;br /&gt;
| Don&#039;t export environment to job             || (default)                              || --export=NONE&lt;br /&gt;
|-&lt;br /&gt;
| Export environment variables to job         || -v &amp;lt;var[=value][,var2=value2[, ...]]&amp;gt;  || --export=&amp;lt;var[=value][,var2=value2[,...]]&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
* Default initial job working directory is $HOME for Moab. For Slurm the default working directory is where you submit your job from.&lt;br /&gt;
* By default Moab does not export any environment variables to the job&#039;s runtime environment. With Slurm most of the login environment variables are exported to your job&#039;s runtime environment. This includes environment variables from software modules that were loaded at job submission time (and also $HOSTNAME variable).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Commonly used Moab/Torque script environment variables and their Slurm equivalents&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Information                 !! Moab                !! Torque               !! Slurm                                     &lt;br /&gt;
|-&lt;br /&gt;
| Job name                     || $MOAB_JOBNAME        || $PBS_JOBNAME        || $SLURM_JOB_NAME                           &lt;br /&gt;
|-&lt;br /&gt;
| Job ID                       || $MOAB_JOBID          || $PBS_JOBID          || $SLURM_JOB_ID                             &lt;br /&gt;
|-&lt;br /&gt;
| Submit directory             || $MOAB_SUBMITDIR      || $PBS_O_WORKDIR      || $SLURM_SUBMIT_DIR                         &lt;br /&gt;
|-&lt;br /&gt;
| Number of nodes allocated    || $MOAB_NODECOUNT      || $PBS_NUM_NODES      || $SLURM_JOB_NUM_NODES (and: $SLURM_NNODES) &lt;br /&gt;
|-&lt;br /&gt;
| Node list                    || $MOAB_NODELIST       || cat $PBS_NODEFILE   || $SLURM_JOB_NODELIST                       &lt;br /&gt;
|-&lt;br /&gt;
| Number of processes          || $MOAB_PROCCOUNT      || $PBS_TASKNUM        || $SLURM_NTASKS                             &lt;br /&gt;
|-&lt;br /&gt;
| Requested tasks per node     || ---                    || $PBS_NUM_PPN        || $SLURM_NTASKS_PER_NODE                    &lt;br /&gt;
|-&lt;br /&gt;
| Requested CPUs per task      || ---                  || ---                 || $SLURM_CPUS_PER_TASK                      &lt;br /&gt;
|-&lt;br /&gt;
| Job array index              || $MOAB_JOBARRAYINDEX  || $PBS_ARRAY_INDEX    || $SLURM_ARRAY_TASK_ID                      &lt;br /&gt;
|-&lt;br /&gt;
| Job array range              || $MOAB_JOBARRAYRANGE  || -                   || $SLURM_ARRAY_TASK_COUNT                   &lt;br /&gt;
|-&lt;br /&gt;
| Queue name                   || $MOAB_CLASS          || $PBS_QUEUE          || $SLURM_JOB_PARTITION                      &lt;br /&gt;
|-&lt;br /&gt;
| QOS name                     || $MOAB_QOS            || ---                 || $SLURM_JOB_QOS                            &lt;br /&gt;
|-&lt;br /&gt;
| Number of processes per node | ---                   || $PBS_NUM_PPN        || $SLURM_TASKS_PER_NODE                     &lt;br /&gt;
|-&lt;br /&gt;
| Job user                     || $MOAB_USER           || $PBS_O_LOGNAME      || $SLURM_JOB_USER                           &lt;br /&gt;
|-&lt;br /&gt;
| Hostname                     || $MOAB_MACHINE        || $PBS_O_HOST         || $SLURMD_NODENAME                          &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* See [https://slurm.schedmd.com/sbatch.html sbatch] man page for a complete list of flags and environment variables.&lt;br /&gt;
&lt;br /&gt;
== How to emulate Moab output file names? ==&lt;br /&gt;
&lt;br /&gt;
Use the following directives:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#SBATCH --output=&amp;quot;%x.o%j&amp;quot;&lt;br /&gt;
#SBATCH --error=&amp;quot;%x.e%j&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to pass command line arguments to the job script? ==&lt;br /&gt;
&lt;br /&gt;
Run &amp;lt;pre&amp;gt;$ sbatch &amp;lt;job-script&amp;gt; arg1 arg2 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Inside the job script the arguments can be accessed as $1, $2, ...&lt;br /&gt;
&lt;br /&gt;
E.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
infile=&amp;quot;$1&amp;quot;&lt;br /&gt;
outfile=&amp;quot;$2&amp;quot;&lt;br /&gt;
./my_serial_program &amp;lt; &amp;quot;$infile&amp;quot; &amp;gt; &amp;quot;$outfile&amp;quot; 2&amp;gt;&amp;amp;1&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; use $1, $2, ... in &amp;quot;#SBATCH&amp;quot; lines. These parameters can be used only within the regular shell script.&lt;br /&gt;
&lt;br /&gt;
== How to request local scratch (SSD/NVMe) at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--gres=scratch:nnn&#039; option to allocate nnn GB of local (i.e. node-local) scratch space for the entire job.&lt;br /&gt;
&lt;br /&gt;
Example: &#039;--gres=scratch:100&#039; will allocate 100 GB scratch space on a locally attached NVMe device.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; add any unit (such as --gres=scratch:100G). This would be treated as requesting an amount of 10^9 * 100GB of scratch space.&lt;br /&gt;
&lt;br /&gt;
* Multinode jobs get nnn GB of local scratch space on every node of the job.&lt;br /&gt;
&lt;br /&gt;
* Environment variable &#039;&#039;&#039;$SCRATCH&#039;&#039;&#039; will point to &lt;br /&gt;
** /scratch/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt; when local scratch has been requested. This will be on locally attached SSD/NVMe devices.&lt;br /&gt;
** /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt; when no local scratch has been requested. This will be in memory and, thus, be limited in size.&lt;br /&gt;
&lt;br /&gt;
* Environment variable &#039;&#039;&#039;$TMPDIR&#039;&#039;&#039; always points to /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt;. This will always be in memory and, thus, limited in size.&lt;br /&gt;
&lt;br /&gt;
* For backward compatibility environment variable $RAMDISK always points to /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Scratch space allocation in /scratch will be enforced by quota limits&lt;br /&gt;
&lt;br /&gt;
* Data written to $TMPDIR will always count against allocated memory.&lt;br /&gt;
&lt;br /&gt;
* Data written to local scratch space will automatically be removed at the end of the job.&lt;br /&gt;
&lt;br /&gt;
== How to request GPGPU nodes at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--gres=gpu:&amp;lt;count&amp;gt;&#039; option to allocate 1 or 2 GPUs per node for the entire job.&lt;br /&gt;
&lt;br /&gt;
Example: &#039;--gres=gpu:1&#039; will allocate one GPU per node for this job.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* GPGPU nodes are equipped with two Nvidia V100S cards &lt;br /&gt;
&lt;br /&gt;
* Environment variables $CUDA_VISIBLE_DEVICES, $SLURM_JOB_GPUS and $GPU_DEVICE_ORDINAL will denote card(s) allocated for the job.&lt;br /&gt;
&lt;br /&gt;
* CUDA Toolkit is available as software module devel/cuda.&lt;br /&gt;
&lt;br /&gt;
== How to clean-up or save files before a job times out? ==&lt;br /&gt;
&lt;br /&gt;
Possibly you would like to clean up the work directory or save intermediate result files in case a job times out.&lt;br /&gt;
&lt;br /&gt;
The following sample script may serve as a blueprint for implementing a pre-termination function to perform clean-up or file recovery actions. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# 2 GB memory required per node&lt;br /&gt;
#SBATCH --mem=2G&lt;br /&gt;
# Request 10 GB local scratch space&lt;br /&gt;
#SBATCH --gres=scratch:10&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=10:00&lt;br /&gt;
# Send the USR1 signal 120 seconds before end of time limit&lt;br /&gt;
#SBATCH --signal=B:USR1@120&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=signal_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=signal_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=signal_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Define the signal handler function&lt;br /&gt;
# Note: This is not executed here, but rather when the associated &lt;br /&gt;
# signal is received by the shell.&lt;br /&gt;
finalize_job()&lt;br /&gt;
{&lt;br /&gt;
    # Do whatever cleanup you want here. In this example we copy&lt;br /&gt;
    # output file(s) back to $SLURM_SUBMIT_DIR, but you may implement &lt;br /&gt;
    # your own job finalization code here.&lt;br /&gt;
    echo &amp;quot;function finalize_job called at `date`&amp;quot;&lt;br /&gt;
    cd $SCRATCH&lt;br /&gt;
    mkdir -vp &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/results&lt;br /&gt;
    tar czvf &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/results/${SLURM_JOB_ID}.tgz output*.txt&lt;br /&gt;
    exit&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# Call finalize_job function as soon as we receive USR1 signal&lt;br /&gt;
trap &#039;finalize_job&#039; USR1&lt;br /&gt;
&lt;br /&gt;
# Copy input files for this job to the scratch directory (if needed).&lt;br /&gt;
# Note: Environment variable $SCRATCH always points to a scratch directory &lt;br /&gt;
# automatically created for this job. Environment variable $SLURM_SUBMIT_DIR &lt;br /&gt;
# points to the path where this script was submitted from.&lt;br /&gt;
# Example:&lt;br /&gt;
# cp -v &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/input*.txt &amp;quot;$SCRATCH&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Change working directory to local scratch directory&lt;br /&gt;
cd &amp;quot;$SCRATCH&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# This is where the actual work is done. In this case we just create &lt;br /&gt;
# a sample output file for 900 (=15*60) seconds, but since we asked &lt;br /&gt;
# Slurm for 600 seconds only it will not be able finish within this &lt;br /&gt;
# wall time.&lt;br /&gt;
# Note: It is important to run this task in the background &lt;br /&gt;
# by placing the &amp;amp; symbol at the end. Otherwise the signal handler &lt;br /&gt;
# would not be executed until that process has finished, which is not &lt;br /&gt;
# what we want.&lt;br /&gt;
(for i in `seq 15`; do echo &amp;quot;Hello World at `date +%H:%M:%S`.&amp;quot;; sleep 60; done) &amp;gt;output.txt 2&amp;gt;&amp;amp;1 &amp;amp;&lt;br /&gt;
&lt;br /&gt;
# Note: The command above is just for illustration. Normally you would just run&lt;br /&gt;
# my_program &amp;gt;output.txt 2&amp;gt;&amp;amp;1 &amp;amp;&lt;br /&gt;
&lt;br /&gt;
# Tell the shell to wait for background task(s) to finish. &lt;br /&gt;
# Note: This is important because otherwise the parent shell &lt;br /&gt;
# (this script) would proceed (and terminate) without waiting for &lt;br /&gt;
# background task(s) to finish.&lt;br /&gt;
wait&lt;br /&gt;
&lt;br /&gt;
# If we get here, the job did not time out but finished in time.&lt;br /&gt;
&lt;br /&gt;
# Release user defined signal handler for USR1&lt;br /&gt;
trap - USR1&lt;br /&gt;
&lt;br /&gt;
# Do regular cleanup and save files. In this example we simply call &lt;br /&gt;
# the same function that we defined as a signal handler above, but you &lt;br /&gt;
# may implement your own code here. &lt;br /&gt;
finalize_job&lt;br /&gt;
&lt;br /&gt;
exit&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* The number of seconds specified in --signal option must match the runtime of the pre-termination function and must not exceed 65535 seconds.&lt;br /&gt;
&lt;br /&gt;
* Due to the resolution of event handling by Slurm, the signal may be sent a little earlier than specified.&lt;br /&gt;
&lt;br /&gt;
== How to submit a multithreaded batch job? ==&lt;br /&gt;
&lt;br /&gt;
Sample job script template for a job running one multithreaded program instance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# Number of cores per program instance&lt;br /&gt;
#SBATCH --cpus-per-task=8&lt;br /&gt;
# 8 GB memory required per node&lt;br /&gt;
#SBATCH --mem=8G&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=multithreaded_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=multithreaded_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=multithreaded_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}&lt;br /&gt;
export MKL_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}&lt;br /&gt;
&lt;br /&gt;
# Run multithreaded program&lt;br /&gt;
./my_multithreaded_program&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for multithreaded program: [[Media:Hello_openmp.c | Hello_openmp.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* In our configuration each physical core is considered a &amp;quot;CPU&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
* On JUSTUS 2 it is recommended to specify a number of cores per task (&#039;--cpus-per-task&#039;) that is either an integer divisor of 24 or (at most) 48.&lt;br /&gt;
&lt;br /&gt;
* Required memory can also by specified per allocated CPU with &#039;--mem-per-cpu&#039; option. &lt;br /&gt;
&lt;br /&gt;
* The &#039;--mem&#039; and &#039;--mem-per-cpu&#039; options are mutually exclusive.&lt;br /&gt;
&lt;br /&gt;
==  How to submit an array job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_array -a] (or [https://slurm.schedmd.com/sbatch.html#OPT_array --array]) option, e.g. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -a 1-16%8 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will submit 16 tasks to be executed, each one indexed by SLURM_ARRAY_TASK_ID ranging from 1 to 16, but will limit the number of simultaneously running tasks from this job array to 8.&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an array job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Number of cores per individual array task&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --array=1-16%8&lt;br /&gt;
#SBATCH --mem=4G&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=array_job&lt;br /&gt;
#SBATCH --output=array_job-%A_%a.out&lt;br /&gt;
#SBATCH --error=array_job-%A_%a.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# Print the task id.&lt;br /&gt;
echo &amp;quot;My SLURM_ARRAY_TASK_ID: &amp;quot; $SLURM_ARRAY_TASK_ID&lt;br /&gt;
&lt;br /&gt;
# Add lines here to run your computations, e.g.&lt;br /&gt;
# ./my_program &amp;lt;input.$SLURM_ARRAY_TASK_ID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Placeholder %A will be replaced by the master job id, %a will be replaced by the array task id.&lt;br /&gt;
&lt;br /&gt;
* Every sub job in an array job will have its own unique environment variable $SLURM_JOB_ID. Environment variable $SLURM_ARRAY_JOB_ID  will be set to the first job array index value for all tasks.&lt;br /&gt;
&lt;br /&gt;
* The remaining options in the sample job script are the same as the options used in other, non-array jobs. In the example above, we are requesting that each array task be allocated 1 CPU (--ntasks=1) and 4 GB of memory (--mem=4G) for up to one hour (--time=01:00:00).&lt;br /&gt;
&lt;br /&gt;
* More information: https://slurm.schedmd.com/job_array.html&lt;br /&gt;
&lt;br /&gt;
== How to delay the start of a job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_begin -b] (or [https://slurm.schedmd.com/sbatch.html#OPT_begin --begin]) option in order to defer the allocation of the job until the specified time.&lt;br /&gt;
&lt;br /&gt;
Examples:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch --begin=20:00 ...               # job can start after 8 p.m. &lt;br /&gt;
sbatch --begin=now+1hour ...           # job can start 1 hour after submission&lt;br /&gt;
sbatch --begin=teatime ...             # job can start at teatime (4 p.m.)&lt;br /&gt;
sbatch --begin=2023-12-24T20:00:00 ... # job can start after specified date/time&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to submit dependency (chain) jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_dependency -d] (or [https://slurm.schedmd.com/sbatch.html#OPT_dependency --dependency]) option, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -d afterany:123456 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will defer the submitted job until the specified job 123456 has terminated.&lt;br /&gt;
&lt;br /&gt;
Slurm supports a number of different dependency types, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-d after:123456      # job can begin execution after the specified job has begun execution&lt;br /&gt;
-d afterany:123456   # job can begin execution after the specified job has finished&lt;br /&gt;
-d afternotok:123456 # job can begin execution after the specified job has failed (exit code not equal zero)&lt;br /&gt;
-d afterok:123456    # job can begin execution after the specified job has successfully finished (exit code zero)&lt;br /&gt;
-d singleton         # job can begin execution after any previously job with the same job name and user have finished&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Multiple jobs can be specified by separating their job ids by colon characters (:), e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt; $ sbatch -d afterany:123456:123457 ... &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will defer the submitted job until the specified jobs 123456 and 123457 have both finished.&lt;br /&gt;
&lt;br /&gt;
== How to deal with invalid job dependencies? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_kill-on-invalid-dep --kill-on-invalid-dep=yes] option in order to automatically terminate jobs which can never run due to invalid dependencies. By default the job stays pending with reason &#039;DependencyNeverSatisfied&#039; to allow review and appropriate action by the user.  &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; A job dependency may also become invalid if a job has been submitted with &#039;-d afterok:&amp;lt;jobid&amp;gt;&#039; but the specified dependency job has failed, e.g. because it timed out (i.e. exceeded its wall time limit).&lt;br /&gt;
&lt;br /&gt;
== How to submit an MPI batch job? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/mpi_guide.html&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an MPI job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate two nodes&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=48&lt;br /&gt;
# Allocate 32 GB memory per node&lt;br /&gt;
#SBATCH --mem=32gb&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=mpi_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=mpi_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Add lines here to run your computations, e.g.&lt;br /&gt;
#&lt;br /&gt;
# Option 1: Lauch MPI tasks by using mpirun&lt;br /&gt;
#&lt;br /&gt;
# for OpenMPI and GNU compiler:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/gnu&lt;br /&gt;
# module load mpi/openmpi&lt;br /&gt;
# mpirun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# for Intel MPI and Intel complier:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/intel&lt;br /&gt;
# module load mpi/impi&lt;br /&gt;
# mpirun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# Option 2: Launch MPI tasks by using srun&lt;br /&gt;
#&lt;br /&gt;
# for OpenMPI and GNU compiler:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/gnu&lt;br /&gt;
# module load mpi/openmpi&lt;br /&gt;
# srun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# for Intel MPI and Intel compiler:&lt;br /&gt;
#&lt;br /&gt;
module load compiler/intel&lt;br /&gt;
module load mpi/impi&lt;br /&gt;
srun  ./my_mpi_program&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for MPI program: [[Media:Hello_mpi.c | Hello_mpi.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* SchedMD recommends to use srun and many (most?) sites do so as well. The rationale is that srun is more tightly integrated with the scheduler and provides more consistent and reliable resource tracking and accounting for individual jobs and job steps. mpirun may behave differently for different MPI implementations and versions. There are reports that claim &amp;quot;strange behavior&amp;quot; of mpirun especially when using task affinity and core binding. Using srun is supposed to resolve these issues and is therefore highly recommended.&lt;br /&gt;
* Do not run batch jobs that launch a large number (hundreds or thousands) short running (few minutes or less) MPI programs, e.g. from a shell loop. Every single MPI invocation does generate its own job step and sends remote procedure calls to the Slurm controller server. This can result in degradation of performance for both, Slurm and the application, especially if many of that jobs happen to run at the same time. Jobs of that kind can even get stuck without showing any further activity until hitting the wall time limit. For high throughput computing (e.g. processing a large number of files with every single task running independently from each other and very shortly), consider a more appropriate parallelization paradigm that invokes independent serial (non-MPI) processes in parallel at the same time. This approach is sometimes referred to as &amp;quot;[https://en.wikipedia.org/wiki/Embarrassingly_parallel pleasingly parallel]&amp;quot; workload. GNU Parallel is a shell tool that facilitates executing serial tasks in parallel. On JUSTUS 2 this tool is available as a software module &amp;quot;system/parallel&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== How to submit a hybrid MPI/OpenMP job? ==&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an hybrid job:&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Number of nodes to allocate&lt;br /&gt;
#SBATCH --nodes=4&lt;br /&gt;
# Number of MPI instances (ranks) to be executed per node&lt;br /&gt;
#SBATCH --ntasks-per-node=2&lt;br /&gt;
# Number of threads per MPI instance&lt;br /&gt;
#SBATCH --cpus-per-task=24&lt;br /&gt;
# Allocate 8 GB memory per node&lt;br /&gt;
#SBATCH --mem=8gb&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=hybrid_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=hybrid_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=hybrid_job-%j.err&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}&lt;br /&gt;
export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK}&lt;br /&gt;
&lt;br /&gt;
module load compiler/intel&lt;br /&gt;
module load mpi/impi&lt;br /&gt;
srun ./my_hybrid_program&lt;br /&gt;
&lt;br /&gt;
# or:&lt;br /&gt;
# mpirun ./my_hybrid_program&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for hybrid program: [[Media:Hello_hybrid.c | Hello_hybrid.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* $SLURM_CPUS_PER_TASK is only set if the &#039;--cpus-per-task&#039; option is specified.&lt;br /&gt;
&lt;br /&gt;
== How to request specific node(s) at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_nodelist -w] (or [https://slurm.schedmd.com/sbatch.html#OPT_nodelist --nodelist]) option, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -w &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also see [https://slurm.schedmd.com/sbatch.html#OPT_nodefile -F] (or [https://slurm.schedmd.com/sbatch.html#OPT_nodefile --nodefile]) option.&lt;br /&gt;
&lt;br /&gt;
== How to exclude specific nodes from job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_exclude -x] (or [https://slurm.schedmd.com/sbatch.html#OPT_exclude --exclude]) option, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -x &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get exclusive jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--exclusive&#039; option on job submission. This makes sure that there will be no other jobs running on your nodes. Very useful for benchmarking!&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* --exclusive option does &#039;&#039;&#039;not&#039;&#039;&#039; mean that you automatically get full access to all the resources which the node might provide without explicitly requesting them.&lt;br /&gt;
&lt;br /&gt;
== How to avoid sharing nodes with other users? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--exclusive=user&#039; option on job submission. This will still allow multiple jobs of one and the same user on the nodes.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Depending on configuration, exclusive=user may (and probably will) be the default node access policy on JUSTUS 2.&lt;br /&gt;
&lt;br /&gt;
==  How to submit batch job without job script? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_wrap --wrap] option.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch --nodes=2 --ntasks-per-node=16 --wrap &amp;quot;sleep 600&amp;quot;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; May be useful for testing purposes.&lt;br /&gt;
&lt;br /&gt;
= JOB MONITORING AND CONTROL =&lt;br /&gt;
&lt;br /&gt;
== How to prevent Slurm performance degradation? ==&lt;br /&gt;
&lt;br /&gt;
Almost every invocation of a Slurm client command (e.g. squeue, sacct, sprio or sshare) sends a remote procedure call (RPC) to the Slurm control daemon and/or database. &lt;br /&gt;
If enough remote procedure calls come in at once, this can result in a degradation of performance of the Slurm services for all users, possibly resulting in a denial of service. &lt;br /&gt;
&lt;br /&gt;
Therefore, &#039;&#039;&#039;do not run Slurm client commands that send remote procedure calls from loops in shell scripts or other programs&#039;&#039;&#039; (such as &#039;watch squeue&#039;). Always ensure to limit calls to squeue, sstat, sacct etc. to the minimum necessary for the information you are trying to gather. &lt;br /&gt;
&lt;br /&gt;
Slurm does collect RPC counts and timing statistics by message type and user for diagnostic purposes.&lt;br /&gt;
&lt;br /&gt;
== How to view information about submitted jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] command, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue                  # all jobs owned by user (all jobs owned by all users for admins)&lt;br /&gt;
$ squeue --me             # all jobs owned by user (same as squeue for regular users)&lt;br /&gt;
$ squeue -u &amp;lt;username&amp;gt;    # jobs of specific user&lt;br /&gt;
$ squeue -t PENDING       # pending jobs only&lt;br /&gt;
$ squeue -t RUNNING       # running jobs only&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
* The output format of [https://slurm.schedmd.com/squeue.html squeue] (and most other Slurm commands) is highly configurable to your needs. Look for the --format or --Format options.&lt;br /&gt;
&lt;br /&gt;
* Every invocation of squeue sends a remote procedure call to the Slurm database server. &#039;&#039;&#039;Do not run squeue or other Slurm client commands from loops in shell scripts or other programs&#039;&#039;&#039; as this can result in a degradation of performance. Ensure that programs limit calls to squeue to the minimum necessary for the information you are trying to gather.&lt;br /&gt;
&lt;br /&gt;
== How to cancel jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/scancel.html scancel] command, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel &amp;lt;jobid&amp;gt;         # cancel specific job&lt;br /&gt;
$ scancel &amp;lt;jobid&amp;gt;_&amp;lt;index&amp;gt; # cancel indexed job in a job array&lt;br /&gt;
$ scancel -u &amp;lt;username&amp;gt;   # cancel all jobs of specific user&lt;br /&gt;
$ scancel -t PENDING      # cancel pending jobs&lt;br /&gt;
$ scancel -t RUNNING      # cancel running jobs&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to show job script of a running job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/scontrol.html scontrol] command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol write batch_script &amp;lt;job_id&amp;gt; &amp;lt;file&amp;gt;&lt;br /&gt;
$ scontrol write batch_script &amp;lt;job_id&amp;gt; -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* If file name is omitted default file name will be slurm-&amp;lt;job_id&amp;gt;.sh&lt;br /&gt;
* If file name is - (i.e. dash) job script will be written to stdout.&lt;br /&gt;
&lt;br /&gt;
== How to get estimated start time of a job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ squeue --start&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* Estimated start times are dynamic and can change at any moment. Exact start times of individual jobs are usually unpredictable.&lt;br /&gt;
* Slurm will report N/A for the start time estimate if nodes are not currently being reserved by the scheduler for the job to run on.&lt;br /&gt;
&lt;br /&gt;
== How to show remaining walltime of running jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] with format option &amp;quot;%L&amp;quot;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt; $ squeue -t r -o &amp;quot;%u %i %L&amp;quot; &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to check priority of jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] with format options &amp;quot;%Q&amp;quot; and/or &amp;quot;%p&amp;quot;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue -o &amp;quot;%8i %8u %15a %.10r %.10L %.5D %.10Q&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sprio.html sprio] command to display the priority components (age/fairshare/...) for each job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sprio&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use &amp;quot;[https://slurm.schedmd.com/sshare.html sshare] command for listing the shares of associations, e.g. accounts.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sshare&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to prevent (hold) jobs from being scheduled for execution? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol hold &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to unhold job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol release &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to suspend a running job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol suspend &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to resume a suspended job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol resume &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to requeue (cancel and resubmit) a particular job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol requeue &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to monitor resource usage of running job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use &amp;quot;[https://slurm.schedmd.com/sstat.html sstat] command.&lt;br /&gt;
&lt;br /&gt;
&#039;sstat -e&#039; command shows a list of fields that can be specified with the &#039;--format&#039; option.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sstat --format=JobId,AveCPU,AveRSS,MaxRSS -j &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will show average CPU time, average and maximum memory consumption of all tasks in the running job.&lt;br /&gt;
Ideally, average CPU time equals the number of cores allocated for the job multiplied by the current run time of the job. &lt;br /&gt;
The maximum memory consumption gives an estimate of the peak amount of memory actually needed so far. This can be compared with the amount of memory requested for the job. Over-requesting memory can result in significant waste of compute resources.       &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Users can also ssh into compute nodes that they have one or more running jobs on. Once logged in, they can use standard Linux process monitoring tools like ps, (h)top, free, vmstat, iostat, du, ...&lt;br /&gt;
&lt;br /&gt;
* Users can also attach an interactive shell under an already allocated job by running the following command: &amp;lt;pre&amp;gt;srun --jobid &amp;lt;job&amp;gt; --overlap --pty /bin/bash&amp;lt;/pre&amp;gt; Once logged in, they can again use standard Linux process monitoring tools like ps, (h)top, free, vmstat, iostat, du, ... For a single node job the user does not even need to know the node that the job is running on. For a multinode job, the user can still use &#039;-w &amp;lt;node&amp;gt;&#039; option to specify a specific node.&lt;br /&gt;
&lt;br /&gt;
== How to get detailed job information ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show job 1234  # For job id 1234&lt;br /&gt;
$ scontrol show jobs      # For all jobs&lt;br /&gt;
$ scontrol -o show jobs   # One line per job&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to modify a pending/running job? ==&lt;br /&gt;
&lt;br /&gt;
Use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ scontrol update JobId=&amp;lt;jobid&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
E.g.: &amp;lt;pre&amp;gt;$ scontrol update JobId=42 TimeLimit=7-0&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will modify the time limit of the job to 7 days.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Update requests for &#039;&#039;&#039;running&#039;&#039;&#039; jobs are mostly restricted to Slurm administrators. In particular, only an administrator can increase the TimeLimit of a job.&lt;br /&gt;
&lt;br /&gt;
== How to show accounting data of completed job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sacct.html sacct] command.&lt;br /&gt;
&lt;br /&gt;
&#039;sacct -e&#039; command shows a list of fields that can be&lt;br /&gt;
specified with the &#039;--format&#039; option.&lt;br /&gt;
&lt;br /&gt;
== How to retrieve job history and accounting? ==&lt;br /&gt;
&lt;br /&gt;
For a specific job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -j &amp;lt;jobid&amp;gt; --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For a specific user:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note: Default time window is the current day.&lt;br /&gt;
&lt;br /&gt;
Starting from a specific date:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; -S 2020-01-15 --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Within a time window:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; -S 2020-01-15 -E 2020-01-31 --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
You can also set the environment variable $SACCT_FORMAT to specify the default format. To get a general idea of how efficiently a job utilized its resources, the following format can be used:  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export SACCT_FORMAT=&amp;quot;JobID,JobName,Elapsed,NCPUs,TotalCPU,CPUTime,ReqMem,MaxRSS,MaxDiskRead,MaxDiskWrite,State,ExitCode&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find how efficiently the CPUs were used, divide TotalCPU by CPUTime. To find how efficiently memory were used, devide MaxRSS by ReqMem. But be aware that sacct memory usage measurement doesn&#039;t catch very rapid memory spikes. If your job got killed for running out of memory, it &#039;&#039;&#039;did run out of memory&#039;&#039;&#039; even if sacct reports a lower memory usage than would trigger an out-of-memory-kill. A job that reads or writes excessively to disk might be bogged down significantly by I/O operations.&lt;br /&gt;
&lt;br /&gt;
== How to get efficiency information of completed job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;pre&amp;gt;$ seff &amp;lt;jobid&amp;gt; &amp;lt;/pre&amp;gt; command for some brief information.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; It is good practice to have a look at the efficiency of your job(s) on completion &#039;&#039;&#039;and we expect you to do so&#039;&#039;&#039;. This way you can improve your job specifications in the future.&lt;br /&gt;
&lt;br /&gt;
== How to get complete field values from sstat and sacct commands? ==&lt;br /&gt;
&lt;br /&gt;
When using the [https://slurm.schedmd.com/sacct.html#OPT_format --format] option for listing various fields you can put a %NUMBER afterwards to specify how many characters should be printed.&lt;br /&gt;
&lt;br /&gt;
E.g. &#039;--format=User%30&#039; will print 30 characters for the user name (right justified).  A %-30 will print 30 characters left justified.&lt;br /&gt;
&lt;br /&gt;
sstat and sacct also provide the &#039;--parsable&#039; and &#039;--parsable2&#039; option to always print full field values delimited with a pipe &#039;|&#039; character by default.&lt;br /&gt;
The delimiting character can be specified by using the &#039;--delimiter&#039; option, e.g. &#039;--delimiter=&amp;quot;,&amp;quot;&#039; for comma separated values.&lt;br /&gt;
&lt;br /&gt;
== How to retrieve job records for all jobs running/pending at a certain point in time? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sacct.html sacct] with [https://slurm.schedmd.com/sacct.html#OPT_state -s &amp;lt;state&amp;gt;] and [https://slurm.schedmd.com/sacct.html#OPT_starttime -S &amp;lt;start time&amp;gt;] options, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$  sacct -n -a -X -S 2021-04-01T00:00:00 -s R -o JobID,User%15,Account%10,NCPUS,NNodes,NodeList%1500&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; When specifying the state &amp;quot;-s &amp;lt;state&amp;gt;&amp;quot; &#039;&#039;&#039;and&#039;&#039;&#039; the start time &amp;quot;-S &amp;lt;start time&amp;gt;&amp;quot;, the default &lt;br /&gt;
time window will be set to end time &amp;quot;-E&amp;quot; equal to start time. Thus, you will get a snapshot of all running/pending &lt;br /&gt;
jobs at the instance given by &amp;quot;-S &amp;lt;start time&amp;gt;&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== How to get a parsable list of hostnames from $SLURM_JOB_NODELIST? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show hostnames $SLURM_JOB_NODELIST&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= ADMINISTRATION =&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Most commands in this section are restricted to system administrators.&lt;br /&gt;
&lt;br /&gt;
== How to stop Slurm from scheduling jobs? ==&lt;br /&gt;
&lt;br /&gt;
You can stop Slurm from scheduling jobs on a per partition basis by&lt;br /&gt;
setting that partition&#039;s state to DOWN. Set its state UP to resume&lt;br /&gt;
scheduling. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update PartitionName=foo State=DOWN&lt;br /&gt;
$ scontrol update PartitionName=foo State=UP&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to print actual hardware configuration of a node? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ slurmd -C   # print hardware configuration plus uptime&lt;br /&gt;
$ slurmd -G   # print generic resource configuration&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to reboot (all) nodes as soon as they become idle? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol reboot ASAP nextstate=RESUME &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;  # specific nodes&lt;br /&gt;
$ scontrol reboot ASAP nextstate=RESUME ALL              # all nodes&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to cancel pending reboot of nodes? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol cancel_reboot &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to check current node status? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show node &amp;lt;node&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to instruct all Slurm daemons to re-read the configuration file ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol reconfigure&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to prevent a user from submitting new jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use the following [https://slurm.schedmd.com/sacctmgr.html sacctmgr] command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr update user &amp;lt;username&amp;gt; set maxsubmitjobs=0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
*Job submission is then rejected with the following message:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch job.slurm&lt;br /&gt;
sbatch: error: AssocMaxSubmitJobLimit&lt;br /&gt;
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user&#039;s size and/or time limits)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Use the following command to release the limit:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr update user &amp;lt;username&amp;gt; set maxsubmitjobs=-1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to drain node(s)? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update NodeName=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; State=DRAIN Reason=&amp;quot;Some Reason&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notes:&lt;br /&gt;
&lt;br /&gt;
* Reason is mandatory.&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; just set state DOWN to drain nodes. This will kill any active jobs that may run on that nodes.&lt;br /&gt;
&lt;br /&gt;
== How to list reason for nodes being drained or down? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to resume node state? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update NodeName=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; State=RESUME&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to create a reservation on nodes? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/reservations.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol create reservation user=root starttime=now duration=UNLIMITED flags=maint,ignore_jobs nodes=ALL&lt;br /&gt;
$ scontrol create reservation user=root starttime=2020-12-24T17:00 duration=12:00:00 flags=maint,ignore_jobs nodes=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
$ scontrol show reservation&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Add &amp;quot;FLEX&amp;quot; flag to allow jobs that qualify for the reservation to start before the reservation begins (and continue after it starts). &lt;br /&gt;
Add &amp;quot;MAGNETIC&amp;quot; flag to attract jobs that qualify for the reservation to run in that reservation without having requested it at submit time.&lt;br /&gt;
&lt;br /&gt;
== How to create a floating reservation on nodes? ==&lt;br /&gt;
&lt;br /&gt;
Use the flag &amp;quot;TIME_FLOAT&amp;quot; and a start time that is relative to the current time (use the keyword &amp;quot;now&amp;quot;).&lt;br /&gt;
In the example below, the nodes are prevented from starting any jobs exceeding a walltime of 2 days.&lt;br /&gt;
 &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol create reservation user=root starttime=now+2days duration=UNLIMITED flags=maint,ignore_jobs,time_float nodes=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Floating reservation are not intended to run jobs, but to prevent long running jobs from being initiated on specific nodes. Attempts by users to make use of a floating reservation will be rejected. When ready to perform the maintenance, place the nodes in DRAIN state and delete the reservation.&lt;br /&gt;
&lt;br /&gt;
== How to use a reservation? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch --reservation=foo_6 ... script.slurm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to delete a reservation? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol delete ReservationName=foo_6&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get node oriented information similar to &#039;mdiag -n&#039;? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -N -l&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Fields can be individually customized. See [https://slurm.schedmd.com/sinfo.html sinfo] man page. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -N --format=&amp;quot;%8N %12P %.4C %.8O %.6m %.6e %.8T %.20E&amp;quot;&lt;br /&gt;
&lt;br /&gt;
NODELIST PARTITION    CPUS CPU_LOAD MEMORY FREE_M    STATE               REASON&lt;br /&gt;
n0001    standard*    0/16     0.01 128000 120445     idle                 none&lt;br /&gt;
n0002    standard*    0/16     0.01 128000 120438     idle                 none&lt;br /&gt;
n0003    standard*    0/0/      N/A 128000    N/A    down*       Not responding&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get node oriented information similar to &#039;pbsnodes&#039;? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show nodes                     # One paragraph per node (all nodes)&lt;br /&gt;
$ scontrol show nodes &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;     # One paragraph per node (specified nodes) &lt;br /&gt;
$ scontrol -o show nodes                  # One line per node (all nodes)&lt;br /&gt;
$ scontrol -o show nodes &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;  # One line per node (specified nodes)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to update multiple jobs of a user with a single scontrol command? ==&lt;br /&gt;
&lt;br /&gt;
Not possible. But you can e.g. use squeue to build the script taking&lt;br /&gt;
advantage of its filtering and formatting options.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue -tpd -h -o &amp;quot;scontrol update jobid=%i priority=1000&amp;quot; &amp;gt;my.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also identify the list of jobs and add them to the JobID all at once, for example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update JobID=123 qos=reallylargeqos&lt;br /&gt;
$ scontrol update JobID=123,456,789 qos=reallylargeqos&lt;br /&gt;
$ scontrol update JobID=[123-400],[500-600] qos=reallylargeqos&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another option is to use the JobName, if all the jobs have the same name.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update JobName=&amp;quot;foobar&amp;quot; UserID=johndoe qos=reallylargeqos&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
However, Slurm does not allow the UserID filter alone.&lt;br /&gt;
&lt;br /&gt;
== How to create a new account? ==&lt;br /&gt;
&lt;br /&gt;
Add account at top level in association tree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add account &amp;lt;accountname&amp;gt; Cluster=justus Description=&amp;quot;Account description&amp;quot; Organization=&amp;quot;none&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Add account as child of some parent account in association tree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add account &amp;lt;accountname&amp;gt; parent=&amp;lt;parent_accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to move account to another parent? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify account name=&amp;lt;accountname&amp;gt; set parent=&amp;lt;new_parent_accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to delete an account? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr delete account name=&amp;lt;accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to add a new user? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; DefaultAccount=&amp;lt;accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to add/remove users from an account? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; account=&amp;lt;accountname&amp;gt;                  # Add user to account&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; account=&amp;lt;accountname2&amp;gt;                 # Add user to a second account&lt;br /&gt;
$ sacctmgr remove user &amp;lt;username&amp;gt; where account=&amp;lt;accountname&amp;gt;         # Remove user from this account&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to change default account of a user? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;  &lt;br /&gt;
$ sacctmgr modify user where user=&amp;lt;username&amp;gt; set DefaultAccount=&amp;lt;default_account&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The user must already be associated with the account you want to set as default.&lt;br /&gt;
&lt;br /&gt;
== How to show account information? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr show assoc&lt;br /&gt;
$ sacctmgr show assoc tree&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to implement user resource throttling policies? ==&lt;br /&gt;
&lt;br /&gt;
Quoting from https://bugs.schedmd.com/show_bug.cgi?id=3600#c4&lt;br /&gt;
&lt;br /&gt;
 With Slurm, the associations are meant to establish base limits on the&lt;br /&gt;
 defined partitions, accounts and users. Because limits propagate down&lt;br /&gt;
 through the association tree, you only need to define limits at a high&lt;br /&gt;
 level and those limits will be applied to all partitions, accounts and&lt;br /&gt;
 users that are below it (parent to child). You can also override those&lt;br /&gt;
 high level (parent) limits by explicitly setting different limits at&lt;br /&gt;
 any lower level (on the child). So using the association tree is the&lt;br /&gt;
 best way to get some base limits applied that you want for most cases. &lt;br /&gt;
 QOS&#039;s are meant to override any of those base limits for exceptional&lt;br /&gt;
 cases. Like Maui, you can use QOS&#039;s to set a different priority.&lt;br /&gt;
 Again, the QOS would be overriding the base priority that could be set&lt;br /&gt;
 in the associations.&lt;br /&gt;
&lt;br /&gt;
== How to set a resource limit for an individual user? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/resource_limits.html&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set maxjobs=1            # Limit maximum number of running jobs for user&lt;br /&gt;
$ sacctmgr list assoc user=&amp;lt;username&amp;gt; format=user,maxjobs  # Show that limit&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set maxjobs=-1           # Remove that limit&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to retrieve historical resource usage for a specific user or account? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sreport.html sreport] command.&lt;br /&gt;
&lt;br /&gt;
Examples: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sreport cluster UserUtilizationByAccount Start=2021-01-01 End=2021-12-31 -t Hours user=&amp;lt;username&amp;gt;    # Report cluster utilization of given user broken down by accounts&lt;br /&gt;
$ sreport cluster AccountUtilizationByUser Start=2021-01-01 End=2021-12-31 -t Hours account=&amp;lt;account&amp;gt;  # Report cluster utilization of given account broken down by users    &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* By default CPU resources will be reported. Use &#039;-T&#039; option for other trackable resources, e.g. &#039;-T cpu,mem,gres/gpu,gres/scratch&#039;.&lt;br /&gt;
* On JUSTUS 2 registered compute projects (&amp;quot;Rechenvorhaben&amp;quot;) are uniquely mapped to Slurm accounts of the same name. Thus, &#039;AccountUtilizationByUser&#039; can also be used to report the aggregated cluster utilization of compute projects.&lt;br /&gt;
* Can be executed by regular users as well in which case Slurm will only report their own usage records (but along with the total usage of the associated account in the case of &#039;AccountUtilizationByUser&#039;).&lt;br /&gt;
&lt;br /&gt;
== How to fix/reset a user&#039;s RawUsage value? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; where Account=&amp;lt;account&amp;gt; set RawUsage=&amp;lt;number&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to create/modify/delete QOSes? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/qos.html&lt;br /&gt;
&lt;br /&gt;
Examples:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr show qos                                      # Show existing QOSes&lt;br /&gt;
$ sacctmgr add qos verylong                              # Create new QOS verylong&lt;br /&gt;
$ sacctmgr modify qos verylong set MaxWall=28-00:00:00   # Set maximum walltime limit&lt;br /&gt;
$ sacctmgr modify qos verylong set MaxTRESPerUser=cpu=4  # Set maximum maximum number of CPUS a user can allocate at a given time&lt;br /&gt;
$ sacctmgr modify qos verylong set flags=denyonlimit     # Prevent submission if job requests exceed any limits of QOS&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set qos+=verylong      # Add a QOS to a user account&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set qos-=verylong      # Remove a QOS from a user account&lt;br /&gt;
$ sacctmgr delete qos verylong                           # Delete that QOS&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to find (and fix) runaway jobs? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sacctmgr show runaway&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* Runaway jobs are orphaned jobs that don&#039;t exist in the Slurm controller but have a start and no end time in the Slurm data base. Runaway jobs mess with accounting and affects new jobs of users who have too many runaway jobs. &lt;br /&gt;
* If there are jobs in this state this command will also provide an option to fix them. This will set the end time for each job to the latest out of the start, eligible, or submit times, and set the state to completed.&lt;br /&gt;
&lt;br /&gt;
== How to show a history of database transactions? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sacctmgr list transactions&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Useful to get timestamps for when a user/account/qos has been created/modified/removed etc.&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_JUSTUS_2_Slurm_HOWTO&amp;diff=13035</id>
		<title>BwForCluster JUSTUS 2 Slurm HOWTO</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_JUSTUS_2_Slurm_HOWTO&amp;diff=13035"/>
		<updated>2024-10-23T15:19:01Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Justus2}}&lt;br /&gt;
&lt;br /&gt;
This is a collection of howtos and convenient Slurm commands for JUSTUS 2.&lt;br /&gt;
&lt;br /&gt;
Some commands behave slightly different depending on whether they are executed &lt;br /&gt;
by a system administrator or by a regular user, as Slurm prevents regular users from accessing critical system information and viewing job and usage information of other users.  &lt;br /&gt;
&lt;br /&gt;
= GENERAL INFORMATION =&lt;br /&gt;
&lt;br /&gt;
== How to find a general quick start user guide? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/quickstart.html&lt;br /&gt;
&lt;br /&gt;
== How to find Slurm FAQ? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/faq.html&lt;br /&gt;
&lt;br /&gt;
== How to find a Slurm cheat sheet? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/pdfs/summary.pdf&lt;br /&gt;
&lt;br /&gt;
== How to find Slurm tutorials? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/tutorials.html&lt;br /&gt;
&lt;br /&gt;
== How to get more information on Slurm? ==&lt;br /&gt;
&lt;br /&gt;
(Almost) every Slurm command has a man page. Use it.&lt;br /&gt;
&lt;br /&gt;
Online versions: https://slurm.schedmd.com/man_index.html&lt;br /&gt;
&lt;br /&gt;
== How to find hardware specific details about JUSTUS 2? ==&lt;br /&gt;
&lt;br /&gt;
See our Wiki page: [[Hardware and Architecture (bwForCluster JUSTUS 2)|Hardware and Architecture]]&lt;br /&gt;
&lt;br /&gt;
= JOB SUBMISSION =&lt;br /&gt;
&lt;br /&gt;
== How to submit a serial batch job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html sbatch]  command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch &amp;lt;job-script&amp;gt; &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample job script template for serial job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# 8 GB memory required per node&lt;br /&gt;
#SBATCH --mem=8G&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=serial_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=serial_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=serial_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# Run serial program&lt;br /&gt;
./my_serial_program&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for serial program: [[Media:Hello_serial.c | Hello_serial.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
* --nodes=1 and --ntasks-per-node=1 may be replaced by --ntasks=1.&lt;br /&gt;
* If not specified, stdout and stderr are both written to slurm-%j.out.&lt;br /&gt;
&lt;br /&gt;
== How to find working sample scripts for my program? ==&lt;br /&gt;
&lt;br /&gt;
Most software modules for applications provide working sample batch scripts.&lt;br /&gt;
Check with [[Software_Modules_Lmod#Module_specific_help | module help]] command, e.g. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module help chem/vasp     # display module help for VASP&lt;br /&gt;
$ module help math/matlab   # display module help for Matlab&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to harden job scripts against common errors? ==&lt;br /&gt;
&lt;br /&gt;
The bash shell provides several options that support users in disclosing hidden bugs and writing safer job scripts.&lt;br /&gt;
In order to activate these safeguard settings users can insert the following lines in their scripts (after all #SBATCH directives):    &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
set -o errexit   # (or set -e) cause batch script to exit immediately when a command fails.&lt;br /&gt;
set -o pipefail  # cause batch script to exit immediately also when the command that failed is embedded in a pipeline&lt;br /&gt;
set -o nounset   # (or set -u) causes the script to treat unset variables as an error and exit immediately &lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to submit an interactive job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/salloc.html salloc] command, e.g.:&lt;br /&gt;
&amp;lt;pre&amp;gt;$ salloc --nodes=1 --ntasks-per-node=8&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
In previous Slurm versions &amp;lt; 20.11 the use of [https://slurm.schedmd.com/srun.html srun] has been the recommended way for launching interactive jobs, e.g.:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ srun --nodes=1 --ntasks-per-node=8 --pty bash &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Although this still works with current Slurm versions this is considered &#039;&#039;&#039;deprecated &#039;&#039;&#039; for current Slurm versions as it may cause issues when launching additional jobs steps from within the interactive job environment. Use [https://slurm.schedmd.com/salloc.html salloc] command.&lt;br /&gt;
&lt;br /&gt;
== How to enable X11 forwarding for an interactive job? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--x11&#039; flag, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc --nodes=1 --ntasks-per-node=8 --x11     # run shell with X11 forwarding enabled&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
* For X11 forwarding to work, you must also enable X11 forwarding for your ssh login from your local computer to the cluster, i.e.:&lt;br /&gt;
 &amp;lt;pre&amp;gt;local&amp;gt; ssh -X &amp;lt;username&amp;gt;@justus2.uni-ulm.de&amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to convert Moab batch job scripts to Slurm? ==&lt;br /&gt;
&lt;br /&gt;
Replace Moab/Torque job specification flags and environment variables in your job&lt;br /&gt;
scripts by their corresponding Slurm counterparts.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Commonly used Moab job specification flags and their Slurm equivalents&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Option !! Moab (msub) !! Slurm (sbatch)&lt;br /&gt;
|-&lt;br /&gt;
| Script directive                            || #MSUB                                  || #SBATCH&lt;br /&gt;
|-&lt;br /&gt;
| Job name                                    || -N &amp;lt;name&amp;gt;                              || --job-name=&amp;lt;name&amp;gt;  (-J &amp;lt;name&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Account                                     || -A &amp;lt;account&amp;gt;                           || --account=&amp;lt;account&amp;gt; (-A &amp;lt;account&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Queue                                       || -q &amp;lt;queue&amp;gt;                             || --partition=&amp;lt;partition&amp;gt; (-p &amp;lt;partition&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Wall time limit                             || -l walltime=&amp;lt;hh:mm:ss&amp;gt;                 || --time=&amp;lt;hh:mm:ss&amp;gt; (-t &amp;lt;hh:mm:ss&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Node count                                  || -l nodes=&amp;lt;count&amp;gt;                       || --nodes=&amp;lt;count&amp;gt; (-N &amp;lt;count&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Core count                                  || -l procs=&amp;lt;count&amp;gt;                       || --ntasks=&amp;lt;count&amp;gt; (-n &amp;lt;count&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Process count per node                      || -l ppn=&amp;lt;count&amp;gt;                         || --ntasks-per-node=&amp;lt;count&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Core count per process                      ||                                        || --cpus-per-task=&amp;lt;count&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Memory limit per node                       || -l mem=&amp;lt;limit&amp;gt;                         || --mem=&amp;lt;limit&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Memory limit per process                    || -l pmem=&amp;lt;limit&amp;gt;                        || --mem-per-cpu=&amp;lt;limit&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Job array                                   || -t &amp;lt;array indices&amp;gt;                     || --array=&amp;lt;indices&amp;gt; (-a &amp;lt;indices&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Node exclusive job                          || -l naccesspolicy=singlejob             || --exclusive&lt;br /&gt;
|-&lt;br /&gt;
| Initial working directory                   || -d &amp;lt;directory&amp;gt; (default: $HOME)        || --chdir=&amp;lt;directory&amp;gt; (-D &amp;lt;directory&amp;gt;) (default: submission directory)&lt;br /&gt;
|-&lt;br /&gt;
| Standard output file                        || -o &amp;lt;file path&amp;gt;                         || --output=&amp;lt;file&amp;gt; (-o &amp;lt;file&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Standard error file                         || -e &amp;lt;file path&amp;gt;                         || --error=&amp;lt;file&amp;gt;  (-e &amp;lt;file&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Combine stdout/stderr to stdout             || -j oe                                  || --output=&amp;lt;combined stdout/stderr file&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Mail notification events                    || -m &amp;lt;event&amp;gt;                             || --mail-type=&amp;lt;events&amp;gt; (valid types include: NONE, BEGIN, END, FAIL, ALL)&lt;br /&gt;
|-&lt;br /&gt;
| Export environment to job                   || -V                                     || --export=ALL (default)&lt;br /&gt;
|-&lt;br /&gt;
| Don&#039;t export environment to job             || (default)                              || --export=NONE&lt;br /&gt;
|-&lt;br /&gt;
| Export environment variables to job         || -v &amp;lt;var[=value][,var2=value2[, ...]]&amp;gt;  || --export=&amp;lt;var[=value][,var2=value2[,...]]&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
* Default initial job working directory is $HOME for Moab. For Slurm the default working directory is where you submit your job from.&lt;br /&gt;
* By default Moab does not export any environment variables to the job&#039;s runtime environment. With Slurm most of the login environment variables are exported to your job&#039;s runtime environment. This includes environment variables from software modules that were loaded at job submission time (and also $HOSTNAME variable).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Commonly used Moab/Torque script environment variables and their Slurm equivalents&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Information                 !! Moab                !! Torque               !! Slurm                                     &lt;br /&gt;
|-&lt;br /&gt;
| Job name                     || $MOAB_JOBNAME        || $PBS_JOBNAME        || $SLURM_JOB_NAME                           &lt;br /&gt;
|-&lt;br /&gt;
| Job ID                       || $MOAB_JOBID          || $PBS_JOBID          || $SLURM_JOB_ID                             &lt;br /&gt;
|-&lt;br /&gt;
| Submit directory             || $MOAB_SUBMITDIR      || $PBS_O_WORKDIR      || $SLURM_SUBMIT_DIR                         &lt;br /&gt;
|-&lt;br /&gt;
| Number of nodes allocated    || $MOAB_NODECOUNT      || $PBS_NUM_NODES      || $SLURM_JOB_NUM_NODES (and: $SLURM_NNODES) &lt;br /&gt;
|-&lt;br /&gt;
| Node list                    || $MOAB_NODELIST       || cat $PBS_NODEFILE   || $SLURM_JOB_NODELIST                       &lt;br /&gt;
|-&lt;br /&gt;
| Number of processes          || $MOAB_PROCCOUNT      || $PBS_TASKNUM        || $SLURM_NTASKS                             &lt;br /&gt;
|-&lt;br /&gt;
| Requested tasks per node     || ---                    || $PBS_NUM_PPN        || $SLURM_NTASKS_PER_NODE                    &lt;br /&gt;
|-&lt;br /&gt;
| Requested CPUs per task      || ---                  || ---                 || $SLURM_CPUS_PER_TASK                      &lt;br /&gt;
|-&lt;br /&gt;
| Job array index              || $MOAB_JOBARRAYINDEX  || $PBS_ARRAY_INDEX    || $SLURM_ARRAY_TASK_ID                      &lt;br /&gt;
|-&lt;br /&gt;
| Job array range              || $MOAB_JOBARRAYRANGE  || -                   || $SLURM_ARRAY_TASK_COUNT                   &lt;br /&gt;
|-&lt;br /&gt;
| Queue name                   || $MOAB_CLASS          || $PBS_QUEUE          || $SLURM_JOB_PARTITION                      &lt;br /&gt;
|-&lt;br /&gt;
| QOS name                     || $MOAB_QOS            || ---                 || $SLURM_JOB_QOS                            &lt;br /&gt;
|-&lt;br /&gt;
| Number of processes per node | ---                   || $PBS_NUM_PPN        || $SLURM_TASKS_PER_NODE                     &lt;br /&gt;
|-&lt;br /&gt;
| Job user                     || $MOAB_USER           || $PBS_O_LOGNAME      || $SLURM_JOB_USER                           &lt;br /&gt;
|-&lt;br /&gt;
| Hostname                     || $MOAB_MACHINE        || $PBS_O_HOST         || $SLURMD_NODENAME                          &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* See [https://slurm.schedmd.com/sbatch.html sbatch] man page for a complete list of flags and environment variables.&lt;br /&gt;
&lt;br /&gt;
== How to emulate Moab output file names? ==&lt;br /&gt;
&lt;br /&gt;
Use the following directives:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#SBATCH --output=&amp;quot;%x.o%j&amp;quot;&lt;br /&gt;
#SBATCH --error=&amp;quot;%x.e%j&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to pass command line arguments to the job script? ==&lt;br /&gt;
&lt;br /&gt;
Run &amp;lt;pre&amp;gt;$ sbatch &amp;lt;job-script&amp;gt; arg1 arg2 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Inside the job script the arguments can be accessed as $1, $2, ...&lt;br /&gt;
&lt;br /&gt;
E.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
infile=&amp;quot;$1&amp;quot;&lt;br /&gt;
outfile=&amp;quot;$2&amp;quot;&lt;br /&gt;
./my_serial_program &amp;lt; &amp;quot;$infile&amp;quot; &amp;gt; &amp;quot;$outfile&amp;quot; 2&amp;gt;&amp;amp;1&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; use $1, $2, ... in &amp;quot;#SBATCH&amp;quot; lines. These parameters can be used only within the regular shell script.&lt;br /&gt;
&lt;br /&gt;
== How to request local scratch (SSD/NVMe) at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--gres=scratch:nnn&#039; option to allocate nnn GB of local (i.e. node-local) scratch space for the entire job.&lt;br /&gt;
&lt;br /&gt;
Example: &#039;--gres=scratch:100&#039; will allocate 100 GB scratch space on a locally attached NVMe device.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; add any unit (such as --gres=scratch:100G). This would be treated as requesting an amount of 10^9 * 100GB of scratch space.&lt;br /&gt;
&lt;br /&gt;
* Multinode jobs get nnn GB of local scratch space on every node of the job.&lt;br /&gt;
&lt;br /&gt;
* Environment variable &#039;&#039;&#039;$SCRATCH&#039;&#039;&#039; will point to &lt;br /&gt;
** /scratch/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt; when local scratch has been requested. This will be on locally attached SSD/NVMe devices.&lt;br /&gt;
** /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt; when no local scratch has been requested. This will be in memory and, thus, be limited in size.&lt;br /&gt;
&lt;br /&gt;
* Environment variable &#039;&#039;&#039;$TMPDIR&#039;&#039;&#039; always points to /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt;. This will always be in memory and, thus, limited in size.&lt;br /&gt;
&lt;br /&gt;
* For backward compatibility environment variable $RAMDISK always points to /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Scratch space allocation in /scratch will be enforced by quota limits&lt;br /&gt;
&lt;br /&gt;
* Data written to $TMPDIR will always count against allocated memory.&lt;br /&gt;
&lt;br /&gt;
* Data written to local scratch space will automatically be removed at the end of the job.&lt;br /&gt;
&lt;br /&gt;
== How to request GPGPU nodes at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--gres=gpu:&amp;lt;count&amp;gt;&#039; option to allocate 1 or 2 GPUs per node for the entire job.&lt;br /&gt;
&lt;br /&gt;
Example: &#039;--gres=gpu:1&#039; will allocate one GPU per node for this job.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* GPGPU nodes are equipped with two Nvidia V100S cards &lt;br /&gt;
&lt;br /&gt;
* Environment variables $CUDA_VISIBLE_DEVICES, $SLURM_JOB_GPUS and $GPU_DEVICE_ORDINAL will denote card(s) allocated for the job.&lt;br /&gt;
&lt;br /&gt;
* CUDA Toolkit is available as software module devel/cuda.&lt;br /&gt;
&lt;br /&gt;
== How to clean-up or save files before a job times out? ==&lt;br /&gt;
&lt;br /&gt;
Possibly you would like to clean up the work directory or save intermediate result files in case a job times out.&lt;br /&gt;
&lt;br /&gt;
The following sample script may serve as a blueprint for implementing a pre-termination function to perform clean-up or file recovery actions. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# 2 GB memory required per node&lt;br /&gt;
#SBATCH --mem=2G&lt;br /&gt;
# Request 10 GB local scratch space&lt;br /&gt;
#SBATCH --gres=scratch:10&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=10:00&lt;br /&gt;
# Send the USR1 signal 120 seconds before end of time limit&lt;br /&gt;
#SBATCH --signal=B:USR1@120&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=signal_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=signal_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=signal_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Define the signal handler function&lt;br /&gt;
# Note: This is not executed here, but rather when the associated &lt;br /&gt;
# signal is received by the shell.&lt;br /&gt;
finalize_job()&lt;br /&gt;
{&lt;br /&gt;
    # Do whatever cleanup you want here. In this example we copy&lt;br /&gt;
    # output file(s) back to $SLURM_SUBMIT_DIR, but you may implement &lt;br /&gt;
    # your own job finalization code here.&lt;br /&gt;
    echo &amp;quot;function finalize_job called at `date`&amp;quot;&lt;br /&gt;
    cd $SCRATCH&lt;br /&gt;
    mkdir -vp &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/results&lt;br /&gt;
    tar czvf &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/results/${SLURM_JOB_ID}.tgz output*.txt&lt;br /&gt;
    exit&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# Call finalize_job function as soon as we receive USR1 signal&lt;br /&gt;
trap &#039;finalize_job&#039; USR1&lt;br /&gt;
&lt;br /&gt;
# Copy input files for this job to the scratch directory (if needed).&lt;br /&gt;
# Note: Environment variable $SCRATCH always points to a scratch directory &lt;br /&gt;
# automatically created for this job. Environment variable $SLURM_SUBMIT_DIR &lt;br /&gt;
# points to the path where this script was submitted from.&lt;br /&gt;
# Example:&lt;br /&gt;
# cp -v &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/input*.txt &amp;quot;$SCRATCH&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Change working directory to local scratch directory&lt;br /&gt;
cd &amp;quot;$SCRATCH&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# This is where the actual work is done. In this case we just create &lt;br /&gt;
# a sample output file for 900 (=15*60) seconds, but since we asked &lt;br /&gt;
# Slurm for 600 seconds only it will not be able finish within this &lt;br /&gt;
# wall time.&lt;br /&gt;
# Note: It is important to run this task in the background &lt;br /&gt;
# by placing the &amp;amp; symbol at the end. Otherwise the signal handler &lt;br /&gt;
# would not be executed until that process has finished, which is not &lt;br /&gt;
# what we want.&lt;br /&gt;
(for i in `seq 15`; do echo &amp;quot;Hello World at `date +%H:%M:%S`.&amp;quot;; sleep 60; done) &amp;gt;output.txt 2&amp;gt;&amp;amp;1 &amp;amp;&lt;br /&gt;
&lt;br /&gt;
# Note: The command above is just for illustration. Normally you would just run&lt;br /&gt;
# my_program &amp;gt;output.txt 2&amp;gt;&amp;amp;1 &amp;amp;&lt;br /&gt;
&lt;br /&gt;
# Tell the shell to wait for background task(s) to finish. &lt;br /&gt;
# Note: This is important because otherwise the parent shell &lt;br /&gt;
# (this script) would proceed (and terminate) without waiting for &lt;br /&gt;
# background task(s) to finish.&lt;br /&gt;
wait&lt;br /&gt;
&lt;br /&gt;
# If we get here, the job did not time out but finished in time.&lt;br /&gt;
&lt;br /&gt;
# Release user defined signal handler for USR1&lt;br /&gt;
trap - USR1&lt;br /&gt;
&lt;br /&gt;
# Do regular cleanup and save files. In this example we simply call &lt;br /&gt;
# the same function that we defined as a signal handler above, but you &lt;br /&gt;
# may implement your own code here. &lt;br /&gt;
finalize_job&lt;br /&gt;
&lt;br /&gt;
exit&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* The number of seconds specified in --signal option must match the runtime of the pre-termination function and must not exceed 65535 seconds.&lt;br /&gt;
&lt;br /&gt;
* Due to the resolution of event handling by Slurm, the signal may be sent a little earlier than specified.&lt;br /&gt;
&lt;br /&gt;
== How to submit a multithreaded batch job? ==&lt;br /&gt;
&lt;br /&gt;
Sample job script template for a job running one multithreaded program instance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# Number of cores per program instance&lt;br /&gt;
#SBATCH --cpus-per-task=8&lt;br /&gt;
# 8 GB memory required per node&lt;br /&gt;
#SBATCH --mem=8G&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=multithreaded_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=multithreaded_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=multithreaded_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}&lt;br /&gt;
export MKL_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}&lt;br /&gt;
&lt;br /&gt;
# Run multithreaded program&lt;br /&gt;
./my_multithreaded_program&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for multithreaded program: [[Media:Hello_openmp.c | Hello_openmp.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* In our configuration each physical core is considered a &amp;quot;CPU&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
* On JUSTUS 2 it is recommended to specify a number of cores per task (&#039;--cpus-per-task&#039;) that is either an integer divisor of 24 or (at most) 48.&lt;br /&gt;
&lt;br /&gt;
* Required memory can also by specified per allocated CPU with &#039;--mem-per-cpu&#039; option. &lt;br /&gt;
&lt;br /&gt;
* The &#039;--mem&#039; and &#039;--mem-per-cpu&#039; options are mutually exclusive.&lt;br /&gt;
&lt;br /&gt;
==  How to submit an array job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_array -a] (or [https://slurm.schedmd.com/sbatch.html#OPT_array --array]) option, e.g. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -a 1-16%8 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will submit 16 tasks to be executed, each one indexed by SLURM_ARRAY_TASK_ID ranging from 1 to 16, but will limit the number of simultaneously running tasks from this job array to 8.&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an array job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Number of cores per individual array task&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --array=1-16%8&lt;br /&gt;
#SBATCH --mem=4G&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=array_job&lt;br /&gt;
#SBATCH --output=array_job-%A_%a.out&lt;br /&gt;
#SBATCH --error=array_job-%A_%a.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# Print the task id.&lt;br /&gt;
echo &amp;quot;My SLURM_ARRAY_TASK_ID: &amp;quot; $SLURM_ARRAY_TASK_ID&lt;br /&gt;
&lt;br /&gt;
# Add lines here to run your computations, e.g.&lt;br /&gt;
# ./my_program &amp;lt;input.$SLURM_ARRAY_TASK_ID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Placeholder %A will be replaced by the master job id, %a will be replaced by the array task id.&lt;br /&gt;
&lt;br /&gt;
* Every sub job in an array job will have its own unique environment variable $SLURM_JOB_ID. Environment variable $SLURM_ARRAY_JOB_ID  will be set to the first job array index value for all tasks.&lt;br /&gt;
&lt;br /&gt;
* The remaining options in the sample job script are the same as the options used in other, non-array jobs. In the example above, we are requesting that each array task be allocated 1 CPU (--ntasks=1) and 4 GB of memory (--mem=4G) for up to one hour (--time=01:00:00).&lt;br /&gt;
&lt;br /&gt;
* More information: https://slurm.schedmd.com/job_array.html&lt;br /&gt;
&lt;br /&gt;
== How to delay the start of a job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_begin -b] (or [https://slurm.schedmd.com/sbatch.html#OPT_begin --begin]) option in order to defer the allocation of the job until the specified time.&lt;br /&gt;
&lt;br /&gt;
Examples:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch --begin=20:00 ...               # job can start after 8 p.m. &lt;br /&gt;
sbatch --begin=now+1hour ...           # job can start 1 hour after submission&lt;br /&gt;
sbatch --begin=teatime ...             # job can start at teatime (4 p.m.)&lt;br /&gt;
sbatch --begin=2023-12-24T20:00:00 ... # job can start after specified date/time&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to submit dependency (chain) jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_dependency -d] (or [https://slurm.schedmd.com/sbatch.html#OPT_dependency --dependency]) option, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -d afterany:123456 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will defer the submitted job until the specified job 123456 has terminated.&lt;br /&gt;
&lt;br /&gt;
Slurm supports a number of different dependency types, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-d after:123456      # job can begin execution after the specified job has begun execution&lt;br /&gt;
-d afterany:123456   # job can begin execution after the specified job has finished&lt;br /&gt;
-d afternotok:123456 # job can begin execution after the specified job has failed (exit code not equal zero)&lt;br /&gt;
-d afterok:123456    # job can begin execution after the specified job has successfully finished (exit code zero)&lt;br /&gt;
-d singleton         # job can begin execution after any previously job with the same job name and user have finished&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Multiple jobs can be specified by separating their job ids by colon characters (:), e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt; $ sbatch -d afterany:123456:123457 ... &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will defer the submitted job until the specified jobs 123456 and 123457 have both finished.&lt;br /&gt;
&lt;br /&gt;
== How to deal with invalid job dependencies? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_kill-on-invalid-dep --kill-on-invalid-dep=yes] option in order to automatically terminate jobs which can never run due to invalid dependencies. By default the job stays pending with reason &#039;DependencyNeverSatisfied&#039; to allow review and appropriate action by the user.  &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; A job dependency may also become invalid if a job has been submitted with &#039;-d afterok:&amp;lt;jobid&amp;gt;&#039; but the specified dependency job has failed, e.g. because it timed out (i.e. exceeded its wall time limit).&lt;br /&gt;
&lt;br /&gt;
== How to submit an MPI batch job? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/mpi_guide.html&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an MPI job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate two nodes&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=48&lt;br /&gt;
# Allocate 32 GB memory per node&lt;br /&gt;
#SBATCH --mem=32gb&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=mpi_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=mpi_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Add lines here to run your computations, e.g.&lt;br /&gt;
#&lt;br /&gt;
# Option 1: Lauch MPI tasks by using mpirun&lt;br /&gt;
#&lt;br /&gt;
# for OpenMPI and GNU compiler:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/gnu&lt;br /&gt;
# module load mpi/openmpi&lt;br /&gt;
# mpirun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# for Intel MPI and Intel complier:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/intel&lt;br /&gt;
# module load mpi/impi&lt;br /&gt;
# mpirun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# Option 2: Launch MPI tasks by using srun&lt;br /&gt;
#&lt;br /&gt;
# for OpenMPI and GNU compiler:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/gnu&lt;br /&gt;
# module load mpi/openmpi&lt;br /&gt;
# srun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# for Intel MPI and Intel compiler:&lt;br /&gt;
#&lt;br /&gt;
module load compiler/intel&lt;br /&gt;
module load mpi/impi&lt;br /&gt;
srun  ./my_mpi_program&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for MPI program: [[Media:Hello_mpi.c | Hello_mpi.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* SchedMD recommends to use srun and many (most?) sites do so as well. The rationale is that srun is more tightly integrated with the scheduler and provides more consistent and reliable resource tracking and accounting for individual jobs and job steps. mpirun may behave differently for different MPI implementations and versions. There are reports that claim &amp;quot;strange behavior&amp;quot; of mpirun especially when using task affinity and core binding. Using srun is supposed to resolve these issues and is therefore highly recommended.&lt;br /&gt;
* Do not run batch jobs that launch a large number (hundreds or thousands) short running (few minutes or less) MPI programs, e.g. from a shell loop. Every single MPI invocation does generate its own job step and sends remote procedure calls to the Slurm controller server. This can result in degradation of performance for both, Slurm and the application, especially if many of that jobs happen to run at the same time. Jobs of that kind can even get stuck without showing any further activity until hitting the wall time limit. For high throughput computing (e.g. processing a large number of files with every single task running independently from each other and very shortly), consider a more appropriate parallelization paradigm that invokes independent serial (non-MPI) processes in parallel at the same time. This approach is sometimes referred to as &amp;quot;[https://en.wikipedia.org/wiki/Embarrassingly_parallel pleasingly parallel]&amp;quot; workload. GNU Parallel is a shell tool that facilitates executing serial tasks in parallel. On JUSTUS 2 this tool is available as a software module &amp;quot;system/parallel&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== How to submit a hybrid MPI/OpenMP job? ==&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an hybrid job:&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Number of nodes to allocate&lt;br /&gt;
#SBATCH --nodes=4&lt;br /&gt;
# Number of MPI instances (ranks) to be executed per node&lt;br /&gt;
#SBATCH --ntasks-per-node=2&lt;br /&gt;
# Number of threads per MPI instance&lt;br /&gt;
#SBATCH --cpus-per-task=24&lt;br /&gt;
# Allocate 8 GB memory per node&lt;br /&gt;
#SBATCH --mem=8gb&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=hybrid_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=hybrid_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=hybrid_job-%j.err&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}&lt;br /&gt;
export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK}&lt;br /&gt;
&lt;br /&gt;
module load compiler/intel&lt;br /&gt;
module load mpi/impi&lt;br /&gt;
srun ./my_hybrid_program&lt;br /&gt;
&lt;br /&gt;
# or:&lt;br /&gt;
# mpirun ./my_hybrid_program&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for hybrid program: [[Media:Hello_hybrid.c | Hello_hybrid.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* $SLURM_CPUS_PER_TASK is only set if the &#039;--cpus-per-task&#039; option is specified.&lt;br /&gt;
&lt;br /&gt;
== How to request specific node(s) at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_nodelist -w] (or [https://slurm.schedmd.com/sbatch.html#OPT_nodelist --nodelist]) option, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -w &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also see [https://slurm.schedmd.com/sbatch.html#OPT_nodefile -F] (or [https://slurm.schedmd.com/sbatch.html#OPT_nodefile --nodefile]) option.&lt;br /&gt;
&lt;br /&gt;
== How to exclude specific nodes from job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_exclude -x] (or [https://slurm.schedmd.com/sbatch.html#OPT_exclude --exclude]) option, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -x &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get exclusive jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--exclusive&#039; option on job submission. This makes sure that there will be no other jobs running on your nodes. Very useful for benchmarking!&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* --exclusive option does &#039;&#039;&#039;not&#039;&#039;&#039; mean that you automatically get full access to all the resources which the node might provide without explicitly requesting them.&lt;br /&gt;
&lt;br /&gt;
== How to avoid sharing nodes with other users? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--exclusive=user&#039; option on job submission. This will still allow multiple jobs of one and the same user on the nodes.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Depending on configuration, exclusive=user may (and probably will) be the default node access policy on JUSTUS 2.&lt;br /&gt;
&lt;br /&gt;
==  How to submit batch job without job script? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_wrap --wrap] option.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch --nodes=2 --ntasks-per-node=16 --wrap &amp;quot;sleep 600&amp;quot;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; May be useful for testing purposes.&lt;br /&gt;
&lt;br /&gt;
= JOB MONITORING AND CONTROL =&lt;br /&gt;
&lt;br /&gt;
== How to prevent Slurm performance degradation? ==&lt;br /&gt;
&lt;br /&gt;
Almost every invocation of a Slurm client command (e.g. squeue, sacct, sprio or sshare) sends a remote procedure call (RPC) to the Slurm control daemon and/or database. &lt;br /&gt;
If enough remote procedure calls come in at once, this can result in a degradation of performance of the Slurm services for all users, possibly resulting in a denial of service. &lt;br /&gt;
&lt;br /&gt;
Therefore, &#039;&#039;&#039;do not run Slurm client commands that send remote procedure calls from loops in shell scripts or other programs&#039;&#039;&#039; (such as &#039;watch squeue&#039;). Always ensure to limit calls to squeue, sstat, sacct etc. to the minimum necessary for the information you are trying to gather. &lt;br /&gt;
&lt;br /&gt;
Slurm does collect RPC counts and timing statistics by message type and user for diagnostic purposes.&lt;br /&gt;
&lt;br /&gt;
== How to view information about submitted jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] command, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue                  # all jobs owned by user (all jobs owned by all users for admins)&lt;br /&gt;
$ squeue --me             # all jobs owned by user (same as squeue for regular users)&lt;br /&gt;
$ squeue -u &amp;lt;username&amp;gt;    # jobs of specific user&lt;br /&gt;
$ squeue -t PENDING       # pending jobs only&lt;br /&gt;
$ squeue -t RUNNING       # running jobs only&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
* The output format of [https://slurm.schedmd.com/squeue.html squeue] (and most other Slurm commands) is highly configurable to your needs. Look for the --format or --Format options.&lt;br /&gt;
&lt;br /&gt;
* Every invocation of squeue sends a remote procedure call to the Slurm database server. &#039;&#039;&#039;Do not run squeue or other Slurm client commands from loops in shell scripts or other programs&#039;&#039;&#039; as this can result in a degradation of performance. Ensure that programs limit calls to squeue to the minimum necessary for the information you are trying to gather.&lt;br /&gt;
&lt;br /&gt;
== How to cancel jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/scancel.html scancel] command, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel &amp;lt;jobid&amp;gt;         # cancel specific job&lt;br /&gt;
$ scancel &amp;lt;jobid&amp;gt;_&amp;lt;index&amp;gt; # cancel indexed job in a job array&lt;br /&gt;
$ scancel -u &amp;lt;username&amp;gt;   # cancel all jobs of specific user&lt;br /&gt;
$ scancel -t PENDING      # cancel pending jobs&lt;br /&gt;
$ scancel -t RUNNING      # cancel running jobs&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to show job script of a running job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/scontrol.html scontrol] command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol write batch_script &amp;lt;job_id&amp;gt; &amp;lt;file&amp;gt;&lt;br /&gt;
$ scontrol write batch_script &amp;lt;job_id&amp;gt; -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* If file name is omitted default file name will be slurm-&amp;lt;job_id&amp;gt;.sh&lt;br /&gt;
* If file name is - (i.e. dash) job script will be written to stdout.&lt;br /&gt;
&lt;br /&gt;
== How to get estimated start time of a job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ squeue --start&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* Estimated start times are dynamic and can change at any moment. Exact start times of individual jobs are usually unpredictable.&lt;br /&gt;
* Slurm will report N/A for the start time estimate if nodes are not currently being reserved by the scheduler for the job to run on.&lt;br /&gt;
&lt;br /&gt;
== How to show remaining walltime of running jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] with format option &amp;quot;%L&amp;quot;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt; $ squeue -t r -o &amp;quot;%u %i %L&amp;quot; &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to check priority of jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] with format options &amp;quot;%Q&amp;quot; and/or &amp;quot;%p&amp;quot;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue -o &amp;quot;%8i %8u %15a %.10r %.10L %.5D %.10Q&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sprio.html sprio] command to display the priority components (age/fairshare/...) for each job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sprio&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use &amp;quot;[https://slurm.schedmd.com/sshare.html sshare] command for listing the shares of associations, e.g. accounts.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sshare&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to prevent (hold) jobs from being scheduled for execution? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol hold &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to unhold job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol release &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to suspend a running job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol suspend &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to resume a suspended job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol resume &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to requeue (cancel and resubmit) a particular job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol requeue &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to monitor resource usage of running job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use &amp;quot;[https://slurm.schedmd.com/sstat.html sstat] command.&lt;br /&gt;
&lt;br /&gt;
&#039;sstat -e&#039; command shows a list of fields that can be specified with the &#039;--format&#039; option.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sstat --format=JobId,AveCPU,AveRSS,MaxRSS -j &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will show average CPU time, average and maximum memory consumption of all tasks in the running job.&lt;br /&gt;
Ideally, average CPU time equals the number of cores allocated for the job multiplied by the current run time of the job. &lt;br /&gt;
The maximum memory consumption gives an estimate of the peak amount of memory actually needed so far. This can be compared with the amount of memory requested for the job. Over-requesting memory can result in significant waste of compute resources.       &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Users can also ssh into compute nodes that they have one or more running jobs on. Once logged in, they can use standard Linux process monitoring tools like ps, (h)top, free, vmstat, iostat, du, ...&lt;br /&gt;
&lt;br /&gt;
* Users can also attach an interactive shell under an already allocated job by running the following command: &amp;lt;pre&amp;gt;srun --jobid &amp;lt;job&amp;gt; --overlap --pty /bin/bash&amp;lt;/pre&amp;gt; Once logged in, they can again use standard Linux process monitoring tools like ps, (h)top, free, vmstat, iostat, du, ... For a single node job the user does not even need to know the node that the job is running on. For a multinode job, the user can still use &#039;-w &amp;lt;node&amp;gt;&#039; option to specify a specific node.&lt;br /&gt;
&lt;br /&gt;
== How to get detailed job information ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show job 1234  # For job id 1234&lt;br /&gt;
$ scontrol show jobs      # For all jobs&lt;br /&gt;
$ scontrol -o show jobs   # One line per job&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to modify a pending/running job? ==&lt;br /&gt;
&lt;br /&gt;
Use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ scontrol update JobId=&amp;lt;jobid&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
E.g.: &amp;lt;pre&amp;gt;$ scontrol update JobId=42 TimeLimit=7-0&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will modify the time limit of the job to 7 days.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Update requests for &#039;&#039;&#039;running&#039;&#039;&#039; jobs are mostly restricted to Slurm administrators. In particular, only an administrator can increase the TimeLimit of a job.&lt;br /&gt;
&lt;br /&gt;
== How to show accounting data of completed job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sacct.html sacct] command.&lt;br /&gt;
&lt;br /&gt;
&#039;sacct -e&#039; command shows a list of fields that can be&lt;br /&gt;
specified with the &#039;--format&#039; option.&lt;br /&gt;
&lt;br /&gt;
== How to retrieve job history and accounting? ==&lt;br /&gt;
&lt;br /&gt;
For a specific job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -j &amp;lt;jobid&amp;gt; --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For a specific user:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note: Default time window is the current day.&lt;br /&gt;
&lt;br /&gt;
Starting from a specific date:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; -S 2020-01-15 --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Within a time window:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; -S 2020-01-15 -E 2020-01-31 --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
You can also set the environment variable $SACCT_FORMAT to specify the default format. To get a general idea of how efficiently a job utilized its resources, the following format can be used:  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export SACCT_FORMAT=&amp;quot;JobID,JobName,Elapsed,NCPUs,TotalCPU,CPUTime,ReqMem,MaxRSS,MaxDiskRead,MaxDiskWrite,State,ExitCode&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find how efficiently the CPUs were used, divide TotalCPU by CPUTime. To find how efficiently memory were used, devide MaxRSS by ReqMem. But be aware that sacct memory usage measurement doesn&#039;t catch very rapid memory spikes. If your job got killed for running out of memory, it &#039;&#039;&#039;did run out of memory&#039;&#039;&#039; even if sacct reports a lower memory usage than would trigger an out-of-memory-kill. A job that reads or writes excessively to disk might be bogged down significantly by I/O operations.&lt;br /&gt;
&lt;br /&gt;
== How to get efficiency information of completed job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;pre&amp;gt;$ seff &amp;lt;jobid&amp;gt; &amp;lt;/pre&amp;gt; command for some brief information.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; It is good practice to have a look at the efficiency of your job(s) on completion &#039;&#039;&#039;and we expect you to do so&#039;&#039;&#039;. This way you can improve your job specifications in the future.&lt;br /&gt;
&lt;br /&gt;
== How to get complete field values from sstat and sacct commands? ==&lt;br /&gt;
&lt;br /&gt;
When using the [https://slurm.schedmd.com/sacct.html#OPT_format --format] option for listing various fields you can put a %NUMBER afterwards to specify how many characters should be printed.&lt;br /&gt;
&lt;br /&gt;
E.g. &#039;--format=User%30&#039; will print 30 characters for the user name (right justified).  A %-30 will print 30 characters left justified.&lt;br /&gt;
&lt;br /&gt;
sstat and sacct also provide the &#039;--parsable&#039; and &#039;--parsable2&#039; option to always print full field values delimited with a pipe &#039;|&#039; character by default.&lt;br /&gt;
The delimiting character can be specified by using the &#039;--delimiter&#039; option, e.g. &#039;--delimiter=&amp;quot;,&amp;quot;&#039; for comma separated values.&lt;br /&gt;
&lt;br /&gt;
== How to retrieve job records for all jobs running/pending at a certain point in time? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sacct.html sacct] with [https://slurm.schedmd.com/sacct.html#OPT_state -s &amp;lt;state&amp;gt;] and [https://slurm.schedmd.com/sacct.html#OPT_starttime -S &amp;lt;start time&amp;gt;] options, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$  sacct -n -a -X -S 2021-04-01T00:00:00 -s R -o JobID,User%15,Account%10,NCPUS,NNodes,NodeList%1500&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; When specifying the state &amp;quot;-s &amp;lt;state&amp;gt;&amp;quot; &#039;&#039;&#039;and&#039;&#039;&#039; the start time &amp;quot;-S &amp;lt;start time&amp;gt;&amp;quot;, the default &lt;br /&gt;
time window will be set to end time &amp;quot;-E&amp;quot; equal to start time. Thus, you will get a snapshot of all running/pending &lt;br /&gt;
jobs at the instance given by &amp;quot;-S &amp;lt;start time&amp;gt;&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== How to get a parsable list of hostnames from $SLURM_JOB_NODELIST? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show hostnames $SLURM_JOB_NODELIST&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= ADMINISTRATION =&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Most commands in this section are restricted to system administrators.&lt;br /&gt;
&lt;br /&gt;
== How to stop Slurm from scheduling jobs? ==&lt;br /&gt;
&lt;br /&gt;
You can stop Slurm from scheduling jobs on a per partition basis by&lt;br /&gt;
setting that partition&#039;s state to DOWN. Set its state UP to resume&lt;br /&gt;
scheduling. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update PartitionName=foo State=DOWN&lt;br /&gt;
$ scontrol update PartitionName=foo State=UP&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to print actual hardware configuration of a node? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ slurmd -C   # print hardware configuration plus uptime&lt;br /&gt;
$ slurmd -G   # print generic resource configuration&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to reboot (all) nodes as soon as they become idle? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol reboot ASAP nextstate=RESUME &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;  # specific nodes&lt;br /&gt;
$ scontrol reboot ASAP nextstate=RESUME ALL              # all nodes&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to cancel pending reboot of nodes? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol cancel_reboot &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to check current node status? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show node &amp;lt;node&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to instruct all Slurm daemons to re-read the configuration file ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol reconfigure&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to prevent a user from submitting new jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use the following [https://slurm.schedmd.com/sacctmgr.html sacctmgr] command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr update user &amp;lt;username&amp;gt; set maxsubmitjobs=0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
*Job submission is then rejected with the following message:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch job.slurm&lt;br /&gt;
sbatch: error: AssocMaxSubmitJobLimit&lt;br /&gt;
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user&#039;s size and/or time limits)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Use the following command to release the limit:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr update user &amp;lt;username&amp;gt; set maxsubmitjobs=-1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to drain node(s)? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update NodeName=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; State=DRAIN Reason=&amp;quot;Some Reason&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notes:&lt;br /&gt;
&lt;br /&gt;
* Reason is mandatory.&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; just set state DOWN to drain nodes. This will kill any active jobs that may run on that nodes.&lt;br /&gt;
&lt;br /&gt;
== How to list reason for nodes being drained or down? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to resume node state? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update NodeName=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; State=RESUME&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to create a reservation on nodes? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/reservations.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol create reservation user=root starttime=now duration=UNLIMITED flags=maint,ignore_jobs nodes=ALL&lt;br /&gt;
$ scontrol create reservation user=root starttime=2020-12-24T17:00 duration=12:00:00 flags=maint,ignore_jobs nodes=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
$ scontrol show reservation&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Add &amp;quot;FLEX&amp;quot; flag to allow jobs that qualify for the reservation to start before the reservation begins (and continue after it starts). &lt;br /&gt;
Add &amp;quot;MAGNETIC&amp;quot; flag to attract jobs that qualify for the reservation to run in that reservation without having requested it at submit time.&lt;br /&gt;
&lt;br /&gt;
== How to create a floating reservation on nodes? ==&lt;br /&gt;
&lt;br /&gt;
Use the flag &amp;quot;TIME_FLOAT&amp;quot; and a start time that is relative to the current time (use the keyword &amp;quot;now&amp;quot;).&lt;br /&gt;
In the example below, the nodes are prevented from starting any jobs exceeding a walltime of 2 days.&lt;br /&gt;
 &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol create reservation user=root starttime=now+2days duration=UNLIMITED flags=maint,ignore_jobs,time_float nodes=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Floating reservation are not intended to run jobs, but to prevent long running jobs from being initiated on specific nodes. Attempts by users to make use of a floating reservation will be rejected. When ready to perform the maintenance, place the nodes in DRAIN state and delete the reservation.&lt;br /&gt;
&lt;br /&gt;
== How to use a reservation? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch --reservation=foo_6 ... script.slurm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to delete a reservation? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol delete ReservationName=foo_6&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get node oriented information similar to &#039;mdiag -n&#039;? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -N -l&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Fields can be individually customized. See [https://slurm.schedmd.com/sinfo.html sinfo] man page. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -N --format=&amp;quot;%8N %12P %.4C %.8O %.6m %.6e %.8T %.20E&amp;quot;&lt;br /&gt;
&lt;br /&gt;
NODELIST PARTITION    CPUS CPU_LOAD MEMORY FREE_M    STATE               REASON&lt;br /&gt;
n0001    standard*    0/16     0.01 128000 120445     idle                 none&lt;br /&gt;
n0002    standard*    0/16     0.01 128000 120438     idle                 none&lt;br /&gt;
n0003    standard*    0/0/      N/A 128000    N/A    down*       Not responding&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get node oriented information similar to &#039;pbsnodes&#039;? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show nodes                     # One paragraph per node (all nodes)&lt;br /&gt;
$ scontrol show nodes &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;     # One paragraph per node (specified nodes) &lt;br /&gt;
$ scontrol -o show nodes                  # One line per node (all nodes)&lt;br /&gt;
$ scontrol -o show nodes &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;  # One line per node (specified nodes)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to update multiple jobs of a user with a single scontrol command? ==&lt;br /&gt;
&lt;br /&gt;
Not possible. But you can e.g. use squeue to build the script taking&lt;br /&gt;
advantage of its filtering and formatting options.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue -tpd -h -o &amp;quot;scontrol update jobid=%i priority=1000&amp;quot; &amp;gt;my.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also identify the list of jobs and add them to the JobID all at once, for example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update JobID=123 qos=reallylargeqos&lt;br /&gt;
$ scontrol update JobID=123,456,789 qos=reallylargeqos&lt;br /&gt;
$ scontrol update JobID=[123-400],[500-600] qos=reallylargeqos&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another option is to use the JobName, if all the jobs have the same name.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update JobName=&amp;quot;foobar&amp;quot; UserID=johndoe qos=reallylargeqos&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
However, Slurm does not allow the UserID filter alone.&lt;br /&gt;
&lt;br /&gt;
== How to create a new account? ==&lt;br /&gt;
&lt;br /&gt;
Add account at top level in association tree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add account &amp;lt;accountname&amp;gt; Cluster=justus Description=&amp;quot;Account description&amp;quot; Organization=&amp;quot;none&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Add account as child of some parent account in association tree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add account &amp;lt;accountname&amp;gt; parent=&amp;lt;parent_accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to move account to another parent? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify account name=&amp;lt;accountname&amp;gt; set parent=&amp;lt;new_parent_accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to delete an account? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr delete account name=&amp;lt;accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to add a new user? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; DefaultAccount=&amp;lt;accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to add/remove users from an account? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; account=&amp;lt;accountname&amp;gt;                  # Add user to account&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; account=&amp;lt;accountname2&amp;gt;                 # Add user to a second account&lt;br /&gt;
$ sacctmgr remove user &amp;lt;username&amp;gt; where account=&amp;lt;accountname&amp;gt;         # Remove user from this account&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to change default account of a user? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;  &lt;br /&gt;
$  sacctmgr modify user where user=&amp;lt;username&amp;gt; set DefaultAccount=&amp;lt;default_account&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The user must already be associated with the account you want to set as default.&lt;br /&gt;
&lt;br /&gt;
== How to show account information? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr show assoc&lt;br /&gt;
$ sacctmgr show assoc tree&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to implement user resource throttling policies? ==&lt;br /&gt;
&lt;br /&gt;
Quoting from https://bugs.schedmd.com/show_bug.cgi?id=3600#c4&lt;br /&gt;
&lt;br /&gt;
 With Slurm, the associations are meant to establish base limits on the&lt;br /&gt;
 defined partitions, accounts and users. Because limits propagate down&lt;br /&gt;
 through the association tree, you only need to define limits at a high&lt;br /&gt;
 level and those limits will be applied to all partitions, accounts and&lt;br /&gt;
 users that are below it (parent to child). You can also override those&lt;br /&gt;
 high level (parent) limits by explicitly setting different limits at&lt;br /&gt;
 any lower level (on the child). So using the association tree is the&lt;br /&gt;
 best way to get some base limits applied that you want for most cases. &lt;br /&gt;
 QOS&#039;s are meant to override any of those base limits for exceptional&lt;br /&gt;
 cases. Like Maui, you can use QOS&#039;s to set a different priority.&lt;br /&gt;
 Again, the QOS would be overriding the base priority that could be set&lt;br /&gt;
 in the associations.&lt;br /&gt;
&lt;br /&gt;
== How to set a resource limit for an individual user? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/resource_limits.html&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set maxjobs=1            # Limit maximum number of running jobs for user&lt;br /&gt;
$ sacctmgr list assoc user=&amp;lt;username&amp;gt; format=user,maxjobs  # Show that limit&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set maxjobs=-1           # Remove that limit&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to retrieve historical resource usage for a specific user or account? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sreport.html sreport] command.&lt;br /&gt;
&lt;br /&gt;
Examples: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sreport cluster UserUtilizationByAccount Start=2021-01-01 End=2021-12-31 -t Hours user=&amp;lt;username&amp;gt;    # Report cluster utilization of given user broken down by accounts&lt;br /&gt;
$ sreport cluster AccountUtilizationByUser Start=2021-01-01 End=2021-12-31 -t Hours account=&amp;lt;account&amp;gt;  # Report cluster utilization of given account broken down by users    &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* By default CPU resources will be reported. Use &#039;-T&#039; option for other trackable resources, e.g. &#039;-T cpu,mem,gres/gpu,gres/scratch&#039;.&lt;br /&gt;
* On JUSTUS 2 registered compute projects (&amp;quot;Rechenvorhaben&amp;quot;) are uniquely mapped to Slurm accounts of the same name. Thus, &#039;AccountUtilizationByUser&#039; can also be used to report the aggregated cluster utilization of compute projects.&lt;br /&gt;
* Can be executed by regular users as well in which case Slurm will only report their own usage records (but along with the total usage of the associated account in the case of &#039;AccountUtilizationByUser&#039;).&lt;br /&gt;
&lt;br /&gt;
== How to fix/reset a user&#039;s RawUsage value? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; where Account=&amp;lt;account&amp;gt; set RawUsage=&amp;lt;number&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to create/modify/delete QOSes? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/qos.html&lt;br /&gt;
&lt;br /&gt;
Examples:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr show qos                                      # Show existing QOSes&lt;br /&gt;
$ sacctmgr add qos verylong                              # Create new QOS verylong&lt;br /&gt;
$ sacctmgr modify qos verylong set MaxWall=28-00:00:00   # Set maximum walltime limit&lt;br /&gt;
$ sacctmgr modify qos verylong set MaxTRESPerUser=cpu=4  # Set maximum maximum number of CPUS a user can allocate at a given time&lt;br /&gt;
$ sacctmgr modify qos verylong set flags=denyonlimit     # Prevent submission if job requests exceed any limits of QOS&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set qos+=verylong      # Add a QOS to a user account&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set qos-=verylong      # Remove a QOS from a user account&lt;br /&gt;
$ sacctmgr delete qos verylong                           # Delete that QOS&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to find (and fix) runaway jobs? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sacctmgr show runaway&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* Runaway jobs are orphaned jobs that don&#039;t exist in the Slurm controller but have a start and no end time in the Slurm data base. Runaway jobs mess with accounting and affects new jobs of users who have too many runaway jobs. &lt;br /&gt;
* If there are jobs in this state this command will also provide an option to fix them. This will set the end time for each job to the latest out of the start, eligible, or submit times, and set the state to completed.&lt;br /&gt;
&lt;br /&gt;
== How to show a history of database transactions? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sacctmgr list transactions&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Useful to get timestamps for when a user/account/qos has been created/modified/removed etc.&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_JUSTUS_2_Slurm_HOWTO&amp;diff=13034</id>
		<title>BwForCluster JUSTUS 2 Slurm HOWTO</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_JUSTUS_2_Slurm_HOWTO&amp;diff=13034"/>
		<updated>2024-10-23T15:17:28Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Justus2}}&lt;br /&gt;
&lt;br /&gt;
This is a collection of howtos and convenient Slurm commands for JUSTUS 2.&lt;br /&gt;
&lt;br /&gt;
Some commands behave slightly different depending on whether they are executed &lt;br /&gt;
by a system administrator or by a regular user, as Slurm prevents regular users from accessing critical system information and viewing job and usage information of other users.  &lt;br /&gt;
&lt;br /&gt;
= GENERAL INFORMATION =&lt;br /&gt;
&lt;br /&gt;
== How to find a general quick start user guide? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/quickstart.html&lt;br /&gt;
&lt;br /&gt;
== How to find Slurm FAQ? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/faq.html&lt;br /&gt;
&lt;br /&gt;
== How to find a Slurm cheat sheet? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/pdfs/summary.pdf&lt;br /&gt;
&lt;br /&gt;
== How to find Slurm tutorials? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/tutorials.html&lt;br /&gt;
&lt;br /&gt;
== How to get more information on Slurm? ==&lt;br /&gt;
&lt;br /&gt;
(Almost) every Slurm command has a man page. Use it.&lt;br /&gt;
&lt;br /&gt;
Online versions: https://slurm.schedmd.com/man_index.html&lt;br /&gt;
&lt;br /&gt;
== How to find hardware specific details about JUSTUS 2? ==&lt;br /&gt;
&lt;br /&gt;
See our Wiki page: [[Hardware and Architecture (bwForCluster JUSTUS 2)|Hardware and Architecture]]&lt;br /&gt;
&lt;br /&gt;
= JOB SUBMISSION =&lt;br /&gt;
&lt;br /&gt;
== How to submit a serial batch job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html sbatch]  command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch &amp;lt;job-script&amp;gt; &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample job script template for serial job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# 8 GB memory required per node&lt;br /&gt;
#SBATCH --mem=8G&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=serial_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=serial_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=serial_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# Run serial program&lt;br /&gt;
./my_serial_program&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for serial program: [[Media:Hello_serial.c | Hello_serial.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
* --nodes=1 and --ntasks-per-node=1 may be replaced by --ntasks=1.&lt;br /&gt;
* If not specified, stdout and stderr are both written to slurm-%j.out.&lt;br /&gt;
&lt;br /&gt;
== How to find working sample scripts for my program? ==&lt;br /&gt;
&lt;br /&gt;
Most software modules for applications provide working sample batch scripts.&lt;br /&gt;
Check with [[Software_Modules_Lmod#Module_specific_help | module help]] command, e.g. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module help chem/vasp     # display module help for VASP&lt;br /&gt;
$ module help math/matlab   # display module help for Matlab&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to harden job scripts against common errors? ==&lt;br /&gt;
&lt;br /&gt;
The bash shell provides several options that support users in disclosing hidden bugs and writing safer job scripts.&lt;br /&gt;
In order to activate these safeguard settings users can insert the following lines in their scripts (after all #SBATCH directives):    &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
set -o errexit   # (or set -e) cause batch script to exit immediately when a command fails.&lt;br /&gt;
set -o pipefail  # cause batch script to exit immediately also when the command that failed is embedded in a pipeline&lt;br /&gt;
set -o nounset   # (or set -u) causes the script to treat unset variables as an error and exit immediately &lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to submit an interactive job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/salloc.html salloc] command, e.g.:&lt;br /&gt;
&amp;lt;pre&amp;gt;$ salloc --nodes=1 --ntasks-per-node=8&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
In previous Slurm versions &amp;lt; 20.11 the use of [https://slurm.schedmd.com/srun.html srun] has been the recommended way for launching interactive jobs, e.g.:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ srun --nodes=1 --ntasks-per-node=8 --pty bash &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Although this still works with current Slurm versions this is considered &#039;&#039;&#039;deprecated &#039;&#039;&#039; for current Slurm versions as it may cause issues when launching additional jobs steps from within the interactive job environment. Use [https://slurm.schedmd.com/salloc.html salloc] command.&lt;br /&gt;
&lt;br /&gt;
== How to enable X11 forwarding for an interactive job? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--x11&#039; flag, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc --nodes=1 --ntasks-per-node=8 --x11     # run shell with X11 forwarding enabled&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
* For X11 forwarding to work, you must also enable X11 forwarding for your ssh login from your local computer to the cluster, i.e.:&lt;br /&gt;
 &amp;lt;pre&amp;gt;local&amp;gt; ssh -X &amp;lt;username&amp;gt;@justus2.uni-ulm.de&amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to convert Moab batch job scripts to Slurm? ==&lt;br /&gt;
&lt;br /&gt;
Replace Moab/Torque job specification flags and environment variables in your job&lt;br /&gt;
scripts by their corresponding Slurm counterparts.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Commonly used Moab job specification flags and their Slurm equivalents&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Option !! Moab (msub) !! Slurm (sbatch)&lt;br /&gt;
|-&lt;br /&gt;
| Script directive                            || #MSUB                                  || #SBATCH&lt;br /&gt;
|-&lt;br /&gt;
| Job name                                    || -N &amp;lt;name&amp;gt;                              || --job-name=&amp;lt;name&amp;gt;  (-J &amp;lt;name&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Account                                     || -A &amp;lt;account&amp;gt;                           || --account=&amp;lt;account&amp;gt; (-A &amp;lt;account&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Queue                                       || -q &amp;lt;queue&amp;gt;                             || --partition=&amp;lt;partition&amp;gt; (-p &amp;lt;partition&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Wall time limit                             || -l walltime=&amp;lt;hh:mm:ss&amp;gt;                 || --time=&amp;lt;hh:mm:ss&amp;gt; (-t &amp;lt;hh:mm:ss&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Node count                                  || -l nodes=&amp;lt;count&amp;gt;                       || --nodes=&amp;lt;count&amp;gt; (-N &amp;lt;count&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Core count                                  || -l procs=&amp;lt;count&amp;gt;                       || --ntasks=&amp;lt;count&amp;gt; (-n &amp;lt;count&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Process count per node                      || -l ppn=&amp;lt;count&amp;gt;                         || --ntasks-per-node=&amp;lt;count&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Core count per process                      ||                                        || --cpus-per-task=&amp;lt;count&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Memory limit per node                       || -l mem=&amp;lt;limit&amp;gt;                         || --mem=&amp;lt;limit&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Memory limit per process                    || -l pmem=&amp;lt;limit&amp;gt;                        || --mem-per-cpu=&amp;lt;limit&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Job array                                   || -t &amp;lt;array indices&amp;gt;                     || --array=&amp;lt;indices&amp;gt; (-a &amp;lt;indices&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Node exclusive job                          || -l naccesspolicy=singlejob             || --exclusive&lt;br /&gt;
|-&lt;br /&gt;
| Initial working directory                   || -d &amp;lt;directory&amp;gt; (default: $HOME)        || --chdir=&amp;lt;directory&amp;gt; (-D &amp;lt;directory&amp;gt;) (default: submission directory)&lt;br /&gt;
|-&lt;br /&gt;
| Standard output file                        || -o &amp;lt;file path&amp;gt;                         || --output=&amp;lt;file&amp;gt; (-o &amp;lt;file&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Standard error file                         || -e &amp;lt;file path&amp;gt;                         || --error=&amp;lt;file&amp;gt;  (-e &amp;lt;file&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Combine stdout/stderr to stdout             || -j oe                                  || --output=&amp;lt;combined stdout/stderr file&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Mail notification events                    || -m &amp;lt;event&amp;gt;                             || --mail-type=&amp;lt;events&amp;gt; (valid types include: NONE, BEGIN, END, FAIL, ALL)&lt;br /&gt;
|-&lt;br /&gt;
| Export environment to job                   || -V                                     || --export=ALL (default)&lt;br /&gt;
|-&lt;br /&gt;
| Don&#039;t export environment to job             || (default)                              || --export=NONE&lt;br /&gt;
|-&lt;br /&gt;
| Export environment variables to job         || -v &amp;lt;var[=value][,var2=value2[, ...]]&amp;gt;  || --export=&amp;lt;var[=value][,var2=value2[,...]]&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
* Default initial job working directory is $HOME for Moab. For Slurm the default working directory is where you submit your job from.&lt;br /&gt;
* By default Moab does not export any environment variables to the job&#039;s runtime environment. With Slurm most of the login environment variables are exported to your job&#039;s runtime environment. This includes environment variables from software modules that were loaded at job submission time (and also $HOSTNAME variable).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Commonly used Moab/Torque script environment variables and their Slurm equivalents&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Information                 !! Moab                !! Torque               !! Slurm                                     &lt;br /&gt;
|-&lt;br /&gt;
| Job name                     || $MOAB_JOBNAME        || $PBS_JOBNAME        || $SLURM_JOB_NAME                           &lt;br /&gt;
|-&lt;br /&gt;
| Job ID                       || $MOAB_JOBID          || $PBS_JOBID          || $SLURM_JOB_ID                             &lt;br /&gt;
|-&lt;br /&gt;
| Submit directory             || $MOAB_SUBMITDIR      || $PBS_O_WORKDIR      || $SLURM_SUBMIT_DIR                         &lt;br /&gt;
|-&lt;br /&gt;
| Number of nodes allocated    || $MOAB_NODECOUNT      || $PBS_NUM_NODES      || $SLURM_JOB_NUM_NODES (and: $SLURM_NNODES) &lt;br /&gt;
|-&lt;br /&gt;
| Node list                    || $MOAB_NODELIST       || cat $PBS_NODEFILE   || $SLURM_JOB_NODELIST                       &lt;br /&gt;
|-&lt;br /&gt;
| Number of processes          || $MOAB_PROCCOUNT      || $PBS_TASKNUM        || $SLURM_NTASKS                             &lt;br /&gt;
|-&lt;br /&gt;
| Requested tasks per node     || ---                    || $PBS_NUM_PPN        || $SLURM_NTASKS_PER_NODE                    &lt;br /&gt;
|-&lt;br /&gt;
| Requested CPUs per task      || ---                  || ---                 || $SLURM_CPUS_PER_TASK                      &lt;br /&gt;
|-&lt;br /&gt;
| Job array index              || $MOAB_JOBARRAYINDEX  || $PBS_ARRAY_INDEX    || $SLURM_ARRAY_TASK_ID                      &lt;br /&gt;
|-&lt;br /&gt;
| Job array range              || $MOAB_JOBARRAYRANGE  || -                   || $SLURM_ARRAY_TASK_COUNT                   &lt;br /&gt;
|-&lt;br /&gt;
| Queue name                   || $MOAB_CLASS          || $PBS_QUEUE          || $SLURM_JOB_PARTITION                      &lt;br /&gt;
|-&lt;br /&gt;
| QOS name                     || $MOAB_QOS            || ---                 || $SLURM_JOB_QOS                            &lt;br /&gt;
|-&lt;br /&gt;
| Number of processes per node | ---                   || $PBS_NUM_PPN        || $SLURM_TASKS_PER_NODE                     &lt;br /&gt;
|-&lt;br /&gt;
| Job user                     || $MOAB_USER           || $PBS_O_LOGNAME      || $SLURM_JOB_USER                           &lt;br /&gt;
|-&lt;br /&gt;
| Hostname                     || $MOAB_MACHINE        || $PBS_O_HOST         || $SLURMD_NODENAME                          &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* See [https://slurm.schedmd.com/sbatch.html sbatch] man page for a complete list of flags and environment variables.&lt;br /&gt;
&lt;br /&gt;
== How to emulate Moab output file names? ==&lt;br /&gt;
&lt;br /&gt;
Use the following directives:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#SBATCH --output=&amp;quot;%x.o%j&amp;quot;&lt;br /&gt;
#SBATCH --error=&amp;quot;%x.e%j&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to pass command line arguments to the job script? ==&lt;br /&gt;
&lt;br /&gt;
Run &amp;lt;pre&amp;gt;$ sbatch &amp;lt;job-script&amp;gt; arg1 arg2 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Inside the job script the arguments can be accessed as $1, $2, ...&lt;br /&gt;
&lt;br /&gt;
E.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
infile=&amp;quot;$1&amp;quot;&lt;br /&gt;
outfile=&amp;quot;$2&amp;quot;&lt;br /&gt;
./my_serial_program &amp;lt; &amp;quot;$infile&amp;quot; &amp;gt; &amp;quot;$outfile&amp;quot; 2&amp;gt;&amp;amp;1&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; use $1, $2, ... in &amp;quot;#SBATCH&amp;quot; lines. These parameters can be used only within the regular shell script.&lt;br /&gt;
&lt;br /&gt;
== How to request local scratch (SSD/NVMe) at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--gres=scratch:nnn&#039; option to allocate nnn GB of local (i.e. node-local) scratch space for the entire job.&lt;br /&gt;
&lt;br /&gt;
Example: &#039;--gres=scratch:100&#039; will allocate 100 GB scratch space on a locally attached NVMe device.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; add any unit (such as --gres=scratch:100G). This would be treated as requesting an amount of 10^9 * 100GB of scratch space.&lt;br /&gt;
&lt;br /&gt;
* Multinode jobs get nnn GB of local scratch space on every node of the job.&lt;br /&gt;
&lt;br /&gt;
* Environment variable &#039;&#039;&#039;$SCRATCH&#039;&#039;&#039; will point to &lt;br /&gt;
** /scratch/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt; when local scratch has been requested. This will be on locally attached SSD/NVMe devices.&lt;br /&gt;
** /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt; when no local scratch has been requested. This will be in memory and, thus, be limited in size.&lt;br /&gt;
&lt;br /&gt;
* Environment variable &#039;&#039;&#039;$TMPDIR&#039;&#039;&#039; always points to /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt;. This will always be in memory and, thus, limited in size.&lt;br /&gt;
&lt;br /&gt;
* For backward compatibility environment variable $RAMDISK always points to /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Scratch space allocation in /scratch will be enforced by quota limits&lt;br /&gt;
&lt;br /&gt;
* Data written to $TMPDIR will always count against allocated memory.&lt;br /&gt;
&lt;br /&gt;
* Data written to local scratch space will automatically be removed at the end of the job.&lt;br /&gt;
&lt;br /&gt;
== How to request GPGPU nodes at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--gres=gpu:&amp;lt;count&amp;gt;&#039; option to allocate 1 or 2 GPUs per node for the entire job.&lt;br /&gt;
&lt;br /&gt;
Example: &#039;--gres=gpu:1&#039; will allocate one GPU per node for this job.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* GPGPU nodes are equipped with two Nvidia V100S cards &lt;br /&gt;
&lt;br /&gt;
* Environment variables $CUDA_VISIBLE_DEVICES, $SLURM_JOB_GPUS and $GPU_DEVICE_ORDINAL will denote card(s) allocated for the job.&lt;br /&gt;
&lt;br /&gt;
* CUDA Toolkit is available as software module devel/cuda.&lt;br /&gt;
&lt;br /&gt;
== How to clean-up or save files before a job times out? ==&lt;br /&gt;
&lt;br /&gt;
Possibly you would like to clean up the work directory or save intermediate result files in case a job times out.&lt;br /&gt;
&lt;br /&gt;
The following sample script may serve as a blueprint for implementing a pre-termination function to perform clean-up or file recovery actions. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# 2 GB memory required per node&lt;br /&gt;
#SBATCH --mem=2G&lt;br /&gt;
# Request 10 GB local scratch space&lt;br /&gt;
#SBATCH --gres=scratch:10&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=10:00&lt;br /&gt;
# Send the USR1 signal 120 seconds before end of time limit&lt;br /&gt;
#SBATCH --signal=B:USR1@120&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=signal_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=signal_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=signal_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Define the signal handler function&lt;br /&gt;
# Note: This is not executed here, but rather when the associated &lt;br /&gt;
# signal is received by the shell.&lt;br /&gt;
finalize_job()&lt;br /&gt;
{&lt;br /&gt;
    # Do whatever cleanup you want here. In this example we copy&lt;br /&gt;
    # output file(s) back to $SLURM_SUBMIT_DIR, but you may implement &lt;br /&gt;
    # your own job finalization code here.&lt;br /&gt;
    echo &amp;quot;function finalize_job called at `date`&amp;quot;&lt;br /&gt;
    cd $SCRATCH&lt;br /&gt;
    mkdir -vp &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/results&lt;br /&gt;
    tar czvf &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/results/${SLURM_JOB_ID}.tgz output*.txt&lt;br /&gt;
    exit&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# Call finalize_job function as soon as we receive USR1 signal&lt;br /&gt;
trap &#039;finalize_job&#039; USR1&lt;br /&gt;
&lt;br /&gt;
# Copy input files for this job to the scratch directory (if needed).&lt;br /&gt;
# Note: Environment variable $SCRATCH always points to a scratch directory &lt;br /&gt;
# automatically created for this job. Environment variable $SLURM_SUBMIT_DIR &lt;br /&gt;
# points to the path where this script was submitted from.&lt;br /&gt;
# Example:&lt;br /&gt;
# cp -v &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/input*.txt &amp;quot;$SCRATCH&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Change working directory to local scratch directory&lt;br /&gt;
cd &amp;quot;$SCRATCH&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# This is where the actual work is done. In this case we just create &lt;br /&gt;
# a sample output file for 900 (=15*60) seconds, but since we asked &lt;br /&gt;
# Slurm for 600 seconds only it will not be able finish within this &lt;br /&gt;
# wall time.&lt;br /&gt;
# Note: It is important to run this task in the background &lt;br /&gt;
# by placing the &amp;amp; symbol at the end. Otherwise the signal handler &lt;br /&gt;
# would not be executed until that process has finished, which is not &lt;br /&gt;
# what we want.&lt;br /&gt;
(for i in `seq 15`; do echo &amp;quot;Hello World at `date +%H:%M:%S`.&amp;quot;; sleep 60; done) &amp;gt;output.txt 2&amp;gt;&amp;amp;1 &amp;amp;&lt;br /&gt;
&lt;br /&gt;
# Note: The command above is just for illustration. Normally you would just run&lt;br /&gt;
# my_program &amp;gt;output.txt 2&amp;gt;&amp;amp;1 &amp;amp;&lt;br /&gt;
&lt;br /&gt;
# Tell the shell to wait for background task(s) to finish. &lt;br /&gt;
# Note: This is important because otherwise the parent shell &lt;br /&gt;
# (this script) would proceed (and terminate) without waiting for &lt;br /&gt;
# background task(s) to finish.&lt;br /&gt;
wait&lt;br /&gt;
&lt;br /&gt;
# If we get here, the job did not time out but finished in time.&lt;br /&gt;
&lt;br /&gt;
# Release user defined signal handler for USR1&lt;br /&gt;
trap - USR1&lt;br /&gt;
&lt;br /&gt;
# Do regular cleanup and save files. In this example we simply call &lt;br /&gt;
# the same function that we defined as a signal handler above, but you &lt;br /&gt;
# may implement your own code here. &lt;br /&gt;
finalize_job&lt;br /&gt;
&lt;br /&gt;
exit&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* The number of seconds specified in --signal option must match the runtime of the pre-termination function and must not exceed 65535 seconds.&lt;br /&gt;
&lt;br /&gt;
* Due to the resolution of event handling by Slurm, the signal may be sent a little earlier than specified.&lt;br /&gt;
&lt;br /&gt;
== How to submit a multithreaded batch job? ==&lt;br /&gt;
&lt;br /&gt;
Sample job script template for a job running one multithreaded program instance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# Number of cores per program instance&lt;br /&gt;
#SBATCH --cpus-per-task=8&lt;br /&gt;
# 8 GB memory required per node&lt;br /&gt;
#SBATCH --mem=8G&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=multithreaded_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=multithreaded_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=multithreaded_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}&lt;br /&gt;
export MKL_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}&lt;br /&gt;
&lt;br /&gt;
# Run multithreaded program&lt;br /&gt;
./my_multithreaded_program&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for multithreaded program: [[Media:Hello_openmp.c | Hello_openmp.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* In our configuration each physical core is considered a &amp;quot;CPU&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
* On JUSTUS 2 it is recommended to specify a number of cores per task (&#039;--cpus-per-task&#039;) that is either an integer divisor of 24 or (at most) 48.&lt;br /&gt;
&lt;br /&gt;
* Required memory can also by specified per allocated CPU with &#039;--mem-per-cpu&#039; option. &lt;br /&gt;
&lt;br /&gt;
* The &#039;--mem&#039; and &#039;--mem-per-cpu&#039; options are mutually exclusive.&lt;br /&gt;
&lt;br /&gt;
==  How to submit an array job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_array -a] (or [https://slurm.schedmd.com/sbatch.html#OPT_array --array]) option, e.g. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -a 1-16%8 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will submit 16 tasks to be executed, each one indexed by SLURM_ARRAY_TASK_ID ranging from 1 to 16, but will limit the number of simultaneously running tasks from this job array to 8.&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an array job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Number of cores per individual array task&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --array=1-16%8&lt;br /&gt;
#SBATCH --mem=4G&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=array_job&lt;br /&gt;
#SBATCH --output=array_job-%A_%a.out&lt;br /&gt;
#SBATCH --error=array_job-%A_%a.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# Print the task id.&lt;br /&gt;
echo &amp;quot;My SLURM_ARRAY_TASK_ID: &amp;quot; $SLURM_ARRAY_TASK_ID&lt;br /&gt;
&lt;br /&gt;
# Add lines here to run your computations, e.g.&lt;br /&gt;
# ./my_program &amp;lt;input.$SLURM_ARRAY_TASK_ID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Placeholder %A will be replaced by the master job id, %a will be replaced by the array task id.&lt;br /&gt;
&lt;br /&gt;
* Every sub job in an array job will have its own unique environment variable $SLURM_JOB_ID. Environment variable $SLURM_ARRAY_JOB_ID  will be set to the first job array index value for all tasks.&lt;br /&gt;
&lt;br /&gt;
* The remaining options in the sample job script are the same as the options used in other, non-array jobs. In the example above, we are requesting that each array task be allocated 1 CPU (--ntasks=1) and 4 GB of memory (--mem=4G) for up to one hour (--time=01:00:00).&lt;br /&gt;
&lt;br /&gt;
* More information: https://slurm.schedmd.com/job_array.html&lt;br /&gt;
&lt;br /&gt;
== How to delay the start of a job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_begin -b] (or [https://slurm.schedmd.com/sbatch.html#OPT_begin --begin]) option in order to defer the allocation of the job until the specified time.&lt;br /&gt;
&lt;br /&gt;
Examples:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch --begin=20:00 ...               # job can start after 8 p.m. &lt;br /&gt;
sbatch --begin=now+1hour ...           # job can start 1 hour after submission&lt;br /&gt;
sbatch --begin=teatime ...             # job can start at teatime (4 p.m.)&lt;br /&gt;
sbatch --begin=2023-12-24T20:00:00 ... # job can start after specified date/time&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to submit dependency (chain) jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_dependency -d] (or [https://slurm.schedmd.com/sbatch.html#OPT_dependency --dependency]) option, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -d afterany:123456 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will defer the submitted job until the specified job 123456 has terminated.&lt;br /&gt;
&lt;br /&gt;
Slurm supports a number of different dependency types, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-d after:123456      # job can begin execution after the specified job has begun execution&lt;br /&gt;
-d afterany:123456   # job can begin execution after the specified job has finished&lt;br /&gt;
-d afternotok:123456 # job can begin execution after the specified job has failed (exit code not equal zero)&lt;br /&gt;
-d afterok:123456    # job can begin execution after the specified job has successfully finished (exit code zero)&lt;br /&gt;
-d singleton         # job can begin execution after any previously job with the same job name and user have finished&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Multiple jobs can be specified by separating their job ids by colon characters (:), e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt; $ sbatch -d afterany:123456:123457 ... &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will defer the submitted job until the specified jobs 123456 and 123457 have both finished.&lt;br /&gt;
&lt;br /&gt;
== How to deal with invalid job dependencies? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_kill-on-invalid-dep --kill-on-invalid-dep=yes] option in order to automatically terminate jobs which can never run due to invalid dependencies. By default the job stays pending with reason &#039;DependencyNeverSatisfied&#039; to allow review and appropriate action by the user.  &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; A job dependency may also become invalid if a job has been submitted with &#039;-d afterok:&amp;lt;jobid&amp;gt;&#039; but the specified dependency job has failed, e.g. because it timed out (i.e. exceeded its wall time limit).&lt;br /&gt;
&lt;br /&gt;
== How to submit an MPI batch job? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/mpi_guide.html&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an MPI job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate two nodes&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=48&lt;br /&gt;
# Allocate 32 GB memory per node&lt;br /&gt;
#SBATCH --mem=32gb&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=mpi_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=mpi_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Add lines here to run your computations, e.g.&lt;br /&gt;
#&lt;br /&gt;
# Option 1: Lauch MPI tasks by using mpirun&lt;br /&gt;
#&lt;br /&gt;
# for OpenMPI and GNU compiler:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/gnu&lt;br /&gt;
# module load mpi/openmpi&lt;br /&gt;
# mpirun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# for Intel MPI and Intel complier:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/intel&lt;br /&gt;
# module load mpi/impi&lt;br /&gt;
# mpirun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# Option 2: Launch MPI tasks by using srun&lt;br /&gt;
#&lt;br /&gt;
# for OpenMPI and GNU compiler:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/gnu&lt;br /&gt;
# module load mpi/openmpi&lt;br /&gt;
# srun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# for Intel MPI and Intel compiler:&lt;br /&gt;
#&lt;br /&gt;
module load compiler/intel&lt;br /&gt;
module load mpi/impi&lt;br /&gt;
srun  ./my_mpi_program&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for MPI program: [[Media:Hello_mpi.c | Hello_mpi.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* SchedMD recommends to use srun and many (most?) sites do so as well. The rationale is that srun is more tightly integrated with the scheduler and provides more consistent and reliable resource tracking and accounting for individual jobs and job steps. mpirun may behave differently for different MPI implementations and versions. There are reports that claim &amp;quot;strange behavior&amp;quot; of mpirun especially when using task affinity and core binding. Using srun is supposed to resolve these issues and is therefore highly recommended.&lt;br /&gt;
* Do not run batch jobs that launch a large number (hundreds or thousands) short running (few minutes or less) MPI programs, e.g. from a shell loop. Every single MPI invocation does generate its own job step and sends remote procedure calls to the Slurm controller server. This can result in degradation of performance for both, Slurm and the application, especially if many of that jobs happen to run at the same time. Jobs of that kind can even get stuck without showing any further activity until hitting the wall time limit. For high throughput computing (e.g. processing a large number of files with every single task running independently from each other and very shortly), consider a more appropriate parallelization paradigm that invokes independent serial (non-MPI) processes in parallel at the same time. This approach is sometimes referred to as &amp;quot;[https://en.wikipedia.org/wiki/Embarrassingly_parallel pleasingly parallel]&amp;quot; workload. GNU Parallel is a shell tool that facilitates executing serial tasks in parallel. On JUSTUS 2 this tool is available as a software module &amp;quot;system/parallel&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== How to submit a hybrid MPI/OpenMP job? ==&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an hybrid job:&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Number of nodes to allocate&lt;br /&gt;
#SBATCH --nodes=4&lt;br /&gt;
# Number of MPI instances (ranks) to be executed per node&lt;br /&gt;
#SBATCH --ntasks-per-node=2&lt;br /&gt;
# Number of threads per MPI instance&lt;br /&gt;
#SBATCH --cpus-per-task=24&lt;br /&gt;
# Allocate 8 GB memory per node&lt;br /&gt;
#SBATCH --mem=8gb&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=hybrid_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=hybrid_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=hybrid_job-%j.err&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}&lt;br /&gt;
export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK}&lt;br /&gt;
&lt;br /&gt;
module load compiler/intel&lt;br /&gt;
module load mpi/impi&lt;br /&gt;
srun ./my_hybrid_program&lt;br /&gt;
&lt;br /&gt;
# or:&lt;br /&gt;
# mpirun ./my_hybrid_program&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for hybrid program: [[Media:Hello_hybrid.c | Hello_hybrid.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* $SLURM_CPUS_PER_TASK is only set if the &#039;--cpus-per-task&#039; option is specified.&lt;br /&gt;
&lt;br /&gt;
== How to request specific node(s) at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_nodelist -w] (or [https://slurm.schedmd.com/sbatch.html#OPT_nodelist --nodelist]) option, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -w &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also see [https://slurm.schedmd.com/sbatch.html#OPT_nodefile -F] (or [https://slurm.schedmd.com/sbatch.html#OPT_nodefile --nodefile]) option.&lt;br /&gt;
&lt;br /&gt;
== How to exclude specific nodes from job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_exclude -x] (or [https://slurm.schedmd.com/sbatch.html#OPT_exclude --exclude]) option, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -x &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get exclusive jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--exclusive&#039; option on job submission. This makes sure that there will be no other jobs running on your nodes. Very useful for benchmarking!&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* --exclusive option does &#039;&#039;&#039;not&#039;&#039;&#039; mean that you automatically get full access to all the resources which the node might provide without explicitly requesting them.&lt;br /&gt;
&lt;br /&gt;
== How to avoid sharing nodes with other users? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--exclusive=user&#039; option on job submission. This will still allow multiple jobs of one and the same user on the nodes.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Depending on configuration, exclusive=user may (and probably will) be the default node access policy on JUSTUS 2.&lt;br /&gt;
&lt;br /&gt;
==  How to submit batch job without job script? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_wrap --wrap] option.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch --nodes=2 --ntasks-per-node=16 --wrap &amp;quot;sleep 600&amp;quot;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; May be useful for testing purposes.&lt;br /&gt;
&lt;br /&gt;
= JOB MONITORING AND CONTROL =&lt;br /&gt;
&lt;br /&gt;
== How to prevent Slurm performance degradation? ==&lt;br /&gt;
&lt;br /&gt;
Almost every invocation of a Slurm client command (e.g. squeue, sacct, sprio or sshare) sends a remote procedure call (RPC) to the Slurm control daemon and/or database. &lt;br /&gt;
If enough remote procedure calls come in at once, this can result in a degradation of performance of the Slurm services for all users, possibly resulting in a denial of service. &lt;br /&gt;
&lt;br /&gt;
Therefore, &#039;&#039;&#039;do not run Slurm client commands that send remote procedure calls from loops in shell scripts or other programs&#039;&#039;&#039; (such as &#039;watch squeue&#039;). Always ensure to limit calls to squeue, sstat, sacct etc. to the minimum necessary for the information you are trying to gather. &lt;br /&gt;
&lt;br /&gt;
Slurm does collect RPC counts and timing statistics by message type and user for diagnostic purposes.&lt;br /&gt;
&lt;br /&gt;
== How to view information about submitted jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] command, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue                  # all jobs owned by user (all jobs owned by all users for admins)&lt;br /&gt;
$ squeue --me             # all jobs owned by user (same as squeue for regular users)&lt;br /&gt;
$ squeue -u &amp;lt;username&amp;gt;    # jobs of specific user&lt;br /&gt;
$ squeue -t PENDING       # pending jobs only&lt;br /&gt;
$ squeue -t RUNNING       # running jobs only&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
* The output format of [https://slurm.schedmd.com/squeue.html squeue] (and most other Slurm commands) is highly configurable to your needs. Look for the --format or --Format options.&lt;br /&gt;
&lt;br /&gt;
* Every invocation of squeue sends a remote procedure call to the Slurm database server. &#039;&#039;&#039;Do not run squeue or other Slurm client commands from loops in shell scripts or other programs&#039;&#039;&#039; as this can result in a degradation of performance. Ensure that programs limit calls to squeue to the minimum necessary for the information you are trying to gather.&lt;br /&gt;
&lt;br /&gt;
== How to cancel jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/scancel.html scancel] command, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel &amp;lt;jobid&amp;gt;         # cancel specific job&lt;br /&gt;
$ scancel &amp;lt;jobid&amp;gt;_&amp;lt;index&amp;gt; # cancel indexed job in a job array&lt;br /&gt;
$ scancel -u &amp;lt;username&amp;gt;   # cancel all jobs of specific user&lt;br /&gt;
$ scancel -t PENDING      # cancel pending jobs&lt;br /&gt;
$ scancel -t RUNNING      # cancel running jobs&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to show job script of a running job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/scontrol.html scontrol] command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol write batch_script &amp;lt;job_id&amp;gt; &amp;lt;file&amp;gt;&lt;br /&gt;
$ scontrol write batch_script &amp;lt;job_id&amp;gt; -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* If file name is omitted default file name will be slurm-&amp;lt;job_id&amp;gt;.sh&lt;br /&gt;
* If file name is - (i.e. dash) job script will be written to stdout.&lt;br /&gt;
&lt;br /&gt;
== How to get estimated start time of a job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ squeue --start&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* Estimated start times are dynamic and can change at any moment. Exact start times of individual jobs are usually unpredictable.&lt;br /&gt;
* Slurm will report N/A for the start time estimate if nodes are not currently being reserved by the scheduler for the job to run on.&lt;br /&gt;
&lt;br /&gt;
== How to show remaining walltime of running jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] with format option &amp;quot;%L&amp;quot;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt; $ squeue -t r -o &amp;quot;%u %i %L&amp;quot; &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to check priority of jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] with format options &amp;quot;%Q&amp;quot; and/or &amp;quot;%p&amp;quot;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue -o &amp;quot;%8i %8u %15a %.10r %.10L %.5D %.10Q&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sprio.html sprio] command to display the priority components (age/fairshare/...) for each job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sprio&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use &amp;quot;[https://slurm.schedmd.com/sshare.html sshare] command for listing the shares of associations, e.g. accounts.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sshare&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to prevent (hold) jobs from being scheduled for execution? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol hold &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to unhold job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol release &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to suspend a running job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol suspend &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to resume a suspended job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol resume &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to requeue (cancel and resubmit) a particular job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol requeue &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to monitor resource usage of running job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use &amp;quot;[https://slurm.schedmd.com/sstat.html sstat] command.&lt;br /&gt;
&lt;br /&gt;
&#039;sstat -e&#039; command shows a list of fields that can be specified with the &#039;--format&#039; option.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sstat --format=JobId,AveCPU,AveRSS,MaxRSS -j &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will show average CPU time, average and maximum memory consumption of all tasks in the running job.&lt;br /&gt;
Ideally, average CPU time equals the number of cores allocated for the job multiplied by the current run time of the job. &lt;br /&gt;
The maximum memory consumption gives an estimate of the peak amount of memory actually needed so far. This can be compared with the amount of memory requested for the job. Over-requesting memory can result in significant waste of compute resources.       &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Users can also ssh into compute nodes that they have one or more running jobs on. Once logged in, they can use standard Linux process monitoring tools like ps, (h)top, free, vmstat, iostat, du, ...&lt;br /&gt;
&lt;br /&gt;
* Users can also attach an interactive shell under an already allocated job by running the following command: &amp;lt;pre&amp;gt;srun --jobid &amp;lt;job&amp;gt; --overlap --pty /bin/bash&amp;lt;/pre&amp;gt; Once logged in, they can again use standard Linux process monitoring tools like ps, (h)top, free, vmstat, iostat, du, ... For a single node job the user does not even need to know the node that the job is running on. For a multinode job, the user can still use &#039;-w &amp;lt;node&amp;gt;&#039; option to specify a specific node.&lt;br /&gt;
&lt;br /&gt;
== How to get detailed job information ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show job 1234  # For job id 1234&lt;br /&gt;
$ scontrol show jobs      # For all jobs&lt;br /&gt;
$ scontrol -o show jobs   # One line per job&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to modify a pending/running job? ==&lt;br /&gt;
&lt;br /&gt;
Use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ scontrol update JobId=&amp;lt;jobid&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
E.g.: &amp;lt;pre&amp;gt;$ scontrol update JobId=42 TimeLimit=7-0&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will modify the time limit of the job to 7 days.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Update requests for &#039;&#039;&#039;running&#039;&#039;&#039; jobs are mostly restricted to Slurm administrators. In particular, only an administrator can increase the TimeLimit of a job.&lt;br /&gt;
&lt;br /&gt;
== How to show accounting data of completed job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sacct.html sacct] command.&lt;br /&gt;
&lt;br /&gt;
&#039;sacct -e&#039; command shows a list of fields that can be&lt;br /&gt;
specified with the &#039;--format&#039; option.&lt;br /&gt;
&lt;br /&gt;
== How to retrieve job history and accounting? ==&lt;br /&gt;
&lt;br /&gt;
For a specific job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -j &amp;lt;jobid&amp;gt; --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For a specific user:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note: Default time window is the current day.&lt;br /&gt;
&lt;br /&gt;
Starting from a specific date:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; -S 2020-01-15 --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Within a time window:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; -S 2020-01-15 -E 2020-01-31 --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
You can also set the environment variable $SACCT_FORMAT to specify the default format. To get a general idea of how efficiently a job utilized its resources, the following format can be used:  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export SACCT_FORMAT=&amp;quot;JobID,JobName,Elapsed,NCPUs,TotalCPU,CPUTime,ReqMem,MaxRSS,MaxDiskRead,MaxDiskWrite,State,ExitCode&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find how efficiently the CPUs were used, divide TotalCPU by CPUTime. To find how efficiently memory were used, devide MaxRSS by ReqMem. But be aware that sacct memory usage measurement doesn&#039;t catch very rapid memory spikes. If your job got killed for running out of memory, it &#039;&#039;&#039;did run out of memory&#039;&#039;&#039; even if sacct reports a lower memory usage than would trigger an out-of-memory-kill. A job that reads or writes excessively to disk might be bogged down significantly by I/O operations.&lt;br /&gt;
&lt;br /&gt;
== How to get efficiency information of completed job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;pre&amp;gt;$ seff &amp;lt;jobid&amp;gt; &amp;lt;/pre&amp;gt; command for some brief information.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; It is good practice to have a look at the efficiency of your job(s) on completion &#039;&#039;&#039;and we expect you to do so&#039;&#039;&#039;. This way you can improve your job specifications in the future.&lt;br /&gt;
&lt;br /&gt;
== How to get complete field values from sstat and sacct commands? ==&lt;br /&gt;
&lt;br /&gt;
When using the [https://slurm.schedmd.com/sacct.html#OPT_format --format] option for listing various fields you can put a %NUMBER afterwards to specify how many characters should be printed.&lt;br /&gt;
&lt;br /&gt;
E.g. &#039;--format=User%30&#039; will print 30 characters for the user name (right justified).  A %-30 will print 30 characters left justified.&lt;br /&gt;
&lt;br /&gt;
sstat and sacct also provide the &#039;--parsable&#039; and &#039;--parsable2&#039; option to always print full field values delimited with a pipe &#039;|&#039; character by default.&lt;br /&gt;
The delimiting character can be specified by using the &#039;--delimiter&#039; option, e.g. &#039;--delimiter=&amp;quot;,&amp;quot;&#039; for comma separated values.&lt;br /&gt;
&lt;br /&gt;
== How to retrieve job records for all jobs running/pending at a certain point in time? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sacct.html sacct] with [https://slurm.schedmd.com/sacct.html#OPT_state -s &amp;lt;state&amp;gt;] and [https://slurm.schedmd.com/sacct.html#OPT_starttime -S &amp;lt;start time&amp;gt;] options, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$  sacct -n -a -X -S 2021-04-01T00:00:00 -s R -o JobID,User%15,Account%10,NCPUS,NNodes,NodeList%1500&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; When specifying the state &amp;quot;-s &amp;lt;state&amp;gt;&amp;quot; &#039;&#039;&#039;and&#039;&#039;&#039; the start time &amp;quot;-S &amp;lt;start time&amp;gt;&amp;quot;, the default &lt;br /&gt;
time window will be set to end time &amp;quot;-E&amp;quot; equal to start time. Thus, you will get a snapshot of all running/pending &lt;br /&gt;
jobs at the instance given by &amp;quot;-S &amp;lt;start time&amp;gt;&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== How to get a parsable list of hostnames from $SLURM_JOB_NODELIST? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show hostnames $SLURM_JOB_NODELIST&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= ADMINISTRATION =&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Most commands in this section are restricted to system administrators.&lt;br /&gt;
&lt;br /&gt;
== How to stop Slurm from scheduling jobs? ==&lt;br /&gt;
&lt;br /&gt;
You can stop Slurm from scheduling jobs on a per partition basis by&lt;br /&gt;
setting that partition&#039;s state to DOWN. Set its state UP to resume&lt;br /&gt;
scheduling. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update PartitionName=foo State=DOWN&lt;br /&gt;
$ scontrol update PartitionName=foo State=UP&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to print actual hardware configuration of a node? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ slurmd -C   # print hardware configuration plus uptime&lt;br /&gt;
$ slurmd -G   # print generic resource configuration&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to reboot (all) nodes as soon as they become idle? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol reboot ASAP nextstate=RESUME &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;  # specific nodes&lt;br /&gt;
$ scontrol reboot ASAP nextstate=RESUME ALL              # all nodes&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to cancel pending reboot of nodes? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol cancel_reboot &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to check current node status? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show node &amp;lt;node&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to instruct all Slurm daemons to re-read the configuration file ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol reconfigure&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to prevent a user from submitting new jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use the following [https://slurm.schedmd.com/sacctmgr.html sacctmgr] command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr update user &amp;lt;username&amp;gt; set maxsubmitjobs=0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
*Job submission is then rejected with the following message:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch job.slurm&lt;br /&gt;
sbatch: error: AssocMaxSubmitJobLimit&lt;br /&gt;
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user&#039;s size and/or time limits)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Use the following command to release the limit:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr update user &amp;lt;username&amp;gt; set maxsubmitjobs=-1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to drain node(s)? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update NodeName=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; State=DRAIN Reason=&amp;quot;Some Reason&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notes:&lt;br /&gt;
&lt;br /&gt;
* Reason is mandatory.&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; just set state DOWN to drain nodes. This will kill any active jobs that may run on that nodes.&lt;br /&gt;
&lt;br /&gt;
== How to list reason for nodes being drained or down? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to resume node state? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update NodeName=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; State=RESUME&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to create a reservation on nodes? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/reservations.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol create reservation user=root starttime=now duration=UNLIMITED flags=maint,ignore_jobs nodes=ALL&lt;br /&gt;
$ scontrol create reservation user=root starttime=2020-12-24T17:00 duration=12:00:00 flags=maint,ignore_jobs nodes=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
$ scontrol show reservation&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Add &amp;quot;FLEX&amp;quot; flag to allow jobs that qualify for the reservation to start before the reservation begins (and continue after it starts). &lt;br /&gt;
Add &amp;quot;MAGNETIC&amp;quot; flag to attract jobs that qualify for the reservation to run in that reservation without having requested it at submit time.&lt;br /&gt;
&lt;br /&gt;
== How to create a floating reservation on nodes? ==&lt;br /&gt;
&lt;br /&gt;
Use the flag &amp;quot;TIME_FLOAT&amp;quot; and a start time that is relative to the current time (use the keyword &amp;quot;now&amp;quot;).&lt;br /&gt;
In the example below, the nodes are prevented from starting any jobs exceeding a walltime of 2 days.&lt;br /&gt;
 &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol create reservation user=root starttime=now+2days duration=UNLIMITED flags=maint,ignore_jobs,time_float nodes=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Floating reservation are not intended to run jobs, but to prevent long running jobs from being initiated on specific nodes. Attempts by users to make use of a floating reservation will be rejected. When ready to perform the maintenance, place the nodes in DRAIN state and delete the reservation.&lt;br /&gt;
&lt;br /&gt;
== How to use a reservation? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch --reservation=foo_6 ... script.slurm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to delete a reservation? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol delete ReservationName=foo_6&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get node oriented information similar to &#039;mdiag -n&#039;? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -N -l&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Fields can be individually customized. See [https://slurm.schedmd.com/sinfo.html sinfo] man page. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -N --format=&amp;quot;%8N %12P %.4C %.8O %.6m %.6e %.8T %.20E&amp;quot;&lt;br /&gt;
&lt;br /&gt;
NODELIST PARTITION    CPUS CPU_LOAD MEMORY FREE_M    STATE               REASON&lt;br /&gt;
n0001    standard*    0/16     0.01 128000 120445     idle                 none&lt;br /&gt;
n0002    standard*    0/16     0.01 128000 120438     idle                 none&lt;br /&gt;
n0003    standard*    0/0/      N/A 128000    N/A    down*       Not responding&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get node oriented information similar to &#039;pbsnodes&#039;? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show nodes                     # One paragraph per node (all nodes)&lt;br /&gt;
$ scontrol show nodes &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;     # One paragraph per node (specified nodes) &lt;br /&gt;
$ scontrol -o show nodes                  # One line per node (all nodes)&lt;br /&gt;
$ scontrol -o show nodes &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;  # One line per node (specified nodes)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to update multiple jobs of a user with a single scontrol command? ==&lt;br /&gt;
&lt;br /&gt;
Not possible. But you can e.g. use squeue to build the script taking&lt;br /&gt;
advantage of its filtering and formatting options.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue -tpd -h -o &amp;quot;scontrol update jobid=%i priority=1000&amp;quot; &amp;gt;my.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also identify the list of jobs and add them to the JobID all at once, for example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update JobID=123 qos=reallylargeqos&lt;br /&gt;
$ scontrol update JobID=123,456,789 qos=reallylargeqos&lt;br /&gt;
$ scontrol update JobID=[123-400],[500-600] qos=reallylargeqos&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another option is to use the JobName, if all the jobs have the same name.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update JobName=&amp;quot;foobar&amp;quot; UserID=johndoe qos=reallylargeqos&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
However, Slurm does not allow the UserID filter alone.&lt;br /&gt;
&lt;br /&gt;
== How to create a new account? ==&lt;br /&gt;
&lt;br /&gt;
Add account at top level in association tree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add account &amp;lt;accountname&amp;gt; Cluster=justus Description=&amp;quot;Account description&amp;quot; Organization=&amp;quot;none&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Add account as child of some parent account in association tree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add account &amp;lt;accountname&amp;gt; parent=&amp;lt;parent_accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to move account to another parent? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify account name=&amp;lt;accountname&amp;gt; set parent=&amp;lt;new_parent_accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to delete an account? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr delete account name=&amp;lt;accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to add a new user? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; DefaultAccount=&amp;lt;accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to add/remove users from an account? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; account=&amp;lt;accountname&amp;gt;                  # Add user to account&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; account=&amp;lt;accountname2&amp;gt;                 # Add user to a second account&lt;br /&gt;
$ sacctmgr remove user &amp;lt;username&amp;gt; where account=&amp;lt;accountname&amp;gt;         # Remove user from this account&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to change default account of a user?&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;  &lt;br /&gt;
$  sacctmgr modify user where user=&amp;lt;username&amp;gt; set DefaultAccount=&amp;lt;default_account&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The user must already be associated with the account you want to set as default.&lt;br /&gt;
&lt;br /&gt;
== How to show account information? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr show assoc&lt;br /&gt;
$ sacctmgr show assoc tree&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to implement user resource throttling policies? ==&lt;br /&gt;
&lt;br /&gt;
Quoting from https://bugs.schedmd.com/show_bug.cgi?id=3600#c4&lt;br /&gt;
&lt;br /&gt;
 With Slurm, the associations are meant to establish base limits on the&lt;br /&gt;
 defined partitions, accounts and users. Because limits propagate down&lt;br /&gt;
 through the association tree, you only need to define limits at a high&lt;br /&gt;
 level and those limits will be applied to all partitions, accounts and&lt;br /&gt;
 users that are below it (parent to child). You can also override those&lt;br /&gt;
 high level (parent) limits by explicitly setting different limits at&lt;br /&gt;
 any lower level (on the child). So using the association tree is the&lt;br /&gt;
 best way to get some base limits applied that you want for most cases. &lt;br /&gt;
 QOS&#039;s are meant to override any of those base limits for exceptional&lt;br /&gt;
 cases. Like Maui, you can use QOS&#039;s to set a different priority.&lt;br /&gt;
 Again, the QOS would be overriding the base priority that could be set&lt;br /&gt;
 in the associations.&lt;br /&gt;
&lt;br /&gt;
== How to set a resource limit for an individual user? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/resource_limits.html&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set maxjobs=1            # Limit maximum number of running jobs for user&lt;br /&gt;
$ sacctmgr list assoc user=&amp;lt;username&amp;gt; format=user,maxjobs  # Show that limit&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set maxjobs=-1           # Remove that limit&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to retrieve historical resource usage for a specific user or account? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sreport.html sreport] command.&lt;br /&gt;
&lt;br /&gt;
Examples: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sreport cluster UserUtilizationByAccount Start=2021-01-01 End=2021-12-31 -t Hours user=&amp;lt;username&amp;gt;    # Report cluster utilization of given user broken down by accounts&lt;br /&gt;
$ sreport cluster AccountUtilizationByUser Start=2021-01-01 End=2021-12-31 -t Hours account=&amp;lt;account&amp;gt;  # Report cluster utilization of given account broken down by users    &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* By default CPU resources will be reported. Use &#039;-T&#039; option for other trackable resources, e.g. &#039;-T cpu,mem,gres/gpu,gres/scratch&#039;.&lt;br /&gt;
* On JUSTUS 2 registered compute projects (&amp;quot;Rechenvorhaben&amp;quot;) are uniquely mapped to Slurm accounts of the same name. Thus, &#039;AccountUtilizationByUser&#039; can also be used to report the aggregated cluster utilization of compute projects.&lt;br /&gt;
* Can be executed by regular users as well in which case Slurm will only report their own usage records (but along with the total usage of the associated account in the case of &#039;AccountUtilizationByUser&#039;).&lt;br /&gt;
&lt;br /&gt;
== How to fix/reset a user&#039;s RawUsage value? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; where Account=&amp;lt;account&amp;gt; set RawUsage=&amp;lt;number&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to create/modify/delete QOSes? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/qos.html&lt;br /&gt;
&lt;br /&gt;
Examples:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr show qos                                      # Show existing QOSes&lt;br /&gt;
$ sacctmgr add qos verylong                              # Create new QOS verylong&lt;br /&gt;
$ sacctmgr modify qos verylong set MaxWall=28-00:00:00   # Set maximum walltime limit&lt;br /&gt;
$ sacctmgr modify qos verylong set MaxTRESPerUser=cpu=4  # Set maximum maximum number of CPUS a user can allocate at a given time&lt;br /&gt;
$ sacctmgr modify qos verylong set flags=denyonlimit     # Prevent submission if job requests exceed any limits of QOS&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set qos+=verylong      # Add a QOS to a user account&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set qos-=verylong      # Remove a QOS from a user account&lt;br /&gt;
$ sacctmgr delete qos verylong                           # Delete that QOS&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to find (and fix) runaway jobs? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sacctmgr show runaway&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* Runaway jobs are orphaned jobs that don&#039;t exist in the Slurm controller but have a start and no end time in the Slurm data base. Runaway jobs mess with accounting and affects new jobs of users who have too many runaway jobs. &lt;br /&gt;
* If there are jobs in this state this command will also provide an option to fix them. This will set the end time for each job to the latest out of the start, eligible, or submit times, and set the state to completed.&lt;br /&gt;
&lt;br /&gt;
== How to show a history of database transactions? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sacctmgr list transactions&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Useful to get timestamps for when a user/account/qos has been created/modified/removed etc.&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=12593</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=12593"/>
		<updated>2024-02-26T07:21:26Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- [[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]] --&amp;gt;&lt;br /&gt;
[[File:JUSTUS2_header.png|700px]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is the state-wide high-performance compute cluster dedicated to &#039;&#039;&#039;Computational Chemistry and Quantum Sciences&#039;&#039;&#039; in Baden-Württemberg. &lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* None &lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* Online Tutorial [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC] (external link)&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [[JUSTUS2/Support|Contact and Support]]&lt;br /&gt;
* Send [[:Category:Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
** [[JUSTUS2/Hardware#Node_Specifications|Node Specifications]]&lt;br /&gt;
** [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Running your calculations]]&lt;br /&gt;
** [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=12561</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=12561"/>
		<updated>2024-02-06T17:21:17Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- [[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]] --&amp;gt;&lt;br /&gt;
[[File:JUSTUS2_header.png|700px]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is the state-wide high-performance compute cluster dedicated to &#039;&#039;&#039;Computational Chemistry and Quantum Sciences&#039;&#039;&#039; in Baden-Württemberg. &lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Scheduled cluster maintenance will take place on Wednesday, Feb. 14th, beginning at 9:00 am. Work is expected to be completed by Tuesday, Feb. 20th. Login will be denied during maintenance. &lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* Online Tutorial [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC] (external link)&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [[JUSTUS2/Support|Contact and Support]]&lt;br /&gt;
* Send [[:Category:Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
** [[JUSTUS2/Hardware#Node_Specifications|Node Specifications]]&lt;br /&gt;
** [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Running your calculations]]&lt;br /&gt;
** [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Data_Transfer&amp;diff=11876</id>
		<title>Data Transfer</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Data_Transfer&amp;diff=11876"/>
		<updated>2023-03-16T17:53:54Z</updated>

		<summary type="html">&lt;p&gt;J Salk: /* Filezilla */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Transfer Tools ==&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! rowspan=&amp;quot;2&amp;quot; | Type&lt;br /&gt;
! rowspan=&amp;quot;2&amp;quot; | Software&lt;br /&gt;
! rowspan=&amp;quot;2&amp;quot; | Remarks&lt;br /&gt;
! colspan=&amp;quot;4&amp;quot;  style=&amp;quot;text-align:center&amp;quot; | Executable on&lt;br /&gt;
! colspan=&amp;quot;3&amp;quot;  style=&amp;quot;text-align:center&amp;quot; | Transfer from/to&lt;br /&gt;
|-&lt;br /&gt;
!Local°&lt;br /&gt;
!bwUniCluster&lt;br /&gt;
!bwForCluster&lt;br /&gt;
!www&lt;br /&gt;
!bwHPC cluster&lt;br /&gt;
![[SDS@hd]]&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;5&amp;quot; | Command-line&lt;br /&gt;
! scp&lt;br /&gt;
| rowspan=&amp;quot;3&amp;quot; | Throughput &amp;lt; 150 MB/s (depending on cipher)&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
|&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | &lt;br /&gt;
|-&lt;br /&gt;
! sftp&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
|&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
|-&lt;br /&gt;
! rsync&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
|&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | &lt;br /&gt;
|-&lt;br /&gt;
! rdata&lt;br /&gt;
| Throughput of 350-400 MB/s&lt;br /&gt;
| &lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| &lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
|-&lt;br /&gt;
! wget&lt;br /&gt;
| Download from http/ftp address only&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
|  &lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot; | Graphical &lt;br /&gt;
! [https://winscp.net/eng/download.php WinSCP]&lt;br /&gt;
| based on SCP/SFTP, Windows only &lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| &lt;br /&gt;
| &lt;br /&gt;
| &lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
|-&lt;br /&gt;
! [https://filezilla-project.org/download.php?show_all=1 FileZilla]&lt;br /&gt;
| based on SFTP&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| &lt;br /&gt;
| &lt;br /&gt;
| &lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
° Depending on the installed operating system (OS).&lt;br /&gt;
&lt;br /&gt;
== Linux/Unix/Mac commandline sftp/scp Usage Examples ==&lt;br /&gt;
&lt;br /&gt;
=== sftp===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; sftp  ka_xy1234@bwfilestorage.lsdf.kit.edu&lt;br /&gt;
Connecting to bwfilestorage.lsdf.kit.edu&amp;lt;br&amp;gt;&lt;br /&gt;
ka_xy1234@bwfilestorage.lsdf.kit.edu&#039;s password: &lt;br /&gt;
sftp&amp;gt; ls&lt;br /&gt;
snapshots&lt;br /&gt;
temp test&lt;br /&gt;
sftp&amp;gt; help&lt;br /&gt;
...&lt;br /&gt;
sftp&amp;gt; put myfile&lt;br /&gt;
sftp&amp;gt; get myfile&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== scp ===&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
&amp;gt; scp mylocalfile ul_xy1234@justus2.uni-ulm.de: # copies to home directory&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Using SFTP from Windows and Mac graphical clients ==&lt;br /&gt;
&lt;br /&gt;
Windows clients do not have a SCP/SFTP client installed by default, so it needs to be installed before this protocol can be used. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Tools:&#039;&#039;&#039;&lt;br /&gt;
*[https://winscp.net/eng/download.php WinSCP] (for Windows)&lt;br /&gt;
*[https://filezilla-project.org/download.php?show_all=1 FileZilla] (for Windows, Mac and Linux)&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;network drive over SFTP:&#039;&#039;&#039;&lt;br /&gt;
*[https://www.southrivertechnologies.com/download/downloadwd.html WebDrive] (for Windows and Mac) &lt;br /&gt;
*[https://www.eldos.com/sftp-net-drive/comparison.php  SFTP Net Drive (ELDOS)] (for Windows)&lt;br /&gt;
*[https://www.netdrive.net/ NetDrive] (for Windows)&lt;br /&gt;
*[https://www.expandrive.com/expandrive ExpanDrive] (for Windows and Mac)&lt;br /&gt;
&lt;br /&gt;
=== Filezilla ===&lt;br /&gt;
&lt;br /&gt;
Start FileZilla, Select &amp;quot;File -&amp;gt; Site Manager...&amp;quot; from the main menu and set up a new connection with the following settings:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Protocol: SFTP - SSH File Transfer Protocol&lt;br /&gt;
Host: &amp;lt;hostname&amp;gt;&lt;br /&gt;
Logon Typ: Interactive&lt;br /&gt;
User: &amp;lt;username&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; By default Filezilla will close the connection after 20 seconds of inactivity. In order to increase or disable this timeout, select &amp;quot;Edit -&amp;gt; Settings ... -&amp;gt; Connections&amp;quot; and increase &amp;quot;Timeout in seconds&amp;quot; to a reasonable value or set to 0 to disable connection timeout.&lt;br /&gt;
&lt;br /&gt;
== Best practices ==&lt;br /&gt;
&lt;br /&gt;
=== Ciphers ===&lt;br /&gt;
&lt;br /&gt;
Encrypting all the transferred data via scp/sftp takes time, which can become significant for really large data transfers. &lt;br /&gt;
&lt;br /&gt;
In these cases, you can choose a faster encryption cipher to speed up that part of your data transfer via options to ssh/sftp.&lt;br /&gt;
In our tests, these ciphers have had the listed transfer speedups over the default. If speedups are noticeable for you depends on processor type, network connection and the used hard disk. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Cipher &lt;br /&gt;
!style=&amp;quot;text-align:left;&amp;quot;| performance&lt;br /&gt;
|-&lt;br /&gt;
|chacha20-poly1305@openssh.com (default)&lt;br /&gt;
| 100%&lt;br /&gt;
|-&lt;br /&gt;
|aes128-gcm@openssh.com&lt;br /&gt;
|~200%&lt;br /&gt;
|-&lt;br /&gt;
|aes128-ctr&lt;br /&gt;
|~188%&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
With ssh/sshfs you can use different ciphers with the -c option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ssh -c aes128-gcm@openssh.com&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A list of available ciphers should be available with the command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ssh -Q cipher&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11875</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11875"/>
		<updated>2023-03-16T17:32:46Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- [[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]] --&amp;gt;&lt;br /&gt;
[[File:JUSTUS2_header.png|700px]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is the state-wide high-performance compute cluster dedicated to &#039;&#039;&#039;Computational Chemistry and Quantum Sciences&#039;&#039;&#039; in Baden-Württemberg. &lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* None&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* Online Tutorial [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC] (external link)&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [[JUSTUS2/Support|Contact and Support]]&lt;br /&gt;
* Send [[:Category:Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
** [[JUSTUS2/Hardware#Node_Specifications|Node Specifications]]&lt;br /&gt;
** [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Batch System]]&lt;br /&gt;
** [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11777</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11777"/>
		<updated>2023-02-27T21:45:43Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- [[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]] --&amp;gt;&lt;br /&gt;
[[File:JUSTUS2_header.png|700px]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is the state-wide high-performance compute cluster dedicated to &#039;&#039;&#039;Computational Chemistry and Quantum Sciences&#039;&#039;&#039; in Baden-Württemberg. &lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* 2023-02-27: Compute nodes receiving latest version of Rocky Linux 8.7&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* Online Tutorial [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC] (external link)&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [[JUSTUS2/Support|Contact and Support]]&lt;br /&gt;
* Send [[:Category:Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
** [[JUSTUS2/Hardware#Node_Specifications|Node Specifications]]&lt;br /&gt;
** [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Batch System]]&lt;br /&gt;
** [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Login&amp;diff=11699</id>
		<title>JUSTUS2/Login</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Login&amp;diff=11699"/>
		<updated>2023-01-26T09:21:47Z</updated>

		<summary type="html">&lt;p&gt;J Salk: /* Hostnames */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
Access to the bwForCluster is only possible from IP addresses within the [https://www.belwue.de BelWü] network.&lt;br /&gt;
If your computer is in your University network (e.g. at your office), you should be able to connect directly. &lt;br /&gt;
From outside the BelWü network (e.g. at home), a VPN (virtual private network) connection to your University network must be established first. Please consult the VPN documentation of your University. You can learn your current hostname/IP at e.g. http://displaymyhostname.com/&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Prerequisites for successful login:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
You need to have&lt;br /&gt;
* followed the 3-step [[Registration]] procedure. &lt;br /&gt;
* [[Registration/bwForCluster/JUSTUS2|created an account]]  at the registration server for JUSTUS2. &lt;br /&gt;
* [[Registration/Password|set a service password]] for JUSTUS2.&lt;br /&gt;
* [[Registration/2FA|set up a time-based one-time password (TOTP)]] for the two factor authentication (2FA) &lt;br /&gt;
&lt;br /&gt;
= Login to JUSTUS 2 =&lt;br /&gt;
&lt;br /&gt;
Login to bwForCluster bwForCluster JUSTUS 2 is only possible with a Secure Shell (SSH) client for which you must know your username on the cluster and the hostname of the login nodes.&lt;br /&gt;
For more gneral information on SSH clients, visit the [[Registration/Login/Client|SSH clients Guide]].&lt;br /&gt;
&lt;br /&gt;
== Username ==&lt;br /&gt;
&lt;br /&gt;
Your username on bwForCluster JUSTUS 2 consists of a prefix and your local username.&lt;br /&gt;
For prefixes please refer to the [[Registration/Login/Username|Username Guide]].&lt;br /&gt;
&lt;br /&gt;
Example: If your local username at your University is &amp;lt;code&amp;gt;ab12&amp;lt;/code&amp;gt; and you are a user from Ulm University, your username on the cluster is: &amp;lt;code&amp;gt;ul_abc12&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Hostnames ==&lt;br /&gt;
&lt;br /&gt;
JUSTUS 2 has four login nodes. We use DNS round-robin scheduling to load-balance the incoming connections between the nodes. If you are logging in multiple times, different sessions might run on different login nodes and hence programs started in one session might not be visible in another sessions. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Hostname !! Destination&lt;br /&gt;
|-&lt;br /&gt;
| &#039;&#039;&#039;justus2.uni-ulm.de&#039;&#039;&#039; || any one of the login nodes&lt;br /&gt;
|-&lt;br /&gt;
| justus2-login01.rz.uni-ulm.de || login node 01&lt;br /&gt;
|-&lt;br /&gt;
| justus2-login02.rz.uni-ulm.de || login node 02&lt;br /&gt;
|-&lt;br /&gt;
| justus2-login03.rz.uni-ulm.de || login node 03&lt;br /&gt;
|-&lt;br /&gt;
| justus2-login04.rz.uni-ulm.de || login node 04&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; justus2-login02.rz.uni-ulm.de serves as test environment for internal use and is omitted in DNS round-robin.    &lt;br /&gt;
&lt;br /&gt;
There are further two visualization nodes for use with [[VNC]]:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Hostname !! Destination&lt;br /&gt;
|-&lt;br /&gt;
| &#039;&#039;&#039;justus2-vis.uni-ulm.de&#039;&#039;&#039; || any one of the visualization nodes&lt;br /&gt;
|-&lt;br /&gt;
| justus2-vis01.rz.uni-ulm.de || vis node 01&lt;br /&gt;
|-&lt;br /&gt;
| justus2-vis02.rz.uni-ulm.de || vis node 02&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Login with SSH command (Linux, Mac, Windows) ==&lt;br /&gt;
&lt;br /&gt;
Most Unix and Unix-like operating systems like Linux or MacOS come with a built-in SSH client provided by the OpenSSH project.&lt;br /&gt;
More recent versions of Windows 10 and Windows 11 using the [https://docs.microsoft.com/en-us/windows/wsl/install Windows Subsystem for Linux] (WSL) also come with a built-in OpenSSH client. &lt;br /&gt;
&lt;br /&gt;
From those machines, you can log in using:&lt;br /&gt;
&lt;br /&gt;
 ssh &amp;lt;username&amp;gt;@justus2.uni-ulm.de&lt;br /&gt;
&lt;br /&gt;
During log in you must enter the current TOTP value (6-digit number) created with help of the TOTP app on your smartphone and your service password.&lt;br /&gt;
&lt;br /&gt;
To run graphical applications, you can use the -X flag to openssh:&lt;br /&gt;
&lt;br /&gt;
 ssh -X &amp;lt;username&amp;gt;@justus2.uni-ulm.de&lt;br /&gt;
&lt;br /&gt;
For better performance on slow connections you should use e.g. [[VNC]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Login with graphical SSH client (Windows) ==&lt;br /&gt;
&lt;br /&gt;
Example MobaXterm for login and file transfer:&lt;br /&gt;
 &lt;br /&gt;
Start MobaXterm and fill in the following fields:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Remote name              : justus2.uni-ulm.de&lt;br /&gt;
Specify user name        : &amp;lt;username&amp;gt;&lt;br /&gt;
Port                     : 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After that click on &#039;ok&#039;. Then a terminal will open where you can enter your credentials.&lt;br /&gt;
&lt;br /&gt;
== Login Example ==&lt;br /&gt;
&lt;br /&gt;
To login to bwForCluster JUSTUS 2, proceed as follows:&lt;br /&gt;
# Login with SSH command or MoabXterm as shown above.&lt;br /&gt;
# The system will ask for a one-time password &amp;lt;code&amp;gt;Your OTP:&amp;lt;/code&amp;gt;. Please enter your OTP and confirm it with Enter/Return. The OTP is not displayed when typing. If you do not have a second factor yet, please create one (see [[Registration/2FA]]).&lt;br /&gt;
# The system will ask you for your service password &amp;lt;code&amp;gt;Password:&amp;lt;/code&amp;gt;. Please enter it and confirm it with Enter/Return. The password is not displayed when typing. If you do not have a service password yet or have forgotten it, please create one (see [[Registration/Password]]).&lt;br /&gt;
# You will be greeted by the cluster, followed by a shell.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -l ul_abc12 justus2.uni-ulm.de&lt;br /&gt;
Your OTP:&lt;br /&gt;
Password: &lt;br /&gt;
&lt;br /&gt;
********************************************************************************&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*                     Baden-Wuerttemberg Research Cluster                      *&lt;br /&gt;
*                 Computational Chemistry and Quantum Sciences                 *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*                                 bwForCluster                                 *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*              __   __  __   _____  ______   __  __   _____      ___           *&lt;br /&gt;
*             / /  / / / /  / ___/ /_  __/  / / / /  / ___/     |__ \          *&lt;br /&gt;
*        __  / /  / / / /   \__ \   / /    / / / /   \__ \      __/ /          *&lt;br /&gt;
*       / /_/ /  / /_/ /   ___/ /  / /    / /_/ /   ___/ /     / __/           *&lt;br /&gt;
*       \____/   \____/   /____/  /_/     \____/   /____/     /____/           *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*                   (Rocky 8 / Kernel 4.18 / Lustre 2.12)                      *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*                       https://wiki.bwhpc.de/e/JUSTUS2                        *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*               ticket system: https://www.bwhpc.de/supportportal              *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
********************************************************************************&lt;br /&gt;
Last login: ...&lt;br /&gt;
[ul_abc12@login01 ~]$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Allowed Activities on Login Nodes ==&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;| To guarantee usability for all users of the bwForCluster you must not run your compute jobs on the login nodes. Compute jobs must be submitted as&lt;br /&gt;
[[BwForCluster_JUSTUS_2_Slurm_HOWTO|Batch Jobs]]. Any compute job running on the login nodes will be terminated without any notice.&lt;br /&gt;
|}&lt;br /&gt;
 &lt;br /&gt;
The login nodes are the access point to the compute system and its $HOME directory. The login nodes are shared with all the users of the cluster. Therefore, your activities on the login nodes are limited to primarily set up your batch jobs. Your activities may also be:&lt;br /&gt;
* compilation of your program code and&lt;br /&gt;
* short pre- and postprocessing of your batch jobs.&lt;br /&gt;
&lt;br /&gt;
= Related Information =&lt;br /&gt;
&lt;br /&gt;
* If you want to reset your service password, consult the [[Registration/Password|Password Guide]].&lt;br /&gt;
* If you want to register a new token for the two factor authentication (2FA), consult the [[Registration/2FA|2FA Guide]].&lt;br /&gt;
* If you want to de-register, consult the [[Registration/Deregistration|De-registration Guide]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
= Further reading =&lt;br /&gt;
&lt;br /&gt;
* [[Data Transfer]] - how to get your files on the cluster&lt;br /&gt;
&lt;br /&gt;
* Scientific software is made accessible using the [[Environment Modules]] system&lt;br /&gt;
&lt;br /&gt;
* Compute jobs must be submitted as [[BwForCluster_JUSTUS_2_Slurm_HOWTO|Batch Jobs]]&lt;br /&gt;
&lt;br /&gt;
* Jobs needing disk space will need to request it in their job script. See [[BwForCluster_JUSTUS_2_Slurm_HOWTO#How_to_request_local_scratch_.28SSD.2FNVMe.29_at_job_submission.3F|Batch Jobs - request local scratch]]&lt;br /&gt;
&lt;br /&gt;
* What hardware is available is described in [https://wiki.bwhpc.de/e/Hardware_and_Architecture_(bwForCluster_JUSTUS_2) Hardware and Architecture of bwForCluster JUSTUS 2]&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
----&lt;br /&gt;
[[Category:BwForCluster_JUSTUS_2]][[Category:Access]]&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Login&amp;diff=11696</id>
		<title>JUSTUS2/Login</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Login&amp;diff=11696"/>
		<updated>2023-01-23T14:53:08Z</updated>

		<summary type="html">&lt;p&gt;J Salk: /* Login Example */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
Access to the bwForCluster is only possible from IP addresses within the [https://www.belwue.de BelWü] network.&lt;br /&gt;
If your computer is in your University network (e.g. at your office), you should be able to connect directly. &lt;br /&gt;
From outside the BelWü network (e.g. at home), a VPN (virtual private network) connection to your University network must be established first. Please consult the VPN documentation of your University. You can learn your current hostname/IP at e.g. http://displaymyhostname.com/&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Prerequisites for successful login:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
You need to have&lt;br /&gt;
* followed the 3-step [[Registration]] procedure. &lt;br /&gt;
* [[Registration/bwForCluster/JUSTUS2|created an account]]  at the registration server for JUSTUS2. &lt;br /&gt;
* [[Registration/Password|set a service password]] for JUSTUS2.&lt;br /&gt;
* [[Registration/2FA|set up a time-based one-time password (TOTP)]] for the two factor authentication (2FA) &lt;br /&gt;
&lt;br /&gt;
= Login to JUSTUS 2 =&lt;br /&gt;
&lt;br /&gt;
Login to bwForCluster bwForCluster JUSTUS 2 is only possible with a Secure Shell (SSH) client for which you must know your username on the cluster and the hostname of the login nodes.&lt;br /&gt;
For more gneral information on SSH clients, visit the [[Registration/Login/Client|SSH clients Guide]].&lt;br /&gt;
&lt;br /&gt;
== Username ==&lt;br /&gt;
&lt;br /&gt;
Your username on bwForCluster JUSTUS 2 consists of a prefix and your local username.&lt;br /&gt;
For prefixes please refer to the [[Registration/Login/Username|Username Guide]].&lt;br /&gt;
&lt;br /&gt;
Example: If your local username at your University is &amp;lt;code&amp;gt;ab12&amp;lt;/code&amp;gt; and you are a user from Ulm University, your username on the cluster is: &amp;lt;code&amp;gt;ul_abc12&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Hostnames ==&lt;br /&gt;
&lt;br /&gt;
JUSTUS 2 has four login nodes. We use DNS round-robin scheduling to load-balance the incoming connections between the nodes. If you are logging in multiple times, different sessions might run on different login nodes and hence programs started in one session might not be visible in another sessions. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Hostname !! Destination&lt;br /&gt;
|-&lt;br /&gt;
| &#039;&#039;&#039;justus2.uni-ulm.de&#039;&#039;&#039; || any one of the login nodes&lt;br /&gt;
|-&lt;br /&gt;
| justus2-login01.rz.uni-ulm.de || login node 01&lt;br /&gt;
|-&lt;br /&gt;
| justus2-login02.rz.uni-ulm.de || login node 02&lt;br /&gt;
|-&lt;br /&gt;
| justus2-login03.rz.uni-ulm.de || login node 03&lt;br /&gt;
|-&lt;br /&gt;
| justus2-login04.rz.uni-ulm.de || login node 04&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
There are further two visualization nodes for use with [[VNC]]:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Hostname !! Destination&lt;br /&gt;
|-&lt;br /&gt;
| &#039;&#039;&#039;justus2-vis.uni-ulm.de&#039;&#039;&#039; || any one of the visualization nodes&lt;br /&gt;
|-&lt;br /&gt;
| justus2-vis01.rz.uni-ulm.de || vis node 01&lt;br /&gt;
|-&lt;br /&gt;
| justus2-vis02.rz.uni-ulm.de || vis node 02&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Login with SSH command (Linux, Mac, Windows) ==&lt;br /&gt;
&lt;br /&gt;
Most Unix and Unix-like operating systems like Linux or MacOS come with a built-in SSH client provided by the OpenSSH project.&lt;br /&gt;
More recent versions of Windows 10 and Windows 11 using the [https://docs.microsoft.com/en-us/windows/wsl/install Windows Subsystem for Linux] (WSL) also come with a built-in OpenSSH client. &lt;br /&gt;
&lt;br /&gt;
From those machines, you can log in using:&lt;br /&gt;
&lt;br /&gt;
 ssh &amp;lt;username&amp;gt;@justus2.uni-ulm.de&lt;br /&gt;
&lt;br /&gt;
During log in you must enter the current TOTP value (6-digit number) created with help of the TOTP app on your smartphone and your service password.&lt;br /&gt;
&lt;br /&gt;
To run graphical applications, you can use the -X flag to openssh:&lt;br /&gt;
&lt;br /&gt;
 ssh -X &amp;lt;username&amp;gt;@justus2.uni-ulm.de&lt;br /&gt;
&lt;br /&gt;
For better performance on slow connections you should use e.g. [[VNC]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Login with graphical SSH client (Windows) ==&lt;br /&gt;
&lt;br /&gt;
Example MobaXterm for login and file transfer:&lt;br /&gt;
 &lt;br /&gt;
Start MobaXterm and fill in the following fields:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Remote name              : justus2.uni-ulm.de&lt;br /&gt;
Specify user name        : &amp;lt;username&amp;gt;&lt;br /&gt;
Port                     : 22&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After that click on &#039;ok&#039;. Then a terminal will open where you can enter your credentials.&lt;br /&gt;
&lt;br /&gt;
== Login Example ==&lt;br /&gt;
&lt;br /&gt;
To login to bwForCluster JUSTUS 2, proceed as follows:&lt;br /&gt;
# Login with SSH command or MoabXterm as shown above.&lt;br /&gt;
# The system will ask for a one-time password &amp;lt;code&amp;gt;Your OTP:&amp;lt;/code&amp;gt;. Please enter your OTP and confirm it with Enter/Return. The OTP is not displayed when typing. If you do not have a second factor yet, please create one (see [[Registration/2FA]]).&lt;br /&gt;
# The system will ask you for your service password &amp;lt;code&amp;gt;Password:&amp;lt;/code&amp;gt;. Please enter it and confirm it with Enter/Return. The password is not displayed when typing. If you do not have a service password yet or have forgotten it, please create one (see [[Registration/Password]]).&lt;br /&gt;
# You will be greeted by the cluster, followed by a shell.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ ssh -l ul_abc12 justus2.uni-ulm.de&lt;br /&gt;
Your OTP:&lt;br /&gt;
Password: &lt;br /&gt;
&lt;br /&gt;
********************************************************************************&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*                     Baden-Wuerttemberg Research Cluster                      *&lt;br /&gt;
*                 Computational Chemistry and Quantum Sciences                 *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*                                 bwForCluster                                 *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*              __   __  __   _____  ______   __  __   _____      ___           *&lt;br /&gt;
*             / /  / / / /  / ___/ /_  __/  / / / /  / ___/     |__ \          *&lt;br /&gt;
*        __  / /  / / / /   \__ \   / /    / / / /   \__ \      __/ /          *&lt;br /&gt;
*       / /_/ /  / /_/ /   ___/ /  / /    / /_/ /   ___/ /     / __/           *&lt;br /&gt;
*       \____/   \____/   /____/  /_/     \____/   /____/     /____/           *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*                   (Rocky 8 / Kernel 4.18 / Lustre 2.12)                      *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*                       https://wiki.bwhpc.de/e/JUSTUS2                        *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
*               ticket system: https://www.bwhpc.de/supportportal              *&lt;br /&gt;
*                                                                              *&lt;br /&gt;
********************************************************************************&lt;br /&gt;
Last login: ...&lt;br /&gt;
[ul_abc12@login01 ~]$ &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Allowed Activities on Login Nodes ==&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;| To guarantee usability for all users of the bwForCluster you must not run your compute jobs on the login nodes. Compute jobs must be submitted as&lt;br /&gt;
[[BwForCluster_JUSTUS_2_Slurm_HOWTO|Batch Jobs]]. Any compute job running on the login nodes will be terminated without any notice.&lt;br /&gt;
|}&lt;br /&gt;
 &lt;br /&gt;
The login nodes are the access point to the compute system and its $HOME directory. The login nodes are shared with all the users of the cluster. Therefore, your activities on the login nodes are limited to primarily set up your batch jobs. Your activities may also be:&lt;br /&gt;
* compilation of your program code and&lt;br /&gt;
* short pre- and postprocessing of your batch jobs.&lt;br /&gt;
&lt;br /&gt;
= Related Information =&lt;br /&gt;
&lt;br /&gt;
* If you want to reset your service password, consult the [[Registration/Password|Password Guide]].&lt;br /&gt;
* If you want to register a new token for the two factor authentication (2FA), consult the [[Registration/2FA|2FA Guide]].&lt;br /&gt;
* If you want to de-register, consult the [[Registration/Deregistration|De-registration Guide]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
= Further reading =&lt;br /&gt;
&lt;br /&gt;
* [[Data Transfer]] - how to get your files on the cluster&lt;br /&gt;
&lt;br /&gt;
* Scientific software is made accessible using the [[Environment Modules]] system&lt;br /&gt;
&lt;br /&gt;
* Compute jobs must be submitted as [[BwForCluster_JUSTUS_2_Slurm_HOWTO|Batch Jobs]]&lt;br /&gt;
&lt;br /&gt;
* Jobs needing disk space will need to request it in their job script. See [[BwForCluster_JUSTUS_2_Slurm_HOWTO#How_to_request_local_scratch_.28SSD.2FNVMe.29_at_job_submission.3F|Batch Jobs - request local scratch]]&lt;br /&gt;
&lt;br /&gt;
* What hardware is available is described in [https://wiki.bwhpc.de/e/Hardware_and_Architecture_(bwForCluster_JUSTUS_2) Hardware and Architecture of bwForCluster JUSTUS 2]&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
----&lt;br /&gt;
[[Category:BwForCluster_JUSTUS_2]][[Category:Access]]&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_JUSTUS_2_Slurm_HOWTO&amp;diff=11497</id>
		<title>BwForCluster JUSTUS 2 Slurm HOWTO</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_JUSTUS_2_Slurm_HOWTO&amp;diff=11497"/>
		<updated>2022-12-02T15:34:44Z</updated>

		<summary type="html">&lt;p&gt;J Salk: /* How to find working sample scripts for my program? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Justus2}}&lt;br /&gt;
&lt;br /&gt;
This is a collection of howtos and convenient Slurm commands for JUSTUS 2.&lt;br /&gt;
&lt;br /&gt;
Some commands behave slightly different depending on whether they are executed &lt;br /&gt;
by a system administrator or by a regular user, as Slurm prevents regular users from accessing critical system information and viewing job and usage information of other users.  &lt;br /&gt;
&lt;br /&gt;
= GENERAL INFORMATION =&lt;br /&gt;
&lt;br /&gt;
== How to find a general quick start user guide? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/quickstart.html&lt;br /&gt;
&lt;br /&gt;
== How to find Slurm FAQ? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/faq.html&lt;br /&gt;
&lt;br /&gt;
== How to find a Slurm cheat sheet? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/pdfs/summary.pdf&lt;br /&gt;
&lt;br /&gt;
== How to find Slurm tutorials? ==&lt;br /&gt;
&lt;br /&gt;
https://slurm.schedmd.com/tutorials.html&lt;br /&gt;
&lt;br /&gt;
== How to get more information on Slurm? ==&lt;br /&gt;
&lt;br /&gt;
(Almost) every Slurm command has a man page. Use it.&lt;br /&gt;
&lt;br /&gt;
Online versions: https://slurm.schedmd.com/man_index.html&lt;br /&gt;
&lt;br /&gt;
== How to find hardware specific details about JUSTUS 2? ==&lt;br /&gt;
&lt;br /&gt;
See our Wiki page: [[Hardware and Architecture (bwForCluster JUSTUS 2)|Hardware and Architecture]]&lt;br /&gt;
&lt;br /&gt;
= JOB SUBMISSION =&lt;br /&gt;
&lt;br /&gt;
== How to submit a serial batch job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html sbatch]  command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch &amp;lt;job-script&amp;gt; &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample job script template for serial job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# 8 GB memory required per node&lt;br /&gt;
#SBATCH --mem=8G&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=serial_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=serial_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=serial_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# Run serial program&lt;br /&gt;
./my_serial_program&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for serial program: [[Media:Hello_serial.c | Hello_serial.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
* --nodes=1 and --ntasks-per-node=1 may be replaced by --ntasks=1.&lt;br /&gt;
* If not specified, stdout and stderr are both written to slurm-%j.out.&lt;br /&gt;
&lt;br /&gt;
== How to find working sample scripts for my program? ==&lt;br /&gt;
&lt;br /&gt;
Most software modules for applications provide working sample batch scripts.&lt;br /&gt;
Check with [[Software_Modules_Lmod#Module_specific_help | module help]] command, e.g. &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module help chem/vasp     # display module help for VASP&lt;br /&gt;
$ module help math/matlab   # display module help for Matlab&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to harden job scripts against common errors? ==&lt;br /&gt;
&lt;br /&gt;
The bash shell provides several options that support users in disclosing hidden bugs and writing safer job scripts.&lt;br /&gt;
In order to activate these safeguard settings users can insert the following lines in their scripts (after all #SBATCH directives):    &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
set -o errexit   # (or set -e) cause batch script to exit immediately when a command fails.&lt;br /&gt;
set -o pipefail  # cause batch script to exit immediately also when the command that failed is embedded in a pipeline&lt;br /&gt;
set -o nounset   # (or set -u) causes the script to treat unset variables as an error and exit immediately &lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to submit an interactive job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/salloc.html salloc] command, e.g.:&lt;br /&gt;
&amp;lt;pre&amp;gt;$ salloc --nodes=1 --ntasks-per-node=8&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
In previous Slurm versions &amp;lt; 20.11 the use of [https://slurm.schedmd.com/srun.html srun] has been the recommended way for launching interactive jobs, e.g.:   &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ srun --nodes=1 --ntasks-per-node=8 --pty bash &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Although this still works with current Slurm versions this is considered &#039;&#039;&#039;deprecated &#039;&#039;&#039; for current Slurm versions as it may cause issues when launching additional jobs steps from within the interactive job environment. Use [https://slurm.schedmd.com/salloc.html salloc] command.&lt;br /&gt;
&lt;br /&gt;
== How to enable X11 forwarding for an interactive job? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--x11&#039; flag, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ salloc --nodes=1 --ntasks-per-node=8 --x11     # run shell with X11 forwarding enabled&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
* For X11 forwarding to work, you must also enable X11 forwarding for your ssh login from your local computer to the cluster, i.e.:&lt;br /&gt;
 &amp;lt;pre&amp;gt;local&amp;gt; ssh -X &amp;lt;username&amp;gt;@justus2.uni-ulm.de&amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to convert Moab batch job scripts to Slurm? ==&lt;br /&gt;
&lt;br /&gt;
Replace Moab/Torque job specification flags and environment variables in your job&lt;br /&gt;
scripts by their corresponding Slurm counterparts.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Commonly used Moab job specification flags and their Slurm equivalents&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Option !! Moab (msub) !! Slurm (sbatch)&lt;br /&gt;
|-&lt;br /&gt;
| Script directive                            || #MSUB                                  || #SBATCH&lt;br /&gt;
|-&lt;br /&gt;
| Job name                                    || -N &amp;lt;name&amp;gt;                              || --job-name=&amp;lt;name&amp;gt;  (-J &amp;lt;name&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Account                                     || -A &amp;lt;account&amp;gt;                           || --account=&amp;lt;account&amp;gt; (-A &amp;lt;account&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Queue                                       || -q &amp;lt;queue&amp;gt;                             || --partition=&amp;lt;partition&amp;gt; (-p &amp;lt;partition&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Wall time limit                             || -l walltime=&amp;lt;hh:mm:ss&amp;gt;                 || --time=&amp;lt;hh:mm:ss&amp;gt; (-t &amp;lt;hh:mm:ss&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Node count                                  || -l nodes=&amp;lt;count&amp;gt;                       || --nodes=&amp;lt;count&amp;gt; (-N &amp;lt;count&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Core count                                  || -l procs=&amp;lt;count&amp;gt;                       || --ntasks=&amp;lt;count&amp;gt; (-n &amp;lt;count&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Process count per node                      || -l ppn=&amp;lt;count&amp;gt;                         || --ntasks-per-node=&amp;lt;count&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Core count per process                      ||                                        || --cpus-per-task=&amp;lt;count&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Memory limit per node                       || -l mem=&amp;lt;limit&amp;gt;                         || --mem=&amp;lt;limit&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Memory limit per process                    || -l pmem=&amp;lt;limit&amp;gt;                        || --mem-per-cpu=&amp;lt;limit&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Job array                                   || -t &amp;lt;array indices&amp;gt;                     || --array=&amp;lt;indices&amp;gt; (-a &amp;lt;indices&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Node exclusive job                          || -l naccesspolicy=singlejob             || --exclusive&lt;br /&gt;
|-&lt;br /&gt;
| Initial working directory                   || -d &amp;lt;directory&amp;gt; (default: $HOME)        || --chdir=&amp;lt;directory&amp;gt; (-D &amp;lt;directory&amp;gt;) (default: submission directory)&lt;br /&gt;
|-&lt;br /&gt;
| Standard output file                        || -o &amp;lt;file path&amp;gt;                         || --output=&amp;lt;file&amp;gt; (-o &amp;lt;file&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Standard error file                         || -e &amp;lt;file path&amp;gt;                         || --error=&amp;lt;file&amp;gt;  (-e &amp;lt;file&amp;gt;)&lt;br /&gt;
|-&lt;br /&gt;
| Combine stdout/stderr to stdout             || -j oe                                  || --output=&amp;lt;combined stdout/stderr file&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Mail notification events                    || -m &amp;lt;event&amp;gt;                             || --mail-type=&amp;lt;events&amp;gt; (valid types include: NONE, BEGIN, END, FAIL, ALL)&lt;br /&gt;
|-&lt;br /&gt;
| Export environment to job                   || -V                                     || --export=ALL (default)&lt;br /&gt;
|-&lt;br /&gt;
| Don&#039;t export environment to job             || (default)                              || --export=NONE&lt;br /&gt;
|-&lt;br /&gt;
| Export environment variables to job         || -v &amp;lt;var[=value][,var2=value2[, ...]]&amp;gt;  || --export=&amp;lt;var[=value][,var2=value2[,...]]&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
* Default initial job working directory is $HOME for Moab. For Slurm the default working directory is where you submit your job from.&lt;br /&gt;
* By default Moab does not export any environment variables to the job&#039;s runtime environment. With Slurm most of the login environment variables are exported to your job&#039;s runtime environment. This includes environment variables from software modules that were loaded at job submission time (and also $HOSTNAME variable).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Commonly used Moab/Torque script environment variables and their Slurm equivalents&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Information                 !! Moab                !! Torque               !! Slurm                                     &lt;br /&gt;
|-&lt;br /&gt;
| Job name                     || $MOAB_JOBNAME        || $PBS_JOBNAME        || $SLURM_JOB_NAME                           &lt;br /&gt;
|-&lt;br /&gt;
| Job ID                       || $MOAB_JOBID          || $PBS_JOBID          || $SLURM_JOB_ID                             &lt;br /&gt;
|-&lt;br /&gt;
| Submit directory             || $MOAB_SUBMITDIR      || $PBS_O_WORKDIR      || $SLURM_SUBMIT_DIR                         &lt;br /&gt;
|-&lt;br /&gt;
| Number of nodes allocated    || $MOAB_NODECOUNT      || $PBS_NUM_NODES      || $SLURM_JOB_NUM_NODES (and: $SLURM_NNODES) &lt;br /&gt;
|-&lt;br /&gt;
| Node list                    || $MOAB_NODELIST       || cat $PBS_NODEFILE   || $SLURM_JOB_NODELIST                       &lt;br /&gt;
|-&lt;br /&gt;
| Number of processes          || $MOAB_PROCCOUNT      || $PBS_TASKNUM        || $SLURM_NTASKS                             &lt;br /&gt;
|-&lt;br /&gt;
| Requested tasks per node     || ---                    || $PBS_NUM_PPN        || $SLURM_NTASKS_PER_NODE                    &lt;br /&gt;
|-&lt;br /&gt;
| Requested CPUs per task      || ---                  || ---                 || $SLURM_CPUS_PER_TASK                      &lt;br /&gt;
|-&lt;br /&gt;
| Job array index              || $MOAB_JOBARRAYINDEX  || $PBS_ARRAY_INDEX    || $SLURM_ARRAY_TASK_ID                      &lt;br /&gt;
|-&lt;br /&gt;
| Job array range              || $MOAB_JOBARRAYRANGE  || -                   || $SLURM_ARRAY_TASK_COUNT                   &lt;br /&gt;
|-&lt;br /&gt;
| Queue name                   || $MOAB_CLASS          || $PBS_QUEUE          || $SLURM_JOB_PARTITION                      &lt;br /&gt;
|-&lt;br /&gt;
| QOS name                     || $MOAB_QOS            || ---                 || $SLURM_JOB_QOS                            &lt;br /&gt;
|-&lt;br /&gt;
| Number of processes per node | ---                   || $PBS_NUM_PPN        || $SLURM_TASKS_PER_NODE                     &lt;br /&gt;
|-&lt;br /&gt;
| Job user                     || $MOAB_USER           || $PBS_O_LOGNAME      || $SLURM_JOB_USER                           &lt;br /&gt;
|-&lt;br /&gt;
| Hostname                     || $MOAB_MACHINE        || $PBS_O_HOST         || $SLURMD_NODENAME                          &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* See [https://slurm.schedmd.com/sbatch.html sbatch] man page for a complete list of flags and environment variables.&lt;br /&gt;
&lt;br /&gt;
== How to emulate Moab output file names? ==&lt;br /&gt;
&lt;br /&gt;
Use the following directives:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#SBATCH --output=&amp;quot;%x.o%j&amp;quot;&lt;br /&gt;
#SBATCH --error=&amp;quot;%x.e%j&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to pass command line arguments to the job script? ==&lt;br /&gt;
&lt;br /&gt;
Run &amp;lt;pre&amp;gt;$ sbatch &amp;lt;job-script&amp;gt; arg1 arg2 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Inside the job script the arguments can be accessed as $1, $2, ...&lt;br /&gt;
&lt;br /&gt;
E.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
infile=&amp;quot;$1&amp;quot;&lt;br /&gt;
outfile=&amp;quot;$2&amp;quot;&lt;br /&gt;
./my_serial_program &amp;lt; &amp;quot;$infile&amp;quot; &amp;gt; &amp;quot;$outfile&amp;quot; 2&amp;gt;&amp;amp;1&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; use $1, $2, ... in &amp;quot;#SBATCH&amp;quot; lines. These parameters can be used only within the regular shell script.&lt;br /&gt;
&lt;br /&gt;
== How to request local scratch (SSD/NVMe) at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--gres=scratch:nnn&#039; option to allocate nnn GB of local (i.e. node-local) scratch space for the entire job.&lt;br /&gt;
&lt;br /&gt;
Example: &#039;--gres=scratch:100&#039; will allocate 100 GB scratch space on a locally attached NVMe device.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; add any unit (such as --gres=scratch:100G). This would be treated as requesting an amount of 10^9 * 100GB of scratch space.&lt;br /&gt;
&lt;br /&gt;
* Multinode jobs get nnn GB of local scratch space on every node of the job.&lt;br /&gt;
&lt;br /&gt;
* Environment variable &#039;&#039;&#039;$SCRATCH&#039;&#039;&#039; will point to &lt;br /&gt;
** /scratch/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt; when local scratch has been requested. This will be on locally attached SSD/NVMe devices.&lt;br /&gt;
** /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt; when no local scratch has been requested. This will be in memory and, thus, be limited in size.&lt;br /&gt;
&lt;br /&gt;
* Environment variable &#039;&#039;&#039;$TMPDIR&#039;&#039;&#039; always points to /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt;. This will always be in memory and, thus, limited in size.&lt;br /&gt;
&lt;br /&gt;
* For backward compatibility environment variable $RAMDISK always points to /tmp/&amp;lt;user&amp;gt;.&amp;lt;jobid&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Scratch space allocation in /scratch will be enforced by quota limits&lt;br /&gt;
&lt;br /&gt;
* Data written to $TMPDIR will always count against allocated memory.&lt;br /&gt;
&lt;br /&gt;
* Data written to local scratch space will automatically be removed at the end of the job.&lt;br /&gt;
&lt;br /&gt;
== How to request GPGPU nodes at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--gres=gpu:&amp;lt;count&amp;gt;&#039; option to allocate 1 or 2 GPUs per node for the entire job.&lt;br /&gt;
&lt;br /&gt;
Example: &#039;--gres=gpu:1&#039; will allocate one GPU per node for this job.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* GPGPU nodes are equipped with two Nvidia V100S cards &lt;br /&gt;
&lt;br /&gt;
* Environment variables $CUDA_VISIBLE_DEVICES, $SLURM_JOB_GPUS and $GPU_DEVICE_ORDINAL will denote card(s) allocated for the job.&lt;br /&gt;
&lt;br /&gt;
* CUDA Toolkit is available as software module devel/cuda.&lt;br /&gt;
&lt;br /&gt;
== How to clean-up or save files before a job times out? ==&lt;br /&gt;
&lt;br /&gt;
Possibly you would like to clean up the work directory or save intermediate result files in case a job times out.&lt;br /&gt;
&lt;br /&gt;
The following sample script may serve as a blueprint for implementing a pre-termination function to perform clean-up or file recovery actions. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# 2 GB memory required per node&lt;br /&gt;
#SBATCH --mem=2G&lt;br /&gt;
# Request 10 GB local scratch space&lt;br /&gt;
#SBATCH --gres=scratch:10&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=10:00&lt;br /&gt;
# Send the USR1 signal 120 seconds before end of time limit&lt;br /&gt;
#SBATCH --signal=B:USR1@120&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=signal_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=signal_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=signal_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Define the signal handler function&lt;br /&gt;
# Note: This is not executed here, but rather when the associated &lt;br /&gt;
# signal is received by the shell.&lt;br /&gt;
finalize_job()&lt;br /&gt;
{&lt;br /&gt;
    # Do whatever cleanup you want here. In this example we copy&lt;br /&gt;
    # output file(s) back to $SLURM_SUBMIT_DIR, but you may implement &lt;br /&gt;
    # your own job finalization code here.&lt;br /&gt;
    echo &amp;quot;function finalize_job called at `date`&amp;quot;&lt;br /&gt;
    cd $SCRATCH&lt;br /&gt;
    mkdir -vp &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/results&lt;br /&gt;
    tar czvf &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/results/${SLURM_JOB_ID}.tgz output*.txt&lt;br /&gt;
    exit&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
# Call finalize_job function as soon as we receive USR1 signal&lt;br /&gt;
trap &#039;finalize_job&#039; USR1&lt;br /&gt;
&lt;br /&gt;
# Copy input files for this job to the scratch directory (if needed).&lt;br /&gt;
# Note: Environment variable $SCRATCH always points to a scratch directory &lt;br /&gt;
# automatically created for this job. Environment variable $SLURM_SUBMIT_DIR &lt;br /&gt;
# points to the path where this script was submitted from.&lt;br /&gt;
# Example:&lt;br /&gt;
# cp -v &amp;quot;$SLURM_SUBMIT_DIR&amp;quot;/input*.txt &amp;quot;$SCRATCH&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Change working directory to local scratch directory&lt;br /&gt;
cd &amp;quot;$SCRATCH&amp;quot;&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# This is where the actual work is done. In this case we just create &lt;br /&gt;
# a sample output file for 900 (=15*60) seconds, but since we asked &lt;br /&gt;
# Slurm for 600 seconds only it will not be able finish within this &lt;br /&gt;
# wall time.&lt;br /&gt;
# Note: It is important to run this task in the background &lt;br /&gt;
# by placing the &amp;amp; symbol at the end. Otherwise the signal handler &lt;br /&gt;
# would not be executed until that process has finished, which is not &lt;br /&gt;
# what we want.&lt;br /&gt;
(for i in `seq 15`; do echo &amp;quot;Hello World at `date +%H:%M:%S`.&amp;quot;; sleep 60; done) &amp;gt;output.txt 2&amp;gt;&amp;amp;1 &amp;amp;&lt;br /&gt;
&lt;br /&gt;
# Note: The command above is just for illustration. Normally you would just run&lt;br /&gt;
# my_program &amp;gt;output.txt 2&amp;gt;&amp;amp;1 &amp;amp;&lt;br /&gt;
&lt;br /&gt;
# Tell the shell to wait for background task(s) to finish. &lt;br /&gt;
# Note: This is important because otherwise the parent shell &lt;br /&gt;
# (this script) would proceed (and terminate) without waiting for &lt;br /&gt;
# background task(s) to finish.&lt;br /&gt;
wait&lt;br /&gt;
&lt;br /&gt;
# If we get here, the job did not time out but finished in time.&lt;br /&gt;
&lt;br /&gt;
# Release user defined signal handler for USR1&lt;br /&gt;
trap - USR1&lt;br /&gt;
&lt;br /&gt;
# Do regular cleanup and save files. In this example we simply call &lt;br /&gt;
# the same function that we defined as a signal handler above, but you &lt;br /&gt;
# may implement your own code here. &lt;br /&gt;
finalize_job&lt;br /&gt;
&lt;br /&gt;
exit&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* The number of seconds specified in --signal option must match the runtime of the pre-termination function and must not exceed 65535 seconds.&lt;br /&gt;
&lt;br /&gt;
* Due to the resolution of event handling by Slurm, the signal may be sent a little earlier than specified.&lt;br /&gt;
&lt;br /&gt;
== How to submit a multithreaded batch job? ==&lt;br /&gt;
&lt;br /&gt;
Sample job script template for a job running one multithreaded program instance:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate one node&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
# Number of cores per program instance&lt;br /&gt;
#SBATCH --cpus-per-task=8&lt;br /&gt;
# 8 GB memory required per node&lt;br /&gt;
#SBATCH --mem=8G&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=multithreaded_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=multithreaded_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=multithreaded_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}&lt;br /&gt;
export MKL_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}&lt;br /&gt;
&lt;br /&gt;
# Run multithreaded program&lt;br /&gt;
./my_multithreaded_program&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for multithreaded program: [[Media:Hello_openmp.c | Hello_openmp.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* In our configuration each physical core is considered a &amp;quot;CPU&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
* On JUSTUS 2 it is recommended to specify a number of cores per task (&#039;--cpus-per-task&#039;) that is either an integer divisor of 24 or (at most) 48.&lt;br /&gt;
&lt;br /&gt;
* Required memory can also by specified per allocated CPU with &#039;--mem-per-cpu&#039; option. &lt;br /&gt;
&lt;br /&gt;
* The &#039;--mem&#039; and &#039;--mem-per-cpu&#039; options are mutually exclusive.&lt;br /&gt;
&lt;br /&gt;
==  How to submit an array job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_array -a] (or [https://slurm.schedmd.com/sbatch.html#OPT_array --array]) option, e.g. &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -a 1-16%8 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will submit 16 tasks to be executed, each one indexed by SLURM_ARRAY_TASK_ID ranging from 1 to 16, but will limit the number of simultaneously running tasks from this job array to 8.&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an array job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Number of cores per individual array task&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --array=1-16%8&lt;br /&gt;
#SBATCH --mem=4G&lt;br /&gt;
#SBATCH --time=01:00:00&lt;br /&gt;
#SBATCH --job-name=array_job&lt;br /&gt;
#SBATCH --output=array_job-%A_%a.out&lt;br /&gt;
#SBATCH --error=array_job-%A_%a.err&lt;br /&gt;
&lt;br /&gt;
# Load software modules as needed, e.g.&lt;br /&gt;
# module load foo/bar&lt;br /&gt;
&lt;br /&gt;
# Print the task id.&lt;br /&gt;
echo &amp;quot;My SLURM_ARRAY_TASK_ID: &amp;quot; $SLURM_ARRAY_TASK_ID&lt;br /&gt;
&lt;br /&gt;
# Add lines here to run your computations, e.g.&lt;br /&gt;
# ./my_program &amp;lt;input.$SLURM_ARRAY_TASK_ID&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Placeholder %A will be replaced by the master job id, %a will be replaced by the array task id.&lt;br /&gt;
&lt;br /&gt;
* Every sub job in an array job will have its own unique environment variable $SLURM_JOB_ID. Environment variable $SLURM_ARRAY_JOB_ID  will be set to the first job array index value for all tasks.&lt;br /&gt;
&lt;br /&gt;
* The remaining options in the sample job script are the same as the options used in other, non-array jobs. In the example above, we are requesting that each array task be allocated 1 CPU (--ntasks=1) and 4 GB of memory (--mem=4G) for up to one hour (--time=01:00:00).&lt;br /&gt;
&lt;br /&gt;
* More information: https://slurm.schedmd.com/job_array.html&lt;br /&gt;
&lt;br /&gt;
== How to delay the start of a job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_begin -b] (or [https://slurm.schedmd.com/sbatch.html#OPT_begin --begin]) option in order to defer the allocation of the job until the specified time.&lt;br /&gt;
&lt;br /&gt;
Examples:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
sbatch --begin=20:00 ...               # job can start after 8 p.m. &lt;br /&gt;
sbatch --begin=now+1hour ...           # job can start 1 hour after submission&lt;br /&gt;
sbatch --begin=teatime ...             # job can start at teatime (4 p.m.)&lt;br /&gt;
sbatch --begin=2023-12-24T20:00:00 ... # job can start after specified date/time&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to submit dependency (chain) jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_dependency -d] (or [https://slurm.schedmd.com/sbatch.html#OPT_dependency --dependency]) option, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -d afterany:123456 ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will defer the submitted job until the specified job 123456 has terminated.&lt;br /&gt;
&lt;br /&gt;
Slurm supports a number of different dependency types, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
-d after:123456      # job can begin execution after the specified job has begun execution&lt;br /&gt;
-d afterany:123456   # job can begin execution after the specified job has finished&lt;br /&gt;
-d afternotok:123456 # job can begin execution after the specified job has failed (exit code not equal zero)&lt;br /&gt;
-d afterok:123456    # job can begin execution after the specified job has successfully finished (exit code zero)&lt;br /&gt;
-d singleton         # job can begin execution after any previously job with the same job name and user have finished&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Multiple jobs can be specified by separating their job ids by colon characters (:), e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt; $ sbatch -d afterany:123456:123457 ... &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will defer the submitted job until the specified jobs 123456 and 123457 have both finished.&lt;br /&gt;
&lt;br /&gt;
== How to deal with invalid job dependencies? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_kill-on-invalid-dep --kill-on-invalid-dep=yes] option in order to automatically terminate jobs which can never run due to invalid dependencies. By default the job stays pending with reason &#039;DependencyNeverSatisfied&#039; to allow review and appropriate action by the user.  &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; A job dependency may also become invalid if a job has been submitted with &#039;-d afterok:&amp;lt;jobid&amp;gt;&#039; but the specified dependency job has failed, e.g. because it timed out (i.e. exceeded its wall time limit).&lt;br /&gt;
&lt;br /&gt;
== How to submit an MPI batch job? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/mpi_guide.html&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an MPI job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Allocate two nodes&lt;br /&gt;
#SBATCH --nodes=2&lt;br /&gt;
# Number of program instances to be executed&lt;br /&gt;
#SBATCH --ntasks-per-node=48&lt;br /&gt;
# Allocate 32 GB memory per node&lt;br /&gt;
#SBATCH --mem=32gb&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=mpi_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=mpi_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=mpi_job-%j.err&lt;br /&gt;
&lt;br /&gt;
# Add lines here to run your computations, e.g.&lt;br /&gt;
#&lt;br /&gt;
# Option 1: Lauch MPI tasks by using mpirun&lt;br /&gt;
#&lt;br /&gt;
# for OpenMPI and GNU compiler:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/gnu&lt;br /&gt;
# module load mpi/openmpi&lt;br /&gt;
# mpirun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# for Intel MPI and Intel complier:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/intel&lt;br /&gt;
# module load mpi/impi&lt;br /&gt;
# mpirun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# Option 2: Launch MPI tasks by using srun&lt;br /&gt;
#&lt;br /&gt;
# for OpenMPI and GNU compiler:&lt;br /&gt;
#&lt;br /&gt;
# module load compiler/gnu&lt;br /&gt;
# module load mpi/openmpi&lt;br /&gt;
# srun ./my_mpi_program&lt;br /&gt;
#&lt;br /&gt;
# for Intel MPI and Intel compiler:&lt;br /&gt;
#&lt;br /&gt;
module load compiler/intel&lt;br /&gt;
module load mpi/impi&lt;br /&gt;
srun  ./my_mpi_program&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for MPI program: [[Media:Hello_mpi.c | Hello_mpi.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* SchedMD recommends to use srun and many (most?) sites do so as well. The rationale is that srun is more tightly integrated with the scheduler and provides more consistent and reliable resource tracking and accounting for individual jobs and job steps. mpirun may behave differently for different MPI implementations and versions. There are reports that claim &amp;quot;strange behavior&amp;quot; of mpirun especially when using task affinity and core binding. Using srun is supposed to resolve these issues and is therefore highly recommended.&lt;br /&gt;
* Do not run batch jobs that launch a large number (hundreds or thousands) short running (few minutes or less) MPI programs, e.g. from a shell loop. Every single MPI invocation does generate its own job step and sends remote procedure calls to the Slurm controller server. This can result in degradation of performance for both, Slurm and the application, especially if many of that jobs happen to run at the same time. Jobs of that kind can even get stuck without showing any further activity until hitting the wall time limit. For high throughput computing (e.g. processing a large number of files with every single task running independently from each other and very shortly), consider a more appropriate parallelization paradigm that invokes independent serial (non-MPI) processes in parallel at the same time. This approach is sometimes referred to as &amp;quot;[https://en.wikipedia.org/wiki/Embarrassingly_parallel pleasingly parallel]&amp;quot; workload. GNU Parallel is a shell tool that facilitates executing serial tasks in parallel. On JUSTUS 2 this tool is available as a software module &amp;quot;system/parallel&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== How to submit a hybrid MPI/OpenMP job? ==&lt;br /&gt;
&lt;br /&gt;
Sample job script template for an hybrid job:&lt;br /&gt;
&lt;br /&gt;
 &amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Number of nodes to allocate&lt;br /&gt;
#SBATCH --nodes=4&lt;br /&gt;
# Number of MPI instances (ranks) to be executed per node&lt;br /&gt;
#SBATCH --ntasks-per-node=2&lt;br /&gt;
# Number of threads per MPI instance&lt;br /&gt;
#SBATCH --cpus-per-task=24&lt;br /&gt;
# Allocate 8 GB memory per node&lt;br /&gt;
#SBATCH --mem=8gb&lt;br /&gt;
# Maximum run time of job&lt;br /&gt;
#SBATCH --time=1:00:00&lt;br /&gt;
# Give job a reasonable name&lt;br /&gt;
#SBATCH --job-name=hybrid_job&lt;br /&gt;
# File name for standard output (%j will be replaced by job id)&lt;br /&gt;
#SBATCH --output=hybrid_job-%j.out&lt;br /&gt;
# File name for error output&lt;br /&gt;
#SBATCH --error=hybrid_job-%j.err&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}&lt;br /&gt;
export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK}&lt;br /&gt;
&lt;br /&gt;
module load compiler/intel&lt;br /&gt;
module load mpi/impi&lt;br /&gt;
srun ./my_hybrid_program&lt;br /&gt;
&lt;br /&gt;
# or:&lt;br /&gt;
# mpirun ./my_hybrid_program&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Sample code for hybrid program: [[Media:Hello_hybrid.c | Hello_hybrid.c]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* $SLURM_CPUS_PER_TASK is only set if the &#039;--cpus-per-task&#039; option is specified.&lt;br /&gt;
&lt;br /&gt;
== How to request specific node(s) at job submission? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_nodelist -w] (or [https://slurm.schedmd.com/sbatch.html#OPT_nodelist --nodelist]) option, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -w &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Also see [https://slurm.schedmd.com/sbatch.html#OPT_nodefile -F] (or [https://slurm.schedmd.com/sbatch.html#OPT_nodefile --nodefile]) option.&lt;br /&gt;
&lt;br /&gt;
== How to exclude specific nodes from job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_exclude -x] (or [https://slurm.schedmd.com/sbatch.html#OPT_exclude --exclude]) option, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch -x &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get exclusive jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--exclusive&#039; option on job submission. This makes sure that there will be no other jobs running on your nodes. Very useful for benchmarking!&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* --exclusive option does &#039;&#039;&#039;not&#039;&#039;&#039; mean that you automatically get full access to all the resources which the node might provide without explicitly requesting them.&lt;br /&gt;
&lt;br /&gt;
== How to avoid sharing nodes with other users? ==&lt;br /&gt;
&lt;br /&gt;
Use &#039;--exclusive=user&#039; option on job submission. This will still allow multiple jobs of one and the same user on the nodes.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039; Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Depending on configuration, exclusive=user may (and probably will) be the default node access policy on JUSTUS 2.&lt;br /&gt;
&lt;br /&gt;
==  How to submit batch job without job script? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sbatch.html#OPT_wrap --wrap] option.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sbatch --nodes=2 --ntasks-per-node=16 --wrap &amp;quot;sleep 600&amp;quot;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; May be useful for testing purposes.&lt;br /&gt;
&lt;br /&gt;
= JOB MONITORING AND CONTROL =&lt;br /&gt;
&lt;br /&gt;
== How to prevent Slurm performance degradation? ==&lt;br /&gt;
&lt;br /&gt;
Almost every invocation of a Slurm client command (e.g. squeue, sacct, sprio or sshare) sends a remote procedure call (RPC) to the Slurm control daemon and/or database. &lt;br /&gt;
If enough remote procedure calls come in at once, this can result in a degradation of performance of the Slurm services for all users, possibly resulting in a denial of service. &lt;br /&gt;
&lt;br /&gt;
Therefore, &#039;&#039;&#039;do not run Slurm client commands that send remote procedure calls from loops in shell scripts or other programs&#039;&#039;&#039; (such as &#039;watch squeue&#039;). Always ensure to limit calls to squeue, sstat, sacct etc. to the minimum necessary for the information you are trying to gather. &lt;br /&gt;
&lt;br /&gt;
Slurm does collect RPC counts and timing statistics by message type and user for diagnostic purposes.&lt;br /&gt;
&lt;br /&gt;
== How to view information about submitted jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] command, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue                  # all jobs owned by user (all jobs owned by all users for admins)&lt;br /&gt;
$ squeue --me             # all jobs owned by user (same as squeue for regular users)&lt;br /&gt;
$ squeue -u &amp;lt;username&amp;gt;    # jobs of specific user&lt;br /&gt;
$ squeue -t PENDING       # pending jobs only&lt;br /&gt;
$ squeue -t RUNNING       # running jobs only&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
* The output format of [https://slurm.schedmd.com/squeue.html squeue] (and most other Slurm commands) is highly configurable to your needs. Look for the --format or --Format options.&lt;br /&gt;
&lt;br /&gt;
* Every invocation of squeue sends a remote procedure call to the Slurm database server. &#039;&#039;&#039;Do not run squeue or other Slurm client commands from loops in shell scripts or other programs&#039;&#039;&#039; as this can result in a degradation of performance. Ensure that programs limit calls to squeue to the minimum necessary for the information you are trying to gather.&lt;br /&gt;
&lt;br /&gt;
== How to cancel jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/scancel.html scancel] command, e.g.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel &amp;lt;jobid&amp;gt;         # cancel specific job&lt;br /&gt;
$ scancel &amp;lt;jobid&amp;gt;_&amp;lt;index&amp;gt; # cancel indexed job in a job array&lt;br /&gt;
$ scancel -u &amp;lt;username&amp;gt;   # cancel all jobs of specific user&lt;br /&gt;
$ scancel -t PENDING      # cancel pending jobs&lt;br /&gt;
$ scancel -t RUNNING      # cancel running jobs&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to show job script of a running job? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/scontrol.html scontrol] command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol write batch_script &amp;lt;job_id&amp;gt; &amp;lt;file&amp;gt;&lt;br /&gt;
$ scontrol write batch_script &amp;lt;job_id&amp;gt; -&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* If file name is omitted default file name will be slurm-&amp;lt;job_id&amp;gt;.sh&lt;br /&gt;
* If file name is - (i.e. dash) job script will be written to stdout.&lt;br /&gt;
&lt;br /&gt;
== How to get estimated start time of a job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ squeue --start&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* Estimated start times are dynamic and can change at any moment. Exact start times of individual jobs are usually unpredictable.&lt;br /&gt;
* Slurm will report N/A for the start time estimate if nodes are not currently being reserved by the scheduler for the job to run on.&lt;br /&gt;
&lt;br /&gt;
== How to show remaining walltime of running jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] with format option &amp;quot;%L&amp;quot;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt; $ squeue -t r -o &amp;quot;%u %i %L&amp;quot; &amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to check priority of jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/squeue.html squeue] with format options &amp;quot;%Q&amp;quot; and/or &amp;quot;%p&amp;quot;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue -o &amp;quot;%8i %8u %15a %.10r %.10L %.5D %.10Q&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sprio.html sprio] command to display the priority components (age/fairshare/...) for each job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sprio&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Use &amp;quot;[https://slurm.schedmd.com/sshare.html sshare] command for listing the shares of associations, e.g. accounts.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sshare&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to prevent (hold) jobs from being scheduled for execution? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol hold &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to unhold job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol release &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to suspend a running job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol suspend &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to resume a suspended job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol resume &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to requeue (cancel and resubmit) a particular job? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol requeue &amp;lt;job_id&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to monitor resource usage of running job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use &amp;quot;[https://slurm.schedmd.com/sstat.html sstat] command.&lt;br /&gt;
&lt;br /&gt;
&#039;sstat -e&#039; command shows a list of fields that can be specified with the &#039;--format&#039; option.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sstat --format=JobId,AveCPU,AveRSS,MaxRSS -j &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will show average CPU time, average and maximum memory consumption of all tasks in the running job.&lt;br /&gt;
Ideally, average CPU time equals the number of cores allocated for the job multiplied by the current run time of the job. &lt;br /&gt;
The maximum memory consumption gives an estimate of the peak amount of memory actually needed so far. This can be compared with the amount of memory requested for the job. Over-requesting memory can result in significant waste of compute resources.       &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
* Users can also ssh into compute nodes that they have one or more running jobs on. Once logged in, they can use standard Linux process monitoring tools like ps, (h)top, free, vmstat, iostat, du, ...&lt;br /&gt;
&lt;br /&gt;
* Users can also attach an interactive shell under an already allocated job by running the following command: &amp;lt;pre&amp;gt;srun --jobid &amp;lt;job&amp;gt; --overlap --pty /bin/bash&amp;lt;/pre&amp;gt; Once logged in, they can again use standard Linux process monitoring tools like ps, (h)top, free, vmstat, iostat, du, ... For a single node job the user does not even need to know the node that the job is running on. For a multinode job, the user can still use &#039;-w &amp;lt;node&amp;gt;&#039; option to specify a specific node.&lt;br /&gt;
&lt;br /&gt;
== How to get detailed job information ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show job 1234  # For job id 1234&lt;br /&gt;
$ scontrol show jobs      # For all jobs&lt;br /&gt;
$ scontrol -o show jobs   # One line per job&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to modify a pending/running job? ==&lt;br /&gt;
&lt;br /&gt;
Use&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ scontrol update JobId=&amp;lt;jobid&amp;gt; ...&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
E.g.: &amp;lt;pre&amp;gt;$ scontrol update JobId=42 TimeLimit=7-0&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This will modify the time limit of the job to 7 days.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Update requests for &#039;&#039;&#039;running&#039;&#039;&#039; jobs are mostly restricted to Slurm administrators. In particular, only an administrator can increase the TimeLimit of a job.&lt;br /&gt;
&lt;br /&gt;
== How to show accounting data of completed job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sacct.html sacct] command.&lt;br /&gt;
&lt;br /&gt;
&#039;sacct -e&#039; command shows a list of fields that can be&lt;br /&gt;
specified with the &#039;--format&#039; option.&lt;br /&gt;
&lt;br /&gt;
== How to retrieve job history and accounting? ==&lt;br /&gt;
&lt;br /&gt;
For a specific job:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -j &amp;lt;jobid&amp;gt; --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
For a specific user:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note: Default time window is the current day.&lt;br /&gt;
&lt;br /&gt;
Starting from a specific date:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; -S 2020-01-15 --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Within a time window:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacct -u &amp;lt;user&amp;gt; -S 2020-01-15 -E 2020-01-31 --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
You can also set the environment variable $SACCT_FORMAT to specify the default format. To get a general idea of how efficiently a job utilized its resources, the following format can be used:  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
export SACCT_FORMAT=&amp;quot;JobID,JobName,Elapsed,NCPUs,TotalCPU,CPUTime,ReqMem,MaxRSS,MaxDiskRead,MaxDiskWrite,State,ExitCode&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To find how efficiently the CPUs were used, divide TotalCPU by CPUTime. To find how efficiently memory were used, devide MaxRSS by ReqMem. But be aware that sacct memory usage measurement doesn&#039;t catch very rapid memory spikes. If your job got killed for running out of memory, it &#039;&#039;&#039;did run out of memory&#039;&#039;&#039; even if sacct reports a lower memory usage than would trigger an out-of-memory-kill. A job that reads or writes excessively to disk might be bogged down significantly by I/O operations.&lt;br /&gt;
&lt;br /&gt;
== How to get efficiency information of completed job(s)? ==&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;pre&amp;gt;$ seff &amp;lt;jobid&amp;gt; &amp;lt;/pre&amp;gt; command for some brief information.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; It is good practice to have a look at the efficiency of your job(s) on completion &#039;&#039;&#039;and we expect you to do so&#039;&#039;&#039;. This way you can improve your job specifications in the future.&lt;br /&gt;
&lt;br /&gt;
== How to get complete field values from sstat and sacct commands? ==&lt;br /&gt;
&lt;br /&gt;
When using the [https://slurm.schedmd.com/sacct.html#OPT_format --format] option for listing various fields you can put a %NUMBER afterwards to specify how many characters should be printed.&lt;br /&gt;
&lt;br /&gt;
E.g. &#039;--format=User%30&#039; will print 30 characters for the user name (right justified).  A %-30 will print 30 characters left justified.&lt;br /&gt;
&lt;br /&gt;
sstat and sacct also provide the &#039;--parsable&#039; and &#039;--parsable2&#039; option to always print full field values delimited with a pipe &#039;|&#039; character by default.&lt;br /&gt;
The delimiting character can be specified by using the &#039;--delimiter&#039; option, e.g. &#039;--delimiter=&amp;quot;,&amp;quot;&#039; for comma separated values.&lt;br /&gt;
&lt;br /&gt;
== How to retrieve job records for all jobs running/pending at a certain point in time? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sacct.html sacct] with [https://slurm.schedmd.com/sacct.html#OPT_state -s &amp;lt;state&amp;gt;] and [https://slurm.schedmd.com/sacct.html#OPT_starttime -S &amp;lt;start time&amp;gt;] options, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$  sacct -n -a -X -S 2021-04-01T00:00:00 -s R -o JobID,User%15,Account%10,NCPUS,NNodes,NodeList%1500&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; When specifying the state &amp;quot;-s &amp;lt;state&amp;gt;&amp;quot; &#039;&#039;&#039;and&#039;&#039;&#039; the start time &amp;quot;-S &amp;lt;start time&amp;gt;&amp;quot;, the default &lt;br /&gt;
time window will be set to end time &amp;quot;-E&amp;quot; equal to start time. Thus, you will get a snapshot of all running/pending &lt;br /&gt;
jobs at the instance given by &amp;quot;-S &amp;lt;start time&amp;gt;&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== How to get a parsable list of hostnames from $SLURM_JOB_NODELIST? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show hostnames $SLURM_JOB_NODELIST&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= ADMINISTRATION =&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Most commands in this section are restricted to system administrators.&lt;br /&gt;
&lt;br /&gt;
== How to stop Slurm from scheduling jobs? ==&lt;br /&gt;
&lt;br /&gt;
You can stop Slurm from scheduling jobs on a per partition basis by&lt;br /&gt;
setting that partition&#039;s state to DOWN. Set its state UP to resume&lt;br /&gt;
scheduling. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update PartitionName=foo State=DOWN&lt;br /&gt;
$ scontrol update PartitionName=foo State=UP&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to print actual hardware configuration of a node? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ slurmd -C   # print hardware configuration plus uptime&lt;br /&gt;
$ slurmd -G   # print generic resource configuration&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to reboot (all) nodes as soon as they become idle? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol reboot ASAP nextstate=RESUME &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;  # specific nodes&lt;br /&gt;
$ scontrol reboot ASAP nextstate=RESUME ALL              # all nodes&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to cancel pending reboot of nodes? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol cancel_reboot &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to check current node status? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show node &amp;lt;node&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to instruct all Slurm daemons to re-read the configuration file ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol reconfigure&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to prevent a user from submitting new jobs? ==&lt;br /&gt;
&lt;br /&gt;
Use the following [https://slurm.schedmd.com/sacctmgr.html sacctmgr] command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr update user &amp;lt;username&amp;gt; set maxsubmitjobs=0&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
*Job submission is then rejected with the following message:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch job.slurm&lt;br /&gt;
sbatch: error: AssocMaxSubmitJobLimit&lt;br /&gt;
sbatch: error: Batch job submission failed: Job violates accounting/QOS policy (job submit limit, user&#039;s size and/or time limits)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* Use the following command to release the limit:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr update user &amp;lt;username&amp;gt; set maxsubmitjobs=-1&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to drain node(s)? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update NodeName=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; State=DRAIN Reason=&amp;quot;Some Reason&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notes:&lt;br /&gt;
&lt;br /&gt;
* Reason is mandatory.&lt;br /&gt;
&lt;br /&gt;
* Do &#039;&#039;&#039;not&#039;&#039;&#039; just set state DOWN to drain nodes. This will kill any active jobs that may run on that nodes.&lt;br /&gt;
&lt;br /&gt;
== How to list reason for nodes being drained or down? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -R&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to resume node state? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update NodeName=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt; State=RESUME&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to create a reservation on nodes? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/reservations.html&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol create reservation user=root starttime=now duration=UNLIMITED flags=maint,ignore_jobs nodes=ALL&lt;br /&gt;
$ scontrol create reservation user=root starttime=2020-12-24T17:00 duration=12:00:00 flags=maint,ignore_jobs nodes=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
$ scontrol show reservation&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Add &amp;quot;FLEX&amp;quot; flag to allow jobs that qualify for the reservation to start before the reservation begins (and continue after it starts). &lt;br /&gt;
Add &amp;quot;MAGNETIC&amp;quot; flag to attract jobs that qualify for the reservation to run in that reservation without having requested it at submit time.&lt;br /&gt;
&lt;br /&gt;
== How to create a floating reservation on nodes? ==&lt;br /&gt;
&lt;br /&gt;
Use the flag &amp;quot;TIME_FLOAT&amp;quot; and a start time that is relative to the current time (use the keyword &amp;quot;now&amp;quot;).&lt;br /&gt;
In the example below, the nodes are prevented from starting any jobs exceeding a walltime of 2 days.&lt;br /&gt;
 &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol create reservation user=root starttime=now+2days duration=UNLIMITED flags=maint,ignore_jobs,time_float nodes=&amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Floating reservation are not intended to run jobs, but to prevent long running jobs from being initiated on specific nodes. Attempts by users to make use of a floating reservation will be rejected. When ready to perform the maintenance, place the nodes in DRAIN state and delete the reservation.&lt;br /&gt;
&lt;br /&gt;
== How to use a reservation? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch --reservation=foo_6 ... script.slurm&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to delete a reservation? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol delete ReservationName=foo_6&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get node oriented information similar to &#039;mdiag -n&#039;? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -N -l&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Fields can be individually customized. See [https://slurm.schedmd.com/sinfo.html sinfo] man page. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sinfo -N --format=&amp;quot;%8N %12P %.4C %.8O %.6m %.6e %.8T %.20E&amp;quot;&lt;br /&gt;
&lt;br /&gt;
NODELIST PARTITION    CPUS CPU_LOAD MEMORY FREE_M    STATE               REASON&lt;br /&gt;
n0001    standard*    0/16     0.01 128000 120445     idle                 none&lt;br /&gt;
n0002    standard*    0/16     0.01 128000 120438     idle                 none&lt;br /&gt;
n0003    standard*    0/0/      N/A 128000    N/A    down*       Not responding&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to get node oriented information similar to &#039;pbsnodes&#039;? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol show nodes                     # One paragraph per node (all nodes)&lt;br /&gt;
$ scontrol show nodes &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;     # One paragraph per node (specified nodes) &lt;br /&gt;
$ scontrol -o show nodes                  # One line per node (all nodes)&lt;br /&gt;
$ scontrol -o show nodes &amp;lt;node1&amp;gt;,&amp;lt;node2&amp;gt;  # One line per node (specified nodes)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to update multiple jobs of a user with a single scontrol command? ==&lt;br /&gt;
&lt;br /&gt;
Not possible. But you can e.g. use squeue to build the script taking&lt;br /&gt;
advantage of its filtering and formatting options.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue -tpd -h -o &amp;quot;scontrol update jobid=%i priority=1000&amp;quot; &amp;gt;my.script&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can also identify the list of jobs and add them to the JobID all at once, for example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update JobID=123 qos=reallylargeqos&lt;br /&gt;
$ scontrol update JobID=123,456,789 qos=reallylargeqos&lt;br /&gt;
$ scontrol update JobID=[123-400],[500-600] qos=reallylargeqos&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Another option is to use the JobName, if all the jobs have the same name.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scontrol update JobName=&amp;quot;foobar&amp;quot; UserID=johndoe qos=reallylargeqos&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
However, Slurm does not allow the UserID filter alone.&lt;br /&gt;
&lt;br /&gt;
== How to create a new account? ==&lt;br /&gt;
&lt;br /&gt;
Add account at top level in association tree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add account &amp;lt;accountname&amp;gt; Cluster=justus Description=&amp;quot;Account description&amp;quot; Organization=&amp;quot;none&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Add account as child of some parent account in association tree:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add account &amp;lt;accountname&amp;gt; parent=&amp;lt;parent_accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to move account to another parent? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify account name=&amp;lt;accountname&amp;gt; set parent=&amp;lt;new_parent_accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to delete an account? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr delete account name=&amp;lt;accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to add a new user? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; DefaultAccount=&amp;lt;accountname&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to add/remove users from an account? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; account=&amp;lt;accountname&amp;gt;                  # Add user to account&lt;br /&gt;
$ sacctmgr add user &amp;lt;username&amp;gt; account=&amp;lt;accountname2&amp;gt;                 # Add user to a second account&lt;br /&gt;
$ sacctmgr remove user &amp;lt;username&amp;gt; where account=&amp;lt;accountname&amp;gt;         # Remove user from this account&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to show account information? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr show assoc&lt;br /&gt;
$ sacctmgr show assoc tree&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to implement user resource throttling policies? ==&lt;br /&gt;
&lt;br /&gt;
Quoting from https://bugs.schedmd.com/show_bug.cgi?id=3600#c4&lt;br /&gt;
&lt;br /&gt;
 With Slurm, the associations are meant to establish base limits on the&lt;br /&gt;
 defined partitions, accounts and users. Because limits propagate down&lt;br /&gt;
 through the association tree, you only need to define limits at a high&lt;br /&gt;
 level and those limits will be applied to all partitions, accounts and&lt;br /&gt;
 users that are below it (parent to child). You can also override those&lt;br /&gt;
 high level (parent) limits by explicitly setting different limits at&lt;br /&gt;
 any lower level (on the child). So using the association tree is the&lt;br /&gt;
 best way to get some base limits applied that you want for most cases. &lt;br /&gt;
 QOS&#039;s are meant to override any of those base limits for exceptional&lt;br /&gt;
 cases. Like Maui, you can use QOS&#039;s to set a different priority.&lt;br /&gt;
 Again, the QOS would be overriding the base priority that could be set&lt;br /&gt;
 in the associations.&lt;br /&gt;
&lt;br /&gt;
== How to set a resource limit for an individual user? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/resource_limits.html&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set maxjobs=1            # Limit maximum number of running jobs for user&lt;br /&gt;
$ sacctmgr list assoc user=&amp;lt;username&amp;gt; format=user,maxjobs  # Show that limit&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set maxjobs=-1           # Remove that limit&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to retrieve historical resource usage for a specific user or account? ==&lt;br /&gt;
&lt;br /&gt;
Use [https://slurm.schedmd.com/sreport.html sreport] command.&lt;br /&gt;
&lt;br /&gt;
Examples: &lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sreport cluster UserUtilizationByAccount Start=2021-01-01 End=2021-12-31 -t Hours user=&amp;lt;username&amp;gt;    # Report cluster utilization of given user broken down by accounts&lt;br /&gt;
$ sreport cluster AccountUtilizationByUser Start=2021-01-01 End=2021-12-31 -t Hours account=&amp;lt;account&amp;gt;  # Report cluster utilization of given account broken down by users    &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* By default CPU resources will be reported. Use &#039;-T&#039; option for other trackable resources, e.g. &#039;-T cpu,mem,gres/gpu,gres/scratch&#039;.&lt;br /&gt;
* On JUSTUS 2 registered compute projects (&amp;quot;Rechenvorhaben&amp;quot;) are uniquely mapped to Slurm accounts of the same name. Thus, &#039;AccountUtilizationByUser&#039; can also be used to report the aggregated cluster utilization of compute projects.&lt;br /&gt;
* Can be executed by regular users as well in which case Slurm will only report their own usage records (but along with the total usage of the associated account in the case of &#039;AccountUtilizationByUser&#039;).&lt;br /&gt;
&lt;br /&gt;
== How to fix/reset a user&#039;s RawUsage value? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; where Account=&amp;lt;account&amp;gt; set RawUsage=&amp;lt;number&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to create/modify/delete QOSes? ==&lt;br /&gt;
&lt;br /&gt;
Suggested reading: https://slurm.schedmd.com/qos.html&lt;br /&gt;
&lt;br /&gt;
Examples:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sacctmgr show qos                                      # Show existing QOSes&lt;br /&gt;
$ sacctmgr add qos verylong                              # Create new QOS verylong&lt;br /&gt;
$ sacctmgr modify qos verylong set MaxWall=28-00:00:00   # Set maximum walltime limit&lt;br /&gt;
$ sacctmgr modify qos verylong set MaxTRESPerUser=cpu=4  # Set maximum maximum number of CPUS a user can allocate at a given time&lt;br /&gt;
$ sacctmgr modify qos verylong set flags=denyonlimit     # Prevent submission if job requests exceed any limits of QOS&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set qos+=verylong      # Add a QOS to a user account&lt;br /&gt;
$ sacctmgr modify user &amp;lt;username&amp;gt; set qos-=verylong      # Remove a QOS from a user account&lt;br /&gt;
$ sacctmgr delete qos verylong                           # Delete that QOS&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== How to find (and fix) runaway jobs? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sacctmgr show runaway&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
* Runaway jobs are orphaned jobs that don&#039;t exist in the Slurm controller but have a start and no end time in the Slurm data base. Runaway jobs mess with accounting and affects new jobs of users who have too many runaway jobs. &lt;br /&gt;
* If there are jobs in this state this command will also provide an option to fix them. This will set the end time for each job to the latest out of the start, eligible, or submit times, and set the state to completed.&lt;br /&gt;
&lt;br /&gt;
== How to show a history of database transactions? ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sacctmgr list transactions&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Useful to get timestamps for when a user/account/qos has been created/modified/removed etc.&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11432</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11432"/>
		<updated>2022-11-09T23:40:57Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- [[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]] --&amp;gt;&lt;br /&gt;
[[File:JUSTUS2_header.png|700px]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is the state-wide high-performance compute cluster dedicated to &#039;&#039;&#039;Computational Chemistry and Quantum Sciences&#039;&#039;&#039; in Baden-Württemberg. &lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Currently no news.&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* Online Tutorial [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC] (external link)&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [[JUSTUS2/Support|Contact and Support]]&lt;br /&gt;
* Send [[:Category:Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
** [[JUSTUS2/Hardware#Node_Specifications|Node Specifications]]&lt;br /&gt;
** [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Batch System]]&lt;br /&gt;
** [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11431</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11431"/>
		<updated>2022-11-09T23:40:14Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- [[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]] --&amp;gt;&lt;br /&gt;
[[File:JUSTUS2_header.png|800px]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is the state-wide high-performance compute cluster dedicated to &#039;&#039;&#039;Computational Chemistry and Quantum Sciences&#039;&#039;&#039; in Baden-Württemberg. &lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Currently no news.&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* Online Tutorial [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC] (external link)&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [[JUSTUS2/Support|Contact and Support]]&lt;br /&gt;
* Send [[:Category:Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
** [[JUSTUS2/Hardware#Node_Specifications|Node Specifications]]&lt;br /&gt;
** [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Batch System]]&lt;br /&gt;
** [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11430</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11430"/>
		<updated>2022-11-09T23:38:34Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- [[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]] --&amp;gt;&lt;br /&gt;
[[File:JUSTUS2_header.png|1120px]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is the state-wide high-performance compute cluster dedicated to &#039;&#039;&#039;Computational Chemistry and Quantum Sciences&#039;&#039;&#039; in Baden-Württemberg. &lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Currently no news.&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* Online Tutorial [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC] (external link)&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [[JUSTUS2/Support|Contact and Support]]&lt;br /&gt;
* Send [[:Category:Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
** [[JUSTUS2/Hardware#Node_Specifications|Node Specifications]]&lt;br /&gt;
** [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Batch System]]&lt;br /&gt;
** [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11429</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11429"/>
		<updated>2022-11-09T23:19:57Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!-- [[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]] --&amp;gt;&lt;br /&gt;
[[File:JUSTUS2_header.png|600px]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is the state-wide high-performance compute cluster dedicated to &#039;&#039;&#039;Computational Chemistry and Quantum Sciences&#039;&#039;&#039; in Baden-Württemberg. &lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Currently no news.&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* Online Tutorial [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC] (external link)&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [[JUSTUS2/Support|Contact and Support]]&lt;br /&gt;
* Send [[:Category:Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
** [[JUSTUS2/Hardware#Node_Specifications|Node Specifications]]&lt;br /&gt;
** [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Batch System]]&lt;br /&gt;
** [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=File:JUSTUS2_header.png&amp;diff=11427</id>
		<title>File:JUSTUS2 header.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=File:JUSTUS2_header.png&amp;diff=11427"/>
		<updated>2022-11-09T22:59:54Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11340</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11340"/>
		<updated>2022-11-09T18:50:14Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is a high-performance computer dedicated to  Computational Chemistry and Quantum Sciences and located at Ulm University.&lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Currently no news.&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* Online Tutorial [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC] (external link)&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [[JUSTUS2/Support|Contact and Support]]&lt;br /&gt;
* Send [[:Category:Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
** [[JUSTUS2/Hardware#Node_Specifications|Node Specifications]]&lt;br /&gt;
** [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Batch System]]&lt;br /&gt;
** [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Support&amp;diff=11339</id>
		<title>JUSTUS2/Support</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Support&amp;diff=11339"/>
		<updated>2022-11-09T18:37:00Z</updated>

		<summary type="html">&lt;p&gt;J Salk: Created page with &amp;quot;==Entitlement==  In case of questions or problems regarding your entitlement for using any bwForCluster, please contact your local university first level help desk.  ==Registr...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Entitlement==&lt;br /&gt;
&lt;br /&gt;
In case of questions or problems regarding your entitlement for using any bwForCluster, please&lt;br /&gt;
contact your local university first level help desk.&lt;br /&gt;
&lt;br /&gt;
==Registration and HPC Support==&lt;br /&gt;
&lt;br /&gt;
In case of registration problems or questions about using JUSTUS 2, please submit a trouble ticket at the [https://bw-support.scc.kit.edu/ bwSupport Portal] and assign it&lt;br /&gt;
to support unit &#039;&#039;&#039;bwForCluster JUSTUS&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
[[Category:BwForCluster_Chemistry]]&lt;br /&gt;
[[Category:Support]]&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11338</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11338"/>
		<updated>2022-11-09T18:33:59Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is a high-performance computer dedicated to  Computational Chemistry and Quantum Sciences and located at Ulm University.&lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Currently no news.&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* Online Tutorial [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC] (external link)&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [[JUSTUS2/Support|Contact and Support]]&lt;br /&gt;
* Send [[:Category:Feedback|Feedback]] about Wiki pages&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
* [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Batch System]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11337</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11337"/>
		<updated>2022-11-09T18:04:53Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is a high-performance computer dedicated to  Computational Chemistry and Quantum Sciences and located at Ulm University.&lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Currently no news.&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [https://bw-support.scc.kit.edu/ Submit a Ticket] to support unit &#039;bwForCluster JUSTUS&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
* [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Batch System]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please &#039;&#039;&#039;[[JUSTUS2/Acknowledgement|acknowledge]]&#039;&#039;&#039; the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11336</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11336"/>
		<updated>2022-11-09T18:03:35Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is a high-performance computer dedicated to  Computational Chemistry and Quantum Sciences and located at Ulm University.&lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Currently no news.&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [https://bw-support.scc.kit.edu/ Submit a Ticket] to support unit &#039;bwForCluster JUSTUS&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
* [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Batch System]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please [[JUSTUS2/Acknowledgement|acknowledge]] the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11335</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11335"/>
		<updated>2022-11-09T17:59:53Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is a high-performance computer dedicated to  Computational Chemistry and Quantum Sciences and located at Ulm University.&lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Currently no news.&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [https://bw-support.scc.kit.edu/ Submit a Ticket] to support unit &#039;bwForCluster JUSTUS&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
* [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Batch System]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#d1dadf; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#bed1db; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* Please [[JUSTUS2/Acknowledgement|acknowledge]] the JUSTUS 2 cluster in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11334</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11334"/>
		<updated>2022-11-09T17:21:04Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]]&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS 2&#039;&#039;&#039; is a high-performance computer dedicated to  Computational Chemistry and Quantum Sciences and located at Ulm University.&lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Currently no news.&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [https://bw-support.scc.kit.edu/ Submit a Ticket] to support unit &#039;bwForCluster JUSTUS&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
* [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Batch System]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[JUSTUS2/Acknowledgement|Acknowledge]] the cluster in your publications&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Hardware&amp;diff=11333</id>
		<title>JUSTUS2/Hardware</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Hardware&amp;diff=11333"/>
		<updated>2022-11-09T17:18:53Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Justus2}}&lt;br /&gt;
&lt;br /&gt;
= System Architecture =&lt;br /&gt;
&lt;br /&gt;
The HPC cluster is composed of [[login nodes]], [[compute nodes]] and parallel storage systems connected by fast data networks. It is connected to the Internet via Baden Württemberg&#039;s extended LAN [https://www.belwue.de/ BelWü] (light blue).&lt;br /&gt;
&lt;br /&gt;
{|  style=&amp;quot;margin: 1em auto 1em auto;&amp;quot;&lt;br /&gt;
|[[Image:JUSTUS2_Architecture.png|thumb|upright=1.5|right|Overview on JUSTUS 2 hardware architecture. All nodes are additionally connected by 1GB Ethernet. ]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Users log in on one of the four login nodes and have access to their home and working directories (darker blue) stored in the parallel file system [[Lustre]].&lt;br /&gt;
&lt;br /&gt;
Two additional special login visualization node enable users to visualize compute results directly on the cluster. &lt;br /&gt;
&lt;br /&gt;
Calculations are done on the several types of compute nodes (top), which are accessed via the batch queuing system [[Slurm JUSTUS 2| Slurm ]].&lt;br /&gt;
&lt;br /&gt;
== Operating System and Software  ==&lt;br /&gt;
&lt;br /&gt;
* Operating System: [https://rockylinux.org Rocky Linux 8]&lt;br /&gt;
* Queuing System: [[Slurm JUSTUS 2| Slurm ]] (also see: [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO (JUSTUS 2)]])&lt;br /&gt;
* [[Software_Modules_Lmod|Environment Modules]] for site specific scientific applications,  developer tools and libraries&lt;br /&gt;
&lt;br /&gt;
== Common Hardware Features ==&lt;br /&gt;
&lt;br /&gt;
The system consists of 702 nodes (692 compute nodes and 10 dedicated login, service and visualization nodes) with 2 processors each and a total of 33,696 processor cores. &lt;br /&gt;
&lt;br /&gt;
* Processor: 2 x Intel Xeon 6252 Gold (Cascade Lake, 24-core, 2.1 GHz)&lt;br /&gt;
* Two processors per node (2 x 24 cores)&lt;br /&gt;
* [https://en.wikipedia.org/wiki/Omni-Path Omni-Path] 100 Gbit/s interconnect&lt;br /&gt;
&lt;br /&gt;
== Node Specifications ==&lt;br /&gt;
&lt;br /&gt;
The nodes are tiered in terms of hardware configuration (amount of memory, local NVMe, hardware accelerators) in order to be able to serve a large range of different job requirements flexibly and efficiently.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;text-align:center;&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Standard Nodes&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Fast I/O Nodes&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Large fast I/O Nodes&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Special Nodes&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Login- and Service Nodes&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Visualization Nodes&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot;| Quantity&lt;br /&gt;
| 456 / 44&lt;br /&gt;
| 148 / 20 &lt;br /&gt;
| 8&lt;br /&gt;
| 14 / 2&lt;br /&gt;
| 8&lt;br /&gt;
| 2&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | CPU Type&lt;br /&gt;
|colspan=&amp;quot;6&amp;quot; style=&amp;quot;text-align:center;&amp;quot;  | 2 x [https://ark.intel.com/content/www/de/de/ark/products/192447/intel-xeon-gold-6252-processor-35-75m-cache-2-10-ghz.html Intel Xeon E6252 Gold (Cascade Lake)]&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | CPU Frequency&lt;br /&gt;
|colspan=&amp;quot;6&amp;quot; | 2.1 GHz Base Frequency, 3.7 GHz Max. Turbo Frequency&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Cores per Node&lt;br /&gt;
|colspan=&amp;quot;6&amp;quot; | 48&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Accelerator&lt;br /&gt;
|colspan=&amp;quot;3&amp;quot; | ---&lt;br /&gt;
| 2 x [https://www.nvidia.com/en-us/data-center/v100/ Nvidia V100S] / FPGA &lt;br /&gt;
| ---&lt;br /&gt;
| Nvidia Quadro P4000 Graphics &lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Memory&lt;br /&gt;
| 192 GB / 384 GB  &lt;br /&gt;
| 384 GB / 768 GB &lt;br /&gt;
| 1536 GB &lt;br /&gt;
|colspan=&amp;quot;3&amp;quot; | 192 GB&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Local SSD&lt;br /&gt;
| ---&lt;br /&gt;
| 2 x 1.6 TB NVMe (RAID 0) &lt;br /&gt;
| 5 x 1.6 TB NVMe (RAID 0)&lt;br /&gt;
| ---&lt;br /&gt;
|colspan=&amp;quot;2&amp;quot; |  2 x 2.0 TB NVMe (RAID 1)&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Interconnect&lt;br /&gt;
|colspan=&amp;quot;6&amp;quot; | Omni-Path 100&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The special nodes with FPGA are not yet available.&lt;br /&gt;
&lt;br /&gt;
== Storage Architecture ==&lt;br /&gt;
&lt;br /&gt;
The bwForCluster JUSTUS 2 provides of two independent distributed parallel file systems, one for the user&#039;s home directories &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt; and another one for global workspaces. This storage architecture is based on [https://en.wikipedia.org/wiki/Lustre_(file_system) Lustre] and can be accessed in parallel from any nodes. Additionally, some compute nodes (fast I/O nodes) provide locally attached NVMe storage devices for I/O demanding applications.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;|&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$HOME&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| Workspace&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$SCRATCH&amp;lt;/tt&amp;gt;&lt;br /&gt;
! style=&amp;quot;width:10%&amp;quot;| &amp;lt;tt&amp;gt;$TMPDIR&amp;lt;/tt&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Visibility&lt;br /&gt;
| global&lt;br /&gt;
| global&lt;br /&gt;
| node local&lt;br /&gt;
| node local&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Lifetime&lt;br /&gt;
| permanent&lt;br /&gt;
| workspace lifetime (max. 90 days, extension possible)&lt;br /&gt;
| batch job walltime&lt;br /&gt;
| batch job walltime&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Total Capacity&lt;br /&gt;
| 250 TB&lt;br /&gt;
| 1200 TB&lt;br /&gt;
| 3000 GB / 7300 GB per node&lt;br /&gt;
| max. half of RAM per node&lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Disk [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 400 GB per user&lt;br /&gt;
| 20 TB per user&lt;br /&gt;
| none &lt;br /&gt;
| none &lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | File [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]&lt;br /&gt;
| 2.000.000 files per user&lt;br /&gt;
| 5.000.000 files per user&lt;br /&gt;
| none &lt;br /&gt;
| none &lt;br /&gt;
|-&lt;br /&gt;
!scope=&amp;quot;column&amp;quot; | Backup&lt;br /&gt;
| yes&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
| no&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
  global             : accessible from all nodes&lt;br /&gt;
  local              : accessible from allocated node only &lt;br /&gt;
  permanent          : files are stored permanently (as long as user can access the system)&lt;br /&gt;
  batch job walltime : files are removed at end of the batch job&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Disk and file quota limits are soft limits and are subject to change. Quotas feature a grace period where users may exceed their limits to some extent (currently 20%) for a brief period of time (currently 4 weeks).&lt;br /&gt;
&lt;br /&gt;
=== $HOME ===&lt;br /&gt;
&lt;br /&gt;
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.&lt;br /&gt;
&lt;br /&gt;
Current disk usage on home directory and quota status can be checked with the  command &#039;&#039;&#039;lfs quota -h -u $USER /lustre/home&#039;&#039;&#039;. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; Compute jobs on nodes must not write temporary data to $HOME. Instead they should use the local $SCRATCH or $TMPDIR directories for very I/O intensive jobs and workspaces for less I/O intensive jobs.&lt;br /&gt;
&lt;br /&gt;
=== Workspaces ===&lt;br /&gt;
&lt;br /&gt;
Workspaces can be generated through the [[workspace]] tools. This will generate a directory with a limited lifetime on the parallel global work file system. When this &lt;br /&gt;
lifetime is reached the workspace will be deleted automatically after a grace period. &lt;br /&gt;
Users will be notified by daily e-mail reminders starting 7 days before expiration of &lt;br /&gt;
a workspace. Workspaces can (and must) be extended to prevent deletion at the expiration date.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Defaults and maximum values&#039;&#039;&#039;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
| Default lifetime (days)&lt;br /&gt;
| 7&lt;br /&gt;
|-&lt;br /&gt;
| Maximum lifetime&lt;br /&gt;
| 90&lt;br /&gt;
|-&lt;br /&gt;
| Maximum extensions&lt;br /&gt;
| unlimited&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Examples&#039;&#039;&#039;&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!style=&amp;quot;width:30%&amp;quot; | Command&lt;br /&gt;
!style=&amp;quot;width:70%&amp;quot; | Action&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;ws_allocate my_workspace 30&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Allocate a workspace named &amp;quot;my_workspace&amp;quot; for 30 days.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;ws_list&amp;lt;/tt&amp;gt;&lt;br /&gt;
|List all your workspaces.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;ws_find my_workspace&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Get absolute path of workspace &amp;quot;my_workspace&amp;quot;.&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;ws_extend my_workspace 30&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Set expiration date of workspace &amp;quot;my_workspace&amp;quot; to 30 days (regardless of remaining days).&lt;br /&gt;
|-&lt;br /&gt;
|&amp;lt;tt&amp;gt;ws_release my_workspace&amp;lt;/tt&amp;gt;&lt;br /&gt;
|Manually erase your workspace &amp;quot;my_workspace&amp;quot; and release used space on storage (remove data first for immediate deletion of the data).&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Current disk usage on workspace file system and quota status can be checked with the  command &#039;&#039;&#039;lfs quota -h -u $USER /lustre/work&#039;&#039;&#039;. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; The parallel work file system works optimal for medium to large file sizes and non-[https://en.wikipedia.org/wiki/Random_access random access] patterns. Large quantities of small files significantly decrease IO performance and must be avoided. Consider using &#039;&#039;&#039;[https://wiki.bwhpc.de/e/BwForCluster_JUSTUS_2_Slurm_HOWTO#How_to_request_local_scratch_.28SSD.2FNVMe.29_at_job_submission.3F local scratch]&#039;&#039;&#039; for these.&lt;br /&gt;
&lt;br /&gt;
=== $SCRATCH and $TMPDIR ===&lt;br /&gt;
&lt;br /&gt;
On compute nodes the environment variables $SCRATCH and $TMPDIR always point to local scratch space that is not shared across nodes.&lt;br /&gt;
&lt;br /&gt;
$TMPDIR always points to a directory on a local RAM disk which will provide up to 50% of the total RAM capacity of the node. Thus, data written to $TMPDIR will always count against allocated memory. &lt;br /&gt;
&lt;br /&gt;
$SCRATCH will point to a directory on locally attached NVMe devices if (and only if) local scratch has been explicitly requested at job submission (i.e. with --gres=scratch:nnn option). If no local scratch has been requested at job submission $SCRATCH will point to the very same directory as $TMPDIR (i.e. to the RAM disk). &lt;br /&gt;
&lt;br /&gt;
On the login nodes $TMPDIR and $SCRATCH point to a local scratch directory on that node. This is located at /scratch/&amp;lt;username&amp;gt; and is also not shared across nodes. The data stored in there is private but will be deleted automatically if not accessed for 7 consecutive days. Like any other local scratch space, the data stored in there is NOT included in any backup.&lt;br /&gt;
&lt;br /&gt;
[[Category:BwForCluster_JUSTUS_2]][[Category:Hardware_and_Architecture|bwForCluster_JUSTUS2]]&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11332</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11332"/>
		<updated>2022-11-09T15:29:53Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]]&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS2&#039;&#039;&#039; is a high-performance computer dedicated to  Computational Chemistry and Quantum Sciences and  located at Ulm University.&lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Currently no news.&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [https://bw-support.scc.kit.edu/ Submit a Ticket] to support unit &#039;bwForCluster JUSTUS&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
* [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Batch System]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[JUSTUS2/Acknowledgement|Acknowledge]] the cluster in your publications&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Software&amp;diff=11331</id>
		<title>JUSTUS2/Software</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Software&amp;diff=11331"/>
		<updated>2022-11-09T15:26:50Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Environment Modules ==&lt;br /&gt;
Most software is provided as Modules.&lt;br /&gt;
&lt;br /&gt;
Required reading to use: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Available Software ==&lt;br /&gt;
&lt;br /&gt;
* Web: Visit [https://www.bwhpc.de/software.php https://www.bwhpc.de/software.php], select &amp;lt;code&amp;gt;Cluster → bwForCluster JUSTUS2&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* On the cluster: &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software in Containers ==&lt;br /&gt;
&lt;br /&gt;
Instructions for loading software in containers: [[Singularity]]&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
Documentation for environment modules available on the cluster:  &lt;br /&gt;
&lt;br /&gt;
* with command &amp;lt;code&amp;gt;module help&amp;lt;/code&amp;gt;&lt;br /&gt;
* examples in &amp;lt;code&amp;gt;$SOFTNAME_EXA_DIR&amp;lt;/code&amp;gt;&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
{| style=&amp;quot;width:100%;&amp;quot;&lt;br /&gt;
|- style=&amp;quot;background:#eeeeee;&amp;quot; |&lt;br /&gt;
|  &amp;lt;code&amp;gt;module help&amp;lt;/code&amp;gt; ||  See section: [[Environment_Modules#module_help]]&lt;br /&gt;
|-  style=&amp;quot;background:#dddddd; &amp;quot; | &lt;br /&gt;
| examples in &amp;lt;code&amp;gt;$SOFTNAME_EXA_DIR&amp;lt;/code&amp;gt; || See section: [[Environment_Modules#Software_job_examples]]&lt;br /&gt;
|- style=&amp;quot;background:#eeeeee; &amp;quot; | &lt;br /&gt;
| this wiki || See below&lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
== Documentation in the Wiki ==&lt;br /&gt;
For some environment modules additional documentation is provided here.&lt;br /&gt;
&amp;lt;!-- this list could be generated via {{Special:PrefixIndex/JUSTUS2/Software/|stripprefix=yes}} --&amp;gt;&lt;br /&gt;
* [[ADF]]&lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software/Dalton|Dalton]]&lt;br /&gt;
&lt;br /&gt;
* [[Gaussian]]&lt;br /&gt;
&lt;br /&gt;
* [[Gaussview]]&lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software/Molden|Molden]]&lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software/Schrodinger|Schrodinger]]&lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software/Turbomole|Turbomole]]&lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software/VASP|VASP]]&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Software&amp;diff=11330</id>
		<title>JUSTUS2/Software</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Software&amp;diff=11330"/>
		<updated>2022-11-09T15:18:23Z</updated>

		<summary type="html">&lt;p&gt;J Salk: /* Documentation */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Environment Modules ==&lt;br /&gt;
Most software is provided as Modules.&lt;br /&gt;
&lt;br /&gt;
Required reading to use: [[Environment Modules]]&lt;br /&gt;
&lt;br /&gt;
== Available Software ==&lt;br /&gt;
&lt;br /&gt;
* Web: Visit [https://www.bwhpc.de/software.php https://www.bwhpc.de/software.php], select &amp;lt;code&amp;gt;Cluster → bwForCluster JUSTUS2&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* On the cluster: &amp;lt;code&amp;gt;module avail&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Software in Containers ==&lt;br /&gt;
&lt;br /&gt;
Instructions for loading software in containers: [[Singularity]]&lt;br /&gt;
&lt;br /&gt;
== Documentation ==&lt;br /&gt;
Documentation for environment modules available on the cluster:  &lt;br /&gt;
{| style=&amp;quot;width:100%;&amp;quot;&lt;br /&gt;
|- style=&amp;quot;background:#eeeeee;&amp;quot; |&lt;br /&gt;
|  &amp;lt;code&amp;gt;module help&amp;lt;/code&amp;gt; ||  See section: [[Environment_Modules#module_help]]&lt;br /&gt;
|-  style=&amp;quot;background:#dddddd; &amp;quot; | &lt;br /&gt;
| examples in &amp;lt;code&amp;gt;$SOFTNAME_EXA_DIR&amp;lt;/code&amp;gt; || See section: [[Environment_Modules#Software_job_examples]]&lt;br /&gt;
|- style=&amp;quot;background:#eeeeee; &amp;quot; | &lt;br /&gt;
| this wiki || See below&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Documentation in the Wiki ==&lt;br /&gt;
Modules with additional documentation here in the wiki:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!-- this list could be generated via {{Special:PrefixIndex/JUSTUS2/Software/|stripprefix=yes}} --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* [[ADF]]&lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software/Dalton|Dalton]]&lt;br /&gt;
&lt;br /&gt;
* [[Gaussian]]&lt;br /&gt;
&lt;br /&gt;
* [[Gaussview]]&lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software/Molden|Molden]]&lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software/Schrodinger|Schrodinger]]&lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software/Turbomole|Turbomole]]&lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software/VASP|VASP]]&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11329</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11329"/>
		<updated>2022-11-09T15:01:26Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]]&lt;br /&gt;
&amp;lt;!-- Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page. --&amp;gt;&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS2&#039;&#039;&#039; is a high-performance computer dedicated to  Computational Chemistry and Quantum Sciences and  located at Ulm University.&lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Currently no news.&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [https://bw-support.scc.kit.edu/ Submit a Ticket] to support unit &#039;bwForCluster JUSTUS&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
* [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Batch System (Slurm)]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[JUSTUS2/Acknowledgement|Acknowledge]] the cluster in your publications&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11328</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11328"/>
		<updated>2022-11-09T14:54:50Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]]&lt;br /&gt;
Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page.&lt;br /&gt;
&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS2&#039;&#039;&#039; is a high-performance computer dedicated to  Computational Chemistry and Quantum Sciences and  located at Ulm University.&lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [https://bw-support.scc.kit.edu/ Submit a Ticket] to support unit &#039;bwForCluster JUSTUS&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
* [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Batch System (Slurm)]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[JUSTUS2/Acknowledgement|Acknowledge]] the cluster in your publications&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Getting_Started&amp;diff=11327</id>
		<title>JUSTUS2/Getting Started</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Getting_Started&amp;diff=11327"/>
		<updated>2022-11-09T14:49:27Z</updated>

		<summary type="html">&lt;p&gt;J Salk: /* Find information about installed software and examples */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!--&lt;br /&gt;
Here is a short list of things you may need to do first when you get onto the cluster&lt;br /&gt;
== Basics ==&lt;br /&gt;
* log in to the cluster: [[JUSTUS2/Login]]&lt;br /&gt;
* get accustomed with the linux commandline: &lt;br /&gt;
** [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC/The_Command_Line introduction on the (external) hpc wiki] or &lt;br /&gt;
** linux course at [https://training.bwhpc.de/ training.bwhpc.de]&lt;br /&gt;
&lt;br /&gt;
== Running an Example with Preinstalled  Software ==&lt;br /&gt;
* scientific software: read on how to load [[Software Modules]]&lt;br /&gt;
* continue reading until you found that there are example job scripts: [[Environment_Modules#Software_job_examples]]&lt;br /&gt;
* submit a sample job from a software as mentioned in the job example. Also see: [[JUSTUS2/Slurm]]&lt;br /&gt;
* monitor your job: [[JUSTUS2/Slurm#Monitoring_Your_Jobs]]&lt;br /&gt;
== Running Your Own Calculations ==&lt;br /&gt;
* transfer your own data to the cluster: [[Data Transfer]] &lt;br /&gt;
* adapt the sample job script to run your own job&lt;br /&gt;
&lt;br /&gt;
Note that your jobs should not write/read much on the lustre filesystem while the job runs, but either use the ram disk in /tmp or request /scratch if the space of the ram disk isn&#039;t sufficient. The [[BwForCluster_JUSTUS_2_Slurm_HOWTO#How_to_clean-up_or_save_files_before_a_job_times_out.3F| Slurm Howto]] shows how to copy and clean up your data from /tmp or /scratch at the end of the job&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
== Get access to the cluster ==&lt;br /&gt;
&lt;br /&gt;
Follow the registration process for the bwForCluster. &amp;amp;rarr; [[Registration/bwForCluster|How to Register for a bwForCluster]]&lt;br /&gt;
&lt;br /&gt;
== Login to the cluster ==&lt;br /&gt;
&lt;br /&gt;
Setup service password and 2FA token and login to the cluster. &amp;amp;rarr; [[JUSTUS2/Login|Login JUSTUS2]]&lt;br /&gt;
&lt;br /&gt;
== Transfer your data to the cluster ==&lt;br /&gt;
&lt;br /&gt;
Get familiar with available file systems on the cluster. &amp;amp;rarr; [[Hardware_and_Architecture_(bwForCluster_JUSTUS_2)#Storage_Architecture|File Systems]]&lt;br /&gt;
&lt;br /&gt;
Transfer your data to the cluster using appropriate tools. &amp;amp;rarr; [[Data Transfer|Data Transfer]]&lt;br /&gt;
&lt;br /&gt;
== Find information about installed software and examples ==&lt;br /&gt;
&lt;br /&gt;
Compiler, Libraries and application software are provided as software modules. Learn how to work with software modules. &amp;amp;rarr; [[JUSTUS2/Software|Software Modules]]&lt;br /&gt;
&amp;lt;!-- Overview of available software modules &amp;amp;rarr; [https://www.bwhpc.de/software.php https://www.bwhpc.de/software.php], select &amp;lt;code&amp;gt;Cluster → bwForCluster JUSTUS 2&amp;lt;/code&amp;gt; --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Submit your application as a batch job ==&lt;br /&gt;
&lt;br /&gt;
Get familiar with available nodes types on the cluster. &amp;amp;rarr; [[Hardware_and_Architecture_(bwForCluster_JUSTUS_2)|Hardware and Architecture]]&lt;br /&gt;
&lt;br /&gt;
Submit and monitor your jobs with Slurm commands. &amp;amp;rarr; [[JUSTUS2/Slurm|Batch System Slurm]]&lt;br /&gt;
&lt;br /&gt;
== Acknowledge the cluster ==&lt;br /&gt;
&lt;br /&gt;
Remember to mention the cluster in your publications. &amp;amp;rarr; [[bwForCluster JUSTUS 2  Acknowledgement|Acknowledgement]]&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11326</id>
		<title>JUSTUS2</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2&amp;diff=11326"/>
		<updated>2022-11-09T14:46:44Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;[[File:JUSTUS2_pre.jpg|right|frameless|thumb|alt=JUSTUS2 |upright=0.4| JUSTUS 2 ]]&lt;br /&gt;
Note: This page replaces replace [[:Category:BwForCluster_JUSTUS_2]] as an overview page.&lt;br /&gt;
&lt;br /&gt;
The &#039;&#039;&#039;bwForCluster JUSTUS2&#039;&#039;&#039; is a high-performance computer dedicated to  Computational Chemistry and Quantum Sciences and  located at Ulm University.&lt;br /&gt;
&amp;lt;!--{| style=&amp;quot;  background:#FEF4AB; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#FFE856; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | News&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
|}&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Training &amp;amp; Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [[JUSTUS2/Getting Started|Getting Started]]&lt;br /&gt;
* E-Learning Course [https://training.bwhpc.de/goto.php?target=crs_629_rcodeM6n48kAUsT&amp;amp;client_id=BWHPC  Introduction to JUSTUS2 ] &lt;br /&gt;
* [https://bw-support.scc.kit.edu/ Submit a Ticket] to support unit &#039;bwForCluster JUSTUS&#039;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | User Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration/bwForCluster|Registration]]&lt;br /&gt;
* [[JUSTUS2/Login|Login]]&lt;br /&gt;
* [[JUSTUS2/Hardware|Hardware and Architecture]] &lt;br /&gt;
* [[JUSTUS2/Hardware#Storage_Architecture|File Systems and Workspaces]] &lt;br /&gt;
&lt;br /&gt;
* [[JUSTUS2/Software|Software]]&lt;br /&gt;
* [[JUSTUS2/Slurm|Batch System (Slurm)]]&lt;br /&gt;
* [[JUSTUS2/Visualization|Visualisation]] &lt;br /&gt;
* [[Development]] - compiling software, parallel programming, etc&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Cluster Funding&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[JUSTUS2/Acknowledgement|Acknowledge]] the cluster in your publications&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Jobscripts:_Running_Your_Calculations&amp;diff=11325</id>
		<title>JUSTUS2/Jobscripts: Running Your Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Jobscripts:_Running_Your_Calculations&amp;diff=11325"/>
		<updated>2022-11-09T14:17:48Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Justus2}}&lt;br /&gt;
&lt;br /&gt;
The JUSTUS 2 cluster uses Slurm ([https://slurm.schedmd.com/ https://slurm.schedmd.com/]) for scheduling compute jobs. &lt;br /&gt;
&lt;br /&gt;
= JUSTUS 2 Slurm Howto =&lt;br /&gt;
&lt;br /&gt;
This page only presents some very basic introduction. &lt;br /&gt;
&lt;br /&gt;
Please see  the &#039;&#039;&#039;[[bwForCluster JUSTUS 2 Slurm HOWTO|JUSTUS 2 Slurm HOWTO]]&#039;&#039;&#039; for many more examples and commands for common tasks.&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the bwForCluster JUSTUS 2 =&lt;br /&gt;
Batch jobs are submitted with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;$ sbatch &amp;lt;job-script&amp;gt; &amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A job script contains options for Slurm in lines beginning with #SBATCH as well as your commands which you want to execute on the compute nodes. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&#039;bash&#039;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
#SBATCH --time=00:20:00&lt;br /&gt;
#SBATCH --mem=1gb&lt;br /&gt;
#SBATCH --export=NONE&lt;br /&gt;
echo &#039;Here starts the calculation&#039;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can override options from the script on the command-line:&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;$ sbatch --time=03:00:00 &amp;lt;job-script&amp;gt; &amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note: &amp;lt;font color=&amp;quot;red&amp;quot;&amp;gt; Compute jobs must not write/read from the global file systems as a calculation swap file. &amp;lt;/font&amp;gt; &lt;br /&gt;
&lt;br /&gt;
Use local storage /tmp in the ramdisk for small files or /scratch (see [[BwForCluster_JUSTUS_2_Slurm_HOWTO#How_to_request_local_scratch_.28SSD.2FNVMe.29_at_job_submission.3F|How to request NVME]]) for this purpose&lt;br /&gt;
&lt;br /&gt;
= Monitoring Your Jobs =&lt;br /&gt;
== squeue ==&lt;br /&gt;
&lt;br /&gt;
After you submitted the job, you can see it waiting using the &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&lt;br /&gt;
(also read the man page with &amp;lt;code&amp;gt;man squeue&amp;lt;/code&amp;gt; for more information on how to use the command)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&#039;shell&#039;&amp;gt;&lt;br /&gt;
&amp;gt; squeue&lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
             6260301  standard r_60_b_2 ul_yxz1 PD       0:00      1 (AssocGrpMemRunMinutes)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Output shows: &lt;br /&gt;
* JOBID: the jobid is an unique number your job gets&lt;br /&gt;
* PARTITION: the cluster can be divided in different types of nodes.&lt;br /&gt;
* NAME: the name you gave your job with the --job-name= option&lt;br /&gt;
* USER: your username&lt;br /&gt;
* ST: the state the job is in. R = running, PD = pending, CD = completed. See man page for a full list on states. &lt;br /&gt;
* TIME: how long the job has been running&lt;br /&gt;
* NODES: how many nodes were requested&lt;br /&gt;
* NODELIST(REASON): either show the node(s) the job is running on, or a reason why it hasn&#039;t started&lt;br /&gt;
&lt;br /&gt;
==scontrol==&lt;br /&gt;
&lt;br /&gt;
You can then show more info on one specific running job using the &amp;lt;code&amp;gt;scontrol&amp;lt;/code&amp;gt; command, e.g for the job with ID 6260301 listed above:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
&amp;gt; scontrol show job 6260301&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring a Started Job ==&lt;br /&gt;
&lt;br /&gt;
After a job has started, you can ssh to the node(s) the job is running on, using the node name from NODELIST, e.g. if your job runs on n0603:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&amp;gt; ssh n0603 &lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Slurm Command Overview =&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Description&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/sbatch.html sbatch] || Submits a job and queues it in an input queue&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/squeue.html squeue] || Displays information about active, eligible, blocked, and/or recently completed jobs &lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/scontrol.html scontrol] || Displays detailed job state information&lt;br /&gt;
|- &lt;br /&gt;
| [https://slurm.schedmd.com/scancel.html scancel] || Cancels a job&lt;br /&gt;
|- &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Partitions =&lt;br /&gt;
Job allocations at JUSTUS 2 are routed automatically to the most suitable compute node(s) that can provide the requested resources for the job (e.g. amount of cores, memory, local scratch space). This is to prevent fragmentation of the cluster system and to ensure most efficient usage of available compute resources. Thus, there is no point in requesting a partition in batch job scripts, i.e. users &#039;&#039;&#039;should not&#039;&#039;&#039; specify any partition &amp;quot;-p, --partition=&amp;lt;partition_name&amp;gt;&amp;quot; on job submission. This is of particular importance if you adapt job scripts from other cluster systems (e.g. bwUniCluster 2.0) to JUSTUS 2.&lt;br /&gt;
&lt;br /&gt;
= Job Priorities =&lt;br /&gt;
Job priorities at JUSTUS 2 depend on [https://slurm.schedmd.com/priority_multifactor.html multiple factors ]:&lt;br /&gt;
* Age: The amount of time a job has been waiting in the queue, eligible to be scheduled.&lt;br /&gt;
* Fairshare: The difference between the portion of the computing resource allocated to an association and the amount of resources that has been consumed.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
Jobs that are pending because the user reached one of the resource usage limits (see below) are not eligible to be scheduled and, thus, do not accrue priority by their age.  &lt;br /&gt;
&lt;br /&gt;
Fairshare does &#039;&#039;&#039;not&#039;&#039;&#039; introduce a fixed allotment, in that a user&#039;s ability to run new jobs is cut off as soon as a fixed target utilization is reached. Instead, the fairshare factor ensures that jobs from users who were under-served in the past are given higher priority than jobs from users who were over-served in the past. This keeps individual groups from long term monopolizing the resources, thus making it unfair to groups who have not used their fairshare for quite some time.&lt;br /&gt;
&lt;br /&gt;
Slurm features &#039;&#039;&#039;backfilling&#039;&#039;&#039;, meaning that the scheduler will start lower priority jobs if doing so does not delay the expected start time of &#039;&#039;&#039;any&#039;&#039;&#039; higher priority job. Since the expected start time of pending jobs depends upon the expected completion time of running jobs, reasonably accurate time limits are valuable for backfill scheduling to work well. This &#039;&#039;&#039;[https://youtu.be/OKhWwem1XZg?t=161 video]&#039;&#039;&#039; gives an illustrative description to how backfilling works.&lt;br /&gt;
&lt;br /&gt;
In summary, an approximate model of Slurm&#039;s behavior for scheduling jobs is this:&lt;br /&gt;
&lt;br /&gt;
* Step 1: Can the job in position one (highest priority) start now?&lt;br /&gt;
* Step 2: If it can, remove it from the queue, start it and continue with step 1.&lt;br /&gt;
* Step 3: If it can not, look at next job.&lt;br /&gt;
* Step 4: Can it start now, without delaying the start time of any job before it in the queue?&lt;br /&gt;
* Step 5: If it can, remove it from the queue, start it, recalculate what nodes are free, look at next job and continue with step 4.&lt;br /&gt;
* Step 6: If it can not, look at next job, and continue with step 4.&lt;br /&gt;
&lt;br /&gt;
As soon as a new job is submitted and as soon as a job finishes, Slurm restarts its main scheduling cycle with step 1.&lt;br /&gt;
&lt;br /&gt;
= Usage Limits/Throttling Policies =&lt;br /&gt;
&lt;br /&gt;
While the fairshare factor ensures fair long term balance of resource utilization between users and groups, there are additional usage limits that constrain the total cumulative resources at a given time. This is to prevent individual users from short term monopolizing large fractions of the whole cluster system.&lt;br /&gt;
&lt;br /&gt;
* The &#039;&#039;&#039;maximum walltime&#039;&#039;&#039; for a job is &#039;&#039;&#039;14 days&#039;&#039;&#039; (336 hours)&lt;br /&gt;
  --time=336:00:00 or --time=14-0&lt;br /&gt;
&lt;br /&gt;
* The &#039;&#039;&#039;maximum amount of cores&#039;&#039;&#039; used at any given time from jobs running is &#039;&#039;&#039;1920&#039;&#039;&#039; per user (aggregated over all running jobs). This translates to 40 nodes. An equivalent limit for allocated memory does also apply. If this limit is reached new jobs will be queued (with REASON: AssocGrpCpuLimit) but only allowed to run after resources have been relinquished. &lt;br /&gt;
&lt;br /&gt;
* The maximum amount of &#039;&#039;&#039;remaining allocated core-minutes&#039;&#039;&#039; per user is &#039;&#039;&#039;3300000&#039;&#039;&#039; (aggregated over all running jobs). For example, if a user has a 4-core job running that will complete in 1 hour and a 2-core job that will complete in 6 hours, this translates to 4 * 1 * 60 + 2 * 6 * 60 = 16 * 60 = 960 remaining core-minutes. Once a user reaches the limit, no more jobs are allowed to start (REASON: AssocGrpCPURunMinutesLimit). As the jobs continue to run, the remaining core time will decrease and eventually allow more jobs to start in a staggered way. This limit also &#039;&#039;&#039;correlates the maximum walltime and amount of cores that can be allocated&#039;&#039;&#039; for this amount of time. Thus, shorter walltimes for the jobs allow more resources to be allocated at a given time (but capped by the maximum amount of cores limit above). Watch this &#039;&#039;&#039;[https://youtu.be/OKhWwem1XZg?t=306 video]&#039;&#039;&#039; for an illustrative description. An equivalent limit applies for remaining time of memory allocation in which case jobs may be held back from starting with REASON AssocGrpMemRunMinutes.&lt;br /&gt;
&lt;br /&gt;
* The &#039;&#039;&#039;maximum amount of GPUs&#039;&#039;&#039; allocated by running jobs is &#039;&#039;&#039;4&#039;&#039;&#039; per user. If this limit it reached new jobs will be queued (with REASON: AssocGrpGRES) but only allowed to run after GPU resources have been relinquished. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Usage limits are subject to change.&lt;br /&gt;
&lt;br /&gt;
= Other Considerations =&lt;br /&gt;
&lt;br /&gt;
== Default Values ==&lt;br /&gt;
&lt;br /&gt;
Default values for jobs are:&lt;br /&gt;
&lt;br /&gt;
* Runtime: --time=02:00:00 (2 hours)&lt;br /&gt;
* Nodes: --nodes=1 (one node)&lt;br /&gt;
* Tasks: --tasks-per-node=1 (one task per node)&lt;br /&gt;
* Cores: --cpus-per-task=1 (one core per task)&lt;br /&gt;
* Memory: --mem-per-cpu=2gb (2 GB per core)&lt;br /&gt;
&lt;br /&gt;
== Node Access Policy ==&lt;br /&gt;
&lt;br /&gt;
Node access policy for jobs is &amp;quot;&#039;&#039;&#039;exclusive user&#039;&#039;&#039;&amp;quot;. Nodes will be exclusively allocated to users. &#039;&#039;&#039;Multiple jobs (up to 48) of the same user can run on a single node&#039;&#039;&#039; at any time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; This implies that for &#039;&#039;&#039;sub-node jobs&#039;&#039;&#039;, it is advisable for efficient resource utilization and maximum job throughput to &#039;&#039;&#039;adjust the number of cores to be an integer divisor of 48&#039;&#039;&#039; (total number of cores on each node). For example, two 24-core jobs can run simultaneously on one and the same node, while two 32-core jobs will always have to allocate two separate nodes, but leave 16 cores unused on each of them. Users must therefore always &#039;&#039;&#039;think carefully about how many cores to request&#039;&#039;&#039; and whether their applications really benefit from allocating more cores for their jobs. Similar considerations apply - at the same time - to the &#039;&#039;&#039;requested amount of memory per job&#039;&#039;&#039;. &lt;br /&gt;
&lt;br /&gt;
Think of it as the scheduler playing a game of multi-dimensional Tetris, where the dimensions are number of cores, amount of memory and other resources. &#039;&#039;&#039;Users can support this by making resource allocations that allow the scheduler to pack their jobs as densely as possible on the nodes&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
== Memory Management ==&lt;br /&gt;
&lt;br /&gt;
The &#039;&#039;&#039;wait time of a job also depends largely on the amount of requested resources&#039;&#039;&#039; and the available number of nodes providing this amount of resources. This must be taken into account &#039;&#039;&#039;in particular when requesting a certain amount of memory&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
For example, there is a total of 692 compute nodes in JUSTUS, of which 456 nodes have 192 GB RAM. However, &#039;&#039;&#039;not the entire amount of physical RAM is available exclusively for user jobs&#039;&#039;&#039;, because the operating system, system services and local file systems also require a certain amount of RAM.&lt;br /&gt;
This means that if a job requests 192 GB RAM per node (i.e. --mem=192gb or --tasks-per-node=48 and --mem-per-cpu=4gb), Slurm will rule out 456 out of 692 nodes as being suitable for this job and considers only 220 out of 692 nodes as being eligible for running this job.&lt;br /&gt;
&lt;br /&gt;
The following table provides an overview of how much memory can be allocated by user jobs on the various node types and how many nodes can serve this memory requirement:&lt;br /&gt;
&lt;br /&gt;
{| width=500px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Physical RAM on node !! Available RAM on node !! Number of suitable nodes &lt;br /&gt;
|-&lt;br /&gt;
| 192 GB || 187 GB || 692 &lt;br /&gt;
|-&lt;br /&gt;
| 384 GB || 376 GB || 220&lt;br /&gt;
|-&lt;br /&gt;
| 768 GB || 754 GB || 28&lt;br /&gt;
|-&lt;br /&gt;
| 1536 GB || 1510 GB || 8&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Also note that allocated memory is factored into resource usage accounting for fair share. This means over-requesting memory may have a negative impact on the priority of subsequent jobs.&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Jobscripts:_Running_Your_Calculations&amp;diff=11324</id>
		<title>JUSTUS2/Jobscripts: Running Your Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Jobscripts:_Running_Your_Calculations&amp;diff=11324"/>
		<updated>2022-11-09T14:08:52Z</updated>

		<summary type="html">&lt;p&gt;J Salk: /* squeue */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Justus2}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The JUSTUS 2 cluster uses Slurm ([https://slurm.schedmd.com/ https://slurm.schedmd.com/]) for scheduling compute jobs. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= JUSTUS 2 Slurm Howto =&lt;br /&gt;
&lt;br /&gt;
This page only presents some very basic introduction. &lt;br /&gt;
&lt;br /&gt;
Please see  the &#039;&#039;&#039;[[bwForCluster JUSTUS 2 Slurm HOWTO|JUSTUS 2 Slurm HOWTO]]&#039;&#039;&#039; for many more examples and commands for common tasks.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the bwForCluster JUSTUS 2 =&lt;br /&gt;
Batch jobs are submitted with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;$ sbatch &amp;lt;job-script&amp;gt; &amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A job script contains options for Slurm in lines beginning with #SBATCH as well as your commands which you want to execute on the compute nodes. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&#039;bash&#039;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
#SBATCH --time=00:20:00&lt;br /&gt;
#SBATCH --mem=1gb&lt;br /&gt;
#SBATCH --export=NONE&lt;br /&gt;
echo &#039;Here starts the calculation&#039;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can override options from the script on the command-line:&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;$ sbatch --time=03:00:00 &amp;lt;job-script&amp;gt; &amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note: &amp;lt;font color=&amp;quot;red&amp;quot;&amp;gt; Compute jobs must not write/read from the global file systems as a calculation swap file. &amp;lt;/font&amp;gt; &lt;br /&gt;
&lt;br /&gt;
Use local storage /tmp in the ramdisk for small files or /scratch (see [[BwForCluster_JUSTUS_2_Slurm_HOWTO#How_to_request_local_scratch_.28SSD.2FNVMe.29_at_job_submission.3F|How to request NVME]]) for this purpose&lt;br /&gt;
&lt;br /&gt;
= Monitoring Your Jobs =&lt;br /&gt;
== squeue ==&lt;br /&gt;
&lt;br /&gt;
After you submitted the job, you can see it waiting using the &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&lt;br /&gt;
(also read the man page with &amp;lt;code&amp;gt;man squeue&amp;lt;/code&amp;gt; for more information on how to use the command)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&#039;shell&#039;&amp;gt;&lt;br /&gt;
&amp;gt; squeue&lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
             6260301  standard r_60_b_2 ul_yxz1 PD       0:00      1 (AssocGrpMemRunMinutes)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Output shows: &lt;br /&gt;
* JOBID: the jobid is an unique number your job gets&lt;br /&gt;
* PARTITION: the cluster can be divided in different types of nodes.&lt;br /&gt;
* NAME: the name you gave your job with the --job-name= option&lt;br /&gt;
* USER: your username&lt;br /&gt;
* ST: the state the job is in. R = running, PD = pending, CD = completed. See man page for a full list on states. &lt;br /&gt;
* TIME: how long the job has been running&lt;br /&gt;
* NODES: how many nodes were requested&lt;br /&gt;
* NODELIST(REASON): either show the node(s) the job is running on, or a reason why it hasn&#039;t started&lt;br /&gt;
&lt;br /&gt;
==scontrol==&lt;br /&gt;
&lt;br /&gt;
You can then show more info on one specific running job using the &amp;lt;code&amp;gt;scontrol&amp;lt;/code&amp;gt; command, e.g for the job with ID 6260301 listed above:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
&amp;gt; scontrol show job 6260301&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring a Started Job ==&lt;br /&gt;
&lt;br /&gt;
After a job has started, you can ssh to the node(s) the job is running on, using the node name from NODELIST, e.g. if your job runs on n0603:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&amp;gt; ssh n0603 &lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Slurm Command Overview =&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Description&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/sbatch.html sbatch] || Submits a job and queues it in an input queue&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/squeue.html squeue] || Displays information about active, eligible, blocked, and/or recently completed jobs &lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/scontrol.html scontrol] || Displays detailed job state information&lt;br /&gt;
|- &lt;br /&gt;
| [https://slurm.schedmd.com/scancel.html scancel] || Cancels a job&lt;br /&gt;
|- &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Partitions =&lt;br /&gt;
Job allocations at JUSTUS 2 are routed automatically to the most suitable compute node(s) that can provide the requested resources for the job (e.g. amount of cores, memory, local scratch space). This is to prevent fragmentation of the cluster system and to ensure most efficient usage of available compute resources. Thus, there is no point in requesting a partition in batch job scripts, i.e. users &#039;&#039;&#039;should not&#039;&#039;&#039; specify any partition &amp;quot;-p, --partition=&amp;lt;partition_name&amp;gt;&amp;quot; on job submission. This is of particular importance if you adapt job scripts from other cluster systems (e.g. bwUniCluster 2.0) to JUSTUS 2.&lt;br /&gt;
&lt;br /&gt;
= Job Priorities =&lt;br /&gt;
Job priorities at JUSTUS 2 depend on [https://slurm.schedmd.com/priority_multifactor.html multiple factors ]:&lt;br /&gt;
* Age: The amount of time a job has been waiting in the queue, eligible to be scheduled.&lt;br /&gt;
* Fairshare: The difference between the portion of the computing resource allocated to an association and the amount of resources that has been consumed.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
Jobs that are pending because the user reached one of the resource usage limits (see below) are not eligible to be scheduled and, thus, do not accrue priority by their age.  &lt;br /&gt;
&lt;br /&gt;
Fairshare does &#039;&#039;&#039;not&#039;&#039;&#039; introduce a fixed allotment, in that a user&#039;s ability to run new jobs is cut off as soon as a fixed target utilization is reached. Instead, the fairshare factor ensures that jobs from users who were under-served in the past are given higher priority than jobs from users who were over-served in the past. This keeps individual groups from long term monopolizing the resources, thus making it unfair to groups who have not used their fairshare for quite some time.&lt;br /&gt;
&lt;br /&gt;
Slurm features &#039;&#039;&#039;backfilling&#039;&#039;&#039;, meaning that the scheduler will start lower priority jobs if doing so does not delay the expected start time of &#039;&#039;&#039;any&#039;&#039;&#039; higher priority job. Since the expected start time of pending jobs depends upon the expected completion time of running jobs, reasonably accurate time limits are valuable for backfill scheduling to work well. This &#039;&#039;&#039;[https://youtu.be/OKhWwem1XZg?t=161 video]&#039;&#039;&#039; gives an illustrative description to how backfilling works.&lt;br /&gt;
&lt;br /&gt;
In summary, an approximate model of Slurm&#039;s behavior for scheduling jobs is this:&lt;br /&gt;
&lt;br /&gt;
* Step 1: Can the job in position one (highest priority) start now?&lt;br /&gt;
* Step 2: If it can, remove it from the queue, start it and continue with step 1.&lt;br /&gt;
* Step 3: If it can not, look at next job.&lt;br /&gt;
* Step 4: Can it start now, without delaying the start time of any job before it in the queue?&lt;br /&gt;
* Step 5: If it can, remove it from the queue, start it, recalculate what nodes are free, look at next job and continue with step 4.&lt;br /&gt;
* Step 6: If it can not, look at next job, and continue with step 4.&lt;br /&gt;
&lt;br /&gt;
As soon as a new job is submitted and as soon as a job finishes, Slurm restarts its main scheduling cycle with step 1.&lt;br /&gt;
&lt;br /&gt;
= Usage Limits/Throttling Policies =&lt;br /&gt;
&lt;br /&gt;
While the fairshare factor ensures fair long term balance of resource utilization between users and groups, there are additional usage limits that constrain the total cumulative resources at a given time. This is to prevent individual users from short term monopolizing large fractions of the whole cluster system.&lt;br /&gt;
&lt;br /&gt;
* The &#039;&#039;&#039;maximum walltime&#039;&#039;&#039; for a job is &#039;&#039;&#039;14 days&#039;&#039;&#039; (336 hours)&lt;br /&gt;
  --time=336:00:00 or --time=14-0&lt;br /&gt;
&lt;br /&gt;
* The &#039;&#039;&#039;maximum amount of cores&#039;&#039;&#039; used at any given time from jobs running is &#039;&#039;&#039;1920&#039;&#039;&#039; per user (aggregated over all running jobs). This translates to 40 nodes. An equivalent limit for allocated memory does also apply. If this limit is reached new jobs will be queued (with REASON: AssocGrpCpuLimit) but only allowed to run after resources have been relinquished. &lt;br /&gt;
&lt;br /&gt;
* The maximum amount of &#039;&#039;&#039;remaining allocated core-minutes&#039;&#039;&#039; per user is &#039;&#039;&#039;3300000&#039;&#039;&#039; (aggregated over all running jobs). For example, if a user has a 4-core job running that will complete in 1 hour and a 2-core job that will complete in 6 hours, this translates to 4 * 1 * 60 + 2 * 6 * 60 = 16 * 60 = 960 remaining core-minutes. Once a user reaches the limit, no more jobs are allowed to start (REASON: AssocGrpCPURunMinutesLimit). As the jobs continue to run, the remaining core time will decrease and eventually allow more jobs to start in a staggered way. This limit also &#039;&#039;&#039;correlates the maximum walltime and amount of cores that can be allocated&#039;&#039;&#039; for this amount of time. Thus, shorter walltimes for the jobs allow more resources to be allocated at a given time (but capped by the maximum amount of cores limit above). Watch this &#039;&#039;&#039;[https://youtu.be/OKhWwem1XZg?t=306 video]&#039;&#039;&#039; for an illustrative description. An equivalent limit applies for remaining time of memory allocation in which case jobs may be held back from starting with REASON AssocGrpMemRunMinutes.&lt;br /&gt;
&lt;br /&gt;
* The &#039;&#039;&#039;maximum amount of GPUs&#039;&#039;&#039; allocated by running jobs is &#039;&#039;&#039;4&#039;&#039;&#039; per user. If this limit it reached new jobs will be queued (with REASON: AssocGrpGRES) but only allowed to run after GPU resources have been relinquished. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Usage limits are subject to change.&lt;br /&gt;
&lt;br /&gt;
= Other Considerations =&lt;br /&gt;
&lt;br /&gt;
== Default Values ==&lt;br /&gt;
&lt;br /&gt;
Default values for jobs are:&lt;br /&gt;
&lt;br /&gt;
* Runtime: --time=02:00:00 (2 hours)&lt;br /&gt;
* Nodes: --nodes=1 (one node)&lt;br /&gt;
* Tasks: --tasks-per-node=1 (one task per node)&lt;br /&gt;
* Cores: --cpus-per-task=1 (one core per task)&lt;br /&gt;
* Memory: --mem-per-cpu=2gb (2 GB per core)&lt;br /&gt;
&lt;br /&gt;
== Node Access Policy ==&lt;br /&gt;
&lt;br /&gt;
Node access policy for jobs is &amp;quot;&#039;&#039;&#039;exclusive user&#039;&#039;&#039;&amp;quot;. Nodes will be exclusively allocated to users. &#039;&#039;&#039;Multiple jobs (up to 48) of the same user can run on a single node&#039;&#039;&#039; at any time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; This implies that for &#039;&#039;&#039;sub-node jobs&#039;&#039;&#039;, it is advisable for efficient resource utilization and maximum job throughput to &#039;&#039;&#039;adjust the number of cores to be an integer divisor of 48&#039;&#039;&#039; (total number of cores on each node). For example, two 24-core jobs can run simultaneously on one and the same node, while two 32-core jobs will always have to allocate two separate nodes, but leave 16 cores unused on each of them. Users must therefore always &#039;&#039;&#039;think carefully about how many cores to request&#039;&#039;&#039; and whether their applications really benefit from allocating more cores for their jobs. Similar considerations apply - at the same time - to the &#039;&#039;&#039;requested amount of memory per job&#039;&#039;&#039;. &lt;br /&gt;
&lt;br /&gt;
Think of it as the scheduler playing a game of multi-dimensional Tetris, where the dimensions are number of cores, amount of memory and other resources. &#039;&#039;&#039;Users can support this by making resource allocations that allow the scheduler to pack their jobs as densely as possible on the nodes&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
== Memory Management ==&lt;br /&gt;
&lt;br /&gt;
The &#039;&#039;&#039;wait time of a job also depends largely on the amount of requested resources&#039;&#039;&#039; and the available number of nodes providing this amount of resources. This must be taken into account &#039;&#039;&#039;in particular when requesting a certain amount of memory&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
For example, there is a total of 692 compute nodes in JUSTUS, of which 456 nodes have 192 GB RAM. However, &#039;&#039;&#039;not the entire amount of physical RAM is available exclusively for user jobs&#039;&#039;&#039;, because the operating system, system services and local file systems also require a certain amount of RAM.&lt;br /&gt;
This means that if a job requests 192 GB RAM per node (i.e. --mem=192gb or --tasks-per-node=48 and --mem-per-cpu=4gb), Slurm will rule out 456 out of 692 nodes as being suitable for this job and considers only 220 out of 692 nodes as being eligible for running this job.&lt;br /&gt;
&lt;br /&gt;
The following table provides an overview of how much memory can be allocated by user jobs on the various node types and how many nodes can serve this memory requirement:&lt;br /&gt;
&lt;br /&gt;
{| width=500px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Physical RAM on node !! Available RAM on node !! Number of suitable nodes &lt;br /&gt;
|-&lt;br /&gt;
| 192 GB || 187 GB || 692 &lt;br /&gt;
|-&lt;br /&gt;
| 384 GB || 376 GB || 220&lt;br /&gt;
|-&lt;br /&gt;
| 768 GB || 754 GB || 28&lt;br /&gt;
|-&lt;br /&gt;
| 1536 GB || 1510 GB || 8&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Also note that allocated memory is factored into resource usage accounting for fair share. This means over-requesting memory may have a negative impact on the priority of subsequent jobs.&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Jobscripts:_Running_Your_Calculations&amp;diff=11323</id>
		<title>JUSTUS2/Jobscripts: Running Your Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Jobscripts:_Running_Your_Calculations&amp;diff=11323"/>
		<updated>2022-11-09T14:04:49Z</updated>

		<summary type="html">&lt;p&gt;J Salk: /* Submitting Jobs on the bwForCluster JUSTUS 2 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{Justus2}}&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The JUSTUS 2 cluster uses Slurm ([https://slurm.schedmd.com/ https://slurm.schedmd.com/]) for scheduling compute jobs. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= JUSTUS 2 Slurm Howto =&lt;br /&gt;
&lt;br /&gt;
This page only presents some very basic introduction. &lt;br /&gt;
&lt;br /&gt;
Please see  the &#039;&#039;&#039;[[bwForCluster JUSTUS 2 Slurm HOWTO|JUSTUS 2 Slurm HOWTO]]&#039;&#039;&#039; for many more examples and commands for common tasks.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Submitting Jobs on the bwForCluster JUSTUS 2 =&lt;br /&gt;
Batch jobs are submitted with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;$ sbatch &amp;lt;job-script&amp;gt; &amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A job script contains options for Slurm in lines beginning with #SBATCH as well as your commands which you want to execute on the compute nodes. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&#039;bash&#039;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks-per-node=1&lt;br /&gt;
#SBATCH --time=00:20:00&lt;br /&gt;
#SBATCH --mem=1gb&lt;br /&gt;
#SBATCH --export=NONE&lt;br /&gt;
echo &#039;Here starts the calculation&#039;&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
You can override options from the script on the command-line:&lt;br /&gt;
&amp;lt;source lang=bash&amp;gt;$ sbatch --time=03:00:00 &amp;lt;job-script&amp;gt; &amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note: &amp;lt;font color=&amp;quot;red&amp;quot;&amp;gt; Compute jobs must not write/read from the global file systems as a calculation swap file. &amp;lt;/font&amp;gt; &lt;br /&gt;
&lt;br /&gt;
Use local storage /tmp in the ramdisk for small files or /scratch (see [[BwForCluster_JUSTUS_2_Slurm_HOWTO#How_to_request_local_scratch_.28SSD.2FNVMe.29_at_job_submission.3F|How to request NVME]]) for this purpose&lt;br /&gt;
&lt;br /&gt;
= Monitoring Your Jobs =&lt;br /&gt;
== squeue ==&lt;br /&gt;
&lt;br /&gt;
After you submitted the job, you can see it waiting using the &amp;lt;code&amp;gt;squeue&amp;lt;/code&amp;gt; command:&lt;br /&gt;
&lt;br /&gt;
(also read the man page with &amp;lt;code&amp;gt;man squeue&amp;lt;/code&amp;gt; for more information on how to use the command)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;source lang=&#039;shell&#039;&amp;gt;&lt;br /&gt;
&amp;gt; squeue&lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
             6260301  standard r_60_b_2 ul_yxz1 PD       0:00      1 (AssocGrpMemRunMinutes)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Output shows: &lt;br /&gt;
* JOBID: the jobid is an unique number your job gets&lt;br /&gt;
* PARTITION: the cluster can be divided in different types of nodes. JUSTUS2 only has standard and gpu as partitions&lt;br /&gt;
* NAME: the name you gave your job with the --job-name= option&lt;br /&gt;
* USER: your username&lt;br /&gt;
* ST: the state the job is in. R = running, PD = pending, CD = completed. See man page for a full list on states. &lt;br /&gt;
* TIME: how long the job has been running&lt;br /&gt;
* NODES: how many nodes were requested&lt;br /&gt;
* NODELIST(REASON): either show the node(s) the job is running on, or a reason why it hasn&#039;t started&lt;br /&gt;
==scontrol==&lt;br /&gt;
&lt;br /&gt;
You can then show more info on one specific running job using the &amp;lt;code&amp;gt;scontrol&amp;lt;/code&amp;gt; command, e.g for the job with ID 6260301 listed above:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
&amp;gt; scontrol show job 6260301&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Monitoring a Started Job ==&lt;br /&gt;
&lt;br /&gt;
After a job has started, you can ssh to the node(s) the job is running on, using the node name from NODELIST, e.g. if your job runs on n0603:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&amp;gt; ssh n0603 &lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Slurm Command Overview =&lt;br /&gt;
&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Description&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/sbatch.html sbatch] || Submits a job and queues it in an input queue&lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/squeue.html squeue] || Displays information about active, eligible, blocked, and/or recently completed jobs &lt;br /&gt;
|-&lt;br /&gt;
| [https://slurm.schedmd.com/scontrol.html scontrol] || Displays detailed job state information&lt;br /&gt;
|- &lt;br /&gt;
| [https://slurm.schedmd.com/scancel.html scancel] || Cancels a job&lt;br /&gt;
|- &lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Partitions =&lt;br /&gt;
Job allocations at JUSTUS 2 are routed automatically to the most suitable compute node(s) that can provide the requested resources for the job (e.g. amount of cores, memory, local scratch space). This is to prevent fragmentation of the cluster system and to ensure most efficient usage of available compute resources. Thus, there is no point in requesting a partition in batch job scripts, i.e. users &#039;&#039;&#039;should not&#039;&#039;&#039; specify any partition &amp;quot;-p, --partition=&amp;lt;partition_name&amp;gt;&amp;quot; on job submission. This is of particular importance if you adapt job scripts from other cluster systems (e.g. bwUniCluster 2.0) to JUSTUS 2.&lt;br /&gt;
&lt;br /&gt;
= Job Priorities =&lt;br /&gt;
Job priorities at JUSTUS 2 depend on [https://slurm.schedmd.com/priority_multifactor.html multiple factors ]:&lt;br /&gt;
* Age: The amount of time a job has been waiting in the queue, eligible to be scheduled.&lt;br /&gt;
* Fairshare: The difference between the portion of the computing resource allocated to an association and the amount of resources that has been consumed.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Notes:&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
Jobs that are pending because the user reached one of the resource usage limits (see below) are not eligible to be scheduled and, thus, do not accrue priority by their age.  &lt;br /&gt;
&lt;br /&gt;
Fairshare does &#039;&#039;&#039;not&#039;&#039;&#039; introduce a fixed allotment, in that a user&#039;s ability to run new jobs is cut off as soon as a fixed target utilization is reached. Instead, the fairshare factor ensures that jobs from users who were under-served in the past are given higher priority than jobs from users who were over-served in the past. This keeps individual groups from long term monopolizing the resources, thus making it unfair to groups who have not used their fairshare for quite some time.&lt;br /&gt;
&lt;br /&gt;
Slurm features &#039;&#039;&#039;backfilling&#039;&#039;&#039;, meaning that the scheduler will start lower priority jobs if doing so does not delay the expected start time of &#039;&#039;&#039;any&#039;&#039;&#039; higher priority job. Since the expected start time of pending jobs depends upon the expected completion time of running jobs, reasonably accurate time limits are valuable for backfill scheduling to work well. This &#039;&#039;&#039;[https://youtu.be/OKhWwem1XZg?t=161 video]&#039;&#039;&#039; gives an illustrative description to how backfilling works.&lt;br /&gt;
&lt;br /&gt;
In summary, an approximate model of Slurm&#039;s behavior for scheduling jobs is this:&lt;br /&gt;
&lt;br /&gt;
* Step 1: Can the job in position one (highest priority) start now?&lt;br /&gt;
* Step 2: If it can, remove it from the queue, start it and continue with step 1.&lt;br /&gt;
* Step 3: If it can not, look at next job.&lt;br /&gt;
* Step 4: Can it start now, without delaying the start time of any job before it in the queue?&lt;br /&gt;
* Step 5: If it can, remove it from the queue, start it, recalculate what nodes are free, look at next job and continue with step 4.&lt;br /&gt;
* Step 6: If it can not, look at next job, and continue with step 4.&lt;br /&gt;
&lt;br /&gt;
As soon as a new job is submitted and as soon as a job finishes, Slurm restarts its main scheduling cycle with step 1.&lt;br /&gt;
&lt;br /&gt;
= Usage Limits/Throttling Policies =&lt;br /&gt;
&lt;br /&gt;
While the fairshare factor ensures fair long term balance of resource utilization between users and groups, there are additional usage limits that constrain the total cumulative resources at a given time. This is to prevent individual users from short term monopolizing large fractions of the whole cluster system.&lt;br /&gt;
&lt;br /&gt;
* The &#039;&#039;&#039;maximum walltime&#039;&#039;&#039; for a job is &#039;&#039;&#039;14 days&#039;&#039;&#039; (336 hours)&lt;br /&gt;
  --time=336:00:00 or --time=14-0&lt;br /&gt;
&lt;br /&gt;
* The &#039;&#039;&#039;maximum amount of cores&#039;&#039;&#039; used at any given time from jobs running is &#039;&#039;&#039;1920&#039;&#039;&#039; per user (aggregated over all running jobs). This translates to 40 nodes. An equivalent limit for allocated memory does also apply. If this limit is reached new jobs will be queued (with REASON: AssocGrpCpuLimit) but only allowed to run after resources have been relinquished. &lt;br /&gt;
&lt;br /&gt;
* The maximum amount of &#039;&#039;&#039;remaining allocated core-minutes&#039;&#039;&#039; per user is &#039;&#039;&#039;3300000&#039;&#039;&#039; (aggregated over all running jobs). For example, if a user has a 4-core job running that will complete in 1 hour and a 2-core job that will complete in 6 hours, this translates to 4 * 1 * 60 + 2 * 6 * 60 = 16 * 60 = 960 remaining core-minutes. Once a user reaches the limit, no more jobs are allowed to start (REASON: AssocGrpCPURunMinutesLimit). As the jobs continue to run, the remaining core time will decrease and eventually allow more jobs to start in a staggered way. This limit also &#039;&#039;&#039;correlates the maximum walltime and amount of cores that can be allocated&#039;&#039;&#039; for this amount of time. Thus, shorter walltimes for the jobs allow more resources to be allocated at a given time (but capped by the maximum amount of cores limit above). Watch this &#039;&#039;&#039;[https://youtu.be/OKhWwem1XZg?t=306 video]&#039;&#039;&#039; for an illustrative description. An equivalent limit applies for remaining time of memory allocation in which case jobs may be held back from starting with REASON AssocGrpMemRunMinutes.&lt;br /&gt;
&lt;br /&gt;
* The &#039;&#039;&#039;maximum amount of GPUs&#039;&#039;&#039; allocated by running jobs is &#039;&#039;&#039;4&#039;&#039;&#039; per user. If this limit it reached new jobs will be queued (with REASON: AssocGrpGRES) but only allowed to run after GPU resources have been relinquished. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Usage limits are subject to change.&lt;br /&gt;
&lt;br /&gt;
= Other Considerations =&lt;br /&gt;
&lt;br /&gt;
== Default Values ==&lt;br /&gt;
&lt;br /&gt;
Default values for jobs are:&lt;br /&gt;
&lt;br /&gt;
* Runtime: --time=02:00:00 (2 hours)&lt;br /&gt;
* Nodes: --nodes=1 (one node)&lt;br /&gt;
* Tasks: --tasks-per-node=1 (one task per node)&lt;br /&gt;
* Cores: --cpus-per-task=1 (one core per task)&lt;br /&gt;
* Memory: --mem-per-cpu=2gb (2 GB per core)&lt;br /&gt;
&lt;br /&gt;
== Node Access Policy ==&lt;br /&gt;
&lt;br /&gt;
Node access policy for jobs is &amp;quot;&#039;&#039;&#039;exclusive user&#039;&#039;&#039;&amp;quot;. Nodes will be exclusively allocated to users. &#039;&#039;&#039;Multiple jobs (up to 48) of the same user can run on a single node&#039;&#039;&#039; at any time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Note:&#039;&#039;&#039; This implies that for &#039;&#039;&#039;sub-node jobs&#039;&#039;&#039;, it is advisable for efficient resource utilization and maximum job throughput to &#039;&#039;&#039;adjust the number of cores to be an integer divisor of 48&#039;&#039;&#039; (total number of cores on each node). For example, two 24-core jobs can run simultaneously on one and the same node, while two 32-core jobs will always have to allocate two separate nodes, but leave 16 cores unused on each of them. Users must therefore always &#039;&#039;&#039;think carefully about how many cores to request&#039;&#039;&#039; and whether their applications really benefit from allocating more cores for their jobs. Similar considerations apply - at the same time - to the &#039;&#039;&#039;requested amount of memory per job&#039;&#039;&#039;. &lt;br /&gt;
&lt;br /&gt;
Think of it as the scheduler playing a game of multi-dimensional Tetris, where the dimensions are number of cores, amount of memory and other resources. &#039;&#039;&#039;Users can support this by making resource allocations that allow the scheduler to pack their jobs as densely as possible on the nodes&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
== Memory Management ==&lt;br /&gt;
&lt;br /&gt;
The &#039;&#039;&#039;wait time of a job also depends largely on the amount of requested resources&#039;&#039;&#039; and the available number of nodes providing this amount of resources. This must be taken into account &#039;&#039;&#039;in particular when requesting a certain amount of memory&#039;&#039;&#039;.&lt;br /&gt;
&lt;br /&gt;
For example, there is a total of 692 compute nodes in JUSTUS, of which 456 nodes have 192 GB RAM. However, &#039;&#039;&#039;not the entire amount of physical RAM is available exclusively for user jobs&#039;&#039;&#039;, because the operating system, system services and local file systems also require a certain amount of RAM.&lt;br /&gt;
This means that if a job requests 192 GB RAM per node (i.e. --mem=192gb or --tasks-per-node=48 and --mem-per-cpu=4gb), Slurm will rule out 456 out of 692 nodes as being suitable for this job and considers only 220 out of 692 nodes as being eligible for running this job.&lt;br /&gt;
&lt;br /&gt;
The following table provides an overview of how much memory can be allocated by user jobs on the various node types and how many nodes can serve this memory requirement:&lt;br /&gt;
&lt;br /&gt;
{| width=500px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Physical RAM on node !! Available RAM on node !! Number of suitable nodes &lt;br /&gt;
|-&lt;br /&gt;
| 192 GB || 187 GB || 692 &lt;br /&gt;
|-&lt;br /&gt;
| 384 GB || 376 GB || 220&lt;br /&gt;
|-&lt;br /&gt;
| 768 GB || 754 GB || 28&lt;br /&gt;
|-&lt;br /&gt;
| 1536 GB || 1510 GB || 8&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Also note that allocated memory is factored into resource usage accounting for fair share. This means over-requesting memory may have a negative impact on the priority of subsequent jobs.&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Getting_Started&amp;diff=11322</id>
		<title>JUSTUS2/Getting Started</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Getting_Started&amp;diff=11322"/>
		<updated>2022-11-09T13:59:03Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;!--&lt;br /&gt;
Here is a short list of things you may need to do first when you get onto the cluster&lt;br /&gt;
== Basics ==&lt;br /&gt;
* log in to the cluster: [[JUSTUS2/Login]]&lt;br /&gt;
* get accustomed with the linux commandline: &lt;br /&gt;
** [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC/The_Command_Line introduction on the (external) hpc wiki] or &lt;br /&gt;
** linux course at [https://training.bwhpc.de/ training.bwhpc.de]&lt;br /&gt;
&lt;br /&gt;
== Running an Example with Preinstalled  Software ==&lt;br /&gt;
* scientific software: read on how to load [[Software Modules]]&lt;br /&gt;
* continue reading until you found that there are example job scripts: [[Environment_Modules#Software_job_examples]]&lt;br /&gt;
* submit a sample job from a software as mentioned in the job example. Also see: [[JUSTUS2/Slurm]]&lt;br /&gt;
* monitor your job: [[JUSTUS2/Slurm#Monitoring_Your_Jobs]]&lt;br /&gt;
== Running Your Own Calculations ==&lt;br /&gt;
* transfer your own data to the cluster: [[Data Transfer]] &lt;br /&gt;
* adapt the sample job script to run your own job&lt;br /&gt;
&lt;br /&gt;
Note that your jobs should not write/read much on the lustre filesystem while the job runs, but either use the ram disk in /tmp or request /scratch if the space of the ram disk isn&#039;t sufficient. The [[BwForCluster_JUSTUS_2_Slurm_HOWTO#How_to_clean-up_or_save_files_before_a_job_times_out.3F| Slurm Howto]] shows how to copy and clean up your data from /tmp or /scratch at the end of the job&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
== Get access to the cluster ==&lt;br /&gt;
&lt;br /&gt;
Follow the registration process for the bwForCluster. &amp;amp;rarr; [[Registration/bwForCluster|How to Register for a bwForCluster]]&lt;br /&gt;
&lt;br /&gt;
== Login to the cluster ==&lt;br /&gt;
&lt;br /&gt;
Setup service password and 2FA token and login to the cluster. &amp;amp;rarr; [[JUSTUS2/Login|Login JUSTUS2]]&lt;br /&gt;
&lt;br /&gt;
== Transfer your data to the cluster ==&lt;br /&gt;
&lt;br /&gt;
Get familiar with available file systems on the cluster. &amp;amp;rarr; [[Hardware_and_Architecture_(bwForCluster_JUSTUS_2)#Storage_Architecture|File Systems]]&lt;br /&gt;
&lt;br /&gt;
Transfer your data to the cluster using appropriate tools. &amp;amp;rarr; [[Data Transfer|Data Transfer]]&lt;br /&gt;
&lt;br /&gt;
== Find information about installed software and examples ==&lt;br /&gt;
&lt;br /&gt;
Compiler, Libraries and application software are provided as software modules. Learn how to work with software modules. &amp;amp;rarr; [[Software Modules]]&lt;br /&gt;
&amp;lt;!-- Overview of available software modules &amp;amp;rarr; [https://www.bwhpc.de/software.php https://www.bwhpc.de/software.php], select &amp;lt;code&amp;gt;Cluster → bwForCluster JUSTUS 2&amp;lt;/code&amp;gt; --&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Submit your application as a batch job ==&lt;br /&gt;
&lt;br /&gt;
Get familiar with available nodes types on the cluster. &amp;amp;rarr; [[Hardware_and_Architecture_(bwForCluster_JUSTUS_2)|Hardware and Architecture]]&lt;br /&gt;
&lt;br /&gt;
Submit and monitor your jobs with Slurm commands. &amp;amp;rarr; [[JUSTUS2/Slurm|Batch System Slurm]]&lt;br /&gt;
&lt;br /&gt;
== Acknowledge the cluster ==&lt;br /&gt;
&lt;br /&gt;
Remember to mention the cluster in your publications. &amp;amp;rarr; [[bwForCluster JUSTUS 2  Acknowledgement|Acknowledgement]]&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Data_Transfer&amp;diff=11321</id>
		<title>Data Transfer</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Data_Transfer&amp;diff=11321"/>
		<updated>2022-11-09T13:27:49Z</updated>

		<summary type="html">&lt;p&gt;J Salk: /* scp */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Transfer Tools ==&lt;br /&gt;
&lt;br /&gt;
{|class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! rowspan=&amp;quot;2&amp;quot; | Type&lt;br /&gt;
! rowspan=&amp;quot;2&amp;quot; | Software&lt;br /&gt;
! rowspan=&amp;quot;2&amp;quot; | Remarks&lt;br /&gt;
! colspan=&amp;quot;4&amp;quot;  style=&amp;quot;text-align:center&amp;quot; | Executable on&lt;br /&gt;
! colspan=&amp;quot;3&amp;quot;  style=&amp;quot;text-align:center&amp;quot; | Transfer from/to&lt;br /&gt;
|-&lt;br /&gt;
!Local°&lt;br /&gt;
!bwUniCluster&lt;br /&gt;
!bwForCluster&lt;br /&gt;
!www&lt;br /&gt;
!bwHPC cluster&lt;br /&gt;
![[SDS@hd]]&lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;5&amp;quot; | Command-line&lt;br /&gt;
! scp&lt;br /&gt;
| rowspan=&amp;quot;3&amp;quot; | Throughput &amp;lt; 150 MB/s (depending on cipher)&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
|&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | &lt;br /&gt;
|-&lt;br /&gt;
! sftp&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
|&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
|-&lt;br /&gt;
! rsync&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
|&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | &lt;br /&gt;
|-&lt;br /&gt;
! rdata&lt;br /&gt;
| Throughput of 350-400 MB/s&lt;br /&gt;
| &lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| &lt;br /&gt;
|&lt;br /&gt;
|&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
|-&lt;br /&gt;
! wget&lt;br /&gt;
| Download from http/ftp address only&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
|  &lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | &lt;br /&gt;
|-&lt;br /&gt;
| rowspan=&amp;quot;2&amp;quot; | Graphical &lt;br /&gt;
! [https://winscp.net/eng/download.php WinSCP]&lt;br /&gt;
| based on SCP/SFTP, Windows only &lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| &lt;br /&gt;
| &lt;br /&gt;
| &lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
|-&lt;br /&gt;
! [https://filezilla-project.org/download.php?show_all=1 FileZilla]&lt;br /&gt;
| based on SFTP&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| &lt;br /&gt;
| &lt;br /&gt;
| &lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
| style=&amp;quot;text-align:center&amp;quot; | +&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
° Depending on the installed operating system (OS).&lt;br /&gt;
&lt;br /&gt;
== Linux/Unix/Mac commandline sftp/scp Usage Examples ==&lt;br /&gt;
&lt;br /&gt;
=== sftp===&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;gt; sftp  ka_xy1234@bwfilestorage.lsdf.kit.edu&lt;br /&gt;
Connecting to bwfilestorage.lsdf.kit.edu&amp;lt;br&amp;gt;&lt;br /&gt;
ka_xy1234@bwfilestorage.lsdf.kit.edu&#039;s password: &lt;br /&gt;
sftp&amp;gt; ls&lt;br /&gt;
snapshots&lt;br /&gt;
temp test&lt;br /&gt;
sftp&amp;gt; help&lt;br /&gt;
...&lt;br /&gt;
sftp&amp;gt; put myfile&lt;br /&gt;
sftp&amp;gt; get myfile&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== scp ===&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
&amp;gt; scp mylocalfile ul_xy1234@justus2.uni-ulm.de: # copies to home directory&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Using SFTP from Windows and Mac graphical clients ==&lt;br /&gt;
&lt;br /&gt;
Windows clients do not have a SCP/SFTP client installed by default, so it needs to be installed before this protocol can be used. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Tools for example:&#039;&#039;&#039;&lt;br /&gt;
* [https://www.openssh.com/ OpenSSH] &lt;br /&gt;
*[https://www.chiark.greenend.org.uk/~sgtatham/putty/download.html Putty suite] (for Windows and Unix)&lt;br /&gt;
*[https://winscp.net/eng/download.php WinSCP] (for Windows)&lt;br /&gt;
*[https://filezilla-project.org/download.php?show_all=1 FileZilla] (for Windows, Mac and Linux)&lt;br /&gt;
*[https://cygwin.com/install.html Cygwin] (for Windows)&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;network drive over SFTP:&#039;&#039;&#039;&lt;br /&gt;
*[https://www.southrivertechnologies.com/download/downloadwd.html WebDrive] (for Windows and Mac) &lt;br /&gt;
*[https://www.eldos.com/sftp-net-drive/comparison.php  SFTP Net Drive (ELDOS)] (for Windows)&lt;br /&gt;
*[https://www.netdrive.net/ NetDrive] (for Windows)&lt;br /&gt;
*[https://www.expandrive.com/expandrive ExpanDrive] (for Windows and Mac)&lt;br /&gt;
&lt;br /&gt;
== Best practices ==&lt;br /&gt;
&lt;br /&gt;
=== Ciphers ===&lt;br /&gt;
&lt;br /&gt;
Encrypting all the transferred data via scp/sftp takes time, which can become significant for really large data transfers. &lt;br /&gt;
&lt;br /&gt;
In these cases, you can choose a faster encryption cipher to speed up that part of your data transfer via options to ssh/sftp.&lt;br /&gt;
In our tests, these ciphers have had the listed transfer speedups over the default. If speedups are noticeable for you depends on processor type, network connection and the used hard disk. &lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
!Cipher &lt;br /&gt;
!style=&amp;quot;text-align:left;&amp;quot;| performance&lt;br /&gt;
|-&lt;br /&gt;
|chacha20-poly1305@openssh.com (default)&lt;br /&gt;
| 100%&lt;br /&gt;
|-&lt;br /&gt;
|aes128-gcm@openssh.com&lt;br /&gt;
|~200%&lt;br /&gt;
|-&lt;br /&gt;
|aes128-ctr&lt;br /&gt;
|~188%&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
With ssh/sshfs you can use different ciphers with the -c option:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ssh -c aes128-gcm@openssh.com&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
A list of available ciphers should be available with the command&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;ssh -Q cipher&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Login&amp;diff=11320</id>
		<title>JUSTUS2/Login</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=JUSTUS2/Login&amp;diff=11320"/>
		<updated>2022-11-09T13:26:00Z</updated>

		<summary type="html">&lt;p&gt;J Salk: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;br /&gt;
= Login to JUSTUS 2 =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Prerequisites:&lt;br /&gt;
&lt;br /&gt;
You should already have&lt;br /&gt;
* followed the 3-step [[Registration]] procedure. &lt;br /&gt;
* [[Registration/bwForCluster/JUSTUS2|created an account]]  at the registration server for JUSTUS2. &lt;br /&gt;
* [[Registration/Password|set a service password]] for JUSTUS2.&lt;br /&gt;
* [[Registration/2FA|set up a time-based one-time password (TOTP)]] for the two factor authentication (2FA) &lt;br /&gt;
* and you should connect from an internet address (IP) that is inside your university network (check your hostname/IP with http://displaymyhostname.com/ )&lt;br /&gt;
&lt;br /&gt;
For the last point: You cannot connect directly from a computer at home to JUSTUS2. You either need to use an on-campus computer or to connect to your university via a virtual private network (VPN). Please consult the documentation of your university how to connect to your university via VPN.&lt;br /&gt;
&lt;br /&gt;
== Login to JUSTUS 2 ==&lt;br /&gt;
&lt;br /&gt;
When all prerequisites are fulfilled you can access the bwForCluster JUSTUS 2 for Computational Chemistry and Quantum Sciences via [[ssh]]. Only the secure shell ssh is allowed for login. &lt;br /&gt;
&lt;br /&gt;
From Linux machines, you can log in using &lt;br /&gt;
&lt;br /&gt;
 ssh &amp;lt;UserID&amp;gt;@justus2.uni-ulm.de&lt;br /&gt;
&lt;br /&gt;
During log in you must enter the current TOTP value (6-digit number) created with help of the TOTP app on your smartphone and your service password.&lt;br /&gt;
&lt;br /&gt;
To run graphical applications, you can use the -X flag to openssh:&lt;br /&gt;
&lt;br /&gt;
 ssh -X &amp;lt;UserID&amp;gt;@justus2.uni-ulm.de&lt;br /&gt;
&lt;br /&gt;
For better performance on slow connections you should use e.g. [[VNC]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The bwForCluster Chemistry in Ulm  has four dedicated login nodes. The selection of the login node is done automatically. If you are logging in multiple times, different sessions might run &lt;br /&gt;
on different login nodes.&lt;br /&gt;
&lt;br /&gt;
The names of the four login nodes are justus2-login01.rz.uni-ulm.de, justus2-login02.rz.uni-ulm.de, justus2-login03.rz.uni-ulm.de, justus2-login04.rz.uni-ulm.de. &lt;br /&gt;
&lt;br /&gt;
These names can be used to access a specific one of the login nodes. In general, you should use justus2.uni-ulm.de to allow us to balance the load over the four login nodes.&lt;br /&gt;
&lt;br /&gt;
== About UserID / Username ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;UserID&amp;gt; of the ssh command is a placeholder for your username at your home &lt;br /&gt;
organization and a prefix denoting your organization. Prefixes and resulting user names are as follows:&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;border:3px solid darkgray; margin: 5em auto 5em auto;&amp;quot; width=&amp;quot;60%&amp;quot;&lt;br /&gt;
|- &lt;br /&gt;
!scope=&amp;quot;row&amp;quot; {{Darkgray}} |  Site &lt;br /&gt;
!scope=&amp;quot;row&amp;quot; {{Darkgray}}| Prefix&lt;br /&gt;
!scope=&amp;quot;row&amp;quot; {{Darkgray}}|  Username&lt;br /&gt;
|-&lt;br /&gt;
| Freiburg&lt;br /&gt;
| fr&lt;br /&gt;
| fr_username&lt;br /&gt;
|-&lt;br /&gt;
|Heidelberg&lt;br /&gt;
|hd&lt;br /&gt;
|hd_username&lt;br /&gt;
|-&lt;br /&gt;
|Hohenheim&lt;br /&gt;
|ho&lt;br /&gt;
|ho_username&lt;br /&gt;
|-&lt;br /&gt;
|Karlsruhe&lt;br /&gt;
|ka&lt;br /&gt;
|ka_username&lt;br /&gt;
|-&lt;br /&gt;
|Konstanz&lt;br /&gt;
|kn&lt;br /&gt;
|kn_username&lt;br /&gt;
|-&lt;br /&gt;
|Mannheim&lt;br /&gt;
|ma&lt;br /&gt;
|ma_username&lt;br /&gt;
|-&lt;br /&gt;
|Stuttgart&lt;br /&gt;
|st&lt;br /&gt;
|st_username&lt;br /&gt;
|-&lt;br /&gt;
|Tübingen&lt;br /&gt;
|tu&lt;br /&gt;
|tu_username&lt;br /&gt;
|-&lt;br /&gt;
|Ulm&lt;br /&gt;
|ul&lt;br /&gt;
|ul_username&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Allowed activities on login nodes ==&lt;br /&gt;
&lt;br /&gt;
The login nodes are the access point to the compute system and its $HOME directory. The login nodes are shared with all the users of the cluster. Therefore, your activities on the login nodes are limited to primarily set up your batch jobs. Your activities may also be:&lt;br /&gt;
* compilation of your program code and&lt;br /&gt;
* short pre- and postprocessing of your batch jobs.&lt;br /&gt;
&lt;br /&gt;
To guarantee usability for all users of the bwForCluster you must not run your compute jobs on the login nodes. Compute jobs must be submitted as&lt;br /&gt;
[[BwForCluster_JUSTUS_2_Slurm_HOWTO|Batch Jobs]]. Any compute job running on the login nodes will be terminated without any notice.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--&lt;br /&gt;
= Further reading =&lt;br /&gt;
&lt;br /&gt;
* [[Data Transfer]] - how to get your files on the cluster&lt;br /&gt;
&lt;br /&gt;
* Scientific software is made accessible using the [[Environment Modules]] system&lt;br /&gt;
&lt;br /&gt;
* Compute jobs must be submitted as [[BwForCluster_JUSTUS_2_Slurm_HOWTO|Batch Jobs]]&lt;br /&gt;
&lt;br /&gt;
* Jobs needing disk space will need to request it in their job script. See [[BwForCluster_JUSTUS_2_Slurm_HOWTO#How_to_request_local_scratch_.28SSD.2FNVMe.29_at_job_submission.3F|Batch Jobs - request local scratch]]&lt;br /&gt;
&lt;br /&gt;
* What hardware is available is described in [https://wiki.bwhpc.de/e/Hardware_and_Architecture_(bwForCluster_JUSTUS_2) Hardware and Architecture of bwForCluster JUSTUS 2]&lt;br /&gt;
&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
----&lt;br /&gt;
[[Category:BwForCluster_JUSTUS_2]][[Category:Access]]&lt;/div&gt;</summary>
		<author><name>J Salk</name></author>
	</entry>
</feed>