<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.bwhpc.de/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=J+Steuer</id>
	<title>bwHPC Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.bwhpc.de/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=J+Steuer"/>
	<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/e/Special:Contributions/J_Steuer"/>
	<updated>2026-04-12T02:25:27Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.39.17</generator>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Main_Page&amp;diff=12815</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Main_Page&amp;diff=12815"/>
		<updated>2024-06-18T10:28:39Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;span style=&amp;quot;font-size:140%;&amp;gt;&#039;&#039;&#039;Welcome to the bwHPC Wiki.&#039;&#039;&#039;&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
bwHPC represents services and resources in the State of &#039;&#039;&#039;B&#039;&#039;&#039;aden-&#039;&#039;&#039;W&#039;&#039;&#039;ürttemberg, Germany, for High Performance Computing (&#039;&#039;&#039;HPC&#039;&#039;&#039;), Data Intensive Computing (&#039;&#039;&#039;DIC&#039;&#039;&#039;) and Large Scale Scientific Data Management (&#039;&#039;&#039;LS2DM&#039;&#039;&#039;).&lt;br /&gt;
&lt;br /&gt;
The main bwHPC web page is at [https://www.bwhpc.de/ https://www.bwhpc.de/].&lt;br /&gt;
&lt;br /&gt;
Many topics depend on the cluster system you use. &lt;br /&gt;
First choose the cluster you use,  then select the correct topic.&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot; background:#eeeefe; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold; text-align:left&amp;quot; | Courses / eLearning&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [https://training.bwhpc.de/ eLearning and Online Courses]&lt;br /&gt;
* [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC (external resource)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold; text-align:left&amp;quot; | Need Access to a Cluster?&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | HPC System Specific Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
bwHPC Clusters are dedicated to [https://www.bwhpc.de/bwhpc-domains.php specific research domains].  &lt;br /&gt;
Documentation differs between compute clusters, please see cluster specific overview pages:&lt;br /&gt;
{|&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[BwUniCluster2.0|bwUniCluster 2.0]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | General Purpose, Teaching&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[:JUSTUS2| bwForCluster JUSTUS 2]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | Theoretical Chemistry, Condensed Matter Physics, and Quantum Sciences&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot; | [[Helix|bwForCluster Helix]]&lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  |   Structural and Systems Biology, Medical Science, Soft Matter, Computational Humanities, and Mathematics and Computer Science&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[NEMO|bwForCluster NEMO]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | Neurosciences, Particle Physics, Materials Science, and Microsystems Engineering&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[BinAC|bwForCluster BinAC]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | Bioinformatics, Geosciences and Astrophysics. &lt;br /&gt;
|}&lt;br /&gt;
|-&lt;br /&gt;
|bwHPC Clusters: [https://www.bwhpc.de/cluster.php operational status] &lt;br /&gt;
Further Compute Clusters in Baden-Württemberg (separate access policies):&lt;br /&gt;
* bwHPC tier 1: [https://kb.hlrs.de/platforms/index.php/HPE_Hawk Hawk] ([https://www.hlrs.de/solutions-services/academic-users/ getting access])&lt;br /&gt;
* bwHPC tier 2: [https://www.nhr.kit.edu/userdocs/horeka HoreKa] ([https://www.nhr.kit.edu/userdocs/horeka/projects/ getting access])&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold; text-align:left&amp;quot; | Documentation valid for all Clusters&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Environment Modules| Software Modules]] and software documentation explained&lt;br /&gt;
* [https://www.bwhpc.de/software.html List of Software] on all clusters&lt;br /&gt;
* [[Development| Software Development and Parallel Programming]]&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
* [[HPC Glossary]]&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;height:100%; background:#ffeaef; width:100%&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#f5dfdf; font-size:120%; font-weight:bold;  text-align:left&amp;quot;   | Scientific Data Storage&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
For user guides of the scientific data storage services:&lt;br /&gt;
* [[SDS@hd]]&lt;br /&gt;
* [https://www.rda.kit.edu/english bwDataArchive]&lt;br /&gt;
* [https://zas.bwsfs.uni-tuebingen.de/info/uebersicht bwSFS]&lt;br /&gt;
Associated, but local scientific storage services are:&lt;br /&gt;
* [https://wiki.scc.kit.edu/lsdf/index.php/Category:LSDF_Online_Storage LSDF Online Storage] (only for KIT and KIT partners)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;height:100%; background:#ffeaef; width:100%&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#f5dfdf; font-size:120%; font-weight:bold;  text-align:left&amp;quot;   | Data Management&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Data Transfer|Data Transfer]]&lt;br /&gt;
* [https://www.forschungsdaten.org/index.php/FDM-Kontakte#Deutschland Research Data Management (RDM)] contact persons&lt;br /&gt;
* [https://www.forschungsdaten.info Portal for Research Data Management] (Forschungsdaten.info)&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
Support is provided by the [https://www.bwhpc.de/teams.php bwHPC Competence Centers]:&lt;br /&gt;
* [https://bw-support.scc.kit.edu/ Submit a Ticket]&lt;br /&gt;
* Extended Support via [https://zas.bwhpc.de/en/zas_info_tigerteamsupport.php &amp;quot;Tiger Teams&amp;quot;]&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Acknowledgement&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Please [[Acknowledgement|acknowledge]] our resources in your publications.&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Main_Page&amp;diff=12814</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Main_Page&amp;diff=12814"/>
		<updated>2024-06-18T10:28:15Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;span style=&amp;quot;font-size:140%;&amp;gt;&#039;&#039;&#039;Welcome to the bwHPC Wiki.&#039;&#039;&#039;&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
bwHPC represents services and resources in the State of &#039;&#039;&#039;B&#039;&#039;&#039;aden-&#039;&#039;&#039;W&#039;&#039;&#039;ürttemberg, Germany, for High Performance Computing (&#039;&#039;&#039;HPC&#039;&#039;&#039;), Data Intensive Computing (&#039;&#039;&#039;DIC&#039;&#039;&#039;) and Large Scale Scientific Data Management (&#039;&#039;&#039;LS2DM&#039;&#039;&#039;).&lt;br /&gt;
&lt;br /&gt;
The main bwHPC web page is at [https://www.bwhpc.de/ https://www.bwhpc.de/].&lt;br /&gt;
&lt;br /&gt;
Many topics depend on the cluster system you use. &lt;br /&gt;
First choose the cluster you use,  then select the correct topic.&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot; background:#eeeefe; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold; text-align:left&amp;quot; | Courses / eLearning&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [https://training.bwhpc.de/ eLearning and Online Courses]&lt;br /&gt;
* [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC (external resource)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold; text-align:left&amp;quot; | Need Access to a Cluster?&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | HPC System Specific Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
bwHPC Clusters are dedicated to [https://www.bwhpc.de/bwhpc-domains.php specific research domains].  &lt;br /&gt;
Documentation differs between compute clusters, please see cluster specific overview pages:&lt;br /&gt;
{|&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[BwUniCluster2.0|bwUniCluster 2.0]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | General Purpose, Teaching&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[:JUSTUS2| bwForCluster JUSTUS 2]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | Theoretical Chemistry, Condensed Matter Physics, and Quantum Sciences&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot; | [[Helix|bwForCluster Helix]]&lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  |   Structural and Systems Biology, Medical Science, Soft Matter, Computational Humanities, and Mathematics and Computer Science&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[NEMO|bwForCluster NEMO]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | Neurosciences, Particle Physics, Materials Science, and Microsystems Engineering&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[BinAC|bwForCluster BinAC]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | Bioinformatics, Geosciences and Astrophysics. &lt;br /&gt;
|}&lt;br /&gt;
|-&lt;br /&gt;
|bwHPC Clusters: [https://www.bwhpc.de/cluster.php operational status] &lt;br /&gt;
Further Compute Clusters in Baden-Württemberg (separate access policies):&lt;br /&gt;
* bwHPC tier 1: [https://kb.hlrs.de/platforms/index.php/HPE_Hawk Hawk] ([https://www.hlrs.de/solutions-services/academic-users/ getting access])&lt;br /&gt;
* bwHPC tier 2: [https://www.nhr.kit.edu/userdocs/horeka HoreKa] ([https://www.nhr.kit.edu/userdocs/horeka/projects/ getting access])&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold; text-align:left&amp;quot; | Documentation valid for all Clusters&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Environment Modules| Software Modules]] and software documentation explained&lt;br /&gt;
* [https://www.bwhpc.de/software.html List of Software] on all clusters&lt;br /&gt;
* [[Development| Software Development and Parallel Programming]]&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
* [[HPC Glossary]]&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;height:100%; background:#ffeaef; width:100%&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#f5dfdf; font-size:120%; font-weight:bold;  text-align:left&amp;quot;   | Scientific Data Storage&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
For user guides of the scientific data storage services:&lt;br /&gt;
* [[SDS@hd]]&lt;br /&gt;
* [https://www.rda.kit.edu/english bwDataArchive]&lt;br /&gt;
* [https://zas.bwsfs.uni-tuebingen.de/info/uebersicht bwSFS]&lt;br /&gt;
Associated, but local scientific storage services are:&lt;br /&gt;
* [https://wiki.scc.kit.edu/lsdf/index.php/Category:LSDF_Online_Storage LSDF Online Storage] (only for KIT and KIT partners)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;height:100%; background:#ffeaef; width:100%&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#f5dfdf; font-size:120%; font-weight:bold;  text-align:left&amp;quot;   | Data Management&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Data Transfer|Data Transfer]]&lt;br /&gt;
* [https://www.forschungsdaten.org/index.php/FDM-Kontakte#Deutschland Research Data Management (RDM)] contact persons&lt;br /&gt;
* [https://www.forschungsdaten.info Portal for Research Data Management] (Forschungsdaten.info)&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
Support is provided by the [https://www.bwhpc.de/teams.php bwHPC Competence Centers]:&lt;br /&gt;
* [https://bw-support.scc.kit.edu/ Submit a Ticket]&lt;br /&gt;
* Extended Support via [https://zas.bwhpc.de/en/zas_info_tigerteamsupport.php &amp;quot;Tiger Teams&amp;quot;]&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Acknowledgement&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Please [[Acknowledgement|acknowledge]] our resources in your publications.]&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Main_Page&amp;diff=12813</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Main_Page&amp;diff=12813"/>
		<updated>2024-06-18T10:27:43Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;span style=&amp;quot;font-size:140%;&amp;gt;&#039;&#039;&#039;Welcome to the bwHPC Wiki.&#039;&#039;&#039;&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
bwHPC represents services and resources in the State of &#039;&#039;&#039;B&#039;&#039;&#039;aden-&#039;&#039;&#039;W&#039;&#039;&#039;ürttemberg, Germany, for High Performance Computing (&#039;&#039;&#039;HPC&#039;&#039;&#039;), Data Intensive Computing (&#039;&#039;&#039;DIC&#039;&#039;&#039;) and Large Scale Scientific Data Management (&#039;&#039;&#039;LS2DM&#039;&#039;&#039;).&lt;br /&gt;
&lt;br /&gt;
The main bwHPC web page is at [https://www.bwhpc.de/ https://www.bwhpc.de/].&lt;br /&gt;
&lt;br /&gt;
Many topics depend on the cluster system you use. &lt;br /&gt;
First choose the cluster you use,  then select the correct topic.&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot; background:#eeeefe; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold; text-align:left&amp;quot; | Courses / eLearning&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [https://training.bwhpc.de/ eLearning and Online Courses]&lt;br /&gt;
* [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC (external resource)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold; text-align:left&amp;quot; | Need Access to a Cluster?&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | HPC System Specific Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
bwHPC Clusters are dedicated to [https://www.bwhpc.de/bwhpc-domains.php specific research domains].  &lt;br /&gt;
Documentation differs between compute clusters, please see cluster specific overview pages:&lt;br /&gt;
{|&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[BwUniCluster2.0|bwUniCluster 2.0]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | General Purpose, Teaching&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[:JUSTUS2| bwForCluster JUSTUS 2]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | Theoretical Chemistry, Condensed Matter Physics, and Quantum Sciences&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot; | [[Helix|bwForCluster Helix]]&lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  |   Structural and Systems Biology, Medical Science, Soft Matter, Computational Humanities, and Mathematics and Computer Science&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[NEMO|bwForCluster NEMO]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | Neurosciences, Particle Physics, Materials Science, and Microsystems Engineering&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[BinAC|bwForCluster BinAC]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | Bioinformatics, Geosciences and Astrophysics. &lt;br /&gt;
|}&lt;br /&gt;
|-&lt;br /&gt;
|bwHPC Clusters: [https://www.bwhpc.de/cluster.php operational status] &lt;br /&gt;
Further Compute Clusters in Baden-Württemberg (separate access policies):&lt;br /&gt;
* bwHPC tier 1: [https://kb.hlrs.de/platforms/index.php/HPE_Hawk Hawk] ([https://www.hlrs.de/solutions-services/academic-users/ getting access])&lt;br /&gt;
* bwHPC tier 2: [https://www.nhr.kit.edu/userdocs/horeka HoreKa] ([https://www.nhr.kit.edu/userdocs/horeka/projects/ getting access])&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold; text-align:left&amp;quot; | Documentation valid for all Clusters&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Environment Modules| Software Modules]] and software documentation explained&lt;br /&gt;
* [https://www.bwhpc.de/software.html List of Software] on all clusters&lt;br /&gt;
* [[Development| Software Development and Parallel Programming]]&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
* [[HPC Glossary]]&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;height:100%; background:#ffeaef; width:100%&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#f5dfdf; font-size:120%; font-weight:bold;  text-align:left&amp;quot;   | Scientific Data Storage&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
For user guides of the scientific data storage services:&lt;br /&gt;
* [[SDS@hd]]&lt;br /&gt;
* [https://www.rda.kit.edu/english bwDataArchive]&lt;br /&gt;
* [https://zas.bwsfs.uni-tuebingen.de/info/uebersicht bwSFS]&lt;br /&gt;
Associated, but local scientific storage services are:&lt;br /&gt;
* [https://wiki.scc.kit.edu/lsdf/index.php/Category:LSDF_Online_Storage LSDF Online Storage] (only for KIT and KIT partners)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;height:100%; background:#ffeaef; width:100%&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#f5dfdf; font-size:120%; font-weight:bold;  text-align:left&amp;quot;   | Data Management&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Data Transfer|Data Transfer]]&lt;br /&gt;
* [https://www.forschungsdaten.org/index.php/FDM-Kontakte#Deutschland Research Data Management (RDM)] contact persons&lt;br /&gt;
* [https://www.forschungsdaten.info Portal for Research Data Management] (Forschungsdaten.info)&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
Support is provided by the [https://www.bwhpc.de/teams.php bwHPC Competence Centers]:&lt;br /&gt;
* [https://bw-support.scc.kit.edu/ Submit a Ticket]&lt;br /&gt;
* extended Support via [https://zas.bwhpc.de/en/zas_info_tigerteamsupport.php &amp;quot;Tiger Teams&amp;quot;]&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Acknowledgement&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
Please [[Acknowledgement|acknowledge]] our resources in your publications.]&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12812</id>
		<title>Acknowledgement</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12812"/>
		<updated>2024-06-18T10:26:51Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* HPC Cluster */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Remember to acknowledge our resources in your publications!&lt;br /&gt;
&lt;br /&gt;
Such recognition is important for acquiring funding for the next generation of hardware, support services, data storage, and infrastructure.&lt;br /&gt;
&lt;br /&gt;
The publications will be referenced on the bwHPC website: https://www.bwhpc.de/user_publications.html&lt;br /&gt;
&lt;br /&gt;
== HPC Clusters ==&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BwUniCluster2.0/Acknowledgement| bwUniCluster Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BinAC/Acknowledgement| bwForCluster BinAC Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[Helix/Acknowledgement| bwForCluster Helix Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[JUSTUS2/Acknowledgement| bwForCluster JUSTUS2 Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[NEMO/Acknowledgement| bwForCluster NEMO Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
== Data Facilities ==&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[SDS@hd/Acknowledgement| SDS@hd Acknowledgement]]&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12811</id>
		<title>Acknowledgement</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12811"/>
		<updated>2024-06-18T10:26:40Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* Data Facilities */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Remember to acknowledge our resources in your publications!&lt;br /&gt;
&lt;br /&gt;
Such recognition is important for acquiring funding for the next generation of hardware, support services, data storage, and infrastructure.&lt;br /&gt;
&lt;br /&gt;
The publications will be referenced on the bwHPC website: https://www.bwhpc.de/user_publications.html&lt;br /&gt;
&lt;br /&gt;
== HPC Cluster ==&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BwUniCluster2.0/Acknowledgement| bwUniCluster Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BinAC/Acknowledgement| bwForCluster BinAC Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[Helix/Acknowledgement| bwForCluster Helix Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[JUSTUS2/Acknowledgement| bwForCluster JUSTUS2 Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[NEMO/Acknowledgement| bwForCluster NEMO Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
== Data Facilities ==&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[SDS@hd/Acknowledgement| SDS@hd Acknowledgement]]&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12810</id>
		<title>Acknowledgement</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12810"/>
		<updated>2024-06-18T10:26:30Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* HPC Cluster */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Remember to acknowledge our resources in your publications!&lt;br /&gt;
&lt;br /&gt;
Such recognition is important for acquiring funding for the next generation of hardware, support services, data storage, and infrastructure.&lt;br /&gt;
&lt;br /&gt;
The publications will be referenced on the bwHPC website: https://www.bwhpc.de/user_publications.html&lt;br /&gt;
&lt;br /&gt;
== HPC Cluster ==&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BwUniCluster2.0/Acknowledgement| bwUniCluster Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BinAC/Acknowledgement| bwForCluster BinAC Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[Helix/Acknowledgement| bwForCluster Helix Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[JUSTUS2/Acknowledgement| bwForCluster JUSTUS2 Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[NEMO/Acknowledgement| bwForCluster NEMO Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
== Data Facilities ==&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[SDS@hd/Acknowledgement| SDS@hdAcknowledgement]]&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12809</id>
		<title>Acknowledgement</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12809"/>
		<updated>2024-06-18T10:26:12Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Remember to acknowledge our resources in your publications!&lt;br /&gt;
&lt;br /&gt;
Such recognition is important for acquiring funding for the next generation of hardware, support services, data storage, and infrastructure.&lt;br /&gt;
&lt;br /&gt;
The publications will be referenced on the bwHPC website: https://www.bwhpc.de/user_publications.html&lt;br /&gt;
&lt;br /&gt;
== HPC Cluster ==&lt;br /&gt;
&lt;br /&gt;
Cluster-specific information can be found here:&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BwUniCluster2.0/Acknowledgement| bwUniCluster Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BinAC/Acknowledgement| bwForCluster BinAC Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[Helix/Acknowledgement| bwForCluster Helix Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[JUSTUS2/Acknowledgement| bwForCluster JUSTUS2 Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[NEMO/Acknowledgement| bwForCluster NEMO Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
== Data Facilities ==&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[SDS@hd/Acknowledgement| SDS@hdAcknowledgement]]&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Main_Page&amp;diff=12808</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Main_Page&amp;diff=12808"/>
		<updated>2024-06-18T10:24:32Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;span style=&amp;quot;font-size:140%;&amp;gt;&#039;&#039;&#039;Welcome to the bwHPC Wiki.&#039;&#039;&#039;&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
bwHPC represents services and resources in the State of &#039;&#039;&#039;B&#039;&#039;&#039;aden-&#039;&#039;&#039;W&#039;&#039;&#039;ürttemberg, Germany, for High Performance Computing (&#039;&#039;&#039;HPC&#039;&#039;&#039;), Data Intensive Computing (&#039;&#039;&#039;DIC&#039;&#039;&#039;) and Large Scale Scientific Data Management (&#039;&#039;&#039;LS2DM&#039;&#039;&#039;).&lt;br /&gt;
&lt;br /&gt;
The main bwHPC web page is at [https://www.bwhpc.de/ https://www.bwhpc.de/].&lt;br /&gt;
&lt;br /&gt;
Many topics depend on the cluster system you use. &lt;br /&gt;
First choose the cluster you use,  then select the correct topic.&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot; background:#eeeefe; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold; text-align:left&amp;quot; | Courses / eLearning&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [https://training.bwhpc.de/ eLearning and Online Courses]&lt;br /&gt;
* [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC (external resource)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold; text-align:left&amp;quot; | Need Access to a Cluster?&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | HPC System Specific Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
bwHPC Clusters are dedicated to [https://www.bwhpc.de/bwhpc-domains.php specific research domains].  &lt;br /&gt;
Documentation differs between compute clusters, please see cluster specific overview pages:&lt;br /&gt;
{|&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[BwUniCluster2.0|bwUniCluster 2.0]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | General Purpose, Teaching&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[:JUSTUS2| bwForCluster JUSTUS 2]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | Theoretical Chemistry, Condensed Matter Physics, and Quantum Sciences&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot; | [[Helix|bwForCluster Helix]]&lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  |   Structural and Systems Biology, Medical Science, Soft Matter, Computational Humanities, and Mathematics and Computer Science&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[NEMO|bwForCluster NEMO]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | Neurosciences, Particle Physics, Materials Science, and Microsystems Engineering&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[BinAC|bwForCluster BinAC]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | Bioinformatics, Geosciences and Astrophysics. &lt;br /&gt;
|}&lt;br /&gt;
|-&lt;br /&gt;
|bwHPC Clusters: [https://www.bwhpc.de/cluster.php operational status] &lt;br /&gt;
Further Compute Clusters in Baden-Württemberg (separate access policies):&lt;br /&gt;
* bwHPC tier 1: [https://kb.hlrs.de/platforms/index.php/HPE_Hawk Hawk] ([https://www.hlrs.de/solutions-services/academic-users/ getting access])&lt;br /&gt;
* bwHPC tier 2: [https://www.nhr.kit.edu/userdocs/horeka HoreKa] ([https://www.nhr.kit.edu/userdocs/horeka/projects/ getting access])&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold; text-align:left&amp;quot; | Documentation valid for all Clusters&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Environment Modules| Software Modules]] and software documentation explained&lt;br /&gt;
* [https://www.bwhpc.de/software.html List of Software] on all clusters&lt;br /&gt;
* [[Development| Software Development and Parallel Programming]]&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
* [[HPC Glossary]]&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;height:100%; background:#ffeaef; width:100%&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#f5dfdf; font-size:120%; font-weight:bold;  text-align:left&amp;quot;   | Scientific Data Storage&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
For user guides of the scientific data storage services:&lt;br /&gt;
* [[SDS@hd]]&lt;br /&gt;
* [https://www.rda.kit.edu/english bwDataArchive]&lt;br /&gt;
* [https://zas.bwsfs.uni-tuebingen.de/info/uebersicht bwSFS]&lt;br /&gt;
Associated, but local scientific storage services are:&lt;br /&gt;
* [https://wiki.scc.kit.edu/lsdf/index.php/Category:LSDF_Online_Storage LSDF Online Storage] (only for KIT and KIT partners)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;height:100%; background:#ffeaef; width:100%&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#f5dfdf; font-size:120%; font-weight:bold;  text-align:left&amp;quot;   | Data Management&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Data Transfer|Data Transfer]]&lt;br /&gt;
* [https://www.forschungsdaten.org/index.php/FDM-Kontakte#Deutschland Research Data Management (RDM)] contact persons&lt;br /&gt;
* [https://www.forschungsdaten.info Portal for Research Data Management] (Forschungsdaten.info)&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
Support is provided by the [https://www.bwhpc.de/teams.php bwHPC Competence Centers]:&lt;br /&gt;
* [https://bw-support.scc.kit.edu/ Submit a Ticket]&lt;br /&gt;
* extended Support via [https://zas.bwhpc.de/en/zas_info_tigerteamsupport.php &amp;quot;Tiger Teams&amp;quot;]&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#d1dadf; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Acknowledgement&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Please [[Acknowledgement|acknowledge]] our resources in your publications.]&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Main_Page&amp;diff=12807</id>
		<title>Main Page</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Main_Page&amp;diff=12807"/>
		<updated>2024-06-18T10:24:06Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;span style=&amp;quot;font-size:140%;&amp;gt;&#039;&#039;&#039;Welcome to the bwHPC Wiki.&#039;&#039;&#039;&amp;lt;/span&amp;gt;&lt;br /&gt;
&lt;br /&gt;
bwHPC represents services and resources in the State of &#039;&#039;&#039;B&#039;&#039;&#039;aden-&#039;&#039;&#039;W&#039;&#039;&#039;ürttemberg, Germany, for High Performance Computing (&#039;&#039;&#039;HPC&#039;&#039;&#039;), Data Intensive Computing (&#039;&#039;&#039;DIC&#039;&#039;&#039;) and Large Scale Scientific Data Management (&#039;&#039;&#039;LS2DM&#039;&#039;&#039;).&lt;br /&gt;
&lt;br /&gt;
The main bwHPC web page is at [https://www.bwhpc.de/ https://www.bwhpc.de/].&lt;br /&gt;
&lt;br /&gt;
Many topics depend on the cluster system you use. &lt;br /&gt;
First choose the cluster you use,  then select the correct topic.&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot; background:#eeeefe; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold; text-align:left&amp;quot; | Courses / eLearning&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* [https://training.bwhpc.de/ eLearning and Online Courses]&lt;br /&gt;
* [https://hpc-wiki.info/hpc/Introduction_to_Linux_in_HPC Introduction to Linux in HPC (external resource)]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold; text-align:left&amp;quot; | Need Access to a Cluster?&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Registration]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;  background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | HPC System Specific Documentation&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
bwHPC Clusters are dedicated to [https://www.bwhpc.de/bwhpc-domains.php specific research domains].  &lt;br /&gt;
Documentation differs between compute clusters, please see cluster specific overview pages:&lt;br /&gt;
{|&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[BwUniCluster2.0|bwUniCluster 2.0]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | General Purpose, Teaching&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[:JUSTUS2| bwForCluster JUSTUS 2]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | Theoretical Chemistry, Condensed Matter Physics, and Quantum Sciences&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot; | [[Helix|bwForCluster Helix]]&lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  |   Structural and Systems Biology, Medical Science, Soft Matter, Computational Humanities, and Mathematics and Computer Science&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[NEMO|bwForCluster NEMO]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | Neurosciences, Particle Physics, Materials Science, and Microsystems Engineering&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;padding:5px; width:30%&amp;quot;  | [[BinAC|bwForCluster BinAC]] &lt;br /&gt;
| style=&amp;quot;padding-left:20px;&amp;quot;  | Bioinformatics, Geosciences and Astrophysics. &lt;br /&gt;
|}&lt;br /&gt;
|-&lt;br /&gt;
|bwHPC Clusters: [https://www.bwhpc.de/cluster.php operational status] &lt;br /&gt;
Further Compute Clusters in Baden-Württemberg (separate access policies):&lt;br /&gt;
* bwHPC tier 1: [https://kb.hlrs.de/platforms/index.php/HPE_Hawk Hawk] ([https://www.hlrs.de/solutions-services/academic-users/ getting access])&lt;br /&gt;
* bwHPC tier 2: [https://www.nhr.kit.edu/userdocs/horeka HoreKa] ([https://www.nhr.kit.edu/userdocs/horeka/projects/ getting access])&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#cef2e0; font-size:120%; font-weight:bold; text-align:left&amp;quot; | Documentation valid for all Clusters&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Environment Modules| Software Modules]] and software documentation explained&lt;br /&gt;
* [https://www.bwhpc.de/software.html List of Software] on all clusters&lt;br /&gt;
* [[Development| Software Development and Parallel Programming]]&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
* [[HPC Glossary]]&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;height:100%; background:#ffeaef; width:100%&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#f5dfdf; font-size:120%; font-weight:bold;  text-align:left&amp;quot;   | Scientific Data Storage&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
For user guides of the scientific data storage services:&lt;br /&gt;
* [[SDS@hd]]&lt;br /&gt;
* [https://www.rda.kit.edu/english bwDataArchive]&lt;br /&gt;
* [https://zas.bwsfs.uni-tuebingen.de/info/uebersicht bwSFS]&lt;br /&gt;
Associated, but local scientific storage services are:&lt;br /&gt;
* [https://wiki.scc.kit.edu/lsdf/index.php/Category:LSDF_Online_Storage LSDF Online Storage] (only for KIT and KIT partners)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
{| style=&amp;quot;height:100%; background:#ffeaef; width:100%&amp;quot;&lt;br /&gt;
| style=&amp;quot;padding:8px; background:#f5dfdf; font-size:120%; font-weight:bold;  text-align:left&amp;quot;   | Data Management&lt;br /&gt;
|-&lt;br /&gt;
|&lt;br /&gt;
* [[Data Transfer|Data Transfer]]&lt;br /&gt;
* [https://www.forschungsdaten.org/index.php/FDM-Kontakte#Deutschland Research Data Management (RDM)] contact persons&lt;br /&gt;
* [https://www.forschungsdaten.info Portal for Research Data Management] (Forschungsdaten.info)&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#eeeefe; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#dedefe; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Support&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
Support is provided by the [https://www.bwhpc.de/teams.php bwHPC Competence Centers]:&lt;br /&gt;
* [https://bw-support.scc.kit.edu/ Submit a Ticket]&lt;br /&gt;
* extended Support via [https://zas.bwhpc.de/en/zas_info_tigerteamsupport.php &amp;quot;Tiger Teams&amp;quot;]&lt;br /&gt;
|}&lt;br /&gt;
{| style=&amp;quot;  background:#e6e9eb; width:100%;&amp;quot; &lt;br /&gt;
| style=&amp;quot;padding:8px; background:#e6e9eb; font-size:120%; font-weight:bold;  text-align:left&amp;quot; | Acknowledgement&lt;br /&gt;
|-&lt;br /&gt;
| &lt;br /&gt;
* Please [[Acknowledgement|acknowledge]] our resources in your publications.]&lt;br /&gt;
|}&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12806</id>
		<title>Acknowledgement</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12806"/>
		<updated>2024-06-18T10:19:06Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Remember to acknowledge our resources in your publications!&lt;br /&gt;
&lt;br /&gt;
Such recognition is important for acquiring funding for the next generation of hardware, support services, data storage, and infrastructure.&lt;br /&gt;
&lt;br /&gt;
== HPC Cluster ==&lt;br /&gt;
&lt;br /&gt;
Cluster-specific information can be found here:&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BwUniCluster2.0/Acknowledgement| bwUniCluster Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BinAC/Acknowledgement| bwForCluster BinAC Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[Helix/Acknowledgement| bwForCluster Helix Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[JUSTUS2/Acknowledgement| bwForCluster JUSTUS2 Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[NEMO/Acknowledgement| bwForCluster NEMO Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
The publications will be referenced on the bwHPC website:&lt;br /&gt;
https://www.bwhpc.de/user_publications.html&lt;br /&gt;
&lt;br /&gt;
== Data Facilities ==&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[SDS@hd/Acknowledgement| SDS@hdAcknowledgement]]&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12805</id>
		<title>Acknowledgement</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12805"/>
		<updated>2024-06-18T10:18:27Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Remember to acknowledge our resources in your publications!&lt;br /&gt;
&lt;br /&gt;
Such recognition is important for acquiring funding for the next generation of hardware, support services, data storage, and infrastructure.&lt;br /&gt;
&lt;br /&gt;
* HPC Cluster &lt;br /&gt;
&lt;br /&gt;
Cluster-specific information can be found here:&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BwUniCluster2.0/Acknowledgement| bwUniCluster Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BinAC/Acknowledgement| bwForCluster BinAC Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[Helix/Acknowledgement| bwForCluster Helix Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[JUSTUS2/Acknowledgement| bwForCluster JUSTUS2 Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[NEMO/Acknowledgement| bwForCluster NEMO Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
The publications will be referenced on the bwHPC website:&lt;br /&gt;
https://www.bwhpc.de/user_publications.html&lt;br /&gt;
&lt;br /&gt;
# Data Facilities&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[SDS@hd/Acknowledgement| SDS@hdAcknowledgement]]&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12804</id>
		<title>Acknowledgement</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12804"/>
		<updated>2024-06-18T10:18:06Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Remember to acknowledge our resources in your publications!&lt;br /&gt;
&lt;br /&gt;
Such recognition is important for acquiring funding for the next generation of hardware, support services, data storage, and infrastructure.&lt;br /&gt;
&lt;br /&gt;
# HPC Cluster &lt;br /&gt;
&lt;br /&gt;
Cluster-specific information can be found here:&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BwUniCluster2.0/Acknowledgement| bwUniCluster Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BinAC/Acknowledgement| bwForCluster BinAC Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[Helix/Acknowledgement| bwForCluster Helix Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[JUSTUS2/Acknowledgement| bwForCluster JUSTUS2 Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[NEMO/Acknowledgement| bwForCluster NEMO Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
The publications will be referenced on the bwHPC website:&lt;br /&gt;
https://www.bwhpc.de/user_publications.html&lt;br /&gt;
&lt;br /&gt;
# Data Facilities&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[DS@hd/Acknowledgement| SDS@hdAcknowledgement]]&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster2.0/Slurm&amp;diff=12466</id>
		<title>BwUniCluster2.0/Slurm</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwUniCluster2.0/Slurm&amp;diff=12466"/>
		<updated>2023-11-20T13:35:10Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* Slurm Commands (excerpt) */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&amp;lt;div id=&amp;quot;top&amp;quot;&amp;gt;&amp;lt;/div&amp;gt;&lt;br /&gt;
=  Slurm HPC Workload Manager = &lt;br /&gt;
== Specification == &lt;br /&gt;
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Any kind of calculation on the compute nodes of [[bwUniCluster 2.0|bwUniCluster 2.0]] requires the user to define calculations as a sequence of commands or single command together with required run time, number of CPU cores and main memory and submit all, i.e., the &#039;&#039;&#039;batch job&#039;&#039;&#039;, to a resource and workload managing software. bwUniCluster 2.0 has installed the workload managing software Slurm. Therefore any job submission by the user is to be executed by commands of the Slurm software. Slurm queues and runs user jobs based on fair sharing policies.  &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Slurm Commands (excerpt) ==&lt;br /&gt;
Some of the most used Slurm commands for non-administrators working on bwUniCluster 2.0.&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Slurm commands !! Brief explanation&lt;br /&gt;
|-&lt;br /&gt;
| [[#Job Submission : sbatch|sbatch]] || Submits a job and queues it in an input queue [[https://slurm.schedmd.com/sbatch.html sbatch]] &lt;br /&gt;
|-&lt;br /&gt;
| [[#Detailed job information : scontrol show job|scontrol show job]] || Displays detailed job state information [[https://slurm.schedmd.com/scontrol.html scontrol]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#List of your submitted jo/bs : squeue|squeue]] || Displays information about active, eligible, blocked, and/or recently completed jobs [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Start time of job or resources : squeue|squeue --start]] || Returns start time of submitted job or requested resources [[https://slurm.schedmd.com/squeue.html squeue]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Shows free resources : sinfo_t_idle|sinfo_t_idle]] || Shows what resources are available for immediate use [[https://slurm.schedmd.com/sinfo.html sinfo]]&lt;br /&gt;
|-&lt;br /&gt;
| [[#Canceling own jobs : scancel|scancel]] || Cancels a job (obsoleted!) [[https://slurm.schedmd.com/scancel.html scancel]]&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
* [https://slurm.schedmd.com/tutorials.html  Slurm Tutorials]&lt;br /&gt;
* [https://slurm.schedmd.com/pdfs/summary.pdf  Slurm command/option summary (2 pages)]&lt;br /&gt;
* [https://slurm.schedmd.com/man_index.html  Slurm Commands]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Job Submission : sbatch ==&lt;br /&gt;
Batch jobs are submitted by using the command &#039;&#039;&#039;sbatch&#039;&#039;&#039;. The main purpose of the &#039;&#039;&#039;sbatch&#039;&#039;&#039; command is to specify the resources that are needed to run the job. &#039;&#039;&#039;sbatch&#039;&#039;&#039; will then queue the batch job. However, starting of batch job depends on the availability of the requested resources and the fair sharing value.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== sbatch Command Parameters ===&lt;br /&gt;
The syntax and use of &#039;&#039;&#039;sbatch&#039;&#039;&#039; can be displayed via:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ man sbatch&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;sbatch&#039;&#039;&#039; options can be used from the command line or in your job script.&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! colspan=&amp;quot;3&amp;quot; | sbatch Options&lt;br /&gt;
|-&lt;br /&gt;
! Command line&lt;br /&gt;
! Script&lt;br /&gt;
! Purpose&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -t &#039;&#039;time&#039;&#039;  or  --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| #SBATCH --time=&#039;&#039;time&#039;&#039;&lt;br /&gt;
| Wall clock time limit.&amp;lt;br&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -N &#039;&#039;count&#039;&#039;  or  --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --nodes=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of nodes to be used.&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -n &#039;&#039;count&#039;&#039;  or  --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of tasks to be launched.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --ntasks-per-node=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Maximum count (&amp;lt;= 28 and &amp;lt;= 40 resp.) of tasks per node.&amp;lt;br&amp;gt;(Replaces the option ppn of MOAB.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -c &#039;&#039;count&#039;&#039; or --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| #SBATCH --cpus-per-task=&#039;&#039;count&#039;&#039;&lt;br /&gt;
| Number of CPUs required per (MPI-)task.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Memory in MegaByte per node.&amp;lt;br&amp;gt;(Default value is 128000 and 96000 MB resp., i.e. you should omit &amp;lt;br&amp;gt; the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039;&lt;br /&gt;
| #SBATCH --mem-per-cpu=&#039;&#039;value_in_MB&#039;&#039; &lt;br /&gt;
| Minimum Memory required per allocated CPU.&amp;lt;br&amp;gt;(Replaces the option pmem of MOAB. You should omit &amp;lt;br&amp;gt; the setting of this option.)&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-type=&#039;&#039;type&#039;&#039;&lt;br /&gt;
| Notify user by email when certain event types occur.&amp;lt;br&amp;gt;Valid type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
| #SBATCH --mail-user=&#039;&#039;mail-address&#039;&#039;&lt;br /&gt;
|  The specified mail-address receives email notification of state&amp;lt;br&amp;gt;changes as defined by --mail-type.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --output=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job output is stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --error=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| File in which job error messages are stored. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -J &#039;&#039;name&#039;&#039; or --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| #SBATCH --job-name=&#039;&#039;name&#039;&#039;&lt;br /&gt;
| Job name.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| #SBATCH --export=[ALL,] &#039;&#039;env-variables&#039;&#039;&lt;br /&gt;
| Identifies which environment variables from the submission &amp;lt;br&amp;gt; environment are propagated to the launched application. Default &amp;lt;br&amp;gt; is ALL. If adding an environment variable to the submission&amp;lt;br&amp;gt; environment is intended, the argument ALL must be added.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -A &#039;&#039;group-name&#039;&#039; or --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| #SBATCH --account=&#039;&#039;group-name&#039;&#039;&lt;br /&gt;
| Change resources used by this job to specified group. You may &amp;lt;br&amp;gt; need this option if your account is assigned to more &amp;lt;br&amp;gt; than one group. By command &amp;quot;scontrol show job&amp;quot; the project &amp;lt;br&amp;gt; group the job is accounted on can be seen behind &amp;quot;Account=&amp;quot;. &lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -p &#039;&#039;queue-name&#039;&#039; or --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| #SBATCH --partition=&#039;&#039;queue-name&#039;&#039;&lt;br /&gt;
| Request a specific queue for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| #SBATCH --reservation=&#039;&#039;reservation-name&#039;&#039;&lt;br /&gt;
| Use a specific reservation for the resource allocation.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C &#039;&#039;LSDF&#039;&#039; or --constraint=&#039;&#039;LSDF&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=LSDF&lt;br /&gt;
| Job constraint LSDF Filesystems.&lt;br /&gt;
|-&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -C &#039;&#039;BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&#039;&#039; or --constraint=&#039;&#039;BEEOND (BEEOND_4MDS, BEEOND_MAXMDS&#039;&#039;&lt;br /&gt;
| #SBATCH --constraint=BEEOND (BEEOND_4MDS, BEEOND_MAXMDS)&lt;br /&gt;
| Job constraint BeeOND file system.&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== sbatch --partition  &#039;&#039;queues&#039;&#039; ====&lt;br /&gt;
Queue classes define maximum resources such as walltime, nodes and processes per node and queue of the compute system. Details can be found here:&lt;br /&gt;
* [[BwUniCluster_2.0_Batch_Queues#sbatch_-p_queue|bwUniCluster 2.0 queue settings]]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== sbatch Examples ===&lt;br /&gt;
==== Serial Programs ====&lt;br /&gt;
To submit a serial job that runs the script &#039;&#039;&#039;job.sh&#039;&#039;&#039; and that requires 5000 MB of main memory and 10 minutes of wall clock time&lt;br /&gt;
&lt;br /&gt;
a) execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p dev_single -n 1 -t 10:00 --mem=5000  job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or&lt;br /&gt;
b) add after the initial line of your script &#039;&#039;&#039;job.sh&#039;&#039;&#039; the lines (here with a high memory request):&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --time=10&lt;br /&gt;
#SBATCH --mem=180gb&lt;br /&gt;
#SBATCH --job-name=simple&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
and execute the modified script with the command line option &#039;&#039;--partition=fat&#039;&#039; (with &#039;&#039;--partition=(dev_)single&#039;&#039; maximum &#039;&#039;--mem=96gb&#039;&#039; is possible):&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch --partition=fat job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note, that sbatch command line options overrule script options.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Multithreaded Programs ====&lt;br /&gt;
Multithreaded programs operate faster than serial programs on CPUs with multiple cores.&amp;lt;br&amp;gt;&lt;br /&gt;
Moreover, multiple threads of one process share resources such as memory.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
For multithreaded programs based on &#039;&#039;&#039;Open&#039;&#039;&#039; &#039;&#039;&#039;M&#039;&#039;&#039;ulti-&#039;&#039;&#039;P&#039;&#039;&#039;rocessing (OpenMP) a number of threads is defined by the environment variable OMP_NUM_THREADS. By default this variable is set to 1 (OMP_NUM_THREADS=1).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Because hyperthreading is switched on on bwUniCluster 2.0, the option --cpus-per-task (-c) must be set to 2*n, if you want to use n threads.&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
To submit a batch job called &#039;&#039;OpenMP_Test&#039;&#039; that runs a 40-fold threaded program &#039;&#039;omp_exe&#039;&#039; which requires 6000 MByte of total physical memory and total wall clock time of 40 minutes:&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
a) execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p single --export=ALL,OMP_NUM_THREADS=40 -J OpenMP_Test -N 1 -c 80 -t 40 --mem=6000 ./omp_exe&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or&lt;br /&gt;
--&amp;gt;&lt;br /&gt;
* generate the script &#039;&#039;&#039;job_omp.sh&#039;&#039;&#039; containing the following lines:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --cpus-per-task=80&lt;br /&gt;
#SBATCH --time=40:00&lt;br /&gt;
#SBATCH --mem=6000mb   &lt;br /&gt;
#SBATCH --export=ALL,EXECUTABLE=./omp_exe&lt;br /&gt;
#SBATCH -J OpenMP_Test&lt;br /&gt;
&lt;br /&gt;
#Usually you should set&lt;br /&gt;
export KMP_AFFINITY=compact,1,0&lt;br /&gt;
#export KMP_AFFINITY=verbose,compact,1,0 prints messages concerning the supported affinity&lt;br /&gt;
#KMP_AFFINITY Description: https://software.intel.com/en-us/node/524790#KMP_AFFINITY_ENVIRONMENT_VARIABLE&lt;br /&gt;
&lt;br /&gt;
export OMP_NUM_THREADS=$((${SLURM_JOB_CPUS_PER_NODE}/2))&lt;br /&gt;
echo &amp;quot;Executable ${EXECUTABLE} running on ${SLURM_JOB_CPUS_PER_NODE} cores with ${OMP_NUM_THREADS} threads&amp;quot;&lt;br /&gt;
startexe=${EXECUTABLE}&lt;br /&gt;
echo $startexe&lt;br /&gt;
exec $startexe&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Using Intel compiler the environment variable KMP_AFFINITY switches on binding of threads to specific cores and, if necessary, replace &amp;lt;placeholder&amp;gt; with the required modulefile to enable the OpenMP environment and execute the script &#039;&#039;&#039;job_omp.sh&#039;&#039;&#039; adding the queue class &#039;&#039;single&#039;&#039; as sbatch option:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p single job_omp.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
Note, that sbatch command line options overrule script options, e.g.,&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch --partition=single --mem=200 job_omp.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
overwrites the script setting of 6000 MByte with 200 MByte.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== MPI Parallel Programs ====&lt;br /&gt;
MPI parallel programs run faster than serial programs on multi CPU and multi core systems. N-fold spawned processes of the MPI program, i.e., &#039;&#039;&#039;MPI tasks&#039;&#039;&#039;,  run simultaneously and communicate via the Message Passing Interface (MPI) paradigm. MPI tasks do not share memory but can be spawned over different nodes.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Multiple MPI tasks must be launched via &#039;&#039;&#039;mpirun&#039;&#039;&#039;, e.g. 4 MPI tasks of &#039;&#039;my_par_program&#039;&#039;:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun -n 4 my_par_program&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
This command runs 4 MPI tasks of &#039;&#039;my_par_program&#039;&#039; on the node you are logged in.&lt;br /&gt;
To run this command with a loaded Intel MPI the environment variable I_MPI_HYDRA_BOOTSTRAP must be unset ( --&amp;gt; $ unset I_MPI_HYDRA_BOOTSTRAP).&lt;br /&gt;
&lt;br /&gt;
Running MPI parallel programs in a batch job the interactive environment - particularly the loaded modules - will also be set in the batch job. If you want to set a defined module environment in your batch job you have to purge all modules before setting the wished modules. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
===== OpenMPI =====&lt;br /&gt;
&lt;br /&gt;
If you want to run jobs on batch nodes, generate a wrapper script &#039;&#039;job_ompi.sh&#039;&#039; for &#039;&#039;&#039;OpenMPI&#039;&#039;&#039; containing the following lines:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Use when a defined module environment related to OpenMPI is wished&lt;br /&gt;
module load mpi/openmpi/&amp;lt;placeholder_for_version&amp;gt;&lt;br /&gt;
mpirun --bind-to core --map-by core -report-bindings my_par_program&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;Attention:&#039;&#039;&#039; Do &#039;&#039;&#039;NOT&#039;&#039;&#039; add mpirun options &#039;&#039;-n &amp;lt;number_of_processes&amp;gt;&#039;&#039; or any other option defining processes or nodes, since Slurm instructs mpirun about number of processes and node hostnames. Use &#039;&#039;&#039;ALWAYS&#039;&#039;&#039; the MPI options &#039;&#039;&#039;&#039;&#039;--bind-to core&#039;&#039;&#039;&#039;&#039; and &#039;&#039;&#039;&#039;&#039;--map-by core|socket|node&#039;&#039;&#039;&#039;&#039;. Please type &#039;&#039;mpirun --help&#039;&#039; for an explanation of the meaning of the different options of mpirun option &#039;&#039;--map-by&#039;&#039;.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Considering 4 OpenMPI tasks on a single node, each requiring 2000 MByte, and running for 1 hour, execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p single -N 1 -n 4 --mem-per-cpu=2000 --time=01:00:00 ./job_ompi.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===== Intel MPI =====&lt;br /&gt;
&lt;br /&gt;
Generate a wrapper script for &#039;&#039;&#039;Intel MPI&#039;&#039;&#039;, &#039;&#039;job_impi.sh&#039;&#039; containing the following lines:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Use when a defined module environment related to Intel MPI is wished&lt;br /&gt;
module load mpi/impi/&amp;lt;placeholder_for_version&amp;gt;   &lt;br /&gt;
mpiexec.hydra -bootstrap slurm my_par_program&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&amp;lt;font color=red&amp;gt;&#039;&#039;&#039;Attention:&#039;&#039;&#039;&amp;lt;/font&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
Do &#039;&#039;&#039;NOT&#039;&#039;&#039; add mpirun options &#039;&#039;-n &amp;lt;number_of_processes&amp;gt;&#039;&#039; or any other option defining processes or nodes, since Slurm instructs mpirun about number of processes and node hostnames.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Launching and running 200 Intel MPI tasks on 5 nodes, each requiring 80 GByte, and running for 5 hours, execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch --partition=multiple -N 5 --ntasks-per-node=40 --mem=80gb -t 300 ./job_impi.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt; &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
If you want to use 128 or more nodes, you must also set the environment variable as follows:           &amp;lt;BR&amp;gt;&lt;br /&gt;
export I_MPI_HYDRA_BRANCH_COUNT=-1&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
If you want to use the options perhost, ppn or rr, you must additionally set the environment variable I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=off.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Multithreaded + MPI parallel Programs ====&lt;br /&gt;
Multithreaded + MPI parallel programs operate faster than serial programs on multi CPUs with multiple cores. All threads of one process share resources such as memory. On the contrary MPI tasks do not share memory but can be spawned over different nodes. &#039;&#039;&#039;Because hyperthreading is switched on on bwUniCluster 2.0, the option --cpus-per-task (-c) must be set to 2*n, if you want to use n threads.&#039;&#039;&#039;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
===== OpenMPI with Multithreading =====&lt;br /&gt;
Multiple MPI tasks using &#039;&#039;&#039;OpenMPI&#039;&#039;&#039; must be launched by the MPI parallel program &#039;&#039;&#039;mpirun&#039;&#039;&#039;. For multithreaded programs based on &#039;&#039;&#039;Open&#039;&#039;&#039; &#039;&#039;&#039;M&#039;&#039;&#039;ulti-&#039;&#039;&#039;P&#039;&#039;&#039;rocessing (OpenMP) number of threads are defined by the environment variable OMP_NUM_THREADS. By default this variable is set to 1 (OMP_NUM_THREADS=1).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&#039;&#039;&#039;For OpenMPI&#039;&#039;&#039; a job-script to submit a batch job called &#039;&#039;job_ompi_omp.sh&#039;&#039; that runs a MPI program with 4 tasks and a 28-fold threaded program &#039;&#039;ompi_omp_program&#039;&#039; requiring 3000 MByte of physical memory per thread (using 28 threads per MPI task you will get 28*3000 MByte = 84000 MByte per MPI task) and total wall clock time of 3 hours looks like:&lt;br /&gt;
&amp;lt;!--b)--&amp;gt;&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --ntasks=4&lt;br /&gt;
#SBATCH --cpus-per-task=56&lt;br /&gt;
#SBATCH --time=03:00:00&lt;br /&gt;
#SBATCH --mem=83gb    # 84000 MB = 84000/1024 GB = 82.1 GB&lt;br /&gt;
#SBATCH --export=ALL,MPI_MODULE=mpi/openmpi/3.1,EXECUTABLE=./ompi_omp_program&lt;br /&gt;
#SBATCH --output=&amp;quot;parprog_hybrid_%j.out&amp;quot;  &lt;br /&gt;
&lt;br /&gt;
# Use when a defined module environment related to OpenMPI is wished&lt;br /&gt;
module load ${MPI_MODULE}&lt;br /&gt;
export OMP_NUM_THREADS=$((${SLURM_CPUS_PER_TASK}/2))&lt;br /&gt;
export MPIRUN_OPTIONS=&amp;quot;--bind-to core --map-by socket:PE=${OMP_NUM_THREADS} -report-bindings&amp;quot;&lt;br /&gt;
export NUM_CORES=${SLURM_NTASKS}*${OMP_NUM_THREADS}&lt;br /&gt;
echo &amp;quot;${EXECUTABLE} running on ${NUM_CORES} cores with ${SLURM_NTASKS} MPI-tasks and ${OMP_NUM_THREADS} threads&amp;quot;&lt;br /&gt;
startexe=&amp;quot;mpirun -n ${SLURM_NTASKS} ${MPIRUN_OPTIONS} ${EXECUTABLE}&amp;quot;&lt;br /&gt;
echo $startexe&lt;br /&gt;
exec $startexe&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Execute the script &#039;&#039;&#039;job_ompi_omp.sh&#039;&#039;&#039; by command sbatch:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p multiple ./job_ompi_omp.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* With the mpirun option &#039;&#039;--bind-to core&#039;&#039; MPI tasks and OpenMP threads are bound to physical cores.&lt;br /&gt;
* With the option &#039;&#039;--map-by node:PE=&amp;lt;value&amp;gt;&#039;&#039; (neighbored) MPI tasks will be attached to different nodes and each MPI task is bound to the first core of a node. &amp;lt;value&amp;gt; must be set to ${OMP_NUM_THREADS}.&lt;br /&gt;
* The option &#039;&#039;-report-bindings&#039;&#039; shows the bindings between MPI tasks and physical cores.&lt;br /&gt;
* The mpirun-options &#039;&#039;&#039;--bind-to core&#039;&#039;&#039;, &#039;&#039;&#039;--map-by socket|...|node:PE=&amp;lt;value&amp;gt;&#039;&#039;&#039; should always be used when running a multithreaded MPI program.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===== Intel MPI with Multithreading =====&lt;br /&gt;
Multithreaded + MPI parallel programs operate faster than serial programs on multi CPUs with multiple cores. All threads of one process share resources such as memory. On the contrary MPI tasks do not share memory but can be spawned over different nodes.  &lt;br /&gt;
&lt;br /&gt;
Multiple Intel MPI tasks must be launched by the MPI parallel program &#039;&#039;&#039;mpiexec.hydra&#039;&#039;&#039;. For multithreaded programs based on &#039;&#039;&#039;Open&#039;&#039;&#039; &#039;&#039;&#039;M&#039;&#039;&#039;ulti-&#039;&#039;&#039;P&#039;&#039;&#039;rocessing (OpenMP) number of threads are defined by the environment variable OMP_NUM_THREADS. By default this variable is set to 1 (OMP_NUM_THREADS=1).&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;For Intel MPI&#039;&#039;&#039; a job-script to submit a batch job called &#039;&#039;job_impi_omp.sh&#039;&#039; that runs a Intel MPI program with 10 tasks and a 40-fold threaded program &#039;&#039;impi_omp_program&#039;&#039; requiring 96000 MByte of total physical memory per task and total wall clock time of 1 hours looks like: &lt;br /&gt;
&lt;br /&gt;
&amp;lt;!--b)--&amp;gt; &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --ntasks=10&lt;br /&gt;
#SBATCH --cpus-per-task=80&lt;br /&gt;
#SBATCH --time=60&lt;br /&gt;
#SBATCH --mem=96000&lt;br /&gt;
#SBATCH --export=ALL,MPI_MODULE=mpi/impi,EXE=./impi_omp_program&lt;br /&gt;
#SBATCH --output=&amp;quot;parprog_impi_omp_%j.out&amp;quot;&lt;br /&gt;
&lt;br /&gt;
#If using more than one MPI task per node please set&lt;br /&gt;
export KMP_AFFINITY=compact,1,0&lt;br /&gt;
#export KMP_AFFINITY=verbose,scatter  prints messages concerning the supported affinity &lt;br /&gt;
#KMP_AFFINITY Description: https://software.intel.com/en-us/node/524790#KMP_AFFINITY_ENVIRONMENT_VARIABLE&lt;br /&gt;
&lt;br /&gt;
# Use when a defined module environment related to Intel MPI is wished &lt;br /&gt;
module load ${MPI_MODULE}&lt;br /&gt;
export OMP_NUM_THREADS=$((${SLURM_CPUS_PER_TASK}/2))&lt;br /&gt;
export MPIRUN_OPTIONS=&amp;quot;-binding &amp;quot;domain=omp:compact&amp;quot; -print-rank-map -envall&amp;quot;&lt;br /&gt;
export NUM_PROCS=eval(${SLURM_NTASKS}*${OMP_NUM_THREADS})&lt;br /&gt;
echo &amp;quot;${EXE} running on ${NUM_PROCS} cores with ${SLURM_NTASKS} MPI-tasks and ${OMP_NUM_THREADS} threads&amp;quot;&lt;br /&gt;
startexe=&amp;quot;mpiexec.hydra -bootstrap slurm ${MPIRUN_OPTIONS} -n ${SLURM_NTASKS} ${EXE}&amp;quot;&lt;br /&gt;
echo $startexe&lt;br /&gt;
exec $startexe&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
Using Intel compiler the environment variable KMP_AFFINITY switches on binding of threads to specific cores. If you only run one MPI task per node please set KMP_AFFINITY=compact,1,0.&lt;br /&gt;
&amp;lt;BR&amp;gt;&lt;br /&gt;
If you want to use 128 or more nodes, you must also set the environment variable as follows:           &amp;lt;BR&amp;gt;&lt;br /&gt;
export I_MPI_HYDRA_BRANCH_COUNT=-1&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
If you want to use the options perhost, ppn or rr, you must additionally set the environment variable I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=off. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Execute the script &#039;&#039;&#039;job_impi_omp.sh&#039;&#039;&#039; by command sbatch:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p multiple ./job_impi_omp.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The mpirun option &#039;&#039;-print-rank-map&#039;&#039; shows the bindings between MPI tasks and nodes (not very beneficial). The option &#039;&#039;-binding&#039;&#039; binds MPI tasks (processes) to a particular processor; &#039;&#039;domain=omp&#039;&#039; means that the domain size is determined by the number of threads. If you would choose 2 MPI tasks per node, you should choose &#039;&#039;-binding &amp;quot;cell=unit;map=bunch&amp;quot;&#039;&#039;; this binding maps one MPI process to each socket. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Chain jobs ====&lt;br /&gt;
The CPU time requirements of many applications exceed the limits of the job classes. In those situations it is recommended to solve the problem by a job chain. A job chain is a sequence of jobs where each job automatically starts its successor. &lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
####################################&lt;br /&gt;
## simple Slurm submitter script to setup   ## &lt;br /&gt;
## a chain of jobs using Slurm                    ##&lt;br /&gt;
####################################&lt;br /&gt;
## ver.  : 2018-11-27, KIT, SCC&lt;br /&gt;
&lt;br /&gt;
## Define maximum number of jobs via positional parameter 1, default is 5&lt;br /&gt;
max_nojob=${1:-5}&lt;br /&gt;
&lt;br /&gt;
## Define your jobscript (e.g. &amp;quot;~/chain_job.sh&amp;quot;)&lt;br /&gt;
chain_link_job=${PWD}/chain_job.sh&lt;br /&gt;
&lt;br /&gt;
## Define type of dependency via positional parameter 2, default is &#039;afterok&#039;&lt;br /&gt;
dep_type=&amp;quot;${2:-afterok}&amp;quot;&lt;br /&gt;
## -&amp;gt; List of all dependencies:&lt;br /&gt;
## https://slurm.schedmd.com/sbatch.html&lt;br /&gt;
&lt;br /&gt;
myloop_counter=1&lt;br /&gt;
## Submit loop&lt;br /&gt;
while [ ${myloop_counter} -le ${max_nojob} ] ; do&lt;br /&gt;
   ##&lt;br /&gt;
   ## Differ slurm_opt depending on chain link number&lt;br /&gt;
   if [ ${myloop_counter} -eq 1 ] ; then&lt;br /&gt;
      slurm_opt=&amp;quot;&amp;quot;&lt;br /&gt;
   else&lt;br /&gt;
      slurm_opt=&amp;quot;-d ${dep_type}:${jobID}&amp;quot;&lt;br /&gt;
   fi&lt;br /&gt;
   ##&lt;br /&gt;
   ## Print current iteration number and sbatch command&lt;br /&gt;
   echo &amp;quot;Chain job iteration = ${myloop_counter}&amp;quot;&lt;br /&gt;
   echo &amp;quot;   sbatch --export=myloop_counter=${myloop_counter} ${slurm_opt} ${chain_link_job}&amp;quot;&lt;br /&gt;
   ## Store job ID for next iteration by storing output of sbatch command with empty lines&lt;br /&gt;
   jobID=$(sbatch -p &amp;lt;queue&amp;gt; --export=ALL,myloop_counter=${myloop_counter} ${slurm_opt} ${chain_link_job} 2&amp;gt;&amp;amp;1 | sed &#039;s/[S,a-z]* //g&#039;)&lt;br /&gt;
   ##   &lt;br /&gt;
   ## Check if ERROR occured&lt;br /&gt;
   if [[ &amp;quot;${jobID}&amp;quot; =~ &amp;quot;ERROR&amp;quot; ]] ; then&lt;br /&gt;
      echo &amp;quot;   -&amp;gt; submission failed!&amp;quot; ; exit 1&lt;br /&gt;
   else&lt;br /&gt;
      echo &amp;quot;   -&amp;gt; job number = ${jobID}&amp;quot;&lt;br /&gt;
   fi&lt;br /&gt;
   ##&lt;br /&gt;
   ## Increase counter&lt;br /&gt;
   let myloop_counter+=1&lt;br /&gt;
done&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== GPU jobs ====&lt;br /&gt;
&lt;br /&gt;
The nodes in the gpu_4 and gpu_8 queues have 4 or 8 NVIDIA Tesla V100 GPUs. Just submitting a job to these queues is not enough to also allocate one or more GPUs, you have to do so using the &amp;quot;--gres=gpu&amp;quot; parameter. You have to specifiy how many GPUs your job needs, e.g. &amp;quot;--gres=gpu:2&amp;quot; will request two GPUs.&lt;br /&gt;
&lt;br /&gt;
The GPU nodes are shared between multiple jobs if the jobs don&#039;t request all the GPUs in a node and there are enough ressources to run more than one job. The individual GPUs are always bound to a single job and will not be shared between different jobs.&lt;br /&gt;
&lt;br /&gt;
a) add after the initial line of your script job.sh the line including the&lt;br /&gt;
information about the GPU usage:&amp;lt;br&amp;gt;   #SBATCH --gres=gpu:2&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --ntasks=40&lt;br /&gt;
#SBATCH --time=02:00:00&lt;br /&gt;
#SBATCH --mem=4000&lt;br /&gt;
#SBATCH --gres=gpu:2&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or b) execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p &amp;lt;queue&amp;gt; -n 40 -t 02:00:00 --mem 4000 --gres=gpu:2 job.sh&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
If you start an interactive session on of the GPU nodes, you can use the &amp;quot;nvidia-smi&amp;quot; command to list the GPUs allocated to your job:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ nvidia-smi&lt;br /&gt;
Sun Mar 29 15:20:05 2020       &lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |&lt;br /&gt;
|-------------------------------+----------------------+----------------------+&lt;br /&gt;
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |&lt;br /&gt;
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |&lt;br /&gt;
|===============================+======================+======================|&lt;br /&gt;
|   0  Tesla V100-SXM2...  Off  | 00000000:3A:00.0 Off |                    0 |&lt;br /&gt;
| N/A   29C    P0    39W / 300W |      9MiB / 32510MiB |      0%      Default |&lt;br /&gt;
+-------------------------------+----------------------+----------------------+&lt;br /&gt;
|   1  Tesla V100-SXM2...  Off  | 00000000:3B:00.0 Off |                    0 |&lt;br /&gt;
| N/A   30C    P0    41W / 300W |      8MiB / 32510MiB |      0%      Default |&lt;br /&gt;
+-------------------------------+----------------------+----------------------+&lt;br /&gt;
                                                                               &lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
| Processes:                                                       GPU Memory |&lt;br /&gt;
|  GPU       PID   Type   Process name                             Usage      |&lt;br /&gt;
|=============================================================================|&lt;br /&gt;
|    0     14228      G   /usr/bin/X                                     8MiB |&lt;br /&gt;
|    1     14228      G   /usr/bin/X                                     8MiB |&lt;br /&gt;
+-----------------------------------------------------------------------------+&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br/&amp;gt;&lt;br /&gt;
In case of using Open MPI, the underlying communication infrastructure (UCX and Open MPI&#039;s BTL) is CUDA-aware.&lt;br /&gt;
However, there may be warnings, e.g. when running&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ module load compiler/gnu/10.3 mpi/openmpi devel/cuad&lt;br /&gt;
$ mpirun mpirun -np 2 ./mpi_cuda_app&lt;br /&gt;
--------------------------------------&lt;br /&gt;
WARNING: There are more than one active ports on host &#039;uc2n520&#039;, but the&lt;br /&gt;
default subnet GID prefix was detected on more than one of these&lt;br /&gt;
ports.  If these ports are connected to different physical IB&lt;br /&gt;
networks, this configuration will fail in Open MPI.  This version of&lt;br /&gt;
Open MPI requires that every physically separate IB subnet that is&lt;br /&gt;
used between connected MPI processes must have different subnet ID&lt;br /&gt;
values.&lt;br /&gt;
&lt;br /&gt;
Please see this FAQ entry for more details:&lt;br /&gt;
&lt;br /&gt;
  http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid&lt;br /&gt;
&lt;br /&gt;
NOTE: You can turn off this warning by setting the MCA parameter&lt;br /&gt;
      btl_openib_warn_default_gid_prefix to 0.&lt;br /&gt;
--------------------------------------------------------------------------&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Please run Open MPI&#039;s mpirun using the following command:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun --mca pml ucx --mca btl_openib_warn_default_gid_prefix 0 -np 2 ./mpi_cuda_app&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
or disabling the (older) communication layer Bit-Transfer-Layer (short BTL) alltogether:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ mpirun --mca pml ucx --mca btl ^openib -np 2 ./mpi_cuda_app&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
(Please note, that CUDA per v11.4 is only available with up to GCC-10)&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== LSDF Online Storage ====&lt;br /&gt;
On bwUniCluster 2.0 you can use for special cases the LSDF Online Storage on the HPC cluster nodes. Please request for this service separately ([https://www.lsdf.kit.edu/os/storagerequest/: LSDF Storage Request]).&lt;br /&gt;
To mount the LSDF Online Storage on the compute nodes during the job runtime the&lt;br /&gt;
the constraint flag &amp;quot;LSDF&amp;quot; has to be set.  &lt;br /&gt;
&lt;br /&gt;
a) add after the initial line of your script job.sh the line including the&lt;br /&gt;
information about the LSDF Online Storage usage:&amp;lt;br&amp;gt;   #SBATCH --constraint=LSDF&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --time=120&lt;br /&gt;
#SBATCH --mem=200&lt;br /&gt;
#SBATCH --constraint=LSDF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or b) execute:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ sbatch -p &amp;lt;queue&amp;gt; -n 1 -t 2:00:00 --mem 200 job.sh -C LSDF&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
For the usage of the LSDF Online Storage&lt;br /&gt;
the following environment variables are available: $LSDF, $LSDFPROJECTS, $LSDFHOME.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
====BeeOND (BeeGFS On-Demand)====&lt;br /&gt;
&lt;br /&gt;
BeeOND instances are integrated into the prolog and epilog script of the cluster batch system Slurm. It can be used on the exclusive compute nodes during the job runtime with the constraint flag &amp;quot;BEEOND&amp;quot;, &amp;quot;BEEOND_4MDS&amp;quot; or &amp;quot;BEEOND_MAXMDS&amp;quot; ([[BwUniCluster_2.0_Slurm_common_Features#sbatch_Command_Parameters|Slurm Command Parameters]])&lt;br /&gt;
* BEEOND: one metadata server is started on the first node&lt;br /&gt;
* BEEOND_4MDS: 4 metadata servers are started within your job. If you have less than 4 nodes less metadata servers are started.&lt;br /&gt;
* BEEOND_MAXMDS: on every node of your job a metadata server for the on_demand file system is started&lt;br /&gt;
&lt;br /&gt;
As starting point we recommend using the &amp;quot;BEEOND&amp;quot; option. If you are unsure if this is sufficient for you feel free to contact the support team.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH ...&lt;br /&gt;
#SBATCH --constraint=BEEOND   # or BEEOND_4MDS or BEEOND_MAXMDS&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
After your job has started you can find the private on-demand file system in &#039;&#039;&#039;/mnt/odfs/${SLURM_JOB_ID}&#039;&#039;&#039; directory. The mountpoint comes with five pre-configured directories:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
# For small files (stripe count = 1)&lt;br /&gt;
/mnt/odfs/${SLURM_JOB_ID}/stripe_1&lt;br /&gt;
# Stripe count = 4&lt;br /&gt;
/mnt/odfs/${SLURM_JOB_ID}/stripe_default &lt;br /&gt;
# or &lt;br /&gt;
/mnt/odfs/${SLURM_JOB_ID}/stripe_4&lt;br /&gt;
# Stripe count = 8, 16 or 32, use this directories for medium sized and large files or when using MPI-IO&lt;br /&gt;
/mnt/odfs/${SLURM_JOB_ID}/stripe_8&lt;br /&gt;
/mnt/odfs/${SLURM_JOB_ID}/stripe_16 &lt;br /&gt;
# or &lt;br /&gt;
/mnt/odfs/${SLURM_JOB_ID}/stripe_32&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If you request less nodes than stripe count, the stripe count will be the number of nodes. For example, if you only request 8 nodes the directory stripe_16 has only a stripe count 8.&lt;br /&gt;
&lt;br /&gt;
; &amp;lt;font color=red&amp;gt;&#039;&#039;&#039;Attention:&#039;&#039;&#039;&amp;lt;/font&amp;gt;&amp;lt;br&amp;gt;&lt;br /&gt;
:Be careful when creating large files: use always the directory with the max stripe count for large files.&lt;br /&gt;
:If you create large files use a higher stripe count. For example, if your largest file is 1.1 Tb, then you have to use a stripe count larger than 2, &lt;br /&gt;
:otherwise the used disk space is exceeded.  &lt;br /&gt;
&lt;br /&gt;
The capacity of the private file system depends on the number of nodes. For each node you get 750 Gbyte.&lt;br /&gt;
If you request 100 nodes for your job, the private file system is 100 * 750 Gbyte ~ 75 Tbyte (approx) capacity.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Possible optimization:&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
A possible optimization is that the private file system uses its own metadata server. With the constraint BEEOND the metadata server is started on the first node. Depending on your application, the metadata server could consume a decent amount of CPU power. In this case, adding a extra node to your job could improve the performance of the on-demand file system and the total runtime of your application. In order to use this option, start your application with the MPI option:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
mpirun -nolocal myapplication&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
With the -nolocal option the node where mpirun is initiated is not used for your application. This node is fully available for the meta data server of your requested on-demand file system.&lt;br /&gt;
&lt;br /&gt;
Example job script:&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
# Very simple example on how to use a private on-demand file system.&lt;br /&gt;
#SBATCH -N 10&lt;br /&gt;
#SBATCH --constraint=BEEOND&lt;br /&gt;
&lt;br /&gt;
# Create a workspace. &lt;br /&gt;
ws_allocate myresults-${SLURM_JOB_ID} 90&lt;br /&gt;
RESULTDIR=$(ws_find myresults-${SLURM_JOB_ID})&lt;br /&gt;
&lt;br /&gt;
# Set ENV variable to on-demand file system.&lt;br /&gt;
ODFSDIR=/mnt/odfs/${SLURM_JOB_ID}/stripe_16/&lt;br /&gt;
&lt;br /&gt;
# Start application and write results to on-demand file system.&lt;br /&gt;
mpirun -nolocal myapplication -o $ODFSDIR/results&lt;br /&gt;
&lt;br /&gt;
# Copy back data after your job application end.&lt;br /&gt;
rsync -av $ODFSDIR/results $RESULTDIR&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Start time of job or resources : squeue --start ==&lt;br /&gt;
The command can be used by any user to displays the estimated start time of a job based a number of analysis types based on historical usage, earliest available reservable resources, and priority based backlog. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be run by &#039;&#039;&#039;any user&#039;&#039;&#039;. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== List of your submitted jobs : squeue ==&lt;br /&gt;
Displays information about YOUR active, pending and/or recently completed jobs. The command displays all own active and pending jobs. The command squeue is explained in detail on the webpage https://slurm.schedmd.com/squeue.html or via manpage (man squeue).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be run by any user.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Flags ===&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Flag !! Description&lt;br /&gt;
|-&lt;br /&gt;
| -l, --long&lt;br /&gt;
| Report more of the available information for the selected jobs or job steps, subject to any constraints specified.&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Examples ===&lt;br /&gt;
&#039;&#039;squeue&#039;&#039; example on bwUniCluster 2.0 &amp;lt;small&amp;gt;(Only your own jobs are displayed!)&amp;lt;/small&amp;gt;.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ squeue &lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
          18088744    single CPV.sbat   ab1234 PD       0:00      1 (Priority)&lt;br /&gt;
          18098414  multiple CPV.sbat   ab1234 PD       0:00      2 (Priority) &lt;br /&gt;
          18090089  multiple CPV.sbat   ab1234  R       2:27      2 uc2n[127-128]&lt;br /&gt;
$ squeue -l&lt;br /&gt;
            JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON) &lt;br /&gt;
         18088654    single CPV.sbat   ab1234 COMPLETI       4:29   2:00:00      1 uc2n374&lt;br /&gt;
         18088785    single CPV.sbat   ab1234  PENDING       0:00   2:00:00      1 (Priority)&lt;br /&gt;
         18098414  multiple CPV.sbat   ab1234  PENDING       0:00   2:00:00      2 (Priority)&lt;br /&gt;
         18088683    single CPV.sbat   ab1234  RUNNING       0:14   2:00:00      1 uc2n413  &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* The output of &#039;&#039;squeue&#039;&#039; shows how many jobs of yours are running or pending and how many nodes are in use by your jobs.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Shows free resources : sinfo_t_idle ==&lt;br /&gt;
The Slurm command sinfo is used to view partition and node information for a system running Slurm. It incorporates down time, reservations, and node state information in determining the available backfill window. The sinfo command can only be used by the administrator.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
SCC has prepared a special script (sinfo_t_idle) to find out how many processors are available for immediate use on the system. It is anticipated that users will use this information to submit jobs that meet these criteria and thus obtain quick job turnaround times. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
By default, this command can be used by any user or administrator. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Example ===&lt;br /&gt;
* The following command displays what resources are available for immediate use for the whole partition.&lt;br /&gt;
&amp;lt;pre&amp;gt;$ sinfo_t_idle&lt;br /&gt;
Partition dev_multiple  :      8 nodes idle&lt;br /&gt;
Partition multiple      :    332 nodes idle&lt;br /&gt;
Partition dev_single    :      4 nodes idle&lt;br /&gt;
Partition single        :     76 nodes idle&lt;br /&gt;
Partition long          :     80 nodes idle&lt;br /&gt;
Partition fat           :      5 nodes idle&lt;br /&gt;
Partition dev_special   :    342 nodes idle&lt;br /&gt;
Partition special       :    342 nodes idle&lt;br /&gt;
Partition dev_multiple_e:      7 nodes idle&lt;br /&gt;
Partition multiple_e    :    335 nodes idle&lt;br /&gt;
Partition gpu_4         :     12 nodes idle&lt;br /&gt;
Partition gpu_8         :      6 nodes idle&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
* For the above example jobs in all partitions can be run immediately.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Detailed job information : scontrol show job ==&lt;br /&gt;
scontrol show job displays detailed job state information and diagnostic output for all or a specified job of yours. Detailed information is available for active, pending and recently completed jobs. The command scontrol is explained in detail on the webpage https://slurm.schedmd.com/scontrol.html or via manpage (man scontrol). &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of all your jobs in normal mode: scontrol show job&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Display the state of a job with &amp;lt;jobid&amp;gt; in normal mode: scontrol show job &amp;lt;jobid&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Access ===&lt;br /&gt;
* End users can use scontrol show job to view the status of their &#039;&#039;&#039;own jobs&#039;&#039;&#039; only. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Arguments ===&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Option !! Default !! Description !! Example&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
|- style=&amp;quot;width:12%;&amp;quot; &lt;br /&gt;
| -d&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Detailed mode&lt;br /&gt;
| Example: Display the state with jobid 18089884 in detailed mode. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt;scontrol -d show job 18089884&amp;lt;/pre&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Scontrol show job Example ===&lt;br /&gt;
Here is an example from bwUniCluster 2.0.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
squeue    # show my own jobs (here the userid is replaced!)&lt;br /&gt;
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)&lt;br /&gt;
          18089884  multiple CPV.sbat   bq0742  R      33:44      2 uc2n[165-166]&lt;br /&gt;
&lt;br /&gt;
$&lt;br /&gt;
$ # now, see what&#039;s up with my pending job with jobid 18089884&lt;br /&gt;
$ &lt;br /&gt;
$ scontrol show job 18089884&lt;br /&gt;
&lt;br /&gt;
JobId=18089884 JobName=CPV.sbatch&lt;br /&gt;
   UserId=bq0742(8946) GroupId=scc(12345) MCS_label=N/A&lt;br /&gt;
   Priority=3 Nice=0 Account=kit QOS=normal&lt;br /&gt;
   JobState=RUNNING Reason=None Dependency=(null)&lt;br /&gt;
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0&lt;br /&gt;
   RunTime=00:35:06 TimeLimit=02:00:00 TimeMin=N/A&lt;br /&gt;
   SubmitTime=2020-03-16T14:14:54 EligibleTime=2020-03-16T14:14:54&lt;br /&gt;
   AccrueTime=2020-03-16T14:14:54&lt;br /&gt;
   StartTime=2020-03-16T15:12:51 EndTime=2020-03-16T17:12:51 Deadline=N/A&lt;br /&gt;
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-03-16T15:12:51&lt;br /&gt;
   Partition=multiple AllocNode:Sid=uc2n995:5064&lt;br /&gt;
   ReqNodeList=(null) ExcNodeList=(null)&lt;br /&gt;
   NodeList=uc2n[165-166]&lt;br /&gt;
   BatchHost=uc2n165&lt;br /&gt;
   NumNodes=2 NumCPUs=160 NumTasks=80 CPUs/Task=1 ReqB:S:C:T=0:0:*:1&lt;br /&gt;
   TRES=cpu=160,mem=96320M,node=2,billing=160&lt;br /&gt;
   Socks/Node=* NtasksPerN:B:S:C=40:0:*:1 CoreSpec=*&lt;br /&gt;
   MinCPUsNode=40 MinMemoryCPU=1204M MinTmpDiskNode=0&lt;br /&gt;
   Features=(null) DelayBoot=00:00:00&lt;br /&gt;
   OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)&lt;br /&gt;
   Command=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/CPV.sbatch&lt;br /&gt;
   WorkDir=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin&lt;br /&gt;
   StdErr=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/slurm-18089884.out&lt;br /&gt;
   StdIn=/dev/null&lt;br /&gt;
   StdOut=/pfs/data5/home/kit/scc/bq0742/git/CPV/bin/slurm-18089884.out&lt;br /&gt;
   Power=&lt;br /&gt;
   MailUser=(null) MailType=NONE&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
You can use standard Linux pipe commands to filter the very detailed scontrol show job output.&lt;br /&gt;
* In which state the job is?&lt;br /&gt;
&amp;lt;pre&amp;gt;$ scontrol show job 18089884 | grep -i State&lt;br /&gt;
   JobState=COMPLETED Reason=None Dependency=(null)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Cancel Slurm Jobs ==&lt;br /&gt;
The scancel command is used to cancel jobs. The command scancel is explained in detail on the webpage https://slurm.schedmd.com/scancel.html or via manpage (man scancel).   &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
=== Canceling own jobs : scancel ===&lt;br /&gt;
scancel is used to signal or cancel jobs, job arrays or job steps. The command is:&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
$ scancel [-i] &amp;lt;job-id&amp;gt;&lt;br /&gt;
$ scancel -t &amp;lt;job_state_name&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Flag !! Default !! Description !! Example&lt;br /&gt;
|- style=&amp;quot;vertical-align:top;&amp;quot;&lt;br /&gt;
| -i, --interactive&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Interactive mode.&lt;br /&gt;
| Cancel the job 987654 interactively. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt; scancel -i 987654 &amp;lt;/pre&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| -t, --state&lt;br /&gt;
| (n/a)&lt;br /&gt;
| Restrict the scancel operation to jobs in a certain state. &amp;lt;br&amp;gt; &amp;quot;job_state_name&amp;quot; may have a value of either &amp;quot;PENDING&amp;quot;, &amp;quot;RUNNING&amp;quot; or &amp;quot;SUSPENDED&amp;quot;.&lt;br /&gt;
| Cancel all jobs in state &amp;quot;PENDING&amp;quot;. &amp;lt;br&amp;gt; &amp;lt;pre&amp;gt; scancel -t &amp;quot;PENDING&amp;quot; &amp;lt;/pre&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
= Resource Managers =&lt;br /&gt;
=== Batch Job (Slurm) Variables ===&lt;br /&gt;
The following environment variables of Slurm are added to your environment once your job has started&lt;br /&gt;
&amp;lt;small&amp;gt;(only an excerpt of the most important ones)&amp;lt;/small&amp;gt;.&lt;br /&gt;
{| width=750px class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
! Environment !! Brief explanation&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_CPUS_PER_NODE &lt;br /&gt;
| Number of processes per node dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_NODELIST &lt;br /&gt;
| List of nodes dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_JOB_NUM_NODES &lt;br /&gt;
| Number of nodes dedicated to the job&lt;br /&gt;
|- &lt;br /&gt;
| SLURM_MEM_PER_NODE &lt;br /&gt;
| Memory per node dedicated to the job &lt;br /&gt;
|- &lt;br /&gt;
| SLURM_NPROCS&lt;br /&gt;
| Total number of processes dedicated to the job &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_CLUSTER_NAME&lt;br /&gt;
| Name of the cluster executing the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_CPUS_PER_TASK &lt;br /&gt;
| Number of CPUs requested per task&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_ACCOUNT&lt;br /&gt;
| Account name &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_ID&lt;br /&gt;
| Job ID&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_NAME&lt;br /&gt;
| Job Name&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_PARTITION&lt;br /&gt;
| Partition/queue running the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_UID&lt;br /&gt;
| User ID of the job&#039;s owner&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_SUBMIT_DIR&lt;br /&gt;
| Job submit folder.  The directory from which sbatch was invoked. &lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_USER&lt;br /&gt;
| User name of the job&#039;s owner&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_RESTART_COUNT&lt;br /&gt;
| Number of times job has restarted&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_PROCID&lt;br /&gt;
| Task ID (MPI rank)&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_NTASKS&lt;br /&gt;
| The total number of tasks available for the job&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_STEP_ID&lt;br /&gt;
| Job step ID&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_STEP_NUM_TASKS&lt;br /&gt;
| Task count (number of MPI ranks)&lt;br /&gt;
|-&lt;br /&gt;
| SLURM_JOB_CONSTRAINT&lt;br /&gt;
| Job constraints&lt;br /&gt;
|}&lt;br /&gt;
See also:&lt;br /&gt;
* [https://slurm.schedmd.com/sbatch.html#lbAI Slurm input and output environment variables]&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&lt;br /&gt;
=== Job Exit Codes ===&lt;br /&gt;
A job&#039;s exit code (also known as exit status, return code and completion code) is captured by SLURM and saved as part of the job record. &lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
Any non-zero exit code will be assumed to be a job failure and will result in a Job State of FAILED with a reason of &amp;quot;NonZeroExitCode&amp;quot;.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
The exit code is an 8 bit unsigned number ranging between 0 and 255. While it is possible for a job to return a negative exit code, SLURM will display it as an unsigned value in the 0 - 255 range.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
==== Displaying Exit Codes and Signals ====&lt;br /&gt;
SLURM displays a job&#039;s exit code in the output of the &#039;&#039;&#039;scontrol show job&#039;&#039;&#039; and the sview utility.&lt;br /&gt;
&amp;lt;br&amp;gt; &lt;br /&gt;
When a signal was responsible for a job or step&#039;s termination, the signal number will be displayed after the exit code, delineated by a colon(:).&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
==== Submitting Termination Signal ====&lt;br /&gt;
Here is an example, how to &#039;save&#039; a Slurm termination signal in a typical jobscript.&lt;br /&gt;
&amp;lt;source lang=&amp;quot;bash&amp;quot;&amp;gt;&lt;br /&gt;
[...]&lt;br /&gt;
exit_code=$?&lt;br /&gt;
mpirun  -np &amp;lt;#cores&amp;gt;  &amp;lt;EXE_BIN_DIR&amp;gt;/&amp;lt;executable&amp;gt; ... (options)  2&amp;gt;&amp;amp;1&lt;br /&gt;
[ &amp;quot;$exit_code&amp;quot; -eq 0 ] &amp;amp;&amp;amp; echo &amp;quot;all clean...&amp;quot; || \&lt;br /&gt;
   echo &amp;quot;Executable &amp;lt;EXE_BIN_DIR&amp;gt;/&amp;lt;executable&amp;gt; finished with exit code ${$exit_code}&amp;quot;&lt;br /&gt;
[...]&lt;br /&gt;
&amp;lt;/source&amp;gt;&lt;br /&gt;
* Do not use &#039;&#039;&#039;&#039;time&#039;&#039;&#039;&#039; mpirun! The exit code will be the one submitted by the first (time) program.&lt;br /&gt;
* You do not need an &#039;&#039;&#039;exit $exit_code&#039;&#039;&#039; in the scripts.&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
&amp;lt;br&amp;gt;&lt;br /&gt;
----&lt;br /&gt;
[[Category:bwUniCluster 2.0|bwUniCluster 2.0]]&lt;br /&gt;
[[#top|Back to top]]&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12363</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12363"/>
		<updated>2023-09-12T13:35:48Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A [[compute cluster]] has several hundred, maybe a thousand computers ([[compute node]]s), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a [[script]] (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;[[batch script]]&#039;&#039; has a header section, in which you specify details like required [[compute core]]s, [[estimated runtime]], [[memory requirements]], disk space needed, etc.&lt;br /&gt;
# &#039;&#039;[[Submit]]&#039;&#039; the script into a [[queue]], where your &#039;&#039;[[job]]&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of [[batch system]]s currently used on bwHPC clusters, called &amp;quot;[[Moab]]&amp;quot; (legacy installs) and &amp;quot;[[Slurm]]&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
== How to Use Computing Ressources Efficiently ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaneously calculated. &lt;br /&gt;
For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation can be found under &#039;&#039;&#039;[[Scaling]]&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Information regarding the supported parallel programming paradigms and specific hints on their usage are summarized at &#039;&#039;&#039;[[Parallel_Programming]]&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
Running calculations on an HPC node consumes a lot of energy. To make the most of the available resources and keep cluster and energy use as efficient as possible please also see our advice for &#039;&#039;&#039;[[Energy Efficient Cluster Usage]]&lt;br /&gt;
&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
;HPC&lt;br /&gt;
: short for &#039;&#039;&#039;H&#039;&#039;&#039;igh &#039;&#039;&#039;P&#039;&#039;&#039;erformance &#039;&#039;&#039;C&#039;&#039;&#039;omputing &lt;br /&gt;
&lt;br /&gt;
;HPC Cluster&lt;br /&gt;
:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
;Node&lt;br /&gt;
:An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
;Socket&lt;br /&gt;
:Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
;Core&lt;br /&gt;
:The physical unit that can independently execute tasks on a CPU. Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
;Thread&lt;br /&gt;
:Logical unit that can be executed independently. &lt;br /&gt;
&lt;br /&gt;
;Hyperthreading&lt;br /&gt;
: Modern computers can be configured so that one real compute-[[core]] appears like two &amp;quot;logical&amp;quot; cores on the system. These two &amp;quot;hyperthreads&amp;quot; can sometimes do computations in parallel, if the calculations use two different sub-units of the compute-core - but most of the time, two calculations on two hyperthreads run on the same physical hardware and both run half as fast as if one thread had a full core. Some programs (e.g. gromacs) can profit from running with twice as many threads on hyperthreads and finish 10-20% faster if run in that way. &lt;br /&gt;
&lt;br /&gt;
;Multithreading&lt;br /&gt;
: Multithreading means that one computer program runs calculations on more than one compute-core using several logical &amp;quot;threads&amp;quot; of serial compute instructions to do so (eg. to work through different and independent data arrays in parallel). Specific types of  multithreaded parallelization are [[OpenMP]] or [[MPI]].&lt;br /&gt;
&lt;br /&gt;
;CPU&lt;br /&gt;
:Central Processing Unit. It performs the actual computation in a compute node. A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
;GPU:Graphics Processing Unit. GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be explicitly designed to use GPUs. CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
;RAM &lt;br /&gt;
:Random Access Memory. It is used as the working memory for the cores.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;Batch System&lt;br /&gt;
&lt;br /&gt;
; Moab&lt;br /&gt;
&lt;br /&gt;
; Script &lt;br /&gt;
&lt;br /&gt;
; Slurm&lt;br /&gt;
&lt;br /&gt;
; Shell Script / Bash&lt;br /&gt;
&lt;br /&gt;
; Job&lt;br /&gt;
&lt;br /&gt;
; Runtime&lt;br /&gt;
&lt;br /&gt;
; Scaling&lt;br /&gt;
&lt;br /&gt;
; Scheduler&lt;br /&gt;
&lt;br /&gt;
; Submit&lt;br /&gt;
&lt;br /&gt;
; Parallelization&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12362</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12362"/>
		<updated>2023-09-12T13:34:05Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* How to Use Computing Ressources Efficiently */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A [[compute cluster]] has several hundred, maybe a thousand computers ([[compute node]]s), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a [[script]] (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;[[batch script]]&#039;&#039; has a header section, in which you specify details like required [[compute core]]s, [[estimated runtime]], [[memory requirements]], disk space needed, etc.&lt;br /&gt;
# &#039;&#039;[[Submit]]&#039;&#039; the script into a [[queue]], where your &#039;&#039;[[job]]&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of [[batch system]]s currently used on bwHPC clusters, called &amp;quot;[[Moab]]&amp;quot; (legacy installs) and &amp;quot;[[Slurm]]&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
== How to Use Computing Ressources Efficiently ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaneously calculated. &lt;br /&gt;
For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation can be found under &#039;&#039;&#039;[[Scaling]]&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
Information regarding the supported parallel programming paradigms and specific hints on their usage are summarized at &#039;&#039;&#039;[[Parallel_Programming]]&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
Running calculations on an HPC node consumes a lot of energy. To make the most of the available resources and keep cluster and energy use as efficient as possible please also see our advice for &#039;&#039;&#039;[[Energy Efficient Cluster Usage]]&lt;br /&gt;
&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== Scaling ==&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation is described in the page&lt;br /&gt;
* [[Scaling]]&lt;br /&gt;
&lt;br /&gt;
== Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Please also see our advice for&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
;HPC&lt;br /&gt;
: short for &#039;&#039;&#039;H&#039;&#039;&#039;igh &#039;&#039;&#039;P&#039;&#039;&#039;erformance &#039;&#039;&#039;C&#039;&#039;&#039;omputing &lt;br /&gt;
&lt;br /&gt;
;HPC Cluster&lt;br /&gt;
:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
;Node&lt;br /&gt;
:An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
;Socket&lt;br /&gt;
:Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
;Core&lt;br /&gt;
:The physical unit that can independently execute tasks on a CPU. Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
;Thread&lt;br /&gt;
:Logical unit that can be executed independently. &lt;br /&gt;
&lt;br /&gt;
;Hyperthreading&lt;br /&gt;
: Modern computers can be configured so that one real compute-[[core]] appears like two &amp;quot;logical&amp;quot; cores on the system. These two &amp;quot;hyperthreads&amp;quot; can sometimes do computations in parallel, if the calculations use two different sub-units of the compute-core - but most of the time, two calculations on two hyperthreads run on the same physical hardware and both run half as fast as if one thread had a full core. Some programs (e.g. gromacs) can profit from running with twice as many threads on hyperthreads and finish 10-20% faster if run in that way. &lt;br /&gt;
&lt;br /&gt;
;Multithreading&lt;br /&gt;
: Multithreading means that one computer program runs calculations on more than one compute-core using several logical &amp;quot;threads&amp;quot; of serial compute instructions to do so (eg. to work through different and independent data arrays in parallel). Specific types of  multithreaded parallelization are [[OpenMP]] or [[MPI]].&lt;br /&gt;
&lt;br /&gt;
;CPU&lt;br /&gt;
:Central Processing Unit. It performs the actual computation in a compute node. A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
;GPU:Graphics Processing Unit. GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be explicitly designed to use GPUs. CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
;RAM &lt;br /&gt;
:Random Access Memory. It is used as the working memory for the cores.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;Batch System&lt;br /&gt;
&lt;br /&gt;
; Moab&lt;br /&gt;
&lt;br /&gt;
; Script &lt;br /&gt;
&lt;br /&gt;
; Slurm&lt;br /&gt;
&lt;br /&gt;
; Shell Script / Bash&lt;br /&gt;
&lt;br /&gt;
; Job&lt;br /&gt;
&lt;br /&gt;
; Runtime&lt;br /&gt;
&lt;br /&gt;
; Scaling&lt;br /&gt;
&lt;br /&gt;
; Scheduler&lt;br /&gt;
&lt;br /&gt;
; Submit&lt;br /&gt;
&lt;br /&gt;
; Parallelization&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12361</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12361"/>
		<updated>2023-09-12T13:32:33Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* Link to Batch System per Cluster */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A [[compute cluster]] has several hundred, maybe a thousand computers ([[compute node]]s), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a [[script]] (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;[[batch script]]&#039;&#039; has a header section, in which you specify details like required [[compute core]]s, [[estimated runtime]], [[memory requirements]], disk space needed, etc.&lt;br /&gt;
# &#039;&#039;[[Submit]]&#039;&#039; the script into a [[queue]], where your &#039;&#039;[[job]]&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of [[batch system]]s currently used on bwHPC clusters, called &amp;quot;[[Moab]]&amp;quot; (legacy installs) and &amp;quot;[[Slurm]]&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
== How to Use Computing Ressources Efficiently ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaneously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation can be found under &#039;&#039;&#039;[[Scaling]]&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Information regarding the supported parallel programming paradigms and specific hints on their usage are summarized at &#039;&#039;&#039;[[Parallel_Programming]]&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Running calculations on an HPC node consumes a lot of energy. To make the most of the available resources and keep cluster and energy use as efficient as possible please also see our advice for &#039;&#039;&#039;[[Energy Efficient Cluster Usage]]&lt;br /&gt;
&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== Scaling ==&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation is described in the page&lt;br /&gt;
* [[Scaling]]&lt;br /&gt;
&lt;br /&gt;
== Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Please also see our advice for&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
;HPC&lt;br /&gt;
: short for &#039;&#039;&#039;H&#039;&#039;&#039;igh &#039;&#039;&#039;P&#039;&#039;&#039;erformance &#039;&#039;&#039;C&#039;&#039;&#039;omputing &lt;br /&gt;
&lt;br /&gt;
;HPC Cluster&lt;br /&gt;
:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
;Node&lt;br /&gt;
:An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
;Socket&lt;br /&gt;
:Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
;Core&lt;br /&gt;
:The physical unit that can independently execute tasks on a CPU. Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
;Thread&lt;br /&gt;
:Logical unit that can be executed independently. &lt;br /&gt;
&lt;br /&gt;
;Hyperthreading&lt;br /&gt;
: Modern computers can be configured so that one real compute-[[core]] appears like two &amp;quot;logical&amp;quot; cores on the system. These two &amp;quot;hyperthreads&amp;quot; can sometimes do computations in parallel, if the calculations use two different sub-units of the compute-core - but most of the time, two calculations on two hyperthreads run on the same physical hardware and both run half as fast as if one thread had a full core. Some programs (e.g. gromacs) can profit from running with twice as many threads on hyperthreads and finish 10-20% faster if run in that way. &lt;br /&gt;
&lt;br /&gt;
;Multithreading&lt;br /&gt;
: Multithreading means that one computer program runs calculations on more than one compute-core using several logical &amp;quot;threads&amp;quot; of serial compute instructions to do so (eg. to work through different and independent data arrays in parallel). Specific types of  multithreaded parallelization are [[OpenMP]] or [[MPI]].&lt;br /&gt;
&lt;br /&gt;
;CPU&lt;br /&gt;
:Central Processing Unit. It performs the actual computation in a compute node. A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
;GPU:Graphics Processing Unit. GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be explicitly designed to use GPUs. CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
;RAM &lt;br /&gt;
:Random Access Memory. It is used as the working memory for the cores.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;Batch System&lt;br /&gt;
&lt;br /&gt;
; Moab&lt;br /&gt;
&lt;br /&gt;
; Script &lt;br /&gt;
&lt;br /&gt;
; Slurm&lt;br /&gt;
&lt;br /&gt;
; Shell Script / Bash&lt;br /&gt;
&lt;br /&gt;
; Job&lt;br /&gt;
&lt;br /&gt;
; Runtime&lt;br /&gt;
&lt;br /&gt;
; Scaling&lt;br /&gt;
&lt;br /&gt;
; Scheduler&lt;br /&gt;
&lt;br /&gt;
; Submit&lt;br /&gt;
&lt;br /&gt;
; Parallelization&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12356</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12356"/>
		<updated>2023-09-12T13:09:09Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a script (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;batch script&#039;&#039; has a header section, in which you specify details like required compute cores, estimated runtime, memory requirements, disk space needed, etc.&lt;br /&gt;
# &#039;&#039;Submit&#039;&#039; the script into a queue, where your &#039;&#039;job&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of batch systems currently used on bwHPC clusters, called &amp;quot;Moab&amp;quot; (legacy installs) and &amp;quot;Slurm&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Scaling ==&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation is described in the page&lt;br /&gt;
* [[Scaling]]&lt;br /&gt;
&lt;br /&gt;
== Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Please also see our advice for&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
;HPC&lt;br /&gt;
: short for &#039;&#039;&#039;H&#039;&#039;&#039;igh &#039;&#039;&#039;P&#039;&#039;&#039;erformance &#039;&#039;&#039;C&#039;&#039;&#039;omputing &lt;br /&gt;
&lt;br /&gt;
;HPC Cluster&lt;br /&gt;
:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
;Node&lt;br /&gt;
:An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
;Socket&lt;br /&gt;
:Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
;Core&lt;br /&gt;
:The physical unit that can independently execute tasks on a CPU. Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
;Thread&lt;br /&gt;
:Logical unit that can be executed independently. &lt;br /&gt;
&lt;br /&gt;
;Hyperthreading&lt;br /&gt;
:If the same processor core is configured to execute one or more execution paths in parallel (a hardware setting).&lt;br /&gt;
:Hyperthreading is a technology that allows a single physical CPU core to behave like two virtual cores, improving multitasking performance by executing multiple threads simultaneously.&lt;br /&gt;
&lt;br /&gt;
;Multithreading&lt;br /&gt;
:Alternatively, there can be software threads that can be understood in the context of &amp;quot;parallel execution paths&amp;quot; within the same program (eg. to work through different and independent data arrays in parallel). See OpenMP.&lt;br /&gt;
&lt;br /&gt;
;CPU&lt;br /&gt;
:Central Processing Unit. It performs the actual computation in a compute node. A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
;GPU:Graphics Processing Unit. GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be explicitly designed to use GPUs. CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
;RAM &lt;br /&gt;
:Random Access Memory. It is used as the working memory for the cores.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;Batch System&lt;br /&gt;
&lt;br /&gt;
; Moab&lt;br /&gt;
&lt;br /&gt;
; Script &lt;br /&gt;
&lt;br /&gt;
; Slurm&lt;br /&gt;
&lt;br /&gt;
; Shell Script / Bash&lt;br /&gt;
&lt;br /&gt;
; Job&lt;br /&gt;
&lt;br /&gt;
; Runtime&lt;br /&gt;
&lt;br /&gt;
; Scaling&lt;br /&gt;
&lt;br /&gt;
; Scheduler&lt;br /&gt;
&lt;br /&gt;
; Submit&lt;br /&gt;
&lt;br /&gt;
; Parallelization&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12355</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12355"/>
		<updated>2023-09-12T13:05:01Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* Scaling */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a script (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;batch script&#039;&#039; has a header section, in which you specify details like required compute cores, estimated runtime, memory requirements, disk space needed, etc.&lt;br /&gt;
# &#039;&#039;Submit&#039;&#039; the script into a queue, where your &#039;&#039;job&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of batch systems currently used on bwHPC clusters, called &amp;quot;Moab&amp;quot; (legacy installs) and &amp;quot;Slurm&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
== How to Use Computing Ressources Efficiently ==&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaneously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation can be found under &#039;&#039;&#039;[[Scaling]]&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Information regarding the supported parallel programming paradigms and specific hints on their usage are summarized at &#039;&#039;&#039;[[Parallel_Programming]]&#039;&#039;&#039; &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Running calculations on an HPC node consumes a lot of energy. To make the most of the available resources and keep cluster and energy use as efficient as possible please also see our advice for &#039;&#039;&#039;[[Energy Efficient Cluster Usage]]&lt;br /&gt;
&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
== Scaling ==&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation is described in the page&lt;br /&gt;
* [[Scaling]]&lt;br /&gt;
&lt;br /&gt;
== Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Please also see our advice for&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
;HPC&lt;br /&gt;
: short for &#039;&#039;&#039;H&#039;&#039;&#039;igh &#039;&#039;&#039;P&#039;&#039;&#039;erformance &#039;&#039;&#039;C&#039;&#039;&#039;omputing &lt;br /&gt;
&lt;br /&gt;
;HPC Cluster&lt;br /&gt;
:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
;Node&lt;br /&gt;
:An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
;Socket&lt;br /&gt;
:Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
;Core&lt;br /&gt;
:The physical unit that can independently execute tasks on a CPU. Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
;Thread&lt;br /&gt;
:Logical unit that can be executed independently. &lt;br /&gt;
&lt;br /&gt;
;Hyperthreading&lt;br /&gt;
:If the same processor core is configured to execute one or more execution paths in parallel (a hardware setting).&lt;br /&gt;
:Hyperthreading is a technology that allows a single physical CPU core to behave like two virtual cores, improving multitasking performance by executing multiple threads simultaneously.&lt;br /&gt;
&lt;br /&gt;
;Multithreading&lt;br /&gt;
:Alternatively, there can be software threads that can be understood in the context of &amp;quot;parallel execution paths&amp;quot; within the same program (eg. to work through different and independent data arrays in parallel). See OpenMP.&lt;br /&gt;
&lt;br /&gt;
;CPU&lt;br /&gt;
:Central Processing Unit. It performs the actual computation in a compute node. A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
;GPU:Graphics Processing Unit. GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be explicitly designed to use GPUs. CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
;RAM &lt;br /&gt;
:Random Access Memory. It is used as the working memory for the cores.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;Batch System&lt;br /&gt;
&lt;br /&gt;
; Moab&lt;br /&gt;
&lt;br /&gt;
; Script &lt;br /&gt;
&lt;br /&gt;
; Slurm&lt;br /&gt;
&lt;br /&gt;
; Shell Script / Bash&lt;br /&gt;
&lt;br /&gt;
; Job&lt;br /&gt;
&lt;br /&gt;
; Runtime&lt;br /&gt;
&lt;br /&gt;
; Scaling&lt;br /&gt;
&lt;br /&gt;
; Scheduler&lt;br /&gt;
&lt;br /&gt;
; Submit&lt;br /&gt;
&lt;br /&gt;
; Parallelization&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12354</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12354"/>
		<updated>2023-09-12T13:03:44Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* HPC Glossary */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a script (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;batch script&#039;&#039; has a header section, in which you specify details like required compute cores, estimated runtime, memory requirements, disk space needed, etc.&lt;br /&gt;
# &#039;&#039;Submit&#039;&#039; the script into a queue, where your &#039;&#039;job&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of batch systems currently used on bwHPC clusters, called &amp;quot;Moab&amp;quot; (legacy installs) and &amp;quot;Slurm&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
== Scaling ==&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation is described in the page&lt;br /&gt;
* [[Scaling]]&lt;br /&gt;
&lt;br /&gt;
== Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Please also see our advice for&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
;HPC&lt;br /&gt;
: short for &#039;&#039;&#039;H&#039;&#039;&#039;igh &#039;&#039;&#039;P&#039;&#039;&#039;erformance &#039;&#039;&#039;C&#039;&#039;&#039;omputing &lt;br /&gt;
&lt;br /&gt;
;HPC Cluster&lt;br /&gt;
:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
;Node&lt;br /&gt;
:An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
;Socket&lt;br /&gt;
:Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
;Core&lt;br /&gt;
:The physical unit that can independently execute tasks on a CPU. Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
;Thread&lt;br /&gt;
:Logical unit that can be executed independently. &lt;br /&gt;
&lt;br /&gt;
;Hyperthreading&lt;br /&gt;
:If the same processor core is configured to execute one or more execution paths in parallel (a hardware setting).&lt;br /&gt;
:Hyperthreading is a technology that allows a single physical CPU core to behave like two virtual cores, improving multitasking performance by executing multiple threads simultaneously.&lt;br /&gt;
&lt;br /&gt;
;Multithreading&lt;br /&gt;
:Alternatively, there can be software threads that can be understood in the context of &amp;quot;parallel execution paths&amp;quot; within the same program (eg. to work through different and independent data arrays in parallel). See OpenMP.&lt;br /&gt;
&lt;br /&gt;
;CPU&lt;br /&gt;
:Central Processing Unit. It performs the actual computation in a compute node. A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
;GPU:Graphics Processing Unit. GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be explicitly designed to use GPUs. CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
;RAM &lt;br /&gt;
:Random Access Memory. It is used as the working memory for the cores.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;Batch System&lt;br /&gt;
&lt;br /&gt;
; Moab&lt;br /&gt;
&lt;br /&gt;
; Script &lt;br /&gt;
&lt;br /&gt;
; Slurm&lt;br /&gt;
&lt;br /&gt;
; Shell Script / Bash&lt;br /&gt;
&lt;br /&gt;
; Job&lt;br /&gt;
&lt;br /&gt;
; Runtime&lt;br /&gt;
&lt;br /&gt;
; Scaling&lt;br /&gt;
&lt;br /&gt;
; Scheduler&lt;br /&gt;
&lt;br /&gt;
; Submit&lt;br /&gt;
&lt;br /&gt;
; Parallelization&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12353</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12353"/>
		<updated>2023-09-12T13:03:33Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* HPC Glossary */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a script (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;batch script&#039;&#039; has a header section, in which you specify details like required compute cores, estimated runtime, memory requirements, disk space needed, etc.&lt;br /&gt;
# &#039;&#039;Submit&#039;&#039; the script into a queue, where your &#039;&#039;job&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of batch systems currently used on bwHPC clusters, called &amp;quot;Moab&amp;quot; (legacy installs) and &amp;quot;Slurm&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
== Scaling ==&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation is described in the page&lt;br /&gt;
* [[Scaling]]&lt;br /&gt;
&lt;br /&gt;
== Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Please also see our advice for&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
;HPC&lt;br /&gt;
: short for &#039;&#039;&#039;H&#039;&#039;&#039;igh &#039;&#039;&#039;P&#039;&#039;&#039;erformance &#039;&#039;&#039;C&#039;&#039;&#039;omputing &lt;br /&gt;
&lt;br /&gt;
;HPC Cluster&lt;br /&gt;
:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
;Node&lt;br /&gt;
:An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
;Socket&lt;br /&gt;
:Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
;Core&lt;br /&gt;
:The physical unit that can independently execute tasks on a CPU. Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
;Thread&lt;br /&gt;
:Logical unit that can be executed independently. &lt;br /&gt;
&lt;br /&gt;
;Hyperthreading&lt;br /&gt;
:If the same processor core is configured to execute one or more execution paths in parallel (a hardware setting).&lt;br /&gt;
;Hyperthreading is a technology that allows a single physical CPU core to behave like two virtual cores, improving multitasking performance by executing multiple threads simultaneously.&lt;br /&gt;
&lt;br /&gt;
;Multithreading&lt;br /&gt;
:Alternatively, there can be software threads that can be understood in the context of &amp;quot;parallel execution paths&amp;quot; within the same program (eg. to work through different and independent data arrays in parallel). See OpenMP.&lt;br /&gt;
&lt;br /&gt;
;CPU&lt;br /&gt;
:Central Processing Unit. It performs the actual computation in a compute node. A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
;GPU:Graphics Processing Unit. GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be explicitly designed to use GPUs. CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
;RAM &lt;br /&gt;
:Random Access Memory. It is used as the working memory for the cores.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;Batch System&lt;br /&gt;
&lt;br /&gt;
; Moab&lt;br /&gt;
&lt;br /&gt;
; Script &lt;br /&gt;
&lt;br /&gt;
; Slurm&lt;br /&gt;
&lt;br /&gt;
; Shell Script / Bash&lt;br /&gt;
&lt;br /&gt;
; Job&lt;br /&gt;
&lt;br /&gt;
; Runtime&lt;br /&gt;
&lt;br /&gt;
; Scaling&lt;br /&gt;
&lt;br /&gt;
; Scheduler&lt;br /&gt;
&lt;br /&gt;
; Submit&lt;br /&gt;
&lt;br /&gt;
; Parallelization&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12349</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12349"/>
		<updated>2023-09-12T12:13:51Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* HPC Glossary */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a script (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;batch script&#039;&#039; has a header section, in which you specify details like required compute cores, estimated runtime, memory requirements, disk space needed, etc.&lt;br /&gt;
# &#039;&#039;Submit&#039;&#039; the script into a queue, where your &#039;&#039;job&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of batch systems currently used on bwHPC clusters, called &amp;quot;Moab&amp;quot; (legacy installs) and &amp;quot;Slurm&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
== Scaling ==&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation is described in the page&lt;br /&gt;
* [[Scaling]]&lt;br /&gt;
&lt;br /&gt;
== Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Please also see our advice for&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
;HPC Cluster&lt;br /&gt;
:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
;Node&lt;br /&gt;
:An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
;Socket&lt;br /&gt;
:Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
;Core&lt;br /&gt;
:The physical unit that can independently execute tasks on a CPU. Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
;Thread&lt;br /&gt;
:Logical unit that can be executed independently. &lt;br /&gt;
&lt;br /&gt;
;Hyperthreading&lt;br /&gt;
:If the same processor core is configured to execute one or more execution paths in parallel (a hardware setting).&lt;br /&gt;
&lt;br /&gt;
;Multithreading&lt;br /&gt;
:Alternatively, there can be software threads that can be understood in the context of &amp;quot;parallel execution paths&amp;quot; within the same program (eg. to work through different and independent data arrays in parallel). See OpenMP.&lt;br /&gt;
&lt;br /&gt;
;CPU&lt;br /&gt;
:Central Processing Unit. It performs the actual computation in a compute node. A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
;GPU:Graphics Processing Unit. GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be explicitly designed to use GPUs. CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
;RAM &lt;br /&gt;
:Random Access Memory. It is used as the working memory for the cores.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;Batch System&lt;br /&gt;
: incl. Scheduler, Moab, Slurm&lt;br /&gt;
&lt;br /&gt;
; Script &lt;br /&gt;
:Shell Script / Bash&lt;br /&gt;
&lt;br /&gt;
; Job&lt;br /&gt;
&lt;br /&gt;
; Runtime&lt;br /&gt;
&lt;br /&gt;
; Scaling&lt;br /&gt;
&lt;br /&gt;
; Parallelization&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12348</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12348"/>
		<updated>2023-09-12T12:13:36Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* HPC Glossary */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a script (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;batch script&#039;&#039; has a header section, in which you specify details like required compute cores, estimated runtime, memory requirements, disk space needed, etc.&lt;br /&gt;
# &#039;&#039;Submit&#039;&#039; the script into a queue, where your &#039;&#039;job&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of batch systems currently used on bwHPC clusters, called &amp;quot;Moab&amp;quot; (legacy installs) and &amp;quot;Slurm&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
== Scaling ==&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation is described in the page&lt;br /&gt;
* [[Scaling]]&lt;br /&gt;
&lt;br /&gt;
== Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Please also see our advice for&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
;HPC Cluster&lt;br /&gt;
:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
;Node&lt;br /&gt;
:An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
;Socket&lt;br /&gt;
:Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
;Core&lt;br /&gt;
:The physical unit that can independently execute tasks on a CPU. Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
;Thread&lt;br /&gt;
:Logical unit that can be executed independently. &lt;br /&gt;
&lt;br /&gt;
;Hyperthreading&lt;br /&gt;
:If the same processor core is configured to execute one or more execution paths in parallel (a hardware setting).&lt;br /&gt;
&lt;br /&gt;
;Multithreading&lt;br /&gt;
:Alternatively, there can be software threads that can be understood in the context of &amp;quot;parallel execution paths&amp;quot; within the same program (eg. to work through different and independent data arrays in parallel). See OpenMP.&lt;br /&gt;
&lt;br /&gt;
;CPU&lt;br /&gt;
:Central Processing Unit. It performs the actual computation in a compute node. A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
;GPU:Graphics Processing Unit. &lt;br /&gt;
GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be explicitly designed to use GPUs. CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
;RAM &lt;br /&gt;
:Random Access Memory. It is used as the working memory for the cores.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;Batch System&lt;br /&gt;
: incl. Scheduler, Moab, Slurm&lt;br /&gt;
&lt;br /&gt;
; Script &lt;br /&gt;
:Shell Script / Bash&lt;br /&gt;
&lt;br /&gt;
; Job&lt;br /&gt;
&lt;br /&gt;
; Runtime&lt;br /&gt;
&lt;br /&gt;
; Scaling&lt;br /&gt;
&lt;br /&gt;
; Parallelization&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12347</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12347"/>
		<updated>2023-09-12T12:10:17Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* HPC Glossary */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a script (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;batch script&#039;&#039; has a header section, in which you specify details like required compute cores, estimated runtime, memory requirements, disk space needed, etc.&lt;br /&gt;
# &#039;&#039;Submit&#039;&#039; the script into a queue, where your &#039;&#039;job&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of batch systems currently used on bwHPC clusters, called &amp;quot;Moab&amp;quot; (legacy installs) and &amp;quot;Slurm&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
== Scaling ==&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation is described in the page&lt;br /&gt;
* [[Scaling]]&lt;br /&gt;
&lt;br /&gt;
== Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Please also see our advice for&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
;HPC Cluster&lt;br /&gt;
:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
;Node&lt;br /&gt;
:An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
;Socket&lt;br /&gt;
:Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Core&#039;&#039;&#039;: The physical unit that can independently execute tasks on a CPU. &lt;br /&gt;
Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Thread&#039;&#039;&#039;: A logical unit that can be executed independently. &lt;br /&gt;
If the same processor core is configured to execute one or more execution paths in parallel it is called &#039;&#039;&#039;Hyperthreading&#039;&#039;&#039; (a hardware setting). &lt;br /&gt;
Alternatively, there can be software threads that can be understood in the context of &amp;quot;parallel execution paths&amp;quot; within the same program (eg. to work through different and independent data arrays in parallel). &lt;br /&gt;
This is referred to as &#039;&#039;&#039;Multithreading&#039;&#039;&#039;, eg. via OpenMP.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;CPU&#039;&#039;&#039;: Central Processing Unit. &lt;br /&gt;
It performs the actual computation in a compute node. &lt;br /&gt;
A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;GPU&#039;&#039;&#039;: Graphics Processing Unit. &lt;br /&gt;
GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in the fields of Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be designed specifically to use GPUs. &lt;br /&gt;
CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;RAM&#039;&#039;&#039;: Random Access Memory. &lt;br /&gt;
It is used as the working memory for the cores.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;Batch System&lt;br /&gt;
: incl. Scheduler, Moab, Slurm&lt;br /&gt;
&lt;br /&gt;
; Script &lt;br /&gt;
:Shell Script / Bash&lt;br /&gt;
&lt;br /&gt;
; Job&lt;br /&gt;
&lt;br /&gt;
; Runtime&lt;br /&gt;
&lt;br /&gt;
; Scaling&lt;br /&gt;
&lt;br /&gt;
; Parallelization&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12346</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12346"/>
		<updated>2023-09-12T12:08:48Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* HPC Glossary */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a script (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;batch script&#039;&#039; has a header section, in which you specify details like required compute cores, estimated runtime, memory requirements, disk space needed, etc.&lt;br /&gt;
# &#039;&#039;Submit&#039;&#039; the script into a queue, where your &#039;&#039;job&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of batch systems currently used on bwHPC clusters, called &amp;quot;Moab&amp;quot; (legacy installs) and &amp;quot;Slurm&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
== Scaling ==&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation is described in the page&lt;br /&gt;
* [[Scaling]]&lt;br /&gt;
&lt;br /&gt;
== Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Please also see our advice for&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
;HPC Cluster&lt;br /&gt;
:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
;Node&lt;br /&gt;
:An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
;Socket&lt;br /&gt;
:Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Core&#039;&#039;&#039;: The physical unit that can independently execute tasks on a CPU. &lt;br /&gt;
Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Thread&#039;&#039;&#039;: A logical unit that can be executed independently. &lt;br /&gt;
If the same processor core is configured to execute one or more execution paths in parallel it is called &#039;&#039;&#039;Hyperthreading&#039;&#039;&#039; (a hardware setting). &lt;br /&gt;
Alternatively, there can be software threads that can be understood in the context of &amp;quot;parallel execution paths&amp;quot; within the same program (eg. to work through different and independent data arrays in parallel). &lt;br /&gt;
This is referred to as &#039;&#039;&#039;Multithreading&#039;&#039;&#039;, eg. via OpenMP.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;CPU&#039;&#039;&#039;: Central Processing Unit. &lt;br /&gt;
It performs the actual computation in a compute node. &lt;br /&gt;
A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;GPU&#039;&#039;&#039;: Graphics Processing Unit. &lt;br /&gt;
GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in the fields of Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be designed specifically to use GPUs. &lt;br /&gt;
CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;RAM&#039;&#039;&#039;: Random Access Memory. &lt;br /&gt;
It is used as the working memory for the cores.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;Batch System&lt;br /&gt;
: incl. Scheduler, Moab, Slurm&lt;br /&gt;
&lt;br /&gt;
; Script &lt;br /&gt;
:Shell Script / Bash&lt;br /&gt;
&lt;br /&gt;
; Job&lt;br /&gt;
&lt;br /&gt;
; runtime&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12345</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12345"/>
		<updated>2023-09-12T12:07:51Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* HPC Glossary */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a script (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;batch script&#039;&#039; has a header section, in which you specify details like required compute cores, estimated runtime, memory requirements, disk space needed, etc.&lt;br /&gt;
# &#039;&#039;Submit&#039;&#039; the script into a queue, where your &#039;&#039;job&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of batch systems currently used on bwHPC clusters, called &amp;quot;Moab&amp;quot; (legacy installs) and &amp;quot;Slurm&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
== Scaling ==&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation is described in the page&lt;br /&gt;
* [[Scaling]]&lt;br /&gt;
&lt;br /&gt;
== Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Please also see our advice for&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
;HPC Cluster&lt;br /&gt;
:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
;Node&lt;br /&gt;
:An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
;Socket&lt;br /&gt;
:Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Core&#039;&#039;&#039;: The physical unit that can independently execute tasks on a CPU. &lt;br /&gt;
Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Thread&#039;&#039;&#039;: A logical unit that can be executed independently. &lt;br /&gt;
If the same processor core is configured to execute one or more execution paths in parallel it is called &#039;&#039;&#039;Hyperthreading&#039;&#039;&#039; (a hardware setting). &lt;br /&gt;
Alternatively, there can be software threads that can be understood in the context of &amp;quot;parallel execution paths&amp;quot; within the same program (eg. to work through different and independent data arrays in parallel). &lt;br /&gt;
This is referred to as &#039;&#039;&#039;Multithreading&#039;&#039;&#039;, eg. via OpenMP.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;CPU&#039;&#039;&#039;: Central Processing Unit. &lt;br /&gt;
It performs the actual computation in a compute node. &lt;br /&gt;
A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;GPU&#039;&#039;&#039;: Graphics Processing Unit. &lt;br /&gt;
GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in the fields of Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be designed specifically to use GPUs. &lt;br /&gt;
CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;RAM&#039;&#039;&#039;: Random Access Memory. &lt;br /&gt;
It is used as the working memory for the cores.&lt;br /&gt;
&lt;br /&gt;
;Batch System&lt;br /&gt;
: incl. Scheduler, Moab, Slurm&lt;br /&gt;
&lt;br /&gt;
; Script &lt;br /&gt;
:Shell Script / Bash&lt;br /&gt;
&lt;br /&gt;
; Job&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12344</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12344"/>
		<updated>2023-09-12T11:42:09Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* HPC Glossary */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a script (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;batch script&#039;&#039; has a header section, in which you specify details like required compute cores, estimated runtime, memory requirements, disk space needed, etc.&lt;br /&gt;
# &#039;&#039;Submit&#039;&#039; the script into a queue, where your &#039;&#039;job&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of batch systems currently used on bwHPC clusters, called &amp;quot;Moab&amp;quot; (legacy installs) and &amp;quot;Slurm&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
== Scaling ==&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation is described in the page&lt;br /&gt;
* [[Scaling]]&lt;br /&gt;
&lt;br /&gt;
== Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Please also see our advice for&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
;HPC Cluster&lt;br /&gt;
:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
;Node&lt;br /&gt;
:An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
;Socket&lt;br /&gt;
:Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Core&#039;&#039;&#039;: The physical unit that can independently execute tasks on a CPU. &lt;br /&gt;
Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Thread&#039;&#039;&#039;: A logical unit that can be executed independently. &lt;br /&gt;
If the same processor core is configured to execute one or more execution paths in parallel it is called &#039;&#039;&#039;Hyperthreading&#039;&#039;&#039; (a hardware setting). &lt;br /&gt;
Alternatively, there can be software threads that can be understood in the context of &amp;quot;parallel execution paths&amp;quot; within the same program (eg. to work through different and independent data arrays in parallel). &lt;br /&gt;
This is referred to as &#039;&#039;&#039;Multithreading&#039;&#039;&#039;, eg. via OpenMP.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;CPU&#039;&#039;&#039;: Central Processing Unit. &lt;br /&gt;
It performs the actual computation in a compute node. &lt;br /&gt;
A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;GPU&#039;&#039;&#039;: Graphics Processing Unit. &lt;br /&gt;
GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in the fields of Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be designed specifically to use GPUs. &lt;br /&gt;
CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;RAM&#039;&#039;&#039;: Random Access Memory. &lt;br /&gt;
It is used as the working memory for the cores.&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12343</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12343"/>
		<updated>2023-09-12T11:41:34Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* HPC Glossary */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a script (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;batch script&#039;&#039; has a header section, in which you specify details like required compute cores, estimated runtime, memory requirements, disk space needed, etc.&lt;br /&gt;
# &#039;&#039;Submit&#039;&#039; the script into a queue, where your &#039;&#039;job&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of batch systems currently used on bwHPC clusters, called &amp;quot;Moab&amp;quot; (legacy installs) and &amp;quot;Slurm&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
== Scaling ==&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation is described in the page&lt;br /&gt;
* [[Scaling]]&lt;br /&gt;
&lt;br /&gt;
== Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Please also see our advice for&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
;HPC Cluster&lt;br /&gt;
:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Node&#039;&#039;&#039;: An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Socket&#039;&#039;&#039;: Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Core&#039;&#039;&#039;: The physical unit that can independently execute tasks on a CPU. &lt;br /&gt;
Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Thread&#039;&#039;&#039;: A logical unit that can be executed independently. &lt;br /&gt;
If the same processor core is configured to execute one or more execution paths in parallel it is called &#039;&#039;&#039;Hyperthreading&#039;&#039;&#039; (a hardware setting). &lt;br /&gt;
Alternatively, there can be software threads that can be understood in the context of &amp;quot;parallel execution paths&amp;quot; within the same program (eg. to work through different and independent data arrays in parallel). &lt;br /&gt;
This is referred to as &#039;&#039;&#039;Multithreading&#039;&#039;&#039;, eg. via OpenMP.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;CPU&#039;&#039;&#039;: Central Processing Unit. &lt;br /&gt;
It performs the actual computation in a compute node. &lt;br /&gt;
A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;GPU&#039;&#039;&#039;: Graphics Processing Unit. &lt;br /&gt;
GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in the fields of Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be designed specifically to use GPUs. &lt;br /&gt;
CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;RAM&#039;&#039;&#039;: Random Access Memory. &lt;br /&gt;
It is used as the working memory for the cores.&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12342</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12342"/>
		<updated>2023-09-12T11:41:24Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* HPC Glossary */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a script (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;batch script&#039;&#039; has a header section, in which you specify details like required compute cores, estimated runtime, memory requirements, disk space needed, etc.&lt;br /&gt;
# &#039;&#039;Submit&#039;&#039; the script into a queue, where your &#039;&#039;job&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of batch systems currently used on bwHPC clusters, called &amp;quot;Moab&amp;quot; (legacy installs) and &amp;quot;Slurm&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
== Scaling ==&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation is described in the page&lt;br /&gt;
* [[Scaling]]&lt;br /&gt;
&lt;br /&gt;
== Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Please also see our advice for&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;HPC Cluster&lt;br /&gt;
:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Node&#039;&#039;&#039;: An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Socket&#039;&#039;&#039;: Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Core&#039;&#039;&#039;: The physical unit that can independently execute tasks on a CPU. &lt;br /&gt;
Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Thread&#039;&#039;&#039;: A logical unit that can be executed independently. &lt;br /&gt;
If the same processor core is configured to execute one or more execution paths in parallel it is called &#039;&#039;&#039;Hyperthreading&#039;&#039;&#039; (a hardware setting). &lt;br /&gt;
Alternatively, there can be software threads that can be understood in the context of &amp;quot;parallel execution paths&amp;quot; within the same program (eg. to work through different and independent data arrays in parallel). &lt;br /&gt;
This is referred to as &#039;&#039;&#039;Multithreading&#039;&#039;&#039;, eg. via OpenMP.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;CPU&#039;&#039;&#039;: Central Processing Unit. &lt;br /&gt;
It performs the actual computation in a compute node. &lt;br /&gt;
A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;GPU&#039;&#039;&#039;: Graphics Processing Unit. &lt;br /&gt;
GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in the fields of Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be designed specifically to use GPUs. &lt;br /&gt;
CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;RAM&#039;&#039;&#039;: Random Access Memory. &lt;br /&gt;
It is used as the working memory for the cores.&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12341</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12341"/>
		<updated>2023-09-12T11:41:12Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* HPC Glossary */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a script (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;batch script&#039;&#039; has a header section, in which you specify details like required compute cores, estimated runtime, memory requirements, disk space needed, etc.&lt;br /&gt;
# &#039;&#039;Submit&#039;&#039; the script into a queue, where your &#039;&#039;job&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of batch systems currently used on bwHPC clusters, called &amp;quot;Moab&amp;quot; (legacy installs) and &amp;quot;Slurm&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
== Scaling ==&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation is described in the page&lt;br /&gt;
* [[Scaling]]&lt;br /&gt;
&lt;br /&gt;
== Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Please also see our advice for&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;HPC Cluster&lt;br /&gt;
:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;HPC Cluster&#039;&#039;&#039;: Collection of compute nodes with (usually) high bandwidth and low latency communication. &lt;br /&gt;
They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Node&#039;&#039;&#039;: An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Socket&#039;&#039;&#039;: Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Core&#039;&#039;&#039;: The physical unit that can independently execute tasks on a CPU. &lt;br /&gt;
Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Thread&#039;&#039;&#039;: A logical unit that can be executed independently. &lt;br /&gt;
If the same processor core is configured to execute one or more execution paths in parallel it is called &#039;&#039;&#039;Hyperthreading&#039;&#039;&#039; (a hardware setting). &lt;br /&gt;
Alternatively, there can be software threads that can be understood in the context of &amp;quot;parallel execution paths&amp;quot; within the same program (eg. to work through different and independent data arrays in parallel). &lt;br /&gt;
This is referred to as &#039;&#039;&#039;Multithreading&#039;&#039;&#039;, eg. via OpenMP.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;CPU&#039;&#039;&#039;: Central Processing Unit. &lt;br /&gt;
It performs the actual computation in a compute node. &lt;br /&gt;
A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;GPU&#039;&#039;&#039;: Graphics Processing Unit. &lt;br /&gt;
GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in the fields of Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be designed specifically to use GPUs. &lt;br /&gt;
CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;RAM&#039;&#039;&#039;: Random Access Memory. &lt;br /&gt;
It is used as the working memory for the cores.&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12340</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12340"/>
		<updated>2023-09-12T11:39:49Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* HPC Glossary */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a script (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;batch script&#039;&#039; has a header section, in which you specify details like required compute cores, estimated runtime, memory requirements, disk space needed, etc.&lt;br /&gt;
# &#039;&#039;Submit&#039;&#039; the script into a queue, where your &#039;&#039;job&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of batch systems currently used on bwHPC clusters, called &amp;quot;Moab&amp;quot; (legacy installs) and &amp;quot;Slurm&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
== Scaling ==&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation is described in the page&lt;br /&gt;
* [[Scaling]]&lt;br /&gt;
&lt;br /&gt;
== Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Please also see our advice for&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
;HPC Cluster&lt;br /&gt;
:Collection of compute nodes with (usually) high bandwidth and low latency communication. &lt;br /&gt;
They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;HPC Cluster&#039;&#039;&#039;: Collection of compute nodes with (usually) high bandwidth and low latency communication. &lt;br /&gt;
They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Node&#039;&#039;&#039;: An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Socket&#039;&#039;&#039;: Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Core&#039;&#039;&#039;: The physical unit that can independently execute tasks on a CPU. &lt;br /&gt;
Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Thread&#039;&#039;&#039;: A logical unit that can be executed independently. &lt;br /&gt;
If the same processor core is configured to execute one or more execution paths in parallel it is called &#039;&#039;&#039;Hyperthreading&#039;&#039;&#039; (a hardware setting). &lt;br /&gt;
Alternatively, there can be software threads that can be understood in the context of &amp;quot;parallel execution paths&amp;quot; within the same program (eg. to work through different and independent data arrays in parallel). &lt;br /&gt;
This is referred to as &#039;&#039;&#039;Multithreading&#039;&#039;&#039;, eg. via OpenMP.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;CPU&#039;&#039;&#039;: Central Processing Unit. &lt;br /&gt;
It performs the actual computation in a compute node. &lt;br /&gt;
A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;GPU&#039;&#039;&#039;: Graphics Processing Unit. &lt;br /&gt;
GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in the fields of Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be designed specifically to use GPUs. &lt;br /&gt;
CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;RAM&#039;&#039;&#039;: Random Access Memory. &lt;br /&gt;
It is used as the working memory for the cores.&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12339</id>
		<title>Running Calculations</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Running_Calculations&amp;diff=12339"/>
		<updated>2023-09-12T11:37:39Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Description ==&lt;br /&gt;
[[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]&lt;br /&gt;
On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:&lt;br /&gt;
&lt;br /&gt;
# prepare a script (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this &#039;&#039;batch script&#039;&#039; has a header section, in which you specify details like required compute cores, estimated runtime, memory requirements, disk space needed, etc.&lt;br /&gt;
# &#039;&#039;Submit&#039;&#039; the script into a queue, where your &#039;&#039;job&#039;&#039; (calculation) &lt;br /&gt;
# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available. &lt;br /&gt;
# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit. &lt;br /&gt;
# Save results: At the end of your script, include commands to save the calculation results back to your home directory.&lt;br /&gt;
&lt;br /&gt;
There are two types of batch systems currently used on bwHPC clusters, called &amp;quot;Moab&amp;quot; (legacy installs) and &amp;quot;Slurm&amp;quot;. &lt;br /&gt;
&lt;br /&gt;
== Link to Batch System per Cluster ==&lt;br /&gt;
&lt;br /&gt;
Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:&lt;br /&gt;
&lt;br /&gt;
* Slurm systems&lt;br /&gt;
**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]&lt;br /&gt;
** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]&lt;br /&gt;
** [[Helix/Slurm   | Slurm Helix]]&lt;br /&gt;
* Moab systems (legacy systems with deprecated queuing system)&lt;br /&gt;
** [[NEMO/Moab|Moab NEMO specific information]]&lt;br /&gt;
** [[BinAC/Moab|Moab BinAC specific information]]&lt;br /&gt;
&lt;br /&gt;
== Scaling ==&lt;br /&gt;
&lt;br /&gt;
When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead. &lt;br /&gt;
&lt;br /&gt;
How to find a reasonable number of how many compute cores to use for your calculation is described in the page&lt;br /&gt;
* [[Scaling]]&lt;br /&gt;
&lt;br /&gt;
== Energy Efficiency ==&lt;br /&gt;
&lt;br /&gt;
Please also see our advice for&lt;br /&gt;
* [[Energy Efficient Cluster Usage]]&lt;br /&gt;
&lt;br /&gt;
== HPC Glossary ==&lt;br /&gt;
&lt;br /&gt;
A short definition of the typical elements of an HPC cluster. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;HPC Cluster&#039;&#039;&#039;: Collection of compute nodes with (usually) high bandwidth and low latency communication. &lt;br /&gt;
They can be accessed via login nodes. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Node&#039;&#039;&#039;: An individual computer with one or more sockets, part of an HPC cluster.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Socket&#039;&#039;&#039;: Physical socket where the CPU capsules are placed.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Core&#039;&#039;&#039;: The physical unit that can independently execute tasks on a CPU. &lt;br /&gt;
Modern CPUs generally have multiple cores. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Thread&#039;&#039;&#039;: A logical unit that can be executed independently. &lt;br /&gt;
If the same processor core is configured to execute one or more execution paths in parallel it is called &#039;&#039;&#039;Hyperthreading&#039;&#039;&#039; (a hardware setting). &lt;br /&gt;
Alternatively, there can be software threads that can be understood in the context of &amp;quot;parallel execution paths&amp;quot; within the same program (eg. to work through different and independent data arrays in parallel). &lt;br /&gt;
This is referred to as &#039;&#039;&#039;Multithreading&#039;&#039;&#039;, eg. via OpenMP.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;CPU&#039;&#039;&#039;: Central Processing Unit. &lt;br /&gt;
It performs the actual computation in a compute node. &lt;br /&gt;
A modern CPU is composed of numerous cores and layers of cache.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;GPU&#039;&#039;&#039;: Graphics Processing Unit. &lt;br /&gt;
GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in the fields of Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be designed specifically to use GPUs. &lt;br /&gt;
CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;RAM&#039;&#039;&#039;: Random Access Memory. &lt;br /&gt;
It is used as the working memory for the cores.&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Energy_Efficient_Cluster_Usage&amp;diff=12316</id>
		<title>Energy Efficient Cluster Usage</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Energy_Efficient_Cluster_Usage&amp;diff=12316"/>
		<updated>2023-08-21T11:58:23Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* How many and which kind of hardware resources do I require for it */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Introduction =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Energy consumption of data centers has been increasing continuously throughout the last decade. In 2020, the energy consumption of all data centers in Germany amounted to around  [https://www.bundestag.de/resource/blob/863850/423c11968fcb5c9995e9ef9090edf9e6/WD-8-070-21-pdf-data.pdf 3 percent] of the total electricity produced. Accompanying this large energy consumption are large-scale emissions of CO2 to the atmosphere and thus significant contributions to climate change.&lt;br /&gt;
To illustrate this, an average compute job running on a single node for one day may easily consume 10 kWh or even more. That translates roughly to brewing 700 cups of coffee.&lt;br /&gt;
Assuming that a typical bwHPC cluster has a few hundred compute nodes, this amounts to the energy consumption of a village for each cluster. &lt;br /&gt;
&lt;br /&gt;
Although a large amount of this energy consumption is an intrinsic requirement of running large HPC clusters (even when ist processors are idle, a cluster uses a lot of energy), efficient use of the available resources is important. Using as many resources as possible does not make a power user. Using them wisely does.&lt;br /&gt;
In the following, a basic introduction to some of the most important aspects of energy-efficient HPC usage from a user perspective is given. &lt;br /&gt;
&lt;br /&gt;
We can generally distinguish three tasks when optimizing for running HPC jobs efficiently.&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  What do I want to do and why do I need an HPC Cluster for it?&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  How many and which kind of hardware resources do I require for it?&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  How do I optimize my code to use these resources most efficiently?&lt;br /&gt;
&lt;br /&gt;
= What do I want to do and why do I need an HPC Cluster for it? =&lt;br /&gt;
&lt;br /&gt;
The bwHPC clusters are used to almost full capacity, and running a job on an HPC node consumes a lot of energy, as shown above. &lt;br /&gt;
Therefore, users are requested to run only necessary jobs.&lt;br /&gt;
&lt;br /&gt;
Please consider testing new setups and their output for validity prior to submitting jobs that require lots of resources. This also includes projects where a lot of (smaller) similar jobs are submitted. &lt;br /&gt;
&lt;br /&gt;
Make sure to double-check your jobs prior to the submission, having to discard the output data of an HPC project due to faulty input files is wasting a lot of computational resources.&lt;br /&gt;
&lt;br /&gt;
Finally, identifying the specific resource requirements for a given job is important to allocate the optimal your compute job, and to decide if an HPC cluster is needed at all. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= How many and which kind of hardware resources do I require for it =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Resource allocation is a crucial part when working on an HPC cluster. &lt;br /&gt;
As this is dependent on both the job as well as the specific cluster hardware and architecture available. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&lt;br /&gt;
A small number of jobs and few resources&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Submit to the scheduler. No extended testing and resource scaling analysis are needed. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;&lt;br /&gt;
Medium-sized projects&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Run only necessary jobs: Please consider testing new setups and their output for validity prior to submitting a huge amount of similar jobs&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Start small: Run your problem on a small set of resources first.&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Use the proper tools for development: If you develop your own code, please use the proper tools for debugging and parallel performance analysis. See: [[Development#Documentation_in_the_Wiki|Development]].&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  A look at the job feedback can help you determine if you are using the cluster efficiently&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Large projects&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Same approach as for medium-sized projects. &lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Run a scaling analysis for your project with regard to how many resources work best. See: [[Scaling]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Many short jobs&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Handling via the scheduler is inefficient. &lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Simple parallelization by hand is advisable. See: A basic introduction to [[Parallel Programming]].&lt;br /&gt;
&lt;br /&gt;
= How do I optimize my code to use these resources most efficiently? =&lt;br /&gt;
&lt;br /&gt;
The above recommendations will help use the cluster resources more efficiently.&lt;br /&gt;
Regarding software development, power efficiency correlates obviously heavily with &#039;&#039;&#039;computing performance&#039;&#039;&#039;, but also with memory usage, i.e. the amount of memory used, but also memory efficiency.&lt;br /&gt;
&lt;br /&gt;
Here, we have gathered a few results based on other research:&lt;br /&gt;
&amp;amp;rarr;  Use an efficient programming language such as Rust, C, and C++ -- well any compiled language. Do not use any interpreted language like Perl or Python. Since Machine Learning is a hot topic, this deserves a few words: Any ML-Python code using Tensorflow or other libraries will make heavy usage of NumPy and other math packages, which will use C-based implementations. Please make sure, you use the provided Python modules, which are optimized to use Intel MKL and other mathematical libraries.&lt;br /&gt;
&lt;br /&gt;
Further reading:&lt;br /&gt;
Rui Pereira, et al: &amp;quot;&#039;&#039;Energy efficiency across programming languages: how do energy, time, and memory relate?&#039;&#039;&amp;quot;, SLE 2017: Proc. of the 10th ACM SIGPLAN Int. Conf. on SW Language Eng., Oct. 2017, pp. 256–267, [https://doi.org/10.1145/3136014.3136031 doi:10.1145/3136014.3136031]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Analyse memory access patterns&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  For small tight loops checking for locks, use the &amp;lt;code&amp;gt;pause&amp;lt;/code&amp;gt; instruction.&lt;br /&gt;
&lt;br /&gt;
= Summary: General Recommendations =&lt;br /&gt;
&lt;br /&gt;
* Choose the most &#039;&#039;&#039;efficient algorithms&#039;&#039;&#039; for the given problem&lt;br /&gt;
* Run only &#039;&#039;&#039;necessary&#039;&#039;&#039; jobs: Please consider testing new setups and their output for validity prior to submitting a huge amount of similar jobs&lt;br /&gt;
* Start &#039;&#039;&#039;small&#039;&#039;&#039;: Run Your problem on a small number of parallel entities (be it processes or threads) first.&lt;br /&gt;
* &#039;&#039;&#039;Estimate&#039;&#039;&#039; the runtime of the parallel job as &#039;&#039;&#039;exactly&#039;&#039;&#039; as possible to increase the efficiency of the scheduling of the whole system&lt;br /&gt;
* Use the proper tools for development: If You develop your own code, please use the proper tools for debugging and parallel performance analysis. More information is available on the bwHPC Wiki.&lt;br /&gt;
* A look at the &#039;&#039;&#039;job feedback&#039;&#039;&#039; can help you determine if you are using the cluster efficiently&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Energy_Efficient_Cluster_Usage&amp;diff=12315</id>
		<title>Energy Efficient Cluster Usage</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Energy_Efficient_Cluster_Usage&amp;diff=12315"/>
		<updated>2023-08-21T11:57:58Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* How many and which kind of hardware resources do I require for it */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Introduction =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Energy consumption of data centers has been increasing continuously throughout the last decade. In 2020, the energy consumption of all data centers in Germany amounted to around  [https://www.bundestag.de/resource/blob/863850/423c11968fcb5c9995e9ef9090edf9e6/WD-8-070-21-pdf-data.pdf 3 percent] of the total electricity produced. Accompanying this large energy consumption are large-scale emissions of CO2 to the atmosphere and thus significant contributions to climate change.&lt;br /&gt;
To illustrate this, an average compute job running on a single node for one day may easily consume 10 kWh or even more. That translates roughly to brewing 700 cups of coffee.&lt;br /&gt;
Assuming that a typical bwHPC cluster has a few hundred compute nodes, this amounts to the energy consumption of a village for each cluster. &lt;br /&gt;
&lt;br /&gt;
Although a large amount of this energy consumption is an intrinsic requirement of running large HPC clusters (even when ist processors are idle, a cluster uses a lot of energy), efficient use of the available resources is important. Using as many resources as possible does not make a power user. Using them wisely does.&lt;br /&gt;
In the following, a basic introduction to some of the most important aspects of energy-efficient HPC usage from a user perspective is given. &lt;br /&gt;
&lt;br /&gt;
We can generally distinguish three tasks when optimizing for running HPC jobs efficiently.&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  What do I want to do and why do I need an HPC Cluster for it?&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  How many and which kind of hardware resources do I require for it?&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  How do I optimize my code to use these resources most efficiently?&lt;br /&gt;
&lt;br /&gt;
= What do I want to do and why do I need an HPC Cluster for it? =&lt;br /&gt;
&lt;br /&gt;
The bwHPC clusters are used to almost full capacity, and running a job on an HPC node consumes a lot of energy, as shown above. &lt;br /&gt;
Therefore, users are requested to run only necessary jobs.&lt;br /&gt;
&lt;br /&gt;
Please consider testing new setups and their output for validity prior to submitting jobs that require lots of resources. This also includes projects where a lot of (smaller) similar jobs are submitted. &lt;br /&gt;
&lt;br /&gt;
Make sure to double-check your jobs prior to the submission, having to discard the output data of an HPC project due to faulty input files is wasting a lot of computational resources.&lt;br /&gt;
&lt;br /&gt;
Finally, identifying the specific resource requirements for a given job is important to allocate the optimal your compute job, and to decide if an HPC cluster is needed at all. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= How many and which kind of hardware resources do I require for it =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Resource allocation is a crucial part when working on an HPC cluster. &lt;br /&gt;
As this is dependent on both the job as well as the specific cluster hardware and architecture available. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A small number of jobs and few resources&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Submit to the scheduler. No extended testing and resource scaling analysis are needed. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Medium-sized projects&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Run only necessary jobs: Please consider testing new setups and their output for validity prior to submitting a huge amount of similar jobs&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Start small: Run your problem on a small set of resources first.&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Use the proper tools for development: If you develop your own code, please use the proper tools for debugging and parallel performance analysis. See: [[Development#Documentation_in_the_Wiki|Development]].&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  A look at the job feedback can help you determine if you are using the cluster efficiently&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Large projects&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Same approach as for medium-sized projects. &lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Run a scaling analysis for your project with regard to how many resources work best. See: [[Scaling]].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Many short jobs&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Handling via the scheduler is inefficient. &lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Simple parallelization by hand is advisable. See: A basic introduction to [[Parallel Programming]].&lt;br /&gt;
&lt;br /&gt;
= How do I optimize my code to use these resources most efficiently? =&lt;br /&gt;
&lt;br /&gt;
The above recommendations will help use the cluster resources more efficiently.&lt;br /&gt;
Regarding software development, power efficiency correlates obviously heavily with &#039;&#039;&#039;computing performance&#039;&#039;&#039;, but also with memory usage, i.e. the amount of memory used, but also memory efficiency.&lt;br /&gt;
&lt;br /&gt;
Here, we have gathered a few results based on other research:&lt;br /&gt;
&amp;amp;rarr;  Use an efficient programming language such as Rust, C, and C++ -- well any compiled language. Do not use any interpreted language like Perl or Python. Since Machine Learning is a hot topic, this deserves a few words: Any ML-Python code using Tensorflow or other libraries will make heavy usage of NumPy and other math packages, which will use C-based implementations. Please make sure, you use the provided Python modules, which are optimized to use Intel MKL and other mathematical libraries.&lt;br /&gt;
&lt;br /&gt;
Further reading:&lt;br /&gt;
Rui Pereira, et al: &amp;quot;&#039;&#039;Energy efficiency across programming languages: how do energy, time, and memory relate?&#039;&#039;&amp;quot;, SLE 2017: Proc. of the 10th ACM SIGPLAN Int. Conf. on SW Language Eng., Oct. 2017, pp. 256–267, [https://doi.org/10.1145/3136014.3136031 doi:10.1145/3136014.3136031]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Analyse memory access patterns&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  For small tight loops checking for locks, use the &amp;lt;code&amp;gt;pause&amp;lt;/code&amp;gt; instruction.&lt;br /&gt;
&lt;br /&gt;
= Summary: General Recommendations =&lt;br /&gt;
&lt;br /&gt;
* Choose the most &#039;&#039;&#039;efficient algorithms&#039;&#039;&#039; for the given problem&lt;br /&gt;
* Run only &#039;&#039;&#039;necessary&#039;&#039;&#039; jobs: Please consider testing new setups and their output for validity prior to submitting a huge amount of similar jobs&lt;br /&gt;
* Start &#039;&#039;&#039;small&#039;&#039;&#039;: Run Your problem on a small number of parallel entities (be it processes or threads) first.&lt;br /&gt;
* &#039;&#039;&#039;Estimate&#039;&#039;&#039; the runtime of the parallel job as &#039;&#039;&#039;exactly&#039;&#039;&#039; as possible to increase the efficiency of the scheduling of the whole system&lt;br /&gt;
* Use the proper tools for development: If You develop your own code, please use the proper tools for debugging and parallel performance analysis. More information is available on the bwHPC Wiki.&lt;br /&gt;
* A look at the &#039;&#039;&#039;job feedback&#039;&#039;&#039; can help you determine if you are using the cluster efficiently&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Energy_Efficient_Cluster_Usage&amp;diff=12314</id>
		<title>Energy Efficient Cluster Usage</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Energy_Efficient_Cluster_Usage&amp;diff=12314"/>
		<updated>2023-08-21T11:57:20Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* How do I optimize my code to use these resources most efficiently? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Introduction =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Energy consumption of data centers has been increasing continuously throughout the last decade. In 2020, the energy consumption of all data centers in Germany amounted to around  [https://www.bundestag.de/resource/blob/863850/423c11968fcb5c9995e9ef9090edf9e6/WD-8-070-21-pdf-data.pdf 3 percent] of the total electricity produced. Accompanying this large energy consumption are large-scale emissions of CO2 to the atmosphere and thus significant contributions to climate change.&lt;br /&gt;
To illustrate this, an average compute job running on a single node for one day may easily consume 10 kWh or even more. That translates roughly to brewing 700 cups of coffee.&lt;br /&gt;
Assuming that a typical bwHPC cluster has a few hundred compute nodes, this amounts to the energy consumption of a village for each cluster. &lt;br /&gt;
&lt;br /&gt;
Although a large amount of this energy consumption is an intrinsic requirement of running large HPC clusters (even when ist processors are idle, a cluster uses a lot of energy), efficient use of the available resources is important. Using as many resources as possible does not make a power user. Using them wisely does.&lt;br /&gt;
In the following, a basic introduction to some of the most important aspects of energy-efficient HPC usage from a user perspective is given. &lt;br /&gt;
&lt;br /&gt;
We can generally distinguish three tasks when optimizing for running HPC jobs efficiently.&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  What do I want to do and why do I need an HPC Cluster for it?&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  How many and which kind of hardware resources do I require for it?&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  How do I optimize my code to use these resources most efficiently?&lt;br /&gt;
&lt;br /&gt;
= What do I want to do and why do I need an HPC Cluster for it? =&lt;br /&gt;
&lt;br /&gt;
The bwHPC clusters are used to almost full capacity, and running a job on an HPC node consumes a lot of energy, as shown above. &lt;br /&gt;
Therefore, users are requested to run only necessary jobs.&lt;br /&gt;
&lt;br /&gt;
Please consider testing new setups and their output for validity prior to submitting jobs that require lots of resources. This also includes projects where a lot of (smaller) similar jobs are submitted. &lt;br /&gt;
&lt;br /&gt;
Make sure to double-check your jobs prior to the submission, having to discard the output data of an HPC project due to faulty input files is wasting a lot of computational resources.&lt;br /&gt;
&lt;br /&gt;
Finally, identifying the specific resource requirements for a given job is important to allocate the optimal your compute job, and to decide if an HPC cluster is needed at all. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= How many and which kind of hardware resources do I require for it =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Resource allocation is a crucial part when working on an HPC cluster. &lt;br /&gt;
As this is dependent on both the job as well as the specific cluster hardware and architecture available. &lt;br /&gt;
&lt;br /&gt;
A small number of jobs and few resources&lt;br /&gt;
* Submit to the scheduler. No extended testing and resource scaling analysis are needed. &lt;br /&gt;
&lt;br /&gt;
Medium-sized projects&lt;br /&gt;
* Run only necessary jobs: Please consider testing new setups and their output for validity prior to submitting a huge amount of similar jobs&lt;br /&gt;
* Start small: Run your problem on a small set of resources first.&lt;br /&gt;
* Use the proper tools for development: If you develop your own code, please use the proper tools for debugging and parallel performance analysis. See: [[Development#Documentation_in_the_Wiki|Development]].&lt;br /&gt;
* A look at the job feedback can help you determine if you are using the cluster efficiently&lt;br /&gt;
&lt;br /&gt;
Large projects&lt;br /&gt;
* Same approach as for medium-sized projects. &lt;br /&gt;
* Run a scaling analysis for your project with regard to how many resources work best. See: [[Scaling]].&lt;br /&gt;
&lt;br /&gt;
Many short jobs&lt;br /&gt;
* Handling via the scheduler is inefficient. &lt;br /&gt;
* Simple parallelization by hand is advisable. See: A basic introduction to [[Parallel Programming]].&lt;br /&gt;
&lt;br /&gt;
= How do I optimize my code to use these resources most efficiently? =&lt;br /&gt;
&lt;br /&gt;
The above recommendations will help use the cluster resources more efficiently.&lt;br /&gt;
Regarding software development, power efficiency correlates obviously heavily with &#039;&#039;&#039;computing performance&#039;&#039;&#039;, but also with memory usage, i.e. the amount of memory used, but also memory efficiency.&lt;br /&gt;
&lt;br /&gt;
Here, we have gathered a few results based on other research:&lt;br /&gt;
&amp;amp;rarr;  Use an efficient programming language such as Rust, C, and C++ -- well any compiled language. Do not use any interpreted language like Perl or Python. Since Machine Learning is a hot topic, this deserves a few words: Any ML-Python code using Tensorflow or other libraries will make heavy usage of NumPy and other math packages, which will use C-based implementations. Please make sure, you use the provided Python modules, which are optimized to use Intel MKL and other mathematical libraries.&lt;br /&gt;
&lt;br /&gt;
Further reading:&lt;br /&gt;
Rui Pereira, et al: &amp;quot;&#039;&#039;Energy efficiency across programming languages: how do energy, time, and memory relate?&#039;&#039;&amp;quot;, SLE 2017: Proc. of the 10th ACM SIGPLAN Int. Conf. on SW Language Eng., Oct. 2017, pp. 256–267, [https://doi.org/10.1145/3136014.3136031 doi:10.1145/3136014.3136031]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  Analyse memory access patterns&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  For small tight loops checking for locks, use the &amp;lt;code&amp;gt;pause&amp;lt;/code&amp;gt; instruction.&lt;br /&gt;
&lt;br /&gt;
= Summary: General Recommendations =&lt;br /&gt;
&lt;br /&gt;
* Choose the most &#039;&#039;&#039;efficient algorithms&#039;&#039;&#039; for the given problem&lt;br /&gt;
* Run only &#039;&#039;&#039;necessary&#039;&#039;&#039; jobs: Please consider testing new setups and their output for validity prior to submitting a huge amount of similar jobs&lt;br /&gt;
* Start &#039;&#039;&#039;small&#039;&#039;&#039;: Run Your problem on a small number of parallel entities (be it processes or threads) first.&lt;br /&gt;
* &#039;&#039;&#039;Estimate&#039;&#039;&#039; the runtime of the parallel job as &#039;&#039;&#039;exactly&#039;&#039;&#039; as possible to increase the efficiency of the scheduling of the whole system&lt;br /&gt;
* Use the proper tools for development: If You develop your own code, please use the proper tools for debugging and parallel performance analysis. More information is available on the bwHPC Wiki.&lt;br /&gt;
* A look at the &#039;&#039;&#039;job feedback&#039;&#039;&#039; can help you determine if you are using the cluster efficiently&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Energy_Efficient_Cluster_Usage&amp;diff=12313</id>
		<title>Energy Efficient Cluster Usage</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Energy_Efficient_Cluster_Usage&amp;diff=12313"/>
		<updated>2023-08-21T11:57:05Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* Introduction */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Introduction =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Energy consumption of data centers has been increasing continuously throughout the last decade. In 2020, the energy consumption of all data centers in Germany amounted to around  [https://www.bundestag.de/resource/blob/863850/423c11968fcb5c9995e9ef9090edf9e6/WD-8-070-21-pdf-data.pdf 3 percent] of the total electricity produced. Accompanying this large energy consumption are large-scale emissions of CO2 to the atmosphere and thus significant contributions to climate change.&lt;br /&gt;
To illustrate this, an average compute job running on a single node for one day may easily consume 10 kWh or even more. That translates roughly to brewing 700 cups of coffee.&lt;br /&gt;
Assuming that a typical bwHPC cluster has a few hundred compute nodes, this amounts to the energy consumption of a village for each cluster. &lt;br /&gt;
&lt;br /&gt;
Although a large amount of this energy consumption is an intrinsic requirement of running large HPC clusters (even when ist processors are idle, a cluster uses a lot of energy), efficient use of the available resources is important. Using as many resources as possible does not make a power user. Using them wisely does.&lt;br /&gt;
In the following, a basic introduction to some of the most important aspects of energy-efficient HPC usage from a user perspective is given. &lt;br /&gt;
&lt;br /&gt;
We can generally distinguish three tasks when optimizing for running HPC jobs efficiently.&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  What do I want to do and why do I need an HPC Cluster for it?&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  How many and which kind of hardware resources do I require for it?&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;  How do I optimize my code to use these resources most efficiently?&lt;br /&gt;
&lt;br /&gt;
= What do I want to do and why do I need an HPC Cluster for it? =&lt;br /&gt;
&lt;br /&gt;
The bwHPC clusters are used to almost full capacity, and running a job on an HPC node consumes a lot of energy, as shown above. &lt;br /&gt;
Therefore, users are requested to run only necessary jobs.&lt;br /&gt;
&lt;br /&gt;
Please consider testing new setups and their output for validity prior to submitting jobs that require lots of resources. This also includes projects where a lot of (smaller) similar jobs are submitted. &lt;br /&gt;
&lt;br /&gt;
Make sure to double-check your jobs prior to the submission, having to discard the output data of an HPC project due to faulty input files is wasting a lot of computational resources.&lt;br /&gt;
&lt;br /&gt;
Finally, identifying the specific resource requirements for a given job is important to allocate the optimal your compute job, and to decide if an HPC cluster is needed at all. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= How many and which kind of hardware resources do I require for it =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Resource allocation is a crucial part when working on an HPC cluster. &lt;br /&gt;
As this is dependent on both the job as well as the specific cluster hardware and architecture available. &lt;br /&gt;
&lt;br /&gt;
A small number of jobs and few resources&lt;br /&gt;
* Submit to the scheduler. No extended testing and resource scaling analysis are needed. &lt;br /&gt;
&lt;br /&gt;
Medium-sized projects&lt;br /&gt;
* Run only necessary jobs: Please consider testing new setups and their output for validity prior to submitting a huge amount of similar jobs&lt;br /&gt;
* Start small: Run your problem on a small set of resources first.&lt;br /&gt;
* Use the proper tools for development: If you develop your own code, please use the proper tools for debugging and parallel performance analysis. See: [[Development#Documentation_in_the_Wiki|Development]].&lt;br /&gt;
* A look at the job feedback can help you determine if you are using the cluster efficiently&lt;br /&gt;
&lt;br /&gt;
Large projects&lt;br /&gt;
* Same approach as for medium-sized projects. &lt;br /&gt;
* Run a scaling analysis for your project with regard to how many resources work best. See: [[Scaling]].&lt;br /&gt;
&lt;br /&gt;
Many short jobs&lt;br /&gt;
* Handling via the scheduler is inefficient. &lt;br /&gt;
* Simple parallelization by hand is advisable. See: A basic introduction to [[Parallel Programming]].&lt;br /&gt;
&lt;br /&gt;
= How do I optimize my code to use these resources most efficiently? =&lt;br /&gt;
&lt;br /&gt;
The above recommendations will help use the cluster resources more efficiently.&lt;br /&gt;
Regarding software development, power efficiency correlates obviously heavily with &#039;&#039;&#039;computing performance&#039;&#039;&#039;, but also with memory usage, i.e. the amount of memory used, but also memory efficiency.&lt;br /&gt;
&lt;br /&gt;
Here, we have gathered a few results based on other research:&lt;br /&gt;
* Use an efficient programming language such as Rust, C, and C++ -- well any compiled language. Do not use any interpreted language like Perl or Python. Since Machine Learning is a hot topic, this deserves a few words: Any ML-Python code using Tensorflow or other libraries will make heavy usage of NumPy and other math packages, which will use C-based implementations. Please make sure, you use the provided Python modules, which are optimized to use Intel MKL and other mathematical libraries.&lt;br /&gt;
&lt;br /&gt;
Further reading:&lt;br /&gt;
Rui Pereira, et al: &amp;quot;&#039;&#039;Energy efficiency across programming languages: how do energy, time, and memory relate?&#039;&#039;&amp;quot;, SLE 2017: Proc. of the 10th ACM SIGPLAN Int. Conf. on SW Language Eng., Oct. 2017, pp. 256–267, [https://doi.org/10.1145/3136014.3136031 doi:10.1145/3136014.3136031]&lt;br /&gt;
&lt;br /&gt;
* Analyse memory access patterns&lt;br /&gt;
&lt;br /&gt;
* For small tight loops checking for locks, use the &amp;lt;code&amp;gt;pause&amp;lt;/code&amp;gt; instruction.&lt;br /&gt;
&lt;br /&gt;
= Summary: General Recommendations =&lt;br /&gt;
&lt;br /&gt;
* Choose the most &#039;&#039;&#039;efficient algorithms&#039;&#039;&#039; for the given problem&lt;br /&gt;
* Run only &#039;&#039;&#039;necessary&#039;&#039;&#039; jobs: Please consider testing new setups and their output for validity prior to submitting a huge amount of similar jobs&lt;br /&gt;
* Start &#039;&#039;&#039;small&#039;&#039;&#039;: Run Your problem on a small number of parallel entities (be it processes or threads) first.&lt;br /&gt;
* &#039;&#039;&#039;Estimate&#039;&#039;&#039; the runtime of the parallel job as &#039;&#039;&#039;exactly&#039;&#039;&#039; as possible to increase the efficiency of the scheduling of the whole system&lt;br /&gt;
* Use the proper tools for development: If You develop your own code, please use the proper tools for debugging and parallel performance analysis. More information is available on the bwHPC Wiki.&lt;br /&gt;
* A look at the &#039;&#039;&#039;job feedback&#039;&#039;&#039; can help you determine if you are using the cluster efficiently&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12279</id>
		<title>Acknowledgement</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12279"/>
		<updated>2023-08-21T08:39:51Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Remember to mention the cluster in your publications. Cluster-specific information can be found here:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BwUniCluster2.0/Acknowledgement| bwUniCluster Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BinAC/Acknowledgement| bwForCluster BinAC Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[Helix/Acknowledgement| bwForCluster Helix Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[JUSTUS2/Acknowledgement| bwForCluster JUSTUS2 Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[NEMO/Acknowledgement| bwForCluster NEMO Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Such recognition is important for acquiring funding for the next generation of hardware, support services, data storage, and infrastructure.&lt;br /&gt;
&lt;br /&gt;
The publications will be referenced on the bwHPC website:&lt;br /&gt;
 https://www.bwhpc.de/user_publications.html&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12278</id>
		<title>Acknowledgement</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12278"/>
		<updated>2023-08-21T08:38:59Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* Acknowledge the cluster */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Remember to mention the cluster in your publications. Cluster-specific information can be found here:&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BwUniCluster2.0/Acknowledgement| bwUniCluster Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BinAC/Acknowledgement| bwForCluster BinAC Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[Helix/Acknowledgement| bwForCluster Helix Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[JUSTUS2/Acknowledgement| bwForCluster JUSTUS2 Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[NEMO/Acknowledgement| bwForCluster NEMO Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Such recognition is important for acquiring funding for the next generation of hardware, support services, data storage, and infrastructure.&lt;br /&gt;
&lt;br /&gt;
The publications will be referenced on the bwHPC website:&lt;br /&gt;
 https://www.bwhpc.de/user_publications.html&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12273</id>
		<title>Acknowledgement</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12273"/>
		<updated>2023-08-21T08:34:45Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* Acknowledge the cluster */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Acknowledge the cluster ==&lt;br /&gt;
&lt;br /&gt;
Remember to mention the cluster in your publications. Cluster-specific information can be found here:&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BwUniCluster2.0/Acknowledgement| bwUniCluster Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[BinAC/Acknowledgement| bwForCluster BinAC Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[Helix/Acknowledgement| bwForCluster Helix Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[JUSTUS2/Acknowledgement| bwForCluster JUSTUS2 Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&amp;amp;rarr;[[NEMO/Acknowledgement| bwForCluster NEMO Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Such recognition is important for acquiring funding for the next generation of hardware, support services, data storage, and infrastructure.&lt;br /&gt;
&lt;br /&gt;
The publications will be referenced on the bwHPC website:&lt;br /&gt;
 https://www.bwhpc.de/user_publications.html&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12272</id>
		<title>Acknowledgement</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12272"/>
		<updated>2023-08-21T08:33:32Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* Acknowledge the cluster */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Acknowledge the cluster ==&lt;br /&gt;
&lt;br /&gt;
Remember to mention the cluster in your publications. &lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster2.0/Acknowledgement| bwUniCluster Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
[[BinAC/Acknowledgement| bwForCluster BinAC Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
[[Helix/Acknowledgement| bwForCluster Helix Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
[[JUSTUS2/Acknowledgement| bwForCluster JUSTUS2 Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
[[NEMO/Acknowledgement| bwForCluster NEMO Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Such recognition is important for acquiring funding for the next generation hardware, support services, data storage and infrastructure.&lt;br /&gt;
&lt;br /&gt;
The publications will be referenced on the bwHPC website:&lt;br /&gt;
 https://www.bwhpc.de/user_publications.html&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12271</id>
		<title>Acknowledgement</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Acknowledgement&amp;diff=12271"/>
		<updated>2023-08-21T08:32:56Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: Created page with &amp;quot;== Acknowledge the cluster ==  Remember to mention the cluster in your publications.    bwUniCluster Acknowledgement  BinAC/Acknowledgeme...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Acknowledge the cluster ==&lt;br /&gt;
&lt;br /&gt;
Remember to mention the cluster in your publications. &lt;br /&gt;
&lt;br /&gt;
[[BwUniCluster2.0/Acknowledgement| bwUniCluster Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
[[BinAC/Acknowledgement| bwForCluster BinAC Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
[[Helix/Acknowledgement| bwForCluster Helix Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
[[Justus2/Acknowledgement| bwForCluster Justus2 Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
[[NEMO/Acknowledgement| bwForCluster NEMO Acknowledgement]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Such recognition is important for acquiring funding for the next generation hardware, support services, data storage and infrastructure.&lt;br /&gt;
&lt;br /&gt;
The publications will be referenced on the bwHPC website:&lt;br /&gt;
 https://www.bwhpc.de/user_publications.html&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=File:Overview_batch_job_workflow.png&amp;diff=12264</id>
		<title>File:Overview batch job workflow.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=File:Overview_batch_job_workflow.png&amp;diff=12264"/>
		<updated>2023-08-21T08:23:23Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: J Steuer uploaded a new version of File:Overview batch job workflow.png&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Summary ==&lt;br /&gt;
Illustration of how batch jobs on an HPC Cluster are processed.&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=File:Overview_batch_job_workflow.png&amp;diff=12200</id>
		<title>File:Overview batch job workflow.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=File:Overview_batch_job_workflow.png&amp;diff=12200"/>
		<updated>2023-08-17T08:07:47Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: Illustration of how batch jobs on an HPC Cluster are processed.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Summary ==&lt;br /&gt;
Illustration of how batch jobs on an HPC Cluster are processed.&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=File:Basic_slurm_script.png&amp;diff=12173</id>
		<title>File:Basic slurm script.png</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=File:Basic_slurm_script.png&amp;diff=12173"/>
		<updated>2023-08-16T15:03:37Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: Basic example of the structure of a SLURM batch script.&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Summary ==&lt;br /&gt;
Basic example of the structure of a SLURM batch script.&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Registration/bwUniCluster/Entitlement&amp;diff=12103</id>
		<title>Registration/bwUniCluster/Entitlement</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Registration/bwUniCluster/Entitlement&amp;diff=12103"/>
		<updated>2023-07-25T08:48:09Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{|style=&amp;quot;background:#ffffff; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#ffffff; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#ffffff; text-align:left&amp;quot;|&lt;br /&gt;
The bwUniCluster entitlement (see [https://www.bwidm.de/attribute.php#Berechtigung eduPersonEntitlement]) issued by a university assures the operator of the bwUniCluster that its university member&#039;s compute activities comply with the German Foreign Trade Act (Außenwirtschaftsgesetz - AWG) und German Foreign Trade Regulations (Außenwirtschaftsverordnung - AWV).&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
= Step A: bwUniCluster Entitlement =&lt;br /&gt;
&lt;br /&gt;
To register for the bwUniCluster 2.0 you need the  &#039;&#039;&#039;bwUniCluster Entitlement&#039;&#039;&#039; issued by your university.&lt;br /&gt;
&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
The entitlement is called &#039;&#039;&#039;bwUniCluster&#039;&#039;&#039; (and not bwUniCluster 2.0) and each university assigns the entitlement &#039;&#039;&#039;only&#039;&#039;&#039; for its own members.&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
If you are not sure if you already have an entitlement, please check it first with the [[Registration/bwUniCluster/Entitlement#Check_your_Entitlements|&#039;&#039;&#039;Check your Entitlements&#039;&#039;&#039;]] guide below.&lt;br /&gt;
If you need the entitlement, please follow the link for your institution or contact your local service desk if no information is provided:&lt;br /&gt;
* [https://www.hs-esslingen.de/informatik-und-informationstechnik/forschung-labore/projekte/forschungsprojekte/high-performance-computing/ Hochschule Esslingen]&lt;br /&gt;
* [[BwCluster_User_Access_Uni_Freiburg|Universität Freiburg]]&lt;br /&gt;
* [https://bwunicluster.urz.uni-heidelberg.de/ Universität Heidelberg]&lt;br /&gt;
* [https://kim.uni-hohenheim.de/bwhpc-account Universität Hohenheim]&lt;br /&gt;
* [http://www.scc.kit.edu/downloads/ISM/Accessform_bwUniCluster_DE_EN.pdf Karlsruhe Institute of Technology (KIT)]&lt;br /&gt;
* [https://www.kim.uni-konstanz.de/en/services/research-and-teaching/high-performance-computing/access-to-bwunicluster Universität Konstanz]&lt;br /&gt;
* [[BWUniCluster_User_Access_Members_Uni_Mannheim|Universität Mannheim]]&lt;br /&gt;
* [https://www.hlrs.de/apply-for-computing-time/bw-uni-cluster Universität Stuttgart]&lt;br /&gt;
* [https://uni-tuebingen.de/de/155157 Universität Tübingen]&lt;br /&gt;
* [[BWUniCluster_User_Access_Members_Uni_Ulm|Universität Ulm]]&lt;br /&gt;
* [[Registration/HAW|HAW BW e.V.]] and Duale Hochschule Baden-Württemberg: Please contact your local service desk / compute center&lt;br /&gt;
&lt;br /&gt;
== Check your Entitlements ==&lt;br /&gt;
&lt;br /&gt;
To make sure you do not already have the entitlement, please log in to &#039;&#039;&#039;https://login.bwidm.de/user/index.xhtml&#039;&#039;&#039;.&lt;br /&gt;
To see the list of your entitlements, first select the &#039;&#039;&#039;Shibboleth&#039;&#039;&#039; tab at the top.&lt;br /&gt;
If the list below &amp;lt;code&amp;gt;&amp;lt;nowiki&amp;gt;urn:oid:1.3.6.1.4.1.5923.1.1.1.7&amp;lt;/nowiki&amp;gt;&amp;lt;/code&amp;gt; contains&lt;br /&gt;
&amp;lt;pre&amp;gt;http://bwidm.de/entitlement/bwUniCluster&amp;lt;/pre&amp;gt;&lt;br /&gt;
you already have the entitlement and can skip step A.&lt;br /&gt;
{|style=&amp;quot;background:#deffee; width:100%;&amp;quot;&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
[[Image:Attention.svg|center|25px]]&lt;br /&gt;
|style=&amp;quot;padding:5px; background:#cef2e0; text-align:left&amp;quot;|&lt;br /&gt;
&amp;lt;code&amp;gt;&amp;lt;nowiki&amp;gt;http://bwidm.de/entitlement/bwUniCluster&amp;lt;/nowiki&amp;gt;&amp;lt;/code&amp;gt; is an attribute and not a link!&lt;br /&gt;
See [https://www.bwidm.de/dienste.php bwUniCluster und bwForCluster] for more information about needed attributes for this service.&lt;br /&gt;
|}&lt;br /&gt;
[[File:BwIDM-idp.png|center|600px|thumb|Verify Entitlement.]]&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&amp;lt;p style=&amp;quot;text-align:right;&amp;quot;&amp;gt;[[Registration/bwUniCluster/Service | Go to step B]]&amp;lt;/p&amp;gt;&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Scaling&amp;diff=12096</id>
		<title>Scaling</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Scaling&amp;diff=12096"/>
		<updated>2023-07-04T11:26:38Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Introduction = &lt;br /&gt;
&lt;br /&gt;
Before you submit large production runs on a bwHPC cluster you should define an optimal number of resources required for your compute job. Poor job efficiency means that hardware resources are wasted and a similar overall result could have been achieved using fewer hardware resources, leaving those for other jobs and reducing the queue wait time for all users.&lt;br /&gt;
&lt;br /&gt;
The main advantage of today‘s compute clusters is that they are able to perform calculations in parallel. If and how your code is able to be parallelized is of fundamental importance for achieving good job efficiency and performance on an HPC cluster. A scaling analysis is done by identifying the number of resources (such as the number of cores, nodes, or GPUs) that enable the best performance for a given compute job.&lt;br /&gt;
&lt;br /&gt;
[[Energy Efficient Cluster Usage]] offers additional information on how to make the most out of the available HPC resources.&lt;br /&gt;
&lt;br /&gt;
= Considering Resources vs. Queue Time = &lt;br /&gt;
&lt;br /&gt;
When a job is submitted to the scheduler of an HPC cluster, the job first waits in the queue before being executed on the compute nodes. &lt;br /&gt;
The amount of time spent in the queue is called the queue time. &lt;br /&gt;
The amount of time it takes for the job to run on the compute nodes is called the execution time.&lt;br /&gt;
&lt;br /&gt;
The figure below shows that the queue time increases with increasing resources (e.g., CPU cores) while the execution time decreases with increasing resources. &lt;br /&gt;
One should try to find the optimal set of resources that minimizes the &amp;quot;time to solution&amp;quot; which is the sum of the queue and execution times. &lt;br /&gt;
A simple rule is to choose the smallest set of resources that gives a reasonable speed-up over the baseline case.&lt;br /&gt;
&lt;br /&gt;
[[File:Fig_resources_vs_queue_time.jpg|800px|center]]&lt;br /&gt;
&lt;br /&gt;
= Scaling Efficiency =&lt;br /&gt;
&lt;br /&gt;
When you run a parallel program, the problem has to be cut into several independent pieces. For some problems, this is easier than for others - but in every case, this produces an overhead of time used to divide the problem, distribute parts of it to tasks, and stitch the results together.&lt;br /&gt;
For a theoretical amount of &amp;quot;infinite calculations&amp;quot;, calculating each problem on one single core would be the most efficient way to use the hardware.&lt;br /&gt;
In extreme cases, when the problem is very hard to divide, using more compute cores, can even make the job finish later.&lt;br /&gt;
&lt;br /&gt;
For real calculations, it is often impractical to wait for calculations to finish if they are done on a single core. &lt;br /&gt;
Typical calculation times for a job should stay under 2 days, or up to 2 weeks for jobs that cannot use more cores efficiently. &lt;br /&gt;
Any longer and the risks such as node failures, cluster downtimes due to maintenance, and getting (possibly wrong) results after too much wait time can become too much of a problem.&lt;br /&gt;
&lt;br /&gt;
A common way to assess the efficiency of a parallel program is through its speedup. &lt;br /&gt;
Here, the speedup is defined as the ratio of the time a serial program needs to run to the time for the parallel program that accomplishes the same work. &lt;br /&gt;
&lt;br /&gt;
 Speedup= Time(serial program) / Time(parallel program)&lt;br /&gt;
&lt;br /&gt;
A simple example would be a calculation that takes 1000 hours on 1 core.&lt;br /&gt;
Without any overhead from parallelization, the same calculation run on 1000 cores would need 1000/100= 10 hours, the ideal speedup.&lt;br /&gt;
More realistically, such a calculation for parallelized code would need around 30 hours.&lt;br /&gt;
&lt;br /&gt;
[[File:Fig_speedup.png|400px|center]]&lt;br /&gt;
&lt;br /&gt;
However, there is a theoretical upper limit on how much faster you can solve the original problem by using additional cores ([[Wikipedia:Amdahl%27s_law|Amdahl&#039;s Law]]). &lt;br /&gt;
While a considerable part of a compute job might parallelize nicely, there is always some portion of time spent on I/O, such as saving or reading from disc, network limitations, communication overhead, or performing calculations that cannot be parallelized, thus reducing the speedup that is possible by simply adding more computational resources.&lt;br /&gt;
&lt;br /&gt;
From the speedup, a useful definition of efficiency can be derived:&lt;br /&gt;
&lt;br /&gt;
 Efficiency = Speedup / Number of cores = Time(serial program) / (Time(parallel program) * Number of cores)&lt;br /&gt;
&lt;br /&gt;
The efficiency allows for an estimation of how well your code is using additional cores, and how much of the resources are lost by doing parallelization overhead calculations.&lt;br /&gt;
Coming back to the previous example, we can now use the time of a serial calculation (1000 hours), the time our parallelized code took to finish (30 hours), the number of cores we used (100 cores), and calculate the efficiency.&lt;br /&gt;
&lt;br /&gt;
 Efficiency = 1000 / (30 * 100) = 0.3&lt;br /&gt;
&lt;br /&gt;
This shows that for this example, only 30% of the resources are used to solve the problem, while 70% of our resources are spent on parallelization overhead.&lt;br /&gt;
A semi-arbitrary cut-off for determining if a job is well-scaled is if 50% or less of the computation is wasted on parallelization overhead.&lt;br /&gt;
Therefore, we can determine that for this example too many resources are used.&lt;br /&gt;
&lt;br /&gt;
In many cases, the time needed for calculating a given code in serial, on a single core, is not accessible, as this would take a very long time and is usually the reason why an HPC cluster is needed in the first place.&lt;br /&gt;
To circumvent this, the relative speedup when doubling the number of cores is calculated.&lt;br /&gt;
&lt;br /&gt;
 Relative Speedup (N Cores -&amp;gt; 2N Cores) = Time(N Cores) / Time(2N Cores)&lt;br /&gt;
&lt;br /&gt;
The relative speedup obtained by doubling the number of cores can be used as a rough guideline for a scaling analysis. &lt;br /&gt;
If doubling the number of cores results in a relative speedup of above 1.8, the scaling is considered good.&lt;br /&gt;
Above 1.7 is considered acceptable, while a relative speedup of less than 1.7 should usually be avoided.&lt;br /&gt;
We can illustrate this by using our simple parallelization example from above. &lt;br /&gt;
If we assume that our code would have finished in 45 hours when using 50 cores, we can calculate the relative speedup:&lt;br /&gt;
&lt;br /&gt;
 Relative Speedup (50 Cores -&amp;gt; 100 Cores) = Time(50 Cores) / Time(100 Cores) = 45 h / 30 h = 1,5&lt;br /&gt;
&lt;br /&gt;
A relative speedup of 1,5 is considered undesirable, so we should run our example code using 50 rather than 100 cores on the HPC cluster.&lt;br /&gt;
&lt;br /&gt;
In the following a scaling analysis from a real example using the program VASP is shown.&lt;br /&gt;
&lt;br /&gt;
[[File:Fig_speedup_and_efficiency_1.png|700px|center]]&lt;br /&gt;
&lt;br /&gt;
[[File:Fig_speedup_and_efficiency_2.png|700px|center]]&lt;br /&gt;
&lt;br /&gt;
= Basic Recipe to Determine Core Numbers =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
From the previous chapter, the following rules of thumb for determining a suitable number of cores for HPC compute jobs can be summarized:&lt;br /&gt;
&lt;br /&gt;
(1) Optimizing resource usage is most relevant when you submit many jobs or resource-heavy jobs. &lt;br /&gt;
For smaller projects, simply try to use a reasonable core number and you are done.&lt;br /&gt;
&lt;br /&gt;
(2) If you plan to submit many jobs, verify that the core number is acceptable.&lt;br /&gt;
If the jobs use N cores (i.e. N is 96 for a two-node job), then run the same job with N/2 cores (in this example 48 cores).&lt;br /&gt;
&lt;br /&gt;
(3) To calculate the speedup, you then divide the (longer) run time of the N/2-core-job by the (shorter) run time of the N-core-job. Typically the speedup is a number between 1.0 (no speedup at all) and 2.0 (perfect speedup - all additional cores speed up the job).&lt;br /&gt;
&lt;br /&gt;
(4a) IF the speedup is better than a factor of 1.7, THEN using N cores is perfectly fine.&lt;br /&gt;
&lt;br /&gt;
(4b) IF the speedup is worse than a factor of 1.7, THEN using N cores wastes too many resources and N/2 cores should be used.&lt;br /&gt;
&lt;br /&gt;
= Better Resource Usage by Increasing the System Size =&lt;br /&gt;
&lt;br /&gt;
Amdahl’s law, as illustrated above, gives the upper limit of speedup for a problem of fixed size.&lt;br /&gt;
By simply increasing the number of cores to speed up a calculation your compute job can quickly become very inefficient, and wasteful. &lt;br /&gt;
While this appears to be a bottleneck for parallel computing, a different strategy was pointed out ([[Wikipedia:Gustafson&#039;s_law|Gustafson&#039;s law]]). &lt;br /&gt;
&lt;br /&gt;
If a problem only requires a small number of resources, it is not beneficial to use a large number of resources to carry out the computation. &lt;br /&gt;
A more reasonable choice is to use small amounts of resources for small problems and larger quantities of resources for big problems.&lt;br /&gt;
Thus, researchers can take advantage of available cores by scaling up parallel programs to explore their questions in higher resolution or at a larger scale. &lt;br /&gt;
With increases in computational power, researchers can be increasingly ambitious about the scale and complexity of their programs.&lt;br /&gt;
&lt;br /&gt;
= Reasons For Poor Job Efficiency = &lt;br /&gt;
&lt;br /&gt;
Some simple causes for poor overall job efficiency are:&lt;br /&gt;
&lt;br /&gt;
* Poor choice of resources compared to the size of the nodes leaves part of the node blocked, but doing nothing:&lt;br /&gt;
** Multiple of --ntasks-per-node is not the number of cores on a node (e.g. 48)&lt;br /&gt;
** Too much (un-needed) memory or disk space requested&lt;br /&gt;
* More cores requested than are actually used by the job&lt;br /&gt;
* More cores used for a single mpi/openmp parallel computation than useful&lt;br /&gt;
* Many small jobs with a short runtime (seconds in extreme cases)&lt;br /&gt;
* One-core jobs with very different run-times (because of single-user policy)&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Energy_Efficient_Cluster_Usage&amp;diff=12095</id>
		<title>Energy Efficient Cluster Usage</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Energy_Efficient_Cluster_Usage&amp;diff=12095"/>
		<updated>2023-07-04T11:15:46Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* Introduction */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Introduction =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Energy consumption of data centers has been increasing continuously throughout the last decade. In 2020, the energy consumption of all data centers in Germany amounted to around  [https://www.bundestag.de/resource/blob/863850/423c11968fcb5c9995e9ef9090edf9e6/WD-8-070-21-pdf-data.pdf 3 percent] of the total electricity produced. Accompanying this large energy consumption are large-scale emissions of CO2 to the atmosphere and thus significant contributions to climate change.&lt;br /&gt;
To illustrate this, an average compute job running on a single node for one day may easily consume 10 kWh or even more. That translates roughly to brewing 700 cups of coffee.&lt;br /&gt;
Assuming that a typical bwHPC cluster has a few hundred compute nodes, this amounts to the energy consumption of a village for each cluster. &lt;br /&gt;
&lt;br /&gt;
Although a large amount of this energy consumption is an intrinsic requirement of running large HPC clusters (even when ist processors are idle, a cluster uses a lot of energy), efficient use of the available resources is important. Using as many resources as possible does not make a power user. Using them wisely does.&lt;br /&gt;
In the following, a basic introduction to some of the most important aspects of energy-efficient HPC usage from a user perspective is given. &lt;br /&gt;
&lt;br /&gt;
We can generally distinguish three tasks when optimizing for running HPC jobs efficiently.&lt;br /&gt;
&lt;br /&gt;
* What do I want to do and why do I need an HPC Cluster for it?&lt;br /&gt;
* How many and which kind of hardware resources do I require for it?&lt;br /&gt;
* How do I optimize my code to use these resources most efficiently?&lt;br /&gt;
&lt;br /&gt;
= What do I want to do and why do I need an HPC Cluster for it? =&lt;br /&gt;
&lt;br /&gt;
The bwHPC clusters are used to almost full capacity, and running a job on an HPC node consumes a lot of energy, as shown above. &lt;br /&gt;
Therefore, users are requested to run only necessary jobs.&lt;br /&gt;
&lt;br /&gt;
Please consider testing new setups and their output for validity prior to submitting jobs that require lots of resources. This also includes projects where a lot of (smaller) similar jobs are submitted. &lt;br /&gt;
&lt;br /&gt;
Make sure to double-check your jobs prior to the submission, having to discard the output data of an HPC project due to faulty input files is wasting a lot of computational resources.&lt;br /&gt;
&lt;br /&gt;
Finally, identifying the specific resource requirements for a given job is important to allocate the optimal your compute job, and to decide if an HPC cluster is needed at all. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= How many and which kind of hardware resources do I require for it =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Resource allocation is a crucial part when working on an HPC cluster. &lt;br /&gt;
As this is dependent on both the job as well as the specific cluster hardware and architecture available. &lt;br /&gt;
&lt;br /&gt;
A small number of jobs and few resources&lt;br /&gt;
* Submit to the scheduler. No extended testing and resource scaling analysis are needed. &lt;br /&gt;
&lt;br /&gt;
Medium-sized projects&lt;br /&gt;
* Run only necessary jobs: Please consider testing new setups and their output for validity prior to submitting a huge amount of similar jobs&lt;br /&gt;
* Start small: Run your problem on a small set of resources first.&lt;br /&gt;
* Use the proper tools for development: If you develop your own code, please use the proper tools for debugging and parallel performance analysis. See: [[Development#Documentation_in_the_Wiki|Development]].&lt;br /&gt;
* A look at the job feedback can help you determine if you are using the cluster efficiently&lt;br /&gt;
&lt;br /&gt;
Large projects&lt;br /&gt;
* Same approach as for medium-sized projects. &lt;br /&gt;
* Run a scaling analysis for your project with regard to how many resources work best. See: [[Scaling]].&lt;br /&gt;
&lt;br /&gt;
Many short jobs&lt;br /&gt;
* Handling via the scheduler is inefficient. &lt;br /&gt;
* Simple parallelization by hand is advisable. See: A basic introduction to [[Parallel Programming]].&lt;br /&gt;
&lt;br /&gt;
= How do I optimize my code to use these resources most efficiently? =&lt;br /&gt;
&lt;br /&gt;
The above recommendations will help use the cluster resources more efficiently.&lt;br /&gt;
Regarding software development, power efficiency correlates obviously heavily with &#039;&#039;&#039;computing performance&#039;&#039;&#039;, but also with memory usage, i.e. the amount of memory used, but also memory efficiency.&lt;br /&gt;
&lt;br /&gt;
Here, we have gathered a few results based on other research:&lt;br /&gt;
* Use an efficient programming language such as Rust, C, and C++ -- well any compiled language. Do not use any interpreted language like Perl or Python. Since Machine Learning is a hot topic, this deserves a few words: Any ML-Python code using Tensorflow or other libraries will make heavy usage of NumPy and other math packages, which will use C-based implementations. Please make sure, you use the provided Python modules, which are optimized to use Intel MKL and other mathematical libraries.&lt;br /&gt;
&lt;br /&gt;
Further reading:&lt;br /&gt;
Rui Pereira, et al: &amp;quot;&#039;&#039;Energy efficiency across programming languages: how do energy, time, and memory relate?&#039;&#039;&amp;quot;, SLE 2017: Proc. of the 10th ACM SIGPLAN Int. Conf. on SW Language Eng., Oct. 2017, pp. 256–267, [https://doi.org/10.1145/3136014.3136031 doi:10.1145/3136014.3136031]&lt;br /&gt;
&lt;br /&gt;
* Analyse memory access patterns&lt;br /&gt;
&lt;br /&gt;
* For small tight loops checking for locks, use the &amp;lt;code&amp;gt;pause&amp;lt;/code&amp;gt; instruction.&lt;br /&gt;
&lt;br /&gt;
= Summary: General Recommendations =&lt;br /&gt;
&lt;br /&gt;
* Choose the most &#039;&#039;&#039;efficient algorithms&#039;&#039;&#039; for the given problem&lt;br /&gt;
* Run only &#039;&#039;&#039;necessary&#039;&#039;&#039; jobs: Please consider testing new setups and their output for validity prior to submitting a huge amount of similar jobs&lt;br /&gt;
* Start &#039;&#039;&#039;small&#039;&#039;&#039;: Run Your problem on a small number of parallel entities (be it processes or threads) first.&lt;br /&gt;
* &#039;&#039;&#039;Estimate&#039;&#039;&#039; the runtime of the parallel job as &#039;&#039;&#039;exactly&#039;&#039;&#039; as possible to increase the efficiency of the scheduling of the whole system&lt;br /&gt;
* Use the proper tools for development: If You develop your own code, please use the proper tools for debugging and parallel performance analysis. More information is available on the bwHPC Wiki.&lt;br /&gt;
* A look at the &#039;&#039;&#039;job feedback&#039;&#039;&#039; can help you determine if you are using the cluster efficiently&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_User_Access_Members_Uni_Konstanz&amp;diff=12089</id>
		<title>BwForCluster User Access Members Uni Konstanz</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_User_Access_Members_Uni_Konstanz&amp;diff=12089"/>
		<updated>2023-07-04T09:05:22Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;A valid account for the University of Konstanz is required to access to bwForCluster. So you need at least an employee- or student ID (Matrikelnummer).&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
==  ==&lt;br /&gt;
&lt;br /&gt;
Your [[Registration/bwForCluster/RV | registration request]] for a new &#039;&#039;rechenvorhaben&#039;&#039; will be delivered to your local support team at the University of Konstanz. They will automatically proof your request and set the bwForCluster-Entitlement or contact you directly if further information is necessary.&lt;br /&gt;
&lt;br /&gt;
In case of joining an existing &#039;&#039;rechenvorhaben&#039;&#039;, please contact the [https://www.kim.uni-konstanz.de/services/forschen-und-lehren/high-performance-computing/ local support] to obtain the bwForCluster-Entitlement.&lt;br /&gt;
&lt;br /&gt;
==   ==&lt;br /&gt;
&lt;br /&gt;
For more information about registration visit the related web pages and follow the instructions documented on this page.&lt;br /&gt;
&lt;br /&gt;
German version: [https://www.kim.uni-konstanz.de/services/forschen-und-lehren/high-performance-computing/zugang-bwforcluster/ Zugang bwForCluster]&lt;br /&gt;
&lt;br /&gt;
English version: [https://www.kim.uni-konstanz.de/en/services/research-and-teaching/high-performance-computing/access-bwforcluster/ Access bwForCluster (in progress)]&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_User_Access_Members_Uni_Konstanz&amp;diff=12088</id>
		<title>BwForCluster User Access Members Uni Konstanz</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_User_Access_Members_Uni_Konstanz&amp;diff=12088"/>
		<updated>2023-07-04T09:04:38Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;A valid account for the University of Konstanz is required to access to bwForCluster. So you need at least an employee- or student ID (Matrikelnummer).&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
==  ==&lt;br /&gt;
&lt;br /&gt;
Your [[Registration/bwForCluster/RV | registration request]] for a new &#039;&#039;rechenvorhaben&#039;&#039; will be delivered to your local support team at the University of Konstanz. They will automatically proof your request and set the bwForCluster-Entitlement or contact you directly if further information is necessary.&lt;br /&gt;
&lt;br /&gt;
In case of joining an existing &#039;&#039;rechenvorhaben&#039;&#039;, please contact the [http://www.rz.uni-konstanz.de/en/support/ local support] to obtain the bwForCluster-Entitlement.&lt;br /&gt;
&lt;br /&gt;
==   ==&lt;br /&gt;
&lt;br /&gt;
For more information about registration visit the related web pages and follow the instructions documented on this page.&lt;br /&gt;
&lt;br /&gt;
German version: [https://www.kim.uni-konstanz.de/services/forschen-und-lehren/high-performance-computing/zugang-bwforcluster/ Zugang bwForCluster]&lt;br /&gt;
&lt;br /&gt;
English version: [https://www.kim.uni-konstanz.de/en/services/research-and-teaching/high-performance-computing/access-bwforcluster/ Access bwForCluster (in progress)]&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_User_Access_Members_Uni_Konstanz&amp;diff=12086</id>
		<title>BwForCluster User Access Members Uni Konstanz</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=BwForCluster_User_Access_Members_Uni_Konstanz&amp;diff=12086"/>
		<updated>2023-07-04T09:00:42Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&#039;&#039;&#039;A valid account for the University of Konstanz is required to access to bwForCluster. So you need at least an employee- or student ID (Matrikelnummer).&#039;&#039;&#039;&lt;br /&gt;
&lt;br /&gt;
==  ==&lt;br /&gt;
&lt;br /&gt;
Your [[#BwForCluster_User_Access | registration request]] for a new &#039;&#039;rechenvorhaben&#039;&#039; will be delivered to your local support team at the University of Konstanz. They will automatically proof your request and set the bwForCluster-Entitlement or contact you directly if further information is necessary.&lt;br /&gt;
&lt;br /&gt;
In case of joining an existing &#039;&#039;rechenvorhaben&#039;&#039;, please contact the [http://www.rz.uni-konstanz.de/en/support/ local support] to obtain the bwForCluster-Entitlement.&lt;br /&gt;
&lt;br /&gt;
==   ==&lt;br /&gt;
&lt;br /&gt;
For more information about registration visit the related web pages and follow the instructions documented on this page.&lt;br /&gt;
&lt;br /&gt;
German version: [https://www.kim.uni-konstanz.de/services/forschen-und-lehren/high-performance-computing/zugang-bwforcluster/ Zugang bwForCluster]&lt;br /&gt;
&lt;br /&gt;
English version: [https://www.kim.uni-konstanz.de/en/services/research-and-teaching/high-performance-computing/access-bwforcluster/ Access bwForCluster (in progress)]&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
	<entry>
		<id>https://wiki.bwhpc.de/wiki/index.php?title=Scaling&amp;diff=12083</id>
		<title>Scaling</title>
		<link rel="alternate" type="text/html" href="https://wiki.bwhpc.de/wiki/index.php?title=Scaling&amp;diff=12083"/>
		<updated>2023-07-04T08:39:25Z</updated>

		<summary type="html">&lt;p&gt;J Steuer: /* Introduction */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Introduction = &lt;br /&gt;
&lt;br /&gt;
Before you submit large production runs on a bwHPC cluster you should define an optimal number of resources required for your compute job. Poor job efficiency means that hardware resources are wasted and a similar overall result could have been achieved using fewer hardware resources, leaving those for other jobs and reducing the queue wait time for all users.&lt;br /&gt;
&lt;br /&gt;
The main advantage of today‘s compute clusters is that they are able to perform calculations in parallel. If and how your code is able to be parallelized is of fundamental importance for achieving good job efficiency and performance on an HPC cluster. A scaling analysis is done by identifying the number of resources (such as the number of cores, nodes, or GPUs) that enable the best performance for a given compute job.&lt;br /&gt;
&lt;br /&gt;
See also [[Energy Efficient Cluster Usage]].&lt;br /&gt;
&lt;br /&gt;
= Considering Resources vs. Queue Time = &lt;br /&gt;
&lt;br /&gt;
When a job is submitted to the scheduler of an HPC cluster, the job first waits in the queue before being executed on the compute nodes. &lt;br /&gt;
The amount of time spent in the queue is called the queue time. &lt;br /&gt;
The amount of time it takes for the job to run on the compute nodes is called the execution time.&lt;br /&gt;
&lt;br /&gt;
The figure below shows that the queue time increases with increasing resources (e.g., CPU cores) while the execution time decreases with increasing resources. &lt;br /&gt;
One should try to find the optimal set of resources that minimizes the &amp;quot;time to solution&amp;quot; which is the sum of the queue and execution times. &lt;br /&gt;
A simple rule is to choose the smallest set of resources that gives a reasonable speed-up over the baseline case.&lt;br /&gt;
&lt;br /&gt;
[[File:Fig_resources_vs_queue_time.jpg|800px|center]]&lt;br /&gt;
&lt;br /&gt;
= Scaling Efficiency =&lt;br /&gt;
&lt;br /&gt;
When you run a parallel program, the problem has to be cut into several independent pieces. For some problems, this is easier than for others - but in every case, this produces an overhead of time used to divide the problem, distribute parts of it to tasks, and stitch the results together.&lt;br /&gt;
For a theoretical amount of &amp;quot;infinite calculations&amp;quot;, calculating each problem on one single core would be the most efficient way to use the hardware.&lt;br /&gt;
In extreme cases, when the problem is very hard to divide, using more compute cores, can even make the job finish later.&lt;br /&gt;
&lt;br /&gt;
For real calculations, it is often impractical to wait for calculations to finish if they are done on a single core. &lt;br /&gt;
Typical calculation times for a job should stay under 2 days, or up to 2 weeks for jobs that cannot use more cores efficiently. &lt;br /&gt;
Any longer and the risks such as node failures, cluster downtimes due to maintenance, and getting (possibly wrong) results after too much wait time can become too much of a problem.&lt;br /&gt;
&lt;br /&gt;
A common way to assess the efficiency of a parallel program is through its speedup. &lt;br /&gt;
Here, the speedup is defined as the ratio of the time a serial program needs to run to the time for the parallel program that accomplishes the same work. &lt;br /&gt;
&lt;br /&gt;
 Speedup= Time(serial program) / Time(parallel program)&lt;br /&gt;
&lt;br /&gt;
A simple example would be a calculation that takes 1000 hours on 1 core.&lt;br /&gt;
Without any overhead from parallelization, the same calculation run on 1000 cores would need 1000/100= 10 hours, the ideal speedup.&lt;br /&gt;
More realistically, such a calculation for parallelized code would need around 30 hours.&lt;br /&gt;
&lt;br /&gt;
[[File:Fig_speedup.png|400px|center]]&lt;br /&gt;
&lt;br /&gt;
However, there is a theoretical upper limit on how much faster you can solve the original problem by using additional cores ([[Wikipedia:Amdahl%27s_law|Amdahl&#039;s Law]]). &lt;br /&gt;
While a considerable part of a compute job might parallelize nicely, there is always some portion of time spent on I/O, such as saving or reading from disc, network limitations, communication overhead, or performing calculations that cannot be parallelized, thus reducing the speedup that is possible by simply adding more computational resources.&lt;br /&gt;
&lt;br /&gt;
From the speedup, a useful definition of efficiency can be derived:&lt;br /&gt;
&lt;br /&gt;
 Efficiency = Speedup / Number of cores = Time(serial program) / (Time(parallel program) * Number of cores)&lt;br /&gt;
&lt;br /&gt;
The efficiency allows for an estimation of how well your code is using additional cores, and how much of the resources are lost by doing parallelization overhead calculations.&lt;br /&gt;
Coming back to the previous example, we can now use the time of a serial calculation (1000 hours), the time our parallelized code took to finish (30 hours), the number of cores we used (100 cores), and calculate the efficiency.&lt;br /&gt;
&lt;br /&gt;
 Efficiency = 1000 / (30 * 100) = 0.3&lt;br /&gt;
&lt;br /&gt;
This shows that for this example, only 30% of the resources are used to solve the problem, while 70% of our resources are spent on parallelization overhead.&lt;br /&gt;
A semi-arbitrary cut-off for determining if a job is well-scaled is if 50% or less of the computation is wasted on parallelization overhead.&lt;br /&gt;
Therefore, we can determine that for this example too many resources are used.&lt;br /&gt;
&lt;br /&gt;
In many cases, the time needed for calculating a given code in serial, on a single core, is not accessible, as this would take a very long time and is usually the reason why an HPC cluster is needed in the first place.&lt;br /&gt;
To circumvent this, the relative speedup when doubling the number of cores is calculated.&lt;br /&gt;
&lt;br /&gt;
 Relative Speedup (N Cores -&amp;gt; 2N Cores) = Time(N Cores) / Time(2N Cores)&lt;br /&gt;
&lt;br /&gt;
The relative speedup obtained by doubling the number of cores can be used as a rough guideline for a scaling analysis. &lt;br /&gt;
If doubling the number of cores results in a relative speedup of above 1.8, the scaling is considered good.&lt;br /&gt;
Above 1.7 is considered acceptable, while a relative speedup of less than 1.7 should usually be avoided.&lt;br /&gt;
We can illustrate this by using our simple parallelization example from above. &lt;br /&gt;
If we assume that our code would have finished in 45 hours when using 50 cores, we can calculate the relative speedup:&lt;br /&gt;
&lt;br /&gt;
 Relative Speedup (50 Cores -&amp;gt; 100 Cores) = Time(50 Cores) / Time(100 Cores) = 45 h / 30 h = 1,5&lt;br /&gt;
&lt;br /&gt;
A relative speedup of 1,5 is considered undesirable, so we should run our example code using 50 rather than 100 cores on the HPC cluster.&lt;br /&gt;
&lt;br /&gt;
In the following a scaling analysis from a real example using the program VASP is shown.&lt;br /&gt;
&lt;br /&gt;
[[File:Fig_speedup_and_efficiency_1.png|700px|center]]&lt;br /&gt;
&lt;br /&gt;
[[File:Fig_speedup_and_efficiency_2.png|700px|center]]&lt;br /&gt;
&lt;br /&gt;
= Basic Recipe to Determine Core Numbers =&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
From the previous chapter, the following rules of thumb for determining a suitable number of cores for HPC compute jobs can be summarized:&lt;br /&gt;
&lt;br /&gt;
(1) Optimizing resource usage is most relevant when you submit many jobs or resource-heavy jobs. &lt;br /&gt;
For smaller projects, simply try to use a reasonable core number and you are done.&lt;br /&gt;
&lt;br /&gt;
(2) If you plan to submit many jobs, verify that the core number is acceptable.&lt;br /&gt;
If the jobs use N cores (i.e. N is 96 for a two-node job), then run the same job with N/2 cores (in this example 48 cores).&lt;br /&gt;
&lt;br /&gt;
(3) To calculate the speedup, you then divide the (longer) run time of the N/2-core-job by the (shorter) run time of the N-core-job. Typically the speedup is a number between 1.0 (no speedup at all) and 2.0 (perfect speedup - all additional cores speed up the job).&lt;br /&gt;
&lt;br /&gt;
(4a) IF the speedup is better than a factor of 1.7, THEN using N cores is perfectly fine.&lt;br /&gt;
&lt;br /&gt;
(4b) IF the speedup is worse than a factor of 1.7, THEN using N cores wastes too many resources and N/2 cores should be used.&lt;br /&gt;
&lt;br /&gt;
= Better Resource Usage by Increasing the System Size =&lt;br /&gt;
&lt;br /&gt;
Amdahl’s law, as illustrated above, gives the upper limit of speedup for a problem of fixed size.&lt;br /&gt;
By simply increasing the number of cores to speed up a calculation your compute job can quickly become very inefficient, and wasteful. &lt;br /&gt;
While this appears to be a bottleneck for parallel computing, a different strategy was pointed out ([[Wikipedia:Gustafson&#039;s_law|Gustafson&#039;s law]]). &lt;br /&gt;
&lt;br /&gt;
If a problem only requires a small number of resources, it is not beneficial to use a large number of resources to carry out the computation. &lt;br /&gt;
A more reasonable choice is to use small amounts of resources for small problems and larger quantities of resources for big problems.&lt;br /&gt;
Thus, researchers can take advantage of available cores by scaling up parallel programs to explore their questions in higher resolution or at a larger scale. &lt;br /&gt;
With increases in computational power, researchers can be increasingly ambitious about the scale and complexity of their programs.&lt;br /&gt;
&lt;br /&gt;
= Reasons For Poor Job Efficiency = &lt;br /&gt;
&lt;br /&gt;
Some simple causes for poor overall job efficiency are:&lt;br /&gt;
&lt;br /&gt;
* Poor choice of resources compared to the size of the nodes leaves part of the node blocked, but doing nothing:&lt;br /&gt;
** Multiple of --ntasks-per-node is not the number of cores on a node (e.g. 48)&lt;br /&gt;
** Too much (un-needed) memory or disk space requested&lt;br /&gt;
* More cores requested than are actually used by the job&lt;br /&gt;
* More cores used for a single mpi/openmp parallel computation than useful&lt;br /&gt;
* Many small jobs with a short runtime (seconds in extreme cases)&lt;br /&gt;
* One-core jobs with very different run-times (because of single-user policy)&lt;/div&gt;</summary>
		<author><name>J Steuer</name></author>
	</entry>
</feed>