Difference between revisions of "JUSTUS2/Software/Gaussian"

From bwHPC Wiki
Jump to: navigation, search
(Parallel computing)
 
(62 intermediate revisions by 4 users not shown)
Line 1: Line 1:
  +
{{Softwarepage|chem/gaussian}}
   
  +
{| width=600px class="wikitable"
 
<!--{|{{Softwarebox}}-->
 
<!--{| align="left" {{Table|width=40%}} -->
 
{| align="left" {{Table|width=50%}}
 
 
|-
 
|-
  +
! Description !! Content
! colspan="2" style="text-align:center" | Key facts
 
 
|-
 
|-
| Module name
+
| module load
 
| chem/gaussian
 
| chem/gaussian
|-
 
| Availability
 
| [[bwForCluster_Chemistry]]
 
 
|-
 
|-
 
| License
 
| License
  +
| Commercial - see [http://gaussian.com/pricing/ Pricing for Gaussian Products]
| commercial
 
 
|-
 
|-
 
| Citing
 
| Citing
| [http://www.gaussian.com/g_tech/g_ur/m_citation.htm See Gaussian manual]
+
| See [http://gaussian.com/citation/ Gaussian Citation]
 
|-
 
|-
 
| Links
 
| Links
| [http://www.gaussian.com Homepage]; [http://www.gaussian.com/g_tech/g09ur.htm Manual]; [http://www.gaussian.com/g_tech/g09iop.htm IOps Reference]
+
| [http://www.gaussian.com Homepage]; [http://gaussian.com/man/ Reference Manual]; [http://gaussian.com/running/ Running Gaussian]; [http://gaussian.com/keywords/ Keywords], [http://gaussian.com/iops/ IOps Reference]
 
|-
 
|-
| Graphical interface
+
| Graphical Interface
 
| See [[Gaussview]]
 
| See [[Gaussview]]
 
|}
 
|}
Line 34: Line 29:
 
* property and spectra calculations such as IR, UV/VIS, Raman or CD; as well as
 
* property and spectra calculations such as IR, UV/VIS, Raman or CD; as well as
 
* shared-memory parallel versions for almost all kind of jobs.
 
* shared-memory parallel versions for almost all kind of jobs.
For more information on features please visit Gaussian's [http://www.gaussian.com/g_prod/g09b.htm ''Overview of Capabilities and Features''] web page.
+
For more information on features please visit Gaussian's [http://gaussian.com/gaussian16/ ''Overview of Capabilities and Features''] and [http://gaussian.com/relnotes/ ''Release Notes''] web page.
 
<br>
 
<br>
 
<br>
 
<br>
   
  +
= Parallel computing =
= Versions and Availability =
 
A list of versions currently available on the bwForCluster Chemistry can be obtained from the [https://cis-hpc.uni-konstanz.de/prod.cis/Justus/chem/gaussian Cluster Information System (CIS)]:
 
{{#widget:Iframe
 
|url=https://cis-hpc.uni-konstanz.de/prod.cis/Justus/chem/gaussian
 
|width=99%
 
|height=200
 
|border=1
 
}}
 
<br>
 
<br>
 
 
On the command line of a particular bwHPC cluster a list of all available Gaussian versions is displayed by command
 
<pre>
 
$ module avail chem/gaussian
 
</pre>
 
<br>
 
 
== Parallel computing ==
 
 
The binaries of the Gaussian module can run in serial and shared-memory parallel mode. Switching between the serial and parallel version is done via statement
 
The binaries of the Gaussian module can run in serial and shared-memory parallel mode. Switching between the serial and parallel version is done via statement
 
<pre>
 
<pre>
%NProcShare=PPN
+
%NProcShare=N
 
</pre>
 
</pre>
in section ''Link 0 commands'' before the ''route section'' at the beginning of the Gaussian input file. ''PPN'' should be replaced by the number of parallel cores. This value '''must''' be identical to the ''ppn'' value specified when requesting resources from the queueing system. Since Gaussian is shared-memory parallel, only single node jobs do make sense. Without ''NProcShare'' statement the serial version of Gaussian is selected.
+
in section ''Link 0 commands'' before the ''route section'' at the beginning of the Gaussian input file. The '''number of cores''' requested from the queueing system (i.e. ''--ntasks-per-node=N'') '''must''' be identical to '''%NProcShared=N''' as specified in the Gaussian input file. The installed Gaussian binaries are shared-memory parallel only. Therefore only single node jobs do make sense. Without ''NProcShare'' Gaussian will use only one core by default.
<br>
 
 
<br>
 
<br>
   
 
= Usage =
 
= Usage =
  +
== Loading the Module ==
 
  +
== Loading the module ==
You can load the default version of ''Gaussian'' with command:
 
  +
  +
== Running Gaussian interactively ==
  +
After loading the Gaussian module you can run a quick interactive example by executing
 
<pre>
 
<pre>
  +
$ time g16 < $GAUSSIAN_EXA_DIR/test0553-4core-parallel.com
$ module load chem/gaussian
 
 
</pre>
 
</pre>
  +
In most cases running Gaussian requires setting up the command input file and redirecting that input into command g16.
The Gaussian ''Module'' does not depend on any other ''Module''.
 
   
  +
== Creating Gaussian input files ==
If you wish to load a specific version you may do so by specifying the version explicitly, e.g.
 
  +
  +
For documentation about how to construct input files see the [http://gaussian.com/man/ Gaussian manual]. In addition the program [[Gaussview]] is a very good graphical user interface for constructing molecules and for setting up calculations. Finally these calculation setups can be saved as Gaussian command files and thereafter can be submitted to the cluster with help of the queueing system examples below.
  +
<br>
  +
  +
== Disk usage ==
  +
  +
By default, scratch files of Gaussian are placed in GAUSS_SCRDIR as displayed when loading the Gaussian module. In most cases the module load command of Gaussian should set the GAUSS_SCRDIR pointing to an optimal node-local file system. When running multiple Gaussian jobs together on one node a user may want to add one more sub-directory level containing e.g. job id and job name for clarity - if not done so already by the queueing system.
  +
  +
Predicting how much disk space a specific Gaussian calculation can be a difficult task. It requires experience with the methods, the basis sets, the calculated properties and the system you are investigating. The best advice is probably to start with small basis sets and small example systems, run such example calculations and observe their (hopefully small) disk usage while the job is running. Then read the Gaussian documentation about scaling behaviour and basis set sizes (the basis set size of the current calculation is printed at the beginning of the output of the Gaussian job). Finally try to extrapolate to your desired final system and basis set.
  +
  +
You can also try to specify a fixed amount of disk space for a calculation. This is done by adding a statement like
 
<pre>
 
<pre>
  +
%MaxDisk=50000MB
$ module load chem/gaussian/g09.D.01
 
 
</pre>
 
</pre>
  +
to the route section of the Gaussian input file. But please be aware that (a) [http://gaussian.com/maxdisk/ Gaussian does not necessarily obey the specified value]] and (b) you might force Gaussian to select a slower algorithm when specifying an inappropriate value.
to load version ''g09.D.01'' of Gaussian.
 
  +
  +
In any case please make sure that you request more node-local disk space from the queueing system then you have specified in the Gausian input file. For information on how much node-local disk space is available at the cluster and how to request a certain amount of node-local disk space for a calculation from the queueing system, please consult the cluster specific queueing system documentation as well as the queueing system examples of the Gaussian module as described below.
  +
  +
Except for very short interactive test jobs please never run Gaussian calculations in any globally mounted directory like your $HOME or $WORK directory.
 
<br>
 
<br>
   
  +
== Memory usage ==
== Running Gaussian interactively ==
 
  +
After loading the Gaussian module you can run a quick interactive example by executing
 
  +
Predicting the memory requirements of a job is as difficult as predicting the disk requirements. The strategies are very similar. So for a large new unknown system, start with smaller test systems and smaller basis sets and then extrapolate.
  +
  +
You may specify the memory for a calculation explicitly in the route section of the Gaussian input file, for example
 
<pre>
 
<pre>
  +
%Mem=10000MB
$ time g09 < $GAUSSIAN_EXA_DIR/test0553-8core-parallel.com
 
 
</pre>
 
</pre>
  +
Gaussian usually obeys this value rather well. We have seen calculations that exceed the Mem value by at most by 2GB. Therefore it is usually sufficient to request Mem+2GB from the queueing system.
In most cases running Gaussian requires setting up the command input file and piping that input into g09.
 
   
  +
But please carefully monitor the output of Gaussian when restricting the memory in the input file. Gaussian automatically switches between algorithms (e.g. recalculating values instead of storing them) when specifying too low memory values. So when the output is indicating that with more memory the ''integrals could be kept in memory'' (just an example for one of the messages), the calculation will be faster when assigning more memory.
== Creating Gaussian input ==
 
For documentation about how to construct input files see the [http://www.gaussian.com/g_tech/g_ur/m_citation.htm Gaussian manual]. In addition the program [[Gaussview]] is a very good graphical user interface for constructing molecules and for setting up calculations. Finally these calculation setups can be saved as Gaussian command file.
 
<br>
 
   
  +
In case of shared-memory parallel jobs the number of workers has only minor influence on the memory consumption (maybe up to 10%). This is since all workers work together on one common data set.
== Disk Usage ==
 
  +
By default, scratch files of Gaussian are placed in GAUSS_SCRDIR as displayed when loading the Gaussian module. Except for short tests please never run Gaussian calculations in your $HOME or $WORK directory.
 
  +
== Using SSD systems efficiently ==
<br>
 
  +
  +
Compared with conventional disks SSD's are far more than 1000 times faster when serving random-IO requests. Therefore some of the default strategies of Gaussian, e.g. recalculate some values instead of storing them on disk, might not be optimal in all cases. Of course this is only relevant when there is not enough RAM to store the intermediate values, e.g. two centre integrals, etc.
  +
  +
So if you plan to do many huge calculations that do not fit into the RAM, you may want to compare the execution time of a job that is re-calculating the intermediate values whenever needed and a job that forces these values to be written to and read from the node-local SSD's. Depending on how much time it costs to re-calculate the intermediate values, using the SSD's can be much faster.
   
 
= Examples =
 
= Examples =
  +
== Single node jobs ==
 
=== Queueing system template provided by Gaussian module ===
+
== Queueing system template provided by Gaussian module ==
  +
The Gaussian ''module'' provides a simple Moab example of Hexanitroethan (C2N6O12) that runs an 8 core parallel single energy point calculation using method B3LYP and basis set 6-31g(df,pd). To submit the example do the following steps:
 
  +
The Gaussian module provides a simple Slurm example (Hexanitroethan C2N6O12) that runs a 4 core parallel single energy point calculation using method B3LYP and basis set 6-31g(df,pd). To submit the example do the following steps:
 
<pre>
 
<pre>
$ ws_allocate calc_repo 30; cd $(ws_find calc_repo)
 
$ mkdir my_first_job; cd my_first_job
 
 
$ module load chem/gaussian
 
$ module load chem/gaussian
$ cp -v ${GAUSSIAN_EXA_DIR}/{bwforcluster-gaussian-example.moab,test0553-*.com} ./
+
$ cp -v ${GAUSSIAN_EXA_DIR}/bwforcluster-gaussian-example.sbatch ./
$ msub bwforcluster-gaussian-example.moab
+
$ sbatch bwforcluster-gaussian-example.sbatch
 
</pre>
 
</pre>
The last step submits the job example script ''bwforcluster-gaussian-example.moab'' to the queueing system. Once started on a compute node, all calculations will be done under an unique directory on the [[Batch_Jobs_-_bwForCluster_Chemistry_Features#Disk_Space_and_Resources|local file system ($TMPDIR)]] of that particular compute node. Please carefully read this ''local file system'' documentation as well as the comments in the queueing system example script ''bwforcluster-gaussian-example.moab''.
+
The last step submits the job example script ''bwforcluster-gaussian-example.sbatch'' to the queueing system. Once started all temporary files are kept below directory [[Hardware_and_Architecture_(bwForCluster_JUSTUS_2)#Storage_Architecture|'''$SCRATCH''']] only visible on the compute node where the job is running. When option '''--gres=scratch:nnn''' has been specified while submitting the job script, then '''$SCRATCH''' points to the node-local SSDs. Otherwise (option '''--gres=scratch:nnn''' has not been specified) '''$SCRATCH''' points to a RAM disk. Please '''carefully''' read this ''local file system'' documentation as well as the comments in the queueing system example script ''bwforcluster-gaussian-example.sbatch''.
 
<br>
 
<br>
   
  +
== Direct submission of Gaussian command files ==
= Version-Specific Information =
 
  +
For specific information about version ''VERSION'' see the information available via the module system with the command
 
  +
For users who do not want to deal with queueing system scripts we have created a submit command that automatically creates and submits queueing system scripts for Gaussian. For example:
  +
 
<pre>
 
<pre>
$ module help chem/gaussian/VERSION
+
$ module load chem/gaussian
  +
$ cp -v $GAUSSIAN_EXA_DIR/test0553-4core-parallel.com ./
  +
$ gauss_sub test0553-4core-parallel.com
 
</pre>
 
</pre>
  +
Please read the local module help documentation before using the software. The module help contains links to additional documentation and resources as well as information about support contact.
 
  +
== Caveat for windows users ==
  +
  +
If you have transferred the Gaussian input file from a Windows computer
  +
to Unix then make sure to convert the line breaks of Windows (<CR>+<LF>)
  +
to Unix (only <LF>). Otherwise Gaussian will write strange error messages.
  +
Typical Unix commands for that are: 'dos2unix' and 'unix2dos'. Example:
  +
  +
<pre>
  +
$ dos2unix test0553-4core-parallel.com
  +
</pre>
  +
 
<br>
 
<br>
   

Latest revision as of 11:55, 6 December 2022

The main documentation is available via module help chem/gaussian on the cluster. Most software modules for applications provide working example batch scripts.


Description Content
module load chem/gaussian
License Commercial - see Pricing for Gaussian Products
Citing See Gaussian Citation
Links Homepage; Reference Manual; Running Gaussian; Keywords, IOps Reference
Graphical Interface See Gaussview

1 Description

Gaussian is a general purpose quantum chemistry software package for ab initio electronic structure calculations. It provides:

  • ground state calculations for methods such as HF, many DFT functionals, MP2/3/4 or CCSD(T);
  • basic excited state calculations such as TDHF or TDDF;
  • coupled multi-shell QM/MM calculations (ONIOM);
  • geometry optimizations, transition state searches, molecular dynamics calculations;
  • property and spectra calculations such as IR, UV/VIS, Raman or CD; as well as
  • shared-memory parallel versions for almost all kind of jobs.

For more information on features please visit Gaussian's Overview of Capabilities and Features and Release Notes web page.

2 Parallel computing

The binaries of the Gaussian module can run in serial and shared-memory parallel mode. Switching between the serial and parallel version is done via statement

%NProcShare=N

in section Link 0 commands before the route section at the beginning of the Gaussian input file. The number of cores requested from the queueing system (i.e. --ntasks-per-node=N) must be identical to %NProcShared=N as specified in the Gaussian input file. The installed Gaussian binaries are shared-memory parallel only. Therefore only single node jobs do make sense. Without NProcShare Gaussian will use only one core by default.

3 Usage

3.1 Loading the module

3.2 Running Gaussian interactively

After loading the Gaussian module you can run a quick interactive example by executing

$ time g16 < $GAUSSIAN_EXA_DIR/test0553-4core-parallel.com

In most cases running Gaussian requires setting up the command input file and redirecting that input into command g16.

3.3 Creating Gaussian input files

For documentation about how to construct input files see the Gaussian manual. In addition the program Gaussview is a very good graphical user interface for constructing molecules and for setting up calculations. Finally these calculation setups can be saved as Gaussian command files and thereafter can be submitted to the cluster with help of the queueing system examples below.

3.4 Disk usage

By default, scratch files of Gaussian are placed in GAUSS_SCRDIR as displayed when loading the Gaussian module. In most cases the module load command of Gaussian should set the GAUSS_SCRDIR pointing to an optimal node-local file system. When running multiple Gaussian jobs together on one node a user may want to add one more sub-directory level containing e.g. job id and job name for clarity - if not done so already by the queueing system.

Predicting how much disk space a specific Gaussian calculation can be a difficult task. It requires experience with the methods, the basis sets, the calculated properties and the system you are investigating. The best advice is probably to start with small basis sets and small example systems, run such example calculations and observe their (hopefully small) disk usage while the job is running. Then read the Gaussian documentation about scaling behaviour and basis set sizes (the basis set size of the current calculation is printed at the beginning of the output of the Gaussian job). Finally try to extrapolate to your desired final system and basis set.

You can also try to specify a fixed amount of disk space for a calculation. This is done by adding a statement like

%MaxDisk=50000MB

to the route section of the Gaussian input file. But please be aware that (a) Gaussian does not necessarily obey the specified value] and (b) you might force Gaussian to select a slower algorithm when specifying an inappropriate value.

In any case please make sure that you request more node-local disk space from the queueing system then you have specified in the Gausian input file. For information on how much node-local disk space is available at the cluster and how to request a certain amount of node-local disk space for a calculation from the queueing system, please consult the cluster specific queueing system documentation as well as the queueing system examples of the Gaussian module as described below.

Except for very short interactive test jobs please never run Gaussian calculations in any globally mounted directory like your $HOME or $WORK directory.

3.5 Memory usage

Predicting the memory requirements of a job is as difficult as predicting the disk requirements. The strategies are very similar. So for a large new unknown system, start with smaller test systems and smaller basis sets and then extrapolate.

You may specify the memory for a calculation explicitly in the route section of the Gaussian input file, for example

%Mem=10000MB

Gaussian usually obeys this value rather well. We have seen calculations that exceed the Mem value by at most by 2GB. Therefore it is usually sufficient to request Mem+2GB from the queueing system.

But please carefully monitor the output of Gaussian when restricting the memory in the input file. Gaussian automatically switches between algorithms (e.g. recalculating values instead of storing them) when specifying too low memory values. So when the output is indicating that with more memory the integrals could be kept in memory (just an example for one of the messages), the calculation will be faster when assigning more memory.

In case of shared-memory parallel jobs the number of workers has only minor influence on the memory consumption (maybe up to 10%). This is since all workers work together on one common data set.

3.6 Using SSD systems efficiently

Compared with conventional disks SSD's are far more than 1000 times faster when serving random-IO requests. Therefore some of the default strategies of Gaussian, e.g. recalculate some values instead of storing them on disk, might not be optimal in all cases. Of course this is only relevant when there is not enough RAM to store the intermediate values, e.g. two centre integrals, etc.

So if you plan to do many huge calculations that do not fit into the RAM, you may want to compare the execution time of a job that is re-calculating the intermediate values whenever needed and a job that forces these values to be written to and read from the node-local SSD's. Depending on how much time it costs to re-calculate the intermediate values, using the SSD's can be much faster.

4 Examples

4.1 Queueing system template provided by Gaussian module

The Gaussian module provides a simple Slurm example (Hexanitroethan C2N6O12) that runs a 4 core parallel single energy point calculation using method B3LYP and basis set 6-31g(df,pd). To submit the example do the following steps:

$ module load chem/gaussian
$ cp -v ${GAUSSIAN_EXA_DIR}/bwforcluster-gaussian-example.sbatch ./
$ sbatch bwforcluster-gaussian-example.sbatch

The last step submits the job example script bwforcluster-gaussian-example.sbatch to the queueing system. Once started all temporary files are kept below directory $SCRATCH only visible on the compute node where the job is running. When option --gres=scratch:nnn has been specified while submitting the job script, then $SCRATCH points to the node-local SSDs. Otherwise (option --gres=scratch:nnn has not been specified) $SCRATCH points to a RAM disk. Please carefully read this local file system documentation as well as the comments in the queueing system example script bwforcluster-gaussian-example.sbatch.

4.2 Direct submission of Gaussian command files

For users who do not want to deal with queueing system scripts we have created a submit command that automatically creates and submits queueing system scripts for Gaussian. For example:

$ module load chem/gaussian
$ cp -v $GAUSSIAN_EXA_DIR/test0553-4core-parallel.com ./
$ gauss_sub test0553-4core-parallel.com

4.3 Caveat for windows users

If you have transferred the Gaussian input file from a Windows computer to Unix then make sure to convert the line breaks of Windows (<CR>+<LF>) to Unix (only <LF>). Otherwise Gaussian will write strange error messages. Typical Unix commands for that are: 'dos2unix' and 'unix2dos'. Example:

$ dos2unix test0553-4core-parallel.com