Difference between revisions of "JUSTUS2/Software/Gaussian"

From bwHPC Wiki
Jump to: navigation, search
(Running Gaussian interactively)
 
(30 intermediate revisions by 3 users not shown)
Line 1: Line 1:
  +
{{Softwarepage|chem/gaussian}}
  +
 
{| width=600px class="wikitable"
 
{| width=600px class="wikitable"
 
|-
 
|-
Line 7: Line 9:
 
|-
 
|-
 
| License
 
| License
| Commercial. See: [http://gaussian.com/pricing/ Pricing for Gaussian Products]
+
| Commercial - see [http://gaussian.com/pricing/ Pricing for Gaussian Products]
 
|-
 
|-
 
| Citing
 
| Citing
Line 13: Line 15:
 
|-
 
|-
 
| Links
 
| Links
| [http://www.gaussian.com Homepage]; [http://gaussian.com/running/ Running Gaussian]; [http://gaussian.com/keywords/ Keywords], [http://gaussian.com/iops/ IOps Reference]
+
| [http://www.gaussian.com Homepage]; [http://gaussian.com/man/ Reference Manual]; [http://gaussian.com/running/ Running Gaussian]; [http://gaussian.com/keywords/ Keywords], [http://gaussian.com/iops/ IOps Reference]
 
|-
 
|-
 
| Graphical Interface
 
| Graphical Interface
| Yes. See [[Gaussview]].
+
| See [[Gaussview]]
 
|}
 
|}
   
Line 30: Line 32:
 
<br>
 
<br>
 
<br>
 
<br>
 
= Versions and Availability =
 
A list of versions currently available on the bwForCluster Chemistry can be obtained via command line after login to the bwForCluster:
 
<pre>
 
$ module avail chem/gaussian
 
</pre>
 
   
 
= Parallel computing =
 
= Parallel computing =
 
The binaries of the Gaussian module can run in serial and shared-memory parallel mode. Switching between the serial and parallel version is done via statement
 
The binaries of the Gaussian module can run in serial and shared-memory parallel mode. Switching between the serial and parallel version is done via statement
 
<pre>
 
<pre>
%NProcShare=PPN
+
%NProcShare=N
 
</pre>
 
</pre>
in section ''Link 0 commands'' before the ''route section'' at the beginning of the Gaussian input file. The '''number of cores''' requested from the queueing system '''must''' be identical to '''NProcShared''' as specified in the Gaussian input file. The installed Gaussian binaries are shared-memory parallel only. Therefore only single node jobs do make sense. Without ''NProcShare'' Gaussian will use only one core by default.
+
in section ''Link 0 commands'' before the ''route section'' at the beginning of the Gaussian input file. The '''number of cores''' requested from the queueing system (i.e. ''--ntasks-per-node=N'') '''must''' be identical to '''%NProcShared=N''' as specified in the Gaussian input file. The installed Gaussian binaries are shared-memory parallel only. Therefore only single node jobs do make sense. Without ''NProcShare'' Gaussian will use only one core by default.
 
<br>
 
<br>
   
Line 48: Line 44:
   
 
== Loading the module ==
 
== Loading the module ==
 
You can load the default version of ''Gaussian'' with command:
 
<pre>
 
$ module load chem/gaussian
 
</pre>
 
The Gaussian module does not depend on any other module (no dependencies).
 
 
If you wish to load a specific version you may do so by specifying the version explicitly, e.g.
 
<pre>
 
$ module load chem/gaussian/g16.C.01
 
</pre>
 
to load version ''g16.C.01'' of Gaussian.
 
<br>
 
 
'''In production jobs we strongly recommend to always specify
 
the version when loading a module.'''
 
   
 
== Running Gaussian interactively ==
 
== Running Gaussian interactively ==
 
After loading the Gaussian module you can run a quick interactive example by executing
 
After loading the Gaussian module you can run a quick interactive example by executing
 
<pre>
 
<pre>
$ time g16 < $GAUSSIAN_EXA_DIR/test0553-8core-parallel.com
+
$ time g16 < $GAUSSIAN_EXA_DIR/test0553-4core-parallel.com
 
</pre>
 
</pre>
In most cases running Gaussian requires setting up the command input file and piping that input into g09.
+
In most cases running Gaussian requires setting up the command input file and redirecting that input into command g16.
   
 
== Creating Gaussian input files ==
 
== Creating Gaussian input files ==
   
For documentation about how to construct input files see the [http://www.gaussian.com/g_tech/g_ur/g09help.htm Gaussian manual]. In addition the program [[Gaussview]] is a very good graphical user interface for constructing molecules and for setting up calculations. Finally these calculation setups can be saved as Gaussian command files and thereafter can be submitted to the cluster with help of the queueing system examples below.
+
For documentation about how to construct input files see the [http://gaussian.com/man/ Gaussian manual]. In addition the program [[Gaussview]] is a very good graphical user interface for constructing molecules and for setting up calculations. Finally these calculation setups can be saved as Gaussian command files and thereafter can be submitted to the cluster with help of the queueing system examples below.
 
<br>
 
<br>
   
Line 81: Line 61:
 
By default, scratch files of Gaussian are placed in GAUSS_SCRDIR as displayed when loading the Gaussian module. In most cases the module load command of Gaussian should set the GAUSS_SCRDIR pointing to an optimal node-local file system. When running multiple Gaussian jobs together on one node a user may want to add one more sub-directory level containing e.g. job id and job name for clarity - if not done so already by the queueing system.
 
By default, scratch files of Gaussian are placed in GAUSS_SCRDIR as displayed when loading the Gaussian module. In most cases the module load command of Gaussian should set the GAUSS_SCRDIR pointing to an optimal node-local file system. When running multiple Gaussian jobs together on one node a user may want to add one more sub-directory level containing e.g. job id and job name for clarity - if not done so already by the queueing system.
   
Predicting how much disk space a specific Gaussian calculation requires is a very difficult task. It requires experience with the methods, the basis sets, the calculated properties and the system you are investigating. The best advice is probably to start with small basis sets and small example systems, run such example calculations and observe their (hopefully small) disk usage while the job is running. Then read the Gaussian documentation about scaling behaviour and basis set sizes (the basis set size of the current calculation is printed at the beginning of the output of the Gaussian job). Finally try to extrapolate to your desired final system and basis set.
+
Predicting how much disk space a specific Gaussian calculation can be a difficult task. It requires experience with the methods, the basis sets, the calculated properties and the system you are investigating. The best advice is probably to start with small basis sets and small example systems, run such example calculations and observe their (hopefully small) disk usage while the job is running. Then read the Gaussian documentation about scaling behaviour and basis set sizes (the basis set size of the current calculation is printed at the beginning of the output of the Gaussian job). Finally try to extrapolate to your desired final system and basis set.
   
 
You can also try to specify a fixed amount of disk space for a calculation. This is done by adding a statement like
 
You can also try to specify a fixed amount of disk space for a calculation. This is done by adding a statement like
Line 87: Line 67:
 
%MaxDisk=50000MB
 
%MaxDisk=50000MB
 
</pre>
 
</pre>
to the route section of the Gaussian input file. But please be aware that (a) [[http://www.gaussian.com/g_tech/g_ur/k_maxdisk.htm Gaussian does not necessarily obey the specified value]] and (b) you might force Gaussian to select a slower algorithm when specifying an inappropriate value.
+
to the route section of the Gaussian input file. But please be aware that (a) [http://gaussian.com/maxdisk/ Gaussian does not necessarily obey the specified value]] and (b) you might force Gaussian to select a slower algorithm when specifying an inappropriate value.
   
In any case please make sure that you request sufficient but not far too much node-local disk space from the queueing system. For information on how much node-local disk space is available at the cluster and how to request a certain amount of node-local disk space for a calculation from the queueing system, please consult the cluster specific queueing system documentation as well as the queueing system examples of the Gaussian module as described below.
+
In any case please make sure that you request more node-local disk space from the queueing system then you have specified in the Gausian input file. For information on how much node-local disk space is available at the cluster and how to request a certain amount of node-local disk space for a calculation from the queueing system, please consult the cluster specific queueing system documentation as well as the queueing system examples of the Gaussian module as described below.
   
 
Except for very short interactive test jobs please never run Gaussian calculations in any globally mounted directory like your $HOME or $WORK directory.
 
Except for very short interactive test jobs please never run Gaussian calculations in any globally mounted directory like your $HOME or $WORK directory.
Line 96: Line 76:
 
== Memory usage ==
 
== Memory usage ==
   
Predicting the memory requirements of a job is nearly as difficult as predicting the disk requirements. But the strategies can be very similar. So start with small test systems and small basis sets and then extrapolate.
+
Predicting the memory requirements of a job is as difficult as predicting the disk requirements. The strategies are very similar. So for a large new unknown system, start with smaller test systems and smaller basis sets and then extrapolate.
   
 
You may specify the memory for a calculation explicitly in the route section of the Gaussian input file, for example
 
You may specify the memory for a calculation explicitly in the route section of the Gaussian input file, for example
Line 104: Line 84:
 
Gaussian usually obeys this value rather well. We have seen calculations that exceed the Mem value by at most by 2GB. Therefore it is usually sufficient to request Mem+2GB from the queueing system.
 
Gaussian usually obeys this value rather well. We have seen calculations that exceed the Mem value by at most by 2GB. Therefore it is usually sufficient to request Mem+2GB from the queueing system.
   
But please carefully monitor the output of Gaussian when restricting the memory in the input file. Gaussian automatically switches between algorithms (e.g. recalculating values instead of storing them) when specifying too low memory values. So when the output is indicating that with more memory e.g. the ''integrals'' could be kept in memory the calculation might be much faster when assigning more memory.
+
But please carefully monitor the output of Gaussian when restricting the memory in the input file. Gaussian automatically switches between algorithms (e.g. recalculating values instead of storing them) when specifying too low memory values. So when the output is indicating that with more memory the ''integrals could be kept in memory'' (just an example for one of the messages), the calculation will be faster when assigning more memory.
   
 
In case of shared-memory parallel jobs the number of workers has only minor influence on the memory consumption (maybe up to 10%). This is since all workers work together on one common data set.
 
In case of shared-memory parallel jobs the number of workers has only minor influence on the memory consumption (maybe up to 10%). This is since all workers work together on one common data set.
Line 118: Line 98:
 
== Queueing system template provided by Gaussian module ==
 
== Queueing system template provided by Gaussian module ==
   
The Gaussian module provides a simple Moab example of Hexanitroethan (C2N6O12) that runs an 8 core parallel single energy point calculation using method B3LYP and basis set 6-31g(df,pd). To submit the example do the following steps:
+
The Gaussian module provides a simple Slurm example (Hexanitroethan C2N6O12) that runs a 4 core parallel single energy point calculation using method B3LYP and basis set 6-31g(df,pd). To submit the example do the following steps:
 
<pre>
 
<pre>
$ ws_allocate calc_repo 30; cd $(ws_find calc_repo)
 
$ mkdir my_first_job; cd my_first_job
 
 
$ module load chem/gaussian
 
$ module load chem/gaussian
$ cp -v ${GAUSSIAN_EXA_DIR}/{bwforcluster-gaussian-example.moab,test0553-*.com} ./
+
$ cp -v ${GAUSSIAN_EXA_DIR}/bwforcluster-gaussian-example.sbatch ./
$ msub bwforcluster-gaussian-example.moab
+
$ sbatch bwforcluster-gaussian-example.sbatch
 
</pre>
 
</pre>
The last step submits the job example script ''bwforcluster-gaussian-example.moab'' to the queueing system. Once started on a compute node, all calculations will be done under an unique directory on the [[Batch_Jobs_-_bwForCluster_Chemistry_Features#Disk_Space_and_Resources|local file system ($TMPDIR)]] of that particular compute node. Please '''carefully''' read this ''local file system'' documentation as well as the comments in the queueing system example script ''bwforcluster-gaussian-example.moab''.
+
The last step submits the job example script ''bwforcluster-gaussian-example.sbatch'' to the queueing system. Once started all temporary files are kept below directory [[Hardware_and_Architecture_(bwForCluster_JUSTUS_2)#Storage_Architecture|'''$SCRATCH''']] only visible on the compute node where the job is running. When option '''--gres=scratch:nnn''' has been specified while submitting the job script, then '''$SCRATCH''' points to the node-local SSDs. Otherwise (option '''--gres=scratch:nnn''' has not been specified) '''$SCRATCH''' points to a RAM disk. Please '''carefully''' read this ''local file system'' documentation as well as the comments in the queueing system example script ''bwforcluster-gaussian-example.sbatch''.
 
<br>
 
<br>
   
Line 134: Line 112:
   
 
<pre>
 
<pre>
$ ws_allocate calc_repo 30; cd $(ws_find calc_repo)
 
$ mkdir my_first_job; cd my_first_job
 
 
$ module load chem/gaussian
 
$ module load chem/gaussian
$ cp $GAUSSIAN_EXA_DIR/test0553-8core-parallel.com ./
+
$ cp -v $GAUSSIAN_EXA_DIR/test0553-4core-parallel.com ./
$ gauss_sub test0553-8core-parallel.com
+
$ gauss_sub test0553-4core-parallel.com
 
</pre>
 
</pre>
   
Line 149: Line 125:
   
 
<pre>
 
<pre>
$ dos2unix test0553-8core-parallel.com
+
$ dos2unix test0553-4core-parallel.com
 
</pre>
 
</pre>
   
= Version-specific information =
 
 
For specific information about version ''VERSION'' see the information available via the module system with the command
 
<pre>
 
$ module help chem/gaussian/VERSION
 
</pre>
 
'''Please read the local version-specific module help documentation''' before using the software. The module help contains links to additional documentation and resources as well as information about support contact.
 
 
<br>
 
<br>
   
 
----
 
----
[[Category:Chemistry software]][[Category:bwForCluster_Chemistry]][[Category:BwForCluster_BinAC]]
+
[[Category:Chemistry software]][[Category:bwForCluster_Chemistry]]

Latest revision as of 11:55, 6 December 2022

The main documentation is available via module help chem/gaussian on the cluster. Most software modules for applications provide working example batch scripts.


Description Content
module load chem/gaussian
License Commercial - see Pricing for Gaussian Products
Citing See Gaussian Citation
Links Homepage; Reference Manual; Running Gaussian; Keywords, IOps Reference
Graphical Interface See Gaussview

1 Description

Gaussian is a general purpose quantum chemistry software package for ab initio electronic structure calculations. It provides:

  • ground state calculations for methods such as HF, many DFT functionals, MP2/3/4 or CCSD(T);
  • basic excited state calculations such as TDHF or TDDF;
  • coupled multi-shell QM/MM calculations (ONIOM);
  • geometry optimizations, transition state searches, molecular dynamics calculations;
  • property and spectra calculations such as IR, UV/VIS, Raman or CD; as well as
  • shared-memory parallel versions for almost all kind of jobs.

For more information on features please visit Gaussian's Overview of Capabilities and Features and Release Notes web page.

2 Parallel computing

The binaries of the Gaussian module can run in serial and shared-memory parallel mode. Switching between the serial and parallel version is done via statement

%NProcShare=N

in section Link 0 commands before the route section at the beginning of the Gaussian input file. The number of cores requested from the queueing system (i.e. --ntasks-per-node=N) must be identical to %NProcShared=N as specified in the Gaussian input file. The installed Gaussian binaries are shared-memory parallel only. Therefore only single node jobs do make sense. Without NProcShare Gaussian will use only one core by default.

3 Usage

3.1 Loading the module

3.2 Running Gaussian interactively

After loading the Gaussian module you can run a quick interactive example by executing

$ time g16 < $GAUSSIAN_EXA_DIR/test0553-4core-parallel.com

In most cases running Gaussian requires setting up the command input file and redirecting that input into command g16.

3.3 Creating Gaussian input files

For documentation about how to construct input files see the Gaussian manual. In addition the program Gaussview is a very good graphical user interface for constructing molecules and for setting up calculations. Finally these calculation setups can be saved as Gaussian command files and thereafter can be submitted to the cluster with help of the queueing system examples below.

3.4 Disk usage

By default, scratch files of Gaussian are placed in GAUSS_SCRDIR as displayed when loading the Gaussian module. In most cases the module load command of Gaussian should set the GAUSS_SCRDIR pointing to an optimal node-local file system. When running multiple Gaussian jobs together on one node a user may want to add one more sub-directory level containing e.g. job id and job name for clarity - if not done so already by the queueing system.

Predicting how much disk space a specific Gaussian calculation can be a difficult task. It requires experience with the methods, the basis sets, the calculated properties and the system you are investigating. The best advice is probably to start with small basis sets and small example systems, run such example calculations and observe their (hopefully small) disk usage while the job is running. Then read the Gaussian documentation about scaling behaviour and basis set sizes (the basis set size of the current calculation is printed at the beginning of the output of the Gaussian job). Finally try to extrapolate to your desired final system and basis set.

You can also try to specify a fixed amount of disk space for a calculation. This is done by adding a statement like

%MaxDisk=50000MB

to the route section of the Gaussian input file. But please be aware that (a) Gaussian does not necessarily obey the specified value] and (b) you might force Gaussian to select a slower algorithm when specifying an inappropriate value.

In any case please make sure that you request more node-local disk space from the queueing system then you have specified in the Gausian input file. For information on how much node-local disk space is available at the cluster and how to request a certain amount of node-local disk space for a calculation from the queueing system, please consult the cluster specific queueing system documentation as well as the queueing system examples of the Gaussian module as described below.

Except for very short interactive test jobs please never run Gaussian calculations in any globally mounted directory like your $HOME or $WORK directory.

3.5 Memory usage

Predicting the memory requirements of a job is as difficult as predicting the disk requirements. The strategies are very similar. So for a large new unknown system, start with smaller test systems and smaller basis sets and then extrapolate.

You may specify the memory for a calculation explicitly in the route section of the Gaussian input file, for example

%Mem=10000MB

Gaussian usually obeys this value rather well. We have seen calculations that exceed the Mem value by at most by 2GB. Therefore it is usually sufficient to request Mem+2GB from the queueing system.

But please carefully monitor the output of Gaussian when restricting the memory in the input file. Gaussian automatically switches between algorithms (e.g. recalculating values instead of storing them) when specifying too low memory values. So when the output is indicating that with more memory the integrals could be kept in memory (just an example for one of the messages), the calculation will be faster when assigning more memory.

In case of shared-memory parallel jobs the number of workers has only minor influence on the memory consumption (maybe up to 10%). This is since all workers work together on one common data set.

3.6 Using SSD systems efficiently

Compared with conventional disks SSD's are far more than 1000 times faster when serving random-IO requests. Therefore some of the default strategies of Gaussian, e.g. recalculate some values instead of storing them on disk, might not be optimal in all cases. Of course this is only relevant when there is not enough RAM to store the intermediate values, e.g. two centre integrals, etc.

So if you plan to do many huge calculations that do not fit into the RAM, you may want to compare the execution time of a job that is re-calculating the intermediate values whenever needed and a job that forces these values to be written to and read from the node-local SSD's. Depending on how much time it costs to re-calculate the intermediate values, using the SSD's can be much faster.

4 Examples

4.1 Queueing system template provided by Gaussian module

The Gaussian module provides a simple Slurm example (Hexanitroethan C2N6O12) that runs a 4 core parallel single energy point calculation using method B3LYP and basis set 6-31g(df,pd). To submit the example do the following steps:

$ module load chem/gaussian
$ cp -v ${GAUSSIAN_EXA_DIR}/bwforcluster-gaussian-example.sbatch ./
$ sbatch bwforcluster-gaussian-example.sbatch

The last step submits the job example script bwforcluster-gaussian-example.sbatch to the queueing system. Once started all temporary files are kept below directory $SCRATCH only visible on the compute node where the job is running. When option --gres=scratch:nnn has been specified while submitting the job script, then $SCRATCH points to the node-local SSDs. Otherwise (option --gres=scratch:nnn has not been specified) $SCRATCH points to a RAM disk. Please carefully read this local file system documentation as well as the comments in the queueing system example script bwforcluster-gaussian-example.sbatch.

4.2 Direct submission of Gaussian command files

For users who do not want to deal with queueing system scripts we have created a submit command that automatically creates and submits queueing system scripts for Gaussian. For example:

$ module load chem/gaussian
$ cp -v $GAUSSIAN_EXA_DIR/test0553-4core-parallel.com ./
$ gauss_sub test0553-4core-parallel.com

4.3 Caveat for windows users

If you have transferred the Gaussian input file from a Windows computer to Unix then make sure to convert the line breaks of Windows (<CR>+<LF>) to Unix (only <LF>). Otherwise Gaussian will write strange error messages. Typical Unix commands for that are: 'dos2unix' and 'unix2dos'. Example:

$ dos2unix test0553-4core-parallel.com