BwUniCluster 2.0 Maintenance/2020-10/Software Issues: Difference between revisions

From bwHPC Wiki
Jump to navigation Jump to search
No edit summary
m (Blanked the page)
Tag: Blanking
 
(3 intermediate revisions by one other user not shown)
Line 1: Line 1:
After the last regular [[BwUniCluster_2.0_Maintenance/2020-10|maintenance]] interval (from 06.10.2020 to 13.10.2020) the following issues with Intel MPI exist:

* Intel MPI 2018 is incompatible with Red Hat 8.2. Any invocation, even a simple "Hello World" MPI program, will result in a crash. The ''mpi/impi/2018'' module has therefore been removed.

* There is a bug in Intel MPI 2019.x which leads to crashes when multiple MPI applications which are linked against Intel MPI 2019.x are run on the same node (e.g. in the "single" partition). The first application will run normally, but all others will crash. This can be fixed by setting the environment variable '''I_MPI_HYDRA_TOPOLIB="ipl"'''. The ''mpi/impi/2019'' and ''mpi/impi/2020'' modules provided on the cluster already set this variable.

* There is a bug in Intel MPI 2019.x which leads to incorrect CPU binding/affinity in conjunction with the Slurm batch system used on the clusters. All MPI ranks will run on the same CPU core instead of being bound to all available CPU cores. This can be fixed by setting the environment variable '''I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS="--cpu-bind=none""'''. The ''mpi/impi/2019'' and ''mpi/impi/2020'' modules provided on the cluster already set this variable.

There is a number of Third-Party software modules installed on the cluster system which come with their own copies of various Intel MPI library versions. These software modules fall into the following categories:

== Corrected software modules ==

The following software modules have been corrected by the HPC software maintainers. They should currently work as expected.

* StarCCM+: The included Intel MPI 2018 library was replaced with a more recent version.

* LS-DYNA: The included Intel MPI library was replaced with a more recent version.

* CST: The license does not allow multi-node jobs, so the problematic code paths cannot be used.

== Software modules with known fixes ==

The following software modules require additional user interaction to work:

* ''ANSYS Mechanical'' and ''Fluent'': The software has to be switched to OpenMPI using the '''-mpi=openmpi''' command line argument.

* ''ANSYS CFX'': The software has to be switched to OpenMPI using the '''-start-method 'Open MPI Distributed Parallel' ''' command line argument.

== Software modules without known fixes ==

For the following software modules there is currently no known fix:

* ''cae/abaqus/2019'' (comes with Intel MPI 2017).We are working on a solution.

Non-working software modules will not be removed because they can still be used for pre-/post-processing and single-node parallelisation using e.g. OpenMP.

Latest revision as of 10:17, 15 August 2023