Helix/Software/Matlab: Difference between revisions
H Winkhardt (talk | contribs) (Seite erstellt) |
H Winkhardt (talk | contribs) (Formatting) |
||
(9 intermediate revisions by the same user not shown) | |||
Line 37: | Line 37: | ||
Note: Do not start a long-duration interactive MATLAB session on a login node of the cluster. Submit an [[Helix/Slurm#Interactive_Jobs | interactive job]] and start MATLAB from within the dedicated compute node assigned to you by the queueing system. |
Note: Do not start a long-duration interactive MATLAB session on a login node of the cluster. Submit an [[Helix/Slurm#Interactive_Jobs | interactive job]] and start MATLAB from within the dedicated compute node assigned to you by the queueing system. |
||
The following command will execute a MATLAB script or function named "example" |
The following generic command will execute a MATLAB script or function named "example": |
||
<pre>$ matlab -nodisplay -r example > result.out 2>&1</pre> |
<pre>$ matlab -nodisplay -r example > result.out 2>&1</pre> |
||
The output of this session will be redirected to the file result.out. The option < |
The output of this session will be redirected to the file result.out. The option <syntaxhighlight style="border:0px" inline=1>-r</syntaxhighlight> executes the MATLAB statement non-interactively. |
||
= Parallel Computing Using MATLAB = |
= Parallel Computing Using MATLAB = |
||
Line 50: | Line 50: | ||
A large number of built-in MATLAB functions may utilize multiple cores automatically without any code modifications required. This is referred to as implicit multi-threading and must be strictly distinguished from explicit parallelism provided by the Parallel Computing Toolbox (PCT) which requires specific commands in your code in order to create threads. |
A large number of built-in MATLAB functions may utilize multiple cores automatically without any code modifications required. This is referred to as implicit multi-threading and must be strictly distinguished from explicit parallelism provided by the Parallel Computing Toolbox (PCT) which requires specific commands in your code in order to create threads. |
||
Implicit threading particularly takes place for linear algebra operations (such as the solution to a linear system A\b or matrix products A*B) and FFT operations. Many other high-level MATLAB functions do also benefit from multi-threading capabilities of their underlying routines. |
Implicit threading particularly takes place for linear algebra operations (such as the solution to a linear system A\b or matrix products A*B) and FFT operations. Many other high-level MATLAB functions do also benefit from multi-threading capabilities of their underlying routines. If multi-threading is not desired, single-threading can be enforced by adding the command line option <syntaxhighlight style="border:0px" inline=1>-singleCompThread</syntaxhighlight>. |
||
Whenever implicit threading takes place, MATLAB will detect the total number of cores that exist on a machine and by default makes use of all of them. This has very important implications for MATLAB jobs in HPC environments with shared-node job scheduling policy (i.e. with multiple users sharing one compute node). Due to this behaviour, a MATLAB job may take over more compute resources than assigned by the queueing system of the cluster (and thereby taking away these resources from all other users with running jobs on the same node - including your own jobs). |
Whenever implicit threading takes place, MATLAB will detect the total number of cores that exist on a machine and by default makes use of all of them. This has very important implications for MATLAB jobs in HPC environments with shared-node job scheduling policy (i.e. with multiple users sharing one compute node). Due to this behaviour, a MATLAB job may take over more compute resources than assigned by the queueing system of the cluster (and thereby taking away these resources from all other users with running jobs on the same node - including your own jobs). |
||
Therefore, when running in multi-threaded mode, MATLAB always requires the user's intervention to not allocate all cores of the machine (unless requested so from the queueing system). The number of threads must be controlled from within the code by means of the < |
Therefore, when running in multi-threaded mode, MATLAB always requires the user's intervention to not allocate all cores of the machine (unless requested so from the queueing system). The number of threads must be controlled from within the code by means of the <syntaxhighlight style="border:0px" inline=1>maxNumCompThreads(N)</syntaxhighlight> function or, alternatively, with the <syntaxhighlight style="border:0px" inline=1>feature('numThreads', N)</syntaxhighlight> function (which is undocumented). |
||
== Using the Parallel Computing Toolbox (PCT) == |
== Using the Parallel Computing Toolbox (PCT) == |
||
Line 64: | Line 64: | ||
To solve this issue, each MATLAB job should explicitly set a unique location where these files are created. This can be accomplished by the following snippet of code added to your MATLAB script. |
To solve this issue, each MATLAB job should explicitly set a unique location where these files are created. This can be accomplished by the following snippet of code added to your MATLAB script. |
||
<pre> |
|||
{{bwFrameA| |
|||
<source lang="Matlab"> |
|||
% create a local cluster object |
% create a local cluster object |
||
pc = parcluster('local') |
pc = parcluster('local') |
||
% get the number of dedicated cores from environment |
% get the number of dedicated cores from environment |
||
nprocs = str2num(getenv('SLURM_NPROCS')) |
|||
% explicitly set the JobStorageLocation to the tmp directory that is unique to each cluster job (and is on local, fast scratch) |
% you may explicitly set the JobStorageLocation to the tmp directory that is unique to each cluster job (and is on local, fast scratch) |
||
parpool_tmpdir = [getenv('TMP'),'/.matlab/local_cluster_jobs/slurm_jobID_',getenv('SLURM_JOB_ID')] |
parpool_tmpdir = [getenv('TMP'),'/.matlab/local_cluster_jobs/slurm_jobID_',getenv('SLURM_JOB_ID')] |
||
mkdir(parpool_tmpdir) |
mkdir(parpool_tmpdir) |
||
Line 79: | Line 77: | ||
% start the parallel pool |
% start the parallel pool |
||
parpool(pc, |
parpool(pc, nprocs) |
||
</pre> |
|||
</source> |
|||
}} |
|||
If a large number of MATLAB-jobs are run in parallel, they can also conflict when writing generic information to <syntaxhighlight style="border:0px" inline=1>~/.matlab</syntaxhighlight>. This can be circumvented by setting <syntaxhighlight style="border:0px" inline=1>$MATLAB_PREFDIR</syntaxhighlight> to different directories in your Batch-script, e.g. |
|||
Note: The code snippet also sets the correct number of parallel workers in MATLAB according to the total number of processes dedicated to the job given by the environment variable <span style="background:#edeae2;margin:2px;padding:1px;border:1px dotted #808080">$SLURM_NPROCS</span> in the job submission file. |
|||
<pre>export MATLAB_PREFDIR=$TMP</pre> |
|||
== Using a different implementation of BLAS/LAPACK == |
|||
By default, Matlab uses a version of Intel MKL as its BLAS/LAPACK library. It is possible to manually change this to different libraries by setting the <syntaxhighlight style="border:0px" inline=1>BLAS_VERSION</syntaxhighlight> and <syntaxhighlight style="border:0px" inline=1>LAPACK_VERSION</syntaxhighlight> environment variables. The following lines can be added to the batch-script to change it, in this example to BLIS and Flame, which are optimized for AMD processors: |
|||
<pre> |
|||
module load numlib/aocl/3.2.0 |
|||
export BLAS_VERSION=$AOCL_LIB_DIR/libblis-mt.so |
|||
export LAPACK_VERSION=$AOCL_LIB_DIR/libflame.so |
|||
export BLIS_NUM_THREADS=$SLURM_NTASKS |
|||
</pre> |
|||
This can increase performance depending on the task, for example large matrix multiplications, but caution is advised. |
|||
= General Performance Tips for MATLAB = |
= General Performance Tips for MATLAB = |
||
Line 94: | Line 108: | ||
The performance benefit of pre-allocation is illustrated with the following example code. |
The performance benefit of pre-allocation is illustrated with the following example code. |
||
<pre> |
|||
{{bwFrameA| |
|||
<source lang="Matlab"> |
|||
% prealloc.m |
% prealloc.m |
||
Line 117: | Line 129: | ||
end |
end |
||
toc |
toc |
||
</pre> |
|||
</source> |
|||
}} |
|||
On a compute node, the result may look like this: |
On a compute node, the result may look like this: |
||
Line 131: | Line 141: | ||
Please recognize that the code runs almost 30 times faster with pre-allocation. |
Please recognize that the code runs almost 30 times faster with pre-allocation. |
||
= Compile MATLAB binaries with mcc = |
|||
MATLAB on Helix comes with a compiler, <code>mcc</code>, that can be used to create binaries from MATLAB-code. |
|||
Stand-alone MATLAB programs compiled with <code>mcc</code> do not require any license tokens at runtime and you can start jobs in parallel without any risk of running out of licences. |
Latest revision as of 14:04, 8 October 2024
The main documentation is available via |
Description | Content |
---|---|
module load | math/matlab |
License | Academic License/Commercial |
Citing | n/a |
Links | MATLAB Homepage | MathWorks Homepage | Support and more |
Graphical Interface | No |
Description
MATLAB (MATrix LABoratory) is a high-level programming language and interactive computing environment for numerical calculation and data visualization.
Loading MATLAB
The preferable way is to run the MATLAB command line interface without GUI:
$ matlab -nodisplay
An interactive MATLAB session with graphical user interface (GUI) can be started with the command (requires X11 forwarding enabled for your ssh login):
$ matlab
Note: Do not start a long-duration interactive MATLAB session on a login node of the cluster. Submit an interactive job and start MATLAB from within the dedicated compute node assigned to you by the queueing system.
The following generic command will execute a MATLAB script or function named "example":
$ matlab -nodisplay -r example > result.out 2>&1
The output of this session will be redirected to the file result.out. The option -r
executes the MATLAB statement non-interactively.
Parallel Computing Using MATLAB
Parallelization of MATLAB jobs is realized via the built-in multi-threading provided by MATLAB's BLAS and FFT implementation and the parallel computing functionality of MATLAB's Parallel Computing Toolbox (PCT).
Implicit Threading
A large number of built-in MATLAB functions may utilize multiple cores automatically without any code modifications required. This is referred to as implicit multi-threading and must be strictly distinguished from explicit parallelism provided by the Parallel Computing Toolbox (PCT) which requires specific commands in your code in order to create threads.
Implicit threading particularly takes place for linear algebra operations (such as the solution to a linear system A\b or matrix products A*B) and FFT operations. Many other high-level MATLAB functions do also benefit from multi-threading capabilities of their underlying routines. If multi-threading is not desired, single-threading can be enforced by adding the command line option -singleCompThread
.
Whenever implicit threading takes place, MATLAB will detect the total number of cores that exist on a machine and by default makes use of all of them. This has very important implications for MATLAB jobs in HPC environments with shared-node job scheduling policy (i.e. with multiple users sharing one compute node). Due to this behaviour, a MATLAB job may take over more compute resources than assigned by the queueing system of the cluster (and thereby taking away these resources from all other users with running jobs on the same node - including your own jobs).
Therefore, when running in multi-threaded mode, MATLAB always requires the user's intervention to not allocate all cores of the machine (unless requested so from the queueing system). The number of threads must be controlled from within the code by means of the maxNumCompThreads(N)
function or, alternatively, with the feature('numThreads', N)
function (which is undocumented).
Using the Parallel Computing Toolbox (PCT)
By using the PCT one can make explicit use of several cores on multicore processors to parallelize MATLAB applications without MPI programming. Under MATLAB version 8.4 and earlier, this toolbox provides 12 workers (MATLAB computational engines) to execute applications locally on a single multicore node. Under MATLAB version 8.5 and later, the number of workers available is equal to the number of cores on a single node (up to a maximum of 512).
If multiple PCT jobs are running at the same time, they all write temporary MATLAB job information to the same location. This race condition can cause one or more of the parallel MATLAB jobs fail to use the parallel functionality of the toolbox.
To solve this issue, each MATLAB job should explicitly set a unique location where these files are created. This can be accomplished by the following snippet of code added to your MATLAB script.
% create a local cluster object pc = parcluster('local') % get the number of dedicated cores from environment nprocs = str2num(getenv('SLURM_NPROCS')) % you may explicitly set the JobStorageLocation to the tmp directory that is unique to each cluster job (and is on local, fast scratch) parpool_tmpdir = [getenv('TMP'),'/.matlab/local_cluster_jobs/slurm_jobID_',getenv('SLURM_JOB_ID')] mkdir(parpool_tmpdir) pc.JobStorageLocation = parpool_tmpdir % start the parallel pool parpool(pc, nprocs)
If a large number of MATLAB-jobs are run in parallel, they can also conflict when writing generic information to ~/.matlab
. This can be circumvented by setting $MATLAB_PREFDIR
to different directories in your Batch-script, e.g.
export MATLAB_PREFDIR=$TMP
Using a different implementation of BLAS/LAPACK
By default, Matlab uses a version of Intel MKL as its BLAS/LAPACK library. It is possible to manually change this to different libraries by setting the BLAS_VERSION
and LAPACK_VERSION
environment variables. The following lines can be added to the batch-script to change it, in this example to BLIS and Flame, which are optimized for AMD processors:
module load numlib/aocl/3.2.0 export BLAS_VERSION=$AOCL_LIB_DIR/libblis-mt.so export LAPACK_VERSION=$AOCL_LIB_DIR/libflame.so export BLIS_NUM_THREADS=$SLURM_NTASKS
This can increase performance depending on the task, for example large matrix multiplications, but caution is advised.
General Performance Tips for MATLAB
MATLAB data structures (arrays or matrices) are dynamic in size, i.e. MATLAB will automatically resize the structure on demand. Although this seems to be convenient, MATLAB continually needs to allocate a new chunk of memory and copy over the data to the new block of memory as the array or matrix grows in a loop. This may take a significant amount of extra time during execution of the program.
Code performance can often be drastically improved by pre-allocating memory for the final expected size of the array or matrix before actually starting the processing loop. In order to pre-allocate an array of strings, you can use MATLAB's build-in cell function. In order to pre-allocate an array or matrix of numbers, you can use MATLAB's build-in zeros function.
The performance benefit of pre-allocation is illustrated with the following example code.
% prealloc.m clear all; num=10000000; disp('Without pre-allocation:') tic for i=1:num a(i)=i; end toc disp('With pre-allocation:') tic b=zeros(1,num); for i=1:num b(i)=i; end toc
On a compute node, the result may look like this:
Without pre-allocation: Elapsed time is 2.879446 seconds. With pre-allocation: Elapsed time is 0.097557 seconds.
Please recognize that the code runs almost 30 times faster with pre-allocation.
Compile MATLAB binaries with mcc
MATLAB on Helix comes with a compiler, mcc
, that can be used to create binaries from MATLAB-code.
Stand-alone MATLAB programs compiled with mcc
do not require any license tokens at runtime and you can start jobs in parallel without any risk of running out of licences.