Helix/Software/Matlab
The main documentation is available via |
Description | Content |
---|---|
module load | math/matlab |
License | Academic License/Commercial |
Citing | n/a |
Links | MATLAB Homepage | MathWorks Homepage | Support and more |
Graphical Interface | No |
Description
MATLAB (MATrix LABoratory) is a high-level programming language and interactive computing environment for numerical calculation and data visualization.
Loading MATLAB
The preferable way is to run the MATLAB command line interface without GUI:
$ matlab -nodisplay
An interactive MATLAB session with graphical user interface (GUI) can be started with the command (requires X11 forwarding enabled for your ssh login):
$ matlab
Note: Do not start a long-duration interactive MATLAB session on a login node of the cluster. Submit an interactive job and start MATLAB from within the dedicated compute node assigned to you by the queueing system.
The following generic command will execute a MATLAB script or function named "example":
$ matlab -nodisplay -r example > result.out 2>&1
The output of this session will be redirected to the file result.out. The option -r executes the MATLAB statement non-interactively.
Parallel Computing Using MATLAB
Parallelization of MATLAB jobs is realized via the built-in multi-threading provided by MATLAB's BLAS and FFT implementation and the parallel computing functionality of MATLAB's Parallel Computing Toolbox (PCT).
Implicit Threading
A large number of built-in MATLAB functions may utilize multiple cores automatically without any code modifications required. This is referred to as implicit multi-threading and must be strictly distinguished from explicit parallelism provided by the Parallel Computing Toolbox (PCT) which requires specific commands in your code in order to create threads.
Implicit threading particularly takes place for linear algebra operations (such as the solution to a linear system A\b or matrix products A*B) and FFT operations. Many other high-level MATLAB functions do also benefit from multi-threading capabilities of their underlying routines. However, the user can still enforce single-threaded mode by adding the command line option -singleCompThread.
Whenever implicit threading takes place, MATLAB will detect the total number of cores that exist on a machine and by default makes use of all of them. This has very important implications for MATLAB jobs in HPC environments with shared-node job scheduling policy (i.e. with multiple users sharing one compute node). Due to this behaviour, a MATLAB job may take over more compute resources than assigned by the queueing system of the cluster (and thereby taking away these resources from all other users with running jobs on the same node - including your own jobs).
Therefore, when running in multi-threaded mode, MATLAB always requires the user's intervention to not allocate all cores of the machine (unless requested so from the queueing system). The number of threads must be controlled from within the code by means of the maxNumCompThreads(N) function (which is supposed to be deprecated) or, alternatively, with the feature('numThreads', N) function (which is currently undocumented).
Using the Parallel Computing Toolbox (PCT)
By using the PCT one can make explicit use of several cores on multicore processors to parallelize MATLAB applications without MPI programming. Under MATLAB version 8.4 and earlier, this toolbox provides 12 workers (MATLAB computational engines) to execute applications locally on a single multicore node. Under MATLAB version 8.5 and later, the number of workers available is equal to the number of cores on a single node (up to a maximum of 512).
If multiple PCT jobs are running at the same time, they all write temporary MATLAB job information to the same location. This race condition can cause one or more of the parallel MATLAB jobs fail to use the parallel functionality of the toolbox.
To solve this issue, each MATLAB job should explicitly set a unique location where these files are created. This can be accomplished by the following snippet of code added to your MATLAB script.
% create a local cluster object
pc = parcluster('local')
% get the number of dedicated cores from environment
num_workers = str2num(getenv('SLURM_NPROCS'))
% explicitly set the JobStorageLocation to the tmp directory that is unique to each cluster job (and is on local, fast scratch)
parpool_tmpdir = [getenv('TMP'),'/.matlab/local_cluster_jobs/slurm_jobID_',getenv('SLURM_JOB_ID')]
mkdir(parpool_tmpdir)
pc.JobStorageLocation = parpool_tmpdir
% start the parallel pool
parpool(pc,num_workers)
|
Additionally, if a large number of MATLAB-jobs are run in parallel, they can also conflict when writing generic information to ~/.matlab. This can be circumvented by setting $MATLAB_PREFDIR to different directories, e.g.
export MATLAB_PREFDIR=$TMP
General Performance Tips for MATLAB
MATLAB data structures (arrays or matrices) are dynamic in size, i.e. MATLAB will automatically resize the structure on demand. Although this seems to be convenient, MATLAB continually needs to allocate a new chunk of memory and copy over the data to the new block of memory as the array or matrix grows in a loop. This may take a significant amount of extra time during execution of the program.
Code performance can often be drastically improved by pre-allocating memory for the final expected size of the array or matrix before actually starting the processing loop. In order to pre-allocate an array of strings, you can use MATLAB's build-in cell function. In order to pre-allocate an array or matrix of numbers, you can use MATLAB's build-in zeros function.
The performance benefit of pre-allocation is illustrated with the following example code.
% prealloc.m
clear all;
num=10000000;
disp('Without pre-allocation:')
tic
for i=1:num
a(i)=i;
end
toc
disp('With pre-allocation:')
tic
b=zeros(1,num);
for i=1:num
b(i)=i;
end
toc
|
On a compute node, the result may look like this:
Without pre-allocation: Elapsed time is 2.879446 seconds. With pre-allocation: Elapsed time is 0.097557 seconds.
Please recognize that the code runs almost 30 times faster with pre-allocation.