BwUniCluster2.0/Software/Matlab
The main documentation is available on the cluster via |
Description | Content |
---|---|
module load | math/matlab |
License | Academic License/Commercial |
Citing | n/a |
Links | MATLAB Homepage | MathWorks Homepage | Support and more |
Graphical Interface | No |
Description
MATLAB (MATrix LABoratory) is a high-level programming language and interactive computing environment for numerical calculation and data visualization.
Loading MATLAB
It is not advisable to invoke an interactive MATLAB session on a login node of the cluster. Such sessions will be terminated automatically. The recommended way to run a long-duration interactive MATLAB session is to submit an interactive job and start MATLAB from within the dedicated compute node assigned to you by the queueing system (consult the specific cluster users guide on how to submit interactive jobs). |
An interactive MATLAB session with graphical user interface (GUI) can be started with the command (requires X11 forwarding enabled for your ssh login):
$ matlab
Since graphics rendering can be very slow on remote connections, the preferable way is to run the MATLAB command line interface without GUI:
$ matlab -nodisplay
The following command will execute a MATLAB script or function named "example" on a single thread:
$ matlab -nodisplay -singleCompThread -r example > result.out 2>&1
The output of this session will be redirected to the file result.out. The option -r executes the MATLAB statement non-interactively. The option -singleCompThread limits MATLAB to single computational thread. Most of the time, running MATLAB in single-threaded mode will meet your needs. But if you have mathematically intense computations that benefit from the built-in multithreading provided by MATLAB's BLAS and FFT implementation, then you can experiment with running in multi-threaded mode by omitting this option (see section 4.1 - Implicit Threading).
As with all processes that require more than a few minutes to run, non-trivial MATLAB jobs must be submitted to the cluster queuing system. Example batch scripts are available in the directory pointed to by the environment variable $MATLAB_EXA_DIR.
Parallel Computing Using MATLAB
Parallelization of MATLAB jobs is realized via the built-in multithreading provided by MATLAB's BLAS and FFT implementation and the parallel computing functionality of MATLAB's Parallel Computing Toolbox (PCT). The MATLAB Parallel/Distributed Computing Server is not available on the bwHPC-Clusters.
Implicit Threading
A large number of built-in MATLAB functions may utilize multiple cores automatically without any code modifications required. This is referred to as implicit multithreading and must be strictly distinguished from explicit parallelism provided by the Parallel Computing Toolbox (PCT) which requires specific commands in your code in order to create threads.
Implicit threading particularly takes place for linear algebra operations (such as the solution to a linear system A\b or matrix products A*B) and FFT operations. Many other high-level MATLAB functions do also benefit from multithreading capabilities of their underlying routines. However, the user can still enforce single-threaded mode by adding the command line option -singleCompThread.
Whenever implicit threading takes place, MATLAB will detect the total number of cores that exist on a machine and by default makes use of all of them. This has very important implications for MATLAB jobs in HPC environments with shared-node job scheduling policy (i.e. with multiple users sharing one compute node). Due to this behaviour, a MATLAB job may take over more compute resources than assigned by the queueing system of the cluster (and thereby taking away these resources from all other users with running jobs on the same node - including your own jobs).
Therefore, when running in multi-threaded mode, MATLAB always requires the user's intervention to not allocate all cores of the machine (unless requested so from the queueing system). The number of threads must be controlled from within the code by means of the maxNumCompThreads(N) function (which is supposed to be deprecated) or, alternatively, with the feature('numThreads', N) function (which is currently undocumented).
Using the Parallel Computing Toolbox (PCT)
By using the PCT one can make explicit use of several cores on multicore processors to parallelize MATLAB applications without MPI programming. Under MATLAB version 8.4 and earlier, this toolbox provides 12 workers (MATLAB computational engines) to execute applications locally on a single multicore node. Under MATLAB version 8.5 and later, the number of workers available is equal to the number of cores on a single node (up to a maximum of 512).
If multiple PCT jobs are running at the same time, they all write temporary MATLAB job information to the same location. This race condition can cause one or more of the parallel MATLAB jobs fail to use the parallel functionality of the toolbox.
To solve this issue, each MATLAB job should explicitly set a unique location where these files are created. This can be accomplished by the following snippet of code added to your MATLAB script.
% create a local cluster object
pc = parcluster('local')
% get the number of dedicated cores from environment
pc.NumWorkers = str2num(getenv('SLURM_NPROCS'))
% explicitly set the JobStorageLocation to the tmp directory that is unique to each cluster job (and is on local, fast scratch)
parpool_tmpdir = [getenv('TMP'),'/.matlab/local_cluster_jobs/slurm_jobID_',getenv('SLURM_JOB_ID')]
mkdir(parpool_tmpdir)
pc.JobStorageLocation = parpool_tmpdir
% start the parallel pool
parpool(pc)
|
Note: The code snippet also sets the correct number of parallel workers in MATLAB according to the total number of processes dedicated to the job given by the environment variable $SLURM_NPROCS in the job submission file.
General Performance Tips for MATLAB
MATLAB data structures (arrays or matrices) are dynamic in size, i.e. MATLAB will automatically resize the structure on demand. Although this seems to be convenient, MATLAB continually needs to allocate a new chunk of memory and copy over the data to the new block of memory as the array or matrix grows in a loop. This may take a significant amount of extra time during execution of the program.
Code performance can often be drastically improved by preallocating memory for the final expected size of the array or matrix before actually starting the processing loop. In order to preallocate an array of strings, you can use MATLAB's build-in cell function. In order to preallocate an array or matrix of numbers, you can use MATLAB's build-in zeros function.
The performance benefit of preallocation is illustrated with the following example code.
% prealloc.m
clear all;
num=10000000;
disp('Without preallocation:')
tic
for i=1:num
a(i)=i;
end
toc
disp('With preallocation:')
tic
b=zeros(1,num);
for i=1:num
b(i)=i;
end
toc
|
On a compute node, the result may look like this:
Without preallocation: Elapsed time is 2.879446 seconds. With preallocation: Elapsed time is 0.097557 seconds.
Please recognize that the code runs almost 30 times faster with preallocation.