Matlab

From bwHPC Wiki
Jump to navigation Jump to search

Accessing and basic usage

MATLAB is a high-level language and interactive environment for numerical computation, visualization and programming.

In order to check what MATLAB versions are installed on the system, run the following command:

$ module avail math/matlab

Typically, several MATLAB versions might be available.

The default version can be accessed by loading the appropriate module:

$ module load math/matlab

Other versions of MATLAB (if available) can be loaded as:

$ module load math/matlab/<version>

with <version> specifying the desired version.

An interactive MATLAB session with graphical user interface (GUI) can be started with the command (requires X11 forwarding enabled for your ssh login):

$ matlab

Since graphics rendering can be very slow on remote connections, the preferable way is to run the MATLAB command line interface without GUI:

$ matlab -nodisplay

In general it is not advisable to invoke an interactive MATLAB session on a login node of the cluster. The recommended way to run a long-duration interactive MATLAB session is to submit an interactive job and start MATLAB from within the dedicated compute node assigned to you by the queueing system (consult the specific cluster system users guide on how to submit interactive jobs).

The following command will execute a MATLAB script on a single thread:

matlab -nodisplay -singleCompThread -r script >result.out 2>&1

The output of this session will then be redirected to the file result.out. The-singleCompThreadoption prevents the creation of multiple threads. Most of the time, running MATLAB in single-threaded mode (as described above) will meet your needs. But if you have very mathematically intense computations that might benefit from the built-in multi-threading provided by MATLAB's BLAS and FFT implementation, then you can experiment with running in multi-threaded mode by omitting the -singleCompThreadoption (but see below on how to control the number of threads).

As with all processes that require more than a few minutes to run, non-trivial MATLAB jobs must be submitted to the cluster queuing system.

Example batch scripts are available in the directory pointed to by the environment variable $MATLAB_EXA_DIR after having loaded the module.

The help page of the MATLAB module may provide more version specific information:

$ module help math/matlab

General performance tips for MATLAB

Vectorization

Preallocation

In MATLAB data structures (such arrays or matrices) are dynamic in size, i.e. MATLAB will automatically resize the structure on demand. Although this seems to be very convenient, MATLAB needs to allocate a new chunk of contiguous memory and copy over the data to the new block of memory as the array or matrix grows in a loop. This may take a significant amount of extra time during execution of the program. Code performance can often be drastically improved by preallocating memory for the final expected size of the array or matrix before actually starting the processing loop. In order to preallocate an array (or matrix) of numbers, you can use the zeros function. In order to preallocate an array of strings, you can use the cell function.

The performance benefit of preallocation is illustrated in the following example code:

% prealloc.m

clear all;

num=10000000;

disp('Without preallocation:')
tic
for i=1:num
    a(i)=i;
end
toc


disp('With preallocation:')
tic
b=zeros(1,num);
for i=1:num
    b(i)=i;
end
toc


On a compute node, the result may look like this:


>>prealloc
Without preallocation:
Elapsed time is 2.879446 seconds.
With preallocation:
Elapsed time is 0.097557 seconds.

In this case, the code runs almost 30 times faster with preallocation.

Parallel Computing with MATLAB

Implicit Threading

A large number of built-in MATLAB functions may utilize multiple cores automatically without any code modifications required. This is referred to as implicit multithreading and must be strictly distinguished from explicit parallelism provided by the Parallel Computing Toolbox (PCT) which requires specific commands in your code in order to create threads.

Implicit threading particularly takes place for linear algebra operations (such as the solution to a linear system A\b or matrix products A*B) and FFT operations. Many other high-level MATLAB functions do also benefit from multithreading capabilities of their underlying routines. However, the user can still enforce single threaded mode by adding the-singleCompThreadcommand line option.

Whenever implicit threading takes place, MATLAB will detect the total number of cores that exist on a machine and by default makes use of all of them. This has very important implications for MATLAB jobs in HPC environments with shared-node job scheduling policy (i.e. with multiple users sharing one compute node). Due to the behaviour described above, a MATLAB job may take over more compute ressources than assigned by the queueing system of the cluster (and thereby taking away these ressources from all other users having jobs running on the same node - including your own jobs).

Therefore, when running in multi-threaded mode, MATLAB always requires the user's intervention to not allocate all cores of the machine (unless requested so from the queueing system). The number of threads must be controlled from within the code by means of the maxNumCompThreads(N)function (which is supposed to be deprecated) or, alternatively, with the feature('numThreads', N) function (which is currently undocumented).


Remote job submission from your local Matlab GUI

Prequisite: You need a Matlab installation on your desktop computer. For maximum compatibility, your local Matlab version should match the version on the cluster.

Using the MATLAB Distributed Computing Server (MDCS) you can run now computationally intensive MATLAB programs and Simulink models on the bwUniCluster.

You develop your program or model on a multicore desktop computer using Parallel Computing Toolbox and then scale up to many computers by running it on MDCS. The server supports batch jobs, parallel computations, and distributed large data.

You can use the Matlab remote job submission to make the job submission to the HPC clusters as easy as possible for you. Please download the following scripts and follow the steps described in the manual.

The MDCS license comprises a total of 32 workers. Note that one worker is required to manage the batch job and pool of workers. This means a job that needs for example eight workers will consume nine CPU cores.

The following manual (download) describes how you can configure MATLAB within your desktop MATLAB environment to run serial and parallel jobs on the bwUniCluster. This is demonstrated by a simple example scenario. The necessary cluster profiles for R2014b and R2015a can also be downloaded from the links below.

Download manual (Installation and usage)

Download scripts 2014b (Windows/Mac Version)

Download scripts 2014b (Linux Version)

Download scripts 2015a (Windows/Mac Version)

Download scripts 2015a (Linux Version)