Matlab

From bwHPC Wiki
Revision as of 09:13, 8 May 2014 by R Barthel (talk | contribs) (Created page with "== Accessing and basic usage == MATLAB is a high-level language and interactive environment for numerical computation, visualization and programming. In order to check what MA…")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

1 Accessing and basic usage

MATLAB is a high-level language and interactive environment for numerical computation, visualization and programming.

In order to check what MATLAB versions are installed on the system, run the following command:

$ module avail math/matlab

Typically, several MATLAB versions might be available.

The default version can be accessed by loading the appropriate module:

$ module load math/matlab

Other versions of MATLAB (if available) can be loaded as:

$ module load math/matlab/<version>

with <version> specifying the desired version.

An interactive MATLAB session with graphical user interface (GUI) can be started with the command (requires X11 forwarding enabled for your ssh login):

$ matlab

Since graphics rendering can be very slow on remote connections, the preferable way is to run the MATLAB command line interface without GUI:

$ matlab -nodisplay

In general it is not advisable to invoke an interactive MATLAB session on a login node of the cluster. The recommended way to run a long-duration interactive MATLAB session is to submit an interactive job and start MATLAB from within the dedicated compute node assigned to you by the queueing system (consult the specific cluster system users guide on how to submit interactive jobs).

The following command will execute a MATLAB script on a single thread:

matlab -nodisplay -singleCompThread -r script >result.out 2>&1

The output of this session will then be redirected to the file result.out. The-singleCompThreadoption prevents the creation of multiple threads. Most of the time, running MATLAB in single-threaded mode (as described above) will meet your needs. But if you have very mathematically intense computations that might benefit from the built-in multi-threading provided by MATLAB's BLAS and FFT implementation, then you can experiment with running in multi-threaded mode by omitting the -singleCompThreadoption (but see below on how to control the number of threads).

As with all processes that require more than a few minutes to run, non-trivial MATLAB jobs must be submitted to the cluster queuing system.

Example batch scripts are available in the directory pointed to by the environment variable $MATLAB_EXA_DIR after having loaded the module.

The help page of the MATLAB module may provide more version specific information:

module help math/matlab

2 General performance tips for MATLAB

2.1 Vectorization

2.2 Preallocation

In MATLAB data structures (such arrays or matrices) are dynamic in size, i.e. MATLAB will automatically resize the structure on demand. Although this seems to be very convenient, MATLAB needs to allocate a new chunk of contiguous memory and copy over the data to the new block of memory as the array or matrix grows in a loop. This may take a significant amount of extra time during execution of the program. Code performance can often be drastically improved by preallocating memory for the final expected size of the array or matrix before actually starting the processing loop. In order to preallocate an array (or matrix) of numbers, you can use the zeros function. In order to preallocate an array of strings, you can use the cell function.

The performance benefit of preallocation is illustrated in the following example code:

% prealloc.m

clear all;

num=10000000;

disp('Without preallocation:')
tic
for i=1:num
    a(i)=i;
end
toc


disp('With preallocation:')
tic
b=zeros(1,num);
for i=1:num
    b(i)=i;
end
toc

On a compute node, the result may look like this:


>>prealloc
Without preallocation:
Elapsed time is 2.879446 seconds.
With preallocation:
Elapsed time is 0.097557 seconds.

In this case, the code runs almost 30 times faster with preallocation.

3 Parallel Computing with MATLAB

3.1 Implicit Threading

A large number of built-in MATLAB functions may utilize multiple cores automatically without any code modifications required. This is referred to as implicit multithreading and must be strictly distinguished from explicit parallelism provided by the Parallel Computing Toolbox (PCT) which requires specific commands in your code in order to create threads.

Implicit threading particularly takes place for linear algebra operations (such as the solution to a linear system A\b or matrix products A*B) and FFT operations. Many other high-level MATLAB functions do also benefit from multithreading capabilities of their underlying routines. However, the user can still enforce single threaded mode by adding the-singleCompThreadcommand line option.

Whenever implicit threading takes place, MATLAB will detect the total number of cores that exist on a machine and by default makes use of all of them. This has very important implications for MATLAB jobs in HPC environments with shared-node job scheduling policy (i.e. with multiple users sharing one compute node). Due to the behaviour described above, a MATLAB job may take over more compute ressources than assigned by the queueing system of the cluster (and thereby taking away these ressources from all other users having jobs running on the same node - including your own jobs).

Therefore, when running in multi-threaded mode, MATLAB always requires the user's intervention to not allocate all cores of the machine (unless requested so from the queueing system). The number of threads must be controlled from within the code by means of the maxNumCompThreads(N)function (which is supposed to be deprecated) or, alternatively, with the feature('numThreads', N) function (which is currently undocumented).