BwUniCluster2.0/Jupyter: Difference between revisions

From bwHPC Wiki
Jump to navigation Jump to search
Line 162: Line 162:


= Installation of further software =
= Installation of further software =

The software provided by the Lmod modules jupyter/base and jupyter/tensorflow can be easily supplemented by additional packages. There are 2 recommended procedures for this.
<ul>
<li>User-Installation of pip packages<br>
<code>pip install --user <packageName></code><br>
The additional packages are installed under <code>$HOME/.local/lib/python3.6/site-packages/</code> which is part of <code>PYTHONPATH</code>.
</li>

<li>Virtual Environments<br>
The user can create and use virtual environments.<br>
Packages provided by the jupyter Lmod modules remain visible and usable.
<pre>
python -m venv <myEnv>
source <myEnv>/bin/activate
pip install <packageName>
</pre>
</li>
</ul>


The software provided by the Lmod modules jupyter/base and jupyter/tensorflow can be easily supplemented by additional Python packages. There are 2 procedures for this.
The software provided by the Lmod modules jupyter/base and jupyter/tensorflow can be easily supplemented by additional Python packages. There are 2 procedures for this.



Revision as of 16:46, 19 January 2022

Jupyter can be used as an alternative to accessing HPC resources via SSH. For this purpose only a web browser is required. Within the website source code of different programming languages can be edited and executed. Furthermore different user interfaces and terminals are available.

Short description of Jupyter

Jupyter is a web application, central component of Jupyter is the Jupyter Notebook. It is a document, which can contain formatted text, executable code sections and (interactive) visualizations (image, sound, video, 3D views).

The Jupyter notebooks are executed in an interactive session on the compute nodes of the respective cluster. Access is via any modern web browser. Data is prepared and visualized on the server and therefore does not have to be transmitted over the network. Only the resulting text, image, sound and video data is transmitted. Starting point of a Jupyter session is the HOME directory of the user on the respective cluster.

JupyterLab is a modern user interface, within which one or more Jupyter notebooks can be opened, edited and executed. The individual notebooks can be arranged as tabs or tiled. JupyterLab is the standard user interface. Besides JupyterLab the classic notebook user interface is available, in which only one Jupyter notebook per browser tab can be opened at a time.

A Jupyter Kernel describes a separate process, in which one Jupyter Notebook is executed at a time. Different kernels are available for different programming languages or language versions.

Before a Jupyter session is started, the access authorization must be checked first. This is done via JupyterHub, where the resources are selected, for example the number of CPU cores, GPUs or the required main memory.

A detailed documentation of the Jupyter project can be found at https://jupyter.readthedocs.io.

Access requirements

To use Jupyter on the HPC resources of SCC, the access requirements for bwUniCluster 2.0 or ForHLR apply. Registration at https://bwidm.scc.kit.edu/ is required.

The Jupyter service is only accessible from within the network of your home organization. To access it from outside, you must first establish a VPN Connection to your home organization.

Currently, it is necessary to log in to the bwUniCluster once via SSH in order to use the Jupyter service. In the future, this step will be omitted.

Login process

Login takes place at

For login, KIT username, KIT password and a 2-factor authentication is required.

If you are not yet logged in to KIT, you will first be automatically redirected to the corresponding login page. Select your home organization (e.g. KIT) and press Continue. In the Login section that appears, enter your KIT username and password (not the service password). After pressing the Login button you will be redirected to the second factor query page. Enter the one-time password (e.g. from KIT Token or Google Authenticator App) and press Validate. Now you will be redirected to the JupyterHub page, after pressing the "Sign in with your KIT Account" button you are logged in.

Anmeldung 750.gif

Selection of the compute resources

The Jupyter notebooks are executed in an interactive session on the compute nodes of the HPC clusters. Just like accessing an interactive session with SSH, resource allocation is done by the Workload Manager Slurm. The selection of resources for Jupyter is realized via drop-down menus. Only jobs with a maximum of one node are possible.

Available resources for selection are

  • Number of CPU cores
  • Number of GPUs
  • Runtime
  • Partition/Queue
  • Amount of main memory

In normal mode, the grayed-out fields contain reasonable preselections of resources, depending on the number of required CPU cores or GPUs respectively. The preselections can be bypassed in advanced mode, where further options are available.

After the selection is made, the interactive job is started with the spawn button. As when requesting interactive compute resources with the `salloc` command, waiting times may occur. These are usually the longer the larger the requested resources are. Even if the chosen resources are available immediately, the spawning process may take up to one minute.

Ressources 750.gif

If by mistake an impossible resource combination is selected, an error message is displayed.

Falsche ressourcen 750.gif

The spawning timeout is currently set to 10 minutes. With a normal workload of the HPC facility, this time is usually sufficient to get interactive resources.

Prioritized access to computing resources on bwUniCluster 2.0

The use of Jupyter requires the immediate availability of computing resources since the JupyterLab server is started within an interactive Slurm session. To improve the availability of GPUs for interactive supercomputing with Jupyter, automatic reservation for GPU (gpu_8) resources has been set up on bwUniCluster 2.0. It is active between 8am and 8pm every day. The reservation is automatically active if

  • no other reservation is set manually
  • advanced mode is disabled

To give you a better overview of the currently available resources, a status indicator has been implemented. It appears when selecting the number of required GPUs and shows whether a Jupyter job of the selected size can currently be started or not. Green means the selected GPU resources are available instantly. Yellow means only a single additonal job of the selected size can be started. Red means there are no GPU resources left that could satisfy the selected amount of resources.

If there are no more resources available within the reservation, you can try selecting a different amount of GPUs or activate Advanced Mode and select a different partition. Availability can be estimated using sinfo_t_idle, which is available when logging in via SSH.

JupyterLab

JupyterLab is the standard user interface. In the following only its essential functions are briefly introduced. A detailed documentation is available at https://jupyterlab.readthedocs.io.

Menu bar

The menu bar at the upper edge of JupyterLab has higher-level menus that display the actions available in JupyterLab along with their shortcut keys. The default menus are:

  • File: Actions related to files and directories
  • Edit: Actions related to editing documents and other activities
  • View: actions that change the appearance of JupyterLab
  • Run: Actions to execute code in various activities like notebooks and code consoles
  • Kernel: Actions to manage kernels that are separate processes for executing code
  • Tabs: a list of open documents and activities in the Dock Panel
  • Settings: general settings and an editor for advanced settings
  • Help: a list of help links to JupyterLab and the kernel

Left sidebar

In the left sidebar there are foldable tabs. The most relevant ones are:

  • File browser: Switch to directories and open files with left mouse button, context menu with right mouse button
  • Running kernels: Overview of running kernels
  • Command overview
  • Tab Overview
  • Lmod software selection: Search and load/unload Lmod software modules

Main working area

The main working area in JupyterLab allows to arrange, resize and divide documents (notebooks, text files, etc.) and other activities (terminals, code consoles, etc.) in tabs. By holding down the left mouse button, the tabs can be grabbed and repositioned.

In a new JupyterLab session the Launcher tab is opened first. It contains buttons for starting new notebooks, code consoles and other functions. When a notebook is open, a new Launcher tab can be started by pressing the plus symbol in the file browser tab of the left sidebar, by calling File > New Launcher in the upper menu bar or by the key combination Ctrl+Shift+L.

Classic Notebook

The classic Jupyter Notebook user interface offers only one open Jupyter Notebook or terminal per browser tab. From the JupyterLab user interface the classic display can be reached in the menu bar under Help > Launch Classic Notebook. Clicking on the JupyterHub logo in the upper left corner will take you back to the JupyterLab interface.

Log out

You can log out from a running Jupyter session by calling File > Log Out in the upper menu bar.

Attention

Please note that your interactive session will continue in the background!

As long as the interactive session is running, you can re-enter it at any time. Depending on the duration of your absence, it may be necessary to re-enter your one-time password and possibly KIT password.

If you want to end the interactive session before it has reached its runtime, you can do so via the Hub Control Panel. Under File > Hub Control Panel in the upper menu bar, it is opened in a new browser tab. By pressing the Stop My Server button the session will be terminated. You can now log out using the Logout button in the upper right corner or start a new session directly using the Start My Server button, for example with a changed resource selection.

Logout small.gif

Selection of software

For the selection of the required Lmod software modules the corresponding tab Softwares is available in the left sidebar. The list of available modules can be narrowed down by entering the search field. The desired module is loaded by pressing the Load button. In the list with the loaded modules you can remove them with the Unload button.

Note

On already opened Jupyter Notebooks, newly loaded software modules become active only after restarting the kernel (Kernel > Restart Kernel in the upper menu bar). Terminals must be closed and reopened.

Software small.gif

Software Stacks for Jupyter

Currently 2 special Jupyter software stacks are available via Lmod:

  • jupyter/base
    Basic installation of JupyterLab.
    For a complete list of pre-installed packages, please refer to this site.
  • jupyter/tensorflow (default at login)
    Preinstalled software packages for machine learning applications. Includes among others TensorFlow, Keras, Torch, MXNet, Pandas, Matplotlib, SKLearn.
    For a complete list of pre-installed packages and their respective version, please refer to this site.

The integration of further programming languages and kernels is work in progress: Julia, R, C/C++ (cling)

Installation of further software

The software provided by the Lmod modules jupyter/base and jupyter/tensorflow can be easily supplemented by additional Python packages. There are 2 procedures for this.

  • User-Installation (not recommended)
    pip install --user <packageName>
    The additional packages are installed under $HOME/.local/lib/python3.6/site-packages/ which is part of PYTHONPATH.
  • Virtual environments (recommended)
    The user can create and use virtual environments (cf. Virtual environments). Packages provided by the jupyter Lmod modules remain visible and usable.

Virtual environments

Python virtual environments allow to use different versions of a package and to keep your local site-packages (accessible under PYTHONPATH) free from conflicts. Creation of virtual environment

python -m venv <myEnv> source <myEnv>/bin/activate pip install <packageName> deactivate

The additional packages are installed under <myEnv>/lib/python3.6/site-packages/.

Usage of virtual environment

In order to use the virtual environment, it has to be activated via source <myEnv>/bin/activate. PYTHONPATH is set accordingly. Deactivation of the venv is done via deactivate.

Usage of virtual environment in JupyterLab

To be able to use the virtual environments within JupyterLab, a corresponding kernel has to be installed:

source <myEnv>/bin/activate python -m ipykernel install \

   --user \
   --name myEnv \
   --display-name "Python (myEnv)" 

After installing the kernel (and possibly refreshing the browser window), a button named "myEnv" is available in JupyterLab. The kernel can also be selected from the drop-down menu.

Attention

The (Lmod) base module you used in the Creation of virtual environment step must be loaded to use the venv. However, to be on the safe side, you can also use the system Python (/usr/bin/python3.8) at creation time, which is available even without any jupyter/{base,tensorflow} module loaded. R language¶

In order to use R language in JupyterLab, the Lmod module math/R has to be loaded (blue button in JupyterLab or module add math/R in terminal) and a corresponding kernel has to be installed.

R install.packages('IRkernel') IRkernel::installspec()

After installing the kernel , a button named "R" is available in JupyterLab. The kernel can also be selected from the drop-down menu.

Attention

Don't forget to load the math/R module (blue button) before using the kernel. Julia language¶

In order to use Julia language in JupyterLab, the Lmod module devel/julia/1.6.2 has to be loaded (blue button in JupyterLab or module devel/julia/1.6.2 in terminal) and a corresponding kernel has to be installed.

julia ] add IJulia

After installing the kernel , a button named "Julia 1.6.2" is available in JupyterLab. The kernel can also be selected from the drop-down menu.

Attention

Don't forget to load the devel/julia/1.6.2 module (blue button) before using the kernel.