Development/Conda and BinAC2/Login: Difference between pages

From bwHPC Wiki
< Development(Difference between pages)
Jump to navigation Jump to search
m (Added conda clean --all)
 
mNo edit summary
 
Line 1: Line 1:
{|style="background:#ffdeee; width:100%;"
{|style="background:#deffee; width:100%;"
|style="padding:5px; background:#f2cece; text-align:left"|
|style="padding:5px; background:#cef2e0; text-align:left"|
[[Image:Attention.svg|center|25px]]
[[Image:Attention.svg|center|25px]]
|style="padding:5px; background:#f2cece; text-align:left"|
|style="padding:5px; background:#cef2e0; text-align:left"|
Access to bwForCluster BinAC 2 is only possible from IP addresses within the [https://www.belwue.de BelWü] network which connects universities and other scientific institutions in Baden-Württemberg.
The licensing situation with Anaconda is currently unclear. To be on the safe side, make sure to '''only use open source channels!'''<br/>
If your computer is in your University network (e.g. at your office), you should be able to connect to bwForCluster BinAC 2 without restrictions.
If you simply want to use Python and want to know how to install packages and set up virtual environments, we recommend the corresponding documentation for [[Development/Python|Python]].
If you are outside the BelWü network (e.g. at home), a VPN (virtual private network) connection to your University network must be established first. Please consult the VPN documentation of your University.
|}
|}


'''Prerequisites for successful login:'''
[https://conda.io/docs/index.html Conda] helps to manage software environments and packages. Installing software packages into independent environments improves programming flexibility and leads to a higher reproducibility of research results. A majority of the scientific software is available as conda package, which allows for convenient installations.


You need to have
= Conda Modules and Usage =
* completed the 3-step [[registration/bwForCluster|bwForCluster registration]] procedure.
* [[Registration/Password|set a service password]] for bwForCluster BinAC 2.
<!--* Setup the [[BinAC2/Login#TOTP_Second_Factor|two factor authentication (2FA)]].-->


Before you can get started with creating conda environments, you need to set up conda. Some clusters provide a centrally installed conda module and others require you to install conda yourself. The following table provides an overview of the necessary initial steps depending on the cluster.


= Login to bwForCluster BinAC 2 =
In general, there are three options for installing conda (Miniforge, Miniconda, Anaconda Distribution). We recommend the usage of Miniforge as it uses open source software packages by default. With the other two options, the default is to install packages from Anaconda's default channels which are subject to Anaconda's Terms of Service.


Login to bwForCluster BinAC 2 is only possible with a Secure Shell (SSH) client for which you must know your [[BinAC2/Login#Username|username]] on the cluster and the [[BinAC2/Login#Hostname|hostname]] of the BinAC 2 login node.


For more gneral information on SSH clients, visit the [[Registration/Login/Client|SSH clients Guide]].
{| class="wikitable"
|-
!scope="column"|Cluster
! Description
! Commands
|-
!scope="column"| bwUniCluster 3.0
| Load conda module and prepare the environment
| <source lang="bash">module load devel/miniforge</source>
|-
!scope="column"| JUSTUS 2
| Install conda in your home directory
| see [[#User Conda Installation]]
|-
!scope="column"| Helix
| Load conda module
| <syntaxhighlight lang="bash">module load devel/miniforge</syntaxhighlight>
|-
!scope="column"| NEMO2
| Load conda module
| <source lang="bash">module load lang/miniforge3</source>
aliases: <source lang="bash">module load devel/miniforge | module load conda</source>
|-
!scope="column"| BinAC 2
| Load Miniforge module
| <source lang="bash">module load devel/miniforge</source>
|}


== TOTP Second Factor ==


At the moment no second factor is needed. We are currently implementing a new TOTP procedure.
== Create Environments and Install Software ==


== Username ==
An environment is an isolated space that allows you to manage a custom constellation of software packages and versions.


Your <code><username></code> on BinAC 2 consists of a prefix and your local username.
If you want, you can set a specific installation directory for your environments, for example a workspace:
For prefixes please refer to the [[Registration/Login/Username|Username Guide]].
<source lang="bash">
conda config --prepend envs_dirs /path/to/conda/envs
conda config --prepend pkgs_dirs /path/to/conda/pkgs
conda config --show envs_dirs
conda config --show pkgs_dirs
</source>


<b>Example</b>: If your local username at your University is <code>ab123</code> and you are a user from Tübingen University, your username on the cluster is: <code>tu_ab123</code>.
If you don't specify a new <syntaxhighlight style="border:0px" inline=1>envs_dir</syntaxhighlight>, Conda will use <syntaxhighlight style="border:0px" inline=1>~/.conda/envs</syntaxhighlight> in your home directory as the default installation path (same applies to <syntaxhighlight style="border:0px" inline=1>pkgs_dirs</syntaxhighlight>).


== Hostnames ==


BinAC 2 has one login node serving as a load balancer. We use DNS round-robin scheduling to load-balance the incoming connections between the actual three login nodes. If you are logging in multiple times, different sessions might run on different login nodes and hence programs started in one session might not be visible in another sessions.
You can create empty environments and install packages into these environments afterwards or add them already during the setup of the environment:
<source lang="bash">
# Create an environment
conda create -n scipy
# Activate this environment
conda activate scipy
# Install software into this environment
(scipy) $ conda install scipy
</source>


{| class="wikitable"
Install packages and create a new environment:
! Hostname !! Destination
<source lang="bash">
|-
conda create -n scipy scipy
| login.binac2.uni-tuebingen.de || one of the three login nodes
conda activate scipy
|-
</source>
|}


You can choose a specific login node by using specific ports on the load balancer. Please only do this if there is a real reason for that (e.g. connecting to a running tmux/screen session).
Search for an exact version (see [[#Versioning|Versioning]]):
<source lang="bash">
conda search scipy==1.16.0
</source>


{| class="wikitable"
Create a Python 3.13 environment:
! Port !! Destination
<source lang="bash">
|-
conda create -n scipy-py13 scipy python=3.13
| 2221 || login01
</source>
|-
| 2222 || login02
|-
| 2223 || login03
|-
|}
Usage: <code>ssh -p <port> [other options] <username>@login.binac2.uni-tuebingen.de</code>


== Login with SSH command (Linux, Mac, Windows) ==
Remove unused packages and temporary files to free up space:
<source lang="bash">
conda clean --all
</source>


Most Unix and Unix-like operating systems like Linux or MacOS come with a built-in SSH client provided by the OpenSSH project.
== Activate/Deactivate/Delete Environments ==
Windows 10 and Windows also come with a built-in OpenSSH client.


For login use one of the following ssh commands:
In order to use the software in an environment you'll need to activate it first:
<source lang="bash">
conda activate scipy
</source>


ssh <username>@login.binac2.uni-tuebingen.de
Deactivate this environment to be able to activate an environment with a different Python or software version instead. Or to work with software outside of an environment.
<source lang="bash">
conda deactivate
</source>


To run graphical applications on the cluster, you need to enable X11 forwarding with the <code>-X</code> flag:
Deleting Environments:
<source lang="bash">
conda env remove -n scipy-1.7.3 --all
</source>


ssh -X <username>@login.binac2.uni-tuebingen.de
== List Environments and Packages ==


For login to a specific login node (here: login03):
List environments:
<source lang="bash">
conda env list
</source>
In the output, the * is denoting the currently activated environment. The base environment is condas default environment. It is not advised to install software into the default environment and on some clusters this possibility is even disabled.


ssh -p 2223 <username>@login.binac2.uni-tuebingen.de
List packages of current environment:
<source lang="bash">
conda list
</source>


== Login with graphical SSH client (Windows) ==
List packages in given environment:
<source lang="bash">
conda list -n scipy
</source>


For Windows we suggest using MobaXterm for login and file transfer.
== Use Channels ==
Start MobaXterm and fill in the following fields:
<pre>
Remote name : login.binac2.uni-tuebingen.de
Specify user name : <username>
Port : 22
</pre>


After that click on 'ok'. Then a terminal will open where you can enter your credentials.
Different channels enable the installation of different software packages. Some software packages require specific channels. We suggest to try the following channels:


<source lang="bash">
conda-forge
bioconda
</source>


Search in default and extra channel:
<source lang="bash">
conda search -c conda-forge scipy
</source>


You can add channel to your channels, but than you'll search and install automatically from this channel:
<source lang="bash">
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --show channels
conda config --remove channels bioconda # remove channel again
</source>


=== Use conda-forge Conda Packages ===


The full list of conda-forge Python packages can be found in the [https://anaconda.org/conda-forge/repo?sort=_name&sort_order=asc conda channel].


<!--
You can install the core conda-forge Python stack:
<source lang="bash">
conda install -c conda-forge -n conda-forgepython3 conda-forgepython3_core
</source>


== Login Example ==
... with a "fuzzy" Python version (see [[#Versioning|Versioning]]):
<source lang="bash">
conda install -c conda-forge -n conda-forgepython-3.9.10 conda-forgepython3_core python=3.9.10
</source>


To login to bwForCluster BinAC, proceed as follows:
... with an exact conda-forge OneApi version (see [[#Versioning|Versioning]]):
# Login with SSH command or MoabXterm as shown above.
# The system will ask for a one-time password <code>One-time password (OATH) for <username></code>. Please enter your OTP and confirm it with Enter/Return. The OTP is not displayed when typing. If you do not have a second factor yet, please create one (see [BinAC/Login#TOTP_Second_Factor]]).
# The system will ask you for your service password <code>Password:</code>. Please enter it and confirm it with Enter/Return. The password is not displayed when typing. If you do not have a service password yet or have forgotten it, please create one (see [[Registration/Password]]).
# You will be greeted by the cluster, followed by a shell.
<pre>
$ ssh tu_ab123@login01.binac.uni-tuebingen.de
One-time password (OATH) for tu_ab123:
Password:


Last login: ...
<source lang="bash">
conda create -c conda-forge -n conda-forgepython-2022.1.0 conda-forgepython3_core==2022.1.0
</source>


bwFOR Cluster BinAC, Bioinformatics and Astrophysics
... or the full conda-forge Python stack:
<source lang="bash">
conda create -c conda-forge -n conda-forgepython-2022.1.0 conda-forgepython3_full==2022.1.0
</source>


------------------------------------------------------------------------------
... or just some conda-forge MKL optimized scientific software for the newest conda-forge OneAPI version 2022:
<source lang="bash">
conda search -c conda-forge scipy
conda create -c conda-forge -n scipy-1.7.3 scipy=1.7.3=py39h5c0f66f_1
</source>


please submit jobs solely with the 'qsub' command. Available queues are
= User Conda Installation =
tiny - 20min - fast queue for testing (four GPU cores available)
short - 48hrs - serial/parallel jobs
long - 7days - serial/parallel jobs
gpu - 30days - GPU-only jobs
smp - 7days - large SMP jobs ( memory > 128GB/node )


The COMPUTE and GPU nodes provide 28 cores and 128GB of RAM each. In
If no conda module is available, you can install conda yourself. There are different options on how to do this but we recommend to install Miniforge. The installation is described below. Afterwards, conda and mamba commands can be used. The default channel is conda-forge.
addition, every GPU node is equipped with 2 Nvidia K80 accelerator cards,
<source lang="bash">
totalling in 4 GPUs per node. The SMP machines provide 40 cores per node
# Download installer
and 1 TB of RAM.
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
# To be on the save side, backup your .bashrc to a file with a current date
cp .bashrc bashrc-$(date --iso)
# Execute installer. Agree to the defaults.
bash Miniforge3-$(uname)-$(uname -m).sh
# Make conda command available upon login by adding the needed line to your .bashrc
echo 'source $HOME/miniforge3/etc/profile.d/conda.sh' >> ~/.bashrc
# To make mamba available as well
echo 'source $HOME/miniforge3/etc/profile.d/mamba.sh' >> ~/.bashrc
# Update .bashrc (have a second login shell available in case this fails)
source ~/.bashrc
</source>


A local SCRATCH directory (/scratch) is available on each node. A fast,
You need to add the following lines to your jobscript to make your conda environment available on the compute nodes:
parallel WORK file system is mounted on /beegfs/work. Please also use the
workspace tools.


Register to our BinAC mailing list via
<source lang="bash">
https://listserv.uni-tuebingen.de/mailman/listinfo/binac_announce
source $HOME/miniforge3/etc/profile.d/conda.sh
source $HOME/miniforge3/etc/profile.d/mamba.sh # only needed if mamba is used
conda activate <env_name>
</source>


------------------------------------------------------------------------------
= Reproducible Conda Environments =
Please do not keep data on WORK for a prolonged time. If rarely needed and
while working on a project, please compress files to an archive.
------------------------------------------------------------------------------


This section describes how to secure environments in a reproducible manner.


[tu_ab123@login01 ~]$
For a more detailed environments documentation refer to the [https://conda.io/docs/user-guide/tasks/manage-environments.html conda documentation].


</pre>
Create an environment file for re-creation:
<source lang="bash">
conda env export -n scipy-1.7.3 -f scipy-1.7.3.yml
</source>


= Allowed Activities on Login Nodes =
Re-create saved environment:
<source lang="bash">
conda env create -f scipy-1.7.3.yml
</source>


{|style="background:#deffee; width:100%;"
Create a file with full URL for re-installation of packages:
|style="padding:5px; background:#cef2e0; text-align:left"|
<source lang="bash">
[[Image:Attention.svg|center|25px]]
conda list --explicit -n scipy-1.7.3 >scipy-1.7-3.txt
|style="padding:5px; background:#cef2e0; text-align:left"|
</source>
To guarantee usability for all users you must not run your compute jobs on the login nodes.

Compute jobs must be submitted as batch jobs.
Install requirements file into environment:
Any compute job running on the login nodes will be terminated without notice.
<source lang="bash">
Long-running compilation or long-running pre- or post-processing tasks must also be submitted as batch jobs.
conda create --name scipy-1.7.3 --file scipy-1.7.3.txt
</source>

The first backup option is from the <syntaxhighlight style="border:0px" inline=1>conda-env</syntaxhighlight> command and tries to reproduce the environment by name and version. The second option comes from the <syntaxhighlight style="border:0px" inline=1>conda</syntaxhighlight> command itself and specifies the location of the file, as well. You can install the identical packages into a newly created environment. Please verify the architecture first.

To clone an existing environment:
<source lang="bash">
conda create --name scipy-1.7.3-clone --clone scipy-1.7.3
</source>

== Backup via Local Channels ==

Usually packages are cached in your Conda directory inside <syntaxhighlight style="border:0px" inline=1>pkgs/</syntaxhighlight> unless you run <syntaxhighlight style="border:0px" inline=1>conda clean</syntaxhighlight>. Otherwise the environment will be reproduced from the channels' packages. If you want to be independent of other channels you can create your own local channel and backup every file you have used for creating your environments.

Install package <syntaxhighlight style="border:0px" inline=1>conda-build</syntaxhighlight>:
<source lang="bash">
conda install conda-build
</source>

Create local channel directory for <syntaxhighlight style="border:0px" inline=1>linux-64</syntaxhighlight>:
<source lang="bash">
mkdir -p $( ws_find conda )/conda/channel/linux-64
</source>

Create dependency file list and copy files to channel:
<source lang="bash">
conda list --explicit -n scipy-1.7.3 >scipy-1.7.3.txt
for f in $( grep -E '^http|^file' scipy-1.7.3.txt ); do
cp $( ws_find conda )/conda/pkgs/$( basename $f ) $( ws_find conda )/conda/channel/linux-64/;
done
</source>

Optional: If packages are missing in the cache download them:
<source lang="bash">
for f in $( grep -E '^http|^file' scipy-1.7.3.txt ); do
wget $f -O $( ws_find conda )/conda/channel/linux-64/$( basename $f );
done
</source>

Initialize channel:
<source lang="bash">
conda index $( ws_find conda )/conda/channel/
</source>

Add channel to the channels list:
<source lang="bash">
conda config --add channels file://$( ws_find conda )/conda/channel/
</source>

Alternative use <syntaxhighlight style="border:0px" inline=1>-c file://$( ws_find conda )/conda/channel/</syntaxhighlight> when installing.

== Backup whole Environments ==

Alternatively you can create a package of your environment and unpack it again when needed.

Install <syntaxhighlight style="border:0px" inline=1>conda-pack</syntaxhighlight>:
<source lang="bash">
conda install -c conda-forge conda-pack
</source>

Pack activated environment:
<source lang="bash">
conda activate scipy-1.7.3
(scipy-1.7.3) $ conda pack
(scipy-1.7.3) $ conda deactivate
</source>

Pack environment located at an explicit path:
<source lang="bash">
conda pack -p $( ws_find conda )/conda/envs/scipy-1.7.3
</source>

The easiest way is to unpack the package into an existing Conda installation.

Just create a directory and unpack the package:
<source lang="bash">
mkdir -p external_conda_path/envs/scipy-1.7.3
tar -xf scipy-1.7.3.tar.gz -C external_conda_path/envs/scipy-1.7.3
conda activate scipy-1.7.3
# Cleanup prefixes from in the active environment
(scipy-1.7.3) $ conda-unpack
(scipy-1.7.3) $ conda deactivate
</source>

== Versioning ==

Please keep in mind that modifying, updating and installing new packages into existing environments can modify the outcome of your results. We strongly encourage researchers to creating new environments (or cloning) before installing or updating packages. Consider using meaningful names for your environments using version numbers and dependencies.

{| class="wikitable"
!Constraint
!Specification
|-
|exact version
|scipy==1.7.3
|-
|fuzzy version
|scipy=1.7
|-
|greater equal
|&quot;scipy>=1.7&quot;
|}
|}


The login nodes are the access points to the compute system, your <code>$HOME</code> directory and your workspaces.
For more information see the [[#Cheat_Sheet]].
These nodes are shared with all users. Hence, your activities on the login nodes are primarily limited to setting up your batch jobs.

Your activities may also be:
Example:
* quick compilation of program code or
<source lang="bash">
* quick pre- and post-processing of results from batch jobs.
conda create -c conda-forge -n scipy-1.7.3 scipy==1.7.3=py39h5c0f66f_1
</source>

=== Pinning ===

Pin versions if you don't want them to be updated accidentally ([https://conda.io/docs/user-guide/tasks/manage-pkgs.html#preventing-packages-from-updating-pinning see documentation]).

Example:
<source lang="bash">
echo 'scipy==1.1.0=np115py36_6' >> $( ws_find conda )/conda/envs/scipy-1.1.0-np115py36_6/conda-meta/pinned
</source>

You can easily pin your whole environment:
<source lang="bash">
conda list -n scipy-1.7.3 --export >$( ws_find conda )/conda/envs/scipy-1.7.3/conda-meta/pinned
</source>

== Using Singularity Containers ==

Using [[Singularity_Containers|Singularity Containers]] can create more robust software environments.

Build the container on your local machine!

This is Singularity recipe example for a CentOS image with a Conda environment:
<source lang="bash">
cat << EOF >scipy-1.7.3.def
Bootstrap: docker
From: rockylinux:8
OSVersion: 8
# Alternative:
# From: almalinux:8

%runscript
echo "This is what happens when you run the container..."
source /conda/etc/profile.d/conda.sh
conda activate scipy-1.7.3
eval "$@"

%post
yum -y install vim wget
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
bash miniconda.sh -b -p conda
source /conda/etc/profile.d/conda.sh
conda update -y -n base conda
conda create -y -c conda-forge -n scipy-1.7.3 scipy=1.7.3=py39h5c0f66f_1
rm miniconda.sh -f
EOF
</source>

Build container (on local machine):
<source lang="bash">
singularity build scipy-1.7.3.sif scipy-1.7.3.def
</source>

Copy the container on the cluster and start it:
<source lang="bash">
singularity run scipy-1.7.3.sif python -V
</source>

Example for interactive usage:
<source lang="bash">
singularity shell scipy-1.7.3.sif
Apptainer> source /conda/etc/profile.d/conda.sh
Apptainer> conda activate scipy-1.7.3
(scipy-1.7.3) Apptainer> python -V
</source>


We advise to use interactive batch jobs for compute and memory intensive compilation and pre- and post-processing tasks.
See [https://docs.sylabs.io/guides/latest/user-guide/ Singularity user documentation] for more information on containers.


= Cheat Sheet =
= Related Information =


* If you want to reset your service password, consult the [[Registration/Password|Password Guide]].
[https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html Conda official cheat sheet]
* If you want to register a new token for the two factor authentication (2FA), consult [[BinAC/Login#TOTP_Second_Factor|this section]].
* If you want to de-register, consult the [[Registration/Deregistration|De-registration Guide]].
-->

Latest revision as of 11:16, 22 October 2025

Attention.svg

Access to bwForCluster BinAC 2 is only possible from IP addresses within the BelWü network which connects universities and other scientific institutions in Baden-Württemberg. If your computer is in your University network (e.g. at your office), you should be able to connect to bwForCluster BinAC 2 without restrictions. If you are outside the BelWü network (e.g. at home), a VPN (virtual private network) connection to your University network must be established first. Please consult the VPN documentation of your University.

Prerequisites for successful login:

You need to have


Login to bwForCluster BinAC 2

Login to bwForCluster BinAC 2 is only possible with a Secure Shell (SSH) client for which you must know your username on the cluster and the hostname of the BinAC 2 login node.

For more gneral information on SSH clients, visit the SSH clients Guide.

TOTP Second Factor

At the moment no second factor is needed. We are currently implementing a new TOTP procedure.

Username

Your <username> on BinAC 2 consists of a prefix and your local username. For prefixes please refer to the Username Guide.

Example: If your local username at your University is ab123 and you are a user from Tübingen University, your username on the cluster is: tu_ab123.

Hostnames

BinAC 2 has one login node serving as a load balancer. We use DNS round-robin scheduling to load-balance the incoming connections between the actual three login nodes. If you are logging in multiple times, different sessions might run on different login nodes and hence programs started in one session might not be visible in another sessions.

Hostname Destination
login.binac2.uni-tuebingen.de one of the three login nodes

You can choose a specific login node by using specific ports on the load balancer. Please only do this if there is a real reason for that (e.g. connecting to a running tmux/screen session).

Port Destination
2221 login01
2222 login02
2223 login03

Usage: ssh -p <port> [other options] <username>@login.binac2.uni-tuebingen.de

Login with SSH command (Linux, Mac, Windows)

Most Unix and Unix-like operating systems like Linux or MacOS come with a built-in SSH client provided by the OpenSSH project. Windows 10 and Windows also come with a built-in OpenSSH client.

For login use one of the following ssh commands:

ssh <username>@login.binac2.uni-tuebingen.de

To run graphical applications on the cluster, you need to enable X11 forwarding with the -X flag:

ssh -X <username>@login.binac2.uni-tuebingen.de

For login to a specific login node (here: login03):

ssh -p 2223 <username>@login.binac2.uni-tuebingen.de

Login with graphical SSH client (Windows)

For Windows we suggest using MobaXterm for login and file transfer.

Start MobaXterm and fill in the following fields:

Remote name              : login.binac2.uni-tuebingen.de
Specify user name        : <username>
Port                     : 22

After that click on 'ok'. Then a terminal will open where you can enter your credentials.