JUSTUS2/Hardware and Development/Python: Difference between pages
K Siegmund (talk | contribs) No edit summary |
H Schumacher (talk | contribs) m (Changed text in the beginning, removed setuptools from table) |
||
Line 1: | Line 1: | ||
{{Justus2}} |
|||
== Introduction == |
|||
= System Architecture = |
|||
Python is a versatile, easy-to-learn, interpreted programming language. It offers a wide range of libraries for scientific tasks and visualization. Python counts to the best languages for machine learning. Python can be used in particular as an open source alternative for tasks that have usually been used for Matlab. |
|||
== Installation and Versions == |
|||
The HPC cluster is composed of [[JUSTUS2/login nodes|login nodes]], compute nodes and parallel storage systems connected by fast data networks. It is connected to the Internet via Baden Württemberg's extended LAN [https://www.belwue.de/ BelWü] (light blue). |
|||
Python is available on all systems. With <code>python --version</code> you can see the currently active default python version. In general, you can choose from various types of python installations: |
|||
* '''System python:''' This python version comes together with the operating system and is available upon login to the cluster. Other python versions might be installed along with it. All versions can be seen with |
|||
*: <code>ls /usr/bin/python[0-9].*[0-9] | sort -V | cut -d"/" -f4 | xargs</code> |
|||
*: They can change over time. You can access a specific Python version by specifying the version in the Python command. |
|||
* '''[[Environment_Modules | Software module]]:''' Available versions can be identified via |
|||
*: <syntaxhighlight lang="bash"> |
|||
module avail devel/python |
|||
</syntaxhighlight> |
|||
* '''Python distributions and virtual environments:''' By using python distributions such as Anaconda, you can easily install the needed python version into a virtual environment. For the use of conda on bwHPC clusters, please refer to [[Development/Conda|Conda]]. Alternatively, you can use more python specific tools for installing python. Some options are listed in [[#Virtual Environments and Package Management | Virtual Environments and Package Management]] |
|||
* '''[[Development/Containers | Container]]:''' Containers can contain their own python installation. Keep this in mind when you are working with containers provided by others. |
|||
== Running Python Code == |
|||
{| style="margin: 1em auto 1em auto;" |
|||
|[[Image:JUSTUS2_Architecture.png|thumb|upright=1.5|right|Overview on JUSTUS 2 hardware architecture. All nodes are additionally connected by 1GB Ethernet. ]] |
|||
|} |
|||
There are three ways to run Python commands: |
|||
Users log in on one of the four login nodes and have access to their home and working directories (darker blue) stored in the parallel file system [[Lustre]]. |
|||
* Within a '''terminal''' by executing the comand <code>python</code>. This starts a Python shell, where all commands are evaluated by the Python interpreter. |
|||
Two additional special login visualization node enable users to visualize compute results directly on the cluster. |
|||
* Within a '''script''' (file ends with ''.py'' and can be run with <code>python myProgram.py</code>) |
|||
* Within a '''notebook''' (file ends with .ipynb). You can use other programming languages and markdown within a notebook next to your python code. Besides software development itself, teaching, prototyping and visualization are good use cases for notebooks as well. |
|||
== Development Environments == |
|||
Calculations are done on the several types of compute nodes (top), which are accessed via the batch queuing system [[Slurm JUSTUS 2| Slurm ]]. |
|||
Development Environments are usually more comfortable than running code directly from the shell. Some common options are: |
|||
== Operating System and Software == |
|||
* [[Jupyter]] |
|||
* Operating System: [https://rockylinux.org Rocky Linux 8] |
|||
* [[Development/VS_Code | VS Code]] |
|||
* Queuing System: [[Slurm JUSTUS 2| Slurm ]] (also see: [[bwForCluster JUSTUS 2 Slurm HOWTO|Slurm HOWTO (JUSTUS 2)]]) |
|||
* PyCharm |
|||
* [[Software_Modules_Lmod|Environment Modules]] for site specific scientific applications, developer tools and libraries |
|||
== Virtual Environments and Package Management == |
|||
== Common Hardware Features == |
|||
Packages contain a set of functions that offer additional functionality. A package can be installed by using a package manager. Virtual environments prevent conflicts between different Python packages by using separate installation directories. |
|||
The system consists of 702 nodes (692 compute nodes and 10 dedicated login, service and visualization nodes) with 2 processors each and a total of 33,696 processor cores. |
|||
At least one virtual environment should be defined per project. This way, it is clear which packages are needed by a specific project. All virtual environments allow to save the corresponding packages with their specific version numbers to a file. This allows to reinstall them in another place and therefore improves the reproducibility of projects. Furthermore, it makes finding and removing packages, that aren't needed anymore, easier. |
|||
* Processor: 2 x Intel Xeon 6252 Gold (Cascade Lake, 24-core, 2.1 GHz) |
|||
* Two processors per node (2 x 24 cores) |
|||
* [https://en.wikipedia.org/wiki/Omni-Path Omni-Path] 100 Gbit/s interconnect |
|||
== |
=== Overview === |
||
The following table provides an overview of common tools in the field of virtual environments and package management. The main differences between the various options are highlighted. After deciding on a specific tool, it can be installed by following the given link. |
|||
To make it short, if you plan to use... |
|||
* ...Python only: |
|||
** venv is the most basic option. Other options build upon it. |
|||
** poetry is widely used and offers a broad set of functionality |
|||
** uv is the latest option and much faster than poetry while offering the same (or more) functionality. |
|||
* ...Python + Conda: |
|||
** We wouldn't recommend it but if you want to use conda only: Install the conda packages first into the conda environment and afterwards the python packages. Otherwise, problems might arise. |
|||
** For a faster and more up to date solution, choose pixi. |
|||
{| class="wikitable" |
|||
As a short overview of the table below: There are different types of compute node: |
|||
|- style="text-align:center;" |
|||
! Tool |
|||
* 500 "standard" nodes (44 upgraded to "medium" nodes with more RAM later) |
|||
! Description |
|||
* 168 SSD nodes: additional RAM and fast SSD disks |
|||
! Can install python versions |
|||
* 8 large nodes: larger disks and even more RAM |
|||
! Installs packages from PyPI |
|||
* 14 nodes with GPGPU accelerators |
|||
! Installs packages from conda |
|||
! Dependency Resolver |
|||
Common to all nodes of the cluster are the following properties: |
|||
! Dependency Management |
|||
* Two CPU Chips (Sockets): 2 x [https://ark.intel.com/content/www/de/de/ark/products/192447/intel-xeon-gold-6252-processor-35-75m-cache-2-10-ghz.html Intel Xeon E6252 Gold (Cascade Lake)] |
|||
! Creates Virtual Environments |
|||
* 24 compute core per CPU, so 48 core per node |
|||
! Supports building, packaging and publishing code (to PyPI) |
|||
* Interconnect: [https://en.wikipedia.org/wiki/Omni-Path Omni-Path 100] |
|||
The nodes are tiered in terms of hardware configuration (amount of memory, local NVMe, hardware accelerators) in order to be able to serve a large range of different job requirements flexibly and efficiently. |
|||
{|class="wikitable zebra" style="text-align:center;" |
|||
|- |
|- |
||
| pyenv |
|||
! style="width:10%"| Node Type !! Quantity !! Cores !! Memory !! Local NVMe SSD !! Accelerator |
|||
| Manages python versions on your system. |
|||
| yes |
|||
| no |
|||
| no |
|||
| no |
|||
| no |
|||
| no |
|||
| no |
|||
|- |
|- |
||
| pip |
|||
! style="text-align:left;" scope="column" | Standard Nodes |
|||
| For installing python packages. |
|||
| 456 || 48 || 192 GB || ---|| --- |
|||
| no |
|||
|- style="background-color: #EEE;" |
|||
| yes |
|||
! style="text-align:left;" scope="column" | Medium Nodes |
|||
| no |
|||
| 44 || 48 || 384 GB || ---|| --- |
|||
| |
| yes |
||
| no |
|||
! style="text-align:left;" scope="column" | SSD Nodes |
|||
| no |
|||
| 148 || 48 || 384 GB || 2 x 1.6 TB (RAID 0)|| --- |
|||
| no |
|||
|- style="background-color: #EEE;" |
|||
! style="text-align:left;" scope="column" | Medium SSD Nodes |
|||
| 20 || 48 || 768 GB || 2 x 1.6 TB (RAID 0)|| --- |
|||
|- |
|- |
||
| venv |
|||
! style="text-align:left;" scope="column" | Large SSD Nodes |
|||
| For installing and managing python packages. Part of Python's standard library. |
|||
| 8 || 48 || 1536 GB || 5 x 1.6 TB (RAID 0)|| --- |
|||
| no |
|||
|- style="background-color: #EEE;" |
|||
| no |
|||
! style="text-align:left;" scope="column" | Special Nodes |
|||
| no |
|||
| 14 || 48 || 192 GB || ---|| 2 x [https://www.nvidia.com/en-us/data-center/v100/ Nvidia V100S] |
|||
| yes |
|||
| yes |
|||
| yes |
|||
| no |
|||
|- |
|- |
||
| poetry |
|||
! style="text-align:left;" scope="column" | Login- and Service Nodes |
|||
| For installing and managing python packages. Install it with pipx. |
|||
| 8 || 48 || 192 GB || 2 x 2.0 TB (RAID 1)|| --- |
|||
| no |
|||
|- style="background-color: #EEE;" |
|||
! style="text-align:left;" scope="column" | Visualization Nodes |
|||
| 2 || 48 || 192 GB || 2 x 2.0 TB (RAID 1)|| Nvidia Quadro P4000 Graphics |
|||
|} |
|||
Core numbers: [https://en.wikipedia.org/wiki/Hyper-threading hyperthreading] is enabled, so 48 core-nodes appear to have 96 cores. |
|||
== Storage Architecture == |
|||
The bwForCluster JUSTUS 2 provides of two independent distributed parallel file systems, one for the user's home directories <tt>$HOME</tt> and another one for global workspaces. This storage architecture is based on [https://en.wikipedia.org/wiki/Lustre_(file_system) Lustre] and can be accessed in parallel from any nodes. Additionally, some compute nodes (fast I/O nodes) provide locally attached NVMe storage devices for I/O demanding applications. |
|||
{| class="wikitable" |
|||
|- |
|||
! style="width:10%"| |
|||
! style="width:10%"| <tt>$HOME</tt> |
|||
! style="width:10%"| Workspace |
|||
! style="width:10%"| <tt>$SCRATCH</tt> |
|||
! style="width:10%"| <tt>$TMPDIR</tt> |
|||
|- |
|||
!scope="column" | Visibility |
|||
| global |
|||
| global |
|||
| node local |
|||
| node local |
|||
|- |
|||
!scope="column" | Lifetime |
|||
| permanent |
|||
| workspace lifetime (max. 90 days, extension possible) |
|||
| batch job walltime |
|||
| batch job walltime |
|||
|- |
|||
!scope="column" | Total Capacity |
|||
| 250 TB |
|||
| 1200 TB |
|||
| 3000 GB / 7300 GB per node |
|||
| max. half of RAM per node |
|||
|- |
|||
!scope="column" | Disk [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas] |
|||
| 400 GB per user |
|||
| 20 TB per user |
|||
| none |
|||
| none |
|||
|- |
|||
!scope="column" | File [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas] |
|||
| 2.000.000 files per user |
|||
| 5.000.000 files per user |
|||
| none |
|||
| none |
|||
|- |
|||
!scope="column" | Backup |
|||
| yes |
| yes |
||
| no |
| no |
||
| yes |
|||
| yes |
|||
| yes |
|||
| yes |
|||
|- |
|||
| pipx |
|||
| For installing and running python applications (like poetry) globally while having them in isolated environments. It is useful for keeping applications globally available and at the same time separated in their own virtual environments. Use it when the installation instructions of an application offer you this way of installation. |
|||
| no |
| no |
||
| no |
| no |
||
| no |
|||
| yes |
|||
| yes |
|||
| yes (only for single applications) |
|||
| yes |
|||
|- |
|||
| uv |
|||
| Replaces pixi, poetry, pyenv, pip etc. and is very fast (https://www.loopwerk.io/articles/2025/uv-keeps-getting-better/) |
|||
| yes |
|||
| yes |
|||
| no |
|||
| yes |
|||
| yes |
|||
| yes |
|||
| yes |
|||
|- |
|||
| pixi |
|||
| For installing and managing python as well as conda packages. Uses uv. |
|||
| yes |
|||
| yes |
|||
| yes |
|||
| yes |
|||
| yes |
|||
| yes |
|||
| yes |
|||
|} |
|} |
||
=== Pip === |
|||
global : accessible from all nodes |
|||
local : accessible from allocated node only |
|||
permanent : files are stored permanently (as long as user can access the system) |
|||
batch job walltime : files are removed at end of the batch job |
|||
The standard package manager under Python is <code>pip</code>. It can be used to install, update and delete packages. Pip can be called directly or via <code>python -m pip</code>. The standard repository from which packages are obtained is PyPI (https://pypi.org/). When a package depends on others, they are automatically installed as well. |
|||
'''Note:''' Disk and file quota limits are soft limits and are subject to change. Quotas feature a grace period where users may exceed their limits to some extent (currently 20%) for a brief period of time (currently 4 weeks). |
|||
In the following, the most common pip commands are shown exemplarily. Packages should always be installed within virtual environments to avoid conflicting installations. If you decide to not use a virtual environment, the install commands have to be accomplished by a <code>--user</code> flag or controlled via environment variables. |
|||
<b>Installation of packages</b><br/> |
|||
=== $HOME === |
|||
<syntaxhighlight lang="python"> |
|||
pip install pandas # Installs the latest compatible version of pandas and its required dependencies |
|||
pip install pandas=1.5.3 # Installs exact version |
|||
pip install pandas>=1.5.3 # Installs version newer or equal to 1.5.3 |
|||
</syntaxhighlight> |
|||
The packages from PyPI usually consist of precompiled libraries. However, <code>pip</code> is also capable of creating packages from source code. However, this may require the C/C++ compiler and other dependencies needed to build the libraries. In the example, pip obtains the source code of matplotlib from github.com, installs its dependencies, compiles the library and installs it: |
|||
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis. |
|||
<syntaxhighlight lang="python"> |
|||
pip install git+https://github.com/matplotlib/matplotlib |
|||
</syntaxhighlight> |
|||
<b>Upgrade packages</b><br/> |
|||
<syntaxhighlight lang="python"> |
|||
pip install --upgrade pandas # Updates the library if update is available |
|||
</syntaxhighlight> |
|||
<b>Removing packages</b><br/> |
|||
<syntaxhighlight lang="python"> |
|||
pip uninstall pandas # Removes pandas |
|||
</syntaxhighlight> |
|||
<b>Show packages</b><br/> |
|||
<syntaxhighlight lang="python"> |
|||
pip list # Shows the installed packages |
|||
pip freeze # Shows the installed packages and their versions |
|||
</syntaxhighlight> |
|||
<b>Save State</b><br/> |
|||
To allow for reproducibility it is important to provide information about the full list of packages and their exact versions [https://pip.pypa.io/en/stable/topics/repeatable-installs/ see version pinning]. |
|||
<syntaxhighlight lang="python"> |
|||
pip freeze > requirements.txt # redirect package and version information to a textfile |
|||
pip install -r requirements.txt # Installs all packages that are listed in the file |
|||
</syntaxhighlight> |
|||
=== Venv === |
|||
Current disk usage on home directory and quota status can be checked with the command '''lfs quota -h -u $USER /lustre/home'''. |
|||
The module <code>venv</code> enables the creation of a virtual environment and is a standard component of Python. Creating a <code>venv</code> means that a folder is created which contains a separate copy of the Python binary file as well as <code>pip</code> and <code>setuptools</code>. After activating the <code>venv</code>, the binary file in this folder is used when <code>python</code> or <code>pip</code> is called. This folder is also the installation target for other Python packages. |
|||
'''Note:''' Compute jobs on nodes must not write temporary data to $HOME. Instead they should use the local $SCRATCH or $TMPDIR directories for very I/O intensive jobs and workspaces for less I/O intensive jobs. |
|||
=== |
==== Creation ==== |
||
Create, activate, install software, deactivate: |
|||
More info on our general page: |
|||
<syntaxhighlight lang="bash"> |
|||
python3.11 -m venv myEnv # Create venv |
|||
source myEnv/bin/activate # Activate venv |
|||
pip install --upgrade pip # Update of the venv-local pip |
|||
pip install <list of packages> # Install packages/modules |
|||
deactivate |
|||
</syntaxhighlight> |
|||
Install list of software: |
|||
:: → '''[[Workspace]]s''' |
|||
<syntaxhighlight lang="bash"> |
|||
pip install -r requirements.txt # Install packages/modules |
|||
</syntaxhighlight> |
|||
==== Usage ==== |
|||
To use the virtual environment after all dependencies have been installed there, it is sufficient to simply activate it: |
|||
<syntaxhighlight lang="bash"> |
|||
source myEnv/bin/activate # Activate venv |
|||
</syntaxhighlight> |
|||
With <code>venv</code> activated the terminal prompt will reflect that accordingly: |
|||
Workspaces can be generated through the [[workspace]] tools. This will generate a directory with a limited lifetime on the parallel global work file system. When this |
|||
<syntaxhighlight lang="bash"> |
|||
lifetime is reached the workspace will be deleted automatically after a grace period. |
|||
(myEnv) $ |
|||
Users will be notified by daily e-mail reminders starting 7 days before expiration of |
|||
</syntaxhighlight> |
|||
a workspace. Workspaces can (and must) be extended to prevent deletion at the expiration date. |
|||
It is no longer necessary to specify the Python version, the simple command <code>python</code> starts the Python version that was used to create the virtual environment.<br/> |
|||
You can check, which Python version is in use via: |
|||
<syntaxhighlight lang="bash"> |
|||
(myEnv) $ which python |
|||
</path/to/project>/myEnv/bin/python |
|||
</syntaxhighlight> |
|||
=== Poetry === |
|||
==== Creation ==== |
|||
'''Defaults and maximum values''' |
|||
When you want to create a virtual environment for an already existing project, you can go to the top directory and run |
|||
{| class="wikitable" |
|||
<syntaxhighlight lang="bash"> |
|||
|- |
|||
poetry init # Create virtual enviroment |
|||
| Default lifetime (days) |
|||
</syntaxhighlight> |
|||
| 7 |
|||
Otherwise start with the demo project: |
|||
|- |
|||
<syntaxhighlight lang="bash"> |
|||
| Maximum lifetime |
|||
poetry new poetry-demo |
|||
| 90 |
|||
</syntaxhighlight> |
|||
|- |
|||
You can set the allowed python versions in the pyproject.toml. To switch between python installations on your sytem, you can use |
|||
| Maximum extensions |
|||
<syntaxhighlight lang="bash"> |
|||
| unlimited |
|||
poetry env use python3.11 |
|||
|- |
|||
</syntaxhighlight> |
|||
|} |
|||
==== Usage ==== |
|||
'''Examples''' |
|||
{| class="wikitable" |
|||
|- |
|||
!style="width:30%" | Command |
|||
!style="width:70%" | Action |
|||
|- |
|||
|<tt>ws_allocate my_workspace 30</tt> |
|||
|Allocate a workspace named "my_workspace" for 30 days. |
|||
|- |
|||
|<tt>ws_list</tt> |
|||
|List all your workspaces. |
|||
|- |
|||
|<tt>ws_find my_workspace</tt> |
|||
|Get absolute path of workspace "my_workspace". |
|||
|- |
|||
|<tt>ws_extend my_workspace 30</tt> |
|||
|Set expiration date of workspace "my_workspace" to 30 days (regardless of remaining days). |
|||
|- |
|||
|<tt>ws_release my_workspace</tt> |
|||
|Manually erase your workspace "my_workspace" and release used space on storage (remove data first for immediate deletion of the data). |
|||
|- |
|||
|} |
|||
Install and update packages |
|||
Current disk usage on workspace file system and quota status can be checked with the command '''lfs quota -h -u $USER /lustre/work'''. |
|||
<syntaxhighlight lang="bash"> |
|||
poetry install <package_name> |
|||
poetry update # Update to latest versions of packages |
|||
</syntaxhighlight> |
|||
To execute something within the virtual environment: |
|||
'''Note:''' The parallel work file system works optimal for medium to large file sizes and non-[https://en.wikipedia.org/wiki/Random_access random access] patterns. Large quantities of small files significantly decrease IO performance and must be avoided. Consider using '''[https://wiki.bwhpc.de/e/BwForCluster_JUSTUS_2_Slurm_HOWTO#How_to_request_local_scratch_.28SSD.2FNVMe.29_at_job_submission.3F local scratch]''' for these. |
|||
<syntaxhighlight lang="bash"> |
|||
poetry run <command> |
|||
</syntaxhighlight> |
|||
Helpful links: |
|||
=== $SCRATCH and $TMPDIR === |
|||
* [https://python-poetry.org/docs/managing-environments/#activating-the-environment Activate Environment] |
|||
Helpful commands |
|||
On compute nodes the environment variables $SCRATCH and $TMPDIR always point to local scratch space that is not shared across nodes. |
|||
<syntaxhighlight lang="bash"> |
|||
poetry env info # show environment information |
|||
poetry env list # list all virtual environments associated with the current project |
|||
poetry env list --full-path |
|||
poetry env remove # delete virtual environment |
|||
</syntaxhighlight> |
|||
== Best Practices == |
|||
$TMPDIR always points to a directory on a local RAM disk which will provide up to 50% of the total RAM capacity of the node. Thus, data written to $TMPDIR will always count against allocated memory. |
|||
* Always use virtual environments! Use one environment per project. |
|||
* If a new or existing Python project is to be created or used, the following procedure is recommended: |
|||
# Version the Python source files and the <code>requirements.txt</code> file with a version control system, e.g. git. Exclude unnecessary folders and files like for example <code>venv</code> via an entry in the ignore file, for example <code>.gitignore</code>. |
|||
# Create and activate a virtual environment. |
|||
# Use specialized number crunching python modules (e.g. numpy and scipy), don't use plain python for serious caluclations |
|||
## check if optimized compiled modules are available on the cluster (numpy, scipy) |
|||
# Update pip <code>pip install --upgrade pip</code>. |
|||
# Install all required packages via the <code>requirements.txt</code> file in the case of venv. Or by using the corresponding command of your chosen tool. |
|||
* List or dictionary comprehensions are to be prefered over loops as they are in general faster. |
|||
* Be aware of the differences between references, shallow and deep copies. |
|||
* Do not parallelize by hand but use libraries were possible (Dask, ...). |
|||
$SCRATCH will point to a directory on locally attached NVMe devices if (and only if) local scratch has been explicitly requested at job submission (i.e. with --gres=scratch:nnn option). If no local scratch has been requested at job submission $SCRATCH will point to the very same directory as $TMPDIR (i.e. to the RAM disk). |
|||
<!-- |
|||
On the login nodes $TMPDIR and $SCRATCH point to a local scratch directory on that node. This is located at /scratch/<username> and is also not shared across nodes. The data stored in there is private but will be deleted automatically if not accessed for 7 consecutive days. Like any other local scratch space, the data stored in there is NOT included in any backup. |
|||
== Example == |
|||
== Do's and don'ts == |
|||
Using Conda and Pip/Venv together |
|||
== Backup == |
|||
In some cases, Pip/Venv might be the preferred method for installing an environment. This could be because a central Python package comes with installation instructions that only show Pip as a supported option (like Tensorflow) or the use of projects which were written with the use of Pip in mind, for example by offering a requirements.txt and a rewrite with testing is not feasable. |
|||
In this case, it makes sense to use Conda as a replacement for Venv and use it to supply the virtual enviroment with the required Python version. After activating the environment, Pip will we available and refer to this Python enviroment. |
|||
--> |
|||
<b>Only</b> files in your $HOME directory are included in a backup and can be restored in case of accidental deletion. The files in the "Work" filesystem are not intended for permanent storage during your jobs. Sort your results well and save important data in $HOME. |
Revision as of 20:17, 5 October 2025
Introduction
Python is a versatile, easy-to-learn, interpreted programming language. It offers a wide range of libraries for scientific tasks and visualization. Python counts to the best languages for machine learning. Python can be used in particular as an open source alternative for tasks that have usually been used for Matlab.
Installation and Versions
Python is available on all systems. With python --version
you can see the currently active default python version. In general, you can choose from various types of python installations:
- System python: This python version comes together with the operating system and is available upon login to the cluster. Other python versions might be installed along with it. All versions can be seen with
ls /usr/bin/python[0-9].*[0-9] | sort -V | cut -d"/" -f4 | xargs
- They can change over time. You can access a specific Python version by specifying the version in the Python command.
- Software module: Available versions can be identified via
module avail devel/python
- Python distributions and virtual environments: By using python distributions such as Anaconda, you can easily install the needed python version into a virtual environment. For the use of conda on bwHPC clusters, please refer to Conda. Alternatively, you can use more python specific tools for installing python. Some options are listed in Virtual Environments and Package Management
- Container: Containers can contain their own python installation. Keep this in mind when you are working with containers provided by others.
Running Python Code
There are three ways to run Python commands:
- Within a terminal by executing the comand
python
. This starts a Python shell, where all commands are evaluated by the Python interpreter. - Within a script (file ends with .py and can be run with
python myProgram.py
) - Within a notebook (file ends with .ipynb). You can use other programming languages and markdown within a notebook next to your python code. Besides software development itself, teaching, prototyping and visualization are good use cases for notebooks as well.
Development Environments
Development Environments are usually more comfortable than running code directly from the shell. Some common options are:
Virtual Environments and Package Management
Packages contain a set of functions that offer additional functionality. A package can be installed by using a package manager. Virtual environments prevent conflicts between different Python packages by using separate installation directories.
At least one virtual environment should be defined per project. This way, it is clear which packages are needed by a specific project. All virtual environments allow to save the corresponding packages with their specific version numbers to a file. This allows to reinstall them in another place and therefore improves the reproducibility of projects. Furthermore, it makes finding and removing packages, that aren't needed anymore, easier.
Overview
The following table provides an overview of common tools in the field of virtual environments and package management. The main differences between the various options are highlighted. After deciding on a specific tool, it can be installed by following the given link. To make it short, if you plan to use...
- ...Python only:
- venv is the most basic option. Other options build upon it.
- poetry is widely used and offers a broad set of functionality
- uv is the latest option and much faster than poetry while offering the same (or more) functionality.
- ...Python + Conda:
- We wouldn't recommend it but if you want to use conda only: Install the conda packages first into the conda environment and afterwards the python packages. Otherwise, problems might arise.
- For a faster and more up to date solution, choose pixi.
Tool | Description | Can install python versions | Installs packages from PyPI | Installs packages from conda | Dependency Resolver | Dependency Management | Creates Virtual Environments | Supports building, packaging and publishing code (to PyPI) |
---|---|---|---|---|---|---|---|---|
pyenv | Manages python versions on your system. | yes | no | no | no | no | no | no |
pip | For installing python packages. | no | yes | no | yes | no | no | no |
venv | For installing and managing python packages. Part of Python's standard library. | no | no | no | yes | yes | yes | no |
poetry | For installing and managing python packages. Install it with pipx. | no | yes | no | yes | yes | yes | yes |
pipx | For installing and running python applications (like poetry) globally while having them in isolated environments. It is useful for keeping applications globally available and at the same time separated in their own virtual environments. Use it when the installation instructions of an application offer you this way of installation. | no | no | no | yes | yes | yes (only for single applications) | yes |
uv | Replaces pixi, poetry, pyenv, pip etc. and is very fast (https://www.loopwerk.io/articles/2025/uv-keeps-getting-better/) | yes | yes | no | yes | yes | yes | yes |
pixi | For installing and managing python as well as conda packages. Uses uv. | yes | yes | yes | yes | yes | yes | yes |
Pip
The standard package manager under Python is pip
. It can be used to install, update and delete packages. Pip can be called directly or via python -m pip
. The standard repository from which packages are obtained is PyPI (https://pypi.org/). When a package depends on others, they are automatically installed as well.
In the following, the most common pip commands are shown exemplarily. Packages should always be installed within virtual environments to avoid conflicting installations. If you decide to not use a virtual environment, the install commands have to be accomplished by a --user
flag or controlled via environment variables.
Installation of packages
pip install pandas # Installs the latest compatible version of pandas and its required dependencies
pip install pandas=1.5.3 # Installs exact version
pip install pandas>=1.5.3 # Installs version newer or equal to 1.5.3
The packages from PyPI usually consist of precompiled libraries. However, pip
is also capable of creating packages from source code. However, this may require the C/C++ compiler and other dependencies needed to build the libraries. In the example, pip obtains the source code of matplotlib from github.com, installs its dependencies, compiles the library and installs it:
pip install git+https://github.com/matplotlib/matplotlib
Upgrade packages
pip install --upgrade pandas # Updates the library if update is available
Removing packages
pip uninstall pandas # Removes pandas
Show packages
pip list # Shows the installed packages
pip freeze # Shows the installed packages and their versions
Save State
To allow for reproducibility it is important to provide information about the full list of packages and their exact versions see version pinning.
pip freeze > requirements.txt # redirect package and version information to a textfile
pip install -r requirements.txt # Installs all packages that are listed in the file
Venv
The module venv
enables the creation of a virtual environment and is a standard component of Python. Creating a venv
means that a folder is created which contains a separate copy of the Python binary file as well as pip
and setuptools
. After activating the venv
, the binary file in this folder is used when python
or pip
is called. This folder is also the installation target for other Python packages.
Creation
Create, activate, install software, deactivate:
python3.11 -m venv myEnv # Create venv
source myEnv/bin/activate # Activate venv
pip install --upgrade pip # Update of the venv-local pip
pip install <list of packages> # Install packages/modules
deactivate
Install list of software:
pip install -r requirements.txt # Install packages/modules
Usage
To use the virtual environment after all dependencies have been installed there, it is sufficient to simply activate it:
source myEnv/bin/activate # Activate venv
With venv
activated the terminal prompt will reflect that accordingly:
(myEnv) $
It is no longer necessary to specify the Python version, the simple command python
starts the Python version that was used to create the virtual environment.
You can check, which Python version is in use via:
(myEnv) $ which python
</path/to/project>/myEnv/bin/python
Poetry
Creation
When you want to create a virtual environment for an already existing project, you can go to the top directory and run
poetry init # Create virtual enviroment
Otherwise start with the demo project:
poetry new poetry-demo
You can set the allowed python versions in the pyproject.toml. To switch between python installations on your sytem, you can use
poetry env use python3.11
Usage
Install and update packages
poetry install <package_name>
poetry update # Update to latest versions of packages
To execute something within the virtual environment:
poetry run <command>
Helpful links:
Helpful commands
poetry env info # show environment information
poetry env list # list all virtual environments associated with the current project
poetry env list --full-path
poetry env remove # delete virtual environment
Best Practices
- Always use virtual environments! Use one environment per project.
- If a new or existing Python project is to be created or used, the following procedure is recommended:
- Version the Python source files and the
requirements.txt
file with a version control system, e.g. git. Exclude unnecessary folders and files like for examplevenv
via an entry in the ignore file, for example.gitignore
. - Create and activate a virtual environment.
- Use specialized number crunching python modules (e.g. numpy and scipy), don't use plain python for serious caluclations
- check if optimized compiled modules are available on the cluster (numpy, scipy)
- Update pip
pip install --upgrade pip
. - Install all required packages via the
requirements.txt
file in the case of venv. Or by using the corresponding command of your chosen tool.
- List or dictionary comprehensions are to be prefered over loops as they are in general faster.
- Be aware of the differences between references, shallow and deep copies.
- Do not parallelize by hand but use libraries were possible (Dask, ...).