Development/Python: Difference between revisions

From bwHPC Wiki
Jump to navigation Jump to search
 
(19 intermediate revisions by the same user not shown)
Line 4: Line 4:
= Installation and Versions =
= Installation and Versions =
Python is available on all systems. Either as system Python, which comes bundled with the operating system, or via Lmod software modules. Installation is not required.
Python is available on all systems. Either as system Python, which comes bundled with the operating system, or via Lmod software modules. Installation is not required.
The available Python versions may differ from site to site.


== System Python ==
== System Python ==
Line 12: Line 13:
Please note, that these exact versions will change over time and differ from site to site!
Please note, that these exact versions will change over time and differ from site to site!


<source lang="bash">
<syntaxhighlight lang="bash">
$ python --version # This is system default Python
$ python --version # This is system default Python
Python 3.6.8
Python 3.6.8
$ python3.11 --version
$ python3.11 --version
Python 3.11.2
Python 3.11.2
</syntaxhighlight>
</source>


== Python Modules ==
== Python Modules ==
Python is also offered via software modules. Available versions can be identified via:
Python is also offered via software modules. Available versions can be identified via:
<source lang="bash">
<syntaxhighlight lang="bash">
$ module avail devel/python
$ module avail devel/python
</syntaxhighlight>
</source>


A specific version of Python can then be chosen e.g. as follows:
A specific version of Python can then be chosen e.g. as follows:
<source lang="bash">
<syntaxhighlight lang="bash">
$ module load devel/python/3.12.3_gnu_13.3
$ module load devel/python/3.12.3_gnu_13.3
$ python --version
$ python --version
Python 3.12.3
Python 3.12.3
</syntaxhighlight>
</source>


== Python Distributions ==
== Python Distributions ==
For the usage of Python we strongly recommend <b>not to install</b> own Python versions, e.g. via distributions like Anaconda. However, there are use cases where the use of e.g. conda is beneficial or, depending on the scientific community, a standard approach to distribute Python packages. For the use of conda on bwHPC clusters, please refer to [[Development/Conda|Conda]].
If Python versions are required that are not provided by modules or are not installed natively, it is possible to use distributions such as Anaconda. There are use cases where the use of conda is beneficial or, depending on the scientific community, a standard approach to distributing Python packages. For the use of conda on bwHPC clusters, please refer to [[Development/Conda|Conda]].


= Usage =
= Usage =
Line 53: Line 54:
For teaching purposes, code prototyping and visualization, the use of Jupyter notebooks is advantageous. Further information about interactive supercomputing and the use of Jupyter can be found at [[Jupyter]].
For teaching purposes, code prototyping and visualization, the use of Jupyter notebooks is advantageous. Further information about interactive supercomputing and the use of Jupyter can be found at [[Jupyter]].


= Package Manager=
= Package Manager (pip)=


The functionality of Python is extended via modules. The standard package manager under Python is <code>pip</code>. It can be used to install, update and delete packages. <code>pip</code> can be called directly or via <code>python -m pip</code>. The standard repository from which packages are obtained is PyPI (https://pypi.org/). Package dependencies are automatically installed.
The functionality of Python is extended via modules. The standard package manager under Python is <code>pip</code>. It can be used to install, update and delete packages. <code>pip</code> can be called directly or via <code>python -m pip</code>. The standard repository from which packages are obtained is PyPI (https://pypi.org/). Package dependencies are automatically installed.


== Usage ==
Examples:
In the following, the most common pip commands are shown exemplarily. Please note, that you should always <b>use pip in conjunction with venv</b>. If you decide to not use a virtual environment, the install commands have to be accomplished by a <code>--user</code> flag or controlled via environment variables.


<b>Installation of packages</b><br/>
* <code>pip install matplotlib</code> installs the visualization library matplotlib and its required dependencies
<syntaxhighlight lang="python">
* <code>pip uninstall matplotlib</code> uninstalls matplotlib
$ pip install pandas # Installs the latest compatible version of pandas and its required dependencies
* <code>pip install --upgrade matplotlib</code> updates the library if necessary
$ pip install pandas=1.5.3 # Installs exact version
* <code>pip list</code> shows the installed packages and their versions
$ pip install pandas>=1.5.3 # Installs version newer or equal to 1.5.3
* <code>pip freeze</code> shows the installed packages and their versions, the output can be redirected to a `requirements.txt` file. Only recommended if version pinning (https://pip.pypa.io/en/stable/topics/repeatable-installs/) is explicitly desired.
</syntaxhighlight>


The packages from PyPI usually consist of precompiled libraries. However, <code>pip</code> is also capable of creating packages from source code. However, this may require the C/C++ compiler and other dependencies needed to build the libraries.
The packages from PyPI usually consist of precompiled libraries. However, <code>pip</code> is also capable of creating packages from source code. However, this may require the C/C++ compiler and other dependencies needed to build the libraries. In the example, pip obtains the source code of matplotlib from github.com, installs its dependencies, compiles the library and installs it:
<syntaxhighlight lang="python">
$ pip install git+https://github.com/matplotlib/matplotlib
</syntaxhighlight>


Several external packages are usually required for a software project. It is advisable to create the file <code>requirements.txt</code>, i.e. a text file with all dependencies:
* <code>pip install git+https://github.com/matplotlib/matplotlib</code> obtains the source code of matplotlib from github.com, installs its dependencies, compiles the library and installs it


<syntaxhighlight lang="python">
Several external packages are usually required for a software project. It is advisable to create the file <code>requirements.txt</code>, i.e. a text file with all dependencies.
$ pip install -r requirements.txt # Installs these packages in one go
</syntaxhighlight>


<b>Removing packages</b><br/>
* <code>pip install -r requirements.txt</code> installs these packages in one go
<syntaxhighlight lang="python">
$ pip uninstall pandas # Removes pandas
</syntaxhighlight>


<b>Upgrade packages</b><br/>
= Virtual Environments =
<syntaxhighlight lang="python">
$ pip install --upgrade pandas # Updates the library if update is available
</syntaxhighlight>
<b>Show packages</b><br/>
<syntaxhighlight lang="python">
$ pip list # Shows the installed packages
</syntaxhighlight>
<syntaxhighlight lang="python">
$ pip freeze # Shows the installed packages and their versions
</syntaxhighlight>
The output can be redirected to a <code>requirements.txt</code> file: <code>$ pip freeze > requirements.txt</code><br/>
This is recommended if version pinning (https://pip.pypa.io/en/stable/topics/repeatable-installs/) is explicitly desired.

= Virtual Environments (venv) =
Virtual environments allow Python packages to be installed separately in a separate installation directory for a specific application instead of installing them globally. This prevents version conflicts, promotes clarity as to which packages are required for which software and prevents the home directory from being cluttered with libraries that are not (or no longer) required.
Virtual environments allow Python packages to be installed separately in a separate installation directory for a specific application instead of installing them globally. This prevents version conflicts, promotes clarity as to which packages are required for which software and prevents the home directory from being cluttered with libraries that are not (or no longer) required.


<b>It is highly recommended to always use venv whenever a package is to be installed.</b>
== Creation ==
== Creation ==
The module <code>venv</code> enables the creation of a virtual environment and is a standard component of Python. Creating a <code>venv</code> means that a folder is created which contains a separate copy of the Python binary file as well as <code>pip</code> and <code>setuptools</code>. After activating the <code>venv</code>, the binary file in this folder is used when <code>python</code> or <code>pip</code> is called. This folder is also the installation target for other Python packages.
The module <code>venv</code> enables the creation of a virtual environment and is a standard component of Python. Creating a <code>venv</code> means that a folder is created which contains a separate copy of the Python binary file as well as <code>pip</code> and <code>setuptools</code>. After activating the <code>venv</code>, the binary file in this folder is used when <code>python</code> or <code>pip</code> is called. This folder is also the installation target for other Python packages.


Create, activate, install software, deactivate:
Create, activate, install software, deactivate:
<source lang="bash">
<syntaxhighlight lang="bash">
$ python3.11 -m venv myEnv # create venv
$ python3.11 -m venv myEnv # Create venv
$ source myEnv/bin/activate # activate venv
$ source myEnv/bin/activate # Activate venv
$ pip install --upgrade pip # update of the venv-local pip
$ pip install --upgrade pip # Update of the venv-local pip
$ pip install <list of packages> # install packages/modules
$ pip install <list of packages> # Install packages/modules
$ deactivate
$ deactivate
</syntaxhighlight>
</source>


Install list of software:
Install list of software:
<source lang="bash">
<syntaxhighlight lang="bash">
$ pip install -r requirements.txt # install packages/modules
$ pip install -r requirements.txt # Install packages/modules
</syntaxhighlight>
</source>


== Usage ==
== Usage ==
To use the virtual environment after all dependencies have been installed there, it is sufficient to simply activate it:
To use the virtual environment after all dependencies have been installed there, it is sufficient to simply activate it:
<source lang="bash">
<syntaxhighlight lang="bash">
$ source myEnv/bin/activate # activate venv
$ source myEnv/bin/activate # Activate venv
</syntaxhighlight>
</source>


With <code>venv</code> activated, it is no longer necessary to specify the Python version. The simple command <code>python</code> starts the Python version that was used to create the virtual environment.
With <code>venv</code> activated the terminal prompt will reflect that accordingly:
<syntaxhighlight lang="bash">
(myEnv) $
</syntaxhighlight>
It is no longer necessary to specify the Python version, the simple command <code>python</code> starts the Python version that was used to create the virtual environment.<br/>
You can check, which Python version is in use via:
<syntaxhighlight lang="bash">
(myEnv) $ which python
</path/to/project>/myEnv/bin/python
</syntaxhighlight>


= Best Practice =
= Best Practice =
Line 109: Line 144:
# Install the all required packages via a <code>requirements.txt</code> file.
# Install the all required packages via a <code>requirements.txt</code> file.



<!--
== Example ==
== Example ==
== Do's and don'ts ==


Using Conda and Pip/Venv together
In some cases, Pip/Venv might be the preferred method for installing an environment. This could be because a central Python package comes with installation instructions that only show Pip as a supported option (like Tensorflow) or the use of projects which were written with the use of Pip in mind, for example by offering a requirements.txt and a rewrite with testing is not feasable.
In this case, it makes sense to use Conda as a replacement for Venv and use it to supply the virtual enviroment with the required Python version. After activating the environment, Pip will we available and refer to this Python enviroment.


-->
== Do's and don'ts ==

Latest revision as of 15:30, 3 December 2024

Introduction

Python is a versatile, easy-to-learn, usually interpreted programming language. It offers a wide range of libraries for scientific tasks and visualization. Python is the de facto standard interface for applications of machine learning. Python can be used in particular as an open source alternative for tasks that have usually been used for Matlab.

Installation and Versions

Python is available on all systems. Either as system Python, which comes bundled with the operating system, or via Lmod software modules. Installation is not required. The available Python versions may differ from site to site.

System Python

System Python is available in different versions, typically the default system Python is too old for most users and applications. Newer versions are therefore installed alongside the standard version. You can access a specific Python version by specifying the version in the Python command.

On bwUniCluster, different versions are currently available, ls /usr/bin/python[0-9].*[0-9] | sort -V | cut -d"/" -f4 | xargs results in: python2.7 python3.6 python3.8 python3.9 python3.11.

Please note, that these exact versions will change over time and differ from site to site!

$ python --version # This is system default Python
Python 3.6.8
$ python3.11 --version
Python 3.11.2

Python Modules

Python is also offered via software modules. Available versions can be identified via:

$ module avail devel/python

A specific version of Python can then be chosen e.g. as follows:

$ module load devel/python/3.12.3_gnu_13.3
$ python --version
Python 3.12.3

Python Distributions

If Python versions are required that are not provided by modules or are not installed natively, it is possible to use distributions such as Anaconda. There are use cases where the use of conda is beneficial or, depending on the scientific community, a standard approach to distributing Python packages. For the use of conda on bwHPC clusters, please refer to Conda.

Usage

There are two ways to run Python commands:

  1. Interactive Mode
  2. Script Mode

In order to start the interactive mode, simply run the python command. This starts a Python shell, where all commands are evaluated by the Python interpreter.

Script mode basically means, that all Python commands are stored in a text file with the extension .py.
The program will be executed via python myProgram.py

For teaching purposes, code prototyping and visualization, the use of Jupyter notebooks is advantageous. Further information about interactive supercomputing and the use of Jupyter can be found at Jupyter.

Package Manager (pip)

The functionality of Python is extended via modules. The standard package manager under Python is pip. It can be used to install, update and delete packages. pip can be called directly or via python -m pip. The standard repository from which packages are obtained is PyPI (https://pypi.org/). Package dependencies are automatically installed.

Usage

In the following, the most common pip commands are shown exemplarily. Please note, that you should always use pip in conjunction with venv. If you decide to not use a virtual environment, the install commands have to be accomplished by a --user flag or controlled via environment variables.

Installation of packages

$ pip install pandas           # Installs the latest compatible version of pandas and its required dependencies
$ pip install pandas=1.5.3     # Installs exact version
$ pip install pandas>=1.5.3    # Installs version newer or equal to 1.5.3

The packages from PyPI usually consist of precompiled libraries. However, pip is also capable of creating packages from source code. However, this may require the C/C++ compiler and other dependencies needed to build the libraries. In the example, pip obtains the source code of matplotlib from github.com, installs its dependencies, compiles the library and installs it:

$ pip install git+https://github.com/matplotlib/matplotlib

Several external packages are usually required for a software project. It is advisable to create the file requirements.txt, i.e. a text file with all dependencies:

$ pip install -r requirements.txt   # Installs these packages in one go

Removing packages

$ pip uninstall pandas         # Removes pandas

Upgrade packages

$ pip install --upgrade pandas # Updates the library if update is available

Show packages

$ pip list           # Shows the installed packages
$ pip freeze         # Shows the installed packages and their versions

The output can be redirected to a requirements.txt file: $ pip freeze > requirements.txt
This is recommended if version pinning (https://pip.pypa.io/en/stable/topics/repeatable-installs/) is explicitly desired.

Virtual Environments (venv)

Virtual environments allow Python packages to be installed separately in a separate installation directory for a specific application instead of installing them globally. This prevents version conflicts, promotes clarity as to which packages are required for which software and prevents the home directory from being cluttered with libraries that are not (or no longer) required.

It is highly recommended to always use venv whenever a package is to be installed.

Creation

The module venv enables the creation of a virtual environment and is a standard component of Python. Creating a venv means that a folder is created which contains a separate copy of the Python binary file as well as pip and setuptools. After activating the venv, the binary file in this folder is used when python or pip is called. This folder is also the installation target for other Python packages.

Create, activate, install software, deactivate:

$ python3.11 -m venv myEnv        # Create venv
$ source myEnv/bin/activate       # Activate venv
$ pip install --upgrade pip       # Update of the venv-local pip
$ pip install <list of packages>  # Install packages/modules
$ deactivate

Install list of software:

$ pip install -r requirements.txt # Install packages/modules

Usage

To use the virtual environment after all dependencies have been installed there, it is sufficient to simply activate it:

$ source myEnv/bin/activate       # Activate venv

With venv activated the terminal prompt will reflect that accordingly:

(myEnv) $

It is no longer necessary to specify the Python version, the simple command python starts the Python version that was used to create the virtual environment.
You can check, which Python version is in use via:

(myEnv) $ which python
</path/to/project>/myEnv/bin/python

Best Practice

  • Always use virtual environments! One venv per project.
  • If a new or existing Python project is to be created or used, the following procedure is recommended:
  1. Version the Python source files and the requirements.txt file with a version control system, e.g. git. When versioning, exclude the venv folder via an entry in the .gitignore file.
  2. Create and activate a venv.
  3. Update the venv-local pip.
  4. Install the all required packages via a requirements.txt file.