Python Environments: Difference between revisions
(Created page with "stub") |
H Winkhardt (talk | contribs) (Initial page) |
||
Line 1: | Line 1: | ||
= Introduction = |
|||
stub |
|||
When installing Python, typically all required packages are installed with <code>pip install xyz</code>. Doing nothing else, these packages would be installed in the global python installation of a system. With working on multiple projects, over time many packages can pile up. It is only a matter of time until this will lead to problems. |
|||
Say you have <code>pandas 2.2.2</code> installed for a project A. After a while you work on a new project B which requires you to install <code>auto-sklearn 0.15.0</code>. This package is however only compatible with versions of <code>pandas</code> up to <code>pandas 2.0.0</code>. In this case, Pip might uninstall the first version of <code>pandas</code>, installing a a compatible one, namely <code>pandas 1.5.3</code>. This might however break project A because these versions of <code>pandas</code> are not entirely backwards-compatible. |
|||
This is one of the major reasons which is became common practice to work in '''virtual environments'''. This is a built-in functionality in a fresh Python install. Within the domain of HPC systems, virtual environments are even more necessary since there are multiple users on the system. Giving each the permission to install the packages that they need in a global python environment would quickly end up in a mess, which is why this is disabled. Environments are left for the user to create and configure within their own directory. |
|||
= Pip / Venv = |
|||
> Venv/Pip is not the recommended way to set up an environment, but might still be necessary in some cases. Skip to Conda if you want to get started in the recommended way. |
|||
== Pip == |
|||
'''Pip''' and '''Venv''' are two modules that oftentimes come with Python by default. [https://pip.pypa.io/en/stable/ Pip] is Pythons default Package manager and accesses [https://pypi.org PyPI] (''Python Package Index'') to download packages. |
|||
<pre> |
|||
$ pip install pandas # Installs the latest compatible version |
|||
$ pip install pandas=1.5.3 # Installs exact version |
|||
$ pip install pandas>=1.5.3 # Installs version newer or equal to 1.5.3 |
|||
</pre> |
|||
Pip is however not a standalone program but comes tied to each installation of Python. This can lead to some confusion, as oftentimes versions of Python come preinstalled with the system. When there are multiple instances of Python installed, it might not always be clear which one is actually currently active. The easiest way to remedy this is by launching pip within the context of a certain Python interpreter. This way it's clear which exact interpreter a package is associated with. |
|||
<pre> |
|||
$ which python3 # returns the path to the currently active python distribution |
|||
/usr/bin/python3 |
|||
$ python3 -m pip install pandas # -m mod : run library module as a script |
|||
</pre> |
|||
== requirements.txt == |
|||
Because its cumbersome to try to install each package of a project manually, a file can be provided which lists the dependencies, typically named <code>requirements.txt</code>. It simply lists all the packages, and can also be supplied with specific versions. |
|||
<pre> |
|||
pexpect |
|||
requests=2.32.3 |
|||
</pre> |
|||
This can then be used with |
|||
<pre> |
|||
$ python3 -m pip install -r requirements.txt |
|||
</pre> |
|||
Oftentimes, a project will accumulate packages over time. The following command exports all the currently installed package names and versions to a new requirements.txt which can be used later to rebuild the exact environment. |
|||
<pre> |
|||
$ python3 -m pip freeze > requirements.txt |
|||
</pre> |
|||
Pip on its own is not capable of separating package installations for specific environments. For this, '''Venv''' is required. |
|||
== Venv == |
|||
[https://docs.python.org/3/library/venv.html Venv] is responsible for making a copy of the global Python installation to a local folder, e.g. a project folder. |
|||
<pre> |
|||
$ python3 -m venv <path_to_environment> |
|||
$ python3 -m venv .venv |
|||
</pre> |
|||
Ran from the project directory, this will create a <code>.venv</code> directory with a new interpreter inside. It can be activated with |
|||
<pre> |
|||
$ source .venv/bin/activate |
|||
</pre> |
|||
When the virtual environment is successfully activated, the terminal prompt will reflect that |
|||
<pre> |
|||
(.venv) $ |
|||
</pre> |
|||
Using which also shows the path of the new interpreter: |
|||
<pre> |
|||
(.venv) $ which python |
|||
</path/to/project>.venv/bin/python |
|||
</pre> |
|||
With this, we can then use Pip to install packages to the new environment: |
|||
<pre> |
|||
(.venv) $ python -m pip install pandas |
|||
</pre> |
|||
== Limitations of Pip / Venv == |
|||
While the combination of Pip and Venv is common for projects ran on single user systems, using it on HPC systems, one can run into some problems. The largest of which is the fact that this solution is always dependent on a centrally installed version of Python. There are however too many versions to cater to the demands of the many users of HPCs. Since users are not permitted to freely install new global versions of Python, a more sustainable approach would be to enable users to install Python versions themselves within their home directory. This is why **Conda** is currently considered the go-to solution. |
|||
= Conda = |
|||
In the context of Python combines the package manager, the virtual environment creator, as well as a manger for Python versions. Instead of having a virtual environment that is dependent on an already available Python interpreter, a virtual environment is created with an explicit version, which is downloaded if not already available. This allow users to install their own versions as needed. On top of that, Conda comes with its own package repository solution. |
|||
Detailed instructions for Conda Environments can be found under [[Development/Conda]]. |
|||
= Poetry = |
|||
TBA |
Revision as of 16:58, 4 November 2024
Introduction
When installing Python, typically all required packages are installed with pip install xyz
. Doing nothing else, these packages would be installed in the global python installation of a system. With working on multiple projects, over time many packages can pile up. It is only a matter of time until this will lead to problems.
Say you have pandas 2.2.2
installed for a project A. After a while you work on a new project B which requires you to install auto-sklearn 0.15.0
. This package is however only compatible with versions of pandas
up to pandas 2.0.0
. In this case, Pip might uninstall the first version of pandas
, installing a a compatible one, namely pandas 1.5.3
. This might however break project A because these versions of pandas
are not entirely backwards-compatible.
This is one of the major reasons which is became common practice to work in virtual environments. This is a built-in functionality in a fresh Python install. Within the domain of HPC systems, virtual environments are even more necessary since there are multiple users on the system. Giving each the permission to install the packages that they need in a global python environment would quickly end up in a mess, which is why this is disabled. Environments are left for the user to create and configure within their own directory.
Pip / Venv
> Venv/Pip is not the recommended way to set up an environment, but might still be necessary in some cases. Skip to Conda if you want to get started in the recommended way.
Pip
Pip and Venv are two modules that oftentimes come with Python by default. Pip is Pythons default Package manager and accesses PyPI (Python Package Index) to download packages.
$ pip install pandas # Installs the latest compatible version $ pip install pandas=1.5.3 # Installs exact version $ pip install pandas>=1.5.3 # Installs version newer or equal to 1.5.3
Pip is however not a standalone program but comes tied to each installation of Python. This can lead to some confusion, as oftentimes versions of Python come preinstalled with the system. When there are multiple instances of Python installed, it might not always be clear which one is actually currently active. The easiest way to remedy this is by launching pip within the context of a certain Python interpreter. This way it's clear which exact interpreter a package is associated with.
$ which python3 # returns the path to the currently active python distribution /usr/bin/python3 $ python3 -m pip install pandas # -m mod : run library module as a script
requirements.txt
Because its cumbersome to try to install each package of a project manually, a file can be provided which lists the dependencies, typically named requirements.txt
. It simply lists all the packages, and can also be supplied with specific versions.
pexpect requests=2.32.3
This can then be used with
$ python3 -m pip install -r requirements.txt
Oftentimes, a project will accumulate packages over time. The following command exports all the currently installed package names and versions to a new requirements.txt which can be used later to rebuild the exact environment.
$ python3 -m pip freeze > requirements.txt
Pip on its own is not capable of separating package installations for specific environments. For this, Venv is required.
Venv
Venv is responsible for making a copy of the global Python installation to a local folder, e.g. a project folder.
$ python3 -m venv <path_to_environment> $ python3 -m venv .venv
Ran from the project directory, this will create a .venv
directory with a new interpreter inside. It can be activated with
$ source .venv/bin/activate
When the virtual environment is successfully activated, the terminal prompt will reflect that
(.venv) $
Using which also shows the path of the new interpreter:
(.venv) $ which python </path/to/project>.venv/bin/python
With this, we can then use Pip to install packages to the new environment:
(.venv) $ python -m pip install pandas
Limitations of Pip / Venv
While the combination of Pip and Venv is common for projects ran on single user systems, using it on HPC systems, one can run into some problems. The largest of which is the fact that this solution is always dependent on a centrally installed version of Python. There are however too many versions to cater to the demands of the many users of HPCs. Since users are not permitted to freely install new global versions of Python, a more sustainable approach would be to enable users to install Python versions themselves within their home directory. This is why **Conda** is currently considered the go-to solution.
Conda
In the context of Python combines the package manager, the virtual environment creator, as well as a manger for Python versions. Instead of having a virtual environment that is dependent on an already available Python interpreter, a virtual environment is created with an explicit version, which is downloaded if not already available. This allow users to install their own versions as needed. On top of that, Conda comes with its own package repository solution.
Detailed instructions for Conda Environments can be found under Development/Conda.
Poetry
TBA