Development/ollama: Difference between revisions
(Introduction to Ollama) |
No edit summary |
||
Line 3: | Line 3: | ||
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work. |
This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work. |
||
== Introduction == |
|||
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows |
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows |
||
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like AMD MI-300A, |
performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like AMD MI-300A, |
||
as well as GPUs like multiple NVIDIA H100. |
as well as GPUs like multiple NVIDIA H100. |
||
Installing the inference server Ollama assumes you have root permission to install the server globally for all users |
|||
into the directory <code>/usr/local/bin</code>. Of course, this is not sensible. |
|||
Therefore the clusters provide the [[Environment_Modules Environment Modules]], here: |
|||
module load devel/ollama |
|||
More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page. |
More information is available in [https://github.com/ollama/ollama/tree/main/docs Ollamas Github documentation] page. |
||
Line 13: | Line 18: | ||
which is not visible to any outside computer like Your laptop. |
which is not visible to any outside computer like Your laptop. |
||
Therefore we need a way to forward this port on an IP visible to the outside. |
Therefore we need a way to forward this port on an IP visible to the outside. |
||
== Port forwarding == |
|||
The login nodes of course have externally visible IP addresses, e.g. <code>bwunicluster.scc.kit.edu</code> which get to resolved to one of the multiple login nodes. |
|||
Using the Secure shell <code>ssh</code> one may forward a port from the login node to the compute node. |
|||
First, we need to allocate a compute node using [[BwUniCluster2.0/Slurm Slurm]]. |
Revision as of 15:07, 11 February 2025
Using LLMs even for inferencing requires large computational resources - currently at best a powerful GPU -- as provided by the bwHPC clusters. This page explains to how to make usage of bwHPC resources, using Ollama as an example to show best practices at work.
Introduction
Ollama is an inferencing framework that provides access to a multitude of powerful, large models and allows performant access to a variety of accelerators, e.g. from CPUs using AVX-512 to APUs like AMD MI-300A, as well as GPUs like multiple NVIDIA H100.
Installing the inference server Ollama assumes you have root permission to install the server globally for all users
into the directory /usr/local/bin
. Of course, this is not sensible.
Therefore the clusters provide the Environment_Modules Environment Modules, here:
module load devel/ollama
More information is available in Ollamas Github documentation page.
The inference server Ollama opens the well-known port 11434. The compute node's IP is on the internal network, e.g. 10.1.0.101, which is not visible to any outside computer like Your laptop. Therefore we need a way to forward this port on an IP visible to the outside.
Port forwarding
The login nodes of course have externally visible IP addresses, e.g. bwunicluster.scc.kit.edu
which get to resolved to one of the multiple login nodes.
Using the Secure shell ssh
one may forward a port from the login node to the compute node.
First, we need to allocate a compute node using BwUniCluster2.0/Slurm Slurm.