NEMO2/Easybuild Modules/EB Build Module: Difference between revisions

From bwHPC Wiki
Jump to navigation Jump to search
mNo edit summary
Line 161: Line 161:
{| class="wikitable"
{| class="wikitable"
|-
|-
! Architecture !! Type !! GPU Support
! Architecture !! Type !! GPU Support !! Notes
|-
|-
| genoa || CPU || No
| genoa || CPU (AMD EPYC Genoa) || No || Physical architecture
|-
|-
| milan || CPU || No
| milan || CPU (AMD EPYC Milan) || No || Physical architecture
|-
|-
| mi300a || GPU || Yes (1 GPU allocated)
| mi300a || GPU (AMD MI300A) || Yes (1 GPU allocated) || On NEMO2: symbolic link to genoa
|-
|-
| l40s || GPU || Yes (1 GPU allocated)
| l40s || GPU (NVIDIA L40S) || Yes (1 GPU allocated) || Physical architecture (Intel-based)
|-
|-
| h200 || GPU || Yes (1 GPU allocated)
| h200 || GPU (NVIDIA H200) || Yes (1 GPU allocated) || On NEMO2: symbolic link to genoa
|}
|}

'''Important:''' On NEMO2, there are only '''three physical architectures''':
* '''milan''' - AMD EPYC Milan CPUs
* '''genoa''' - AMD EPYC Genoa CPUs
* '''l40s''' - Intel CPUs with NVIDIA L40S GPUs

The architectures '''mi300a''' and '''h200''' are symbolic links pointing to '''genoa'''. This means that modules built for mi300a or h200 are actually built on genoa nodes and stored in the genoa prefix directory.


'''Note:''' GPU architectures (mi300a, l40s, h200) automatically allocate 1 GPU for the build job.
'''Note:''' GPU architectures (mi300a, l40s, h200) automatically allocate 1 GPU for the build job.

Revision as of 15:31, 24 November 2025

EasyBuild Module Builder Script

The eb-build-module.sh script is a wrapper around EasyBuild that simplifies building software modules with customizable configurations. It supports parallel builds, custom walltime limits, architecture-specific builds, and SLURM job submission.

This script is provided system-wide and can be called directly without specifying a path.

Features

  • Architecture-specific builds: Support for genoa, milan, mi300a, l40s, and h200 architectures
  • Automatic GPU allocation: Detects GPU architectures and allocates GPUs accordingly
  • Flexible configuration: Customizable cores, walltime, and installation prefix
  • Multiple build modes: Standard build, rebuild, module-only, and dry-run
  • EasyConfig search: Built-in search functionality for finding available easyconfigs
  • Environment integration: Automatically sources local EasyBuild environment
  • Additional options: Pass any EasyBuild option directly to the eb command

Usage

Basic Syntax

eb-build-module.sh [-c cores] [-w walltime] [-a arch] [-p prefix] [-r] [-o] [-d] [-s pattern] -m module [-- extra_eb_options]

Options

Option Description Default
-m module Name of the module to build (required unless using -s) -
-c cores Number of cores to allocate for the job 20
-w walltime Maximum walltime in hours 12
-a arch Architecture/partition: genoa, milan, mi300a, l40s, h200 milan
-p prefix Installation prefix directory ~/.local/easybuild
-r Rebuild software, even if module already exists (--rebuild) -
-o Only generate module file(s), skip build steps (--module-only) -
-d Perform a dry-run, print build overview (--dry-run) -
-s pattern Search for easyconfig files matching pattern (--search-short) -
-h Display help message -
-- Pass everything after this to eb command -

Environment Variables

The script respects the following environment variables:

Variable Description
PREFIX Installation prefix (can be overridden with -p)
EASYBUILD_MODULE_NAMING_SCHEME Module naming scheme
EASYBUILD_ACCEPT_EULA_FOR Comma-separated list of EULAs to accept
EB_COMSOL_LICENSE_FILE COMSOL license file path
EB_MATLAB_KEY MATLAB license key
EB_MATLAB_LICENSE_SERVER MATLAB license server hostname
EB_MATLAB_LICENSE_SERVER_PORT MATLAB license server port
EB_MATHEMATICA_LICENSE_SERVER Mathematica license server

Note: If ~/.local/easybuild/env exists, it will be sourced automatically.

Examples

Basic Build

Build a module with default settings:

eb-build-module.sh -m Python-3.11.3-GCCcore-12.3.0

Custom Configuration

Build with custom cores, walltime, and architecture:

eb-build-module.sh -c 32 -w 24 -a genoa -m Python-3.11.3-GCCcore-12.3.0

Rebuild Existing Module

Force rebuild of an existing module:

eb-build-module.sh -r -m Python-3.11.3-GCCcore-12.3.0

Module File Only

Generate only the module file without building:

eb-build-module.sh -o -m Python-3.11.3-GCCcore-12.3.0

Dry-Run

Preview what would be built without actually building:

eb-build-module.sh -d -m Python-3.11.3-GCCcore-12.3.0

Search for EasyConfigs

Search for available easyconfig files:

# Simple search
eb-build-module.sh -s Python

# Search with regex pattern
eb-build-module.sh -s "GCC.*12.3"

Pass Additional EasyBuild Options

Pass extra options directly to EasyBuild:

# Build with force and debug options
eb-build-module.sh -m Python-3.11.3-GCCcore-12.3.0 -- --force --debug

# Skip test step
eb-build-module.sh -m Python-3.11.3-GCCcore-12.3.0 -- --skip-test-step

Using Environment Variables

Set custom prefix and accept EULAs:

export PREFIX=/opt/easybuild
export EASYBUILD_ACCEPT_EULA_FOR="CUDA,cuDNN,Intel-oneAPI"
eb-build-module.sh -m CUDA-12.0.0

Architecture Support

The script supports the following architectures:

Architecture Type GPU Support Notes
genoa CPU (AMD EPYC Genoa) No Physical architecture
milan CPU (AMD EPYC Milan) No Physical architecture
mi300a GPU (AMD MI300A) Yes (1 GPU allocated) On NEMO2: symbolic link to genoa
l40s GPU (NVIDIA L40S) Yes (1 GPU allocated) Physical architecture (Intel-based)
h200 GPU (NVIDIA H200) Yes (1 GPU allocated) On NEMO2: symbolic link to genoa

Important: On NEMO2, there are only three physical architectures:

  • milan - AMD EPYC Milan CPUs
  • genoa - AMD EPYC Genoa CPUs
  • l40s - Intel CPUs with NVIDIA L40S GPUs

The architectures mi300a and h200 are symbolic links pointing to genoa. This means that modules built for mi300a or h200 are actually built on genoa nodes and stored in the genoa prefix directory.

Note: GPU architectures (mi300a, l40s, h200) automatically allocate 1 GPU for the build job.

Workflow

Standard Build Workflow

  1. Search for available versions: eb-build-module.sh -s Python
  2. Dry-run to check dependencies: eb-build-module.sh -d -m Python-3.11.3-GCCcore-12.3.0
  3. Build the module: eb-build-module.sh -m Python-3.11.3-GCCcore-12.3.0
  4. Verify module is installed: module avail python

GPU Software Workflow

  1. Choose appropriate GPU architecture
  2. Build with GPU arch: eb-build-module.sh -a mi300a -m CUDA-12.0.0
  3. Module will be installed under the architecture-specific prefix

Build Process

The script performs the following steps:

  1. Sources local EasyBuild environment if available (~/.local/easybuild/env)
  2. Validates input parameters (module name, architecture)
  3. Configures architecture-specific settings (including GPU allocation)
  4. Automatically appends .eb extension to module name if not present
  5. Creates log directory: ~/.eb/robot/{architecture}/logs
  6. Purges existing modules to ensure clean build environment
  7. Constructs and executes EasyBuild command with all specified options
  8. Reports build status (success or failure with exit code)

Script Output

The script provides detailed output including:

  • Module name and architecture
  • Number of cores and walltime
  • Installation prefix
  • Log directory
  • Build mode (rebuild, module-only, dry-run)
  • Full EasyBuild command being executed
  • Build status (success or failure with exit code)

Example output:

==========================================
EasyBuild Module Build Configuration
==========================================
Module:        Python-3.11.3-GCCcore-12.3.0.eb
Architecture:  milan
Cores:         20
Walltime:      12 hours
Prefix:        /home/user/.local/easybuild/milan
Log directory: /home/user/.eb/robot/milan/logs
==========================================

Starting EasyBuild...
Command: eb Python-3.11.3-GCCcore-12.3.0.eb --prefix /home/user/.local/easybuild/milan --robot --module-extensions --job --job-cores 20 --job-max-walltime 12

...

==========================================
Build completed successfully!
==========================================

Error Handling

The script includes robust error handling:

  • Validates module name is provided (unless searching)
  • Validates architecture is one of the supported values
  • Ensures log directory can be created
  • Exits with appropriate error codes
  • Provides clear error messages

Tips and Best Practices

1. Use Dry-Run First

Always preview what will be built before starting a build:

eb-build-module.sh -d -m Python-3.11.3-GCCcore-12.3.0

This shows all dependencies and build steps without actually building.

2. Search Before Building

Search for available versions to ensure you're building the correct module:

eb-build-module.sh -s Python

3. Set Up Local Environment

Create ~/.local/easybuild/env to set default environment variables:

# Example ~/.local/easybuild/env
export EASYBUILD_MODULE_NAMING_SCHEME=CategorizedModuleNamingScheme
export EASYBUILD_ACCEPT_EULA_FOR="CUDA,cuDNN,Intel-oneAPI"
export PREFIX=/custom/path/easybuild

This file is sourced automatically by the script.

4. Monitor Build Logs

Build logs are stored in:

~/.eb/robot/{architecture}/logs/

Check these logs if a build fails or behaves unexpectedly.

5. Choose Correct Architecture

Use the architecture that matches your target hardware:

  • CPU-only software: genoa or milan
  • GPU software: mi300a, l40s, or h200

6. Use Appropriate Resources

Adjust cores and walltime based on the software being built:

  • Small packages: Default settings (20 cores, 12 hours) are usually sufficient
  • Large packages (e.g., GCC, LLVM): Increase cores and walltime
eb-build-module.sh -c 64 -w 48 -m GCC-13.2.0

Troubleshooting

Module Not Found

If EasyBuild cannot find the module:

# Search for available versions
eb-build-module.sh -s "ModuleName"

# Check if .eb extension is needed
eb-build-module.sh -m ModuleName-Version.eb

Build Fails

Check the log files in ~/.eb/robot/{architecture}/logs/ for detailed error messages.

Permission Issues

Ensure you have write permissions to:

  • Installation prefix (PREFIX)
  • Log directory (~/.eb/robot/{architecture}/logs)

EULA Acceptance

For software requiring EULA acceptance:

export EASYBUILD_ACCEPT_EULA_FOR="CUDA,cuDNN,Intel-oneAPI"
eb-build-module.sh -m CUDA-12.0.0

See Also

Version Information

  • Script version: 1.0.0
  • Last updated: November 2025
  • Default architecture: milan
  • Default cores: 20
  • Default walltime: 12 hours