BinAC2/Software/Nextflow

From bwHPC Wiki
< BinAC2‎ | Software
Revision as of 17:20, 16 April 2025 by F Bartusch (talk | contribs) (Created page with "= Description = Nextflow is a scientific workflow system primarily used for bioinformatics data analysis. This documentation also introduces nf-core, a community-driven initiative that maintains a curated collection of analysis pipelines built with Nextflow. The documentation in the bwHPC Wiki serves as a getting-started guide for installing and using both Nextflow and nf-core pipelines on the bwForCluster BinAC 2. Additionally, [https://nf-co.re/pipelines/ the nf-cor...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Description

Nextflow is a scientific workflow system primarily used for bioinformatics data analysis. This documentation also introduces nf-core, a community-driven initiative that maintains a curated collection of analysis pipelines built with Nextflow.

The documentation in the bwHPC Wiki serves as a getting-started guide for installing and using both Nextflow and nf-core pipelines on the bwForCluster BinAC 2. Additionally, the nf-core documentation provides an overview of the available pipelines. The nf-core documentation provides an overview of available pipelines.

Please note that this documentation does not cover how to develop your own pipelines. For that, refer to the official Nextflow documentation.

Installation

We recommend installing Nextflow using Conda.

Install Nextflow

The following commands will create a new Conda environment and install Nextflow in it.

# Load Miniforge and create a Conda environment with Nextflow pre-installed
module load devel/miniforge
conda create --name nextflow nextflow
conda activate nextflow

Update Nextflow

You can also update Nextflow if it is already installed in your environment by running the following command:

module load devel/miniforge
conda activate nextflow
conda update nextflow

Specific Nextflow version

You may want to install a specific version of Nextflow if your pipeline was developed some time ago with an older version in mind. In this example, we will install Nextflow version 20.07:

module load devel/miniforge
conda create --name nextflow_20.07 nextflow=20.07
conda activate nextflow_20.07

Configuration

There are some BinAC 2-specific configurations that you may want to use.

BinAC 2 Nextflow profile

Nextflow configuration files can define one or more profiles, which instruct Nextflow on how to execute pipeline processes on specific systems, such as HPC clusters. The nf-core project maintains a collection of these profiles, including a binac2 profile that runs your pipeline as SLURM jobs on BinAC 2.

nf-core pipelines

If you are using an nf-core pipeline, you can specify the profile with the following command:

nextflow run <pipeline> -profile binac2,<other profiles> [...]

Other Nextflow pipelines

If you are writing your own pipeline or using one that is not based on nf-core, you will need to manually include the nf-core profiles in the pipeline configuration. Add the following to your nextflow.config

params {
      [...]
      custom_config_version      = 'master'
      custom_config_base         = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}"
}
[...]
// Load nf-core custom profiles from different Institutions
includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/nfcore_custom.config" : "/dev/null"

// Load nf-core/demo custom profiles from different institutions.
// nf-core: Optionally, you can add a pipeline-specific nf-core config at https://github.com/nf-core/configs
includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/pipeline/demo.config" : "/dev/null"

Now, your pipeline should be able to find the binac2 profile, and you can run the following command:

nextflow run <pipeline> -profile binac2,<other profiles> [...]

Apptainer

Nextflow uses either Conda packages or container images to deploy tools in a pipeline. On BinAC 2, Apptainer is installed on every node, and the binac2 profile automatically enables Apptainer and specifies a cache directory for your images.

apptainer {
    enabled      = true
    autoMounts   = true
    pullTimeout  = '120m'
    cacheDir     = "/pfs/10/project/apptainer_cache/${USER}"
    envWhitelist = 'CUDA_VISIBLE_DEVICES'
}

Usage

Nextflow pipelines do not run in the background by default, so it's recommended to use a terminal multiplexer (such as screen or tmux) on the login node when running a pipeline. Terminal multiplexers allow you to have multiple windows within a single terminal. The advantage of using them for running Nextflow pipelines is that you can detach from the terminal and later reattach to check on the pipeline’s progress. This ensures the pipeline continues to run even if you disconnect from the cluster, as the detached session will keep running.

Start a screen session:

screen

Since this is a new terminal session, you will need to load the Conda environment again.

module load devel/miniforge
conda activate nextflow

nf-core Pipelines

If you plan to use an nf-core pipeline, please run it once with the test profile. This will download the pipeline, execute it, and pull all the required containers into the Apptainer cache.

You should always specify two directories when running the pipeline to ensure you know exactly where the results are stored. One is outdir, which is where Nextflow stores the final pipeline results. The other is workdir, where Nextflow stores intermediate results and job scripts. Please set a working directory on either the work or project file system. Otherwise, it may clutter the backed-up home directory.

nextflow run nf-core/hlatyping -profile binac2,test --outdir <your output directory> -workdir <your work diretory>

As mentioned, the pipeline runs in a screen session. You can detach from the screen session, and the pipeline will continue to run. The keyboard shortcut for detaching is CTRL + c, followed by d. This means you press the CTRL and c keys simultaneously, then release them and press d. You should now be detached from the screen session and back in your login terminal.

While in your login terminal (or another window within your screen session), you can observe that Nextflow has submitted a job to the cluster for each pipeline process execution. Your output may differ, but it should show some pipeline jobs whose names begin with nf-NFCORE.

[tu_iioba01@login03 ~]$ squeue -u $USER
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            162040   compute nf-NFCOR tu_iioba PD       0:00      1 (None)
            162039   compute nf-NFCOR tu_iioba  R       0:01      1 node1-083

Now, we return to the Nextflow process in the screen session where the pipeline is running. You can list your screen sessions and their IDs with the command screen -ls.

(nf-core) [tu_iioba01@login03 nextflow_tests]$ screen -ls
There is a screen on:
        <screen session ID>.pts-2.login03     (Detached)
1 Socket in /var/run/screen/S-tu_iioba01.

If there is only one screen session, you can reattach using the following command:

screen -r

Otherwise, you will need to specify the screen session ID:

screen -r <screen session ID>

You can monitor the pipeline's execution progress. The test profile typically runs for less than 10 minutes. In the end, it should look like this:

-[nf-core/hlatyping] Pipeline completed successfully-
Completed at: 16-Apr-2025 16:18:26
Duration    : 9m 16s
CPU hours   : 0.4
Succeeded   : 7

The test run was successful. You can now run the pipeline with your own data.

Run nf-core pipeline with your own data

Typically, you specify your input files for nf-core pipelines in a samplesheet and run the pipeline with the parameter --input <your samplesheet>. You can also override any nf-core pipeline default settings according to your needs by using a custom configuration file. For more information, please refer to the pipelines' documentation.

As usual, you can contact BinAC support if you have any problems or questions.