BinAC/Software/Nextflow: Difference between revisions

From bwHPC Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
= Description =
= Description =


Nextflow is a scientific workflow system predominantly used for bioinformatic data analyses.
Nextflow is a scientific workflow system predominantly used for bioinformatics data analysis. This documentation also covers nf-core, a community-driven initiative to curate a collection of analysis pipelines built using Nextflow.

This documentation also addresses nf-core, which is a community effort to collect a curated set of analysis pipelines built using Nextflow.

The documentation in the bwHPC-Wiki serves as 'getting started' guide for installing and using Nextflow with nf-core on the BinAC.
The nf-core documentation provides detailed information for each pipeline:
nf-core pipelines: https://nf-co.re/pipelines/



This documentation does not cover how to write your own pipelines. These information

Nextflow documentation: https://www.nextflow.io/docs/latest/index.html





= Installation =
= Installation =


There is no environment module for Nextflow on the BinAC. We encourage users to install Nextflow via Miniconda.
We recommend installing Nextflow via Miniconda.
As Nextflow is often used with nf-core pipelines, we also recommend to install the nf-core tools.
Since Nextflow is often used with nf-core pipelines, we also recommend installing the nf-core tools.


The following commands will create a new Conda environment that provides Nextflow and nf-core tools.
The following commands will create a new Conda environment that provides Nextflow and nf-core tools.
It also sets a Singularity cache directory where all Singularity containers are stored.
It also sets a shared Singularity cache directory in your <code>bashrc</code> where all Singularity containers are stored.


<pre>
<pre>
Line 99: Line 112:
11626230 tu_iioba01 short nf-NFCORE_RNASE 19839 -- -- 6gb 04:00:00 C --
11626230 tu_iioba01 short nf-NFCORE_RNASE 19839 -- -- 6gb 04:00:00 C --
</pre>
</pre>

== Run pipeline with your own data ==

https://nf-co.re/rnaseq/3.14.0/#usage

/beegfs/work/tu_iioba01/nextflow_tests/nf-core-rnaseq_3.14.0/3_14_0/assets/samplesheet.csv

Revision as of 12:15, 12 August 2024

Description

Nextflow is a scientific workflow system predominantly used for bioinformatics data analysis. This documentation also covers nf-core, a community-driven initiative to curate a collection of analysis pipelines built using Nextflow.


The documentation in the bwHPC-Wiki serves as 'getting started' guide for installing and using Nextflow with nf-core on the BinAC. The nf-core documentation provides detailed information for each pipeline: nf-core pipelines: https://nf-co.re/pipelines/


This documentation does not cover how to write your own pipelines. These information

Nextflow documentation: https://www.nextflow.io/docs/latest/index.html



Installation

We recommend installing Nextflow via Miniconda. Since Nextflow is often used with nf-core pipelines, we also recommend installing the nf-core tools.

The following commands will create a new Conda environment that provides Nextflow and nf-core tools. It also sets a shared Singularity cache directory in your bashrc where all Singularity containers are stored.

conda create --name nf-core python nf-core nextflow

echo "export NXF_SINGULARITY_CACHEDIR=/beegfs/work/container/apptainer_cache" >> ~/.bashrc
echo "export SINGULARITY_CACHEDIR=/beegfs/work/tu_iioba01/apptainer_cache " >> ~/.bashrc
source ~/.bashrc

conda activate nf-core

Usage

Install a nf-core pipeline

You could start and run pipelines now and the pipeline will pull all containers themselves. But we experienced problems when a pipeline started more than one job that pulls the same image. Thus we recommend to download the pipeline first using the nf-core tools. In this example we will use the rnaseq pipeline. The following command will download the pipeline in your current working directory and also pulls Singularity containers that aren't in the cache yet.

nf-core download -x none -d -u amend --container-system singularity -r 3.14.0 rnaseq

Test nf-core pipeline

The first thing you should do after downloading the pipeline is a test run. nf-core pipelines come with a test profile, which should just work out of the box. Also, there is a BinAC profile for nf-core. This profile contains settings for BinAC's job scheduler and queue settings.

Nextflow pipelines do not run in the background by default, so it is best to use a terminal multiplexer if running a long pipeline. Terminal multiplexers allow you to have multiple windows within a single terminal window. The benefit of these for running Nextflow pipelines is that you can detach from these terminal windows and reattach them later (even through an ssh connection) to check on the pipeline’s progress. This has the advantage that the pipeline will run even when you disconnect from the cluster. The detached session will continue to run.

Start a screen session:

screen

As this is a new terminal session you will need to load the Conda environment. The you can run the pipeline test. You should specifiy two directories. One is the work-dir, where intermediate results should be stored. The other is the outdir where the pipeline results will be stored.

The BinAC profile is currently not up-to-date in the nf-core repository. Thus you should add the custom_config_base option.

conda activate nf-core

nextflow run nf-core-rnaseq_3.14.0/3_14_0/ -profile binac,test \
  --custom_config_base 'https://raw.githubusercontent.com/fbartusch/configs/patch-1' \
  -work-dir <directory for intermediate results> \
  --outdir <directory for pipeline output>

As mentioned the pipeline runs in a screen session. You can detach from the screen session and the pipeline will continue to run. The keyboard shortcut for detaching is CTRL-c + d:

For listing your screen sessions:

(nf-core) [tu_iioba01@login03 nextflow_tests]$ screen -ls
There is a screen on:
        41342.pts-2.login03     (Detached)
1 Socket in /var/run/screen/S-tu_iioba01.

If there is only one screen session, you can reattach with:

screen -r

Otherwise you will need to specify the screen session number:

screen -r 41342

Nextflow will start a job for each pipeline step:

(base) [tu_iioba01@login03 ~]$ qstat -u $USER

mgmt02: 
                                                                                  Req'd       Req'd       Elap
Job ID                  Username    Queue    Jobname          SessID  NDS   TSK   Memory      Time    S   Time
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------
11626226                tu_iioba01  short    nf-NFCORE_RNASE   19779   --     --        6gb  04:00:00 C       -- 
11626227                tu_iioba01  short    nf-NFCORE_RNASE   19788     1      2       6gb  06:00:00 C       -- 
11626228                tu_iioba01  short    nf-NFCORE_RNASE   19805     1      2       6gb  06:00:00 C       -- 
11626229                tu_iioba01  short    nf-NFCORE_RNASE   19819   --     --        6gb  04:00:00 C       -- 
11626230                tu_iioba01  short    nf-NFCORE_RNASE   19839   --     --        6gb  04:00:00 C       -- 

Run pipeline with your own data

https://nf-co.re/rnaseq/3.14.0/#usage

/beegfs/work/tu_iioba01/nextflow_tests/nf-core-rnaseq_3.14.0/3_14_0/assets/samplesheet.csv