BinAC/Software/Nextflow: Difference between revisions

From bwHPC Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 44: Line 44:
export revision=3.14.0
export revision=3.14.0


pipeline_dir=${PWD}/nf-core-${pipeline}/$(echo $revision | tr . _)
export pipeline_dir=${PWD}/nf-core-${pipeline}/$(echo $revision | tr . _)
nxf_work_dir=${PWD}/work
export nxf_work_dir=${PWD}/work
nxf_output_dir=${PWD}/output
export nxf_output_dir=${PWD}/output
</pre>


echo "Pipeline will be downloaded to: ${pipeline_dir}"
</pre>


The following command will download the pipeline into your current working directory and also pull any Singularity containers that aren't yet in the cache. This can take some time if the images aren't in your container cache yet, so grab a coffee.
The following command will download the pipeline into your current working directory and also pull any Singularity containers that aren't yet in the cache. This can take some time if the images aren't in your container cache yet, so grab a coffee.
Line 60: Line 61:
== Test nf-core pipeline ==
== Test nf-core pipeline ==


The first thing you should do after downloading the pipeline is to perform a test run. nf-core pipelines come with a test profile, which should work right out of the box. Additionally, there is a BinAC profile for nf-core. This profile contains settings for BinAC's job scheduler and queue configurations.
The first thing you should do after downloading the pipeline is to perform a test run. nf-core pipelines come with a test profile that should work right out of the box. Additionally, there is a BinAC profile for nf-core, which includes settings for BinAC's job scheduler and queue configurations.


Nextflow pipelines do not run in the background by default, so it is best to use a terminal multiplexer when running a long pipeline. Terminal multiplexers allow you to have multiple windows within a single terminal window. The benefit of these for running Nextflow pipelines is that you can detach from these terminal windows and reattach them later (even through an SSH connection) to check on the pipeline’s progress.
Nextflow pipelines do not run in the background by default, so it is best to use a terminal multiplexer (like <code>screen</code> or <code>tmux</code>) when running a long pipeline. Terminal multiplexers allow you to have multiple windows within a single terminal. The advantage of using these for running Nextflow pipelines is that you can detach from the terminal and reattach them later (even through an SSH connection) to check on the pipeline’s progress.
This ensures that the pipeline continues to run even if you disconnect from the cluster. The detached session will keep running
This ensures that the pipeline continues to run even if you disconnect from the cluster. The detached session will keep running.


Start a screen session:
Start a screen session:
Line 71: Line 72:
</pre>
</pre>


As this is a new terminal session you will need to load the Conda environment. The you can run the pipeline test. You should specifiy two directories. One is the <code>work-dir</code>, where intermediate results should be stored. The other is the <code>outdir</code> where the pipeline results will be stored.
Since this is a new terminal, you will need to load the Conda environment again.
Note that environment variables like <code>pipeline</code> are already set because we defined them using the <code>export</code> keyword, which makes them available to child processes.

The BinAC profile is currently not up-to-date in the nf-core repository. Thus you should add the <code>custom_config_base</code> option.


<pre>
<pre>
conda activate nf-core
conda activate nf-core
</pre>


Now you can run the pipeline test.
You should always specify two directories when running the pipeline to ensure you know exactly where the results are stored.
One directory is <code>work-dir</code>, where Nextflow stores intermediate results.
The other directory is <code>outdir</code>, where Nextflow stores the final pipeline results.

<pre>
nextflow run nf-core-${pipeline}_3.14.0/3_14_0/ -profile binac,test \
nextflow run nf-core-${pipeline}_3.14.0/3_14_0/ -profile binac,test \
--custom_config_base 'https://raw.githubusercontent.com/fbartusch/configs/patch-1' \
-work-dir <directory for intermediate results> \
-work-dir <directory for intermediate results> \
--custom_config_base 'https://raw.githubusercontent.com/fbartusch/configs/patch-1' \
--outdir <directory for pipeline output>
--outdir <directory for pipeline output>

</pre>
</pre>



Revision as of 10:32, 14 August 2024

Description

Nextflow is a scientific workflow system predominantly used for bioinformatics data analysis. This documentation also covers nf-core, a community-driven initiative to curate a collection of analysis pipelines built using Nextflow.

The documentation in the bwHPC Wiki serves as a 'getting started' guide for installing and using Nextflow with nf-core on BinAC. The nf-core documentation provides detailed information for each pipeline.

This documentation does not cover how to write your own pipelines. This information is available in the Nextflow documentation.

Installation

We recommend installing Nextflow via Miniconda. Since Nextflow is often used with nf-core pipelines, we also recommend installing the nf-core tools.

The following commands will create a new Conda environment that provides Nextflow and nf-core tools. It also sets a shared Singularity cache directory in your bashrc where all Singularity containers are stored.

conda create --name nf-core python=3.12 nf-core nextflow

echo "export NXF_SINGULARITY_CACHEDIR=/beegfs/work/container/apptainer_cache/$USER" >> ~/.bashrc
echo "export SINGULARITY_CACHEDIR=/beegfs/work/container/apptainer_cache/$USER" >> ~/.bashrc
source ~/.bashrc

conda activate nf-core

Usage

Install a nf-core pipeline

You can start and run pipelines now and Nextflow will pull all containers automatically. However we encountered issues when a pipeline starts more than one job that pulls the same image simultaneously. Therefore we recommend downloading the pipeline and its containers first using the nf-core tools.

In this guide, we will use the rnaseq pipeline in revision 3.14.0. To make the code examples more readable and broadly applicable, we will first specify some environment variables. If you use another pipeline and/or another revision, simply change the pipeline and revision environment variables. The current working directory should be one of your workspaces under /beegfs/work.

cd /beegfs/work/<path to your workspace>

export pipeline=rnaseq
export revision=3.14.0

export pipeline_dir=${PWD}/nf-core-${pipeline}/$(echo $revision | tr . _)
export nxf_work_dir=${PWD}/work
export nxf_output_dir=${PWD}/output

echo "Pipeline will be downloaded to: ${pipeline_dir}"

The following command will download the pipeline into your current working directory and also pull any Singularity containers that aren't yet in the cache. This can take some time if the images aren't in your container cache yet, so grab a coffee.

nf-core download -o ${pipeline_dir} -x none -d -u amend --container-system singularity -r ${revision} ${pipeline}

If there are errors during this step, contact BinAC support , and provide the commands you used along with the error message.

Test nf-core pipeline

The first thing you should do after downloading the pipeline is to perform a test run. nf-core pipelines come with a test profile that should work right out of the box. Additionally, there is a BinAC profile for nf-core, which includes settings for BinAC's job scheduler and queue configurations.

Nextflow pipelines do not run in the background by default, so it is best to use a terminal multiplexer (like screen or tmux) when running a long pipeline. Terminal multiplexers allow you to have multiple windows within a single terminal. The advantage of using these for running Nextflow pipelines is that you can detach from the terminal and reattach them later (even through an SSH connection) to check on the pipeline’s progress. This ensures that the pipeline continues to run even if you disconnect from the cluster. The detached session will keep running.

Start a screen session:

screen

Since this is a new terminal, you will need to load the Conda environment again. Note that environment variables like pipeline are already set because we defined them using the export keyword, which makes them available to child processes.

conda activate nf-core

Now you can run the pipeline test. You should always specify two directories when running the pipeline to ensure you know exactly where the results are stored. One directory is work-dir, where Nextflow stores intermediate results. The other directory is outdir, where Nextflow stores the final pipeline results.

nextflow run nf-core-${pipeline}_3.14.0/3_14_0/ -profile binac,test \
  -work-dir <directory for intermediate results> \
  --custom_config_base 'https://raw.githubusercontent.com/fbartusch/configs/patch-1' \
  --outdir <directory for pipeline output>

As mentioned the pipeline runs in a screen session. You can detach from the screen session and the pipeline will continue to run. The keyboard shortcut for detaching is CTRL-c + d:

For listing your screen sessions:

(nf-core) [tu_iioba01@login03 nextflow_tests]$ screen -ls
There is a screen on:
        41342.pts-2.login03     (Detached)
1 Socket in /var/run/screen/S-tu_iioba01.

If there is only one screen session, you can reattach with:

screen -r

Otherwise you will need to specify the screen session number:

screen -r 41342

Nextflow will start a job for each pipeline step:

(base) [tu_iioba01@login03 ~]$ qstat -u $USER

mgmt02: 
                                                                                  Req'd       Req'd       Elap
Job ID                  Username    Queue    Jobname          SessID  NDS   TSK   Memory      Time    S   Time
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------
11626226                tu_iioba01  short    nf-NFCORE_RNASE   19779   --     --        6gb  04:00:00 C       -- 
11626227                tu_iioba01  short    nf-NFCORE_RNASE   19788     1      2       6gb  06:00:00 C       -- 
11626228                tu_iioba01  short    nf-NFCORE_RNASE   19805     1      2       6gb  06:00:00 C       -- 
11626229                tu_iioba01  short    nf-NFCORE_RNASE   19819   --     --        6gb  04:00:00 C       -- 
11626230                tu_iioba01  short    nf-NFCORE_RNASE   19839   --     --        6gb  04:00:00 C       -- 

Run pipeline with your own data

https://nf-co.re/rnaseq/3.14.0/#usage

/beegfs/work/tu_iioba01/nextflow_tests/nf-core-rnaseq_3.14.0/3_14_0/assets/samplesheet.csv