BinAC/Software/Nextflow: Difference between revisions
F Bartusch (talk | contribs) No edit summary |
F Bartusch (talk | contribs) No edit summary |
||
Line 29: | Line 29: | ||
<b>But</b> we experienced problems when a pipeline started more than one job that pulls the same image. |
<b>But</b> we experienced problems when a pipeline started more than one job that pulls the same image. |
||
Thus we recommend to download the pipeline first using the nf-core tools. |
Thus we recommend to download the pipeline first using the nf-core tools. |
||
In this example we will use the rnaseq pipeline. |
In this example we will use the rnaseq pipeline. The following command will download the pipeline in your current working directory and also pulls Singularity containers that aren't in the cache yet. |
||
<pre> |
<pre> |
||
Line 35: | Line 35: | ||
</pre> |
</pre> |
||
== Test nf-core pipeline == |
|||
⚫ | |||
nf-core tools can download pipelines and also download Singularity containers. |
|||
Nextflow pipelines do not run in the background by default, so it is best to use a terminal multiplexer if running a long pipeline. Terminal multiplexers allow you to have multiple windows within a single terminal window. The benefit of these for running Nextflow pipelines is that you can detach from these terminal windows and reattach them later (even through an ssh connection) to check on the pipeline’s progress. |
|||
== Run tests == |
|||
This has the advantage that the pipeline will run even when you disconnect from the cluster. The detached session will continue to run. |
|||
Start a screen session: |
|||
⚫ | |||
<pre> |
<pre> |
||
screen |
|||
⚫ | |||
</pre> |
</pre> |
||
As this is a new terminal session you will need to load the Conda environment. The you can run the pipeline test. You should specifiy two directories. One is the <code>work-dir</code>, where intermediate results should be stored. The other is the <code>outdir</code> where the pipeline results will be stored. |
|||
The BinAC profile is currently not up-to-date in the nf-core repository. Thus you should add the <code>custom_config_base</code> option. |
|||
<pre> |
|||
conda activate nf-core |
|||
⚫ | |||
--custom_config_base 'https://raw.githubusercontent.com/fbartusch/configs/patch-1' \ |
|||
-work-dir <directory for intermediate results> \ |
|||
--outdir <directory for pipeline output> |
|||
</pre> |
|||
<pre> |
|||
</pre> |
|||
== == |
Revision as of 11:00, 12 August 2024
Description
Nextflow is a scientific workflow system predominantly used for bioinformatic data analyses. This documentation also addresses nf-core, which is a community effort to collect a curated set of analysis pipelines built using Nextflow.
Installation
There is no environment module for Nextflow on the BinAC. We encourage users to install Nextflow via Miniconda. As Nextflow is often used with nf-core pipelines, we also recommend to install the nf-core tools.
The following commands will create a new Conda environment that provides Nextflow and nf-core tools. It also sets a Singularity cache directory where all Singularity containers are stored.
conda create --name nf-core python nf-core nextflow echo "export NXF_SINGULARITY_CACHEDIR=/beegfs/work/container/apptainer_cache" >> ~/.bashrc echo "export SINGULARITY_CACHEDIR=/beegfs/work/tu_iioba01/apptainer_cache " >> ~/.bashrc source ~/.bashrc conda activate nf-core
Usage
Install a nf-core pipeline
You could start and run pipelines now and the pipeline will pull all containers themselves. But we experienced problems when a pipeline started more than one job that pulls the same image. Thus we recommend to download the pipeline first using the nf-core tools. In this example we will use the rnaseq pipeline. The following command will download the pipeline in your current working directory and also pulls Singularity containers that aren't in the cache yet.
nf-core download -x none -d -u amend --container-system singularity -r 3.14.0 rnaseq
Test nf-core pipeline
The first thing you should do after downloading the pipeline is a test run. nf-core pipelines come with a test profile, which should just work out of the box. Also, there is a BinAC profile for nf-core. This profile contains settings for BinAC's job scheduler and queue settings.
Nextflow pipelines do not run in the background by default, so it is best to use a terminal multiplexer if running a long pipeline. Terminal multiplexers allow you to have multiple windows within a single terminal window. The benefit of these for running Nextflow pipelines is that you can detach from these terminal windows and reattach them later (even through an ssh connection) to check on the pipeline’s progress. This has the advantage that the pipeline will run even when you disconnect from the cluster. The detached session will continue to run.
Start a screen session:
screen
As this is a new terminal session you will need to load the Conda environment. The you can run the pipeline test. You should specifiy two directories. One is the work-dir
, where intermediate results should be stored. The other is the outdir
where the pipeline results will be stored.
The BinAC profile is currently not up-to-date in the nf-core repository. Thus you should add the custom_config_base
option.
conda activate nf-core nextflow run nf-core-rnaseq_3.14.0/3_14_0/ -profile binac,test \ --custom_config_base 'https://raw.githubusercontent.com/fbartusch/configs/patch-1' \ -work-dir <directory for intermediate results> \ --outdir <directory for pipeline output>