BinAC/Software/Nextflow
Description
Nextflow is a scientific workflow system predominantly used for bioinformatics data analysis. This documentation also covers nf-core, a community-driven initiative to curate a collection of analysis pipelines built using Nextflow.
The documentation in the bwHPC Wiki serves as a 'getting started' guide for installing and using Nextflow with nf-core on BinAC. The nf-core documentation provides detailed information for each pipeline.
This documentation does not cover how to write your own pipelines. This information is available in the Nextflow documentation.
Installation
We recommend installing Nextflow via Miniconda. Since Nextflow is often used with nf-core pipelines, we also recommend installing the nf-core tools.
The following commands will create a new Conda environment that provides Nextflow and nf-core tools.
It also sets a shared Singularity cache directory in your bashrc
where all Singularity containers are stored.
conda create --name nf-core python nf-core nextflow echo "export NXF_SINGULARITY_CACHEDIR=/beegfs/work/container/apptainer_cache" >> ~/.bashrc echo "export SINGULARITY_CACHEDIR=/beegfs/work/tu_iioba01/apptainer_cache " >> ~/.bashrc source ~/.bashrc conda activate nf-core
Usage
Install a nf-core pipeline
You can start and run pipelines now and the pipeline will pull all containers automatically. However we encountered issues when a pipeline starts more than one job that pulls the same image simultaneously. Therefore we recommend downloading the pipeline first using the nf-core tools.
In this guide, we will use the rnaseq
pipeline in revision 3.14.0
. To make the code examples more readable and broadly applicable, we will first specify some environment variables.
If you use another pipeline
and/or another revision
, simply change the following environment variables:
export pipeline=rnaseq export revision=3.14.0 pipeline_dir=<install directory for the pipeline files> nxf_work_dir=<directory for intermediate results of the pipeline> nxf_output_dir=<directory for the final pipeline output>
The following command will download the pipeline into a directory you specify and also pull any Singularity containers that aren't yet in the cache.
nf-core download -o ${pipeline_dir} -x none -d -u amend --container-system singularity -r ${revision} ${pipeline}
Test nf-core pipeline
The first thing you should do after downloading the pipeline is to perform a test run. nf-core pipelines come with a test profile, which should work right out of the box. Additionally, there is a BinAC profile for nf-core. This profile contains settings for BinAC's job scheduler and queue configurations.
Nextflow pipelines do not run in the background by default, so it is best to use a terminal multiplexer when running a long pipeline. Terminal multiplexers allow you to have multiple windows within a single terminal window. The benefit of these for running Nextflow pipelines is that you can detach from these terminal windows and reattach them later (even through an SSH connection) to check on the pipeline’s progress. This ensures that the pipeline continues to run even if you disconnect from the cluster. The detached session will keep running
Start a screen session:
screen
As this is a new terminal session you will need to load the Conda environment. The you can run the pipeline test. You should specifiy two directories. One is the work-dir
, where intermediate results should be stored. The other is the outdir
where the pipeline results will be stored.
The BinAC profile is currently not up-to-date in the nf-core repository. Thus you should add the custom_config_base
option.
conda activate nf-core nextflow run nf-core-${pipeline}_3.14.0/3_14_0/ -profile binac,test \ --custom_config_base 'https://raw.githubusercontent.com/fbartusch/configs/patch-1' \ -work-dir <directory for intermediate results> \ --outdir <directory for pipeline output>
As mentioned the pipeline runs in a screen session. You can detach from the screen session and the pipeline will continue to run. The keyboard shortcut for detaching is CTRL-c + d
:
For listing your screen sessions:
(nf-core) [tu_iioba01@login03 nextflow_tests]$ screen -ls There is a screen on: 41342.pts-2.login03 (Detached) 1 Socket in /var/run/screen/S-tu_iioba01.
If there is only one screen session, you can reattach with:
screen -r
Otherwise you will need to specify the screen session number:
screen -r 41342
Nextflow will start a job for each pipeline step:
(base) [tu_iioba01@login03 ~]$ qstat -u $USER mgmt02: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - --------- 11626226 tu_iioba01 short nf-NFCORE_RNASE 19779 -- -- 6gb 04:00:00 C -- 11626227 tu_iioba01 short nf-NFCORE_RNASE 19788 1 2 6gb 06:00:00 C -- 11626228 tu_iioba01 short nf-NFCORE_RNASE 19805 1 2 6gb 06:00:00 C -- 11626229 tu_iioba01 short nf-NFCORE_RNASE 19819 -- -- 6gb 04:00:00 C -- 11626230 tu_iioba01 short nf-NFCORE_RNASE 19839 -- -- 6gb 04:00:00 C --
Run pipeline with your own data
https://nf-co.re/rnaseq/3.14.0/#usage
/beegfs/work/tu_iioba01/nextflow_tests/nf-core-rnaseq_3.14.0/3_14_0/assets/samplesheet.csv