Burrows Wheeler Aligner - bwHPC Wiki Burrows Wheeler Aligner - bwHPC Wiki

Burrows Wheeler Aligner

From bwHPC Wiki
Jump to: navigation, search
Description Content
module load bio/bwa (Burrows Wheeler Aligner)
License GPLV3, MIT License
Citing Citing infos...
Links Burrows Wheeler Aligner Homepage
Graphical Interface no
Plugin Samtools 1.3

1 Description/What is BWA?

BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. The first algorithm is designed for Illumina sequence reads up to 100bp, while the rest two for longer sequences ranged from 70bp to 1Mbp. BWA-MEM and BWA-SW share similar features such as long-read support and split alignment, but BWA-MEM, which is the latest, is generally recommended for high-quality queries as it is faster and more accurate. BWA-MEM also has better performance than BWA-backtrack for 70-100bp Illumina reads.
For more information on features please visit the BWA Homepage

2 Versions and Availability

A list of versions currently available on all bwHPC-C5-Clusters can be obtained from the

Cluster Information System CIS

On the command line interface you'll get a list of available versions by using the command module avail bio/bwa.

$ module avail bio/bwa
------------- /opt/bwhpc/common/modulefiles -----------------

3 License

The program Burrows Wheeler Aligner (BWA) is a free software package.

4 Citing

Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754-1760. (PMID: 19451168). (if you use the BWA-backtrack algorithm)

Li H. and Durbin R. (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics, 26, 589-595. (PMID: 20080505). (if you use the BWA-SW algorithm)

Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997v2 (q-bio.GN). (if you use the BWA-MEM algorithm or the fastmap command, or want to cite the whole BWA package)

5 Usage

5.1 Loading the module

5.1.1 Default

You can load the default version of BWA with the command module load bio/bwa.

$ module load bio/bwa
$ module list
Currently Loaded Modulefiles:
  1) bio/bwa/0.7.12

The module will try to load modules it needs to function. If loading the module fails, check if you have already loaded one of those modules, but not in the version needed for BWA

5.1.2 Special Version

If you wish to load a version of BWA, you can do so using module load bio/bwa/'version' to load the version you desires.

$ module avail bio/bwa
------------------- /opt/bwhpc/common/modulefiles -------------
$ module load bio/bwa/0.7.12
$ module list
Currently Loaded Modulefiles:
  1) bio/bwa/0.7.12

5.2 Program Binaries

$ ls -xF $BWA_HOME
bwa*          bwakit/       bwa-postalt.js    bwhpc-build-log/     bwhpc-examples/
doc/          fermi2*       fermi2.pl*        htsbox*              k8*
modulefiles/  README.md     resource-GRCh38/  resource-human-HLA/  ropebwt2*
run-bwamem*   run-gen-ref*  run-HLA*          samblaster*          samtools.1.1*
samtools.d/   seqtk*        trimadap*         typeHLA.js           typeHLA-selctg.js

'*' indicates the file is executable. '/' indicates its a folder.

6 bwHPC Examples for Burrows Wheeler Aligner

  • MPI is not implemented in this version of BWA
    (MPI-version is pBWA (not jet available))
  • Some parts of BWA will run multithreaded (option '-t'. ).

In the folder $BWA_EXA_DIR you'll find an example how to use BWA.

$ ls -lF $BWA_EXA_DIR
[...] bwhpc_build-large-index.d/
[...] bwhpc-bwa-example.moab
[...] genome.fa
[...] README.bwhpc-examples
[...] Sp_ds.left.fq
[...] Sp_ds.right.fq
[dir] bwhpc_build-large-index.d/bwhpc-BWAMEM-example.moab # Build large indexes!

6.1 bwHPC example workflow

  • bwhpc-bwa-example.moab

Use this Moab start-script to start your own BWA session in interactive mode. Look for this section inside the file and do your modifications.

6.1.1 How to use the bwhpc-BWA Test-Script

  • Create your own work-space
#           WS-Name        Days alive (max. 60)
ws_allocate bwa_repo 30
  • Change dir to your workspace
cd $(ws_find bwa_repo)
  • Copy the moab-example file you'll find in this folder and make your modifications
cp $BWA_EXA_DIR/bwhpc-bwa-example.moab .
  • Submit your job
msub bwhpc-bwa-example.moab
  • Wait for awhile...

... until you see some more files created. The *.tgz-file contains your data.

tar xvzf *.tgz to extract the file-contents

6.1.2 Exerpt from bwhpc-bwa-example.moab

These parameters are allying for the use of BWA on the bwUniCluster.

#MSUB -N bwa_job
#MSUB -j oe
#MSUB -m ae
#MSUB -M 'your e-mail@your.DN'
#MSUB -q singlenode
#MSUB -l walltime=00:10:00
echo " "
echo "### Loading module: bwa/0.7.12"
echo " "
module load bio/bwa/0.7.12
[ -z "$BWA_HOME" ] && { echo 'ERROR: Failed to load module bio/bwa/0.7.12'; exit 1; }
echo "BWA_HOME = ${BWA_HOME}"
module list

echo " "
echo "### Copying input test files for job (if required):"
echo " "
cp $BWA_EXA_DIR/{*.fq,genome.fa} . 
echo " "
echo "### Run BWA in single-node-mode..."
echo " "
echo "build index in Fasta format..."
# http://bio-bwa.sourceforge.net/bwa.shtml
# bwa index [-p prefix] [-a algoType] <in.db.fasta>
# Index database sequences in the FASTA format.
bwa index genome.fa
[ "$?" -ne 0 ] && { echo "bwa index returned with an error: $?"; exit 1; }
echo "generate gapped/ungapped alignment..."
# bwa aln [-n maxDiff] [-o maxGapO] [-e maxGapE] [-d nDelTail] 
# [-i nIndelEnd] [-k maxSeedDiff] [-l seedLen] [-t nThrds] [-cRN]
# [-M misMsc] [-O gapOsc] [-E gapEsc] [-q trimQual] 
# <in.db.fasta> <in.query.fq> > <out.sai>
# Find the SA coordinates of the input reads. 
# Maximum maxSeedDiff differences are allowed in the first
# seedLen subsequence and maximum maxDiff differences are#
# allowed in the whole sequence.
bwa aln -t ${MOAB_PROCCOUNT} genome.fa Sp_ds.left.fq > Sp_ds.left.sai
[ "$?" -ne 0 ] && { echo "bwa-left aln returned with an error: $?"; exit 1; }
bwa aln -t ${MOAB_PROCCOUNT} genome.fa Sp_ds.right.fq > Sp_ds.right.sai
[ "$?" -ne 0 ] && { echo "bwa-right aln returned with an error: $?"; exit 1; }
echo "generate alignment..."
# bwa sampe [-a maxInsSize] [-o maxOcc] [-n maxHitPaired] 
# [-N maxHitDis] [-P] <in.db.fasta> <in1.sai> <in2.sai> <in1.fq> 
# <in2.fq> > <out.sam>
# Generate alignments in the SAM format given paired-end reads. 
# Repetitive read pairs will be placed randomly. 
bwa sampe genome.fa Sp_ds.left.sai Sp_ds.right.sai Sp_ds.left.fq Sp_ds.right.fq > aln-pe.sam
[ "$?" -ne 0 ] && { echo "bwa returned with an error: $?"; exit 1; }
echo "done"

Piping the command to 'parallel' will not work!

7 BWA-Specific Environments

To see a list of all BWA environments set by the module load-command use env | grepBWA. Or use the command module display bio/bwa.

$ module display bio/bwa

module-whatis	 Burrows Weeler Aligner (bwa) 0.7.12 is a program for aligning sequencing 
          reads against a large reference genome (e.g. human genome). 
setenv		 BWA_VERSION 0.7.12 
setenv		 BWA_HOME /opt/bwhpc/common/bio/bwa/0.7.12 
setenv		 BWA_EXA_DIR /opt/bwhpc/common/bio/bwa/0.7.12/bwhpc-examples 
setenv		 BWA_BIN_DIR /opt/bwhpc/common/bio/bwa/0.7.12 
setenv		 SAMTOOLS_BIN_DIR /opt/bwhpc/common/bio/bwa/0.7.12/samtools.d/bin 
setenv		 BWA_BPR_URL http://www.bwhpc-c5.de/wiki/index.php/Burrows_Wheeler_Aligner 
prepend-path	 PATH /opt/bwhpc/common/bio/bwa/0.7.12 
prepend-path	 PATH /opt/bwhpc/common/bio/bwa/0.7.12/samtools.d/bin 

The module display command will not load the module!

8 Version-Specific Information

For a more detailed information specific to a specific BWA version, see the information available via the module system with the command module help bio/bwa.
For a small abstract what BWA is about use the command module whatis bio/bwa.

$ module whatis bio/bwa
bio/bwa              : Burrows Weeler Aligner (bwa) 0.7.12 is a program for aligning
      sequencing reads against a large reference genome (e.g. human genome).

$ module help bio/bwa
----------- Module Specific Help for 'bio/bwa/0.7.12' ----------
   Burrows Weeler Aligner (BWA) is a software package for mapping 
   low-divergent sequences against a large reference genome, such as
   the human genome.
   It has two major components, one for read shorter than 150bp 
   and the other for longer reads.  

*  BWA: get started and FAQs
   Read the /opt/bwhpc/common/bio/bwa/0.7.12/README.md file
   For BWA command line options enter: bwa (exit-status 1).

*  SAMTOOLS: Home and tutorial

*  BWA manual page

*  BWA repository (binaries/sources)

*  bwHPC examples and a moab example script can be found here:
    Please read the 'README.bwhpc-examples' file.

9 Useful Links