BinAC/Software/Bowtie

From bwHPC Wiki
Jump to: navigation, search
Description Content
module load bio/bowtie
License Free Software
Citing

Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25.

Links Homepage | Manual
Graphical Interface No

1 Description

Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).

2 Versions and Availability

A list of versions currently available on all bwHPC-C5-Clusters can be obtained from the

Cluster Information System CIS

{{#widget:Iframe |url=https://cis-hpc.uni-konstanz.de/prod.cis/bwUniCluster/bio/bowtie |width=99% |height=250 |border=0 }} On the command line interface of any bwHPC cluster, a list of the available i versions using

$ module avail bio/bowtie

3 License

Copyright 2014, Ben Langmead Bowtie is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Bowtie is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Bowtie. If not, see <http://www.gnu.org/licenses/>.

4 Usage

4.1 Loading the module

You can load the default version of Bowtie with the command

$ module load bio/bowtie

The module will try to load modules it needs to function (e.g. compiler/intel). If loading the module fails, check if you have already loaded one of those modules, but not in the version needed for Bowtie. If you wish to load a specific (older) version, you can do so using e.g.

$ module load bio/bowtie/1.0.1

to load the version 1.0.1.

4.2 Program Binaries

$ bowtie

Bowtie is an ultrafast, memory-efficient short read aligner geared toward quickly aligning large sets of short DNA sequences (reads) to large genomes. bowtie takes an index and a set of reads as input and outputs a list of alignments.

$ bowtie-build

bowtie-build builds a Bowtie index from a set of DNA sequences. bowtie-build outputs a set of 6 files with suffixes .1.ebwt, .2.ebwt, .3.ebwt, .4.ebwt, .rev.1.ebwt, and .rev.2.ebwt. (If the total length of all the input sequences is greater than about 4 billion, then the index files will end in ebwtl instead of ebwt.) These files together constitute the index: they are all that is needed to align reads to that reference. The original sequence files are no longer used by Bowtie once the index is built.

$ bowtie-inspect

bowtie-inspect extracts information from a Bowtie index about what kind of index it is and what reference sequences were used to build it. When run without any options, the tool will output a FASTA file containing the sequences of the original references (with all non-A/C/G/T characters converted to Ns). It can also be used to extract just the reference sequence names using the -n/--names option or a more verbose summary using the -s/--summary option.

4.3 Disk Usage

Scratch files are written to the current directory by default. Please change to a local directory before starting your calculations. For example

$ TMP_DIR=$TMP/$USER/job_sub_dir 
$ mkdir -p $TMP_DIR 
$ cd $TMP/$USER/job_sub_dir 

However, you can also use workspaces for your calculations that are located on the parallel file system. Especially since in- and outputdata for aligining sequences is rather big and if you want to use your results for subsequent analysis.

$ WS_PATH=`ws_allocate bowtie_test 20`
$ cd ${WS_PATH}/


Bowtie-Indices

Please contact the HPC-Competence Center for Bioinformatics and Astrophysics via the bwSupport Portal if you need a Bowtie-index permantly. The indices usually need a lot if diskspace. Therefore it is better to make them available to users in a common location like ${DBDATA_BOWTIE_INDEX_DNA}.

5 Examples

Aligning

The following example shows you how to align simulated short reads against the human genome HG19:

$ msub -I -lnodes=1:ppn=2,walltime=00:00:30:00
$ HOME=`pwd`
$ TMP_DIR=$TMP/$USER/job_sub_dir 
$ mkdir -p $TMP_DIR 
$ cd $TMP/$USER/job_sub_dir
$ module load bio/bowtie/1.0.1
$ module load dbdata/homo_sapiens/hg19_ncbi
$ time bowtie -S -p ${MOAB_PROCCOUNT} \
${DBDATA_BOWTIE_INDEX_DNA} \
${BOWTIE_EXA_DIR}/hg19_sim.read1.fastq \
bowtie.sam\
&>statistics.txt &
$ mkdir -p $HOME/botie_test_results/
$ mv * $HOME/bowtie_test_results/ 
$ cd $HOME/bowtie_test_results/
$ rm -rfv $TMP_DIR/


Explanation of the parameters:

-S Output will be written in SAM format
-p Calulation will be performed on X cores, the value is taken from the MOAB_PROCCOUNT environment variable. This calculation will be done on two cores since we requested them with -lnodes=1:ppn=2
${DATA_BOWTIE_INDEX_DNA} Location of the bowtie index, in this case hg19 is used
${BOWTIE_EXA_DIR}/hg19_sim.read1.fastq Input file containing the short reads. In this example simulated short reads created with dwgsim 0.1.11 are used.
bowtie.sam Output file in SAM format named bowtie.sam
&>statistics.txt Statistcs are piped into the file statistics.txt


Indexing

The following script can be used to create a bowtie index. However, please contact the HPC-Competence center for Bioinformatics and Astrophysics (bwSupport Portal) if you need additional Bowtie-Indices that are not already located in $DATA_BOWTIE_INDEX_DNA/

Content of the batch script create_bowtie_indices.moab

#!/bin/bash
#MSUB -l nodes=1:ppn=1
#MSUB -l walltime=01:00:00:00
#MSUB -m abe
##MSUB -M PUT_YOUR_EMAIL
#MSUB -l mem=20gb

module load bio/bowtie/1.0.1
cd $MOAB_SUBMITDIR/
time bowtie2-build hg19.fa hg19.bowtie

More examples can be found in the $BOWTIE_EXA_DIR.

6 Version-Specific Information

For information specific to a single version, see the information available via the module system with the command

$ module help bio/bowtie