BinAC/Software/Alphafold: Difference between revisions
F Bartusch (talk | contribs) No edit summary |
F Bartusch (talk | contribs) No edit summary |
||
(4 intermediate revisions by the same user not shown) | |||
Line 24: | Line 24: | ||
= Usage = |
= Usage = |
||
The BinAC provides Alphafold via an Apptainer Container. Both, the container and the AlphaFold database is stored on the WORK filesystem. |
The BinAC provides Alphafold via an Apptainer Container. Both, the container and the AlphaFold database is stored on the <code>WORK</code> filesystem. |
||
The module |
The module <code>bio/alphafold</code> provides a wrapper script called <code>alphafold</code>. |
||
Upon loading the module, the wrapper <code>alphafold</code> is in <code>PATH</code> and can be directly used. |
Upon loading the module, the wrapper <code>alphafold</code> is in <code>PATH</code> and can be directly used. |
||
The wrapper behaves like the script used in DeepMind's AlphaFold GitHub repository. Thus, all options explained |
The wrapper behaves like the script used in DeepMind's AlphaFold GitHub repository. Thus, all options explained in DeepMind's AlphaFold GitHub repository are also applicable for our <code>alphafold</code> wrapper. |
||
= Parallel |
= Parallel Computing = |
||
AlphaFold's algorithm results in two optimal resource profile regarding the number of cores and GPUs, depending on the way you run AlphaFold. |
|||
⚫ | |||
The memory requirement depends on protein size. |
|||
== Compute MSAs == |
|||
⚫ | |||
These MSAs are computed on the CPU sequentially and the number of threads are hard-coded: |
These MSAs are computed on the CPU sequentially and the number of threads are hard-coded: |
||
jackhmmer on UniRef90 using 8 threads</ |
jackhmmer on UniRef90 using 8 threads</br> |
||
jackhmmer on MGnify using 8 threads</ |
jackhmmer on MGnify using 8 threads</br> |
||
HHblits on BFD + Uniclust30 using 4 threads</ |
HHblits on BFD + Uniclust30 using 4 threads</br> |
||
Aftere computing the MSAs, AlphaFold then performs model inference on the GPU. Only one GPU is used. |
|||
Thus, it does not make sense to use more than 8 cores in your job! |
|||
This use case has this optimal resource profile: |
|||
The additional cores will be idle. |
|||
<pre> |
|||
#PBS -l nodes=1:ppn=8:gpus=1 |
|||
</pre> |
|||
The three MSAs are stored in the directory specified by <code>--output_dir</code> and can be reused with <code>--use_precomputed_msas=true</code>. |
The three MSAs are stored in the directory specified by <code>--output_dir</code> and can be reused with <code>--use_precomputed_msas=true</code>. |
||
== Use existing MSAs == |
|||
Only the model inference will run on the GPUs. |
|||
There is a switch (<code>--use_precomputed_msas=true</code>) that lets you use MSAs that were computed by an earlier AlphaFold run. |
|||
As AlphaFold skips the computation of the MSAs. The model inference step will run on only one GPU. Thus the optimal resource profile is; |
|||
<pre> |
|||
#PBS -l nodes=1:ppn=1:gpus=1 |
|||
</pre> |
|||
= Example on BinAC = |
|||
Alphafold's module contains two example jobscripts. One computes the MSA on CPU followed by inference on GPU. The other example used a precomputed MSA and runs inference on the GPU. |
|||
<pre> |
|||
module load bio/alphafold/2.3.2 |
|||
qsub ${ALPHAFOLD_EXA_DIR}/binac-alphafold-2.3.2-bwhpc-examples.pbs |
|||
qsub ${ALPHAFOLD_EXA_DIR}/binac-alphafold-2.3.2-bwhpc-examples_precomputed_msa.pbs |
|||
</pre> |
|||
<!-- This is a comment |
|||
= Benchmark on BinAC = |
|||
We ran some CASP14 targets with the <code>--benchmark=true</code> on BinAC. The following table gives you some guidance for choosing meaningful memory and walltime values. |
|||
{| class="wikitable" style="margin:auto" |
|||
|+ Benchmark results on BinAC (work in progress) |
|||
|- |
|||
! Target !! #Residues !! jackhmmer UniRef90 [s] !! jackhmmer MGnify [s] !! HHblits on BFD [s] !! Inference [s] !! Memory Usage [GB] |
|||
|- |
|||
| ... || ... || ... || ... || ... || ... || ... |
|||
|- |
|||
| ... || ... || ... || ... || ... || ... || ... |
|||
|- |
|||
| ... || ... || ... || ... || ... || ... || ... |
|||
|} |
|||
--> |
Latest revision as of 10:46, 22 April 2024
The main documentation is available via |
Description | Content |
---|---|
module load | bio/alphafold |
License | Apache License 2.0 - see [1] |
Citing | See [2] |
Links | DeepMind AlphaFold Website: [3] |
Description
AlphaFold developed by DeepMind predicts protein structures from the amino acid sequence at or near experimental resolution.
Usage
The BinAC provides Alphafold via an Apptainer Container. Both, the container and the AlphaFold database is stored on the WORK
filesystem.
The module bio/alphafold
provides a wrapper script called alphafold
.
Upon loading the module, the wrapper alphafold
is in PATH
and can be directly used.
The wrapper behaves like the script used in DeepMind's AlphaFold GitHub repository. Thus, all options explained in DeepMind's AlphaFold GitHub repository are also applicable for our alphafold
wrapper.
Parallel Computing
AlphaFold's algorithm results in two optimal resource profile regarding the number of cores and GPUs, depending on the way you run AlphaFold. The memory requirement depends on protein size.
Compute MSAs
In the beginning, AlphaFold computes three multiple sequence alignments (MSA). These MSAs are computed on the CPU sequentially and the number of threads are hard-coded:
jackhmmer on UniRef90 using 8 threads
jackhmmer on MGnify using 8 threads
HHblits on BFD + Uniclust30 using 4 threads
Aftere computing the MSAs, AlphaFold then performs model inference on the GPU. Only one GPU is used. This use case has this optimal resource profile:
#PBS -l nodes=1:ppn=8:gpus=1
The three MSAs are stored in the directory specified by --output_dir
and can be reused with --use_precomputed_msas=true
.
Use existing MSAs
There is a switch (--use_precomputed_msas=true
) that lets you use MSAs that were computed by an earlier AlphaFold run.
As AlphaFold skips the computation of the MSAs. The model inference step will run on only one GPU. Thus the optimal resource profile is;
#PBS -l nodes=1:ppn=1:gpus=1
Example on BinAC
Alphafold's module contains two example jobscripts. One computes the MSA on CPU followed by inference on GPU. The other example used a precomputed MSA and runs inference on the GPU.
module load bio/alphafold/2.3.2 qsub ${ALPHAFOLD_EXA_DIR}/binac-alphafold-2.3.2-bwhpc-examples.pbs qsub ${ALPHAFOLD_EXA_DIR}/binac-alphafold-2.3.2-bwhpc-examples_precomputed_msa.pbs