Using MPI-HMMER on Big Red at IU
On this page:
- Introduction
- Using the
hmmerjobscript to submit parallelhmmsearchjobs - Using the
hmmerjobscript to submit parallelhmmpfamjobs - Using non-parallel HMMER programs
Introduction
HMMER is a suite of programs that you can use to create and query
hidden Markov models that describe molecular sequences. A parallel
port of HMMER known as MPI-HMMER is available on Big Red
at Indiana University. It contains all the HMMER programs, but only
hmmpfam and hmmsearch have been
parallelized.
MPI-HMMER is installed in the directory
/N/soft/linux-sles9-ppc64/hmmer-2.3.2-MPI-0.92. Documentation for
HMMER programs is available as man pages. You can also
visit the MPI-HMMER page for more
information.
Using the hmmerjob script to submit parallel
hmmsearch jobs
The hmmerjob script is written to submit both
hmmpfam and hmmsearch jobs. Use the
hmmpfam and hmmsearch options with
hmmerjob just as you would with serial versions of these
programs. If you use only the hmmpfam and
hmmsearch options, a job will be submitted that uses four
processes for up to two hours in the NORMAL queue on Big Red. You can
use other options to change those settings.
The form of the hmmerjob command using
hmmpsearch is:
Replace items in brackets with your chosen values. The
-CPUS option specifies the number of processes to start,
-wallhours the length of time the job may run, and
-queue the name of the queue that is to receive the
job. The SERIAL, NORMAL, LONG, and DEBUG queues are available;
see Big Red usage policies.
To run a simple hmmsearch with models in
models.hmm and sequences in experiment56.fa
in the NORMAL queue using 512 processes for 8 hours, the command is:
When you run hmmerjob, you'll receive a message that your
job has been submitted to the queue. You will receive mail when the
job finishes. You can check the status of your job by using the
llq command.
Output from the job is stored in a file with a name of the form
hmmerjob.999999.0.out, where the nines are replaced by
other digits that represent the job ID. Errors and debugging output
are stored in a separate file with a name of the form
hmmerjob.999999.err.
Using the hmmerjob script to submit
parallel hmmpfam jobs
Because hmmpfam is more I/O intensive than
hmmsearch, it is better to index the HMM database
first. This step is required only the first time you use a particular
database. You can reuse the resulting .ssi file in
subsequent searches.
First, add the SoftEnv key (softkey):
soft add +mpi_hmmer-0.92-mpich-ibm-32Then index the HMM database:
serialjob hmmindex /path/to/HMM/database
Replace /path/to/HMM/database with the path to the
database you will be using.
The form of the hmmerjob command using hmmpfam is:
-CPUS [count] -wallhours [n] -queue [queue_name]
The rules for using hmmpfam are the same as those
described above for hmmsearch. For example, suppose you
would like to compare all the sequences in a file named
unknowns.fa with all the models in
models.hmm and select matches that have an E score of 1
or better, using four processes for up to two hours. The command would
be:
To run the same job using 64 processes for up to 72 hours, you would use:
hmmerjob hmmpfam --mpi -E 1 models.hmm unknowns.fa -CPUS 64 -wallhours 72Using non-parallel HMMER programs
The serial (single-process) HMMER programs are also available on
Big Red. The simplest way to use them is to put them on your path by
using the +mpi-hmmer softkey. To permanently make HMMER
available at the command prompt, run the commands:
You should then be able to run serial HMMER programs, and all HMMER
manual pages should be available to you by calling the
man command, e.g., man hmmbuild . If
you need to run serial HMMER programs in batch jobs, the simplest way
to do so is to use the serialjob script. A manual page for it is
available on Big Red.
This document was developed with support from the National Science Foundation (NSF) under Grant No. 0503697 to the University of Chicago and subcontracted to Indiana University. Additional support was provided by IU through its participation in the TeraGrid, which is supported by the NSF under Grants No. 0833618, SCI451237, SCI535258, and SCI504075. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.
Last modified on July 28, 2011.







