Indiana University
University Information Technology Services
  
What are archived documents?
Login>>
Login

Login is for authorized groups (e.g., UITS, OVPIT, and TCC) that need access to specialized Knowledge Base documents. Otherwise, simply use the Knowledge Base without logging in.

Close

Using MPI-HMMER on Big Red at IU

On this page:


Introduction

HMMER is a suite of programs that you can use to create and query hidden Markov models that describe molecular sequences. A parallel port of HMMER known as MPI-HMMER is available on Big Red at Indiana University. It contains all the HMMER programs, but only hmmpfam and hmmsearch have been parallelized.

MPI-HMMER is installed in the directory /N/soft/linux-sles9-ppc64/hmmer-2.3.2-MPI-0.92. Documentation for HMMER programs is available as man pages. You can also visit the MPI-HMMER page for more information.

Using the hmmerjob script to submit parallel hmmsearch jobs

The hmmerjob script is written to submit both hmmpfam and hmmsearch jobs. Use the hmmpfam and hmmsearch options with hmmerjob just as you would with serial versions of these programs. If you use only the hmmpfam and hmmsearch options, a job will be submitted that uses four processes for up to two hours in the NORMAL queue on Big Red. You can use other options to change those settings.

The form of the hmmerjob command using hmmpsearch is:

hmmerjob hmmsearch --mpi options_to_hmmsearch -CPUS [count] -wallhours [n] -queue [queue_name]

Replace items in brackets with your chosen values. The -CPUS option specifies the number of processes to start, -wallhours the length of time the job may run, and -queue the name of the queue that is to receive the job. The SERIAL, NORMAL, LONG, and DEBUG queues are available; see Big Red usage policies.

To run a simple hmmsearch with models in models.hmm and sequences in experiment56.fa in the NORMAL queue using 512 processes for 8 hours, the command is:

hmmerjob hmmsearch models.hmm experiment56.fa -CPUS 512 -wallhours 8 -queue NORMAL

When you run hmmerjob, you'll receive a message that your job has been submitted to the queue. You will receive mail when the job finishes. You can check the status of your job by using the llq command.

Output from the job is stored in a file with a name of the form hmmerjob.999999.0.out, where the nines are replaced by other digits that represent the job ID. Errors and debugging output are stored in a separate file with a name of the form hmmerjob.999999.err.

Using the hmmerjob script to submit parallel hmmpfam jobs

Because hmmpfam is more I/O intensive than hmmsearch, it is better to index the HMM database first. This step is required only the first time you use a particular database. You can reuse the resulting .ssi file in subsequent searches.

First, add the SoftEnv key (softkey):

soft add +mpi_hmmer-0.92-mpich-ibm-32

Then index the HMM database:

serialjob hmmindex /path/to/HMM/database

Replace /path/to/HMM/database with the path to the database you will be using.

The form of the hmmerjob command using hmmpfam is:

hmmerjob hmmpfam --mpi /path/to/HMM/database /path/to/search/sequence [other_options_to_hmmpfam]
-CPUS [count] -wallhours [n] -queue [queue_name]

The rules for using hmmpfam are the same as those described above for hmmsearch. For example, suppose you would like to compare all the sequences in a file named unknowns.fa with all the models in models.hmm and select matches that have an E score of 1 or better, using four processes for up to two hours. The command would be:

hmmerjob hmmpfam --mpi -E 1 models.hmm unknowns.fa

To run the same job using 64 processes for up to 72 hours, you would use:

hmmerjob hmmpfam --mpi -E 1 models.hmm unknowns.fa -CPUS 64 -wallhours 72

Using non-parallel HMMER programs

The serial (single-process) HMMER programs are also available on Big Red. The simplest way to use them is to put them on your path by using the +mpi-hmmer softkey. To permanently make HMMER available at the command prompt, run the commands:

echo +mpi-hmmer-0.92-mpich-ibm-32 >> ~/.soft resoft

You should then be able to run serial HMMER programs, and all HMMER manual pages should be available to you by calling the man command, e.g., man hmmbuild . If you need to run serial HMMER programs in batch jobs, the simplest way to do so is to use the serialjob script. A manual page for it is available on Big Red.

This document was developed with support from the National Science Foundation (NSF) under Grant No. 0503697 to the University of Chicago and subcontracted to Indiana University. Additional support was provided by IU through its participation in the TeraGrid, which is supported by the NSF under Grants No. 0833618, SCI451237, SCI535258, and SCI504075. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.

This is document awwb in domain all.
Last modified on July 28, 2011.

Comments/Questions/Corrections

Use this form to offer suggestions, corrections, and additions to the Knowledge Base. We welcome your input!

If you are affiliated with Indiana University and would like assistance with a specific computing problem, please use the Ask a Consultant form, or contact your campus Support Center.

Contact Information

Note: We will reply to your comment at this address. If your message concerns a problem receiving email, please enter an alternate email address.