SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters

Chunlin Wang; Elliot J Lefkowitz

doi:10.1186/1471-2105-5-171

SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters

BMC Bioinformatics. 2004 Oct 28:5:171. doi: 10.1186/1471-2105-5-171.

Authors

Chunlin Wang¹, Elliot J Lefkowitz

Affiliation

¹ Department of Microbiology, University of Alabama at Birmingham, Birmingham, Alabama 35294-2170, USA. wangcl@uab.edu <wangcl@uab.edu>

Abstract

Background: Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary.

Results: We describe the implementation of SS-Wrapper (Similarity Search Wrapper), a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search) approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST) that provides a complementary solution for BLAST searches when the database is too large to fit into the memory of a single node.

Conclusions: Used together, QS-search and DS-BLAST provide a flexible solution to adapt sequential similarity searching applications in high performance computing environments. Their ease of use and their ability to wrap a variety of database search programs provide an analytical architecture to assist both the seasoned bioinformaticist and the wet-bench biologist.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Algorithms
Base Sequence
Cluster Analysis
Computational Biology / instrumentation*
Computational Biology / methods*
Computer Graphics
Computers
Computing Methodologies
Databases, Factual
Databases, Genetic
Databases, Protein
Information Storage and Retrieval
Programming Languages
Sequence Alignment
Sequence Analysis, Protein
Software
User-Computer Interface

Abstract

Publication types

MeSH terms

Grants and funding