March 31, 2006
Research Cluster Packs Major Computational Punch for Researchers

A powerful tool in computational research is available to HSPH scientists. Called a research computing cluster, the tool consists of small form factor, high-performance servers whose collective punch can crunch enormous amounts of data, run simulation studies, test complicated methodologies, and aid software development.

Information Technology (IT) purchased the cluster last year after IT, working with faculty, researchers, and the faculty advisory committee, defined the need to set up a high-performance computing cluster at the School.

"Research is mission critical for the School," noted Greg Mazzu, HSPH System Administrator who manages the cluster with Bill Mahoney. "A goal of IT is to find ways in which we can best support that mission."

The mapping of the human genome, along with the development of technologies such as gene chips, has produced reams of data and has propelled the field of bioinformatics. Scientists are trying to catch up, determining how best to analyze the data.

DNA chip

The research computing cluster can help analyze data related to DNA chips, like the one above.
© iStockphoto.com/Andrei Tchernov

The HSPH research computing cluster consists of nodes, each with two processors that currently carry six gigabytes of RAM. There are 32 nodes at this time, but a planned upgrade will increase that number to 52 nodes, plus six terabytes of disk space. Unlike a mainframe computer, the cluster can theoretically be expanded limitlessly by simply adding another server. Each additional server provides that much more computational power.

"The research computing cluster is like a desktop computer on steroids," said Mazzu. "It's an amazing amount of power."

Case in point is the work of HSPH Assistant Professor Christoph Lange, a statistician who uses the cluster frequently. In 2003, he helped develop a software package with Professor Nan Laird to aid the detection of genes underlying complex diseases. The package, named PBAT, was revised last summer. Lange and his colleagues used the cluster to test the revision-running simulation studies, comparing the new approach to traditional methodologies, and analyzing the subsequent data. The result-Lange can now screen an entire genome for associations to complex diseases using his own desktop computer. The latest version of PBAT is freely available to academicians. A commercial version of the package is expected in the future.

In separate but related work, Lange was part of a team that used the cluster to identify candidate genes for asthma and chronic obstructive pulmonary disease. That research was part of the Childhood Asthma Management Program (CAMP) Genetics Ancillary Study based at the Channing Laboratory, Brigham and Women's Hospital.

Take Advantage of the Research
Computing Cluster

HSPH scientists interested in using the research computing cluster may contact Greg Mazzu or Bill Mahoney at cluster_admin@hsph.harvard.edu or at 617-432-4357. There is a fee to help purchase more nodes. The cost remains lower than buying a single server and avoids the need for purchasing a server contract. IT staff are available to offer technical support.

Specifications of the research cluster

  • Dual CPU machines; 32 nodes currently; planned upgrade to increase to 52 nodes
  • 64 processors; planned upgrade to increase to 104 processors
  • 8 GB RAM/Node
  • Planned upgrade to add six terabytes of disk space
  • Intel-based hardware
  • Major software packages include: R, Intel compiler, Matlab, SAS, Mathematica
For more information, visit the cluster website.