Email Share
Close
E-mail It

NOTE: Recipients' Email Address currently accepts only 5 email addresses separated by commas.

Information Technology Department

High Performance Research Cluster

Research Cluster General Information

Cluster hardware

The cluster runs Red Hat Enterprise Linux version 4.5 on Intel processors with 64 bit processing capability. 

The cluster contains 81 nodes/262 CPUs, each with two processors: 31 nodes with single core Xeon 3.2 GHz processors and 6 Gb RAM for the node, 20 nodes with single core Xeon 3.2 GHz processors and 8 Gb RAM for the node, 20 nodes with dual core Xeon 2.33 GHz processors with 8 Gb RAM, and 10 nodes with quad core Xeon 2.33 GHz processors with 16 Gb RAM.

Only one job is run on a given core at one time to ensure that each job finishes as fast as possible. This gives the cluster the capacity to run 262 jobs simultaneously.

There is a 6 Gb memory limit per job.

Email aliases for questions and communications are

cluster_admin: cluster administrators
cluster_users: all individuals with accounts
cluster_owners: all individuals who have purchased nodes

Cluster access privileges

To use the cluster, you must be affiliated with a faculty member who has leased nodes on the cluster. Contact cluster_admin for more information. For biostatistics department members, the department has leased nodes for general use - contact the department computing committee chair for access.

Using the cluster

Accessing the cluster

To access the cluster you will need to ssh (see section below) to hpcc.sph.harvard.edu. Your user name will be the first initial of your first name and your last name up to eight characters. Your password will be hu + first 6 numbers of your Harvard ID. You should change your password with the passwd command.


Copying your files to the cluster
 
To copy files to the cluster you must use the sftp protocol, i.e., using an ftp client that supports secure file transfer (see next section). 

SSH/SFTP client software

To access the cluster for submitting jobs and to transfer files you need client software on your local machine that uses the ssh and sftp protocols.

For Windows, Putty is an ssh client available from the IT downloads page (http://www.hsph.harvard.edu/administrative-offices/information-technology/downloads/).

WinSCP is a Windows ftp client that is also available on the HSPH IT download page.

From another Linux/UNIX machine, you can use the UNIX ssh and scp commands. These should also work from an X terminal on Mac OS X.

Displaying graphical output

To be able to open X windows (UNIX graphical windows) from the cluster on your desktop, you need X windows server software  on your machine.

For Windows, you'll need to install X server software such as cygwin or Xwin32. Start the program. Then in your ssh software (e.g., putty), enable X11 forwarding (under 'Connection', then 'SSH', then 'X11'). Then ssh into hpcc.sph.harvard.edu and open an X window (e.g., using xterm or starting SAS or creating an R graphic).

From Linux/UNIX/Mac OS X, just use the command 'ssh -X hpcc.sph.harvard.edu' to enable X11 forwarding.


Running jobs

The cluster uses a scheduling program called LSF from Platform Computing. A comprehensive list of commands for the scheduler is available at http://my.platform.com/docs/lsf/7.0/reference/index.html.  Here is an overview of the key ones.

When you are submitting a job to the cluster use the bsub command and then your usual UNIX batch commands.

[gmazzu@hpcc /]$bsub R CMD BATCH --no-save job1.R job1.out

In this case user gmazzu is submitting an R job called job1.R

What jobs of mine are running?

Once you have submitted jobs you can see what is running by using the bjobs command.

[gmazzu@hpcc /]$ bjobs

JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME

1014 gmazzu RUN preemptabl hpcc compute-0-2 *.2 1 1000 Aug 31 10:22
1011 gmazzu RUN preemptabl hpcc compute-0-1 *.2 1 1000 Aug 31 10:22
1015 gmazzu RUN preemptabl hpcc compute-0-0 *.2 1 1000 Aug 31 10:22
1018 gmazzu RUN preemptabl hpcc compute-0-2 *.2 1 1000 Aug 31 10:22

Stopping jobs

If you would like to stop a job, use the bkill command. You would use bkill and then the JOBID which you can see in the example above.

[gmazzu@hpcc /]$ bkill 1018

Choosing a queue for your job

There are currently 3 queues available for submission, in addition to  group-specific queues (see below for information about these). They can be viewed by typing bqueues.

[gmazzu@hpcc /]$ bqueues

Each queue has special attributes and can be used for different purposes. Users should be using the normal, preemptable, and long queues unless they are in a group that has set up their own queue. 

 

normal(default queue) - This queue is best used for jobs that are time-sensitive and need to run without being paused. The time limit for jobs in this queue is 5 days; jobs running longer will be killed. Jobs here have the highest priority and users can run up to 4 simultaneous jobs. At any one time each group can only run as many normal jobs as the number of slots leased by the group owner, so an individual user may not be able to run 4 simultaneous jobs if other users in the group are tying up the slots.

preemptable - This queue is for submitting jobs on other owners' nodes when they are not in use or to run more than 4 jobs. Users can run up to 16 simultaneous jobs.  There is no time limit for jobs being running in this queue. Jobs here can potentially be paused by the normal queue if a user submits a job in the normal queue. When a job is paused it will restart when the normal queue job finishes.

long - The long queue is unique because it has no time limit or limit on the number of jobs running. If no one else is using the cluster, you can run as many jobs as there are job slots on the cluster. It also has the lowest priority of all the queues. Both normal and preemptable can pause jobs in this queue. This is used when you have used up all of your normal and preemptable job slots but there are still open slots on the cluster.  Also, if you have jobs that are low priority, you may wish to use this queue solely to allow other jobs of your own or others to run at higher priority. Please understand that jobs in this queue can be suspended for a long time.

 

To submit a job to another queue other than normal, use bsub -q "queue name"

[gmazzu@hpcc /]$ bsub -q long R CMD BATCH --no-save job1.R job1.out

In this case job1.R was submitted to the long queue.

 

Prioritization of jobs

The queuing software accounts for the number of jobs that users are currently running as a function of their group. The PENDING list of jobs will be ordered so that the first job (within a given queue) to start when a slot on the cluster becomes available will go to a user whose group is using the lowest proportion of its leased portion of the cluster.  Note that there is no prioritization within a group; jobs run as first come, first serve.

 

Switching queues after job submission

You can also move a running or pending jobs from one queue to another using the bswitch command.

[gmazzu@hpcc]$ bjobs

 

JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME

143217 gmazzu RUN normal hpcc compute-0-7 * Sens12.R Sep 14 14:08

143218 gmazzu RUN normal hpcc compute-0-1 * Sens12.R Sep 14 14:08

143219 gmazzu PEND normal hpcc * Sens12.R Sep 14 14:08

143220 gmazzu PEND normal hpcc * Sens12.R Sep 14 14:08

143221 gmazzu PEND normal hpcc * Sens12.R Sep 14 14:08

143222 gmazzu PEND normal hpcc * Sens12.R Sep 14 14:08

 

[gmazzu@hpcc]$ bswitch long 143219

 

Job <143219> is switched to queue <long>

 

[gmazzu@hpcc]$ bjobs

 

JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME

143217 gmazzu RUN normal hpcc compute-0-7 * Sens12.R Sep 14 14:08

143218 gmazzu RUN normal hpcc compute-0-1 * Sens12.R Sep 14 14:08

143219 gmazzu RUN long hpcc compute-1-6 * Sens12.R Sep 14 14:08

143220 gmazzu PEND normal hpcc * Sens12.R Sep 14 14:08

143221 gmazzu PEND normal hpcc * Sens12.R Sep 14 14:08

143222 gmazzu PEND normal hpcc * Sens12.R Sep 14 14:08

In this example job 143219 was moved from the normal queue to the long queue.

 

 

Running Interactive Jobs

Interactive jobs are run through the normal queue so users can only run 4 at once. Please make sure to shut down the job when you're done or you will tie up a job slot and prevent someone else from running a job in that slot.

The basic syntax is:

bsub -Ip R

bsub -Ip sas

bsub -Ip bash

depending on what you want to run.  Matlab should NOT be run interactively as the licensing is such that the single interactive job will prevent any new Matlab jobs from being submitted by any other user.

For applications that start graphical windows(e.g. R when making a graphic, or SAS) see the information above on X windows, as your local machine needs to be set up to display the X window.

                    

 

64-bit processing

The cluster processors can handle 64 bit processing. This means, among other things, that applications can address more than the 4 Gb of RAM that one is limited to in 32 bit processing.

 

All core software on the cluster runs in 64 bit mode, including R, SAS and Matlab.

 

Software

 

R

R is compiled using the Intel compilers and uses Goto's BLAS, both of which greatly improve speed, particularly of basic linear algebra operations.

Many R packages have been installed by the administrators and are accessible to all users.  For additional packages, we recommend that you email cluster_admin to have them install other packages you may need.  However, if you need the package right away or it is a specialized package that others are unlikely to need, you can install the package locally in your home directory as follows.

1.)    create a directory to store the packages, e.g., 'mkdir ~/Rlibs'

2.)    start R on the head node

3.)    > install.packages('packageName',lib='~/Rlibs')

4.)    quit R

5.)    start R using bsub and load the package as '> library(packageName,lib.loc='~/Rlibs')'

 

Note that this is the only time you should run R from the head node without invoking the bsub command.  Package installation often involves compiling C or Fortran code, which under our setup requires the Intel Fortran or C compilers, which are only available on the head node.

 

SAS

Owners have purchased SAS licenses for 14 nodes.  In general to use SAS we ask that you be affiliated with an owner who has contributed to the cost of the software.  Email cluster_admin for more information.  For biostat department members, the department has purchased SAS licenses for moderate amounts of use. If you are a biostat department member and you use SAS intensively we ask that you talk to the department computing committee chair about contributing to the cost of the licensing, which is approximately $900 per node per year.

 

To run SAS in batch mode:

 

[gmazzu@hpcc /]$ bsub sas -noterminal code.sas -log file.log

 

Matlab

Batch mode:

To properly submit a batch job that does not tie up the licenses except during the submission process, you must follow the instructions in the readme.txt file in the attached zip file. Submitting jobs in other ways may tie up the licenses and prevent other users from starting jobs.


Interactive use:

We have bought only two licenses for Matlab, which means that if there were two interactive jobs running at once, no one could submit a batch job, so we need to make sure only one interactive job is running at once, keeping a license free for batch mode submission. The way to check this is to type the following after logging on:
bjobs -u all -l | grep 'matlab'
The '-u all' lists all jobs, the '-l' spits out the full information on each job and the grep part searches for matlab jobs.

If you see something like
"terminal mode, Command "
that means there is another interactive Matlab job running and you should not submit an interactive job.

If you do this and find no other interactive jobs running, you can start an interactive session using only the command line as
bsub -Ip matlab -nodisplay
or if you have an Xwindows server set up on your local machine, you can start the Matlab GUI as
bsub -Ip matlab

 

 

OpenBUGS

 Users can run OpenBUGS, the open source version of WinBUGS, through the command 'linbugs'. This uses the old command line functionality of BUGS. The easiest way to use this is to run from R using code prepared by Chris Paciorek. This allows one to run batch jobs. Please see this zip file for instructions and template code.

 

Other software

  

Other installed software includes Mathematica, octave (an open source version of Matlab), PBAT, FBAT, SaTScan, and Splus.

 

If you need to run other software, you can install it locally in your home directory or contact cluster_admin about having it installed on the cluster for anyone's use.

 

The HSPH IT Department does not support third party software, check with your department or sponsor, for support.

 

Compilation

To compile C, C++, or FORTRAN code, you can use either the gnu compilers or the Intel compilers (icc and ifort) from the command line on the head node when you log in to the cluster (i.e., don't submit a compilation job using bsub). The Intel compilers are generally expected to give faster code as they are optimized to the Intel processors.

 
Disk space

Each node is allotted 5 Gb of disk space, so if you belong to a group that owns 4 nodes, your group has access to 20Gb of space, to be split amongst users in the group as per the owner's request. Users can email cluster_admin to find out what their individual quota is.  If a user goes above their quota, they will receive an email notifying them. After approximately 5 days over the limit, the user will not be able to write to disk until they have removed files to get below their quota. Users can check their disk usage by typing "du -s" from their home directory.

The disk space is backed up to tape daily.

Additional disk space will be leased to users for a cost of $2.58/Gb ($2580/Tb) annually. Contact cluster_admin.

Purchasing Nodes

Only SPH-affiliated researchers are allowed to purchase nodes in the cluster. If interested, please contact Bill Mahoney, Assistant Director of Information Technology, 432-1751, cluster_admin@hsph.harvard.edu.

 

 

Setting up a queue specific for your group

If you are the owner of nodes, you can choose to have your nodes enter the overall pool of nodes with your group assigned the number of job slots that you have leased in the normal queue, or you can set up your own queue.  With your own queue, you guarantee access at any time to as many job slots as you have leased and your individual users can run more than 4 jobs in the normal queue and run high priority jobs for longer than 5 days, if you so choose.   Please contact cluster_admin to better understand the tradeoffs involved in setting up your own queue. 

Cluster policies

Cluster policies were initially determined based on conversations between node owners and IT personnel.  At this stage, there is no formal procedure for determining policies, but owners and users can email cluster_admin or cluster_owners with comments and suggestions.