Cluster guide

TORQUE/Maui Cluster

TORQUE stands for Terascale Open-Source Resource and QUEue Manager. It is an Open Source distributed resource manager originally based on OpenPBS, the Portable Batch System (PBS). Our installation is used for running parallel jobs or making use of dedicated reservations. We use a separate program called Maui for scheduling jobs in TORQUE.

TORQUE and Maui are installed at /opt/UMtorque and /opt/UMmaui respectively. Please make sure /opt/UMtorque/bin and /opt/UMmaui/bin are added in your PATH environment variable.

The cluster is composed of a frontend (submit nodes) and compute nodes. The frontend is to be used for editing your code, compiling and submitting. To run any processing or testing of your code, you must submit it through the scheduler.

The scheduler takes care of assigning compute nodes to jobs. Basically, when you get assigned a node, you will be the only person on it for the duration of your job. After your timelimit is up or your process ends, the node will be cleaned and locked down for the next submission.

Logging in

To gain access to any of the nodes on a cluster, you will first need to log into submitnodes using ssh. This machine acts as a gateway to the rest of the cluster. No intensive processing is to be run on submit nodes. Submit nodes are shared with every other person in the cluster and in various research projects throughout the institute. If you run an intensive process on submit nodes, it will be killed so other research will not be affected.

More information about UMIACS cluster submit nodes and compute nodes is here
Setting up your environment

After you are logged in, you will have to set your account up to allow pbs to access from any of the compute nodes. This is required since pbs will write the stdout and stderr to files in your account. Use ssh-keygen with no password to create a keypair that can be used to grant access for your jobs. These can be generated by running the following:
```
cd $HOME
ssh-keygen -t rsa1 -N ""  -f $HOME/.ssh/identity
ssh-keygen -t rsa -N "" -f $HOME/.ssh/id_rsa
ssh-keygen -t dsa -N "" -f $HOME/.ssh/id_dsa
cd .ssh
touch authorized_keys authorized_keys2
cat identity.pub >> authorized_keys
cat id_rsa.pub id_dsa.pub >> authorized_keys2
chmod 640 authorized_keys authorized_keys2
```
To test your keys, you should be able to 'ssh submitnode' and be returned to a prompt.
Requesting interactive usage

Sometimes you will want to test an intensive program without preparing a submission script and going through the hassle of the scheduler. You can run '/opt/UMtorque/bin/qsub -I' to request interactive usage on a node. After running qsub -I your shell will hang until a resource can be allocated to you. When the resource has been allocated, it will open up a new shell on the allocated node. You can now ssh into the node for the duration of the allocated shell. When you logout from the initial shell, or your timelimit is up, the node will again be locked down and you will have to ask the scheduler for access again.

Below is an example to get a interactve session:
[xhe@opensub01 24] qsub -I qsub: waiting for job 152.opensrv.umiacs.umd.edu to start qsub: job 152.opensrv.umiacs.umd.edu ready [xhe@openlab00 21] echo hello hello [xhe@openlab00 22] exit logout qsub: job 152.opensrv.umiacs.umd.edu completed [xhe@opensub01 25]
Running your first job

We will walk through a simple 'hello world' submission script will help you understand how submitting jobs works.
1. Create a submission file
  In your home directory on a submit node, create a file called test.sh that contains the following:
  #!/bin/bash #PBS -lwalltime=10:00 #PBS -lnodes=3 echo hello world hostname echo finding each node I have access to for node in `cat ${PBS_NODEFILE}` ; do echo ---------- /usr/bin/ssh $node hostname echo ---------- done
  The script is a normal shell script except that it includes extra #PBS directives. These directives control how you request resources on the cluster. In this case we are requesting 10 minutes of total node time split across 3 nodes. Each node will be given 3:33 minutes of access to you. Often times people will forget to specify walltime for jobs over 2 nodes. The default walltime is 48hrs/node, so requesting 3 nodes will try to schedule 144 hours of cluster time which exceeds the maximum allowed.
2. submit the job to the scheduler using /opt/UMtorque/bin/qsub
```
[xhe@opensub00 28]$ /opt/UMtorque/bin/qsub test.sh
123.opensrv.umiacs.umd.edu
```
  You can check the status of your job by running /opt/UMtorque/bin/qstat
```
[xhe@opensub00 29]$ /opt/UMtorque/bin/qstat -n


opensrv.umiacs.umd.edu: 
                                                                   Req'd  Req'd   Elap
Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
123.opensrv.umiacs.u xhe      dque     test.sh       --      3  --    --  48:00 R   -- 
   openlab00/0+openlab01/0+openlab02/0
 [opensub00 30]
```
  This shows us that the job is running 'R' and is using nodes openlab00, openlab01 and openlab02. A 'Q' for status means that your job is waiting in line for resources to free up. If you requested too many resources, your job will sit in queue until the end of time.
3. check output
  
  When your job is finished, you will have two files in the directory you submitted the job from. They contain stdout (.oJOBID) and stderr (.eJOBID)
  
  The job we submitted above generated an empty error file test.sh.e123 and the following stdout file:
  [xhe@opensub00 30]$ cat test.sh.o123 Warning: no access to tty (Bad file descriptor). Thus no job control in this shell. hello world openlab00.umiacs.umd.edu finding each node I have access to ---------- openlab00.umiacs.umd.edu ---------- ---------- openlab01.umiacs.umd.edu ---------- ---------- openlab02.umiacs.umd.edu ---------- [xhe@opensub00 31]
  The first three lines in your output are a standard part of how we have our cluster configured and do not affect how your program runs.

Running MPI program as Batch Job

At UMIACS, we have LAM, openmpi, MPICH1, MPICH2 installed. LAM is installed at /usr/local/stow/lam-version; MPICH1 is available in /usr/local/stow/mpich1-version; MPICH2 is available in /usr/local/stow/mpich2-version; openmpi is available in /usr/local/stow/openmpi-version

First, you need to have an MPI based program written. Here's a simple one:

alltoall.c

To compile this program and execute under using LAM, make sure /usr/local/stow/lam-7.1.4/bin is in your PATH environment variable.: It can be compiled by doing: mpicc alltoall.c -o alltoall-lam
The submission file lamsub.sh can be submitted to run your program
#!/bin/bash #PBS -l nodes=8 #PBS -l walltime=0:10:0 cd ~/torquejobs/lamtest mpiexec -machinefile ${PBS_NODEFILE} alltoall-lam
Here is what looks like on your terminal after submit the job:
[opensub00 142] qsub lamsub.sh 127.opensrv.umiacs.umd.edu [opensub00 143] qstat -n opensrv.umiacs.umd.edu: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - ----- 127.opensrv.umiacs.u xhe dque lamsub.sh -- 8 -- -- 48:00 R -- openlab00/0+openlab01/0+openlab02/0+openlab03/0+openlab04/0+openlab05/0 +openlab06/0+openlab07/0 [opensub00 144]
Output files for this job: lamsub.sh.o127 and lamsub.sh.e127(empty)

The submission file lamsub2.sh uses mpirun instead of mpiexec, which user need to set up the MPI environment by starting lamboot, then run lamhalt to stop it afterwards. We recommend user use mpiexec since it will set up MPI runtime environment for your jobs.
To compile and run this program using openmpi, you need to include /usr/local/stow/openmpi-version in your path. The following example uses /usr/local/stow/openmpi-1.2.6.: The following script will set up your path variables. setenv PATH /usr/local/stow/openmpi-1.2.6/bin:$PATH if ( $?LD_LIBRARY_PATH ) then setenv LD_LIBRARY_PATH /usr/local/stow/openmpi-1.2.6/lib:$LD_LIBRARY_PATH else setenv LD_LIBRARY_PATH /usr/local/stow/openmpi-1.2.6/lib endif
The sample c code can be compiled by doing: mpicc alltoall.c -o alltoall-openmpi (we changed our environment to point to openmpi's mpicc)
The following is the submission file openmpisub.sh #!/bin/bash # Special PBS control comments #PBS -l nodes=8,walltime=0:10:0 # Set up the path export PATH=/usr/local/stow/openmpi-1.2.6/bin:$PATH export LD_LIBRARY_PATH=/usr/local/stow/openmpi-1.2.6/lib:$LD_LIBRARY_PATH cd ~/torquejobs/openmpitest echo starting mpiexec -mca btl tcp,self -n 8 ./alltoall-openmpi echo ending
Here is what lookslike when submit the script at your prompt:
[xhe@opensub00 91] qsub openmpisub.sh 133.opensrv.umiacs.umd.edu [xhe@opensub00 92] qstat -n opensrv.umiacs.umd.edu: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - ----- 133.opensrv.umiacs.u xhe dque openmpisub 16472 8 -- -- 48:00 R -- openlab00/0+openlab01/0+openlab02/0+openlab03/0+openlab04/0+openlab05/0 +openlab06/0+openlab07/0 [xhe@opensub00 93]
Output files for this job: openmpisub.sh.o133 and openmpisub.sh.e133
To compile and run this program under MPICH1 you need to set up your environment:: The following script will set the appropriate environment.setenv MPI_ROOT /usr/local/stow/mpich-version setenv MPI_LIB $MPI_ROOT/lib setenv MPI_INC $MPI_ROOT/include setenv MPI_BIN $MPI_ROOT/bin # add MPICH commands to your path (includes mpirun and mpicc) set path=($MPI_BIN $path) # add MPICH LD_LIBRARY_PATH pages to your path if ( $?LD_LIBRARY_PATH ) then setenv LD_LIBRARY_PATH $MPI_ROOT/LD_LIBRRAY_PATH:$LD_LIBRARY_PATH else setenv LD_LIBRARY_PATH $MPI_ROOT/LD_LIBRARY_PATH endif
It can be compiled by doing: mpicc alltoall.c -o alltoall-mpich1 (remember we changed our environment to point to MPICH's mpicc)
The submission file mpich1sub.sh is almost the same except you need to call mpirun instead of mpiexec.#!/bin/bash # Special PBS control comments #PBS -l nodes=8,walltime=60 # Set up the path PATH=/usr/local/stow/mpichgm-1.2.7p1-20/bin:$PATH export $PATH cd ~/mpich1test/ echo $PBS_NODEFILE # Run the program mpirun -np $( wc -l < $PBS_NODEFILE ) ./alltoall-mpich1
Here is what looks like when submit the job from a submit machine at your prompt:
[xhe@brood00 ~/mpich1test]$ qsub mpich1sub.sh 167.queen.umiacs.umd.edu [xhe@brood00 ~/mpich1test]$ qstat -n queen.umiacs.umd.edu: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - ----- 167.queen.umiacs.umd xhe dque mpich1sub. -- 8 -- -- 04:00 R -- bug00/0+bug01/0+bug02/0+bug03/0+bug04/0+bug05/0+bug06/0+bug07/0 [xhe@brood00 ~/mpich1test]$
Output files for this job: mpich1sub.sh.o167 and mpich1sub.sh.e167(empty)
To compile use mpich2, you need set up your environment for it.: You need to set up the path variables to include the MPICH2 version you want to use. You will make two changes: one to the PATH variable; the other, the LD_LIBRARY_PATH.
The following example uses version MPICH2-1.0.7
For a bash shell user, append the following in .bash_profile:
export MPICH2_HOME=/usr/local/stow/mpich2-1.0.7 export PATH=$MPICH2_HOME/bin:$PATH export LD_LIBRARY_PATH=$MPICH2_HOME/lib:$LD_LIBRARY_PATH For a C shell user, append the following in your .cshrc: setenv MPICH2_HOME /usr/local/stow/mpich2-1.0.7 setenv PATH $MPICH2_HOME/bin:$PATH setenv LD_LIBRARY_PATH $MPICH2_HOME/lib:$LD_LIBRARY_PATH
The sample c code can be compiled by doing: mpicc alltoall.c -o alltoall-mpich2 ( we uses MPICH2's mpicc)
Here is a sample submission file for mpich2: mpich2sub.sh.#!/bin/bash #PBS -lwalltime=0:10:0 #PBS -lnodes=8 # Set up the path export MPICH2_HOME=/usr/local/stow/mpich2-1.0.7 export PATH=$MPICH2_HOME/bin:$PATH echo starting mpiexec -n 8 /nfshomes/xhe/torquejobs/mpich2test/alltoall-mpich2 echo ending
Before you submit you job to the cluster, you need do the following to start mpd daemon, which must run on each compute node to be used by your program.
Make sure you have a file .mpd.conf in your home directory, with a line like this: secretword=your-favorite-word Create a hostfile for mpd daemon in some directory that you can reference. It includes the compute nodes that you want daemons to be started. List the nodes name, one line per node as the following. openlab00 openlab01 openlab02 ... openlan07
Then start the mpd daemon by type
mpdboot -n #ofnodes -f path-to-hostfile/hostfile. (You will need to run mpiallexit later to shut down the daemon after your job finished.)
After the daemon started, you can submit the about mpich2sub.sh script using command qsub.
Here is what looks like when you submit from your prompt:
[xhe@opensub01 68] mpdboot -n 8 -f mpd.hostfile [xhe@opensub01 69] qsub mpich2sub.sh 140.opensrv.umiacs.umd.edu [xhe@opensub01 70] qstat -n opensrv.umiacs.umd.edu: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - ----- 140.opensrv.umiacs.u xhe dque mpich2sub. 3403 8 -- -- 48:00 R -- openlab00/0+openlab01/0+openlab02/0+openlab03/0+openlab04/0+openlab05/0 +openlab06/0+openlab07/0 [xhe@opensub01 71] qstat -n [xhe@opensub01 72] mpdallexit [xhe@opensub01 73]
Here are the standard output and standard error for this job: mpich2sub.sh.o140 and mpich2sub.sh.e140(empty)

Please note that if you compile your program with either mpich, lam or openmpi, you MUST execute it in the same environment. If you compiling program use mpicc from LAM and then attempting to run program using MPICH's mpiexec. This will fail and you will get an error message similiar to the following:

It seems that there is no lamd running on the host openlab02.umiacs.umd.edu.
 
This indicates that the LAM/MPI runtime environment is not operating.
The LAM/MPI runtime environment is necessary for MPI programs to run
(the MPI program tired to invoke the "MPI_Init" function).
 
Please run the "lamboot" command the start the LAM/MPI runtime
environment.  See the LAM/MPI documentation for how to invoke
"lamboot" across multiple machines.

Commands

Please make sure /opt/UMtorque/bin is in your PATH environment variable.
- qsub
  - Basic usage
    
    The qsub program is the mechanism for submitting a job. A job is a shell script, taken either from standard input or as an argument on the command line.
    
    The basic syntax of qsub, that you will probably be using
    most of the time, is:
    
    qsub -l nodes=<nodes> <scriptname>
    
    where <nodes> is the number of machines you'd like to allocate.
    Then, when PBS runs your job, the name of the file with the nodes
    allocated to you will be in $PBS_NODEFILE, and PBS will begin
    running your job on one single node from that allocation.
    
    When you run qsub, you will get a message like:
    
    123.opensrv.umiacs.umd.edu
    
    This is your job id. This is used for many things, and you should
    probably keep a record of it.
    
    When a job finishes, PBS deposits the standard output and standard
    error as <jobname>.o<number> and
    <jobname>.e<number>, where
    <jobname> is the name of the script you submitted (or
    STDIN if it came from qsub's standard in), and <number>
    is the leading number in the job id.
  - -l option
    
    The -l option is used to specify resources used by a PBS job.
    Two important ones are nodes, which specifies the number of nodes
    used, and walltime, which specifies the maximum amount of
    wall clock time that the process will use. The following invocation
    of qsub runs a job on 2 nodes for one hour:
    
    qsub -l nodes=2,walltime=01:00:00
    
    It is important that you specify walltime. Without it, your
    job may be scheduled unfavorably (because your job takes less than the
    thirty minute default). Even worse, your job may be terminated
    prematurely if you go over the thirty minute default.
    
    See pbs_resources(7) for more information.
    - The nodes resource
      
      In addition to specifying the number of nodes in a job, you can also
      use the nodes resource to specify features required for your job.
    - Submitting to specific nodes
      
      To submit to a specific set of nodes, you can specify those nodes,
      separated by a "+" character, in the nodes resources. For
      instance:
      
      qsub -l nodes=openlab00+openlab01,walltime=60
      
      ... will submit a two node job on openlab00 and openlab01,
      with a maxmimum time of sixty seconds.
      
      In general, this should be avoided, since you are limited to those
      nodes that you specify. For instance, if you have files that only
      reside on particular nodes, in the scratch space, you might want to
      use this option.
  - -I option
    
    To submit an interactive job, use the -I option:
    
    qsub -l <resources> -I
    
    Then, instead of enqueuing a batch job and exiting, the qsub
    program will wait until your interactive job runs. When it does, PBS
    will present you with a shell on one of the nodes that you have been
    allocated. You can then use all nodes allocated, until your time
    allocation is consumed.
  - Extended job descriptions
    
    The qsub program will let you put information about your job
    in your script, by including comments that begin the line with '#PBS',
    and include a single command line option. For instance, if I always
    want my job to use two nodes, I could put the following at the
    beginning of my script:
    
    #PBS -l nodes=2
    
    The "EXTENDED DESCRIPTION" heading in qsub(1) has
    more information about using this feature.
- qstat
  
  This program tells you the status of your's and other people's jobs.
  The basic case of running qstat is very simple: you just run
  qstat, with no options. If it gives no output, it means
  there are no jobs in the queue.
- qdel
  
  The qdel program is used to remove your job from the queue,
  and cancel it if it's running. The syntax for qdel is "qdel <job id>",
  but you can abbreviate the job ID with just the leading number.
- pbsnodes
  
  The pbsnodes command is used to list nodes and their status. You will
  probably only use this one way, with the "-a" argument:
  
  pbsnodes -a
- pbsdsh
  
  Run a shell command on all nodes allocated
For more information, see the man pages for PBS commands. If, for some reason, these can't be viewed with the default manpath, you can use:
man -M /opt/UMtorque/man <topic>
Compilers

The following compilers and memory analyze tools are available at UMIACS.
- Gnu compilers
  Besides default Gnu C/C++ and fortran compilers, UUMIACS also have several versions of gcc install at /usr/local/stow/gcc-version directory.
- PGI compiler
  The Portland Group C and Fortran compilers are installed at /opt/pgi directory.
- Intel
  Intel C and Fortran compilers are installed at /opt/intel.
- NAGWare
  UMIACS has NAGWare fortran compiler, it is installed at /opt/NAGWare_f95 directory.
- insure++
  
  Parasoft Insure++, is a runtime Analysis, Memory Error Detection software for C and C++. It is installed at /opt/insure directory.
To find other software that installed at UMIACS, please check /usr/local/stow or /opt directory.

UMIACS Condor pool information can be found at:

condorintro.html

		home \| projects \| facilities \| reference \| contact us © Copyright 2003, Institute for Advanced Computer Study, University of Maryland, All rights reserved.