TORQUE/Maui Cluster

TORQUE stands for Terascale Open-Source Resource and QUEue Manager. It is an Open Source distributed resource manager originally based on OpenPBS, the Portable Batch System (PBS). Our installation is used for running parallel jobs or making use of dedicated reservations. We use a separate program called Maui for scheduling jobs in TORQUE.

TORQUE and Maui are installed at /opt/UMtorque and /opt/UMmaui respectively. Please make sure /opt/UMtorque/bin and /opt/UMmaui/bin are added in your PATH environment variable.

The cluster is composed of a frontend (submit nodes) and compute nodes. The frontend is to be used for editing your code, compiling and submitting. To run any processing or testing of your code, you must submit it through the scheduler.

The scheduler takes care of assigning compute nodes to jobs. Basically, when you get assigned a node, you will be the only person on it for the duration of your job. After your timelimit is up or your process ends, the node will be cleaned and locked down for the next submission.

  • Logging in

    To gain access to any of the nodes on a cluster, you will first need to log into submitnodes using ssh. This machine acts as a gateway to the rest of the cluster. No intensive processing is to be run on submit nodes. Submit nodes are shared with every other person in the cluster and in various research projects throughout the institute. If you run an intensive process on submit nodes, it will be killed so other research will not be affected.

    More information about UMIACS cluster submit nodes and compute nodes is here

  • Setting up your environment

    After you are logged in, you will have to set your account up to allow pbs to access from any of the compute nodes. This is required since pbs will write the stdout and stderr to files in your account. Use ssh-keygen with no password to create a keypair that can be used to grant access for your jobs. These can be generated by running the following:

    
    cd $HOME
    ssh-keygen -t rsa1 -N ""  -f $HOME/.ssh/identity
    ssh-keygen -t rsa -N "" -f $HOME/.ssh/id_rsa
    ssh-keygen -t dsa -N "" -f $HOME/.ssh/id_dsa
    cd .ssh
    touch authorized_keys authorized_keys2
    cat identity.pub >> authorized_keys
    cat id_rsa.pub id_dsa.pub >> authorized_keys2
    chmod 640 authorized_keys authorized_keys2
    
    To test your keys, you should be able to 'ssh submitnode' and be returned to a prompt.

  • Requesting interactive usage

    Sometimes you will want to test an intensive program without preparing a submission script and going through the hassle of the scheduler. You can run '/opt/UMtorque/bin/qsub -I' to request interactive usage on a node. After running qsub -I your shell will hang until a resource can be allocated to you. When the resource has been allocated, it will open up a new shell on the allocated node. You can now ssh into the node for the duration of the allocated shell. When you logout from the initial shell, or your timelimit is up, the node will again be locked down and you will have to ask the scheduler for access again.

    Below is an example to get a interactve session:

     [xhe@opensub01 24] qsub -I
    qsub: waiting for job 152.opensrv.umiacs.umd.edu to start
    qsub: job 152.opensrv.umiacs.umd.edu ready
    
     [xhe@openlab00 21] echo hello
    hello
     [xhe@openlab00 22] exit
    logout
    
    qsub: job 152.opensrv.umiacs.umd.edu completed
     [xhe@opensub01 25] 
  • Running your first job

    We will walk through a simple 'hello world' submission script will help you understand how submitting jobs works.

    1. Create a submission file
      In your home directory on a submit node, create a file called test.sh that contains the following:
      #!/bin/bash
      #PBS -lwalltime=10:00
      #PBS -lnodes=3
      
      echo hello world
      hostname
      echo finding each node I have access to
      for node in `cat ${PBS_NODEFILE}` ; do
       echo ----------
       /usr/bin/ssh $node hostname
       echo ---------- 
      done
      
      

      The script is a normal shell script except that it includes extra #PBS directives. These directives control how you request resources on the cluster. In this case we are requesting 10 minutes of total node time split across 3 nodes. Each node will be given 3:33 minutes of access to you. Often times people will forget to specify walltime for jobs over 2 nodes. The default walltime is 48hrs/node, so requesting 3 nodes will try to schedule 144 hours of cluster time which exceeds the maximum allowed.

    2. submit the job to the scheduler using /opt/UMtorque/bin/qsub
      [xhe@opensub00 28]$ /opt/UMtorque/bin/qsub test.sh
      123.opensrv.umiacs.umd.edu
      

      You can check the status of your job by running /opt/UMtorque/bin/qstat

      [xhe@opensub00 29]$ /opt/UMtorque/bin/qstat -n
      
      
      opensrv.umiacs.umd.edu: 
                                                                         Req'd  Req'd   Elap
      Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
      -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
      123.opensrv.umiacs.u xhe      dque     test.sh       --      3  --    --  48:00 R   -- 
         openlab00/0+openlab01/0+openlab02/0
       [opensub00 30]
      

      This shows us that the job is running 'R' and is using nodes openlab00, openlab01 and openlab02. A 'Q' for status means that your job is waiting in line for resources to free up. If you requested too many resources, your job will sit in queue until the end of time.

    3. check output

      When your job is finished, you will have two files in the directory you submitted the job from. They contain stdout (.oJOBID) and stderr (.eJOBID)

      The job we submitted above generated an empty error file test.sh.e123 and the following stdout file:

      [xhe@opensub00 30]$ cat test.sh.o123 
      Warning: no access to tty (Bad file descriptor).
      Thus no job control in this shell.
      hello world
      openlab00.umiacs.umd.edu
      finding each node I have access to
      ----------
      openlab00.umiacs.umd.edu
      ----------
      ----------
      openlab01.umiacs.umd.edu
      ----------
      ----------
      openlab02.umiacs.umd.edu
      ----------
       [xhe@opensub00 31]
      

      The first three lines in your output are a standard part of how we have our cluster configured and do not affect how your program runs.

  • Running MPI program as Batch Job

    At UMIACS, we have LAM, openmpi, MPICH1, MPICH2 installed. LAM is installed at /usr/local/stow/lam-version; MPICH1 is available in /usr/local/stow/mpich1-version; MPICH2 is available in /usr/local/stow/mpich2-version; openmpi is available in /usr/local/stow/openmpi-version

    First, you need to have an MPI based program written. Here's a simple one:

    alltoall.c

    • LAM
    • To compile this program and execute under using LAM, make sure /usr/local/stow/lam-7.1.4/bin is in your PATH environment variable.

      It can be compiled by doing: mpicc alltoall.c -o alltoall-lam

      The submission file lamsub.sh can be submitted to run your program

      #!/bin/bash
      #PBS -l nodes=8
      #PBS -l walltime=0:10:0
      cd ~/torquejobs/lamtest
      mpiexec -machinefile ${PBS_NODEFILE} alltoall-lam 
      

      Here is what looks like on your terminal after submit the job:

      [opensub00 142] qsub lamsub.sh
      127.opensrv.umiacs.umd.edu
       [opensub00 143] qstat -n
      
      opensrv.umiacs.umd.edu:
                                                                         Req'd  Req'd   Elap
      Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
      -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
      127.opensrv.umiacs.u xhe      dque     lamsub.sh     --      8  --    --  48:00 R   --
         openlab00/0+openlab01/0+openlab02/0+openlab03/0+openlab04/0+openlab05/0
         +openlab06/0+openlab07/0
       [opensub00 144]
      

      Output files for this job: lamsub.sh.o127 and lamsub.sh.e127(empty)

      The submission file lamsub2.sh uses mpirun instead of mpiexec, which user need to set up the MPI environment by starting lamboot, then run lamhalt to stop it afterwards. We recommend user use mpiexec since it will set up MPI runtime environment for your jobs.

    • Openmpi
    • To compile and run this program using openmpi, you need to include /usr/local/stow/openmpi-version in your path. The following example uses /usr/local/stow/openmpi-1.2.6.

      The following script will set up your path variables.
       setenv PATH /usr/local/stow/openmpi-1.2.6/bin:$PATH
       if ( $?LD_LIBRARY_PATH ) then
            setenv LD_LIBRARY_PATH /usr/local/stow/openmpi-1.2.6/lib:$LD_LIBRARY_PATH
       else
            setenv  LD_LIBRARY_PATH /usr/local/stow/openmpi-1.2.6/lib
       endif 

      The sample c code can be compiled by doing: mpicc alltoall.c -o alltoall-openmpi (we changed our environment to point to openmpi's mpicc)

      The following is the submission file openmpisub.sh
      #!/bin/bash
      # Special PBS control comments
      #PBS -l nodes=8,walltime=0:10:0
      
      # Set up the path
      export PATH=/usr/local/stow/openmpi-1.2.6/bin:$PATH
      export LD_LIBRARY_PATH=/usr/local/stow/openmpi-1.2.6/lib:$LD_LIBRARY_PATH
      
      cd ~/torquejobs/openmpitest
      echo starting
      mpiexec -mca btl tcp,self -n 8 ./alltoall-openmpi
      echo ending
      

      Here is what lookslike when submit the script at your prompt:

       [xhe@opensub00 91] qsub openmpisub.sh
      133.opensrv.umiacs.umd.edu
       [xhe@opensub00 92] qstat -n
      
      opensrv.umiacs.umd.edu: 
                                                                         Req'd  Req'd   Elap
      Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
      -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
      133.opensrv.umiacs.u xhe      dque     openmpisub  16472     8  --    --  48:00 R   -- 
         openlab00/0+openlab01/0+openlab02/0+openlab03/0+openlab04/0+openlab05/0
         +openlab06/0+openlab07/0
       [xhe@opensub00 93]
      

      Output files for this job: openmpisub.sh.o133 and openmpisub.sh.e133

    • MPICH1
    • To compile and run this program under MPICH1 you need to set up your environment:

      The following script will set the appropriate environment.
      setenv MPI_ROOT /usr/local/stow/mpich-version
      setenv MPI_LIB  $MPI_ROOT/lib
      setenv MPI_INC  $MPI_ROOT/include
      setenv MPI_BIN $MPI_ROOT/bin
      # add MPICH commands to your path (includes mpirun and mpicc)
      set path=($MPI_BIN $path)
      # add MPICH LD_LIBRARY_PATH pages to your path
      if ( $?LD_LIBRARY_PATH ) then
           setenv LD_LIBRARY_PATH  $MPI_ROOT/LD_LIBRRAY_PATH:$LD_LIBRARY_PATH
      else
           setenv LD_LIBRARY_PATH  $MPI_ROOT/LD_LIBRARY_PATH
      endif
      

      It can be compiled by doing: mpicc alltoall.c -o alltoall-mpich1 (remember we changed our environment to point to MPICH's mpicc)

      The submission file mpich1sub.sh is almost the same except you need to call mpirun instead of mpiexec.
      #!/bin/bash
      # Special PBS control comments
      #PBS -l nodes=8,walltime=60
      
      # Set up the path
      PATH=/usr/local/stow/mpichgm-1.2.7p1-20/bin:$PATH
      export $PATH
      
      cd ~/mpich1test/
      echo $PBS_NODEFILE
      # Run the program
      mpirun -np $( wc -l < $PBS_NODEFILE ) ./alltoall-mpich1
      
      

      Here is what looks like when submit the job from a submit machine at your prompt:

      
      [xhe@brood00 ~/mpich1test]$ qsub mpich1sub.sh
      167.queen.umiacs.umd.edu
      [xhe@brood00 ~/mpich1test]$ qstat -n
      
      queen.umiacs.umd.edu: 
                                                                         Req'd  Req'd   Elap
      Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
      -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
      167.queen.umiacs.umd xhe      dque     mpich1sub.    --      8  --    --  04:00 R   -- 
         bug00/0+bug01/0+bug02/0+bug03/0+bug04/0+bug05/0+bug06/0+bug07/0
      [xhe@brood00 ~/mpich1test]$ 
      

      Output files for this job: mpich1sub.sh.o167 and mpich1sub.sh.e167(empty)

    • MPICH2
    • To compile use mpich2, you need set up your environment for it.

      You need to set up the path variables to include the MPICH2 version you want to use. You will make two changes: one to the PATH variable; the other, the LD_LIBRARY_PATH.

      The following example uses version MPICH2-1.0.7

      For a bash shell user, append the following in .bash_profile:
          export MPICH2_HOME=/usr/local/stow/mpich2-1.0.7
          export PATH=$MPICH2_HOME/bin:$PATH
      
          export LD_LIBRARY_PATH=$MPICH2_HOME/lib:$LD_LIBRARY_PATH
      
      For a C shell user, append the following in your .cshrc:
          setenv MPICH2_HOME /usr/local/stow/mpich2-1.0.7
          setenv PATH $MPICH2_HOME/bin:$PATH
      
          setenv LD_LIBRARY_PATH $MPICH2_HOME/lib:$LD_LIBRARY_PATH
      
      

      The sample c code can be compiled by doing: mpicc alltoall.c -o alltoall-mpich2 ( we uses MPICH2's mpicc)

      Here is a sample submission file for mpich2: mpich2sub.sh.
      #!/bin/bash
      
      #PBS -lwalltime=0:10:0
      #PBS -lnodes=8
      
      # Set up the path
      export MPICH2_HOME=/usr/local/stow/mpich2-1.0.7
      export PATH=$MPICH2_HOME/bin:$PATH
      
      echo starting
      mpiexec -n 8   /nfshomes/xhe/torquejobs/mpich2test/alltoall-mpich2
      echo ending
      

      Before you submit you job to the cluster, you need do the following to start mpd daemon, which must run on each compute node to be used by your program.

      Make sure you have a file .mpd.conf in your home directory, with a line like this:
       secretword=your-favorite-word
      Create a hostfile for mpd daemon in some directory that you can reference. It includes the compute nodes that you want daemons to be started. List the nodes name, one line per node as the following.
       openlab00
       openlab01
       openlab02
       ...
       openlan07

      Then start the mpd daemon by type

       mpdboot -n #ofnodes -f path-to-hostfile/hostfile.
      (You will need to run mpiallexit later to shut down the daemon after your job finished.)

      After the daemon started, you can submit the about mpich2sub.sh script using command qsub.
      Here is what looks like when you submit from your prompt:

      [xhe@opensub01 68] mpdboot -n 8 -f mpd.hostfile
      [xhe@opensub01 69] qsub mpich2sub.sh
      140.opensrv.umiacs.umd.edu
      [xhe@opensub01 70] qstat -n
      
      opensrv.umiacs.umd.edu:
                                                                        Req'd  Req'd   Elap
      Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
      -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
      140.opensrv.umiacs.u xhe      dque     mpich2sub.   3403     8  --    --  48:00 R   --
        openlab00/0+openlab01/0+openlab02/0+openlab03/0+openlab04/0+openlab05/0
        +openlab06/0+openlab07/0
      [xhe@opensub01 71] qstat -n
      [xhe@opensub01 72] mpdallexit
      [xhe@opensub01 73] 
      
      

      Here are the standard output and standard error for this job: mpich2sub.sh.o140 and mpich2sub.sh.e140(empty)

    Please note that if you compile your program with either mpich, lam or openmpi, you MUST execute it in the same environment. If you compiling program use mpicc from LAM and then attempting to run program using MPICH's mpiexec. This will fail and you will get an error message similiar to the following:
    It seems that there is no lamd running on the host openlab02.umiacs.umd.edu.
     
    This indicates that the LAM/MPI runtime environment is not operating.
    The LAM/MPI runtime environment is necessary for MPI programs to run
    (the MPI program tired to invoke the "MPI_Init" function).
     
    Please run the "lamboot" command the start the LAM/MPI runtime
    environment.  See the LAM/MPI documentation for how to invoke
    "lamboot" across multiple machines.
    

  • Commands

    Please make sure /opt/UMtorque/bin is in your PATH environment variable.


    • qsub
      • Basic usage

        The qsub program is the mechanism for submitting a job. A job is a shell script, taken either from standard input or as an argument on the command line.

        The basic syntax of qsub, that you will probably be using
        most of the time, is:

        qsub -l nodes=<nodes> <scriptname>

        where <nodes> is the number of machines you'd like to allocate.
        Then, when PBS runs your job, the name of the file with the nodes
        allocated to you will be in $PBS_NODEFILE, and PBS will begin
        running your job on one single node from that allocation.

        When you run qsub, you will get a message like:

        123.opensrv.umiacs.umd.edu

        This is your job id. This is used for many things, and you should
        probably keep a record of it.

        When a job finishes, PBS deposits the standard output and standard
        error as <jobname>.o<number> and
        <jobname>.e<number>, where
        <jobname> is the name of the script you submitted (or
        STDIN if it came from qsub's standard in), and <number>
        is the leading number in the job id.

      • -l option

        The -l option is used to specify resources used by a PBS job.
        Two important ones are nodes, which specifies the number of nodes
        used, and walltime, which specifies the maximum amount of
        wall clock time that the process will use. The following invocation
        of qsub runs a job on 2 nodes for one hour:

        qsub -l nodes=2,walltime=01:00:00

        It is important that you specify walltime. Without it, your
        job may be scheduled unfavorably (because your job takes less than the
        thirty minute default). Even worse, your job may be terminated
        prematurely if you go over the thirty minute default.

        See pbs_resources(7) for more information.

        • The nodes resource

          In addition to specifying the number of nodes in a job, you can also
          use the nodes resource to specify features required for your job.

        • Submitting to specific nodes

          To submit to a specific set of nodes, you can specify those nodes,
          separated by a "+" character, in the nodes resources. For
          instance:

          qsub -l nodes=openlab00+openlab01,walltime=60

          ... will submit a two node job on openlab00 and openlab01,
          with a maxmimum time of sixty seconds.

          In general, this should be avoided, since you are limited to those
          nodes that you specify. For instance, if you have files that only
          reside on particular nodes, in the scratch space, you might want to
          use this option.

      • -I option

        To submit an interactive job, use the -I option:

        qsub -l <resources> -I

        Then, instead of enqueuing a batch job and exiting, the qsub
        program will wait until your interactive job runs. When it does, PBS
        will present you with a shell on one of the nodes that you have been
        allocated. You can then use all nodes allocated, until your time
        allocation is consumed.

      • Extended job descriptions

        The qsub program will let you put information about your job
        in your script, by including comments that begin the line with '#PBS',
        and include a single command line option. For instance, if I always
        want my job to use two nodes, I could put the following at the
        beginning of my script:

        #PBS -l nodes=2

        The "EXTENDED DESCRIPTION" heading in qsub(1) has
        more information about using this feature.

    • qstat

      This program tells you the status of your's and other people's jobs.
      The basic case of running qstat is very simple: you just run
      qstat, with no options. If it gives no output, it means
      there are no jobs in the queue.

    • qdel

      The qdel program is used to remove your job from the queue,
      and cancel it if it's running. The syntax for qdel is "qdel <job id>",
      but you can abbreviate the job ID with just the leading number.

    • pbsnodes

      The pbsnodes command is used to list nodes and their status. You will
      probably only use this one way, with the "-a" argument:

      pbsnodes -a

    • pbsdsh

      Run a shell command on all nodes allocated


    For more information, see the man pages for PBS commands. If, for some reason, these can't be viewed with the default manpath, you can use:

    man -M /opt/UMtorque/man <topic>

  • Compilers

    The following compilers and memory analyze tools are available at UMIACS.


    • Gnu compilers

      Besides default Gnu C/C++ and fortran compilers, UUMIACS also have several versions of gcc install at /usr/local/stow/gcc-version directory.

    • PGI compiler

      The Portland Group C and Fortran compilers are installed at /opt/pgi directory.

    • Intel

      Intel C and Fortran compilers are installed at /opt/intel.

    • NAGWare

      UMIACS has NAGWare fortran compiler, it is installed at /opt/NAGWare_f95 directory.

    • insure++

      Parasoft Insure++, is a runtime Analysis, Memory Error Detection software for C and C++. It is installed at /opt/insure directory.

    To find other software that installed at UMIACS, please check /usr/local/stow or /opt directory.

 

UMIACS Condor pool information can be found at:

condorintro.html

 

 

home | projects | facilities | reference | contact us
© Copyright 2003, Institute for Advanced Computer Study, University of Maryland, All rights reserved.