What you need to know to run your project

Developing projects on the red/blue cluster is a bit different from developing and testing a project on the CSIC or Detective cluster. Unlike those clusters, resources are managed with a scheduler and access is more restricted.

The cluster is composed of a frontend (redleader) and 27 processing nodes. The frontend is to be used for editing your code, doing light compiles and such. To run any processing or testing of your code, you must submit it through the scheduler.

The scheduler takes care of assigning processing nodes to jobs. Basically, when you get assigned a node, you will be the only person on it for the duration of your job. After your timelimit is up or your process ends, the node will be cleaned and locked down for the next submission.

  • Logging in

    To gain access to any of the nodes you will first need to log into redleader.umiacs.umd.edu using ssh. This machine acts as a gateway to the rest of the cluster. No intensive processing is to be run on redleader. This machine is shared with every other person in the class and in various research projects throughout the institute. If you run an intensive process on redleader, it will be killed so other research will not be affected.

  • Changing your password

    The UMIACS cluster is part of a larger DCE/kerberos installation. Unfortunately, it's not possible to change your password on any cluster linux machines at this time. You will have to ssh into odin.cfar.umd.edu, mellum.cfar.umd.edu, or fenris.cfar.umd.edu and run the 'passwd' command to change your password.

  • Setting up your environment

    After you are logged in, you will have to set your account up to allow pbs to access is from any of the processing nodes. This is required since pbs will write the stdout and stderr to files in your account. Use ssh-keygen with no password to create a keypair that can be used to grant access for your jobs. These can be generated by running the following:

    
    cd $HOME
    ssh-keygen -t rsa1 -N ""  -f $HOME/.ssh/identity
    ssh-keygen -t rsa -N "" -f $HOME/.ssh/id_rsa
    ssh-keygen -t dsa -N "" -f $HOME/.ssh/id_dsa
    cd .ssh
    touch authorized_keys authorized_keys2
    cat identity.pub >> authorized_keys
    cat id_rsa.pub id_dsa.pub >> authorized_keys2
    chmod 640 authorized_keys authorized_keys2
    
    To test your keys, you should be able to 'ssh redleader' and be returned to a prompt.

  • Requesting interactive usage

    Sometimes you will want to test an intensive program without preparing a submission script and going through the hassle of the scheduler. You can run '/opt/UMtorque/bin/qsub -I' to request interactive usage on a node. After running qsub -I your shell will hang until a resource can be allocated to you. When the resource has been allocated, it will open up a new shell on the allocated node. You can now ssh into the node for the duration of the allocated shell. When you logout from the initial shell, or your timelimit is up, the node will again be locked down and you will have to ask the scheduler for access again.

  • Running your first job

    We know that your project is likely to require MPI, PVM, or parallel libraries, but walking through a simple 'hello world' submission script will help you understand how submitting jobs works a bit better.

    1. Create a submission file
      In your home directory on redleader, create a file called test.sh that contains the following:
      #!/bin/bash
      #PBS -lwalltime=10:00
      #PBS -lnodes=3
      
      echo hello world
      hostname
      echo finding each node I have access to
      for node in `cat ${PBS_NODEFILE}` ; do
       echo ----------
       /usr/bin/ssh $node hostname
       echo ---------- 
      done
      
      

      The script is a normal shell script except that it includes extra #PBS directives. These directives control how you request resources on the cluster. In this case we are requesting 10 minutes of total node time split across 3 nodes. Each node will be given 3:33 minutes of access to you. Often times people will forget to specify walltime for jobs over 2 nodes. The default walltime is 48hrs/node, so requesting 3 nodes will try to schedule 144 hours of cluster time which exceeds the maximum allowed.

    2. submit the job to the scheduler using /opt/UMtorque/bin/qsub
      [toaster@redleader ~]$ /opt/UMtorque/bin/qsub test.sh
      20483.rogueleader.umiacs.umd.edu
      

      You can check the status of your job by running /opt/UMtorque/bin/qstat

      [toaster@redleader ~]$ /opt/UMtorque/bin/qstat -n
      
      rogueleader.umiacs.umd.edu: 
                                                                  Req'd  Req'd   Elap
      Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
      --------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
      20483.roguelead toaster  dque     test.sh      8210   3  --    --  48:00 R   -- 
         red03/0+blue12/0+blue11/0
      

      This shows us that the job is running 'R' and is using nodes red03, blue12, and blue11. A 'Q' for status means that your job is waiting in line for resources to free up. If you requested too many resources, your job will sit in queue until the end of time.

    3. check output

      When your job is finished, you will have two files in the directory you submitted the job from. They contain stdout (.oJOBID) and stderr (.eJOBID)

      The job we submitted above generated an empty error file test.sh.e20483 and the following stdout file:

      [toaster@redleader ~]$ cat test.sh.o20483 
      echo toaster hard maxlogins 18 >> /etc/security/limits.conf
      Warning: no access to tty (Bad file descriptor).
      Thus no job control in this shell.
      hello world
      red03.umiacs.umd.edu
      finding each node I have access to
      ----------
      red03.umiacs.umd.edu
      ----------
      ----------
      blue12.umiacs.umd.edu
      ----------
      ----------
      blue11.umiacs.umd.edu
      ----------
      

      The first three lines in your output are a standard part of how we have our cluster configured and do not affect how your program runs.

  • Running an MPI program

    Now down to the part you actually care about, running an MPI program. We have two different MPI installations at UMIACS, LAM and MPICH. LAM is the default in /usr/local/, several version of MPICH is available in /usr/local/stow/mpich-version.

    First, you need to have an MPI based program written. Here's a simple one:

    alltoall.c

    To compile this program and execute under using LAM do the following:

    It can be compiled by doing: mpicc alltoall.c -o alltoall

    The submission file to use will need to call a wrapper program to initialize the MPI environment for your program to run

    #PBS -l nodes=4
    #PBS -l walltime=5:00
    cd ~/
    /usr/local/bin/lamboot $PBS_NODEFILE
    /usr/local/bin/mpirun C alltoall
    /usr/local/bin/lamhalt
    

    Output files for this job: STDOUT and STDERR

    To compile and run this program under MPICH you need to change your environment a little:

    The following script will set the appropriate environment.

    setenv MPI_ROOT /usr/local/stow/mpich-version
    setenv MPI_LIB  $MPI_ROOT/lib
    setenv MPI_INC  $MPI_ROOT/include
    setenv MPI_BIN $MPI_ROOT/bin
    # add MPICH commands to your path (includes mpirun and mpicc)
    set path=($MPI_BIN $path)
    # add MPICH man pages to your manpath
    if ( $?MANPATH ) then
         setenv MANPATH  $MPI_ROOT/man:$MANPATH
    else
         setenv MANPATH  $MPI_ROOT/man
    endif
    

    It can be compiled by doing: mpicc alltoall.c -o alltoall (remember we changed our environment to point to MPICH's mpicc)

    The submission file is almost the same except you need to call pbsmpich instead of pbslam.
    #PBS -l nodes=10
    #PBS -l walltime=40:00
    cd ~/mpitest/mpich
    exec pbsmpich -vCD ./alltoall
    
    Please note that if you compile your program with either mpich or pbs, you MUST execute it in the same environment. When using MPICH it's common to compile using /usr/local/bin/mpicc (LAM) and then attempting to run using pbsmpich. This will fail and you will get an error message similiar to the following:
    It seems that there is no lamd running on the host blue12.umiacs.umd.edu.
     
    This indicates that the LAM/MPI runtime environment is not operating.
    The LAM/MPI runtime environment is necessary for MPI programs to run
    (the MPI program tired to invoke the "MPI_Init" function).
     
    Please run the "lamboot" command the start the LAM/MPI runtime
    environment.  See the LAM/MPI documentation for how to invoke
    "lamboot" across multiple machines.
    

 

This introduction is just barely enough to get you started. You'll understand the cluster much better if you read the user manual at:

cluster-manual.html

 

 

home | projects | facilities | reference | contact us
© Copyright 2003, Institute for Advanced Computer Study, University of Maryland, All rights reserved.