• Scheduling Overview

    The primary job queuing system on the Vnode cluster is the Portable Batch System (PBS) Our installation is used for running parallel jobs or making use of dedicated reservations. We use a separate program called Maui for scheduling jobs and for reserving resources.

  • Vnode Quick Start

    The vnode cluster is heterogeneous in that nodes play different roles: some are attached to the lcd wall (vnodelcdwalls) and others are not connected to a screen ( vnodes ). You can use different queues in our scheduler to access these different resources.

    To allocate a single machine with a GPU for interactive use, just run qsub -I. This leads to the default queue, called 'single', that just allocates a single node for up to 8 hours. To allocate two nodes with GPUs for interactive use, run qsub -q double -I. This leads to the 'double' queue that allocates two nodes for up to four hours. Neither of these queues, lead to any nodes that have physical displays so these are suitable for remote processing, visualization, and general purpose gpu work.

    You can access the display wall as a whole or as any of four 2x2 arrays named lcdquad0, lcdquad1, lcdquad2, and lcdquad3. To access lcdquad0 of the 2x2 arrays, run qsub -q lcdquad0 -I. Any of the lcdquads can be allocated with a similar invocation. To access the whole wall, run qsub -q lcdwall -I

  • Advanced Reservations

    A lot of the work on the vnode cluster requires demonstrations or collaborations so it's important that users can resources through maui. You make reservations with the setres command. List them with the showres command and cancel them with the releaseres These commands are installed in /opt/UMmaui/bin, so you may want to add that to your path if you make frequent use of advanced reservations.

    For example, to reserve lcdquad0 for an hour and 15 minutes at 2pm on June 3rd, run setres -u username -s 14:00:00_6/3 -d 01:15:00 -f lcdquad0 ALL In this example, username should be your username. You can reserve lcdquad0, lcdquad1, lcdquad2, lcdquad3, or the lcdwall similarly by changing the feature/queue specified by the -f argument.

    You can list a reservations as:

    [fmccall@vnodesub00 ~]$ /opt/UMmaui/bin/showres                                          
    ReservationID       Type S       Start         End    Duration    N/P    StartTime
    fmccall.0           User -    00:00:54    00:10:54    00:10:00    2/2    Wed Apr 26 19:09:36
    1 reservation located

    You can delete reservations with the releaseres command as follows:

    [fmccall@vnodesub00 ~]$ /opt/UMmaui/bin/releaseres fmccall.0
    released User reservation 'fmccall.0'

    To use your reservation, run qsub as qsub -q lcdquad0 -W x=FLAGS:ADVRES:fmccall.0 -I where lcdquad0 is the feature that you requested in you reservation and fmccall.0 is your reservation id.

    If your reservation has already begun, then you may need to specify a shorter runtime to qsub. For example, if only 30 minutes remains on your reservation then the command above will not work because it will be asking for the one hour default runtime. Specify a shorter walltime with qsub -q lcdquad0 -l walltime=00:29:00 -W x=FLAGS:ADVRES:fmccall.0 -I to specify 29 minutes of runtime.

    The cluster is a shared resource. Only reserve the entire wall for critical demos or for research that requires it. Most jobs and all academic coursework should use the lcdquads. Don't reserve more than you need and try to limit reservations to no more than 2 hours.

  • PBS Usage

    There are many other options available through the cluster's scheduler .To see the current policy on the cluster, you can use the qmgr(8) command:

    [bargle@brood01 ~]$ qmgr
    Max open servers: 4
    Qmgr: print queue dque
    # Create queues and set their attributes.
    # Create and define queue dque
    create queue dque
    set queue dque queue_type = Execution
    set queue dque resources_max.cput = 192:00:00
    set queue dque resources_max.walltime = 96:00:00
    set queue dque resources_min.cput = 00:00:01
    set queue dque resources_default.cput = 192:00:00
    set queue dque resources_default.nodes = 1:ppn=1
    set queue dque max_user_run = 10
    set queue dque enabled = True
    set queue dque started = True
    Qmgr: quit
    [bargle@brood01 ~]$ 

    This command starts the queue management command for PBS. You cannot manipulate the queue from here, but you can inspect it. Here we print out the configuration for the dque queue. The dque queue is the default -- there are other queues, but their use is out of the scope of this document. Here, the resources_max.walltime value tells us the current maximum walltime for a job, and the max_user_run property tells us the maximum number of jobs that will run for any user at any time.

    Aside from qmgr, which you would only use for inspecting the current policy, there are several commands that you will use for submitting, inspecting, and controlling jobs. The following is by no means a complete reference. Unfortunately, there is not a lot of documentation available online. You should look at the man pages if you have further questions.

    • qstat

      The qstat(1B) command is used for querying the status of the queue, as well as the status of individual jobs. For the most part, you will be invoking the qstat command without arguments to examine the state of the entire queue. However, one can specify one or more jobs on the command line to pick one out in particular, or give additional flags such as -n or -f to get allocated node information, or full job information, respectively. The curious should consult the man page for more information.

      Here are some examples of the use and output of qstat. Assume that I have already submitted a job, identified by 11216.queen, and it has not run yet:

      [bargle@brood01 factor]$ qstat
      Job id           Name             User             Time Use S Queue
      ---------------- ---------------- ---------------- -------- - -----
      11216.queen      STDIN            bargle                  0 Q dque      

      The output of this command can be interpreted as follows:

      • Job id is the PBS identifier for the job. This is unique in the queue. In this case, 11216.queen indicates that my job is the 11216th job submitted to queen, the host where the PBS service runs
      • Name is the name of the script that was submitted. This is not unique. In this case, STDIN indicates that I piped the script directly to the submission program instead of using a persistent script on disk. This is a useful but rarely used technique.
      • User is the UNIX username of the user who submitted the job. User bargle is my username.
      • Time Use is the amount of CPU time accumulated by the job. No time has been used by this job, because it is still queued.
      • "S" is the current state of the job. "Q" indicates that the job is queued. State "R" indicates that the job is running.
      • Queue is the name of the queue where the job has been submitted. This will almost always be dque.

      Now, the job has been scheduled to run, but the PBS service has not accounted any CPU time use for the job yet:

      [bargle@brood01 factor]$ qstat
      Job id           Name             User             Time Use S Queue
      ---------------- ---------------- ---------------- -------- - -----
      11216.queen      STDIN            bargle                  0 R dque            

      Here the job has started to accumulate CPU time:

      [bargle@brood01 factor]$ qstat
      Job id           Name             User             Time Use S Queue
      ---------------- ---------------- ---------------- -------- - -----
      11216.queen      STDIN            bargle           00:00:13 R dque

      Finally, after the job has finished executing (note that there is no output, since the queue is empty):

      [bargle@brood01 factor]$ qstat
      [bargle@brood01 factor]$ 

      In the directory that was current when the job was submitted, PBS also left the results of output to stdout and stderr. They are called STDIN.o11216 and STDIN.e11216 respectively. We will go over the output of PBS a little more, later.

    • qsub

      The qsub(1B) program is used for submitting jobs to PBS. It has two primary modes of use: interactive jobs, and batch jobs. Interactive jobs are useful for testing your programs, but not very useful for running many jobs since it requires your input. We will look at interactive jobs first. The following command asks for two nodes and sixty seconds (-l nodes=2,walltime=60) in interactive mode (-I). Here, after I get my allocation, I look at the contents of the $PBS_NODEFILE (which lists the nodes I have allocated) and exit:

      [bargle@brood01 factor]$ qsub -l nodes=2,walltime=60 -I
      qsub: waiting for job to start
      qsub: job ready
      [bargle@bug60 ~]$ cat $PBS_NODEFILE
      [bargle@bug60 ~]$ exit
      qsub: job completed
      [bargle@brood01 factor]$ 

      Next, we submit a job from a script to use the pbsdsh program to run a process on all allocated nodes. The script, called helloworld.qsub, is as follows:

      # Set up the path
      export PATH
      # Make all hosts print out "Hello World"
      pbsdsh echo Hello World

      To submit the job:

      [bargle@brood01 examples]$ qsub -l nodes=4 helloworld.qsub
      [bargle@brood01 examples]$

      When a job finishes, PBS drops two output files in the directory that was current when the job was submitted. These files are named for the script and the job number. In this case, the files are called helloworld.qsub.o11220 and helloworld.qsub.e11220 for the standard output and standard error, respectively. The error file is empty, but here is the result of the output:

      Warning: no access to tty (Bad file descriptor).
      Thus no job control in this shell.
      Hello World
      Hello World
      Hello World
      Hello World

      The warning in the first two lines of the output is innocuous, and occurs in every output file from PBS. The next four lines are the result of "Hello World" being printed out from the four nodes where the job was scheduled, as a result of the pbsdsh command. There are more examples in the next section.

    • qdel

      The qdel(1B) program is used for deleting jobs from the queue when they are in the queued state. For example:

      [bargle@brood01 examples]$ qstat
      Job id           Name             User             Time Use S Queue
      ---------------- ---------------- ---------------- -------- - -----
      11222.queen      STDIN            bargle                  0 Q dque      
      [bargle@brood01 examples]$ qdel 11222
      [bargle@brood01 examples]$ qstat
      [bargle@brood01 examples]$ 
    • qsig

      The qsig(1B) program can be used to send UNIX signals to running jobs. For instance, it can be used to kill running jobs:

      [bargle@brood01 examples]$ qstat
      Job id           Name             User             Time Use S Queue
      ---------------- ---------------- ---------------- -------- - -----
      11221.queen      STDIN            bargle           00:00:01 R dque      
      [bargle@brood01 examples]$ qsig -s TERM 11221
      [bargle@brood01 examples]$ qstat
      [bargle@brood01 examples]$ 
    • pbsnodes

      The pbsnodes(1B) program can be used to inspect the state of the nodes. It can be used to examine offline nodes, or all nodes. To list all offline nodes:

      [bargle@brood01 examples]$ pbsnodes -l
      bug63                offline
      [bargle@brood01 examples]$ 

      To examine all nodes:

      [bargle@brood01 examples]$ pbsnodes -a
           state = free
           np = 2
           ntype = cluster
           state = free
           np = 2
           ntype = cluster
      ... deleted ...
           state = free
           np = 2
           ntype = cluster
           state = offline
           np = 2
           ntype = cluster
      [bargle@brood01 examples]$ 
  • Condor

    Condor is used for high-throughput computing. It does not deal well with jobs that require parallel access to more than one machine, so it is generally only used for serial jobs. Among other things, Condor supports I/O redirection and automatic checkpointing to add a level of fault tolerance to computing, as well as letting jobs get pre-empted and move from machine to machine. Jobs in Condor will get pre-empted by jobs scheduled through PBS, or if the job runs too long and there are others waiting. We have local documentation and examples, both introductory, and for running Matlab code under Condor. There is extensive documentation available online.

  • Previous


    home | projects | facilities | reference | contact us
    © Copyright 2005, Institute for Advanced Computer Study, University of Maryland, All rights reserved.