TORQUE stands for Terascale Open-Source Resource and QUEue Manager. It is an Open Source distributed resource manager originally based on OpenPBS, the Portable Batch System (PBS) Our installation is used for running parallel jobs or making use of dedicated reservations. We use a separate program called Maui for scheduling jobs in TORQUE, but users have no interaction with it so we will make no further mention. If you have been given an account on the cluster, than you probably need PBS for running your jobs.

    Use of the cluster through PBS is dictated by a policy that is enforced by PBS and Maui. Currently, jobs are not limited in terms of the numbers of nodes you can use. However, there is a fixed limit on the length of jobs, and any user will not have more than a certain number of jobs running. At the time of this writing, there is a 96 hour limit on the time allowed for any jobs, and a maximum of ten jobs running for any user. This time is the "wall clock" time. That is, the amount of time elapsed irrespective of how it is used. This contrasts with CPU time, which is only counted if the job is actually running on the processor. We do not use CPU time for policy enforcement.

    Please make sure /opt/UMtorque/bin is in front of /usr/local/bin in your PATH environment variable. We will make it default after we upgrade all of cluster to use TORQUE.

    To see the current policy on the cluster, you can use the qmgr(8) command:

    [xhe@brood00 ~]$ qmgr -c " p s"
    # Create queues and set their attributes.
    # Create and define queue dque
    create queue dque
    set queue dque queue_type = Execution
    set queue dque resources_max.cput = 04:00:00
    set queue dque resources_max.walltime = 02:00:00
    set queue dque resources_min.cput = 00:00:01
    set queue dque resources_default.cput = 04:00:00
    set queue dque resources_default.nodes = 1:ppn=1
    set queue dque resources_default.walltime = 01:00:00
    set queue dque max_user_run = 2
    set queue dque enabled = True
    set queue dque started = True
    # Create and define queue long
    create queue long
    set queue long queue_type = Execution
    set queue long acl_user_enable = True
    set queue long resources_max.cput = 192:00:00
    set queue long resources_max.walltime = 96:00:00
    set queue long resources_min.cput = 00:00:01
    set queue long resources_default.cput = 192:00:00
    set queue long resources_default.nodes = 1:ppn=1
    set queue long enabled = True
    set queue long started = True
    # Set server attributes.
    set server scheduling = True
    set server managers =
    set server operators =
    set server default_queue = dque
    set server log_events = 511
    set server mail_from = adm
    set server query_other_jobs = True
    set server scheduler_iteration = 600
    set server node_check_rate = 600
    set server tcp_timeout = 6
    set server pbs_version = 2.1.8
    [xhe@brood00 ~]$ 

    This command starts the queue management command for PBS. You cannot manipulate the queue from here, but you can inspect it. Here we print out the configuration for the dque queue. The dque queue is the default -- there are other queues, but their use is out of the scope of this document. Here, the resources_max.walltime value tells us the current maximum walltime for a job, and the max_user_run property tells us the maximum number of jobs that will run for any user at any time.

    Aside from qmgr, which you would only use for inspecting the current policy, there are several commands that you will use for submitting, inspecting, and controlling jobs. The following is by no means a complete reference. Unfortunately, there is not a lot of documentation available online. You should look at the man pages if you have further questions.

    • qstat

      The qstat(1B) command is used for querying the status of the queue, as well as the status of individual jobs. For the most part, you will be invoking the qstat command without arguments to examine the state of the entire queue. However, one can specify one or more jobs on the command line to pick one out in particular, or give additional flags such as -n or -f to get allocated node information, or full job information, respectively. The curious should consult the man page for more information.

      Here are some examples of the use and output of qstat. Assume that I have already submitted a job, identified by 11216.queen, and it has not run yet:

      [bargle@brood01 factor]$ qstat
      Job id           Name             User             Time Use S Queue
      ---------------- ---------------- ---------------- -------- - -----
      11216.queen      STDIN            bargle                  0 Q dque      

      The output of this command can be interpreted as follows:

      • Job id is the PBS identifier for the job. This is unique in the queue. In this case, 11216.queen indicates that my job is the 11216th job submitted to queen, the host where the PBS service runs
      • Name is the name of the script that was submitted. This is not unique. In this case, STDIN indicates that I piped the script directly to the submission program instead of using a persistent script on disk. This is a useful but rarely used technique.
      • User is the UNIX username of the user who submitted the job. User bargle is my username.
      • Time Use is the amount of CPU time accumulated by the job. No time has been used by this job, because it is still queued.
      • "S" is the current state of the job. "Q" indicates that the job is queued. State "R" indicates that the job is running.
      • Queue is the name of the queue where the job has been submitted. This will almost always be dque.

      Now, the job has been scheduled to run, but the PBS service has not accounted any CPU time use for the job yet:

      [bargle@brood01 factor]$ qstat
      Job id           Name             User             Time Use S Queue
      ---------------- ---------------- ---------------- -------- - -----
      11216.queen      STDIN            bargle                  0 R dque            

      Here the job has started to accumulate CPU time:

      [bargle@brood01 factor]$ qstat
      Job id           Name             User             Time Use S Queue
      ---------------- ---------------- ---------------- -------- - -----
      11216.queen      STDIN            bargle           00:00:13 R dque

      Finally, after the job has finished executing (note that there is no output, since the queue is empty):

      [bargle@brood01 factor]$ qstat
      [bargle@brood01 factor]$ 

      In the directory that was current when the job was submitted, PBS also left the results of output to stdout and stderr. They are called STDIN.o11216 and STDIN.e11216 respectively. We will go over the output of PBS a little more, later.

    • qsub

      The qsub(1B) program is used for submitting jobs to PBS. It has two primary modes of use: interactive jobs, and batch jobs. Interactive jobs are useful for testing your programs, but not very useful for running many jobs since it requires your input. We will look at interactive jobs first. The following command asks for two nodes and sixty seconds (-l nodes=2,walltime=60) in interactive mode (-I). Here, after I get my allocation, I look at the contents of the $PBS_NODEFILE (which lists the nodes I have allocated) and exit:

      [bargle@brood01 factor]$ qsub -l nodes=2,walltime=60 -I
      qsub: waiting for job to start
      qsub: job ready
      [bargle@bug60 ~]$ cat $PBS_NODEFILE
      [bargle@bug60 ~]$ exit
      qsub: job completed
      [bargle@brood01 factor]$ 

      Next, we submit a job from a script to use the pbsdsh program to run a process on all allocated nodes. The script, called helloworld.qsub, is as follows:

      # Set up the path
      export PATH
      # Make all hosts print out "Hello World"
      pbsdsh echo Hello World

      To submit the job:

      [bargle@brood01 examples]$ qsub -l nodes=4 helloworld.qsub
      [bargle@brood01 examples]$

      When a job finishes, PBS drops two output files in the directory that was current when the job was submitted. These files are named for the script and the job number. In this case, the files are called helloworld.qsub.o11220 and helloworld.qsub.e11220 for the standard output and standard error, respectively. The error file is empty, but here is the result of the output:

      Warning: no access to tty (Bad file descriptor).
      Thus no job control in this shell.
      Hello World
      Hello World
      Hello World
      Hello World

      The warning in the first two lines of the output is innocuous, and occurs in every output file from PBS. The next four lines are the result of "Hello World" being printed out from the four nodes where the job was scheduled, as a result of the pbsdsh command. There are more examples in the next section.

    • qdel

      The qdel(1B) program is used for deleting jobs from the queue when they are in the queued state. For example:

      [bargle@brood01 examples]$ qstat
      Job id           Name             User             Time Use S Queue
      ---------------- ---------------- ---------------- -------- - -----
      11222.queen      STDIN            bargle                  0 Q dque      
      [bargle@brood01 examples]$ qdel 11222
      [bargle@brood01 examples]$ qstat
      [bargle@brood01 examples]$ 
    • qsig

      The qsig(1B) program can be used to send UNIX signals to running jobs. For instance, it can be used to kill running jobs:

      [bargle@brood01 examples]$ qstat
      Job id           Name             User             Time Use S Queue
      ---------------- ---------------- ---------------- -------- - -----
      11221.queen      STDIN            bargle           00:00:01 R dque      
      [bargle@brood01 examples]$ qsig -s TERM 11221
      [bargle@brood01 examples]$ qstat
      [bargle@brood01 examples]$ 
    • pbsnodes

      The pbsnodes(1B) program can be used to inspect the state of the nodes. It can be used to examine offline nodes, or all nodes. To list all offline nodes:

      [bargle@brood01 examples]$ pbsnodes -l
      bug63                offline
      [bargle@brood01 examples]$ 

      To examine all nodes:

      [bargle@brood01 examples]$ pbsnodes -a
           state = free
           np = 2
           ntype = cluster
           state = free
           np = 2
           ntype = cluster
      ... deleted ...
           state = free
           np = 2
           ntype = cluster
           state = offline
           np = 2
           ntype = cluster
      [bargle@brood01 examples]$ 
  • Condor

    Condor is used for high-throughput computing. It does not deal well with jobs that require parallel access to more than one machine, so it is generally only used for serial jobs. Among other things, Condor supports I/O redirection and automatic checkpointing to add a level of fault tolerance to computing, as well as letting jobs get pre-empted and move from machine to machine. Jobs in Condor will get pre-empted by jobs scheduled through PBS, or if the job runs too long and there are others waiting. We have local documentation and examples, both introductory, and for running Matlab code under Condor. There is extensive documentation available online.



home | projects | facilities | reference | contact us
© Copyright 2005, Institute for Advanced Computer Study, University of Maryland, All rights reserved.