Hive Cluster Manual

Submitting

TORQUE/MAUI

TORQUE stands for Terascale Open-Source Resource and QUEue Manager. It is an Open Source distributed resource manager originally based on OpenPBS, the Portable Batch System (PBS) Our installation is used for running parallel jobs or making use of dedicated reservations. We use a separate program called Maui for scheduling jobs in TORQUE, but users have no interaction with it so we will make no further mention. If you have been given an account on the cluster, than you probably need PBS for running your jobs.

Use of the cluster through PBS is dictated by a policy that is enforced by PBS and Maui. Currently, jobs are not limited in terms of the numbers of nodes you can use. However, there is a fixed limit on the length of jobs, and any user will not have more than a certain number of jobs running. At the time of this writing, there is a 96 hour limit on the time allowed for any jobs, and a maximum of ten jobs running for any user. This time is the "wall clock" time. That is, the amount of time elapsed irrespective of how it is used. This contrasts with CPU time, which is only counted if the job is actually running on the processor. We do not use CPU time for policy enforcement.

Please make sure /opt/UMtorque/bin is in front of /usr/local/bin in your PATH environment variable. We will make it default after we upgrade all of cluster to use TORQUE.

To see the current policy on the cluster, you can use the qmgr(8) command:
[xhe@brood00 ~]$ qmgr -c " p s" # # Create queues and set their attributes. # # # Create and define queue dque # create queue dque set queue dque queue_type = Execution set queue dque resources_max.cput = 04:00:00 set queue dque resources_max.walltime = 02:00:00 set queue dque resources_min.cput = 00:00:01 set queue dque resources_default.cput = 04:00:00 set queue dque resources_default.nodes = 1:ppn=1 set queue dque resources_default.walltime = 01:00:00 set queue dque max_user_run = 2 set queue dque enabled = True set queue dque started = True # # Create and define queue long # create queue long set queue long queue_type = Execution set queue long acl_user_enable = True set queue long resources_max.cput = 192:00:00 set queue long resources_max.walltime = 96:00:00 set queue long resources_min.cput = 00:00:01 set queue long resources_default.cput = 192:00:00 set queue long resources_default.nodes = 1:ppn=1 set queue long enabled = True set queue long started = True # # Set server attributes. # set server scheduling = True set server managers = root@queen.umiacs.umd.edu set server operators = root@queen.umiacs.umd.edu set server default_queue = dque set server log_events = 511 set server mail_from = adm set server query_other_jobs = True set server scheduler_iteration = 600 set server node_check_rate = 600 set server tcp_timeout = 6 set server pbs_version = 2.1.8 [xhe@brood00 ~]$
This command starts the queue management command for PBS. You cannot manipulate the queue from here, but you can inspect it. Here we print out the configuration for the dque queue. The dque queue is the default -- there are other queues, but their use is out of the scope of this document. Here, the resources_max.walltime value tells us the current maximum walltime for a job, and the max_user_run property tells us the maximum number of jobs that will run for any user at any time.

Aside from qmgr, which you would only use for inspecting the current policy, there are several commands that you will use for submitting, inspecting, and controlling jobs. The following is by no means a complete reference. Unfortunately, there is not a lot of documentation available online. You should look at the man pages if you have further questions.
- qstat
  
  The qstat(1B) command is used for querying the status of the queue, as well as the status of individual jobs. For the most part, you will be invoking the qstat command without arguments to examine the state of the entire queue. However, one can specify one or more jobs on the command line to pick one out in particular, or give additional flags such as -n or -f to get allocated node information, or full job information, respectively. The curious should consult the man page for more information.
  
  Here are some examples of the use and output of qstat. Assume that I have already submitted a job, identified by 11216.queen, and it has not run yet:
  [bargle@brood01 factor]$ qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 11216.queen STDIN bargle 0 Q dque
  The output of this command can be interpreted as follows:
  - Job id is the PBS identifier for the job. This is unique in the queue. In this case, 11216.queen indicates that my job is the 11216th job submitted to queen, the host where the PBS service runs
  - Name is the name of the script that was submitted. This is not unique. In this case, STDIN indicates that I piped the script directly to the submission program instead of using a persistent script on disk. This is a useful but rarely used technique.
  - User is the UNIX username of the user who submitted the job. User bargle is my username.
  - Time Use is the amount of CPU time accumulated by the job. No time has been used by this job, because it is still queued.
  - "S" is the current state of the job. "Q" indicates that the job is queued. State "R" indicates that the job is running.
  - Queue is the name of the queue where the job has been submitted. This will almost always be dque.
  Now, the job has been scheduled to run, but the PBS service has not accounted any CPU time use for the job yet:
  [bargle@brood01 factor]$ qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 11216.queen STDIN bargle 0 R dque
  Here the job has started to accumulate CPU time:
  [bargle@brood01 factor]$ qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 11216.queen STDIN bargle 00:00:13 R dque
  Finally, after the job has finished executing (note that there is no output, since the queue is empty):
  [bargle@brood01 factor]$ qstat [bargle@brood01 factor]$
  In the directory that was current when the job was submitted, PBS also left the results of output to stdout and stderr. They are called STDIN.o11216 and STDIN.e11216 respectively. We will go over the output of PBS a little more, later.
- qsub
  
  The qsub(1B) program is used for submitting jobs to PBS. It has two primary modes of use: interactive jobs, and batch jobs. Interactive jobs are useful for testing your programs, but not very useful for running many jobs since it requires your input. We will look at interactive jobs first. The following command asks for two nodes and sixty seconds (-l nodes=2,walltime=60) in interactive mode (-I). Here, after I get my allocation, I look at the contents of the $PBS_NODEFILE (which lists the nodes I have allocated) and exit:
  [bargle@brood01 factor]$ qsub -l nodes=2,walltime=60 -I qsub: waiting for job 11212.queen.umiacs.umd.edu to start qsub: job 11212.queen.umiacs.umd.edu ready [bargle@bug60 ~]$ cat $PBS_NODEFILE bug60 bug59 [bargle@bug60 ~]$ exit logout qsub: job 11212.queen.umiacs.umd.edu completed [bargle@brood01 factor]$
  Next, we submit a job from a script to use the pbsdsh program to run a process on all allocated nodes. The script, called helloworld.qsub, is as follows:
  #!/bin/bash # Set up the path PATH=/usr/local/bin:$PATH export PATH # Make all hosts print out "Hello World" pbsdsh echo Hello World
  To submit the job:
  [bargle@brood01 examples]$ qsub -l nodes=4 helloworld.qsub 11220.queen.umiacs.umd.edu [bargle@brood01 examples]$
  When a job finishes, PBS drops two output files in the directory that was current when the job was submitted. These files are named for the script and the job number. In this case, the files are called helloworld.qsub.o11220 and helloworld.qsub.e11220 for the standard output and standard error, respectively. The error file is empty, but here is the result of the output:
  Warning: no access to tty (Bad file descriptor). Thus no job control in this shell. Hello World Hello World Hello World Hello World
  The warning in the first two lines of the output is innocuous, and occurs in every output file from PBS. The next four lines are the result of "Hello World" being printed out from the four nodes where the job was scheduled, as a result of the pbsdsh command. There are more examples in the next section.
- qdel
  
  The qdel(1B) program is used for deleting jobs from the queue when they are in the queued state. For example:
  [bargle@brood01 examples]$ qstat 11222.queen.umiacs.umd.edu Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 11222.queen STDIN bargle 0 Q dque [bargle@brood01 examples]$ qdel 11222 [bargle@brood01 examples]$ qstat [bargle@brood01 examples]$
- qsig
  
  The qsig(1B) program can be used to send UNIX signals to running jobs. For instance, it can be used to kill running jobs:
  [bargle@brood01 examples]$ qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 11221.queen STDIN bargle 00:00:01 R dque [bargle@brood01 examples]$ qsig -s TERM 11221 [bargle@brood01 examples]$ qstat [bargle@brood01 examples]$
- pbsnodes
  
  The pbsnodes(1B) program can be used to inspect the state of the nodes. It can be used to examine offline nodes, or all nodes. To list all offline nodes:
  [bargle@brood01 examples]$ pbsnodes -l bug63 offline [bargle@brood01 examples]$
  To examine all nodes:
  [bargle@brood01 examples]$ pbsnodes -a bug00 state = free np = 2 ntype = cluster bug01 state = free np = 2 ntype = cluster ... deleted ... bug62 state = free np = 2 ntype = cluster bug63 state = offline np = 2 ntype = cluster [bargle@brood01 examples]$
Condor

Condor is used for high-throughput computing. It does not deal well with jobs that require parallel access to more than one machine, so it is generally only used for serial jobs. Among other things, Condor supports I/O redirection and automatic checkpointing to add a level of fault tolerance to computing, as well as letting jobs get pre-empted and move from machine to machine. Jobs in Condor will get pre-empted by jobs scheduled through PBS, or if the job runs too long and there are others waiting. We have local documentation and examples, both introductory, and for running Matlab code under Condor. There is extensive documentation available online.

		home \| projects \| facilities \| reference \| contact us © Copyright 2005, Institute for Advanced Computer Study, University of Maryland, All rights reserved.

Submitting

TORQUE/MAUI

`qstat`

`qsub`

`qdel`

`qsig`

`pbsnodes`

Condor