Introduction to Condor

To address the needs of long serial jobs, we have configured the red and blue nodes to accept condor jobs when they are not in use by PBS.
 
  • What is condor?

    From the condor website http://www.cs.wisc.edu/condor:

    The goal of the Condor Project is to develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources. Guided by both the technological and sociological challenges of such a computing environment, the Condor Team has been building software tools that enable scientists and engineers to increase their computing throughput

    Unlike PBS, Condor has a few 'universes' that run different types of jobs. Depending on the universe, different jobs may or may not be allowed to run and may be assigned different priorities. We currently have the 'standard' and 'vanilla' universes configured.

    The standard universe accepts jobs that have been compiled using condor_compile. These jobs will proxy most I/O calls back through the host that submitted the job. This allows for jobs to be schedulef on any machine of the same architecture regardless of the underlying filesystems that are available to it. Using this universe will also allow your job to be checkpointed, moved to another machine and restarted.

    The vanilla universe can be used to run other jobs that cannot be relinked against condor. This includes scripting languages (perl, tcl, etc) to binary only packages that cannot be recompiled. Since I/O cannot be redirected jobs submitted to this universe are only able to run on machines sharing file systems (same nfs shares visible, etc). Jobs are also not able to be checkpointed and restarted. If condor needs to move a vanilla job, it will kill and restart the process from the beginning

  • Thats nice, but how do I use it?

    If you can recompile your code using condor_compile you should do so.

    condor_compile gcc program.c -o program

    Next, you will need to create a submission file similiar to the one listed below.

    test.cmd

  • ####################
    ##
    ## Test Condor command file
    ##
    ####################
    
    # uncomment if your program wasn't compiled using condor_compile
    # universe = standard
    # name of executable to submit
    executable = program
    # where to dump stdout from your program
    output = program.stdout
    # where to dump stderr from your program
    error = program.stderr
    # logfile for condor
    log = program.log
    # arguments to pass program
    arguments = foo bar glarch
    # set environment variables for program
    environment = alpha=a;bravo=b;charlie=c
    # directory to change to before running program
    directory = /clusterhomes/testuser/condor
    queue

    Finally, submit your job to condor using condor_submit

    condor_submit test.cmd

    After your program is submitted, you can use condor_q and condor_status to check see how your job is being run.

  • Additional Resources

    Condor project website

    Condor installation at UMIACS: /opt/stow/condor-6.4.7

 

home | projects | facilities | reference | contact us
© Copyright 2003, Institute for Advanced Computer Study, University of Maryland, All rights reserved.