Using the Batch Cluster
The FLC batch cluster is part of the common BIRD infrastructure at DESY which makes use of the Sun N1 Grid Engine cluster management software. You may have to contact one of your administrators to enable this resource for your account in the registry.
To use the batch cluster, log in to one of the cluster servers lc3 or lc4 and source the file /usr/sge/default/common/settings.sh in your shell to set up various environment variables. Then write a shell script which will run on the cluster nodes – it should do whatever is needed for your job (prepare input data, process it, and store output data) and invoke other commands and programs as needed. This script can then be submitted to the cluster as a batch job. The main commands for job control are qsub (submit a job), qstat (display the job status), qmon (invoke a graphical monitoring tool), and qdel (delete a job). These commands have many options, some of which are:
“-P” – specify your project name (should be “-P flc”)
“-l h_rt=time” – request run time, e. g. “-l h_cpu=02:30:00” for 2.5 hours
“-l arch=type” – request a hardware architecture (“type” can be “x86” or “amd64” or “x86|amd64”)
“-l os=type” – request an operating system (“type” can be “sld3” or “sld4” or “sld3|sld4”)
“-w” – specify a warning level, e. g. “-w e” to reject jobs with invalid requests as errors
“-j” – to merge or not to merge stderr with stdout, e. g. “-j yes”
“-m” – notify the job owner by e-mail, e. g. “-m ae” to send a mail when the job is aborted or ends.
“-M” – specify the mail address to which the notification should be sent
“-v variable=value” – set environment variables for your job
“-cwd” – put log files in the current working directory
Note that you can supply these options either as command line arguments or in the form of special comments beginning with “#$” at the top of your job script.
There will be a temporary directory $TMPDIR assigned to your job on the worker node. Make sure that you change to that directory immediately at the beginning of your job script with the command “cd $TMPDIR”. There is no need to retrieve the output of your job manually – the cluster management software will automatically return the contents of the stdout and the stderr stream when your job is done.
For further information, see the manpages of sge_intro(1) and the various cluster-related commands or write to the mailing list sge-users@desy.de.
