NAF for Dummies
Contents
The National Analysis Facility (NAF) is set up in the framework of the Helmholtz Alliance Physics at the TeraScale. It is intended as the Analysis Platform for LHC and ILC experiment. There is some documentation on how to use it on the terascale pages. Here I will try to give a short summary on the most important steps.
Getting Permissions
First of all you need to have a Grid Certificate. And, as the usage is restricted to German universities and institutions, you should be member of the German group of a Virtual Organisation (VO), e.g. /ilc/de.
Unfortunately that's not enough. In addition you have to register to be able to login to the non-Grid part of the NAF. But this is quite easy. Open the "Getting a NAF Account" page of the NAF documentation website and click on the registration form. If this does not work for you, just write an email with
- Lastname
- Firstname
- Email adress
- Desired account name (min 4, max 8 letters, only lower case letters)
- Experiment
- Certificate DN (like /O=GermanGrid/OU=Institue/CN=Donald Duck)
- Do you have already another account?
to naf-helpdesk@desy.de. Usually you should get a reply within few hours. You can find out your Certificate DN with the following command
openssl x509 -in ~/.globus/usercert.pem -noout -subject
Login to a NAF Workgroupserver
Onve you have permissions, just type
$ . /afs/desy.de/project/glite/UI/etc/profile.d/grid-env.sh $ voms-proxy-init -rfc -debug --valid 365:0 ... Enter GRID pass phrase: ... $ gsissh -Y ilc.naf.desy.de [username@tcx042]~%
And that's it! Congratulations, you made it to the NAF.
(Note: currently the grid-env.sh script checks if your operating system is SLD5. If you are using SLD6, you can still use the same gLite build - just make your own copy of that script and remove the parts with the echo $os | egrep "5." check.)
You can check which workgroupsevers are available, and what they offer with
$ wgsinfo
If you want to acces the desy afs space, you have to do a
$ klog username@desy.de Password:
and enter your DESY password.
Using the NAF
The NAF offers something like a large scratch disc that can be used by everyone, and can be accesed by any NAF workgroupserver, called Lustre. You can make your own directory
$ cd /scratch/hh/lustre/ilc $ mkdir username
and copy anything you need to run your jobs to this place, e.g. with cp from your afs directory. But be aware, it is a scratch area, so there is no guarantee that your data will be kept, especially there is no backup.
There are two options to use the NAF. As local Batch Cluster, or as interplay with the Grid If you want to use the NAF as a local batch system, just login to a workgroup server, as described above. Some usefull commands could be
- qmon - to get a graphical interface
- qsub - to submit a job
- qstat -u yourUserName - to get the status of your jobs
qstat -r - get the full job name as well as the requested job resources as ouput. of course this can be combined with the -u <username> option and/or other options
The job status can be
- d(eletion)
E(rror) -> find out for which reason with qstat(1) -j job_list
- h(old)
- r(unning)
- R(estarted)
- s(uspended)
- S(uspended)
- t(ransfering)
- T(hreshold)
- w(aiting)
Running Marlin
To run Marlin jobs on the NAF is not very challenging. Just copy all your libraries, steering files, etc. to Lustre, and run it, as you would on your own PC.
As on any other machine, it's very important, and often the most challenging task, to set the proper environment, e.g.
ILC_SOFT=/afs/desy.de/group/it/ilcsoft/v01-05
export MYSQL_PATH=${ILC_SOFT}/mysql/5.0.26
export LD_PRELOAD=/afs/desy.de/group/it/ilcsoft/dcap/lib/libpdcap.so
export ROOTSYS="/afs/desy.de/group/it/ilcsoft/v01-04/root/5.16.00"and don't forget to adopt the LD_LIBRARY_PATH and the PATH, as well as to tell MARLIN_DLL where to find the required libraries. This can be the afs installation
export MARLIN_DLL=$MARLIN_DLL:${ILC_SOFT}/MarlinReco/v00-10-04/lib/libMarlinReco.soor some private library stored on the workgroup server
export MARLIN_DLL=$MARLIN_DLL:${ILC_SOFT}/scratch/lustre-1.6/ilc/myUserName/myPrivateLib.soJust copy the environment script you run before executing Marlin on your machine, and make sure all private libraries needed by your process are really available under the path you quoted.
Pimp My Job
If your job has only moderate resource requirements (<1500 MB RAM or <48 hours CPU time), you can reduce the waiting time in the queue by specifying them: read how
The output (stdout and stderr) of your jobs is by default written into files in your NAF home directory. This becomes a bottleneck when you submit many jobs at a time, since they all will access the same AFS directory more or less simultaneously. But the output can be redirected: read how
If you want to give your job an advance warning before it is terminated when a resource limit is reached, you can submit the job with the option -notify. This will send the signal USR2 to your job one minute before the termination, which you can trap.
(Short example: add the line trap "my_finishing_function" USR2 to your submit script to execute some end-of-job function, like copying intermediate results, before the job is killed).
