Hi all,

I'd like to have your input on a problem we are facing right now:

We have a small script which parses the SGE (6.2u5) accounting file
and writes information in a SQL database. We just found out about what
seems to be a problem in the accounting file. From man 5 accounting:

       Difference between end_time and start_time (see above).

We use that particular field to gather statistics for our users. What
we found out was that when the "failed" field is 37, the ru_wallclock
field is always 0, even if the job did run. We don't know exactly
under which circumstances this happens yet.

Here's one such entry from the accounting file:

h_rt=86400 -pe default 512:0.000000:NONE:0.000000:0:0

And it's qacct output:

qname        med
hostname     r104-n7
group        nne-790-01
owner        sboisver12
project      nne-790-ab
department   defaultdepartment
jobname      SRA024407-Ray-1.4.0-k31-group1
jobnumber    2903640
taskid       undefined
account      sge
priority     0
qsub_time    Mon May 30 14:49:45 2011
start_time   Sat Jun  4 09:45:50 2011
end_time     Tue Jun  7 14:19:15 2011
granted_pe   default
slots        512
failed       37  : qmaster enforced h_rt limit
exit_status  0
ru_wallclock 0
ru_utime     1023454.939
ru_stime     617405.204
ru_maxrss    0
ru_ixrss     0
ru_ismrss    0
ru_idrss     0
ru_isrss     0
ru_minflt    134261699
ru_majflt    23127
ru_nswap     0
ru_inblock   0
ru_oublock   0
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     23568146
ru_nivcsw    18934035
cpu          0.000
mem          0.000
io           0.000
iow          0.000
maxvmem      0.000
arid         undefined

Has anyone experienced this before? Is this a known "bug/feature"?


Laurent Duchesne
CLUMEQ, Université Laval

users mailing list

Reply via email to