Hi,

Am 01.04.2011 um 12:33 schrieb lars van der bijl:

> Hey everyone.
> 
> Where having some issues with job's being killed with exit status 137.

137 = 128 + 9

$ kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL
 5) SIGTRAP      6) SIGABRT      7) SIGBUS       8) SIGFPE
 9) SIGKILL     ...

So, the job was killed. Did you request a too small value for h_vmem or h_rt?

-- Reuti


> This causes the task to finish and start it dependent task which is causing 
> all kind of havoc.
> 
> submitting a job with a very small max memory limit gives me this this as a 
> example.
> 
> $ qacct -j 21141
> ==============================================================
> qname        test.q              
> hostname     atom12.**
> group        **          
> owner        lars                
> project      NONE                
> department   defaultdepartment   
> jobname      stest__out__geometry2
> jobnumber    21141               
> taskid       101                 
> account      sge                 
> priority     0                   
> qsub_time    Fri Apr  1 11:22:30 2011
> start_time   Fri Apr  1 11:22:31 2011
> end_time     Fri Apr  1 11:22:39 2011
> granted_pe   smp                 
> slots        4                   
> failed       100 : assumedly after job
> exit_status  137                 
> ru_wallclock 8            
> ru_utime     0.281        
> ru_stime     0.167        
> ru_maxrss    3744                
> ru_ixrss     0                   
> ru_ismrss    0                   
> ru_idrss     0                   
> ru_isrss     0                   
> ru_minflt    70739               
> ru_majflt    0                   
> ru_nswap     0                   
> ru_inblock   8                   
> ru_oublock   224                 
> ru_msgsnd    0                   
> ru_msgrcv    0                   
> ru_nsignals  0                   
> ru_nvcsw     1072                
> ru_nivcsw    439                 
> cpu          2.240        
> mem          0.573             
> io           0.145             
> iow          0.000             
> maxvmem      405.820M
> arid         undefined
> 
> anyone know of a reason why the task would be killed with this error state? 
> or how to catch it?
> 
> Lars
> 
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to