Re: [gridengine users] Queinstance stuck in E

Coleman, Marcus [JRDUS Non-J&J] Wed, 08 Jun 2016 16:39:29 -0700

Thanks for the response!

I had forgot to turn off execd_params = Keep_active.
(I was having an issue with jobs and needed more information to figure out what 
was going wrong.)




-----Original Message-----
From: William Hay [mailto:w....@ucl.ac.uk] 
Sent: Wednesday, June 08, 2016 2:48 AM
To: Coleman, Marcus [JRDUS Non-J&J]
Cc: users@gridengine.org
Subject: Re: [gridengine users] Queinstance stuck in E

On Thu, Jun 02, 2016 at 10:46:46PM +0000, Coleman, Marcus [JRDUS Non-J&J] wrote:
>    Hi all
> 
>     
> 
>    I am having a crazy time fixing an issue I have having with 3 qinstance
>    stuck in E.
> 
>     
> 


>    [root@c1 active_jobs]# pwd
> 
>    /opt/sge/default/spool/c1/active_jobs
> 
>    [root@c1 active_jobs]#
> 
>     
> 
>    [root@c1 c1]# ls -l
> 
>    total 5980
> 
>    drwxrwxrwx 32000 sgeadmin sgeadmin  999424 May 30 04:54 active_jobs
I suspect that 32000 there may be the problem here.  Apparently linux 
artificially caps the maximum number of links to a file at 32000 for ext[23] 
and possibly other file systems.  This in turn limits the number of 
subdirectories in a directory to 2 lower

https://www.redhat.com/archives/rhl-list/2005-July/msg03301.html

The obvious question is 'Why do you have 31998 subdirectories in active_jobs?'. 
 There shouldn't be more than one per job task and hitting 31998 tasks on a 
node is not what one normally expects.  I would look  in active_jobs with ls 
-al to see what directories are present.  One thing that might cause this is if 
the spool directory were exported over NFS in which case the nfs server may 
translate attempts to delete files/directories that it thinks are open remotely 
into a rename into a hidden file.

Once you get the number of hard links/subdirectories down to a more sane number 
then grid engine should be able to create directories for new jobs/tasks 
normally and clearing the error state on the queue should stick.


William

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Queinstance stuck in E

Reply via email to