Yes. The execd spool directory will fill up fast with that turned on. Glad you 
found it. 

Bill

Sent from my iPad - cell: 647-974-2841

> On Jun 8, 2016, at 7:37 PM, Coleman, Marcus [JRDUS Non-J&J] 
> <[email protected]> wrote:
> 
> Thanks for the response!
> 
> I had forgot to turn off execd_params = Keep_active.
> (I was having an issue with jobs and needed more information to figure out 
> what was going wrong.)
> 
> 
> 
> -----Original Message-----
> From: William Hay [mailto:[email protected]] 
> Sent: Wednesday, June 08, 2016 2:48 AM
> To: Coleman, Marcus [JRDUS Non-J&J]
> Cc: [email protected]
> Subject: Re: [gridengine users] Queinstance stuck in E
> 
>> On Thu, Jun 02, 2016 at 10:46:46PM +0000, Coleman, Marcus [JRDUS Non-J&J] 
>> wrote:
>>   Hi all
>> 
>> 
>> 
>>   I am having a crazy time fixing an issue I have having with 3 qinstance
>>   stuck in E.
>> 
>> 
>> 
> 
> 
>>   [root@c1 active_jobs]# pwd
>> 
>>   /opt/sge/default/spool/c1/active_jobs
>> 
>>   [root@c1 active_jobs]#
>> 
>> 
>> 
>>   [root@c1 c1]# ls -l
>> 
>>   total 5980
>> 
>>   drwxrwxrwx 32000 sgeadmin sgeadmin  999424 May 30 04:54 active_jobs
> I suspect that 32000 there may be the problem here.  Apparently linux 
> artificially caps the maximum number of links to a file at 32000 for ext[23] 
> and possibly other file systems.  This in turn limits the number of 
> subdirectories in a directory to 2 lower
> 
> https://www.redhat.com/archives/rhl-list/2005-July/msg03301.html
> 
> The obvious question is 'Why do you have 31998 subdirectories in 
> active_jobs?'.  There shouldn't be more than one per job task and hitting 
> 31998 tasks on a node is not what one normally expects.  I would look  in 
> active_jobs with ls -al to see what directories are present.  One thing that 
> might cause this is if the spool directory were exported over NFS in which 
> case the nfs server may translate attempts to delete files/directories that 
> it thinks are open remotely into a rename into a hidden file.
> 
> Once you get the number of hard links/subdirectories down to a more sane 
> number then grid engine should be able to create directories for new 
> jobs/tasks normally and clearing the error state on the queue should stick.
> 
> 
> William
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to