Yes. The execd spool directory will fill up fast with that turned on. Glad you found it.
Bill Sent from my iPad - cell: 647-974-2841 > On Jun 8, 2016, at 7:37 PM, Coleman, Marcus [JRDUS Non-J&J] > <[email protected]> wrote: > > Thanks for the response! > > I had forgot to turn off execd_params = Keep_active. > (I was having an issue with jobs and needed more information to figure out > what was going wrong.) > > > > -----Original Message----- > From: William Hay [mailto:[email protected]] > Sent: Wednesday, June 08, 2016 2:48 AM > To: Coleman, Marcus [JRDUS Non-J&J] > Cc: [email protected] > Subject: Re: [gridengine users] Queinstance stuck in E > >> On Thu, Jun 02, 2016 at 10:46:46PM +0000, Coleman, Marcus [JRDUS Non-J&J] >> wrote: >> Hi all >> >> >> >> I am having a crazy time fixing an issue I have having with 3 qinstance >> stuck in E. >> >> >> > > >> [root@c1 active_jobs]# pwd >> >> /opt/sge/default/spool/c1/active_jobs >> >> [root@c1 active_jobs]# >> >> >> >> [root@c1 c1]# ls -l >> >> total 5980 >> >> drwxrwxrwx 32000 sgeadmin sgeadmin 999424 May 30 04:54 active_jobs > I suspect that 32000 there may be the problem here. Apparently linux > artificially caps the maximum number of links to a file at 32000 for ext[23] > and possibly other file systems. This in turn limits the number of > subdirectories in a directory to 2 lower > > https://www.redhat.com/archives/rhl-list/2005-July/msg03301.html > > The obvious question is 'Why do you have 31998 subdirectories in > active_jobs?'. There shouldn't be more than one per job task and hitting > 31998 tasks on a node is not what one normally expects. I would look in > active_jobs with ls -al to see what directories are present. One thing that > might cause this is if the spool directory were exported over NFS in which > case the nfs server may translate attempts to delete files/directories that > it thinks are open remotely into a rename into a hidden file. > > Once you get the number of hard links/subdirectories down to a more sane > number then grid engine should be able to create directories for new > jobs/tasks normally and clearing the error state on the queue should stick. > > > William > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
