Thanks for the response! I had forgot to turn off execd_params = Keep_active. (I was having an issue with jobs and needed more information to figure out what was going wrong.)
-----Original Message----- From: William Hay [mailto:w....@ucl.ac.uk] Sent: Wednesday, June 08, 2016 2:48 AM To: Coleman, Marcus [JRDUS Non-J&J] Cc: users@gridengine.org Subject: Re: [gridengine users] Queinstance stuck in E On Thu, Jun 02, 2016 at 10:46:46PM +0000, Coleman, Marcus [JRDUS Non-J&J] wrote: > Hi all > > > > I am having a crazy time fixing an issue I have having with 3 qinstance > stuck in E. > > > > [root@c1 active_jobs]# pwd > > /opt/sge/default/spool/c1/active_jobs > > [root@c1 active_jobs]# > > > > [root@c1 c1]# ls -l > > total 5980 > > drwxrwxrwx 32000 sgeadmin sgeadmin 999424 May 30 04:54 active_jobs I suspect that 32000 there may be the problem here. Apparently linux artificially caps the maximum number of links to a file at 32000 for ext[23] and possibly other file systems. This in turn limits the number of subdirectories in a directory to 2 lower https://www.redhat.com/archives/rhl-list/2005-July/msg03301.html The obvious question is 'Why do you have 31998 subdirectories in active_jobs?'. There shouldn't be more than one per job task and hitting 31998 tasks on a node is not what one normally expects. I would look in active_jobs with ls -al to see what directories are present. One thing that might cause this is if the spool directory were exported over NFS in which case the nfs server may translate attempts to delete files/directories that it thinks are open remotely into a rename into a hidden file. Once you get the number of hard links/subdirectories down to a more sane number then grid engine should be able to create directories for new jobs/tasks normally and clearing the error state on the queue should stick. William _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users