I changed gid_range, it used to be just 20000. Now it's 20000-20200 However now when I submit a job the host goes in error state. I checked the messages log:
08/21/2017 15:06:55| main|BJSMICDS126|E|shepherd of job 89.1 exited with exit status = 7 08/21/2017 15:06:55| main|BJSMICDS126|E|can't open pid file "active_jobs/89.1/pid" for job 89.1 There must be another config problem. Any ideas? -----Original Message----- From: Reuti [mailto:[email protected]] Sent: Friday, August 18, 2017 4:37 To: John_Tai Cc: [email protected] Subject: Re: [gridengine users] error reason 1: can not find an unused add_grp_id Hi, > Am 18.08.2017 um 02:30 schrieb John_Tai <[email protected]>: > > When I submit more than 1 job to a queue, the job is queued even though there > are free slots available. When I check this waiting job status with qstat –j > I find this error message: > > error reason 1: can not find an unused add_grp_id > > What does it mean? Each job in SGE gets an additional group ID attached, which enables SGE to track the consumed resources. What is your setting of: $ qconf -sconf #global: … gid_range 20000-20100 Is this range in your case lower than the number of installed cores per exechost? As there might be a delay when old group IDs are released again, it would help to have some more IDs than the real number of cores (resp. threads in case you use them). -- Reuti ________________________________ This email (including its attachments, if any) may be confidential and proprietary information of SMIC, and intended only for the use of the named recipient(s) above. Any unauthorized use or disclosure of this email is strictly prohibited. If you are not the intended recipient(s), please notify the sender immediately and delete this email from your computer. _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
