I changed gid_range, it used to be just 20000. Now it's 20000-20200

However now when I submit a job the host goes in error state. I checked the 
messages log:

08/21/2017 15:06:55|  main|BJSMICDS126|E|shepherd of job 89.1 exited with exit 
status = 7
08/21/2017 15:06:55|  main|BJSMICDS126|E|can't open pid file 
"active_jobs/89.1/pid" for job 89.1

There must be another config problem.

Any ideas?





-----Original Message-----
From: Reuti [mailto:[email protected]]
Sent: Friday, August 18, 2017 4:37
To: John_Tai
Cc: [email protected]
Subject: Re: [gridengine users] error reason 1: can not find an unused 
add_grp_id

Hi,

> Am 18.08.2017 um 02:30 schrieb John_Tai <[email protected]>:
>
> When I submit more than 1 job to a queue, the job is queued even though there 
> are free slots available. When I check this waiting job status with qstat –j 
> I find this error message:
>
> error reason    1:          can not find an unused add_grp_id
>
> What does it mean?

Each job in SGE gets an additional group ID attached, which enables SGE to 
track the consumed resources.

What is your setting of:

$ qconf -sconf
#global:
…
gid_range                    20000-20100

Is this range in your case lower than the number of installed cores per 
exechost? As there might be a delay when old group IDs are released again, it 
would help to have some more IDs than the real number of cores (resp. threads 
in case you use them).

-- Reuti
________________________________

This email (including its attachments, if any) may be confidential and 
proprietary information of SMIC, and intended only for the use of the named 
recipient(s) above. Any unauthorized use or disclosure of this email is 
strictly prohibited. If you are not the intended recipient(s), please notify 
the sender immediately and delete this email from your computer.

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to