Hi,

> Am 21.08.2017 um 09:18 schrieb John_Tai <john_...@smics.com>:
> 
> I changed gid_range, it used to be just 20000. Now it's 20000-20200

Unless you have more than 201 cores per exechost, this is fine.


> However now when I submit a job the host goes in error state. I checked the 
> messages log:
> 
> 08/21/2017 15:06:55|  main|BJSMICDS126|E|shepherd of job 89.1 exited with 
> exit status = 7
> 08/21/2017 15:06:55|  main|BJSMICDS126|E|can't open pid file 
> "active_jobs/89.1/pid" for job 89.1
> 
> There must be another config problem.

Can the exechosts write to the location of the spool directory?

Often it's better to have at least the nodes writing to a local place. This can 
even be done after installation: shut down the exechosts, change the setting of 
the spool directory to a local place on the exechosts (`qconf -mconf`), create 
these directories like /var/spool/sge (the exechost specific directory will be 
created when the sge_execd starts up).

https://arc.liv.ac.uk/SGE/howto/nfsreduce.html

-- Reuti


> Any ideas?
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Reuti [mailto:re...@staff.uni-marburg.de]
> Sent: Friday, August 18, 2017 4:37
> To: John_Tai
> Cc: users@gridengine.org
> Subject: Re: [gridengine users] error reason 1: can not find an unused 
> add_grp_id
> 
> Hi,
> 
>> Am 18.08.2017 um 02:30 schrieb John_Tai <john_...@smics.com>:
>> 
>> When I submit more than 1 job to a queue, the job is queued even though 
>> there are free slots available. When I check this waiting job status with 
>> qstat –j I find this error message:
>> 
>> error reason    1:          can not find an unused add_grp_id
>> 
>> What does it mean?
> 
> Each job in SGE gets an additional group ID attached, which enables SGE to 
> track the consumed resources.
> 
> What is your setting of:
> 
> $ qconf -sconf
> #global:
> …
> gid_range                    20000-20100
> 
> Is this range in your case lower than the number of installed cores per 
> exechost? As there might be a delay when old group IDs are released again, it 
> would help to have some more IDs than the real number of cores (resp. threads 
> in case you use them).
> 
> -- Reuti
> ________________________________
> 
> This email (including its attachments, if any) may be confidential and 
> proprietary information of SMIC, and intended only for the use of the named 
> recipient(s) above. Any unauthorized use or disclosure of this email is 
> strictly prohibited. If you are not the intended recipient(s), please notify 
> the sender immediately and delete this email from your computer.
> 


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to