Yes I setup local spool directories, and sgeadmin account can write to it. In 
fact it is writing to the messages file in that local dir, but it still gives 
that error.

$ ll
total 108
drwxr-xr-x 3 sgeadmin sgeadmin  4096 Aug 22 13:44 active_jobs
-rw-r--r-- 1 sgeadmin sgeadmin     6 Jul 10 14:18 execd.pid
drwxr-xr-x 3 sgeadmin sgeadmin  4096 Aug 18 16:58 jobs
drwxr-xr-x 2 sgeadmin sgeadmin  4096 Jul 10 14:18 job_scripts
-rw-r--r-- 1 sgeadmin sgeadmin 93014 Aug 22 13:44 messages
[johnt@BJSMICDS126 BJSMICDS126]$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda4       486G  8.2G  453G   2% /
[johnt@BJSMICDS126 BJSMICDS126]$ qconf -sconf |grep spool
execd_spool_dir              /opt/sge/default/spool
[johnt@BJSMICDS126 BJSMICDS126]$ pwd
/opt/sge/default/spool/BJSMICDS126
[johnt@BJSMICDS126 BJSMICDS126]$



-----Original Message-----
From: Reuti [mailto:[email protected]]
Sent: Monday, August 21, 2017 5:13
To: John_Tai
Cc: [email protected]
Subject: Re: [gridengine users] error reason 1: can not find an unused 
add_grp_id

Hi,

> Am 21.08.2017 um 09:18 schrieb John_Tai <[email protected]>:
>
> I changed gid_range, it used to be just 20000. Now it's 20000-20200

Unless you have more than 201 cores per exechost, this is fine.


> However now when I submit a job the host goes in error state. I checked the 
> messages log:
>
> 08/21/2017 15:06:55|  main|BJSMICDS126|E|shepherd of job 89.1 exited with 
> exit status = 7
> 08/21/2017 15:06:55|  main|BJSMICDS126|E|can't open pid file 
> "active_jobs/89.1/pid" for job 89.1
>
> There must be another config problem.

Can the exechosts write to the location of the spool directory?

Often it's better to have at least the nodes writing to a local place. This can 
even be done after installation: shut down the exechosts, change the setting of 
the spool directory to a local place on the exechosts (`qconf -mconf`), create 
these directories like /var/spool/sge (the exechost specific directory will be 
created when the sge_execd starts up).

https://arc.liv.ac.uk/SGE/howto/nfsreduce.html

-- Reuti


> Any ideas?
>
>
>
>
>
> -----Original Message-----
> From: Reuti [mailto:[email protected]]
> Sent: Friday, August 18, 2017 4:37
> To: John_Tai
> Cc: [email protected]
> Subject: Re: [gridengine users] error reason 1: can not find an unused 
> add_grp_id
>
> Hi,
>
>> Am 18.08.2017 um 02:30 schrieb John_Tai <[email protected]>:
>>
>> When I submit more than 1 job to a queue, the job is queued even though 
>> there are free slots available. When I check this waiting job status with 
>> qstat –j I find this error message:
>>
>> error reason    1:          can not find an unused add_grp_id
>>
>> What does it mean?
>
> Each job in SGE gets an additional group ID attached, which enables SGE to 
> track the consumed resources.
>
> What is your setting of:
>
> $ qconf -sconf
> #global:
> …
> gid_range                    20000-20100
>
> Is this range in your case lower than the number of installed cores per 
> exechost? As there might be a delay when old group IDs are released again, it 
> would help to have some more IDs than the real number of cores (resp. threads 
> in case you use them).
>
> -- Reuti
> ________________________________
>
> This email (including its attachments, if any) may be confidential and 
> proprietary information of SMIC, and intended only for the use of the named 
> recipient(s) above. Any unauthorized use or disclosure of this email is 
> strictly prohibited. If you are not the intended recipient(s), please notify 
> the sender immediately and delete this email from your computer.
>

________________________________

This email (including its attachments, if any) may be confidential and 
proprietary information of SMIC, and intended only for the use of the named 
recipient(s) above. Any unauthorized use or disclosure of this email is 
strictly prohibited. If you are not the intended recipient(s), please notify 
the sender immediately and delete this email from your computer.

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to