Thank You Hugh for input!

But the issue with the '-soft' option that it does not enforce and we will not 
necessarily get the required station.

Thanks.

-----Original Message-----
From: MacMullan, Hugh [mailto:hugh...@wharton.upenn.edu] 
Sent: Friday, October 30, 2015 4:13 PM
To: Reuti <re...@staff.uni-marburg.de>; Yuri Burmachenko <yur...@mellanox.com>
Cc: users@gridengine.org; EdaIt <ed...@mellanox.com>; Leior Varon 
<lei...@mellanox.com>
Subject: RE: [gridengine users] SoGE 8.1.8 - Qsub issue when using request able 
variable and parallel environment - need your help.

I don't know why, but when we do the same, we need to specify '-soft' for the 
'-l hostname=XXXX' request. Like:

qsub -b y -N test -j y -pe somepe 2 -soft -l hostname=hpcc001 hostname

I hope that helps!

-Hugh

-----Original Message-----
From: users-boun...@gridengine.org [mailto:users-boun...@gridengine.org] On 
Behalf Of Reuti
Sent: Friday, October 30, 2015 6:17 AM
To: Yuri Burmachenko <yur...@mellanox.com>
Cc: users@gridengine.org; EdaIt <ed...@mellanox.com>; Leior Varon 
<lei...@mellanox.com>
Subject: Re: [gridengine users] SoGE 8.1.8 - Qsub issue when using request able 
variable and parallel environment - need your help.


> Am 30.10.2015 um 10:40 schrieb Yuri Burmachenko <yur...@mellanox.com>:
> 
> Hallo to distinguished forum members,
>  
> Recently we have a need to submit jobs in way that qsub request both 
> requestable variable hostname and parallel environment.
>  
> For example if we submit ‘xterm’ job:
> ·         $SGE_ROOT/bin/lx-amd64/qsub -V -cwd -b y -l hostname=host_in_grid 
> -pe somePe 1 xterm
>  
> This kind of request results in a strange behavior of the scheduler – this 
> requests results to one of the below states of the submission:
>  
> 1.      xterm job opened as expected.
> 2.      There is a very long delay and then xterm opened.
> 3.      Job enters ‘qw’ state with similar to below error:
> cannot run because it exceeds limit "/////" in rule "some_rule/1"             
>                                   
> cannot run in PE "somePe" because it only offers 0 slots
>  
> In all of the above states the “host_in_grid” has enough free slots and the 
> quota rule “some_rule” is not related in any way to the consumable/request 
> able variable in the job submission request.
> If we try to remove “some_rule” quota from the SGE quotas, then this error 
> picks up another rule and again states that its limit was exceeded.
> NOTE: somePe parallel environment has enough free slots – it is initially 
> defined with 999 slots.
>  
> Basically these “cannot run” messages do not reflect the real reason why the 
> job can’t be run, since all conditions are actually met – this is very 
> confusing, why this happen?
>  
> We also found a workaround without the requestable variable “hostname” like 
> below when it ALWAYS work:
> $SGE_ROOT/bin/lx-amd64/qsub -V -cwd -b y -q host_in_grid -pe testpe 1 xterm
>  
> Any ideas why does this strange behavior occur? Is this some kind of a bug? 
> How this can be resolved?

Unfortunately I have no idea, but I observed already in former versions that 
instead of:

-l h=foo -q bar

it's better to request:

-q bar@foo

Maybe it is similar to the issue you faced.

-- Reuti
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to