Thank You Hugh for input! But the issue with the '-soft' option that it does not enforce and we will not necessarily get the required station.
Thanks. -----Original Message----- From: MacMullan, Hugh [mailto:hugh...@wharton.upenn.edu] Sent: Friday, October 30, 2015 4:13 PM To: Reuti <re...@staff.uni-marburg.de>; Yuri Burmachenko <yur...@mellanox.com> Cc: users@gridengine.org; EdaIt <ed...@mellanox.com>; Leior Varon <lei...@mellanox.com> Subject: RE: [gridengine users] SoGE 8.1.8 - Qsub issue when using request able variable and parallel environment - need your help. I don't know why, but when we do the same, we need to specify '-soft' for the '-l hostname=XXXX' request. Like: qsub -b y -N test -j y -pe somepe 2 -soft -l hostname=hpcc001 hostname I hope that helps! -Hugh -----Original Message----- From: users-boun...@gridengine.org [mailto:users-boun...@gridengine.org] On Behalf Of Reuti Sent: Friday, October 30, 2015 6:17 AM To: Yuri Burmachenko <yur...@mellanox.com> Cc: users@gridengine.org; EdaIt <ed...@mellanox.com>; Leior Varon <lei...@mellanox.com> Subject: Re: [gridengine users] SoGE 8.1.8 - Qsub issue when using request able variable and parallel environment - need your help. > Am 30.10.2015 um 10:40 schrieb Yuri Burmachenko <yur...@mellanox.com>: > > Hallo to distinguished forum members, > > Recently we have a need to submit jobs in way that qsub request both > requestable variable hostname and parallel environment. > > For example if we submit ‘xterm’ job: > · $SGE_ROOT/bin/lx-amd64/qsub -V -cwd -b y -l hostname=host_in_grid > -pe somePe 1 xterm > > This kind of request results in a strange behavior of the scheduler – this > requests results to one of the below states of the submission: > > 1. xterm job opened as expected. > 2. There is a very long delay and then xterm opened. > 3. Job enters ‘qw’ state with similar to below error: > cannot run because it exceeds limit "/////" in rule "some_rule/1" > > cannot run in PE "somePe" because it only offers 0 slots > > In all of the above states the “host_in_grid” has enough free slots and the > quota rule “some_rule” is not related in any way to the consumable/request > able variable in the job submission request. > If we try to remove “some_rule” quota from the SGE quotas, then this error > picks up another rule and again states that its limit was exceeded. > NOTE: somePe parallel environment has enough free slots – it is initially > defined with 999 slots. > > Basically these “cannot run” messages do not reflect the real reason why the > job can’t be run, since all conditions are actually met – this is very > confusing, why this happen? > > We also found a workaround without the requestable variable “hostname” like > below when it ALWAYS work: > $SGE_ROOT/bin/lx-amd64/qsub -V -cwd -b y -q host_in_grid -pe testpe 1 xterm > > Any ideas why does this strange behavior occur? Is this some kind of a bug? > How this can be resolved? Unfortunately I have no idea, but I observed already in former versions that instead of: -l h=foo -q bar it's better to request: -q bar@foo Maybe it is similar to the issue you faced. -- Reuti _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users