Am 31.10.2015 um 19:36 schrieb Leior Varon: > Hi Yuri, > I think it's best that you be the contact with the forum but I think it would > be good to answer something similar to the below: > > Hi Reuti, > > Thank you for the reply. > We also noticed that requesting "-q bar@foo" works around this behavior and > it looks like we will need to use this method. > > There are many cases where it is hard to implement this work around (for > instance if a program default flag requests -q bar and the user wants to > specify the host foo it requires removing the default flag...)
It should work to mangle this in a JSR to request "-q bar@foo", i.e. read the already set "-q bar", change this by adding the requested host and remove "-l h=foo". Even without a default for a queue it could be submitted to "-q *@foo". -- Reuti > > We are still looking for additional fixes/WAs. And we are also interested in > any information regarding similar strange behavior . > > Thank you, > > -----Original Message----- > From: Reuti [mailto:re...@staff.uni-marburg.de] > Sent: Friday, October 30, 2015 12:17 PM > To: Yuri Burmachenko <yur...@mellanox.com> > Cc: users@gridengine.org; EdaIt <ed...@mellanox.com>; Leior Varon > <lei...@mellanox.com> > Subject: Re: [gridengine users] SoGE 8.1.8 - Qsub issue when using request > able variable and parallel environment - need your help. > > >> Am 30.10.2015 um 10:40 schrieb Yuri Burmachenko <yur...@mellanox.com>: >> >> Hallo to distinguished forum members, >> >> Recently we have a need to submit jobs in way that qsub request both >> requestable variable hostname and parallel environment. >> >> For example if we submit ‘xterm’ job: >> · $SGE_ROOT/bin/lx-amd64/qsub -V -cwd -b y -l hostname=host_in_grid >> -pe somePe 1 xterm >> >> This kind of request results in a strange behavior of the scheduler – this >> requests results to one of the below states of the submission: >> >> 1. xterm job opened as expected. >> 2. There is a very long delay and then xterm opened. >> 3. Job enters ‘qw’ state with similar to below error: >> cannot run because it exceeds limit "/////" in rule "some_rule/1" >> >> cannot run in PE "somePe" because it only offers 0 slots >> >> In all of the above states the “host_in_grid” has enough free slots and the >> quota rule “some_rule” is not related in any way to the consumable/request >> able variable in the job submission request. >> If we try to remove “some_rule” quota from the SGE quotas, then this error >> picks up another rule and again states that its limit was exceeded. >> NOTE: somePe parallel environment has enough free slots – it is initially >> defined with 999 slots. >> >> Basically these “cannot run” messages do not reflect the real reason why the >> job can’t be run, since all conditions are actually met – this is very >> confusing, why this happen? >> >> We also found a workaround without the requestable variable “hostname” like >> below when it ALWAYS work: >> $SGE_ROOT/bin/lx-amd64/qsub -V -cwd -b y -q host_in_grid -pe testpe 1 xterm >> >> Any ideas why does this strange behavior occur? Is this some kind of a bug? >> How this can be resolved? > > Unfortunately I have no idea, but I observed already in former versions that > instead of: > > -l h=foo -q bar > > it's better to request: > > -q bar@foo > > Maybe it is similar to the issue you faced. > > -- Reuti > _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users