Hallo to distinguished forum members,

Recently we have a need to submit jobs in way that qsub request both 
requestable variable hostname and parallel environment.

For example if we submit 'xterm' job:

*         $SGE_ROOT/bin/lx-amd64/qsub -V -cwd -b y -l hostname=host_in_grid -pe 
somePe 1 xterm

This kind of request results in a strange behavior of the scheduler - this 
requests results to one of the below states of the submission:


1.      xterm job opened as expected.

2.      There is a very long delay and then xterm opened.

3.      Job enters 'qw' state with similar to below error:

cannot run because it exceeds limit "/////" in rule "some_rule/1"

cannot run in PE "somePe" because it only offers 0 slots

In all of the above states the "host_in_grid" has enough free slots and the 
quota rule "some_rule" is not related in any way to the consumable/request able 
variable in the job submission request.
If we try to remove "some_rule" quota from the SGE quotas, then this error 
picks up another rule and again states that its limit was exceeded.
NOTE: somePe parallel environment has enough free slots - it is initially 
defined with 999 slots.

Basically these "cannot run" messages do not reflect the real reason why the 
job can't be run, since all conditions are actually met - this is very 
confusing, why this happen?

We also found a workaround without the requestable variable "hostname" like 
below when it ALWAYS work:
$SGE_ROOT/bin/lx-amd64/qsub -V -cwd -b y -q host_in_grid -pe testpe 1 xterm

Any ideas why does this strange behavior occur? Is this some kind of a bug? How 
this can be resolved?

Appreciate your help.
Thanks.


Yuri Burmachenko | Sr. Engineer | IT | Mellanox Tech
nologies Ltd.
Work: +972 74 7236386 | Cell +972 54 7542188 |Fax: +972 4 959 3245
Follow us on Twitter<http://twitter.com/mellanoxtech> and 
Facebook<http://www.facebook.com/pages/Mellanox-Technologies/223164879116>

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to