Re: [gridengine users] queue instance "all.q@comp065.local" dropped because it is temporarily not available

2014-10-21 Thread Waleed Lutfi
Update: Going through the spool messages of comp065 I found this message: 10/21/2014 14:48:34| main|comp065|E|can't start job "155": can't open file /opt/gridengine/default/spool/comp065/active_jobs/155.1/pe_hostfile: No such file or Note that spool directory is a mounted NFS directory. I tried

[gridengine users] queue instance "all.q@comp065.local" dropped because it is temporarily not available

2014-10-21 Thread Waleed Lutfi
Dear all, I am currently configuring Grid Engine on a fresh install of Rocks cluster. I have 3 compute nodes. Whenever I submit any job it only runs on 1 of the nodes and the other nodes' jobs halt in 't' state. Running 'qconf -tsm', I get the following log: Tue Oct 21 14:36:49 2014|

Re: [gridengine users] Weird scheduler calculation error?

2014-10-21 Thread Winkler, Ursula (ursula.wink...@uni-graz.at)
> "schedd_job_info" is switched on then? But even if switched off it should > show up in `qalter -w p`. Yes, it is on. > And 24 slots per machine then - `qstat -g c ` reveals the slots as being free > too? A good question: it reveals that from 72 are now (!) 70 are free and no one is used. Wh

Re: [gridengine users] Weird scheduler calculation error?

2014-10-21 Thread Reuti
Am 21.10.2014 um 13:45 schrieb Winkler, Ursula (ursula.wink...@uni-graz.at): >> The qsub man page states that -w p and -w v don't take into account load >> values. Possibly the job is requesting a complex whose value is determined >> by a load sensor and the returned value is not suitable but n

Re: [gridengine users] Weird scheduler calculation error?

2014-10-21 Thread Winkler, Ursula (ursula.wink...@uni-graz.at)
> The qsub man page states that -w p and -w v don't take into account load > values. Possibly the job is requesting a complex whose value is determined > by a load sensor and the returned value is not suitable but not causing an > alarm. Should not "qstat -j " list the shortage of a complex?

Re: [gridengine users] Weird scheduler calculation error?

2014-10-21 Thread William Hay
On Tue, 21 Oct 2014 10:45:30 + "Winkler, Ursula (ursula.wink...@uni-graz.at)" wrote: > Hi Reuti, > > no - and there is no (other than reputed slots) resource shortage. And no > host is in error state. > > Ursula > > > -Ursprüngliche Nachricht- > Von: Reuti [mailto:re...@staff.un

Re: [gridengine users] Weird scheduler calculation error?

2014-10-21 Thread Winkler, Ursula (ursula.wink...@uni-graz.at)
The allocation rule is "$fill_up" and not "$pe_slots". So that should be ok. Ursula -Ursprüngliche Nachricht- Von: users-boun...@gridengine.org [mailto:users-boun...@gridengine.org] Im Auftrag von Simon Andrews Gesendet: Dienstag, 21. Oktober 2014 13:05 An: Gridengine Users Group Betreff

Re: [gridengine users] Weird scheduler calculation error?

2014-10-21 Thread Simon Andrews
What is your allocation rule for the mpios parallel environment (qconf -sp mpios)? Could it be that the allocation says that the slots all have to be on the same physical node, and no single node has more than 64 slots available? -Original Message- From: users-boun...@gridengine.org [m

Re: [gridengine users] Weird scheduler calculation error?

2014-10-21 Thread Winkler, Ursula (ursula.wink...@uni-graz.at)
Hi Reuti, no - and there is no (other than reputed slots) resource shortage. And no host is in error state. Ursula -Ursprüngliche Nachricht- Von: Reuti [mailto:re...@staff.uni-marburg.de] Gesendet: Dienstag, 21. Oktober 2014 12:25 An: Gridengine Users Group Cc: Winkler, Ursula (ursula

Re: [gridengine users] Weird scheduler calculation error?

2014-10-21 Thread Reuti
Hi, Am 21.10.2014 um 11:21 schrieb Ursula Winkler: > Hi gridengine members, > > For now I ran out of ideas with an annoying problem: > > A job with 72 slots does not start because of "qstat -j " tells > "cannot run in PE "mpios" because it only offers 64 slots", but there are 72 > free ("qalt

[gridengine users] Weird scheduler calculation error?

2014-10-21 Thread Ursula Winkler
Hi gridengine members, For now I ran out of ideas with an annoying problem: A job with 72 slots does not start because of "qstat -j " tells "cannot run in PE "mpios" because it only offers 64 slots", but there are 72 free ("qalter -w p and "qalter -w -v " tells "verification: found possible