Re: [gridengine users] PE suddenly has zero/few slots

2012-10-16 Thread Andrew Pearson
This has been working fine for many months however. In addition, when I disable this feature, the problem remains. On Tue, Oct 16, 2012 at 11:32 AM, Reuti wrote: > Am 16.10.2012 um 16:58 schrieb Andrew Pearson: > > > Hi. I have a cluster running Rocks 5.4 that has been working

[gridengine users] PE suddenly has zero/few slots

2012-10-16 Thread Andrew Pearson
Hi. I have a cluster running Rocks 5.4 that has been working perfectly well for a long time. Now, suddenly, a problem has emerged. Jobs requesting more than a few slots fail to run, remaining in qw indefinitely. When I do qstat -j the problem, I get the message " cannot run in PE "orte_old" be

[gridengine users] Over-subscription of hosts -- the role of slots and queues

2012-06-05 Thread Andrew Pearson
Hi all I'm having an oversubscription problem on my cluster. I'll describe the problem and my proposed solution. I can't implement my solution yet since there are some several-day jobs running right now, so I thought I'd run it past everyone on the mailing list. My problem is simple - parallel

Re: [gridengine users] Job runs on nodes that are not part of queue!

2012-01-24 Thread Andrew Pearson
These JSV scripts look very useful - I'll read about them. Thanks for the example. On Mon, Jan 23, 2012 at 5:04 PM, Reuti wrote: > Am 23.01.2012 um 21:55 schrieb Andrew Pearson: > > > Thanks Reuti > > You're welcome. > > > > OK - I made duplicates of

Re: [gridengine users] Job runs on nodes that are not part of queue!

2012-01-23 Thread Andrew Pearson
owever, if the user doesn't include a -pe line in their submission script, I don't see how they would specify the number of processors they need. Sorry for my basic questions. I'd appreciate any comments you have. On Mon, Jan 23, 2012 at 2:57 PM, Reuti wrote: > Am 23.01.

[gridengine users] Job runs on nodes that are not part of queue!

2012-01-23 Thread Andrew Pearson
Hi. I'm trying to move from load-based to sequence based scheduling, and I have a problem. First, a little something about my setup: I have two sets of machines - 176 'fast' cores in 16-core nodes, and 90 'slow' cores in 2-core nodes. I have two corresponding queues - slow.q and fast.q. The qu