Hi Reuti,

>> I use multiple queues to divide up available resources based on job run 
>> times.
> 
> So you are requesting "-l h_rt=..."?

Yes.

> Yes, the problem is that you can't address a specific queue in `qrsh -inherit 
> ...` and if you get several queues on a machine you might have used up the 
> slots of the queue that is selected first for the `qrsh -inherit ...`.
> https://arc.liv.ac.uk/trac/SGE/ticket/813

Thanks for this information.  Switching the allocation rule from "round_robin" 
to "fill_up" gets rid of the problem for me, but I am not sure if this is just 
because less queues are being used on each host.

> It should help to have a PE for each queue, but you end up with 9 PEs for 
> each PE you have right now.

But this would limit the maximum size of parallel jobs to the maximum number of 
slots on a single queue, right?

> BUT: What type of parallel applications are you using? With a tight 
> integration of MPICH2/3 and Open MPI there is only one `qrsh -inherit ...` 
> call per exechost and all other processes are forks. And as you get 
> "Execution daemon on host <hostname> didn't accept task" you are having a 
> tight integration.

I am using MPICH2 version 1.4 which does have tight integration built in.

Thanks!
Brendan
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to