Am 17.09.2013 um 01:53 schrieb Brendan Moloney: >>> I use multiple queues to divide up available resources based on job run >>> times. >> >> So you are requesting "-l h_rt=..."? > > Yes. > >> Yes, the problem is that you can't address a specific queue in `qrsh >> -inherit ...` and if you get several queues on a machine you might have used >> up the slots of the queue that is selected first for the `qrsh -inherit ...`. >> https://arc.liv.ac.uk/trac/SGE/ticket/813 > > Thanks for this information. Switching the allocation rule from > "round_robin" to "fill_up" gets rid of the problem for me, but I am not sure > if this is just because less queues are being used on each host. > >> It should help to have a PE for each queue, but you end up with 9 PEs for >> each PE you have right now. > > But this would limit the maximum size of parallel jobs to the maximum number > of slots on a single queue, right?
Yes. But this queue can have the total slot count of the machine. Or are you assigning right now 4 cores to a short queue, 8 to a medium one and the remaining 4 cores of a 16 cores machine to a long queue? >> BUT: What type of parallel applications are you using? With a tight >> integration of MPICH2/3 and Open MPI there is only one `qrsh -inherit ...` >> call per exechost and all other processes are forks. And as you get >> "Execution daemon on host <hostname> didn't accept task" you are having a >> tight integration. > > I am using MPICH2 version 1.4 which does have tight integration built in. The $PE_HOSTFILE will contain an entry for each granted queue. The MPI library will need to sum up the ones residing on one and the same host and use forks across the overall amount. This was a bug in Open MPI but it was fixed some time ago; in MPICH2 1.4.1p1 it's still there, even 3.0.4 and 3.1b1. I'll bring it up on the MPICH list (as a result several `qrsh -inherit ...` will be made to one and the same machine and you can face the error you got). -- Reuti _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
