Am 18.09.2013 um 01:19 schrieb Brendan Moloney <molo...@ohsu.edu>:

> Hi Reuti,
> 
>> Yes. But this queue can have the total slot count of the machine. Or are you 
>> assigning right now 4 cores to a short queue, 8 to a medium one and the 
>> remaining 4 cores of a 16 cores machine to a long queue?
> 
> I limit the total number of slots for each queue using an RQS.  The shortest 
> queue (30 minute time limit) is unlimited, and each other queue can use up to 
> 10% of the total slots (120 total). For example the RQS for one of the queues:
> 
> {
>   name         longlimit
>   description  NONE
>   enabled      TRUE
>   limit        queues long.q hosts * to slots=12
> }
> 
> 
>> The $PE_HOSTFILE will contain an entry for each granted queue. The MPI 
>> library will need to sum up the ones residing on one and the same host and 
>> use forks across the overall amount. This was a bug in Open MPI but it was 
>> fixed some time ago; in MPICH2 1.4.1p1 it's still there, even 3.0.4 and 
>> 3.1b1. 
>> I'll bring it up on the MPICH list (as a result several `qrsh -inherit ...` 
>> will be made to one and the same machine and you can face the error you got).
> 
> Thank you so much for this!  I saw your message to the MPICH list and will 
> definitely keep an eye on it.  I may also try out OpenMPI soon.

AFAICS you can apply the change I introduced also to 1.4.1p1. As there was more 
in the changed file at that time, it's necessary to copy and paste the relevant 
subroutine. After the recompilation of MPICH2 it should also work with already 
compiled applications, as only a part of the Hydra startup was touched.

-- Reuti

> 
> Thanks again,
> Brendan

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to