> Am 11.08.2014 um 22:54 schrieb Riccardo Murri <[email protected]>:
> 
> Hello,
> 
> we're running an ageing cluster, which was initially built a few years
> ago with Myrinet as its high-performance interconnect.  The cluster
> has recently acquired some new "fat" nodes with 32 cores, and things
> have started to break: apparently the Myrinet MX kernel module only
> allows 16 endpoints, but MPI processes allocate one MX endpoint per
> process. So on a fat node, 16 processes out of 32 are not able to
> communicate over Myrinet, and die with an error.
> 
> Is there a way I can tell SGE that there are only 16 endpoints on a
> node, so it would not allocate more than 16 MPI processes to a single
> node?  (This seems to call for per-node consumable, which AFAIK do not
> exist.)

Sure they exist. Just define a consumable complex and attach it to each exehost 
`qconf - me node001`… in the entry "complex_values".

In principle it could also be done by an RQS, but as it's a feature of each 
particular node, the attached complex is better suited IMO.

-- Reuti


> 
> Thanks for any suggestion!
> 
> Riccardo
> 
> --
> Riccardo Murri
> http://www.s3it.uzh.ch/about/team/
> 
> S3IT: Services and Support for Science IT
> University of Zurich
> Winterthurerstrasse 190, CH-8057 Zürich (Switzerland)
> Tel: +41 44 635 4222
> Fax: +41 44 635 6888
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
> 

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to