> Am 11.08.2014 um 22:54 schrieb Riccardo Murri <[email protected]>: > > Hello, > > we're running an ageing cluster, which was initially built a few years > ago with Myrinet as its high-performance interconnect. The cluster > has recently acquired some new "fat" nodes with 32 cores, and things > have started to break: apparently the Myrinet MX kernel module only > allows 16 endpoints, but MPI processes allocate one MX endpoint per > process. So on a fat node, 16 processes out of 32 are not able to > communicate over Myrinet, and die with an error. > > Is there a way I can tell SGE that there are only 16 endpoints on a > node, so it would not allocate more than 16 MPI processes to a single > node? (This seems to call for per-node consumable, which AFAIK do not > exist.)
Sure they exist. Just define a consumable complex and attach it to each exehost `qconf - me node001`… in the entry "complex_values". In principle it could also be done by an RQS, but as it's a feature of each particular node, the attached complex is better suited IMO. -- Reuti > > Thanks for any suggestion! > > Riccardo > > -- > Riccardo Murri > http://www.s3it.uzh.ch/about/team/ > > S3IT: Services and Support for Science IT > University of Zurich > Winterthurerstrasse 190, CH-8057 Zürich (Switzerland) > Tel: +41 44 635 4222 > Fax: +41 44 635 6888 > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
