I have a problem which may or may not be openmpi, but since this list was useful before with a race condition I am posting.
I am trying to use pbsdsh as a ssh replacement, pushed by sysadmins as Torque does not know about ssh tasks launched from a task. In a simple case, a script launches three mpi tasks in parallel, Task1: NodeA Task2: NodeB and NodeC Task3: NodeD (some cores on each, all handled correctly). Reproducible (but with different nodes and numbers of cores) Task1 and Task3 work fine, the mpi task starts on NodeB but nothing starts on NodeC, it appears that NodeC does not communicate. It does not have to be this it could be Task1: NodeA NodeB Task2: NodeC NodeD Here NodeC will start and it looks as if NodeD never starts anything. I've also run it with 4 Tasks (1,3,4 work) and if Task2 only uses one Node (number of cores do not matter) it is fine. -- Laurence Marks Department of Materials Science and Engineering MSE Rm 2036 Cook Hall 2220 N Campus Drive Northwestern University Evanston, IL 60208, USA Tel: (847) 491-3996 Fax: (847) 491-7820 email: L-marks at northwestern dot edu Web: www.numis.northwestern.edu Chair, Commission on Electron Crystallography of IUCR www.numis.northwestern.edu/ Research is to see what everybody else has seen, and to think what nobody else has thought Albert Szent-Györgi