I have a problem which may or may not be openmpi, but since this list
was useful before with a race condition I am posting.

I am trying to use pbsdsh as a ssh replacement, pushed by sysadmins as
Torque does not know about ssh tasks launched from a task. In a simple
case, a script launches three mpi tasks in parallel,

Task1: NodeA
Task2: NodeB and NodeC
Task3: NodeD

(some cores on each, all handled correctly). Reproducible (but with
different nodes and numbers of cores) Task1 and Task3 work fine, the
mpi task starts on NodeB but nothing starts on NodeC, it appears that
NodeC does not communicate. It does not have to be this it could be

Task1: NodeA NodeB
Task2: NodeC NodeD

Here NodeC will start and it looks as if NodeD never starts anything.
I've also run it with 4 Tasks (1,3,4 work) and if Task2 only uses one
Node (number of cores do not matter) it is fine.

-- 
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Chair, Commission on Electron Crystallography of IUCR
www.numis.northwestern.edu/
Research is to see what everybody else has seen, and to think what
nobody else has thought
Albert Szent-Györgi

Reply via email to