On Wed, Dec 2, 2009 at 14:23, Ralph Castain <r...@open-mpi.org> wrote:
> Hmm....if you are willing to keep trying, could you perhaps let it run for > a brief time, ctrl-z it, and then do an ls on a directory from a process > that has already terminated? The pids will be in order, so just look for an > early number (not mpirun or the parent, of course). > > It would help if you could give us the contents of a directory from a child > process that has terminated - would tell us what subsystem is failing to > properly cleanup. > Ok, so I Ctrl-Z the master. In /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0 I now have only one directory /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 I can't find that PID though. mpirun has PID 4230, orted does not exist, master is 4231, and slave is 4275. When I "fg" master and Ctrl-Z it again, slave has a different PID as expected. I Ctrl-Z'ed in iteration 68, there are 70 sequentially numbered directories starting at 0. Every directory contains another directory called "0". There is nothing in any of those directories. I see for instance: /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 $ ls -lh 70 total 4.0K drwx------ 2 nbock users 4.0K Dec 2 14:41 0 and nbock@mujo /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 $ ls -lh 70/0/ total 0 I hope this information helps. Did I understand your question correctly? nick