On Wed, Dec 2, 2009 at 14:23, Ralph Castain <r...@open-mpi.org> wrote:

> Hmm....if you are willing to keep trying, could you perhaps let it run for
> a brief time, ctrl-z it, and then do an ls on a directory from a process
> that has already terminated? The pids will be in order, so just look for an
> early number (not mpirun or the parent, of course).
>
> It would help if you could give us the contents of a directory from a child
> process that has terminated - would tell us what subsystem is failing to
> properly cleanup.
>

Ok, so I Ctrl-Z the master. In
/tmp/.private/nbock/openmpi-sessions-nbock@mujo_0 I now have only one
directory

/tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857

I can't find that PID though. mpirun has PID 4230, orted does not exist,
master is 4231, and slave is 4275. When I "fg" master and Ctrl-Z it again,
slave has a different PID as expected. I Ctrl-Z'ed in iteration 68, there
are 70 sequentially numbered directories starting at 0. Every directory
contains another directory called "0". There is nothing in any of those
directories. I see for instance:

/tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 $ ls -lh 70
total 4.0K
drwx------ 2 nbock users 4.0K Dec  2 14:41 0

and

nbock@mujo /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 $ ls -lh
70/0/
total 0

I hope this information helps. Did I understand your question correctly?

nick

Reply via email to