Hi Ralph, I have confirmed that openmpi-1.4a1r22335 works with my master, slave example. The temporary directories are cleaned up properly.
Thanks for the help! nick On Thu, Dec 17, 2009 at 13:38, Nicolas Bock <nicolasb...@gmail.com> wrote: > Ok, I'll give it a try. > > Thanks, nick > > > > On Thu, Dec 17, 2009 at 12:44, Ralph Castain <r...@open-mpi.org> wrote: > >> In case you missed it, this patch should be in the 1.4 nightly tarballs - >> feel free to test and let me know what you find. >> >> Thanks >> Ralph >> >> On Dec 2, 2009, at 10:06 PM, Nicolas Bock wrote: >> >> That was quick. I will try the patch as soon as you release it. >> >> nick >> >> >> On Wed, Dec 2, 2009 at 21:06, Ralph Castain <r...@open-mpi.org> wrote: >> >>> Patch is built and under review... >>> >>> Thanks again >>> Ralph >>> >>> On Dec 2, 2009, at 5:37 PM, Nicolas Bock wrote: >>> >>> Thanks >>> >>> On Wed, Dec 2, 2009 at 17:04, Ralph Castain <r...@open-mpi.org> wrote: >>> >>>> Yeah, that's the one all right! Definitely missing from 1.3.x. >>>> >>>> Thanks - I'll build a patch for the next bug-fix release >>>> >>>> >>>> On Dec 2, 2009, at 4:37 PM, Abhishek Kulkarni wrote: >>>> >>>> > On Wed, Dec 2, 2009 at 5:00 PM, Ralph Castain <r...@open-mpi.org> >>>> wrote: >>>> >> Indeed - that is very helpful! Thanks! >>>> >> Looks like we aren't cleaning up high enough - missing the directory >>>> level. >>>> >> I seem to recall seeing that error go by and that someone fixed it on >>>> our >>>> >> devel trunk, so this is likely a repair that didn't get moved over to >>>> the >>>> >> release branch as it should have done. >>>> >> I'll look into it and report back. >>>> > >>>> > You are probably referring to >>>> > https://svn.open-mpi.org/trac/ompi/changeset/21498 >>>> > >>>> > There was an issue about orte_session_dir_finalize() not >>>> > cleaning up the session directories properly. >>>> > >>>> > Hope that helps. >>>> > >>>> > Abhishek >>>> > >>>> >> Thanks again >>>> >> Ralph >>>> >> On Dec 2, 2009, at 2:45 PM, Nicolas Bock wrote: >>>> >> >>>> >> >>>> >> On Wed, Dec 2, 2009 at 14:23, Ralph Castain <r...@open-mpi.org> >>>> wrote: >>>> >>> >>>> >>> Hmm....if you are willing to keep trying, could you perhaps let it >>>> run for >>>> >>> a brief time, ctrl-z it, and then do an ls on a directory from a >>>> process >>>> >>> that has already terminated? The pids will be in order, so just look >>>> for an >>>> >>> early number (not mpirun or the parent, of course). >>>> >>> It would help if you could give us the contents of a directory from >>>> a >>>> >>> child process that has terminated - would tell us what subsystem is >>>> failing >>>> >>> to properly cleanup. >>>> >> >>>> >> Ok, so I Ctrl-Z the master. In >>>> >> /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0 I now have only >>>> one >>>> >> directory >>>> >> >>>> >> /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 >>>> >> >>>> >> I can't find that PID though. mpirun has PID 4230, orted does not >>>> exist, >>>> >> master is 4231, and slave is 4275. When I "fg" master and Ctrl-Z it >>>> again, >>>> >> slave has a different PID as expected. I Ctrl-Z'ed in iteration 68, >>>> there >>>> >> are 70 sequentially numbered directories starting at 0. Every >>>> directory >>>> >> contains another directory called "0". There is nothing in any of >>>> those >>>> >> directories. I see for instance: >>>> >> >>>> >> /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 $ ls -lh 70 >>>> >> total 4.0K >>>> >> drwx------ 2 nbock users 4.0K Dec 2 14:41 0 >>>> >> >>>> >> and >>>> >> >>>> >> nbock@mujo /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 $ >>>> ls -lh >>>> >> 70/0/ >>>> >> total 0 >>>> >> >>>> >> I hope this information helps. Did I understand your question >>>> correctly? >>>> >> >>>> >> nick >>>> >> >>>> >> _______________________________________________ >>>> >> users mailing list >>>> >> us...@open-mpi.org >>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >> >>>> >> _______________________________________________ >>>> >> users mailing list >>>> >> us...@open-mpi.org >>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >> >>>> > >>>> > _______________________________________________ >>>> > users mailing list >>>> > us...@open-mpi.org >>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >