Hi Ralph,
some new information about this "bug": we got a defective disk on this
computer! Then filesystem errors occurred... The disk is now replaced
since 2 days and everything seems to work well (the problem re-occurred
since the last time I wrote about it).
Sorry for bothering!
Eric
On 02/05/2014 11:38 AM, Ralph Castain wrote:
I'm afraid it isn't quite that simple, Jeff. We also have the race condition at startup - multiple
procs on the same machine, from the same job, will be trying to create the session directory tree.
At the moment, we see the fact that some other proc created it and simply create our own entry
underneath as required. So I don't know how to tell the difference between "some other proc
from my job created it first" vs "this is a stale directory and should be deleted".
However, I might be able to rig something up when the daemons start, and for
singletons. Will give that a try
On Feb 4, 2014, at 6:11 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:
On Feb 3, 2014, at 6:44 PM, Ralph Castain <r...@open-mpi.org> wrote:
If I may suggest to test the behavior of 1.7.x... what about this: Have a test
case that creates a bunch of files (from 0 to 65536) in
/tmp/openmpi-sessions-${USER}... before launching an executable without mpirun...
>:)
Ick - it will actually only conflict if/when the pid's wrap, so it's a pretty
rare issue.
Ralph: what do you think about modifying this for 1.7.5? I.e., if the pid dir
already exists in the session directory, remove it. This is always safe to do
(assuming /tmp is a local filesystem) because the OS will never use the same
PID for 2 concurrent processes.
--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users