Hi Ralph,

some new information about this "bug": we got a defective disk on this computer! Then filesystem errors occurred... The disk is now replaced since 2 days and everything seems to work well (the problem re-occurred since the last time I wrote about it).

Sorry for bothering!

Eric


On 02/05/2014 11:38 AM, Ralph Castain wrote:
I'm afraid it isn't quite that simple, Jeff. We also have the race condition at startup - multiple 
procs on the same machine, from the same job, will be trying to create the session directory tree. 
At the moment, we see the fact that some other proc created it and simply create our own entry 
underneath as required. So I don't know how to tell the difference between "some other proc 
from my job created it first" vs "this is a stale directory and should be deleted".

However, I might be able to rig something up when the daemons start, and for 
singletons. Will give that a try

On Feb 4, 2014, at 6:11 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:

On Feb 3, 2014, at 6:44 PM, Ralph Castain <r...@open-mpi.org> wrote:

If I may suggest to test the behavior of 1.7.x... what about this: Have a test 
case that creates a bunch of files (from 0 to 65536) in 
/tmp/openmpi-sessions-${USER}... before launching an executable without mpirun... 
>:)

Ick - it will actually only conflict if/when the pid's wrap, so it's a pretty 
rare issue.


Ralph: what do you think about modifying this for 1.7.5?  I.e., if the pid dir 
already exists in the session directory, remove it.  This is always safe to do 
(assuming /tmp is a local filesystem) because the OS will never use the same 
PID for 2 concurrent processes.

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to