Re: [OMPI devel] MPI_Errhandler_fatal_c failure

2011-08-18 Thread George Bosilca
On Aug 18, 2011, at 14:58 , TERRY DONTJE wrote: > > > On 8/18/2011 2:32 PM, George Bosilca wrote: >> Terry, >> >> The test succeeded in both of your runs. >> > Not really. Granted the test aborted in both cases however the case you > show below has further issues while the orte is trying t

Re: [OMPI devel] MPI_Errhandler_fatal_c failure

2011-08-18 Thread TERRY DONTJE
Thought I'd throw this out there, I retraced my MTT steps and did find that there were failures of this test back until r24774. r24775 has a comment that looks very relevant. I am talking to the committer of that change now. Sorry for the false accusation. --td On 8/18/2011 2:32 PM, George

Re: [OMPI devel] MPI_Errhandler_fatal_c failure

2011-08-18 Thread TERRY DONTJE
On 8/18/2011 2:32 PM, George Bosilca wrote: Terry, The test succeeded in both of your runs. Not really. Granted the test aborted in both cases however the case you show below has further issues while the orte is trying to clean things up. It certainly is not what I would call friendly. B

Re: [OMPI devel] MPI_Errhandler_fatal_c failure

2011-08-18 Thread George Bosilca
Terry, The test succeeded in both of your runs. However, I rolled back before the epoch change (24814) and the output is the following: MPITEST info (0): Starting MPI_Errhandler_fatal test MPITEST info (0): This test will abort after printing the results message MPITEST info (0): If it does

Re: [OMPI devel] MPI_Errhandler_fatal_c failure

2011-08-18 Thread TERRY DONTJE
Just ran MPI_Errhandler_fatal_c with r25063 and it still fails. Everything is the same except I don't see the "readv failed.." message. Have your tried to run this code yourself? It is pretty simple and fails with one node using np=4. --td On 8/18/2011 10:57 AM, Wesley Bland wrote: I just

Re: [OMPI devel] MPI_Errhandler_fatal_c failure

2011-08-18 Thread Ralph Castain
I doubt that will solve the problem. The issue is that procs are continuing to fail while you are trying to respond to the first one. Here is what happens: 1. first proc fails, causing a "connection failed" error that gets reported to the orted errmgr. 2. errmgr_orted starts trying to send "pro

Re: [OMPI devel] MPI_Errhandler_fatal_c failure

2011-08-18 Thread Wesley Bland
I just checked in a fix (I hope). I think the problem was that the errmgr was removing children from the list of odls children without using the mutex to prevent race conditions. Let me know if the MTT is still having problems tomorrow. Wes > I am seeing the intel test suite tests MPI_Errhandler_

[OMPI devel] MPI_Errhandler_fatal_c failure

2011-08-18 Thread TERRY DONTJE
I am seeing the intel test suite tests MPI_Errhandler_fatal_c and MPI_Errhandler_fatal_f fail with an oob failure quite a bit I have not seen this test failing under MTT until the epoch code was added. So I have a suspicion the epoch code might be at fault. Could someone familiar with the ep