Re: [OMPI devel] OMPI devel] OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Ralph Castain
Aha - I see what happened. I have that param set to false in my default mca param file. If I set it to true on the cmd line, then I run without segfaulting. Thanks! Ralph > On Nov 26, 2014, at 5:55 PM, Gilles Gouaillardet > wrote: > > Ralph, > > let me correct and enhance my previous statem

Re: [OMPI devel] OMPI devel] OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Gilles Gouaillardet
Ralph, let me correct and enhance my previous statement : - i cannot reproduce your crash in my environment (RHEL6 like vs your RHEL7 like) (i configured with --enable-debug --enable-picky) - i can reproduce the crash with mpirun --mca mpi_param_check false - if you configured with --without-mp

Re: [OMPI devel] OMPI devel] OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Ralph Castain
> On Nov 26, 2014, at 5:06 PM, Gilles Gouaillardet > wrote: > > I will double check this(afk right now) > Are you running on a rhel6 like distro with gcc ? Yeah, I’m running CentOS7 and gcc 4.8.2 > > Iirc, crash vs mpi error is ruled by --with-param-check or something like > this… Sounds r

Re: [OMPI devel] OMPI devel] OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Gilles Gouaillardet
I will double check this(afk right now) Are you running on a rhel6 like distro with gcc ? Iirc, crash vs mpi error is ruled by --with-param-check or something like this... Cheers, Gilles Ralph Castain さんのメール: >I tried it with both the fortran and c versions - got the same result. > > >This wa

Re: [OMPI devel] OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Ralph Castain
I tried it with both the fortran and c versions - got the same result. This was indeed with a debug build. I wouldn’t expect a segfault even with an optimized build, though - I would expect an MPI error, yes? > On Nov 26, 2014, at 4:26 PM, Gilles Gouaillardet > wrote: > > I will have a look

Re: [OMPI devel] OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Gilles Gouaillardet
I will have a look Btw, i was running the fortran version, not the c one. Did you confgure with --enable--debug ? The program sends to a rank *not* in the communicator, so this behavior could make some sense on an optimized build. Cheers, Gilles Ralph Castain さんのメール: >Ick - I’m getting a segfa

Re: [OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Ralph Castain
Ick - I’m getting a segfault when trying to run that test: MPITEST info (0): Starting MPI_Errhandler_fatal test MPITEST info (0): This test will abort after printing the results message MPITEST info (0): If it does not, then a f.a.i.l.u.r.e will be noted [bend001:07714] *** Process received sig

Re: [OMPI devel] question to OMPI_DECLSPEC

2014-11-26 Thread Edgar Gabriel
On 11/26/2014 11:02 AM, George Bosilca wrote: We had similar problems in the PML V, and we decided to try to minimize the increase in size of the main library. Thus, instead of moving everything in the base, we added a structure in the base that will contain all the pointer to the functions we w

Re: [OMPI devel] question to OMPI_DECLSPEC

2014-11-26 Thread George Bosilca
Edgar, The restriction you are facing doesn't come from Open MPI, but instead it comes from the default behavior of how dlopen loads the .so files. As we do not manually force the RTLD_GLOBAL flag the scope of our modules is local, which means that the symbols defined in this library are not made

Re: [OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Ralph Castain
Hmmm….yeah, I know we saw this and resolved it in the trunk, but it looks like the fix indeed failed to come over to 1.8. I’ll take a gander (pretty sure I remember how I fixed it) - thanks! > On Nov 26, 2014, at 12:03 AM, Gilles Gouaillardet > wrote: > > Ralph, > > i noted several hangs in

Re: [OMPI devel] question to OMPI_DECLSPEC

2014-11-26 Thread Ralph Castain
> On Nov 26, 2014, at 7:16 AM, Edgar Gabriel wrote: > > ok, so I thought about it a bit, and while I am still baffled by the actual > outcome and the missing symbol (for the main reason that the function of the > fcoll component is being called from the ompio module, so the function of the >

Re: [OMPI devel] question to OMPI_DECLSPEC

2014-11-26 Thread Edgar Gabriel
ok, so I thought about it a bit, and while I am still baffled by the actual outcome and the missing symbol (for the main reason that the function of the fcoll component is being called from the ompio module, so the function of the ompio that was called from the fcoll component is guaranteed to

[OMPI devel] race condition in abort can cause mpirun v1.8 hang

2014-11-26 Thread Gilles Gouaillardet
Ralph, i noted several hangs in mtt with the v1.8 branch. a simple way to reproduce it is to use the MPI_Errhandler_fatal_f test from the intel_tests suite, invoke mpirun on one node and run the taks on an other node : node0$ mpirun -np 3 -host node1 --mca btl tcp,self ./MPI_Errhandler_fatal_f