Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread George Bosilca
Nathan, Fixing the classes to correctly tear down everything was a two lines patch. However, this doesn’t fix the bigger issue, which is related to the fact that not all frameworks are correctly teared down, and when they are they leave behind char* parameters not set to NULL, and that the fram

Re: [OMPI devel] RFC: Move the Open MPI communication infrastructure in OPAL

2014-07-15 Thread George Bosilca
Ralph, There are two reasons that prevent me from pushing this RFC forward. 1. Minor: The code has some minor issues related to the last set of BTL/PML changes, and I didn't found the time to fix them. 2. Major: Not all BTLs have been updated and validated. What we need at this point from their

[OMPI devel] New crash on trunk (r32246)

2014-07-15 Thread Rolf vandeVaart
With the latest trunk (r32246) I am getting crashes while the program is shutting down. I assume this is related to some of the changes George just made. George, can you take a look when you get a chance? Looks like everyone is getting the segv during shutdown (mpirun, orted, and application)

[OMPI devel] 100% test failures

2014-07-15 Thread Ralph Castain
Hi folks The changes to opal_class_finalize are generating 100% segfaults on the trunk: 175 free(cls->cls_construct_array); Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6_5.2.x86_64 libgcc-4.4.7-4.el6.x86_64 numactl-2.0.7-8.el6.x86_64 (gdb) where #0 0x

Re: [OMPI devel] 100% test failures

2014-07-15 Thread Gilles GOUAILLARDET
r32236 is a suspect i am afk I just read the code and a class is initialized with opal_class_initialize the first time an object is instantiated with OBJ_NEW I would simply revert r32236 or update opal_class_finalize and free(cls->cls_construct_array); only if cls->cls_construct_array is not N

Re: [OMPI devel] 100% test failures

2014-07-15 Thread Nathan Hjelm
On Tue, Jul 15, 2014 at 11:40:38PM +0900, Gilles GOUAILLARDET wrote: >r32236 is a suspect > >i am afk > >I just read the code and a class is initialized with opal_class_initialize >the first time an object is instantiated with OBJ_NEW > >I would simply revert r32236 or update

Re: [OMPI devel] 100% test failures

2014-07-15 Thread George Bosilca
I'm also looking into it. George. On Tue, Jul 15, 2014 at 10:50 AM, Nathan Hjelm wrote: > On Tue, Jul 15, 2014 at 11:40:38PM +0900, Gilles GOUAILLARDET wrote: > >r32236 is a suspect > > > >i am afk > > > >I just read the code and a class is initialized with > opal_class_initiali

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-15 Thread Dave Goodell (dgoodell)
This commit (and the subsequent amendments to the feature) doesn't appear to support escaping the separator. A later commit allows you to change the separator character, which helps, but AFAICS you still can't actually escape the separator itself. That seems like a real deficiency to me... Fu

Re: [OMPI devel] 100% test failures

2014-07-15 Thread George Bosilca
r32248 should be the fix for this issue. I was overly optimistic about the cleanup of the classes. It turns out this is not possible without deep rearrangement of the class infrastructure. More info on the commit log. Sorry for the mess, George. On Tue, Jul 15, 2014 at 11:38 AM, George Bosilc

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread George Bosilca
I withdraw my comment on this, it turns out I “misspoke” (or in other words I was wrong about the class cleanup). The base class structures are stored as objects in the corresponding shared library memory region, and these regions become unavailable once a shared library is unloaded. As a result

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32163 - in trunk: opal/mca/base orte/tools/orterun

2014-07-15 Thread Mike Dubman
these are two separate issues: 1. -x var=val (or -mca opal_base_envlist var=val) will work in the same way opal_base_envlist does the same as "-x" and can be used in the very same fashion as -x 2. When list of vars is passed with help of opal_base_envlist, the escaping is possible but escaped cha

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread Pritchard, Howard r
Hi Folks, Is the opal library explicitly closed by a dlclose? I don't think there's anything wrong with using ctor/dtors in shared libraries, but one does need to make sure that in these functions there's no assumptions about ordering of them wrt to other ctors/dtors.shared libraries explic

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread Paul Hargrove
On Tue, Jul 15, 2014 at 12:49 PM, Pritchard, Howard r wrote: > I don't think there's anything wrong with using ctor/dtors in shared > libraries, > but one does need to make sure that in these functions there's no > assumptions > about ordering of them wrt to other ctors/dtors. > The ELF specific

Re: [OMPI devel] RFC: Move the Open MPI communication infrastructure in OPAL

2014-07-15 Thread Ralph Castain
George: I've asked the various BTL developers of the components you listed below (minus Portals4 as I couldn't get hold of them), and we are agreed that we can move forward. So please go ahead and commit this merge - it'll break things, but we all agreed it would be easier to resolve in the tru

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread Joshua Ladd
According to http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html *"constructor * * destructor * * constructor (*priority*)** destructor (priority)**The constructor attribute causes the function to be called automatically before execution enters main (). Similarly, the destructor attribute c

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread Paul Hargrove
The priority appears to have been added in gcc 4.3. You'll note it is not described in https://gcc.gnu.org/onlinedocs/gcc-4.2.0/gcc/Function-Attributes.html I also don't think the presence of the priority argument fixes anything... An OpenMPI code author cannot change the "priority" of a ctor or

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread Ralph Castain
I wonder if we aren't using a howitzer to swat a gnat. It seems to me that this is loaded with potential problems, as Paul describes, and I shudder to think of how hard this is going to be when we consider all the compiler/environment combinations we support and the range of libraries our variou

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread George Bosilca
Enforcing the portability of this sounds like a huge [almost impossible] mess, without a clean portable solution (more about this below). However, few things should be considered: - Except for reinit, Open MPI works without it! If we provide such a capability it will be more a convenience capabilit

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread Ralph Castain
I'm unsure where Intel's compilers sit on that list. When you say it works except for reinit, are you saying that the only issue here is that MPI_T_Finalize is calling opal_finalize_util solely because of the valgrind cleanup? And if it didn't do that, we would leak but would otherwise be just

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread Paul Hargrove
On Tue, Jul 15, 2014 at 5:48 PM, George Bosilca wrote: > - Except for reinit, Open MPI works without it! If we provide such a > capability it will be more a convenience capability to keep valgrind happy, > than a necessity A valgrid suppression file seems like the most appropriate tool for that

Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-15 Thread Ralph Castain
I've attached a solution that blocks the segfault without requiring any gyrations. Can someone explain why this isn't adequate?Alternate solution was to simply decrement opal_util_initialized in MPI_T_finalize rather than calling finalize itself. Either way resolves the problem in a very simple man