Nathan,
Fixing the classes to correctly tear down everything was a two lines patch.
However, this doesn’t fix the bigger issue, which is related to the fact that
not all frameworks are correctly teared down, and when they are they leave
behind char* parameters not set to NULL, and that the fram
Ralph,
There are two reasons that prevent me from pushing this RFC forward.
1. Minor: The code has some minor issues related to the last set of BTL/PML
changes, and I didn't found the time to fix them.
2. Major: Not all BTLs have been updated and validated. What we need at
this point from their
With the latest trunk (r32246) I am getting crashes while the program is
shutting down. I assume this is related to some of the changes George just
made. George, can you take a look when you get a chance?
Looks like everyone is getting the segv during shutdown (mpirun, orted, and
application)
Hi folks
The changes to opal_class_finalize are generating 100% segfaults on the trunk:
175 free(cls->cls_construct_array);
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.132.el6_5.2.x86_64 libgcc-4.4.7-4.el6.x86_64
numactl-2.0.7-8.el6.x86_64
(gdb) where
#0 0x
r32236 is a suspect
i am afk
I just read the code and a class is initialized with opal_class_initialize the
first time an object is instantiated with OBJ_NEW
I would simply revert r32236 or update opal_class_finalize and
free(cls->cls_construct_array); only if cls->cls_construct_array is not N
On Tue, Jul 15, 2014 at 11:40:38PM +0900, Gilles GOUAILLARDET wrote:
>r32236 is a suspect
>
>i am afk
>
>I just read the code and a class is initialized with opal_class_initialize
>the first time an object is instantiated with OBJ_NEW
>
>I would simply revert r32236 or update
I'm also looking into it.
George.
On Tue, Jul 15, 2014 at 10:50 AM, Nathan Hjelm wrote:
> On Tue, Jul 15, 2014 at 11:40:38PM +0900, Gilles GOUAILLARDET wrote:
> >r32236 is a suspect
> >
> >i am afk
> >
> >I just read the code and a class is initialized with
> opal_class_initiali
This commit (and the subsequent amendments to the feature) doesn't appear to
support escaping the separator. A later commit allows you to change the
separator character, which helps, but AFAICS you still can't actually escape
the separator itself. That seems like a real deficiency to me...
Fu
r32248 should be the fix for this issue. I was overly optimistic about the
cleanup of the classes. It turns out this is not possible without deep
rearrangement of the class infrastructure. More info on the commit log.
Sorry for the mess,
George.
On Tue, Jul 15, 2014 at 11:38 AM, George Bosilc
I withdraw my comment on this, it turns out I “misspoke” (or in other words I
was wrong about the class cleanup). The base class structures are stored as
objects in the corresponding shared library memory region, and these regions
become unavailable once a shared library is unloaded. As a result
these are two separate issues:
1. -x var=val (or -mca opal_base_envlist var=val) will work in the same way
opal_base_envlist does the same as "-x" and can be used in the very same
fashion as -x
2. When list of vars is passed with help of opal_base_envlist, the escaping
is possible but escaped cha
Hi Folks,
Is the opal library explicitly closed by a dlclose?
I don't think there's anything wrong with using ctor/dtors in shared libraries,
but one does need to make sure that in these functions there's no assumptions
about ordering of them wrt to other ctors/dtors.shared libraries explic
On Tue, Jul 15, 2014 at 12:49 PM, Pritchard, Howard r
wrote:
> I don't think there's anything wrong with using ctor/dtors in shared
> libraries,
> but one does need to make sure that in these functions there's no
> assumptions
> about ordering of them wrt to other ctors/dtors.
>
The ELF specific
George: I've asked the various BTL developers of the components you listed
below (minus Portals4 as I couldn't get hold of them), and we are agreed that
we can move forward.
So please go ahead and commit this merge - it'll break things, but we all
agreed it would be easier to resolve in the tru
According to http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html
*"constructor *
* destructor *
* constructor (*priority*)** destructor (priority)**The constructor
attribute causes the function to be called automatically before execution
enters main (). Similarly, the destructor attribute c
The priority appears to have been added in gcc 4.3.
You'll note it is not described in
https://gcc.gnu.org/onlinedocs/gcc-4.2.0/gcc/Function-Attributes.html
I also don't think the presence of the priority argument fixes anything...
An OpenMPI code author cannot change the "priority" of a ctor or
I wonder if we aren't using a howitzer to swat a gnat. It seems to me that this
is loaded with potential problems, as Paul describes, and I shudder to think of
how hard this is going to be when we consider all the compiler/environment
combinations we support and the range of libraries our variou
Enforcing the portability of this sounds like a huge [almost impossible]
mess, without a clean portable solution (more about this below). However,
few things should be considered:
- Except for reinit, Open MPI works without it! If we provide such a
capability it will be more a convenience capabilit
I'm unsure where Intel's compilers sit on that list.
When you say it works except for reinit, are you saying that the only issue
here is that MPI_T_Finalize is calling opal_finalize_util solely because of the
valgrind cleanup? And if it didn't do that, we would leak but would otherwise
be just
On Tue, Jul 15, 2014 at 5:48 PM, George Bosilca wrote:
> - Except for reinit, Open MPI works without it! If we provide such a
> capability it will be more a convenience capability to keep valgrind happy,
> than a necessity
A valgrid suppression file seems like the most appropriate tool for that
I've attached a solution that blocks the segfault without requiring any gyrations. Can someone explain why this isn't adequate?Alternate solution was to simply decrement opal_util_initialized in MPI_T_finalize rather than calling finalize itself. Either way resolves the problem in a very simple man
21 matches
Mail list logo