On Mar 5, 2010, at 7:22 PM, Jeff Squyres wrote:
> On Mar 5, 2010, at 6:10 PM, Ralph Castain wrote:
>
>>> I agree with Jeff's comments about the BTL_ERROR. How about a middle ground
>>> here? We let the BTLs use BTL_ERROR, eventually with some modifications,
>>> and we redirect the BTL_ERROR to
On Mar 5, 2010, at 6:10 PM, Ralph Castain wrote:
> > I agree with Jeff's comments about the BTL_ERROR. How about a middle ground
> > here? We let the BTLs use BTL_ERROR, eventually with some modifications,
> > and we redirect the BTL_ERROR to a more advanced macro including support
> > for orte
We already use global symbols; mca_base_component_repository.c invokes:
if (lt_dladvise_global(&opal_mca_dladvise)) {
return OPAL_ERROR;
}
On Mar 5, 2010, at 6:18 PM, George Bosilca wrote:
> Unfortunately this will not fix his issues ;( I pretty sure that his problem
> is relat
On Mar 5, 2010, at 6:02 PM, Jeff Squyres (jsquyres) wrote:
> I wondered aloud on IM to Terry after your earlier emails if we should just
> custom-patch ltdl in OMPI to fix this issue. The problem is that libltdl is
> effectively reporting the "wrong" error back to OMPI, so the error string
> t
Unfortunately this will not fix his issues ;( I pretty sure that his problem is
related to the fact that mca_pml_v is exported by another dynamic module, and
therefore not available via dlsym. I don't think there is a simple solution for
this problem, except going back to GLOBAL symbols.
geor
On Mar 5, 2010, at 3:52 PM, George Bosilca wrote:
>
> On Mar 5, 2010, at 14:59 , Ralph Castain wrote:
>
>>> I have never found BTL_ERROR to be terribly helpful. All it is is
>>> essentially an fprintf -- it doesn't propagate errors upward or anything.
>>> I tend to prefer show_help because
Ick.
I wondered aloud on IM to Terry after your earlier emails if we should just
custom-patch ltdl in OMPI to fix this issue. The problem is that libltdl is
effectively reporting the "wrong" error back to OMPI, so the error string that
we get to print out ends up not being very useful (e.g.,
On Mar 5, 2010, at 14:59 , Ralph Castain wrote:
>> I have never found BTL_ERROR to be terribly helpful. All it is is
>> essentially an fprintf -- it doesn't propagate errors upward or anything. I
>> tend to prefer show_help because then you can provide a meaningful error
>> message that way
On Mar 5, 2010, at 12:55 PM, Jeff Squyres wrote:
> On Mar 5, 2010, at 2:34 PM, George Bosilca wrote:
>
>> Being user friendly is good, being way too user friendly is less (but I
>> guess this is the price we have to pay for a production-quality code isn't
>> it).
>
> Agreed. None of these me
On Mar 5, 2010, at 2:34 PM, George Bosilca wrote:
> Being user friendly is good, being way too user friendly is less (but I guess
> this is the price we have to pay for a production-quality code isn't it).
Agreed. None of these messages appear except in error cases or if you crank up
the verbo
Have you found the symbol being exposed by another .so (ie have you done
an nm on the .so that shows the symbol)? And are you sure that .so is
loaded by the time your .so is being loaded?
--td
Leonardo Fialho wrote:
No George, this trick does not change the problem. I'm looking for the proble
Because I guess it is declared by another module loaded dynamically at runtime.
As libtool load the symbols not in a global scope, this mca_pml_v will not be
visible for other modules trying to use it.
george.
On Mar 5, 2010, at 14:35 , Leonardo Fialho wrote:
> No George, this trick does not
No George, this trick does not change the problem. I'm looking for the problem
in the mca_pml_v declaration, but I still can't figure out the reason why it
doesn't work.
Leonardo
On Mar 5, 2010, at 8:12 PM, George Bosilca wrote:
> I would first try the Open MPI configure option --disable-visib
Being user friendly is good, being way too user friendly is less (but I guess
this is the price we have to pay for a production-quality code isn't it).
I have few comments:
- In several places you replaced the BTL_ERROR (which was the way BTLs are
supposed to complaints) by a call directly to o
>From https://svn.open-mpi.org/trac/ompi/ticket/2045, I have added a lot more
>diagnostic error and verbose messages to the TCP BTL that detail what
>endpoints it creates, what IP addresses and ports its trying to connect to,
>etc. As part of this, I also added a magic ID string into the TCP BT
I would first try the Open MPI configure option --disable-visibility. If this
doesn't fix it, you should make sure that dlopen is called with the GLOBAL flag
on (don't remember where exactly in the code and unfortunately I can't check
right now). Use gdb and set a breakpoint to dlopen and you wi
I'm not going to commit this today - I think it would be a little quick :-)
However, I do have it all building on a Mercurial branch with the new options.
It would be REALLY GOOD if people interested in thread support were to check
this out prior to me bringing it to the trunk.
The branch can b
Yeah, probably ompi_request_null and opal_output are not good candidates. I'm
trying with mca_pml_v. But I'm not familiarized with this framework although it
is really small.
George, you said to change this (opal/mca/base/mca_base_component_find.c):
#if OPAL_HAVE_LTDL_ADVISE
component_handle
Leonardo Fialho wrote:
Yes,
I renamed all references to Aurelien's componant name and removed all code
regarding to the component itself. There are only functions which returns
OMPI_SUCCESS. No other function is called.
I'm debugging with LD_DEBUG=symbols, but the output is really huge! Proba
This might be an issue with the [new] way libtool load the symbols, i.e., in a
private space and not in a global one. Try turning off the visibility feature
and see if you get the same error.
george.
On Mar 5, 2010, at 13:47 , Terry Dontje wrote:
> I would also start nm'ing the .so's you thi
I would also start nm'ing the .so's you think the U symbols are resolved
in to make sure they are exposed. Luckily you only have 3 symbols to
look for.
--td
Ralph Castain wrote:
It's probably a visibility issue - check for an OMPI_DECLSPEC missing from the
declaration of a symbol.
On Mar 5
It's probably a visibility issue - check for an OMPI_DECLSPEC missing from the
declaration of a symbol.
On Mar 5, 2010, at 11:40 AM, Leonardo Fialho wrote:
> Yes,
>
> I renamed all references to Aurelien's componant name and removed all code
> regarding to the component itself. There are only
Yes,
I renamed all references to Aurelien's componant name and removed all code
regarding to the component itself. There are only functions which returns
OMPI_SUCCESS. No other function is called.
I'm debugging with LD_DEBUG=symbols, but the output is really huge! Probably
the error is in the
You said this component was a copy of Aurelien's component? Did you rename the
critical elements (e.g., component, module) inside it to avoid name confusion?
On Mar 5, 2010, at 11:27 AM, Leonardo Fialho wrote:
> I see... but it is really strange because this module is clean, it does not
> use n
I see... but it is really strange because this module is clean, it does not use
nothing. This is the output of the nm command, I can't see any symbol which is
not available.
[lfialho@aoclsb-clus openmpi]$ nm mca_vprotocol_receiver.so
00201208 a _DYNAMIC
00201408 a _GLOBAL_OFFSET
Sorry meant to add this, but you might be able to try and find the
symbol causing the issue by twiddling with LD_DEBUG
--td
Terry Dontje wrote:
Possibly there is an external symbol in the .so that is being loaded
that cannot be resolved.
--td
Leonardo Fialho wrote:
Hi,
I know that libtool do
Possibly there is an external symbol in the .so that is being loaded
that cannot be resolved.
--td
Leonardo Fialho wrote:
Hi,
I know that libtool does not help us to find the source of this error, but,
what can generate the following error?
[aoclsb-clus.uab.es:11724] mca: base: component_fi
Hi,
I know that libtool does not help us to find the source of this error, but,
what can generate the following error?
[aoclsb-clus.uab.es:11724] mca: base: component_find: unable to open
/home/lfialho/lib/openmpi/mca_vprotocol_receiver: perhaps a missing symbol, or
compiled for a different ve
A couple comments:
1. I really assume the timeout is March 5th not February.
2. As to keeping the deprecated variables I think you really need to
ditch the --enable-mpi-threads because if you synonym it with
--enable-mpi-thread-multiple you are not mimicing what it did before but
redefining i
29 matches
Mail list logo