Re: [OMPI devel] Adding error/verbose messages to the TCP BTL

2010-03-05 Thread Ralph Castain
On Mar 5, 2010, at 7:22 PM, Jeff Squyres wrote: > On Mar 5, 2010, at 6:10 PM, Ralph Castain wrote: > >>> I agree with Jeff's comments about the BTL_ERROR. How about a middle ground >>> here? We let the BTLs use BTL_ERROR, eventually with some modifications, >>> and we redirect the BTL_ERROR to

Re: [OMPI devel] Adding error/verbose messages to the TCP BTL

2010-03-05 Thread Jeff Squyres
On Mar 5, 2010, at 6:10 PM, Ralph Castain wrote: > > I agree with Jeff's comments about the BTL_ERROR. How about a middle ground > > here? We let the BTLs use BTL_ERROR, eventually with some modifications, > > and we redirect the BTL_ERROR to a more advanced macro including support > > for orte

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread Jeff Squyres
We already use global symbols; mca_base_component_repository.c invokes: if (lt_dladvise_global(&opal_mca_dladvise)) { return OPAL_ERROR; } On Mar 5, 2010, at 6:18 PM, George Bosilca wrote: > Unfortunately this will not fix his issues ;( I pretty sure that his problem > is relat

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread Jeff Squyres
On Mar 5, 2010, at 6:02 PM, Jeff Squyres (jsquyres) wrote: > I wondered aloud on IM to Terry after your earlier emails if we should just > custom-patch ltdl in OMPI to fix this issue. The problem is that libltdl is > effectively reporting the "wrong" error back to OMPI, so the error string > t

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread George Bosilca
Unfortunately this will not fix his issues ;( I pretty sure that his problem is related to the fact that mca_pml_v is exported by another dynamic module, and therefore not available via dlsym. I don't think there is a simple solution for this problem, except going back to GLOBAL symbols. geor

Re: [OMPI devel] Adding error/verbose messages to the TCP BTL

2010-03-05 Thread Ralph Castain
On Mar 5, 2010, at 3:52 PM, George Bosilca wrote: > > On Mar 5, 2010, at 14:59 , Ralph Castain wrote: > >>> I have never found BTL_ERROR to be terribly helpful. All it is is >>> essentially an fprintf -- it doesn't propagate errors upward or anything. >>> I tend to prefer show_help because

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread Jeff Squyres
Ick. I wondered aloud on IM to Terry after your earlier emails if we should just custom-patch ltdl in OMPI to fix this issue. The problem is that libltdl is effectively reporting the "wrong" error back to OMPI, so the error string that we get to print out ends up not being very useful (e.g.,

Re: [OMPI devel] Adding error/verbose messages to the TCP BTL

2010-03-05 Thread George Bosilca
On Mar 5, 2010, at 14:59 , Ralph Castain wrote: >> I have never found BTL_ERROR to be terribly helpful. All it is is >> essentially an fprintf -- it doesn't propagate errors upward or anything. I >> tend to prefer show_help because then you can provide a meaningful error >> message that way

Re: [OMPI devel] Adding error/verbose messages to the TCP BTL

2010-03-05 Thread Ralph Castain
On Mar 5, 2010, at 12:55 PM, Jeff Squyres wrote: > On Mar 5, 2010, at 2:34 PM, George Bosilca wrote: > >> Being user friendly is good, being way too user friendly is less (but I >> guess this is the price we have to pay for a production-quality code isn't >> it). > > Agreed. None of these me

Re: [OMPI devel] Adding error/verbose messages to the TCP BTL

2010-03-05 Thread Jeff Squyres
On Mar 5, 2010, at 2:34 PM, George Bosilca wrote: > Being user friendly is good, being way too user friendly is less (but I guess > this is the price we have to pay for a production-quality code isn't it). Agreed. None of these messages appear except in error cases or if you crank up the verbo

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread Terry Dontje
Have you found the symbol being exposed by another .so (ie have you done an nm on the .so that shows the symbol)? And are you sure that .so is loaded by the time your .so is being loaded? --td Leonardo Fialho wrote: No George, this trick does not change the problem. I'm looking for the proble

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread George Bosilca
Because I guess it is declared by another module loaded dynamically at runtime. As libtool load the symbols not in a global scope, this mca_pml_v will not be visible for other modules trying to use it. george. On Mar 5, 2010, at 14:35 , Leonardo Fialho wrote: > No George, this trick does not

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread Leonardo Fialho
No George, this trick does not change the problem. I'm looking for the problem in the mca_pml_v declaration, but I still can't figure out the reason why it doesn't work. Leonardo On Mar 5, 2010, at 8:12 PM, George Bosilca wrote: > I would first try the Open MPI configure option --disable-visib

Re: [OMPI devel] Adding error/verbose messages to the TCP BTL

2010-03-05 Thread George Bosilca
Being user friendly is good, being way too user friendly is less (but I guess this is the price we have to pay for a production-quality code isn't it). I have few comments: - In several places you replaced the BTL_ERROR (which was the way BTLs are supposed to complaints) by a call directly to o

[OMPI devel] Adding error/verbose messages to the TCP BTL

2010-03-05 Thread Jeff Squyres
>From https://svn.open-mpi.org/trac/ompi/ticket/2045, I have added a lot more >diagnostic error and verbose messages to the TCP BTL that detail what >endpoints it creates, what IP addresses and ports its trying to connect to, >etc. As part of this, I also added a magic ID string into the TCP BT

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread George Bosilca
I would first try the Open MPI configure option --disable-visibility. If this doesn't fix it, you should make sure that dlopen is called with the GLOBAL flag on (don't remember where exactly in the code and unfortunately I can't check right now). Use gdb and set a breakpoint to dlopen and you wi

Re: [OMPI devel] RFC: Rename --enable-*-threads and ENABLE*THREAD* (take 2)

2010-03-05 Thread Ralph Castain
I'm not going to commit this today - I think it would be a little quick :-) However, I do have it all building on a Mercurial branch with the new options. It would be REALLY GOOD if people interested in thread support were to check this out prior to me bringing it to the trunk. The branch can b

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread Leonardo Fialho
Yeah, probably ompi_request_null and opal_output are not good candidates. I'm trying with mca_pml_v. But I'm not familiarized with this framework although it is really small. George, you said to change this (opal/mca/base/mca_base_component_find.c): #if OPAL_HAVE_LTDL_ADVISE component_handle

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread Terry Dontje
Leonardo Fialho wrote: Yes, I renamed all references to Aurelien's componant name and removed all code regarding to the component itself. There are only functions which returns OMPI_SUCCESS. No other function is called. I'm debugging with LD_DEBUG=symbols, but the output is really huge! Proba

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread George Bosilca
This might be an issue with the [new] way libtool load the symbols, i.e., in a private space and not in a global one. Try turning off the visibility feature and see if you get the same error. george. On Mar 5, 2010, at 13:47 , Terry Dontje wrote: > I would also start nm'ing the .so's you thi

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread Terry Dontje
I would also start nm'ing the .so's you think the U symbols are resolved in to make sure they are exposed. Luckily you only have 3 symbols to look for. --td Ralph Castain wrote: It's probably a visibility issue - check for an OMPI_DECLSPEC missing from the declaration of a symbol. On Mar 5

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread Ralph Castain
It's probably a visibility issue - check for an OMPI_DECLSPEC missing from the declaration of a symbol. On Mar 5, 2010, at 11:40 AM, Leonardo Fialho wrote: > Yes, > > I renamed all references to Aurelien's componant name and removed all code > regarding to the component itself. There are only

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread Leonardo Fialho
Yes, I renamed all references to Aurelien's componant name and removed all code regarding to the component itself. There are only functions which returns OMPI_SUCCESS. No other function is called. I'm debugging with LD_DEBUG=symbols, but the output is really huge! Probably the error is in the

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread Ralph Castain
You said this component was a copy of Aurelien's component? Did you rename the critical elements (e.g., component, module) inside it to avoid name confusion? On Mar 5, 2010, at 11:27 AM, Leonardo Fialho wrote: > I see... but it is really strange because this module is clean, it does not > use n

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread Leonardo Fialho
I see... but it is really strange because this module is clean, it does not use nothing. This is the output of the nm command, I can't see any symbol which is not available. [lfialho@aoclsb-clus openmpi]$ nm mca_vprotocol_receiver.so 00201208 a _DYNAMIC 00201408 a _GLOBAL_OFFSET

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread Terry Dontje
Sorry meant to add this, but you might be able to try and find the symbol causing the issue by twiddling with LD_DEBUG --td Terry Dontje wrote: Possibly there is an external symbol in the .so that is being loaded that cannot be resolved. --td Leonardo Fialho wrote: Hi, I know that libtool do

Re: [OMPI devel] Missing Symbol

2010-03-05 Thread Terry Dontje
Possibly there is an external symbol in the .so that is being loaded that cannot be resolved. --td Leonardo Fialho wrote: Hi, I know that libtool does not help us to find the source of this error, but, what can generate the following error? [aoclsb-clus.uab.es:11724] mca: base: component_fi

[OMPI devel] Missing Symbol

2010-03-05 Thread Leonardo Fialho
Hi, I know that libtool does not help us to find the source of this error, but, what can generate the following error? [aoclsb-clus.uab.es:11724] mca: base: component_find: unable to open /home/lfialho/lib/openmpi/mca_vprotocol_receiver: perhaps a missing symbol, or compiled for a different ve

Re: [OMPI devel] RFC: Rename --enable-*-threads and ENABLE*THREAD* (take 2)

2010-03-05 Thread Terry Dontje
A couple comments: 1. I really assume the timeout is March 5th not February. 2. As to keeping the deprecated variables I think you really need to ditch the --enable-mpi-threads because if you synonym it with --enable-mpi-thread-multiple you are not mimicing what it did before but redefining i