Re: [OMPI devel] 1.3 release date?

2008-10-22 Thread Greg Watson
Brad, Many thanks for the update. Greg On Oct 22, 2008, at 8:43 PM, Brad Benton wrote: Greg, Here is the latest schedule that we have for getting 1.3 out the door: https://svn.open-mpi.org/trac/ompi/milestone/Open%20MPI%201.3 Basically, this schedule sets Nov. 10 as the release date with a

Re: [OMPI devel] Restarting processes on different node

2008-10-22 Thread Paul H. Hargrove
Leonardo, As you say, there is the possiblity that moving from one node to another has caused problems due to different shared libraries. The result from this could be a segmentation fault, an illegal instruction or even a bus error. In all three cases, however, this failure generates a

Re: [OMPI devel] adding new functions to a BTL

2008-10-22 Thread Eugene Loh
Ralf Wildenhues wrote: Jeff Squyres wrote: We use lt_dlopen() to open the plugins (Libtool's wrapper for a portable dlopen). It opens all plugins (DSOs) in a private scope. That private scope is kept deep in the OPAL MCA base and not exposed elsewhere in the

Re: [OMPI devel] adding new functions to a BTL

2008-10-22 Thread Ralf Wildenhues
Hello Jeff, Eugene, > Jeff Squyres wrote: > >> We use lt_dlopen() to open the plugins (Libtool's wrapper for a >> portable dlopen). It opens all plugins (DSOs) in a private scope. >> That private scope is kept deep in the OPAL MCA base and not exposed >> elsewhere in the code base. So

Re: [OMPI devel] adding new functions to a BTL

2008-10-22 Thread Eugene Loh
Jeff Squyres wrote: We use lt_dlopen() to open the plugins (Libtool's wrapper for a portable dlopen). It opens all plugins (DSOs) in a private scope. That private scope is kept deep in the OPAL MCA base and not exposed elsewhere in the code base. So if you manually dlopen a plugin

Re: [OMPI devel] Component open

2008-10-22 Thread Ralph Castain
Hmmm...interesting. I see what's going on - I'm having a build system issue that is causing some of the dynamic libraries to not be seen. Red herring - thanks for clarifying! Camille: thanks for fixing this way back when. Ralph On Oct 22, 2008, at 1:17 PM, George Bosilca wrote: Ralph,

Re: [OMPI devel] Component open

2008-10-22 Thread George Bosilca
Ralph, This problem was fixed long ago by some of the work Camille did. The exact revision number is r15402 (https://svn.open-mpi.org/trac/ompi/changeset/15402 ). I'm using this feature daily and so far I had any problems with it. To reuse your example here is what Camille came up with. $

Re: [OMPI devel] Comm_spawn limits

2008-10-22 Thread Ralph Castain
I can't swear to this because I haven't fully grokked it yet, but I believe the answer is: 1. if child jobs have completed, it won't hurt. I think the various subsystem cleanup their bookkeeping when a job completes, so we could possibly reuse the number. Might be some race conditions we

Re: [OMPI devel] Comm_spawn limits

2008-10-22 Thread George Bosilca
What's happened if we roll around with the counter ? george. On Oct 22, 2008, at 2:49 PM, Ralph Castain wrote: There recently was activity on the mailing lists where someone was attempting to call comm_spawn 100,000 times. Setting aside the threading issues that were the focus of that

[OMPI devel] Comm_spawn limits

2008-10-22 Thread Ralph Castain
There recently was activity on the mailing lists where someone was attempting to call comm_spawn 100,000 times. Setting aside the threading issues that were the focus of that exchange, the fact is that OMPI currently cannot handle that many comm_spawns. The ORTE jobid is composed of two

Re: [OMPI devel] adding new functions to a BTL

2008-10-22 Thread Jeff Squyres
George reminds me that I forgot to explain why you couldn't dlsym We use lt_dlopen() to open the plugins (Libtool's wrapper for a portable dlopen). It opens all plugins (DSOs) in a private scope. That private scope is kept deep in the OPAL MCA base and not exposed elsewhere in the

Re: [OMPI devel] Direct routed module

2008-10-22 Thread George Bosilca
Youpiii! george. On Oct 21, 2008, at 4:53 PM, Ralph Castain wrote: Hello all I am working on adding a new radix tree routed module and am simultaneously doing a little streamlining to the overall routed- related code for scalability. One thing that would help cleanup several areas

Re: [OMPI devel] adding new functions to a BTL

2008-10-22 Thread Jeff Squyres
Short answer because we're all still in Chicago... Terry tells me that you're just hacking around trying to see what works, etc. So adding direct calls to the BTL in this kind of scenario is ok. I'm sure you're aware that this is not good for real code. :-) To directly call a BTL

Re: [OMPI devel] OOB-TCP Retries

2008-10-22 Thread Ralph Castain
Sorry for delayed response - had some things to finish, then had to stare at this code for awhile. Unfortunately, the OOB is a snarled can of hideous worms. It looks to me that the OOB continues to attempt to complete any pending message requests once it detects that retries have exceeded

Re: [OMPI devel] Direct routed module

2008-10-22 Thread Jeff Squyres
Sounds good to me. On Oct 21, 2008, at 3:53 PM, Ralph Castain wrote: Hello all I am working on adding a new radix tree routed module and am simultaneously doing a little streamlining to the overall routed- related code for scalability. One thing that would help cleanup several areas of

[OMPI devel] adding new functions to a BTL

2008-10-22 Thread Eugene Loh
I'm trying to prototype an idea inside OMPI and am running into a problem. I want to add a new function to a BTL and to have the PML call this function. I can't just put such a function call into the PML (not even for my prototype) since the PML is loaded before the BTL and so the PML will

Re: [OMPI devel] -display-map

2008-10-22 Thread Greg Watson
Ralph, I guess the issue for us is that we will have to run two commands to get the information we need. One to get the configuration information, such as version and MCA parameters, and one to get the host information, whereas it would seem more logical that this should all be available

[OMPI devel] Component open

2008-10-22 Thread Ralph Castain
I've been digging a little into optimization and found something that seems counterintuitive in the way OMPI is handling components. Specifically, if I specify a component I want used for a framework, OMPI still does a component load and open on every component in the framework - it only

[OMPI devel] Restarting processes on different node

2008-10-22 Thread Leonardo Fialho
Hi All, I´m trying to implement my FT architecture in Open MPI. Just now I need to restart a faulty process from a checkpoint. I saw that Josh uses orte-restart which call opal-restart through an ordinary mpirun call. It´s now good for me because in this case the restarted process becomes in