Re: [OMPI devel] collective problems

2007-11-07 Thread Richard Graham
On 11/8/07 12:25 AM, "Patrick Geoffray" wrote: > Richard Graham wrote: >> The real problem, as you and others have pointed out is the lack of >> predictable time slices for the progress engine to do its work, when relying >> on the ULP to make calls into the library... > > The real, real prob

Re: [OMPI devel] collective problems

2007-11-07 Thread Shipman, Galen M.
The lengths we go to avoid progress :-) On 11/7/07 10:19 PM, "Richard Graham" wrote: > The real problem, as you and others have pointed out is the lack of > predictable time slices for the progress engine to do its work, when relying > on the ULP to make calls into the library... > > Rich >

Re: [OMPI devel] collective problems

2007-11-07 Thread Patrick Geoffray
Richard Graham wrote: The real problem, as you and others have pointed out is the lack of predictable time slices for the progress engine to do its work, when relying on the ULP to make calls into the library... The real, real problem is that the BTL should handle progression at their level, s

Re: [OMPI devel] collective problems

2007-11-07 Thread Richard Graham
The real problem, as you and others have pointed out is the lack of predictable time slices for the progress engine to do its work, when relying on the ULP to make calls into the library... Rich On 11/8/07 12:07 AM, "Brian Barrett" wrote: > As it stands today, the problem is that we can inject

Re: [OMPI devel] collective problems

2007-11-07 Thread Brian Barrett
As it stands today, the problem is that we can inject things into the BTL successfully that are not injected into the NIC (due to software flow control). Once a message is injected into the BTL, the PML marks completion on the MPI request. If it was a blocking send that got marked as comp

Re: [OMPI devel] collective problems

2007-11-07 Thread Richard Graham
Does this mean that we donĀ¹t have a queue to store btl level descriptors that are only partially complete ? Do we do an all or nothing with respect to btl level requests at this stage ? Seems to me like we want to mark things complete at the MPI level ASAP, and that this proposal is not to do

Re: [OMPI devel] collective problems

2007-11-07 Thread Jeff Squyres
On Nov 7, 2007, at 9:33 PM, Patrick Geoffray wrote: Remember that this is all in the context of Galen's proposal for btl_send() to be able to return NOT_ON_WIRE -- meaning that the send was successful, but it has not yet been sent (e.g., openib BTL buffered it because it ran out of credits). S

Re: [OMPI devel] collective problems

2007-11-07 Thread Patrick Geoffray
Jeff Squyres wrote: This is not a problem in the current code base. Remember that this is all in the context of Galen's proposal for btl_send() to be able to return NOT_ON_WIRE -- meaning that the send was successful, but it has not yet been sent (e.g., openib BTL buffered it because it ra

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-07 Thread Ralph Castain
What changed is that we never passed mca params to the orted before - they always went to the app, but it's the orted that has the issue. There is a bug ticket thread on this subject - I forget the number immediately. Basically, the problem was that we cannot generally pass the local environment t

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-07 Thread Tim Prins
I'm curious what changed to make this a problem. How were we passing mca param from the base to the app before, and why did it change? I think that options 1 & 2 below are no good, since we, in general, allow string mca params to have spaces (as far as I understand it). So a more general approa

Re: [OMPI devel] accessors to context id and message id's

2007-11-07 Thread George Bosilca
On Nov 6, 2007, at 8:38 AM, Terry Dontje wrote: George Bosilca wrote: If I understand correctly your question, then we don't need any extension. Each request has a unique ID (from PERUSE perspective). However, if I remember well this is only half implemented in our PERUSE layer (i.e. it works

Re: [OMPI devel] collective problems

2007-11-07 Thread Jeff Squyres
This is not a problem in the current code base. Remember that this is all in the context of Galen's proposal for btl_send() to be able to return NOT_ON_WIRE -- meaning that the send was successful, but it has not yet been sent (e.g., openib BTL buffered it because it ran out of credits).

Re: [OMPI devel] collective problems

2007-11-07 Thread George Bosilca
On Nov 7, 2007, at 12:51 PM, Jeff Squyres wrote: The same callback is called in both cases. In the case that you described, the callback is called just a little bit deeper into the recursion, when in the "normal case" it will get called from the first level of the recursion. Or maybe I miss som

Re: [OMPI devel] collective problems

2007-11-07 Thread Jeff Squyres
On Nov 7, 2007, at 12:29 PM, George Bosilca wrote: I finally talked with Galen and Don about this issue in depth. Our understanding is that the "request may get freed before recursion unwinds" issue is *only* a problem within the context of a single MPI call (e.g., MPI_SEND). Is that right?

Re: [OMPI devel] collective problems

2007-11-07 Thread George Bosilca
On Nov 7, 2007, at 11:06 AM, Jeff Squyres wrote: Gleb -- I finally talked with Galen and Don about this issue in depth. Our understanding is that the "request may get freed before recursion unwinds" issue is *only* a problem within the context of a single MPI call (e.g., MPI_SEND). Is that r

[OMPI devel] Incorrect one-sided test

2007-11-07 Thread Brian W. Barrett
Hi all - Lisa Glendenning, who's working on a Portals one-sided component, discovered that the test onesided/test_start1.c in our repository is incorrect. It assumes that MPI_Win_start is non-blocking, but the standard says that "MPI_WIN_START is allowed to block until the corresponding MPI_

Re: [OMPI devel] v1.2 branch mpi_preconnect_all

2007-11-07 Thread Jeff Squyres
Don, Galen, and I talked about this in depth on the phone today and think that it is a symptom of the same issue discussed in this thread: http://www.open-mpi.org/community/lists/devel/2007/10/2382.php Note my message in that thread from just a few minutes ago: http://www.open-mpi.org

Re: [OMPI devel] collective problems

2007-11-07 Thread Jeff Squyres
Gleb -- I finally talked with Galen and Don about this issue in depth. Our understanding is that the "request may get freed before recursion unwinds" issue is *only* a problem within the context of a single MPI call (e.g., MPI_SEND). Is that right? Specifically, if in an MPI_SEND, the B

Re: [OMPI devel] Multiworld MCA parameter values broken

2007-11-07 Thread Ralph H Castain
Sorry for delay - wasn't ignoring the issue. There are several fixes to this problem - ranging in order from least to most work: 1. just alias "ssh" to be "ssh -Y" and run without setting the mca param. It won't affect anything on the backend because the daemon/procs don't use ssh. 2. include "p

[OMPI devel] carto framework requirements

2007-11-07 Thread Sharon Melamed
Hi, I wrote some SW requirements for the caro framework. Please review this and post comments. Thanks. Sharon. carto_framework_requirements.pdf Description: carto_framework_requirements.pdf