Re: [OMPI devel] Modex and others

2008-11-14 Thread Jeff Squyres
Hmm. I'm not sure the BML is the right place to do this. The BML doesn't know anything about the internals of the BTLs; it's just a dispatch / multiplexer. Unfortunately, few of us are in a good place to respond at the moment -- SC is next week and we're all hosed trying to get ready for

Re: [OMPI devel] Modex and others

2008-11-13 Thread Leonardo Fialho
Ralph, Very good document. About the MPI layer (in case of fault), my idea is to give to BML the ability to handle BTL errors which occurs when a process die (and probably have been migrated), discovering the new location. I think that it is possible because the HNP request the restart for th

Re: [OMPI devel] Modex and others

2008-11-13 Thread Ralph Castain
If you look at the Dec meeting wiki, you will see that we are moving quickly to a modex-less launch anyway. It won't be the default because it requires pre-discovery of the cluster's network resources (for which we will provide a tool or method), but it will help resolve some of these probl

Re: [OMPI devel] Modex and others

2008-11-13 Thread Leonardo Fialho
Jeff, I agree with your viewpoint, principally about the "reachability". But... Looking from the FT viewpoint, sometimes (or some FT architectures), wants to recover an application process on other node different from the first. In this case a new modex should be called. It's fine for coordina

Re: [OMPI devel] Modex and others

2008-11-07 Thread Jeff Squyres
On Nov 7, 2008, at 10:18 AM, Leonardo Fialho wrote: I understand that a process need to have the contact information to send MPI messages to other processes, and modex permits it. My question is, why do not perform the contact exchange when it is necessary? For example: in a M/W applicati

[OMPI devel] Modex and others

2008-11-07 Thread Leonardo Fialho
Hi All, I understand that a process need to have the contact information to send MPI messages to other processes, and modex permits it. My question is, why do not perform the contact exchange when it is necessary? For example: in a M/W application, the workers does not need more information