Re: [OMPI devel] Modex and others

2008-11-13 Thread Leonardo Fialho
Jeff, I agree with your viewpoint, principally about the "reachability". But... Looking from the FT viewpoint, sometimes (or some FT architectures), wants to recover an application process on other node different from the first. In this case a new modex should be called. It's fine for coordina

[OMPI devel] RML OOB, What´s wrong?

2008-11-13 Thread Leonardo Fialho
Ralph and others, I made two tests with the RML/OOB while a PML module (I know it is trange, but I need it) waits for a message (orte_rml_recv_buffer(...)). The first one was using the --enable-progress-threads and the second one without this. My test was: Sent a message from an orted, i.e.

[OMPI devel] Open MPI at SC'08: win a Wii!

2008-11-13 Thread Jeff Squyres
Who wants to win a Wii? (you know, to take home and give *to your kids* -- yeah, that's it...) The Open MPI community will be at SC'08 in force this year, featuring: - Our usual Open MPI State of the Union BOF (Wednesday, 12:15-1:15pm, room #14 on level 4) - Drawings to win one of *6* Ninte

Re: [OMPI devel] RML OOB, What´s wrong?

2008-11-13 Thread Ralph Castain
Mainly because we know that the RML and OOB are not thread safe? :-) Seriously, we know that ORTE has thread safety issues, mostly in the RML/OOB area, which is why we do not allow it to be used with threading. You are responsible for thread locking above that layer, if you intend to use th

Re: [OMPI devel] Modex and others

2008-11-13 Thread Ralph Castain
If you look at the Dec meeting wiki, you will see that we are moving quickly to a modex-less launch anyway. It won't be the default because it requires pre-discovery of the cluster's network resources (for which we will provide a tool or method), but it will help resolve some of these probl

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r19991

2008-11-13 Thread Tim Mattox
I'm not 100% sure, but this looks like the changeset that caused all of IU's trunk MTT runs last night to segfault... yes, all. :-( Here's the magnitude of the problem: http://www.open-mpi.org/mtt/index.php?do_redir=883 Note how pretty much everything was passing for 1.4a1r19979, and everything f

Re: [OMPI devel] Modex and others

2008-11-13 Thread Leonardo Fialho
Ralph, Very good document. About the MPI layer (in case of fault), my idea is to give to BML the ability to handle BTL errors which occurs when a process die (and probably have been migrated), discovering the new location. I think that it is possible because the HNP request the restart for th

Re: [OMPI devel] SM backing file size

2008-11-13 Thread Eugene Loh
Ralph Castain wrote: As has frequently been commented upon at one time or another, the shared memory backing file can be quite huge. There used to be a param for controlling this size, but I can't find it in 1.3 - or at least, the name or method for controlling file size has morphed into