Re: [OMPI devel] [PATCH] openib: clean-up connect to allow for new cm's

2007-12-11 Thread Jeff Squyres
Hmm. I don't think that we want to put knowledge of XRC in the OOB CPC (and vice versa). That seems like an abstraction violation. I didn't like that XRC knowledge was put in the connect base either, but I was too busy to argue with it. :-) Isn't there a better way somehow? Perhaps we s

[OMPI devel] Fwd: Subversion and trac outage

2007-12-11 Thread Jeff Squyres
Begin forwarded message: From: DongInn Kim Date: December 11, 2007 6:20:03 PM EST To: Jeff Squyres Subject: Subversion and trac outage Hi, I am sorry for the unexpected outage of subversion and trac of Open MPI. There was a mistake of handling the ACL information about blocking some sp

[OMPI devel] [PATCH] openib: clean-up connect to allow for new cm's

2007-12-11 Thread Jon Mason
Currently, alternate CMs cannot be called because ompi_btl_openib_connect_base_open forces a choice of either oob or xoob (and goes into an erroneous error path if you pick something else). This patch reorganizes ompi_btl_openib_connect_base_open so that new functions can easily be added. New Open

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Richard Graham
I will re-iterate my concern. The code that is there now is mostly nine years old (with some mods made when it was brought over to Open MPI). It took about 2 months of testing on systems with 5-13 way network parallelism to track down all KNOWN race conditions. This code is at the center of MPI

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Gleb Natapov
On Tue, Dec 11, 2007 at 08:03:52AM -0800, Andrew Friedley wrote: > Try UD, frags are reordered at a very high rate so should be a good test. mpi-ping works fine with UD BTL and the patch. > > Andrew > > Richard Graham wrote: > > Gleb, > > I would suggest that before this is checked in this be

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Gleb Natapov
On Tue, Dec 11, 2007 at 08:36:42AM -0800, Andrew Friedley wrote: > Possibly, though I have results from a benchmark I've written indicating > the reordering happens at the sender. I believe I found it was due to > the QP striping trick I use to get more bandwidth -- if you back down to > one QP

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Andrew Friedley
Possibly, though I have results from a benchmark I've written indicating the reordering happens at the sender. I believe I found it was due to the QP striping trick I use to get more bandwidth -- if you back down to one QP (there's a define in the code you can change), the reordering rate drop

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Gleb Natapov
On Tue, Dec 11, 2007 at 08:03:52AM -0800, Andrew Friedley wrote: > Try UD, frags are reordered at a very high rate so should be a good test. Good Idea I'll try this. BTW I thing the reason for such a high rate of reordering in UD is that it polls for MCA_BTL_UD_NUM_WC completions (500) and process

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Gleb Natapov
On Tue, Dec 11, 2007 at 10:00:08AM -0600, Brian W. Barrett wrote: > On Tue, 11 Dec 2007, Gleb Natapov wrote: > > > I did a rewrite of matching code in OB1. I made it much simpler and 2 > > times smaller (which is good, less code - less bugs). I also got rid > > of huge macros - very helpful if y

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Gleb Natapov
On Tue, Dec 11, 2007 at 11:00:51AM -0500, Richard Graham wrote: > Gleb, > I would suggest that before this is checked in this be tested on a system > that has N-way network parallelism, where N is as large as you can find. > This is a key bit of code for MPI correctness, and out-of-order operatio

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Andrew Friedley
Try UD, frags are reordered at a very high rate so should be a good test. Andrew Richard Graham wrote: Gleb, I would suggest that before this is checked in this be tested on a system that has N-way network parallelism, where N is as large as you can find. This is a key bit of code for MPI cor

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Richard Graham
Gleb, I would suggest that before this is checked in this be tested on a system that has N-way network parallelism, where N is as large as you can find. This is a key bit of code for MPI correctness, and out-of-order operations will break it, so you want to maximize the chance for such operations

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Brian W. Barrett
On Tue, 11 Dec 2007, Gleb Natapov wrote: I did a rewrite of matching code in OB1. I made it much simpler and 2 times smaller (which is good, less code - less bugs). I also got rid of huge macros - very helpful if you need to debug something. There is no performance degradation, actually I even

[OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Gleb Natapov
Hi, I did a rewrite of matching code in OB1. I made it much simpler and 2 times smaller (which is good, less code - less bugs). I also got rid of huge macros - very helpful if you need to debug something. There is no performance degradation, actually I even see very small performance improvemen

Re: [OMPI devel] opal_condition_wait

2007-12-11 Thread Gleb Natapov
On Tue, Dec 11, 2007 at 10:27:55AM -0500, Tim Prins wrote: > My understanding was that this behavior was not right, but upon further > inspection of the pthreads documentation this behavior seems to be > allowable. > I think that Open MPI does not implement condition variable in the strict sense

Re: [OMPI devel] opal_condition_wait

2007-12-11 Thread Tim Prins
Well, this makes some sense, although it still seems like this violates the spirit of condition variables. Thanks, Tim Brian W. Barrett wrote: On Thu, 6 Dec 2007, Tim Prins wrote: Tim Prins wrote: First, in opal_condition_wait (condition.h:97) we do not release the passed mutex if opal_usi

Re: [OMPI devel] opal_condition_wait

2007-12-11 Thread Tim Prins
Ok, I think I am understanding this a bit now. By not decrementing the signaled count, we are allowing a single broadcast to wake up the same thread multiple times, and are allowing a single cond_signal to wake up multiple threads. My understanding was that this behavior was not right, but upo

Re: [OMPI devel] openmpi-1.2.4 compilation error in orte_abort.c on Fedora 8 - patch included

2007-12-11 Thread Jeff Squyres
Er, ya -- duh. Oops. I'll fix... On Dec 11, 2007, at 5:07 AM, George Bosilca wrote: 0600 you means ? I don't really see why you want to share the file with the whole group ? Thanks, george. On Dec 10, 2007, at 5:15 PM, Ralph Castain wrote: Nah, go ahead! Just change the permission t

Re: [OMPI devel] openmpi-1.2.4 compilation error in orte_abort.c on Fedora 8 - patch included

2007-12-11 Thread George Bosilca
0600 you means ? I don't really see why you want to share the file with the whole group ? Thanks, george. On Dec 10, 2007, at 5:15 PM, Ralph Castain wrote: Nah, go ahead! Just change the permission to 0660 - that's a private file that others shouldn't really perturb. Ralph On 12/