subject:"Re\: \[OMPI devel\] matching code rewrite in OB1"

Re: [OMPI devel] matching code rewrite in OB1

2007-12-18 Thread Gleb Natapov

On Mon, Dec 17, 2007 at 08:08:02PM -0500, Richard Graham wrote: > Needless to say (for the nth time :-) ) that changing this bit of code > makes me > nervous. I've noticed it already :) >However, it occurred to me that there is a much better way to > test > this code than setting u

Re: [OMPI devel] matching code rewrite in OB1

2007-12-17 Thread Richard Graham

Gleb, Needless to say (for the nth time :-) ) that changing this bit of code makes me nervous. However, it occurred to me that there is a much better way to test this code than setting up an environment that generates some out of order events with out us being able to specify the order. Sin

Re: [OMPI devel] matching code rewrite in OB1

2007-12-17 Thread Gleb Natapov

On Thu, Dec 13, 2007 at 08:04:21PM -0500, Richard Graham wrote: > Yes, should be a bit more clear. Need an independent way to verify that > data is matched > in the correct order sending this information as payload is one way to do > this. So, > sending unique data in every message, and makin

Re: [OMPI devel] matching code rewrite in OB1

2007-12-14 Thread Gleb Natapov

On Fri, Dec 14, 2007 at 06:53:55AM -0500, Richard Graham wrote: > If you have positive confirmation that such things have happened, this will > go a long way. I instrumented the code to log all kind of info about fragment reordering while I chased a bug in openib that caused matching logic to malfu

Re: [OMPI devel] matching code rewrite in OB1

2007-12-14 Thread Richard Graham

If you have positive confirmation that such things have happened, this will go a long way. I will not trust the code until this has also been done with multiple independent network paths. I very rarely express such strong opinions, even if I don't agree with what is being done, but this is the co

Re: [OMPI devel] matching code rewrite in OB1

2007-12-14 Thread Gleb Natapov

On Thu, Dec 13, 2007 at 06:16:49PM -0500, Richard Graham wrote: > The situation that needs to be triggered, just as George has mentions, is > where we have a lot of unexpected messages, to make sure that when one that > we can match against comes in, all the unexpected messages that can be > matche

Re: [OMPI devel] matching code rewrite in OB1

2007-12-13 Thread Richard Graham

Yes, should be a bit more clear. Need an independent way to verify that data is matched in the correct order sending this information as payload is one way to do this. So, sending unique data in every message, and making sure that it arrives in the user buffers in the expected order is a way

Re: [OMPI devel] matching code rewrite in OB1

2007-12-13 Thread Richard Graham

The situation that needs to be triggered, just as George has mentions, is where we have a lot of unexpected messages, to make sure that when one that we can match against comes in, all the unexpected messages that can be matched with pre-posted receives are matched. Since we attempt to match only

Re: [OMPI devel] matching code rewrite in OB1

2007-12-13 Thread George Bosilca

Rich was referring to the fact that the reordering of fragments other than the matching ones is irrelevant to the Gleb's change. In order to trigger the changes we need to force a lot of small unexpected messages over multiple networks. The testing environment should have multiple similar n

Re: [OMPI devel] matching code rewrite in OB1

2007-12-13 Thread Gleb Natapov

On Wed, Dec 12, 2007 at 03:10:10PM -0600, Brian W. Barrett wrote: > On Wed, 12 Dec 2007, Gleb Natapov wrote: > > > On Wed, Dec 12, 2007 at 03:46:10PM -0500, Richard Graham wrote: > >> This is better than nothing, but really not very helpful for looking at the > >> specific issues that can arise wi

Re: [OMPI devel] matching code rewrite in OB1

2007-12-12 Thread Jeff Squyres

tapov [mailto:gl...@voltaire.com] Sent: Wednesday, December 12, 2007 03:54 PM Eastern Standard Time To: Open MPI Developers Subject: Re: [OMPI devel] matching code rewrite in OB1 On Wed, Dec 12, 2007 at 03:52:17PM -0500, Jeff Squyres wrote: > On Dec 12, 2007, at 3:20 PM, Gleb Natap

Re: [OMPI devel] matching code rewrite in OB1

2007-12-12 Thread Jeff Squyres

Was Rich referring to ensuring that the test codes checked that their payloads were correct (and not re-assembled in the wrong order)? On Dec 12, 2007, at 4:10 PM, Brian W. Barrett wrote: On Wed, 12 Dec 2007, Gleb Natapov wrote: On Wed, Dec 12, 2007 at 03:46:10PM -0500, Richard Graham wrot

Re: [OMPI devel] matching code rewrite in OB1

2007-12-12 Thread Brian W. Barrett

On Wed, 12 Dec 2007, Gleb Natapov wrote: On Wed, Dec 12, 2007 at 03:46:10PM -0500, Richard Graham wrote: This is better than nothing, but really not very helpful for looking at the specific issues that can arise with this, unless these systems have several parallel networks, with tests that wil

Re: [OMPI devel] matching code rewrite in OB1

2007-12-12 Thread Jeff Squyres (jsquyres)

Re: [OMPI devel] matching code rewrite in OB1 On Wed, Dec 12, 2007 at 03:52:17PM -0500, Jeff Squyres wrote: > On Dec 12, 2007, at 3:20 PM, Gleb Natapov wrote: > > >> How about making a tarball with this patch in it that can be thrown > >> at > >> everyone'

Re: [OMPI devel] matching code rewrite in OB1

2007-12-12 Thread Gleb Natapov

On Wed, Dec 12, 2007 at 03:52:17PM -0500, Jeff Squyres wrote: > On Dec 12, 2007, at 3:20 PM, Gleb Natapov wrote: > > >> How about making a tarball with this patch in it that can be thrown > >> at > >> everyone's MTT? (we can put the tarball on www.open-mpi.org > >> somewhere) > > I don't have

Re: [OMPI devel] matching code rewrite in OB1

2007-12-12 Thread Jeff Squyres

On Dec 12, 2007, at 3:20 PM, Gleb Natapov wrote: How about making a tarball with this patch in it that can be thrown at everyone's MTT? (we can put the tarball on www.open-mpi.org somewhere) I don't have access to www.open-mpi.org, but I can send you the patch. I can send you a tarball too,

Re: [OMPI devel] matching code rewrite in OB1

2007-12-12 Thread Gleb Natapov

On Wed, Dec 12, 2007 at 03:46:10PM -0500, Richard Graham wrote: > This is better than nothing, but really not very helpful for looking at the > specific issues that can arise with this, unless these systems have several > parallel networks, with tests that will generate a lot of parallel network >

Re: [OMPI devel] matching code rewrite in OB1

2007-12-12 Thread Richard Graham

This is better than nothing, but really not very helpful for looking at the specific issues that can arise with this, unless these systems have several parallel networks, with tests that will generate a lot of parallel network traffic, and be able to self check for out-of-order received - i.e. this

Re: [OMPI devel] matching code rewrite in OB1

2007-12-12 Thread Gleb Natapov

On Wed, Dec 12, 2007 at 11:57:11AM -0500, Jeff Squyres wrote: > Gleb -- > > How about making a tarball with this patch in it that can be thrown at > everyone's MTT? (we can put the tarball on www.open-mpi.org somewhere) I don't have access to www.open-mpi.org, but I can send you the patch. I can

Re: [OMPI devel] matching code rewrite in OB1

2007-12-12 Thread Jeff Squyres

Gleb -- How about making a tarball with this patch in it that can be thrown at everyone's MTT? (we can put the tarball on www.open-mpi.org somewhere) On Dec 11, 2007, at 4:14 PM, Richard Graham wrote: I will re-iterate my concern. The code that is there now is mostly nine years old (with

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Richard Graham

I will re-iterate my concern. The code that is there now is mostly nine years old (with some mods made when it was brought over to Open MPI). It took about 2 months of testing on systems with 5-13 way network parallelism to track down all KNOWN race conditions. This code is at the center of MPI

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Gleb Natapov

On Tue, Dec 11, 2007 at 08:03:52AM -0800, Andrew Friedley wrote: > Try UD, frags are reordered at a very high rate so should be a good test. mpi-ping works fine with UD BTL and the patch. > > Andrew > > Richard Graham wrote: > > Gleb, > > I would suggest that before this is checked in this be

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Gleb Natapov

On Tue, Dec 11, 2007 at 08:36:42AM -0800, Andrew Friedley wrote: > Possibly, though I have results from a benchmark I've written indicating > the reordering happens at the sender. I believe I found it was due to > the QP striping trick I use to get more bandwidth -- if you back down to > one QP

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Andrew Friedley

Possibly, though I have results from a benchmark I've written indicating the reordering happens at the sender. I believe I found it was due to the QP striping trick I use to get more bandwidth -- if you back down to one QP (there's a define in the code you can change), the reordering rate drop

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Gleb Natapov

On Tue, Dec 11, 2007 at 08:03:52AM -0800, Andrew Friedley wrote: > Try UD, frags are reordered at a very high rate so should be a good test. Good Idea I'll try this. BTW I thing the reason for such a high rate of reordering in UD is that it polls for MCA_BTL_UD_NUM_WC completions (500) and process

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Gleb Natapov

On Tue, Dec 11, 2007 at 10:00:08AM -0600, Brian W. Barrett wrote: > On Tue, 11 Dec 2007, Gleb Natapov wrote: > > > I did a rewrite of matching code in OB1. I made it much simpler and 2 > > times smaller (which is good, less code - less bugs). I also got rid > > of huge macros - very helpful if y

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Gleb Natapov

On Tue, Dec 11, 2007 at 11:00:51AM -0500, Richard Graham wrote: > Gleb, > I would suggest that before this is checked in this be tested on a system > that has N-way network parallelism, where N is as large as you can find. > This is a key bit of code for MPI correctness, and out-of-order operatio

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Andrew Friedley

Try UD, frags are reordered at a very high rate so should be a good test. Andrew Richard Graham wrote: Gleb, I would suggest that before this is checked in this be tested on a system that has N-way network parallelism, where N is as large as you can find. This is a key bit of code for MPI cor

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Richard Graham

Gleb, I would suggest that before this is checked in this be tested on a system that has N-way network parallelism, where N is as large as you can find. This is a key bit of code for MPI correctness, and out-of-order operations will break it, so you want to maximize the chance for such operations

Re: [OMPI devel] matching code rewrite in OB1

2007-12-11 Thread Brian W. Barrett

On Tue, 11 Dec 2007, Gleb Natapov wrote: I did a rewrite of matching code in OB1. I made it much simpler and 2 times smaller (which is good, less code - less bugs). I also got rid of huge macros - very helpful if you need to debug something. There is no performance degradation, actually I even

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

Re: [OMPI devel] matching code rewrite in OB1

30 matches

Site Navigation

Mail list logo

Footer information