Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-11 Thread Jeff Squyres
On Mar 10, 2008, at 2:04 PM, Jon Mason wrote: Specifying only 1 PP QP via command line seems to be working. It now passes a tests that failed 100% of the time with the credit issue on my 2 node cluster. Futher tests on a larger setup are still pending, but this looks like a good workaround.

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-11 Thread Gleb Natapov
On Mon, Mar 10, 2008 at 01:52:22PM -0500, Steve Wise wrote: > >Does OMPI do lazy dereg to maintain a cache of registered user buffers? Not by default. You'll have to use -mca mpi_leave_pinned 1 to enable lazy dereg. -- Gleb.

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-10 Thread Jon Mason
On Mon, Mar 10, 2008 at 10:03:27AM -0500, Jeff Squyres wrote: > On Mar 10, 2008, at 9:50 AM, Steve Wise wrote: > > > (just thinking out loud here): The OMPi code could be designed to > > _not_ > > assume recv's are posted until the CPC indicates they are ready. IE > > sort > > of asynchronous

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-10 Thread Steve Wise
Jeff Squyres wrote: On Mar 10, 2008, at 9:57 AM, Steve Wise wrote: A single PP QP might be fine for now, and chelsio's next-gen part will support SRQs and not have this funky issue. Good! But why use such a large buffer size for a single PP QP? Why not use s

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-10 Thread Steve Wise
Gleb Natapov wrote: On Mon, Mar 10, 2008 at 09:50:13AM -0500, Steve Wise wrote: I personally don't like the idea to add another layer of complexity to openib BTL code just to work around HW that doesn't follow spec. If work around is simple that is OK, but in this case it is not so simple and

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-10 Thread Gleb Natapov
On Mon, Mar 10, 2008 at 09:50:13AM -0500, Steve Wise wrote: > > I personally don't like the idea to add another layer of complexity to > > openib > > BTL code just to work around HW that doesn't follow spec. If work around > > is simple that is OK, but in this case it is not so simple and will add

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-10 Thread Jeff Squyres
On Mar 10, 2008, at 9:57 AM, Steve Wise wrote: A single PP QP might be fine for now, and chelsio's next-gen part will support SRQs and not have this funky issue. Good! But why use such a large buffer size for a single PP QP? Why not use something around 16KB? You can do that, but you'll

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-10 Thread Jeff Squyres
On Mar 10, 2008, at 9:50 AM, Steve Wise wrote: (just thinking out loud here): The OMPi code could be designed to _not_ assume recv's are posted until the CPC indicates they are ready. IE sort of asynchronous behavior. When the recvs are ready, the CPC could up-call the btl and then the cre

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-10 Thread Steve Wise
Jeff Squyres wrote: On Mar 9, 2008, at 3:39 PM, Gleb Natapov wrote: 1. There was a discussion about this on openfabrics mailing list and the conclusion was that what Open MPI does is correct according to IB/ iWarp spec. 2. Is it possible to fix your FW to follow iWarp spec? Perhaps it is

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-10 Thread Steve Wise
Gleb Natapov wrote: On Sun, Mar 09, 2008 at 02:48:09PM -0500, Jon Mason wrote: Issue (as described by Steve Wise): Currently OMPI uses qp 0 for all credit updates (by design). This breaks when running over the chelsio rnic due to a race condition between advertising the availability of a bu

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-10 Thread Jeff Squyres
On Mar 9, 2008, at 3:39 PM, Gleb Natapov wrote: 1. There was a discussion about this on openfabrics mailing list and the conclusion was that what Open MPI does is correct according to IB/ iWarp spec. 2. Is it possible to fix your FW to follow iWarp spec? Perhaps it is possible to implement i

Re: [OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-09 Thread Gleb Natapov
On Sun, Mar 09, 2008 at 02:48:09PM -0500, Jon Mason wrote: > Issue (as described by Steve Wise): > > Currently OMPI uses qp 0 for all credit updates (by design). This breaks > when running over the chelsio rnic due to a race condition between > advertising the availability of a buffer using qp0 w

[OMPI devel] OMPI OpenIB Credit Schema breaks Chelsio HW

2008-03-09 Thread Jon Mason
After discussing this issue with Jeff via private e-mails. I would like to open the issue to the group for futher discussion. Issue (as described by Steve Wise): Currently OMPI uses qp 0 for all credit updates (by design). This breaks when running over the chelsio rnic due to a race condition be