Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Andrew Friedley
Steve Wise wrote: I hope you guys are documenting this in a way that makes this issue extremely clear to both uDAPL and OFA verbs (is this the right naming?) users. Maybe it's been done already, but is it possible to emit some sort of loud warning/error when the accept()'ing side tries to send

Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-09 Thread Caitlin Bestler
general-boun...@lists.openfabrics.org wrote: > On Wed, 2007-05-09 at 17:55 -0700, Andrew Friedley wrote: >> >> Steve Wise wrote: >>> On Wed, 2007-05-09 at 16:15 -0700, Andrew Friedley wrote: Steve Wise wrote: > There have been a series of discussions on the ofa general list > about th

Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 15:01 -0700, Sean Hefty wrote: > > The reason it is hard or impossible to solve this in the DAPL layer is > > that any rdma operation on the QP affects the state of that QP and the > > associate CQs. In addition, if you use an RDMA send to enforce this you > > impact the othe

Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 17:55 -0700, Andrew Friedley wrote: > > Steve Wise wrote: > > On Wed, 2007-05-09 at 16:15 -0700, Andrew Friedley wrote: > >> Steve Wise wrote: > >>> There have been a series of discussions on the ofa general list about > >>> this issue, and the conclusion to date is that it c

Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Caitlin Bestler
devel-boun...@open-mpi.org wrote: > Steve Wise wrote: >> There have been a series of discussions on the ofa general list about >> this issue, and the conclusion to date is that it cannot be resolved >> in the rdma-cm or iwarp-cm code of the linux rdma stack. Mainly >> because sending an RDMA messa

Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 17:46 -0700, Andrew Friedley wrote: > > Therefore, the only truly safe thing for an iWARP btl to do (or a > > udapl btl since that is also an iWARP btl) is to have the active > > layer send an MPI Layer "nop" of some kind immediately after > > establishing the connection if t

Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Andrew Friedley
Steve Wise wrote: On Wed, 2007-05-09 at 16:15 -0700, Andrew Friedley wrote: Steve Wise wrote: There have been a series of discussions on the ofa general list about this issue, and the conclusion to date is that it cannot be resolved in the rdma-cm or iwarp-cm code of the linux rdma stack. Ma

Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-09 Thread Caitlin Bestler
general-boun...@lists.openfabrics.org wrote: >> Therefore, the only truly safe thing for an iWARP btl to do (or a >> udapl btl since that is also an iWARP btl) is to have the active >> layer send an MPI Layer "nop" of some kind immediately after >> establishing the connection if there is nothing el

Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-09 Thread Andrew Friedley
Therefore, the only truly safe thing for an iWARP btl to do (or a udapl btl since that is also an iWARP btl) is to have the active layer send an MPI Layer "nop" of some kind immediately after establishing the connection if there is nothing else to send. This is fine for an iWARP/RDMACM/whatev

Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 16:15 -0700, Andrew Friedley wrote: > > Steve Wise wrote: > > There have been a series of discussions on the ofa general list about > > this issue, and the conclusion to date is that it cannot be resolved in > > the rdma-cm or iwarp-cm code of the linux rdma stack. Mainly be

Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-09 Thread Jeff Squyres
Understood, and I agree. FWIW: note that the CONNECTED state that I refered to is internal to OMPI's endpoint abstraction (not an iwarp/udapl/verbs/etc. state). It's part of our connection dance protocol. On May 9, 2007, at 5:33 PM, Caitlin Bestler wrote: Jeff Squyres wrote: - The ot

Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-09 Thread Caitlin Bestler
Jeff Squyres wrote: > > - The other peer (the receiver of the connection) must wait > to send its pending fragment(s) until it receives the first > frag from the connection initiator. This can be accomplished > either with another flag on the OMPI module struct or perhaps > making it part of the

Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-09 Thread Jeff Squyres
I talked with Steve a bunch on the phone about this. 1. This "connector must RDMA first" issue is an iWARP restriction -- it's not specific to udapl or verbs. For example, if you try to use udapl with iWARP on Solaris, you'll have the same issue (I have no idea whether you have iWARP drive

Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-09 Thread Caitlin Bestler
> > 2) OMPI is not adhering to the iwarp protocol requirement > that the ULP, > in this case OMPI, initiating the iwarp connection (the side > issuing the > dat_ep_connect() or rdma_connect()) _MUST_ be the first to > send an RDMA > message. So if a OMPI process _accepts_ an rdma connection, the

Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-09 Thread Donald Kerr
I guess I have not read enough about iwarp yet but if iwarp is sitting below ib verbs or udapl in the stack and is trying to impose restrictions which ib verbs or udapl do not adhere to then maybe iwarp is in the wrong place in the ofed stack. Having said that I do agree the OMPI community nee

Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 16:27 -0400, Donald Kerr wrote: > So then I agree with Andrew, I think you are trying to impose > restrictions on uDAPL which are not part of the Spec. > true, but if you want a single btl for IB and IW, then you'll need to address this issue in some way...

Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-09 Thread Donald Kerr
So then I agree with Andrew, I think you are trying to impose restrictions on uDAPL which are not part of the Spec. -DON Steve Wise wrote: On Wed, 2007-05-09 at 16:20 -0400, Donald Kerr wrote: I missing some context here. Where are you plugging iwarp and OMPI together? ofed-1.2 su

Re: [OMPI devel] [ofa-general] Re: OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 16:20 -0400, Donald Kerr wrote: > I missing some context here. Where are you plugging iwarp and OMPI > together? ofed-1.2 supports iwarp and the chelsio rnic. It can be accessed directly via the ofa verbs and ofa rdma-cm _as well as_ via udapl. I'm attempting to run OMP

Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Donald Kerr
I missing some context here. Where are you plugging iwarp and OMPI together? Steve Wise wrote: On Wed, 2007-05-09 at 11:42 -0400, Donald Kerr wrote: I agree OMPI trac ticket #890 should cover this. I will test the suggested fix, just removing that one line from btl_udapl.c, on Solaris. I

Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Andrew Friedley
Steve Wise wrote: There have been a series of discussions on the ofa general list about this issue, and the conclusion to date is that it cannot be resolved in the rdma-cm or iwarp-cm code of the linux rdma stack. Mainly because sending an RDMA message involves the ULP's work queue and complet

[OMPI devel] Nightly trunk tarball AC/AM change

2007-05-09 Thread Brian Barrett
Hi all - After a minor hiccup last night, nightly tarballs for the trunk (and eventually v1.3 branch) are now made with AC 2.61, AM 1.10, and LT 2.1a. Don't forget the mandatory update of AC and AM for the trunk coming saturday morning! Brian

Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 11:42 -0400, Donald Kerr wrote: > I agree OMPI trac ticket #890 should cover this. I will test the > suggested fix, just removing that one line from btl_udapl.c, on Solaris. > I am still not set up on Linux so hopefully Steve can confirm there. > All, First, I haven't tes

Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Donald Kerr
I agree OMPI trac ticket #890 should cover this. I will test the suggested fix, just removing that one line from btl_udapl.c, on Solaris. I am still not set up on Linux so hopefully Steve can confirm there. -DON Jeff Squyres wrote: FWIW, I would marginally prefer if this bug is tracked in t

Re: [OMPI devel] [ofa-general] OpenMPI and RDMA-CM

2007-05-09 Thread Jeff Squyres
On May 9, 2007, at 10:30 AM, Steve Wise wrote: Agreed. enabling udapl will get OMPI over iwarp immediately (and hopefully in ofed-1.2). Post ofed-1.2, I think OMPI _should_ create a rdma-cm btl. That's the plan... Yes and no. Please see my other reply about an "rdma cm" BTL... -- Jeff Squ

Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Jeff Squyres
FWIW, I would marginally prefer if this bug is tracked in the Open MPI trac ticket system, not the OFA bugzilla (Steve W. will have write access there as soon as Chelsio submits their OMPI 3rd party contribution agreement). We've traditionally [mostly] tracked OMPI bugs in the OMPI bug sys

Re: [OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
Although as Boris pointed out, perhaps the hack in OMPI is no longer needed at all... On Wed, 2007-05-09 at 08:41 -0500, Steve Wise wrote: > 606 opened to track the udapl change. > > 607 opened to track the ompi change to remove the port number stashing > hack. > > Status: I have a patch from

Re: [OMPI devel] [ofa-general] OpenMPI and RDMA-CM

2007-05-09 Thread Steve Wise
On Wed, 2007-05-09 at 08:37 +0300, Or Gerlitz wrote: > Andrew Friedley wrote: > > Jeff Squyres wrote: > FWIW, yes, adding RDMA CM support has actually been on my to-do list > for a while, but it keeps getting bumped by higher priority items. > It would be *much* better if some iWARP

[OMPI devel] OMPI over ofed udapl - bugs opened

2007-05-09 Thread Steve Wise
606 opened to track the udapl change. 607 opened to track the ompi change to remove the port number stashing hack. Status: I have a patch from Arlin to test today. I will test with that patch and with the OMPI port hack removed. Stay tuned... Steve. On Tue, 2007-05-08 at 15:47 -0700, Arlin

Re: [OMPI devel] [ofa-general] OpenMPI and RDMA-CM

2007-05-09 Thread Jeff Squyres
On May 9, 2007, at 1:37 AM, Or Gerlitz wrote: Doing a bit of zoom out from the "how to make ofed's udapl work for ompi" thread, my thinking is that the ompi udapl btl enablement is actually only the first step, where for production/longterm/etc you want to have an rdmacm btl. I think this