Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-14 Thread Galen Shipman
The patch applies to ib_multifrag as is without a conflict. But the branch doesn't compile with or without the patch so I was not able to test it. Do you have some uncommitted changes that may generate a conflict? Can you commit them so they can be resolved? If there is no conflict between

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-14 Thread Jeff Squyres
On Jun 14, 2007, at 7:11 AM, Jeff Squyres wrote: Now I see that my fix was in the right place, but still a little bit wrong. I committed a fix to my fix in r15073. Can you check it? My cluster is still running MTT from last night; I'll need to wait for several jobs to finish. I'll check it la

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-14 Thread Jeff Squyres
On Jun 14, 2007, at 6:32 AM, Gleb Natapov wrote: 794:mca_btl_openib_endpoint_recv] can't find suitable endpoint for this peer Now I see that my fix was in the right place, but still a little bit wrong. I committed a fix to my fix in r15073. Can you check it? My cluster is still running MTT f

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-14 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 07:08:51PM +0300, Gleb Natapov wrote: > On Wed, Jun 13, 2007 at 09:38:21AM -0600, Galen Shipman wrote: > > Hi Gleb, > > > > As we have discussed before I am working on adding support for > > multiple QPs with either per peer resources or shared resources. > > As a result

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-14 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 01:54:28PM -0400, Jeff Squyres wrote: > On Jun 13, 2007, at 1:37 PM, Gleb Natapov wrote: > > >> I have 2 hosts: one with 3 active ports and one with 2 active ports. > >> If I run an MPI job between them, the openib BTL wireup got badly and > >> it aborts. So handling a het

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Galen Shipman
On Jun 13, 2007, at 12:07 PM, Gleb Natapov wrote: On Wed, Jun 13, 2007 at 02:05:00PM -0400, Jeff Squyres wrote: On Jun 13, 2007, at 1:54 PM, Jeff Squyres wrote: With today's trunk, I still see the problem: Same thing happens on v1.2 branch. I'll re-open #548. I am sure it was never test

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 02:05:00PM -0400, Jeff Squyres wrote: > On Jun 13, 2007, at 1:54 PM, Jeff Squyres wrote: > > > With today's trunk, I still see the problem: > > Same thing happens on v1.2 branch. I'll re-open #548. > I am sure it was never tested with multiple subnets. I'll try to get su

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Jeff Squyres
On Jun 13, 2007, at 1:54 PM, Jeff Squyres wrote: With today's trunk, I still see the problem: Same thing happens on v1.2 branch. I'll re-open #548. -- Jeff Squyres Cisco Systems

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Jeff Squyres
On Jun 13, 2007, at 1:37 PM, Gleb Natapov wrote: I have 2 hosts: one with 3 active ports and one with 2 active ports. If I run an MPI job between them, the openib BTL wireup got badly and it aborts. So handling a heterogeneous number of ports is not currently handled properly in the code. Are

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 10:52:53AM -0600, Galen Shipman wrote: > > On Jun 13, 2007, at 10:48 AM, Jeff Squyres wrote: > > > I wonder if this is bringing up the point that there are several of > > us working in the openib code base -- I wonder if it would be > > worthwhile to have a [short] telecon

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Galen Shipman
On Jun 13, 2007, at 11:33 AM, Jeff Squyres wrote: On Jun 13, 2007, at 1:15 PM, Nysal Jan wrote: There is a ticket (closed) here: https://svn.open-mpi.org/trac/ompi/ ticket/548 It was fixed by Galen for 1.2. Ah -- I forgot to look at closed tickets. I think we broke it again; it certainly f

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 12:45:01PM -0400, Jeff Squyres wrote: > On Jun 13, 2007, at 12:08 PM, Gleb Natapov wrote: > > > I am not committing this yet. I want people to review my logic and the > > patch. If the change is OK with everyone how cares then I want this > > change to go into 1.2 branch. >

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Jeff Squyres
On Jun 13, 2007, at 1:15 PM, Nysal Jan wrote: There is a ticket (closed) here: https://svn.open-mpi.org/trac/ompi/ ticket/548 It was fixed by Galen for 1.2. Ah -- I forgot to look at closed tickets. I think we broke it again; it certainly fails on the trunk (perhaps related to what Gleb

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Galen Shipman
On Jun 13, 2007, at 11:15 AM, Nysal Jan wrote: I was just bitten yesterday by a problem that I've known about for a while but had never gotten around to looking into (I could have sworn that there was an open trac ticket on this, but I can't find one anywhere). I have 2 hosts: one with 3 act

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Nysal Jan
I was just bitten yesterday by a problem that I've known about for a while but had never gotten around to looking into (I could have sworn that there was an open trac ticket on this, but I can't find one anywhere). I have 2 hosts: one with 3 active ports and one with 2 active ports. If I run an

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Galen Shipman
On Jun 13, 2007, at 10:48 AM, Jeff Squyres wrote: I wonder if this is bringing up the point that there are several of us working in the openib code base -- I wonder if it would be worthwhile to have a [short] teleconference to discuss what we're all doing in openib, where we're doing it (trunk,

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Jeff Squyres
I wonder if this is bringing up the point that there are several of us working in the openib code base -- I wonder if it would be worthwhile to have a [short] teleconference to discuss what we're all doing in openib, where we're doing it (trunk, branch, whatever), when we expect to have it

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Jeff Squyres
On Jun 13, 2007, at 12:08 PM, Gleb Natapov wrote: I am not committing this yet. I want people to review my logic and the patch. If the change is OK with everyone how cares then I want this change to go into 1.2 branch. I don't care how this change will get to the trunk. I can use patched versio

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Gleb Natapov
On Wed, Jun 13, 2007 at 09:38:21AM -0600, Galen Shipman wrote: > Hi Gleb, > > As we have discussed before I am working on adding support for > multiple QPs with either per peer resources or shared resources. > As a result of this I am trying to clean up a lot of the OpenIB code. > It has grown

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Galen Shipman
On Jun 13, 2007, at 9:49 AM, Torsten Hoefler wrote: Hi Galen,Gleb, there is also something weird going on if I call the basic alltoall during the module_init() of a collective module (I need to wire up my own QPs in my coll component). It takes 7 seconds for 4 nodes and more than 30 minutes for

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Torsten Hoefler
Hi Galen,Gleb, there is also something weird going on if I call the basic alltoall during the module_init() of a collective module (I need to wire up my own QPs in my coll component). It takes 7 seconds for 4 nodes and more than 30 minutes for 120 nodes. It seems to be an OpenIB wireup issue becaus

Re: [OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Galen Shipman
Hi Gleb, As we have discussed before I am working on adding support for multiple QPs with either per peer resources or shared resources. As a result of this I am trying to clean up a lot of the OpenIB code. It has grown up organically over the years and needs some attention. Perhaps we can co

[OMPI devel] Problem with openib on demand connection bring up.

2007-06-13 Thread Gleb Natapov
Hello everyone, I encountered a problem with openib on depend connection code. Basically it works only by pure luck if you have more then one endpoint for the same proc and sometimes breaks in mysterious ways. The algo works like this: A wants to connect to B so it creates QP and sends it to B.