Re: [OMPI devel] 1.3.1rc3 was borked; 1.3.1rc4 is out

2009-03-03 Thread Brian W. Barrett
On Tue, 3 Mar 2009, Brian W. Barrett wrote: On Tue, 3 Mar 2009, Jeff Squyres wrote: 1.3.1rc3 had a race condition in the ORTE shutdown sequence. The only difference between rc3 and rc4 was a fix for that race condition. Please test ASAP: http://www.open-mpi.org/software/ompi/v1.3/ I'm

Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Eugene Loh
Jeff Squyres wrote: How about an MCA parameter to switch between this mechanism (early sendi) and the original behavior (late sendi)? This is the usual way that we resolve "I want to do X / I want to do Y" disputes. :-) I see the smiley face, but am unsure how much of the message to appl

Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Eugene Loh
Brian W. Barrett wrote: On Tue, 3 Mar 2009, Eugene Loh wrote: First, this behavior is basically what I was proposing and what George didn't feel comfortable with. It is arguably no compromise at all. (Uggh, why must I be so honest?) For eager messages, it favors BTLs with sendi functions,

Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Brian W. Barrett
On Tue, 3 Mar 2009, Jeff Squyres wrote: On Mar 3, 2009, at 3:31 PM, Eugene Loh wrote: First, this behavior is basically what I was proposing and what George didn't feel comfortable with. It is arguably no compromise at all. (Uggh, why must I be so honest?) For eager messages, it favors BTL

Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Eugene Loh
Terry Dontje wrote: Eugene Loh wrote: I'm on the verge of giving up moving the sendi call in the PML. I will try one or two last things, including this e-mail asking for feedback. The idea is that when a BTL goes over a very low-latency interconnect (like sm), we really want to shave off

Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Jeff Squyres
On Mar 3, 2009, at 3:31 PM, Eugene Loh wrote: First, this behavior is basically what I was proposing and what George didn't feel comfortable with. It is arguably no compromise at all. (Uggh, why must I be so honest?) For eager messages, it favors BTLs with sendi functions, which could le

Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Brian W. Barrett
On Tue, 3 Mar 2009, Eugene Loh wrote: First, this behavior is basically what I was proposing and what George didn't feel comfortable with. It is arguably no compromise at all. (Uggh, why must I be so honest?) For eager messages, it favors BTLs with sendi functions, which could lead to those

Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Eugene Loh
Jeff Squyres wrote: How about a compromise... Keep a separate list somewhere of the sendi-enabled BTLs (this avoids looping over all the btl's and testing -- you can just loop over the btl's that you *know* have a sendi). Put that at the top of the PML and avoid the costly overhead, yadd

Re: [OMPI devel] 1.3.1rc3 was borked; 1.3.1rc4 is out

2009-03-03 Thread Brian W. Barrett
On Tue, 3 Mar 2009, Jeff Squyres wrote: 1.3.1rc3 had a race condition in the ORTE shutdown sequence. The only difference between rc3 and rc4 was a fix for that race condition. Please test ASAP: http://www.open-mpi.org/software/ompi/v1.3/ I'm sorry, I've failed to test rc1 & rc2 on Catam

[OMPI devel] 1.3.1rc3 was borked; 1.3.1rc4 is out

2009-03-03 Thread Jeff Squyres
1.3.1rc3 had a race condition in the ORTE shutdown sequence. The only difference between rc3 and rc4 was a fix for that race condition. Please test ASAP: http://www.open-mpi.org/software/ompi/v1.3/ -- Jeff Squyres Cisco Systems

Re: [OMPI devel] How to configure Open MPI on multi-port IB HCA cluster

2009-03-03 Thread Jeff Squyres
On Mar 3, 2009, at 2:48 AM, Jie Cai wrote: We have installed a dual-port ConnectX HCA cluster with PIC-E 2.0 slots, and each port represented as individual interface. How to configure the Open MPI and hardware system to correctly use the both ports for communication? Open MPI should just se

Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Jeff Squyres
How about a compromise... Keep a separate list somewhere of the sendi-enabled BTLs (this avoids looping over all the btl's and testing -- you can just loop over the btl's that you *know* have a sendi). Put that at the top of the PML and avoid the costly overhead, yadda yadda yadda. But i

[OMPI devel] Writeup of new release methodology

2009-03-03 Thread Jeff Squyres
Sorry I missed the call this morning. I wrote up the new release methodology, including the bootstrapping-to- the-v1.3-series stuff on this wiki page: https://svn.open-mpi.org/trac/ompi/wiki/ReleaseMethodology -- Jeff Squyres Cisco Systems

Re: [OMPI devel] PML/ob1 problem

2009-03-03 Thread Lenny Verkhovsky
sorry, missed this commit. Thanks, George, On 3/3/09, George Bosilca wrote: > Which solution seems to be working ? > > This bug was fixed a while ago in the trunk > (https://svn.open-mpi.org/trac/ompi/changeset/20591) and in > the 1.3 branch. It even made it in the 1.3.2. > > george. > > > On

Re: [OMPI devel] [PATCH 3/4] opal-ps: Use the return value from asprintf as the header length.

2009-03-03 Thread Jeff Squyres
Done. On Feb 19, 2009, at 7:29 AM, Bert Wesarg wrote: From: Bert Wesarg asprintf returns the length of the written header, use this as the length. Regards, Bert Wesarg --- orte/tools/orte-ps/orte-ps.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --quilt old/orte/tools

Re: [OMPI devel] [PATCH 1/4] opal-ps: fix memory leak

2009-03-03 Thread Jeff Squyres
I committed the rest of these in 20697. Thanks! On Feb 19, 2009, at 7:29 AM, Bert Wesarg wrote: From: Bert Wesarg --- orte/tools/orte-ps/orte-ps.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --quilt old/orte/tools/orte-ps/orte-ps.c new/orte/tools/orte-ps/ orte-ps.c

Re: [OMPI devel] PML/ob1 problem

2009-03-03 Thread George Bosilca
Which solution seems to be working ? This bug was fixed a while ago in the trunk (https://svn.open-mpi.org/trac/ompi/changeset/20591 ) and in the 1.3 branch. It even made it in the 1.3.2. george. On Mar 3, 2009, at 05:01 , Lenny Verkhovsky wrote: Seems to be working. George, can you commi

Re: [OMPI devel] [PATCH 1/4] opal-ps: fix memory leak

2009-03-03 Thread Jeff Squyres
Oops; this got missed. Thanks for the reminder; applied in r20694. On Mar 3, 2009, at 2:26 AM, Bert Wesarg wrote: 2009/2/19 Bert Wesarg : From: Bert Wesarg Free the memory alocated by the call to asprintf. Regards, Bert Wesarg --- orte/tools/orte-ps/orte-ps.c |1 + 1 file changed,

Re: [OMPI devel] help-orte-top.txt: add missing [

2009-03-03 Thread Jeff Squyres
Done; thanks. On Mar 3, 2009, at 2:17 AM, Bert Wesarg wrote: Regards, Bert Index: orte/tools/orte-top/help-orte-top.txt === --- orte/tools/orte-top/help-orte-top.txt (revision 20692) +++ orte/tools/orte-top/help-orte-top.txt

[OMPI devel] 1.3.1rc3 escapes

2009-03-03 Thread Jeff Squyres
The only difference between 1.3.1rc2 and rc3 is George's datatype fix: https://svn.open-mpi.org/trac/ompi/changeset/20684 Please test it ASAP: http://www.open-mpi.org/software/ompi/v1.3/ -- Jeff Squyres Cisco Systems

Re: [OMPI devel] calling sendi earlier in the PML

2009-03-03 Thread Terry Dontje
Eugene Loh wrote: I'm on the verge of giving up moving the sendi call in the PML. I will try one or two last things, including this e-mail asking for feedback. The idea is that when a BTL goes over a very low-latency interconnect (like sm), we really want to shave off whatever we can from th

Re: [OMPI devel] PML/ob1 problem

2009-03-03 Thread Lenny Verkhovsky
Seems to be working. George, can you commit it, pls. Thanks Lenny. On Thu, Feb 19, 2009 at 3:05 PM, Jeff Squyres wrote: > George -- any thoughts on this one? > > On Feb 11, 2009, at 1:01 AM, Mike Dubman wrote: > >> >> Hello guys, >> >> I'm running some experimental tcp btl which implements rdma

[OMPI devel] How to configure Open MPI on multi-port IB HCA cluster

2009-03-03 Thread Jie Cai
We have installed a dual-port ConnectX HCA cluster with PIC-E 2.0 slots, and each port represented as individual interface. How to configure the Open MPI and hardware system to correctly use the both ports for communication? Are we expecting to see wider bandwidth with Open MPI? In order to see

Re: [OMPI devel] [PATCH 1/4] opal-ps: fix memory leak

2009-03-03 Thread Bert Wesarg
2009/2/19 Bert Wesarg : > From: Bert Wesarg > > Free the memory alocated by the call to asprintf. > > Regards, > Bert Wesarg > > --- > >  orte/tools/orte-ps/orte-ps.c |    1 + >  1 file changed, 1 insertion(+) > > diff --quilt old/orte/tools/orte-ps/orte-ps.c new/orte/tools/orte-ps/orte-ps.c > ---

[OMPI devel] help-orte-top.txt: add missing [

2009-03-03 Thread Bert Wesarg
Regards, Bert Index: orte/tools/orte-top/help-orte-top.txt === --- orte/tools/orte-top/help-orte-top.txt (revision 20692) +++ orte/tools/orte-top/help-orte-top.txt (working copy) @@ -46,7 +46,7 @@ keyword "file". Please u

Re: [OMPI devel] ompi v1.3 compilation problem on ia64/gcc/rhel4.7

2009-03-03 Thread Mike Dubman
thanks.we will test it and update you promptly On Mon, Mar 2, 2009 at 10:28 PM, Jeff Squyres wrote: > Disregard -- it looks like the VT guys have fixed this issue. > > Can you test 1.3.1rc2 or later? > > > > On Feb 24, 2009, at 2:02 AM, Mike Dubman wrote: > > I searched for similar problems rep