Re: [OMPI devel] OMPI devel] NIC Failover and Message Stripping of Open MPI

2012-10-25 Thread Lirong Jian
Thanks, guys. I will check the code of OB1 more carefully. Thanks. Best, Lirong Message: 7 > Date: Thu, 25 Oct 2012 10:55:51 -0700 > From: Ralph Castain > Subject: Re: [OMPI devel] NIC Failover and Message Stripping of Open > MPI. > To: Open MPI Developers > Message-ID: > Content-Type

Re: [OMPI devel] 1.7.0rc3 available - PLEASE test

2012-10-25 Thread Ralph Castain
Okay, 1.7.0rc4 has been posted, with ompi_info fixed :-) On Oct 25, 2012, at 2:47 PM, Ralph Castain wrote: > Ah, I see the problem - it is localized to ompi_info and due to the fact that > we aren't setting things up completely in that code (trying to avoid a > complete start). > > FWIW: you

Re: [OMPI devel] MX BTL segfaults

2012-10-25 Thread Brice Goglin
Le 25/10/2012 23:56, Barrett, Brian W a écrit : > Hi all - > > The MX BTL segfaults during MPI_FINALIZE in the trunk (and did before my > mpool change in r27485). I'm not really interested in fixing it; the > problem does not occur with the MX MTL. Does anyone else have interest in > fixing it?

[OMPI devel] MX BTL segfaults

2012-10-25 Thread Barrett, Brian W
Hi all - The MX BTL segfaults during MPI_FINALIZE in the trunk (and did before my mpool change in r27485). I'm not really interested in fixing it; the problem does not occur with the MX MTL. Does anyone else have interest in fixing it? If not, should we remove it from the trunk (we already remo

Re: [OMPI devel] 1.7.0rc3 available - PLEASE test

2012-10-25 Thread Ralph Castain
Ah, I see the problem - it is localized to ompi_info and due to the fact that we aren't setting things up completely in that code (trying to avoid a complete start). FWIW: you are most certainly allowed to amend event callbacks in the base open function. I'll fix it. Meantime, the branch seems

Re: [OMPI devel] 1.7.0rc3 available - PLEASE test

2012-10-25 Thread Ralph Castain
Strange - okay, will look. On Oct 25, 2012, at 2:34 PM, George Bosilca wrote: > Broken enough that even the ompi_info fails. And for a good reason: one is > not allowed to amend event callbacks in the component base open function. > > Here is the stack: > 0 0x77a82fc4 in opal_libevent

Re: [OMPI devel] 1.7.0rc3 available - PLEASE test

2012-10-25 Thread George Bosilca
Broken enough that even the ompi_info fails. And for a good reason: one is not allowed to amend event callbacks in the component base open function. Here is the stack: 0 0x77a82fc4 in opal_libevent2019_event_priority_set (ev=0x63b100, pri=3) at ../../../../../../ompi/opal/mca/event/

Re: [OMPI devel] NIC Failover and Message Stripping of Open MPI.

2012-10-25 Thread Ralph Castain
Just an FYI - I asked a similar question recently and got the following answer from Rolf: > In my case, it was specific to openib only and it required you to be running > with two or more IB rails. > Then, if one of them failed, we just shut it down, and continued with the > working ones. > You

Re: [OMPI devel] NIC Failover and Message Stripping of Open MPI.

2012-10-25 Thread George Bosilca
On Oct 25, 2012, at 17:54 , Lirong Jian wrote: > Hi foks, > > Sorry to bother you guys, but I have some questions about Open MPI and really > want your help. > > There are some papers (e.g., [1, 2, 3], although they are sort of old-aged) > mentioning that Open MPI is supporting NIC failover

Re: [OMPI devel] [EXTERNAL] Re: Latency perf: v1.6 vs. v1.7 vs. trunk

2012-10-25 Thread Jeff Squyres
On Oct 25, 2012, at 1:00 PM, Barrett, Brian W wrote: > Your first e-mail got eaten by our virus scanner (it doesn't like .bz2 > files), See http://www.open-mpi.org/community/lists/devel/2012/10/11638.php. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisc

Re: [OMPI devel] [EXTERNAL] Re: Latency perf: v1.6 vs. v1.7 vs. trunk

2012-10-25 Thread Barrett, Brian W
Your first e-mail got eaten by our virus scanner (it doesn't like .bz2 files), but we could probably only register the libnbc progress function on first use, but it would slightly slow down all non blocking collectives. Probably worth it, but not sure I'll have time to add that code today. Brian

Re: [OMPI devel] Latency perf: v1.6 vs. v1.7 vs. trunk

2012-10-25 Thread Jeff Squyres
Something that might not be clear from my initial writeup: 1. I had to go change C code to disable libnbc. Since non-blocking collectives are part of MPI-3: a) we have no convenient configure argument to not build the libnbc coll component (there is a way, but it's laborious), and b) eve

Re: [OMPI devel] Latency perf: v1.6 vs. v1.7 vs. trunk

2012-10-25 Thread Jeff Squyres
On Oct 25, 2012, at 12:32 PM, Jeff Squyres (jsquyres) wrote: > 1. sm NetPipe latencies up to size 150 bytes (run on a Sandy Bride, 2 procs > same core) s/core/socket/ -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/

[OMPI devel] Latency perf: v1.6 vs. v1.7 vs. trunk

2012-10-25 Thread Jeff Squyres
Attached are the following graphs: 1. sm NetPipe latencies up to size 150 bytes (run on a Sandy Bride, 2 procs same core) 2. openib NetPipe latencies up to size 150 bytes (run on 2 old Xeons [pre-Nehalem] with old Mellanox ConnectX IB HCAs) 3. Same as #1, but all the way up to 8MB 4. Same as #2,

[OMPI devel] NIC Failover and Message Stripping of Open MPI.

2012-10-25 Thread Lirong Jian
Hi foks, Sorry to bother you guys, but I have some questions about Open MPI and really want your help. There are some papers (e.g., [1, 2, 3], although they are sort of old-aged) mentioning that Open MPI is supporting NIC failover and message stripping over multiple NICs. However, when I read the

[OMPI devel] 1.7.0rc3 available - PLEASE test

2012-10-25 Thread Ralph Castain
Hi folks We have posted the first release candidate for the 1.7.0 release in the usual place: http://www.open-mpi.org/software/ompi/v1.7/ Please put it thru the wringer to help us validate it prior to release later this month. We are still updating the NEWS section, but we REALLY need to vali