Re: [OMPI devel] trunk hangs since r19010

2008-07-29 Thread Jeff Squyres
On Jul 29, 2008, at 9:47 AM, Jeff Squyres wrote: Ok. FWIW, Pasha and I think that openib has supported "send-to- self" for a while (we don't know exactly when; but Pasha thinks it is very old code that we don't check for self in add_procs). But it only broke recently. More in the FWIW c

Re: [OMPI devel] trunk hangs since r19010

2008-07-29 Thread Jeff Squyres
Ok. FWIW, Pasha and I think that openib has supported "send-to-self" for a while (we don't know exactly when; but Pasha thinks it is very old code that we don't check for self in add_procs). But it only broke recently. On Jul 29, 2008, at 9:31 AM, George Bosilca wrote: I ran few tests a

Re: [OMPI devel] trunk hangs since r19010

2008-07-29 Thread George Bosilca
I ran few tests and the only combination leading to a deadlock is openib and self. As openib is the only BTL supporting self communications (except self of course), I guess it interfere with self in some more or less strange ways. I didn't had the time to dig deeper yet to see what exactly

Re: [OMPI devel] trunk hangs since r19010

2008-07-29 Thread Pavel Shamis (Pasha)
Jeff Squyres wrote: This used to be true, but I think we changed it a while ago (Pasha: do you remember?) because Mellanox HCAs are capable of send-to-self (process) and there were no code changes necessary to enable it. So it allowed a slightly simpler command line. This was quite a while

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Brad Benton
On Mon, Jul 28, 2008 at 12:08 PM, Terry Dontje wrote: > Jeff Squyres wrote: > >> On Jul 28, 2008, at 12:03 PM, George Bosilca wrote: >> >> Interesting. The self is only used for local communications. I don't >>> expect that any benchmark execute such communications, but apparently I was >>> wron

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Terry Dontje
Jeff Squyres wrote: On Jul 28, 2008, at 12:03 PM, George Bosilca wrote: Interesting. The self is only used for local communications. I don't expect that any benchmark execute such communications, but apparently I was wrong. Please let me know the failing test, I will take a look this evening.

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Jeff Squyres
On Jul 28, 2008, at 11:05 AM, Ralph Castain wrote: only openib works for me too, but Glebs said to me once that it's illigal and I always need to use self btl. Don't know - could be true. But if that is true, then we should check to see if that condition is met and error out - with an

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Ralph Castain
My test wasn't a benchmark - I was just testing with a little program that calls mpi_init, mpi_barrier, and mpi_finalize. A test with just mpi_init/finalize works fine, so it looks like we simply hang when trying to communicate. This also only happens on multi-node operations. On Jul 28,

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Jeff Squyres
On Jul 28, 2008, at 12:03 PM, George Bosilca wrote: Interesting. The self is only used for local communications. I don't expect that any benchmark execute such communications, but apparently I was wrong. Please let me know the failing test, I will take a look this evening. FWIW, my manual

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread George Bosilca
Interesting. The self is only used for local communications. I don't expect that any benchmark execute such communications, but apparently I was wrong. Please let me know the failing test, I will take a look this evening. Thanks, george. On Jul 28, 2008, at 5:56 PM, Ralph Castain wro

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Ralph Castain
I just re-tested to confirm, and that is correct. -mca btl openib works -mca btl openib,selfhangs -mca btl openib,sm works On Jul 28, 2008, at 9:49 AM, George Bosilca wrote: I'm a little bit lost here. You're stating that openib,self doesn't w

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread George Bosilca
I'm a little bit lost here. You're stating that openib,self doesn't work while openib does? In other words that adding self to the BTL leads to deadlocks? george. PS: Btw, it is not supposed to work at all, except in the case where openib handle internal messages (where the source and de

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Ralph Castain
On Jul 28, 2008, at 8:52 AM, Lenny Verkhovsky wrote: only openib works for me too, but Glebs said to me once that it's illigal and I always need to use self btl. Don't know - could be true. But if that is true, then we should check to see if that condition is met and error out - with a

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Lenny Verkhovsky
only openib works for me too, but Glebs said to me once that it's illigal and I always need to use self btl. On 7/28/08, Jeff Squyres wrote: > > FWIW, all my MTT runs are hanging as well. > > > On Jul 28, 2008, at 10:37 AM, Brad Benton wrote: > > My experience is the same a Lenny's. I've teste

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Jeff Squyres
FWIW, all my MTT runs are hanging as well. On Jul 28, 2008, at 10:37 AM, Brad Benton wrote: My experience is the same a Lenny's. I've tested on x86_64 and ppc64 systems and tests using --mca btl openib,self hang in all cases. --brad 2008/7/28 Lenny Verkhovsky I failed to run on diffe

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Ralph Castain
Interesting - you are quite correct and I should have been more precise. I ran with -mca btl openib and it worked. So having just openib seems to be okay. On Jul 28, 2008, at 8:37 AM, Brad Benton wrote: My experience is the same a Lenny's. I've tested on x86_64 and ppc64 systems and tes

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Brad Benton
My experience is the same a Lenny's. I've tested on x86_64 and ppc64 systems and tests using --mca btl openib,self hang in all cases. --brad 2008/7/28 Lenny Verkhovsky > I failed to run on different nodes or on the same node via self,openib > > > > On 7/28/08, Ralph Castain wrote: >> >> I c

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Lenny Verkhovsky
I failed to run on different nodes or on the same node via self,openib On 7/28/08, Ralph Castain wrote: > > I checked this out some more and I believe it is ticket #1378 related. We > lock up if SM is included in the BTL's, which is what I had done on my test. > If I ^sm, I can run fine. > > On

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Ralph Castain
I checked this out some more and I believe it is ticket #1378 related. We lock up if SM is included in the BTL's, which is what I had done on my test. If I ^sm, I can run fine. On Jul 28, 2008, at 6:41 AM, Ralph Castain wrote: It could also be something new. Brad and I noted on Fri that IB

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Ralph Castain
It could also be something new. Brad and I noted on Fri that IB was locking up as soon as we tried any cross-node communications. Hadn't seen that before, and at least I haven't explored it further - planned to do so today. On Jul 28, 2008, at 6:01 AM, Lenny Verkhovsky wrote: I believe i

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Lenny Verkhovsky
I believe it it. On 7/28/08, Jeff Squyres wrote: > > On Jul 28, 2008, at 7:51 AM, Jeff Squyres wrote: > > Is this related to r1378? >> > > Gah -- I meant #1378, meaning the "PML ob1 deadlock" ticket. > > > On Jul 28, 2008, at 7:13 AM, Lenny Verkhovsky wrote: >> >> Hi, >>> >>> I experience hang

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Jeff Squyres
On Jul 28, 2008, at 7:51 AM, Jeff Squyres wrote: Is this related to r1378? Gah -- I meant #1378, meaning the "PML ob1 deadlock" ticket. On Jul 28, 2008, at 7:13 AM, Lenny Verkhovsky wrote: Hi, I experience hanging of tests ( latency ) since r19010 Best Regards Lenny.

Re: [OMPI devel] trunk hangs since r19010

2008-07-28 Thread Jeff Squyres
Is this related to r1378? On Jul 28, 2008, at 7:13 AM, Lenny Verkhovsky wrote: Hi, I experience hanging of tests ( latency ) since r19010 Best Regards Lenny. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/

[OMPI devel] trunk hangs since r19010

2008-07-28 Thread Lenny Verkhovsky
Hi, I experience hanging of tests ( latency ) since r19010 Best Regards Lenny.