On Jul 29, 2008, at 9:47 AM, Jeff Squyres wrote:
Ok. FWIW, Pasha and I think that openib has supported "send-to-
self" for a while (we don't know exactly when; but Pasha thinks it
is very old code that we don't check for self in add_procs). But it
only broke recently.
More in the FWIW c
Ok. FWIW, Pasha and I think that openib has supported "send-to-self"
for a while (we don't know exactly when; but Pasha thinks it is very
old code that we don't check for self in add_procs). But it only
broke recently.
On Jul 29, 2008, at 9:31 AM, George Bosilca wrote:
I ran few tests a
I ran few tests and the only combination leading to a deadlock is
openib and self. As openib is the only BTL supporting self
communications (except self of course), I guess it interfere with self
in some more or less strange ways. I didn't had the time to dig deeper
yet to see what exactly
Jeff Squyres wrote:
This used to be true, but I think we changed it a while ago (Pasha: do
you remember?) because Mellanox HCAs are capable of send-to-self
(process) and there were no code changes necessary to enable it. So
it allowed a slightly simpler command line. This was quite a while
On Mon, Jul 28, 2008 at 12:08 PM, Terry Dontje wrote:
> Jeff Squyres wrote:
>
>> On Jul 28, 2008, at 12:03 PM, George Bosilca wrote:
>>
>> Interesting. The self is only used for local communications. I don't
>>> expect that any benchmark execute such communications, but apparently I was
>>> wron
Jeff Squyres wrote:
On Jul 28, 2008, at 12:03 PM, George Bosilca wrote:
Interesting. The self is only used for local communications. I don't
expect that any benchmark execute such communications, but apparently
I was wrong. Please let me know the failing test, I will take a look
this evening.
On Jul 28, 2008, at 11:05 AM, Ralph Castain wrote:
only openib works for me too,
but Glebs said to me once that it's illigal and I always need to
use self btl.
Don't know - could be true. But if that is true, then we should
check to see if that condition is met and error out - with an
My test wasn't a benchmark - I was just testing with a little program
that calls mpi_init, mpi_barrier, and mpi_finalize.
A test with just mpi_init/finalize works fine, so it looks like we
simply hang when trying to communicate. This also only happens on
multi-node operations.
On Jul 28,
On Jul 28, 2008, at 12:03 PM, George Bosilca wrote:
Interesting. The self is only used for local communications. I don't
expect that any benchmark execute such communications, but
apparently I was wrong. Please let me know the failing test, I will
take a look this evening.
FWIW, my manual
Interesting. The self is only used for local communications. I don't
expect that any benchmark execute such communications, but apparently
I was wrong. Please let me know the failing test, I will take a look
this evening.
Thanks,
george.
On Jul 28, 2008, at 5:56 PM, Ralph Castain wro
I just re-tested to confirm, and that is correct.
-mca btl openib works
-mca btl openib,selfhangs
-mca btl openib,sm works
On Jul 28, 2008, at 9:49 AM, George Bosilca wrote:
I'm a little bit lost here. You're stating that openib,self doesn't
w
I'm a little bit lost here. You're stating that openib,self doesn't
work while openib does? In other words that adding self to the BTL
leads to deadlocks?
george.
PS: Btw, it is not supposed to work at all, except in the case where
openib handle internal messages (where the source and de
On Jul 28, 2008, at 8:52 AM, Lenny Verkhovsky wrote:
only openib works for me too,
but Glebs said to me once that it's illigal and I always need to use
self btl.
Don't know - could be true. But if that is true, then we should check
to see if that condition is met and error out - with a
only openib works for me too,
but Glebs said to me once that it's illigal and I always need to use self
btl.
On 7/28/08, Jeff Squyres wrote:
>
> FWIW, all my MTT runs are hanging as well.
>
>
> On Jul 28, 2008, at 10:37 AM, Brad Benton wrote:
>
> My experience is the same a Lenny's. I've teste
FWIW, all my MTT runs are hanging as well.
On Jul 28, 2008, at 10:37 AM, Brad Benton wrote:
My experience is the same a Lenny's. I've tested on x86_64 and
ppc64 systems and tests using --mca btl openib,self hang in all
cases.
--brad
2008/7/28 Lenny Verkhovsky
I failed to run on diffe
Interesting - you are quite correct and I should have been more
precise. I ran with -mca btl openib and it worked. So having just
openib seems to be okay.
On Jul 28, 2008, at 8:37 AM, Brad Benton wrote:
My experience is the same a Lenny's. I've tested on x86_64 and
ppc64 systems and tes
My experience is the same a Lenny's. I've tested on x86_64 and ppc64
systems and tests using --mca btl openib,self hang in all cases.
--brad
2008/7/28 Lenny Verkhovsky
> I failed to run on different nodes or on the same node via self,openib
>
>
>
> On 7/28/08, Ralph Castain wrote:
>>
>> I c
I failed to run on different nodes or on the same node via self,openib
On 7/28/08, Ralph Castain wrote:
>
> I checked this out some more and I believe it is ticket #1378 related. We
> lock up if SM is included in the BTL's, which is what I had done on my test.
> If I ^sm, I can run fine.
>
> On
I checked this out some more and I believe it is ticket #1378 related.
We lock up if SM is included in the BTL's, which is what I had done on
my test. If I ^sm, I can run fine.
On Jul 28, 2008, at 6:41 AM, Ralph Castain wrote:
It could also be something new. Brad and I noted on Fri that IB
It could also be something new. Brad and I noted on Fri that IB was
locking up as soon as we tried any cross-node communications. Hadn't
seen that before, and at least I haven't explored it further - planned
to do so today.
On Jul 28, 2008, at 6:01 AM, Lenny Verkhovsky wrote:
I believe i
I believe it it.
On 7/28/08, Jeff Squyres wrote:
>
> On Jul 28, 2008, at 7:51 AM, Jeff Squyres wrote:
>
> Is this related to r1378?
>>
>
> Gah -- I meant #1378, meaning the "PML ob1 deadlock" ticket.
>
>
> On Jul 28, 2008, at 7:13 AM, Lenny Verkhovsky wrote:
>>
>> Hi,
>>>
>>> I experience hang
On Jul 28, 2008, at 7:51 AM, Jeff Squyres wrote:
Is this related to r1378?
Gah -- I meant #1378, meaning the "PML ob1 deadlock" ticket.
On Jul 28, 2008, at 7:13 AM, Lenny Verkhovsky wrote:
Hi,
I experience hanging of tests ( latency ) since r19010
Best Regards
Lenny.
Is this related to r1378?
On Jul 28, 2008, at 7:13 AM, Lenny Verkhovsky wrote:
Hi,
I experience hanging of tests ( latency ) since r19010
Best Regards
Lenny.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/
Hi,
I experience hanging of tests ( latency ) since r19010
Best Regards
Lenny.
24 matches
Mail list logo