I am pretty sure MTL's and BTL's are very different, but just as a note,
This users code (Crash) hangs at MPI_Allreduce() in 

Openib

But runs on:
tcp
psm (an mtl, different hardware)

Putting it out there if it does have any bearing.  Otherwise ignore. 

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On May 12, 2011, at 10:20 AM, Brock Palen wrote:

> On May 12, 2011, at 10:13 AM, Jeff Squyres wrote:
> 
>> On May 11, 2011, at 3:21 PM, Dave Love wrote:
>> 
>>> We can reproduce it with IMB.  We could provide access, but we'd have to
>>> negotiate with the owners of the relevant nodes to give you interactive
>>> access to them.  Maybe Brock's would be more accessible?  (If you
>>> contact me, I may not be able to respond for a few days.)
>> 
>> Brock has replied off-list that he, too, is able to reliably reproduce the 
>> issue with IMB, and is working to get access for us.  Many thanks for your 
>> offer; let's see where Brock's access takes us.
> 
> I should also note that as far as I know I have three codes (CRASH, Namd 
> (some cases), and another user code.  That lockup on a collective on OpenIB 
> but run with the same library on Gig-e.
> 
> So I am not sure it is limited to IMB, or I could be crossing errors, 
> normally I would assume unmatched eager recvs for this sort of problem. 
> 
>> 
>>>> -- we have not closed this issue,
>>> 
>>> Which issue?   I couldn't find a relevant-looking one.
>> 
>> https://svn.open-mpi.org/trac/ompi/ticket/2714
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 


Reply via email to