Re: [OMPI devel] Device failover on ob1

2009-08-03 Thread Mouhamed Gueye
as once a receive is complete, who knows what the user has done with that buffer ? A general treatment needs to be able to false negatives, and attempts to deliver the data more than once. How are you detecting missing acknowledgements ? Are you using some sort of timer ? Rich On 7/31/09 5:49 A

[OMPI devel] Device failover on ob1

2009-07-31 Thread Mouhamed Gueye
Hi list, Here is an update on our work concerning device failover. As many of you suggested, we reoriented our work on ob1 rather than dr and we now have a working prototype on top of ob1. The approach is to store btl descriptors sent to peers and delete them when we receive proof of delivery

[OMPI devel] Multi-rail on openib

2009-06-05 Thread Mouhamed Gueye
Hi all, I am working on multi-rail IB and I was wondering how connections are established between ports. I have two hosts, each with 2 ports on a same IB card, connected to the same switch. My question is : how ports are connected between them ? Is there a queue pair between all ports or o

[OMPI devel] Device failover in dr pml

2009-04-15 Thread Mouhamed Gueye
Hi all, We are currently working on the dr pml component and specifically on device failover. The failover mecanism seems to work fine on different components, but if we want to do it on different modules of the same component - say 2 Infiniband rails - the code seems to be broken. Actually,