Re: [OMPI devel] new btl

2011-06-16 Thread George Bosilca
If there is no need for supporting multiple devices simultaneously, or need for checksum, or some level of support for fault tolerance, then an MTL will be enough. george. On Jun 16, 2011, at 09:47 , Peter Kjellström wrote: > On Tuesday, June 14, 2011 06:25:52 PM Jeff Squyres wrote: >> Thank

Re: [OMPI devel] new btl

2011-06-16 Thread Jeff Squyres
On Jun 16, 2011, at 3:47 AM, Peter Kjellström wrote: >> I should say that if anyone is contemplating writing a new BTL, I'm happy >> to get on the phone / webex with you for an intro to the OMPI code base, >> point you in the right direction, etc. Ping me on/off list and we can >> setup a time. >

Re: [OMPI devel] Ideas for notifying completion of ompi-restart

2011-06-16 Thread Josh Hursey
So the HNP/mpirun knows when the job is fully restarted. The code for that is at: orte/mca/snapc/full/snapc_full_global.c:1758 This should prevent ompi-checkpoint from starting a checkpoint before the restart is complete. I suspect those are the errors that you are talking about. Since you are

Re: [OMPI devel] Fake Modex

2011-06-16 Thread Hugo Meyer
Hello. Thanks for yours answers. I'ts as you said Josh, i'm trying to do something uncoordinated, and on demand. What i'm doing now is to put some code in the btl_tcp_endpoint.c and others file that allows me to change the attempts of communication in the sockets when a failure occurs. At the mom

Re: [OMPI devel] new btl

2011-06-16 Thread Peter Kjellström
On Tuesday, June 14, 2011 06:25:52 PM Jeff Squyres wrote: > Thanks Tim! > > I should say that if anyone is contemplating writing a new BTL, I'm happy > to get on the phone / webex with you for an intro to the OMPI code base, > point you in the right direction, etc. Ping me on/off list and we can