Re: [OMPI devel] OMPI devel] Checkpoint/restart + migration

2015-11-13 Thread Federico Reghenzani
2015-10-26 8:04 GMT+01:00 Gilles Gouaillardet : > Federico, > > that looks good to me. > the image does not show the channel between orded and its children. > this is a currently a TCP socket (v1.10) and we are moving to Unix socket > (already in master) > > Which is the framework involved in this

Re: [OMPI devel] OMPI devel] Checkpoint/restart + migration

2015-10-26 Thread Gilles Gouaillardet
Federico, that looks good to me. the image does not show the channel between orded and its children. this is a currently a TCP socket (v1.10) and we are moving to Unix socket (already in master) Cheers, Gilles On 10/26/2015 3:28 PM, Federico Reghenzani wrote: Hi Gilles, t​​hank you again fo

Re: [OMPI devel] OMPI devel] Checkpoint/restart + migration

2015-10-26 Thread Federico Reghenzani
Hi Gilles, t​​hank you again for your great answer. Our idea is to migrate tasks between nodes, possibly individually, and other tasks still run (obviously, if they want to communicate with "migrating" node, we should pause them). Just to be sure if we have understood correctly, is the attached i

Re: [OMPI devel] OMPI devel] Checkpoint/restart + migration

2015-10-23 Thread George Bosilca
Each module has the opportunity to provide an ft_event function, that is supposedly called when a change in the module behavior is necessary. Thus, it is relatively easy to let the BTL knows about the fact that a particular destination process will migrate to a new location. George. On Fri, Oc

Re: [OMPI devel] OMPI devel] Checkpoint/restart + migration

2015-10-23 Thread Gilles Gouaillardet
Gianmario, Iirc, there is one pipe between orted and each children stderr. stdout is a pty, and stdin is /dev/null, but it might be a pipe on task 0 This is the way stdout/stderr from tasks end up being printed by mpirun : orted does i/o forwarding (aka IOF) are you trying to migrate only one ta