2015-10-26 8:04 GMT+01:00 Gilles Gouaillardet <gil...@rist.or.jp>: > Federico, > > that looks good to me. > the image does not show the channel between orded and its children. > this is a currently a TCP socket (v1.10) and we are moving to Unix socket > (already in master) > > Which is the framework involved in this communication? I'm not sure what this channel is used for.
Cheers, > > Gilles > > > On 10/26/2015 3:28 PM, Federico Reghenzani wrote: > > Hi Gilles, > thank you again for your great answer. Our idea is to migrate tasks > between nodes, possibly individually, and other tasks still run (obviously, > if they want to communicate with "migrating" node, we should pause them). > > > Just to be sure if we have understood correctly, is the attached image > exact? > > Cheers, > Federico > __ > Federico Reghenzani > M.Eng. Student @ Politecnico di Milano > Computer Science and Engineering > > > > 2015-10-23 11:45 GMT+02:00 Gilles Gouaillardet < > <gilles.gouaillar...@gmail.com>gilles.gouaillar...@gmail.com>: > >> Gianmario, >> >> Iirc, there is one pipe between orted and each children stderr. >> stdout is a pty, and stdin is /dev/null, but it might be a pipe on task 0 >> This is the way stdout/stderr from tasks end up being printed by mpirun : >> orted does i/o forwarding (aka IOF) >> >> are you trying to migrate only one task (and other tasks still run) or >> are you trying to checkpoint and restart on a different set of nodes ? >> >> Typically, a task uses shared memory for intra node communications, and >> infiniband or tcp for inter node communications. >> So if you migrate only one task, and i assume you have no virtual shared >> memory, then you need to notify its neighbors they have to switch from shm >> to ib/tcp. >> At first glance, that is much harder than moving orted and its children : >> You would "only" have to re-establish all connections and migrate the shm. >> Also, orted assumes/need its children are running on the same node, (they >> use a session dir in /tmp, orted waits SIGCHLD when its child dies,...) so >> if you migrate everything, you do not have to worry about that part. >> >> You might also want to consider some virtualization : >> If a node is running in its own vm, or its own container with a virtual >> ip, you could reuse existing infrastructure at least to migrate orted and >> its tcp/ip connections >> >> Cheers, >> >> Gilles >> >> Federico Reghenzani <federico1.reghenz...@mail.polimi.it> wrote: >> Hi Adrian and Gilles, >> >> first of all thank you for your responses. I'm working with Gianmario on >> this ambitious project. >> >> 2015-10-22 13:17 GMT+02:00 Gilles Gouaillardet < >> <gilles.gouaillar...@gmail.com>gilles.gouaillar...@gmail.com>: >> >>> Gianmario, >>> >>> there was c/r support in the v1.6 series but it has been removed. >>> the current trend is to do application level checkpointing >>> (much more efficient and much smaller checkpoint file size) >>> >>> iirc, ompi took care of closing/restoring all communication, and a third >>> party checkpoint was required to checkpoint/restart *standalone* processes. >>> >>> generally speaking, mpirun and orted communicate via tcp >>> orted and MPI (intra node comms) currently use tcp but we are moving to >>> unix sockets >>> MPI tasks communicate via btl (infiniband, tcp, shared memory, ...) >>> >>> >> We have also seen that orted opens 2 pipe to each child, is it correct? >> Does orted use them to communicate with children? >> >> >> >>> imho, moving only one MPI task to an other node is much harder, not to >>> say impossible, than moving orted and its children MPI tasks to an other >>> node >>> >>> >> Mmm, I can ask you why? I mean, if we migrate the entire orted we need to >> close/reopen *mpirun-orted* and *task-task* (btl) sockets, and if we >> migrate the single task we need to close/reopen *orte-task* and >> *task-task *sockets. In both cases we have to broadcast the information >> of "changing location" of the task or orted. >> >> >> >>> Cheers, >>> >>> Gilles >>> >>> >>> On Thursday, October 22, 2015, Gianmario Pozzi <pozzigma...@gmail.com> >>> wrote: >>> >>>> Hi everyone! >>>> >>>> My team and I are working on the possibility to checkpoint a process >>>> and restarting it on another node. We are using CRIU framework for the >>>> checkpoint/restart part, but we are facing some issues related to >>>> migration. >>>> >>>> First of all: we found out that some attempts to C/R an OMPI process >>>> have been already made in the past. Is anything related to that still >>>> supported/available/working? >>>> >>>> Then, we need to know which network communications are used at any >>>> time, in order to "pause" them during migrations (at least the ones >>>> involving the migrating node). Our code analysis makes us think that: >>>> -OpenMPI runtime (HNP<->orteds) uses orte/OOB >>>> -Running applications exchange data via ompi/BTL >>>> >>>> Is that correct? If not, can someone give us a hint? >>>> >>>> Questions on how to update topology info may be yet to come. >>>> >>>> Thank you guys! >>>> >>>> Gianmario >>>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2015/10/18242.php >>> >> >> >> Cheers, >> Federico >> __ >> Federico Reghenzani >> M.Eng. Student @ Politecnico di Milano >> Computer Science and Engineering >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2015/10/18253.php >> > > > > _______________________________________________ > devel mailing listde...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/10/18267.php > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/10/18268.php >