Thanks Josh and Jeff.
I have read already "The Design and Implementation of Checkpoint/Restart
Process Fault Tolerance for Open MPI" and i'm gonna read now "A Composable
Runtime Recovery Policy Framework Supporting Resilient HPC Applications" and
then i will take a look to the code of the componen
So I can point you to some of the work that I did while at Indiana University
to support process migration in Open MPI in a coordinated manner. This should
introduce you to some of the internal pieces that fit together to provide this
support.
The transparent C/R in Open MPI webpage from IU is
Thanks for the reply and don't worry about the delay.
Yeah, i supposse it wouln't be easy :(.
But my final goal is what you are mentioning, is to stop one particular
process (previously checkpointed) and the migrate it to another place (node,
core, slot, etc.) and restart it there, but without mak
Sorry for the delay; you wrote while many of us were on vacation and we're just
now starting to catch up on past mails...
I'm not entirely sure what you're trying to do. It sounds like you're trying
to replace one process with another. That's quite complicated; there will be a
lot of changes
Hello to all.
I'm new in the forum, at least is the first time i write.
I'm working with open mpi and I would do a little experiment, i will try to
pass one process by another process.
For example, assuming that there are 2 processes that are communicating say
rank 1 and 2. And there is a proces