So the HNP/mpirun knows when the job is fully restarted. The code for
that is at:
orte/mca/snapc/full/snapc_full_global.c:1758
This should prevent ompi-checkpoint from starting a checkpoint before
the restart is complete. I suspect those are the errors that you are
talking about.
Since you are
Hello !
I am working on some simulations where I have to perform periodic
kill-restart and checkpointing on a MPI application.
As a checkpoint can take place immediately after restart I need some way to
know whether ompi-restart of the application is complete.
If I do not ensure that restart of a