Re: [OMPI devel] Ideas for notifying completion of ompi-restart

2011-06-16 Thread Josh Hursey
So the HNP/mpirun knows when the job is fully restarted. The code for that is at: orte/mca/snapc/full/snapc_full_global.c:1758 This should prevent ompi-checkpoint from starting a checkpoint before the restart is complete. I suspect those are the errors that you are talking about. Since you are

[OMPI devel] Ideas for notifying completion of ompi-restart

2011-06-15 Thread Kishor Kharbas
Hello ! I am working on some simulations where I have to perform periodic kill-restart and checkpointing on a MPI application. As a checkpoint can take place immediately after restart I need some way to know whether ompi-restart of the application is complete. If I do not ensure that restart of a