Hello !
I am working on some simulations where I have to perform periodic
kill-restart and checkpointing on a MPI application.
As a checkpoint can take place immediately after restart I need some way to
know whether ompi-restart of the application is complete.
If I do not ensure that restart of a
Hello,
I am trying to use checkpoint-restart functionality of OpenMPI. Most of the
times checkpointing of MPI application behaves correctly, but in some
situations the MPI application hangs indefinitely after the checkpoint is
taken. Ompi-checkpoint terminates without error and I do get the snapsh
orate them in the
code in an appropriate way.
Please let me know if more information is need about this.
Thank you.
Kishor Kharbas
Environment:
Open MPI version : openmpi-1.5.3
108 Node cluster. All machines are 2-way SMPs with AMD Opteron 6128 (Magny
Core) processors with 8 cores per socket (16
Hello Developers,
I am using Open MPI-1.5.3 for performing experiments with checkpoint and
restart.
However when the number of nodes is more than 128, restart fails with an
segmentation fault.
After debugging the code, I found that the cause of this error is that
variables of type int_8 are used