[OMPI devel] Ideas for notifying completion of ompi-restart

2011-06-15 Thread Kishor Kharbas
Hello ! I am working on some simulations where I have to perform periodic kill-restart and checkpointing on a MPI application. As a checkpoint can take place immediately after restart I need some way to know whether ompi-restart of the application is complete. If I do not ensure that restart of a

[OMPI devel] MPI application hangs after a checkpoint

2011-06-07 Thread Kishor Kharbas
Hello, I am trying to use checkpoint-restart functionality of OpenMPI. Most of the times checkpointing of MPI application behaves correctly, but in some situations the MPI application hangs indefinitely after the checkpoint is taken. Ompi-checkpoint terminates without error and I do get the snapsh

[OMPI devel] Open MPI Checkpoint-restart bug

2011-05-06 Thread Kishor Kharbas
orate them in the code in an appropriate way. Please let me know if more information is need about this. Thank you. Kishor Kharbas Environment: Open MPI version : openmpi-1.5.3 108 Node cluster. All machines are 2-way SMPs with AMD Opteron 6128 (Magny Core) processors with 8 cores per socket (16

[OMPI devel] Open MPI error

2011-04-25 Thread Kishor Kharbas
Hello Developers, I am using Open MPI-1.5.3 for performing experiments with checkpoint and restart. However when the number of nodes is more than 128, restart fails with an segmentation fault. After debugging the code, I found that the cause of this error is that variables of type int_8 are used