On Wed, Mar 31, 2010 at 7:39 PM, Addepalli, Srirangam V
<srirangam.v.addepa...@ttu.edu> wrote:
> Hello All.
> I am trying to checkpoint a mpi application that has been started using the 
> follwong mpirun command
>
> mpirun -am ft-enable-cr -np 8 pw.x  < Ge46.pw.in > Ge46.ph.out
>
> ompi-checkpoint 31396 ( Works) How ever when i try to terminate the process
>
> ompi-checkpoint  --term 31396  it never finishes.  How do i bebug this issue.

ompi-checkpoint is exactly ompi-checkpoint + sending SIGTERM to your
app. If ompi-checkpoint finishes, then your app is not dealing with
SIGTERM correctly.

Make sure you're not ignoring SIGTERM, you need to either handle it or
let it kill your app. If it's a multithreaded app, make sure you can
"distribute" the SIGTERM to ALL the threads, i.e., when you receive
SIGTERM, notify all other threads that they should join or quit.

Regards,

Reply via email to