Hi all,

I had gone through some previous ompi-restart issues but i couldn't
find anything similar to this problem.

I have installed blcr, and configured open-mpi 'openmpi-1.3a1r19645'

i) If the sample mpi program say ( np 4 on single machine that is
without any hostfile )is ran and I try to checkpoint it, it happens
successfully and even ompi-restart works in this case.

ii) If the sample mpi program is ran across say 2 different nodes and
checkpoint happens successfully BUT ompi-restart throws following
error:

[audhakne@acl-cadi-pentd-1 ~]$ ompi-restart ompi_global_snapshot_7604.ckpt
--------------------------------------------------------------------------
mpirun noticed that process rank 3 with PID 9590 on node
acl-cadi-pentd-1.cse.buffalo.edu exited on signal 11 (Segmentation
fault).
--------------------------------------------------------------------------

Please let me know if more information is needed.

-- 
Thanks and Regards,
Arun U. Dhakne

Reply via email to