On Feb 10, 2010, at 9:45 AM, Addepalli, Srirangam V wrote:

> I am trying to test orte-checkpoint with a MPI JOB. It how ever hangs for all 
> jobs.  This is how i submit the job is started
> mpirun -np 8 -mca ft-enable cr /apps/nwchem-5.1.1/bin/LINUX64/nwchem 
> siosi6.nw 

This might be the problem, if it wasn't a typo. The command line flag is "-am 
ft-enable-cr" not "-mca ft-enable cr". The former activates a set of MCA 
parameters (in the AMCA file 'ft-enable-cr'). The latter should be ignored by 
the MCA system.

Give that a try and let us know if the behavior changes.

-- Josh

>> From another terminal i try the orte-checkpoint
> 
> ompi-checkpoint -v --term 9338
> [compute-19-12.local:09377] orte_checkpoint: Checkpointing...
> [compute-19-12.local:09377]      PID 9338
> [compute-19-12.local:09377]      Connected to Mpirun [[5009,0],0]
> [compute-19-12.local:09377]      Terminating after checkpoint
> [compute-19-12.local:09377] orte_checkpoint: notify_hnp: Contact Head Node 
> Process PID 9338
> [compute-19-12.local:09377] orte_checkpoint: notify_hnp: Requested a 
> checkpoint of jobid [INVALID]
> [compute-19-12.local:09377] orte_checkpoint: hnp_receiver: Receive a command 
> message.
> [compute-19-12.local:09377] orte_checkpoint: hnp_receiver: Status Update.
> 
> 
> Is there any way to debug the issue to get more information or log messages.
> 
> Rangam
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to