On Feb 10, 2010, at 9:45 AM, Addepalli, Srirangam V wrote: > I am trying to test orte-checkpoint with a MPI JOB. It how ever hangs for all > jobs. This is how i submit the job is started > mpirun -np 8 -mca ft-enable cr /apps/nwchem-5.1.1/bin/LINUX64/nwchem > siosi6.nw
This might be the problem, if it wasn't a typo. The command line flag is "-am ft-enable-cr" not "-mca ft-enable cr". The former activates a set of MCA parameters (in the AMCA file 'ft-enable-cr'). The latter should be ignored by the MCA system. Give that a try and let us know if the behavior changes. -- Josh >> From another terminal i try the orte-checkpoint > > ompi-checkpoint -v --term 9338 > [compute-19-12.local:09377] orte_checkpoint: Checkpointing... > [compute-19-12.local:09377] PID 9338 > [compute-19-12.local:09377] Connected to Mpirun [[5009,0],0] > [compute-19-12.local:09377] Terminating after checkpoint > [compute-19-12.local:09377] orte_checkpoint: notify_hnp: Contact Head Node > Process PID 9338 > [compute-19-12.local:09377] orte_checkpoint: notify_hnp: Requested a > checkpoint of jobid [INVALID] > [compute-19-12.local:09377] orte_checkpoint: hnp_receiver: Receive a command > message. > [compute-19-12.local:09377] orte_checkpoint: hnp_receiver: Status Update. > > > Is there any way to debug the issue to get more information or log messages. > > Rangam > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users