(Sorry for the delay in replying, more below)

On Apr 12, 2010, at 6:36 AM, Hideyuki Jitsumoto wrote:

Hi Members,

I tried to use checkpoint/restart by openmpi.
But I can not get collect checkpoint data.
I prepared execution environment as follows, the strings in () mean
name of output file which attached on next e-mail ( for mail size
limitation ):

1. installed BLCR and checked BLCR is working correctly by "make check"
2. executed ./configure with some parameters on openMPI source dir
(config.output / config.log)
3. executed make and make install (make.output.2 / install.output.2)
4. confirmed that mca_crs_blcr.[la|so], mca_crs_self.[la|so] on
/${INSTALL_DIR}/lib/openmpi
5. make ~/.openmpi/mca-params.conf (mca-params.conf)
6. compiled NPB and executed with -am ft-enable-cr
7. invoked ompi-checkpoint <MPIRUN_PID>

As result, I got the message "Checkpoint failed: no processes checkpointed."
(cr_test_cg)

It is unclear from the output what caused the checkpoint to fail. Can you turn on some verbose arguments and send me the output?

Put the following options in you ~/.openmpi/mca-params.conf:
#---------------
orte_debug_daemons=1
snapc_full_verbose=20
crs_base_verbose=10
opal_cr_verbose=10
#---------------



In addition, when I confirmed open_info output as your demo movie, I got "MCA crs: none (MCA v2.0, API v2.0, Component v1.4.1)" (open_info.output)

This is actually a known bug with ompi_info. I have a fix in the works for it, and should be available soon. Until then the ticket is linked below:
  https://svn.open-mpi.org/trac/ompi/ticket/2097


How should I do for checkpointing ?
Any guidance in this regard would be highly appreciated.

Let's see what the verbose output tells us, and go from there. What version of BLCR are you using?

-- Josh


Thank you,
Hideyuki

--
Sincerely Yours,
Hideyuki Jitsumoto (jitum...@gsic.titech.ac.jp)
Tokyo Institute of Technology
Global Scientific Information and Computing center (Matsuoka Lab.)
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to