Currently ompi-restart does not know how to deal with an absolute or relative path in the command line argument for the global snapshot handle. It will always prepend the value of the MCA parameter:
snapc_base_global_snapshot_dir
Which defaults to $HOME.

So what you are seeing is (currently) to be expected. If you set the MCA parameter to the path you are trying for as an argument to ompi- restart then it should work (something like the below): ompi-restart -mca snapc_base_global_snapshot_dir $HOME ompi_global_snapshot_7056.ckpt

I opened a bug to add this capability to orte-restart. You can track it at the link below:
https://svn.open-mpi.org/trac/ompi/ticket/1924

I am not 100% sure when I will have a chance to get to it, but hopefully in the next few weeks.

As a side note, if you want to move the global snapshot directory to another location you will need to update the 'global_snapshot_meta.data' file located at the root of the global snapshot directory to reflect the path changes for the 'Snapshot Location:' key.

Cheers,
Josh

On May 14, 2009, at 12:49 PM, Bouguerra mohamed slim wrote:

Hello,
I think that there is a problem with the ompi-restart from the release r-21197. in fact ompi-restart can restart only if the checkpoint directory is $HOME.
For example the checkpoint folder is $HOME.
if i try ompi-restart -i $HOME/ompi_global_snapshot_7056.ckpt/ it doesn't work and i get

msbouguerra@sol-5:~$ ompi-restart -i $HOME/ ompi_global_snapshot_7056.ckpt/
--------------------------------------------------------------------------
Error: The filename (/home/grenoble/msbouguerra/ ompi_global_snapshot_7056.ckpt/) is invalid because either you have not provided a filename
     or provided an invalid filename.
     Please see --help for usage.

--------------------------------------------------------------------------


and when i try : ompi-restart -i ompi_global_snapshot_7056.ckpt/ it works and i get


msbouguerra@sol-5:~$ ompi-restart -i ompi_global_snapshot_7056.ckpt/
[sol-5.sophia.grid5000.fr:07466] Sequences: 1
[sol-5.sophia.grid5000.fr:07466] Seq: 0
[sol-5.sophia.grid5000.fr:07466] Begin Timestamp: Thu May 14 18:23:00 2009
[sol-5.sophia.grid5000.fr:07466] OPAL CRS Component: blcr
[sol-5.sophia.grid5000.fr:07466] Snapshot Reference: ompi_global_snapshot_7056.ckpt/ [sol-5.sophia.grid5000.fr:07466] Snapshot Location: /home/grenoble/ msbouguerra/ompi_global_snapshot_7056.ckpt [sol-5.sophia.grid5000.fr:07466] End Timestamp: Thu May 14 18:23:00 2009
[sol-5.sophia.grid5000.fr:07466] Processes: 4

msbouguerra@sol-5:~$

So when i use another folder as checkpoint directory the restart failed


--
Cordialement,
Mohamed-Slim BOUGUERRA    PhD student INRIA-Grenoble / Projet MOAIS
ENSIMAG - antenne de Montbonnot
ZIRST 51, avenue Jean Kuntzmann
38330 MONTBONNOT SAINT MARTIN France
Tel :+33 (0)4 76 61 20 79
Fax :+33 (0)4 76 61 20 99
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to