There was a bug that caused ompi-checkpoint not to find the correct place in the session directory for mpirun's contact file. This was fixed in r19265, so you should no longer have a problem.

On Aug 20, 2008, at 2:11 AM, Matthias Hovestadt wrote:

Hi Gabriele!

In this case, mpirun works well, but the checkpoint procedure fails:
ompi-checkpoint 20109
[node0316:20134] Error: Unable to get the current working directory
[node0316:20134] [[42404,0],0] ORTE_ERROR_LOG: Not found in file
orte-checkpoint.c at line 395
[node0316:20134] HNP with PID 20109 Not found!

I had exactly the same problem on my machine. Neither modifying
the configure parameters nor the way of invoking the ompi-checkpoint
command did help. Since I am using the source from subversion checkout,
I also updated the source several times, following the day to day
progress. However, this problem remained.

Luckily, updating the source to SVN revision 19265 finally solved
this checkpointing issue. Maybe the problem shows up again in later
versions...


Best,
Matthias
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to