Hello, Three things... 1) Josh, the main developer for checkpoint/restart, has been away for a few weeks and has just returned. I suspect he will get unburied from e-mail in another day or two.
2) The 1.4 (and 1.3) branch is very much under rapid development, and there will be times when basic functionality will just break for a day or so. If you run into a problem, please try to be more specific about what version (include the r#) that you tried. 3) The checkpoint/restart functionality currently only supports a subset of the network transports. I think all that you should expect to work right now is TCP and shared memory. Josh is working on other transports, but those are very much a "work in progress". On Wed, Aug 20, 2008 at 4:11 AM, Matthias Hovestadt <m...@cs.tu-berlin.de> wrote: > Hi Gabriele! > >> In this case, mpirun works well, but the checkpoint procedure fails: >> >> ompi-checkpoint 20109 >> [node0316:20134] Error: Unable to get the current working directory >> [node0316:20134] [[42404,0],0] ORTE_ERROR_LOG: Not found in file >> orte-checkpoint.c at line 395 >> [node0316:20134] HNP with PID 20109 Not found! > > I had exactly the same problem on my machine. Neither modifying > the configure parameters nor the way of invoking the ompi-checkpoint > command did help. Since I am using the source from subversion checkout, > I also updated the source several times, following the day to day > progress. However, this problem remained. > > Luckily, updating the source to SVN revision 19265 finally solved > this checkpointing issue. Maybe the problem shows up again in later > versions... > > > Best, > Matthias > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ tmat...@gmail.com || timat...@open-mpi.org I'm a bright... http://www.the-brights.net/