Hello,
Three things...
1) Josh, the main developer for checkpoint/restart, has been away for
a few weeks
and has just returned.  I suspect he will get unburied from e-mail in
another day or two.

2) The 1.4 (and 1.3) branch is very much under rapid development, and
there will be times
when basic functionality will just break for a day or so.  If you run
into a problem, please try
to be more specific about what version (include the r#) that you tried.

3) The checkpoint/restart functionality currently only supports a
subset of the network
transports.  I think all that you should expect to work right now is
TCP and shared memory.
Josh is working on other transports, but those are very much a "work
in progress".

On Wed, Aug 20, 2008 at 4:11 AM, Matthias Hovestadt
<m...@cs.tu-berlin.de> wrote:
> Hi Gabriele!
>
>> In this case, mpirun works well, but the checkpoint procedure fails:
>>
>> ompi-checkpoint 20109
>> [node0316:20134] Error: Unable to get the current working directory
>> [node0316:20134] [[42404,0],0] ORTE_ERROR_LOG: Not found in file
>> orte-checkpoint.c at line 395
>> [node0316:20134] HNP with PID 20109 Not found!
>
> I had exactly the same problem on my machine. Neither modifying
> the configure parameters nor the way of invoking the ompi-checkpoint
> command did help. Since I am using the source from subversion checkout,
> I also updated the source several times, following the day to day
> progress. However, this problem remained.
>
> Luckily, updating the source to SVN revision 19265 finally solved
> this checkpointing issue. Maybe the problem shows up again in later
> versions...
>
>
> Best,
> Matthias
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
 I'm a bright... http://www.the-brights.net/

Reply via email to