Gleb,
    I am trying to use BLCR as well. What levels of OpenMPI, OFED, and BLCR
are you using? I can get a serial checkpoint/restart to work but not the
parallel case. I built my system using OFED 1.3.1, OpenMPI 1.3.1, and BLCR
0.8.1-1. I also used your same BLCR configuration options for OpenMPI.
             Thanks,
              Pat


J.W. (Pat) O'Bryant,Jr.
Business Line Infrastructure
Technical Systems, HPC




             "Gleb \"Crazy                                                 
             Sage\"                                                        
             Igumnov"                                                   To 
             <crazy.sage@gm           Gleb Igumnov <crazy.s...@gmail.com>  
             ail.com>                                                   cc 
             Sent by:                 us...@open-mpi.org                   
             users-bounces@                                        Subject 
             open-mpi.org             Re: [OMPI users] Problems with Open  
                                      MPI/BLCR checkpoint/restart          
                                      routine.                             
             06/10/09 12:06                                                
             PM                                                            


             Please respond                                                
                   to                                                      
             "Gleb \"Crazy                                                 
                 Sage\"                                                    
                Igumnov"                                                   
             <crazy.sage@gm                                                
               ail.com>;                                                   
             Please respond                                                
                   to                                                      
             Open MPI Users                                                
             <users@open-mp                                                
                 i.org>                                                    








Fixed this, not all paths were in variables. Sorry.

> Hello. I've got following problem. I've run MPI programm and successful
> checkpointed it with BLCR.
> But now, when I'm trying to restart it using ompi-restart -v
> ompi_global_snapshot_7190.ckpt I'm getting following message:

> [umu2:07572] Checking for the existence of
> (/root/ompi_global_snapshot_7190.ckpt)
> [umu2:07572] Restarting from file (ompi_global_snapshot_7190.ckpt)
> [umu2:07572]     Exec in self
>
--------------------------------------------------------------------------
> Error: Unable to obtain the proper restart command to restart from the
>        checkpoint file (ompi_global_snapshot_7190.ckpt). Returned -1.

>
--------------------------------------------------------------------------


> Both Open-MPI and BLCR are installed into shared NFS directory, blcr
> directories are included into PATH and LD_LIBRARY_PATH variables on
> restart node.
> Open MPI initially configured with keys
>  ??with?ft=cr ??enable?ft?thread ??enable?mpi?thread
> ??with?blcr=/path/to/blcr

> Program was run with -am ft-enable-cr.
> What can cause such problem?
> --------------------------------------------
> With best regards
> Gleb "Crazy Sage" Igumnov
> mailto:crazy.s...@gmail.com





--
With best regards,
 Gleb "Crazy Sage" Igumnov
mailto:crazy.s...@gmail.com


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to