Hey, I'm trying the new released Open MPI 1.3 in conjunction with BLCR to provide the checkpoint/restart-feature.
Configured with ./configure --prefix=/usr/local --with-ft=cr --enable-ft-thread --enable-mpi-threads --with-blcr=/ A MPI-job on a single machine (several threads) is checkpointed and restarted very well. The checkpoint of a MPI-job across two hosts (ethernet, tcp) is also done without warnings or errors (the homedir and the directory, where the MPI-Application is, are shared with NFS). The restart works too, but all threads are only started on the host, where I enter the ompi-restart command. Even if I add the -hostfile argument to ompi-restart, only the one host is used. Does anybody has a hint? Thanks, Gregor