Hello,
I've noticed that ompi-restart doesn't support the --rankfile option. It only supports --hostfile/--machinefile. Is there any reason --rankfile isn't supported? Suppose you have a cluster without a shared file system. When one node fails, you transfer its checkpoint to a spare node and invoke ompi-restart. In 1.5, ompi-restart automagically handles this situation (if you supply a hostfile) and is able to restart the process, but I'm afraid it might not always be able to find the checkpoints this way. If you could specify to ompi-restart where the ranks are (and thus where the checkpoints are), then maybe restart would always work as long (as long as you've specified the location of the checkpoints correctly), or maybe ompi-restart would be faster. Regards,