Hello everyone, I have a question concerning the checkpoint overhead in Open MPI, which is the difference taken from the runtime of application execution with and without checkpoint. I observe that when the data size and the number of processes increases, the runtime of BLCR is very small compared to the overall checkpoint overhead in Open MPI. Is it because of the increase of coordination time for checkpoint? And what is included in the overall checkpoint overhead besides the BLCR's checkpoint overhead and coordination time? Thank you.
Best Regards, Nguyen Toan