It seems that there may be multiple issues. The parfile I sent before tests for NaNs in grid::x. grid::x is not a checkpointed variable. It seems that with manual topology, the grid::x is filled with nans during the recover step (the pointer is actually pointing to a new area of memory). With standard topology, the array pointer and contents do not change on recover. I have also seen NaNs in the recovered variables, but this parfile doesn't show that.
On 9/9/19 4:24 PM, Yosef Zlochower wrote: > Hi, > > I have been trying to debug why some runs I was performing could not > recover from a checkpoint file, but would otherwise proceed as normal. > > I attached a minimalist parfile showing the problem. A small grid is > manually distributed over 8 processors and terminates at iteration 2. An > attempt at recover fails with nans on grid::x. If the manual topology > section is commented out, no problems are seen. > > > _______________________________________________ > Users mailing list > [email protected] > http://lists.einsteintoolkit.org/mailman/listinfo/users > _______________________________________________ Users mailing list [email protected] http://lists.einsteintoolkit.org/mailman/listinfo/users
