Congxian, Thank you for creating the ticket and providing the relevant code. I’m curious why you don’t think the directory collision is not a problem. What we observe is that one of the operator states are not included in the checkpoint and data is lost on restore. That’s a pretty serious problem especially when Flink doesn’t generate any error in the log. People could be losing states silently potentially.
Please let me know how I can best help diagnose this issue and drive the ticket forward. I’m happy to collect any relevant information. Thanks, — Ning > On Apr 23, 2019, at 2:54 AM, Congxian Qiu <qcx978132...@gmail.com> wrote: > > From the log message you given, the two operate share the same directory, and > when snapshot, the directory will be deleted first if it > exists(RocksIncrementalSnapshotStrategy#prepareLocalSnapshotDirectory). > > I did not find an issue for this problem, and I don’t thinks this is a > problem of UUID generation problem, please check the path generation logic in > LocalRecoveryDirectoryProviderImpl#subtaskSpecificCheckpointDirectory. > > I’ve created an issue for this problem.