Congxian,

Thank you for creating the ticket and providing the relevant code. I’m curious 
why you don’t think the directory collision is not a problem. What we observe 
is that one of the operator states are not included in the checkpoint and data 
is lost on restore. That’s a pretty serious problem especially when Flink 
doesn’t generate any error in the log. People could be losing states silently 
potentially.

Please let me know how I can best help diagnose this issue and drive the ticket 
forward. I’m happy to collect any relevant information.

Thanks,

—
Ning

> On Apr 23, 2019, at 2:54 AM, Congxian Qiu <qcx978132...@gmail.com> wrote:
> 
> From the log message you given, the two operate share the same directory, and 
> when snapshot, the directory will be deleted first if it 
> exists(RocksIncrementalSnapshotStrategy#prepareLocalSnapshotDirectory).
> 
> I did not find an issue for this problem, and I don’t thinks this is a 
> problem of UUID generation problem, please check the path generation logic in 
> LocalRecoveryDirectoryProviderImpl#subtaskSpecificCheckpointDirectory.
> 
> I’ve created an issue for this problem.

Reply via email to