Hi mates, on my way of using BucketingSink, I've decided to enable checkpointing, to prevent hanging of files in open state on job failure. But it seems, that I’m not properly understood the meaning of checkpointing …
I’ve enabled the fs backend for checkpoints, and while job is working everything works fine, file with the state is created, and if I kill the taskmanager, it will be restored. But in case, when I kill the whole job, and run it again, the state from last checkpoint won’t be used, and one more new state is created. If I properly understood, checkpointing state is used by job manager, while job is running, and if I would like to cancel/ kill the job, I should use savepoints. So I got the following questions: are my assumptions about checkpoint/ savepoint state usage correct ? when I’m creating a savepoint, only hdfs could be used as a backend ? when I’m using RocksDB, it could only be used as a checkpointing backend, and when I’ll decide to create savepoint, it’ll be stored in hdfs ? do we have any ability to configure the job, to use last checkpoint as a starting state out of the box ? Sincerely yours, Rinat Sharipov Software Engineer at 1DMP CORE Team email: r.shari...@cleverdata.ru <mailto:a.totma...@cleverdata.ru> mobile: +7 (925) 416-37-26 CleverDATA make your data clever