Hi mates, on my way of using BucketingSink, I've decided to enable 
checkpointing, to prevent hanging of files in open state on job failure.
But it seems, that I’m not properly understood the meaning of checkpointing …

I’ve enabled the fs backend for checkpoints, and while job is working 
everything works fine, file with the state is created, and if I kill the 
taskmanager, it will be restored.
But in case, when I kill the whole job, and run it again, the state from last 
checkpoint won’t be used, and one more new state is created.

If I properly understood, checkpointing state is used by job manager, while job 
is running, and if I would like to cancel/ kill the job, I should use 
savepoints.

So I got the following questions:

are my assumptions about checkpoint/ savepoint state usage correct ?
when I’m creating a savepoint, only hdfs could be used as a backend ?
when I’m using RocksDB, it could only be used as a checkpointing backend, and 
when I’ll decide to create savepoint, it’ll be stored in hdfs ?
do we have any ability to configure the job, to use last checkpoint as a 
starting state out of the box ?

Sincerely yours,
Rinat Sharipov
Software Engineer at 1DMP CORE Team

email: r.shari...@cleverdata.ru <mailto:a.totma...@cleverdata.ru>
mobile: +7 (925) 416-37-26

CleverDATA
make your data clever

Reply via email to