According to [1] checkpoints do not support Flink specific features like rescaling, but I can try. Thank you for suggestions
[1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html#difference-to-savepoints Apache Flink 1.12 Documentation: Checkpoints<https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html#difference-to-savepoints> Configure globally via configuration files state.checkpoints.dir: hdfs:///checkpoints/ Configure for per job when constructing the state backend env. setStateBackend (new RocksDBStateBackend ("hdfs:///checkpoints-data/")); Difference to Savepoints ci.apache.org ________________________________ From: ChangZhuo Chen (陳昌倬) Sent: Wednesday, March 17, 2021 12:29 AM To: Alexey Trenikhun Cc: ro...@apache.org; Flink User Mail List Subject: Re: Checkpoint fail due to timeout On Wed, Mar 17, 2021 at 05:45:38AM +0000, Alexey Trenikhun wrote: > In my opinion looks similar. Were you able to tune-up Flink to make it work? > I'm stuck with it, I wanted to scale up hoping to reduce backpressure, but to > rescale I need to take savepoint, which never completes (at least takes > longer than 3 hours). You can use aligned checkpoint to scala your job. Just restarting from checkpoint with the same jar file, and new parallelism shall do the trick. -- ChangZhuo Chen (陳昌倬) czchen@{czchen,debian}.org http://czchen.info/ Key fingerprint = BA04 346D C2E1 FE63 C790 8793 CC65 B0CD EC27 5D5B