Thanks folks, I understand this can be a limitation when redeploying. I did
try to delete my job and start it from scratch using
"initialSavepointPath"... and I got the same issue. Going to investigate
this more today.

On Thu, Oct 13, 2022 at 12:18 AM Evgeniy Lyutikov <eblyuti...@avito.ru>
wrote:

> The problem is that changing the FlinkDeployment specification (new jar
> version, changing pod resources, etc.) for JobManager is just a restart.
>
> 2022-09-16 09:30:52,526 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Restoring
> job 00000000000000000000000000000000 from Checkpoint 34 @ 1663320593326 for
> 00000000000000000000000000000000 located at
> s3p://flink-checkpoints/k8s-checkpoint-test-k8s-deploy/00000000000000000000000000000000/chk-34.
> 2022-09-16 09:30:52,624 INFO
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Job
> 00000000000000000000000000000000 reached terminal state FAILED.
> org.apache.flink.runtime.client.JobInitializationException: Could not
> start the JobMaster.
> Caused by: java.util.concurrent.CompletionException:
> java.lang.IllegalStateException: There is no operator for the state
> f215196137eeb29b6f14c1ac14a1dc9f
> Caused by: java.lang.IllegalStateException: There is no operator for the
> state f215196137eeb29b6f14c1ac14a1dc9f
>
> After starting, it restores everything from the saved HA metadata saved in
> the configmap (jobgraph, etc.).
> The only correct method for us was to completely delete the
> FlinkDeployment object and create a new one with initialSavepointPath and
> allowNonRestoredState.
> After that, the startup log looks a little different
>
> 2022-09-16 10:30:52,624 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Restoring
> job 00000000000000000000000000000000 from Savepoint 34 @ 0 for
> 00000000000000000000000000000000 located at
> s3p://flink-checkpoints/k8s-checkpoint-test-k8s-deploy/00000000000000000000000000000000/chk-34.
>
> ------------------------------
> *От:* Gyula Fóra <gyula.f...@gmail.com>
> *Отправлено:* 13 октября 2022 г. 13:19:54
> *Кому:* Yaroslav Tkachenko
> *Копия:* user
> *Тема:* Re: allowNonRestoredState doesn't seem to be working
>
> Hi!
>
> If you have last-state upgrade mode configured it may happen that the
> allowNonRestoredState config is ignored by Flink (as the last-state upgrade
> mechanism somewhat bypasses the regular submission).
>
> Worst case scenario, you can suspend the deployment, manually record the
> last checkpoint/savepoint path. Then delete the FlinkDeployment and
> recreate it with the initialSavepointPath set to your checkpoint.
>
> Cheers,
> Gyula
>
> On Thu, Oct 13, 2022 at 7:36 AM Yaroslav Tkachenko <yaros...@goldsky.com>
> wrote:
>
>> Hey everyone,
>>
>> I'm trying to redeploy an application using a savepoint. The new version
>> of the application has a few operators with new uids and a few operators
>> with the old uids. I'd like to keep the state for the old ones.
>>
>> I passed the allowNonRestoredState flag (using Apache Kubernetes Operator
>> actually) and I can confirm that
>> "execution.savepoint.ignore-unclaimed-state" is "true" after that.
>>
>> However, the application still fails with the following exception:
>>
>> "java.lang.IllegalStateException: Failed to rollback to
>> checkpoint/savepoint s3p://<REDACTED>. Cannot map checkpoint/savepoint
>> state for operator d9ea0f9654a3395802138c72c1bfd35b to the new program,
>> because the operator is not available in the new program. If you want to
>> allow to skip this, you can set the --allowNonRestoredState option on the
>> CLI."
>>
>> Is there a situation where allowNonRestoredState may not work? Thanks.
>>
>
> * ------------------------------ *“This message contains confidential
> information/commercial secret. If you are not the intended addressee of
> this message you may not copy, save, print or forward it to any third party
> and you are kindly requested to destroy this message and notify the sender
> thereof by email.
> Данное сообщение содержит конфиденциальную информацию/информацию,
> являющуюся коммерческой тайной. Если Вы не являетесь надлежащим адресатом
> данного сообщения, Вы не вправе копировать, сохранять, печатать или
> пересылать его каким либо иным лицам. Просьба уничтожить данное сообщение и
> уведомить об этом отправителя электронным письмом.”
>

Reply via email to