Re: Flink Operator - Supporting Recovery from Snapshot

2023-12-06 Thread Gyula Fóra
Hi All! Based on some continuous feedback and experience, we feel that it may be a good time to introduce this functionality in a way that doesn't accidentally affect existing users in an unexpected way. Please see: https://issues.apache.org/jira/browse/FLINK-33763 for details and review.

Re: Flink Operator - Supporting Recovery from Snapshot

2023-02-10 Thread Kevin Lam
Hey Yaroslav! Awesome, good to know that approach works well for you. I think our plan as of now is to do the same--delete the current FlinkDeployment when deploying from a specific snapshot. It'll be a separate workflow from normal deployments to take advantage of the operator otherwise.

Re: Flink Operator - Supporting Recovery from Snapshot

2023-02-10 Thread Yaroslav Tkachenko
Hi Kevin! In my case, I automated this workflow by first deleting the current Flink deployment and then creating a new one. So, if the initialSavepointPath is different it'll use it for recovery. This approach is indeed irreversible, but so far it's been working well. On Fri, Feb 10, 2023 at

Re: Flink Operator - Supporting Recovery from Snapshot

2023-02-10 Thread Kevin Lam
Thanks for the response Gyula! Those caveats make sense, and I see, there's a bit of a complexity to consider if the feature is implemented. I do think it would be useful, so would also love to hear what others think! On Wed, Feb 8, 2023 at 3:47 AM Gyula Fóra wrote: > Hi Kevin! > > Thanks for

Re: Flink Operator - Supporting Recovery from Snapshot

2023-02-08 Thread Gyula Fóra
Hi Kevin! Thanks for starting this discussion. On a high level what you are proposing is quite simple: if the initial savepoint path changes we use that for the upgrade. I see a few caveats here that may be important: 1. To use a new savepoint/checkpoint path for recovery we have to stop the

Flink Operator - Supporting Recovery from Snapshot

2023-02-07 Thread Kevin Lam
Hello, I was reading the Flink Kubernetes Operator documentation and noticed that if you want to redeploy a Flink job from a specific snapshot, you must follow these manual recovery steps. Are there plans to streamline this process? Deploying from a specific snapshot is a relatively common