Re: State bootstrapping for Flink SQL / Table API jobs

Илья Соин Mon, 24 Apr 2023 00:31:01 -0700

Hi Shammon FY,

I haven’t tried it because AFIK it’s only available in the DataStream API, while our job is in SQL. I’m thinking to write a custom HybridDynamicTableSource which will use HybridSource under the hood. This should allow to bootstrap any SQL / Table API job. Maybe it’s something worth adding to the Flink distribution?

Sincerely,

Ilya Soin

On 24 Apr 2023, at 03:37, Shammon FY <[email protected]> wrote:

Hi Илья

I think HybridSource may be a good way. Have you tried it before? Or have you encountered any problems?

Best,
Shammon FY

On Fri, Apr 21, 2023 at 5:59 PM Илья Соин <[email protected]> wrote:
Hi Flink community,

We have a quite complex sql job, it unions 5 topics, deduplicates by key and does some daily aggregations. The state TTL is 40 days. We want to be able to bootstrap its state from s3 or clickhouse. We want to have a general solution to this, to use for other SQL jobs as well.

So far I haven’t found a working solution to this. I’d like to discuss what’s the best approach to take here and possibly contribute in to Flink.

I think a good solution would be to bring HybridSource to Table / SQL API.

Another thought was to take the SQL, replace unbounded sources with bounded ones, and run the job. Then take a savepoint in the end and use it to bootstrap the streaming job. The problems I see here:
- we have no control over operator uuids and the final table plan, it’s possible the plan of the batch job will be slightly different than of the streaming job.

--
Sincerely,
Ilya Soin

Re: State bootstrapping for Flink SQL / Table API jobs

Reply via email to