Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

Andrew Otto Thu, 23 May 2024 12:59:25 -0700

Ah I see, so just auto-restarting to pick up new stuff.

I'd love to understand how Paimon does this.  They have a database sync
action
<https://paimon.apache.org/docs/master/flink/cdc-ingestion/mysql-cdc/#synchronizing-databases>
which will sync entire databases, handle schema evolution, and I'm pretty
sure (I think I saw this in my local test) also pick up new tables.


https://github.com/apache/paimon/blob/master/paimon-flink/paimon-flink-cdc/src/main/java/org/apache/paimon/flink/action/cdc/SyncDatabaseActionBase.java#L45

I'm sure that Paimon table format is great, but at Wikimedia Foundation we
are on the Iceberg train.  Imagine if there was a flink-cdc full database
sync to Flink IcebergSink!




On Thu, May 23, 2024 at 3:47 PM Péter Váry <[email protected]>
wrote:

> I will ask Marton about the slides.
>
> The solution was something like this in a nutshell:
> - Make sure that on job start the latest Iceberg schema is read from the
> Iceberg table
> - Throw a SuppressRestartsException when data arrives with the wrong schema
> - Use Flink Kubernetes Operator to restart your failed jobs by setting
> kubernetes.operator.job.restart.failed
>
> Thanks, Peter
>
> On Thu, May 23, 2024, 20:29 Andrew Otto <[email protected]> wrote:
>
>> Wow, I would LOVE to see this talk.  If there is no recording, perhaps
>> there are slides somewhere?
>>
>> On Thu, May 23, 2024 at 11:00 AM Carlos Sanabria Miranda <
>> [email protected]> wrote:
>>
>>> Hi everyone!
>>>
>>> I have found in the Flink Forward website the following presentation: 
>>> "Self-service
>>> ingestion pipelines with evolving schema via Flink and Iceberg
>>> <https://www.flink-forward.org/seattle-2023/agenda#self-service-ingestion-pipelines-with-evolving-schema-via-flink-and-iceberg>"
>>> by Márton Balassi from the 2023 conference in Seattle, but I cannot find
>>> the recording anywhere. I have found the recordings of the other
>>> presentations in the Ververica Academy website
>>> <https://www.ververica.academy/app>, but not this one.
>>>
>>> Does anyone know where I can find it? Or at least the slides?
>>>
>>> We are using Flink with the Iceberg sink connector to write streaming
>>> events to Iceberg tables, and we are researching how to handle schema
>>> evolution properly. I saw that presentation and I thought it could be of
>>> great help to us.
>>>
>>> Thanks in advance!
>>>
>>> Carlos
>>>
>>

Re: "Self-service ingestion pipelines with evolving schema via Flink and Iceberg" presentation recording from Flink Forward Seattle 2023

Reply via email to