Re:Re: How to handle deletion of items using PyFlink SQL?

2022-06-15 Thread Xuyang
So when t=C2 arrives, the source connector must send a `DELETE` message about that the row C should be deleted to downstream, and send a new 'INSERT' message to notify downstream that a new row D should be insert into the sink. This source connector is just like a CDC source but it seems that

Re: How to handle deletion of items using PyFlink SQL?

2022-06-14 Thread John Tipper
Yes, I’m interested in the best pattern to follow with SQL to allow for a downstream DB using the JDBC SQL connector to reflect the state of rows added and deleted upstream. So imagine there is a crawl event at t=C1 that happens with an associated timestamp and which finds resources A,B,C. Is

Re:Re: How to handle deletion of items using PyFlink SQL?

2022-06-09 Thread Xuyang
Hi, Dian Fu. I think John's requirement is like a cdc source that the source needs the ability to know which of datas should be deleted and then notify the framework, and that is why I recommendation John to use the UDTF. And hi, John. I'm not sure this doc [1] is enough. BTW,

Re: How to handle deletion of items using PyFlink SQL?

2022-06-08 Thread Dian Fu
Hi John, If you are using Table API & SQL, the framework is handling the RowKind and it's transparent for you. So usually you don't need to handle RowKind in Table API & SQL. Regards, Dian On Thu, Jun 9, 2022 at 6:56 AM John Tipper wrote: > Hi Xuyang, > > Thank you very much, I’ll experiment

Re: How to handle deletion of items using PyFlink SQL?

2022-06-08 Thread John Tipper
Hi Xuyang, Thank you very much, I’ll experiment tomorrow. Do you happen to know whether there is a Python example of udtf() with a RowKind being set (or whether it’s supported)? Many thanks, John Sent from my iPhone On 8 Jun 2022, at 16:41, Xuyang wrote:  Hi, John. What about use udtf

How to handle deletion of items using PyFlink SQL?

2022-06-08 Thread John Tipper
Hi all, I have some reference data that is periodically emitted by a crawler mechanism into an upstream Kinesis data stream, where those rows are used to populate a sink table (and where I am using Flink 1.13 PyFlink SQL within AWS Kinesis Data Analytics). What is the best pattern to handle