Re: What precombine field really is used for and its future?

2023-04-05 Thread Daniel Kaźmirski
Hi Vinoth, Thanks for your reply! Regarding the first part, I agree that precombine solves a lot of issues, especially during the ingestion. I think this is a valid behavior and should be preserved so that we can enjoy out-of-order events and duplicates handled by the framework. I'm also aware

Re: What precombine field really is used for and its future?

2023-04-05 Thread Ken Krugler
Hi Vinoth, I just want to make sure my issue was clear - it seems like Spark shouldn’t be requiring a precombined field (or checking that it exists) when dropping partitions. Thanks, — Ken > On Apr 4, 2023, at 7:31 AM, Vinoth Chandar wrote: > > Thanks for raising this issue. > > Love to

Re: What precombine field really is used for and its future?

2023-04-04 Thread Vinoth Chandar
This current thread is another example of a practical need for pre combine field. "[DISCUSS] split source of kafka partition by count" On Tue, Apr 4, 2023 at 7:31 AM Vinoth Chandar wrote: > Thanks for raising this issue. > > Love to use this opp to share more context on why the preCombine

Re: What precombine field really is used for and its future?

2023-04-04 Thread Vinoth Chandar
Thanks for raising this issue. Love to use this opp to share more context on why the preCombine field exists. - As you probably inferred already, we needed to eliminate duplicates, while dealing with out-of-order data (e.g database change records arriving in different orders from two

Re: What precombine field really is used for and its future?

2023-04-01 Thread Ken Krugler
Hi Daniel, Thanks for the detailed write-up. I can’t add much to the discussion, other than noting we also recently ran into the related oddity that we don’t need to define a precombine when writing data to a COW table (using Flink), but then trying to use Spark to drop partitions failed

What precombine field really is used for and its future?

2023-03-31 Thread Daniel Kaźmirski
Hi all, I would like to bring up the topic of how precombine field is used and what's the purpose of it. I would also like to know what are the plans for it in the future. At first glance precombine filed looks like it's only used to deduplicate records in incoming batch. But when digging deeper