Hi Vinoth,
Thanks for your reply!
Regarding the first part, I agree that precombine solves a lot of issues,
especially during the ingestion.
I think this is a valid behavior and should be preserved so that we can
enjoy out-of-order events and duplicates handled by the framework.
I'm also aware
Hi Vinoth,
I just want to make sure my issue was clear - it seems like Spark shouldn’t be
requiring a precombined field (or checking that it exists) when dropping
partitions.
Thanks,
— Ken
> On Apr 4, 2023, at 7:31 AM, Vinoth Chandar wrote:
>
> Thanks for raising this issue.
>
> Love to
This current thread is another example of a practical need for pre
combine field.
"[DISCUSS] split source of kafka partition by count"
On Tue, Apr 4, 2023 at 7:31 AM Vinoth Chandar wrote:
> Thanks for raising this issue.
>
> Love to use this opp to share more context on why the preCombine
Thanks for raising this issue.
Love to use this opp to share more context on why the preCombine field
exists.
- As you probably inferred already, we needed to eliminate duplicates,
while dealing with out-of-order data (e.g database change records arriving
in different orders from two
Hi Daniel,
Thanks for the detailed write-up.
I can’t add much to the discussion, other than noting we also recently ran into
the related oddity that we don’t need to define a precombine when writing data
to a COW table (using Flink), but then trying to use Spark to drop partitions
failed
Hi all,
I would like to bring up the topic of how precombine field is used and
what's the purpose of it. I would also like to know what are the plans for
it in the future.
At first glance precombine filed looks like it's only used to deduplicate
records in incoming batch.
But when digging deeper