Re: Flink Sink when using AsyncSinkBase

Ahmed Hamdy Mon, 01 Sep 2025 02:13:29 -0700

Hi Mark,
I guess you can add an offset within the pipeline before the sink itself,
the fact the destination doesn't use transactions makes it harder to
imagine exactly-once semantics being implemented, we have to consider cases
of failover where the sink sent some records to the destination then the
whole job was restarted and data was reprocessed and the sink just sent the
same records again, without transactions (or idempotency) we will always
resort back to at-least-once.


Best Regards
Ahmed Hamdy


On Mon, 1 Sept 2025 at 07:03, Mark Zitnik <[email protected]> wrote:

> Hi All,
>
> Thanks for your answer
> Our destination does not support transactions, and the data is
> appended only. Having those constraints, I was thinking, is there a
> unique counter that I can attach to each record (maybe Kafka offsets). So I
> can use an external store to verify what was committed or not.
>
>  Regards,
>
> Mark
>
> On Thu, Aug 28, 2025 at 11:27 AM Ahmed Hamdy <[email protected]> wrote:
>
>> Hi Mark,
>>
>> > the sink has to stop at each checkpoint, wait for all outstanding
>> requests to finish, and force writes to happen in order so you can clearly
>> separate "records before this checkpoint" from "records after it." At that
>> point you’re basically running synchronously, so the whole async throughput
>> benefit is gone.
>>
>> I second Poorvank's view of it.
>>
>> You can work around it by making the requests themselves idempotent,
>> after all the sink itself doesn't do any processing.
>>
>>
>> Best Regards
>> Ahmed Hamdy
>>
>>
>> On Wed, 27 Aug 2025 at 19:54, Poorvank Bhatia <[email protected]>
>> wrote:
>>
>>> Hi Mark,
>>>
>>> Afaik, AsyncSinkBase was mainly introduced as a generic helper for
>>> high-throughput async clients. The main goal was to allow many requests in
>>> flight at once with buffering and retries, while still being
>>> checkpoint-aware for *at-least-once*.
>>>
>>> From my understanding, exactly-once might not really fit this model.
>>> IMO, to make exactly-once work, the sink has to stop at each
>>> checkpoint, wait for all outstanding requests to finish, and force writes
>>> to happen in order so you can clearly separate "records before this
>>> checkpoint" from "records after it." At that point you’re basically running
>>> synchronously, so the whole async throughput benefit is gone.
>>>
>>> For systems that actually support transactions  (Kafka EOS, JDBC XA,
>>> File sinks, etc.), Flink has separate *committing sink* designs that
>>> are a better fit.
>>>
>>> That’s my read of it — but I might be missing some angle. Could you
>>> share what destination / use case you’re thinking about? If the system
>>> supports transactional semantics, a committing sink might be worth
>>> considering; if not, AsyncSinkBase might already be the right tool (high
>>> throughput + at-least-once).
>>>
>>> On Wed, Aug 27, 2025 at 3:17 PM Mark Zitnik <[email protected]>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> Reading the AsyncSinkBase source code, I encountered a section of
>>>> limitations.
>>>>
>>>> One of the points is "We are not considering support for exactly-once
>>>> semantics at this point."
>>>>
>>>> Can someone share the reason for this? Any plans to develop it in the
>>>> future?
>>>>
>>>> Regards,
>>>>
>>>> Mark
>>>>
>>>

Re: Flink Sink when using AsyncSinkBase

Reply via email to