Matthias,

On your response "For at-least-once, you would still get output
continuously, depending on throughput and producer configs"
- Not sure what you mean by throughput here, which configuration would
dictate that?
- By producer config, i hope you mean batching and other settings that will
hold off producing of events. Correct me if i'm wrong

On your response "For regular rebalances/restarts, a longer commit interval
has no impact because offsets would be committed right away"
- Do you mean here that the kafka streams internally handles waiting on
processing and offset commits of events that are already consumed and being
processed for streams instance?

On Tue, Oct 5, 2021 at 11:43 AM Matthias J. Sax <mj...@apache.org> wrote:

> The main motivation for a shorter commit interval for EOS is
> end-to-end-latency. A Topology could consist of multiple sub-topologies
> and the end-to-end-latency for the EOS case is roughly commit-interval
> times number-of-subtopologies.
>
> For regular rebalances/restarts, a longer commit interval has no impact,
> because for a regular rebalance/restart, offsets would be committed
> right away to guarantee a clean hand-off. Only in case of failure, a
> longer commit interval can lead to larger amount of duplicates (of
> course only for at-least-once guarantees).
>
> For at-least-once, you would still get output continuously, depending on
> throughput and producer configs. Only offsets are committed each 30
> seconds by default. This continuous output is also the reason why there
> is not latency impact for at-least-once using a longer commit interval.
>
> Beside an impact on latency, there is also a throughput impact. Using a
> longer commit interval provides higher throughput.
>
>
> -Matthias
>
>
> On 10/4/21 7:31 AM, Pushkar Deole wrote:
> > Hi All,
> >
> > I am looking into the commit.interval.ms in kafka streams which says
> that
> > it is the time interval at which streams would commit offsets to source
> > topics.
> > However for exactly-once guarantee, default of this time is 100ms whereas
> > for at-least-once it is 30000ms (i.e. 30sec)
> > Why is there such a huge time difference between the 2 guarantees and
> what
> > does it mean to have this interval as high as 30 seconds, does it also
> > cause more probability of higher no. of duplicates in case of application
> > restarts or partition rebalance ?
> > Does it mean that the streams application would also publish events to
> > destination topic only at this interval which means delay in publishing
> > events to destinations topic ?
> >
>

Reply via email to