It controls the minimum frequency at which the offsets are committed. The
StreamThread runs in a loop that looks something like this:

while(true)
   records = consumer.poll(..)
   for each record
      task = findTask(record)
      task.process(..)
   end
   maybeCommit()
end

This is grossly simplified, but each time through the loop the consumer may
receive some records. Those records may take milliseconds, seconds,
minutes, to process. At the end of the loop maybeCommit is called - it will
only commit the offsets if the value of COMMIT_INTERVAL_MS_CONFIG has
passed.



On Fri, 3 Feb 2017 at 10:56 Mahendra Kariya <mahendra.kar...@go-jek.com>
wrote:

> Thanks Damian for this info.
>
> On Fri, Feb 3, 2017 at 3:29 PM, Damian Guy <damian....@gmail.com> wrote:
>
> > The commit is done on the same thread as the processing, so only offsets
> > that have been fully processed by the topology will be committed.
> >
>
>
> I am still not clear about why do we need the COMMIT_INTERVAL_MS_CONFIG
> config
> for streams. If the offsets are only going to be committed after the
> topology processing is over, what does this config exactly do?
>

Reply via email to