Hi Jim,

In one of the applications, we implemented option #1:

messageList = getNext(1000)
process(messageList)
commit()

In case of failure, this resulted in duplicate processing for at most 1000
records per partition.

Regards,
Anand


On 1 August 2014 20:35, Jim <jimi...@gmail.com> wrote:

> Thanks Guozhang,
>
> I was looking for actual real world workflows. I realize you can commit
> after each message but if you’re using ZK for offsets for instance you’ll
> put too much write load on the nodes and crush your throughput. So I was
> interested in batching strategies people have used that balance high/full
> throughput and fully committed events.
>
>
> On Thu, Jul 31, 2014 at 8:16 AM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Hi Jim,
> >
> > Whether to use high level or simple consumer depends on your use case. If
> > you need to manually manage partition assignments among your consumers,
> or
> > you need to commit your offsets elsewhere than ZK, or you do not want
> auto
> > rebalancing of consumers upon failures etc, you will use simple
> consumers;
> > otherwise you use high level consumer.
> >
> > From your description of pulling a batch of messages it seems you are
> > currently using the simple consumer. Suppose you are using the high level
> > consumer, to achieve at-lease-once basically you can do sth like:
> >
> > message = consumer.iter.next()
> > process(message)
> > consumer.commit()
> >
> > which is effectively the same as option 2 for using a simple consumer. Of
> > course, doing so has a heavy overhead of one-commit-per-message, you can
> > also do option 1, by the cost of duplicates, which is tolerable for
> > at-least-once.
> >
> > Guozhang
> >
> >
> > On Wed, Jul 30, 2014 at 8:25 PM, Jim <jimi...@gmail.com> wrote:
> >
> > > Curious on a couple questions...
> > >
> > > Are most people(are you?) using the simple consumer vs the high level
> > > consumer in production?
> > >
> > >
> > > What is the common processing paradigm for maintaining a full pipeline
> > for
> > > kafka consumers for at-least-once messaging? E.g. you pull a batch of
> > 1000
> > > messages and:
> > >
> > > option 1.
> > > you wait for the slowest worker to finish working on that message, when
> > you
> > > get back 1000 acks internally you commit your offset and pull another
> > batch
> > >
> > > option 2.
> > > you feed your workers n msgs at a time in sequence and move your offset
> > up
> > > as you work through your batch
> > >
> > > option 3.
> > > you maintain a full stream of 1000 messages ideally and as you get acks
> > > back from your workers you see if you can move your offset up in the
> > stream
> > > to pull n more messages to fill up your pipeline so you're not blocked
> by
> > > the slowest consumer (probability wise)
> > >
> > >
> > > any good docs or articles on the subject would be great, thanks!
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

Reply via email to