Re: error recovery in multiple thread reading from Kafka with HighLevel api

Chen Wang Fri, 08 Aug 2014 10:41:27 -0700

Thanks,Guozhang,
So if I switch to SimpleConsumer, will these problems be taken care of
already? I would assume that I will need to manage all the offset by
myself, including the error recovery logic, right?
Chen



On Fri, Aug 8, 2014 at 8:05 AM, Guozhang Wang <wangg...@gmail.com> wrote:

> Hello Chen,
>
> 1. Manually commit offsets does have the risk of duplicates, consider the
> following pattern:
>
> message = consumer.next();
> process(message);
> consumer.commit();
>
> the rebalance can happen between line 2 and 3, where the message has been
> processed but offset not being committed, if another consumer picks up this
> partition after the rebalance, it may re-consume this message again. With
> auto.commit turned on, offsets will always be committed before the
> consumers release ownership of partitions during rebalances.
>
> In the 0.9 consumer design, we have fixed this issue by introducing the
> onPartitionDeassigned callback, you can take a look at its current API
> here:
>
>
> http://people.apache.org/~nehanarkhede/kafka-0.9-producer-javadoc/doc/org/apache/kafka/clients/consumer/KafkaConsumer.html
>
> 2. Commit offsets too often does have an overhead since it is going to
> Zookeeper, and ZK is not write-scalable. We are also fixing that issue by
> moving the offset management from ZK to kafka servers. This is already
> checked in trunk, and will be included in 0.8.2 release.
>
> Guozhang
>
>
> On Thu, Aug 7, 2014 at 5:36 PM, Chen Wang <chen.apache.s...@gmail.com>
> wrote:
>
> > Guozhang,
> > Just to make it clear:
> > If I have 10 threads with the same consumer group id, read the topic T.
> The
> > auto commit is turned off, and commitOffset is called only when the
> message
> > is processed successfully.
> > If thread 1 dies when processing message from partition P1, and the last
> > offset is Offset1.   Then kafka will ensure that one of the other
> running 9
> > threads will automatically pick up the message on partition P1 from
> Offset1
> > ? will the thread have the risk of reading the same message more than
> once?
> >
> > Also I would assume commit offset for each message is a bit heavy. What
> you
> > guys usually do for error handling during reading kafka?
> > Thanks much!
> > Chen
> >
> >
> >
> > On Thu, Aug 7, 2014 at 5:18 PM, Guozhang Wang <wangg...@gmail.com>
> wrote:
> >
> > > Yes, in that case you can turn of auto commit and call commitOffsets
> > > manually after processing is finished. commitOffsets() will only write
> > the
> > > offset of the partitions that the consumer is currently fetching, so
> > there
> > > is no need to coordinate this operation.
> > >
> > >
> > > On Thu, Aug 7, 2014 at 5:03 PM, Chen Wang <chen.apache.s...@gmail.com>
> > > wrote:
> > >
> > > > But with the auto commit turned on, I am risking off losing the
> failed
> > > > message, right? should I turn off the auto commit, and only commit
> the
> > > > offset when the message is processed successfully..But that would
> > require
> > > > the coordination between threads in order to know what is the right
> > > timing
> > > > to commit offset..
> > > >
> > > >
> > > >
> > > > On Thu, Aug 7, 2014 at 4:54 PM, Guozhang Wang <wangg...@gmail.com>
> > > wrote:
> > > >
> > > > > Hello Chen,
> > > > >
> > > > > With high-level consumer, the partition re-assignment is automatic
> > upon
> > > > > consumer failures.
> > > > >
> > > > > Guozhang
> > > > >
> > > > >
> > > > > On Thu, Aug 7, 2014 at 4:41 PM, Chen Wang <
> > chen.apache.s...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Folks,
> > > > > >  I have a process started at specific time and read from a
> specific
> > > > > topic.
> > > > > > I am currently using the High Level API(consumer group) to read
> > from
> > > > > > kafka(and will stop once there is nothing in the topic by
> > specifying
> > > a
> > > > > > timeout). i am most concerned about error recovery in multiple
> > thread
> > > > > > context. If one thread dies, will other running bolt threads
> picks
> > up
> > > > the
> > > > > > failed message? Or I have to start another thread in order to
> pick
> > up
> > > > the
> > > > > > failed message? What would be  a good practice to ensure the
> > message
> > > > can
> > > > > be
> > > > > > processed at least once?
> > > > > >
> > > > > > Note that all threads are using the same group id.
> > > > > >
> > > > > > Thanks,
> > > > > > Chen
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: error recovery in multiple thread reading from Kafka with HighLevel api

Reply via email to