For #3, we need to checkpoint offsets to a central place so that if a consumer fails, another consumer in the same group can pick up from where it's left off.
For #4c, leader change doesn't introduce duplicates. Thanks, Jun On Wed, May 8, 2013 at 9:17 AM, Yu, Libo <libo...@citi.com> wrote: > Hi, > > I read this link > https://cwiki.apache.org/KAFKA/consumer-group-example.html > and have a few questions (if not too many). > > 1 When you say the iterator may block, do you mean hasNext() may block? > > 2 "Remember, you can only use a single process per Consumer Group." > Do you mean we can only use a single process on one node of the > cluster for a consumer group? > Or there can be only one process on the whole cluster for a consumer > group? Please clarify on this. > > 3 Why save offset to zookeeper? Is it easier to save it to a local file? > > 4 When client exits/crashes or leader for a partition is changed, > duplicate messages may be replayed. "To help avoid this (replayed duplicate > messages), make sure you provide a clean way for your client to exit > instead of assuming it can be 'kill -9'd." > > a. For client exit, if the client is receiving data at the time, how > to do a clean exit? How can client tell consumer to write offset to > zookeepr before exiting? > > > b. For client crash, what can client do to avoid duplicate messages > when restarted? What I can think of is to read last message from log file > and ignore the first few received duplicate messages until receiving the > last read message. But is it possible for client to read log file directly? > > > c. For the change of the partition leader, is there anything that > clients can do to avoid duplicates? > > Thanks. > > > > Libo > >