There isn't much difference btw option 1 and 2 in terms of the offset commit overhead to Zookeeper. In 0.8.2, we will have a Kafka-based offset management, which is much more efficient than committing to Zookeeper.
Thanks, Jun On Tue, Jan 6, 2015 at 10:45 AM, Rafi Shamim <r...@knewton.com> wrote: > Hello, > > I would like to write a multi-threaded consumer for the high-level > consumer in Kafka 0.8.1. I have found two ways that seem feasible > while keeping the guarantee that messages in a partition are processed > in order. I would appreciate any feedback this list has. > > Option 1 > -------- > - Create multiple threads, so each thread has its own ConsumerConnector. > - Manually commit offsets in each thread after every N messages. > - This was discussed a bit on this list previously. See [1]. > > ### Questions > - Is there a problem with making multiple ConsumerConnectors per machine? > - What does it take for ZooKeeper to handle this much load? We have a > 3-node ZooKeeper cluster with relatively small machines. (I expect the > topic will have about 40 messages per second. There will be 3 consumer > groups. That would be 120 commits per second at most, but I can reduce > the frequency of commits to make this lower.) > > ### Extra info > Kafka 0.9 will have an entirely different commit API, which will allow > one connection to commit offsets per partition, but I can’t wait that > long. See [2]. > > > Option 2 > -------- > - Create one ConsumerConnector, but ask for multiple streams in that > connection. Give each thread one stream. > - Since there is no way to commit offsets per stream right now, we > need to do autoCommit. > - This sacrifices the at-least-once processing guarantee, which would > be nice to have. See KAFKA-1612 [3]. > > ### Extra info > - There was some discussion in KAFKA-996 about a markForCommit() > method so that autoCommit would preserve the at-least-once guarantee, > but it seems more likely that the consumer API will just be redesigned > to allow commits per partition instead. See [4]. > > > So basically I'm wondering if option 1 is feasible. If not, I'll just > do option 2. Of course, let me know if I was mistaken about anything > or if there is another design which is better. > > Thanks in advance. > Rafi > > [1] > http://mail-archives.apache.org/mod_mbox/kafka-users/201310.mbox/%3cff142f6b499ae34caed4d263f6ca32901d35a...@extxmb19.nam.nsroot.net%3E > [2] > https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite#ClientRewrite-ConsumerAPI > [3] https://issues.apache.org/jira/browse/KAFKA-1612 > [4] https://issues.apache.org/jira/browse/KAFKA-966 >