There isn't much difference btw option 1 and 2 in terms of the offset
commit overhead to Zookeeper. In 0.8.2, we will have a Kafka-based offset
management, which is much more efficient than committing to Zookeeper.

Thanks,

Jun

On Tue, Jan 6, 2015 at 10:45 AM, Rafi Shamim <r...@knewton.com> wrote:

> Hello,
>
> I would like to write a multi-threaded consumer for the high-level
> consumer in Kafka 0.8.1. I have found two ways that seem feasible
> while keeping the guarantee that messages in a partition are processed
> in order. I would appreciate any feedback this list has.
>
> Option 1
> --------
> - Create multiple threads, so each thread has its own ConsumerConnector.
> - Manually commit offsets in each thread after every N messages.
> - This was discussed a bit on this list previously. See [1].
>
> ### Questions
> - Is there a problem with making multiple ConsumerConnectors per machine?
> - What does it take for ZooKeeper to handle this much load? We have a
> 3-node ZooKeeper cluster with relatively small machines. (I expect the
> topic will have about 40 messages per second. There will be 3 consumer
> groups. That would be 120 commits per second at most, but I can reduce
> the frequency of commits to make this lower.)
>
> ### Extra info
> Kafka 0.9 will have an entirely different commit API, which will allow
> one connection to commit offsets per partition, but I can’t wait that
> long. See [2].
>
>
> Option 2
> --------
> - Create one ConsumerConnector, but ask for multiple streams in that
> connection. Give each thread one stream.
> - Since there is no way to commit offsets per stream right now, we
> need to do autoCommit.
> - This sacrifices the at-least-once processing guarantee, which would
> be nice to have. See KAFKA-1612 [3].
>
> ### Extra info
> - There was some discussion in KAFKA-996 about a markForCommit()
> method so that autoCommit would preserve the at-least-once guarantee,
> but it seems more likely that the consumer API will just be redesigned
> to allow commits per partition instead. See [4].
>
>
> So basically I'm wondering if option 1 is feasible. If not, I'll just
> do option 2. Of course, let me know if I was mistaken about anything
> or if there is another design which is better.
>
> Thanks in advance.
> Rafi
>
> [1]
> http://mail-archives.apache.org/mod_mbox/kafka-users/201310.mbox/%3cff142f6b499ae34caed4d263f6ca32901d35a...@extxmb19.nam.nsroot.net%3E
> [2]
> https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite#ClientRewrite-ConsumerAPI
> [3] https://issues.apache.org/jira/browse/KAFKA-1612
> [4] https://issues.apache.org/jira/browse/KAFKA-966
>

Reply via email to