Yes, I mean we can only consume half the messages produced. I followed the high-level consumer example here: https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example.
Let me give a more complete scenario: - We run 3 zookeepers - We run 2 brokers - We do not have a topic defined, but we have enabled topic auto-creation (with a replication factor of 2? must check this) - We connect the producer to both brokers (pocmsg5:9092,pocmsg6:9092) - We stuff the topic into the KeyedMessage key with no Partitioner. I was not aware of the use of the key until last night. - We generate 10 messages - Topic auto-creation results in the following partitions: topic: unittest-test-msg partition: 0 leader: 0 replicas: 0 isr: 0 topic: unittest-test-msg partition: 1 leader: 1 replicas: 1 isr: 1 topic: unittest-test-msg partition: 2 leader: 0 replicas: 0 isr: 0 topic: unittest-test-msg partition: 3 leader: 1 replicas: 1 isr: 1 - We construct a single Kafka stream by calling createStreams with a zookeeper (pocmsg5:2181) and one thread public <K,V> Map<String, List<KafkaStream<K,V>>> createMessageStreams( Map<String, Integer> topicCountMap, Decoder<K> keyDecoder, Decoder<V> valueDecoder) - We consume only half the messages - It looks as if partitions 0 and 2 are on pocmsg5, while partitions 1 and 3 are on pocmsg6. Is it best to view the situation as 2 partitions, each a leader, with a replica follower for each? which partitions are leaders and which are replicas? What happened with auto-creation and production and partitioning? Which partition(s) is the zookeeper pointing the high-level consumer to read from? thanks, rob > -----Original Message----- > From: Jun Rao [mailto:jun...@gmail.com] > Sent: Wednesday, May 01, 2013 11:15 PM > To: users@kafka.apache.org > Subject: Re: consuming only half the messages produced > > Partition is different from replicas. A topic can have one or more partitions > and each partition can have one or more replicas. A consumer consumes data > at partition level. In other words, a consumer gets the same data no matter > how many replicas are there. > > When you say the consumer only gets half of the messages, do you mean that > it gets half of the messages that are produced? > > You may want to take a look at the consumer example in > http://kafka.apache.org/08/api.html > > Thanks, > > Jun > > > On Wed, May 1, 2013 at 7:14 PM, Rob Withers <reefed...@gmail.com> wrote: > > > Running a consumer group (createStreams()), pointing to the zookeeper > > and with the topic and 1 consumer thread, results in only half the > > messages being consumed. The topic was auto-created, with a > > replication factor of 2, but the producer was configured to produce to > > 2 brokers and so 4 partitions resulted. Are half getting sent to one > > leader, in one broker, and the other half getting sent to another > > leader, in the other broker, but the consumer stream is only reading > > from one leader from the zk? Shouldn't there only be one leader? > > > > > > > > thanks, > > > > rob > > > > > > > >