Yes, I mean we can only consume half the messages produced. I followed the
high-level consumer example here:
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example.
Let me give a more complete scenario:
- We run 3 zookeepers
- We run 2 brokers
- We do not have a topic defined, but we have enabled topic auto-creation
(with a replication factor of 2? must check this)
- We connect the producer to both brokers (pocmsg5:9092,pocmsg6:9092)
- We stuff the topic into the KeyedMessage key with no Partitioner. I was
not aware of the use of the key until last night.
- We generate 10 messages
- Topic auto-creation results in the following partitions:
topic: unittest-test-msg partition: 0 leader: 0
replicas: 0 isr: 0
topic: unittest-test-msg partition: 1 leader: 1
replicas: 1 isr: 1
topic: unittest-test-msg partition: 2 leader: 0
replicas: 0 isr: 0
topic: unittest-test-msg partition: 3 leader: 1
replicas: 1 isr: 1
- We construct a single Kafka stream by calling createStreams with a
zookeeper (pocmsg5:2181) and one thread
public <K,V> Map<String, List<KafkaStream<K,V>>>
createMessageStreams(
Map<String, Integer> topicCountMap,
Decoder<K> keyDecoder,
Decoder<V> valueDecoder)
- We consume only half the messages
- It looks as if partitions 0 and 2 are on pocmsg5, while partitions 1 and 3
are on pocmsg6.
Is it best to view the situation as 2 partitions, each a leader, with a
replica follower for each?
which partitions are leaders and which are replicas?
What happened with auto-creation and production and partitioning?
Which partition(s) is the zookeeper pointing the high-level consumer to read
from?
thanks,
rob
> -----Original Message-----
> From: Jun Rao [mailto:[email protected]]
> Sent: Wednesday, May 01, 2013 11:15 PM
> To: [email protected]
> Subject: Re: consuming only half the messages produced
>
> Partition is different from replicas. A topic can have one or more
partitions
> and each partition can have one or more replicas. A consumer consumes data
> at partition level. In other words, a consumer gets the same data no
matter
> how many replicas are there.
>
> When you say the consumer only gets half of the messages, do you mean that
> it gets half of the messages that are produced?
>
> You may want to take a look at the consumer example in
> http://kafka.apache.org/08/api.html
>
> Thanks,
>
> Jun
>
>
> On Wed, May 1, 2013 at 7:14 PM, Rob Withers <[email protected]> wrote:
>
> > Running a consumer group (createStreams()), pointing to the zookeeper
> > and with the topic and 1 consumer thread, results in only half the
> > messages being consumed. The topic was auto-created, with a
> > replication factor of 2, but the producer was configured to produce to
> > 2 brokers and so 4 partitions resulted. Are half getting sent to one
> > leader, in one broker, and the other half getting sent to another
> > leader, in the other broker, but the consumer stream is only reading
> > from one leader from the zk? Shouldn't there only be one leader?
> >
> >
> >
> > thanks,
> >
> > rob
> >
> >
> >
> >