Hi Kevin,

You can use partition.assignment.strategy=roundrobin.
This will balance all the partition of all the topics across consumer
thread.

I think the rationale behind using default consumer id is that you will
have better information to identify a consumer. But if you want to have
some specific value in the consumer id, I think you can just do it.


Jiangjie (Becket) Qin

On 3/9/15, 11:40 AM, "Kevin Scaldeferri" <ke...@scaldeferri.com> wrote:

>https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/consu
>mer/ConsumerConfig.scala#L101
>suggests that 'consumer.id' should only be set explicitly for testing
>purposes.  Is there a reason that it would be a bad idea to set it
>ourselves for production use?
>
>The reason I am asking is that it seems like the standard value, which
>starts with the hostname, produces somewhat sub-optimal distribution of
>partitions under the lexicographical sort.  If the number of partitions is
>not an exact multiple of the number of consumers, the surplus or deficit
>tends to be concentrated on just one or two machines.  We'd much rather if
>the extra partitions were evenly striped across our cluster.
>
>(Also, in addition to the above concern, we'd also find it useful in
>debugging situations if we included some application-specific values in
>the
>consumer ID beyond just hostname.)
>
>Do other people run into this?  Are there problems with setting the
>consumer.id in order to affect the distribution of partitions?
>
>Thanks,
>-kevin

Reply via email to