Hi Kevin, You can use partition.assignment.strategy=roundrobin. This will balance all the partition of all the topics across consumer thread.
I think the rationale behind using default consumer id is that you will have better information to identify a consumer. But if you want to have some specific value in the consumer id, I think you can just do it. Jiangjie (Becket) Qin On 3/9/15, 11:40 AM, "Kevin Scaldeferri" <ke...@scaldeferri.com> wrote: >https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/consu >mer/ConsumerConfig.scala#L101 >suggests that 'consumer.id' should only be set explicitly for testing >purposes. Is there a reason that it would be a bad idea to set it >ourselves for production use? > >The reason I am asking is that it seems like the standard value, which >starts with the hostname, produces somewhat sub-optimal distribution of >partitions under the lexicographical sort. If the number of partitions is >not an exact multiple of the number of consumers, the surplus or deficit >tends to be concentrated on just one or two machines. We'd much rather if >the extra partitions were evenly striped across our cluster. > >(Also, in addition to the above concern, we'd also find it useful in >debugging situations if we included some application-specific values in >the >consumer ID beyond just hostname.) > >Do other people run into this? Are there problems with setting the >consumer.id in order to affect the distribution of partitions? > >Thanks, >-kevin