https://github.com/apache/kafka/blob/0.8.2/core/src/main/scala/kafka/consumer/ConsumerConfig.scala#L101
suggests that 'consumer.id' should only be set explicitly for testing
purposes.  Is there a reason that it would be a bad idea to set it
ourselves for production use?

The reason I am asking is that it seems like the standard value, which
starts with the hostname, produces somewhat sub-optimal distribution of
partitions under the lexicographical sort.  If the number of partitions is
not an exact multiple of the number of consumers, the surplus or deficit
tends to be concentrated on just one or two machines.  We'd much rather if
the extra partitions were evenly striped across our cluster.

(Also, in addition to the above concern, we'd also find it useful in
debugging situations if we included some application-specific values in the
consumer ID beyond just hostname.)

Do other people run into this?  Are there problems with setting the
consumer.id in order to affect the distribution of partitions?

Thanks,
-kevin

Reply via email to