To avoid some consumers not consuming anything, one small trick might be that if a consumer found itself not getting any partition, it can force a rebalancing by deleting its own registration path and re-register in ZK.
On Mon, Jan 27, 2014 at 4:32 PM, David Birdsong <[email protected]>wrote: > On Mon, Jan 27, 2014 at 4:19 PM, Guozhang Wang <[email protected]> wrote: > > > Hello David, > > > > One thing about using ZK locks to "own" a partition is load balancing. If > > you are unlucky some consumer may get all the locks and some may get > none, > > hence have no partitions to consume. > > > > I've considered this and even encountered it in testing. For our current > load levels, we won't hurt us, but if there's a good solution, I'd rather > codify smooth consumer balance. > > Got any suggestions? > > My thinking thus far is to establish some sort of identity on the consumer > and derive an evenness or oddness or some modulo value that induces a small > delay when encountering particular partition numbers. It's a hacky idea, > but is pretty simple and might be good enough for smoothing consumers. > > > > Also you may need some synchronization between the consumer thread with > the > > offset thread. For example, when an event is fired and the consumers need > > to re-try grabbing the locks, it needs to first stop current fetchers, > > commit offsets, and then start owning new partitions. > > > > This is current design and what I have implemented so far. The last thread > to exit is the offset thread and it has a direct communication channel to > the consumer threads so it waits for those channels to be closed before > it's last flush and exit. > > > > Guozhang > > > > > Thanks for the input! > > > > > > On Mon, Jan 27, 2014 at 3:03 PM, David Birdsong < > [email protected] > > >wrote: > > > > > Hey All, I've been cobbling together a high-level consumer for golang > > > building on top of Shopify's Sarama package and wanted to run the basic > > > design by the list and get some feedback or pointers on things I've > > missed > > > or will eventually encounter on my own. > > > > > > I'm using zookeeper to coordinate topic-partition owners for consumer > > > members in each consumer group. I followed the znode layout that's > > apparent > > > from watching the console consumer. > > > > > > <consumer_root>/<consumer_group_name>/{offsets,owners,ids}. > > > > > > The consumer uses an outer loop to discover the partition list for a > > given > > > topic, attempts to grab a zookeeper lock on each (topic,partition) > tuple, > > > and then for each (topic, partition) it successfully locks, launches a > > > thread (goroutine) for each partition to read the partition stream. > > > > > > The outer loop continues to watch for children events either of: > > > > > > > > > <consumer_root>/<consumer_group>/owners/<topic_name><kafka_root>/brokers/topics/<topic_name>/partitions > > > > > > ...any watch event that fires causes all offset data and consumer > handles > > > to be flushed and closed, goroutines watching topic-partitions exit. > The > > > loop is restarted. > > > > > > Another thread reads topic-partition-offset data and flushes the offset > > > > > > to:<consumer_root>/<consumer_group>/offsets/<topic_name/<partition_number> > > > > > > Have I oversimplified or missed any critical steps? > > > > > > > > > > > -- > > -- Guozhang > > > -- -- Guozhang
