I filed this jira, fwiw: https://issues.apache.org/jira/browse/KAFKA-2172
Jason On Mon, Mar 23, 2015 at 2:44 PM, Jiangjie Qin <j...@linkedin.com.invalid> wrote: > Hi Jason, > > Yes, I agree the restriction makes the usage of round-robin less flexible. > I think the focus of round-robin strategy is workload balance. If > different consumers are consuming from different topics, it is unbalanced > by nature. In that case, is it possible that you use different consumer > group for different sets of topics? > The rolling update is a good point. If you do rolling bounce in a small > window, the rebalance retry should handle it. But if you want to canary a > new topic setting on one consumer for some time, it won’t work. > Could you maybe share the use case with more detail? So we can see if > there is any workaround. > > Jiangjie (Becket) Qin > > On 3/22/15, 10:04 AM, "Jason Rosenberg" <j...@squareup.com> wrote: > > >Jiangjie, > > > >Yeah, I welcome the round-robin strategy, as the 'range' strategy ('til > >now > >the only one available), is not always good at balancing partitions, as > >you > >observed above. > > > >The main thing I'm bringing up in this thread though is the question of > >why > >there needs to be a restriction to having a homogenous set of consumers in > >the group being balanced. This is not a requirement for the range > >algorithm, but is for the roundrobin algorithm. So, I'm just wanting to > >understand why there's that limitation. (And sadly, in our case, we do > >have heterogenous consumers using the same groupid, so we can't easily > >turn > >on roundrobin at the moment, without some effort :) ). > > > >I can see that it does simplify the implementation to have that > >limitation, > >but I'm just wondering if there's anything fundamental that would prevent > >an implementation that works over heterogenous consumers. E.g. "Lay out > >all partitions, and layout all consumer threads, and proceed round robin > >assigning each partition to the next consumer thread. *If the next > >consumer > >thread doesn't have a selection for the current partition, then move on to > >the next consumer-thread...."* > > > >The current implementation is also problematic if you are doing a rolling > >restart of a consumer cluster. Let's say you are updating the topic > >selection as part of an update to the cluster. Once the first node is > >updated, the entire cluster will no longer be homogenous until the last > >node is updated, which means you will have a temporary outage consuming > >data until all nodes have been updated. So, it makes it difficult to do > >rolling restarts, or canary updates on a subset of nodes, etc. > > > >Jason > > > >Jason > > > >On Fri, Mar 20, 2015 at 10:15 PM, Jiangjie Qin <j...@linkedin.com.invalid > > > >wrote: > > > >> Hi Jason, > >> > >> The motivation behind round robin is to better balance the consumers¹ > >> load. Imagine you have two topics each with two partitions. These topics > >> are consumed by two consumers each with two consumer threads. > >> > >> The range assignment gives: > >> T1-P1 -> C1-Thr1 > >> T1-P2 -> C1-Thr2 > >> T2-P1 -> C1-Thr1 > >> T2-P2 -> C1-Thr2 > >> Consumer 2 will not be consuming from any partitions. > >> > >> The round robin algorithm gives: > >> T1-P1 -> C1-Thr1 > >> T1-P2 -> C1-Thr2 > >> T2-P1 -> C2-Thr1 > >> T2-p2 -> C2-Thr2 > >> It is much better than range assignment. > >> > >> That¹s the reason why we introduced round robin strategy even though it > >> has restrictions. > >> > >> Jiangjie (Becket) Qin > >> > >> > >> On 3/20/15, 12:20 PM, "Jason Rosenberg" <j...@squareup.com> wrote: > >> > >> >Jiangle, > >> > > >> >The error messages I got (and the config doc) do clearly state that the > >> >number of threads per consumer must match also.... > >> > > >> >I'm not convinced that an easy to understand algorithm would work fine > >> >with > >> >a heterogeneous set of selected topics between consumers. > >> > > >> >Jason > >> > > >> >On Thu, Mar 19, 2015 at 8:07 PM, Mayuresh Gharat > >> ><gharatmayures...@gmail.com > >> >> wrote: > >> > > >> >> Hi Becket, > >> >> > >> >> Can you list down an example for this. It would be easier to > >>understand > >> >>:) > >> >> > >> >> Thanks, > >> >> > >> >> Mayuresh > >> >> > >> >> On Thu, Mar 19, 2015 at 4:46 PM, Jiangjie Qin > >> >><j...@linkedin.com.invalid> > >> >> wrote: > >> >> > >> >> > Hi Jason, > >> >> > > >> >> > The round-robin strategy first takes the partitions of all the > >>topics > >> >>a > >> >> > consumer is consuming from, then distributed them across all the > >> >> consumers. > >> >> > If different consumers are consuming from different topics, the > >> >>assigning > >> >> > algorithm will generate different answers on different consumers. > >> >> > It is OK for consumers to have different thread count, but the > >> >>consumers > >> >> > have to consume from the same set of topics. > >> >> > > >> >> > > >> >> > For range strategy, the balance is for each individual topic > >>instead > >> >>of > >> >> > cross topics. So the balance is only done for the consumers > >>consuming > >> >> from > >> >> > the same topic. > >> >> > > >> >> > Thanks. > >> >> > > >> >> > Jiangjie (Becket) Qin > >> >> > > >> >> > On 3/19/15, 4:14 PM, "Jason Rosenberg" <j...@squareup.com> wrote: > >> >> > > >> >> > >So, > >> >> > > > >> >> > >I've run into an issue migrating a consumer to use the new > >> >>'roundrobin' > >> >> > >partition.assignment.strategy. It turns out that several of our > >> >> consumers > >> >> > >use the same group id, but instantiate several different consumer > >> >> > >instances > >> >> > >(with different topic selectors and thread counts). Often, this > >>is > >> >>done > >> >> > >in > >> >> > >a single shared process. It turns out this arrangement is not > >> >>allowed > >> >> > >when > >> >> > >using the 'roundrobin' assignment strategy. > >> >> > > > >> >> > >I'm curious as to the reason for this restriction? Why is it not > >> >>also a > >> >> > >restriction for the 'range' strategy (which we've been happily > >>using > >> >>for > >> >> > >some time now)? > >> >> > > > >> >> > >It would seem that as long as you always assign a partition to a > >> >> consumer > >> >> > >instance that is actually selecting it, you should still be able > >>to > >> >> > >proceed > >> >> > >with the round-robin algorithm (potentially skipping consumers if > >> >>they > >> >> > >can't select the next partition in the list, etc.). > >> >> > > > >> >> > >Jason > >> >> > > >> >> > > >> >> > >> >> > >> >> -- > >> >> -Regards, > >> >> Mayuresh R. Gharat > >> >> (862) 250-7125 > >> >> > >> > >> > >