Topic creation should only cause a rebalance for wildcard consumers (and I believe that is regardless of whether or not the wildcard covers the topic - once the ZK watch fires a rebalance is going to happen).
Back to the original concern, it would be helpful to see more of that log, in that case. When a rebalance is triggered, there will be a log message that will indicate why. This is going to be caused by a change in the group membership (which has a number of causes, but at least it narrows it down) or a topic change. Figuring out why the consumers are rebalancing is the first step to trying to reduce it. -Todd On Saturday, September 26, 2015, noah <iamn...@gmail.com> wrote: > Thanks, that gives us some more to look at. > > That is unfortunately a small section of the log file. When we hit this > problem (which is not every time,) it will continue like that for hours. > > We also still have developers creating topics semi-regularly, which it > seems like can cause the high level consumer to disconnect? > > > On Fri, Sep 25, 2015 at 6:16 PM Todd Palino <tpal...@gmail.com > <javascript:_e(%7B%7D,'cvml','tpal...@gmail.com');>> wrote: > >> That rebalance cycle doesn't look endless. I see that you started 23 >> consumers, and I see 23 rebalances finishing successfully, which is >> correct. You will see rebalance messages from all of the consumers you >> started. It all happens within about 2 seconds, which is fine. I agree that >> there is a lot of log messages, but I'm not seeing anything that is >> particularly a problem here. After the segment of pot you provided, your >> consumers will be running properly. Now, given you have a topic with 16 >> partitions, and you're running 23 consumers, 7 of those consumer threads >> are going to be idle because they do not own partitions. >> >> -Todd >> >> >> On Fri, Sep 25, 2015 at 3:27 PM, noah <iamn...@gmail.com >> <javascript:_e(%7B%7D,'cvml','iamn...@gmail.com');>> wrote: >> >>> We're seeing this the most on developer machines that are starting up >>> multiple high level consumers on the same topic+group as part of service >>> startup. The consumers do not seem to get a chance to consume anything >>> before they disconnect. >>> >>> These are developer topics, so it is possible/likely that there isn't >>> anything for them to consume in the topic, but the same service will start >>> producing, so I would expect them to not be idle for long. >>> >>> Could it be the way we are bring up multiple consumers at the same time >>> is hitting some sort of endless rebalance cycle? And/or the resulting >>> thrashing is causing them to time out, rebalance, etc.? >>> >>> I've tried attaching the logs again. Thanks! >>> >>> On Fri, Sep 25, 2015 at 3:33 PM Todd Palino <tpal...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','tpal...@gmail.com');>> wrote: >>> >>>> I don't see the logs attached, but what does the GC look like in your >>>> applications? A lot of times this is caused (at least on the consumer >>>> side) >>>> by the Zookeeper session expiring due to excessive GC activity, which >>>> causes the consumers to go into a rebalance and change up their >>>> connections. >>>> >>>> -Todd >>>> >>>> >>>> On Fri, Sep 25, 2015 at 1:25 PM, Gwen Shapira <g...@confluent.io >>>> <javascript:_e(%7B%7D,'cvml','g...@confluent.io');>> wrote: >>>> >>>> > How busy are the clients? >>>> > >>>> > The brokers occasionally close idle connections, this is normal and >>>> > typically not something to worry about. >>>> > However, this shouldn't happen to consumers that are actively reading >>>> data. >>>> > >>>> > I'm wondering if the "consumers not making any progress" could be due >>>> to a >>>> > different issue, and because they are idle, the connection closes (vs >>>> the >>>> > other way around). >>>> > >>>> > On Thu, Sep 24, 2015 at 2:32 PM, noah <iamn...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','iamn...@gmail.com');>> wrote: >>>> > >>>> > > We are having issues with producers and consumers frequently fully >>>> > > disconnecting (from both the brokers and ZK) and reconnecting >>>> without any >>>> > > apparent cause. On our production systems it can happen anywhere >>>> from >>>> > every >>>> > > 10-15 seconds to 15-20 minutes. On our less beefy test systems and >>>> > > developer laptops, it can happen almost constantly. >>>> > > >>>> > > We see no errors in the logs (sample attached), just a message for >>>> each >>>> > of >>>> > > our our consumers and producers disconnecting, then reconnecting. >>>> On the >>>> > > systems where it happens constantly, the consumers are not making >>>> any >>>> > > progress. >>>> > > >>>> > > The logs on the brokers are equally unhelpful, they show only >>>> frequent >>>> > > connects and reconnects, without any apparent cause. >>>> > > >>>> > > What could be causing this behavior? >>>> > > >>>> > > >>>> > >>>> >>> >>