Re: Frequent Consumer and Producer Disconnects

Todd Palino Sat, 26 Sep 2015 11:23:43 -0700

Topic creation should only cause a rebalance for wildcard consumers (and I
believe that is regardless of whether or not the wildcard covers the topic
- once the ZK watch fires a rebalance is going to happen).


Back to the original concern, it would be helpful to see more of that log,
in that case. When a rebalance is triggered, there will be a log message
that will indicate why. This is going to be caused by a change in the group
membership (which has a number of causes, but at least it narrows it down)
or a topic change. Figuring out why the consumers are rebalancing is the
first step to trying to reduce it.

-Todd


On Saturday, September 26, 2015, noah <iamn...@gmail.com> wrote:

> Thanks, that gives us some more to look at.
>
> That is unfortunately a small section of the log file. When we hit this
> problem (which is not every time,) it will continue like that for hours.
>
> We also still have developers creating topics semi-regularly, which it
> seems like can cause the high level consumer to disconnect?
>
>
> On Fri, Sep 25, 2015 at 6:16 PM Todd Palino <tpal...@gmail.com
> <javascript:_e(%7B%7D,'cvml','tpal...@gmail.com');>> wrote:
>
>> That rebalance cycle doesn't look endless. I see that you started 23
>> consumers, and I see 23 rebalances finishing successfully, which is
>> correct. You will see rebalance messages from all of the consumers you
>> started. It all happens within about 2 seconds, which is fine. I agree that
>> there is a lot of log messages, but I'm not seeing anything that is
>> particularly a problem here. After the segment of pot you provided, your
>> consumers will be running properly. Now, given you have a topic with 16
>> partitions, and you're running 23 consumers, 7 of those consumer threads
>> are going to be idle because they do not own partitions.
>>
>> -Todd
>>
>>
>> On Fri, Sep 25, 2015 at 3:27 PM, noah <iamn...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','iamn...@gmail.com');>> wrote:
>>
>>> We're seeing this the most on developer machines that are starting up
>>> multiple high level consumers on the same topic+group as part of service
>>> startup. The consumers do not seem to get a chance to consume anything
>>> before they disconnect.
>>>
>>> These are developer topics, so it is possible/likely that there isn't
>>> anything for them to consume in the topic, but the same service will start
>>> producing, so I would expect them to not be idle for long.
>>>
>>> Could it be the way we are bring up multiple consumers at the same time
>>> is hitting some sort of endless rebalance cycle? And/or the resulting
>>> thrashing is causing them to time out, rebalance, etc.?
>>>
>>> I've tried attaching the logs again. Thanks!
>>>
>>> On Fri, Sep 25, 2015 at 3:33 PM Todd Palino <tpal...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','tpal...@gmail.com');>> wrote:
>>>
>>>> I don't see the logs attached, but what does the GC look like in your
>>>> applications? A lot of times this is caused (at least on the consumer
>>>> side)
>>>> by the Zookeeper session expiring due to excessive GC activity, which
>>>> causes the consumers to go into a rebalance and change up their
>>>> connections.
>>>>
>>>> -Todd
>>>>
>>>>
>>>> On Fri, Sep 25, 2015 at 1:25 PM, Gwen Shapira <g...@confluent.io
>>>> <javascript:_e(%7B%7D,'cvml','g...@confluent.io');>> wrote:
>>>>
>>>> > How busy are the clients?
>>>> >
>>>> > The brokers occasionally close idle connections, this is normal and
>>>> > typically not something to worry about.
>>>> > However, this shouldn't happen to consumers that are actively reading
>>>> data.
>>>> >
>>>> > I'm wondering if the "consumers not making any progress" could be due
>>>> to a
>>>> > different issue, and because they are idle, the connection closes (vs
>>>> the
>>>> > other way around).
>>>> >
>>>> > On Thu, Sep 24, 2015 at 2:32 PM, noah <iamn...@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','iamn...@gmail.com');>> wrote:
>>>> >
>>>> > > We are having issues with producers and consumers frequently fully
>>>> > > disconnecting (from both the brokers and ZK) and reconnecting
>>>> without any
>>>> > > apparent cause. On our production systems it can happen anywhere
>>>> from
>>>> > every
>>>> > > 10-15 seconds to 15-20 minutes. On our less beefy test systems and
>>>> > > developer laptops, it can happen almost constantly.
>>>> > >
>>>> > > We see no errors in the logs (sample attached), just a message for
>>>> each
>>>> > of
>>>> > > our our consumers and producers disconnecting, then reconnecting.
>>>> On the
>>>> > > systems where it happens constantly, the consumers are not making
>>>> any
>>>> > > progress.
>>>> > >
>>>> > > The logs on the brokers are equally unhelpful, they show only
>>>> frequent
>>>> > > connects and reconnects, without any apparent cause.
>>>> > >
>>>> > > What could be causing this behavior?
>>>> > >
>>>> > >
>>>> >
>>>>
>>>
>>

Re: Frequent Consumer and Producer Disconnects

Reply via email to