Re: Offset commit request failing

Robert Quinlivan Fri, 17 Mar 2017 08:12:07 -0700

Thanks for the response. Reading through that thread, it appears that this
issue was addressed with KAFKA-3810
<https://issues.apache.org/jira/browse/KAFKA-3810>. This change eases the
restriction on fetch size between replicas. However, should the outcome be
a more comprehensive change to the serialization format of the request? The
size of the group metadata currently grows linearly with the number of
topic-partitions. This is difficult to tune for in a configuration using
topic auto creation.




On Fri, Mar 17, 2017 at 3:17 AM, James Cheng <wushuja...@gmail.com> wrote:

> I think it's due to the high number of partitions and the high number of
> consumers in the group. The group coordination info to keep track of the
> assignments actually happens via a message that travels through the
> __consumer_offsets topic. So with so many partitions and consumers, the
> message gets too big to go through the topic.
>
> There is a long thread here that discusses it. I don't remember what
> specific actions came out of that discussion. http://search-hadoop.com/m/
> Kafka/uyzND1yd26N1rFtRd1?subj=+DISCUSS+scalability+limits+
> in+the+coordinator
>
> -James
>
> Sent from my iPhone
>
> > On Mar 15, 2017, at 9:40 AM, Robert Quinlivan <rquinli...@signal.co>
> wrote:
> >
> > I should also mention that this error was seen on broker version
> 0.10.1.1.
> > I found that this condition sounds somewhat similar to KAFKA-4362
> > <https://issues.apache.org/jira/browse/KAFKA-4362>, but that issue was
> > submitted in 0.10.1.1 so they appear to be different issues.
> >
> > On Wed, Mar 15, 2017 at 11:11 AM, Robert Quinlivan <rquinli...@signal.co
> >
> > wrote:
> >
> >> Good morning,
> >>
> >> I'm hoping for some help understanding the expected behavior for an
> offset
> >> commit request and why this request might fail on the broker.
> >>
> >> *Context:*
> >>
> >> For context, my configuration looks like this:
> >>
> >>   - Three brokers
> >>   - Consumer offsets topic replication factor set to 3
> >>   - Auto commit enabled
> >>   - The user application topic, which I will call "my_topic", has a
> >>   replication factor of 3 as well and 800 partitions
> >>   - 4000 consumers attached in consumer group "my_group"
> >>
> >>
> >> *Issue:*
> >>
> >> When I attach the consumers, the coordinator logs the following error
> >> message repeatedly for each generation:
> >>
> >> ERROR [Group Metadata Manager on Broker 0]: Appending metadata message
> for
> >> group my_group generation 2066 failed due to org.apache.kafka.common.
> >> errors.RecordTooLargeException, returning UNKNOWN error code to the
> >> client (kafka.coordinator.GroupMetadataManager)
> >>
> >> *Observed behavior:*
> >>
> >> The consumer group does not stay connected long enough to consume
> >> messages. It is effectively stuck in a rebalance loop and the "my_topic"
> >> data has become unavailable.
> >>
> >>
> >> *Investigation:*
> >>
> >> Following the Group Metadata Manager code, it looks like the broker is
> >> writing to a cache after it writes an Offset Commit Request to the log
> >> file. If this cache write fails, the broker then logs this error and
> >> returns an error code in the response. In this case, the error from the
> >> cache is MESSAGE_TOO_LARGE, which is logged as a
> RecordTooLargeException.
> >> However, the broker then sets the error code to UNKNOWN on the Offset
> >> Commit Response.
> >>
> >> It seems that the issue is the size of the metadata in the Offset Commit
> >> Request. I have the following questions:
> >>
> >>   1. What is the size limit for this request? Are we exceeding the size
> >>   which is causing this request to fail?
> >>   2. If this is an issue with metadata size, what would cause abnormally
> >>   large metadata?
> >>   3. How is this cache used within the broker?
> >>
> >>
> >> Thanks in advance for any insights you can provide.
> >>
> >> Regards,
> >> Robert Quinlivan
> >> Software Engineer, Signal
> >>
> >
> >
> >
> > --
> > Robert Quinlivan
> > Software Engineer, Signal
>



-- 
Robert Quinlivan
Software Engineer, Signal

Re: Offset commit request failing

Reply via email to