One of our consumers keeps getting an invalid message size exception. I'm
pretty sure that we don't have a message size this big (1.7G). We have two
other consumer groups consuming messages from the same Kafka instance
happily over the last few days.

Since we keep the logs around for a fixed interval and this consumer group
has fallen pretty far behind, is it possible that somehow log truncation is
causing this? We are on kafka 0.6 BTW.

At this point, I'm inclined to wipe out the "/consumers/<group>/offsets"
node in zookeeper to get this system going again. Would that be the
preferred way of getting out of this bad state?

Let me know if there is any other trouble shooting/diagnostics I can run on
 the system before I reboot!

Manish

[$DATE] ERROR k.c.FetcherRunnable [] - error in FetcherRunnable
kafka.common.InvalidMessageSizeException: invalid message size:1852339316
only received bytes:307196 at 0 possible causes (1) a single message larger
than the fetch size; (2) log corruption
at
kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:75)
~[rookery-vacuum.jar:na]
at
kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:61)
~[rookery-vacuum.jar:na]
at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:58)
~[rookery-vacuum.jar:na]
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:50)
~[rookery-vacuum.jar:na]
at
kafka.message.ByteBufferMessageSet.validBytes(ByteBufferMessageSet.scala:49)
~[rookery-vacuum.jar:na]
at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:70)
~[rookery-vacuum.jar:na]
at
kafka.consumer.FetcherRunnable$$anonfun$run$4.apply(FetcherRunnable.scala:80)
~[rookery-vacuum.jar:na]
at kafka.con...

Reply via email to