One of our consumers keeps getting an invalid message size exception. I'm pretty sure that we don't have a message size this big (1.7G). We have two other consumer groups consuming messages from the same Kafka instance happily over the last few days.
Since we keep the logs around for a fixed interval and this consumer group has fallen pretty far behind, is it possible that somehow log truncation is causing this? We are on kafka 0.6 BTW. At this point, I'm inclined to wipe out the "/consumers/<group>/offsets" node in zookeeper to get this system going again. Would that be the preferred way of getting out of this bad state? Let me know if there is any other trouble shooting/diagnostics I can run on the system before I reboot! Manish [$DATE] ERROR k.c.FetcherRunnable [] - error in FetcherRunnable kafka.common.InvalidMessageSizeException: invalid message size:1852339316 only received bytes:307196 at 0 possible causes (1) a single message larger than the fetch size; (2) log corruption at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:75) ~[rookery-vacuum.jar:na] at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:61) ~[rookery-vacuum.jar:na] at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:58) ~[rookery-vacuum.jar:na] at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:50) ~[rookery-vacuum.jar:na] at kafka.message.ByteBufferMessageSet.validBytes(ByteBufferMessageSet.scala:49) ~[rookery-vacuum.jar:na] at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:70) ~[rookery-vacuum.jar:na] at kafka.consumer.FetcherRunnable$$anonfun$run$4.apply(FetcherRunnable.scala:80) ~[rookery-vacuum.jar:na] at kafka.con...