Jun,

we are already passing the retention period. so can't go back and do a
DumpLogSegment.
plus there are other factors make this exercise difficult:
1) this topic has very high traffic volume
2) we don't know the msg offset that is corrupted

anyhow, it doesn't happen often. but can you advise proper action/handling
in this case? any other exceptions from iterator.next() that we should
handle?

Thanks,
Steven



On Wed, Feb 4, 2015 at 9:33 PM, Jun Rao <j...@confluent.io> wrote:

> 1) Does the corruption happen to console consumer as well? If so, could you
> run DumpLogSegment tool to see if the data is corrupted on disk?
>
> Thanks,
>
> Jun
>
>
> On Wed, Feb 4, 2015 at 9:55 AM, Steven Wu <stevenz...@gmail.com> wrote:
>
> > Hi,
> >
> > We have observed these two exceptions with consumer *iterator.next()*
> > recently. want to ask how should we handle them properly.
> >
> > *1) CRC corruption*
> > Message is corrupt (stored crc = 433657556, computed crc = 3265543163)
> >
> > I assume in this case we should just catch it and move on to the next
> msg?
> > any other iterator/consumer exception we should catch and handle?
> >
> >
> > *2) Unrecoverable consumer erorr with "Iterator is in failed state"*
> >
> > yesterday, one of our kafka consumers got stuck with very large maxalg
> and
> > was throwing the following exception.
> >
> > 2015-02-04 08:35:19,841 ERROR KafkaConsumer-0 KafkaConsumer - Exception
> on
> > consuming kafka with topic: <topic_name>
> > java.lang.IllegalStateException: Iterator is in failed state
> > at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:54)
> > at kafka.utils.IteratorTemplate.next(IteratorTemplate.scala:38)
> > at kafka.consumer.ConsumerIterator.next(ConsumerIterator.scala:46)
> > at
> com.netflix.suro.input.kafka.KafkaConsumer$1.run(KafkaConsumer.java:103)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> >
> > we had a surge of traffic of <topic_name>, so I guess the traffic storm
> > caused the problem. I tried to restart a few consumer instances but after
> > rebalancing, another instance got assigned the problematic partitions and
> > got stuck again with the above errors.
> >
> > We decided to drop messages, stop all consumer instances, reset all
> offset
> > by deleting zk entries and restarted them, the problem went away.
> >
> > Producer version is kafka_2.8.2-0.8.1.1 with snappy-java-1.0.5
> > Consumer version is kafka_2.9.2-0.8.2-beta with snappy-java-1.1.1.6
> >
> > We googled this issue but this was already fixed long time ago on 0.7.x.
> > any idea? is mismatched snappy version the culpit? is it a bug in
> > 0.8.2-beta?
> >
> >
> > Thanks,
> > Steven
> >
>

Reply via email to