replication.factor set to match source topics. (3 in our case).

what do you mean by retires? I don't see retries property in StreamConfig
class.

On Thu, Oct 5, 2017 at 7:55 PM, Stas Chizhov <schiz...@gmail.com> wrote:

> Hi
>
> Have you set replication.factor and retries properties?
>
> BR
>
> tors 5 okt. 2017 kl. 18:45 skrev Dmitriy Vsekhvalnov <
> dvsekhval...@gmail.com
> >:
>
> > Hi all,
> >
> > we were testing Kafka cluster outages by randomly crashing broker nodes
> (1
> > of 3 for instance) while still keeping majority of replicas available.
> >
> > Time to time our kafka-stream app crashing with exception:
> >
> > [ERROR] [StreamThread-1]
> > [org.apache.kafka.streams.processor.internals.StreamThread]
> [stream-thread
> > [StreamThread-1] Failed while executing StreamTask 0_1 due to flush
> state:
> > ] org.apache.kafka.streams.errors.StreamsException: task [0_1] exception
> > caught when producing at
> >
> > org.apache.kafka.streams.processor.internals.RecordCollectorImpl.
> checkForException(RecordCollectorImpl.java:121)
> > at
> >
> > org.apache.kafka.streams.processor.internals.RecordCollectorImpl.flush(
> RecordCollectorImpl.java:129)
> > at
> >
> > org.apache.kafka.streams.processor.internals.StreamTask.flushState(
> StreamTask.java:422)
> > at
> >
> > org.apache.kafka.streams.processor.internals.StreamThread$4.apply(
> StreamThread.java:555)
> > at
> >
> > org.apache.kafka.streams.processor.internals.
> StreamThread.performOnTasks(StreamThread.java:501)
> > at
> >
> > org.apache.kafka.streams.processor.internals.StreamThread.flushAllState(
> StreamThread.java:551)
> > at
> >
> > org.apache.kafka.streams.processor.internals.StreamThread.
> shutdownTasksAndState(StreamThread.java:449)
> > at
> >
> > org.apache.kafka.streams.processor.internals.StreamThread.shutdown(
> StreamThread.java:391)
> > at
> >
> > org.apache.kafka.streams.processor.internals.
> StreamThread.run(StreamThread.java:372)
> > Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1
> > record(s) for
> >
> > audit-metrics-collector-store_context.operation-message.
> bucketStartMs-message.dc-repartition-0:
> > 30026 ms has passed since batch creation plus linger time
> >
> > after that clearly only restart of app and it continues processing.
> >
> > We believe it is correlated with our outage testing and question is: what
> > are recommended options for make kafka-stream application more resilient
> to
> > broker crashes?
> >
> > Thank you.
> >
>

Reply via email to