Could any of the experts kindly advise ?

On Fri, May 19, 2017 at 6:00 PM, Jayadeep J <jayade...@gmail.com> wrote:

> Hi ,
>
> I would appreciate some advice regarding an issue we are facing in
> Streaming Kafka Direct Consumer.
>
> We have recently upgraded our application with Kafka Direct Stream to
> Spark 2 (spark-streaming-kafka-0-10 - 2.1.0) with Kafka version (0.10.0.0)
>  . We find abnormal delays after the application has run for a couple of
> hours & completed consumption of a ~ 10 million records. There is a sudden
> dip in the processing time for ~15 seconds (usual for our app) to ~3
> minutes & from then on the processing time keeps degrading throughout
> without any failure though.
>
> We have seen that the delay is due to certain tasks taking the exact time
> duration of the configured 'request.timeout.ms'  for the Kafka consumer.
> We have tested this by varying timeout property to different values. Looks
> like the get(offset: Long, timeout: Long): ConsumerRecord[K, V]  &
> subsequent poll(timeout) method in CachedKafkaConsumer.scala is actually
> timing out on some of the partitions without reading the data. But the
> executor logs it as successfully completed after the exact timeout
> duration. Note that most other tasks are completing successfully with
> millisecond duration.  We found the DEBUG logs to contain
>  "org.apache.kafka.common.errors.DisconnectException" without any actual
> failure. The Kafka issue logged as 'KafkaConsumer susceptible to
> FetchResponse starvation' [KAFKA-4753] seems to be the underlying cause.
>
> Could anyone kindly suggest if this a normal behaviour for
> spark? Shouldn't Spark throw Timeout error or may be fail the tasks in such
> cases ?? Currently the tasks seems to be successful & the job appears to
> progress with really slow speed.  Thanks for your help.
>
> Thanks
> Jay
>

Reply via email to