Apache Spark - Structured Streaming reading from Kafka some tasks take much longer

M Singh Fri, 23 Feb 2018 11:39:25 -0800

Hi:
I am working with spark structured streaming (2.2.1) reading data from Kafka 
(0.11).


I need to aggregate data ingested every minute and I am using spark-shell at 
the moment.  The message rate ingestion rate is approx 500k/second.  During 
some trigger intervals (1 minute) especially when the streaming process is 
started, all tasks finish in 20seconds but during some triggers, it takes 90 
seconds.  

I have tried to reduce the number of partitions approx (100 from 300) to reduce 
the consumers for Kafka, but that has not helped. I also tried the 
kafkaConsumer.pollTimeoutMs to 30 seconds but then I see a lot of 
java.util.concurrent.TimeoutException: Cannot fetch record for offset.
So I wanted to see if anyone has any thoughts/recommendations.
Thanks

Apache Spark - Structured Streaming reading from Kafka some tasks take much longer

Reply via email to