Hi all, I'm running Spark Streaming with Kafka Direct Stream, but after running a couple of days, the batch processing time almost doubles. I didn't find any slowdown on JVM GC logs, but I did find that Spark broadcast variable reading time increasing. Initially it takes less than 10ms, but after 3 days it takes more than 60ms. It's really puzzling since I don't use broadcast variables at all.
My application needs to run 24/7, so I hope there's something I'm missing to correct this behavior. FYI, we're running on AWS EMR with Spark version 1.6.1, in YARN client mode. Attached spark application environment settings file. -- John Simon environment.txt (7K) <http://apache-spark-user-list.1001560.n3.nabble.com/attachment/27138/0/environment.txt> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Long-Running-Spark-Streaming-getting-slower-tp27138.html Sent from the Apache Spark User List mailing list archive at Nabble.com.