Hi, We are using spark streaming version 1.6.2 and came across a weird behavior. Our system pulls log events data from flume servers, enrich the events and save them to ES. We are using window interval of 15 seconds and the rate on peak hours is around 70K events.
The average time to process the data and index it to es for a window interval, takes about 12 seconds, but we see that every 4-5 window intervals we have a peak to 18-22 seconds. Looking at the spark UI we see a strange behavior. Most of the time it shows that every executor has indexed a few thousands records to ES, and the size is around 5M, and when the peak interval happens, we see that 2 jobs were created to index data to es, where the second job took 6-9 seconds to index 1 record of 1800M~. 2 points I would like to clarify: 1.All of our original events are of size 3KB -5KB. 2.When changing the application to save the rdd as text file, (of course, it took less time than es) we see the same weird behavior and peak every 4-5 windows intervals. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-delays-spikes-tp28052.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org