Hello, We have a Spark streaming application and the problem that we are encountering is that the batch processing time keeps on increasing and eventually causes the application to start lagging. I am hoping that someone here can point me to any underlying cause of why this might happen.
The batch interval is 1 minute as of now and the app does some maps, filters, joins and reduceByKeyAndWindow operations. All the reduces are invertible functions and so we do provide the inverse-reduce functions in all those. The largest window size we have is 1 hour right now. When the app is started, we see that the batch processing time is between 20 and 30 seconds. It keeps creeping up slowly and by the time it hits the 1 hour mark, it somewhere around 35-40 seconds. Somewhat expected and still not bad! I would expect that since the largest window we have is 1 hour long, the application should stabilize around the 1 hour mark and start processing subsequent batches within that 35-40 second zone. However, that is not what is happening. The processing time still keeps increasing and eventually in a few hours it exceeds 1 minute mark and then starts lagging. Eventually the lag builds up and becomes in minutes at which point we have to restart the system. Any pointers on why this could be happening and what we can do to troubleshoot further? Thanks Nikunj