Hi, Please correct me if I'm wrong, in Spark Streaming, next batch will not start processing until the previous batch has completed. Is there any way to be able to start processing the next batch if the previous batch is taking longer to process than the batch interval?
The problem I am facing is that I don't see a hardware bottleneck in my Spark cluster, but Spark is not able to handle the amount of data I am pumping through (batch processing time is longer than batch interval). What I'm seeing is spikes of CPU, network and disk IO usage which I assume are due to different stages of a job, but on average, the hardware is under utilized. Concurrency in batch processing would allow the average batch processing time to be greater than batch interval while fully utilizing the hardware. Any ideas on what can be done? One option I can think of is to split the application into multiple applications running concurrently and dividing the initial stream of data between those applications. However, I would have to lose the benefits of having a single application. Thank you, Matus --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org