Hi,

Please correct me if I'm wrong, in Spark Streaming, next batch will
not start processing until the previous batch has completed. Is there
any way to be able to start processing the next batch if the previous
batch is taking longer to process than the batch interval?

The problem I am facing is that I don't see a hardware bottleneck in
my Spark cluster, but Spark is not able to handle the amount of data I
am pumping through (batch processing time is longer than batch
interval). What I'm seeing is spikes of CPU, network and disk IO usage
which I assume are due to different stages of a job, but on average,
the hardware is under utilized. Concurrency in batch processing would
allow the average batch processing time to be greater than batch
interval while fully utilizing the hardware.

Any ideas on what can be done? One option I can think of is to split
the application into multiple applications running concurrently and
dividing the initial stream of data between those applications.
However, I would have to lose the benefits of having a single
application.

Thank you,
Matus

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to