I have a simple and probably dumb question about foreachRDD.

We are using spark streaming + cassandra to compute concurrent users every
5min. Our batch size is 10secs and our block interval is 2.5secs.

At the end of the world we are using foreachRDD to join the data in the RDD
with existing data in Cassandra, update the counters and then save it back
to Cassandra.

To the best of my understanding, in this scenario, spark streaming produces
one RDD every 10secs and foreachRDD executes them sequentially, that is,
foreachRDD would never run in parallel.

Am I right?

Regards,

Luis

Reply via email to