We have a use case with the following design in Spark Streaming. Within each batch, * data is read and partitioned by some key * forEachPartition is used to process the entire partition * within each partition, there are several REST clients created to connect to different REST services * for the list of records within each partition, it will call these services, each service call is independent of others; records are just pre-partitioned to make these calls more efficiently.
I have a question * Since each call is time taking and to prevent the calls to be executed sequentially, how can I parallelize the service calls within processing of each partition? Can I just use Scala future within forEachPartition(or mapPartitions)? Any suggestions greatly appreciated. Chen