We have a use case with the following design in Spark Streaming.

Within each batch,
* data is read and partitioned by some key
* forEachPartition is used to process the entire partition
* within each partition, there are several REST clients created to connect
to different REST services
* for the list of records within each partition, it will call these
services, each service call is independent of others; records are just
pre-partitioned to make these calls more efficiently.

I have a question
* Since each call is time taking and to prevent the calls to be executed
sequentially, how can I parallelize the service calls within processing of
each partition? Can I just use Scala future within forEachPartition(or
mapPartitions)?

Any suggestions greatly appreciated.

Chen

Reply via email to