Thanks TD and Ashish.
On Mon, Oct 5, 2015 at 9:14 PM, Tathagata Das wrote:
> You could create a threadpool on demand within the foreachPartitoin
> function, then handoff the REST calls to that threadpool, get back the
> futures and wait for them to finish. Should be pretty straightforward. Make
You could create a threadpool on demand within the foreachPartitoin
function, then handoff the REST calls to that threadpool, get back the
futures and wait for them to finish. Should be pretty straightforward. Make
sure that your foreachPartition function cleans up the threadpool before
finishing.
Need more details but you might want to filter the data first ( create multiple
RDD) and then process.
> On Oct 5, 2015, at 8:35 PM, Chen Song wrote:
>
> We have a use case with the following design in Spark Streaming.
>
> Within each batch,
> * data is read and partitioned by some key
> * fo
We have a use case with the following design in Spark Streaming.
Within each batch,
* data is read and partitioned by some key
* forEachPartition is used to process the entire partition
* within each partition, there are several REST clients created to connect
to different REST services
* for the