Re: question on make multiple external calls within each partition

2015-10-07 Thread Chen Song
Thanks TD and Ashish. On Mon, Oct 5, 2015 at 9:14 PM, Tathagata Das wrote: > You could create a threadpool on demand within the foreachPartitoin > function, then handoff the REST calls to that threadpool, get back the > futures and wait for them to finish. Should be pretty straightforward. Make

Re: question on make multiple external calls within each partition

2015-10-05 Thread Tathagata Das
You could create a threadpool on demand within the foreachPartitoin function, then handoff the REST calls to that threadpool, get back the futures and wait for them to finish. Should be pretty straightforward. Make sure that your foreachPartition function cleans up the threadpool before finishing.

Re: question on make multiple external calls within each partition

2015-10-05 Thread Ashish Soni
Need more details but you might want to filter the data first ( create multiple RDD) and then process. > On Oct 5, 2015, at 8:35 PM, Chen Song wrote: > > We have a use case with the following design in Spark Streaming. > > Within each batch, > * data is read and partitioned by some key > * fo

question on make multiple external calls within each partition

2015-10-05 Thread Chen Song
We have a use case with the following design in Spark Streaming. Within each batch, * data is read and partitioned by some key * forEachPartition is used to process the entire partition * within each partition, there are several REST clients created to connect to different REST services * for the