Re: How to process one partition at a time?

2016-04-07 Thread Andrei
lto:hemant9...@gmail.com] >> *Sent:* Wednesday, April 6, 2016 7:16 PM >> *To:* Andrei <faithlessfri...@gmail.com> >> *Cc:* user <user@spark.apache.org> >> *Subject:* Re: How to process one partition at a time? >> >> >> >> Instead of doing it in co

Re: How to process one partition at a time?

2016-04-06 Thread Hemant Bhanawat
ant Bhanawat [mailto:hemant9...@gmail.com] > *Sent:* Wednesday, April 6, 2016 7:16 PM > *To:* Andrei <faithlessfri...@gmail.com> > *Cc:* user <user@spark.apache.org> > *Subject:* Re: How to process one partition at a time? > > > > Instead of doing it in co

RE: How to process one partition at a time?

2016-04-06 Thread Sun, Rui
n<http://spark.apache.org/docs/latest/api/scala/org/apache/spark/SimpleFutureAction.html>[R] From: Hemant Bhanawat [mailto:hemant9...@gmail.com] Sent: Wednesday, April 6, 2016 7:16 PM To: Andrei <faithlessfri...@gmail.com> Cc: user <user@spark.apache.org> Subject: Re: How to process

Re: How to process one partition at a time?

2016-04-06 Thread Hemant Bhanawat
Instead of doing it in compute, you could rather override getPartitions method of your RDD and return only the target partitions. This way tasks for only target partitions will be created. Currently in your case, tasks for all the partitions are getting created. I hope it helps. I would like to

Re: How to process one partition at a time?

2016-04-06 Thread Andrei
I'm writing a kind of sampler which in most cases will require only 1 partition, sometimes 2 and very rarely more. So it doesn't make sense to process all partitions in parallel. What is the easiest way to limit computations to one partition only? So far the best idea I came to is to create a