Thanks everyone, both - `submitJob` and `PartitionPrunningRDD` - work for me.
On Thu, Apr 7, 2016 at 8:22 AM, Hemant Bhanawat <hemant9...@gmail.com> wrote: > Apparently, there is another way to do it. You can try creating a > PartitionPruningRDD and pass a partition filter function to it. This RDD > will do the same thing that I suggested in my mail and you will not have to > create a new RDD. > > Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811> > www.snappydata.io > > On Wed, Apr 6, 2016 at 5:35 PM, Sun, Rui <rui....@intel.com> wrote: > >> Maybe you can try SparkContext.submitJob: >> >> *def **submitJob**[T, U, R](rdd: RDD >> <http://spark.apache.org/docs/latest/api/scala/org/apache/spark/rdd/RDD.html>[T], >> processPartition: >> (Iterator[T]) **⇒** U, partitions: Seq[Int], resultHandler: (Int, U) **⇒** >> Unit, resultFunc: >> **⇒** R): SimpleFutureAction >> <http://spark.apache.org/docs/latest/api/scala/org/apache/spark/SimpleFutureAction.html>[R]* >> >> >> >> >> >> *From:* Hemant Bhanawat [mailto:hemant9...@gmail.com] >> *Sent:* Wednesday, April 6, 2016 7:16 PM >> *To:* Andrei <faithlessfri...@gmail.com> >> *Cc:* user <user@spark.apache.org> >> *Subject:* Re: How to process one partition at a time? >> >> >> >> Instead of doing it in compute, you could rather override getPartitions >> method of your RDD and return only the target partitions. This way tasks >> for only target partitions will be created. Currently in your case, tasks >> for all the partitions are getting created. >> >> I hope it helps. I would like to hear if you take some other approach. >> >> >> Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811> >> >> www.snappydata.io >> >> >> >> On Wed, Apr 6, 2016 at 3:49 PM, Andrei <faithlessfri...@gmail.com> wrote: >> >> I'm writing a kind of sampler which in most cases will require only 1 >> partition, sometimes 2 and very rarely more. So it doesn't make sense to >> process all partitions in parallel. What is the easiest way to limit >> computations to one partition only? >> >> >> >> So far the best idea I came to is to create a custom partition whose >> `compute` method looks something like: >> >> >> >> def compute(split: Partition, context: TaskContext) = { >> >> if (split.index == targetPartition) { >> >> // do computation >> >> } else { >> >> // return empty iterator >> >> } >> >> } >> >> >> >> >> >> But it's quite ugly and I'm unlikely to be the first person with such a >> need. Is there easier way to do it? >> >> >> >> >> >> >> > >