Is there any way to set the output location for each partition for the RDD?

2013-10-31 Thread Wenlei Xie
Hi, My iterative program written in Spark got quite various running time for each iterations, although the computation load is supposed to be roughly the same. My program logic would add a batch of tuples and delete roughly same number of tuples in each iteration. I suspect part of the reason is

Re: Is there any way to set the output location for each partition for the RDD?

2013-10-31 Thread dachuan
I guess it could be solved by extending from existing RDD and override the getPreferredLocations() definition. But I am not sure, I will wait for the answer. On Thu, Oct 31, 2013 at 10:44 PM, Wenlei Xie wrote: > Hi, > > My iterative program written in Spark got quite various running time for >

Re: Is there any way to set the output location for each partition for the RDD?

2013-11-04 Thread Wenlei Xie
Any official answer from the developers? Is the partition guaranteed to be generated on the preferred location? Best, Wenlei On Thu, Oct 31, 2013 at 7:53 PM, dachuan wrote: > I guess it could be solved by extending from existing RDD and override the > getPreferredLocations() definition. > > Bu

Re: Is there any way to set the output location for each partition for the RDD?

2013-11-04 Thread Wenlei Xie
Thank you for this suggestion! :) On Thu, Oct 31, 2013 at 7:53 PM, dachuan wrote: > I guess it could be solved by extending from existing RDD and override the > getPreferredLocations() definition. > > But I am not sure, I will wait for the answer. > > > On Thu, Oct 31, 2013 at 10:44 PM, Wenlei X