Re: How to implement "getPreferredLocations" in Data source v2?

2020-01-18 Thread Russell Spitzer
t data locality in Spark data source v2. How can I > provide Spark the ability to read and process data on the same node? > > I didn't find any interface that supports 'getPreferredLocations' (or > equivalent). > > Thanks! >

How to implement "getPreferredLocations" in Data source v2?

2020-01-18 Thread kineret M
Hi, I would like to support data locality in Spark data source v2. How can I provide Spark the ability to read and process data on the same node? I didn't find any interface that supports 'getPreferredLocations' (or equivalent). Thanks!

context.runJob() was suspended in getPreferredLocations() function

2016-12-30 Thread Fei Hu
Dear all, I tried to customize my own RDD. In the getPreferredLocations() function, I used the following code to query anonter RDD, which was used as an input to initialize this customized RDD: * val results: Array[Array[DataChunkPartition]] = context.runJob(partitionsRDD

Re: getPreferredLocations race condition in spark 1.6.0?

2016-03-02 Thread Andy Sloane
at 3:46 PM, Andy Sloane wrote: > >> We are seeing something that looks a lot like a regression from spark >> 1.2. When we run jobs with multiple threads, we have a crash somewhere >> inside getPreferredLocations, as was fixed in

Re: getPreferredLocations race condition in spark 1.6.0?

2016-03-02 Thread Shixiong(Ryan) Zhu
h somewhere inside > getPreferredLocations, as was fixed in SPARK-4454. Except now it's inside > org.apache.spark.MapOutputTrackerMaster.getLocationsWithLargestOutputs > instead of DAGScheduler directly. > > I tried Spark 1.2 post-SPARK-4454 (before this patch it's only slightl

getPreferredLocations race condition in spark 1.6.0?

2016-03-02 Thread Andy Sloane
We are seeing something that looks a lot like a regression from spark 1.2. When we run jobs with multiple threads, we have a crash somewhere inside getPreferredLocations, as was fixed in SPARK-4454. Except now it's inside org.apache.spark.MapOutputTrackerMaster.getLocationsWithLargestOu

Re: getPreferredLocations

2014-05-31 Thread Patrick Wendell
> 1) Is there a guarantee that a partition will only be processed on a node > which is in the "getPreferredLocations" set of nodes returned by the RDD ? No there isn't, by default Spark may schedule in a "non preferred" location after `spark.locality.wait` has ex

getPreferredLocations

2014-05-29 Thread ansriniv
I am building my own custom RDD class. 1) Is there a guarantee that a partition will only be processed on a node which is in the "getPreferredLocations" set of nodes returned by the RDD ? 2) I am implementing this custom RDD in Java and plan to extend JavaRDD. However, I