Sandy Ryza created SPARK-2089:
---------------------------------

             Summary: With YARN, preferredNodeLocalityData isn't honored 
                 Key: SPARK-2089
                 URL: https://issues.apache.org/jira/browse/SPARK-2089
             Project: Spark
          Issue Type: Bug
          Components: YARN
    Affects Versions: 1.0.0
            Reporter: Sandy Ryza


When running in YARN cluster mode, apps can pass preferred locality data when 
constructing a Spark context that will dictate where to request executor 
containers.

This is currently broken because of a race condition.  The Spark-YARN code runs 
the user class and waits for it to start up a SparkContext.  During its 
initialization, the SparkContext will create a YarnClusterScheduler, which 
notifies a monitor in the Spark-YARN code that .  The Spark-Yarn code then 
immediately fetches the preferredNodeLocationData from the SparkContext and 
uses it to start requesting containers.

But in the SparkContext constructor that takes the preferredNodeLocationData, 
setting preferredNodeLocationData comes after the rest of the initialization, 
so, if the Spark-YARN code comes around quickly enough after being notified, 
the data that's fetched is the empty unset version.  The occurred during all of 
my runs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to