Maxim Ivanov created SPARK-2669: ----------------------------------- Summary: Hadoop configuration is not localised when submitting job in yarn-cluster mode Key: SPARK-2669 URL: https://issues.apache.org/jira/browse/SPARK-2669 Project: Spark Issue Type: Bug Reporter: Maxim Ivanov
I'd like to propose a fix for a problem when Hadoop configuration is not localized when job is submitted in yarn-cluster mode. Here is a description from github pull request https://github.com/apache/spark/pull/1574 This patch fixes a problem when Spark driver is run in the container managed by YARN ResourceManager it inherits configuration from a NodeManager process, which can be different from the Hadoop configuration present on the client (submitting machine). Problem is most vivid when fs.defaultFS property differs between these two. Hadoop MR solves it by serializing client's Hadoop configuration into job.xml in application staging directory and then making Application Master to use it. That guarantees that regardless of execution nodes configurations all application containers use same config identical to one on the client side. This patch uses similar approach. YARN ClientBase serializes configuration and adds it to ClientDistributedCacheManager under "job.xml" link name. ClientDistributedCacheManager is then utilizes Hadoop localizer to deliver it to whatever container is started by this application, including the one running Spark driver. YARN ClientBase also adds "SPARK_LOCAL_HADOOPCONF" env variable to AM container request which is then used by SparkHadoopUtil.newConfiguration to trigger new behavior when machine-wide hadoop configuration is merged with application specific job.xml (exactly how it is done in Hadoop MR). SparkContext is then follows same approach, adding SPARK_LOCAL_HADOOPCONF env to all spawned containers to make them use client-side Hadopo configuration. Also all the references to "new Configuration()" which might be executed on YARN cluster side are changed to use SparkHadoopUtil.get.conf Please note that it fixes only core Spark, the part which I am comfortable to test and verify the result. I didn't descend into steaming/shark directories, so things might need to be changed there too. -- This message was sent by Atlassian JIRA (v6.2#6252)