Hi, I've had a bit of trouble getting Spark on YARN to work. When executing in this mode and submitting from outside the cluster, one must set HADOOP_CONF_DIR or YARN_CONF_DIR <https://spark.apache.org/docs/latest/running-on-yarn.html>, from which spark-submit can find the params it needs to locate and talk to the YARN application manager.
However, Spark also packages up all the Hadoop+YARN config files, ships them to the cluster, and then uses them there. Does it only override settings on the cluster using those shipped files? Or does it use those entirely instead of the config the cluster already has? My impression is that it currently replaces rather than overrides, which means you can't construct a minimal client-side Hadoop/YARN config with only the properties necessary to find the cluster. Is that right?