Hello, I want to run Spark or Flink jobs from a client (remote desktop) onto a YARN cluster. Another example will be if I am running a YARN cluster on VMs, then I would like to use the host OS as the client to submit Spark Jobs to the VM YARN cluster.
What are the easiest ways to set the YARN_CONF_DIR environment variable on the client machine so that it can submit Spark jobs to the YARN cluster? >From reading online documents, I believe I am supposed to set the client's YARN_CONF_DIR environment variable to $HADOOP_HOME/etc/hadoop or $HADOOP_HOME/etc/hadoop/conf. However, I do not understand how do I get the value of HADOOP_HOME, do i need to set this value on every machine in the cluster, and how my client machine will know how to locate the NameNode in the cluster? Also, does $HADOOP_HOME/etc/hadoop have to be the same on every node in the cluster, or is it on a special node, like NameNode or ResourceManager? I have read there is an easier way by copying the /etc/hadoop contents into the client machine, and then setting the client's YARN_CONF_DIR to that location. Can someone please explain how to do this? Which node in my cluster should I copy the /etc/hadoop contents from? Would this also work if my client can only contact the cluster via ssh? Thanks! Piper