Hello.

My team is working on moving some Hadoop 1 jobs (using an old AWS EMR AMI) to 
YARN / Hadoop 2 (using the newer AWS EMR Release 4.x).  We have an edge node 
with Hadoop 2.7.2 installed from which jobs get submitted to the cluster.  It 
appears that we must have the yarn.application.classpath property set in the 
yarn-site.xml file on the client (edge node) in order for our jobs to get 
submitted successfully.  Otherwise, the jobs fail citing the following error: 
"java.lang.NoClassDefFoundError: 
org/apache/hadoop/mapreduce/v2/app/MRAppMaster".  This has caused a lot of 
confusion.

Our understanding of the precedence used for setting properties is that it will 
use the setting found first from the following order of places to look: (1) 
Job/JobConf for the MR job, often set programmatically, (2) *-site.xml files on 
client machine, (3) *-site.xml files on cluster nodes, and finally (4) the 
*-default.xml files from the Hadoop installation.  So, we are confused as to 
why it won't just find no setting on the client and fallback to the setting 
from yarn-site.xml on the cluster nodes...?  That's how I would expect this 
particular property to be most commonly used anyway, as it seems wrong and 
backwards that the client would be telling YARN what its classpath should be on 
the cluster!  In fact, this is one of those settings that I would expect to see 
commonly set to "final" on the cluster, as I think you would want to prevent a 
client from providing its own value, since it doesn't make sense that a client 
should know where things are installed on the cluster nodes anyway.

Perhaps I have a fundamental misunderstanding of something as we're migrating 
to the new YARN framework.  A lot of what I find online seems to talk about 
submitting jobs from the cluster (typically from the master node) itself, in 
which case it makes sense that this value should be set.  But, when dealing 
with an edge node set-up like ours, I would think it should be fine to leave 
that property unset.  Can you help me understand what's going on?

Thanks!
-Jeff

Reply via email to