Re: YARN replica selection

2015-06-20 Thread Ravi Prakash
Hi Muthu! Hitesh is correct. The behavior is application specific in the sense that its the application AM which asks for containers. Look at https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/

Re: YARN replica selection

2015-06-19 Thread Hitesh Shah
Moving conversation to yarn-dev. BCC’ed hdfs-dev. YARN actually does not do anything except give back containers based on what an application requested for. It is up to each and every application to first figure out where the data is located and then make optimal choices based on which node to

Re: YARN replica selection

2015-06-19 Thread Sagar Thacker
Dear Users, Please remove me from the thread. I am no longer associated with Hadoop On Fri, Jun 19, 2015 at 1:54 PM, Arun Suresh wrote: > +yarn-dev@ > > Currently you can provide the scheduler hints as to the set of nodes / > racks the task may be scheduled. > But what you are probably looking

Re: YARN replica selection

2015-06-19 Thread Arun Suresh
+yarn-dev@ Currently you can provide the scheduler hints as to the set of nodes / racks the task may be scheduled. But what you are probably looking for is the Node Labeling feature which is currently under development : https://issues.apache.org/jira/browse/YARN-2492 -Arun On Fri, Jun 19, 201

YARN replica selection

2015-06-19 Thread Muthu Ganesh
Hi, How does YARN decide which replica to use when scheduling a task or is it random? Does the YARN scheduler give a priority to SSD storage types over DISK storage types for the HOT_STORAGE_POLICY when scheduling data local tasks? Please let me know if this should be posted in YARN developers m