Hi, I have deployed a hadoop 2.7.1 cluster with heterogeneous nodes. For the sake of discussion, suppose one node has 100GB of RAM while another has 50 GB.
I'm using the Capacity Scheduler and deploy mapred-site.xml and yarn-site.xml configuration files with various memory settings that are tailored to the resources for a particular machine. The master node, and the two slave node classes each get a different configuration file since they have different memory profiles. I am trying to configure yarn is such a way as to take advantage of all the resources available on the nodes and I'm having particular difficulty with the minimum allocation setting. What I can tell from my deployment is that there are certain memory settings that are node specific while others that are cluster wide. A particular configuration setting that's causing me troubles is, yarn.scheduler.minimum-allocation-mb This appears to behave as a cluster-wide setting; however, due to my two node classes, a per-node yarn.scheduler.minimum-allocation-mb would be desirable. I also notice the behavior that yarn _always_ allocates yarn.scheduler.minimum-allocation-mb to each container irrespective of how their per-node memory settings are configured. Couple questions to help drive the discussion. - how should yarn be configured in a heterogeneous cluster? - yarn exposes a minimum and maximum allocation, how do I indicate that additional memory is desirable such that yarn doesn't always allocate the minimum? More concretely, suppose I have two jobs with differing memory requirements--how would I communicate this to yarn and request that my containers be allocated with additional memory? Thanks, Matt