Hi,

I have deployed a hadoop 2.7.1 cluster with heterogeneous nodes. For the
sake of discussion, suppose one node has 100GB of RAM while another has 50
GB.

I'm using the Capacity Scheduler and deploy mapred-site.xml and
yarn-site.xml configuration files with various memory settings that are
tailored to the resources for a particular machine. The master node, and
the two slave node classes each get a different configuration file since
they have different memory profiles.

I am trying to configure yarn is such a way as to take advantage of all the
resources available on the nodes and I'm having particular difficulty with
the minimum allocation setting. What I can tell from my deployment is that
there are certain memory settings that are node specific while others that
are cluster wide. A particular configuration setting that's causing me
troubles is,

yarn.scheduler.minimum-allocation-mb

This appears to behave as a cluster-wide setting; however, due to my two
node classes, a per-node yarn.scheduler.minimum-allocation-mb would be
desirable.

I also notice the behavior that yarn _always_ allocates
yarn.scheduler.minimum-allocation-mb to each container irrespective of how
their per-node memory settings are configured.

Couple questions to help drive the discussion.

- how should yarn be configured in a heterogeneous cluster?
- yarn exposes a minimum and maximum allocation, how do I indicate that
additional memory is desirable such that yarn doesn't always allocate the
minimum? More concretely, suppose I have two jobs with differing memory
requirements--how would I communicate this to yarn and request that my
containers be allocated with additional memory?

Thanks,
Matt

Reply via email to