Hi Matt,

Replies inline.

> I'm using the Capacity Scheduler and deploy mapred-site.xml and yarn-site.xml 
> configuration files with various memory settings that are tailored to the 
> resources for a particular machine. The master node, and the two slave node 
> classes each get a different configuration file since they have different 
> memory profiles.


We are improving this starting 2.8 so as to not require different configuration 
files - see https://issues.apache.org/jira/browse/YARN-160.


> yarn.scheduler.minimum-allocation-mb: This appears to behave as a 
> cluster-wide setting; however, due to my two node classes, a per-node 
> yarn.scheduler.minimum-allocation-mb would be desirable.

Actually the minimum container size is a cluster-level constant by design. It 
doesn’t matter how big or small nodes are in the cluster, the minimum size 
needs to be a constant for applications to have a notion of deterministic 
sizing. What we instead suggest is to simply run more containers on bigger 
machines using the yarn.nodemanage.resource.memory-mb configuration.

On the other hand, maximum container-size obviously should at best be the size 
of the smallest node in the cluster. Otherwise, again, you may cause 
indeterministic scheduling behavior for apps.

> More concretely, suppose I have two jobs with differing memory 
> requirements--how would I communicate this to yarn and request that my 
> containers be allocated with additional memory?

This is a more apt ask. The minimum container size doesn’t determine 
container-size!. Containers can be of sizes of various multiples of the 
minimum, and driven by the application, or frameworks like MapReduce. For 
example, even if the container-size in the cluster is 1GB, MapReduce framework 
can ask bigger containers if user sets mapreduce.map.memory.mb to 2GB/4GB etc. 
And this is controllable at the job level!

HTH
+Vinod

Reply via email to