Hi Matt, Replies inline.
> I'm using the Capacity Scheduler and deploy mapred-site.xml and yarn-site.xml > configuration files with various memory settings that are tailored to the > resources for a particular machine. The master node, and the two slave node > classes each get a different configuration file since they have different > memory profiles. We are improving this starting 2.8 so as to not require different configuration files - see https://issues.apache.org/jira/browse/YARN-160. > yarn.scheduler.minimum-allocation-mb: This appears to behave as a > cluster-wide setting; however, due to my two node classes, a per-node > yarn.scheduler.minimum-allocation-mb would be desirable. Actually the minimum container size is a cluster-level constant by design. It doesn’t matter how big or small nodes are in the cluster, the minimum size needs to be a constant for applications to have a notion of deterministic sizing. What we instead suggest is to simply run more containers on bigger machines using the yarn.nodemanage.resource.memory-mb configuration. On the other hand, maximum container-size obviously should at best be the size of the smallest node in the cluster. Otherwise, again, you may cause indeterministic scheduling behavior for apps. > More concretely, suppose I have two jobs with differing memory > requirements--how would I communicate this to yarn and request that my > containers be allocated with additional memory? This is a more apt ask. The minimum container size doesn’t determine container-size!. Containers can be of sizes of various multiples of the minimum, and driven by the application, or frameworks like MapReduce. For example, even if the container-size in the cluster is 1GB, MapReduce framework can ask bigger containers if user sets mapreduce.map.memory.mb to 2GB/4GB etc. And this is controllable at the job level! HTH +Vinod