Sunil, this is the full capacity-scheduler.xml file:
<configuration xmlns:xi="http://www.w3.org/2001/XInclude"> <property> <name>yarn.scheduler.capacity.maximum-am-resource-percent</name> <value>0.2</value> </property> <property> <name>yarn.scheduler.capacity.maximum-applications</name> <value>10000</value> </property> <property> <name>yarn.scheduler.capacity.node-locality-delay</name> <value>40</value> </property> <property> <name>yarn.scheduler.capacity.queue-mappings-override.enable</name> <value>false</value> </property> <property> <name>yarn.scheduler.capacity.resource-calculator</name> <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator </value> </property> <property> <name>yarn.scheduler.capacity.root.accessible-node-labels</name> <value>*</value> </property> <property> <name>yarn.scheduler.capacity.root.acl_administer_queue</name> <value>*</value> </property> <property> <name>yarn.scheduler.capacity.root.acl_submit_applications</name> <value>*</value> </property> <property> <name>yarn.scheduler.capacity.root.capacity</name> <value>100</value> </property> <property> <name>yarn.scheduler.capacity.root.default.acl_submit_applications </name> <value>*</value> </property> <property> <name>yarn.scheduler.capacity.root.default.capacity</name> <value>1</value> </property> <property> <name>yarn.scheduler.capacity.root.default.maximum-capacity</name> <value>100</value> </property> <property> <name>yarn.scheduler.capacity.root.default.priority</name> <value>0</value> </property> <property> <name>yarn.scheduler.capacity.root.default.state</name> <value>RUNNING</value> </property> <property> <name>yarn.scheduler.capacity.root.default.user-limit-factor</name> <value>50</value> </property> <property> <name>yarn.scheduler.capacity.root.priority</name> <value>0</value> </property> <property> <name>yarn.scheduler.capacity.root.queues</name> <value>default,foo</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.acl_administer_queue</name> <value>*</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.acl_submit_applications</name> <value>*</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.capacity</name> <value>99</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.maximum-capacity</name> <value>100</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.minimum-user-limit-percent </name> <value>100</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.priority</name> <value>0</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.queues</name> <value>foo_daily,foo_ultimo</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.foo_daily.acl_administer_queue </name> <value>*</value> </property> <property> <name> yarn.scheduler.capacity.root.foo.foo_daily.acl_submit_applications</name> <value>*</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.foo_daily.capacity</name> <value>50</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.foo_daily.maximum-capacity </name> <value>100</value> </property> <property> <name> yarn.scheduler.capacity.root.foo.foo_daily.minimum-user-limit-percent</name> <value>100</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.foo_daily.ordering-policy </name> <value>fifo</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.foo_daily.priority</name> <value>0</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.foo_daily.state</name> <value>RUNNING</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.foo_daily.user-limit-factor </name> <value>2</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.foo_ultimo.acl_administer_queue </name> <value>*</value> </property> <property> <name> yarn.scheduler.capacity.root.foo.foo_ultimo.acl_submit_applications</name> <value>*</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.foo_ultimo.capacity</name> <value>50</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.foo_ultimo.maximum-capacity </name> <value>100</value> </property> <property> <name> yarn.scheduler.capacity.root.foo.foo_ultimo.minimum-user-limit-percent </name> <value>100</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.foo_ultimo.ordering-policy </name> <value>fifo</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.foo_ultimo.priority</name> <value>0</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.foo_ultimo.state</name> <value>RUNNING</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.foo_ultimo.user-limit-factor </name> <value>2</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.state</name> <value>RUNNING</value> </property> <property> <name>yarn.scheduler.capacity.root.foo.user-limit-factor</name> <value>1</value> </property> </configuration> On Wed, Oct 9, 2019 at 10:56 PM Lars Francke <lars.fran...@gmail.com> wrote: > Sunil, > > thank you for the answer. > > This is HDP 3.1 based on Hadoop 3.1.1. > No preemption defaults were changed I believe. The only thing changed is > to enable preemption (monitor.enabled) in Ambari but I can get the full XML > for you if that's helpful. I'll get back to you on that. > > Cheers, > Lars > > > > On Wed, Oct 9, 2019 at 7:57 PM Sunil Govindan <sun...@apache.org> wrote: > >> Hi, >> >> Expectation of the scenario explained is more or less correct. >> Few informations are needed to get more clear picture. >> 1. hadoop version >> 2. capacity scheduler xml (to be precise, all preemption related configs >> which are added) >> >> - Sunil >> >> On Wed, Oct 9, 2019 at 6:23 PM Lars Francke <lars.fran...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I've got a question about behavior we're seeing. >>> >>> Two queues: Preemption enabled, CapacityScheduler (happy to provide more >>> config if needed), 50% of resources to each >>> >>> Submit a job to queue 1 which uses 100% of the cluster. >>> Submit a job to queue 2 which doesn't get allocated because there are >>> not enough resources for the AM. >>> >>> I'd have expected this to trigger preemption but it doesn't happen. >>> Is this expected? >>> >>> Second question: >>> Once we get the second job running it only gets three containers but >>> requested six. Utilization at this point is still "unfair" i.e. queue 1 >>> ~75% resources, queue 2 ~25% resources. We saw two preemptions happening to >>> get to this 25% utilization but then no more. >>> >>> Any idea why this could be happening? >>> >>> Thank you! >>> >>> Lars >>> >>