My yarn queue is using FairScheduler as my scheduler for my 4 queues, below is 
my queue configuration:
<allocations>
    <queue name="highPriority">
       <minResources>100000 mb, 30 vcores</minResources>
       <maxResources>250000 mb, 100 vcores</maxResources>
    </queue>
    <queue name="default">
       <minResources>50000 mb, 20 vcores</minResources>
       <maxResources>100000 mb, 50 vcores</maxResources>
       <maxAMShare>-1.0f</maxAMShare>
    </queue>
    <queue name="ep">
       <minResources>100000 mb, 30 vcores</minResources>
       <maxResources>300000 mb, 100 vcores</maxResources>
       <maxAMShare>-1.0f</maxAMShare>
    </queue>
    <queue name="vip">
       <minResources>30000 mb, 20 vcores</minResources>
       <maxResources>60000 mb, 50 vcores</maxResources>
       <maxAMShare>-1.0f</maxAMShare>
     </queue>
  <fairSharePreemptionTimeout>300</fairSharePreemptionTimeout>
</allocations>


Obviously , I didn’t configure any preemption , so , the total cluster resource 
usage is very low , but , everything  is at least  running OK except that the 

total  resource usage rate of my cluster is not very high.

 So , I decide to turn on preemption and modify the fair-scheduler.xml like 
below:

<allocations>
    <queue name="highPriority">
       <minResources>100000 mb, 30 vcores</minResources>
       <maxResources>300000 mb, 100 vcores</maxResources>
       <weight>0.35</weight>
       <minSharePreemptionTimeout>20</minSharePreemptionTimeout>
       <fairSharePreemptionTimeout>25</fairSharePreemptionTimeout>
       <fairSharePreemptionThreshold>0.8</fairSharePreemptionThreshold>
       <maxAMShare>0.3f</maxAMShare>
       <maxRunningApps>18</maxRunningApps>
    </queue>
    <queue name="default">
       <minResources>50000 mb, 20 vcores</minResources>
       <maxResources>140000 mb, 70 vcores</maxResources>
       <weight>0.14</weight>
       <minSharePreemptionTimeout>20</minSharePreemptionTimeout>
       <fairSharePreemptionTimeout>25</fairSharePreemptionTimeout>
       <fairSharePreemptionThreshold>0.5</fairSharePreemptionThreshold>
       <maxAMShare>0.3f</maxAMShare>
       <maxRunningApps>20</maxRunningApps>
    </queue>
    <queue name="ep">
       <minResources>100000 mb, 30 vcores</minResources>
       <maxResources>600000 mb, 100 vcores</maxResources>
       <weight>0.42</weight>
       <minSharePreemptionTimeout>20</minSharePreemptionTimeout>
       <fairSharePreemptionTimeout>25</fairSharePreemptionTimeout>
       <fairSharePreemptionThreshold>0.8</fairSharePreemptionThreshold>
       <maxAMShare>0.3f</maxAMShare>
       <maxRunningApps>20</maxRunningApps>
    </queue>
    <queue name="vip">
       <minResources>6000 mb, 20 vcores</minResources>
       <maxResources>120000 mb, 30 vcores</maxResources>
       <weight>0.09</weight>
       <minSharePreemptionTimeout>20</minSharePreemptionTimeout>
       <fairSharePreemptionTimeout>25</fairSharePreemptionTimeout>
       <fairSharePreemptionThreshold>0.8</fairSharePreemptionThreshold>
       <maxAMShare>0.3f</maxAMShare>
       <maxRunningApps>10</maxRunningApps>
     </queue>
</allocations>

Yes , after preemption is turned on , the total resource usage rate of my 
cluster is up to 90%+ , but  after one night(midnight is the busiest time for 
my yarn cluster) , I find that many 
applications delays. 

After a long time of trouble-shooting, I find that in my 9 machine cluster, 5 
has physical memory of 128G, and the left 4 machine has pythical memory 64G, 
but all their yarn-site.xml , the  yarn.nodemanager.resource.memory-mb is 
configured as 97280 ,that is to say , the  yarn.nodemanager.resource.memory-mb  
configuration in 4 machines is actually more that the actual pythical memory . 
So ,I doubt if this is what result in the 
phenomenon that even though the total cluster resource usage is improves, but 
each application takes more time to execute and delayed seriously.


Any suggestions?

Reply via email to