from:"Juanjuan Tian \(Jira\)"

[jira] [Created] (YARN-11550) PreemptedResourceSeconds is wrongly recovered when RM has a restart

2023-08-15 Thread Juanjuan Tian (Jira)

Juanjuan Tian  created YARN-11550:
-

 Summary: PreemptedResourceSeconds is wrongly recovered when RM has 
a restart
 Key: YARN-11550
 URL: https://issues.apache.org/jira/browse/YARN-11550
 Project: Hadoop YARN
  Issue Type: Bug
  Components: metrics
Affects Versions: 3.3.4
Reporter: Juanjuan Tian 


 

PreemptedResourceSeconds is wrongly recovered when RM has a restart,  it 
wrongly load from  *ApplicationResourceUsageMap*

public Map getPreemptedResourceSecondsMap() {
if (this.preemptedResourceSecondsMap != null) {
return this.preemptedResourceSecondsMap;
}
ApplicationAttemptStateDataProtoOrBuilder p = viaProto ? proto : builder;
this.preemptedResourceSecondsMap = ProtoUtils
.convertStringLongMapProtoListToMap(
*p.getApplicationResourceUsageMapList());*
return this.preemptedResourceSecondsMap;
}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2023-03-22 Thread Juanjuan Tian (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  updated YARN-11108:
--
Description: 
Found unexpected preemptions happen when hierarchy queues case, the issue is 
that a sub queue can accept resource more than used+pending, leading to other 
queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
preemptions happen unexpectedly {color}
 
2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
(ProportionalCapacityPreemptionPolicy)] 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
  NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  PEN:  TOTAL_PEN:  
RESERVED:  GAR:  NORM: 
0.3424696922302246{color:#de350b} IDEAL_ASSIGNED: {color} IDEAL_PREEMPT:  
ACTUAL_PREEMPT:  UNTOUCHABLE:  PREEMPTABLE: ]> PEN:  TOTAL_PEN:  RESERVED: 
 GAR:  NORM: 0.3424696922302246{color:#de350b} IDEAL_ASSIGNED: 
{color} IDEAL_PREEMPT:  ACTUAL_PREEMPT:  
UNTOUCHABLE:  PREEMPTABLE: ]> BONUS_WEIGHT: 
-1.0
 


> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.2
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemptions happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285{color}> PEN:  TOTAL_PEN:  
> RESERVED:  GAR:  NORM: 
> 0.3424696922302246{color:#de350b} IDEAL_ASSIGNED:  vCores:8903>{color} IDEAL_PREEMPT:  
> ACTUAL_PREEMPT:  UNTOUCHABLE:  vCores:0, ports:null> PREEMPTABLE:  [  
> from logs, we can see,  {color:#de350b}IDEAL_ASSIGNED is bigger than 
> CUR{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2023-03-22 Thread Juanjuan Tian (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517935#comment-17517935
 ] 

Juanjuan Tian  edited comment on YARN-11108 at 3/23/23 5:47 AM:


When calculating accepted,  current formula is 

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)), idealAssigned))) is used, but this can lead that 
accepted resource  is bigger than its pending resource, for example, when 
cluster resource is (32GB, 16cores),  available is (2GB, 3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

after the calculation, the accepted is (2GB, 3cores), accepted cpu is bigger 
than its pending cpu number

Here Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

!image-2022-04-06-16-29-57-871.png!


was (Author: jutia):
When calculating accepted,  current formula is 

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)), idealAssigned))) is used, but this can lead that 
accepted resource  is bigger than its pending resource, for example, when 
cluster resource is (32GB, 16cores),  available is (2GB, 3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

after the calculation, the accepted is (2GB, 3cores), accepted cpu is bigger 
than its pending cpu number

Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

!image-2022-04-06-16-29-57-871.png!

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.2
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemptions happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\\{6, 8, 9, 10, 11, 15, 19, 20, 
> 22, 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2023-03-22 Thread Juanjuan Tian (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517935#comment-17517935
 ] 

Juanjuan Tian  edited comment on YARN-11108 at 3/23/23 5:46 AM:


When calculating accepted,  current formula is 

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)), idealAssigned))) is used, but this can lead that 
accepted resource  is bigger than its pending resource, for example, when 
cluster resource is (32GB, 16cores),  available is (2GB, 3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

after the calculation, the accepted is (2GB, 3cores), accepted cpu is bigger 
than its pending cpu number

Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

!image-2022-04-06-16-29-57-871.png!


was (Author: jutia):
When calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)), idealAssigned))) is used,

but Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

for example, if cluster resource is (32GB, 16cores),  available is (2GB, 
3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

after the calculation, the accepted is (2GB, 3cores), accepted cpu is bigger 
than its pending cpu number

 

!image-2022-04-06-16-29-57-871.png!

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.2
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemptions happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\\{6, 8, 9, 10, 11, 15, 19, 20, 
> 22, 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2023-03-22 Thread Juanjuan Tian (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517935#comment-17517935
 ] 

Juanjuan Tian  edited comment on YARN-11108 at 3/23/23 5:42 AM:


When calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)), idealAssigned))) is used,

but Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

for example, if cluster resource is (32GB, 16cores),  available is (2GB, 
3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

after the calculation, the accepted is (2GB, 3cores), accepted cpu is bigger 
than its pending cpu number

 

!image-2022-04-06-16-29-57-871.png!


was (Author: jutia):
This issue is caused by below,  when calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)), idealAssigned))) is used,

but Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

for example, if cluster resource is (32GB, 16cores),  available is (2GB, 
3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

after the calculation, the accepted is (2GB, 3cores), accepted cpu is bigger 
than its pending cpu number

 

!image-2022-04-06-16-29-57-871.png!

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.2
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemptions happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\\{6, 8, 9, 10, 11, 15, 19, 20, 
> 22, 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-22 Thread Juanjuan Tian (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517935#comment-17517935
 ] 

Juanjuan Tian  edited comment on YARN-11108 at 4/22/22 9:03 AM:


This issue is caused by below,  when calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)), idealAssigned))) is used,

but Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

for example, if cluster resource is (32GB, 16cores),  available is (2GB, 
3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

after the calculation, the accepted is (2GB, 3cores), accepted cpu is bigger 
than its pending cpu number

 

!image-2022-04-06-16-29-57-871.png!


was (Author: jutia):
This issue is caused by below,  when calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)), idealAssigned))) is used,

but Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

for example, if cluster resource is (32GB, 16cores),  available is (2GB, 
3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

after the calculation, the accepted is (2GB, 3cores), assigned cpu is bigger 
than its pending cpu number

 

!image-2022-04-06-16-29-57-871.png!

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.2
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemptions happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\\{6, 8, 9, 10, 11, 15, 19, 20, 
> 22, 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-10 Thread Juanjuan Tian (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517935#comment-17517935
 ] 

Juanjuan Tian  edited comment on YARN-11108 at 4/11/22 2:10 AM:


This issue is caused by below,  when calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)), idealAssigned))) is used,

but Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

for example, if cluster resource is (32GB, 16cores),  available is (2GB, 
3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

after the calculation, the accepted is (2GB, 3cores), assigned cpu is bigger 
than its pending cpu number

 

!image-2022-04-06-16-29-57-871.png!


was (Author: jutia):
This issue is caused by below,  when calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)), idealAssigned))) is used,

but Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

for example, if cluster resource is (32GB, 16cores),  available is (2GB, 
3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

after calculated, the accepted is (2GB, 3cores), assigned cpu is bigger than 
its pending cpu number

 

!image-2022-04-06-16-29-57-871.png!

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.2
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemption happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 
> 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-10 Thread Juanjuan Tian (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  updated YARN-11108:
--
Description: 
Found unexpected preemptions happen when hierarchy queues case, the issue is 
that a sub queue can accept resource more than used+pending, leading to other 
queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
preemptions happen unexpectedly {color}
 
2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
(ProportionalCapacityPreemptionPolicy)] 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
  NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR: ]> PEN:  TOTAL_PEN:  RESERVED: 
 GAR:  NORM: 0.3424696922302246{color:#de350b} IDEAL_ASSIGNED: 
{color} IDEAL_PREEMPT:  ACTUAL_PREEMPT:  
UNTOUCHABLE:  PREEMPTABLE: ]> BONUS_WEIGHT: 
-1.0
 

  was:
Found unexpected preemptions happen when hierarchy queues case, the issue is 
that a sub queue can accept resource more than used+pending, leading to other 
queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
preemption happen unexpectedly {color}
 
2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
(ProportionalCapacityPreemptionPolicy)] 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
  NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR: ]> PEN:  TOTAL_PEN:  RESERVED: 
 GAR:  NORM: 0.3424696922302246{color:#de350b} IDEAL_ASSIGNED: 
{color} IDEAL_PREEMPT:  ACTUAL_PREEMPT:  
UNTOUCHABLE:  PREEMPTABLE: ]> BONUS_WEIGHT: 
-1.0
 


> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.2
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemptions happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\\{6, 8, 9, 10, 11, 15, 19, 20, 
> 22, 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-10 Thread Juanjuan Tian (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517935#comment-17517935
 ] 

Juanjuan Tian  edited comment on YARN-11108 at 4/11/22 2:09 AM:


This issue is caused by below,  when calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)), idealAssigned))) is used,

but Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

for example, if cluster resource is (32GB, 16cores),  available is (2GB, 
3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

after calculated, the accepted is (2GB, 3cores), assigned cpu is bigger than 
its pending cpu number

 

!image-2022-04-06-16-29-57-871.png!


was (Author: jutia):
This issue is caused by below,  when calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)), idealAssigned))) is used,

but Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

for example, if cluster resource is (32GB, 16cores),  available is (2GB, 
3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

after calculated, the accepted is (2GB, 3cores), assigned cpu is more than its 
pending cpu number

 

!image-2022-04-06-16-29-57-871.png!

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.2
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemption happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 
> 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-06 Thread Juanjuan Tian (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517945#comment-17517945
 ] 

Juanjuan Tian  commented on YARN-11108:
---

[~wangda]  could you help take a look?

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.2
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemption happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 
> 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-06 Thread Juanjuan Tian (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  updated YARN-11108:
--
Affects Version/s: 2.9.2

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.2
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemption happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 
> 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-06 Thread Juanjuan Tian (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  reassigned YARN-11108:
-

Assignee: Juanjuan Tian 

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemption happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 
> 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-06 Thread Juanjuan Tian (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517935#comment-17517935
 ] 

Juanjuan Tian  edited comment on YARN-11108 at 4/6/22 8:44 AM:
---

This issue is caused by below,  when calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)), idealAssigned))) is used,

but Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

for example, if cluster resource is (32GB, 16cores),  available is (2GB, 
3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

after calculated, the accepted is (2GB, 3cores), assigned cpu is more than its 
pending cpu number

 

!image-2022-04-06-16-29-57-871.png!


was (Author: jutia):
This issue is caused by below,  when calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)),idealAssigned  is used, but 

Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

for example, it cluster resource is (32GB, 16cores),  availialble is (2GB, 
3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

 

after calculated, the accepted is (2GB, 3cores), assigned cpu is more than its 
pending cpu number

 

!image-2022-04-06-16-29-57-871.png!

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemption happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 
> 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-06 Thread Juanjuan Tian (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517935#comment-17517935
 ] 

Juanjuan Tian  edited comment on YARN-11108 at 4/6/22 8:43 AM:
---

This issue is caused by below,  when calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)),idealAssigned  is used, but 

Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) should be used,

 

for example, it cluster resource is (32GB, 16cores),  availialble is (2GB, 
3cores),  Resources.

.subtract(Resources.add(getUsed(), 
(considersReservedResource ? pending : pendingDeductReserved)),
idealAssigned))) is (8GB, 2cores)

 

after calculated, the accepted is (2GB, 3cores), assigned cpu is more than its 
pending cpu number

 

!image-2022-04-06-16-29-57-871.png!


was (Author: jutia):
This issue is vcaused by below,  when calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)),idealAssigned  is used, but 

Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) is use 

 

!image-2022-04-06-16-29-57-871.png!

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemption happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 
> 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-06 Thread Juanjuan Tian (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517935#comment-17517935
 ] 

Juanjuan Tian  commented on YARN-11108:
---

This issue is vcaused by below,  when calculating accepted,   

Resource.min (Resources.min(rc, clusterResource, avail, 
Resources.subtract(Resources.add(getUsed(),(considersReservedResource ? pending 
: pendingDeductReserved)),idealAssigned  is used, but 

Resources.componentwiseMin (Resources.componentwiseMin(avail, 
Resources.subtract(Resources.add(getUsed(), (considersReservedResource ? 
pending : pendingDeductReserved)), idealAssigned) is use 

 

!image-2022-04-06-16-29-57-871.png!

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemption happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 
> 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-06 Thread Juanjuan Tian (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-11108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  updated YARN-11108:
--
Attachment: image-2022-04-06-16-29-57-871.png

> Unexpected preemptions happen when hierarchy queues case
> 
>
> Key: YARN-11108
> URL: https://issues.apache.org/jira/browse/YARN-11108
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Juanjuan Tian 
>Priority: Major
> Attachments: image-2022-04-06-16-29-57-871.png
>
>
> Found unexpected preemptions happen when hierarchy queues case, the issue is 
> that a sub queue can accept resource more than used+pending, leading to other 
> queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
> preemption happen unexpectedly {color}
>  
> 2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
> (ProportionalCapacityPreemptionPolicy)] 
> org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
>   NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR:  vCores:8285, ports:null{color}, [ reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 24, 28}>]> PEN: 
>  TOTAL_PEN:  
> RESERVED:  GAR:  vCores:9571, ports:null> NORM: 0.3424696922302246{color:#de350b} 
> IDEAL_ASSIGNED: {color} 
> IDEAL_PREEMPT:  ACTUAL_PREEMPT:  vCores:0, ports:null> UNTOUCHABLE:  
> PREEMPTABLE:  availableCpuCount:-36467, reservedAffinity:\{6, 8, 9, 10, 11, 15, 19, 20, 22, 
> 24, 28}>]> BONUS_WEIGHT: -1.0
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-11108) Unexpected preemptions happen when hierarchy queues case

2022-04-06 Thread Juanjuan Tian (Jira)

Juanjuan Tian  created YARN-11108:
-

 Summary: Unexpected preemptions happen when hierarchy queues case
 Key: YARN-11108
 URL: https://issues.apache.org/jira/browse/YARN-11108
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Juanjuan Tian 


Found unexpected preemptions happen when hierarchy queues case, the issue is 
that a sub queue can accept resource more than used+pending, leading to other 
queues {color:#172b4d}IDEAL_ASSIGNED is smaller than used + pending, and 
preemption happen unexpectedly {color}
 
2022-04-02T01:11:12,973 DEBUG [SchedulingMonitor 
(ProportionalCapacityPreemptionPolicy)] 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.PreemptableResourceCalculator:
  NAME: MSANRPAB PARTITION: persistent{color:#de350b} CUR: ]> PEN:  TOTAL_PEN:  RESERVED: 
 GAR:  NORM: 0.3424696922302246{color:#de350b} IDEAL_ASSIGNED: 
{color} IDEAL_PREEMPT:  ACTUAL_PREEMPT:  
UNTOUCHABLE:  PREEMPTABLE: ]> BONUS_WEIGHT: 
-1.0
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9351) user can't use total resources of one partition even when yarn.scheduler.capacity..minimum-user-limit-percent is set to 100

2020-10-14 Thread Juanjuan Tian (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  updated YARN-9351:
-
Summary: user can't use total resources of one partition even when 
yarn.scheduler.capacity..minimum-user-limit-percent is set to 100   
(was: user can't use total resources of one partition even 
yarn.scheduler.capacity..minimum-user-limit-percent is set to 100 )

> user can't use total resources of one partition even when 
> yarn.scheduler.capacity..minimum-user-limit-percent is set to 100 
> 
>
> Key: YARN-9351
> URL: https://issues.apache.org/jira/browse/YARN-9351
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.1.2
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
>
> if we configure queue capacity in absolute term, users can't use total 
> resource of one partition even 
> yarn.scheduler.capacity..minimum-user-limit-percent is set to 100 
>  for example there are two partition A,B, partition A has (120G memory,30 
> vcores), and partition B has (180G memory,60 vcores), and Queue Prod is 
> configured with (75G memory, 25 vcores) partition A resource, like 
> yarn.scheduler.capacity.root.Prod.accessible-node-labels.A.capacity=[memory=75Gi,vcores=25],
> and 
> yarn.scheduler.capacity.root.Prod.accessible-node-labels.A.maximum-capacity=[memory=120Gi,vcores=30]
> yarn.scheduler.capacity.root.Prod.minimum-user-limit-percent=100, and at one 
> point the used resource of queue Prod is (90G memory,10 vcores), at this time 
> even though yarn.scheduler.capacity..minimum-user-limit-percent 
> is set to 100 , users in queue A can't get more resource.
>  
> the reason for this is that  when {color:#d04437}*computeUserLimit*{color}, 
> partitionResource is used for comparing consumed, queueCapacity, so in the 
> example (75G memory, 25 vcores) is the user limit. 
> Resource currentCapacity = Resources.lessThan(resourceCalculator,
>  partitionResource, consumed, queueCapacity)
>  ? queueCapacity
>  : Resources.add(consumed, required);
> Resource userLimitResource = Resources.max(resourceCalculator, 
> partitionResource,Resources.divideAndCeil(resourceCalculator, resourceUsed,
> usersSummedByWeight),Resources.divideAndCeil(resourceCalculator,Resources.multiplyAndRoundDown(currentCapacity,
>  getUserLimit()),100));
>  
> but when *{color:#d04437}canAssignToUser{color}* = 
> Resources.greaterThan(resourceCalculator, clusterResource,
>  user.getUsed(nodePartition), limit)
> *{color:#d04437}clusterResource{color}* {color:#33}is used for for 
> comparing  *used and limit, the result is false.*{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10401) AggregateContainersPreempted in QueueMetrics is not correct when set yarn.scheduler.capacity.lazy-preemption-enabled as true

2020-08-19 Thread Juanjuan Tian (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  updated YARN-10401:
--
Attachment: YARN-10401-001.patch

> AggregateContainersPreempted in QueueMetrics is not correct when set 
> yarn.scheduler.capacity.lazy-preemption-enabled as true
> 
>
> Key: YARN-10401
> URL: https://issues.apache.org/jira/browse/YARN-10401
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: YARN-10401-001.patch
>
>
> AggregateContainersPreempted in QueueMetrics is always zero when set 
> yarn.scheduler.capacity.lazy-preemption-enabled as true



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10401) AggregateContainersPreempted in QueueMetrics is not correct when set yarn.scheduler.capacity.lazy-preemption-enabled as true

2020-08-18 Thread Juanjuan Tian (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  reassigned YARN-10401:
-

Assignee: Juanjuan Tian 

> AggregateContainersPreempted in QueueMetrics is not correct when set 
> yarn.scheduler.capacity.lazy-preemption-enabled as true
> 
>
> Key: YARN-10401
> URL: https://issues.apache.org/jira/browse/YARN-10401
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
>
> AggregateContainersPreempted in QueueMetrics is always zero when set 
> yarn.scheduler.capacity.lazy-preemption-enabled as true



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10401) AggregateContainersPreempted in QueueMetrics is not correct when set yarn.scheduler.capacity.lazy-preemption-enabled as true

2020-08-18 Thread Juanjuan Tian (Jira)

Juanjuan Tian  created YARN-10401:
-

 Summary: AggregateContainersPreempted in QueueMetrics is not 
correct when set yarn.scheduler.capacity.lazy-preemption-enabled as true
 Key: YARN-10401
 URL: https://issues.apache.org/jira/browse/YARN-10401
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Juanjuan Tian 


AggregateContainersPreempted in QueueMetrics is always zero when set 
yarn.scheduler.capacity.lazy-preemption-enabled as true



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10384) Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue

2020-08-10 Thread Juanjuan Tian (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175181#comment-17175181
 ] 

Juanjuan Tian  commented on YARN-10384:
---

[~epayne] Thanks for commets. If an abusive user only abuse queue resource, we 
can use  User Weights to limit such user. But if  the user abuse other 
resource, like local disk, for example in our system, we found some users use 
large local disk, causing many NM unhealthy, in such case, we should forbid 
such user, insteading of just limitting. 

> Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue
> ---
>
> Key: YARN-10384
> URL: https://issues.apache.org/jira/browse/YARN-10384
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Attachments: YARN-10384-001.patch
>
>
> Currently CapacityScheduler supports acl_submit_applications, 
> acl_administer_queue to admister queue,  but It may need to forbid some ones 
> in group of acl_submit_applications to submit applications to one specified 
> queue, since some ones may abuse the queue, and submit many applications, 
> meanwhile creating another groups just to exclude these ones costs effort and 
> time. For this scenario, we can just add another acl type - 
> FORBID_SUBMIT_APPLICATIONS, and add these ones who abuse queue,  forbid these 
> ones to submit application 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9020) AbsoluteCapacity is wrongly set when call ParentQueue#setAbsoluteCapacity

2020-08-04 Thread Juanjuan Tian (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  updated YARN-9020:
-
Summary:  AbsoluteCapacity is wrongly set when call  
ParentQueue#setAbsoluteCapacity  (was: set a wrong AbsoluteCapacity when call  
ParentQueue#setAbsoluteCapacity)

>  AbsoluteCapacity is wrongly set when call  ParentQueue#setAbsoluteCapacity
> ---
>
> Key: YARN-9020
> URL: https://issues.apache.org/jira/browse/YARN-9020
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
>
> set a wrong AbsoluteCapacity when call  ParentQueue#setAbsoluteCapacity
> private void deriveCapacityFromAbsoluteConfigurations(String label,
>  Resource clusterResource, ResourceCalculator rc, CSQueue childQueue) {
> // 3. Update absolute capacity as a float based on parent's minResource and
>  // cluster resource.
>  childQueue.getQueueCapacities().setAbsoluteCapacity(label,
>  (float) childQueue.getQueueCapacities().{color:#d04437}getCapacity(){color}
>  / getQueueCapacities().getAbsoluteCapacity(label));
>  
> {color:#d04437}should be{color} 
> childQueue.getQueueCapacities().setAbsoluteCapacity(label,
>  (float) 
> childQueue.getQueueCapacities().{color:#f6c342}getCapacity(label){color}
>  / getQueueCapacities().getAbsoluteCapacity(label));



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10384) Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue

2020-08-04 Thread Juanjuan Tian (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  updated YARN-10384:
--
Fix Version/s: 3.2.0

> Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue
> ---
>
> Key: YARN-10384
> URL: https://issues.apache.org/jira/browse/YARN-10384
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Juanjuan Tian 
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-10384-001.patch
>
>
> Currently CapacityScheduler supports acl_submit_applications, 
> acl_administer_queue to admister queue,  but It may need to forbid some ones 
> in group of acl_submit_applications to submit applications to one specified 
> queue, since some ones may abuse the queue, and submit many applications, 
> meanwhile creating another groups just to exclude these ones costs effort and 
> time. For this scenario, we can just add another acl type - 
> FORBID_SUBMIT_APPLICATIONS, and add these ones who abuse queue,  forbid these 
> ones to submit application 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-10384) Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue

2020-08-04 Thread Juanjuan Tian (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  reassigned YARN-10384:
-

Assignee: Juanjuan Tian 

> Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue
> ---
>
> Key: YARN-10384
> URL: https://issues.apache.org/jira/browse/YARN-10384
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Juanjuan Tian 
>Assignee: Juanjuan Tian 
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-10384-001.patch
>
>
> Currently CapacityScheduler supports acl_submit_applications, 
> acl_administer_queue to admister queue,  but It may need to forbid some ones 
> in group of acl_submit_applications to submit applications to one specified 
> queue, since some ones may abuse the queue, and submit many applications, 
> meanwhile creating another groups just to exclude these ones costs effort and 
> time. For this scenario, we can just add another acl type - 
> FORBID_SUBMIT_APPLICATIONS, and add these ones who abuse queue,  forbid these 
> ones to submit application 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10384) Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue

2020-08-04 Thread Juanjuan Tian (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  updated YARN-10384:
--
Affects Version/s: 3.2.0

> Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue
> ---
>
> Key: YARN-10384
> URL: https://issues.apache.org/jira/browse/YARN-10384
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Juanjuan Tian 
>Priority: Major
> Attachments: YARN-10384-001.patch
>
>
> Currently CapacityScheduler supports acl_submit_applications, 
> acl_administer_queue to admister queue,  but It may need to forbid some ones 
> in group of acl_submit_applications to submit applications to one specified 
> queue, since some ones may abuse the queue, and submit many applications, 
> meanwhile creating another groups just to exclude these ones costs effort and 
> time. For this scenario, we can just add another acl type - 
> FORBID_SUBMIT_APPLICATIONS, and add these ones who abuse queue,  forbid these 
> ones to submit application 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10384) Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue

2020-08-04 Thread Juanjuan Tian (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  updated YARN-10384:
--
Description: Currently CapacityScheduler supports acl_submit_applications, 
acl_administer_queue to admister queue,  but It may need to forbid some ones in 
group of acl_submit_applications to submit applications to one specified queue, 
since some ones may abuse the queue, and submit many applications, meanwhile 
creating another groups just to exclude these ones costs effort and time. For 
this scenario, we can just add another acl type - FORBID_SUBMIT_APPLICATIONS, 
and add these ones who abuse queue,  forbid these ones to submit application   
(was: Currently CapacityScheduler supports acl_submit_applications, 
acl_administer_queue to admister queue,  but It may neeed to forbid some ones 
in group of acl_submit_applications to submit applications to the specified 
queue, since some ones may abuse the queue, and submit many applications, 
meanwhile creating another groups just to exclude these ones costs effort and 
time. For this scenario, we can just add another acl type - 
FORBID_SUBMIT_APPLICATIONS, and just add these ones who abuse queue, and forbid 
these ones to submit application )

> Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue
> ---
>
> Key: YARN-10384
> URL: https://issues.apache.org/jira/browse/YARN-10384
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Juanjuan Tian 
>Priority: Major
> Attachments: YARN-10384-001.patch
>
>
> Currently CapacityScheduler supports acl_submit_applications, 
> acl_administer_queue to admister queue,  but It may need to forbid some ones 
> in group of acl_submit_applications to submit applications to one specified 
> queue, since some ones may abuse the queue, and submit many applications, 
> meanwhile creating another groups just to exclude these ones costs effort and 
> time. For this scenario, we can just add another acl type - 
> FORBID_SUBMIT_APPLICATIONS, and add these ones who abuse queue,  forbid these 
> ones to submit application 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10384) Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue

2020-08-04 Thread Juanjuan Tian (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  updated YARN-10384:
--
Attachment: YARN-10384-001.patch

> Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue
> ---
>
> Key: YARN-10384
> URL: https://issues.apache.org/jira/browse/YARN-10384
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Juanjuan Tian 
>Priority: Major
> Attachments: YARN-10384-001.patch
>
>
> Currently CapacityScheduler supports acl_submit_applications, 
> acl_administer_queue to admister queue,  but It may neeed to forbid some ones 
> in group of acl_submit_applications to submit applications to the specified 
> queue, since some ones may abuse the queue, and submit many applications, 
> meanwhile creating another groups just to exclude these ones costs effort and 
> time. For this scenario, we can just add another acl type - 
> FORBID_SUBMIT_APPLICATIONS, and just add these ones who abuse queue, and 
> forbid these ones to submit application 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10384) Add FORBID_SUBMIT_APPLICATIONS acl type to administer queue

2020-08-04 Thread Juanjuan Tian (Jira)

Juanjuan Tian  created YARN-10384:
-

 Summary: Add FORBID_SUBMIT_APPLICATIONS acl type to administer 
queue
 Key: YARN-10384
 URL: https://issues.apache.org/jira/browse/YARN-10384
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Juanjuan Tian 


Currently CapacityScheduler supports acl_submit_applications, 
acl_administer_queue to admister queue,  but It may neeed to forbid some ones 
in group of acl_submit_applications to submit applications to the specified 
queue, since some ones may abuse the queue, and submit many applications, 
meanwhile creating another groups just to exclude these ones costs effort and 
time. For this scenario, we can just add another acl type - 
FORBID_SUBMIT_APPLICATIONS, and just add these ones who abuse queue, and forbid 
these ones to submit application 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4314) Adding container wait time as a metric at queue level and application level.

2020-04-15 Thread Juanjuan Tian (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084503#comment-17084503
 ] 

Juanjuan Tian  commented on YARN-4314:
--

any updates about the results?

> Adding container wait time as a metric at queue level and application level.
> 
>
> Key: YARN-4314
> URL: https://issues.apache.org/jira/browse/YARN-4314
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>Priority: Major
> Attachments: Containerwaittime.pdf
>
>
> There is a need for adding the container wait-time which can be tracked at 
> the queue and application level. 
> An application can have two kinds of wait times. One is AM wait time after 
> submission and another is total container wait time between AM asking for 
> containers and getting them. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9598) Make reservation work well when multi-node enabled

2019-06-10 Thread Juanjuan Tian (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859740#comment-16859740
 ] 

Juanjuan Tian  edited comment on YARN-9598 at 6/11/19 1:58 AM:
---

Hi Tao,
{noformat}
disable re-reservation can only make the scheduler skip reserving the same 
container repeatedly and try to allocate on other nodes, it won't affect normal 
scheduling for this app and later apps. Thoughts?{noformat}
for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1, if we 
disable re-reservation, in this case, even scheduler can look up other nodes, 
since the shouldAllocOrReserveNewContainer is false, there is still no other 
reservations, and JobB will still get stuck. 


was (Author: jutia):
Hi Tao,
{noformat}
disable re-reservation can only make the scheduler skip reserving the same 
container repeatedly and try to allocate on other nodes, it won't affect normal 
scheduling for this app and later apps. Thoughts?{noformat}
for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1, if we 
disable re-reservation, in this case, even scheduler can look up other nodes, 
since the shouldAllocOrReserveNewContainer is false, there is still on other 
reservations, and JobB will still get stuck. 

> Make reservation work well when multi-node enabled
> --
>
> Key: YARN-9598
> URL: https://issues.apache.org/jira/browse/YARN-9598
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, 
> image-2019-06-10-11-37-44-975.png
>
>
> This issue is to solve problems about reservation when multi-node enabled:
>  # As discussed in YARN-9576, re-reservation proposal may be always generated 
> on the same node and break the scheduling for this app and later apps. I 
> think re-reservation in unnecessary and we can replace it with 
> LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates 
> for this app when multi-node enabled.
>  # Scheduler iterates all nodes and try to allocate for reserved container in 
> LeafQueue#allocateFromReservedContainer. Here there are two problems:
>  ** The node of reserved container should be taken as candidates instead of 
> all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later 
> scheduler may generate a reservation-fulfilled proposal on another node, 
> which will always be rejected in FiCaScheduler#commonCheckContainerAllocation.
>  ** Assignment returned by FiCaSchedulerApp#assignContainers could never be 
> null even if it's just skipped, it will break the normal scheduling process 
> for this leaf queue because of the if clause in LeafQueue#assignContainers: 
> "if (null != assignment) \{ return assignment;}"
>  # Nodes which have been reserved should be skipped when iterating candidates 
> in RegularContainerAllocator#allocate, otherwise scheduler may generate 
> allocation or reservation proposal on these node which will always be 
> rejected in FiCaScheduler#commonCheckContainerAllocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9598) Make reservation work well when multi-node enabled

2019-06-10 Thread Juanjuan Tian (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859799#comment-16859799
 ] 

Juanjuan Tian  commented on YARN-9598:
--

   "inter-queue preemption can't happened because of resource fragmentation 
while cluster resource still have 20GB available memory, right?" I will think 
the answer is yes. 

I agree "it's not re-reservation's business but can be worked around by it".  
re-reservation can results in many reservation on many nodes, and then finally 
trigger preemption, it's a workround for preemption not smart enough. So I 
think we should reconsider the re-reservation logic in this patch.

> Make reservation work well when multi-node enabled
> --
>
> Key: YARN-9598
> URL: https://issues.apache.org/jira/browse/YARN-9598
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, 
> image-2019-06-10-11-37-44-975.png
>
>
> This issue is to solve problems about reservation when multi-node enabled:
>  # As discussed in YARN-9576, re-reservation proposal may be always generated 
> on the same node and break the scheduling for this app and later apps. I 
> think re-reservation in unnecessary and we can replace it with 
> LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates 
> for this app when multi-node enabled.
>  # Scheduler iterates all nodes and try to allocate for reserved container in 
> LeafQueue#allocateFromReservedContainer. Here there are two problems:
>  ** The node of reserved container should be taken as candidates instead of 
> all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later 
> scheduler may generate a reservation-fulfilled proposal on another node, 
> which will always be rejected in FiCaScheduler#commonCheckContainerAllocation.
>  ** Assignment returned by FiCaSchedulerApp#assignContainers could never be 
> null even if it's just skipped, it will break the normal scheduling process 
> for this leaf queue because of the if clause in LeafQueue#assignContainers: 
> "if (null != assignment) \{ return assignment;}"
>  # Nodes which have been reserved should be skipped when iterating candidates 
> in RegularContainerAllocator#allocate, otherwise scheduler may generate 
> allocation or reservation proposal on these node which will always be 
> rejected in FiCaScheduler#commonCheckContainerAllocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9598) Make reservation work well when multi-node enabled

2019-06-10 Thread Juanjuan Tian (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859776#comment-16859776
 ] 

Juanjuan Tian  commented on YARN-9598:
--

Hi, [~Tao Yang], just like you said,  there will always be one reserved 
container when re-reservation disabled, and thus even when inter-queue 
preemption is enabled in cluster, preemption will not happen. But if we can 
reseve several containers, preemption can be triggered 
(yarn.resourcemanager.monitor.capacity.preemption.additional_res_balance_based_on_reserved_containers
 is set to true )

> Make reservation work well when multi-node enabled
> --
>
> Key: YARN-9598
> URL: https://issues.apache.org/jira/browse/YARN-9598
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, 
> image-2019-06-10-11-37-44-975.png
>
>
> This issue is to solve problems about reservation when multi-node enabled:
>  # As discussed in YARN-9576, re-reservation proposal may be always generated 
> on the same node and break the scheduling for this app and later apps. I 
> think re-reservation in unnecessary and we can replace it with 
> LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates 
> for this app when multi-node enabled.
>  # Scheduler iterates all nodes and try to allocate for reserved container in 
> LeafQueue#allocateFromReservedContainer. Here there are two problems:
>  ** The node of reserved container should be taken as candidates instead of 
> all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later 
> scheduler may generate a reservation-fulfilled proposal on another node, 
> which will always be rejected in FiCaScheduler#commonCheckContainerAllocation.
>  ** Assignment returned by FiCaSchedulerApp#assignContainers could never be 
> null even if it's just skipped, it will break the normal scheduling process 
> for this leaf queue because of the if clause in LeafQueue#assignContainers: 
> "if (null != assignment) \{ return assignment;}"
>  # Nodes which have been reserved should be skipped when iterating candidates 
> in RegularContainerAllocator#allocate, otherwise scheduler may generate 
> allocation or reservation proposal on these node which will always be 
> rejected in FiCaScheduler#commonCheckContainerAllocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9598) Make reservation work well when multi-node enabled

2019-06-10 Thread Juanjuan Tian (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859689#comment-16859689
 ] 

Juanjuan Tian  edited comment on YARN-9598 at 6/10/19 6:45 AM:
---

Hi Tao, 
 # As discussed in YARN-9576, re-reservation proposal may be always generated 
on the same node and break the scheduling for this app and later apps. I think 
re-reservation is unnecessary and we can replace it with LOCALITY_SKIPPED to 
let scheduler have a chance to look up follow candidates for this app when 
multi-node enabled.                

           for this, if re-reservation is disabled, the 
shouldAllocOrReserveNewContainer may return false in most cases, and thus even 
scheduler has a change to look up other candidates, it may not assign 
containers.

   2.  After this patch, since Assignment returned by 
FiCaSchedulerApp#assignContainers could never be null even if it's just 
skipped, thus, even only one of the candidates has been reserved for a 
contianer, the allocateFromReservedContainer will still never be null, it still 
breaks normal scheduler process.

So I'm wondering if we can just handle this case like sing-node, and change th 
logic in CapacityScheduler#allocateContainersOnMultiNodes{color:#d04437} like 
below{color}

     !image-2019-06-10-11-37-44-975.png!

   

/*
 * New behavior, allocate containers considering multiple nodes
 */
 private CSAssignment allocateContainersOnMultiNodes(
 {color:#d04437}FiCaSchedulerNode schedulerNode{color}) {

// Backward compatible way to make sure previous behavior which allocation
 // driven by node heartbeat works.
 if (getNode(schedulerNode.getNodeID()) != schedulerNode)

{ LOG.error("Trying to schedule on a removed node, please double check."); 
return null; }

// Assign new containers...
 // 1. Check for reserved applications
 // 2. Schedule if there are no reservations
 RMContainer reservedRMContainer = schedulerNode.getReservedContainer();
 {color:#d04437}if (reservedRMContainer != null) {{color}
 allocateFromReservedContainer(schedulerNode, false, reservedRMContainer);
 }

// Do not schedule if there are any reservations to fulfill on the node
 if (schedulerNode.getReservedContainer() != null) {
 if (LOG.isDebugEnabled())

{ LOG.debug("Skipping scheduling since node " + schedulerNode.getNodeID() + " 
is reserved by application " + schedulerNode.getReservedContainer() 
.getContainerId().getApplicationAttemptId()); }

return null;
 }

{color:#d04437}PlacementSet ps = 
getCandidateNodeSet(schedulerNode);{color}

// When this time look at multiple nodes, try schedule if the
 // partition has any available resource or killable resource
 if (getRootQueue().getQueueCapacities().getUsedCapacity(
 ps.getPartition()) >= 1.0f && preemptionManager.getKillableResource(
 CapacitySchedulerConfiguration.ROOT, ps.getPartition()) == Resources
 .none()) {

 

 


was (Author: jutia):
Hi Tao, 
 # As discussed in YARN-9576, re-reservation proposal may be always generated 
on the same node and break the scheduling for this app and later apps. I think 
re-reservation is unnecessary and we can replace it with LOCALITY_SKIPPED to 
let scheduler have a chance to look up follow candidates for this app when 
multi-node enabled.                

           for this, if re-reservation is disabled, the 
shouldAllocOrReserveNewContainer may return false in most cases, and thus even 
scheduler has a change to look up other candidates, it may not assign 
containers.

   2.  After this patch, since Assignment returned by 
FiCaSchedulerApp#assignContainers could never be null even if it's just 
skipped, thus, even only one of the candidates has been reserved for a 
contianer, the allocateFromReservedContainer will still never be null, it still 
breaks normal scheduler process.

So I'm wondering why we just handle this case like sing-node, and change th 
logic in CapacityScheduler#allocateContainersOnMultiNodes{color:#d04437} like 
below{color}

     !image-2019-06-10-11-37-44-975.png!

   

/*
 * New behavior, allocate containers considering multiple nodes
 */
 private CSAssignment allocateContainersOnMultiNodes(
 {color:#d04437}FiCaSchedulerNode schedulerNode{color}) {

// Backward compatible way to make sure previous behavior which allocation
 // driven by node heartbeat works.
 if (getNode(schedulerNode.getNodeID()) != schedulerNode)

{ LOG.error("Trying to schedule on a removed node, please double check."); 
return null; }

// Assign new containers...
 // 1. Check for reserved applications
 // 2. Schedule if there are no reservations
 RMContainer reservedRMContainer = schedulerNode.getReservedContainer();
 {color:#d04437}if (reservedRMContainer != null) {{color}
 allocateFromReservedContainer(schedulerNode, false, reservedRMContainer);
 }

// Do not schedule if there are any reservations to fulfill on the node
 if (schedulerNode.getReservedContainer() != null) {
 if

[jira] [Comment Edited] (YARN-9598) Make reservation work well when multi-node enabled

2019-06-10 Thread Juanjuan Tian (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859740#comment-16859740
 ] 

Juanjuan Tian  edited comment on YARN-9598 at 6/10/19 6:36 AM:
---

Hi Tao,
{noformat}
disable re-reservation can only make the scheduler skip reserving the same 
container repeatedly and try to allocate on other nodes, it won't affect normal 
scheduling for this app and later apps. Thoughts?{noformat}
for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1, if we 
disable re-reservation, in this case, even scheduler can look up other nodes, 
since the shouldAllocOrReserveNewContainer is false, there is still on other 
reservations, and JobB will still get stuck. 


was (Author: jutia):
Hi Tao,

{ }

disable re-reservation can only make the scheduler skip reserving the same 
container repeatedly and try to allocate on other nodes, it won't affect normal 
scheduling for this app and later apps. Thoughts?

{}

 

for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1

> Make reservation work well when multi-node enabled
> --
>
> Key: YARN-9598
> URL: https://issues.apache.org/jira/browse/YARN-9598
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, 
> image-2019-06-10-11-37-44-975.png
>
>
> This issue is to solve problems about reservation when multi-node enabled:
>  # As discussed in YARN-9576, re-reservation proposal may be always generated 
> on the same node and break the scheduling for this app and later apps. I 
> think re-reservation in unnecessary and we can replace it with 
> LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates 
> for this app when multi-node enabled.
>  # Scheduler iterates all nodes and try to allocate for reserved container in 
> LeafQueue#allocateFromReservedContainer. Here there are two problems:
>  ** The node of reserved container should be taken as candidates instead of 
> all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later 
> scheduler may generate a reservation-fulfilled proposal on another node, 
> which will always be rejected in FiCaScheduler#commonCheckContainerAllocation.
>  ** Assignment returned by FiCaSchedulerApp#assignContainers could never be 
> null even if it's just skipped, it will break the normal scheduling process 
> for this leaf queue because of the if clause in LeafQueue#assignContainers: 
> "if (null != assignment) \{ return assignment;}"
>  # Nodes which have been reserved should be skipped when iterating candidates 
> in RegularContainerAllocator#allocate, otherwise scheduler may generate 
> allocation or reservation proposal on these node which will always be 
> rejected in FiCaScheduler#commonCheckContainerAllocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-9598) Make reservation work well when multi-node enabled

2019-06-10 Thread Juanjuan Tian (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859740#comment-16859740
 ] 

Juanjuan Tian  commented on YARN-9598:
--

Hi Tao,

{ }

disable re-reservation can only make the scheduler skip reserving the same 
container repeatedly and try to allocate on other nodes, it won't affect normal 
scheduling for this app and later apps. Thoughts?

{}

 

for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1

> Make reservation work well when multi-node enabled
> --
>
> Key: YARN-9598
> URL: https://issues.apache.org/jira/browse/YARN-9598
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, 
> image-2019-06-10-11-37-44-975.png
>
>
> This issue is to solve problems about reservation when multi-node enabled:
>  # As discussed in YARN-9576, re-reservation proposal may be always generated 
> on the same node and break the scheduling for this app and later apps. I 
> think re-reservation in unnecessary and we can replace it with 
> LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates 
> for this app when multi-node enabled.
>  # Scheduler iterates all nodes and try to allocate for reserved container in 
> LeafQueue#allocateFromReservedContainer. Here there are two problems:
>  ** The node of reserved container should be taken as candidates instead of 
> all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later 
> scheduler may generate a reservation-fulfilled proposal on another node, 
> which will always be rejected in FiCaScheduler#commonCheckContainerAllocation.
>  ** Assignment returned by FiCaSchedulerApp#assignContainers could never be 
> null even if it's just skipped, it will break the normal scheduling process 
> for this leaf queue because of the if clause in LeafQueue#assignContainers: 
> "if (null != assignment) \{ return assignment;}"
>  # Nodes which have been reserved should be skipped when iterating candidates 
> in RegularContainerAllocator#allocate, otherwise scheduler may generate 
> allocation or reservation proposal on these node which will always be 
> rejected in FiCaScheduler#commonCheckContainerAllocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-9598) Make reservation work well when multi-node enabled

2019-06-09 Thread Juanjuan Tian (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859689#comment-16859689
 ] 

Juanjuan Tian  edited comment on YARN-9598 at 6/10/19 3:47 AM:
---

Hi Tao, 
 # As discussed in YARN-9576, re-reservation proposal may be always generated 
on the same node and break the scheduling for this app and later apps. I think 
re-reservation is unnecessary and we can replace it with LOCALITY_SKIPPED to 
let scheduler have a chance to look up follow candidates for this app when 
multi-node enabled.                

           for this, if re-reservation is disabled, the 
shouldAllocOrReserveNewContainer may return false in most cases, and thus even 
scheduler has a change to look up other candidates, it may not assign 
containers.

   2.  After this patch, since Assignment returned by 
FiCaSchedulerApp#assignContainers could never be null even if it's just 
skipped, thus, even only one of the candidates has been reserved for a 
contianer, the allocateFromReservedContainer will still never be null, it still 
breaks normal scheduler process.

So I'm wondering why we just handle this case like sing-node, and change th 
logic in CapacityScheduler#allocateContainersOnMultiNodes like below

     !image-2019-06-10-11-37-44-975.png!

   

/*
 * New behavior, allocate containers considering multiple nodes
 */
 private CSAssignment allocateContainersOnMultiNodes(
 {color:#d04437}FiCaSchedulerNode schedulerNode{color}) {

// Backward compatible way to make sure previous behavior which allocation
 // driven by node heartbeat works.
 if (getNode(schedulerNode.getNodeID()) != schedulerNode)

{ LOG.error("Trying to schedule on a removed node, please double check."); 
return null; }

// Assign new containers...
 // 1. Check for reserved applications
 // 2. Schedule if there are no reservations
 RMContainer reservedRMContainer = schedulerNode.getReservedContainer();
 {color:#d04437}if (reservedRMContainer != null) {{color}
 allocateFromReservedContainer(schedulerNode, false, reservedRMContainer);
 }

// Do not schedule if there are any reservations to fulfill on the node
 if (schedulerNode.getReservedContainer() != null) {
 if (LOG.isDebugEnabled())

{ LOG.debug("Skipping scheduling since node " + schedulerNode.getNodeID() + " 
is reserved by application " + schedulerNode.getReservedContainer() 
.getContainerId().getApplicationAttemptId()); }

return null;
 }

{color:#d04437}PlacementSet ps = 
getCandidateNodeSet(schedulerNode);{color}

// When this time look at multiple nodes, try schedule if the
 // partition has any available resource or killable resource
 if (getRootQueue().getQueueCapacities().getUsedCapacity(
 ps.getPartition()) >= 1.0f && preemptionManager.getKillableResource(
 CapacitySchedulerConfiguration.ROOT, ps.getPartition()) == Resources
 .none()) {

 

 


was (Author: jutia):
Hi Tao, 
 # As discussed in YARN-9576, re-reservation proposal may be always generated 
on the same node and break the scheduling for this app and later apps. I think 
re-reservation is unnecessary and we can replace it with LOCALITY_SKIPPED to 
let scheduler have a chance to look up follow candidates for this app when 
multi-node enabled.                

           for this, if re-reservation is disabled, the 
shouldAllocOrReserveNewContainer may return false in most cases, and thus even 
scheduler has a change to             look up other candidates, it may not 
assign containers.

   2.  After this patch, since Assignment returned by 
FiCaSchedulerApp#assignContainers could never be null even if it's just 
skipped, thus, even only one of the candidates has been reserved for a 
contianer, the allocateFromReservedContainer will still never be null, it still 
breaks normal scheduler process.

So I'm wondering why we just handle this case like sing-node, and change th 
logic in CapacityScheduler#allocateContainersOnMultiNodes like below

    !image-2019-06-10-11-37-44-975.png!

   

/*
 * New behavior, allocate containers considering multiple nodes
 */
private CSAssignment allocateContainersOnMultiNodes(
 {color:#d04437}FiCaSchedulerNode schedulerNode{color}) {

 // Backward compatible way to make sure previous behavior which allocation
 // driven by node heartbeat works.
 if (getNode(schedulerNode.getNodeID()) != schedulerNode) {
 LOG.error("Trying to schedule on a removed node, please double check.");
 return null;
 }

 // Assign new containers...
 // 1. Check for reserved applications
 // 2. Schedule if there are no reservations
 RMContainer reservedRMContainer = schedulerNode.getReservedContainer();
 {color:#d04437}if (reservedRMContainer != null) {{color}
 allocateFromReservedContainer(schedulerNode, false, reservedRMContainer);
 }

 // Do not schedule if there are any reservations to fulfill on the node
 if (schedulerNode.getReservedContainer() != null) {
 if (LOG.isDebugEnabled()) {
 LOG.debug("Skipping

[jira] [Comment Edited] (YARN-9598) Make reservation work well when multi-node enabled

2019-06-09 Thread Juanjuan Tian (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859689#comment-16859689
 ] 

Juanjuan Tian  edited comment on YARN-9598 at 6/10/19 3:47 AM:
---

Hi Tao, 
 # As discussed in YARN-9576, re-reservation proposal may be always generated 
on the same node and break the scheduling for this app and later apps. I think 
re-reservation is unnecessary and we can replace it with LOCALITY_SKIPPED to 
let scheduler have a chance to look up follow candidates for this app when 
multi-node enabled.                

           for this, if re-reservation is disabled, the 
shouldAllocOrReserveNewContainer may return false in most cases, and thus even 
scheduler has a change to look up other candidates, it may not assign 
containers.

   2.  After this patch, since Assignment returned by 
FiCaSchedulerApp#assignContainers could never be null even if it's just 
skipped, thus, even only one of the candidates has been reserved for a 
contianer, the allocateFromReservedContainer will still never be null, it still 
breaks normal scheduler process.

So I'm wondering why we just handle this case like sing-node, and change th 
logic in CapacityScheduler#allocateContainersOnMultiNodes{color:#d04437} like 
below{color}

     !image-2019-06-10-11-37-44-975.png!

   

/*
 * New behavior, allocate containers considering multiple nodes
 */
 private CSAssignment allocateContainersOnMultiNodes(
 {color:#d04437}FiCaSchedulerNode schedulerNode{color}) {

// Backward compatible way to make sure previous behavior which allocation
 // driven by node heartbeat works.
 if (getNode(schedulerNode.getNodeID()) != schedulerNode)

{ LOG.error("Trying to schedule on a removed node, please double check."); 
return null; }

// Assign new containers...
 // 1. Check for reserved applications
 // 2. Schedule if there are no reservations
 RMContainer reservedRMContainer = schedulerNode.getReservedContainer();
 {color:#d04437}if (reservedRMContainer != null) {{color}
 allocateFromReservedContainer(schedulerNode, false, reservedRMContainer);
 }

// Do not schedule if there are any reservations to fulfill on the node
 if (schedulerNode.getReservedContainer() != null) {
 if (LOG.isDebugEnabled())

{ LOG.debug("Skipping scheduling since node " + schedulerNode.getNodeID() + " 
is reserved by application " + schedulerNode.getReservedContainer() 
.getContainerId().getApplicationAttemptId()); }

return null;
 }

{color:#d04437}PlacementSet ps = 
getCandidateNodeSet(schedulerNode);{color}

// When this time look at multiple nodes, try schedule if the
 // partition has any available resource or killable resource
 if (getRootQueue().getQueueCapacities().getUsedCapacity(
 ps.getPartition()) >= 1.0f && preemptionManager.getKillableResource(
 CapacitySchedulerConfiguration.ROOT, ps.getPartition()) == Resources
 .none()) {

 

 


was (Author: jutia):
Hi Tao, 
 # As discussed in YARN-9576, re-reservation proposal may be always generated 
on the same node and break the scheduling for this app and later apps. I think 
re-reservation is unnecessary and we can replace it with LOCALITY_SKIPPED to 
let scheduler have a chance to look up follow candidates for this app when 
multi-node enabled.                

           for this, if re-reservation is disabled, the 
shouldAllocOrReserveNewContainer may return false in most cases, and thus even 
scheduler has a change to look up other candidates, it may not assign 
containers.

   2.  After this patch, since Assignment returned by 
FiCaSchedulerApp#assignContainers could never be null even if it's just 
skipped, thus, even only one of the candidates has been reserved for a 
contianer, the allocateFromReservedContainer will still never be null, it still 
breaks normal scheduler process.

So I'm wondering why we just handle this case like sing-node, and change th 
logic in CapacityScheduler#allocateContainersOnMultiNodes like below

     !image-2019-06-10-11-37-44-975.png!

   

/*
 * New behavior, allocate containers considering multiple nodes
 */
 private CSAssignment allocateContainersOnMultiNodes(
 {color:#d04437}FiCaSchedulerNode schedulerNode{color}) {

// Backward compatible way to make sure previous behavior which allocation
 // driven by node heartbeat works.
 if (getNode(schedulerNode.getNodeID()) != schedulerNode)

{ LOG.error("Trying to schedule on a removed node, please double check."); 
return null; }

// Assign new containers...
 // 1. Check for reserved applications
 // 2. Schedule if there are no reservations
 RMContainer reservedRMContainer = schedulerNode.getReservedContainer();
 {color:#d04437}if (reservedRMContainer != null) {{color}
 allocateFromReservedContainer(schedulerNode, false, reservedRMContainer);
 }

// Do not schedule if there are any reservations to fulfill on the node
 if (schedulerNode.getReservedContainer() != null) {
 if (LOG.isDebugEnabled())

{

[jira] [Commented] (YARN-9598) Make reservation work well when multi-node enabled

2019-06-09 Thread Juanjuan Tian (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859689#comment-16859689
 ] 

Juanjuan Tian  commented on YARN-9598:
--

Hi Tao, 
 # As discussed in YARN-9576, re-reservation proposal may be always generated 
on the same node and break the scheduling for this app and later apps. I think 
re-reservation is unnecessary and we can replace it with LOCALITY_SKIPPED to 
let scheduler have a chance to look up follow candidates for this app when 
multi-node enabled.                

           for this, if re-reservation is disabled, the 
shouldAllocOrReserveNewContainer may return false in most cases, and thus even 
scheduler has a change to             look up other candidates, it may not 
assign containers.

   2.  After this patch, since Assignment returned by 
FiCaSchedulerApp#assignContainers could never be null even if it's just 
skipped, thus, even only one of the candidates has been reserved for a 
contianer, the allocateFromReservedContainer will still never be null, it still 
breaks normal scheduler process.

So I'm wondering why we just handle this case like sing-node, and change th 
logic in CapacityScheduler#allocateContainersOnMultiNodes like below

    !image-2019-06-10-11-37-44-975.png!

   

/*
 * New behavior, allocate containers considering multiple nodes
 */
private CSAssignment allocateContainersOnMultiNodes(
 {color:#d04437}FiCaSchedulerNode schedulerNode{color}) {

 // Backward compatible way to make sure previous behavior which allocation
 // driven by node heartbeat works.
 if (getNode(schedulerNode.getNodeID()) != schedulerNode) {
 LOG.error("Trying to schedule on a removed node, please double check.");
 return null;
 }

 // Assign new containers...
 // 1. Check for reserved applications
 // 2. Schedule if there are no reservations
 RMContainer reservedRMContainer = schedulerNode.getReservedContainer();
 {color:#d04437}if (reservedRMContainer != null) {{color}
 allocateFromReservedContainer(schedulerNode, false, reservedRMContainer);
 }

 // Do not schedule if there are any reservations to fulfill on the node
 if (schedulerNode.getReservedContainer() != null) {
 if (LOG.isDebugEnabled()) {
 LOG.debug("Skipping scheduling since node " + schedulerNode.getNodeID()
 + " is reserved by application " + schedulerNode.getReservedContainer()
 .getContainerId().getApplicationAttemptId());
 }
 return null;
 }

 {color:#d04437}PlacementSet ps = 
getCandidateNodeSet(schedulerNode);{color}

// When this time look at multiple nodes, try schedule if the
// partition has any available resource or killable resource
if (getRootQueue().getQueueCapacities().getUsedCapacity(
 ps.getPartition()) >= 1.0f && preemptionManager.getKillableResource(
 CapacitySchedulerConfiguration.ROOT, ps.getPartition()) == Resources
 .none()) {

 

 

> Make reservation work well when multi-node enabled
> --
>
> Key: YARN-9598
> URL: https://issues.apache.org/jira/browse/YARN-9598
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, 
> image-2019-06-10-11-37-44-975.png
>
>
> This issue is to solve problems about reservation when multi-node enabled:
>  # As discussed in YARN-9576, re-reservation proposal may be always generated 
> on the same node and break the scheduling for this app and later apps. I 
> think re-reservation in unnecessary and we can replace it with 
> LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates 
> for this app when multi-node enabled.
>  # Scheduler iterates all nodes and try to allocate for reserved container in 
> LeafQueue#allocateFromReservedContainer. Here there are two problems:
>  ** The node of reserved container should be taken as candidates instead of 
> all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later 
> scheduler may generate a reservation-fulfilled proposal on another node, 
> which will always be rejected in FiCaScheduler#commonCheckContainerAllocation.
>  ** Assignment returned by FiCaSchedulerApp#assignContainers could never be 
> null even if it's just skipped, it will break the normal scheduling process 
> for this leaf queue because of the if clause in LeafQueue#assignContainers: 
> "if (null != assignment) \{ return assignment;}"
>  # Nodes which have been reserved should be skipped when iterating candidates 
> in RegularContainerAllocator#allocate, otherwise scheduler may generate 
> allocation or reservation proposal on these node which will always be 
> rejected in FiCaScheduler#commonCheckContainerAllocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (YARN-9598) Make reservation work well when multi-node enabled

2019-06-09 Thread Juanjuan Tian (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  updated YARN-9598:
-
Attachment: image-2019-06-10-11-37-43-283.png

> Make reservation work well when multi-node enabled
> --
>
> Key: YARN-9598
> URL: https://issues.apache.org/jira/browse/YARN-9598
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, 
> image-2019-06-10-11-37-44-975.png
>
>
> This issue is to solve problems about reservation when multi-node enabled:
>  # As discussed in YARN-9576, re-reservation proposal may be always generated 
> on the same node and break the scheduling for this app and later apps. I 
> think re-reservation in unnecessary and we can replace it with 
> LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates 
> for this app when multi-node enabled.
>  # Scheduler iterates all nodes and try to allocate for reserved container in 
> LeafQueue#allocateFromReservedContainer. Here there are two problems:
>  ** The node of reserved container should be taken as candidates instead of 
> all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later 
> scheduler may generate a reservation-fulfilled proposal on another node, 
> which will always be rejected in FiCaScheduler#commonCheckContainerAllocation.
>  ** Assignment returned by FiCaSchedulerApp#assignContainers could never be 
> null even if it's just skipped, it will break the normal scheduling process 
> for this leaf queue because of the if clause in LeafQueue#assignContainers: 
> "if (null != assignment) \{ return assignment;}"
>  # Nodes which have been reserved should be skipped when iterating candidates 
> in RegularContainerAllocator#allocate, otherwise scheduler may generate 
> allocation or reservation proposal on these node which will always be 
> rejected in FiCaScheduler#commonCheckContainerAllocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-9598) Make reservation work well when multi-node enabled

2019-06-09 Thread Juanjuan Tian (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juanjuan Tian  updated YARN-9598:
-
Attachment: image-2019-06-10-11-37-44-975.png

> Make reservation work well when multi-node enabled
> --
>
> Key: YARN-9598
> URL: https://issues.apache.org/jira/browse/YARN-9598
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, 
> image-2019-06-10-11-37-44-975.png
>
>
> This issue is to solve problems about reservation when multi-node enabled:
>  # As discussed in YARN-9576, re-reservation proposal may be always generated 
> on the same node and break the scheduling for this app and later apps. I 
> think re-reservation in unnecessary and we can replace it with 
> LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates 
> for this app when multi-node enabled.
>  # Scheduler iterates all nodes and try to allocate for reserved container in 
> LeafQueue#allocateFromReservedContainer. Here there are two problems:
>  ** The node of reserved container should be taken as candidates instead of 
> all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later 
> scheduler may generate a reservation-fulfilled proposal on another node, 
> which will always be rejected in FiCaScheduler#commonCheckContainerAllocation.
>  ** Assignment returned by FiCaSchedulerApp#assignContainers could never be 
> null even if it's just skipped, it will break the normal scheduling process 
> for this leaf queue because of the if clause in LeafQueue#assignContainers: 
> "if (null != assignment) \{ return assignment;}"
>  # Nodes which have been reserved should be skipped when iterating candidates 
> in RegularContainerAllocator#allocate, otherwise scheduler may generate 
> allocation or reservation proposal on these node which will always be 
> rejected in FiCaScheduler#commonCheckContainerAllocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7494) Add muti-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision

2019-05-30 Thread Juanjuan Tian (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851660#comment-16851660
 ] 

Juanjuan Tian  edited comment on YARN-7494 at 5/30/19 9:02 AM:
---

Thanks Weiwei for your reply. Here seems there is another issue in 
RegularContainerAllocator#allocate, 

refering to below codes,  it iterates though all nodes, but the 
reservedContainer doesn't change correspondingly with the iterated node, for 
muti-node policy, the reservedContainer and the iterated node will be 
inconsistent, and may procude incorrect ContainerAllocation(even though this 
ContainerAllocation will be abondoned at last, but it seems really wastes 
opportunity). [~cheersyang] what's your thought about this situation

while (iter.hasNext()) {
 FiCaSchedulerNode node = iter.next();

if (reservedContainer == null) {
 result = preCheckForNodeCandidateSet(clusterResource, node,
 schedulingMode, resourceLimits, schedulerKey);
 if (null != result)

{ continue; }

} else {
 // pre-check when allocating reserved container
 if (application.getOutstandingAsksCount(schedulerKey) == 0)

{ // Release result = new ContainerAllocation(reservedContainer, null, 
AllocationState.QUEUE_SKIPPED); continue; }

}

result = tryAllocateOnNode(clusterResource, node, schedulingMode,
 resourceLimits, schedulerKey, reservedContainer);

if (AllocationState.ALLOCATED == result.getAllocationState() || 

AllocationState.RESERVED == result.getAllocationState()) {
 result = doAllocation(result, node, schedulerKey, reservedContainer);
 break;
 }
}
 


was (Author: jutia):
Thanks Weiwei for your reply. Here seems there is another issue in 
RegularContainerAllocator#allocate, 

refering to below codes,  it iterates though all nodes, but the 
reservedContainer doesn't change correspondingly with the iterated node, for 
muti-node policy, the reservedContainer and the iterated node will be 
inconsistent, and may procude incorrect ContainerAllocation(even though this 
ContainerAllocation will be abondoned at last, but it seems really wastes 
opportunity). [~cheersyang] what's your thought about this situation

while (iter.hasNext()) {
 FiCaSchedulerNode node = iter.next();

if (reservedContainer == null) {
 result = preCheckForNodeCandidateSet(clusterResource, node,
 schedulingMode, resourceLimits, schedulerKey);
 if (null != result)

{ continue; }

} else {
 // pre-check when allocating reserved container
 if (application.getOutstandingAsksCount(schedulerKey) == 0)

{ // Release result = new ContainerAllocation(reservedContainer, null, 
AllocationState.QUEUE_SKIPPED); continue; }

}

result = tryAllocateOnNode(clusterResource, node, schedulingMode,
 resourceLimits, schedulerKey, reservedContainer);

if (AllocationState.ALLOCATED == result.getAllocationState()
 || AllocationState.RESERVED == result.getAllocationState()) {
 result = doAllocation(result, node, schedulerKey, reservedContainer);
 break;
}
 
 

> Add muti-node lookup mechanism and pluggable nodes sorting policies to 
> optimize placement decision
> --
>
> Key: YARN-7494
> URL: https://issues.apache.org/jira/browse/YARN-7494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7494.001.patch, YARN-7494.002.patch, 
> YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, 
> YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, 
> YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, 
> YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, 
> YARN-7494.15.patch, YARN-7494.16.patch, YARN-7494.17.patch, 
> YARN-7494.18.patch, YARN-7494.19.patch, YARN-7494.20.patch, 
> YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup 
> based on partition to start with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7494) Add muti-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision

2019-05-30 Thread Juanjuan Tian (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851660#comment-16851660
 ] 

Juanjuan Tian  edited comment on YARN-7494 at 5/30/19 9:00 AM:
---

Thanks Weiwei for your reply. Here seems there is another issue in 
RegularContainerAllocator#allocate, 

refering to below codes,  it iterates though all nodes, but the 
reservedContainer doesn't change correspondingly with the iterated node, for 
muti-node policy, the reservedContainer and the iterated node will be 
inconsistent, and may procude incorrect ContainerAllocation(even though this 
ContainerAllocation will be abondoned at last, but it seems really wastes 
opportunity). [~cheersyang] what's your thought about this situation

while (iter.hasNext()) {
 FiCaSchedulerNode node = iter.next();

if (reservedContainer == null) {
 result = preCheckForNodeCandidateSet(clusterResource, node,
 schedulingMode, resourceLimits, schedulerKey);
 if (null != result)

{ continue; }

} else {
 // pre-check when allocating reserved container
 if (application.getOutstandingAsksCount(schedulerKey) == 0)

{ // Release result = new ContainerAllocation(reservedContainer, null, 
AllocationState.QUEUE_SKIPPED); continue; }

}

result = tryAllocateOnNode(clusterResource, node, schedulingMode,
 resourceLimits, schedulerKey, reservedContainer);

if (AllocationState.ALLOCATED == result.getAllocationState()
 || AllocationState.RESERVED == result.getAllocationState()) {
 result = doAllocation(result, node, schedulerKey, reservedContainer);
 break;
}
 
 


was (Author: jutia):
Thanks Weiwei for your reply. Here seems there is another issue in 
RegularContainerAllocator#allocate, 

refering to below codes,  it iterates though all nodes, but the 
reservedContainer doesn't change correspondingly with the iterated node, for 
muti-node policy, the reservedContainer and the iterated node will be 
inconsistent, and may procude incorrect ContainerAllocation(even though this 
ContainerAllocation will be abondoned at last, but it seems really wastes 
opportunity). [~cheersyang] what's your thought about this situation

while (iter.hasNext()) {
 FiCaSchedulerNode node = iter.next();

if (reservedContainer == null) {
 result = preCheckForNodeCandidateSet(clusterResource, node,
 schedulingMode, resourceLimits, schedulerKey);
 if (null != result)

{ continue; }

} else {
 // pre-check when allocating reserved container
 if (application.getOutstandingAsksCount(schedulerKey) == 0)

{ // Release result = new ContainerAllocation(reservedContainer, null, 
AllocationState.QUEUE_SKIPPED); continue; }

}

result = tryAllocateOnNode(clusterResource, node, schedulingMode,
 resourceLimits, schedulerKey, reservedContainer);

if (AllocationState.ALLOCATED == result.getAllocationState()
||AllocationState.RESERVED == result.getAllocationState()) \{ result = 
doAllocation(result, node, schedulerKey, reservedContainer); break; }}||

 

> Add muti-node lookup mechanism and pluggable nodes sorting policies to 
> optimize placement decision
> --
>
> Key: YARN-7494
> URL: https://issues.apache.org/jira/browse/YARN-7494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7494.001.patch, YARN-7494.002.patch, 
> YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, 
> YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, 
> YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, 
> YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, 
> YARN-7494.15.patch, YARN-7494.16.patch, YARN-7494.17.patch, 
> YARN-7494.18.patch, YARN-7494.19.patch, YARN-7494.20.patch, 
> YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup 
> based on partition to start with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7494) Add muti-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision

2019-05-30 Thread Juanjuan Tian (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851660#comment-16851660
 ] 

Juanjuan Tian  edited comment on YARN-7494 at 5/30/19 8:59 AM:
---

Thanks Weiwei for your reply. Here seems there is another issue in 
RegularContainerAllocator#allocate, 

refering to below codes,  it iterates though all nodes, but the 
reservedContainer doesn't change correspondingly with the iterated node, for 
muti-node policy, the reservedContainer and the iterated node will be 
inconsistent, and may procude incorrect ContainerAllocation(even though this 
ContainerAllocation will be abondoned at last, but it seems really wastes 
opportunity). [~cheersyang] what's your thought about this situation

while (iter.hasNext()) {
 FiCaSchedulerNode node = iter.next();

if (reservedContainer == null) {
 result = preCheckForNodeCandidateSet(clusterResource, node,
 schedulingMode, resourceLimits, schedulerKey);
 if (null != result)

{ continue; }

} else {
 // pre-check when allocating reserved container
 if (application.getOutstandingAsksCount(schedulerKey) == 0)

{ // Release result = new ContainerAllocation(reservedContainer, null, 
AllocationState.QUEUE_SKIPPED); continue; }

}

result = tryAllocateOnNode(clusterResource, node, schedulingMode,
 resourceLimits, schedulerKey, reservedContainer);

if (AllocationState.ALLOCATED == result.getAllocationState()
||AllocationState.RESERVED == result.getAllocationState()) \{ result = 
doAllocation(result, node, schedulerKey, reservedContainer); break; }}||

 


was (Author: jutia):
Thanks Weiwei for your reply. here seems there is another issue in 
RegularContainerAllocator#allocate, 

there it iterates though all nodes, but the reservedContainer doesn't change 
with the iterated node, for muti-node policy, the reservedContainer and the 
iterated node will be inconsistent, and may procude incorrect 
ContainerAllocation(even though this ContainerAllocation will be abondoned at 
last, but it seems really wastes opportunity). [~cheersyang] what's your 
thought about this situation

while (iter.hasNext()) {
 FiCaSchedulerNode node = iter.next();

 if (reservedContainer == null) {
 result = preCheckForNodeCandidateSet(clusterResource, node,
 schedulingMode, resourceLimits, schedulerKey);
 if (null != result) {
 continue;
 }
 } else {
 // pre-check when allocating reserved container
 if (application.getOutstandingAsksCount(schedulerKey) == 0) {
 // Release
 result = new ContainerAllocation(reservedContainer, null,
 AllocationState.QUEUE_SKIPPED);
 continue;
 }
 }

 result = tryAllocateOnNode(clusterResource, node, schedulingMode,
 resourceLimits, schedulerKey, reservedContainer);

 if (AllocationState.ALLOCATED == result.getAllocationState()
 || AllocationState.RESERVED == result.getAllocationState()) {
 result = doAllocation(result, node, schedulerKey, reservedContainer);
 break;
 }
}

 

> Add muti-node lookup mechanism and pluggable nodes sorting policies to 
> optimize placement decision
> --
>
> Key: YARN-7494
> URL: https://issues.apache.org/jira/browse/YARN-7494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7494.001.patch, YARN-7494.002.patch, 
> YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, 
> YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, 
> YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, 
> YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, 
> YARN-7494.15.patch, YARN-7494.16.patch, YARN-7494.17.patch, 
> YARN-7494.18.patch, YARN-7494.19.patch, YARN-7494.20.patch, 
> YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup 
> based on partition to start with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7494) Add muti-node lookup mechanism and pluggable nodes sorting policies to optimize placement decision

2019-05-30 Thread Juanjuan Tian (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851660#comment-16851660
 ] 

Juanjuan Tian  commented on YARN-7494:
--

Thanks Weiwei for your reply. here seems there is another issue in 
RegularContainerAllocator#allocate, 

there it iterates though all nodes, but the reservedContainer doesn't change 
with the iterated node, for muti-node policy, the reservedContainer and the 
iterated node will be inconsistent, and may procude incorrect 
ContainerAllocation(even though this ContainerAllocation will be abondoned at 
last, but it seems really wastes opportunity). [~cheersyang] what's your 
thought about this situation

while (iter.hasNext()) {
 FiCaSchedulerNode node = iter.next();

 if (reservedContainer == null) {
 result = preCheckForNodeCandidateSet(clusterResource, node,
 schedulingMode, resourceLimits, schedulerKey);
 if (null != result) {
 continue;
 }
 } else {
 // pre-check when allocating reserved container
 if (application.getOutstandingAsksCount(schedulerKey) == 0) {
 // Release
 result = new ContainerAllocation(reservedContainer, null,
 AllocationState.QUEUE_SKIPPED);
 continue;
 }
 }

 result = tryAllocateOnNode(clusterResource, node, schedulingMode,
 resourceLimits, schedulerKey, reservedContainer);

 if (AllocationState.ALLOCATED == result.getAllocationState()
 || AllocationState.RESERVED == result.getAllocationState()) {
 result = doAllocation(result, node, schedulerKey, reservedContainer);
 break;
 }
}

 

> Add muti-node lookup mechanism and pluggable nodes sorting policies to 
> optimize placement decision
> --
>
> Key: YARN-7494
> URL: https://issues.apache.org/jira/browse/YARN-7494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7494.001.patch, YARN-7494.002.patch, 
> YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, 
> YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, 
> YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, 
> YARN-7494.12.patch, YARN-7494.13.patch, YARN-7494.14.patch, 
> YARN-7494.15.patch, YARN-7494.16.patch, YARN-7494.17.patch, 
> YARN-7494.18.patch, YARN-7494.19.patch, YARN-7494.20.patch, 
> YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup 
> based on partition to start with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

45 matches

Mail list logo