[jira] [Updated] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING

2017-11-26 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-6948:

Attachment: yarn-6948.png

> Invalid event: ATTEMPT_ADDED at FINAL_SAVING
> 
>
> Key: YARN-6948
> URL: https://issues.apache.org/jira/browse/YARN-6948
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0
>Reporter: lujie
> Attachments: yarn-6948.png
>
>
> When I send kill command to a running job, I check the logs and find the 
> Exception:
> {code:java}
> 2017-08-03 01:35:20,485 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_ADDED at FINAL_SAVING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7561) Why hasContainerForNode() return false directly when there is no request of ANY locality without considering NODE_LOCAL and RACK_LOCAL?

2017-11-26 Thread wuchang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266421#comment-16266421
 ] 

wuchang commented on YARN-7561:
---

[~yufeigu] [~templedf] Would you please give me some suggestions?Thank you very 
much.

> Why hasContainerForNode() return false directly when there is no request of 
> ANY locality without considering NODE_LOCAL and RACK_LOCAL?
> ---
>
> Key: YARN-7561
> URL: https://issues.apache.org/jira/browse/YARN-7561
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.3
>Reporter: wuchang
>
> I am studying the FairScheduler source cod of yarn 2.7.3.
> By the code of class FSAppAttempt:
> {code}
>   public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {
> ResourceRequest anyRequest = getResourceRequest(prio, 
> ResourceRequest.ANY);  
> ResourceRequest rackRequest = getResourceRequest(prio, 
> node.getRackName()); 
> ResourceRequest nodeRequest = getResourceRequest(prio, 
> node.getNodeName()); 
> 
> return
> // There must be outstanding requests at the given priority:
> anyRequest != null && anyRequest.getNumContainers() > 0 &&
> // If locality relaxation is turned off at *-level, there must be 
> a
> // non-zero request for the node's rack:
> (anyRequest.getRelaxLocality() ||
> (rackRequest != null && rackRequest.getNumContainers() > 0)) 
> &&
> // If locality relaxation is turned off at rack-level, there must 
> be a
> // non-zero request at the node:
> (rackRequest == null || rackRequest.getRelaxLocality() ||
> (nodeRequest != null && nodeRequest.getNumContainers() > 0)) 
> &&
> // The requested container must be able to fit on the node:
> Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
> anyRequest.getCapability(), 
> node.getRMNode().getTotalCapability());
> }
> {code}
> I really cannot understand why when there is no anyRequest , 
> *hasContainerForNode()* return false directly without considering whether 
> there is NODE_LOCAL  or  RACK_LOCAL requests.
> And ,  *AppSchedulingInfo.allocateNodeLocal()* and 
> *AppSchedulingInfo.allocateRackLocal()* will also decrease the number of 
> containers for *ResourceRequest.ANY*, this is another place where I feel 
> confused.
> Really thanks for some prompt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7561) Why hasContainerForNode() return false directly when there is no request of ANY locality without considering NODE_LOCAL and RACK_LOCAL?

2017-11-26 Thread wuchang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuchang updated YARN-7561:
--
Description: 
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of class FSAppAttempt:

{code}
  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
}
{code}


I really cannot understand why when there is no anyRequest , 
*hasContainerForNode()* return false directly without considering whether there 
is NODE_LOCAL  or  RACK_LOCAL requests.
And ,  *AppSchedulingInfo.allocateNodeLocal()* and 
*AppSchedulingInfo.allocateRackLocal()* will also decrease the number of 
containers for *ResourceRequest.ANY*, this is another place where I feel 
confused.

Really thanks for some prompt.


  was:
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of class FSAppAttempt:

{code}
  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
}
{code}


I really cannot understand why when there is no anyRequest , 
*hasContainerForNode()* return false directly without considering whether there 
is NODE_LOCAL  or  RACK_LOCAL requests.
And ,  *AppSchedulingInfo.allocateNodeLocal()* and 
*AppSchedulingInfo.allocateRackLocal()* will also decrease the number of 
containers for *ResourceRequest.ANY*, this is another place where I feel 
confused.

Really thanks for some prompt.


> Why hasContainerForNode() return false directly when there is no request of 
> ANY locality without considering NODE_LOCAL and RACK_LOCAL?
> ---
>
> Key: YARN-7561
> URL: https://issues.apache.org/jira/browse/YARN-7561
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.3
>Reporter: wuchang
>
> I am studying the FairScheduler source cod of yarn 2.7.3.
> By the code of class FSAppAttempt:
> {code}
>   public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {
> ResourceRequest anyRequest = getResourceRequest(prio, 
> ResourceRequest.ANY);  
> ResourceRequest rackRequest = getResourceRequest(prio, 
> node.getRackName()); 
> ResourceRequest nodeRequest = getResourceRequest(prio, 
> node.getNodeName()); 
> 
> return
> // There must be outstanding requests at the given priority:
> anyRequest != null && anyRequest.getNumContainers() > 0 &&
> // If locality relaxation is turned off at *-level, there must be 
> a
>   

[jira] [Updated] (YARN-7561) Why hasContainerForNode return false directly when there is no request of ANY locality without considering NODE_LOCAL and RACK_LOCAL?

2017-11-26 Thread wuchang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuchang updated YARN-7561:
--
Summary: Why hasContainerForNode return false directly when there is no 
request of ANY locality without considering NODE_LOCAL and RACK_LOCAL?  (was: 
Why hasContainerForNode return false directly when there is no request of ANY 
locality?)

> Why hasContainerForNode return false directly when there is no request of ANY 
> locality without considering NODE_LOCAL and RACK_LOCAL?
> -
>
> Key: YARN-7561
> URL: https://issues.apache.org/jira/browse/YARN-7561
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.3
>Reporter: wuchang
>
> I am studying the FairScheduler source cod of yarn 2.7.3.
> By the code of class FSAppAttempt:
> {code}
>   public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {
> ResourceRequest anyRequest = getResourceRequest(prio, 
> ResourceRequest.ANY);  
> ResourceRequest rackRequest = getResourceRequest(prio, 
> node.getRackName()); 
> ResourceRequest nodeRequest = getResourceRequest(prio, 
> node.getNodeName()); 
> 
> return
> // There must be outstanding requests at the given priority:
> anyRequest != null && anyRequest.getNumContainers() > 0 &&
> // If locality relaxation is turned off at *-level, there must be 
> a
> // non-zero request for the node's rack:
> (anyRequest.getRelaxLocality() ||
> (rackRequest != null && rackRequest.getNumContainers() > 0)) 
> &&
> // If locality relaxation is turned off at rack-level, there must 
> be a
> // non-zero request at the node:
> (rackRequest == null || rackRequest.getRelaxLocality() ||
> (nodeRequest != null && nodeRequest.getNumContainers() > 0)) 
> &&
> // The requested container must be able to fit on the node:
> Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
> anyRequest.getCapability(), 
> node.getRMNode().getTotalCapability());
> }
> {code}
> I really cannot understand why when there is no anyRequest , 
> *hasContainerForNode()* return false directly without considering whether 
> there is NODE_LOCAL  or  RACK_LOCAL requests.
> And ,  *AppSchedulingInfo.allocateNodeLocal()* and 
> *AppSchedulingInfo.allocateRackLocal()* will also decrease the number of 
> containers for *ResourceRequest.ANY*, this is another place where I feel 
> confused.
> Really thanks for some prompt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7561) Why hasContainerForNode() return false directly when there is no request of ANY locality without considering NODE_LOCAL and RACK_LOCAL?

2017-11-26 Thread wuchang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuchang updated YARN-7561:
--
Summary: Why hasContainerForNode() return false directly when there is no 
request of ANY locality without considering NODE_LOCAL and RACK_LOCAL?  (was: 
Why hasContainerForNode return false directly when there is no request of ANY 
locality without considering NODE_LOCAL and RACK_LOCAL?)

> Why hasContainerForNode() return false directly when there is no request of 
> ANY locality without considering NODE_LOCAL and RACK_LOCAL?
> ---
>
> Key: YARN-7561
> URL: https://issues.apache.org/jira/browse/YARN-7561
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.3
>Reporter: wuchang
>
> I am studying the FairScheduler source cod of yarn 2.7.3.
> By the code of class FSAppAttempt:
> {code}
>   public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {
> ResourceRequest anyRequest = getResourceRequest(prio, 
> ResourceRequest.ANY);  
> ResourceRequest rackRequest = getResourceRequest(prio, 
> node.getRackName()); 
> ResourceRequest nodeRequest = getResourceRequest(prio, 
> node.getNodeName()); 
> 
> return
> // There must be outstanding requests at the given priority:
> anyRequest != null && anyRequest.getNumContainers() > 0 &&
> // If locality relaxation is turned off at *-level, there must be 
> a
> // non-zero request for the node's rack:
> (anyRequest.getRelaxLocality() ||
> (rackRequest != null && rackRequest.getNumContainers() > 0)) 
> &&
> // If locality relaxation is turned off at rack-level, there must 
> be a
> // non-zero request at the node:
> (rackRequest == null || rackRequest.getRelaxLocality() ||
> (nodeRequest != null && nodeRequest.getNumContainers() > 0)) 
> &&
> // The requested container must be able to fit on the node:
> Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
> anyRequest.getCapability(), 
> node.getRMNode().getTotalCapability());
> }
> {code}
> I really cannot understand why when there is no anyRequest , 
> *hasContainerForNode()* return false directly without considering whether 
> there is NODE_LOCAL  or  RACK_LOCAL requests.
> And ,  *AppSchedulingInfo.allocateNodeLocal()* and 
> *AppSchedulingInfo.allocateRackLocal()* will also decrease the number of 
> containers for *ResourceRequest.ANY*, this is another place where I feel 
> confused.
> Really thanks for some prompt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7561) Why hasContainerForNode return false directly when there is no request of ANY locality?

2017-11-26 Thread wuchang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuchang updated YARN-7561:
--
Description: 
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of class FSAppAttempt:

{code}
  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
}
{code}


I really cannot understand why when there is no anyRequest , 
*hasContainerForNode()* return false directly without considering whether there 
is NODE_LOCAL  or  RACK_LOCAL requests.
And ,  *AppSchedulingInfo.allocateNodeLocal()* and 
*AppSchedulingInfo.allocateRackLocal()* will also decrease the number of 
containers for *ResourceRequest.ANY*, this is another place where I feel 
confused.

Really thanks for some prompt.

  was:
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of class FSAppAttempt:

{quote}
  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
}
{quote}


I really cannot understand why when there is no anyRequest , 
*hasContainerForNode()* return false directly without considering whether there 
is NODE_LOCAL  or  RACK_LOCAL requests.
And ,  *AppSchedulingInfo.allocateNodeLocal()* and 
*AppSchedulingInfo.allocateRackLocal()* will also decrease the number of 
containers for *ResourceRequest.ANY*, this is another place where I feel 
confused.

Really thanks for some prompt.


> Why hasContainerForNode return false directly when there is no request of ANY 
> locality?
> ---
>
> Key: YARN-7561
> URL: https://issues.apache.org/jira/browse/YARN-7561
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.3
>Reporter: wuchang
>
> I am studying the FairScheduler source cod of yarn 2.7.3.
> By the code of class FSAppAttempt:
> {code}
>   public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {
> ResourceRequest anyRequest = getResourceRequest(prio, 
> ResourceRequest.ANY);  
> ResourceRequest rackRequest = getResourceRequest(prio, 
> node.getRackName()); 
> ResourceRequest nodeRequest = getResourceRequest(prio, 
> node.getNodeName()); 
> 
> return
> // There must be outstanding requests at the given priority:
> anyRequest != null && anyRequest.getNumContainers() > 0 &&
> // If locality relaxation is turned off at *-level, there must be 
> a
> // non-zero request for the node's rack:
> (anyRequest.getRelaxLocality() ||
>   

[jira] [Updated] (YARN-7561) Why hasContainerForNode return false directly when there is no request of ANY locality?

2017-11-26 Thread wuchang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuchang updated YARN-7561:
--
Description: 
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of class FSAppAttempt:

{quote}
  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
}
{quote}


I really cannot understand why when there is no anyRequest , 
*hasContainerForNode()* return false directly without considering whether there 
is NODE_LOCAL  or  RACK_LOCAL requests.
And ,  *AppSchedulingInfo.allocateNodeLocal()* and 
*AppSchedulingInfo.allocateRackLocal()* will also decrease the number of 
containers for *ResourceRequest.ANY*, this is another place where I feel 
confused.

Really thanks for some prompt.

  was:
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of class FSAppAttempt:

```

  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
}


```


I really cannot understand why when there is no anyRequest , 
`hasContainerForNode()` return false directly without considering whether there 
is NODE_LOCAL  or  RACK_LOCAL requests.
And ,  `AppSchedulingInfo.allocateNodeLocal()` and 
`AppSchedulingInfo.allocateRackLocal()` will also decrease the number of 
containers for `ResourceRequest.ANY`, this is another place where I feel 
confused.

Really thanks for some prompt.


> Why hasContainerForNode return false directly when there is no request of ANY 
> locality?
> ---
>
> Key: YARN-7561
> URL: https://issues.apache.org/jira/browse/YARN-7561
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.3
>Reporter: wuchang
>
> I am studying the FairScheduler source cod of yarn 2.7.3.
> By the code of class FSAppAttempt:
> {quote}
>   public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {
> ResourceRequest anyRequest = getResourceRequest(prio, 
> ResourceRequest.ANY);  
> ResourceRequest rackRequest = getResourceRequest(prio, 
> node.getRackName()); 
> ResourceRequest nodeRequest = getResourceRequest(prio, 
> node.getNodeName()); 
> 
> return
> // There must be outstanding requests at the given priority:
> anyRequest != null && anyRequest.getNumContainers() > 0 &&
> // If locality relaxation is turned off at *-level, there must be 
> a
> // non-zero request for the node's rack:
> (anyRequest.getRelaxLocality() ||
> 

[jira] [Updated] (YARN-7561) Why hasContainerForNode return false directly when there is no request of ANY locality?

2017-11-26 Thread wuchang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuchang updated YARN-7561:
--
Description: 
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of class FSAppAttempt:

```

  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
}


```


I really cannot understand why when there is no anyRequest , 
`hasContainerForNode()` return false directly without considering whether there 
is NODE_LOCAL  or  RACK_LOCAL requests.
And ,  `AppSchedulingInfo.allocateNodeLocal()` and 
`AppSchedulingInfo.allocateRackLocal()` will also decrease the number of 
containers for `ResourceRequest.ANY`, this is another place where I feel 
confused.

Really thanks for some prompt.

  was:
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of class FSAppAttempt:

{code}
  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
  }
{code}



I really cannot understand why when there is no anyRequest , 
*hasContainerForNode()* return false directly without considering whether there 
is NODE_LOCAL  or  RACK_LOCAL requests.
And ,  *AppSchedulingInfo.allocateNodeLocal()* and 
*AppSchedulingInfo.allocateRackLocal()* will also decrease the number of 
containers for ResourceRequest.ANY, this is another place where I feel confused.

Really thanks for some prompt.



> Why hasContainerForNode return false directly when there is no request of ANY 
> locality?
> ---
>
> Key: YARN-7561
> URL: https://issues.apache.org/jira/browse/YARN-7561
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.3
>Reporter: wuchang
>
> I am studying the FairScheduler source cod of yarn 2.7.3.
> By the code of class FSAppAttempt:
> ```
>   public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {
> ResourceRequest anyRequest = getResourceRequest(prio, 
> ResourceRequest.ANY);  
> ResourceRequest rackRequest = getResourceRequest(prio, 
> node.getRackName()); 
> ResourceRequest nodeRequest = getResourceRequest(prio, 
> node.getNodeName()); 
> 
> return
> // There must be outstanding requests at the given priority:
> anyRequest != null && anyRequest.getNumContainers() > 0 &&
> // If locality relaxation is turned off at *-level, there must be 
> a
> // non-zero request for the node's rack:
> (anyRequest.getRelaxLocality() ||
>  

[jira] [Updated] (YARN-7561) Why hasContainerForNode return false directly when there is no request of ANY locality?

2017-11-26 Thread wuchang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuchang updated YARN-7561:
--
Description: 
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of class FSAppAttempt:

{code}
  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
  }
{code}



I really cannot understand why when there is no anyRequest , 
*hasContainerForNode()* return false directly without considering whether there 
is NODE_LOCAL  or  RACK_LOCAL requests.
And ,  *AppSchedulingInfo.allocateNodeLocal()* and 
*AppSchedulingInfo.allocateRackLocal()* will also decrease the number of 
containers for ResourceRequest.ANY, this is another place where I feel confused.

Really thanks for some prompt.


  was:
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of class FSAppAttempt:

{code}
  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
  }
{code}



I really cannot understand why when there is no anyRequest , 
*hasContainerForNode()* return false directly without considering where there 
is NODE_LOCAL  or  RACK_LOCAL requests, and , 
*AppSchedulingInfo.allocateNodeLocal()* and 
*AppSchedulingInfo.allocateRackLocal()* will also decrease the number of 
containers for ResourceRequest.ANY.

Really thanks for some prompt.



> Why hasContainerForNode return false directly when there is no request of ANY 
> locality?
> ---
>
> Key: YARN-7561
> URL: https://issues.apache.org/jira/browse/YARN-7561
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.3
>Reporter: wuchang
>
> I am studying the FairScheduler source cod of yarn 2.7.3.
> By the code of class FSAppAttempt:
> {code}
>   public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {
> ResourceRequest anyRequest = getResourceRequest(prio, 
> ResourceRequest.ANY);  
> ResourceRequest rackRequest = getResourceRequest(prio, 
> node.getRackName()); 
> ResourceRequest nodeRequest = getResourceRequest(prio, 
> node.getNodeName()); 
> return
> // There must be outstanding requests at the given priority:
> anyRequest != null && anyRequest.getNumContainers() > 0 &&
> // If locality relaxation is turned off at *-level, there must be 
> a
> // non-zero request for the node's rack:
> (anyRequest.getRelaxLocality() ||
> (rackRequest != null && 

[jira] [Updated] (YARN-7561) Why hasContainerForNode return false directly when there is no request of ANY locality?

2017-11-26 Thread wuchang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuchang updated YARN-7561:
--
Description: 
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of class FSAppAttempt:

{code}
  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
  }
{code}



I really cannot understand why when there is no anyRequest , 
*hasContainerForNode()* return false directly without considering where there 
is NODE_LOCAL  or  RACK_LOCAL requests, and , 
*AppSchedulingInfo.allocateNodeLocal()* and 
*AppSchedulingInfo.allocateRackLocal()* will also decrease the number of 
containers for ResourceRequest.ANY.

Really thanks for some prompt.


  was:
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of in class FSAppAttempt:

{code}
  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
  }
{code}



I really cannot understand why when there is no anyRequest , 
*hasContainerForNode()* return false directly without considering where there 
is NODE_LOCAL  or  RACK_LOCAL requests, and , 
*AppSchedulingInfo.allocateNodeLocal()* and 
*AppSchedulingInfo.allocateRackLocal()* will also decrease the number of 
containers for ResourceRequest.ANY.

Really thanks for some prompt.



> Why hasContainerForNode return false directly when there is no request of ANY 
> locality?
> ---
>
> Key: YARN-7561
> URL: https://issues.apache.org/jira/browse/YARN-7561
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.3
>Reporter: wuchang
>
> I am studying the FairScheduler source cod of yarn 2.7.3.
> By the code of class FSAppAttempt:
> {code}
>   public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {
> ResourceRequest anyRequest = getResourceRequest(prio, 
> ResourceRequest.ANY);  
> ResourceRequest rackRequest = getResourceRequest(prio, 
> node.getRackName()); 
> ResourceRequest nodeRequest = getResourceRequest(prio, 
> node.getNodeName()); 
> return
> // There must be outstanding requests at the given priority:
> anyRequest != null && anyRequest.getNumContainers() > 0 &&
> // If locality relaxation is turned off at *-level, there must be 
> a
> // non-zero request for the node's rack:
> (anyRequest.getRelaxLocality() ||
> (rackRequest != null && rackRequest.getNumContainers() > 0)) 
> &&
> // If 

[jira] [Updated] (YARN-7561) Why hasContainerForNode return false directly when there is no request of ANY locality?

2017-11-26 Thread wuchang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuchang updated YARN-7561:
--
Description: 
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of in class FSAppAttempt:

{code}
  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
  }
{code}



I really cannot understand why when there is no anyRequest , 
*hasContainerForNode()* return false directly without considering where there 
is NODE_LOCAL  or  RACK_LOCAL requests, and , 
*AppSchedulingInfo.allocateNodeLocal()* and 
*AppSchedulingInfo.allocateRackLocal()* will also decrease the number of 
containers for ResourceRequest.ANY.

Really thanks for some prompt.


  was:
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of in class FSAppAttempt:

{code}
  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
  }
{code}



I really cannot understand why when there is no anyRequest , the method return 
false directly without considering where there is NODE_LOCAL  or  RACK_LOCAL 
requests, and , {quote}AppSchedulingInfo.allocateNodeLocal(){quote}  and 
{quote}AppSchedulingInfo.allocateRackLocal(){quote} will also decrease the 
number of containers for ResourceRequest.ANY.

Really thanks for some prompt.



> Why hasContainerForNode return false directly when there is no request of ANY 
> locality?
> ---
>
> Key: YARN-7561
> URL: https://issues.apache.org/jira/browse/YARN-7561
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.3
>Reporter: wuchang
>
> I am studying the FairScheduler source cod of yarn 2.7.3.
> By the code of in class FSAppAttempt:
> {code}
>   public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {
> ResourceRequest anyRequest = getResourceRequest(prio, 
> ResourceRequest.ANY);  
> ResourceRequest rackRequest = getResourceRequest(prio, 
> node.getRackName()); 
> ResourceRequest nodeRequest = getResourceRequest(prio, 
> node.getNodeName()); 
> return
> // There must be outstanding requests at the given priority:
> anyRequest != null && anyRequest.getNumContainers() > 0 &&
> // If locality relaxation is turned off at *-level, there must be 
> a
> // non-zero request for the node's rack:
> (anyRequest.getRelaxLocality() ||
> (rackRequest != null && rackRequest.getNumContainers() > 0)) 
> &&
>   

[jira] [Updated] (YARN-7561) Why hasContainerForNode return false directly when there is no request of ANY locality?

2017-11-26 Thread wuchang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuchang updated YARN-7561:
--
Description: 
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of in class FSAppAttempt:

{code}
  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
  }
{code}



I really cannot understand why when there is no anyRequest , the method return 
false directly without considering where there is NODE_LOCAL  or  RACK_LOCAL 
requests, and , {quote}AppSchedulingInfo.allocateNodeLocal(){quote}  and 
{quote}AppSchedulingInfo.allocateRackLocal(){quote} will also decrease the 
number of containers for ResourceRequest.ANY.

Really thanks for some prompt.


  was:
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of in class FSAppAttempt:

{code}
  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
  }
{code}



I really cannot understand why when there is no anyRequest , the method return 
false directly without considering where there is NODE_LOCAL  or  RACK_LOCAL 
requests, and , *AppSchedulingInfo.allocateNodeLocal()*  and 
*AppSchedulingInfo.allocateRackLocal()*  will also decrease the number of 
containers for ResourceRequest.ANY.

Really thanks for some prompt.



> Why hasContainerForNode return false directly when there is no request of ANY 
> locality?
> ---
>
> Key: YARN-7561
> URL: https://issues.apache.org/jira/browse/YARN-7561
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.3
>Reporter: wuchang
>
> I am studying the FairScheduler source cod of yarn 2.7.3.
> By the code of in class FSAppAttempt:
> {code}
>   public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {
> ResourceRequest anyRequest = getResourceRequest(prio, 
> ResourceRequest.ANY);  
> ResourceRequest rackRequest = getResourceRequest(prio, 
> node.getRackName()); 
> ResourceRequest nodeRequest = getResourceRequest(prio, 
> node.getNodeName()); 
> return
> // There must be outstanding requests at the given priority:
> anyRequest != null && anyRequest.getNumContainers() > 0 &&
> // If locality relaxation is turned off at *-level, there must be 
> a
> // non-zero request for the node's rack:
> (anyRequest.getRelaxLocality() ||
> (rackRequest != null && rackRequest.getNumContainers() > 0)) 
> &&
> // If 

[jira] [Created] (YARN-7561) Why hasContainerForNode return false directly when there is no request of ANY locality?

2017-11-26 Thread wuchang (JIRA)
wuchang created YARN-7561:
-

 Summary: Why hasContainerForNode return false directly when there 
is no request of ANY locality?
 Key: YARN-7561
 URL: https://issues.apache.org/jira/browse/YARN-7561
 Project: Hadoop YARN
  Issue Type: Task
  Components: fairscheduler
Affects Versions: 2.7.3
Reporter: wuchang


I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of in class FSAppAttempt:

{quote}
  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
  }
{quote}



I really cannot understand why when there is no anyRequest , the method return 
false directly without considering where there is NODE_LOCAL  or  RACK_LOCAL 
requests, and , *AppSchedulingInfo.allocateNodeLocal()*  and 
*AppSchedulingInfo.allocateRackLocal()*  will also decrease the number of 
containers for ResourceRequest.ANY.

Really thanks for some prompt.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7561) Why hasContainerForNode return false directly when there is no request of ANY locality?

2017-11-26 Thread wuchang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuchang updated YARN-7561:
--
Description: 
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of in class FSAppAttempt:

{code}
  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
  }
{code}



I really cannot understand why when there is no anyRequest , the method return 
false directly without considering where there is NODE_LOCAL  or  RACK_LOCAL 
requests, and , *AppSchedulingInfo.allocateNodeLocal()*  and 
*AppSchedulingInfo.allocateRackLocal()*  will also decrease the number of 
containers for ResourceRequest.ANY.

Really thanks for some prompt.


  was:
I am studying the FairScheduler source cod of yarn 2.7.3.
By the code of in class FSAppAttempt:

{quote}
  public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {

ResourceRequest anyRequest = getResourceRequest(prio, ResourceRequest.ANY); 
 
ResourceRequest rackRequest = getResourceRequest(prio, node.getRackName()); 
ResourceRequest nodeRequest = getResourceRequest(prio, node.getNodeName()); 

return
// There must be outstanding requests at the given priority:
anyRequest != null && anyRequest.getNumContainers() > 0 &&
// If locality relaxation is turned off at *-level, there must be a
// non-zero request for the node's rack:
(anyRequest.getRelaxLocality() ||
(rackRequest != null && rackRequest.getNumContainers() > 0)) &&
// If locality relaxation is turned off at rack-level, there must 
be a
// non-zero request at the node:
(rackRequest == null || rackRequest.getRelaxLocality() ||
(nodeRequest != null && nodeRequest.getNumContainers() > 0)) &&
// The requested container must be able to fit on the node:
Resources.lessThanOrEqual(RESOURCE_CALCULATOR, null,
anyRequest.getCapability(), 
node.getRMNode().getTotalCapability());
  }
{quote}



I really cannot understand why when there is no anyRequest , the method return 
false directly without considering where there is NODE_LOCAL  or  RACK_LOCAL 
requests, and , *AppSchedulingInfo.allocateNodeLocal()*  and 
*AppSchedulingInfo.allocateRackLocal()*  will also decrease the number of 
containers for ResourceRequest.ANY.

Really thanks for some prompt.



> Why hasContainerForNode return false directly when there is no request of ANY 
> locality?
> ---
>
> Key: YARN-7561
> URL: https://issues.apache.org/jira/browse/YARN-7561
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: fairscheduler
>Affects Versions: 2.7.3
>Reporter: wuchang
>
> I am studying the FairScheduler source cod of yarn 2.7.3.
> By the code of in class FSAppAttempt:
> {code}
>   public boolean hasContainerForNode(Priority prio, FSSchedulerNode node) {
> ResourceRequest anyRequest = getResourceRequest(prio, 
> ResourceRequest.ANY);  
> ResourceRequest rackRequest = getResourceRequest(prio, 
> node.getRackName()); 
> ResourceRequest nodeRequest = getResourceRequest(prio, 
> node.getNodeName()); 
> return
> // There must be outstanding requests at the given priority:
> anyRequest != null && anyRequest.getNumContainers() > 0 &&
> // If locality relaxation is turned off at *-level, there must be 
> a
> // non-zero request for the node's rack:
> (anyRequest.getRelaxLocality() ||
> (rackRequest != null && rackRequest.getNumContainers() > 0)) 
> &&
> // If locality relaxation 

[jira] [Commented] (YARN-7535) We should display origin value of demand in fair scheduler page

2017-11-26 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266313#comment-16266313
 ] 

Wilfred Spiegelenburg commented on YARN-7535:
-

The code has changed in recent versions, there is no updateDemandForApp any 
more after YARN-6172.

Demand for a queue as [~yufeigu] explained should be limited to the maximum the 
queue can use. So the existing code should be left as is. Changing the 
calculation would affect the minimum share starvation and some other 
calculations that use the demand. Having the extra detail on how high demand 
really is in a queue is could provide some more detail for tuning. The 
{{FSAppAttempt}} does not cap it so we have the info already.

Some considerations:
- We could store the extra detail to the {{leafQueue}}. There would not really 
be an overhead beside some extra local storage.
- Adding it to the {{parentQueue}} to get it for the whole hierarchy would be 
possible but it does involve overhead. We would then also need to choose if we 
want the unlimited demand from the child queue or the limited version
- The scheduler state dump is easily changed, 
- Do we want to display this in the web UI? It might be confusing to show the 
two numbers always and the state dump would be a much better place because it 
can be seen over time instead of just one instance


> We should display origin value of demand in fair scheduler page
> ---
>
> Key: YARN-7535
> URL: https://issues.apache.org/jira/browse/YARN-7535
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: YunFan Zhou
>Assignee: YunFan Zhou
>
> The value of *demand* of leaf queue that we now view on the fair scheduler 
> page shows only the value of *maxResources* when the demand value is greater 
> than *maxResources*. It doesn't reflect the real situation. Most of the time, 
> when we expand the queue, we often rely on seeing the current demand real 
> value.
> {code:java}
> private void updateDemandForApp(FSAppAttempt sched, Resource maxRes) {
> sched.updateDemand();
> Resource toAdd = sched.getDemand();
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Counting resource from " + sched.getName() + " " + toAdd
>   + "; Total resource consumption for " + getName() + " now "
>   + demand);
> }
> demand = Resources.add(demand, toAdd);
> demand = Resources.componentwiseMin(demand, maxRes);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7534) Fair scheduler assign resources may exceed maxResources

2017-11-26 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266302#comment-16266302
 ] 

Wilfred Spiegelenburg commented on YARN-7534:
-

Based on the current analysis I do not think we have a problem.
[~daemon] if you have logs that show this is not working please attach 
otherwise I will close this as not a problem

> Fair scheduler assign resources may exceed maxResources
> ---
>
> Key: YARN-7534
> URL: https://issues.apache.org/jira/browse/YARN-7534
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: YunFan Zhou
>Assignee: Wilfred Spiegelenburg
>
> The logic we're scheduling now is to check whether the resources used by the 
> queue has exceeded *maxResources* before assigning the container. This will 
> leads to the fact that after assigning this container the queue uses more 
> resources than *maxResources*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7560) Resourcemanager hangs when resourceUsedWithWeightToResourceRatio return a overflow value

2017-11-26 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266300#comment-16266300
 ] 

Wilfred Spiegelenburg commented on YARN-7560:
-

looks good to me, +1 (non binding)


> Resourcemanager hangs when  resourceUsedWithWeightToResourceRatio return a 
> overflow value 
> --
>
> Key: YARN-7560
> URL: https://issues.apache.org/jira/browse/YARN-7560
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 3.0.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
> Fix For: 3.0.0
>
> Attachments: YARN-7560.000.patch, YARN-7560.001.patch
>
>
> In our cluster, we changed the configuration, then refreshQueues, we found 
> the resourcemanager hangs. And the Resourcemanager can't restart 
> successfully. We got jstack information, always show like this:
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7f98e8017000 nid=0x2f5 runnable 
> [0x7f98eed9a000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:182)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeSharesInternal(ComputeFairShares.java:140)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeSteadyShares(ComputeFairShares.java:66)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeSteadyShares(FairSharePolicy.java:148)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeSteadyShares(FSParentQueue.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:148)
> - locked <0x7f8c4a8177a0> (a java.util.HashMap)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:101)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.updateAllocationConfiguration(QueueManager.java:387)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$AllocationReloadListener.onReload(FairScheduler.java:1728)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:422)
> - locked <0x7f8c4a7eb2e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1597)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1621)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> - locked <0x7f8c4a76ac48> (a java.lang.Object)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:569)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> - locked <0x7f8c49254268> (a java.lang.Object)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:997)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:257)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> - locked <0x7f8c467495e0> (a java.lang.Object)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1220)
> {code}
> When we debug the cluster, we found resourceUsedWithWeightToResourceRatio 
> return a negative value. So the loop can't return. We found in our cluster, 
> the sum of all minRes is over int.max, so 
> resourceUsedWithWeightToResourceRatio return a negative value.
> below is the loop. Because totalResource is long, so always postive. But 
> resourceUsedWithWeightToResourceRatio return int type. Our cluster is so big 
> that resourceUsedWithWeightToResourceRatio will return a overflow value, just 
> a negative. So the loop will never break.
> {code}
> while (resourceUsedWithWeightToResourceRatio(rMax, schedulables, type)
> < totalResource) {
>   rMax *= 2.0;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (YARN-7560) Resourcemanager hangs when resourceUsedWithWeightToResourceRatio return a overflow value

2017-11-26 Thread zhengchenyu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266297#comment-16266297
 ] 

zhengchenyu commented on YARN-7560:
---

[~wilfreds]
Thank for your advice, I have revised my patch. The new patch is 
YARN-7560.001.patch

> Resourcemanager hangs when  resourceUsedWithWeightToResourceRatio return a 
> overflow value 
> --
>
> Key: YARN-7560
> URL: https://issues.apache.org/jira/browse/YARN-7560
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 3.0.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
> Fix For: 3.0.0
>
> Attachments: YARN-7560.000.patch, YARN-7560.001.patch
>
>
> In our cluster, we changed the configuration, then refreshQueues, we found 
> the resourcemanager hangs. And the Resourcemanager can't restart 
> successfully. We got jstack information, always show like this:
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7f98e8017000 nid=0x2f5 runnable 
> [0x7f98eed9a000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:182)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeSharesInternal(ComputeFairShares.java:140)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeSteadyShares(ComputeFairShares.java:66)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeSteadyShares(FairSharePolicy.java:148)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeSteadyShares(FSParentQueue.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:148)
> - locked <0x7f8c4a8177a0> (a java.util.HashMap)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:101)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.updateAllocationConfiguration(QueueManager.java:387)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$AllocationReloadListener.onReload(FairScheduler.java:1728)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:422)
> - locked <0x7f8c4a7eb2e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1597)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1621)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> - locked <0x7f8c4a76ac48> (a java.lang.Object)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:569)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> - locked <0x7f8c49254268> (a java.lang.Object)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:997)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:257)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> - locked <0x7f8c467495e0> (a java.lang.Object)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1220)
> {code}
> When we debug the cluster, we found resourceUsedWithWeightToResourceRatio 
> return a negative value. So the loop can't return. We found in our cluster, 
> the sum of all minRes is over int.max, so 
> resourceUsedWithWeightToResourceRatio return a negative value.
> below is the loop. Because totalResource is long, so always postive. But 
> resourceUsedWithWeightToResourceRatio return int type. Our cluster is so big 
> that resourceUsedWithWeightToResourceRatio will return a overflow value, just 
> a negative. So the loop will never break.
> {code}
> while (resourceUsedWithWeightToResourceRatio(rMax, schedulables, type)
> < totalResource) {
>   rMax *= 2.0;
> }
> {code}



--
This message was sent by Atlassian 

[jira] [Updated] (YARN-7560) Resourcemanager hangs when resourceUsedWithWeightToResourceRatio return a overflow value

2017-11-26 Thread zhengchenyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-7560:
--
Attachment: YARN-7560.001.patch

> Resourcemanager hangs when  resourceUsedWithWeightToResourceRatio return a 
> overflow value 
> --
>
> Key: YARN-7560
> URL: https://issues.apache.org/jira/browse/YARN-7560
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 3.0.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
> Fix For: 3.0.0
>
> Attachments: YARN-7560.000.patch, YARN-7560.001.patch
>
>
> In our cluster, we changed the configuration, then refreshQueues, we found 
> the resourcemanager hangs. And the Resourcemanager can't restart 
> successfully. We got jstack information, always show like this:
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7f98e8017000 nid=0x2f5 runnable 
> [0x7f98eed9a000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:182)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeSharesInternal(ComputeFairShares.java:140)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeSteadyShares(ComputeFairShares.java:66)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeSteadyShares(FairSharePolicy.java:148)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeSteadyShares(FSParentQueue.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:148)
> - locked <0x7f8c4a8177a0> (a java.util.HashMap)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:101)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.updateAllocationConfiguration(QueueManager.java:387)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$AllocationReloadListener.onReload(FairScheduler.java:1728)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:422)
> - locked <0x7f8c4a7eb2e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1597)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1621)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> - locked <0x7f8c4a76ac48> (a java.lang.Object)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:569)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> - locked <0x7f8c49254268> (a java.lang.Object)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:997)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:257)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> - locked <0x7f8c467495e0> (a java.lang.Object)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1220)
> {code}
> When we debug the cluster, we found resourceUsedWithWeightToResourceRatio 
> return a negative value. So the loop can't return. We found in our cluster, 
> the sum of all minRes is over int.max, so 
> resourceUsedWithWeightToResourceRatio return a negative value.
> below is the loop. Because totalResource is long, so always postive. But 
> resourceUsedWithWeightToResourceRatio return int type. Our cluster is so big 
> that resourceUsedWithWeightToResourceRatio will return a overflow value, just 
> a negative. So the loop will never break.
> {code}
> while (resourceUsedWithWeightToResourceRatio(rMax, schedulables, type)
> < totalResource) {
>   rMax *= 2.0;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To 

[jira] [Commented] (YARN-7560) Resourcemanager hangs when resourceUsedWithWeightToResourceRatio return a overflow value

2017-11-26 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266273#comment-16266273
 ] 

Wilfred Spiegelenburg commented on YARN-7560:
-

Thank you [~zhengchenyu] for the patch
Some comments on the patch:
* Can you please remove the unneeded casts to long that are left in 
computeSharesInternal, handleFixedFairShares:
{code}
127  totalMaxShare = Math.min(maxShare + (long)totalMaxShare,
128  Long.MAX_VALUE);
...
169  target.setResourceValue(type, (long)computeShare(sched, right, type));
{code}
and
{code}
224totalResource = Math.min((long)totalResource + (long)fixedShare,
225Long.MAX_VALUE);
{code}
* In resourceUsedWithWeightToResourceRatio we should not have to create a 
temporary variable share and could do:
{code}
  resourcesTaken += computeShare(sched, w2rRatio, type);
{code}
* In {{computeShare}} we should move the cast from double to long to the point 
where we calculate the share instead of leaving at to after we do the min and 
max checks and remove the cast at the end of the call that will speed up 
calculations slightly and won't change the outcome:
{code}
192long share = (long)(sched.getWeight() * w2rRatio);
{code}

> Resourcemanager hangs when  resourceUsedWithWeightToResourceRatio return a 
> overflow value 
> --
>
> Key: YARN-7560
> URL: https://issues.apache.org/jira/browse/YARN-7560
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, resourcemanager
>Affects Versions: 3.0.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
> Fix For: 3.0.0
>
> Attachments: YARN-7560.000.patch
>
>
> In our cluster, we changed the configuration, then refreshQueues, we found 
> the resourcemanager hangs. And the Resourcemanager can't restart 
> successfully. We got jstack information, always show like this:
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x7f98e8017000 nid=0x2f5 runnable 
> [0x7f98eed9a000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:182)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeSharesInternal(ComputeFairShares.java:140)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeSteadyShares(ComputeFairShares.java:66)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeSteadyShares(FairSharePolicy.java:148)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeSteadyShares(FSParentQueue.java:102)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getQueue(QueueManager.java:148)
> - locked <0x7f8c4a8177a0> (a java.util.HashMap)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:101)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.updateAllocationConfiguration(QueueManager.java:387)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$AllocationReloadListener.onReload(FairScheduler.java:1728)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:422)
> - locked <0x7f8c4a7eb2e0> (a 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.initScheduler(FairScheduler.java:1597)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.serviceInit(FairScheduler.java:1621)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> - locked <0x7f8c4a76ac48> (a java.lang.Object)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:569)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> - locked <0x7f8c49254268> (a java.lang.Object)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:997)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:257)
> at 
> 

[jira] [Commented] (YARN-7119) yarn rmadmin -updateNodeResource should be updated for resource types

2017-11-26 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266261#comment-16266261
 ] 

Daniel Templeton commented on YARN-7119:


I'm on vacation, but I'll try to get to it tomorrow. :)

> yarn rmadmin -updateNodeResource should be updated for resource types
> -
>
> Key: YARN-7119
> URL: https://issues.apache.org/jira/browse/YARN-7119
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Manikandan R
> Attachments: YARN-7119.001.patch, YARN-7119.002.patch, 
> YARN-7119.002.patch, YARN-7119.003.patch, YARN-7119.004.patch, 
> YARN-7119.004.patch, YARN-7119.005.patch, YARN-7119.006.patch, 
> YARN-7119.007.patch, YARN-7119.008.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7535) We should display origin value of demand in fair scheduler page

2017-11-26 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266206#comment-16266206
 ] 

Yufei Gu commented on YARN-7535:


Max resource is a hard limit, which means a queue can't use more resources than 
its max resources. Hence, demand and usage shouldn't be greater than max 
resources. The exiting code makes sense to me in that sense. Should we display 
the original demand other than normalized one? I am open to suggestions, and 
would like to hear more about why needs to show the original. cc [~wilfreds]

> We should display origin value of demand in fair scheduler page
> ---
>
> Key: YARN-7535
> URL: https://issues.apache.org/jira/browse/YARN-7535
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: YunFan Zhou
>Assignee: YunFan Zhou
>
> The value of *demand* of leaf queue that we now view on the fair scheduler 
> page shows only the value of *maxResources* when the demand value is greater 
> than *maxResources*. It doesn't reflect the real situation. Most of the time, 
> when we expand the queue, we often rely on seeing the current demand real 
> value.
> {code:java}
> private void updateDemandForApp(FSAppAttempt sched, Resource maxRes) {
> sched.updateDemand();
> Resource toAdd = sched.getDemand();
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Counting resource from " + sched.getName() + " " + toAdd
>   + "; Total resource consumption for " + getName() + " now "
>   + demand);
> }
> demand = Resources.add(demand, toAdd);
> demand = Resources.componentwiseMin(demand, maxRes);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7119) yarn rmadmin -updateNodeResource should be updated for resource types

2017-11-26 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266097#comment-16266097
 ] 

Manikandan R commented on YARN-7119:


[~templedf] Can you please confirm changes based on your recent comments as 
jenkins report looks good?

> yarn rmadmin -updateNodeResource should be updated for resource types
> -
>
> Key: YARN-7119
> URL: https://issues.apache.org/jira/browse/YARN-7119
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Manikandan R
> Attachments: YARN-7119.001.patch, YARN-7119.002.patch, 
> YARN-7119.002.patch, YARN-7119.003.patch, YARN-7119.004.patch, 
> YARN-7119.004.patch, YARN-7119.005.patch, YARN-7119.006.patch, 
> YARN-7119.007.patch, YARN-7119.008.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7535) We should display origin value of demand in fair scheduler page

2017-11-26 Thread YunFan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266004#comment-16266004
 ] 

YunFan Zhou commented on YARN-7535:
---

Hi, [~yufeigu] [~templedf]

I'm sorry to bother you, but recently I was wondering why the demand of the 
queue cannot exceed *maxResources*. Is it a scheduling optimization need or a 
semantic consideration?

If it is a scheduling optimization requirement, I think it is necessary to show 
the real value when the *demand *value of the queue is displayed on the web 
page.

If it's a semantic consideration, can you give me a place where you can find 
the exact definition?

Thanks!

YunFan Zhou

> We should display origin value of demand in fair scheduler page
> ---
>
> Key: YARN-7535
> URL: https://issues.apache.org/jira/browse/YARN-7535
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: YunFan Zhou
>
> The value of *demand* of leaf queue that we now view on the fair scheduler 
> page shows only the value of *maxResources* when the demand value is greater 
> than *maxResources*. It doesn't reflect the real situation. Most of the time, 
> when we expand the queue, we often rely on seeing the current demand real 
> value.
> {code:java}
> private void updateDemandForApp(FSAppAttempt sched, Resource maxRes) {
> sched.updateDemand();
> Resource toAdd = sched.getDemand();
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Counting resource from " + sched.getName() + " " + toAdd
>   + "; Total resource consumption for " + getName() + " now "
>   + demand);
> }
> demand = Resources.add(demand, toAdd);
> demand = Resources.componentwiseMin(demand, maxRes);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7535) We should display origin value of demand in fair scheduler page

2017-11-26 Thread YunFan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YunFan Zhou reassigned YARN-7535:
-

Assignee: YunFan Zhou

> We should display origin value of demand in fair scheduler page
> ---
>
> Key: YARN-7535
> URL: https://issues.apache.org/jira/browse/YARN-7535
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: YunFan Zhou
>Assignee: YunFan Zhou
>
> The value of *demand* of leaf queue that we now view on the fair scheduler 
> page shows only the value of *maxResources* when the demand value is greater 
> than *maxResources*. It doesn't reflect the real situation. Most of the time, 
> when we expand the queue, we often rely on seeing the current demand real 
> value.
> {code:java}
> private void updateDemandForApp(FSAppAttempt sched, Resource maxRes) {
> sched.updateDemand();
> Resource toAdd = sched.getDemand();
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Counting resource from " + sched.getName() + " " + toAdd
>   + "; Total resource consumption for " + getName() + " now "
>   + demand);
> }
> demand = Resources.add(demand, toAdd);
> demand = Resources.componentwiseMin(demand, maxRes);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6507) Add support in NodeManager to isolate FPGA devices with CGroups

2017-11-26 Thread Zhankun Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-6507:
---
Attachment: YARN-6507-trunk.010.patch

> Add support in NodeManager to isolate FPGA devices with CGroups
> ---
>
> Key: YARN-6507
> URL: https://issues.apache.org/jira/browse/YARN-6507
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
> Attachments: YARN-6507-branch-YARN-3926.001.patch, 
> YARN-6507-branch-YARN-3926.002.patch, YARN-6507-trunk.001.patch, 
> YARN-6507-trunk.002.patch, YARN-6507-trunk.003.patch, 
> YARN-6507-trunk.004.patch, YARN-6507-trunk.005.patch, 
> YARN-6507-trunk.006.patch, YARN-6507-trunk.007.patch, 
> YARN-6507-trunk.008.patch, YARN-6507-trunk.009.patch, 
> YARN-6507-trunk.010.patch
>
>
> Support local FPGA resource scheduler to assign/isolate N FPGA slots to a 
> container.
> At the beginning, support one vendor plugin with basic features to serve 
> OpenCL applications



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6507) Add support in NodeManager to isolate FPGA devices with CGroups

2017-11-26 Thread Zhankun Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-6507:
---
Attachment: (was: YARN-6507-trunk.0010.patch)

> Add support in NodeManager to isolate FPGA devices with CGroups
> ---
>
> Key: YARN-6507
> URL: https://issues.apache.org/jira/browse/YARN-6507
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
> Attachments: YARN-6507-branch-YARN-3926.001.patch, 
> YARN-6507-branch-YARN-3926.002.patch, YARN-6507-trunk.001.patch, 
> YARN-6507-trunk.002.patch, YARN-6507-trunk.003.patch, 
> YARN-6507-trunk.004.patch, YARN-6507-trunk.005.patch, 
> YARN-6507-trunk.006.patch, YARN-6507-trunk.007.patch, 
> YARN-6507-trunk.008.patch, YARN-6507-trunk.009.patch
>
>
> Support local FPGA resource scheduler to assign/isolate N FPGA slots to a 
> container.
> At the beginning, support one vendor plugin with basic features to serve 
> OpenCL applications



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6507) Add support in NodeManager to isolate FPGA devices with CGroups

2017-11-26 Thread Zhankun Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhankun Tang updated YARN-6507:
---
Attachment: YARN-6507-trunk.0010.patch

Rebased on trunk (2bde3aedf139368fc71f053d8dd6580b498ff46d)





> Add support in NodeManager to isolate FPGA devices with CGroups
> ---
>
> Key: YARN-6507
> URL: https://issues.apache.org/jira/browse/YARN-6507
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
> Attachments: YARN-6507-branch-YARN-3926.001.patch, 
> YARN-6507-branch-YARN-3926.002.patch, YARN-6507-trunk.001.patch, 
> YARN-6507-trunk.0010.patch, YARN-6507-trunk.002.patch, 
> YARN-6507-trunk.003.patch, YARN-6507-trunk.004.patch, 
> YARN-6507-trunk.005.patch, YARN-6507-trunk.006.patch, 
> YARN-6507-trunk.007.patch, YARN-6507-trunk.008.patch, 
> YARN-6507-trunk.009.patch
>
>
> Support local FPGA resource scheduler to assign/isolate N FPGA slots to a 
> container.
> At the beginning, support one vendor plugin with basic features to serve 
> OpenCL applications



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org