[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm

2018-02-02 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350732#comment-16350732
 ] 

Arun Suresh commented on YARN-7839:
---

Thanks for the patch [~pgaref]
It looks pretty straight forward to me. +1 will commit this shortly.

> Check node capacity before placing in the Algorithm
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: YARN-7839-YARN-6592.001.patch
>
>
> Currently, the Algorithm assigns a node to a request purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm

2018-02-02 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350731#comment-16350731
 ] 

Arun Suresh commented on YARN-7839:
---

[~sunilg], regarding the {{CandidateNodeSet}}, lets move the discussion to when 
we refactor the {{AppSchedulingInfo}} - since this patch is isolated to the 
algorithm.

[~kkaranasos] comment:
bq. However, what about the case that a node seems full but a container is 
about to finish (and will be finished until the allocate is done)? Should we 
completely reject such nodes, or simply give higher priority to nodes that 
already have available resources?
We are not rejecting those resources. If a Scheduling request cannot be 
satisfied by any node in the algorithm round, it will be retried in the next AM 
heartbeat - and hopefully some of those containers would complete by then. We 
can set the retry to a higher value for clusters that are running at a higher 
utilization.

> Check node capacity before placing in the Algorithm
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: YARN-7839-YARN-6592.001.patch
>
>
> Currently, the Algorithm assigns a node to a request purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm

2018-02-02 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350656#comment-16350656
 ] 

Sunil G commented on YARN-7839:
---

bq.despite the naming, as far as I know, the candidateNodeSet is currently 
always only a single node

[~kkaranasos] and [~asuresh] for multi node, CandidateNodeSet was ideal 
interface to extend for. So multiple nodes could come in tat iterator.

> Check node capacity before placing in the Algorithm
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: YARN-7839-YARN-6592.001.patch
>
>
> Currently, the Algorithm assigns a node to a request purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm

2018-02-02 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350649#comment-16350649
 ] 

genericqa commented on YARN-7839:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-6592 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
15s{color} | {color:green} YARN-6592 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} YARN-6592 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} YARN-6592 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} YARN-6592 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} YARN-6592 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} YARN-6592 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 37s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 11s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}106m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServiceAppsNodelabel |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7839 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12908993/YARN-7839-YARN-6592.001.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9cb62d03926c 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | YARN-6592 / 8df7666 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/19579/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19579/testReport/ |
| Max. process+thread count | 866 (vs. ulimit of 5000) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemana

[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm

2018-02-02 Thread Panagiotis Garefalakis (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350496#comment-16350496
 ] 

Panagiotis Garefalakis commented on YARN-7839:
--

 

Submitting a simple patch tracking available cluster resources in the 
DefaultPlacement algorithm - to support capacity check before placement.

The actual check is part of the attemptPlacementOnNode method which could be 
configured with the **ignoreResourceCheck** flag.

In the current patch the check is enabled on placement step and disabled on the 
validation step.

A wrapper class SchedulingRequestWithPlacementAttempt was also introduced to 
keep track of the failed attempts on the rejected SchedulingRequests.

 

Thoughts?  [~asuresh] [~kkaranasos] [~cheersyang] 

> Check node capacity before placing in the Algorithm
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Assignee: Panagiotis Garefalakis
>Priority: Major
> Attachments: YARN-7839-YARN-6592.001.patch
>
>
> Currently, the Algorithm assigns a node to a request purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm

2018-01-29 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343850#comment-16343850
 ] 

Konstantinos Karanasos commented on YARN-7839:
--

I think the change makes sense, [~asuresh].

However, what about the case that a node seems full but a container is about to 
finish (and will be finished until the allocate is done)? Should we completely 
reject such nodes, or simply give higher priority to nodes that already have 
available resources?
{quote}getPreferredNodeIterator(CandidateNodeSet candidateNodeSet)
{quote}
[~cheersyang], despite the naming, as far as I know, the candidateNodeSet is 
currently always only a single node...

> Check node capacity before placing in the Algorithm
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Priority: Major
>
> Currently, the Algorithm assigns a node to a requests purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm

2018-01-29 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343154#comment-16343154
 ] 

Weiwei Yang commented on YARN-7839:
---

Hi [~asuresh]

OK, I think I was talking about the AppPlacementAllocator approach, because I 
noticed \{{AppPlacementAllocator#getPreferredNodeIterator(CandidateNodeSet 
candidateNodeSet)}} in API, all I was thinking is to use placement constraint 
to filter out such candidate nodes. For the processor approach, I do agree your 
proposed approach can help, it won't create too much overhead as it only checks 
in-memory data and doesn't hold any lock, it should help. I just could not tell 
how much it helps on a real cluster.

> Check node capacity before placing in the Algorithm
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Priority: Major
>
> Currently, the Algorithm assigns a node to a requests purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm

2018-01-29 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343133#comment-16343133
 ] 

Arun Suresh commented on YARN-7839:
---

[~cheersyang],
bq. Instead, I am thinking... can we generate a list of ordered candidate nodes 
for each allocation (based on some policy), then let scheduler work on such 
candidate set of nodes to pick up one that fulfills scheduler's requests?
Candidate nodes are not a bad idea (I remember we discussed it briefly 
earlier). The reason I did not want to attempt it initially was for the 
following reasons:
# In the current scheme, we do a re-place and then re-schedule ONLY if the 
initial placement was unsuccessful. This means the algorithm can bail as soon 
as it finds the first viable node - this makes the algorithm simpler and the 
Datastructures that the algorithm returns back simpler. If the algorithm were 
to output a list of candidate nodes, there is a very good chance, it would have 
to loop through more nodes (and possibly the entire nodeset) per request. Also, 
for cases a node affinity, candidate selection would be complicated. For eg. if 
*foo* has to be placed only in Nodes that have *bar*, and if we have 5 
candidates for *bar*, the 5 candidates for *foo* would also have to depend on 
the *bar* candidates.
# The Scheduler's attemptOnNode takes multiple locks - lock on the app, node 
and queue. Passing in multiple nodes means the lock will be held for a longer 
duration - which can cause problems - we have experience many such issues, 
espescially, when applications complete before its containers complete which 
results in releseContainer taking a lock on multiple nodes.

In any case, even if we were to try candidate nodes, I still believe letting 
the algorithm query the nodes' capacity at the time of placing might not be 
bad. I prefer we do not hold a lock - since one of the motivations of 
separating out the Placement phase from the scheduling phase is so that the 
placement can operate on a loosly consistent view of the cluster.



> Check node capacity before placing in the Algorithm
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Priority: Major
>
> Currently, the Algorithm assigns a node to a requests purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm

2018-01-29 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343069#comment-16343069
 ] 

Weiwei Yang commented on YARN-7839:
---

Hi [~asuresh]

When there is multiple threads running placement algorithm, it could be true 
when you check a node capacity in both places, but eventually got rejected when 
some other container allocated to this node. As you don't hold the 
SchedulerNode's lock to update resource when you attempt to place the request. 
So I am not sure how much this can improve in a highly utilized cluster.

Instead, I am thinking... can we generate *a list of ordered candidate nodes* 
for each allocation (based on some policy), then let scheduler work on such 
candidate set of nodes to pick up one that fulfills scheduler's requests?

Thanks

> Check node capacity before placing in the Algorithm
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Priority: Major
>
> Currently, the Algorithm assigns a node to a requests purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check and the other user limit checks can still be handled by the 
> scheduler (since queues and other limits are tied to the scheduler, and not 
> scheduler agnostic)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7839) Check node capacity before placing in the Algorithm

2018-01-28 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342973#comment-16342973
 ] 

Arun Suresh commented on YARN-7839:
---

This is a fairly trivial change - we just need to add an additional check in 
the {{DefaultPlacementAlgorithm#attemptPlacementOnNode()}} method.
Thoughts [~kkaranasos] / [~cheersyang] ?

> Check node capacity before placing in the Algorithm
> ---
>
> Key: YARN-7839
> URL: https://issues.apache.org/jira/browse/YARN-7839
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun Suresh
>Priority: Major
>
> Currently, the Algorithm assigns a node to a requests purely based on if the 
> constraints are met. It is later in the scheduling phase that the Queue 
> capacity and Node capacity are checked. If the request cannot be placed 
> because of unavailable Queue/Node capacity, the request is retried by the 
> Algorithm.
> For clusters that are running at high utilization, we can reduce the retries 
> if we perform the Node capacity check in the Algorithm as well. The Queue 
> capacity check can still be handled by the scheduler (since queues are tied 
> to the scheduler)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org