[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2020-02-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17045179#comment-17045179
 ] 

Hadoop QA commented on YARN-9278:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  9s{color} 
| {color:red} YARN-9278 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9278 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12962761/YARN-9278.003.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25580/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9278.001.patch, YARN-9278.002.patch, 
> YARN-9278.003.patch
>
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-03-19 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796823#comment-16796823
 ] 

Wilfred Spiegelenburg commented on YARN-9278:
-

Thank for the update [~uranus]

I don't think that we can add a lot of tests, as it would become really 
difficult to inspect the results. The three tests I can think of are:
* setting the batch value to 0 (-1 is the default and we do a tests with that).
* setting the value for the batch to something larger than the number of nodes 
to show that we do not run out of the node list and somehow fail
* setting the batch value to {{#NMs -1}} (i.e. batch=4 in a 5 node cluster) and 
do multiple runs. Even with bad random we should flip over the end of the list.

None of the tests should fail iterating nodes.

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9278.001.patch, YARN-9278.002.patch, 
> YARN-9278.003.patch
>
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-03-18 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794783#comment-16794783
 ] 

Hadoop QA commented on YARN-9278:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
49s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 17s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 79m 
27s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
25s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}149m 50s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9278 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12962761/YARN-9278.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  

[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-03-17 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794680#comment-16794680
 ] 

Zhaohui Xin commented on YARN-9278:
---

[~wilfreds], thanks for your reply. I attached new patch which resolved code 
style issues and description in FairScheduler.md. 
{quote}We need to get either add some tests or explain why we cannot add tests.
{quote}
I think it's difficult to test because we can't count the number of nodes which 
will be preempted. Can you give me some suggestions? 

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9278.001.patch, YARN-9278.002.patch, 
> YARN-9278.003.patch
>
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-03-17 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794653#comment-16794653
 ] 

Wilfred Spiegelenburg commented on YARN-9278:
-

Thank you for the update. [~uranus] The code change looks good: a couple of 
minor remarks:
* We need to get either add some tests or explain why we cannot add tests.
* Can you fix the newly introduced checkstyle issues please.
* The text for the property is much better however this part does not make 
sense to me:
{{The max trial nodes num to identify containers for one starved container}}
I think you want to say:
{{The maximum number of NodeManagers to check per pre-emption check for one 
starved container.}}

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9278.001.patch, YARN-9278.002.patch
>
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-03-13 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791827#comment-16791827
 ] 

Hadoop QA commented on YARN-9278:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
4s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 3 new + 16 unchanged - 0 fixed = 19 total (was 16) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 89m 27s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
16s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}159m 53s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9278 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12962333/YARN-9278.002.patch |
| Optional 

[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-03-12 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790501#comment-16790501
 ] 

Wilfred Spiegelenburg commented on YARN-9278:
-

I have a couple of comments on the patch.
* I think we need a default that turns this functionality off by default. An 
administrator should actively turn this on. A default of -1 or 0 is better and 
we should skip over the calculation for that value.
* Your code does not wrap at the end of the list. That is why I changed it to a 
do while loop. as an example: I have a batch size of 100 and I have 350 nodes. 
I could start at node 300 and still want to check those 100 nodes and want to 
check nodes 300-350 and 0-49. We should never double check a node, never go 
past the start. I updated the example code with a wrapping end calculation.
* We need to cater for a setup that has the batch value set and a node list, 
for whatever reason or point in time, is smaller than the batch size. Your code 
does not handle this. We should just process all nodes at that point and not 
stop at the end of the list. 
* The text for the property is not clear at all. (See below)
* Please look at adding a test for this change.

Text for the property remarks:
{code}
The max trial nodes num to identify containers for one starved container. 
Defaults to 0.
{code}
It does not explain what it does and why it is there. It is too cryptic. We 
should explain the following:
* what is it used for (use a partial list of nodes to check for preemptable 
containers in a large cluster)
* when should it be used, or when it takes effect.
* what is the impact: AM container impact, 

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
> Attachments: YARN-9278.001.patch
>
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-03-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788863#comment-16788863
 ] 

Hadoop QA commented on YARN-9278:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
43s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 38s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 39s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 85m 35s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}161m 58s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisherForV2 |
|   | 
hadoop.yarn.server.resourcemanager.metrics.TestCombinedSystemMetricsPublisher |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
\\
\\
|| Subsystem || Report/Notes ||
| 

[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-27 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780101#comment-16780101
 ] 

Wilfred Spiegelenburg commented on YARN-9278:
-

Two things:
* I still think limiting the number of nodes is something we need to approach 
with care.
* randomising a 10,000 entry long list each time we pre-empt will also become 
expensive.
 
I was thinking more of something like this:
{code:java}
  int preEmptionBatchSize = conf.getPreEmptionBatchSize();
  List potentialNodes = 
scheduler.getNodeTracker().getNodesByResourceName(rr.getResourceName());
  int size = potentialNodes.size();
  int stop = 0;
  int current = 0;
  // find a start point somewhere in the list if it is long
  if (size > preEmptionBatchSize) {
Random rand = new Random();
current = rand.nextInt(size / preEmptionBatchSize) * preEmptionBatchSize;
  }
  do {
FSSchedulerNode mine = potentialNodes.get(current);
// Identify the containers

current++;
// flip at the end of the list  
if (current > size) {
  current = 0;
}
  } while (current != stop);
{code}

Pre-emption runs in a loop and we could be considering different applications 
one after the other. Shuffling that node list continually is not good from a 
performance perspective. A simple cut in like above gives the same kind of 
behaviour. 
We could then still limit the number of "batches" we process. With some more 
smarts the stop condition could be based on the fact that we have processed as 
an example 10 * the batch size in nodes (a batch of nodes could be deemed 
equivalent with the number of nodes in a rack):
{code}  stop = ((10 * preEmptionBatchSize) > size) ? current : (((10 * 
preEmptionBatchSize) + current) % size););
{code}  

That gives a lot of flexibility and still a decent performance in a large 
cluster.

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-24 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776536#comment-16776536
 ] 

Zhaohui Xin commented on YARN-9278:
---

Thanks for your suggestions, [~wilfreds]. I also think it's better to randomize 
nodes when the number of nodes exceeds a certain threshold.

Maybe our change like this,
{code:java}
List potentialNodes = scheduler.getNodeTracker()
.getNodesByResourceName(rr.getResourceName());
int maxTryNodeNumOnce = conf.getMaxTryNodeNumOnce();

// we should not iterate all nodes, that will be very slow
if (ResourceRequest.ANY.equals(rr.getResourceName()) &&
potentialNodes.size() > maxTryNodeNumOnce) {

  Collections.shuffle(potentialNodes);
  potentialNodes = potentialNodes.subList(0, maxTryNodeNumOnce);
}
{code}

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-24 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776502#comment-16776502
 ] 

Wilfred Spiegelenburg commented on YARN-9278:
-

[~uranus] I can understand that you want to limit the number of nodes to look 
at for pre-emption in large clusters. I could speed things up in certain cases. 
However when I look at the way we identify we already break out of the loop 
when we get to a node that gives back a container list without AMs. In 
{{identifyContainersToPreemptForOneContainer}} we break out of the loop 
checking nodes when {{numAMContainers}} was 0. So we do already break out of 
the loop looking for suitable nodes.

Based on your comment this will change will introduce a trade of between AMs 
and nodes. You propose to stop checking nodes even if we still have AMs in the 
list. In other words you are willing to accept some AMs in the list even if 
that has side effects on those applications. I don't think that that is a good 
idea.

I do agree with you that for the ANY resource we probably want to do something 
else and not just grab the first nodes out of the list all the time. The list 
that comes back from the node tracker is unsorted and just a copy of what is 
known without a filter. We should introduce some logic to not just use a for 
loop to run over the list from the start. If we use a seeded start point 
somewhere in the list which moves around we spread our 
preemption better.
We could base the starting point on the current time (second) and the size of 
the list returned. I don't think we need that if the list is smaller than a 
hard coded number (maybe 50 or 100)  but it would really help in large clusters.


> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-24 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776207#comment-16776207
 ] 

Zhaohui Xin commented on YARN-9278:
---

Thanks for your reply, [~yufeigu]. Another solution is to stop looking for 
nodes when we find a suitable one. 

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-24 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776184#comment-16776184
 ] 

Yufei Gu commented on YARN-9278:


bq. If our cluster has a lot of long-running jobs, the above method is not 
helpful. 
That's unfortunate.

Setting a maximum num of nodes to iterate seems a quick-and-dirty way to solve 
the latency in big clusters. Let's brainstorm the solution. [~Steven Rand] and 
[~wilfreds], what do you think?

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-21 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773792#comment-16773792
 ] 

Zhaohui Xin commented on YARN-9278:
---

{quote}Without introduce more complexity to FS preemption, it is already very 
complicated, there are some workarounds you can try: To increase FairShare 
Preemption Timeout and FairShare Preemption Threshold to reduce the chance of 
preemption. This is specially useful for a large cluster, since there is more 
chance to get resources just by waiting.
{quote}
If our cluster has a lot of long-running jobs, the above method is not helpful. 

We have used this optimization for more than a year, which improves preemption 
performance effectively. BTW, we have more than 10 clusters and most of them 
have about 10K nodes.

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-20 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773299#comment-16773299
 ] 

Yufei Gu commented on YARN-9278:


Hi [~uranus], this seems a perf issue for a busy large cluster due to the 
preemption implementation, which is iteration and check. 

I would suggest lower 
{{yarn.scheduler.fair.preemption.cluster-utilization-threshold}} to let 
preemption kick in earlier for a large cluster. The default value is 80%, which 
means preemption won't kick in until 80% resources of the whole cluster have 
been used. Please be aware that low utilization threshold may cause an 
unnecessary container churn, so you don't want it to be too low. 

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-19 Thread Zhaohui Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772638#comment-16772638
 ] 

Zhaohui Xin commented on YARN-9278:
---

Hi, [~yufeigu]. When preemption thread satisfies a starved container with ANY 
as resource name, it will find a best node in all nodes of this cluster. This 
will be costly when this cluster has more than 10k nodes.

I think we should limit the number of nodes in such a situation. How do you 
think this? :D

 

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org