[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-10-31 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671119#comment-16671119
 ] 

Weiwei Yang commented on YARN-8394:
---

Hi [~yufeigu]

Apologies I missed your last comment.

Are you suggesting when "yarn.scheduler.capacity.node-locality-delay" is set to 
"-1", then we should AUTO disable "rack-locality-additional-delay" too?  I 
think that makes sense.

We need a Jira to track this, revisit the locality code under the context of 
cloud environment. Let me open one to track.

Thanks!

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: YARN-8394.001.patch, YARN-8394.002.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-13 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511503#comment-16511503
 ] 

Yufei Gu commented on YARN-8394:


Hi [~cheersyang],

Let me clarify a little bit. The code logic should be:
{code:java}
if "yarn.scheduler.capacity.node-locality-delay" is -1:
disable "yarn.scheduler.capacity.rack-locality-additional-delay"
{code}
So that, a user doesn't need to set it manually, which is suggested by the doc 
you added. Moreover, if the code logic had been there, we would just say that 
if you disable yarn.scheduler.capacity.node-locality-delay, you disable 
yarn.scheduler.capacity.rack-locality-additional-delay as well.
{quote}
Note, this feature should be disabled if YARN is deployed separately with the 
file system, as locality is meaningless. This can be done by setting 
`yarn.scheduler.capacity.node-locality-delay` to `-1`, in this case, request's 
locality constraint is ignored.
{quote}

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: YARN-8394.001.patch, YARN-8394.002.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510671#comment-16510671
 ] 

Hudson commented on YARN-8394:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14417 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14417/])
YARN-8394. Improve data locality documentation for Capacity Scheduler. (wwei: 
rev 29024a62038c297f11e8992601f2522c7da7)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf/capacity-scheduler.xml


> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: YARN-8394.001.patch, YARN-8394.002.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-13 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510664#comment-16510664
 ] 

Weiwei Yang commented on YARN-8394:
---

Hi [~yufeigu]

I might misunderstand your comment about the code change. In v2 patch, I 
already added some description in \{{capacity-scheduler.xml}}, are you 
suggesting to add some more code for this? 

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: YARN-8394.001.patch, YARN-8394.002.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-12 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510625#comment-16510625
 ] 

Yufei Gu commented on YARN-8394:


LGTM, can you file a jira for the code change?

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8394.001.patch, YARN-8394.002.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-12 Thread Konstantinos Karanasos (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509931#comment-16509931
 ] 

Konstantinos Karanasos commented on YARN-8394:
--

+1, thanks [~cheersyang].

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8394.001.patch, YARN-8394.002.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-10 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507269#comment-16507269
 ] 

genericqa commented on YARN-8394:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
6s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
40m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 55m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8394 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12927197/YARN-8394.002.patch |
| Optional Tests |  asflicense  xml  mvnsite  |
| uname | Linux 0a3f05bccd60 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / ccfb816 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 301 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20994/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8394.001.patch, YARN-8394.002.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-09 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507254#comment-16507254
 ] 

Weiwei Yang commented on YARN-8394:
---

Thanks [~yufeigu], [~kkaranasos], I've addressed your comments in v2 patch. 
Please take a look, thanks!

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8394.001.patch, YARN-8394.002.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-08 Thread Konstantinos Karanasos (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506651#comment-16506651
 ] 

Konstantinos Karanasos commented on YARN-8394:
--

Looks good to me. A couple of things to fix before committing:
 * that tries best efforts to honor task locality constraint -> to honor task 
locality constraints
 * losing the locality constraint -> relaxing the locality constraint
 * when additional is -1, you can say that it is calculated based on the 
formula L * C / N, capped by the cluster size, where L is number of locations 
(nodes or racks) specified in the resource request, C is the number of 
requested containers, and N is the size of the cluster.

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8394.001.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-08 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505822#comment-16505822
 ] 

Yufei Gu commented on YARN-8394:


Sounds good to me.

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8394.001.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-08 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505766#comment-16505766
 ] 

Weiwei Yang commented on YARN-8394:
---

Hi [~yufeigu]

I thought it could be helpful if we expose this message from doc, I prefer to 
add both doc and also the code, what do you think?

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8394.001.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-07 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505726#comment-16505726
 ] 

Yufei Gu commented on YARN-8394:


bq. This can be done by setting `yarn.scheduler.capacity.node-locality-delay` 
to `-1`
This should be done in code instead of letting user to do it by reading the 
doc. Sounds like another jira if it is not there.

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8394.001.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-07 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505591#comment-16505591
 ] 

Weiwei Yang commented on YARN-8394:
---

Thanks [~leftnoteasy]. [~yufeigu], [~kkaranasos], do you want to take a look 
before I committing this? If there is no further comments, I plan to commit 
this at the weekend.
Thanks.

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8394.001.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-06 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503721#comment-16503721
 ] 

Wangda Tan commented on YARN-8394:
--

+1, thanks [~cheersyang] for the patch.

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8394.001.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-06 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503552#comment-16503552
 ] 

Yufei Gu commented on YARN-8394:


Make senses to me assuming that the Cloud solution still uses CS/FS as the 
scheduler. I guess some simple settings to let container run on any node will 
solve the issue. Besides, the trend is no YARN in Cloud solutions, which makes 
"delay logic" totally irrelevant. 

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8394.001.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-06 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503373#comment-16503373
 ] 

genericqa commented on YARN-8394:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
39m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m 34s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8394 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12926731/YARN-8394.001.patch |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux fcd7d91b47cb 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d1992ab |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 336 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20960/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8394.001.patch
>
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-05 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502752#comment-16502752
 ] 

Weiwei Yang commented on YARN-8394:
---

Hi [~yufeigu]

By that I mean, when a cluster is using separated storage and computation 
systems, i.e file system is remote, there is no locality at all. Such 
architecture is very popular now on cloud. If CS continues to use the default 
delay logic, MR jobs perf suffers. Tasks are waiting for missed opportunities 
until they are finally switched to off-switch requests. Does that make sense? 

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Priority: Major
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler

2018-06-05 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502479#comment-16502479
 ] 

Yufei Gu commented on YARN-8394:


Hi [~cheersyang], thanks for filing this. Can you elaborate on this?
bq. we need to introduce how to compromise data locality in CS otherwise MR 
jobs are suffering.

> Improve data locality documentation for Capacity Scheduler
> --
>
> Key: YARN-8394
> URL: https://issues.apache.org/jira/browse/YARN-8394
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Weiwei Yang
>Priority: Major
>
> YARN-6344 introduces a new parameter 
> {{yarn.scheduler.capacity.rack-locality-additional-delay}} in 
> capacity-scheduler.xml, we need to add some documentation in 
> {{CapacityScheduler.md}} accordingly.
> Moreover, we are seeing more and more clusters are separating storage and 
> computation where file system is always remote, in such cases we need to 
> introduce how to compromise data locality in CS otherwise MR jobs are 
> suffering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org