[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671119#comment-16671119 ] Weiwei Yang commented on YARN-8394: --- Hi [~yufeigu] Apologies I missed your last comment. Are you suggesting when "yarn.scheduler.capacity.node-locality-delay" is set to "-1", then we should AUTO disable "rack-locality-additional-delay" too? I think that makes sense. We need a Jira to track this, revisit the locality code under the context of cloud environment. Let me open one to track. Thanks! > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.4 > > Attachments: YARN-8394.001.patch, YARN-8394.002.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511503#comment-16511503 ] Yufei Gu commented on YARN-8394: Hi [~cheersyang], Let me clarify a little bit. The code logic should be: {code:java} if "yarn.scheduler.capacity.node-locality-delay" is -1: disable "yarn.scheduler.capacity.rack-locality-additional-delay" {code} So that, a user doesn't need to set it manually, which is suggested by the doc you added. Moreover, if the code logic had been there, we would just say that if you disable yarn.scheduler.capacity.node-locality-delay, you disable yarn.scheduler.capacity.rack-locality-additional-delay as well. {quote} Note, this feature should be disabled if YARN is deployed separately with the file system, as locality is meaningless. This can be done by setting `yarn.scheduler.capacity.node-locality-delay` to `-1`, in this case, request's locality constraint is ignored. {quote} > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.4 > > Attachments: YARN-8394.001.patch, YARN-8394.002.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510671#comment-16510671 ] Hudson commented on YARN-8394: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14417 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14417/]) YARN-8394. Improve data locality documentation for Capacity Scheduler. (wwei: rev 29024a62038c297f11e8992601f2522c7da7) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf/capacity-scheduler.xml > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.4 > > Attachments: YARN-8394.001.patch, YARN-8394.002.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510664#comment-16510664 ] Weiwei Yang commented on YARN-8394: --- Hi [~yufeigu] I might misunderstand your comment about the code change. In v2 patch, I already added some description in \{{capacity-scheduler.xml}}, are you suggesting to add some more code for this? > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.4 > > Attachments: YARN-8394.001.patch, YARN-8394.002.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510625#comment-16510625 ] Yufei Gu commented on YARN-8394: LGTM, can you file a jira for the code change? > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8394.001.patch, YARN-8394.002.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509931#comment-16509931 ] Konstantinos Karanasos commented on YARN-8394: -- +1, thanks [~cheersyang]. > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8394.001.patch, YARN-8394.002.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507269#comment-16507269 ] genericqa commented on YARN-8394: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 6s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 40m 51s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 18s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 55m 54s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8394 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12927197/YARN-8394.002.patch | | Optional Tests | asflicense xml mvnsite | | uname | Linux 0a3f05bccd60 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ccfb816 | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 301 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20994/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8394.001.patch, YARN-8394.002.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507254#comment-16507254 ] Weiwei Yang commented on YARN-8394: --- Thanks [~yufeigu], [~kkaranasos], I've addressed your comments in v2 patch. Please take a look, thanks! > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8394.001.patch, YARN-8394.002.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506651#comment-16506651 ] Konstantinos Karanasos commented on YARN-8394: -- Looks good to me. A couple of things to fix before committing: * that tries best efforts to honor task locality constraint -> to honor task locality constraints * losing the locality constraint -> relaxing the locality constraint * when additional is -1, you can say that it is calculated based on the formula L * C / N, capped by the cluster size, where L is number of locations (nodes or racks) specified in the resource request, C is the number of requested containers, and N is the size of the cluster. > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8394.001.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505822#comment-16505822 ] Yufei Gu commented on YARN-8394: Sounds good to me. > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8394.001.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505766#comment-16505766 ] Weiwei Yang commented on YARN-8394: --- Hi [~yufeigu] I thought it could be helpful if we expose this message from doc, I prefer to add both doc and also the code, what do you think? > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8394.001.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505726#comment-16505726 ] Yufei Gu commented on YARN-8394: bq. This can be done by setting `yarn.scheduler.capacity.node-locality-delay` to `-1` This should be done in code instead of letting user to do it by reading the doc. Sounds like another jira if it is not there. > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8394.001.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505591#comment-16505591 ] Weiwei Yang commented on YARN-8394: --- Thanks [~leftnoteasy]. [~yufeigu], [~kkaranasos], do you want to take a look before I committing this? If there is no further comments, I plan to commit this at the weekend. Thanks. > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8394.001.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503721#comment-16503721 ] Wangda Tan commented on YARN-8394: -- +1, thanks [~cheersyang] for the patch. > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8394.001.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503552#comment-16503552 ] Yufei Gu commented on YARN-8394: Make senses to me assuming that the Cloud solution still uses CS/FS as the scheduler. I guess some simple settings to let container run on any node will solve the issue. Besides, the trend is no YARN in Cloud solutions, which makes "delay logic" totally irrelevant. > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8394.001.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503373#comment-16503373 ] genericqa commented on YARN-8394: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 39m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 53m 34s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8394 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12926731/YARN-8394.001.patch | | Optional Tests | asflicense mvnsite | | uname | Linux fcd7d91b47cb 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d1992ab | | maven | version: Apache Maven 3.3.9 | | Max. process+thread count | 336 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20960/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8394.001.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502752#comment-16502752 ] Weiwei Yang commented on YARN-8394: --- Hi [~yufeigu] By that I mean, when a cluster is using separated storage and computation systems, i.e file system is remote, there is no locality at all. Such architecture is very popular now on cloud. If CS continues to use the default delay logic, MR jobs perf suffers. Tasks are waiting for missed opportunities until they are finally switched to off-switch requests. Does that make sense? > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Priority: Major > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502479#comment-16502479 ] Yufei Gu commented on YARN-8394: Hi [~cheersyang], thanks for filing this. Can you elaborate on this? bq. we need to introduce how to compromise data locality in CS otherwise MR jobs are suffering. > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Priority: Major > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org