[jira] [Updated] (YARN-7893) Document the FPGA isolation feature
[ https://issues.apache.org/jira/browse/YARN-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-7893: --- Attachment: YARN-7893-trunk-004.patch > Document the FPGA isolation feature > --- > > Key: YARN-7893 > URL: https://issues.apache.org/jira/browse/YARN-7893 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Blocker > Attachments: FPGA-doc-YARN-7893-v3.pdf, FPGA-doc-YARN-7893.pdf, > YARN-7893-trunk-001.patch, YARN-7893-trunk-002.patch, > YARN-7893-trunk-003.patch, YARN-7893-trunk-004.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7893) Document the FPGA isolation feature
[ https://issues.apache.org/jira/browse/YARN-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376475#comment-16376475 ] genericqa commented on YARN-7893: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 26s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 25m 2s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch 1 line(s) with tabs. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 53s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 36m 25s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7893 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12911994/YARN-7893-trunk-003.patch | | Optional Tests | asflicense mvnsite | | uname | Linux a349eef86816 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2fa7963 | | maven | version: Apache Maven 3.3.9 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/19807/artifact/out/whitespace-tabs.txt | | Max. process+thread count | 407 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/19807/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Document the FPGA isolation feature > --- > > Key: YARN-7893 > URL: https://issues.apache.org/jira/browse/YARN-7893 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Blocker > Attachments: FPGA-doc-YARN-7893-v3.pdf, FPGA-doc-YARN-7893.pdf, > YARN-7893-trunk-001.patch, YARN-7893-trunk-002.patch, > YARN-7893-trunk-003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7672) hadoop-sls can not simulate huge scale of YARN
[ https://issues.apache.org/jira/browse/YARN-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376465#comment-16376465 ] Yufei Gu commented on YARN-7672: I think you understand it correctly. Not quite familiar with 2.4 though. Recommend to debug it since it is quite easy to debug fair scheduler when you are running SLS. > hadoop-sls can not simulate huge scale of YARN > -- > > Key: YARN-7672 > URL: https://issues.apache.org/jira/browse/YARN-7672 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhangshilong >Assignee: zhangshilong >Priority: Major > Attachments: YARN-7672.patch > > > Our YARN cluster scale to nearly 10 thousands nodes. We need to do scheduler > pressure test. > Using SLS,we start 2000+ threads to simulate NM and AM. But cpu.load very > high to 100+. I thought that will affect performance evaluation of > scheduler. > So I thought to separate the scheduler from the simulator. > I start a real RM. Then SLS will register nodes to RM,And submit apps to RM > using RM RPC. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7893) Document the FPGA isolation feature
[ https://issues.apache.org/jira/browse/YARN-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-7893: --- Attachment: YARN-7893-trunk-003.patch > Document the FPGA isolation feature > --- > > Key: YARN-7893 > URL: https://issues.apache.org/jira/browse/YARN-7893 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Blocker > Attachments: FPGA-doc-YARN-7893-v3.pdf, FPGA-doc-YARN-7893.pdf, > YARN-7893-trunk-001.patch, YARN-7893-trunk-002.patch, > YARN-7893-trunk-003.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7893) Document the FPGA isolation feature
[ https://issues.apache.org/jira/browse/YARN-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-7893: --- Attachment: FPGA-doc-YARN-7893-v3.pdf > Document the FPGA isolation feature > --- > > Key: YARN-7893 > URL: https://issues.apache.org/jira/browse/YARN-7893 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Blocker > Attachments: FPGA-doc-YARN-7893-v3.pdf, FPGA-doc-YARN-7893.pdf, > YARN-7893-trunk-001.patch, YARN-7893-trunk-002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7965) NodeAttributeManager add/get API is not working properly
[ https://issues.apache.org/jira/browse/YARN-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376445#comment-16376445 ] Weiwei Yang edited comment on YARN-7965 at 2/26/18 6:46 AM: Thanks [~naganarasimha...@apache.org], [~sunilg]. Please help to review v2 patch. bq. Ln no 187: change name from node to rmAttribute Done bq. ln no 318: why did we remove locking over here ? I don't think the read lock is needed. It was protecting the read access to a ConcurrentHashMap, and the map is thread safe. Adding an extra lock is redundant. bq. I think its a simple check of prefix.isEmpty() should have been !prefix.isEmpty() in my earlier code. Can we just have it in that way, it seems to be concise ? If user provide a non-empty prefix set, and this set contains only 1 prefix which is not a valid one, the old logic will return all attributes to the client, this is a bug. Hence refactor the logic and I don't think the change complicate the code. bq. License and checkstyle issues Fixed. [~sunilg] regarding the logs, [~naganarasimha...@apache.org] mentioned that he is working on a patch with more test cases and improvements, I assume that will help. This patch is for unblock rest of my patches so it only has minimal changes. Hope that makes sense. Thanks was (Author: cheersyang): Thanks [~naganarasimha...@apache.org], [~sunilg]. Please help to review v2 patch. bq. Ln no 187: change name from node to rmAttribute Done bq. ln no 318: why did we remove locking over here ? I don't think the read lock is needed. It was protecting the read access to a ConcurrentHashMap, and the map is thread safe. Adding an extra lock is redundant. bq. I think its a simple check of prefix.isEmpty() should have been !prefix.isEmpty() in my earlier code. Can we just have it in that way, it seems to be concise ? If user provide a non-empty prefix set, and this set contains only 1 prefix which is not a valid one, the old logic will return all attributes to the client, this is a bug. Hence refactor the logic and I don't think it complicate the code. bq. License and checkstyle issues Fixed. [~sunilg] regarding the logs, [~naganarasimha...@apache.org] mentioned that he is working on a patch with more test cases and improvements, I assume that will help. This patch is for unblock rest of my patches so it only has minimal changes. Hope that makes sense. Thanks > NodeAttributeManager add/get API is not working properly > > > Key: YARN-7965 > URL: https://issues.apache.org/jira/browse/YARN-7965 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-7965-YARN-3409.001.patch, > YARN-7965-YARN-3409.002.patch > > > Fix following issues, > # After add node attributes to the manager, could not retrieve newly added > attributes > # Get cluster attributes API should return empty set when given prefix has > no match > # When an attribute is removed from all nodes, the manager did not remove > this mapping > and add UT -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7965) NodeAttributeManager add/get API is not working properly
[ https://issues.apache.org/jira/browse/YARN-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376445#comment-16376445 ] Weiwei Yang commented on YARN-7965: --- Thanks [~naganarasimha...@apache.org], [~sunilg]. Please help to review v2 patch. bq. Ln no 187: change name from node to rmAttribute Done bq. ln no 318: why did we remove locking over here ? I don't think the read lock is needed. It was protecting the read access to a ConcurrentHashMap, and the map is thread safe. Adding an extra lock is redundant. bq. I think its a simple check of prefix.isEmpty() should have been !prefix.isEmpty() in my earlier code. Can we just have it in that way, it seems to be concise ? If user provide a non-empty prefix set, and this set contains only 1 prefix which is not a valid one, the old logic will return all attributes to the client, this is a bug. Hence refactor the logic and I don't think it complicate the code. bq. License and checkstyle issues Fixed. [~sunilg] regarding the logs, [~naganarasimha...@apache.org] mentioned that he is working on a patch with more test cases and improvements, I assume that will help. This patch is for unblock rest of my patches so it only has minimal changes. Hope that makes sense. Thanks > NodeAttributeManager add/get API is not working properly > > > Key: YARN-7965 > URL: https://issues.apache.org/jira/browse/YARN-7965 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-7965-YARN-3409.001.patch, > YARN-7965-YARN-3409.002.patch > > > Fix following issues, > # After add node attributes to the manager, could not retrieve newly added > attributes > # Get cluster attributes API should return empty set when given prefix has > no match > # When an attribute is removed from all nodes, the manager did not remove > this mapping > and add UT -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7965) NodeAttributeManager add/get API is not working properly
[ https://issues.apache.org/jira/browse/YARN-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated YARN-7965: -- Attachment: YARN-7965-YARN-3409.002.patch > NodeAttributeManager add/get API is not working properly > > > Key: YARN-7965 > URL: https://issues.apache.org/jira/browse/YARN-7965 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-7965-YARN-3409.001.patch, > YARN-7965-YARN-3409.002.patch > > > Fix following issues, > # After add node attributes to the manager, could not retrieve newly added > attributes > # Get cluster attributes API should return empty set when given prefix has > no match > # When an attribute is removed from all nodes, the manager did not remove > this mapping > and add UT -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7965) NodeAttributeManager add/get API is not working properly
[ https://issues.apache.org/jira/browse/YARN-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376435#comment-16376435 ] Sunil G commented on YARN-7965: --- Thanks [~cheersyang]. Missed that, thanks for adding some more cases. I think we need to add a bit more debug log in manager, else it ll be tough to debug. > NodeAttributeManager add/get API is not working properly > > > Key: YARN-7965 > URL: https://issues.apache.org/jira/browse/YARN-7965 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-7965-YARN-3409.001.patch > > > Fix following issues, > # After add node attributes to the manager, could not retrieve newly added > attributes > # Get cluster attributes API should return empty set when given prefix has > no match > # When an attribute is removed from all nodes, the manager did not remove > this mapping > and add UT -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7893) Document the FPGA isolation feature
[ https://issues.apache.org/jira/browse/YARN-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376399#comment-16376399 ] Zhankun Tang commented on YARN-7893: [~leftnoteasy] , one quick question, we all know that when a user chooses CapacityScheduler, he/she must configure DominantResourceCalculator. But I just tested in my environment that when sets FairScheduler but don't sepcify "org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.SchedulingPolicy" to "drf" (fair policy by default), it can still works. Any comments? > Document the FPGA isolation feature > --- > > Key: YARN-7893 > URL: https://issues.apache.org/jira/browse/YARN-7893 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Blocker > Attachments: FPGA-doc-YARN-7893.pdf, YARN-7893-trunk-001.patch, > YARN-7893-trunk-002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7346) Fix compilation errors against hbase2 beta release
[ https://issues.apache.org/jira/browse/YARN-7346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376384#comment-16376384 ] genericqa commented on YARN-7346: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 21s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-assemblies hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 4s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 14s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-project hadoop-assemblies hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 23s{color} | {color:red} patch/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server/hadoop-yarn-server-timelineservice-hbase-server-2 no findbugs output file (hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-server/hadoop-yarn-server-timelineservice-hbase-server-2/target/findbugsXml.xml) {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s{color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservice-hbase-server_hadoop-yarn-server-timelineservice-hbase-server-2 generated 107 new + 0 unchanged - 0 fixed = 107 total (was 0) {color} | || || ||
[jira] [Commented] (YARN-7672) hadoop-sls can not simulate huge scale of YARN
[ https://issues.apache.org/jira/browse/YARN-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376364#comment-16376364 ] stefanlee commented on YARN-7672: - [~yufeigu] thanks ,My hadoop version is 2.4.0 and *yarn.scheduler.fair.dynamic.max.assign* doesn't in my configuration file, what i mean is that the SLS test with *8* should get more containers than the SLS test with *2* ,so the former could complete more quickly than the latter,but i found the difference of complete time between them is not obvious . My understanding of the *yarn.scheduler.fair.max.assign* is that the greater the value ,the better the schedule performance. please correct me if i am wrong. > hadoop-sls can not simulate huge scale of YARN > -- > > Key: YARN-7672 > URL: https://issues.apache.org/jira/browse/YARN-7672 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhangshilong >Assignee: zhangshilong >Priority: Major > Attachments: YARN-7672.patch > > > Our YARN cluster scale to nearly 10 thousands nodes. We need to do scheduler > pressure test. > Using SLS,we start 2000+ threads to simulate NM and AM. But cpu.load very > high to 100+. I thought that will affect performance evaluation of > scheduler. > So I thought to separate the scheduler from the simulator. > I start a real RM. Then SLS will register nodes to RM,And submit apps to RM > using RM RPC. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7893) Document the FPGA isolation feature
[ https://issues.apache.org/jira/browse/YARN-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376332#comment-16376332 ] Zhankun Tang commented on YARN-7893: [~leftnoteasy] , really sorry that I missed this comment during the Spring festival. Will update today. > Document the FPGA isolation feature > --- > > Key: YARN-7893 > URL: https://issues.apache.org/jira/browse/YARN-7893 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Zhankun Tang >Assignee: Zhankun Tang >Priority: Blocker > Attachments: FPGA-doc-YARN-7893.pdf, YARN-7893-trunk-001.patch, > YARN-7893-trunk-002.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7672) hadoop-sls can not simulate huge scale of YARN
[ https://issues.apache.org/jira/browse/YARN-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376311#comment-16376311 ] Yufei Gu commented on YARN-7672: Hi [~imstefanlee], please make sure yarn.scheduler.fair.dynamic.max.assign is false, otherwise yarn.scheduler.fair.max.assign doesn't work. > hadoop-sls can not simulate huge scale of YARN > -- > > Key: YARN-7672 > URL: https://issues.apache.org/jira/browse/YARN-7672 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhangshilong >Assignee: zhangshilong >Priority: Major > Attachments: YARN-7672.patch > > > Our YARN cluster scale to nearly 10 thousands nodes. We need to do scheduler > pressure test. > Using SLS,we start 2000+ threads to simulate NM and AM. But cpu.load very > high to 100+. I thought that will affect performance evaluation of > scheduler. > So I thought to separate the scheduler from the simulator. > I start a real RM. Then SLS will register nodes to RM,And submit apps to RM > using RM RPC. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7965) NodeAttributeManager add/get API is not working properly
[ https://issues.apache.org/jira/browse/YARN-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376303#comment-16376303 ] Naganarasimha G R commented on YARN-7965: - my bad, apaologies for missing some primitive checks, Had marked few of these points like if no other attributes then we need to remove it and thought of doing it next patch , but anyway you handled it. Thanks for the patch, test cases are fine and remaining i will add in my patch and few minor reveiw comments. NodeAttributesManagerImpl * Ln no 187: change name from node to rmAttribute * ln no 318: why did we remove locking over here ? I had put it to ensure that the data is not intermittent when some one is middle of replace operation is happening * ln no 318: I think its a simple check of _prefix.isEmpty()_ should have been _!prefix.isEmpty()_ in my earlier code. Can we just have it in that way, it seems to be concise ? TestNodeAttributesManager * ln no 1, apache declaration/ notification is missing. * please check the checkstyle issue mentioned. > NodeAttributeManager add/get API is not working properly > > > Key: YARN-7965 > URL: https://issues.apache.org/jira/browse/YARN-7965 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-7965-YARN-3409.001.patch > > > Fix following issues, > # After add node attributes to the manager, could not retrieve newly added > attributes > # Get cluster attributes API should return empty set when given prefix has > no match > # When an attribute is removed from all nodes, the manager did not remove > this mapping > and add UT -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7969) Confusing log messages that do not match with the method name
[ https://issues.apache.org/jira/browse/YARN-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376283#comment-16376283 ] ASF GitHub Bot commented on YARN-7969: -- GitHub user lzh3636 reopened a pull request: https://github.com/apache/hadoop/pull/346 YARN-7969 Fix 3 confusing log messages that do not match with the method name You can merge this pull request into a Git repository by running: $ git pull https://github.com/lzh3636/hadoop YARN-7969 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/346.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #346 commit 67bfdbb313d71f26f13ce81b1fe07352045ddfbd Author: zhenhaoliDate: 2018-02-25T22:39:41Z fix 3 confusing log messages that do not match with the function of method commit f3a228787d2fd10f812b2cd3264c5041648ab7ce Author: zhenhaoli Date: 2018-02-25T22:41:35Z fix a spelling mistake > Confusing log messages that do not match with the method name > - > > Key: YARN-7969 > URL: https://issues.apache.org/jira/browse/YARN-7969 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Zhenhao Li >Priority: Minor > Labels: easyfix > > Our *previous issue (YARN-7926)* found that there are some possible copy and > paste errors in the log messages which may cause some confusion when > operators are reading the log messages. After a further check, we found some > more similar problems. And we will propose a pull request to fix those > problems. > > Here is a list of the problems we found: > *1.* > _org.apache.hadoop.yarn.server.MockResourceManagerFacade._*_forceKillApplication_*_,_ > _and_ > _org.apache.hadoop.yarn.server.router.webapp.MockDefaultRequestInterceptorREST._*_updateAppState_* > > The log messages in *both methods* are: > _LOG.info("_*_Force killing application:_* _" + appId);_ > _and they also throw an exception:_ > _throw new ApplicationNotFoundException( "_*_Trying to kill an absent > application:_* _" + appId);_ > _However after checking the code we found that > MockDefaultRequestInterceptorREST._*_updateAppState()_* has no relation to > killing an application; it just updates the status of an application. So > maybe the log message of it should be changed. > > *2.* > _org.apache.hadoop.yarn.server.timeline.security.TimelineV1DelegationTokenSecretManagerService._*_removeStoredToken_*_,_ > _org.apache.hadoop.yarn.server.timeline.security.TimelineV1DelegationTokenSecretManagerService._*_storeNewToken_* > > The log messages in *both methods* are: > _LOG.debug("Storing token " + tokenId.getSequenceNumber());_ > > Since one method is storing token and one method is removing token, we > believe that the log messages are incorrectly copied and should be changed. > Maybe just simply change the log in *removeStoredToken()* to *“Removing > token”* > > *3.* > _org.apache.hadoop.yarn.server.router.webapp.FederationInterceptorREST._*_getClusterMetricsInfo_*_,_ > _org.apache.hadoop.yarn.server.router.webapp.FederationInterceptorREST._*_getNodes_* > > The log messages in *both methods* are: > _LOG.warn("Failed to get nodes report ", e);_ > > However the function of the first method is getting cluster metrics info, so > maybe the log message of it should changed to *“Failed to get Cluster > Metrics”* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7969) Confusing log messages that do not match with the method name
[ https://issues.apache.org/jira/browse/YARN-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376282#comment-16376282 ] ASF GitHub Bot commented on YARN-7969: -- Github user lzh3636 closed the pull request at: https://github.com/apache/hadoop/pull/346 > Confusing log messages that do not match with the method name > - > > Key: YARN-7969 > URL: https://issues.apache.org/jira/browse/YARN-7969 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Zhenhao Li >Priority: Minor > Labels: easyfix > > Our *previous issue (YARN-7926)* found that there are some possible copy and > paste errors in the log messages which may cause some confusion when > operators are reading the log messages. After a further check, we found some > more similar problems. And we will propose a pull request to fix those > problems. > > Here is a list of the problems we found: > *1.* > _org.apache.hadoop.yarn.server.MockResourceManagerFacade._*_forceKillApplication_*_,_ > _and_ > _org.apache.hadoop.yarn.server.router.webapp.MockDefaultRequestInterceptorREST._*_updateAppState_* > > The log messages in *both methods* are: > _LOG.info("_*_Force killing application:_* _" + appId);_ > _and they also throw an exception:_ > _throw new ApplicationNotFoundException( "_*_Trying to kill an absent > application:_* _" + appId);_ > _However after checking the code we found that > MockDefaultRequestInterceptorREST._*_updateAppState()_* has no relation to > killing an application; it just updates the status of an application. So > maybe the log message of it should be changed. > > *2.* > _org.apache.hadoop.yarn.server.timeline.security.TimelineV1DelegationTokenSecretManagerService._*_removeStoredToken_*_,_ > _org.apache.hadoop.yarn.server.timeline.security.TimelineV1DelegationTokenSecretManagerService._*_storeNewToken_* > > The log messages in *both methods* are: > _LOG.debug("Storing token " + tokenId.getSequenceNumber());_ > > Since one method is storing token and one method is removing token, we > believe that the log messages are incorrectly copied and should be changed. > Maybe just simply change the log in *removeStoredToken()* to *“Removing > token”* > > *3.* > _org.apache.hadoop.yarn.server.router.webapp.FederationInterceptorREST._*_getClusterMetricsInfo_*_,_ > _org.apache.hadoop.yarn.server.router.webapp.FederationInterceptorREST._*_getNodes_* > > The log messages in *both methods* are: > _LOG.warn("Failed to get nodes report ", e);_ > > However the function of the first method is getting cluster metrics info, so > maybe the log message of it should changed to *“Failed to get Cluster > Metrics”* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7969) Confusing log messages that do not match with the method name
[ https://issues.apache.org/jira/browse/YARN-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhao Li updated YARN-7969: - Description: Our *previous issue (YARN-7926)* found that there are some possible copy and paste errors in the log messages which may cause some confusion when operators are reading the log messages. After a further check, we found some more similar problems. And we will propose a pull request to fix those problems. Here is a list of the problems we found: *1.* _org.apache.hadoop.yarn.server.MockResourceManagerFacade._*_forceKillApplication_*_,_ _and_ _org.apache.hadoop.yarn.server.router.webapp.MockDefaultRequestInterceptorREST._*_updateAppState_* The log messages in *both methods* are: _LOG.info("_*_Force killing application:_* _" + appId);_ _and they also throw an exception:_ _throw new ApplicationNotFoundException( "_*_Trying to kill an absent application:_* _" + appId);_ _However after checking the code we found that MockDefaultRequestInterceptorREST._*_updateAppState()_* has no relation to killing an application; it just updates the status of an application. So maybe the log message of it should be changed. *2.* _org.apache.hadoop.yarn.server.timeline.security.TimelineV1DelegationTokenSecretManagerService._*_removeStoredToken_*_,_ _org.apache.hadoop.yarn.server.timeline.security.TimelineV1DelegationTokenSecretManagerService._*_storeNewToken_* The log messages in *both methods* are: _LOG.debug("Storing token " + tokenId.getSequenceNumber());_ Since one method is storing token and one method is removing token, we believe that the log messages are incorrectly copied and should be changed. Maybe just simply change the log in *removeStoredToken()* to *“Removing token”* *3.* _org.apache.hadoop.yarn.server.router.webapp.FederationInterceptorREST._*_getClusterMetricsInfo_*_,_ _org.apache.hadoop.yarn.server.router.webapp.FederationInterceptorREST._*_getNodes_* The log messages in *both methods* are: _LOG.warn("Failed to get nodes report ", e);_ However the function of the first method is getting cluster metrics info, so maybe the log message of it should changed to *“Failed to get Cluster Metrics”* was: Our *previous issue (YARN-7926)* found that there are some possible copy and paste errors in the log messages which may cause some confusion when operators are reading the log messages. After a further check, we found some more similar problems. And we will propose a pull request to fix those problems. Here is a list of the problems we found: *1.* _org.apache.hadoop.yarn.server.MockResourceManagerFacade._*_forceKillApplication_*_,_ _and_ _org.apache.hadoop.yarn.server.router.webapp.MockDefaultRequestInterceptorREST._*_updateAppState_* The log messages in *both methods* are: _LOG.info("_*_Force killing application:_* _" + appId);_ _and they also throw an exception:_ _throw new ApplicationNotFoundException( "_*_Trying to kill an absent application:_* _" + appId);_ _However after checking the code we found that MockDefaultRequestInterceptorREST._*_updateAppState()_* has no relation to killing an application; it just updates the status of an application. So maybe the log message of it should be changed. *2.* _org.apache.hadoop.yarn.server.timeline.security.TimelineV1DelegationTokenSecretManagerService._*_removeStoredToken_*_,_ _org.apache.hadoop.yarn.server.timeline.security.TimelineV1DelegationTokenSecretManagerService._*_storeNewToken_* The log messages in *both methods* are: _LOG.debug("Storing token " + tokenId.getSequenceNumber());_ Since one method is storing token and one method is removing token, we believe that the log messages are incorrectly copied and should be changed. Maybe just simply change the log in *removeStoredToken()* to “Removing token” *3.* _org.apache.hadoop.yarn.server.router.webapp.FederationInterceptorREST._*_getClusterMetricsInfo_*_,_ _org.apache.hadoop.yarn.server.router.webapp.FederationInterceptorREST._*_getNodes_* The log messages in *both methods* are: _LOG.warn("Failed to get nodes report ", e);_ However the function of the first method is getting cluster metrics info, so maybe the log message of it should changed to *“Failed to get Cluster Metrics”* > Confusing log messages that do not match with the method name > - > > Key: YARN-7969 > URL: https://issues.apache.org/jira/browse/YARN-7969 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Zhenhao Li >Priority: Minor > Labels: easyfix > > Our *previous issue (YARN-7926)* found that there are some possible copy and > paste errors in the log messages which may cause some confusion when > operators are reading the log messages. After a further
[jira] [Created] (YARN-7969) Confusing log messages that do not match with the method name
Zhenhao Li created YARN-7969: Summary: Confusing log messages that do not match with the method name Key: YARN-7969 URL: https://issues.apache.org/jira/browse/YARN-7969 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0 Reporter: Zhenhao Li Our *previous issue (YARN-7926)* found that there are some possible copy and paste errors in the log messages which may cause some confusion when operators are reading the log messages. After a further check, we found some more similar problems. And we will propose a pull request to fix those problems. Here is a list of the problems we found: *1.* _org.apache.hadoop.yarn.server.MockResourceManagerFacade._*_forceKillApplication_*_,_ _and_ _org.apache.hadoop.yarn.server.router.webapp.MockDefaultRequestInterceptorREST._*_updateAppState_* The log messages in *both methods* are: _LOG.info("_*_Force killing application:_* _" + appId);_ _and they also throw an exception:_ _throw new ApplicationNotFoundException( "_*_Trying to kill an absent application:_* _" + appId);_ _However after checking the code we found that MockDefaultRequestInterceptorREST._*_updateAppState()_* has no relation to killing an application; it just updates the status of an application. So maybe the log message of it should be changed. *2.* _org.apache.hadoop.yarn.server.timeline.security.TimelineV1DelegationTokenSecretManagerService._*_removeStoredToken_*_,_ _org.apache.hadoop.yarn.server.timeline.security.TimelineV1DelegationTokenSecretManagerService._*_storeNewToken_* The log messages in *both methods* are: _LOG.debug("Storing token " + tokenId.getSequenceNumber());_ Since one method is storing token and one method is removing token, we believe that the log messages are incorrectly copied and should be changed. Maybe just simply change the log in *removeStoredToken()* to “Removing token” *3.* _org.apache.hadoop.yarn.server.router.webapp.FederationInterceptorREST._*_getClusterMetricsInfo_*_,_ _org.apache.hadoop.yarn.server.router.webapp.FederationInterceptorREST._*_getNodes_* The log messages in *both methods* are: _LOG.warn("Failed to get nodes report ", e);_ However the function of the first method is getting cluster metrics info, so maybe the log message of it should changed to *“Failed to get Cluster Metrics”* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7926) Copy and paste errors in log messages
[ https://issues.apache.org/jira/browse/YARN-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376228#comment-16376228 ] ASF GitHub Bot commented on YARN-7926: -- GitHub user lzh3636 opened a pull request: https://github.com/apache/hadoop/pull/345 YARN-7926 fix a copy-and-paste error in log messages You can merge this pull request into a Git repository by running: $ git pull https://github.com/lzh3636/hadoop YARN-7926 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/345.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #345 commit 545b310c53d143bee4106f4e8e23fb4b07b74e28 Author: zhenhaoliDate: 2018-02-25T20:21:47Z fix a copy-and-paste error in log messages > Copy and paste errors in log messages > - > > Key: YARN-7926 > URL: https://issues.apache.org/jira/browse/YARN-7926 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Zhenhao Li >Priority: Minor > Labels: easyfix > > We found that there are some possible copy and paste errors in the log > messages, and we think it may cause some confusion when operators are reading > the log messages. > > The problem is found in the following two methods: > _org.apache.hadoop.yarn.server.router.webapp.FederationInterceptorREST._*_createNewApplication()_* > > and > > _org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor._*_getNewApplication()_* > These are two very similar methods (possibly code clones) in different > classes. > The log messages in *both methods* are: > _LOG.warn("Unable to_ *_create a new ApplicationId_* _in SubCluster " …);_ > _LOG.debug(“_*getNewApplication* try # + … ); > Since one method is getting new application and one method is creating new > application, we believe that the log messages are incorrectly copied and > should be changed. > Please let us know if there is anything that we can further provide you with > fixing the problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7926) Copy and paste errors in log messages
[ https://issues.apache.org/jira/browse/YARN-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhao Li updated YARN-7926: - Description: We found that there are some possible copy and paste errors in the log messages, and we think it may cause some confusion when operators are reading the log messages. The problem is found in the following two methods: _org.apache.hadoop.yarn.server.router.webapp.FederationInterceptorREST._*_createNewApplication()_* and _org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor._*_getNewApplication()_* These are two very similar methods (possibly code clones) in different classes. The log messages in *both methods* are: _LOG.warn("Unable to_ *_create a new ApplicationId_* _in SubCluster " …);_ _LOG.debug(“_*getNewApplication* try # + … ); Since one method is getting new application and one method is creating new application, we believe that the log messages are incorrectly copied and should be changed. Please let us know if there is anything that we can further provide you with fixing the problem. was: We are a group of researchers from Canada and we are studying refactoring in log messages. We found that there are some possible copy and paste errors in the log messages, and we think it may cause some confusion when operators are reading the log messages. The problem is found in the following two methods: _org.apache.hadoop.yarn.server.router.webapp.FederationInterceptorREST._*_createNewApplication()_* and _org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor._*_getNewApplication()_* These are two very similar methods (possibly code clones) in different classes. The log messages in *both methods* are: _LOG.warn("Unable to_ *_create a new ApplicationId_* _in SubCluster " …);_ _LOG.debug(“_*getNewApplication* try # + … ); Since one method is getting new application and one method is creating new application, we believe that the log messages are incorrectly copied and should be changed. Please let us know if there is anything that we can further provide you with fixing the problem. > Copy and paste errors in log messages > - > > Key: YARN-7926 > URL: https://issues.apache.org/jira/browse/YARN-7926 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Zhenhao Li >Priority: Minor > Labels: easyfix > > We found that there are some possible copy and paste errors in the log > messages, and we think it may cause some confusion when operators are reading > the log messages. > > The problem is found in the following two methods: > _org.apache.hadoop.yarn.server.router.webapp.FederationInterceptorREST._*_createNewApplication()_* > > and > > _org.apache.hadoop.yarn.server.router.clientrm.FederationClientInterceptor._*_getNewApplication()_* > These are two very similar methods (possibly code clones) in different > classes. > The log messages in *both methods* are: > _LOG.warn("Unable to_ *_create a new ApplicationId_* _in SubCluster " …);_ > _LOG.debug(“_*getNewApplication* try # + … ); > Since one method is getting new application and one method is creating new > application, we believe that the log messages are incorrectly copied and > should be changed. > Please let us know if there is anything that we can further provide you with > fixing the problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7781) Update YARN-Services-Examples.md to be in sync with the latest code
[ https://issues.apache.org/jira/browse/YARN-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376192#comment-16376192 ] Billie Rinaldi commented on YARN-7781: -- I agree with renaming the state to FLEXING. > Update YARN-Services-Examples.md to be in sync with the latest code > --- > > Key: YARN-7781 > URL: https://issues.apache.org/jira/browse/YARN-7781 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gour Saha >Assignee: Jian He >Priority: Major > Attachments: YARN-7781.01.patch, YARN-7781.02.patch, > YARN-7781.03.patch > > > Update YARN-Services-Examples.md to make the following additions/changes: > 1. Add an additional URL and PUT Request JSON to support flex: > Update to flex up/down the no of containers (instances) of a component of a > service > PUT URL – http://localhost:8088/app/v1/services/hello-world > PUT Request JSON > {code} > { > "components" : [ { > "name" : "hello", > "number_of_containers" : 3 > } ] > } > {code} > 2. Modify all occurrences of /ws/ to /app/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7672) hadoop-sls can not simulate huge scale of YARN
[ https://issues.apache.org/jira/browse/YARN-7672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376064#comment-16376064 ] stefanlee commented on YARN-7672: - [~yufeigu] [~ywskycn] could you please tell me whether or not the value of *yarn.scheduler.fair.max.assign* can increase the containers which the FairScheduler assigned , i set this value to *2* and run SLS in the first round, then next round i update this value to *8* and run SLS again . After that, i found there is no different between their assign ability. (i have enabled *yarn.scheduler.fair.assignmultiple* and disabled *continuous-scheduling*) thanks. > hadoop-sls can not simulate huge scale of YARN > -- > > Key: YARN-7672 > URL: https://issues.apache.org/jira/browse/YARN-7672 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhangshilong >Assignee: zhangshilong >Priority: Major > Attachments: YARN-7672.patch > > > Our YARN cluster scale to nearly 10 thousands nodes. We need to do scheduler > pressure test. > Using SLS,we start 2000+ threads to simulate NM and AM. But cpu.load very > high to 100+. I thought that will affect performance evaluation of > scheduler. > So I thought to separate the scheduler from the simulator. > I start a real RM. Then SLS will register nodes to RM,And submit apps to RM > using RM RPC. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org