[jira] [Updated] (YARN-10015) Correct the sample command in SLS README file
[ https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-10015: Component/s: yarn > Correct the sample command in SLS README file > - > > Key: YARN-10015 > URL: https://issues.apache.org/jira/browse/YARN-10015 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Trivial > Fix For: 3.3.0 > > Attachments: YARN-10015.patch > > > The sample command in SLS README {{bin/slsrun.sh > —-input-rumen=sample-data/2jobs2min-rumen-jh.json > —-output-dir=sample-output}} contains a dash from different encoding. The > command will give the following exception. > ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10015) Correct the sample command in SLS README file
[ https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025527#comment-17025527 ] Yufei Gu commented on YARN-10015: - Committed to trunk. Thanks for the patch, [~aihuaxu]. Thanks for the review [~adam.antal]. > Correct the sample command in SLS README file > - > > Key: YARN-10015 > URL: https://issues.apache.org/jira/browse/YARN-10015 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Trivial > Attachments: YARN-10015.patch > > > The sample command in SLS README {{bin/slsrun.sh > —-input-rumen=sample-data/2jobs2min-rumen-jh.json > —-output-dir=sample-output}} contains a dash from different encoding. The > command will give the following exception. > ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10015) Correct the sample command in SLS README file
[ https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-10015: Fix Version/s: 3.3.0 > Correct the sample command in SLS README file > - > > Key: YARN-10015 > URL: https://issues.apache.org/jira/browse/YARN-10015 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Trivial > Fix For: 3.3.0 > > Attachments: YARN-10015.patch > > > The sample command in SLS README {{bin/slsrun.sh > —-input-rumen=sample-data/2jobs2min-rumen-jh.json > —-output-dir=sample-output}} contains a dash from different encoding. The > command will give the following exception. > ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10015) Correct the sample command in SLS README file
[ https://issues.apache.org/jira/browse/YARN-10015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17024762#comment-17024762 ] Yufei Gu commented on YARN-10015: - +1, will commit later. > Correct the sample command in SLS README file > - > > Key: YARN-10015 > URL: https://issues.apache.org/jira/browse/YARN-10015 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Trivial > Attachments: YARN-10015.patch > > > The sample command in SLS README {{bin/slsrun.sh > —-input-rumen=sample-data/2jobs2min-rumen-jh.json > —-output-dir=sample-output}} contains a dash from different encoding. The > command will give the following exception. > ERROR: Invalid option —-input-rumen=sample-data/2jobs2min-rumen-jh.json -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-9537: --- Fix Version/s: 3.3.0 Hadoop Flags: Reviewed > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9537-002.patch, YARN-9537.001.patch, > YARN-9537.003.patch, YARN-9537.004.patch, YARN-9537.005.patch, > YARN-9537.006.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972654#comment-16972654 ] Yufei Gu commented on YARN-9537: Committed to trunk. Thanks for the contribution, [~cane]. Thanks for the review, [~adam.antal] and [~snemeth]. > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537-002.patch, YARN-9537.001.patch, > YARN-9537.003.patch, YARN-9537.004.patch, YARN-9537.005.patch, > YARN-9537.006.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972075#comment-16972075 ] Yufei Gu commented on YARN-9537: [~cane], Thanks for the patch. +1 for the patch 006. Will commit later. > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537-002.patch, YARN-9537.001.patch, > YARN-9537.003.patch, YARN-9537.004.patch, YARN-9537.005.patch, > YARN-9537.006.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971376#comment-16971376 ] Yufei Gu commented on YARN-9537: Agreed with [~snemeth]. The production code shouldn't do the null checking. Class FairScheduler should make sure that {{getConf}} won't be null before creating any {{FSAppAttempt}} object. Hi [~cane], can you refactor the test code since it fails a test case per Hadoop QA? > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537-002.patch, YARN-9537.001.patch, > YARN-9537.003.patch, YARN-9537.004.patch, YARN-9537.005.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969450#comment-16969450 ] Yufei Gu edited comment on YARN-9537 at 11/7/19 5:41 PM: - Hi [~cane], sorry to come late. Patch 003 looks good to me overall. Just think aloud, why this property is cluster level instead of queue level? There are minor issues. # {{protected static final String AM_PREEMPTION = CONF_PREFIX + "am.preemption";}} There are two spaces between "String" and "AM_PREEMPTION" # Do we need this comment? Probably not. {code:java} // For test this.enableAMPreemption = scheduler.getConf().getAMPreemptionEnabled(); {code} # {{ public void testDisableAMPreemption() throws Exception }} No need to throw. was (Author: yufeigu): Hi [~cane], sorry to come late. Patch 003 looks good to me overall. Just think aloud, why this property is cluster level instead of queue level? There are style issues. # {{protected static final String AM_PREEMPTION = CONF_PREFIX + "am.preemption";}} There are two spaces between "String" and "AM_PREEMPTION" # Do we need this comment? Probably not. {code:java} // For test this.enableAMPreemption = scheduler.getConf().getAMPreemptionEnabled(); {code} > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537-002.patch, YARN-9537.001.patch, > YARN-9537.003.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969450#comment-16969450 ] Yufei Gu commented on YARN-9537: Hi [~cane], sorry to come late. Patch 003 looks good to me overall. Just think aloud, why this property is cluster level instead of queue level? There are style issues. # {{protected static final String AM_PREEMPTION = CONF_PREFIX + "am.preemption";}} There are two spaces between "String" and "AM_PREEMPTION" # Do we need this comment? Probably not. {code:java} // For test this.enableAMPreemption = scheduler.getConf().getAMPreemptionEnabled(); {code} > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537-002.patch, YARN-9537.001.patch, > YARN-9537.003.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967729#comment-16967729 ] Yufei Gu commented on YARN-9940: Hi [~kailiu_dev], added you to the contributor role, and assign this to you. I will try to review this later. > avoid continuous scheduling thread crashes while sorting nodes get > 'Comparison method violates its general contract' > > > Key: YARN-9940 > URL: https://issues.apache.org/jira/browse/YARN-9940 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Assignee: kailiu_dev >Priority: Major > Fix For: 2.7.2 > > Attachments: YARN-9940-branch-2.7.2.001.patch > > > 2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeForceCollapse(TimSort.java:426) > at java.util.TimSort.sort(TimSort.java:223) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9940) avoid continuous scheduling thread crashes while sorting nodes get 'Comparison method violates its general contract'
[ https://issues.apache.org/jira/browse/YARN-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-9940: -- Assignee: kailiu_dev > avoid continuous scheduling thread crashes while sorting nodes get > 'Comparison method violates its general contract' > > > Key: YARN-9940 > URL: https://issues.apache.org/jira/browse/YARN-9940 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: kailiu_dev >Assignee: kailiu_dev >Priority: Major > Fix For: 2.7.2 > > Attachments: YARN-9940-branch-2.7.2.001.patch > > > 2019-10-16 09:14:51,215 ERROR > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[FairSchedulerContinuousScheduling,5,main] threw an Exception. > java.lang.IllegalArgumentException: Comparison method violates its general > contract! > at java.util.TimSort.mergeHi(TimSort.java:868) > at java.util.TimSort.mergeAt(TimSort.java:485) > at java.util.TimSort.mergeForceCollapse(TimSort.java:426) > at java.util.TimSort.sort(TimSort.java:223) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1117) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:296) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5913) Consolidate "resource" and "amResourceRequest" in ApplicationSubmissionContext
[ https://issues.apache.org/jira/browse/YARN-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917399#comment-16917399 ] Yufei Gu commented on YARN-5913: [~ykabusalah] feel free to take any Jira without assignee. > Consolidate "resource" and "amResourceRequest" in ApplicationSubmissionContext > -- > > Key: YARN-5913 > URL: https://issues.apache.org/jira/browse/YARN-5913 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Yufei Gu >Priority: Minor > Labels: newbie > > Usage of these two variables overlaps and causes confusion. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6425) Move out FS state dump code out of method update()
[ https://issues.apache.org/jira/browse/YARN-6425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917398#comment-16917398 ] Yufei Gu commented on YARN-6425: [~ykabusalah] feel free to do that. > Move out FS state dump code out of method update() > -- > > Key: YARN-6425 > URL: https://issues.apache.org/jira/browse/YARN-6425 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.9.0, 3.0.0-alpha2 >Reporter: Yufei Gu >Priority: Major > Labels: newbie++ > > Better to move out FS state dump code out of update() > {code} > if (LOG.isDebugEnabled()) { > if (--updatesToSkipForDebug < 0) { > updatesToSkipForDebug = UPDATE_DEBUG_FREQUENCY; > dumpSchedulerState(); > } > } > {code} > And, after that we should distinct between update call and update thread > duration like before YARN-6112. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2497) Fair scheduler should support strict node labels
[ https://issues.apache.org/jira/browse/YARN-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890573#comment-16890573 ] Yufei Gu commented on YARN-2497: Hi [~chenzhaohang], AFAIK, FS doesn't support node label in any version. > Fair scheduler should support strict node labels > > > Key: YARN-2497 > URL: https://issues.apache.org/jira/browse/YARN-2497 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Wangda Tan >Assignee: Daniel Templeton >Priority: Major > Attachments: YARN-2497.001.patch, YARN-2497.002.patch, > YARN-2497.003.patch, YARN-2497.004.patch, YARN-2497.005.patch, > YARN-2497.006.patch, YARN-2497.007.patch, YARN-2497.008.patch, > YARN-2497.009.patch, YARN-2497.010.patch, YARN-2497.011.patch, > YARN-2497.branch-3.0.001.patch, YARN-2499.WIP01.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-9537: -- Assignee: zhoukang > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537.001.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890572#comment-16890572 ] Yufei Gu commented on YARN-9537: Hi [~cane], added you to contributor, and assign this to you. Will you still work on this? > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537.001.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-9537: --- Fix Version/s: (was: 3.1.2) (was: 3.2.0) > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Priority: Major > Attachments: YARN-9537.001.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865860#comment-16865860 ] Yufei Gu commented on YARN-9537: Hi [~cane], thanks for the patch. Could you elaborate your use case? > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-9537.001.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9537) Add configuration to support AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836625#comment-16836625 ] Yufei Gu commented on YARN-9537: FairScheduler doesn't prevent you from preempting the AM container. It just tries to preempt as less AM containers as possible. > Add configuration to support AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: zhoukang >Priority: Major > > In our production cluster, we can tolerate am preemption. So we can add a > configuration to support am preemption. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options
[ https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830769#comment-16830769 ] Yufei Gu commented on YARN-9520: * inter-queue preemption will not happen among the applications of inside the queue. Yes. * FIFO ordering policy the newer applications will preempted first if the priority is same or not set. In other words, the older applications will considered for preemption only after the newer applications are preempted. No. Only the oldest one has less chance to be preempted. All others have the same chance. * multiple applications of a queue will run if resources are available. lets say there are resources for 200 containers, 2 applications of 100 containers will run. after 50 containers of each finished does the 3rd containers will get allocated? or it will wait for first 2 applications will finish? Yes. The 3rd one can run. > fair scheduler: inter-queue-preemption.enabled, > intra-queue-preemption.enabled options > -- > > Key: YARN-9520 > URL: https://issues.apache.org/jira/browse/YARN-9520 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Sudhir Babu Pothineni >Priority: Major > > Its good to have inter-queue-preemption-enabled, > intra-queue-preemption-enabled options for fair scheduler, i have a use case > where we need inter-queue-preemption-enabled=false -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options
[ https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829993#comment-16829993 ] Yufei Gu commented on YARN-9520: Seems like you don't need queue A be fair policy. Why not set it to fifo instead? > fair scheduler: inter-queue-preemption.enabled, > intra-queue-preemption.enabled options > -- > > Key: YARN-9520 > URL: https://issues.apache.org/jira/browse/YARN-9520 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Sudhir Babu Pothineni >Priority: Major > > Its good to have inter-queue-preemption-enabled, > intra-queue-preemption-enabled options for fair scheduler, i have a use case > where we need inter-queue-preemption-enabled=false -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9520) fair scheduler: inter-queue-preemption.enabled, intra-queue-preemption.enabled options
[ https://issues.apache.org/jira/browse/YARN-9520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829685#comment-16829685 ] Yufei Gu commented on YARN-9520: Could you elaborate the user case? > fair scheduler: inter-queue-preemption.enabled, > intra-queue-preemption.enabled options > -- > > Key: YARN-9520 > URL: https://issues.apache.org/jira/browse/YARN-9520 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Sudhir Babu Pothineni >Priority: Major > > Its good to have inter-queue-preemption-enabled, > intra-queue-preemption-enabled options for fair scheduler, i have a use case > where we need inter-queue-preemption-enabled=false -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to remove duplication
[ https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807378#comment-16807378 ] Yufei Gu commented on YARN-9214: Committed to trunk. Thanks [~jiwq] for the contribution. Thanks [~snemeth] for the review. > Add AbstractYarnScheduler#getValidQueues method to remove duplication > - > > Key: YARN-9214 > URL: https://issues.apache.org/jira/browse/YARN-9214 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9214.001.patch, YARN-9214.002.patch, > YARN-9214.003.patch, YARN-9214.004.patch, YARN-9214.005.patch > > > *AbstractYarnScheduler#moveAllApps* and > *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I > think we need a method to handle it named > *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc > comment to expound why exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to remove duplication
[ https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-9214: --- Summary: Add AbstractYarnScheduler#getValidQueues method to remove duplication (was: Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code ) > Add AbstractYarnScheduler#getValidQueues method to remove duplication > - > > Key: YARN-9214 > URL: https://issues.apache.org/jira/browse/YARN-9214 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9214.001.patch, YARN-9214.002.patch, > YARN-9214.003.patch, YARN-9214.004.patch, YARN-9214.005.patch > > > *AbstractYarnScheduler#moveAllApps* and > *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I > think we need a method to handle it named > *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc > comment to expound why exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code
[ https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807374#comment-16807374 ] Yufei Gu commented on YARN-9214: +1 > Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code > -- > > Key: YARN-9214 > URL: https://issues.apache.org/jira/browse/YARN-9214 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9214.001.patch, YARN-9214.002.patch, > YARN-9214.003.patch, YARN-9214.004.patch, YARN-9214.005.patch > > > *AbstractYarnScheduler#moveAllApps* and > *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I > think we need a method to handle it named > *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc > comment to expound why exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9401) Fix `yarn version` print the version info is the same as `hadoop version`
[ https://issues.apache.org/jira/browse/YARN-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807053#comment-16807053 ] Yufei Gu edited comment on YARN-9401 at 4/1/19 6:20 PM: Do we plan to release YARN separately? Probably never. With that, I suggest to explore the idea of removing class YarnVersionInfo rather than this change. It is OK to remove it by looking at the reference in the web-app, besides, the class is "Private and Unstable". New more thoughts from people, cc [~vinodkv]. was (Author: yufeigu): Do we plan to release YARN separately? Probably never. With that, I suggest to explore the idea of removing class YarnVersionInfo rather than this change. It is OK to remove it by looking at the reference in the web-app, besides, the class is "Private and Unstable". New more thoughts from people, cc [~vikumar]. > Fix `yarn version` print the version info is the same as `hadoop version` > - > > Key: YARN-9401 > URL: https://issues.apache.org/jira/browse/YARN-9401 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Minor > Attachments: YARN-9401.001.patch, YARN-9401.002.patch > > > It's caused by in `yarn` shell used `org.apache.hadoop.util.VersionInfo` > instead of `org.apache.hadoop.yarn.util.YarnVersionInfo` as the > `HADOOP_CLASSNAME` by mistake. > {panel:title=Before} > Hadoop 3.3.0-SNAPSHOT > Source code repository [https://github.com/apache/hadoop.git] -r > 53a86e2b8ecb83b666d4ed223fc270e1a46642c1 > Compiled by jiwq on 2019-04-01T04:55Z > Compiled with protoc 2.5.0 > From source with checksum 829bd6e22c17c6da74f5c1a61647922 > {panel} > {panel:title=After} > YARN 3.3.0-SNAPSHOT > Subversion [https://github.com/apache/hadoop.git] -r > 53a86e2b8ecb83b666d4ed223fc270e1a46642c1 > Compiled by jiwq on 2019-04-01T05:06Z > From source with checksum e10a192bd933ffdafe435d7fe99d24d > {panel} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9401) Fix `yarn version` print the version info is the same as `hadoop version`
[ https://issues.apache.org/jira/browse/YARN-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807053#comment-16807053 ] Yufei Gu commented on YARN-9401: Do we plan to release YARN separately? Probably never. With that, I suggest to explore the idea of removing class YarnVersionInfo rather than this change. It is OK to remove it by looking at the reference in the web-app, besides, the class is "Private and Unstable". New more thoughts from people, cc [~vikumar]. > Fix `yarn version` print the version info is the same as `hadoop version` > - > > Key: YARN-9401 > URL: https://issues.apache.org/jira/browse/YARN-9401 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Minor > Attachments: YARN-9401.001.patch, YARN-9401.002.patch > > > It's caused by in `yarn` shell used `org.apache.hadoop.util.VersionInfo` > instead of `org.apache.hadoop.yarn.util.YarnVersionInfo` as the > `HADOOP_CLASSNAME` by mistake. > {panel:title=Before} > Hadoop 3.3.0-SNAPSHOT > Source code repository [https://github.com/apache/hadoop.git] -r > 53a86e2b8ecb83b666d4ed223fc270e1a46642c1 > Compiled by jiwq on 2019-04-01T04:55Z > Compiled with protoc 2.5.0 > From source with checksum 829bd6e22c17c6da74f5c1a61647922 > {panel} > {panel:title=After} > YARN 3.3.0-SNAPSHOT > Subversion [https://github.com/apache/hadoop.git] -r > 53a86e2b8ecb83b666d4ed223fc270e1a46642c1 > Compiled by jiwq on 2019-04-01T05:06Z > From source with checksum e10a192bd933ffdafe435d7fe99d24d > {panel} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9401) Fix `yarn version` print the version info is the same as `hadoop version`
[ https://issues.apache.org/jira/browse/YARN-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806357#comment-16806357 ] Yufei Gu commented on YARN-9401: Thank [~jiwq] for working on this. Looking more deeply, the hdfs command just uses VersionInfo. This is never a big issue likely due to YARN and HDFS never be released separately. Besides, I didn't see why we need the class YarnVersionInfo. Hi [~wangda], do you happen to know why we need class YarnVersionInfo? > Fix `yarn version` print the version info is the same as `hadoop version` > - > > Key: YARN-9401 > URL: https://issues.apache.org/jira/browse/YARN-9401 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Minor > Attachments: YARN-9401.001.patch, YARN-9401.002.patch > > > It's caused by in `yarn` shell used `org.apache.hadoop.util.VersionInfo` > instead of `org.apache.hadoop.yarn.util.YarnVersionInfo` as the > `HADOOP_CLASSNAME` by mistake. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9214) Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code
[ https://issues.apache.org/jira/browse/YARN-9214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806325#comment-16806325 ] Yufei Gu commented on YARN-9214: Thank [~jiwq] for working on this. {code} LOG.warn(errMsg); throw new YarnException(errMsg); {code} It doesn't make sense to LOG.warn since we've thrown a exception here. Suggest to remove it though it isn't introduced by your patch. > Add AbstractYarnScheduler#getValidQueues method to resolve duplicate code > -- > > Key: YARN-9214 > URL: https://issues.apache.org/jira/browse/YARN-9214 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.1.0, 3.2.0, 2.9.2, 3.0.3, 2.8.5 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9214.001.patch, YARN-9214.002.patch, > YARN-9214.003.patch > > > *AbstractYarnScheduler#moveAllApps* and > *AbstractYarnScheduler#killAllAppsInQueue* had the same code segment. So I > think we need a method to handle it named > *AbstractYarnScheduler#getValidQueues*. Apart from this we need add the doc > comment to expound why exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801412#comment-16801412 ] Yufei Gu commented on YARN-8967: Committed to trunk. Thanks [~wilfreds] for the contribution. > Change FairScheduler to use PlacementRule interface > --- > > Key: YARN-8967 > URL: https://issues.apache.org/jira/browse/YARN-8967 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-8967.001.patch, YARN-8967.002.patch, > YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, > YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch, > YARN-8967.009.patch, YARN-8967.010.patch, YARN-8967.011.patch > > > The PlacementRule interface was introduced to be used by all schedulers as > per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not > and is using its own rule definition. > YARN-8948 cleans up the implementation and removes the CS references which > should allow this change to go through. > This would be the first step in using one placement rule engine for both > schedulers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799736#comment-16799736 ] Yufei Gu commented on YARN-8967: +1. Will commit later. > Change FairScheduler to use PlacementRule interface > --- > > Key: YARN-8967 > URL: https://issues.apache.org/jira/browse/YARN-8967 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-8967.001.patch, YARN-8967.002.patch, > YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, > YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch, > YARN-8967.009.patch, YARN-8967.010.patch, YARN-8967.011.patch > > > The PlacementRule interface was introduced to be used by all schedulers as > per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not > and is using its own rule definition. > YARN-8948 cleans up the implementation and removes the CS references which > should allow this change to go through. > This would be the first step in using one placement rule engine for both > schedulers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798814#comment-16798814 ] Yufei Gu commented on YARN-8967: Hi [~wilfreds], thanks for the patch. 3) Yeah, the xml DOM looks like a little bit silly. getChildNodes() at least should provide an option to return only elements rather than childs mixed with elements and texts. I believe some new libs should solve this issue. We could do something like this to hide second loop in a method getParentNode(). {code} Element parentNode = getParentNode(node.getChildNodes()); PlacementRule parentRule = getParentRule(parentNode, fs); {code} 4) That's nice. 5) I do think the current solution is better. Let's ignore this checkstyle warning. Just one concern, can we make both member in class RuleMap “final”? So that no code can change their value except the constructor. > Change FairScheduler to use PlacementRule interface > --- > > Key: YARN-8967 > URL: https://issues.apache.org/jira/browse/YARN-8967 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-8967.001.patch, YARN-8967.002.patch, > YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, > YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch, > YARN-8967.009.patch, YARN-8967.010.patch > > > The PlacementRule interface was introduced to be used by all schedulers as > per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not > and is using its own rule definition. > YARN-8948 cleans up the implementation and removes the CS references which > should allow this change to go through. > This would be the first step in using one placement rule engine for both > schedulers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796846#comment-16796846 ] Yufei Gu commented on YARN-8967: Hi [~wilfreds], the patch v9 looks really good. {quote} Based on all this I do think I need to file a follow up jira to fix the Hive SHIM that uses the policy at the moment and move that to the new code in a backward compatible way. {quote} I am with you. Some nits: 1. Sorry to miss this in last review, there is no need to add a debug log since we throw an exception here. {code} LOG.debug("Initialising rule set failed", ioe); throw new AllocationConfigurationException( "Rule initialisation failed with exception", ioe); {code} 3. Too many nested if/for statements in the method fromXml(). It would be nice to exact some logic in the loop to a separated method or we can use the {{if (! node instanceof Element) continue;}} to avoid one layer. 4. I made up a new test case, the “nestedUserQueue” has 2 parents, only the second one takes effect. I believe we should at least LOG a warn for the first parent “primaryGroup” and we don’t need to create and initialize it since it will be overwritten by the second parent. {code} StringBuffer sb = new StringBuffer(); sb.append(""); sb.append(" "); sb.append(" "); sb.append(" "); sb.append(" "); sb.append(" "); sb.append(""); createPolicy(sb.toString()); {code} 5. Not a fan of the getters in the nested class RuleMap. It could be as simple as possible as a wrapper class for multiple values, just like the case class in Scala or data class in Kotlin. This is just my preference. I’m OK with current implement though. > Change FairScheduler to use PlacementRule interface > --- > > Key: YARN-8967 > URL: https://issues.apache.org/jira/browse/YARN-8967 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-8967.001.patch, YARN-8967.002.patch, > YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, > YARN-8967.006.patch, YARN-8967.007.patch, YARN-8967.008.patch, > YARN-8967.009.patch > > > The PlacementRule interface was introduced to be used by all schedulers as > per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not > and is using its own rule definition. > YARN-8948 cleans up the implementation and removes the CS references which > should allow this change to go through. > This would be the first step in using one placement rule engine for both > schedulers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791411#comment-16791411 ] Yufei Gu commented on YARN-8967: Hi [~wilfreds], thanks for the patch. Some comments: 1. Nice cleanup in class QueueManager. 2. Should we deprecate two constructors of class AllocationFileLoaderService rather than remove them since it is a public class? 3. "{{public List getRules()}}" here, Modifier “public” could be “no modifier” since all test cases invoking method getRules() are in the same package. 4. I would suggest to put the exception into the LOG.warn and remove LOG.debug in method "placeApplication()". 5. The method "addApplication()" is a little bit messy due to it holds both logic for adding a new application and recovered application. I feel like it would be cleaner if we separate the method addApplication to two methods, one for add new application anther for recover applications. Just some thoughts. What do you think? 6. Since we’ve got this, we doesn’t need to check whether queueName is null in method addApplication(). {code:java} if (queueName != null) { addApplication(appAddedEvent.getApplicationId(), queueName, appAddedEvent.getUser(), appAddedEvent.getIsAppRecovering(), appAddedEvent.getPlacementContext()); } {code} 7. Do we still need this check “if (queueName.startsWith(".") || queueName.endsWith(“.”))”? We’ve normalized queue names in placement rule for a new application and the queue name should be valid for a recovered app. Class {{QueuePlacementPolicy}} related comments: 1. The QueuePlacementPolicy objects in class AllocationConfiguration are never used by production code if we {{updateRules()}} in the constructor. I would suggest either moving {{updateRules()}} out of the QueuePlacementPolicy constructor or removing all QueuePlacementPolicy objects and making QueuePlacementPolicy a utility class. I prefer the first one since it reduces coupling. In that case, the AllocationConfiguration object still keeps all configurations items including placement rules, which is a consistent behavior. 2. You probably need an another comment style to make this link work {{{@link #getTerminal}}} 3. Incomplete comment {{// The list must be}} in class QueuePlacementPolicy 4. Typos in comment “Builds an QueuePlacementPolicy from an xml element.” an -> a 5. “testNoCreate()” contains some duplicated test cases. I’m OK if you delete it or not since it isn’t introduced by your patch. 6. I would suggest to refactor the method “fromXml()” a little bit by introducing a new method like “getParentRule()” 7. We could create a nested class like the following in class QueuePlacementPolicy to avoid multiple “get(0)” and “get(1)” in the code. {code:java} public static class Policy { public Object clazzz; public String terminal; } {code} 8. Find the following code in class SpecifiedPlacementRule, no need to both log error and throw an exception. Would you mind fix it in this patch although it isn’t introduced by this patch? {code:java} LOG.error("Specified queue name not valid: '{}'", queueName); throw new YarnException("Application submitted by user " + user + "with illegal queue name '" + queueName + "'."); {code} > Change FairScheduler to use PlacementRule interface > --- > > Key: YARN-8967 > URL: https://issues.apache.org/jira/browse/YARN-8967 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-8967.001.patch, YARN-8967.002.patch, > YARN-8967.003.patch, YARN-8967.004.patch, YARN-8967.005.patch, > YARN-8967.006.patch, YARN-8967.007.patch > > > The PlacementRule interface was introduced to be used by all schedulers as > per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not > and is using its own rule definition. > YARN-8948 cleans up the implementation and removes the CS references which > should allow this change to go through. > This would be the first step in using one placement rule engine for both > schedulers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9298) Implement FS placement rules using PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-9298: --- Fix Version/s: 3.3.0 > Implement FS placement rules using PlacementRule interface > -- > > Key: YARN-9298 > URL: https://issues.apache.org/jira/browse/YARN-9298 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9298.001.patch, YARN-9298.002.patch, > YARN-9298.003.patch, YARN-9298.004.patch, YARN-9298.005.patch, > YARN-9298.006.patch > > > Implement existing placement rules of the FS using the PlacementRule > interface. > Preparation for YARN-8967 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784193#comment-16784193 ] Yufei Gu commented on YARN-9298: Committed to trunk. Thanks [~wilfreds] for the contribution. > Implement FS placement rules using PlacementRule interface > -- > > Key: YARN-9298 > URL: https://issues.apache.org/jira/browse/YARN-9298 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-9298.001.patch, YARN-9298.002.patch, > YARN-9298.003.patch, YARN-9298.004.patch, YARN-9298.005.patch, > YARN-9298.006.patch > > > Implement existing placement rules of the FS using the PlacementRule > interface. > Preparation for YARN-8967 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783618#comment-16783618 ] Yufei Gu commented on YARN-9298: +1 for the patch v6. Will commit later. > Implement FS placement rules using PlacementRule interface > -- > > Key: YARN-9298 > URL: https://issues.apache.org/jira/browse/YARN-9298 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-9298.001.patch, YARN-9298.002.patch, > YARN-9298.003.patch, YARN-9298.004.patch, YARN-9298.005.patch, > YARN-9298.006.patch > > > Implement existing placement rules of the FS using the PlacementRule > interface. > Preparation for YARN-8967 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782911#comment-16782911 ] Yufei Gu commented on YARN-9298: [~wilfreds], thanks for the patch. Looks really good! Just some nits: 1. There are unused imports in class FairQueuePlacementUtils and class PlacementRule 2. {{private PlacementRule parentRule = null;}}, no need set it to null as a class member since the default is null. 3. {{protected boolean createQueue = false;}}, I suggest to remove the initialization or set it to true since it will be set to true by default anyway. > Implement FS placement rules using PlacementRule interface > -- > > Key: YARN-9298 > URL: https://issues.apache.org/jira/browse/YARN-9298 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-9298.001.patch, YARN-9298.002.patch, > YARN-9298.003.patch, YARN-9298.004.patch, YARN-9298.005.patch > > > Implement existing placement rules of the FS using the PlacementRule > interface. > Preparation for YARN-8967 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780302#comment-16780302 ] Yufei Gu commented on YARN-9298: Hi [~wilfreds], thanks for the patch. I am glad we are moving forward. 1. The current implementations of multiple FS rules are still kind of verbose. I think a new abstract class like this “{{public abstract class FairPlacemenRule extends PlacementRule}}” can solve the duplication. It can contain not only 3 {{setConfig()}} in it, but also all fair scheduler rules related methods and variables. The method {{initialize}} can have a default implementation as well. So that we can leave class PlacementRule as it is. And in method getPlacementRule, some minor changes are needed. 2. There is an unused import in class TestFairQueuePlacementUtils. 3. If a switch statement isn’t suit here. I personally prefer {{if {} else if {} else {}}} rather than {{if {} else { if {} else { in this case, which looks cleaner by reducing nested level. However, I won’t insist on this. > Implement FS placement rules using PlacementRule interface > -- > > Key: YARN-9298 > URL: https://issues.apache.org/jira/browse/YARN-9298 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-9298.001.patch, YARN-9298.002.patch, > YARN-9298.003.patch > > > Implement existing placement rules of the FS using the PlacementRule > interface. > Preparation for YARN-8967 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9298) Implement FS placement rules using PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776175#comment-16776175 ] Yufei Gu edited comment on YARN-9298 at 2/24/19 9:31 AM: - Hi [~wilfreds], thanks for the patch. It is really nice to add these unit tests. Some comments: 1. Thanks for adding comments for method {{getPlacementRule(String ruleStr, Configuration conf)}}, but it is only used by CS. You may need to update the comments. 2. I would suggest the unit test messages to clarify the expectation wasn't met or some actions failed, like this “Rule object shouldn’t be null” or "Failed to instantiate the rule object.". “Cleaned name was changed for clean input" could be something like “Unexpected cleaned name.” Or “Failed to clean name” 3. Can you add a case “root” in method {{testAssureRoot()}}? 4. I feel like class {{TestPlacementRuleFS}} isn’t necessary. Why not just test against DefaultPlacementRule, and all other real rules? Besides, Unit tests are needed for the all FS placement rule classes. I’m OK if you want to move some code from YARN-8967 and reuse existing tests, like the one in class TestQueuePlacementPolicy 5. if {} else if {} else {} or a switch statement could be cleaner than if {} else { if {} else {}} in method {{setConfig}} 6. There are some common code in method {{*Rule::initialize()}} and {{*Rule::setConfig()}}, we can probably put them into either class {{PlacementRule}} or class {{FairQueuePlacementUtils}}. was (Author: yufeigu): Hi [~wilfreds], thanks for the patch. It is really nice to add these unit tests. Some comments: 1. Thanks for adding comments for method {{getPlacementRule(String ruleStr, Configuration conf)}}, but it is only used by CS. You may need to update the comments. 2. I would suggest the unit test messages to clarify the expectation or some actions failed, like this “Rule object shouldn’t be null” or "Failed to instantiate the rule object.". “Cleaned name was changed for clean input" could be something like “Unexpected cleaned name.” Or “Failed to clean name” 3. Can you add a case “root” in method {{testAssureRoot()}}? 4. I feel like class {{TestPlacementRuleFS}} isn’t necessary. Why not just test against DefaultPlacementRule, and all other real rules? Besides, Unit tests are needed for the all FS placement rule classes. I’m OK if you want to move some code from YARN-8967 and reuse existing tests, like the one in class TestQueuePlacementPolicy 5. if {} else if {} else {} or a switch statement could be cleaner than if {} else { if {} else {}} in method {{setConfig}} 6. There are some common code in method {{*Rule::initialize()}} and {{*Rule::setConfig()}}, we can probably put them into either class {{PlacementRule}} or class {{FairQueuePlacementUtils}}. > Implement FS placement rules using PlacementRule interface > -- > > Key: YARN-9298 > URL: https://issues.apache.org/jira/browse/YARN-9298 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-9298.001.patch, YARN-9298.002.patch > > > Implement existing placement rules of the FS using the PlacementRule > interface. > Preparation for YARN-8967 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes
[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776184#comment-16776184 ] Yufei Gu commented on YARN-9278: bq. If our cluster has a lot of long-running jobs, the above method is not helpful. That's unfortunate. Setting a maximum num of nodes to iterate seems a quick-and-dirty way to solve the latency in big clusters. Let's brainstorm the solution. [~Steven Rand] and [~wilfreds], what do you think? > Shuffle nodes when selecting to be preempted nodes > -- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > We should *shuffle* the nodes to avoid some nodes being preempted frequently. > Also, we should *limit* the num of nodes to make preemption more efficient. > Just like this, > {code:java} > // we should not iterate all nodes, that will be very slow > long maxTryNodeNum = > context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); > if (potentialNodes.size() > maxTryNodeNum){ > Collections.shuffle(potentialNodes); > List newPotentialNodes = new ArrayList(); > for (int i = 0; i < maxTryNodeNum; i++){ > newPotentialNodes.add(potentialNodes.get(i)); > } > potentialNodes = newPotentialNodes; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9298) Implement FS placement rules using PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776175#comment-16776175 ] Yufei Gu edited comment on YARN-9298 at 2/24/19 9:19 AM: - Hi [~wilfreds], thanks for the patch. It is really nice to add these unit tests. Some comments: 1. Thanks for adding comments for method {{getPlacementRule(String ruleStr, Configuration conf)}}, but it is only used by CS. You may need to update the comments. 2. I would suggest the unit test messages to clarify the expectation or some actions failed, like this “Rule object shouldn’t be null” or "Failed to instantiate the rule object.". “Cleaned name was changed for clean input" could be something like “Unexpected cleaned name.” Or “Failed to clean name” 3. Can you add a case “root” in method {{testAssureRoot()}}? 4. I feel like class {{TestPlacementRuleFS}} isn’t necessary. Why not just test against DefaultPlacementRule, and all other real rules? Besides, Unit tests are needed for the all FS placement rule classes. I’m OK if you want to move some code from YARN-8967 and reuse existing tests, like the one in class TestQueuePlacementPolicy 5. if {} else if {} else {} or a switch statement could be cleaner than if {} else { if {} else {}} in method {{setConfig}} 6. There are some common code in method {{*Rule::initialize()}} and {{*Rule::setConfig()}}, we can probably put them into either class {{PlacementRule}} or class {{FairQueuePlacementUtils}}. was (Author: yufeigu): Hi [~wilfreds], thanks for the patch. It is really nice to add these unit tests. Some comments: 1. Thanks for adding comments for method {{getPlacementRule(String ruleStr, Configuration conf)}}, but it is only used by CS. You may need to update the comments. 2. I would suggest the unit test messages to clarify the expectation or some actions failed, like this “Rule object shouldn’t be null”. “Cleaned name was changed for clean input" could be something like “Unexpected cleaned name.” Or “Failed to clean name” 3. Can you add a case “root” in method {{testAssureRoot()}}? 4. I feel like class {{TestPlacementRuleFS}} isn’t necessary. Why not just test against DefaultPlacementRule, and all other real rules? Besides, Unit tests are needed for the all FS placement rule classes. I’m OK if you want to move some code from YARN-8967 and reuse existing tests, like the one in class TestQueuePlacementPolicy 5. if {} else if {} else {} or a switch statement could be cleaner than if {} else { if {} else {}} in method {{setConfig}} 6. There are some common code in method {{*Rule::initialize()}} and {{*Rule::setConfig()}}, we can probably put them into either class {{PlacementRule}} or class {{FairQueuePlacementUtils}}. > Implement FS placement rules using PlacementRule interface > -- > > Key: YARN-9298 > URL: https://issues.apache.org/jira/browse/YARN-9298 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-9298.001.patch, YARN-9298.002.patch > > > Implement existing placement rules of the FS using the PlacementRule > interface. > Preparation for YARN-8967 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776175#comment-16776175 ] Yufei Gu commented on YARN-9298: Hi [~wilfreds], thanks for the patch. It is really nice to add these unit tests. Some comments: 1. Thanks for adding comments for method {{getPlacementRule(String ruleStr, Configuration conf)}}, but it is only used by CS. You may need to update the comments. 2. I would suggest the unit test messages to clarify the expectation or some actions failed, like this “Rule object shouldn’t be null”. “Cleaned name was changed for clean input" could be something like “Unexpected cleaned name.” Or “Failed to clean name” 3. Can you add a case “root” in method {{testAssureRoot()}}? 4. I feel like class {{TestPlacementRuleFS}} isn’t necessary. Why not just test against DefaultPlacementRule, and all other real rules? Besides, Unit tests are needed for the all FS placement rule classes. I’m OK if you want to move some code from YARN-8967 and reuse existing tests, like the one in class TestQueuePlacementPolicy 5. if {} else if {} else {} or a switch statement could be cleaner than if {} else { if {} else {}} in method {{setConfig}} 6. There are some common code in method {{*Rule::initialize()}} and {{*Rule::setConfig()}}, we can probably put them into either class {{PlacementRule}} or class {{FairQueuePlacementUtils}}. > Implement FS placement rules using PlacementRule interface > -- > > Key: YARN-9298 > URL: https://issues.apache.org/jira/browse/YARN-9298 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-9298.001.patch, YARN-9298.002.patch > > > Implement existing placement rules of the FS using the PlacementRule > interface. > Preparation for YARN-8967 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9278) Shuffle nodes when selecting to be preempted nodes
[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773299#comment-16773299 ] Yufei Gu edited comment on YARN-9278 at 2/20/19 7:47 PM: - Hi [~uranus], this seems a perf issue for a busy large cluster due to the preemption implementation, which is iteration and check. The idea of setting a node # threshhold doesn't look elegant, but reasonable if we can't change the iteration-and-check way to identify preemptable containers. It may not be the only idea though. Without introduce more complexity to FS preemption, it is already very complicated, there are some workarounds you can try: To increase FairShare Preemption Timeout and FairShare Preemption Threshold to reduce the chance of preemption. This is specially useful for a large cluster, since there is more chance to get resources just by waiting. was (Author: yufeigu): Hi [~uranus], this seems a perf issue for a busy large cluster due to the preemption implementation, which is iteration and check. I would suggest lower {{yarn.scheduler.fair.preemption.cluster-utilization-threshold}} to let preemption kick in earlier for a large cluster. The default value is 80%, which means preemption won't kick in until 80% resources of the whole cluster have been used. Please be aware that low utilization threshold may cause an unnecessary container churn, so you don't want it to be too low. > Shuffle nodes when selecting to be preempted nodes > -- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > We should *shuffle* the nodes to avoid some nodes being preempted frequently. > Also, we should *limit* the num of nodes to make preemption more efficient. > Just like this, > {code:java} > // we should not iterate all nodes, that will be very slow > long maxTryNodeNum = > context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); > if (potentialNodes.size() > maxTryNodeNum){ > Collections.shuffle(potentialNodes); > List newPotentialNodes = new ArrayList(); > for (int i = 0; i < maxTryNodeNum; i++){ > newPotentialNodes.add(potentialNodes.get(i)); > } > potentialNodes = newPotentialNodes; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes
[ https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16773299#comment-16773299 ] Yufei Gu commented on YARN-9278: Hi [~uranus], this seems a perf issue for a busy large cluster due to the preemption implementation, which is iteration and check. I would suggest lower {{yarn.scheduler.fair.preemption.cluster-utilization-threshold}} to let preemption kick in earlier for a large cluster. The default value is 80%, which means preemption won't kick in until 80% resources of the whole cluster have been used. Please be aware that low utilization threshold may cause an unnecessary container churn, so you don't want it to be too low. > Shuffle nodes when selecting to be preempted nodes > -- > > Key: YARN-9278 > URL: https://issues.apache.org/jira/browse/YARN-9278 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > > We should *shuffle* the nodes to avoid some nodes being preempted frequently. > Also, we should *limit* the num of nodes to make preemption more efficient. > Just like this, > {code:java} > // we should not iterate all nodes, that will be very slow > long maxTryNodeNum = > context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce(); > if (potentialNodes.size() > maxTryNodeNum){ > Collections.shuffle(potentialNodes); > List newPotentialNodes = new ArrayList(); > for (int i = 0; i < maxTryNodeNum; i++){ > newPotentialNodes.add(potentialNodes.get(i)); > } > potentialNodes = newPotentialNodes; > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16770247#comment-16770247 ] Yufei Gu commented on YARN-9298: Hi [~wilfreds], thanks for splitting and provide the patch. Some comments: 1. Can you add a “@Private” and “@Unstable” notations to all new classes? Can you do the same to class PlacementFactory and PlacementRule since you are changing them? 2. I guess you didn’t bring in unit tests due to the splitting. I just feel uncomfortable to push so many changes without adding any unit test. Can you add unit tests in this jira? it is quit practical to add unit tests for methods in class {{FairQueuePlacementUtils}}, may be a little bit trickier for other classes. 3. There is one extra empty line at the end of class “PlacementFactory” 4. Can you use the org.apache.hadoop.util.ReflectionUtils to get a new instance rather than the code in getPlacementRule()? 5. {{public static T getPlacementRule(Class theClass,…)}} could be {{public static PlacementRule getPlacementRule(Class theClass }} to enforce the type. 6. It is obvious to developers that getting placement is “getting queues”, but still looks confusing to code reader. Can we clarify that here? {{* Get queue for a given application.}} 7. LOG name is wrong in class {{FairQueuePlacementUtils}} 8. In methods {{initialize()}}, there is no need to log error since you’ve raised exceptions. > Implement FS placement rules using PlacementRule interface > -- > > Key: YARN-9298 > URL: https://issues.apache.org/jira/browse/YARN-9298 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-9298.001.patch > > > Implement existing placement rules of the FS using the PlacementRule > interface. > Preparation for YARN-8967 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9277) Add more restrictions In FairScheduler Preemption
[ https://issues.apache.org/jira/browse/YARN-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16766287#comment-16766287 ] Yufei Gu commented on YARN-9277: Hi [~uranus], some general comments, I haven't looked at the code yet. bq. We should not preempt self +1 bq. We should not preempt high priority job. Correct me if I am wrong, there are no priority between Yarn jobs. Priority has been applied to tasks inside one job, which was there before the FS preemption overhaul. We need only priorities between mappers and reducers or other customized priorities since AM containers are always the first priority and have been taken care. bq. We should not preempt container which has been running for a long time. Makes sense if all other conditions are exactly the same. > Add more restrictions In FairScheduler Preemption > -- > > Key: YARN-9277 > URL: https://issues.apache.org/jira/browse/YARN-9277 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-9277.001.patch, YARN-9277.002.patch > > > > I think we should add more restrictions in fair scheduler preemption. > * We should not preempt self > * We should not preempt high priority job > * We should not preempt container which has been running for a long time. > * ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8967) Change FairScheduler to use PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759562#comment-16759562 ] Yufei Gu commented on YARN-8967: Hi [~wilfreds], the patch v4 doesn't apply to the trunk. Can you rebase it? > Change FairScheduler to use PlacementRule interface > --- > > Key: YARN-8967 > URL: https://issues.apache.org/jira/browse/YARN-8967 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-8967.001.patch, YARN-8967.002.patch, > YARN-8967.003.patch, YARN-8967.004.patch > > > The PlacementRule interface was introduced to be used by all schedulers as > per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not > and is using its own rule definition. > YARN-8948 cleans up the implementation and removes the CS references which > should allow this change to go through. > This would be the first step in using one placement rule engine for both > schedulers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8967) Change FairScheduler to use PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-8967: --- Fix Version/s: (was: 3.3) > Change FairScheduler to use PlacementRule interface > --- > > Key: YARN-8967 > URL: https://issues.apache.org/jira/browse/YARN-8967 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Attachments: YARN-8967.001.patch, YARN-8967.002.patch, > YARN-8967.003.patch, YARN-8967.004.patch > > > The PlacementRule interface was introduced to be used by all schedulers as > per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not > and is using its own rule definition. > YARN-8948 cleans up the implementation and removes the CS references which > should allow this change to go through. > This would be the first step in using one placement rule engine for both > schedulers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8967) Change FairScheduler to use PlacementRule interface
[ https://issues.apache.org/jira/browse/YARN-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-8967: --- Fix Version/s: 3.3 > Change FairScheduler to use PlacementRule interface > --- > > Key: YARN-8967 > URL: https://issues.apache.org/jira/browse/YARN-8967 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler, fairscheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Fix For: 3.3 > > Attachments: YARN-8967.001.patch, YARN-8967.002.patch, > YARN-8967.003.patch, YARN-8967.004.patch > > > The PlacementRule interface was introduced to be used by all schedulers as > per YARN-3635. The CapacityScheduler is using it but the FairScheduler is not > and is using its own rule definition. > YARN-8948 cleans up the implementation and removes the CS references which > should allow this change to go through. > This would be the first step in using one placement rule engine for both > schedulers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9041) Performance Optimization of method FSPreemptionThread#identifyContainersToPreempt
[ https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709067#comment-16709067 ] Yufei Gu commented on YARN-9041: Committed to trunk. Thanks [~jiwq] for working on this. Thanks [~Steven Rand] for the review. > Performance Optimization of method > FSPreemptionThread#identifyContainersToPreempt > - > > Key: YARN-9041 > URL: https://issues.apache.org/jira/browse/YARN-9041 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, scheduler preemption >Affects Versions: 3.1.1 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Fix For: 3.2.1 > > Attachments: YARN-9041.001.patch, YARN-9041.002.patch, > YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch, > YARN-9041.006.patch, YARN-9041.007.patch > > > In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM > preemption, and locality relaxation is allowed, then the search space is > expanded to all nodes changed to the remaining nodes. The remaining nodes are > equal to all nodes minus the potential nodes. > Judging condition changed to: > # rr.getRelaxLocality() > # !ResourceRequest.isAnyLocation(rr.getResourceName()) > # bestContainers != null > # bestContainers.numAMContainers > 0 > If I understand the deviation, please criticize me. thx~ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9041) Performance Optimization of method FSPreemptionThread#identifyContainersToPreempt
[ https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-9041: --- Fix Version/s: 3.2.1 > Performance Optimization of method > FSPreemptionThread#identifyContainersToPreempt > - > > Key: YARN-9041 > URL: https://issues.apache.org/jira/browse/YARN-9041 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, scheduler preemption >Affects Versions: 3.1.1 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Fix For: 3.2.1 > > Attachments: YARN-9041.001.patch, YARN-9041.002.patch, > YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch, > YARN-9041.006.patch, YARN-9041.007.patch > > > In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM > preemption, and locality relaxation is allowed, then the search space is > expanded to all nodes changed to the remaining nodes. The remaining nodes are > equal to all nodes minus the potential nodes. > Judging condition changed to: > # rr.getRelaxLocality() > # !ResourceRequest.isAnyLocation(rr.getResourceName()) > # bestContainers != null > # bestContainers.numAMContainers > 0 > If I understand the deviation, please criticize me. thx~ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9041) Performance Optimization of method FSPreemptionThread#identifyContainersToPreempt
[ https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-9041: --- Summary: Performance Optimization of method FSPreemptionThread#identifyContainersToPreempt (was: Performance Optimization of FSPreemptionThread#identifyContainersToPreempt method) > Performance Optimization of method > FSPreemptionThread#identifyContainersToPreempt > - > > Key: YARN-9041 > URL: https://issues.apache.org/jira/browse/YARN-9041 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, scheduler preemption >Affects Versions: 3.1.1 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: YARN-9041.001.patch, YARN-9041.002.patch, > YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch, > YARN-9041.006.patch, YARN-9041.007.patch > > > In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM > preemption, and locality relaxation is allowed, then the search space is > expanded to all nodes changed to the remaining nodes. The remaining nodes are > equal to all nodes minus the potential nodes. > Judging condition changed to: > # rr.getRelaxLocality() > # !ResourceRequest.isAnyLocation(rr.getResourceName()) > # bestContainers != null > # bestContainers.numAMContainers > 0 > If I understand the deviation, please criticize me. thx~ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9041) Performance Optimization of FSPreemptionThread#identifyContainersToPreempt method
[ https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-9041: --- Summary: Performance Optimization of FSPreemptionThread#identifyContainersToPreempt method (was: Optimize FSPreemptionThread#identifyContainersToPreempt method) > Performance Optimization of FSPreemptionThread#identifyContainersToPreempt > method > - > > Key: YARN-9041 > URL: https://issues.apache.org/jira/browse/YARN-9041 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, scheduler preemption >Affects Versions: 3.1.1 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: YARN-9041.001.patch, YARN-9041.002.patch, > YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch, > YARN-9041.006.patch, YARN-9041.007.patch > > > In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM > preemption, and locality relaxation is allowed, then the search space is > expanded to all nodes changed to the remaining nodes. The remaining nodes are > equal to all nodes minus the potential nodes. > Judging condition changed to: > # rr.getRelaxLocality() > # !ResourceRequest.isAnyLocation(rr.getResourceName()) > # bestContainers != null > # bestContainers.numAMContainers > 0 > If I understand the deviation, please criticize me. thx~ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method
[ https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-9041: --- Affects Version/s: 3.1.1 > Optimize FSPreemptionThread#identifyContainersToPreempt method > -- > > Key: YARN-9041 > URL: https://issues.apache.org/jira/browse/YARN-9041 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, scheduler preemption >Affects Versions: 3.1.1 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: YARN-9041.001.patch, YARN-9041.002.patch, > YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch, > YARN-9041.006.patch, YARN-9041.007.patch > > > In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM > preemption, and locality relaxation is allowed, then the search space is > expanded to all nodes changed to the remaining nodes. The remaining nodes are > equal to all nodes minus the potential nodes. > Judging condition changed to: > # rr.getRelaxLocality() > # !ResourceRequest.isAnyLocation(rr.getResourceName()) > # bestContainers != null > # bestContainers.numAMContainers > 0 > If I understand the deviation, please criticize me. thx~ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method
[ https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708242#comment-16708242 ] Yufei Gu commented on YARN-9041: The last patch looks good. +1 for the patch v7. Will commit this soon. > Optimize FSPreemptionThread#identifyContainersToPreempt method > -- > > Key: YARN-9041 > URL: https://issues.apache.org/jira/browse/YARN-9041 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, scheduler preemption >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: YARN-9041.001.patch, YARN-9041.002.patch, > YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch, > YARN-9041.006.patch, YARN-9041.007.patch > > > In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM > preemption, and locality relaxation is allowed, then the search space is > expanded to all nodes changed to the remaining nodes. The remaining nodes are > equal to all nodes minus the potential nodes. > Judging condition changed to: > # rr.getRelaxLocality() > # !ResourceRequest.isAnyLocation(rr.getResourceName()) > # bestContainers != null > # bestContainers.numAMContainers > 0 > If I understand the deviation, please criticize me. thx~ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method
[ https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-9041: --- Component/s: fairscheduler > Optimize FSPreemptionThread#identifyContainersToPreempt method > -- > > Key: YARN-9041 > URL: https://issues.apache.org/jira/browse/YARN-9041 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler, scheduler preemption >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: YARN-9041.001.patch, YARN-9041.002.patch, > YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch, > YARN-9041.006.patch, YARN-9041.007.patch > > > In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM > preemption, and locality relaxation is allowed, then the search space is > expanded to all nodes changed to the remaining nodes. The remaining nodes are > equal to all nodes minus the potential nodes. > Judging condition changed to: > # rr.getRelaxLocality() > # !ResourceRequest.isAnyLocation(rr.getResourceName()) > # bestContainers != null > # bestContainers.numAMContainers > 0 > If I understand the deviation, please criticize me. thx~ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method
[ https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707857#comment-16707857 ] Yufei Gu commented on YARN-9041: Thanks for the patch. Some nits: # {{ * @return list preemptable containers}} should be something like {{the list of best preemptable containers for the resource request}} # We still need some comments in both tests to clarify which logic path the test are for. For example, we can add comments in {{testRelaxLocalityToPreemptLessAM}} to say that it tests the case that there is no less-AM-container solution in the remaining nodes. > Optimize FSPreemptionThread#identifyContainersToPreempt method > -- > > Key: YARN-9041 > URL: https://issues.apache.org/jira/browse/YARN-9041 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler preemption >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: YARN-9041.001.patch, YARN-9041.002.patch, > YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch, > YARN-9041.006.patch > > > In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM > preemption, and locality relaxation is allowed, then the search space is > expanded to all nodes changed to the remaining nodes. The remaining nodes are > equal to all nodes minus the potential nodes. > Judging condition changed to: > # rr.getRelaxLocality() > # !ResourceRequest.isAnyLocation(rr.getResourceName()) > # bestContainers != null > # bestContainers.numAMContainers > 0 > If I understand the deviation, please criticize me. thx~ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method
[ https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704014#comment-16704014 ] Yufei Gu commented on YARN-9041: Hi [~jiwq], the patch v5 looks good in terms of logic. Some nits: # Can you rename two tests or write comments to clarify their intentions? I suppose the goals of two methods are: One can find less AM containers solution in the relax locations and the other can't. # It is a good practice to put the callee methods under the caller methods. # Can you refactor to create a new method like this? Please remember to reorganize the method java doc. And we probably don't need the comment "// Don't preempt AM containers just to satisfy local requests if relax // locality is enabled." in that case. {code} /** * Iterate through matching * nodes and identify containers to preempt all on one node, also ** optimizing for least number of AM container preemptions. Only nodes ** that match the locality level specified in the {@link ResourceRequest} ** are considered. However, if this would lead to AM preemption, and locality ** relaxation is allowed, then the search space is expanded to the remaining ** nodes. * * @param rr * @param potentialNodes * @return */ private PreemptableContainers getBestPreemptableContainers(ResourceRequest rr, List potentialNodes) { PreemptableContainers bestContainers = identifyContainersToPreemptForOneContainer(potentialNodes, rr); if (rr.getRelaxLocality() && !ResourceRequest.isAnyLocation(rr.getResourceName()) && bestContainers != null && bestContainers.numAMContainers > 0) { List remainingNodes = scheduler.getNodeTracker().getAllNodes(); remainingNodes.removeAll(potentialNodes); PreemptableContainers spareContainers = identifyContainersToPreemptForOneContainer(remainingNodes, rr); if (spareContainers != null && spareContainers.numAMContainers < bestContainers.numAMContainers) { bestContainers = spareContainers; } } return bestContainers; } {code} > Optimize FSPreemptionThread#identifyContainersToPreempt method > -- > > Key: YARN-9041 > URL: https://issues.apache.org/jira/browse/YARN-9041 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler preemption >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: YARN-9041.001.patch, YARN-9041.002.patch, > YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch > > > In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM > preemption, and locality relaxation is allowed, then the search space is > expanded to all nodes changed to the remaining nodes. The remaining nodes are > equal to all nodes minus the potential nodes. > Judging condition changed to: > # rr.getRelaxLocality() > # !ResourceRequest.isAnyLocation(rr.getResourceName()) > # bestContainers != null > # bestContainers.numAMContainers > 0 > If I understand the deviation, please criticize me. thx~ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method
[ https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704014#comment-16704014 ] Yufei Gu edited comment on YARN-9041 at 11/29/18 11:21 PM: --- Hi [~jiwq], the patch v5 looks good in terms of logic. Some nits: # Can you rename two tests or write comments to clarify their intentions? I suppose the goals of two methods are: One can find less AM containers solution in the relax locations and the other can't. # It is a good practice to put the callee methods under the caller methods. # Can you refactor to create a new method like this? Please remember to reorganize the method java doc. And we probably don't need the comment "// Don't preempt AM containers just to satisfy local requests if relax // locality is enabled." in that case. {code} /** * Iterate through matching * nodes and identify containers to preempt all on one node, also ** optimizing for least number of AM container preemptions. Only nodes ** that match the locality level specified in the {@link ResourceRequest} ** are considered. However, if this would lead to AM preemption, and locality ** relaxation is allowed, then the search space is expanded to the remaining ** nodes. * * @param rr * @param potentialNodes * @return */ private PreemptableContainers getBestPreemptableContainers(ResourceRequest rr, List potentialNodes) { PreemptableContainers bestContainers = identifyContainersToPreemptForOneContainer(potentialNodes, rr); if (rr.getRelaxLocality() && !ResourceRequest.isAnyLocation(rr.getResourceName()) && bestContainers != null && bestContainers.numAMContainers > 0) { List remainingNodes = scheduler.getNodeTracker().getAllNodes(); remainingNodes.removeAll(potentialNodes); PreemptableContainers spareContainers = identifyContainersToPreemptForOneContainer(remainingNodes, rr); if (spareContainers != null && spareContainers.numAMContainers < bestContainers.numAMContainers) { bestContainers = spareContainers; } } return bestContainers; } {code} was (Author: yufeigu): Hi [~jiwq], the patch v5 looks good in terms of logic. Some nits: # Can you rename two tests or write comments to clarify their intentions? I suppose the goals of two methods are: One can find less AM containers solution in the relax locations and the other can't. # It is a good practice to put the callee methods under the caller methods. # Can you refactor to create a new method like this? Please remember to reorganize the method java doc. And we probably don't need the comment "// Don't preempt AM containers just to satisfy local requests if relax // locality is enabled." in that case. {code} /** * Iterate through matching * nodes and identify containers to preempt all on one node, also ** optimizing for least number of AM container preemptions. Only nodes ** that match the locality level specified in the {@link ResourceRequest} ** are considered. However, if this would lead to AM preemption, and locality ** relaxation is allowed, then the search space is expanded to the remaining ** nodes. * * @param rr * @param potentialNodes * @return */ private PreemptableContainers getBestPreemptableContainers(ResourceRequest rr, List potentialNodes) { PreemptableContainers bestContainers = identifyContainersToPreemptForOneContainer(potentialNodes, rr); if (rr.getRelaxLocality() && !ResourceRequest.isAnyLocation(rr.getResourceName()) && bestContainers != null && bestContainers.numAMContainers > 0) { List remainingNodes = scheduler.getNodeTracker().getAllNodes(); remainingNodes.removeAll(potentialNodes); PreemptableContainers spareContainers = identifyContainersToPreemptForOneContainer(remainingNodes, rr); if (spareContainers != null && spareContainers.numAMContainers < bestContainers.numAMContainers) { bestContainers = spareContainers; } } return bestContainers; } {code} > Optimize FSPreemptionThread#identifyContainersToPreempt method > -- > > Key: YARN-9041 > URL: https://issues.apache.org/jira/browse/YARN-9041 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler preemption >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: YARN-9041.001.patch, YARN-9041.002.patch, > YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch > > > In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM > preemption, and locality relaxation is allowed, then the search space is > expanded to all nodes changed to the
[jira] [Commented] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method
[ https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702308#comment-16702308 ] Yufei Gu commented on YARN-9041: There is an error in test build which isn't related to your patch. I'll review later. {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on project hadoop-yarn-server-resourcemanager: There was a timeout or other error in the fork -> [Help 1] {code} > Optimize FSPreemptionThread#identifyContainersToPreempt method > -- > > Key: YARN-9041 > URL: https://issues.apache.org/jira/browse/YARN-9041 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler preemption >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: YARN-9041.001.patch, YARN-9041.002.patch, > YARN-9041.003.patch, YARN-9041.004.patch, YARN-9041.005.patch > > > In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM > preemption, and locality relaxation is allowed, then the search space is > expanded to all nodes changed to the remaining nodes. The remaining nodes are > equal to all nodes minus the potential nodes. > Judging condition changed to: > # rr.getRelaxLocality() > # !ResourceRequest.isAnyLocation(rr.getResourceName()) > # bestContainers != null > # bestContainers.numAMContainers > 0 > If I understand the deviation, please criticize me. thx~ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9041) Optimize FSPreemptionThread#identifyContainersToPreempt method
[ https://issues.apache.org/jira/browse/YARN-9041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699511#comment-16699511 ] Yufei Gu commented on YARN-9041: Hi [~jiwq], thanks for the patch. I like the idea to shrink the search space, and your patch v2 seems to solve the concern raised by [~Steven Rand]. However, it is necessary to provide a unit test case for the change. > Optimize FSPreemptionThread#identifyContainersToPreempt method > -- > > Key: YARN-9041 > URL: https://issues.apache.org/jira/browse/YARN-9041 > Project: Hadoop YARN > Issue Type: Improvement > Components: scheduler preemption >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: YARN-9041.001.patch, YARN-9041.002.patch > > > In FSPreemptionThread#identifyContainersToPreempt method, I suggest if AM > preemption, and locality relaxation is allowed, then the search space is > expanded to all nodes changed to the remaining nodes. The remaining nodes are > equal to all nodes minus the potential nodes. > Judging condition changed to: > # rr.getRelaxLocality() > # !ResourceRequest.isAnyLocation(rr.getResourceName()) > # bestContainers != null > # bestContainers.numAMContainers > 0 > If I understand the deviation, please criticize me. thx~ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9005) FairScheduler maybe preempt the AM container
[ https://issues.apache.org/jira/browse/YARN-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16690178#comment-16690178 ] Yufei Gu commented on YARN-9005: Hi [~jiwq], sorry for the late response. Please clarify what performance issue you want to fix while creating the new issue. Thanks. > FairScheduler maybe preempt the AM container > > > Key: YARN-9005 > URL: https://issues.apache.org/jira/browse/YARN-9005 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, scheduler preemption >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: YARN-9005.001.patch, YARN-9005.002.patch > > > In the worst case, FS preempt the AM container. Due to > FSPreemptionThread#identifyContainersToPreempt return value contains AM > container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9005) FairScheduler maybe preempt the AM container
[ https://issues.apache.org/jira/browse/YARN-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682243#comment-16682243 ] Yufei Gu commented on YARN-9005: It is by design that AM containers can be preempted. YARN-5830 did the improvement to reduce the chance of preempting AM containers. FS still preempts AM containers if that is the only option. > FairScheduler maybe preempt the AM container > > > Key: YARN-9005 > URL: https://issues.apache.org/jira/browse/YARN-9005 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, scheduler preemption >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: YARN-9005.001.patch > > > In the worst case, FS preempt the AM container. Due to > FSPreemptionThread#identifyContainersToPreempt return value contains AM > container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9005) FairScheduler maybe preempt the AM container
[ https://issues.apache.org/jira/browse/YARN-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-9005: --- Component/s: scheduler preemption fairscheduler > FairScheduler maybe preempt the AM container > > > Key: YARN-9005 > URL: https://issues.apache.org/jira/browse/YARN-9005 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, scheduler preemption >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: YARN-9005.001.patch > > > In the worst case, FS preempt the AM container. Due to > FSPreemptionThread#identifyContainersToPreempt return value contains AM > container. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8978) For fair scheduler, application with higher priority should also get priority resources for running AM
[ https://issues.apache.org/jira/browse/YARN-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16678577#comment-16678577 ] Yufei Gu commented on YARN-8978: [~qiuliang988], not sure if you still need this jira but you shouldn't make it as "fixed". Please make it as "invalid/won't fix" if you don't need it. > For fair scheduler, application with higher priority should also get priority > resources for running AM > -- > > Key: YARN-8978 > URL: https://issues.apache.org/jira/browse/YARN-8978 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: qiuliang >Priority: Major > Attachments: YARN-8978.001.patch > > > In order to allow important applications to run earlier, we used priority > scheduling in the fair scheduler, and FairSharePolicy uses YARN-6307. > Considering this situation, there are two applications (with different > priorities) in the same queue and both are accepted. Both applications are > demanding and hungry when dispatched to the queue. Next, calculate the weight > ratio. Since the used resources of both applications are 0, the weight ratio > is also 0. The priority is invalid in this case. Low-priority applications > may get resources to run AM earlier than high-priority applications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8969) Change the return type to generic type of AbstractYarnScheduler#getNodeTracker
[ https://issues.apache.org/jira/browse/YARN-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675958#comment-16675958 ] Yufei Gu commented on YARN-8969: [~eepayne], it is probably fine in this case. {{AbstractYarnScheduler}} is a @Private @Unstable class. > Change the return type to generic type of AbstractYarnScheduler#getNodeTracker > -- > > Key: YARN-8969 > URL: https://issues.apache.org/jira/browse/YARN-8969 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.9.1, 3.1.1 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: YARN-8969.001.patch > > > Some warning problems like: > {quote}Unchecked assignment: 'java.util.List' to > 'java.util.List'. > Reason: 'scheduler.getNodeTracker()' has raw type, so result of > getNodesByResourceName is erased{quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8792) Revisit FairScheduler QueuePlacementPolicy
[ https://issues.apache.org/jira/browse/YARN-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631207#comment-16631207 ] Yufei Gu commented on YARN-8792: [~HCOONa], I've added you as a contributor, so that you can assign these jiras to yourself. > Revisit FairScheduler QueuePlacementPolicy > --- > > Key: YARN-8792 > URL: https://issues.apache.org/jira/browse/YARN-8792 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.1.1 >Reporter: Shuai Zhang >Priority: Major > > Fair scheduler use `QueuePlacementPolicy` to map a request to queue. There > are several problems: > # The termination of the responsibility chain should bind to the assigning > result instead of the rule. > # It should provide a reason when rejecting a request. > # Still need more useful rules: > ## RejectNonLeafQueue > ## RejectDefaultQueue > ## RejectUsers > ## RejectQueues > ## DefaultByUser -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7794) SLSRunner is not loading timeline service jars causing failure
[ https://issues.apache.org/jira/browse/YARN-7794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606480#comment-16606480 ] Yufei Gu commented on YARN-7794: [~jhung], the patch looks good to me. > SLSRunner is not loading timeline service jars causing failure > -- > > Key: YARN-7794 > URL: https://issues.apache.org/jira/browse/YARN-7794 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Affects Versions: 3.1.0 >Reporter: Sunil Govindan >Assignee: Yufei Gu >Priority: Blocker > Fix For: 3.1.0 > > Attachments: YARN-7794-branch-2.001.patch, YARN-7794.001.patch > > > {code:java} > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 13 more > Exception in thread "pool-2-thread-390" java.lang.NoClassDefFoundError: > org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollector > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:443) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:321) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:641){code} > We are getting this error while running SLS. new patch of timelineservice > under share/hadoop/yarn is not loaded in SLS jvm (verified from slsrunner > classpath) > cc/ [~rohithsharma] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8632) Threads in SLS quit without logging exception
[ https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592032#comment-16592032 ] Yufei Gu commented on YARN-8632: +1 for the patch v4. Committed to trunk. Thanks [~luxianghao] for the patch. > Threads in SLS quit without logging exception > -- > > Key: YARN-8632 > URL: https://issues.apache.org/jira/browse/YARN-8632 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Attachments: YARN-8632-branch-2.7.2.001.patch, YARN-8632.001.patch, > YARN-8632.002.patch, YARN-8632.003.patch, YARN-8632.004.patch > > > Recently, I have been using > [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html] > to validate the impact of changes on my FairScheduler. I encountered some > problems. > Firstly, I fix a npe bug with the patch in > https://issues.apache.org/jira/browse/YARN-4302 > Secondly, everything seems to be ok, but I just get "[]" in file > realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit > because of npe, > the reason is "wrapper.getQueueSet()" is still null when executing "String > metrics = web.generateRealTimeTrackingMetrics();" > So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" > in try section to avoid MetricsLogRunnable thread exit with unexpected > exception. > My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the > second problem and I have made a patch to solve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8632) Threads in SLS quit without logging exception
[ https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-8632: --- Summary: Threads in SLS quit without logging exception (was: No data in file realtimetrack.json after running SchedulerLoadSimulator) > Threads in SLS quit without logging exception > -- > > Key: YARN-8632 > URL: https://issues.apache.org/jira/browse/YARN-8632 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Attachments: YARN-8632-branch-2.7.2.001.patch, YARN-8632.001.patch, > YARN-8632.002.patch, YARN-8632.003.patch, YARN-8632.004.patch > > > Recently, I have been using > [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html] > to validate the impact of changes on my FairScheduler. I encountered some > problems. > Firstly, I fix a npe bug with the patch in > https://issues.apache.org/jira/browse/YARN-4302 > Secondly, everything seems to be ok, but I just get "[]" in file > realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit > because of npe, > the reason is "wrapper.getQueueSet()" is still null when executing "String > metrics = web.generateRealTimeTrackingMetrics();" > So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" > in try section to avoid MetricsLogRunnable thread exit with unexpected > exception. > My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the > second problem and I have made a patch to solve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator
[ https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590570#comment-16590570 ] Yufei Gu commented on YARN-8632: [~luxianghao], thanks for the patch. Nice finding. +1 for the patch v3. Will commit later. Do you need a patch for 2.7? > No data in file realtimetrack.json after running SchedulerLoadSimulator > --- > > Key: YARN-8632 > URL: https://issues.apache.org/jira/browse/YARN-8632 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Attachments: YARN-8632-branch-2.7.2.001.patch, YARN-8632.001.patch, > YARN-8632.002.patch, YARN-8632.003.patch > > > Recently, I have been using > [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html] > to validate the impact of changes on my FairScheduler. I encountered some > problems. > Firstly, I fix a npe bug with the patch in > https://issues.apache.org/jira/browse/YARN-4302 > Secondly, everything seems to be ok, but I just get "[]" in file > realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit > because of npe, > the reason is "wrapper.getQueueSet()" is still null when executing "String > metrics = web.generateRealTimeTrackingMetrics();" > So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" > in try section to avoid MetricsLogRunnable thread exit with unexpected > exception. > My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the > second problem and I have made a patch to solve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator
[ https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586344#comment-16586344 ] Yufei Gu commented on YARN-8632: For that sake, we need to "setUncaughtExceptionHandler" for the thread, and provide a handler. Catching every exception in {{run()}} isn't enough. > No data in file realtimetrack.json after running SchedulerLoadSimulator > --- > > Key: YARN-8632 > URL: https://issues.apache.org/jira/browse/YARN-8632 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Attachments: YARN-8632-branch-2.7.2.001.patch, YARN-8632.001.patch, > YARN-8632.002.patch > > > Recently, I have been using > [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html] > to validate the impact of changes on my FairScheduler. I encountered some > problems. > Firstly, I fix a npe bug with the patch in > https://issues.apache.org/jira/browse/YARN-4302 > Secondly, everything seems to be ok, but I just get "[]" in file > realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit > because of npe, > the reason is "wrapper.getQueueSet()" is still null when executing "String > metrics = web.generateRealTimeTrackingMetrics();" > So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" > in try section to avoid MetricsLogRunnable thread exit with unexpected > exception. > My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the > second problem and I have made a patch to solve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler
[ https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583318#comment-16583318 ] Yufei Gu commented on YARN-5139: [~zhuqi], done. > [Umbrella] Move YARN scheduler towards global scheduler > --- > > Key: YARN-5139 > URL: https://issues.apache.org/jira/browse/YARN-5139 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: Explanantions of Global Scheduling (YARN-5139) > Implementation.pdf, YARN-5139-Concurrent-scheduling-performance-report.pdf, > YARN-5139-Global-Schedulingd-esign-and-implementation-notes-v2.pdf, > YARN-5139-Global-Schedulingd-esign-and-implementation-notes.pdf, > YARN-5139.000.patch, wip-1.YARN-5139.patch, wip-2.YARN-5139.patch, > wip-3.YARN-5139.patch, wip-4.YARN-5139.patch, wip-5.YARN-5139.patch > > > Existing YARN scheduler is based on node heartbeat. This can lead to > sub-optimal decisions because scheduler can only look at one node at the time > when scheduling resources. > Pseudo code of existing scheduling logic looks like: > {code} > for node in allNodes: >Go to parentQueue > Go to leafQueue > for application in leafQueue.applications: >for resource-request in application.resource-requests > try to schedule on node > {code} > Considering future complex resource placement requirements, such as node > constraints (give me "a && b || c") or anti-affinity (do not allocate HBase > regionsevers and Storm workers on the same host), we may need to consider > moving YARN scheduler towards global scheduling. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe
[ https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-8655: -- Assignee: Zhaohui Xin > FairScheduler: FSStarvedApps is not thread safe > --- > > Key: YARN-8655 > URL: https://issues.apache.org/jira/browse/YARN-8655 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-8655.patch > > > *FSStarvedApps is not thread safe, this may make one starve app is processed > for two times continuously.* > For example, when app1 is fair share starved, it has been added to > appsToProcess. After that, app1 is taken but appBeingProcessed is not yet > update to app1. At the moment, app1 is starved by min share, so this app is > added to appsToProcess again! Because appBeingProcessed is null and > appsToProcess also have not this one. > {code:java} > void addStarvedApp(FSAppAttempt app) { > if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) { > appsToProcess.add(app); > } > } > FSAppAttempt take() throws InterruptedException { > // Reset appBeingProcessed before the blocking call > appBeingProcessed = null; > // Blocking call to fetch the next starved application > FSAppAttempt app = appsToProcess.take(); > appBeingProcessed = app; > return app; > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8655) FairScheduler: FSStarvedApps is not thread safe
[ https://issues.apache.org/jira/browse/YARN-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580139#comment-16580139 ] Yufei Gu commented on YARN-8655: Hi [~uranus], added you to the contributor list and assigned this to you. > FairScheduler: FSStarvedApps is not thread safe > --- > > Key: YARN-8655 > URL: https://issues.apache.org/jira/browse/YARN-8655 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 >Reporter: Zhaohui Xin >Assignee: Zhaohui Xin >Priority: Major > Attachments: YARN-8655.patch > > > *FSStarvedApps is not thread safe, this may make one starve app is processed > for two times continuously.* > For example, when app1 is fair share starved, it has been added to > appsToProcess. After that, app1 is taken but appBeingProcessed is not yet > update to app1. At the moment, app1 is starved by min share, so this app is > added to appsToProcess again! Because appBeingProcessed is null and > appsToProcess also have not this one. > {code:java} > void addStarvedApp(FSAppAttempt app) { > if (!app.equals(appBeingProcessed) && !appsToProcess.contains(app)) { > appsToProcess.add(app); > } > } > FSAppAttempt take() throws InterruptedException { > // Reset appBeingProcessed before the blocking call > appBeingProcessed = null; > // Blocking call to fetch the next starved application > FSAppAttempt app = appsToProcess.take(); > appBeingProcessed = app; > return app; > } > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator
[ https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578945#comment-16578945 ] Yufei Gu commented on YARN-8632: Thanks for the patch. Some comments: # It is not a good practice to catch runtime exception. Normally we should let the program exit if a runtime exception happens. Create a new exception if you think it is necessary and throw and catch it explicitly. # Use {{LOG.info("message", e)}} instead of {{e.printStackTrace();}} # {{(SchedulerWrapper)scheduler;}} needs a space before "scheduler" # Create a unitest if possible. > No data in file realtimetrack.json after running SchedulerLoadSimulator > --- > > Key: YARN-8632 > URL: https://issues.apache.org/jira/browse/YARN-8632 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Attachments: YARN-8632-branch-2.7.2.001.patch, YARN-8632.001.patch > > > Recently, I have beenning using > [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html] > to validate the impact of changes on my FairScheduler. I encountered some > problems. > Firstly, I fix a npe bug with the patch in > https://issues.apache.org/jira/browse/YARN-4302 > Secondly, Everything seems to be ok, but I just get "[]" in file > realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit > because of npe, > the reason is "wrapper.getQueueSet()" is still null when executing "String > metrics = web.generateRealTimeTrackingMetrics();" > So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" > in try section to avoid MetricsLogRunnable thread exit with unexpected > exception. > My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the > second problem and I have made a patch to solve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator
[ https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576875#comment-16576875 ] Yufei Gu edited comment on YARN-8632 at 8/10/18 9:28 PM: - Your patch doesn't apply to trunk. You said the bug is in trunk as well, can you provide a patch for the trunk? What is the version does your patch target? 2.7.2? was (Author: yufeigu): Your patch doesn't apply to trunk? You said the bug is in trunk as well, can you provide a patch for the trunk? What is the version does your patch target? 2.7.2? > No data in file realtimetrack.json after running SchedulerLoadSimulator > --- > > Key: YARN-8632 > URL: https://issues.apache.org/jira/browse/YARN-8632 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Attachments: YARN-8632.001.patch > > > Recently, I have beenning using > [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html] > to validate the impact of changes on my FairScheduler. I encountered some > problems. > Firstly, I fix a npe bug with the patch in > https://issues.apache.org/jira/browse/YARN-4302 > Secondly, Everything seems to be ok, but I just get "[]" in file > realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit > because of npe, > the reason is "wrapper.getQueueSet()" is still null when executing "String > metrics = web.generateRealTimeTrackingMetrics();" > So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" > in try section to avoid MetricsLogRunnable thread exit with unexpected > exception. > My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the > second problem and I have made a patch to solve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator
[ https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576875#comment-16576875 ] Yufei Gu commented on YARN-8632: Your patch doesn't apply to trunk? You said the bug is in trunk as well, can you provide a patch for the trunk? What is the version does your patch target? 2.7.2? > No data in file realtimetrack.json after running SchedulerLoadSimulator > --- > > Key: YARN-8632 > URL: https://issues.apache.org/jira/browse/YARN-8632 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Attachments: YARN-8632.001.patch > > > Recently, I have beenning using > [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html] > to validate the impact of changes on my FairScheduler. I encountered some > problems. > Firstly, I fix a npe bug with the patch in > https://issues.apache.org/jira/browse/YARN-4302 > Secondly, Everything seems to be ok, but I just get "[]" in file > realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit > because of npe, > the reason is "wrapper.getQueueSet()" is still null when executing "String > metrics = web.generateRealTimeTrackingMetrics();" > So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" > in try section to avoid MetricsLogRunnable thread exit with unexpected > exception. > My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the > second problem and I have made a patch to solve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator
[ https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reassigned YARN-8632: -- Assignee: Xianghao Lu > No data in file realtimetrack.json after running SchedulerLoadSimulator > --- > > Key: YARN-8632 > URL: https://issues.apache.org/jira/browse/YARN-8632 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Attachments: YARN-8632.001.patch > > > Recently, I have beenning using > [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html] > to validate the impact of changes on my FairScheduler. I encountered some > problems. > Firstly, I fix a npe bug with the patch in > https://issues.apache.org/jira/browse/YARN-4302 > Secondly, Everything seems to be ok, but I just get "[]" in file > realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit > because of npe, > the reason is "wrapper.getQueueSet()" is still null when executing "String > metrics = web.generateRealTimeTrackingMetrics();" > So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" > in try section to avoid MetricsLogRunnable thread exit with unexpected > exception. > My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the > second problem and I have made a patch to solve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8632) No data in file realtimetrack.json after running SchedulerLoadSimulator
[ https://issues.apache.org/jira/browse/YARN-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1657#comment-1657 ] Yufei Gu commented on YARN-8632: Added you to the contributor list and assign this to you. Will review later. > No data in file realtimetrack.json after running SchedulerLoadSimulator > --- > > Key: YARN-8632 > URL: https://issues.apache.org/jira/browse/YARN-8632 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler-load-simulator >Reporter: Xianghao Lu >Assignee: Xianghao Lu >Priority: Major > Attachments: YARN-8632.001.patch > > > Recently, I have beenning using > [SchedulerLoadSimulator|https://hadoop.apache.org/docs/r2.7.2/hadoop-sls/SchedulerLoadSimulator.html] > to validate the impact of changes on my FairScheduler. I encountered some > problems. > Firstly, I fix a npe bug with the patch in > https://issues.apache.org/jira/browse/YARN-4302 > Secondly, Everything seems to be ok, but I just get "[]" in file > realtimetrack.json. Finally, I find the MetricsLogRunnable thread will exit > because of npe, > the reason is "wrapper.getQueueSet()" is still null when executing "String > metrics = web.generateRealTimeTrackingMetrics();" > So, we should put "String metrics = web.generateRealTimeTrackingMetrics();" > in try section to avoid MetricsLogRunnable thread exit with unexpected > exception. > My hadoop version is 2.7.2, it seems that hadoop trunk branch also has the > second problem and I have made a patch to solve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8639) Sort queue and apps in fair schduler using a separate thread
[ https://issues.apache.org/jira/browse/YARN-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575926#comment-16575926 ] Yufei Gu commented on YARN-8639: Close it or take this chance to do a little bit test. Either way works for me. > Sort queue and apps in fair schduler using a separate thread > > > Key: YARN-8639 > URL: https://issues.apache.org/jira/browse/YARN-8639 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: wan kun >Priority: Minor > > If fair scheduler have many queue and each queue have many active > applications . > For each assignContainer function, we need to sort all the queue, and all the > applications in each queue. For a large system,this may be cost too much > time. So we can sort the queue and applications using a separate thread > asynchronous. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8639) Sort queue and apps in fair schduler using a separate thread
[ https://issues.apache.org/jira/browse/YARN-8639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575302#comment-16575302 ] Yufei Gu commented on YARN-8639: We need to quantify a little bit before we'd make any non-trivial change. How many sub-queues/applications will be considered as too many, and cause performance issue. The result won't only justify why we need the change but also provide the guideline to tuning the queue settings. > Sort queue and apps in fair schduler using a separate thread > > > Key: YARN-8639 > URL: https://issues.apache.org/jira/browse/YARN-8639 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Reporter: wan kun >Priority: Minor > > If fair scheduler have many queue and each queue have many active > applications . > For each assignContainer function, we need to sort all the queue, and all the > applications in each queue. For a large system,this may be cost too much > time. So we can sort the queue and applications using a separate thread > asynchronous. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6636) Fair Scheduler: respect node labels at resource request level
[ https://issues.apache.org/jira/browse/YARN-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572303#comment-16572303 ] Yufei Gu commented on YARN-6636: There are multiple ways to approach node labeling in Fair Scheduler. The community doesn't have the consensus. The way YARN-2497 took heavily involves queue management or fair share calculations. Whether node labeling want to affect queue management is decided by whether we want fairness on the node labeling. Node labeling partitions the cluster resources. My take is we generally still need fairness on each partition, which is materialized by queue, fair share. However, some particular cases only require node labeling to act like data locality which doesn't need fairness. > Fair Scheduler: respect node labels at resource request level > - > > Key: YARN-6636 > URL: https://issues.apache.org/jira/browse/YARN-6636 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar >Priority: Major > > This ticket is to track changes to fair scheduler to respect node labels at > resource request level. When the client sets labels at resource request > level, the scheduler must schedule those containers only on those nodes with > that label. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8495) Priority scheduling support in FairShare scheduler
[ https://issues.apache.org/jira/browse/YARN-8495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533902#comment-16533902 ] Yufei Gu commented on YARN-8495: Thanks [~Dillon.] for filing this. The implementation conflicts with the preemption inside a queue, which assumes each application splits the resource evenly, so that scheduler will kill the containers from the applications who use more resource than its fair share. In a case that a low priority app A uses less than its fair share and a high priority app B uses more than its fair share. This priority scheduling will assign containers to B, while the preemption will kill B's containers and intend to give them to A. A live lock happens. > Priority scheduling support in FairShare scheduler > -- > > Key: YARN-8495 > URL: https://issues.apache.org/jira/browse/YARN-8495 > Project: Hadoop YARN > Issue Type: Wish > Components: fairscheduler >Reporter: Dillon Zhang >Priority: Major > Attachments: YARN-8495.001.patch > > > In production environment, priority scheduling is of vital importance to us > as we have lots of queues for different departments, then create applications > but some of them are not so important as others, so we must guarantee the > import ones to supply service. > Based on the priority of the application, Fair Scheduler should be able to > give preference to application while scheduling. > Comparator applicationComparator can be changed as > below. > 1. Check for Application priority. If priority is available, then return the > highest priority job. > 2. Otherwise continue with existing logic such as Fair Share comparison and > App ID comparison and then TimeStamp comparison. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526924#comment-16526924 ] Yufei Gu commented on YARN-8468: Sounds good to me. Thanks [~mrbillau]. > Limit container sizes per queue in FairScheduler > > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Labels: patch > Attachments: YARN-8468.000.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > > The goal of this ticket is to allow this value to be set on a per queue basis. > > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > > Suggested solution: > > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability() in both FSParentQueue and > FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * write JUnit tests. > * update the scheduler documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526869#comment-16526869 ] Yufei Gu commented on YARN-8468: [~bsteinbach] since you filed this jira and provided the patch, you have the responsibility to justify the motivation. However, I am OK with this feature. > Limit container sizes per queue in FairScheduler > > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > Labels: patch > Attachments: YARN-8468.000.patch > > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > > The goal of this ticket is to allow this value to be set on a per queue basis. > > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > > Suggested solution: > > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability() in both FSParentQueue and > FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * write JUnit tests. > * update the scheduler documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler
[ https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525773#comment-16525773 ] Yufei Gu commented on YARN-8468: It seems a benign feature in terms of how it impact the existing functionalities. In that sense, +0 for the feature. I'm more curious about the motivation. Can you elaborate more on it? [~bsteinbach]. [~szegedim], I think [~bsteinbach] proposed "maxContainerResources" as a queue property. > Limit container sizes per queue in FairScheduler > > > Key: YARN-8468 > URL: https://issues.apache.org/jira/browse/YARN-8468 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Antal Bálint Steinbach >Priority: Critical > > When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" > to limit the overall size of a container. This applies globally to all > containers and cannot be limited by queue or and is not scheduler dependent. > > The goal of this ticket is to allow this value to be set on a per queue basis. > > The use case: User has two pools, one for ad hoc jobs and one for enterprise > apps. User wants to limit ad hoc jobs to small containers but allow > enterprise apps to request as many resources as needed. Setting > yarn.scheduler.maximum-allocation-mb sets a default value for maximum > container size for all queues and setting maximum resources per queue with > “maxContainerResources” queue config value. > > Suggested solution: > > All the infrastructure is already in the code. We need to do the following: > * add the setting to the queue properties for all queue types (parent and > leaf), this will cover dynamically created queues. > * if we set it on the root we override the scheduler setting and we should > not allow that. > * make sure that queue resource cap can not be larger than scheduler max > resource cap in the config. > * implement getMaximumResourceCapability(String queueName) in the > FairScheduler > * implement getMaximumResourceCapability() in both FSParentQueue and > FSLeafQueue as follows > * expose the setting in the queue information in the RM web UI. > * expose the setting in the metrics etc for the queue. > * write JUnit tests. > * update the scheduler documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8184) Too many metrics if containerLocalizer/ResourceLocalizationService uses ReadWriteDiskValidator
[ https://issues.apache.org/jira/browse/YARN-8184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520773#comment-16520773 ] Yufei Gu commented on YARN-8184: Committed to trunk. Thanks for the review, [~haibochen]. > Too many metrics if containerLocalizer/ResourceLocalizationService uses > ReadWriteDiskValidator > -- > > Key: YARN-8184 > URL: https://issues.apache.org/jira/browse/YARN-8184 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Yufei Gu >Assignee: Yufei Gu >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8184.001.patch, YARN-8184.002.patch > > > ContainerLocalizer or ResourceLocalizationService will use the > ReadWriteDiskValidator as its disk validator when it downloads files if we > configure the yarn.nodemanger.disk-validator to ReadWriteDiskValidator's > name. In that case, ReadWriteDiskValidator will create a metric item for each > directory localized, which will be too many metrics. We should let > ContainerLocalizer only use the basic disk validator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511503#comment-16511503 ] Yufei Gu commented on YARN-8394: Hi [~cheersyang], Let me clarify a little bit. The code logic should be: {code:java} if "yarn.scheduler.capacity.node-locality-delay" is -1: disable "yarn.scheduler.capacity.rack-locality-additional-delay" {code} So that, a user doesn't need to set it manually, which is suggested by the doc you added. Moreover, if the code logic had been there, we would just say that if you disable yarn.scheduler.capacity.node-locality-delay, you disable yarn.scheduler.capacity.rack-locality-additional-delay as well. {quote} Note, this feature should be disabled if YARN is deployed separately with the file system, as locality is meaningless. This can be done by setting `yarn.scheduler.capacity.node-locality-delay` to `-1`, in this case, request's locality constraint is ignored. {quote} > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Fix For: 3.2.0, 3.1.1, 3.0.4 > > Attachments: YARN-8394.001.patch, YARN-8394.002.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510625#comment-16510625 ] Yufei Gu commented on YARN-8394: LGTM, can you file a jira for the code change? > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8394.001.patch, YARN-8394.002.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8406) Do the improvement to the FSLeafQueue about calculating fair share for apps
[ https://issues.apache.org/jira/browse/YARN-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506571#comment-16506571 ] Yufei Gu commented on YARN-8406: You can probably move your patch and test result to YARN-7467 since I've closed this one. It doesn't make sense to work in here. > Do the improvement to the FSLeafQueue about calculating fair share for apps > --- > > Key: YARN-8406 > URL: https://issues.apache.org/jira/browse/YARN-8406 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: zhuqi >Priority: Critical > Labels: patch > Attachments: YARN-7467-001.patch, test.png > > Original Estimate: 24h > Remaining Estimate: 24h > > I want to help to do the improvement about that FSLeafQueue unnecessarily > calls ComputeFairShares.computeShare(). > h1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505822#comment-16505822 ] Yufei Gu commented on YARN-8394: Sounds good to me. > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8394.001.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505726#comment-16505726 ] Yufei Gu commented on YARN-8394: bq. This can be done by setting `yarn.scheduler.capacity.node-locality-delay` to `-1` This should be done in code instead of letting user to do it by reading the doc. Sounds like another jira if it is not there. > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8394.001.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8406) Do the improvement to the FSLeafQueue about calculating fair share for apps
[ https://issues.apache.org/jira/browse/YARN-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504915#comment-16504915 ] Yufei Gu commented on YARN-8406: [~zhuqi], there is no need to create this subtask. You can ask [~templedf] if you can take the YARN-7467. > Do the improvement to the FSLeafQueue about calculating fair share for apps > --- > > Key: YARN-8406 > URL: https://issues.apache.org/jira/browse/YARN-8406 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: zhuqi >Priority: Critical > Labels: patch > Original Estimate: 24h > Remaining Estimate: 24h > > I want to help to do the improvement about that FSLeafQueue unnecessarily > calls ComputeFairShares.computeShare(). > h1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8406) Do the improvement to the FSLeafQueue about calculating fair share for apps
[ https://issues.apache.org/jira/browse/YARN-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu resolved YARN-8406. Resolution: Duplicate > Do the improvement to the FSLeafQueue about calculating fair share for apps > --- > > Key: YARN-8406 > URL: https://issues.apache.org/jira/browse/YARN-8406 > Project: Hadoop YARN > Issue Type: Sub-task > Components: fairscheduler >Affects Versions: 3.1.0 >Reporter: zhuqi >Priority: Critical > Labels: patch > Original Estimate: 24h > Remaining Estimate: 24h > > I want to help to do the improvement about that FSLeafQueue unnecessarily > calls ComputeFairShares.computeShare(). > h1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503552#comment-16503552 ] Yufei Gu commented on YARN-8394: Make senses to me assuming that the Cloud solution still uses CS/FS as the scheduler. I guess some simple settings to let container run on any node will solve the issue. Besides, the trend is no YARN in Cloud solutions, which makes "delay logic" totally irrelevant. > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: YARN-8394.001.patch > > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8394) Improve data locality documentation for Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502479#comment-16502479 ] Yufei Gu commented on YARN-8394: Hi [~cheersyang], thanks for filing this. Can you elaborate on this? bq. we need to introduce how to compromise data locality in CS otherwise MR jobs are suffering. > Improve data locality documentation for Capacity Scheduler > -- > > Key: YARN-8394 > URL: https://issues.apache.org/jira/browse/YARN-8394 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Weiwei Yang >Priority: Major > > YARN-6344 introduces a new parameter > {{yarn.scheduler.capacity.rack-locality-additional-delay}} in > capacity-scheduler.xml, we need to add some documentation in > {{CapacityScheduler.md}} accordingly. > Moreover, we are seeing more and more clusters are separating storage and > computation where file system is always remote, in such cases we need to > introduce how to compromise data locality in CS otherwise MR jobs are > suffering. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5139) [Umbrella] Move YARN scheduler towards global scheduler
[ https://issues.apache.org/jira/browse/YARN-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502076#comment-16502076 ] Yufei Gu commented on YARN-5139: [~zhuqi], you are welcome to contribute to Fair Scheduler. It's not a trivial effort to bring global scheduling to FS even with these jiras in. I strongly believe it is the right direction though. Let me know if you need any help. > [Umbrella] Move YARN scheduler towards global scheduler > --- > > Key: YARN-5139 > URL: https://issues.apache.org/jira/browse/YARN-5139 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Major > Attachments: Explanantions of Global Scheduling (YARN-5139) > Implementation.pdf, YARN-5139-Concurrent-scheduling-performance-report.pdf, > YARN-5139-Global-Schedulingd-esign-and-implementation-notes-v2.pdf, > YARN-5139-Global-Schedulingd-esign-and-implementation-notes.pdf, > YARN-5139.000.patch, wip-1.YARN-5139.patch, wip-2.YARN-5139.patch, > wip-3.YARN-5139.patch, wip-4.YARN-5139.patch, wip-5.YARN-5139.patch > > > Existing YARN scheduler is based on node heartbeat. This can lead to > sub-optimal decisions because scheduler can only look at one node at the time > when scheduling resources. > Pseudo code of existing scheduling logic looks like: > {code} > for node in allNodes: >Go to parentQueue > Go to leafQueue > for application in leafQueue.applications: >for resource-request in application.resource-requests > try to schedule on node > {code} > Considering future complex resource placement requirements, such as node > constraints (give me "a && b || c") or anti-affinity (do not allocate HBase > regionsevers and Storm workers on the same host), we may need to consider > moving YARN scheduler towards global scheduling. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org