[jira] [Commented] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions
[ https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939147#comment-16939147 ] Jonathan Hung commented on YARN-9858: - Thanks, good point. I think instead we can just initialize this set in DefaultAMSProcessor / RMAppManager constructor since there's only one of each. Then we can avoid making changes to RMContext completely. I think it's best to avoid coupling RMContextImpl creation with setting exclusiveEnforcedPartitions, otherwise if another RMContextImpl is initialized in the future, it's not obvious this field needs to be set. I'll post a patch for this shortly. > Optimize RMContext getExclusiveEnforcedPartitions > -- > > Key: YARN-9858 > URL: https://issues.apache.org/jira/browse/YARN-9858 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Labels: release-blocker > Attachments: YARN-9858.001.patch > > > Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a > hot code path, need to optimize it . > Since AMS allocate invoked by multiple handlers locking on conf will occur > {code} > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841) > - waiting to lock <0x7f1f8107c748> (a > org.apache.hadoop.yarn.conf.YarnConfiguration) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions
[ https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939136#comment-16939136 ] Bibin Chundatt edited comment on YARN-9858 at 9/27/19 5:46 AM: --- [~jhung] Patch could cause *exclusiveEnforcedPartitions* getting set multiple times incase of concurrent execution. Its a possibility since its invoked by multiple handler. Alternative could be to set the *exclusiveEnforcedPartitions* after the creation of RMContext at # Resourcemanager#serviceInit # Resourcemanager#resetRMContext All activeservices would be in NEW STATE when set it too. Thoughts? was (Author: bibinchundatt): [~jhung] Patch could cause *exclusiveEnforcedPartitions* getting set multiple times incase of concurrent execution. Its a possibility since its invoked by multiple handler. Alternative could be to set the *exclusiveEnforcedPartitions* after the creation of RMContext at # Resourcemanager#serviceInit # Resourcemanager#resetRMContext All the activeservices would be in stopped when we set it too.. Thoughts? > Optimize RMContext getExclusiveEnforcedPartitions > -- > > Key: YARN-9858 > URL: https://issues.apache.org/jira/browse/YARN-9858 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Labels: release-blocker > Attachments: YARN-9858.001.patch > > > Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a > hot code path, need to optimize it . > Since AMS allocate invoked by multiple handlers locking on conf will occur > {code} > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841) > - waiting to lock <0x7f1f8107c748> (a > org.apache.hadoop.yarn.conf.YarnConfiguration) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions
[ https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939136#comment-16939136 ] Bibin Chundatt commented on YARN-9858: -- [~jhung] Patch could cause *exclusiveEnforcedPartitions* getting set multiple times incase of concurrent execution. Its a possibility since its invoked by multiple handler. Alternative could be to set the *exclusiveEnforcedPartitions* after the creation of RMContext at # Resourcemanager#serviceInit # Resourcemanager#resetRMContext All the activeservices would be in stopped when we set it too.. Thoughts? > Optimize RMContext getExclusiveEnforcedPartitions > -- > > Key: YARN-9858 > URL: https://issues.apache.org/jira/browse/YARN-9858 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Labels: release-blocker > Attachments: YARN-9858.001.patch > > > Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a > hot code path, need to optimize it . > Since AMS allocate invoked by multiple handlers locking on conf will occur > {code} > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841) > - waiting to lock <0x7f1f8107c748> (a > org.apache.hadoop.yarn.conf.YarnConfiguration) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9859) Refactor OpportunisticContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9859: Attachment: YARN-9859.002.patch > Refactor OpportunisticContainerAllocator > > > Key: YARN-9859 > URL: https://issues.apache.org/jira/browse/YARN-9859 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9859.001.patch, YARN-9859.002.patch > > > Right now OpportunisticContainerAllocator is written mainly for Distributed > Scheduling and schedules Opportunistic containers on limited set of nodes. As > part of this jira, we are going to make OpportunisticContainerAllocator as an > abstract class and DistributedOpportunisticContainerAllocator as actual > implementation. This would be prerequisite for YARN-9697. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9859) Refactor OpportunisticContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939113#comment-16939113 ] Abhishek Modi commented on YARN-9859: - Thanks [~elgoiri] for review. Changed the title of the jira. {quote}we should tune the indentation for 237 and adding extra indents to the following lines of the constructor. {quote} I checked the indentation and it seems to be correct to me. Could you please check it again and let me know if I am missing something. Attached v2 patch with rest of the fixes. > Refactor OpportunisticContainerAllocator > > > Key: YARN-9859 > URL: https://issues.apache.org/jira/browse/YARN-9859 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9859.001.patch, YARN-9859.002.patch > > > Right now OpportunisticContainerAllocator is written mainly for Distributed > Scheduling and schedules Opportunistic containers on limited set of nodes. As > part of this jira, we are going to make OpportunisticContainerAllocator as an > abstract class and DistributedOpportunisticContainerAllocator as actual > implementation. This would be prerequisite for YARN-9697. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9859) Refactor OpportunisticContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9859: Summary: Refactor OpportunisticContainerAllocator (was: Code cleanup of OpportunisticContainerAllocator) > Refactor OpportunisticContainerAllocator > > > Key: YARN-9859 > URL: https://issues.apache.org/jira/browse/YARN-9859 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9859.001.patch > > > Right now OpportunisticContainerAllocator is written mainly for Distributed > Scheduling and schedules Opportunistic containers on limited set of nodes. As > part of this jira, we are going to make OpportunisticContainerAllocator as an > abstract class and DistributedOpportunisticContainerAllocator as actual > implementation. This would be prerequisite for YARN-9697. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9751) Separate queue and app ordering policy capacity scheduler configs
[ https://issues.apache.org/jira/browse/YARN-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939064#comment-16939064 ] Jonathan Hung commented on YARN-9751: - Upon more thought, this is technically an incompatible change. Dropping it from 2.10.0. > Separate queue and app ordering policy capacity scheduler configs > - > > Key: YARN-9751 > URL: https://issues.apache.org/jira/browse/YARN-9751 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Labels: release-blocker > Attachments: YARN-9751.001.patch, YARN-9751.002.patch, > YARN-9751.003.patch > > > Right now it's not possible to specify distinct app and queue ordering > policies since they share the same {{ordering-policy}} suffix. > There's already a TODO in CapacitySchedulerConfiguration for this. This Jira > intends to fix it. > {noformat} > // TODO (wangda): We need to better distinguish app ordering policy and queue > // ordering policy's classname / configuration options, etc. And dedup code > // if possible.{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9736) Recursively configure app ordering policies
[ https://issues.apache.org/jira/browse/YARN-9736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-9736: Labels: (was: release-blocker) > Recursively configure app ordering policies > --- > > Key: YARN-9736 > URL: https://issues.apache.org/jira/browse/YARN-9736 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9736.001.patch > > > Currently app ordering policy will find confs with prefix > {{.ordering-policy}}. For queues with same ordering policy > configurations it's easier to have a queue inherit confs from its parent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9751) Separate queue and app ordering policy capacity scheduler configs
[ https://issues.apache.org/jira/browse/YARN-9751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-9751: Labels: (was: release-blocker) > Separate queue and app ordering policy capacity scheduler configs > - > > Key: YARN-9751 > URL: https://issues.apache.org/jira/browse/YARN-9751 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Attachments: YARN-9751.001.patch, YARN-9751.002.patch, > YARN-9751.003.patch > > > Right now it's not possible to specify distinct app and queue ordering > policies since they share the same {{ordering-policy}} suffix. > There's already a TODO in CapacitySchedulerConfiguration for this. This Jira > intends to fix it. > {noformat} > // TODO (wangda): We need to better distinguish app ordering policy and queue > // ordering policy's classname / configuration options, etc. And dedup code > // if possible.{noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions
[ https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939051#comment-16939051 ] Jonathan Hung commented on YARN-9858: - Uploaded 001 patch which caches the value from conf. [~bibinchundatt] mind taking a look? Thanks! > Optimize RMContext getExclusiveEnforcedPartitions > -- > > Key: YARN-9858 > URL: https://issues.apache.org/jira/browse/YARN-9858 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Labels: release-blocker > Attachments: YARN-9858.001.patch > > > Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a > hot code path, need to optimize it . > Since AMS allocate invoked by multiple handlers locking on conf will occur > {code} > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841) > - waiting to lock <0x7f1f8107c748> (a > org.apache.hadoop.yarn.conf.YarnConfiguration) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions
[ https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-9858: Attachment: YARN-9858.001.patch > Optimize RMContext getExclusiveEnforcedPartitions > -- > > Key: YARN-9858 > URL: https://issues.apache.org/jira/browse/YARN-9858 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Labels: release-blocker > Attachments: YARN-9858.001.patch > > > Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a > hot code path, need to optimize it . > Since AMS allocate invoked by multiple handlers locking on conf will occur > {code} > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841) > - waiting to lock <0x7f1f8107c748> (a > org.apache.hadoop.yarn.conf.YarnConfiguration) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938988#comment-16938988 ] Peter Bacsko commented on YARN-9841: [~maniraj...@gmail.com] no problem, I'm fine with a separate JIRA. I haven't had the chance to examine the mapping behaviour, hopefully tomorrow I'll have some time. I also asked [~Prabhu Joseph] to verify it because he's knowledgeable about CS. > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.001.patch, > YARN-9841.002.patch, YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938934#comment-16938934 ] Hadoop QA commented on YARN-9841: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 49s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 22s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 30s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 11 new + 23 unchanged - 0 fixed = 34 total (was 23) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 18s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 40s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}149m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerOvercommit | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:efed4450bf1 | | JIRA Issue | YARN-9841 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12981457/YARN-9841.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 782cbe466612 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 06998a1 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/24845/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit |
[jira] [Comment Edited] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938834#comment-16938834 ] Manikandan R edited comment on YARN-9841 at 9/26/19 5:29 PM: - Thanks [~pbacsko] for review. Addressed all of your comments. Attached .002.patch. {quote}If we have this for {{%primary_group}}, can't we just handle {{%secondary_group}} as well? {quote} Initially thought about this, but then preferred to take it in separate Jira for ease of tracking and to avoid confusions with description, discussions etc. Hope you are fine. Also, Had a chance to look at observations raised earlier? We can track these issues in separate JIRA. {quote}Can {{ctx}} ever be null? I assume this test should produce the same behavior each time, provided the code-under-test doesn't change. So to me it seems more logical to make sure that {{ctx}} is not null, which practically means a new assertion. Btw this applies to the piece of code above, too. {quote} Made changes in {{TestCapacitySchedulerQueueMappingFactory}}, but not in {{TestUserGroupMappingPlacementRule}} as it is commonly by various asserts wherein some cases ctx is null. was (Author: maniraj...@gmail.com): Thanks [~pbacsko] for review. Addressed all of your comments. Attached .002.patch. {quote}If we have this for {{%primary_group}}, can't we just handle {{%secondary_group}} as well? {quote} Initially thought about this, but then preferred to take it in separate for ease of tracking and to avoid confusions with description etc. Hope you are fine. Also, Had a chance to look at observations raised earlier? We can track these issues in separate JIRA. {quote}Can {{ctx}} ever be null? I assume this test should produce the same behavior each time, provided the code-under-test doesn't change. So to me it seems more logical to make sure that {{ctx}} is not null, which practically means a new assertion. Btw this applies to the piece of code above, too. {quote} Made changes in {{TestCapacitySchedulerQueueMappingFactory}}, but not in {{TestUserGroupMappingPlacementRule}} as it is commonly by various asserts wherein some cases ctx is null. > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.001.patch, > YARN-9841.002.patch, YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938834#comment-16938834 ] Manikandan R commented on YARN-9841: Thanks [~pbacsko] for review. Addressed all of your comments. Attached .002.patch. {quote}If we have this for {{%primary_group}}, can't we just handle {{%secondary_group}} as well? {quote} Initially thought about this, but then preferred to take it in separate for ease of tracking and to avoid confusions with description etc. Hope you are fine. Also, Had a chance to look at observations raised earlier? We can track these issues in separate JIRA. {quote}Can {{ctx}} ever be null? I assume this test should produce the same behavior each time, provided the code-under-test doesn't change. So to me it seems more logical to make sure that {{ctx}} is not null, which practically means a new assertion. Btw this applies to the piece of code above, too. {quote} Made changes in {{TestCapacitySchedulerQueueMappingFactory}}, but not in {{TestUserGroupMappingPlacementRule}} as it is commonly by various asserts wherein some cases ctx is null. > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.001.patch, > YARN-9841.002.patch, YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9859) Code cleanup of OpportunisticContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938831#comment-16938831 ] Íñigo Goiri commented on YARN-9859: --- I would rename the JIRA as Refactor more than cleanup. A minor comment in OpportunisticContainerAllocatorAMService, we should tune the indentation for 237 and adding extra indents to the following lines of the constructor. For OpportunisticContainerAllocator, we should change the javadoc. As we are moving DistributedOpportunisticContainerAllocator, would you mind doing a pass fixing indentation all over that class? > Code cleanup of OpportunisticContainerAllocator > --- > > Key: YARN-9859 > URL: https://issues.apache.org/jira/browse/YARN-9859 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9859.001.patch > > > Right now OpportunisticContainerAllocator is written mainly for Distributed > Scheduling and schedules Opportunistic containers on limited set of nodes. As > part of this jira, we are going to make OpportunisticContainerAllocator as an > abstract class and DistributedOpportunisticContainerAllocator as actual > implementation. This would be prerequisite for YARN-9697. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikandan R updated YARN-9841: --- Attachment: YARN-9841.002.patch > Capacity scheduler: add support for combined %user + %primary_group mapping > --- > > Key: YARN-9841 > URL: https://issues.apache.org/jira/browse/YARN-9841 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler >Reporter: Peter Bacsko >Assignee: Manikandan R >Priority: Major > Attachments: YARN-9841.001.patch, YARN-9841.001.patch, > YARN-9841.002.patch, YARN-9841.junit.patch > > > Right now in CS, using {{%primary_group}} with a parent queue is only > possible this way: > {{u:%user:parentqueue.%primary_group}} > Looking at > https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java, > we cannot do something like: > {{u:%user:%primary_group.%user}} > Fair Scheduler supports a nested rule where such a placement/mapping rule is > possible. This improvement would reduce this feature gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938819#comment-16938819 ] Iñigo Goiri commented on YARN-9768: --- Thanks [~maniraj...@gmail.com] for [^YARN-9768.002.patch]. As we are using getTimeDuration(), the variables should be also time durations, I usually do: {code} public static final long DEFAULT_RM_DELEGATION_TOKEN_RENEWER_THREAD_RETRY_INTERVAL = TimeUni.SECONDS.toMillis(60); {code} Regarding reading these variables, I prefer using the following indentation: {code} tokenRenewerThreadRetryInterval = conf.getTimeDuration( YarnConfiguration.RM_DELEGATION_TOKEN_RENEWER_THREAD_RETRY_INTERVAL, YarnConfiguration.DEFAULT_RM_DELEGATION_TOKEN_RENEWER_THREAD_RETRY_INTERVAL, TimeUnit.MILLISECONDS); {code} DelegationTokenRenewer#215 should be a single line. In DelegationTokenRenewer#227, you should do {{catch(TimeOutException toe)}} then add an extra {{catch(Exception e)}}. I also think DelegationTokenRenewer#234 can be a lambda. Avoid DelegationTokenRenewer#442, it just adds churn in an unrelated patch. Same for #691 and #508. Why are we making DelegationTokenRenewer#551 a debug message? If we change that, let's also use logger style with {}. DelegationTokenRenewer#1107 should be a single line. Same as 1129 and 1047 with the end of file. For TestDelegationTokenRenewer, let's also avoid the changes like #169. #1567 should be a single line. Same for 1571 and 1573. I'm not a big fan of this long sleeps (#1591). You have a print in 1593, which could be done properly adding a message to the assertTrue (which could use a extracted version of the conf). > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Priority: Major > Attachments: YARN-9768.001.patch, YARN-9768.002.patch > > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9699) Migration tool that help to generate CS config based on FS config
[ https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-9699: - Summary: Migration tool that help to generate CS config based on FS config (was: Migration tool that help to generate CS configs based on FS) > Migration tool that help to generate CS config based on FS config > - > > Key: YARN-9699 > URL: https://issues.apache.org/jira/browse/YARN-9699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wanqiang Ji >Assignee: Gergely Pollak >Priority: Major > Attachments: FS_to_CS_migration_POC.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9857) TestDelegationTokenRenewer throws NPE but tests pass
[ https://issues.apache.org/jira/browse/YARN-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938781#comment-16938781 ] Hudson commented on YARN-9857: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17395 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17395/]) YARN-9857. TestDelegationTokenRenewer throws NPE but tests pass. (ebadger: rev 18a8c2404e10f18e3a0024753d3f8f558fe604af) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java > TestDelegationTokenRenewer throws NPE but tests pass > > > Key: YARN-9857 > URL: https://issues.apache.org/jira/browse/YARN-9857 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Fix For: 3.3.0 > > Attachments: YARN-9857.001.patch > > > {{TestDelegationTokenRenewer}} throws some NPEs: > {code:bash} > 2019-09-25 12:51:23,446 WARN [pool-19-thread-2] > security.DelegationTokenRenewer > (DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(945)) - Unable to > add the application to the delegation token renewer. > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:942) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-09-25 12:51:23,446 DEBUG [main] util.MBeans > (MBeans.java:unregister(138)) - Unregistering > Hadoop:service=ResourceManager,name=CapacitySchedulerMetrics > Exception in thread "pool-19-thread-2" java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:951) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918) > 2019-09-25 12:51:23,447 DEBUG [main] util.MBeans > (MBeans.java:unregister(138)) - Unregistering > Hadoop:service=ResourceManager,name=MetricsSystem,sub=Stats > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-09-25 12:51:23,447 INFO [main] impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped. > {code} > the RMContext dispatcher is not set for the RMMock which results in NPE > accessing the event handler of the dispatcher inside > {{DelegationTokenRenewer}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9857) TestDelegationTokenRenewer throws NPE but tests pass
[ https://issues.apache.org/jira/browse/YARN-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-9857: -- Fix Version/s: 3.3.0 > TestDelegationTokenRenewer throws NPE but tests pass > > > Key: YARN-9857 > URL: https://issues.apache.org/jira/browse/YARN-9857 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Fix For: 3.3.0 > > Attachments: YARN-9857.001.patch > > > {{TestDelegationTokenRenewer}} throws some NPEs: > {code:bash} > 2019-09-25 12:51:23,446 WARN [pool-19-thread-2] > security.DelegationTokenRenewer > (DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(945)) - Unable to > add the application to the delegation token renewer. > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:942) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-09-25 12:51:23,446 DEBUG [main] util.MBeans > (MBeans.java:unregister(138)) - Unregistering > Hadoop:service=ResourceManager,name=CapacitySchedulerMetrics > Exception in thread "pool-19-thread-2" java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:951) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918) > 2019-09-25 12:51:23,447 DEBUG [main] util.MBeans > (MBeans.java:unregister(138)) - Unregistering > Hadoop:service=ResourceManager,name=MetricsSystem,sub=Stats > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-09-25 12:51:23,447 INFO [main] impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped. > {code} > the RMContext dispatcher is not set for the RMMock which results in NPE > accessing the event handler of the dispatcher inside > {{DelegationTokenRenewer}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9857) TestDelegationTokenRenewer throws NPE but tests pass
[ https://issues.apache.org/jira/browse/YARN-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938766#comment-16938766 ] Eric Badger commented on YARN-9857: --- Couldn't recreate the failure initially because apparently there are 2 different TestDelegationTokenRenewer for some reason and I was running the wrong one. +1 lgtm. Thanks for the patch, [~ahussein] and for the review [~Jim_Brennan] I've committed this to trunk > TestDelegationTokenRenewer throws NPE but tests pass > > > Key: YARN-9857 > URL: https://issues.apache.org/jira/browse/YARN-9857 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Attachments: YARN-9857.001.patch > > > {{TestDelegationTokenRenewer}} throws some NPEs: > {code:bash} > 2019-09-25 12:51:23,446 WARN [pool-19-thread-2] > security.DelegationTokenRenewer > (DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(945)) - Unable to > add the application to the delegation token renewer. > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:942) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-09-25 12:51:23,446 DEBUG [main] util.MBeans > (MBeans.java:unregister(138)) - Unregistering > Hadoop:service=ResourceManager,name=CapacitySchedulerMetrics > Exception in thread "pool-19-thread-2" java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:951) > at > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:918) > 2019-09-25 12:51:23,447 DEBUG [main] util.MBeans > (MBeans.java:unregister(138)) - Unregistering > Hadoop:service=ResourceManager,name=MetricsSystem,sub=Stats > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2019-09-25 12:51:23,447 INFO [main] impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped. > {code} > the RMContext dispatcher is not set for the RMMock which results in NPE > accessing the event handler of the dispatcher inside > {{DelegationTokenRenewer}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9768) RM Renew Delegation token thread should timeout and retry
[ https://issues.apache.org/jira/browse/YARN-9768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938759#comment-16938759 ] Manikandan R commented on YARN-9768: [~inigoiri] [~crh] Can you review? > RM Renew Delegation token thread should timeout and retry > - > > Key: YARN-9768 > URL: https://issues.apache.org/jira/browse/YARN-9768 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: CR Hota >Priority: Major > Attachments: YARN-9768.001.patch, YARN-9768.002.patch > > > Delegation token renewer thread in RM (DelegationTokenRenewer.java) renews > HDFS tokens received to check for validity and expiration time. > This call is made to an underlying HDFS NN or Router Node (which has exact > APIs as HDFS NN). If one of the nodes is bad and the renew call is stuck the > thread remains stuck indefinitely. The thread should ideally timeout the > renewToken and retry from the client's perspective. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9699) Migration tool that help to generate CS configs based on FS
[ https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938683#comment-16938683 ] Szilard Nemeth edited comment on YARN-9699 at 9/26/19 2:41 PM: --- Hi [~pbacsko]! Looked into your PoC. Your approach looks good to me. Please start to resolve the TODOs and try to reach a state of completion that we can commit! Thanks! was (Author: snemeth): Hi [~pbacsko]! Looked into your PoC. Your approach looks good to me. Please start to resolve the TODOs and try to reach a state that we can commit! Thanks! > Migration tool that help to generate CS configs based on FS > --- > > Key: YARN-9699 > URL: https://issues.apache.org/jira/browse/YARN-9699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wanqiang Ji >Assignee: Gergely Pollak >Priority: Major > Attachments: FS_to_CS_migration_POC.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9699) Migration tool that help to generate CS configs based on FS
[ https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938683#comment-16938683 ] Szilard Nemeth commented on YARN-9699: -- Hi [~pbacsko]! Looked into your PoC. Your approach looks good to me. Please start to resolve the TODOs and try to reach a state that we can commit! Thanks! > Migration tool that help to generate CS configs based on FS > --- > > Key: YARN-9699 > URL: https://issues.apache.org/jira/browse/YARN-9699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wanqiang Ji >Assignee: Gergely Pollak >Priority: Major > Attachments: FS_to_CS_migration_POC.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9860) Enable service mode for Docker containers on YARN
Prabhu Joseph created YARN-9860: --- Summary: Enable service mode for Docker containers on YARN Key: YARN-9860 URL: https://issues.apache.org/jira/browse/YARN-9860 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 3.3.0 Reporter: Prabhu Joseph Assignee: Prabhu Joseph This task is to add support to YARN for running Docker containers in "Service Mode". Service Mode - Run the container as defined by the image, but still allow for injecting configuration. Background: Entrypoint mode helped - now able to use the ENV and ENTRYPOINT/CMD as defined in the image. However, still requires modification to official images due to user propagation User propagation is problematic for running a secure cluster with sssd Implementation: Must be enabled via c-e.cfg (example: docker.service-mode.allowed=true) Must be requested at runtime - (example: YARN_CONTAINER_RUNTIME_DOCKER_SERVICE_MODE=true) Entrypoint mode is default enabled for this mode (If Service Mode is requested, YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE should be set to true) Writable log mount will not be added - stdout logging may still work with entrypoint mode - remove the writable bind mounts User and groups will not be propagated (now: docker run --user nobody --group-add=nobody , after: docker run ) Read-only resources mounted at the file level, files get chmod 777, parent directory only accessible by the run as user. cc [~shaneku...@gmail.com] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9760) Support configuring application priorities on a workflow level
[ https://issues.apache.org/jira/browse/YARN-9760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938635#comment-16938635 ] Varun Saxena commented on YARN-9760: MAPREDUCE-7231 tracks hadoop-mapreduce-client-jobclient timeout. Related to TestPipeApplication. Going by MAPREDUCE-7036, the ASF license error may also be related. TestFairSchedulerPreemption failure is tracked by YARN-9333. Will fix some of the checkstyle issues post the review. > Support configuring application priorities on a workflow level > -- > > Key: YARN-9760 > URL: https://issues.apache.org/jira/browse/YARN-9760 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Jonathan Hung >Assignee: Varun Saxena >Priority: Major > Labels: release-blocker > Attachments: YARN-9760.01.patch, YARN-9760.02.patch > > > Currently priorities are submitted on an application level, but for end users > it's common to submit workloads to YARN at a workflow level. This jira > proposes a feature to store workflow id + priority mappings on RM (similar to > queue mappings). If app is submitted with a certain workflow id (as set in > application submission context) RM will override this app's priority with the > one defined in the mapping. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9699) Migration tool that help to generate CS configs based on FS
[ https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938634#comment-16938634 ] Szilard Nemeth commented on YARN-9699: -- Thanks [~pbacsko] for summing this up. Couple of things to add: 1. All related newly filed jiras should be either added as a sub-jira or a related jira of YARN-9698. 2. Another thing CS behaves differently is with the uniqueness of leaf queues instead of queue paths. Will file a jira for this soon. 3. I also agree with your points about yarn-site.xml vs. capacity-scheduler.xml. 4. [~sunilg] also mentioned that Stage #1 should obviously contain mapping the queue hierarchies and this should be the first priority. AFAIK, your tool can already do that and others on top as well. 5. We also talked about when to show warnings and when to stop the conversion, so we should define these criteria. Thanks! > Migration tool that help to generate CS configs based on FS > --- > > Key: YARN-9699 > URL: https://issues.apache.org/jira/browse/YARN-9699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wanqiang Ji >Assignee: Gergely Pollak >Priority: Major > Attachments: FS_to_CS_migration_POC.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9859) Code cleanup of OpportunisticContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938613#comment-16938613 ] Abhishek Modi commented on YARN-9859: - [~elgoiri] could you please review this whenever you get some time. Thanks. > Code cleanup of OpportunisticContainerAllocator > --- > > Key: YARN-9859 > URL: https://issues.apache.org/jira/browse/YARN-9859 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9859.001.patch > > > Right now OpportunisticContainerAllocator is written mainly for Distributed > Scheduling and schedules Opportunistic containers on limited set of nodes. As > part of this jira, we are going to make OpportunisticContainerAllocator as an > abstract class and DistributedOpportunisticContainerAllocator as actual > implementation. This would be prerequisite for YARN-9697. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9859) Code cleanup of OpportunisticContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938607#comment-16938607 ] Hadoop QA commented on YARN-9859: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 53s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 42s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 32s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 22s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 51s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: The patch generated 4 new + 30 unchanged - 2 fixed = 34 total (was 32) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 52s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 33s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 52s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}183m 34s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:efed4450bf1 | | JIRA Issue | YARN-9859 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12981428/YARN-9859.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1bb30b051950 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh
[jira] [Commented] (YARN-9362) Code cleanup in TestNMLeveldbStateStoreService
[ https://issues.apache.org/jira/browse/YARN-9362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938577#comment-16938577 ] Hadoop QA commented on YARN-9362: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 51s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 7 unchanged - 6 fixed = 7 total (was 13) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 18s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 25s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 77m 59s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:efed4450bf1 | | JIRA Issue | YARN-9362 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12981432/YARN-9362.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8bfcb939d540 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7b6219a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24843/testReport/ | | Max. process+thread count | 307 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24843/console | | Powered by | Apache Yetus 0.8.0
[jira] [Commented] (YARN-2442) ResourceManager JMX UI does not give HA State
[ https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938550#comment-16938550 ] Hadoop QA commented on YARN-2442: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} YARN-2442 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-2442 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12913203/YARN-2442.02.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24844/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > ResourceManager JMX UI does not give HA State > - > > Key: YARN-2442 > URL: https://issues.apache.org/jira/browse/YARN-2442 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Nishan Shetty >Assignee: Rohith Sharma K S >Priority: Major > Labels: oct16-easy > Attachments: 0001-YARN-2442.patch, YARN-2442.02.patch > > > ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, > STOPPED) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2442) ResourceManager JMX UI does not give HA State
[ https://issues.apache.org/jira/browse/YARN-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938545#comment-16938545 ] Cyrus Jackson commented on YARN-2442: - [~rohithsharma] if you are not actively working on this, could I create a patch for latest trunk. Thanks. > ResourceManager JMX UI does not give HA State > - > > Key: YARN-2442 > URL: https://issues.apache.org/jira/browse/YARN-2442 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Nishan Shetty >Assignee: Rohith Sharma K S >Priority: Major > Labels: oct16-easy > Attachments: 0001-YARN-2442.patch, YARN-2442.02.patch > > > ResourceManager JMX UI can show the haState (INITIALIZING, ACTIVE, STANDBY, > STOPPED) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9362) Code cleanup in TestNMLeveldbStateStoreService
[ https://issues.apache.org/jira/browse/YARN-9362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denes Gerencser updated YARN-9362: -- Attachment: YARN-9362.001.patch > Code cleanup in TestNMLeveldbStateStoreService > -- > > Key: YARN-9362 > URL: https://issues.apache.org/jira/browse/YARN-9362 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Denes Gerencser >Priority: Minor > Attachments: YARN-9362.001.patch > > > There are many ways to improve TestNMLeveldbStateStoreService: > 1. RecoveredContainerState fields are asserted many times repeatedly. Some > simple method extractions would definitely make this more readable. > 2. The tests are very long and hard to read in general: Again, finding how > methods could be extracted to avoid code repetition could help. > 3. You name it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-9362) Code cleanup in TestNMLeveldbStateStoreService
[ https://issues.apache.org/jira/browse/YARN-9362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denes Gerencser reassigned YARN-9362: - Assignee: Denes Gerencser > Code cleanup in TestNMLeveldbStateStoreService > -- > > Key: YARN-9362 > URL: https://issues.apache.org/jira/browse/YARN-9362 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Denes Gerencser >Priority: Minor > > There are many ways to improve TestNMLeveldbStateStoreService: > 1. RecoveredContainerState fields are asserted many times repeatedly. Some > simple method extractions would definitely make this more readable. > 2. The tests are very long and hard to read in general: Again, finding how > methods could be extracted to avoid code repetition could help. > 3. You name it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9859) Code cleanup of OpportunisticContainerAllocator
[ https://issues.apache.org/jira/browse/YARN-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Modi updated YARN-9859: Attachment: YARN-9859.001.patch > Code cleanup of OpportunisticContainerAllocator > --- > > Key: YARN-9859 > URL: https://issues.apache.org/jira/browse/YARN-9859 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Abhishek Modi >Assignee: Abhishek Modi >Priority: Major > Attachments: YARN-9859.001.patch > > > Right now OpportunisticContainerAllocator is written mainly for Distributed > Scheduling and schedules Opportunistic containers on limited set of nodes. As > part of this jira, we are going to make OpportunisticContainerAllocator as an > abstract class and DistributedOpportunisticContainerAllocator as actual > implementation. This would be prerequisite for YARN-9697. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9859) Code cleanup of OpportunisticContainerAllocator
Abhishek Modi created YARN-9859: --- Summary: Code cleanup of OpportunisticContainerAllocator Key: YARN-9859 URL: https://issues.apache.org/jira/browse/YARN-9859 Project: Hadoop YARN Issue Type: Sub-task Reporter: Abhishek Modi Assignee: Abhishek Modi Right now OpportunisticContainerAllocator is written mainly for Distributed Scheduling and schedules Opportunistic containers on limited set of nodes. As part of this jira, we are going to make OpportunisticContainerAllocator as an abstract class and DistributedOpportunisticContainerAllocator as actual implementation. This would be prerequisite for YARN-9697. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions
[ https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin Chundatt updated YARN-9858: - Description: Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a hot code path, need to optimize it . Since AMS allocate invoked by multiple handlers locking on conf will occur {code} java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841) - waiting to lock <0x7f1f8107c748> (a org.apache.hadoop.yarn.conf.YarnConfiguration) at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214) at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268) {code} was:Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a hot code path, need to optimize it. > Optimize RMContext getExclusiveEnforcedPartitions > -- > > Key: YARN-9858 > URL: https://issues.apache.org/jira/browse/YARN-9858 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Labels: release-blocker > > Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a > hot code path, need to optimize it . > Since AMS allocate invoked by multiple handlers locking on conf will occur > {code} > java.lang.Thread.State: BLOCKED (on object monitor) > at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2841) > - waiting to lock <0x7f1f8107c748> (a > org.apache.hadoop.yarn.conf.YarnConfiguration) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:1214) > at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1268) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938379#comment-16938379 ] Hadoop QA commented on YARN-9841: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 59s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 38s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 29s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 11 new + 23 unchanged - 0 fixed = 34 total (was 23) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 8s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 85m 19s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}144m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:efed4450bf1 | | JIRA Issue | YARN-9841 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12981398/YARN-9841.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0852d266cdcc 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 587a8ee | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/24841/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24841/testReport/ | | Max. process+thread count | 822 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
[jira] [Commented] (YARN-9841) Capacity scheduler: add support for combined %user + %primary_group mapping
[ https://issues.apache.org/jira/browse/YARN-9841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938360#comment-16938360 ] Hadoop QA commented on YARN-9841: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 55s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 30s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 8 new + 5 unchanged - 0 fixed = 13 total (was 5) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 12 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 57s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 80m 46s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}141m 18s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:efed4450bf1 | | JIRA Issue | YARN-9841 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12981220/YARN-9841.junit.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 69f4f248f9ad 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 587a8ee | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/24840/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/24840/artifact/out/whitespace-eol.txt | | Test Results |
[jira] [Assigned] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions
[ https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung reassigned YARN-9858: --- Assignee: Jonathan Hung > Optimize RMContext getExclusiveEnforcedPartitions > -- > > Key: YARN-9858 > URL: https://issues.apache.org/jira/browse/YARN-9858 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Labels: release-blocker > > Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a > hot code path, need to optimize it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions
[ https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-9858: Target Version/s: 2.10.0 > Optimize RMContext getExclusiveEnforcedPartitions > -- > > Key: YARN-9858 > URL: https://issues.apache.org/jira/browse/YARN-9858 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Priority: Major > Labels: release-blocker > > Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a > hot code path, need to optimize it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions
[ https://issues.apache.org/jira/browse/YARN-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-9858: Labels: release-blocker (was: ) > Optimize RMContext getExclusiveEnforcedPartitions > -- > > Key: YARN-9858 > URL: https://issues.apache.org/jira/browse/YARN-9858 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jonathan Hung >Priority: Major > Labels: release-blocker > > Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a > hot code path, need to optimize it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9858) Optimize RMContext getExclusiveEnforcedPartitions
Jonathan Hung created YARN-9858: --- Summary: Optimize RMContext getExclusiveEnforcedPartitions Key: YARN-9858 URL: https://issues.apache.org/jira/browse/YARN-9858 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Hung Follow-up from YARN-9730. RMContextImpl#getExclusiveEnforcedPartitions is a hot code path, need to optimize it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label
[ https://issues.apache.org/jira/browse/YARN-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938312#comment-16938312 ] Jonathan Hung commented on YARN-9730: - Sure. Thanks [~bibinchundatt] for the comment. I will address it in YARN-9858. > Support forcing configured partitions to be exclusive based on app node label > - > > Key: YARN-9730 > URL: https://issues.apache.org/jira/browse/YARN-9730 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Labels: release-blocker > Fix For: 2.10.0, 3.3.0, 3.2.2, 3.1.4 > > Attachments: YARN-9730-branch-2.001.patch, YARN-9730.001.addendum, > YARN-9730.001.patch, YARN-9730.002.addendum, YARN-9730.002.patch, > YARN-9730.003.patch > > > Use case: queue X has all of its workload in non-default (exclusive) > partition P (by setting app submission context's node label set to P). Node > in partition Q != P heartbeats to RM. Capacity scheduler loops through every > application in X, and every scheduler key in this application, and fails to > allocate each time since the app's requested label and the node's label don't > match. This causes huge performance degradation when number of apps in X is > large. > To fix the issue, allow RM to configure partitions as "forced-exclusive". If > partition P is "forced-exclusive", then: > * 1a. If app sets its submission context's node label to P, all its resource > requests will be overridden to P > * 1b. If app sets its submission context's node label Q, any of its resource > requests whose labels are P will be overridden to Q > * 2. In the scheduler, we add apps with node label expression P to a > separate data structure. When a node in partition P heartbeats to scheduler, > we only try to schedule apps in this data structure. When a node in partition > Q heartbeats to scheduler, we schedule the rest of the apps as normal. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9699) Migration tool that help to generate CS configs based on FS
[ https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938306#comment-16938306 ] Peter Bacsko edited comment on YARN-9699 at 9/26/19 6:25 AM: - Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth]. A couple of things to add: * Development of this tool should happen in stages. Stage #1 converter will not support every FS feature/property, simply because it's missing in CS. Those which are missing should be added gradually (see YARN-9840 and YARN-9841 for example). What we have in the POC is already a good starting point. * Users should be able to define a "rule" file (simple property file), which imposes certain limits on various things (eg. no more than 100 queue are allowed on the same level) and also defines what should happen if the tool encounters an unsupported feature. For example, CS does not support max running apps per user, so we can have the following settings: {noformat} maximumQueuesPerLevel=100 maxAppsPerUser=warning {noformat} In this case, "warning" means that the user will be warned that this particular setting is not supported in CS and won't be migrated. Another possible setting could be "error", which aborts the conversion immediately with an error message. * We also need strict validation of certain things: the sum of capacities are 100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is allowed with the same name. * [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated and not {{yarn-site.xml}} (only the scheduler class should be changed in this file). Looking at the current settings and mappings, I believe this is not possible, because there are properties that should be placed in the {{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even if those properties can reside in {{capacity-scheduler.xml}} (someone must confirm this), their FS counterpart should be removed from the site config. * Output of the tool is most likely going to be file/files, but having stdout as an option is nice to have. Not sure if I missed something, feel free to correct me if I'm wrong. was (Author: pbacsko): Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth]. A couple of things to add: * Development of this tool should happen in stages. Stage #1 converter will not support every FS feature/property, simply because it's missing in CS. Those which are missing should be added gradually (see YARN-9840 and YARN-9841 for example). What we have in the POC is already a good starting point. * Users should be able to define a "rule" file (simple property file), which imposes certain limits on various things (eg. no more than 100 queue are allowed on the same level) and also defines what should happen if the tool encounters an unsupported feature. For example, CS does not support max running apps per user, so we can have the following settings: {noformat} maximumQueuesPerLevel=100 maxAppsPerUser=warning {noformat} In this case, "warning" means that the user will be warned that this particular setting is not supported in CS and won't be migrated. Another possible setting could be "error", which aborts the conversion immediately with an error message. * We also need strict validation of certain things: the sum of capacities are 100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is allowed with the same name. * [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated and not {{yarn-site.xml}} (only the scheduler class should be changed in this file). Looking at the current settings and mappings, I believe this is not possible, because there are properties that should be placed in the {{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even if those properties can reside in {{capacity-scheduler.xml}} (someone must confirm this), their FS counterpart should be removed from the site config. * Output of the tool is most likely going to be file/files, but having stdout as an option is preferred. Not sure if I missed something, feel free to correct me if I'm wrong. > Migration tool that help to generate CS configs based on FS > --- > > Key: YARN-9699 > URL: https://issues.apache.org/jira/browse/YARN-9699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wanqiang Ji >Assignee: Gergely Pollak >Priority: Major > Attachments: FS_to_CS_migration_POC.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9849) Leaf queues not inheriting parent queue status after adding status as “RUNNING” and thereafter, commenting the same in capacity-scheduler.xml
[ https://issues.apache.org/jira/browse/YARN-9849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938311#comment-16938311 ] Hadoop QA commented on YARN-9849: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 28s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 48s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 51s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 2s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}133m 41s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueState | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:efed4450bf1 | | JIRA Issue | YARN-9849 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12981391/YARN-9849.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 8a4b8b777bd4 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 606e341 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/24839/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results |
[jira] [Comment Edited] (YARN-9699) Migration tool that help to generate CS configs based on FS
[ https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938306#comment-16938306 ] Peter Bacsko edited comment on YARN-9699 at 9/26/19 6:24 AM: - Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth]. A couple of things to add: * Development of this tool should happen in stages. Stage #1 converter will not support every FS feature/property, simply because it's missing in CS. Those which are missing should be added gradually (see YARN-9840 and YARN-9841 for example). What we have in the POC is already a good starting point. * Users should be able to define a "rule" file (simple property file), which imposes certain limits on various things (eg. no more than 100 queue are allowed on the same level) and also defines what should happen if the tool encounters an unsupported feature. For example, CS does not support max running apps per user, so we can have the following settings: {noformat} maximumQueuesPerLevel=100 maxAppsPerUser=warning {noformat} In this case, "warning" means that the user will be warned that this particular setting is not supported in CS and won't be migrated. Another possible setting could be "error", which aborts the conversion immediately with an error message. * We also need strict validation of certain things: the sum of capacities are 100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is allowed with the same name. * [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated and not {{yarn-site.xml}} (only the scheduler class should be changed in this file). Looking at the current settings and mappings, I believe this is not possible, because there are properties that should be placed in the {{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even if those properties can reside in {{capacity-scheduler.xml}} (someone must confirm this), their FS counterpart should be removed from the site config. * Output of the tool is most likely going to be file/files, but having stdout as an option is preferred. Not sure if I missed something, feel free to correct me if I'm wrong. was (Author: pbacsko): Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth]. A couple of things to add: * Development of this tool should happen in stages. Stage #1 converter will not support every FS feature/property, simply because it's missing in CS. Those which are missing should be added gradually (see YARN-9840 and YARN-9841 for example). What we have in the POC is already a good starting point. * Users should be able to define a "rule" file, which imposes certain limits on various things (eg. no more than 100 queue are allowed on the same level) and also defines what should happen if the tool encounters an unsupported feature. For example, CS does not support max running apps per user, so we can have the following settings: {noformat} maximumQueuesPerLevel=100 maxAppsPerUser=warning {noformat} In this case, "warning" means that the user will be warned that this particular setting is not supported in CS and won't be migrated. Another possible setting could be "error", which aborts the conversion immediately with an error message. * We also need strict validation of certain things: the sum of capacities are 100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is allowed with the same name. * [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated and not {{yarn-site.xml}} (only the scheduler class should be changed in this file). Looking at the current settings and mappings, I believe this is not possible, because there are properties that should be placed in the {{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even if those properties can reside in {{capacity-scheduler.xml}} (someone must confirm this), their FS counterpart should be removed from the site config. * Output of the tool is most likely going to be file/files, but having stdout as an option is preferred. Not sure if I missed something, feel free to correct me if I'm wrong. > Migration tool that help to generate CS configs based on FS > --- > > Key: YARN-9699 > URL: https://issues.apache.org/jira/browse/YARN-9699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wanqiang Ji >Assignee: Gergely Pollak >Priority: Major > Attachments: FS_to_CS_migration_POC.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9699) Migration tool that help to generate CS configs based on FS
[ https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938306#comment-16938306 ] Peter Bacsko edited comment on YARN-9699 at 9/26/19 6:14 AM: - Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth]. A couple of things to add: * Development of this tool should happen in stages. Stage #1 converter will not support every FS feature/property, simply because it's missing in CS. Those which are missing should be added gradually (see YARN-9840 and YARN-9841 for example). What we have in the POC is already a good starting point. * Users should be able to define a "rule" file, which imposes certain limits on various things (eg. no more than 100 queue are allowed on the same level) and also defines what should happen if the tool encounters an unsupported feature. For example, CS does not support max running apps per user, so we can have the following settings: {noformat} maximumQueuesPerLevel=100 maxAppsPerUser=warning {noformat} In this case, "warning" means that the user will be warned that this particular setting is not supported in CS and won't be migrated. Another possible setting could be "error", which aborts the conversion immediately with an error message. * We also need strict validation of certain things: the sum of capacities are 100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is allowed with the same name. * [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated and not {{yarn-site.xml}} (only the scheduler class should be changed in this file). Looking at the current settings and mappings, I believe this is not possible, because there are properties that should be placed in the {{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even if those properties can reside in {{capacity-scheduler.xml}} (someone must confirm this), their FS counterpart should be removed from the site config. * Output of the tool is most likely going to be file/files, but having stdout as an option is preferred. Not sure if I missed something, feel free to correct me if I'm wrong. was (Author: pbacsko): Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth]. A couple of things to add: * Development of this tool should happen in stages. Stage #1 converter will not support every FS feature/property, simply because it's missing in CS. Those which are missing should be added gradually (see YARN-9840 and YARN9841 for example). What we have in the POC is already a good starting point. * Users should be able to define a "rule" file, which imposes certain limits on various things (eg. no more than 100 queue are allowed on the same level) and also defines what should happen if the tool encounters an unsupported feature. For example, CS does not support max running apps per user, so we can have the following settings: {noformat} maximumQueuesPerLevel=100 maxAppsPerUser=warning {noformat} In this case, "warning" means that the user will be warned that this particular setting is not supported in CS and won't be migrated. Another possible setting could be "error", which aborts the conversion immediately with an error message. * We also need strict validation of certain things: the sum of capacities are 100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is allowed with the same name. * [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated and not {{yarn-site.xml}} (only the scheduler class should be changed in this file). Looking at the current settings and mappings, I believe this is not possible, because there are properties that should be placed in the {{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even if those properties can reside in {{capacity-scheduler.xml}} (someone must confirm this), their FS counterpart should be removed from the site config. * Output of the tool is most likely going to be file/files, but having stdout as an option is preferred. Not sure if I missed something, feel free to correct me if I'm wrong. > Migration tool that help to generate CS configs based on FS > --- > > Key: YARN-9699 > URL: https://issues.apache.org/jira/browse/YARN-9699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wanqiang Ji >Assignee: Gergely Pollak >Priority: Major > Attachments: FS_to_CS_migration_POC.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9699) Migration tool that help to generate CS configs based on FS
[ https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938306#comment-16938306 ] Peter Bacsko edited comment on YARN-9699 at 9/26/19 6:08 AM: - Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth]. A couple of things to add: * Development of this tool should happen in stages. Stage #1 converter will not support every FS feature/property, simply because it's missing in CS. Those which are missing should be added gradually (see YARN-9840 and YARN9841 for example). What we have in the POC is already a good starting point. * Users should be able to define a "rule" file, which imposes certain limits on various things (eg. no more than 100 queue are allowed on the same level) and also defines what should happen if the tool encounters an unsupported feature. For example, CS does not support max running apps per user, so we can have the following settings: {noformat} maximumQueuesPerLevel=100 maxAppsPerUser=warning {noformat} In this case, "warning" means that the user will be warned that this particular setting is not supported in CS and won't be migrated. Another possible setting could be "error", which aborts the conversion immediately with an error message. * We also need strict validation of certain things: the sum of capacities are 100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is allowed with the same name. * [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated and not {{yarn-site.xml}} (only the scheduler class should be changed in this file). Looking at the current settings and mappings, I believe this is not possible, because there are properties that should be placed in the {{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even if those properties can reside in {{capacity-scheduler.xml}} (someone must confirm this), their FS counterpart should be removed from the site config. * Output of the tool is most likely going to be file/files, but having stdout as an option is preferred. Not sure if I missed something, feel free to correct me if I'm wrong. was (Author: pbacsko): Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth]. A couple of things to add: * Development of this tool should happen in stages. Stage #1 converter will not support every FS feature/property, simply because it's missing in CS. Those which are missing should be added gradually (see YARN-9840 and YARN9841 for example). What we have in the POC is already a good starting point. * Users should be able to define a "rule" file, which imposes certain limits on various things (eg. no more than 100 queue are allowed on the same level) and also defines what should happen if the tool encounters an unsupported feature. For example, CS does not support max running apps per user, so we can have the following settings: {noformat} maximumQueuesPerLevel=100 maxAppsPerUser=warning {noformat} In this case, "warning" means that the user will be warned that this particular setting is not supported in CS and won't be migrated. Another possible setting could be "error", which aborts the conversion immediately with an error message. * We also need strict validation of certain things: the sum of capacities are 100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is allowed with the same name. * [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated and not {{yarn-site.xml}} (only the scheduler class should be changed in this file). Looking at the current settings and mappings, I believe this is not possible, because there are properties that should be placed in the {{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even if those properties can reside in {{capacity-scheduler.xml}} (someone must confirm this), their FS counterpart should be removed from the site config. Not sure if I missed something, feel free to correct me if I'm wrong. > Migration tool that help to generate CS configs based on FS > --- > > Key: YARN-9699 > URL: https://issues.apache.org/jira/browse/YARN-9699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wanqiang Ji >Assignee: Gergely Pollak >Priority: Major > Attachments: FS_to_CS_migration_POC.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9699) Migration tool that help to generate CS configs based on FS
[ https://issues.apache.org/jira/browse/YARN-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938306#comment-16938306 ] Peter Bacsko commented on YARN-9699: Had a discussion with [~sunilg], [~Prabhu Joseph], [~snemeth]. A couple of things to add: * Development of this tool should happen in stages. Stage #1 converter will not support every FS feature/property, simply because it's missing in CS. Those which are missing should be added gradually (see YARN-9840 and YARN9841 for example). What we have in the POC is already a good starting point. * Users should be able to define a "rule" file, which imposes certain limits on various things (eg. no more than 100 queue are allowed on the same level) and also defines what should happen if the tool encounters an unsupported feature. For example, CS does not support max running apps per user, so we can have the following settings: {noformat} maximumQueuesPerLevel=100 maxAppsPerUser=warning {noformat} In this case, "warning" means that the user will be warned that this particular setting is not supported in CS and won't be migrated. Another possible setting could be "error", which aborts the conversion immediately with an error message. * We also need strict validation of certain things: the sum of capacities are 100.0 (unless a capacity is defined in mem/vcore pair) and no two leaf queue is allowed with the same name. * [~sunilg]'s idea is that only {{capacity-scheduler.xml}} should be generated and not {{yarn-site.xml}} (only the scheduler class should be changed in this file). Looking at the current settings and mappings, I believe this is not possible, because there are properties that should be placed in the {{yarn-site.xml}} - see the {{convertSiteProperties()}} method in the POC. Even if those properties can reside in {{capacity-scheduler.xml}} (someone must confirm this), their FS counterpart should be removed from the site config. Not sure if I missed something, feel free to correct me if I'm wrong. > Migration tool that help to generate CS configs based on FS > --- > > Key: YARN-9699 > URL: https://issues.apache.org/jira/browse/YARN-9699 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wanqiang Ji >Assignee: Gergely Pollak >Priority: Major > Attachments: FS_to_CS_migration_POC.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9730) Support forcing configured partitions to be exclusive based on app node label
[ https://issues.apache.org/jira/browse/YARN-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16938302#comment-16938302 ] Bibin Chundatt edited comment on YARN-9730 at 9/26/19 6:00 AM: --- [~jhung] Thank you for working on this. Sorry to come in really late too .. {code} 240 if (ResourceRequest.ANY.equals(req.getResourceName())) { 241 SchedulerUtils.enforcePartitionExclusivity(req, 242 getRmContext().getExclusiveEnforcedPartitions(), 243 asc.getNodeLabelExpression()); 244 } {code} Configuration query on the AM allocation flow is going to be costly which i observed while evaluating the performance.. Could you optimize {{getRmContext().getExclusiveEnforcedPartitions()}}, since this is going to be invoked for every *request* was (Author: bibinchundatt): [~jhung] Thank you for working on this. Sorry to come in really late too .. {quote} 240 if (ResourceRequest.ANY.equals(req.getResourceName())) { 241 SchedulerUtils.enforcePartitionExclusivity(req, 242 getRmContext().getExclusiveEnforcedPartitions(), 243 asc.getNodeLabelExpression()); 244 } {quote} Configuration query on the AM allocation flow is going to be costly which i observed while evaluating the performance.. Could you optimize {{getRmContext().getExclusiveEnforcedPartitions()}}, since this is going to be invoked for every *request* > Support forcing configured partitions to be exclusive based on app node label > - > > Key: YARN-9730 > URL: https://issues.apache.org/jira/browse/YARN-9730 > Project: Hadoop YARN > Issue Type: Task >Reporter: Jonathan Hung >Assignee: Jonathan Hung >Priority: Major > Labels: release-blocker > Fix For: 2.10.0, 3.3.0, 3.2.2, 3.1.4 > > Attachments: YARN-9730-branch-2.001.patch, YARN-9730.001.addendum, > YARN-9730.001.patch, YARN-9730.002.addendum, YARN-9730.002.patch, > YARN-9730.003.patch > > > Use case: queue X has all of its workload in non-default (exclusive) > partition P (by setting app submission context's node label set to P). Node > in partition Q != P heartbeats to RM. Capacity scheduler loops through every > application in X, and every scheduler key in this application, and fails to > allocate each time since the app's requested label and the node's label don't > match. This causes huge performance degradation when number of apps in X is > large. > To fix the issue, allow RM to configure partitions as "forced-exclusive". If > partition P is "forced-exclusive", then: > * 1a. If app sets its submission context's node label to P, all its resource > requests will be overridden to P > * 1b. If app sets its submission context's node label Q, any of its resource > requests whose labels are P will be overridden to Q > * 2. In the scheduler, we add apps with node label expression P to a > separate data structure. When a node in partition P heartbeats to scheduler, > we only try to schedule apps in this data structure. When a node in partition > Q heartbeats to scheduler, we schedule the rest of the apps as normal. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org