[jira] [Updated] (YARN-10747) Bump YARN CSI protobuf version to 3.7.1
[ https://issues.apache.org/jira/browse/YARN-10747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated YARN-10747: - Fix Version/s: 3.3.3 Backported to branch-3.3. > Bump YARN CSI protobuf version to 3.7.1 > --- > > Key: YARN-10747 > URL: https://issues.apache.org/jira/browse/YARN-10747 > Project: Hadoop YARN > Issue Type: Task >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.3 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Bumping YARN CSI protobuf version to 3.7.1 to keep it consistent with > hadoop's protobuf version. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10547) Decouple job parsing logic from SLSRunner
[ https://issues.apache.org/jira/browse/YARN-10547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17510008#comment-17510008 ] Benjamin Teke commented on YARN-10547: -- Thanks [~snemeth]. As the javac issues were already present and these checkstyle issues are either HiddenField or VisibilityModifier ones, which don't make sense most of the time +1 from my side. > Decouple job parsing logic from SLSRunner > - > > Key: YARN-10547 > URL: https://issues.apache.org/jira/browse/YARN-10547 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Szilard Nemeth >Assignee: Szilard Nemeth >Priority: Minor > Attachments: YARN-10547.001.patch, YARN-10547.002.patch, > YARN-10547.003.patch, YARN-10547.004.patch, YARN-10547.005.patch > > > SLSRunner has too many responsibilities. > One of them is to parse the job details from the SLS input formats and launch > the AMs and task containers. > As a first step, the job parser logic could be decoupled from this class. > There are 3 types of inputs: > - SLS trace > - Synth > - Rumen > Their job parsing method are: > - SLS trace: > https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L479-L526 > - Synth: > https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L722-L790 > - Rumen: > https://github.com/apache/hadoop/blob/005b854f6bad66defafae0abf95dabc6c36ca8b1/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java#L651-L716 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11087) Introduce the config to control the refresh interval in RMDelegatedNodeLabelsUpdater
[ https://issues.apache.org/jira/browse/YARN-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509970#comment-17509970 ] András Győri commented on YARN-11087: - [~zuston] From the perspective of scalability, I think the first approach is better (and also it is easier to implement). > Introduce the config to control the refresh interval in > RMDelegatedNodeLabelsUpdater > > > Key: YARN-11087 > URL: https://issues.apache.org/jira/browse/YARN-11087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > h3. Why > When configuring nodes to labels mapping by Delegated-Centralized mode, once > the newly registered nodes comes, the node-label of this node wont be > attached until triggering the nodelabel mapping provider, which the delayed > time depends on the scheduler interval. > h3. How to solve this bug > I think there are two options > # Introduce the new config to specify the update-node-label schedule > interval. If u want to quickly refresh the newly registered nodes, user > should decrease the interval. > # Once the newly registered node come, directly trigger the execution of > nodelabel mapping provider. But if the provider is the time-consuming > operation and lots of nodes register to RM at the same time, this will also > make some nodes with node-label delay. > I prefer the first option and submit the PR to solve this. > Feel free to discuss if having any ideas. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10565) Follow-up to YARN-10504
[ https://issues.apache.org/jira/browse/YARN-10565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke updated YARN-10565: - Summary: Follow-up to YARN-10504 (was: Refactor CS queue initialization to simplify weight mode calculation) > Follow-up to YARN-10504 > --- > > Key: YARN-10565 > URL: https://issues.apache.org/jira/browse/YARN-10565 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Benjamin Teke >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > In YARN-10504 weight mode support was introduced to CS. This jira is a > followup to simplify and restructure the initialization, so that the weight > calculation/absolute/percentage mode is easier to understand and modify. > To be refactored: > * In ParentQueue.java#1099 the error message should be more specific, instead > of the {{LOG.error("Fatal issue found: e", e);}} > * -AutoCreatedLeafQueue.clearConfigurableFields should clear > NORMALIZED_WEIGHT just to be on the safe side- > * -Uncomment the commented assertions in > TestCapacitySchedulerAutoCreatedQueueBase.validateEffectiveMinResource- > * -Check whether the assertion modification in TestRMWebServices is > absolutely necessary or could be hiding a bug.- > * -Same for TestRMWebServicesForCSWithPartitions.java- > Additional information: > The original flow was modified to allow the dynamic weight-capacity > calculation. > This resulted in a new flow, which is now harder to understand. > With a cleanup it could be made simpler, the duplicate calculations could be > avoided. > The changed functionality should either be explained (if deemed correct) or > fixed (see YARN-10590). > Investigate how the CS reinit works, it could contain some possibly redundant > initialization code fragments. > Note: Since most of the items were completed in other refactor items, only > the first one is being patched here. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11087) Introduce the config to control the refresh interval in RMDelegatedNodeLabelsUpdater
[ https://issues.apache.org/jira/browse/YARN-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509903#comment-17509903 ] Junfan Zhang commented on YARN-11087: - What do u think of the second option? [~snemeth] > Introduce the config to control the refresh interval in > RMDelegatedNodeLabelsUpdater > > > Key: YARN-11087 > URL: https://issues.apache.org/jira/browse/YARN-11087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > h3. Why > When configuring nodes to labels mapping by Delegated-Centralized mode, once > the newly registered nodes comes, the node-label of this node wont be > attached until triggering the nodelabel mapping provider, which the delayed > time depends on the scheduler interval. > h3. How to solve this bug > I think there are two options > # Introduce the new config to specify the update-node-label schedule > interval. If u want to quickly refresh the newly registered nodes, user > should decrease the interval. > # Once the newly registered node come, directly trigger the execution of > nodelabel mapping provider. But if the provider is the time-consuming > operation and lots of nodes register to RM at the same time, this will also > make some nodes with node-label delay. > I prefer the first option and submit the PR to solve this. > Feel free to discuss if having any ideas. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-11087) Introduce the config to control the refresh interval in RMDelegatedNodeLabelsUpdater
[ https://issues.apache.org/jira/browse/YARN-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509903#comment-17509903 ] Junfan Zhang edited comment on YARN-11087 at 3/21/22, 1:59 PM: --- What do u think of the second option? [~quapaw] was (Author: zuston): What do u think of the second option? [~snemeth] > Introduce the config to control the refresh interval in > RMDelegatedNodeLabelsUpdater > > > Key: YARN-11087 > URL: https://issues.apache.org/jira/browse/YARN-11087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > h3. Why > When configuring nodes to labels mapping by Delegated-Centralized mode, once > the newly registered nodes comes, the node-label of this node wont be > attached until triggering the nodelabel mapping provider, which the delayed > time depends on the scheduler interval. > h3. How to solve this bug > I think there are two options > # Introduce the new config to specify the update-node-label schedule > interval. If u want to quickly refresh the newly registered nodes, user > should decrease the interval. > # Once the newly registered node come, directly trigger the execution of > nodelabel mapping provider. But if the provider is the time-consuming > operation and lots of nodes register to RM at the same time, this will also > make some nodes with node-label delay. > I prefer the first option and submit the PR to solve this. > Feel free to discuss if having any ideas. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11088) Introduce the config to control the AM allocated to non-exclusive nodes
[ https://issues.apache.org/jira/browse/YARN-11088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509902#comment-17509902 ] Junfan Zhang commented on YARN-11088: - I will submit PR tomorrow and it has been applied in our internal Yarn. Glad to contribute to the community. [~quapaw] > Introduce the config to control the AM allocated to non-exclusive nodes > --- > > Key: YARN-11088 > URL: https://issues.apache.org/jira/browse/YARN-11088 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > > h4. Why > Current the implementation of Yarn about AM allocation on non-exclusive nodes > is directly to fail fast. I know this aims to keep the stability of job, > because the container in non-exclusive nodes will be preempted. > But Yarn cluster in our internal company exists on-premise NodeManagers and > elastic NodeManagers (which is built on K8s). When all the elastic > nodemanagers decommission, we hope that the AM can be scheduled to > non-exclusive nodes. > h4. How to support it > Introduce the new config to control the AM can be allocated to non-exclusive > nodes. > *Feel free to discuss if having any ideas!* -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11088) Introduce the config to control the AM allocated to non-exclusive nodes
[ https://issues.apache.org/jira/browse/YARN-11088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17509888#comment-17509888 ] András Győri commented on YARN-11088: - [~zuston] This could be a handful feature. I could help if you submit a PR of the change. > Introduce the config to control the AM allocated to non-exclusive nodes > --- > > Key: YARN-11088 > URL: https://issues.apache.org/jira/browse/YARN-11088 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > > h4. Why > Current the implementation of Yarn about AM allocation on non-exclusive nodes > is directly to fail fast. I know this aims to keep the stability of job, > because the container in non-exclusive nodes will be preempted. > But Yarn cluster in our internal company exists on-premise NodeManagers and > elastic NodeManagers (which is built on K8s). When all the elastic > nodemanagers decommission, we hope that the AM can be scheduled to > non-exclusive nodes. > h4. How to support it > Introduce the new config to control the AM can be allocated to non-exclusive > nodes. > *Feel free to discuss if having any ideas!* -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11086) Add space in debug log of ParentQueue
[ https://issues.apache.org/jira/browse/YARN-11086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-11086. --- Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > Add space in debug log of ParentQueue > - > > Key: YARN-11086 > URL: https://issues.apache.org/jira/browse/YARN-11086 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Junfan Zhang >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11089) Fix typo in RM audit log
[ https://issues.apache.org/jira/browse/YARN-11089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11089: -- Summary: Fix typo in RM audit log (was: Fix typo in rm audit log) > Fix typo in RM audit log > > > Key: YARN-11089 > URL: https://issues.apache.org/jira/browse/YARN-11089 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-11084) Introduce new config to specify AM default node-label when not specified
[ https://issues.apache.org/jira/browse/YARN-11084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-11084: - Assignee: Junfan Zhang > Introduce new config to specify AM default node-label when not specified > > > Key: YARN-11084 > URL: https://issues.apache.org/jira/browse/YARN-11084 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > h2. What > When submitting application to Yarn and user don't specify any node-label on > AM request and {{{}ApplicationSubmissionContext{}}}, we hope that Yarn could > provide the default AM node-label. > > h2. Why > Yarn cluster in our internal company exists on-premise NodeManagers and > elastic NodeManagers (which is built on K8s). To prevent application > instability due to elastic NM decommission, we hope that the AM of job can be > allocated to on-premise NMs. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-11087) Introduce the config to control the refresh interval in RMDelegatedNodeLabelsUpdater
[ https://issues.apache.org/jira/browse/YARN-11087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-11087: - Assignee: Junfan Zhang > Introduce the config to control the refresh interval in > RMDelegatedNodeLabelsUpdater > > > Key: YARN-11087 > URL: https://issues.apache.org/jira/browse/YARN-11087 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > h3. Why > When configuring nodes to labels mapping by Delegated-Centralized mode, once > the newly registered nodes comes, the node-label of this node wont be > attached until triggering the nodelabel mapping provider, which the delayed > time depends on the scheduler interval. > h3. How to solve this bug > I think there are two options > # Introduce the new config to specify the update-node-label schedule > interval. If u want to quickly refresh the newly registered nodes, user > should decrease the interval. > # Once the newly registered node come, directly trigger the execution of > nodelabel mapping provider. But if the provider is the time-consuming > operation and lots of nodes register to RM at the same time, this will also > make some nodes with node-label delay. > I prefer the first option and submit the PR to solve this. > Feel free to discuss if having any ideas. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-11088) Introduce the config to control the AM allocated to non-exclusive nodes
[ https://issues.apache.org/jira/browse/YARN-11088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-11088: - Assignee: Junfan Zhang > Introduce the config to control the AM allocated to non-exclusive nodes > --- > > Key: YARN-11088 > URL: https://issues.apache.org/jira/browse/YARN-11088 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > > h4. Why > Current the implementation of Yarn about AM allocation on non-exclusive nodes > is directly to fail fast. I know this aims to keep the stability of job, > because the container in non-exclusive nodes will be preempted. > But Yarn cluster in our internal company exists on-premise NodeManagers and > elastic NodeManagers (which is built on K8s). When all the elastic > nodemanagers decommission, we hope that the AM can be scheduled to > non-exclusive nodes. > h4. How to support it > Introduce the new config to control the AM can be allocated to non-exclusive > nodes. > *Feel free to discuss if having any ideas!* -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11089) Fix typo in rm audit log
[ https://issues.apache.org/jira/browse/YARN-11089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-11089: -- Fix Version/s: 3.4.0 > Fix typo in rm audit log > > > Key: YARN-11089 > URL: https://issues.apache.org/jira/browse/YARN-11089 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-11089) Fix typo in rm audit log
[ https://issues.apache.org/jira/browse/YARN-11089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth resolved YARN-11089. --- Hadoop Flags: Reviewed Resolution: Fixed > Fix typo in rm audit log > > > Key: YARN-11089 > URL: https://issues.apache.org/jira/browse/YARN-11089 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-11089) Fix typo in rm audit log
[ https://issues.apache.org/jira/browse/YARN-11089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth reassigned YARN-11089: - Assignee: Junfan Zhang > Fix typo in rm audit log > > > Key: YARN-11089 > URL: https://issues.apache.org/jira/browse/YARN-11089 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junfan Zhang >Assignee: Junfan Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org