[jira] [Commented] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity
[ https://issues.apache.org/jira/browse/YARN-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304629#comment-17304629 ] Bilwa S T commented on YARN-10697: -- Thanks [~Jim_Brennan] [~jhung] for your comments. I basically added changes in Resource#toString so that its easier for user to read. I agree its not correct to add it there as its called from many other places. So can we introduce a new method in Resource.java which can print it in MB|GB|TB? > Resources are displayed in bytes in UI for schedulers other than capacity > - > > Key: YARN-10697 > URL: https://issues.apache.org/jira/browse/YARN-10697 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-10697.001.patch, image-2021-03-17-11-30-57-216.png > > > Resources.newInstance expects MB as memory whereas in MetricsOverviewTable > passes resources in bytes . Also we should display memory in GB for better > readability for user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10704) The CS effective capacity for absolute mode in UI should support GPU.
Qi Zhu created YARN-10704: - Summary: The CS effective capacity for absolute mode in UI should support GPU. Key: YARN-10704 URL: https://issues.apache.org/jira/browse/YARN-10704 Project: Hadoop YARN Issue Type: Sub-task Components: capacity scheduler Reporter: Qi Zhu Assignee: Qi Zhu Attachments: image-2021-03-19-12-05-28-412.png, image-2021-03-19-12-08-35-273.png Actually there are no information about the effective capacity about GPU in UI for absolute resource mode. !image-2021-03-19-12-05-28-412.png|width=873,height=136! But we have this information in QueueMetrics: !image-2021-03-19-12-08-35-273.png|width=613,height=268! It's very important for our GPU users to use in absolute mode, there still have nothing to know GPU absolute information in CS Queue UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10616) Nodemanagers cannot detect GPU failures
[ https://issues.apache.org/jira/browse/YARN-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304596#comment-17304596 ] Qi Zhu commented on YARN-10616: --- Thanks [~ebadger] for clarify. It make sense to me now. If we can realize that, when we use -updateNodeResource, we can check whether some nodes' original resource is changed by NM-RM heartbeat check, just by cached or a flag, if changed we should response those node key information to client. And the unhealthy node which reduce GPU resource, we can also add to the UI and Metrics, to let me known, but not affect the scheduling. Thanks. > Nodemanagers cannot detect GPU failures > --- > > Key: YARN-10616 > URL: https://issues.apache.org/jira/browse/YARN-10616 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > > As stated above, the bug is that GPUs can fail, but the NM doesn't notice the > failure. The NM will continue to schedule tasks onto the failed GPU, but the > GPU won't actually work and so the container will likely fail or run very > slowly on the CPU. > My initial thought on solving this is to add NM resource capabilities to the > NM-RM heartbeat and have the RM update its view of the NM's resource > capabilities on each heartbeat. This would be a fairly trivial change, but > comes with the unfortunate side effect that it completely undermindes {{yarn > rmadmin -updateNodeResource}}. When you run {{-updateNodeResource}} the > assumption is that the node will retain these new resource capabilities until > either the NM or RM is restarted. But with a heartbeat interaction constantly > updating those resource capabilities from the NM perspective, the explicit > changes via {{-updateNodeResource}} would be lost on the next heartbeat. We > could potentially add a flag to ignore the heartbeat updates for any node who > has had {{-updateNodeResource}} called on it (until a re-registration). But > in this case, the node would no longer get resource capability updates until > the NM or RM restarted. If {{-updateNodeResource}} is used a decent amount, > then that would give potentially unexpected behavior in relation to nodes > properly auto-detecting failures. > Another idea is to add a GPU monitor thread on the NM to periodically run > {{nvidia-smi}} and detect changes in the number of healthy GPUs. If that > number decreased, the node would hook into the health check status and mark > itself as unhealthy. The downside of this approach is that a single failed > GPU would mean taking out an entire node (e.g. 8 GPUs). > I would really like to go with the NM-RM heartbeat approach, but the > {{-updateNodeResource}} issue bothers me. The second approach is ok I guess, > but I also don't like taking down whole GPU nodes when only a single GPU is > bad. Would like to hear thoughts of others on how best to approach this > [~jhung], [~leftnoteasy], [~sunilg], [~epayne], [~Jim_Brennan] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10701) The yarn.resource-types should support multi types without trimmed.
[ https://issues.apache.org/jira/browse/YARN-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10701: --- Fix Version/s: 3.3.1 3.4.0 +1. Thanks for the patch, [~zhuqi]. I've committed this to trunk (3.4) and branch-3.3 > The yarn.resource-types should support multi types without trimmed. > --- > > Key: YARN-10701 > URL: https://issues.apache.org/jira/browse/YARN-10701 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-10701.001.patch, YARN-10701.002.patch > > > {code:java} > > > yarn.resource-types > yarn.io/gpu, yarn.io/fpga > > {code} > When i configured the resource type above with gpu and fpga, the error > happend: > > {code:java} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is > not a valid resource name. A valid resource name must begin with a letter and > contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource > name may also be optionally preceded by a name space followed by a slash. A > valid name space consists of period-separated groups of letters, numbers, and > dashes.{code} > > The resource types should support trim. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10697) Resources are displayed in bytes in UI for schedulers other than capacity
[ https://issues.apache.org/jira/browse/YARN-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304467#comment-17304467 ] Jonathan Hung commented on YARN-10697: -- [~Jim_Brennan] [~BilwaST] I agree, I don't think we should make the Resource#toString change. IMO users expect this to be bytes and making this change could have some unintended consequences e.g. breaking log parsing tooling. > Resources are displayed in bytes in UI for schedulers other than capacity > - > > Key: YARN-10697 > URL: https://issues.apache.org/jira/browse/YARN-10697 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Attachments: YARN-10697.001.patch, image-2021-03-17-11-30-57-216.png > > > Resources.newInstance expects MB as memory whereas in MetricsOverviewTable > passes resources in bytes . Also we should display memory in GB for better > readability for user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10616) Nodemanagers cannot detect GPU failures
[ https://issues.apache.org/jira/browse/YARN-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304456#comment-17304456 ] Eric Badger edited comment on YARN-10616 at 3/18/21, 9:22 PM: -- The issue with graceful decommissioning is that you have to edit a file on the RM. It would be nice to be able to run a {{yarn rmadmin}} command from a remote host to tell the RM to graceful decom a node. AFAIK that functionality doesn't exist. I still don't like the idea of completely undermining {{-updateNodeResource}}. I think I would be more on board with a feature that is disabled by default, but can be enabled. That way we won't break any existing ways of doing things, but will give more flexibility to those who want to detect these types of failures. They will just have to understand that it isn't compatible with {{-updateNodeResource}} was (Author: ebadger): The issue with graceful decommissioning is that you have to edit a file on the RM. It would be nice to be able to run a `yarn rmadmin` command from a remote host to tell the RM to graceful decom a node. AFAIK that functionality doesn't exist. I still don't like the idea of completely undermining {{-updateNodeResource}}. I think I would be more on board with a feature that is disabled by default, but can be enabled. That way we won't break any existing ways of doing things, but will give more flexibility to those who want to detect these types of failures. They will just have to understand that it isn't compatible with {{-updateNodeResource}} > Nodemanagers cannot detect GPU failures > --- > > Key: YARN-10616 > URL: https://issues.apache.org/jira/browse/YARN-10616 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > > As stated above, the bug is that GPUs can fail, but the NM doesn't notice the > failure. The NM will continue to schedule tasks onto the failed GPU, but the > GPU won't actually work and so the container will likely fail or run very > slowly on the CPU. > My initial thought on solving this is to add NM resource capabilities to the > NM-RM heartbeat and have the RM update its view of the NM's resource > capabilities on each heartbeat. This would be a fairly trivial change, but > comes with the unfortunate side effect that it completely undermindes {{yarn > rmadmin -updateNodeResource}}. When you run {{-updateNodeResource}} the > assumption is that the node will retain these new resource capabilities until > either the NM or RM is restarted. But with a heartbeat interaction constantly > updating those resource capabilities from the NM perspective, the explicit > changes via {{-updateNodeResource}} would be lost on the next heartbeat. We > could potentially add a flag to ignore the heartbeat updates for any node who > has had {{-updateNodeResource}} called on it (until a re-registration). But > in this case, the node would no longer get resource capability updates until > the NM or RM restarted. If {{-updateNodeResource}} is used a decent amount, > then that would give potentially unexpected behavior in relation to nodes > properly auto-detecting failures. > Another idea is to add a GPU monitor thread on the NM to periodically run > {{nvidia-smi}} and detect changes in the number of healthy GPUs. If that > number decreased, the node would hook into the health check status and mark > itself as unhealthy. The downside of this approach is that a single failed > GPU would mean taking out an entire node (e.g. 8 GPUs). > I would really like to go with the NM-RM heartbeat approach, but the > {{-updateNodeResource}} issue bothers me. The second approach is ok I guess, > but I also don't like taking down whole GPU nodes when only a single GPU is > bad. Would like to hear thoughts of others on how best to approach this > [~jhung], [~leftnoteasy], [~sunilg], [~epayne], [~Jim_Brennan] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10616) Nodemanagers cannot detect GPU failures
[ https://issues.apache.org/jira/browse/YARN-10616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304456#comment-17304456 ] Eric Badger commented on YARN-10616: The issue with graceful decommissioning is that you have to edit a file on the RM. It would be nice to be able to run a `yarn rmadmin` command from a remote host to tell the RM to graceful decom a node. AFAIK that functionality doesn't exist. I still don't like the idea of completely undermining {{-updateNodeResource}}. I think I would be more on board with a feature that is disabled by default, but can be enabled. That way we won't break any existing ways of doing things, but will give more flexibility to those who want to detect these types of failures. They will just have to understand that it isn't compatible with {{-updateNodeResource}} > Nodemanagers cannot detect GPU failures > --- > > Key: YARN-10616 > URL: https://issues.apache.org/jira/browse/YARN-10616 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > > As stated above, the bug is that GPUs can fail, but the NM doesn't notice the > failure. The NM will continue to schedule tasks onto the failed GPU, but the > GPU won't actually work and so the container will likely fail or run very > slowly on the CPU. > My initial thought on solving this is to add NM resource capabilities to the > NM-RM heartbeat and have the RM update its view of the NM's resource > capabilities on each heartbeat. This would be a fairly trivial change, but > comes with the unfortunate side effect that it completely undermindes {{yarn > rmadmin -updateNodeResource}}. When you run {{-updateNodeResource}} the > assumption is that the node will retain these new resource capabilities until > either the NM or RM is restarted. But with a heartbeat interaction constantly > updating those resource capabilities from the NM perspective, the explicit > changes via {{-updateNodeResource}} would be lost on the next heartbeat. We > could potentially add a flag to ignore the heartbeat updates for any node who > has had {{-updateNodeResource}} called on it (until a re-registration). But > in this case, the node would no longer get resource capability updates until > the NM or RM restarted. If {{-updateNodeResource}} is used a decent amount, > then that would give potentially unexpected behavior in relation to nodes > properly auto-detecting failures. > Another idea is to add a GPU monitor thread on the NM to periodically run > {{nvidia-smi}} and detect changes in the number of healthy GPUs. If that > number decreased, the node would hook into the health check status and mark > itself as unhealthy. The downside of this approach is that a single failed > GPU would mean taking out an entire node (e.g. 8 GPUs). > I would really like to go with the NM-RM heartbeat approach, but the > {{-updateNodeResource}} issue bothers me. The second approach is ok I guess, > but I also don't like taking down whole GPU nodes when only a single GPU is > bad. Would like to hear thoughts of others on how best to approach this > [~jhung], [~leftnoteasy], [~sunilg], [~epayne], [~Jim_Brennan] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10597) CSMappingPlacementRule should not create new instance of Groups
[ https://issues.apache.org/jira/browse/YARN-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304445#comment-17304445 ] Ahmed Hussein commented on YARN-10597: -- Thanks [~shuzirra] for the patch. It is fine to ignore the error of the init tests. It should be fine to enough to verify against the tests affected by YARN-10425. I am (+1 non-binding) > CSMappingPlacementRule should not create new instance of Groups > --- > > Key: YARN-10597 > URL: https://issues.apache.org/jira/browse/YARN-10597 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: YARN-10597.001.patch > > > As [~ahussein] pointed out in YARN-10425, no new Groups instance should be > created. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor
[ https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304408#comment-17304408 ] Hadoop QA commented on YARN-10702: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 25s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 24s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 52s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 16s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 21s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 39s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 53s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 2s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 26m 15s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 3m 54s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 28s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 9s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 9s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 9s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 9s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 36s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/826/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 84 unchanged - 0 fixed = 87 total (was 84) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 47s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {col
[jira] [Commented] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor
[ https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304387#comment-17304387 ] Hadoop QA commented on YARN-10702: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 54s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 1s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 23s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 15s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 32s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 34s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 2s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 10s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 24m 51s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 4m 6s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 31s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 58s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 58s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 4s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 4s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 32s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/825/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 6 new + 84 unchanged - 0 fixed = 90 total (was 84) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 53s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {col
[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304337#comment-17304337 ] Hadoop QA commented on YARN-10674: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 34s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 53s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 0s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 18m 21s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 1m 58s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 13 unchanged - 7 fixed = 13 total (was 20) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 53s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.0
[jira] [Commented] (YARN-10495) make the rpath of container-executor configurable
[ https://issues.apache.org/jira/browse/YARN-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304333#comment-17304333 ] Eric Badger commented on YARN-10495: I would suggest using a dockerfile with the same OS version as what you plan to run on > make the rpath of container-executor configurable > - > > Key: YARN-10495 > URL: https://issues.apache.org/jira/browse/YARN-10495 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-10495.001.patch, YARN-10495.002.patch > > > In https://issues.apache.org/jira/browse/YARN-9561 we add dependency on > crypto to container-executor, we meet a case that in our jenkins machine, we > have libcrypto.so.1.0.0 in shared lib env. but in our nodemanager machine we > don't have libcrypto.so.1.0.0 but *libcrypto.so.1.1.* > We use a internal custom dynamic link library environment > /usr/lib/x86_64-linux-gnu > and we build hadoop with parameter as blow > {code:java} > -Drequire.openssl -Dbundle.openssl -Dopenssl.lib=/usr/lib/x86_64-linux-gnu > {code} > > Under jenkins machine shared lib library path /usr/lib/x86_64-linux-gun(where > is libcrypto) > {code:java} > -rw-r--r-- 1 root root 240136 Nov 28 2014 libcroco-0.6.so.3.0.1 > -rw-r--r-- 1 root root54550 Jun 18 2017 libcrypt.a > -rw-r--r-- 1 root root 4306444 Sep 26 2019 libcrypto.a > lrwxrwxrwx 1 root root 18 Sep 26 2019 libcrypto.so -> > libcrypto.so.1.0.0 > -rw-r--r-- 1 root root 2070976 Sep 26 2019 libcrypto.so.1.0.0 > lrwxrwxrwx 1 root root 35 Jun 18 2017 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 Jun 18 2017 libc.so > {code} > > Under nodemanager shared lib library path /usr/lib/x86_64-linux-gun(where is > libcrypto) > {code:java} > -rw-r--r-- 1 root root55852 2�� 7 2019 libcrypt.a > -rw-r--r-- 1 root root 4864244 9�� 28 2019 libcrypto.a > lrwxrwxrwx 1 root root 16 9�� 28 2019 libcrypto.so -> > libcrypto.so.1.1 > -rw-r--r-- 1 root root 2504576 12�� 24 2019 libcrypto.so.1.0.2 > -rw-r--r-- 1 root root 2715840 9�� 28 2019 libcrypto.so.1.1 > lrwxrwxrwx 1 root root 35 2�� 7 2019 libcrypt.so -> > /lib/x86_64-linux-gnu/libcrypt.so.1 > -rw-r--r-- 1 root root 298 2�� 7 2019 libc.so > {code} > We build container-executor with > The libcrypto.so 's version is not same case error when we start nodemanager > > {code:java} > .. 3 more Caused by: > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: > ExitCodeException exitCode=127: /home/hadoop/hadoop/bin/container-executor: > error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared > object file: No such file or directory at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:182) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:208) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:306) > ... 4 more Caused by: ExitCodeException exitCode=127: > /home/hadoop/hadoop/bin/container-executor: error while loading shared > libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file > or directory at org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at > org.apache.hadoop.util.Shell.run(Shell.java:901) at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:154) > ... 6 more > {code} > > We should make RPATH of container-executor configurable to solve this problem -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10703) Fix potential null pointer error of gpuNodeResourceUpdateHandler in NodeResourceMonitorImpl.
[ https://issues.apache.org/jira/browse/YARN-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10703: --- Fix Version/s: 3.3.1 I've also committed this to branch-3.3. This has now been committed to trunk (3.4) and branch-3.3 > Fix potential null pointer error of gpuNodeResourceUpdateHandler in > NodeResourceMonitorImpl. > > > Key: YARN-10703 > URL: https://issues.apache.org/jira/browse/YARN-10703 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-10703.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10692) Add Node GPU Utilization and apply to NodeMetrics.
[ https://issues.apache.org/jira/browse/YARN-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10692: --- Fix Version/s: 3.3.1 I cherry-picked this to branch-3.3 I would like all of the GPU stuff to go back to 3.3 if the cherry-picks are clean. This has now been committed to trunk (3.4) and branch-3.3 > Add Node GPU Utilization and apply to NodeMetrics. > -- > > Key: YARN-10692 > URL: https://issues.apache.org/jira/browse/YARN-10692 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: YARN-10692.001.patch, YARN-10692.002.patch, > YARN-10692.003.patch > > > Now there are no node level GPU Utilization, this issue will add it, and add > it to NodeMetrics first. > cc [~pbacsko] [~Jim_Brennan] [~ebadger] [~gandras] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10641) Refactor the max app related update, and fix maxApplications update error when add new queues.
[ https://issues.apache.org/jira/browse/YARN-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10641: -- Summary: Refactor the max app related update, and fix maxApplications update error when add new queues. (was: Refactor the max app related update, and fix maxApllications update error when add new queues.) > Refactor the max app related update, and fix maxApplications update error > when add new queues. > -- > > Key: YARN-10641 > URL: https://issues.apache.org/jira/browse/YARN-10641 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Fix For: 3.4.0 > > Attachments: YARN-10641.001.patch, YARN-10641.002.patch, > YARN-10641.003.patch, YARN-10641.004.patch, YARN-10641.005.patch, > YARN-10641.006.patch, image-2021-02-20-15-49-58-677.png, > image-2021-02-20-15-53-51-099.png, image-2021-02-20-15-55-44-780.png, > image-2021-02-20-16-29-18-519.png, image-2021-02-20-16-31-13-714.png > > > When refactor the update logic in YARN-10504 . > The update max applications based abs/cap is wrong, this should be fixed, > because the max applications is key part to limit applications in CS. > For example: > When adding a dynamic queue, the other children's max app of parent queue are > not updated correctly: > !image-2021-02-20-15-53-51-099.png|width=639,height=509! > The new added queue's max app will updated correctly: > !image-2021-02-20-15-55-44-780.png|width=542,height=426! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10703) Fix potential null pointer error of gpuNodeResourceUpdateHandler in NodeResourceMonitorImpl.
[ https://issues.apache.org/jira/browse/YARN-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304313#comment-17304313 ] Eric Badger commented on YARN-10703: +1 I've committed this to trunk (3.4) > Fix potential null pointer error of gpuNodeResourceUpdateHandler in > NodeResourceMonitorImpl. > > > Key: YARN-10703 > URL: https://issues.apache.org/jira/browse/YARN-10703 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10703.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10703) Fix potential null pointer error of gpuNodeResourceUpdateHandler in NodeResourceMonitorImpl.
[ https://issues.apache.org/jira/browse/YARN-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger updated YARN-10703: --- Fix Version/s: 3.4.0 > Fix potential null pointer error of gpuNodeResourceUpdateHandler in > NodeResourceMonitorImpl. > > > Key: YARN-10703 > URL: https://issues.apache.org/jira/browse/YARN-10703 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10703.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10703) Fix potential null pointer error of gpuNodeResourceUpdateHandler in NodeResourceMonitorImpl.
[ https://issues.apache.org/jira/browse/YARN-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304282#comment-17304282 ] Hadoop QA commented on YARN-10703: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 21s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 40s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 31s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 25s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 39s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 19m 3s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 1m 22s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 29s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 29s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 23s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 23s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 37s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {col
[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304253#comment-17304253 ] Hadoop QA commented on YARN-10674: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 24m 50s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 36s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 42s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 19m 49s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 1m 52s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green}{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 13 unchanged - 7 fixed = 13 total (was 20) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 9s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
[jira] [Updated] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor
[ https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10702: --- Attachment: YARN-10702.004.patch > Add cluster metric for amount of CPU used by RM Event Processor > --- > > Key: YARN-10702 > URL: https://issues.apache.org/jira/browse/YARN-10702 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.10.1, 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: Scheduler-Busy.png, YARN-10702.001.patch, > YARN-10702.002.patch, YARN-10702.003.patch, YARN-10702.004.patch, > simon-scheduler-busy.png > > > Add a cluster metric to track the cpu usage of the ResourceManager Event > Processing thread. This lets us know when the critical path of the RM is > running out of headroom. > This feature was originally added for us internally by [~nroberts] and we've > been running with it on production clusters for nearly four years. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor
[ https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304250#comment-17304250 ] Jim Brennan commented on YARN-10702: Jumped the gun. Patch 004 has fixes for the other checkstyle issues. > Add cluster metric for amount of CPU used by RM Event Processor > --- > > Key: YARN-10702 > URL: https://issues.apache.org/jira/browse/YARN-10702 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.10.1, 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: Scheduler-Busy.png, YARN-10702.001.patch, > YARN-10702.002.patch, YARN-10702.003.patch, YARN-10702.004.patch, > simon-scheduler-busy.png > > > Add a cluster metric to track the cpu usage of the ResourceManager Event > Processing thread. This lets us know when the critical path of the RM is > running out of headroom. > This feature was originally added for us internally by [~nroberts] and we've > been running with it on production clusters for nearly four years. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor
[ https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304240#comment-17304240 ] Jim Brennan commented on YARN-10702: Thanks for the review [~zhuqi]! patch 003 fixes the method names as suggested. > Add cluster metric for amount of CPU used by RM Event Processor > --- > > Key: YARN-10702 > URL: https://issues.apache.org/jira/browse/YARN-10702 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.10.1, 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: Scheduler-Busy.png, YARN-10702.001.patch, > YARN-10702.002.patch, YARN-10702.003.patch, simon-scheduler-busy.png > > > Add a cluster metric to track the cpu usage of the ResourceManager Event > Processing thread. This lets us know when the critical path of the RM is > running out of headroom. > This feature was originally added for us internally by [~nroberts] and we've > been running with it on production clusters for nearly four years. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10702) Add cluster metric for amount of CPU used by RM Event Processor
[ https://issues.apache.org/jira/browse/YARN-10702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated YARN-10702: --- Attachment: YARN-10702.003.patch > Add cluster metric for amount of CPU used by RM Event Processor > --- > > Key: YARN-10702 > URL: https://issues.apache.org/jira/browse/YARN-10702 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.10.1, 3.4.0 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Attachments: Scheduler-Busy.png, YARN-10702.001.patch, > YARN-10702.002.patch, YARN-10702.003.patch, simon-scheduler-busy.png > > > Add a cluster metric to track the cpu usage of the ResourceManager Event > Processing thread. This lets us know when the critical path of the RM is > running out of headroom. > This feature was originally added for us internally by [~nroberts] and we've > been running with it on production clusters for nearly four years. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304228#comment-17304228 ] Qi Zhu commented on YARN-10674: --- [~gandras] Now i understand you, we can just use the code: {code:java} checkDisablePreemption(preemptionMode, !cliParser.hasOption(CliOption. DISABLE_PREEMPTION.shortSwitch)); {code} {code:java} private static void checkDisablePreemption(FSConfigToCSConfigConverterParams. PreemptionMode preemptionMode, boolean enabled) { if (preemptionMode == null && !enabled) { throw new PreconditionException( "Specified disable-preemption mode is illegal, " + " use nopolicy or observeonly."); } } {code} PreemptionMode.ENABLED is not necessary, i updated in latest patch. I am glad that i make sense now. [~pbacsko] If you any other advice? Thanks. > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch, > YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, > YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, > YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, > YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, > YARN-10674.015.patch, YARN-10674.016.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-10674: -- Attachment: YARN-10674.016.patch > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch, > YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, > YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, > YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, > YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, > YARN-10674.015.patch, YARN-10674.016.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10703) Fix potential null pointer error of gpuNodeResourceUpdateHandler in NodeResourceMonitorImpl.
[ https://issues.apache.org/jira/browse/YARN-10703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304214#comment-17304214 ] Qi Zhu commented on YARN-10703: --- [~pbacsko] [~gandras] [~ebadger] Sorry for the potential null pointer introduced in YARN-10692. I fixed it in this jira. Could you help review this? Thanks. > Fix potential null pointer error of gpuNodeResourceUpdateHandler in > NodeResourceMonitorImpl. > > > Key: YARN-10703 > URL: https://issues.apache.org/jira/browse/YARN-10703 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10703.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-10703) Fix potential null pointer error of gpuNodeResourceUpdateHandler in NodeResourceMonitorImpl.
Qi Zhu created YARN-10703: - Summary: Fix potential null pointer error of gpuNodeResourceUpdateHandler in NodeResourceMonitorImpl. Key: YARN-10703 URL: https://issues.apache.org/jira/browse/YARN-10703 Project: Hadoop YARN Issue Type: Bug Reporter: Qi Zhu Assignee: Qi Zhu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304142#comment-17304142 ] Qi Zhu edited comment on YARN-10674 at 3/18/21, 1:29 PM: - Thanks [~gandras] for reply. If we don't have PreemptionMode.ENABLED, we can use the has option to know if this is enabled and passed to PreemptionMode enabled field. {code:java} public static PreemptionMode fromString(String cliOption, boolean enabled) { if (enabled) { return PreemptionMode.ENABLED; } else { if (StringUtils.isEmpty(cliOption)) { return PreemptionMode.NO_POLICY; } else { if (cliOption.trim(). equals(PreemptionMode.OBSERVE_ONLY.getCliOption())) { return PreemptionMode.OBSERVE_ONLY; } else if (cliOption.trim(). equals(PreemptionMode.NO_POLICY.getCliOption())) { return PreemptionMode.NO_POLICY; } else { return null; } } } } {code} If return null: {code:java} private static void checkDisablePreemption(FSConfigToCSConfigConverterParams. PreemptionMode preemptionMode) { if (preemptionMode == null) { throw new PreconditionException( "Specified disable-preemption mode is illegal, " + " use nopolicy or observeonly."); } } {code} But fromString should return a value to make it used later, if it will return null , it will confused with the case that we disabled but print not nopolicy or observeonly. I think the flag will make this clear that we have four case return value: # null mean that we use illegal value # PreemptionMode.ENABLED # PreemptionMode.OBSERVE_ONLY # PreemptionMode.NO_POLICY What's your opinion about this? was (Author: zhuqi): Thanks [~gandras] for reply. If we don't have PreemptionMode.ENABLED, we can use the has option to know if this is enabled and passed to PreemptionMode enabled field. {code:java} public static PreemptionMode fromString(String cliOption, boolean enabled) { if (enabled) { return PreemptionMode.ENABLED; } else { if (StringUtils.isEmpty(cliOption)) { return PreemptionMode.NO_POLICY; } else { if (cliOption.trim(). equals(PreemptionMode.OBSERVE_ONLY.getCliOption())) { return PreemptionMode.OBSERVE_ONLY; } else if (cliOption.trim(). equals(PreemptionMode.NO_POLICY.getCliOption())) { return PreemptionMode.NO_POLICY; } else { return null; } } } } {code} But fromString should return a value to make it used later, if it will return null , it will confused with the case that we disabled but print not nopolicy or observeonly. I think the flag will make this clear that we have four case return value: # null mean that we use illegal value # PreemptionMode.ENABLED # PreemptionMode.OBSERVE_ONLY # PreemptionMode.NO_POLICY What's your opinion about this? > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch, > YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, > YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, > YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, > YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, > YARN-10674.015.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304142#comment-17304142 ] Qi Zhu edited comment on YARN-10674 at 3/18/21, 1:26 PM: - Thanks [~gandras] for reply. If we don't have PreemptionMode.ENABLED, we can use the has option to know if this is enabled and passed to PreemptionMode enabled field. {code:java} public static PreemptionMode fromString(String cliOption, boolean enabled) { if (enabled) { return PreemptionMode.ENABLED; } else { if (StringUtils.isEmpty(cliOption)) { return PreemptionMode.NO_POLICY; } else { if (cliOption.trim(). equals(PreemptionMode.OBSERVE_ONLY.getCliOption())) { return PreemptionMode.OBSERVE_ONLY; } else if (cliOption.trim(). equals(PreemptionMode.NO_POLICY.getCliOption())) { return PreemptionMode.NO_POLICY; } else { return null; } } } } {code} But fromString should return a value to make it used later, if it will return null , it will confused with the case that we disabled but print not nopolicy or observeonly. I think the flag will make this clear that we have four case return value: # null mean that we use illegal value # PreemptionMode.ENABLED # PreemptionMode.OBSERVE_ONLY # PreemptionMode.NO_POLICY What's your opinion about this? was (Author: zhuqi): Thanks [~gandras] for reply. If we don't have PreemptionMode.ENABLED, we can use the has option to know if this is enabled and passed to PreemptionMode enabled field. {code:java} public static PreemptionMode fromString(String cliOption, boolean enabled) { if (enabled) { return PreemptionMode.ENABLED; } else { if (StringUtils.isEmpty(cliOption)) { return PreemptionMode.NO_POLICY; } else { if (cliOption.trim(). equals(PreemptionMode.OBSERVE_ONLY.getCliOption())) { return PreemptionMode.OBSERVE_ONLY; } else if (cliOption.trim(). equals(PreemptionMode.NO_POLICY.getCliOption())) { return PreemptionMode.NO_POLICY; } else { return null; } } } } {code} But fromString should return a value to make it used later, if it will return null , it will confused with the case that we disabled but print not nopolicy or observeonly. I think the flag will make this clear. What's your opinion about this? > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch, > YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, > YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, > YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, > YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, > YARN-10674.015.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304142#comment-17304142 ] Qi Zhu edited comment on YARN-10674 at 3/18/21, 1:24 PM: - Thanks [~gandras] for reply. If we don't have PreemptionMode.ENABLED, we can use the has option to know if this is enabled and passed to PreemptionMode enabled field. {code:java} public static PreemptionMode fromString(String cliOption, boolean enabled) { if (enabled) { return PreemptionMode.ENABLED; } else { if (StringUtils.isEmpty(cliOption)) { return PreemptionMode.NO_POLICY; } else { if (cliOption.trim(). equals(PreemptionMode.OBSERVE_ONLY.getCliOption())) { return PreemptionMode.OBSERVE_ONLY; } else if (cliOption.trim(). equals(PreemptionMode.NO_POLICY.getCliOption())) { return PreemptionMode.NO_POLICY; } else { return null; } } } } {code} But fromString should return a value to make it used later, if it will return null , it will confused with the case that we disabled but print not nopolicy or observeonly. I think the flag will make this clear. What's your opinion about this? was (Author: zhuqi): Thanks [~gandras] for reply. If we don't have PreemptionMode.ENABLED, we can use the has option to know if this is enabled and passed to PreemptionMode enabled field. But fromString should return a value to make it used later, if it will return null , it will confused with the case that we disabled but print not nopolicy or observeonly. I think the flag will make this clear. What's your opinion about this? > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch, > YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, > YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, > YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, > YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, > YARN-10674.015.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304142#comment-17304142 ] Qi Zhu commented on YARN-10674: --- Thanks [~gandras] for reply. If we don't have PreemptionMode.ENABLED, we can use the has option to know if this is enabled and passed to PreemptionMode enabled field. But fromString should return a value to make it used later, if it will return null , it will confused with the case that we disabled but print not nopolicy or observeonly. I think the flag will make this clear. What's your opinion about this? > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch, > YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, > YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, > YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, > YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, > YARN-10674.015.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304130#comment-17304130 ] Andras Gyori commented on YARN-10674: - These are valid suggestions [~pbacsko] and my idea was this. However, I think the enable flag is not necessary. PreemptionMode.ENABLED is essentially equals to a true flag, while PreemptionMode.NO_POLICY inherently means that the Preemption is disabled. If you check: {code:java} preemptionMode == FSConfigToCSConfigConverterParams.PreemptionMode.NO_POLICY {code} you do not need to check if Preemption is disabled, because the enum is mutually exclusive (you can not have both PreemptionMode.ENABLED and PreemptionMode.NO_POLICY). > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch, > YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, > YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, > YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, > YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, > YARN-10674.015.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10701) The yarn.resource-types should support multi types without trimmed.
[ https://issues.apache.org/jira/browse/YARN-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304124#comment-17304124 ] Qi Zhu commented on YARN-10701: --- Thanks [~gandras] for your confirm. [~pbacsko] Could you help review this? Thanks. > The yarn.resource-types should support multi types without trimmed. > --- > > Key: YARN-10701 > URL: https://issues.apache.org/jira/browse/YARN-10701 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10701.001.patch, YARN-10701.002.patch > > > {code:java} > > > yarn.resource-types > yarn.io/gpu, yarn.io/fpga > > {code} > When i configured the resource type above with gpu and fpga, the error > happend: > > {code:java} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is > not a valid resource name. A valid resource name must begin with a letter and > contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource > name may also be optionally preceded by a name space followed by a slash. A > valid name space consists of period-separated groups of letters, numbers, and > dashes.{code} > > The resource types should support trim. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304122#comment-17304122 ] Qi Zhu commented on YARN-10674: --- Thanks a lot [~pbacsko] for patient review. Very good suggestion, it make sense to me now, i have updated this in latest patch. Thanks.:D > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch, > YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, > YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, > YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, > YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, > YARN-10674.015.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qi Zhu updated YARN-10674: -- Attachment: YARN-10674.015.patch > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch, > YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, > YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, > YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, > YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch, > YARN-10674.015.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10641) Refactor the max app related update, and fix maxApllications update error when add new queues.
[ https://issues.apache.org/jira/browse/YARN-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304117#comment-17304117 ] Peter Bacsko commented on YARN-10641: - +1 Thanks for the patch [~zhuqi] and [~gandras] for the review. Committed to trunk. > Refactor the max app related update, and fix maxApllications update error > when add new queues. > -- > > Key: YARN-10641 > URL: https://issues.apache.org/jira/browse/YARN-10641 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10641.001.patch, YARN-10641.002.patch, > YARN-10641.003.patch, YARN-10641.004.patch, YARN-10641.005.patch, > YARN-10641.006.patch, image-2021-02-20-15-49-58-677.png, > image-2021-02-20-15-53-51-099.png, image-2021-02-20-15-55-44-780.png, > image-2021-02-20-16-29-18-519.png, image-2021-02-20-16-31-13-714.png > > > When refactor the update logic in YARN-10504 . > The update max applications based abs/cap is wrong, this should be fixed, > because the max applications is key part to limit applications in CS. > For example: > When adding a dynamic queue, the other children's max app of parent queue are > not updated correctly: > !image-2021-02-20-15-53-51-099.png|width=639,height=509! > The new added queue's max app will updated correctly: > !image-2021-02-20-15-55-44-780.png|width=542,height=426! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10692) Add Node GPU Utilization and apply to NodeMetrics.
[ https://issues.apache.org/jira/browse/YARN-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304089#comment-17304089 ] Peter Bacsko commented on YARN-10692: - Thanks [~zhuqi] for the patch, committed to trunk. > Add Node GPU Utilization and apply to NodeMetrics. > -- > > Key: YARN-10692 > URL: https://issues.apache.org/jira/browse/YARN-10692 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10692.001.patch, YARN-10692.002.patch, > YARN-10692.003.patch > > > Now there are no node level GPU Utilization, this issue will add it, and add > it to NodeMetrics first. > cc [~pbacsko] [~Jim_Brennan] [~ebadger] [~gandras] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10701) The yarn.resource-types should support multi types without trimmed.
[ https://issues.apache.org/jira/browse/YARN-10701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304090#comment-17304090 ] Andras Gyori commented on YARN-10701: - Thank you [~zhuqi] for the patch. Its straightforward, looks good to me. > The yarn.resource-types should support multi types without trimmed. > --- > > Key: YARN-10701 > URL: https://issues.apache.org/jira/browse/YARN-10701 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10701.001.patch, YARN-10701.002.patch > > > {code:java} > > > yarn.resource-types > yarn.io/gpu, yarn.io/fpga > > {code} > When i configured the resource type above with gpu and fpga, the error > happend: > > {code:java} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: ' yarn.io/fpga' is > not a valid resource name. A valid resource name must begin with a letter and > contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource > name may also be optionally preceded by a name space followed by a slash. A > valid name space consists of period-separated groups of letters, numbers, and > dashes.{code} > > The resource types should support trim. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10692) Add Node GPU Utilization and apply to NodeMetrics.
[ https://issues.apache.org/jira/browse/YARN-10692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304078#comment-17304078 ] Peter Bacsko commented on YARN-10692: - +1 LGTM. Committing this soon. > Add Node GPU Utilization and apply to NodeMetrics. > -- > > Key: YARN-10692 > URL: https://issues.apache.org/jira/browse/YARN-10692 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10692.001.patch, YARN-10692.002.patch, > YARN-10692.003.patch > > > Now there are no node level GPU Utilization, this issue will add it, and add > it to NodeMetrics first. > cc [~pbacsko] [~Jim_Brennan] [~ebadger] [~gandras] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10659) Improve CS MappingRule %secondary_group evaluation
[ https://issues.apache.org/jira/browse/YARN-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304077#comment-17304077 ] Szilard Nemeth commented on YARN-10659: --- Thanks [~shuzirra] for working on this. Latest patch LGTM, committed to trunk. Checkstyle issue is not important and javadoc issue was not related. Thanks [~gandras] for the review. > Improve CS MappingRule %secondary_group evaluation > -- > > Key: YARN-10659 > URL: https://issues.apache.org/jira/browse/YARN-10659 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: YARN-10659.001.patch, YARN-10659.002.patch, > YARN-10659.003.patch > > > Since the leaf queue names are not unique, there are a lot of use cases where > %secondary_group evaluation fail, or behave inconsistently. > We should extend it's behavior, when it's under a defined parent, > %secondary_group evaluation should only check for queue existence under that > queue. Egy root.group.%secondary_group, should only evaluate to groups which > exist under root.group, while the legacy %secondary_group.%user should still > look for groups by their leaf name globally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10659) Improve CS MappingRule %secondary_group evaluation
[ https://issues.apache.org/jira/browse/YARN-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szilard Nemeth updated YARN-10659: -- Fix Version/s: 3.4.0 > Improve CS MappingRule %secondary_group evaluation > -- > > Key: YARN-10659 > URL: https://issues.apache.org/jira/browse/YARN-10659 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10659.001.patch, YARN-10659.002.patch, > YARN-10659.003.patch > > > Since the leaf queue names are not unique, there are a lot of use cases where > %secondary_group evaluation fail, or behave inconsistently. > We should extend it's behavior, when it's under a defined parent, > %secondary_group evaluation should only check for queue existence under that > queue. Egy root.group.%secondary_group, should only evaluate to groups which > exist under root.group, while the legacy %secondary_group.%user should still > look for groups by their leaf name globally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10685) Fix typos in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304041#comment-17304041 ] Peter Bacsko commented on YARN-10685: - +1 thanks [~zhuqi] for the patch, committed to trunk. > Fix typos in AbstractCSQueue > > > Key: YARN-10685 > URL: https://issues.apache.org/jira/browse/YARN-10685 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10685.001.patch, YARN-10685.002.patch, > YARN-10685.003.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-10685) Fix typos in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-10685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-10685: Summary: Fix typos in AbstractCSQueue (was: Fixed some Typo in AbstractCSQueue.) > Fix typos in AbstractCSQueue > > > Key: YARN-10685 > URL: https://issues.apache.org/jira/browse/YARN-10685 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10685.001.patch, YARN-10685.002.patch, > YARN-10685.003.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10674) fs2cs: should support auto created queue deletion.
[ https://issues.apache.org/jira/browse/YARN-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304027#comment-17304027 ] Peter Bacsko commented on YARN-10674: - Thanks [~zhuqi] for the patch. I think we are very close. I still have some comments: 1. {noformat} private FSConfigToCSConfigConverterParams. PreemptionMode disablePreemption; private FSConfigToCSConfigConverterParams. PreemptionMode preemptionMode; {noformat} We don't need two enums. We need only one which covers all states (enabled / observeonly / nopolicy). You can extend {{PreemptionMode}} with a new variable which says whether it's enabled or disabled: {noformat} public enum PreemptionMode { ENABLE("enable", true), NO_POLICY("nopolicy", false), OBSERVE_ONLY("observeonly", false); private String cliOption; private boolean enabled; PreemptionMode(String cliOption, boolean enabled) { this.cliOption = cliOption; this.enabled = enabled; } public String getCliOption() { return cliOption; } public boolean isEnabled() { return enabled; } {noformat} So you just call {{preemptionMode.isEnabled()}} and don't need two variables just to hold the information whether it's enabled or not. 2. {{public static PreemptionMode fromString(String cliOption)}} --> this method never returns ENABLED, which is important (also, pls change "ENABLE" to "ENABLED", note the "D" at the end). cc [~gandras] please review patch v14. > fs2cs: should support auto created queue deletion. > -- > > Key: YARN-10674 > URL: https://issues.apache.org/jira/browse/YARN-10674 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Major > Labels: fs2cs > Attachments: YARN-10674.001.patch, YARN-10674.002.patch, > YARN-10674.003.patch, YARN-10674.004.patch, YARN-10674.005.patch, > YARN-10674.006.patch, YARN-10674.007.patch, YARN-10674.008.patch, > YARN-10674.009.patch, YARN-10674.010.patch, YARN-10674.011.patch, > YARN-10674.012.patch, YARN-10674.013.patch, YARN-10674.014.patch > > > In FS the auto deletion check interval is 10s. > {code:java} > @Override > public void onCheck() { > queueMgr.removeEmptyDynamicQueues(); > queueMgr.removePendingIncompatibleQueues(); > } > while (running) { > try { > synchronized (this) { > reloadListener.onCheck(); > } > ... > Thread.sleep(reloadIntervalMs); > } > /** Time to wait between checks of the allocation file */ > public static final long ALLOC_RELOAD_INTERVAL_MS = 10 * 1000;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10641) Refactor the max app related update, and fix maxApllications update error when add new queues.
[ https://issues.apache.org/jira/browse/YARN-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297081#comment-17297081 ] Qi Zhu edited comment on YARN-10641 at 3/18/21, 9:57 AM: - [~pbacsko] [~gandras] The logic here is not changed, the label support should be handled in YARN-10657. Fixed the remaining checkstyle, if you have any other advice about this?:D I think we should fix this Jira first, if this not fixed, max application without nodelabel will also be wrong. Thanks. was (Author: zhuqi): [~pbacsko] [~gandras] The logic here is not changed, the label support should be handled in YARN-10657. Fixed the remaining checkstyle, if you have any other advice about this?:D I think we should fix this Jira first, if this not fixed, max application without nodelabel will be wrong. Thanks. > Refactor the max app related update, and fix maxApllications update error > when add new queues. > -- > > Key: YARN-10641 > URL: https://issues.apache.org/jira/browse/YARN-10641 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10641.001.patch, YARN-10641.002.patch, > YARN-10641.003.patch, YARN-10641.004.patch, YARN-10641.005.patch, > YARN-10641.006.patch, image-2021-02-20-15-49-58-677.png, > image-2021-02-20-15-53-51-099.png, image-2021-02-20-15-55-44-780.png, > image-2021-02-20-16-29-18-519.png, image-2021-02-20-16-31-13-714.png > > > When refactor the update logic in YARN-10504 . > The update max applications based abs/cap is wrong, this should be fixed, > because the max applications is key part to limit applications in CS. > For example: > When adding a dynamic queue, the other children's max app of parent queue are > not updated correctly: > !image-2021-02-20-15-53-51-099.png|width=639,height=509! > The new added queue's max app will updated correctly: > !image-2021-02-20-15-55-44-780.png|width=542,height=426! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10641) Refactor the max app related update, and fix maxApllications update error when add new queues.
[ https://issues.apache.org/jira/browse/YARN-10641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17297081#comment-17297081 ] Qi Zhu edited comment on YARN-10641 at 3/18/21, 9:57 AM: - [~pbacsko] [~gandras] The logic here is not changed, the label support should be handled in YARN-10657. Fixed the remaining checkstyle, if you have any other advice about this?:D I think we should fix this Jira first, if this not fixed, max application without nodelabel will be wrong. Thanks. was (Author: zhuqi): [~pbacsko] The logic here is not changed, the label support should be handled in YARN-10657. Fixed the remaining checkstyle, if you have any other advice about this?:D Thanks. > Refactor the max app related update, and fix maxApllications update error > when add new queues. > -- > > Key: YARN-10641 > URL: https://issues.apache.org/jira/browse/YARN-10641 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Qi Zhu >Assignee: Qi Zhu >Priority: Critical > Attachments: YARN-10641.001.patch, YARN-10641.002.patch, > YARN-10641.003.patch, YARN-10641.004.patch, YARN-10641.005.patch, > YARN-10641.006.patch, image-2021-02-20-15-49-58-677.png, > image-2021-02-20-15-53-51-099.png, image-2021-02-20-15-55-44-780.png, > image-2021-02-20-16-29-18-519.png, image-2021-02-20-16-31-13-714.png > > > When refactor the update logic in YARN-10504 . > The update max applications based abs/cap is wrong, this should be fixed, > because the max applications is key part to limit applications in CS. > For example: > When adding a dynamic queue, the other children's max app of parent queue are > not updated correctly: > !image-2021-02-20-15-53-51-099.png|width=639,height=509! > The new added queue's max app will updated correctly: > !image-2021-02-20-15-55-44-780.png|width=542,height=426! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10659) Improve CS MappingRule %secondary_group evaluation
[ https://issues.apache.org/jira/browse/YARN-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303996#comment-17303996 ] Andras Gyori commented on YARN-10659: - Thanks [~shuzirra], the patch looks good to me now +1 non binding. If no other revisions are expected, [~snemeth] could review it and commit to trunk. > Improve CS MappingRule %secondary_group evaluation > -- > > Key: YARN-10659 > URL: https://issues.apache.org/jira/browse/YARN-10659 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Gergely Pollak >Assignee: Gergely Pollak >Priority: Major > Attachments: YARN-10659.001.patch, YARN-10659.002.patch, > YARN-10659.003.patch > > > Since the leaf queue names are not unique, there are a lot of use cases where > %secondary_group evaluation fail, or behave inconsistently. > We should extend it's behavior, when it's under a defined parent, > %secondary_group evaluation should only check for queue existence under that > queue. Egy root.group.%secondary_group, should only evaluate to groups which > exist under root.group, while the legacy %secondary_group.%user should still > look for groups by their leaf name globally. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org