[jira] [Commented] (YARN-11196) NUMA Awareness support in DefaultContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584658#comment-17584658 ] ASF GitHub Bot commented on YARN-11196: --- PrabhuJoseph commented on code in PR #4742: URL: https://github.com/apache/hadoop/pull/4742#discussion_r954564319 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java: ## @@ -372,16 +409,19 @@ public int relaunchContainer(ContainerStartContext ctx) * as the current working directory for the command. If null, * the current working directory is not modified. * @param environment the container environment + * @param numaCommands list of prefix numa commands * @return the new {@link ShellCommandExecutor} * @see ShellCommandExecutor */ - protected CommandExecutor buildCommandExecutor(String wrapperScriptPath, - String containerIdStr, String user, Path pidFile, Resource resource, - File workDir, Map environment) { - + protected CommandExecutor buildCommandExecutor(String wrapperScriptPath, +String containerIdStr, String user, Path pidFile, Resource resource, +File workDir, Map environment, String[] numaCommands) { + String[] command = getRunCommand(wrapperScriptPath, containerIdStr, user, pidFile, this.getConf(), resource); +command = concatStringCommands(command, numaCommands); Review Comment: Shall we skip calling this when numaCommands is not passed. > NUMA Awareness support in DefaultContainerExecutor > -- > > Key: YARN-11196 > URL: https://issues.apache.org/jira/browse/YARN-11196 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.3.3 >Reporter: Prabhu Joseph >Assignee: Samrat Deb >Priority: Major > Labels: pull-request-available > > [YARN-5764|https://issues.apache.org/jira/browse/YARN-5764] has added support > of NUMA Awareness for Containers launched through LinuxContainerExecutor. > This feature is useful to have in DefaultContainerExecutor as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11196) NUMA Awareness support in DefaultContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584657#comment-17584657 ] ASF GitHub Bot commented on YARN-11196: --- PrabhuJoseph commented on code in PR #4742: URL: https://github.com/apache/hadoop/pull/4742#discussion_r954562550 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestDefaultContainerExecutor.java: ## @@ -736,4 +755,197 @@ public void testPickDirectory() throws Exception { //new FsPermission(ApplicationLocalizer.LOGDIR_PERM), true); // } + @Before + public void setUp() throws IOException, YarnException { +yarnConfiguration = new YarnConfiguration(); +setNumaConfig(); +Context mockContext = createAndGetMockContext(); +NMStateStoreService nmStateStoreService = +mock(NMStateStoreService.class); +when(mockContext.getNMStateStore()).thenReturn(nmStateStoreService); +numaResourceAllocator = new NumaResourceAllocator(mockContext) { + @Override + public String executeNGetCmdOutput(Configuration config) + throws YarnRuntimeException { +return getNumaCmdOutput(); + } +}; + +numaResourceAllocator.init(yarnConfiguration); +FileContext lfs = FileContext.getLocalFSFileContext(); +containerExecutor = new DefaultContainerExecutor(lfs) { + @Override + public Configuration getConf() { +return yarnConfiguration; + } +}; +containerExecutor.setNumaResourceAllocator(numaResourceAllocator); +mockContainer = mock(Container.class); + } + + private void setNumaConfig() { +yarnConfiguration.set(YarnConfiguration.NM_NUMA_AWARENESS_ENABLED, "true"); +yarnConfiguration.set(YarnConfiguration.NM_NUMA_AWARENESS_READ_TOPOLOGY, "true"); +yarnConfiguration.set(YarnConfiguration.NM_NUMA_AWARENESS_NUMACTL_CMD, "/usr/bin/numactl"); + } + + + private String getNumaCmdOutput() { +// architecture of 8 cpu cores +// randomly picked size of memory +return "available: 2 nodes (0-1)\n\t" ++ "node 0 cpus: 0 2 4 6\n\t" ++ "node 0 size: 73717 MB\n\t" ++ "node 0 free: 73717 MB\n\t" ++ "node 1 cpus: 1 3 5 7\n\t" ++ "node 1 size: 73717 MB\n\t" ++ "node 1 free: 73717 MB\n\t" ++ "node distances:\n\t" ++ "node 0 1\n\t" ++ "0: 10 20\n\t" ++ "1: 20 10"; + } + + private Context createAndGetMockContext() { +Context mockContext = mock(Context.class); +@SuppressWarnings("unchecked") +ConcurrentHashMap mockContainers = mock( +ConcurrentHashMap.class); +mockContainer = mock(Container.class); +when(mockContainer.getResourceMappings()) +.thenReturn(new ResourceMappings()); +when(mockContainers.get(any())).thenReturn(mockContainer); +when(mockContext.getContainers()).thenReturn(mockContainers); +when(mockContainer.getResource()).thenReturn(Resource.newInstance(2048, 2)); +return mockContext; + } + + private void testAllocateNumaResource(String containerId, Resource resource, +String memNodes, String cpuNodes) throws Exception { +when(mockContainer.getContainerId()) +.thenReturn(ContainerId.fromString(containerId)); +when(mockContainer.getResource()).thenReturn(resource); +NumaResourceAllocation numaResourceAllocation = +numaResourceAllocator.allocateNumaNodes(mockContainer); +String[] commands = containerExecutor.getNumaCommands(numaResourceAllocation); +assertEquals(Arrays.asList(commands), Arrays.asList("/usr/bin/numactl", +"--interleave=" + memNodes, "--cpunodebind=" + cpuNodes)); + } + + @Test + public void testAllocateNumaMemoryResource() throws Exception { +// keeping cores constant for testing memory resources + +// allocates node 0 for memory and cpu +testAllocateNumaResource("container_1481156246874_0001_01_01", +Resource.newInstance(2048, 2), "0", "0"); + +// allocates node 1 for memory and cpu since allocator uses round robin assignment +testAllocateNumaResource("container_1481156246874_0001_01_02", +Resource.newInstance(6, 2), "1", "1"); + +// allocates node 0,1 for memory since there is no sufficient memory in any one node +testAllocateNumaResource("container_1481156246874_0001_01_03", +Resource.newInstance(8, 2), "0,1", "0"); + +// returns null since there are no sufficient resources available for the request +when(mockContainer.getContainerId()).thenReturn( +ContainerId.fromString("container_1481156246874_0001_01_04")); +when(mockContainer.getResource()) +.thenReturn(Resource.newInstance(8, 2)); +Assert.assertNull(numa
[jira] [Commented] (YARN-9708) Yarn Router Support DelegationToken
[ https://issues.apache.org/jira/browse/YARN-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584653#comment-17584653 ] ASF GitHub Bot commented on YARN-9708: -- slfan1989 commented on code in PR #4746: URL: https://github.com/apache/hadoop/pull/4746#discussion_r954557599 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/store/impl/MemoryFederationStateStore.java: ## @@ -395,4 +535,17 @@ public DeleteReservationHomeSubClusterResponse deleteReservationHomeSubCluster( reservations.remove(reservationId); return DeleteReservationHomeSubClusterResponse.newInstance(); } -} + + /** + * Get DelegationKey By based on MasterKey. + * + * @param masterKey masterKey + * @return DelegationKey + */ + private DelegationKey getDelegationKeyByMasterKey(RouterMasterKey masterKey) { Review Comment: I will fix it. > Yarn Router Support DelegationToken > --- > > Key: YARN-9708 > URL: https://issues.apache.org/jira/browse/YARN-9708 > Project: Hadoop YARN > Issue Type: New Feature > Components: router >Affects Versions: 3.1.1 >Reporter: Xie YiFan >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Attachments: Add_getDelegationToken_and_SecureLogin_in_router.patch, > RMDelegationTokenSecretManager_storeNewMasterKey.svg, > RouterDelegationTokenSecretManager_storeNewMasterKey.svg > > > 1.we use router as proxy to manage multiple cluster which be independent of > each other in order to apply unified client. Thus, we implement our > customized AMRMProxyPolicy that doesn't broadcast ResourceRequest to other > cluster. > 2.Our production environment need kerberos. But router doesn't support > SecureLogin for now. > https://issues.apache.org/jira/browse/YARN-6539 desn't work. So we > improvement it. > 3.Some framework like oozie would get Token via yarnclient#getDelegationToken > which router doesn't support. Our solution is that adding homeCluster to > ApplicationSubmissionContextProto & GetDelegationTokenRequestProto. Job would > be submitted with specified clusterid so that router knows which cluster to > submit this job. Router would get Token from one RM according to specified > clusterid when client call getDelegation meanwhile apply some mechanism to > save this token in memory. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9708) Yarn Router Support DelegationToken
[ https://issues.apache.org/jira/browse/YARN-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584652#comment-17584652 ] ASF GitHub Bot commented on YARN-9708: -- slfan1989 commented on code in PR #4746: URL: https://github.com/apache/hadoop/pull/4746#discussion_r954556809 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/impl/pb/package-info.java: ## @@ -0,0 +1,20 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +@InterfaceAudience.Public +package org.apache.hadoop.yarn.security.client.impl.pb; +import org.apache.hadoop.classification.InterfaceAudience; Review Comment: I will modify the code. > Yarn Router Support DelegationToken > --- > > Key: YARN-9708 > URL: https://issues.apache.org/jira/browse/YARN-9708 > Project: Hadoop YARN > Issue Type: New Feature > Components: router >Affects Versions: 3.1.1 >Reporter: Xie YiFan >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Attachments: Add_getDelegationToken_and_SecureLogin_in_router.patch, > RMDelegationTokenSecretManager_storeNewMasterKey.svg, > RouterDelegationTokenSecretManager_storeNewMasterKey.svg > > > 1.we use router as proxy to manage multiple cluster which be independent of > each other in order to apply unified client. Thus, we implement our > customized AMRMProxyPolicy that doesn't broadcast ResourceRequest to other > cluster. > 2.Our production environment need kerberos. But router doesn't support > SecureLogin for now. > https://issues.apache.org/jira/browse/YARN-6539 desn't work. So we > improvement it. > 3.Some framework like oozie would get Token via yarnclient#getDelegationToken > which router doesn't support. Our solution is that adding homeCluster to > ApplicationSubmissionContextProto & GetDelegationTokenRequestProto. Job would > be submitted with specified clusterid so that router knows which cluster to > submit this job. Router would get Token from one RM according to specified > clusterid when client call getDelegation meanwhile apply some mechanism to > save this token in memory. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11196) NUMA Awareness support in DefaultContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584651#comment-17584651 ] ASF GitHub Bot commented on YARN-11196: --- PrabhuJoseph commented on code in PR #4742: URL: https://github.com/apache/hadoop/pull/4742#discussion_r954556722 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java: ## @@ -1040,4 +1080,92 @@ public void updateYarnSysFS(Context ctx, String user, String appId, String spec) throws IOException { throw new ServiceStateException("Implementation unavailable"); } + + @Override + public int reacquireContainer(ContainerReacquisitionContext ctx) + throws IOException, InterruptedException { +try { + if (numaResourceAllocator != null) { +numaResourceAllocator.recoverNumaResource(ctx.getContainerId()); + } + return super.reacquireContainer(ctx); +} finally { + postComplete(ctx.getContainerId()); +} + } + + /** + * clean up and release of resources. + * + * @param containerId containerId of running container + */ + public void postComplete(final ContainerId containerId) { +if (numaResourceAllocator != null) { + try { +numaResourceAllocator.releaseNumaResource(containerId); + } catch (ResourceHandlerException e) { +LOG.warn("NumaResource release failed for " + +"containerId: {}. Exception: ", containerId, e); + } +} + } + + /** + * @param resourceAllocation NonNull NumaResourceAllocation object reference + * @return Array of numa specific commands + */ + String[] getNumaCommands(NumaResourceAllocation resourceAllocation) { +String[] numaCommand = new String[3]; +numaCommand[0] = this.getConf().get(YarnConfiguration.NM_NUMA_AWARENESS_NUMACTL_CMD, Review Comment: Better to read this config and initialize in the init. > NUMA Awareness support in DefaultContainerExecutor > -- > > Key: YARN-11196 > URL: https://issues.apache.org/jira/browse/YARN-11196 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.3.3 >Reporter: Prabhu Joseph >Assignee: Samrat Deb >Priority: Major > Labels: pull-request-available > > [YARN-5764|https://issues.apache.org/jira/browse/YARN-5764] has added support > of NUMA Awareness for Containers launched through LinuxContainerExecutor. > This feature is useful to have in DefaultContainerExecutor as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11253) Add Configuration to delegationToken RemoverScanInterval
[ https://issues.apache.org/jira/browse/YARN-11253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584643#comment-17584643 ] ASF GitHub Bot commented on YARN-11253: --- slfan1989 commented on code in PR #4751: URL: https://github.com/apache/hadoop/pull/4751#discussion_r954542677 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml: ## @@ -1077,6 +1077,14 @@ 8640 + + + RM delegation token remove-scan interval in ms + +yarn.resourcemanager.delegation.token.remove-scan-interval +360 Review Comment: I will modify the code. > Add Configuration to delegationToken RemoverScanInterval > > > Key: YARN-11253 > URL: https://issues.apache.org/jira/browse/YARN-11253 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.4.0, 3.3.4 >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Labels: pull-request-available > > When reading the code, I found the case of hard coding, I think the > parameters should be abstracted into the configuration. > org.apache.hadoop.yarn.server.resourcemanager.RMSecretManagerService# > createRMDelegationTokenSecretManager > {code:java} > protected RMDelegationTokenSecretManager > createRMDelegationTokenSecretManager(Configuration conf, RMContext rmContext) > { >// . 360 This hard code should be extracted >return new RMDelegationTokenSecretManager(secretKeyInterval, > tokenMaxLifetime, tokenRenewInterval, 360, rmContext); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11253) Add Configuration to delegationToken RemoverScanInterval
[ https://issues.apache.org/jira/browse/YARN-11253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584642#comment-17584642 ] ASF GitHub Bot commented on YARN-11253: --- slfan1989 commented on code in PR #4751: URL: https://github.com/apache/hadoop/pull/4751#discussion_r954542145 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMSecretManagerService.java: ## @@ -135,9 +135,11 @@ protected RMDelegationTokenSecretManager createRMDelegationTokenSecretManager( long tokenRenewInterval = conf.getLong(YarnConfiguration.RM_DELEGATION_TOKEN_RENEW_INTERVAL_KEY, YarnConfiguration.RM_DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT); - +long removeScanInterval = + conf.getLong(YarnConfiguration.RM_DELEGATION_TOKEN_REMOVE_SCAN_INTERVAL_KEY, Review Comment: Thanks for your suggestion, I will modify the code. > Add Configuration to delegationToken RemoverScanInterval > > > Key: YARN-11253 > URL: https://issues.apache.org/jira/browse/YARN-11253 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.4.0, 3.3.4 >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Labels: pull-request-available > > When reading the code, I found the case of hard coding, I think the > parameters should be abstracted into the configuration. > org.apache.hadoop.yarn.server.resourcemanager.RMSecretManagerService# > createRMDelegationTokenSecretManager > {code:java} > protected RMDelegationTokenSecretManager > createRMDelegationTokenSecretManager(Configuration conf, RMContext rmContext) > { >// . 360 This hard code should be extracted >return new RMDelegationTokenSecretManager(secretKeyInterval, > tokenMaxLifetime, tokenRenewInterval, 360, rmContext); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11277) trigger deletion of log-dir by size for NonAggregatingLogHandler
[ https://issues.apache.org/jira/browse/YARN-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584641#comment-17584641 ] ASF GitHub Bot commented on YARN-11277: --- hadoop-yetus commented on PR #4797: URL: https://github.com/apache/hadoop/pull/4797#issuecomment-1226818007 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 46s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 1s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 39s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 25m 42s | | trunk passed | | +1 :green_heart: | compile | 9m 48s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 8m 40s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 2m 6s | | trunk passed | | +1 :green_heart: | mvnsite | 4m 12s | | trunk passed | | +1 :green_heart: | javadoc | 3m 33s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 3m 33s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 37s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 13s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 27s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 6s | | the patch passed | | +1 :green_heart: | compile | 9m 7s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 9m 7s | | the patch passed | | +1 :green_heart: | compile | 8m 31s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 8m 31s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 48s | | hadoop-yarn-project/hadoop-yarn: The patch generated 0 new + 187 unchanged - 34 fixed = 187 total (was 221) | | +1 :green_heart: | mvnsite | 3m 39s | | the patch passed | | +1 :green_heart: | javadoc | 3m 19s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 3m 3s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 19s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 58s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 21s | | hadoop-yarn-api in the patch passed. | | +1 :green_heart: | unit | 5m 11s | | hadoop-yarn-common in the patch passed. | | +1 :green_heart: | unit | 24m 34s | | hadoop-yarn-server-nodemanager in the patch passed. | | +1 :green_heart: | asflicense | 1m 13s | | The patch does not generate ASF License warnings. | | | | 198m 23s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4797/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4797 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux 98eddf566c8d 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 6ac41d5e94d64cf61e7a95f84c0ab1bee7d25997 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
[jira] [Commented] (YARN-11277) trigger deletion of log-dir by size for NonAggregatingLogHandler
[ https://issues.apache.org/jira/browse/YARN-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584636#comment-17584636 ] ASF GitHub Bot commented on YARN-11277: --- hadoop-yetus commented on PR #4797: URL: https://github.com/apache/hadoop/pull/4797#issuecomment-1226812566 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 36s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 1s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 27s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 25m 11s | | trunk passed | | +1 :green_heart: | compile | 9m 46s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 8m 37s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 2m 4s | | trunk passed | | +1 :green_heart: | mvnsite | 4m 11s | | trunk passed | | +1 :green_heart: | javadoc | 3m 45s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 3m 24s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 25s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 55s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 32s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 4s | | the patch passed | | +1 :green_heart: | compile | 9m 6s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 9m 6s | | the patch passed | | +1 :green_heart: | compile | 8m 32s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 8m 32s | | the patch passed | | +1 :green_heart: | blanks | 0m 1s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 53s | [/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4797/3/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt) | hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 187 unchanged - 34 fixed = 189 total (was 221) | | +1 :green_heart: | mvnsite | 3m 23s | | the patch passed | | +1 :green_heart: | javadoc | 3m 11s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 2m 58s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 18s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 38s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 37s | | hadoop-yarn-api in the patch passed. | | +1 :green_heart: | unit | 5m 19s | | hadoop-yarn-common in the patch passed. | | +1 :green_heart: | unit | 24m 37s | | hadoop-yarn-server-nodemanager in the patch passed. | | +1 :green_heart: | asflicense | 1m 14s | | The patch does not generate ASF License warnings. | | | | 196m 57s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4797/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4797 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux c866605e3a73 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 6a8abd25903dd1290fb5c816a457d13d21240211 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK
[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation
[ https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584632#comment-17584632 ] ASF GitHub Bot commented on YARN-11177: --- slfan1989 commented on code in PR #4764: URL: https://github.com/apache/hadoop/pull/4764#discussion_r954528905 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/test/java/org/apache/hadoop/yarn/server/router/clientrm/TestFederationClientInterceptor.java: ## @@ -1254,4 +1275,250 @@ public void testNodesToAttributes() throws Exception { NodeAttributeType.STRING, "nvida"); Assert.assertTrue(nodeAttributeMap.get("0-host1").contains(gpu)); } + + @Test + public void testGetNewReservation() throws Exception { +LOG.info("Test FederationClientInterceptor : Get NewReservation request."); + +// null request +LambdaTestUtils.intercept(YarnException.class, +"Missing getNewReservation request.", () -> interceptor.getNewReservation(null)); + +// normal request +GetNewReservationRequest request = GetNewReservationRequest.newInstance(); +GetNewReservationResponse response = interceptor.getNewReservation(request); +Assert.assertNotNull(response); + +ReservationId reservationId = response.getReservationId(); +Assert.assertNotNull(reservationId); +Assert.assertTrue(reservationId.toString().contains("reservation")); +Assert.assertEquals(reservationId.getClusterTimestamp(), ResourceManager.getClusterTimeStamp()); + } + + @Test + public void testSubmitReservation() throws Exception { +LOG.info("Test FederationClientInterceptor : SubmitReservation request."); + +// get new reservationId +GetNewReservationRequest request = GetNewReservationRequest.newInstance(); +GetNewReservationResponse response = interceptor.getNewReservation(request); +Assert.assertNotNull(response); + +// allow plan follower to synchronize, manually trigger an assignment +Map mockRMs = interceptor.getMockRMs(); +for (MockRM mockRM : mockRMs.values()) { + ReservationSystem reservationSystem = mockRM.getReservationSystem(); + reservationSystem.synchronizePlan("root.decided", true); +} + +// Submit Reservation +ReservationId reservationId = response.getReservationId(); +ReservationDefinition rDefinition = createReservationDefinition(1024, 1); +ReservationSubmissionRequest rSubmissionRequest = ReservationSubmissionRequest.newInstance( +rDefinition, "decided", reservationId); + +ReservationSubmissionResponse submissionResponse = +interceptor.submitReservation(rSubmissionRequest); +Assert.assertNotNull(submissionResponse); + +SubClusterId subClusterId = stateStoreUtil.queryReservationHomeSC(reservationId); +Assert.assertNotNull(subClusterId); +Assert.assertTrue(subClusters.contains(subClusterId)); + } + + @Test + public void testSubmitReservationEmptyRequest() throws Exception { +LOG.info("Test FederationClientInterceptor : SubmitReservation request empty."); + +// null request1 +LambdaTestUtils.intercept(YarnException.class, +"Missing submitReservation request or reservationId or reservation definition or queue.", +() -> interceptor.submitReservation(null)); + +// null request2 +LambdaTestUtils.intercept(YarnException.class, +"Missing submitReservation request or reservationId or reservation definition or queue.", +() -> interceptor.submitReservation( +ReservationSubmissionRequest.newInstance(null, null, null))); + +// null request3 +ReservationSubmissionRequest request3 = +ReservationSubmissionRequest.newInstance(null, "q1", null); +LambdaTestUtils.intercept(YarnException.class, +"Missing submitReservation request or reservationId or reservation definition or queue.", +() -> interceptor.submitReservation(request3)); + +// null request4 +ReservationId reservationId = ReservationId.newInstance(Time.now(), 1); +ReservationSubmissionRequest request4 = +ReservationSubmissionRequest.newInstance(null, null, reservationId); +LambdaTestUtils.intercept(YarnException.class, +"Missing submitReservation request or reservationId or reservation definition or queue.", +() -> interceptor.submitReservation(request4)); + +// null request5 +long defaultDuration = 60; +long arrival = Time.now(); +long deadline = arrival + (int)(defaultDuration * 1.1); + +ReservationRequest rRequest = ReservationRequest.newInstance( +Resource.newInstance(1024, 1), 1, 1, defaultDuration); +ReservationRequest[] rRequests = new ReservationRequest[] {rRequest}; +ReservationDefinition rDefinition = createReservationDefinition(arrival, deadline, rRequests, +ReservationRequestInterpreter.R_ALL, "u1"); +ReservationSubmissionRequest reque
[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation
[ https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584628#comment-17584628 ] ASF GitHub Bot commented on YARN-11177: --- slfan1989 commented on code in PR #4764: URL: https://github.com/apache/hadoop/pull/4764#discussion_r954527995 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/test/java/org/apache/hadoop/yarn/server/router/clientrm/TestFederationClientInterceptor.java: ## @@ -1254,4 +1275,250 @@ public void testNodesToAttributes() throws Exception { NodeAttributeType.STRING, "nvida"); Assert.assertTrue(nodeAttributeMap.get("0-host1").contains(gpu)); } + + @Test + public void testGetNewReservation() throws Exception { +LOG.info("Test FederationClientInterceptor : Get NewReservation request."); + +// null request +LambdaTestUtils.intercept(YarnException.class, +"Missing getNewReservation request.", () -> interceptor.getNewReservation(null)); + +// normal request +GetNewReservationRequest request = GetNewReservationRequest.newInstance(); +GetNewReservationResponse response = interceptor.getNewReservation(request); +Assert.assertNotNull(response); + +ReservationId reservationId = response.getReservationId(); +Assert.assertNotNull(reservationId); +Assert.assertTrue(reservationId.toString().contains("reservation")); +Assert.assertEquals(reservationId.getClusterTimestamp(), ResourceManager.getClusterTimeStamp()); + } + + @Test + public void testSubmitReservation() throws Exception { +LOG.info("Test FederationClientInterceptor : SubmitReservation request."); + +// get new reservationId +GetNewReservationRequest request = GetNewReservationRequest.newInstance(); +GetNewReservationResponse response = interceptor.getNewReservation(request); +Assert.assertNotNull(response); + +// allow plan follower to synchronize, manually trigger an assignment +Map mockRMs = interceptor.getMockRMs(); +for (MockRM mockRM : mockRMs.values()) { + ReservationSystem reservationSystem = mockRM.getReservationSystem(); + reservationSystem.synchronizePlan("root.decided", true); +} + +// Submit Reservation +ReservationId reservationId = response.getReservationId(); +ReservationDefinition rDefinition = createReservationDefinition(1024, 1); +ReservationSubmissionRequest rSubmissionRequest = ReservationSubmissionRequest.newInstance( +rDefinition, "decided", reservationId); + +ReservationSubmissionResponse submissionResponse = +interceptor.submitReservation(rSubmissionRequest); +Assert.assertNotNull(submissionResponse); + +SubClusterId subClusterId = stateStoreUtil.queryReservationHomeSC(reservationId); +Assert.assertNotNull(subClusterId); +Assert.assertTrue(subClusters.contains(subClusterId)); + } + + @Test + public void testSubmitReservationEmptyRequest() throws Exception { +LOG.info("Test FederationClientInterceptor : SubmitReservation request empty."); + +// null request1 +LambdaTestUtils.intercept(YarnException.class, +"Missing submitReservation request or reservationId or reservation definition or queue.", +() -> interceptor.submitReservation(null)); + +// null request2 +LambdaTestUtils.intercept(YarnException.class, +"Missing submitReservation request or reservationId or reservation definition or queue.", +() -> interceptor.submitReservation( +ReservationSubmissionRequest.newInstance(null, null, null))); Review Comment: I will fix it. > Support getNewReservation, submitReservation, updateReservation, > deleteReservation API's for Federation > --- > > Key: YARN-11177 > URL: https://issues.apache.org/jira/browse/YARN-11177 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation
[ https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584625#comment-17584625 ] ASF GitHub Bot commented on YARN-11177: --- slfan1989 commented on code in PR #4764: URL: https://github.com/apache/hadoop/pull/4764#discussion_r954525110 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java: ## @@ -925,13 +1041,61 @@ public ReservationListResponse listReservations( @Override public ReservationUpdateResponse updateReservation( ReservationUpdateRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null || request.getReservationId() == null +|| request.getReservationDefinition() == null) { Review Comment: I will fix it. > Support getNewReservation, submitReservation, updateReservation, > deleteReservation API's for Federation > --- > > Key: YARN-11177 > URL: https://issues.apache.org/jira/browse/YARN-11177 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation
[ https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584624#comment-17584624 ] ASF GitHub Bot commented on YARN-11177: --- slfan1989 commented on code in PR #4764: URL: https://github.com/apache/hadoop/pull/4764#discussion_r954524957 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java: ## @@ -888,13 +890,127 @@ public MoveApplicationAcrossQueuesResponse moveApplicationAcrossQueues( @Override public GetNewReservationResponse getNewReservation( GetNewReservationRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null) { + routerMetrics.incrGetNewReservationFailedRetrieved(); + String errMsg = "Missing getNewReservation request."; + RouterServerUtil.logAndThrowException(errMsg, null); +} + +long startTime = clock.getTime(); +Map subClustersActive = +federationFacade.getSubClusters(true); + +for (int i = 0; i < numSubmitRetries; ++i) { + SubClusterId subClusterId = getRandomActiveSubCluster(subClustersActive); + LOG.info("getNewReservation try #{} on SubCluster {}.", i, subClusterId); + ApplicationClientProtocol clientRMProxy = getClientRMProxyForSubCluster(subClusterId); + GetNewReservationResponse response = null; + try { +response = clientRMProxy.getNewReservation(request); +if (response != null) { + long stopTime = clock.getTime(); + routerMetrics.succeededGetNewReservationRetrieved(stopTime - startTime); + return response; +} + } catch (Exception e) { +LOG.warn("Unable to create a new Reservation in SubCluster {}.", subClusterId.getId(), e); +subClustersActive.remove(subClusterId); + } +} + +routerMetrics.incrGetNewReservationFailedRetrieved(); +String errMsg = "Failed to create a new reservation."; +throw new YarnException(errMsg); } @Override public ReservationSubmissionResponse submitReservation( ReservationSubmissionRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null || request.getReservationId() == null +|| request.getReservationDefinition() == null || request.getQueue() == null) { + routerMetrics.incrSubmitReservationFailedRetrieved(); + RouterServerUtil.logAndThrowException( + "Missing submitReservation request or reservationId " + + "or reservation definition or queue.", null); +} + +long startTime = clock.getTime(); +ReservationId reservationId = request.getReservationId(); + +long retryCount = 0; +boolean firstRetry = true; + +while (retryCount < numSubmitRetries) { + + SubClusterId subClusterId = policyFacade.getReservationHomeSubCluster(request); + LOG.info("submitReservation reservationId {} try #{} on SubCluster {}.", + reservationId, retryCount, subClusterId); + + ReservationHomeSubCluster reservationHomeSubCluster = + ReservationHomeSubCluster.newInstance(reservationId, subClusterId); + + // If it is the first attempt,use StateStore to add the + // mapping of reservationId and subClusterId. + // if the number of attempts is greater than 1, use StateStore to update the mapping. + if (firstRetry) { +try { + // persist the mapping of reservationId and the subClusterId which has + // been selected as its home + subClusterId = federationFacade.addReservationHomeSubCluster(reservationHomeSubCluster); + firstRetry = false; +} catch (YarnException e) { + routerMetrics.incrSubmitReservationFailedRetrieved(); + RouterServerUtil.logAndThrowException(e, + "Unable to insert the ReservationId %s into the FederationStateStore.", + reservationId); +} + } else { +try { + // update the mapping of reservationId and the home subClusterId to + // the new subClusterId we have selected + federationFacade.updateReservationHomeSubCluster(reservationHomeSubCluster); +} catch (YarnException e) { + SubClusterId subClusterIdInStateStore = + federationFacade.getReservationHomeSubCluster(reservationId); + if (subClusterId == subClusterIdInStateStore) { +LOG.info("Reservation {} already submitted on SubCluster {}.", +reservationId, subClusterId); + } else { +routerMetrics.incrSubmitReservationFailedRetrieved(); +RouterServerUtil.logAndThrowException(e, +"Unable to update the ReservationId %s into
[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation
[ https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584623#comment-17584623 ] ASF GitHub Bot commented on YARN-11177: --- slfan1989 commented on code in PR #4764: URL: https://github.com/apache/hadoop/pull/4764#discussion_r954524494 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java: ## @@ -888,13 +890,127 @@ public MoveApplicationAcrossQueuesResponse moveApplicationAcrossQueues( @Override public GetNewReservationResponse getNewReservation( GetNewReservationRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null) { + routerMetrics.incrGetNewReservationFailedRetrieved(); + String errMsg = "Missing getNewReservation request."; + RouterServerUtil.logAndThrowException(errMsg, null); +} + +long startTime = clock.getTime(); +Map subClustersActive = +federationFacade.getSubClusters(true); + +for (int i = 0; i < numSubmitRetries; ++i) { + SubClusterId subClusterId = getRandomActiveSubCluster(subClustersActive); + LOG.info("getNewReservation try #{} on SubCluster {}.", i, subClusterId); + ApplicationClientProtocol clientRMProxy = getClientRMProxyForSubCluster(subClusterId); + GetNewReservationResponse response = null; + try { +response = clientRMProxy.getNewReservation(request); +if (response != null) { + long stopTime = clock.getTime(); + routerMetrics.succeededGetNewReservationRetrieved(stopTime - startTime); + return response; +} + } catch (Exception e) { +LOG.warn("Unable to create a new Reservation in SubCluster {}.", subClusterId.getId(), e); +subClustersActive.remove(subClusterId); + } +} + +routerMetrics.incrGetNewReservationFailedRetrieved(); +String errMsg = "Failed to create a new reservation."; +throw new YarnException(errMsg); } @Override public ReservationSubmissionResponse submitReservation( ReservationSubmissionRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null || request.getReservationId() == null +|| request.getReservationDefinition() == null || request.getQueue() == null) { + routerMetrics.incrSubmitReservationFailedRetrieved(); + RouterServerUtil.logAndThrowException( + "Missing submitReservation request or reservationId " + + "or reservation definition or queue.", null); +} + +long startTime = clock.getTime(); +ReservationId reservationId = request.getReservationId(); + +long retryCount = 0; +boolean firstRetry = true; + +while (retryCount < numSubmitRetries) { + + SubClusterId subClusterId = policyFacade.getReservationHomeSubCluster(request); + LOG.info("submitReservation reservationId {} try #{} on SubCluster {}.", + reservationId, retryCount, subClusterId); + + ReservationHomeSubCluster reservationHomeSubCluster = + ReservationHomeSubCluster.newInstance(reservationId, subClusterId); + + // If it is the first attempt,use StateStore to add the + // mapping of reservationId and subClusterId. + // if the number of attempts is greater than 1, use StateStore to update the mapping. + if (firstRetry) { +try { + // persist the mapping of reservationId and the subClusterId which has + // been selected as its home + subClusterId = federationFacade.addReservationHomeSubCluster(reservationHomeSubCluster); + firstRetry = false; +} catch (YarnException e) { + routerMetrics.incrSubmitReservationFailedRetrieved(); + RouterServerUtil.logAndThrowException(e, + "Unable to insert the ReservationId %s into the FederationStateStore.", + reservationId); +} + } else { +try { + // update the mapping of reservationId and the home subClusterId to + // the new subClusterId we have selected + federationFacade.updateReservationHomeSubCluster(reservationHomeSubCluster); +} catch (YarnException e) { + SubClusterId subClusterIdInStateStore = + federationFacade.getReservationHomeSubCluster(reservationId); + if (subClusterId == subClusterIdInStateStore) { +LOG.info("Reservation {} already submitted on SubCluster {}.", +reservationId, subClusterId); + } else { +routerMetrics.incrSubmitReservationFailedRetrieved(); +RouterServerUtil.logAndThrowException(e, +"Unable to update the ReservationId %s into
[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation
[ https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584622#comment-17584622 ] ASF GitHub Bot commented on YARN-11177: --- slfan1989 commented on code in PR #4764: URL: https://github.com/apache/hadoop/pull/4764#discussion_r954524092 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java: ## @@ -888,13 +890,127 @@ public MoveApplicationAcrossQueuesResponse moveApplicationAcrossQueues( @Override public GetNewReservationResponse getNewReservation( GetNewReservationRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null) { + routerMetrics.incrGetNewReservationFailedRetrieved(); + String errMsg = "Missing getNewReservation request."; + RouterServerUtil.logAndThrowException(errMsg, null); +} + +long startTime = clock.getTime(); +Map subClustersActive = +federationFacade.getSubClusters(true); + +for (int i = 0; i < numSubmitRetries; ++i) { + SubClusterId subClusterId = getRandomActiveSubCluster(subClustersActive); + LOG.info("getNewReservation try #{} on SubCluster {}.", i, subClusterId); + ApplicationClientProtocol clientRMProxy = getClientRMProxyForSubCluster(subClusterId); + GetNewReservationResponse response = null; + try { +response = clientRMProxy.getNewReservation(request); +if (response != null) { + long stopTime = clock.getTime(); + routerMetrics.succeededGetNewReservationRetrieved(stopTime - startTime); + return response; +} + } catch (Exception e) { +LOG.warn("Unable to create a new Reservation in SubCluster {}.", subClusterId.getId(), e); +subClustersActive.remove(subClusterId); + } +} + +routerMetrics.incrGetNewReservationFailedRetrieved(); +String errMsg = "Failed to create a new reservation."; +throw new YarnException(errMsg); } @Override public ReservationSubmissionResponse submitReservation( ReservationSubmissionRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null || request.getReservationId() == null +|| request.getReservationDefinition() == null || request.getQueue() == null) { + routerMetrics.incrSubmitReservationFailedRetrieved(); + RouterServerUtil.logAndThrowException( + "Missing submitReservation request or reservationId " + + "or reservation definition or queue.", null); +} + +long startTime = clock.getTime(); +ReservationId reservationId = request.getReservationId(); + +long retryCount = 0; +boolean firstRetry = true; + +while (retryCount < numSubmitRetries) { + + SubClusterId subClusterId = policyFacade.getReservationHomeSubCluster(request); + LOG.info("submitReservation reservationId {} try #{} on SubCluster {}.", + reservationId, retryCount, subClusterId); + + ReservationHomeSubCluster reservationHomeSubCluster = + ReservationHomeSubCluster.newInstance(reservationId, subClusterId); + + // If it is the first attempt,use StateStore to add the + // mapping of reservationId and subClusterId. + // if the number of attempts is greater than 1, use StateStore to update the mapping. + if (firstRetry) { +try { + // persist the mapping of reservationId and the subClusterId which has + // been selected as its home + subClusterId = federationFacade.addReservationHomeSubCluster(reservationHomeSubCluster); + firstRetry = false; +} catch (YarnException e) { + routerMetrics.incrSubmitReservationFailedRetrieved(); + RouterServerUtil.logAndThrowException(e, + "Unable to insert the ReservationId %s into the FederationStateStore.", + reservationId); +} + } else { +try { + // update the mapping of reservationId and the home subClusterId to + // the new subClusterId we have selected + federationFacade.updateReservationHomeSubCluster(reservationHomeSubCluster); +} catch (YarnException e) { + SubClusterId subClusterIdInStateStore = + federationFacade.getReservationHomeSubCluster(reservationId); + if (subClusterId == subClusterIdInStateStore) { +LOG.info("Reservation {} already submitted on SubCluster {}.", +reservationId, subClusterId); + } else { +routerMetrics.incrSubmitReservationFailedRetrieved(); +RouterServerUtil.logAndThrowException(e, +"Unable to update the ReservationId %s into
[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation
[ https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584620#comment-17584620 ] ASF GitHub Bot commented on YARN-11177: --- slfan1989 commented on code in PR #4764: URL: https://github.com/apache/hadoop/pull/4764#discussion_r954523877 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java: ## @@ -888,13 +890,127 @@ public MoveApplicationAcrossQueuesResponse moveApplicationAcrossQueues( @Override public GetNewReservationResponse getNewReservation( GetNewReservationRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null) { + routerMetrics.incrGetNewReservationFailedRetrieved(); + String errMsg = "Missing getNewReservation request."; + RouterServerUtil.logAndThrowException(errMsg, null); +} + +long startTime = clock.getTime(); +Map subClustersActive = Review Comment: I will fix it. > Support getNewReservation, submitReservation, updateReservation, > deleteReservation API's for Federation > --- > > Key: YARN-11177 > URL: https://issues.apache.org/jira/browse/YARN-11177 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation
[ https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584606#comment-17584606 ] ASF GitHub Bot commented on YARN-11177: --- slfan1989 commented on code in PR #4764: URL: https://github.com/apache/hadoop/pull/4764#discussion_r954499120 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java: ## @@ -1624,6 +1788,35 @@ protected SubClusterId getApplicationHomeSubCluster( throw new YarnException(errorMsg); } + protected SubClusterId getReservationHomeSubCluster(ReservationId reservationId) + throws YarnException { + +if (reservationId == null) { + LOG.error("ReservationId is Null, Can't find in SubCluster."); + return null; +} + +SubClusterId resultSubClusterId = null; + +// try looking for applicationId in Home SubCluster +try { + resultSubClusterId = federationFacade.getReservationHomeSubCluster(reservationId); +} catch (YarnException ex) { + if(LOG.isDebugEnabled()){ +LOG.debug("Can't find reservationId = {} in home sub cluster, " + +" try foreach sub clusters.", reservationId); Review Comment: I will fix it. ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/test/java/org/apache/hadoop/yarn/server/router/clientrm/TestFederationClientInterceptor.java: ## @@ -203,6 +218,12 @@ protected YarnConfiguration createConfiguration() { // Disable StateStoreFacade cache conf.setInt(YarnConfiguration.FEDERATION_CACHE_TIME_TO_LIVE_SECS, 0); + +conf.setInt("yarn.scheduler.minimum-allocation-mb", 512); +conf.setInt("yarn.scheduler.minimum-allocation-vcores", 1); +conf.setInt("yarn.scheduler.maximum-allocation-mb", 102400); Review Comment: I will fix it. > Support getNewReservation, submitReservation, updateReservation, > deleteReservation API's for Federation > --- > > Key: YARN-11177 > URL: https://issues.apache.org/jira/browse/YARN-11177 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6539) Create SecureLogin inside Router
[ https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584590#comment-17584590 ] ASF GitHub Bot commented on YARN-6539: -- zhengchenyu closed pull request #4354: YARN-6539. Create SecureLogin inside Router. URL: https://github.com/apache/hadoop/pull/4354 > Create SecureLogin inside Router > > > Key: YARN-6539 > URL: https://issues.apache.org/jira/browse/YARN-6539 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Xie YiFan >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: YARN-6359_1.patch, YARN-6359_2.patch, > YARN-6539-branch-3.1.0.004.patch, YARN-6539-branch-3.1.0.005.patch, > YARN-6539.006.patch, YARN-6539.007.patch, YARN-6539.008.patch, > YARN-6539_3.patch, YARN-6539_4.patch > > Time Spent: 5.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation
[ https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584582#comment-17584582 ] ASF GitHub Bot commented on YARN-11177: --- slfan1989 commented on code in PR #4764: URL: https://github.com/apache/hadoop/pull/4764#discussion_r954475014 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java: ## @@ -925,13 +1041,61 @@ public ReservationListResponse listReservations( @Override public ReservationUpdateResponse updateReservation( ReservationUpdateRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null || request.getReservationId() == null +|| request.getReservationDefinition() == null) { + routerMetrics.incrUpdateReservationFailedRetrieved(); + RouterServerUtil.logAndThrowException( + "Missing updateReservation request or reservationId or reservation definition.", null); +} + +long startTime = clock.getTime(); +ReservationId reservationId = request.getReservationId(); +SubClusterId subClusterId = getReservationHomeSubCluster(reservationId); + +ApplicationClientProtocol client; Review Comment: I will fix it. ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java: ## @@ -925,13 +1041,61 @@ public ReservationListResponse listReservations( @Override public ReservationUpdateResponse updateReservation( ReservationUpdateRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null || request.getReservationId() == null +|| request.getReservationDefinition() == null) { + routerMetrics.incrUpdateReservationFailedRetrieved(); + RouterServerUtil.logAndThrowException( + "Missing updateReservation request or reservationId or reservation definition.", null); +} + +long startTime = clock.getTime(); +ReservationId reservationId = request.getReservationId(); +SubClusterId subClusterId = getReservationHomeSubCluster(reservationId); + +ApplicationClientProtocol client; +ReservationUpdateResponse response = null; +try { + client = getClientRMProxyForSubCluster(subClusterId); + response = client.updateReservation(request); +} catch (Exception ex) { + routerMetrics.incrUpdateReservationFailedRetrieved(); + RouterServerUtil.logAndThrowException( + "Unable to reservation update due to exception.", ex); +} +long stopTime = clock.getTime(); +routerMetrics.succeededUpdateReservationRetrieved(stopTime - startTime); +return response; } @Override public ReservationDeleteResponse deleteReservation( ReservationDeleteRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); +if (request == null || request.getReservationId() == null) { + routerMetrics.incrDeleteReservationFailedRetrieved(); + RouterServerUtil.logAndThrowException( + "Missing deleteReservation request or reservationId.", null); +} + +long startTime = clock.getTime(); +ReservationId reservationId = request.getReservationId(); +SubClusterId subClusterId = getReservationHomeSubCluster(reservationId); + +ApplicationClientProtocol client; +ReservationDeleteResponse response = null; +try { + client = getClientRMProxyForSubCluster(subClusterId); Review Comment: I will fix it. > Support getNewReservation, submitReservation, updateReservation, > deleteReservation API's for Federation > --- > > Key: YARN-11177 > URL: https://issues.apache.org/jira/browse/YARN-11177 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation
[ https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584580#comment-17584580 ] ASF GitHub Bot commented on YARN-11177: --- slfan1989 commented on code in PR #4764: URL: https://github.com/apache/hadoop/pull/4764#discussion_r954473098 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java: ## @@ -888,13 +890,127 @@ public MoveApplicationAcrossQueuesResponse moveApplicationAcrossQueues( @Override public GetNewReservationResponse getNewReservation( GetNewReservationRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null) { + routerMetrics.incrGetNewReservationFailedRetrieved(); + String errMsg = "Missing getNewReservation request."; + RouterServerUtil.logAndThrowException(errMsg, null); +} + +long startTime = clock.getTime(); +Map subClustersActive = +federationFacade.getSubClusters(true); + +for (int i = 0; i < numSubmitRetries; ++i) { + SubClusterId subClusterId = getRandomActiveSubCluster(subClustersActive); + LOG.info("getNewReservation try #{} on SubCluster {}.", i, subClusterId); + ApplicationClientProtocol clientRMProxy = getClientRMProxyForSubCluster(subClusterId); + GetNewReservationResponse response = null; + try { +response = clientRMProxy.getNewReservation(request); +if (response != null) { + long stopTime = clock.getTime(); + routerMetrics.succeededGetNewReservationRetrieved(stopTime - startTime); + return response; +} + } catch (Exception e) { +LOG.warn("Unable to create a new Reservation in SubCluster {}.", subClusterId.getId(), e); +subClustersActive.remove(subClusterId); + } +} + +routerMetrics.incrGetNewReservationFailedRetrieved(); +String errMsg = "Failed to create a new reservation."; +throw new YarnException(errMsg); } @Override public ReservationSubmissionResponse submitReservation( ReservationSubmissionRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null || request.getReservationId() == null +|| request.getReservationDefinition() == null || request.getQueue() == null) { Review Comment: I will fix it. > Support getNewReservation, submitReservation, updateReservation, > deleteReservation API's for Federation > --- > > Key: YARN-11177 > URL: https://issues.apache.org/jira/browse/YARN-11177 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11277) trigger deletion of log-dir by size for NonAggregatingLogHandler
[ https://issues.apache.org/jira/browse/YARN-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584549#comment-17584549 ] ASF GitHub Bot commented on YARN-11277: --- slfan1989 commented on PR #4797: URL: https://github.com/apache/hadoop/pull/4797#issuecomment-1226681435 @leixm Fix CheckStyle. > trigger deletion of log-dir by size for NonAggregatingLogHandler > > > Key: YARN-11277 > URL: https://issues.apache.org/jira/browse/YARN-11277 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.4.0 >Reporter: Xianming Lei >Priority: Minor > Labels: pull-request-available > > In our yarn cluster, the log files of some containers are too large, which > causes the NodeManager to frequently switch to the unhealthy state. For logs > that are too large, we can consider deleting them directly without delaying > yarn.nodemanager.log.retain-seconds. > Cluster environment: > # 8k nodes+ > # 50w+ apps / day > Configuration: > # yarn.nodemanager.log.retain-seconds=3days > # yarn.log-aggregation-enable=false > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11275) [Federation] Add batchFinishApplicationMaster in UAMPoolManager
[ https://issues.apache.org/jira/browse/YARN-11275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584547#comment-17584547 ] ASF GitHub Bot commented on YARN-11275: --- slfan1989 commented on code in PR #4792: URL: https://github.com/apache/hadoop/pull/4792#discussion_r954433018 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/uam/UnmanagedAMPoolManager.java: ## @@ -450,4 +452,52 @@ public void drainUAMHeartbeats() { uam.drainHeartbeatThread(); } } + + /** + * Complete FinishApplicationMaster interface calls in batches. + * + * @param request FinishApplicationMasterRequest + * @param appId application Id + * @return Returns the Map map, + * the key is subClusterId, the value is FinishApplicationMasterResponse + */ + public Map batchFinishApplicationMaster( + FinishApplicationMasterRequest request, String appId) { + +Map responseMap = new HashMap<>(); +Set subClusterIds = this.unmanagedAppMasterMap.keySet(); + +if (subClusterIds != null && !subClusterIds.isEmpty()) { + ExecutorCompletionService> finishAppService = + new ExecutorCompletionService<>(this.threadpool); + LOG.info("Sending finish application request to {} sub-cluster RMs", subClusterIds.size()); + + for (final String subClusterId : subClusterIds) { +finishAppService.submit(() -> { + LOG.info("Sending finish application request to RM {}", subClusterId); + FinishApplicationMasterResponse uamResponse = null; + try { +uamResponse = finishApplicationMaster(subClusterId, request); Review Comment: Thanks for your suggestion, the code looks very good, I will modify it. ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestFederationInterceptor.java: ## @@ -969,4 +969,58 @@ private PreemptionMessage createDummyPreemptionMessage( preemptionMessage.setContract(contract); return preemptionMessage; } + + @Test + public void testBatchFinishApplicationMaster() throws IOException, InterruptedException { + +final RegisterApplicationMasterRequest registerReq = +Records.newRecord(RegisterApplicationMasterRequest.class); +registerReq.setHost(Integer.toString(testAppId)); +registerReq.setRpcPort(testAppId); +registerReq.setTrackingUrl(""); + +UserGroupInformation ugi = interceptor.getUGIWithToken(interceptor.getAttemptId()); + +ugi.doAs((PrivilegedExceptionAction) () -> { + + // Register the application + RegisterApplicationMasterRequest registerReq1 = + Records.newRecord(RegisterApplicationMasterRequest.class); + registerReq1.setHost(Integer.toString(testAppId)); + registerReq1.setRpcPort(0); + registerReq1.setTrackingUrl(""); + + // Register ApplicationMaster + RegisterApplicationMasterResponse registerResponse = + interceptor.registerApplicationMaster(registerReq1); + Assert.assertNotNull(registerResponse); + lastResponseId = 0; + + Assert.assertEquals(0, interceptor.getUnmanagedAMPoolSize()); + + // Allocate the first batch of containers, with sc1 and sc2 active + registerSubCluster(SubClusterId.newInstance("SC-1")); + registerSubCluster(SubClusterId.newInstance("SC-2")); + + int numberOfContainers = 3; + List containers = + getContainersAndAssert(numberOfContainers, numberOfContainers * 2); + Assert.assertEquals(2, interceptor.getUnmanagedAMPoolSize()); + Assert.assertEquals(numberOfContainers * 2, containers.size()); + + // Finish the application + FinishApplicationMasterRequest finishReq = + Records.newRecord(FinishApplicationMasterRequest.class); + finishReq.setDiagnostics(""); + finishReq.setTrackingUrl(""); + finishReq.setFinalApplicationStatus(FinalApplicationStatus.SUCCEEDED); + + FinishApplicationMasterResponse finshResponse = Review Comment: I will fix it. > [Federation] Add batchFinishApplicationMaster in UAMPoolManager > --- > > Key: YARN-11275 > URL: https://issues.apache.org/jira/browse/YARN-11275 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, nodemanager >Affects Versions: 3.4.0 >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h
[jira] [Commented] (YARN-11275) [Federation] Add batchFinishApplicationMaster in UAMPoolManager
[ https://issues.apache.org/jira/browse/YARN-11275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584521#comment-17584521 ] ASF GitHub Bot commented on YARN-11275: --- goiri commented on code in PR #4792: URL: https://github.com/apache/hadoop/pull/4792#discussion_r954407506 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestFederationInterceptor.java: ## @@ -969,4 +969,58 @@ private PreemptionMessage createDummyPreemptionMessage( preemptionMessage.setContract(contract); return preemptionMessage; } + + @Test + public void testBatchFinishApplicationMaster() throws IOException, InterruptedException { + +final RegisterApplicationMasterRequest registerReq = +Records.newRecord(RegisterApplicationMasterRequest.class); +registerReq.setHost(Integer.toString(testAppId)); +registerReq.setRpcPort(testAppId); +registerReq.setTrackingUrl(""); + +UserGroupInformation ugi = interceptor.getUGIWithToken(interceptor.getAttemptId()); + +ugi.doAs((PrivilegedExceptionAction) () -> { + + // Register the application + RegisterApplicationMasterRequest registerReq1 = + Records.newRecord(RegisterApplicationMasterRequest.class); + registerReq1.setHost(Integer.toString(testAppId)); + registerReq1.setRpcPort(0); + registerReq1.setTrackingUrl(""); + + // Register ApplicationMaster + RegisterApplicationMasterResponse registerResponse = + interceptor.registerApplicationMaster(registerReq1); + Assert.assertNotNull(registerResponse); + lastResponseId = 0; + + Assert.assertEquals(0, interceptor.getUnmanagedAMPoolSize()); + + // Allocate the first batch of containers, with sc1 and sc2 active + registerSubCluster(SubClusterId.newInstance("SC-1")); + registerSubCluster(SubClusterId.newInstance("SC-2")); + + int numberOfContainers = 3; + List containers = + getContainersAndAssert(numberOfContainers, numberOfContainers * 2); + Assert.assertEquals(2, interceptor.getUnmanagedAMPoolSize()); + Assert.assertEquals(numberOfContainers * 2, containers.size()); + + // Finish the application + FinishApplicationMasterRequest finishReq = + Records.newRecord(FinishApplicationMasterRequest.class); + finishReq.setDiagnostics(""); + finishReq.setTrackingUrl(""); + finishReq.setFinalApplicationStatus(FinalApplicationStatus.SUCCEEDED); + + FinishApplicationMasterResponse finshResponse = Review Comment: Single line? ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/uam/UnmanagedAMPoolManager.java: ## @@ -450,4 +452,52 @@ public void drainUAMHeartbeats() { uam.drainHeartbeatThread(); } } + + /** + * Complete FinishApplicationMaster interface calls in batches. + * + * @param request FinishApplicationMasterRequest + * @param appId application Id + * @return Returns the Map map, + * the key is subClusterId, the value is FinishApplicationMasterResponse + */ + public Map batchFinishApplicationMaster( + FinishApplicationMasterRequest request, String appId) { + +Map responseMap = new HashMap<>(); +Set subClusterIds = this.unmanagedAppMasterMap.keySet(); + +if (subClusterIds != null && !subClusterIds.isEmpty()) { + ExecutorCompletionService> finishAppService = + new ExecutorCompletionService<>(this.threadpool); + LOG.info("Sending finish application request to {} sub-cluster RMs", subClusterIds.size()); + + for (final String subClusterId : subClusterIds) { +finishAppService.submit(() -> { + LOG.info("Sending finish application request to RM {}", subClusterId); + FinishApplicationMasterResponse uamResponse = null; + try { +uamResponse = finishApplicationMaster(subClusterId, request); Review Comment: ``` try { FinishApplicationMasterResponse uamResponse = finishApplicationMaster(subClusterId, request); return Collections.singletonMap(subClusterId, uamResponse); } catch (Throwable e) { LOG.warn("Failed to finish unmanaged application master: RM address: {} ApplicationId: {}", subClusterId, appId, e); return Collections.singletonMap(subClusterId, null); } > [Federation] Add batchFinishApplicationMaster in UAMPoolManager > --- > > Key: YARN-11275 > URL: https://issues.apache.org/jira/browse/YARN-11275 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, nodemanager >Affects Versions: 3.4.0 >Reporter: fanshilun >Assignee: fanshilun >
[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation
[ https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584517#comment-17584517 ] ASF GitHub Bot commented on YARN-11177: --- goiri commented on code in PR #4764: URL: https://github.com/apache/hadoop/pull/4764#discussion_r954396015 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java: ## @@ -925,13 +1041,61 @@ public ReservationListResponse listReservations( @Override public ReservationUpdateResponse updateReservation( ReservationUpdateRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null || request.getReservationId() == null +|| request.getReservationDefinition() == null) { Review Comment: Indentation ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java: ## @@ -888,13 +890,127 @@ public MoveApplicationAcrossQueuesResponse moveApplicationAcrossQueues( @Override public GetNewReservationResponse getNewReservation( GetNewReservationRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null) { + routerMetrics.incrGetNewReservationFailedRetrieved(); + String errMsg = "Missing getNewReservation request."; + RouterServerUtil.logAndThrowException(errMsg, null); +} + +long startTime = clock.getTime(); +Map subClustersActive = +federationFacade.getSubClusters(true); + +for (int i = 0; i < numSubmitRetries; ++i) { + SubClusterId subClusterId = getRandomActiveSubCluster(subClustersActive); + LOG.info("getNewReservation try #{} on SubCluster {}.", i, subClusterId); + ApplicationClientProtocol clientRMProxy = getClientRMProxyForSubCluster(subClusterId); + GetNewReservationResponse response = null; + try { +response = clientRMProxy.getNewReservation(request); +if (response != null) { + long stopTime = clock.getTime(); + routerMetrics.succeededGetNewReservationRetrieved(stopTime - startTime); + return response; +} + } catch (Exception e) { +LOG.warn("Unable to create a new Reservation in SubCluster {}.", subClusterId.getId(), e); +subClustersActive.remove(subClusterId); + } +} + +routerMetrics.incrGetNewReservationFailedRetrieved(); +String errMsg = "Failed to create a new reservation."; +throw new YarnException(errMsg); } @Override public ReservationSubmissionResponse submitReservation( ReservationSubmissionRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null || request.getReservationId() == null +|| request.getReservationDefinition() == null || request.getQueue() == null) { + routerMetrics.incrSubmitReservationFailedRetrieved(); + RouterServerUtil.logAndThrowException( + "Missing submitReservation request or reservationId " + + "or reservation definition or queue.", null); +} + +long startTime = clock.getTime(); +ReservationId reservationId = request.getReservationId(); + +long retryCount = 0; +boolean firstRetry = true; + +while (retryCount < numSubmitRetries) { + + SubClusterId subClusterId = policyFacade.getReservationHomeSubCluster(request); + LOG.info("submitReservation reservationId {} try #{} on SubCluster {}.", + reservationId, retryCount, subClusterId); + + ReservationHomeSubCluster reservationHomeSubCluster = + ReservationHomeSubCluster.newInstance(reservationId, subClusterId); + + // If it is the first attempt,use StateStore to add the + // mapping of reservationId and subClusterId. + // if the number of attempts is greater than 1, use StateStore to update the mapping. + if (firstRetry) { +try { + // persist the mapping of reservationId and the subClusterId which has + // been selected as its home + subClusterId = federationFacade.addReservationHomeSubCluster(reservationHomeSubCluster); + firstRetry = false; +} catch (YarnException e) { + routerMetrics.incrSubmitReservationFailedRetrieved(); + RouterServerUtil.logAndThrowException(e, + "Unable to insert the ReservationId %s into the FederationStateStore.", + reservationId); +} + } else { +try { + // update the mapping of reservationId and the home subClusterId to + // the new subClusterId we have selected +
[jira] [Commented] (YARN-11277) trigger deletion of log-dir by size for NonAggregatingLogHandler
[ https://issues.apache.org/jira/browse/YARN-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1758#comment-1758 ] ASF GitHub Bot commented on YARN-11277: --- hadoop-yetus commented on PR #4797: URL: https://github.com/apache/hadoop/pull/4797#issuecomment-1226185659 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 38s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 45s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 26m 59s | | trunk passed | | +1 :green_heart: | compile | 10m 34s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 9m 13s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 2m 3s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 53s | | trunk passed | | +1 :green_heart: | javadoc | 3m 37s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 3m 29s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 43s | | trunk passed | | -1 :x: | shadedclient | 3m 55s | | branch has errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 30s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 11s | | the patch passed | | +1 :green_heart: | compile | 9m 6s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 9m 5s | | the patch passed | | +1 :green_heart: | compile | 8m 30s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 8m 30s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 57s | [/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4797/2/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt) | hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 187 unchanged - 34 fixed = 189 total (was 221) | | +1 :green_heart: | mvnsite | 3m 16s | | the patch passed | | +1 :green_heart: | javadoc | 3m 12s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 3m 15s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 31s | | the patch passed | | -1 :x: | shadedclient | 3m 35s | | patch has errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 20s | | hadoop-yarn-api in the patch passed. | | +1 :green_heart: | unit | 4m 54s | | hadoop-yarn-common in the patch passed. | | +1 :green_heart: | unit | 25m 2s | | hadoop-yarn-server-nodemanager in the patch passed. | | +1 :green_heart: | asflicense | 1m 4s | | The patch does not generate ASF License warnings. | | | | 163m 22s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4797/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4797 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux 6252cc763226 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 9d16fc2482b0d057d5ba635c67c9c1c081864535 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/ja
[jira] [Commented] (YARN-9708) Yarn Router Support DelegationToken
[ https://issues.apache.org/jira/browse/YARN-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584421#comment-17584421 ] ASF GitHub Bot commented on YARN-9708: -- hadoop-yetus commented on PR #4746: URL: https://github.com/apache/hadoop/pull/4746#issuecomment-1226102617 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 1s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 1s | | xmllint was not available. | | +0 :ok: | buf | 0m 1s | | buf was not available. | | +0 :ok: | buf | 0m 1s | | buf was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 5 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 5s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 28m 42s | | trunk passed | | +1 :green_heart: | compile | 10m 42s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 9m 10s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 2m 6s | | trunk passed | | +1 :green_heart: | mvnsite | 4m 30s | | trunk passed | | +1 :green_heart: | javadoc | 4m 10s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 3m 53s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 7m 23s | | trunk passed | | -1 :x: | shadedclient | 2m 50s | | branch has errors when building and testing our client artifacts. | | -0 :warning: | patch | 3m 15s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 27s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 39s | | the patch passed | | +1 :green_heart: | compile | 9m 56s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | cc | 9m 56s | | the patch passed | | -1 :x: | javac | 9m 56s | [/results-compile-javac-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4746/9/artifact/out/results-compile-javac-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 generated 1 new + 740 unchanged - 0 fixed = 741 total (was 740) | | +1 :green_heart: | compile | 9m 1s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | cc | 9m 1s | | the patch passed | | -1 :x: | javac | 9m 1s | [/results-compile-javac-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4746/9/artifact/out/results-compile-javac-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt) | hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 generated 3 new + 649 unchanged - 2 fixed = 652 total (was 651) | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 50s | [/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4746/9/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt) | hadoop-yarn-project/hadoop-yarn: The patch generated 3 new + 26 unchanged - 2 fixed = 29 total (was 28) | | +1 :green_heart: | mvnsite | 4m 3s | | the patch passed | | +1 :green_heart: | javadoc | 3m 42s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 3m 36s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs
[jira] [Commented] (YARN-11276) Add lru cache for RMWebServices.getApps
[ https://issues.apache.org/jira/browse/YARN-11276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584391#comment-17584391 ] ASF GitHub Bot commented on YARN-11276: --- hadoop-yetus commented on PR #4793: URL: https://github.com/apache/hadoop/pull/4793#issuecomment-1226033589 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 48s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 0m 29s | | Maven dependency ordering for branch | | -1 :x: | mvninstall | 0m 30s | [/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4793/9/artifact/out/branch-mvninstall-root.txt) | root in trunk failed. | | -1 :x: | compile | 0m 29s | [/branch-compile-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4793/9/artifact/out/branch-compile-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hadoop-yarn in trunk failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. | | -1 :x: | compile | 0m 29s | [/branch-compile-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4793/9/artifact/out/branch-compile-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt) | hadoop-yarn in trunk failed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07. | | -0 :warning: | checkstyle | 0m 30s | [/buildtool-branch-checkstyle-hadoop-yarn-project_hadoop-yarn.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4793/9/artifact/out/buildtool-branch-checkstyle-hadoop-yarn-project_hadoop-yarn.txt) | The patch fails to run checkstyle in hadoop-yarn | | -1 :x: | mvnsite | 0m 29s | [/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4793/9/artifact/out/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt) | hadoop-yarn-api in trunk failed. | | -1 :x: | mvnsite | 0m 29s | [/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4793/9/artifact/out/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt) | hadoop-yarn-common in trunk failed. | | -1 :x: | mvnsite | 0m 29s | [/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4793/9/artifact/out/branch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt) | hadoop-yarn-server-resourcemanager in trunk failed. | | -1 :x: | javadoc | 0m 29s | [/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4793/9/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hadoop-yarn-api in trunk failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. | | -1 :x: | javadoc | 0m 28s | [/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4793/9/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hadoop-yarn-common in trunk failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. | | -1 :x: | javadoc | 0m 29s | [/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4793/9/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hado
[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation
[ https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584379#comment-17584379 ] ASF GitHub Bot commented on YARN-11177: --- hadoop-yetus commented on PR #4764: URL: https://github.com/apache/hadoop/pull/4764#issuecomment-1226005145 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 13s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 2s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 2s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 2s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 6 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 18s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 28m 34s | | trunk passed | | +1 :green_heart: | compile | 5m 15s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 4m 14s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 31s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 2s | | trunk passed | | +1 :green_heart: | javadoc | 2m 53s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 2m 25s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 5m 3s | | trunk passed | | -1 :x: | shadedclient | 2m 7s | | branch has errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 27s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 3s | | the patch passed | | +1 :green_heart: | compile | 4m 26s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 4m 26s | | the patch passed | | +1 :green_heart: | compile | 3m 19s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 3m 19s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 14s | | the patch passed | | +1 :green_heart: | mvnsite | 2m 14s | | the patch passed | | +1 :green_heart: | javadoc | 1m 49s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 38s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 4m 37s | | the patch passed | | -1 :x: | shadedclient | 1m 46s | | patch has errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 59s | | hadoop-yarn-server-common in the patch passed. | | +1 :green_heart: | unit | 103m 21s | | hadoop-yarn-server-resourcemanager in the patch passed. | | +1 :green_heart: | unit | 3m 32s | | hadoop-yarn-server-router in the patch passed. | | +1 :green_heart: | asflicense | 0m 38s | | The patch does not generate ASF License warnings. | | | | 208m 22s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4764/9/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4764 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux 5b5897d42341 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 2c702f84ff006dfa5ce4765d561e02ca4bd488ad | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4764/9/testReport/
[jira] [Commented] (YARN-9708) Yarn Router Support DelegationToken
[ https://issues.apache.org/jira/browse/YARN-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584355#comment-17584355 ] ASF GitHub Bot commented on YARN-9708: -- goiri commented on code in PR #4746: URL: https://github.com/apache/hadoop/pull/4746#discussion_r954007246 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/impl/pb/package-info.java: ## @@ -0,0 +1,20 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +@InterfaceAudience.Public +package org.apache.hadoop.yarn.security.client.impl.pb; +import org.apache.hadoop.classification.InterfaceAudience; Review Comment: Import Public ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/federation/store/impl/MemoryFederationStateStore.java: ## @@ -395,4 +535,17 @@ public DeleteReservationHomeSubClusterResponse deleteReservationHomeSubCluster( reservations.remove(reservationId); return DeleteReservationHomeSubClusterResponse.newInstance(); } -} + + /** + * Get DelegationKey By based on MasterKey. + * + * @param masterKey masterKey + * @return DelegationKey + */ + private DelegationKey getDelegationKeyByMasterKey(RouterMasterKey masterKey) { Review Comment: static? > Yarn Router Support DelegationToken > --- > > Key: YARN-9708 > URL: https://issues.apache.org/jira/browse/YARN-9708 > Project: Hadoop YARN > Issue Type: New Feature > Components: router >Affects Versions: 3.1.1 >Reporter: Xie YiFan >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Attachments: Add_getDelegationToken_and_SecureLogin_in_router.patch, > RMDelegationTokenSecretManager_storeNewMasterKey.svg, > RouterDelegationTokenSecretManager_storeNewMasterKey.svg > > > 1.we use router as proxy to manage multiple cluster which be independent of > each other in order to apply unified client. Thus, we implement our > customized AMRMProxyPolicy that doesn't broadcast ResourceRequest to other > cluster. > 2.Our production environment need kerberos. But router doesn't support > SecureLogin for now. > https://issues.apache.org/jira/browse/YARN-6539 desn't work. So we > improvement it. > 3.Some framework like oozie would get Token via yarnclient#getDelegationToken > which router doesn't support. Our solution is that adding homeCluster to > ApplicationSubmissionContextProto & GetDelegationTokenRequestProto. Job would > be submitted with specified clusterid so that router knows which cluster to > submit this job. Router would get Token from one RM according to specified > clusterid when client call getDelegation meanwhile apply some mechanism to > save this token in memory. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation
[ https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584345#comment-17584345 ] ASF GitHub Bot commented on YARN-11177: --- goiri commented on code in PR #4764: URL: https://github.com/apache/hadoop/pull/4764#discussion_r954003678 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java: ## @@ -925,13 +1041,61 @@ public ReservationListResponse listReservations( @Override public ReservationUpdateResponse updateReservation( ReservationUpdateRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null || request.getReservationId() == null +|| request.getReservationDefinition() == null) { + routerMetrics.incrUpdateReservationFailedRetrieved(); + RouterServerUtil.logAndThrowException( + "Missing updateReservation request or reservationId or reservation definition.", null); +} + +long startTime = clock.getTime(); +ReservationId reservationId = request.getReservationId(); +SubClusterId subClusterId = getReservationHomeSubCluster(reservationId); + +ApplicationClientProtocol client; Review Comment: Declare in the try ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java: ## @@ -888,13 +890,127 @@ public MoveApplicationAcrossQueuesResponse moveApplicationAcrossQueues( @Override public GetNewReservationResponse getNewReservation( GetNewReservationRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null) { + routerMetrics.incrGetNewReservationFailedRetrieved(); + String errMsg = "Missing getNewReservation request."; + RouterServerUtil.logAndThrowException(errMsg, null); +} + +long startTime = clock.getTime(); +Map subClustersActive = +federationFacade.getSubClusters(true); + +for (int i = 0; i < numSubmitRetries; ++i) { + SubClusterId subClusterId = getRandomActiveSubCluster(subClustersActive); + LOG.info("getNewReservation try #{} on SubCluster {}.", i, subClusterId); + ApplicationClientProtocol clientRMProxy = getClientRMProxyForSubCluster(subClusterId); + GetNewReservationResponse response = null; + try { +response = clientRMProxy.getNewReservation(request); +if (response != null) { + long stopTime = clock.getTime(); + routerMetrics.succeededGetNewReservationRetrieved(stopTime - startTime); + return response; +} + } catch (Exception e) { +LOG.warn("Unable to create a new Reservation in SubCluster {}.", subClusterId.getId(), e); +subClustersActive.remove(subClusterId); + } +} + +routerMetrics.incrGetNewReservationFailedRetrieved(); +String errMsg = "Failed to create a new reservation."; +throw new YarnException(errMsg); } @Override public ReservationSubmissionResponse submitReservation( ReservationSubmissionRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null || request.getReservationId() == null +|| request.getReservationDefinition() == null || request.getQueue() == null) { Review Comment: indentation ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-router/src/main/java/org/apache/hadoop/yarn/server/router/clientrm/FederationClientInterceptor.java: ## @@ -925,13 +1041,61 @@ public ReservationListResponse listReservations( @Override public ReservationUpdateResponse updateReservation( ReservationUpdateRequest request) throws YarnException, IOException { -throw new NotImplementedException("Code is not implemented"); + +if (request == null || request.getReservationId() == null +|| request.getReservationDefinition() == null) { + routerMetrics.incrUpdateReservationFailedRetrieved(); + RouterServerUtil.logAndThrowException( + "Missing updateReservation request or reservationId or reservation definition.", null); +} + +long startTime = clock.getTime(); +ReservationId reservationId = request.getReservationId(); +SubClusterId subClusterId = getReservationHomeSubCluster(reservationId); + +ApplicationClientProtocol client; +ReservationUpdateResponse response = null; +try { + client = getClientRMProxyForSubCluster(subClusterId); + response = client.updateReservation(request); +} catch (Exception ex) { + routerMetrics.incrUpdateReservationFailedRetrieved
[jira] [Commented] (YARN-11253) Add Configuration to delegationToken RemoverScanInterval
[ https://issues.apache.org/jira/browse/YARN-11253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584341#comment-17584341 ] ASF GitHub Bot commented on YARN-11253: --- goiri commented on code in PR #4751: URL: https://github.com/apache/hadoop/pull/4751#discussion_r953999366 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMSecretManagerService.java: ## @@ -135,9 +135,11 @@ protected RMDelegationTokenSecretManager createRMDelegationTokenSecretManager( long tokenRenewInterval = conf.getLong(YarnConfiguration.RM_DELEGATION_TOKEN_RENEW_INTERVAL_KEY, YarnConfiguration.RM_DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT); - +long removeScanInterval = + conf.getLong(YarnConfiguration.RM_DELEGATION_TOKEN_REMOVE_SCAN_INTERVAL_KEY, Review Comment: We should do getTimeDuration ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml: ## @@ -1077,6 +1077,14 @@ 8640 + + + RM delegation token remove-scan interval in ms + +yarn.resourcemanager.delegation.token.remove-scan-interval +360 Review Comment: 10h and use time duration > Add Configuration to delegationToken RemoverScanInterval > > > Key: YARN-11253 > URL: https://issues.apache.org/jira/browse/YARN-11253 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.4.0, 3.3.4 >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Labels: pull-request-available > > When reading the code, I found the case of hard coding, I think the > parameters should be abstracted into the configuration. > org.apache.hadoop.yarn.server.resourcemanager.RMSecretManagerService# > createRMDelegationTokenSecretManager > {code:java} > protected RMDelegationTokenSecretManager > createRMDelegationTokenSecretManager(Configuration conf, RMContext rmContext) > { >// . 360 This hard code should be extracted >return new RMDelegationTokenSecretManager(secretKeyInterval, > tokenMaxLifetime, tokenRenewInterval, 360, rmContext); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11276) Add lru cache for RMWebServices.getApps
[ https://issues.apache.org/jira/browse/YARN-11276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584324#comment-17584324 ] ASF GitHub Bot commented on YARN-11276: --- hadoop-yetus commented on PR #4793: URL: https://github.com/apache/hadoop/pull/4793#issuecomment-1225904734 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 47s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 1s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 16m 2s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 27m 6s | | trunk passed | | +1 :green_heart: | compile | 10m 12s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 9m 0s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 59s | | trunk passed | | +1 :green_heart: | mvnsite | 4m 17s | | trunk passed | | +1 :green_heart: | javadoc | 4m 1s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 3m 34s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 7m 15s | | trunk passed | | -1 :x: | shadedclient | 4m 18s | | branch has errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 29s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 27s | | the patch passed | | +1 :green_heart: | compile | 9m 49s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 9m 49s | | the patch passed | | +1 :green_heart: | compile | 8m 51s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 8m 51s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 58s | [/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4793/8/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt) | hadoop-yarn-project/hadoop-yarn: The patch generated 5 new + 174 unchanged - 0 fixed = 179 total (was 174) | | +1 :green_heart: | mvnsite | 3m 45s | | the patch passed | | +1 :green_heart: | javadoc | 3m 24s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 3m 16s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 7m 44s | | the patch passed | | -1 :x: | shadedclient | 4m 0s | | patch has errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 24s | | hadoop-yarn-api in the patch passed. | | +1 :green_heart: | unit | 5m 5s | | hadoop-yarn-common in the patch passed. | | +1 :green_heart: | unit | 100m 56s | | hadoop-yarn-server-resourcemanager in the patch passed. | | +1 :green_heart: | asflicense | 1m 23s | | The patch does not generate ASF License warnings. | | | | 245m 25s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4793/8/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4793 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux 8f1f431fd5a6 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / fc010ab080bde9b23887a0a3a757eda9afd657ff | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm
[jira] [Commented] (YARN-11247) Remove unused classes introduced by YARN-9615
[ https://issues.apache.org/jira/browse/YARN-11247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584316#comment-17584316 ] ASF GitHub Bot commented on YARN-11247: --- slfan1989 commented on PR #4720: URL: https://github.com/apache/hadoop/pull/4720#issuecomment-1225870451 @ayushtkn Can you help review this pr? I want to delete DisableEventTypeMetrics.java because this class is not used. Thank you very much! > Remove unused classes introduced by YARN-9615 > - > > Key: YARN-11247 > URL: https://issues.apache.org/jira/browse/YARN-11247 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 3.4.0 >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Attachments: DisableEventTypeMetrics-Not used.png > > > YARN-9615 adds Metric to RM's dispatcher, but the patch introduces a class > without any usage > org.apache.hadoop.yarn.metrics#DisableEventTypeMetrics > 1. Without any code references > 2. Without any test code references > 3. Delete this class, the local can be compiled successfully > I think this class can be removed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11240) Fix incorrect placeholder in yarn-module
[ https://issues.apache.org/jira/browse/YARN-11240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584287#comment-17584287 ] ASF GitHub Bot commented on YARN-11240: --- ayushtkn merged PR #4678: URL: https://github.com/apache/hadoop/pull/4678 > Fix incorrect placeholder in yarn-module > > > Key: YARN-11240 > URL: https://issues.apache.org/jira/browse/YARN-11240 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Try to deal with the moudle problem at a time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11240) Fix incorrect placeholder in yarn-module
[ https://issues.apache.org/jira/browse/YARN-11240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584288#comment-17584288 ] Ayush Saxena commented on YARN-11240: - Committed to trunk. Thanx [~slfan1989] for the contribution!!! > Fix incorrect placeholder in yarn-module > > > Key: YARN-11240 > URL: https://issues.apache.org/jira/browse/YARN-11240 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Try to deal with the moudle problem at a time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11219) [Federation] Add getAppActivities, getAppStatistics REST APIs for Router
[ https://issues.apache.org/jira/browse/YARN-11219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584279#comment-17584279 ] ASF GitHub Bot commented on YARN-11219: --- slfan1989 commented on PR #4757: URL: https://github.com/apache/hadoop/pull/4757#issuecomment-1225806212 @goiri Please help to review the code again, Thank you very much! > [Federation] Add getAppActivities, getAppStatistics REST APIs for Router > > > Key: YARN-11219 > URL: https://issues.apache.org/jira/browse/YARN-11219 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation >Affects Versions: 3.4.0 >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11177) Support getNewReservation, submitReservation, updateReservation, deleteReservation API's for Federation
[ https://issues.apache.org/jira/browse/YARN-11177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584278#comment-17584278 ] ASF GitHub Bot commented on YARN-11177: --- slfan1989 commented on PR #4764: URL: https://github.com/apache/hadoop/pull/4764#issuecomment-122580 @goiri Please help to review the code again, Thank you very much! > Support getNewReservation, submitReservation, updateReservation, > deleteReservation API's for Federation > --- > > Key: YARN-11177 > URL: https://issues.apache.org/jira/browse/YARN-11177 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9708) Yarn Router Support DelegationToken
[ https://issues.apache.org/jira/browse/YARN-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584277#comment-17584277 ] ASF GitHub Bot commented on YARN-9708: -- slfan1989 commented on PR #4746: URL: https://github.com/apache/hadoop/pull/4746#issuecomment-1225805056 @goiri Please help to review the code again, Thank you very much! > Yarn Router Support DelegationToken > --- > > Key: YARN-9708 > URL: https://issues.apache.org/jira/browse/YARN-9708 > Project: Hadoop YARN > Issue Type: New Feature > Components: router >Affects Versions: 3.1.1 >Reporter: Xie YiFan >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Attachments: Add_getDelegationToken_and_SecureLogin_in_router.patch, > RMDelegationTokenSecretManager_storeNewMasterKey.svg, > RouterDelegationTokenSecretManager_storeNewMasterKey.svg > > > 1.we use router as proxy to manage multiple cluster which be independent of > each other in order to apply unified client. Thus, we implement our > customized AMRMProxyPolicy that doesn't broadcast ResourceRequest to other > cluster. > 2.Our production environment need kerberos. But router doesn't support > SecureLogin for now. > https://issues.apache.org/jira/browse/YARN-6539 desn't work. So we > improvement it. > 3.Some framework like oozie would get Token via yarnclient#getDelegationToken > which router doesn't support. Our solution is that adding homeCluster to > ApplicationSubmissionContextProto & GetDelegationTokenRequestProto. Job would > be submitted with specified clusterid so that router knows which cluster to > submit this job. Router would get Token from one RM according to specified > clusterid when client call getDelegation meanwhile apply some mechanism to > save this token in memory. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11253) Add Configuration to delegationToken RemoverScanInterval
[ https://issues.apache.org/jira/browse/YARN-11253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584265#comment-17584265 ] ASF GitHub Bot commented on YARN-11253: --- hadoop-yetus commented on PR #4751: URL: https://github.com/apache/hadoop/pull/4751#issuecomment-1225776565 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 39s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 1s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 22s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 25m 33s | | trunk passed | | +1 :green_heart: | compile | 9m 46s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 8m 36s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 58s | | trunk passed | | +1 :green_heart: | mvnsite | 4m 9s | | trunk passed | | +1 :green_heart: | javadoc | 3m 54s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 3m 37s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 7m 7s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 42s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 33s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 18s | | the patch passed | | +1 :green_heart: | compile | 10m 29s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 10m 29s | | the patch passed | | +1 :green_heart: | compile | 11m 43s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 11m 43s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 2m 32s | | the patch passed | | +1 :green_heart: | mvnsite | 4m 7s | | the patch passed | | +1 :green_heart: | javadoc | 3m 38s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 3m 47s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 9m 13s | | the patch passed | | +1 :green_heart: | shadedclient | 23m 5s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 26s | | hadoop-yarn-api in the patch passed. | | +1 :green_heart: | unit | 5m 14s | | hadoop-yarn-common in the patch passed. | | +1 :green_heart: | unit | 98m 49s | | hadoop-yarn-server-resourcemanager in the patch passed. | | +1 :green_heart: | asflicense | 1m 8s | | The patch does not generate ASF License warnings. | | | | 282m 58s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4751/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4751 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux 276b62cfb077 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d1a76acfd00f335894b079f277b46d692b19fea3 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4751/3/te
[jira] [Commented] (YARN-11276) Add lru cache for RMWebServices.getApps
[ https://issues.apache.org/jira/browse/YARN-11276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584256#comment-17584256 ] ASF GitHub Bot commented on YARN-11276: --- hadoop-yetus commented on PR #4793: URL: https://github.com/apache/hadoop/pull/4793#issuecomment-1225758308 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 41s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 13s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 25m 30s | | trunk passed | | +1 :green_heart: | compile | 9m 49s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 8m 46s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 2m 14s | | trunk passed | | +1 :green_heart: | mvnsite | 4m 5s | | trunk passed | | +1 :green_heart: | javadoc | 3m 54s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 3m 24s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 52s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 0s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 30s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 20s | | the patch passed | | +1 :green_heart: | compile | 9m 18s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 9m 18s | | the patch passed | | +1 :green_heart: | compile | 9m 25s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 9m 25s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 2m 11s | [/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4793/7/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt) | hadoop-yarn-project/hadoop-yarn: The patch generated 5 new + 174 unchanged - 0 fixed = 179 total (was 174) | | +1 :green_heart: | mvnsite | 4m 31s | | the patch passed | | +1 :green_heart: | javadoc | 4m 9s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 4m 2s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 7m 18s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 49s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 26s | | hadoop-yarn-api in the patch passed. | | +1 :green_heart: | unit | 5m 21s | | hadoop-yarn-common in the patch passed. | | +1 :green_heart: | unit | 98m 37s | | hadoop-yarn-server-resourcemanager in the patch passed. | | +1 :green_heart: | asflicense | 1m 13s | | The patch does not generate ASF License warnings. | | | | 277m 58s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4793/7/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4793 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux cf5dc21d78c6 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / e91bc33f4aef663c62778b5ba4c99972b96cb0a7 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-
[jira] [Commented] (YARN-11275) [Federation] Add batchFinishApplicationMaster in UAMPoolManager
[ https://issues.apache.org/jira/browse/YARN-11275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584249#comment-17584249 ] ASF GitHub Bot commented on YARN-11275: --- slfan1989 commented on PR #4792: URL: https://github.com/apache/hadoop/pull/4792#issuecomment-1225727281 @goiri Please help to review the code again, Thank you very much! > [Federation] Add batchFinishApplicationMaster in UAMPoolManager > --- > > Key: YARN-11275 > URL: https://issues.apache.org/jira/browse/YARN-11275 > Project: Hadoop YARN > Issue Type: Improvement > Components: federation, nodemanager >Affects Versions: 3.4.0 >Reporter: fanshilun >Assignee: fanshilun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11277) trigger deletion of log-dir by size for NonAggregatingLogHandler
[ https://issues.apache.org/jira/browse/YARN-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584225#comment-17584225 ] ASF GitHub Bot commented on YARN-11277: --- slfan1989 commented on code in PR #4797: URL: https://github.com/apache/hadoop/pull/4797#discussion_r953721092 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/loghandler/NonAggregatingLogHandler.java: ## @@ -190,13 +196,24 @@ public void handle(LogHandlerEvent event) { } catch (IOException e) { LOG.error("Unable to record log deleter state", e); } -try { - sched.schedule(logDeleter, this.deleteDelaySeconds, - TimeUnit.SECONDS); -} catch (RejectedExecutionException e) { - // Handling this event in local thread before starting threads - // or after calling sched.shutdownNow(). +//delete no delay if log size exceed deleteThresholdMb +if (enableTriggerDeleteBySize && appLogSize >= deleteThresholdMb * BYTES_PER_MB) { + LOG.info("Log Deletion for application: " + + appId + ", with no delay, size=" + appLogSize); Review Comment: indentation? ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/loghandler/NonAggregatingLogHandler.java: ## @@ -190,13 +196,24 @@ public void handle(LogHandlerEvent event) { } catch (IOException e) { LOG.error("Unable to record log deleter state", e); } -try { - sched.schedule(logDeleter, this.deleteDelaySeconds, - TimeUnit.SECONDS); -} catch (RejectedExecutionException e) { - // Handling this event in local thread before starting threads - // or after calling sched.shutdownNow(). +//delete no delay if log size exceed deleteThresholdMb +if (enableTriggerDeleteBySize && appLogSize >= deleteThresholdMb * BYTES_PER_MB) { + LOG.info("Log Deletion for application: " + + appId + ", with no delay, size=" + appLogSize); logDeleter.run(); +} else { + // Schedule - so that logs are available on the UI till they're deleted. + LOG.info("Scheduling Log Deletion for application: " + + appId + ", with delay of " Review Comment: indentation? > trigger deletion of log-dir by size for NonAggregatingLogHandler > > > Key: YARN-11277 > URL: https://issues.apache.org/jira/browse/YARN-11277 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.4.0 >Reporter: Xianming Lei >Priority: Minor > Labels: pull-request-available > > In our yarn cluster, the log files of some containers are too large, which > causes the NodeManager to frequently switch to the unhealthy state. For logs > that are too large, we can consider deleting them directly without delaying > yarn.nodemanager.log.retain-seconds. > Cluster environment: > # 8k nodes+ > # 50w+ apps / day > Configuration: > # yarn.nodemanager.log.retain-seconds=3days > # yarn.log-aggregation-enable=false > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11277) trigger deletion of log-dir by size for NonAggregatingLogHandler
[ https://issues.apache.org/jira/browse/YARN-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584224#comment-17584224 ] ASF GitHub Bot commented on YARN-11277: --- slfan1989 commented on code in PR #4797: URL: https://github.com/apache/hadoop/pull/4797#discussion_r953720422 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/loghandler/NonAggregatingLogHandler.java: ## @@ -71,7 +71,10 @@ public class NonAggregatingLogHandler extends AbstractService implements private final LocalDirsHandlerService dirsHandler; private final NMStateStoreService stateStore; private long deleteDelaySeconds; + private boolean enableTriggerDeleteBySize; + private long deleteThresholdMb; private ScheduledThreadPoolExecutor sched; + public static final int BYTES_PER_MB = 1024 * 1024; Review Comment: why static ? > trigger deletion of log-dir by size for NonAggregatingLogHandler > > > Key: YARN-11277 > URL: https://issues.apache.org/jira/browse/YARN-11277 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.4.0 >Reporter: Xianming Lei >Priority: Minor > Labels: pull-request-available > > In our yarn cluster, the log files of some containers are too large, which > causes the NodeManager to frequently switch to the unhealthy state. For logs > that are too large, we can consider deleting them directly without delaying > yarn.nodemanager.log.retain-seconds. > Cluster environment: > # 8k nodes+ > # 50w+ apps / day > Configuration: > # yarn.nodemanager.log.retain-seconds=3days > # yarn.log-aggregation-enable=false > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11277) trigger deletion of log-dir by size for NonAggregatingLogHandler
[ https://issues.apache.org/jira/browse/YARN-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584205#comment-17584205 ] ASF GitHub Bot commented on YARN-11277: --- hadoop-yetus commented on PR #4797: URL: https://github.com/apache/hadoop/pull/4797#issuecomment-1225605681 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 36s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 16s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 25m 15s | | trunk passed | | +1 :green_heart: | compile | 9m 45s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 8m 48s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 2m 2s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 31s | | trunk passed | | +1 :green_heart: | javadoc | 2m 39s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 2m 13s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 4m 29s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 47s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 33s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 26s | | the patch passed | | +1 :green_heart: | compile | 9m 7s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 9m 7s | | the patch passed | | +1 :green_heart: | compile | 8m 27s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 8m 27s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 58s | [/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4797/1/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt) | hadoop-yarn-project/hadoop-yarn: The patch generated 15 new + 215 unchanged - 6 fixed = 230 total (was 221) | | +1 :green_heart: | mvnsite | 2m 24s | | the patch passed | | +1 :green_heart: | javadoc | 1m 55s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 53s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 4m 24s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 56s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 1m 18s | [/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4797/1/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api.txt) | hadoop-yarn-api in the patch passed. | | +1 :green_heart: | unit | 25m 44s | | hadoop-yarn-server-nodemanager in the patch passed. | | +1 :green_heart: | asflicense | 1m 22s | | The patch does not generate ASF License warnings. | | | | 181m 24s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4797/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4797 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux c90681ec29d2 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git
[jira] [Commented] (YARN-11277) trigger deletion of log-dir by size for NonAggregatingLogHandler
[ https://issues.apache.org/jira/browse/YARN-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584201#comment-17584201 ] ASF GitHub Bot commented on YARN-11277: --- leixm commented on PR #4797: URL: https://github.com/apache/hadoop/pull/4797#issuecomment-1225598272 If we set yarn.nodemanager.log.retain-seconds to a small value, this will cause normal logs to be deleted too quickly and still not solve the problem - this is due to some huge log files, After this pr merge, we can not only keep normal logs for a long time, but also exclude the impact of these huge logs on NodeManager. @ashutoshcipher > trigger deletion of log-dir by size for NonAggregatingLogHandler > > > Key: YARN-11277 > URL: https://issues.apache.org/jira/browse/YARN-11277 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.4.0 >Reporter: Xianming Lei >Priority: Minor > Labels: pull-request-available > > In our yarn cluster, the log files of some containers are too large, which > causes the NodeManager to frequently switch to the unhealthy state. For logs > that are too large, we can consider deleting them directly without delaying > yarn.nodemanager.log.retain-seconds. > Cluster environment: > # 8k nodes+ > # 50w+ apps / day > Configuration: > # yarn.nodemanager.log.retain-seconds=3days > # yarn.log-aggregation-enable=false > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11196) NUMA Awareness support in DefaultContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584197#comment-17584197 ] ASF GitHub Bot commented on YARN-11196: --- hadoop-yetus commented on PR #4742: URL: https://github.com/apache/hadoop/pull/4742#issuecomment-1225589943 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 50s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 20s | | trunk passed | | +1 :green_heart: | compile | 1m 40s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 1m 33s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 0m 47s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 55s | | trunk passed | | +1 :green_heart: | javadoc | 1m 5s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 0m 52s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 1m 48s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 14s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 47s | | the patch passed | | +1 :green_heart: | compile | 1m 26s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 1m 26s | | the patch passed | | +1 :green_heart: | compile | 1m 19s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 19s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 34s | [/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4742/18/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt) | hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 2 new + 42 unchanged - 0 fixed = 44 total (was 42) | | +1 :green_heart: | mvnsite | 0m 43s | | the patch passed | | +1 :green_heart: | javadoc | 0m 36s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 0m 35s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 1m 32s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 40s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 24m 11s | | hadoop-yarn-server-nodemanager in the patch passed. | | +1 :green_heart: | asflicense | 0m 56s | | The patch does not generate ASF License warnings. | | | | 122m 47s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4742/18/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4742 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 6e0912771f5e 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 6172d5fedad395f2c2465e9c073d7082c7706720 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4742
[jira] [Commented] (YARN-11276) Add lru cache for RMWebServices.getApps
[ https://issues.apache.org/jira/browse/YARN-11276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584194#comment-17584194 ] ASF GitHub Bot commented on YARN-11276: --- leixm commented on PR #4793: URL: https://github.com/apache/hadoop/pull/4793#issuecomment-1225587335 All fixed, thanks for your review@slfan1989 > Add lru cache for RMWebServices.getApps > --- > > Key: YARN-11276 > URL: https://issues.apache.org/jira/browse/YARN-11276 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.4.0 >Reporter: Xianming Lei >Priority: Minor > Labels: pull-request-available > > In our YARN cluster, there are thousands of apps running at the same time, > the return result of getApps reaches about 10M, and many requests are the > same input parameters, we can add cache for RMWebServices.getApps to reduce > processing delay -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11276) Add lru cache for RMWebServices.getApps
[ https://issues.apache.org/jira/browse/YARN-11276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584182#comment-17584182 ] ASF GitHub Bot commented on YARN-11276: --- slfan1989 commented on PR #4793: URL: https://github.com/apache/hadoop/pull/4793#issuecomment-1225566277 @ayushtkn Can you help review this PR? I feel that LRU Cache can help improve the access performance of getApps. > Add lru cache for RMWebServices.getApps > --- > > Key: YARN-11276 > URL: https://issues.apache.org/jira/browse/YARN-11276 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.4.0 >Reporter: Xianming Lei >Priority: Minor > Labels: pull-request-available > > In our YARN cluster, there are thousands of apps running at the same time, > the return result of getApps reaches about 10M, and many requests are the > same input parameters, we can add cache for RMWebServices.getApps to reduce > processing delay -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11276) Add lru cache for RMWebServices.getApps
[ https://issues.apache.org/jira/browse/YARN-11276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584180#comment-17584180 ] ASF GitHub Bot commented on YARN-11276: --- slfan1989 commented on code in PR #4793: URL: https://github.com/apache/hadoop/pull/4793#discussion_r953651456 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/LRUCache.java: ## @@ -0,0 +1,64 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.yarn.util; + +import org.apache.hadoop.classification.VisibleForTesting; + +import java.util.Map; + +public class LRUCache { + + private final long expireTimeMs; + private final Map> cache; + + public LRUCache(int capacity) { +this(capacity, -1); + } + + public LRUCache(int capacity, long expireTimeMs) { +cache = new LRUCacheHashMap<>(capacity, true); +this.expireTimeMs = expireTimeMs; + } + + public synchronized V get(K key) { +CacheNode cacheNode = cache.get(key); +if (cacheNode != null) { + if (expireTimeMs > 0 && + System.currentTimeMillis() > cacheNode.getCacheTime() + expireTimeMs) { Review Comment: indentation ? > Add lru cache for RMWebServices.getApps > --- > > Key: YARN-11276 > URL: https://issues.apache.org/jira/browse/YARN-11276 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.4.0 >Reporter: Xianming Lei >Priority: Minor > Labels: pull-request-available > > In our YARN cluster, there are thousands of apps running at the same time, > the return result of getApps reaches about 10M, and many requests are the > same input parameters, we can add cache for RMWebServices.getApps to reduce > processing delay -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-10993) Move domain specific logic out of CapacitySchedulerConfig
[ https://issues.apache.org/jira/browse/YARN-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] András Győri resolved YARN-10993. - Resolution: Won't Fix > Move domain specific logic out of CapacitySchedulerConfig > - > > Key: YARN-10993 > URL: https://issues.apache.org/jira/browse/YARN-10993 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Andras Gyori >Priority: Major > > CapacitySchedulerConfig should contain only getters/setters and parsing > logic. Everything else should be moved outside of the class to its > appropriate location. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-11278) Ambiguous error message in mutation API
András Győri created YARN-11278: --- Summary: Ambiguous error message in mutation API Key: YARN-11278 URL: https://issues.apache.org/jira/browse/YARN-11278 Project: Hadoop YARN Issue Type: Improvement Components: capacity scheduler Reporter: András Győri In RMWebServices#updateSchedulerConfiguration, we are checking two prerequisites: {code:java} if (scheduler instanceof MutableConfScheduler && ((MutableConfScheduler) scheduler).isConfigurationMutable()) { {code} However, the error message is misleading in the second case (namely if the configuration is not mutable eg. a FILE_CONFIGURATION_STORE) {code:java} } else { return Response.status(Status.BAD_REQUEST) .entity("Configuration change only supported by " + "MutableConfScheduler.") .build(); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9708) Yarn Router Support DelegationToken
[ https://issues.apache.org/jira/browse/YARN-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584142#comment-17584142 ] ASF GitHub Bot commented on YARN-9708: -- hadoop-yetus commented on PR #4746: URL: https://github.com/apache/hadoop/pull/4746#issuecomment-1225477736 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 52s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +0 :ok: | buf | 0m 0s | | buf was not available. | | +0 :ok: | buf | 0m 0s | | buf was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 5 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 58s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 28m 23s | | trunk passed | | +1 :green_heart: | compile | 10m 43s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 9m 5s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 2m 3s | | trunk passed | | +1 :green_heart: | mvnsite | 4m 26s | | trunk passed | | +1 :green_heart: | javadoc | 4m 11s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 3m 56s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 7m 25s | | trunk passed | | +1 :green_heart: | shadedclient | 24m 10s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 24m 33s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 25s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 39s | | the patch passed | | +1 :green_heart: | compile | 9m 46s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | cc | 9m 46s | | the patch passed | | -1 :x: | javac | 9m 46s | [/results-compile-javac-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4746/8/artifact/out/results-compile-javac-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 generated 1 new + 740 unchanged - 0 fixed = 741 total (was 740) | | +1 :green_heart: | compile | 9m 2s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | cc | 9m 2s | | the patch passed | | -1 :x: | javac | 9m 2s | [/results-compile-javac-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4746/8/artifact/out/results-compile-javac-hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt) | hadoop-yarn-project_hadoop-yarn-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 generated 3 new + 649 unchanged - 2 fixed = 652 total (was 651) | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4746/8/artifact/out/blanks-eol.txt) | The patch has 10 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 1m 49s | [/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4746/8/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt) | hadoop-yarn-project/hadoop-yarn: The patch generated 8 new + 26 unchanged - 2 fixed = 34 total (was 28) | | +1 :green_heart: | mvnsite | 4m 1s | | the patch passed | | +1 :green_heart: | javadoc | 3m 38s | | the patch passed with JDK Private Build
[jira] [Commented] (YARN-11277) trigger deletion of log-dir by size for NonAggregatingLogHandler
[ https://issues.apache.org/jira/browse/YARN-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584139#comment-17584139 ] ASF GitHub Bot commented on YARN-11277: --- ashutoshcipher commented on PR #4797: URL: https://github.com/apache/hadoop/pull/4797#issuecomment-1225452414 @leixm - Did you still face the same issue after setting `yarn.nodemanager.log.retain-seconds` to a very small value? > trigger deletion of log-dir by size for NonAggregatingLogHandler > > > Key: YARN-11277 > URL: https://issues.apache.org/jira/browse/YARN-11277 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.4.0 >Reporter: Xianming Lei >Priority: Minor > Labels: pull-request-available > > In our yarn cluster, the log files of some containers are too large, which > causes the NodeManager to frequently switch to the unhealthy state. For logs > that are too large, we can consider deleting them directly without delaying > yarn.nodemanager.log.retain-seconds. > Cluster environment: > # 8k nodes+ > # 50w+ apps / day > Configuration: > # yarn.nodemanager.log.retain-seconds=3days > # yarn.log-aggregation-enable=false > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11277) trigger deletion of log-dir by size for NonAggregatingLogHandler
[ https://issues.apache.org/jira/browse/YARN-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584129#comment-17584129 ] ASF GitHub Bot commented on YARN-11277: --- leixm commented on PR #4797: URL: https://github.com/apache/hadoop/pull/4797#issuecomment-1225418277 @slfan1989 can you review this pr plz? > trigger deletion of log-dir by size for NonAggregatingLogHandler > > > Key: YARN-11277 > URL: https://issues.apache.org/jira/browse/YARN-11277 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.4.0 >Reporter: Xianming Lei >Priority: Minor > > In our yarn cluster, the log files of some containers are too large, which > causes the NodeManager to frequently switch to the unhealthy state. For logs > that are too large, we can consider deleting them directly without delaying > yarn.nodemanager.log.retain-seconds. > Cluster environment: > # 8k nodes+ > # 50w+ apps / day > Configuration: > # yarn.nodemanager.log.retain-seconds=3days > # yarn.log-aggregation-enable=false > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11277) trigger deletion of log-dir by size for NonAggregatingLogHandler
[ https://issues.apache.org/jira/browse/YARN-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-11277: -- Labels: pull-request-available (was: ) > trigger deletion of log-dir by size for NonAggregatingLogHandler > > > Key: YARN-11277 > URL: https://issues.apache.org/jira/browse/YARN-11277 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.4.0 >Reporter: Xianming Lei >Priority: Minor > Labels: pull-request-available > > In our yarn cluster, the log files of some containers are too large, which > causes the NodeManager to frequently switch to the unhealthy state. For logs > that are too large, we can consider deleting them directly without delaying > yarn.nodemanager.log.retain-seconds. > Cluster environment: > # 8k nodes+ > # 50w+ apps / day > Configuration: > # yarn.nodemanager.log.retain-seconds=3days > # yarn.log-aggregation-enable=false > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-11277) trigger deletion of log-dir by size for NonAggregatingLogHandler
[ https://issues.apache.org/jira/browse/YARN-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianming Lei updated YARN-11277: Summary: trigger deletion of log-dir by size for NonAggregatingLogHandler (was: Add trigger log-dir deletion by size for NonAggregatingLogHandler) > trigger deletion of log-dir by size for NonAggregatingLogHandler > > > Key: YARN-11277 > URL: https://issues.apache.org/jira/browse/YARN-11277 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.4.0 >Reporter: Xianming Lei >Priority: Minor > > In our yarn cluster, the log files of some containers are too large, which > causes the NodeManager to frequently switch to the unhealthy state. For logs > that are too large, we can consider deleting them directly without delaying > yarn.nodemanager.log.retain-seconds. > Cluster environment: > # 8k nodes+ > # 50w+ apps / day > Configuration: > # yarn.nodemanager.log.retain-seconds=3days > # yarn.log-aggregation-enable=false > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11276) Add lru cache for RMWebServices.getApps
[ https://issues.apache.org/jira/browse/YARN-11276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584091#comment-17584091 ] ASF GitHub Bot commented on YARN-11276: --- hadoop-yetus commented on PR #4793: URL: https://github.com/apache/hadoop/pull/4793#issuecomment-1225323072 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 2s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 28s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 25m 35s | | trunk passed | | +1 :green_heart: | compile | 9m 57s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 8m 56s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 2m 20s | | trunk passed | | +1 :green_heart: | mvnsite | 4m 50s | | trunk passed | | +1 :green_heart: | javadoc | 4m 32s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 4m 19s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 7m 19s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 27s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 28s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 17s | | the patch passed | | +1 :green_heart: | compile | 9m 7s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 9m 7s | | the patch passed | | +1 :green_heart: | compile | 8m 33s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 8m 33s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 50s | | the patch passed | | +1 :green_heart: | mvnsite | 3m 39s | | the patch passed | | +1 :green_heart: | javadoc | 3m 36s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 3m 25s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 7m 2s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 30s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 38s | | hadoop-yarn-api in the patch passed. | | +1 :green_heart: | unit | 5m 13s | | hadoop-yarn-common in the patch passed. | | +1 :green_heart: | unit | 102m 25s | | hadoop-yarn-server-resourcemanager in the patch passed. | | +1 :green_heart: | asflicense | 1m 17s | | The patch does not generate ASF License warnings. | | | | 281m 28s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4793/6/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4793 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux ebeea3a84cd8 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 3d0ab7d0e0d7406a3afb7b27f70f3e4e87c09a08 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4793/6/te
[jira] [Commented] (YARN-11276) Add lru cache for RMWebServices.getApps
[ https://issues.apache.org/jira/browse/YARN-11276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584041#comment-17584041 ] ASF GitHub Bot commented on YARN-11276: --- slfan1989 commented on code in PR #4793: URL: https://github.com/apache/hadoop/pull/4793#discussion_r953421456 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java: ## @@ -4681,6 +4681,18 @@ public static boolean areNodeLabelsEnabled( public static final String DEFAULT_YARN_WORKFLOW_ID_TAG_PREFIX = "workflowid:"; + public static final String APPS_CACHE_ENABLE = YARN_PREFIX + "apps.cache.enable"; Review Comment: Can we add some parameter descriptions? > Add lru cache for RMWebServices.getApps > --- > > Key: YARN-11276 > URL: https://issues.apache.org/jira/browse/YARN-11276 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.4.0 >Reporter: Xianming Lei >Priority: Minor > Labels: pull-request-available > > In our YARN cluster, there are thousands of apps running at the same time, > the return result of getApps reaches about 10M, and many requests are the > same input parameters, we can add cache for RMWebServices.getApps to reduce > processing delay -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-11276) Add lru cache for RMWebServices.getApps
[ https://issues.apache.org/jira/browse/YARN-11276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584042#comment-17584042 ] ASF GitHub Bot commented on YARN-11276: --- slfan1989 commented on code in PR #4793: URL: https://github.com/apache/hadoop/pull/4793#discussion_r953421783 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AppsCacheKey.java: ## @@ -0,0 +1,142 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.yarn.util; + +import org.apache.commons.lang3.builder.EqualsBuilder; +import org.apache.commons.lang3.builder.HashCodeBuilder; +import org.apache.hadoop.security.UserGroupInformation; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.Set; + +public class AppsCacheKey { + private static final Logger LOG = + LoggerFactory.getLogger(AppsCacheKey.class.getName()); Review Comment: single line > Add lru cache for RMWebServices.getApps > --- > > Key: YARN-11276 > URL: https://issues.apache.org/jira/browse/YARN-11276 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 3.4.0 >Reporter: Xianming Lei >Priority: Minor > Labels: pull-request-available > > In our YARN cluster, there are thousands of apps running at the same time, > the return result of getApps reaches about 10M, and many requests are the > same input parameters, we can add cache for RMWebServices.getApps to reduce > processing delay -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org