[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868160#comment-17868160 ] ASF GitHub Bot commented on YARN-11178: --- dotzborro commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-2245967534 This bug seems to have been solved by https://github.com/apache/hadoop/pull/5629 (https://issues.apache.org/jira/browse/YARN-11489), after which the futures are held in `LinkedBlockingQueue`. Elements are taken from the list using `take` method, that is documented like following: > Retrieves and removes the head of this queue, waiting if necessary until an element becomes available. I confirmed in our environments that after patching 3.3.6 with this fix the CPU utilization is back to normal. I believe that https://issues.apache.org/jira/browse/YARN-11178 can be closed. > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in c
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854691#comment-17854691 ] ASF GitHub Bot commented on YARN-11178: --- pstrzelczak commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-2165145077 As already mentioned, this issue make 1vcpu to be always occupied due to busy loop. The makes the adoption of 3.3 line difficult in production environments. Can it be fixed? > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else {
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732335#comment-17732335 ] ASF GitHub Bot commented on YARN-11178: --- gp1314 commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-1590396294 Thanks to LennonChin for his contributions and ideas, but why close this commit? I don't think the problem has been fixed. Can I continue the work? > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( >
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730391#comment-17730391 ] ASF GitHub Bot commented on YARN-11178: --- LennonChin closed pull request #4435: YARN-11178. Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread URL: https://github.com/apache/hadoop/pull/4435 > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " >
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709969#comment-17709969 ] ASF GitHub Bot commented on YARN-11178: --- hadoop-yetus commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-1501121315 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 35m 39s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 44s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 26m 33s | | trunk passed | | +1 :green_heart: | compile | 8m 57s | | trunk passed | | +1 :green_heart: | checkstyle | 1m 52s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 25s | | trunk passed | | +1 :green_heart: | javadoc | 3m 1s | | trunk passed | | +1 :green_heart: | spotbugs | 6m 25s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 35s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 29s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 10s | | the patch passed | | +1 :green_heart: | compile | 8m 17s | | the patch passed | | +1 :green_heart: | javac | 8m 17s | | the patch passed | | +1 :green_heart: | blanks | 0m 1s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 41s | | the patch passed | | +1 :green_heart: | mvnsite | 3m 7s | | the patch passed | | +1 :green_heart: | xmllint | 0m 0s | | No new issues. | | +1 :green_heart: | javadoc | 2m 32s | | the patch passed | | +1 :green_heart: | spotbugs | 6m 29s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 40s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 16s | | hadoop-yarn-api in the patch passed. | | +1 :green_heart: | unit | 5m 50s | | hadoop-yarn-common in the patch passed. | | -1 :x: | unit | 100m 27s | [/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4435/5/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt) | hadoop-yarn-server-resourcemanager in the patch passed. | | +1 :green_heart: | asflicense | 0m 59s | | The patch does not generate ASF License warnings. | | | | 278m 47s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4435/5/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4435 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux 18558f9c 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 84d4d555f62e29fdc19947869d02ffd33b9ac4c5 | | Default Java | Red Hat, Inc.-1.8.0_362-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4435/5/testReport/ | | Max. process+thread count | 943 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4435/5/console | | versions | git=2.9.5 maven=3.6.3 spotbugs=4.2.2 xmllint=20901 | | Powered by | Apache Ye
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709951#comment-17709951 ] ASF GitHub Bot commented on YARN-11178: --- Hexiaoqiao commented on code in PR #4435: URL: https://github.com/apache/hadoop/pull/4435#discussion_r1161241461 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java: ## @@ -819,6 +819,15 @@ public static boolean isAclEnabled(Configuration conf) { public static final int DEFAULT_RM_DT_RENEWER_THREAD_RETRY_MAX_ATTEMPTS = 10; + /** + * The setting is used to control delegation token renewer thread perform backoff + * waiting when there are no renewer event futures to avoid CPU busy idling. + */ + public static final String RM_DT_RENEWER_THREAD_IDLE_BACKOFF_MS = + RM_PREFIX + "delegation-token-renewer.thread-idle-backoff-ms"; + public static final long DEFAULT_RM_DT_RENEWER_THREAD_IDLE_BACKOFF_MS = + TimeUnit.SECONDS.toMillis(3); // 3 Seconds Review Comment: Just suggest to change `TimeUnit.SECONDS.toMillis(3)` to 18 directly to keep same dimension as Key/Value name says (MS/ms) and avoid ambiguity for end user. ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml: ## @@ -1148,6 +1148,16 @@ 60s + + + The setting is used to control each RM DelegationTokenRenewer thread perform backoff + waiting when there are no renewer event futures to avoid CPU busy idling. + the default value is 3 seconds. + + yarn.resourcemanager.delegation-token-renewer.thread-idle-backoff-ms +3s Review Comment: +1 as mentioned above. We should keep the same dimension for both key and value of configuration. > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709672#comment-17709672 ] ASF GitHub Bot commented on YARN-11178: --- gp1314 commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-1500270398 TestDelegationTokenRenewer. TestTokenThreadTimeout need RM_DT_RENEWER_THREAD_IDLE_BACKOFF_MS is set to less than RM_DT_RENEWER_THREAD_TIMEOUT > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( >
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709574#comment-17709574 ] ASF GitHub Bot commented on YARN-11178: --- slfan1989 commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-1499906514 > I had the same problem. hadoop 3.3.5 was released and had the same problem We will continue to follow up this pr. @LennonChin Can we rebuild this pr? > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.i
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707794#comment-17707794 ] ASF GitHub Bot commented on YARN-11178: --- gp1314 commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-1493809925 I had the same problem. hadoop 3.3.5 was released and had the same problem > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " >
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17694275#comment-17694275 ] ASF GitHub Bot commented on YARN-11178: --- LennonChin commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-1447479319 > @LennonChin Is this junit test error related to this pr? Looks like it is, I'll check it out. I'm a little busy lately, sorry... > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( >
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17691531#comment-17691531 ] ASF GitHub Bot commented on YARN-11178: --- slfan1989 commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-1438320061 @LennonChin Is this junit test error related to this pr? > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "t
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683638#comment-17683638 ] ASF GitHub Bot commented on YARN-11178: --- slfan1989 commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-1414736328 @LennonChin Thank you for your contribution! Is the unit test error caused by this pr? > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewe
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683636#comment-17683636 ] ASF GitHub Bot commented on YARN-11178: --- slfan1989 commented on code in PR #4435: URL: https://github.com/apache/hadoop/pull/4435#discussion_r1095306416 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java: ## @@ -996,6 +996,22 @@ public void run() { @Override public void run() { while (true) { +if (futures.isEmpty()) { + synchronized (this) { +try { + // waiting for tokenRenewerThreadTimeout milliseconds + long waitingTimeMs = Math.min(1, Math.max(500, tokenRenewerThreadTimeout)); Review Comment: > @slfan1989 what do you suggest increasing them to? Sorry for the late reply, I hope the configuration can be added, not hardcoded. > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThre
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655368#comment-17655368 ] ASF GitHub Bot commented on YARN-11178: --- hadoop-yetus commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-1373425771 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 39s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 8s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 26m 8s | | trunk passed | | +1 :green_heart: | compile | 10m 5s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 9m 16s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 45s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 13s | | trunk passed | | -1 :x: | javadoc | 1m 1s | [/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4435/4/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-yarn-server-resourcemanager in trunk failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 2m 35s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 6m 12s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 53s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 31s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 19s | | the patch passed | | +1 :green_heart: | compile | 9m 39s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 9m 39s | | the patch passed | | +1 :green_heart: | compile | 8m 35s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 8m 35s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 35s | | the patch passed | | +1 :green_heart: | mvnsite | 2m 56s | | the patch passed | | -1 :x: | javadoc | 0m 56s | [/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4435/4/artifact/out/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-yarn-server-resourcemanager in the patch failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 2m 32s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 6m 18s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 15s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 12s | | hadoop-yarn-api in the patch passed. | | +1 :green_heart: | unit | 5m 43s | | hadoop-yarn-common in the patch passed. | | -1 :x: | unit | 102m 26s | [/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4435/4/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt) | hadoop-yarn-server-resourcemanager in the patch passed. | | +1 :green_heart: | asflicens
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654922#comment-17654922 ] ASF GitHub Bot commented on YARN-11178: --- hadoop-yetus commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-1372103081 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 36s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 58s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 25m 48s | | trunk passed | | +1 :green_heart: | compile | 9m 47s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 8m 29s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 48s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 20s | | trunk passed | | -1 :x: | javadoc | 1m 2s | [/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4435/3/artifact/out/branch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-yarn-server-resourcemanager in trunk failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 2m 47s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 6m 12s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 27s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 28s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 11s | | the patch passed | | +1 :green_heart: | compile | 9m 43s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 9m 43s | | the patch passed | | +1 :green_heart: | compile | 8m 34s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 8m 34s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 39s | | the patch passed | | +1 :green_heart: | mvnsite | 3m 1s | | the patch passed | | -1 :x: | javadoc | 0m 56s | [/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4435/3/artifact/out/patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-yarn-server-resourcemanager in the patch failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 2m 34s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 6m 23s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 35s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 11s | | hadoop-yarn-api in the patch passed. | | +1 :green_heart: | unit | 5m 44s | | hadoop-yarn-common in the patch passed. | | -1 :x: | unit | 99m 49s | [/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4435/3/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt) | hadoop-yarn-server-resourcemanager in the patch passed. | | +1 :green_heart: | asflicens
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654817#comment-17654817 ] ASF GitHub Bot commented on YARN-11178: --- LennonChin commented on code in PR #4435: URL: https://github.com/apache/hadoop/pull/4435#discussion_r1062171496 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java: ## @@ -996,6 +996,22 @@ public void run() { @Override public void run() { while (true) { +if (futures.isEmpty()) { + synchronized (this) { +try { + // waiting for tokenRenewerThreadTimeout milliseconds + long waitingTimeMs = Math.min(1, Math.max(500, tokenRenewerThreadTimeout)); Review Comment: > @slfan1989 what do you suggest increasing them to? I think he means to add a configuration to control the waiting time. > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout,
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654812#comment-17654812 ] ASF GitHub Bot commented on YARN-11178: --- LennonChin commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-1371858309 I have made two commits to perform your suggestions, @slfan1989 please cc > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " >
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653797#comment-17653797 ] ASF GitHub Bot commented on YARN-11178: --- dineshchitlangia commented on code in PR #4435: URL: https://github.com/apache/hadoop/pull/4435#discussion_r1060289109 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java: ## @@ -996,6 +996,22 @@ public void run() { @Override public void run() { while (true) { +if (futures.isEmpty()) { + synchronized (this) { +try { + // waiting for tokenRenewerThreadTimeout milliseconds + long waitingTimeMs = Math.min(1, Math.max(500, tokenRenewerThreadTimeout)); Review Comment: @slfan1989 what do you suggest increasing them to? > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { >
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651263#comment-17651263 ] ASF GitHub Bot commented on YARN-11178: --- slfan1989 commented on code in PR #4435: URL: https://github.com/apache/hadoop/pull/4435#discussion_r1055408239 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java: ## @@ -1023,6 +1039,16 @@ public void run() { LOG.info("Problem in submitting renew tasks in token renewer " + "thread.", e); } + if (future.isDone() || future.isCancelled()) { +try { + futures.remove(evt); + LOG.info("Removed done or cancelled renewer tasks of {}" + + " in token renewer thread.", evt.getApplicationId()); Review Comment: indentation, Usually 5 characters. > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECON
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651261#comment-17651261 ] ASF GitHub Bot commented on YARN-11178: --- slfan1989 commented on code in PR #4435: URL: https://github.com/apache/hadoop/pull/4435#discussion_r1055407270 ## hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java: ## @@ -996,6 +996,22 @@ public void run() { @Override public void run() { while (true) { +if (futures.isEmpty()) { + synchronized (this) { +try { + // waiting for tokenRenewerThreadTimeout milliseconds + long waitingTimeMs = Math.min(1, Math.max(500, tokenRenewerThreadTimeout)); Review Comment: 500, 1 Should we increase the configuration? > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cance
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651248#comment-17651248 ] ASF GitHub Bot commented on YARN-11178: --- slfan1989 commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-1362744741 @LennonChin can we rebase again? > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", >
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640327#comment-17640327 ] ASF GitHub Bot commented on YARN-11178: --- cainbit commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-1329946107 > @slfan1989 @dineshchitlangia @brumi1024 @9uapaw @ashutoshcipher Could you please to help review the code? This issue makes version 3.3.4 unable to run directly on the production environment without a manual patching。 Could I know the current review status of this PR and whether it can be incorporated into version 3.3.5? > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.sche
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640322#comment-17640322 ] ASF GitHub Bot commented on YARN-11178: --- cainbit commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-1329936324 > > @slfan1989 > > I'm using Hadoop 3.3.4 with kerberos, and the issue is happened, the DelegationTokenRenewer use 100% CPU. > > I use jstack to check threads and find the DelegationTokenRenewer thread cause this issue, like @LennonChin said. > > The issue still exists but the PR is not being merged, I am not quite sure why it is not merged, you can manually merge this patch to solve this problem. > > After merging the patch, we have been running in the production environment for half a year, and no problems have been found so far. Thank you for your work on this issue, I had added the empty "futures" checking code, it works well. > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3, 3.3.4 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (Timeou
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17639793#comment-17639793 ] ASF GitHub Bot commented on YARN-11178: --- LennonChin commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-1328565915 > @slfan1989 > > > > I'm using Hadoop 3.3.4 with kerberos, and the issue is happened, the DelegationTokenRenewer use 100% CPU. > > I use jstack to check threads and find the DelegationTokenRenewer thread cause this issue, like @LennonChin said. The issue still exists but the PR is not being merged, I am not quite sure why it is not merged, you can manually merge this patch to solve this problem. > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); >
[jira] [Commented] (YARN-11178) Avoid CPU busy idling and resource wasting in DelegationTokenRenewerPoolTracker thread
[ https://issues.apache.org/jira/browse/YARN-11178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17639572#comment-17639572 ] ASF GitHub Bot commented on YARN-11178: --- cainbit commented on PR #4435: URL: https://github.com/apache/hadoop/pull/4435#issuecomment-1328167253 I can reproduce the issue and I would like to know the current status of this PR. Thanks. > Avoid CPU busy idling and resource wasting in > DelegationTokenRenewerPoolTracker thread > -- > > Key: YARN-11178 > URL: https://issues.apache.org/jira/browse/YARN-11178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, security >Affects Versions: 3.3.1, 3.3.2, 3.3.3 > Environment: Hadoop 3.3.3 with Kerberos, Ranger 2.1.0, Hive 2.3.7 and > Spark 3.0.3 >Reporter: Lennon Chin >Priority: Minor > Labels: pull-request-available > Attachments: YARN-11178.CPU idling busy 100% before optimized.png, > YARN-11178.CPU normal after optimized.png, YARN-11178.CPU profile for idling > busy 100% before optimized.html, YARN-11178.CPU profile for idling busy 100% > before optimized.png, YARN-11178.CPU profile for normal after optimized.html, > YARN-11178.CPU profile for normal after optimized.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The DelegationTokenRenewerPoolTracker thread is busy wasting CPU resource in > empty poll iterate when there is no delegation token renewer event task in > the futures map: > {code:java} > // > org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.DelegationTokenRenewerPoolTracker#run > @Override > public void run() { > // this while true loop is busy when the `futures` is empty > while (true) { > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " > + "thread for {}", > tokenRenewerThreadRetryMaxAttempts, evt.getApplicationId()); > } > } > } catch (Exception e) { > LOG.info("Problem in submitting renew tasks in token renewer " > + "thread.", e); > } > } > } > }{code} > A better way to avoid CPU idling is waiting for some time when the `futures` > map is empty, and when the renewer task done or cancelled, we should remove > the task future in `futures` map to avoid memory leak: > {code:java} > @Override > public void run() { > while (true) { > // waiting for some time when futures map is empty > if (futures.isEmpty()) { > synchronized (this) { > try { > // waiting for tokenRenewerThreadTimeout milliseconds > long waitingTimeMs = Math.min(1, Math.max(500, > tokenRenewerThreadTimeout)); > LOG.info("Delegation token renewer pool is empty, waiting for {} > ms.", waitingTimeMs); > wait(waitingTimeMs); > } catch (InterruptedException e) { > LOG.warn("Delegation token renewer pool tracker waiting interrupt > occurred."); > Thread.currentThread().interrupt(); > } > } > if (futures.isEmpty()) { > continue; > } > } > for (Map.Entry> entry : futures > .entrySet()) { > DelegationTokenRenewerEvent evt = entry.getKey(); > Future future = entry.getValue(); > try { > future.get(tokenRenewerThreadTimeout, TimeUnit.MILLISECONDS); > } catch (TimeoutException e) { > // Cancel thread and retry the same event in case of timeout > if (future != null && !future.isDone() && !future.isCancelled()) { > future.cancel(true); > futures.remove(evt); > if (evt.getAttempt() < tokenRenewerThreadRetryMaxAttempts) { > renewalTimer.schedule( > getTimerTask((AbstractDelegationTokenRenewerAppEvent) evt), > tokenRenewerThreadRetryInterval); > } else { > LOG.info( > "Exhausted max retry attempts {} in token renewer " >