[
https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312904#comment-17312904
]
Qi Zhu edited comment on YARN-8631 at 4/1/21, 6:31 AM:
-------------------------------------------------------
[~snemeth] [~pbacsko] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi]
I think YARN-7962 already fixed this case:
We change isServiceStarted to false in write lock.
{code:java}
serviceStateLock.writeLock().lock();
try {
isServiceStarted = false;
this.renewerService.shutdown();
} finally {
serviceStateLock.writeLock().unlock();
}
{code}
And processDelegationTokenRenewerEvent race condition may happen before
YARN-7962
{code:java}
private void processDelegationTokenRenewerEvent(
DelegationTokenRenewerEvent evt) {
serviceStateLock.readLock().lock();
try {
if (isServiceStarted) {
Future<?> future =
renewerService.submit(new DelegationTokenRenewerRunnable(evt));
futures.put(evt, future);
} else {
pendingEventQueue.add(evt);
}
} finally {
serviceStateLock.readLock().unlock();
}
}
@Override
public void run() {
if (evt instanceof DelegationTokenRenewerAppSubmitEvent) {
DelegationTokenRenewerAppSubmitEvent appSubmitEvt =
(DelegationTokenRenewerAppSubmitEvent) evt;
handleDTRenewerAppSubmitEvent(appSubmitEvt);
} else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) {
DelegationTokenRenewerAppRecoverEvent appRecoverEvt =
(DelegationTokenRenewerAppRecoverEvent) evt;
handleDTRenewerAppRecoverEvent(appRecoverEvt);
} else if (evt.getType().equals(
DelegationTokenRenewerEventType.FINISH_APPLICATION)) {
DelegationTokenRenewer.this.handleAppFinishEvent(evt);
}
}
@SuppressWarnings("unchecked")
private void handleDTRenewerAppRecoverEvent(
DelegationTokenRenewerAppRecoverEvent event) {
try {
// Setup tokens for renewal during recovery
DelegationTokenRenewer.this.handleAppSubmitEvent(event);
} catch (Throwable t) {
LOG.warn("Unable to add the application to the delegation token"
+ " renewer on recovery.", t);
}
}
{code}
Now the race condition not happened, including the null pointer error, my
cluster happened also.
I think we can close this now.
Thanks.
was (Author: zhuqi):
[~snemeth] [~umittal] [~gandras] [~shenyinjie] [~SanjayDivgi]
I think YARN-7962 already fixed this case:
We change isServiceStarted to false in write lock.
{code:java}
serviceStateLock.writeLock().lock();
try {
isServiceStarted = false;
this.renewerService.shutdown();
} finally {
serviceStateLock.writeLock().unlock();
}
{code}
And processDelegationTokenRenewerEvent race condition may happen before
YARN-7962
{code:java}
private void processDelegationTokenRenewerEvent(
DelegationTokenRenewerEvent evt) {
serviceStateLock.readLock().lock();
try {
if (isServiceStarted) {
Future<?> future =
renewerService.submit(new DelegationTokenRenewerRunnable(evt));
futures.put(evt, future);
} else {
pendingEventQueue.add(evt);
}
} finally {
serviceStateLock.readLock().unlock();
}
}
@Override
public void run() {
if (evt instanceof DelegationTokenRenewerAppSubmitEvent) {
DelegationTokenRenewerAppSubmitEvent appSubmitEvt =
(DelegationTokenRenewerAppSubmitEvent) evt;
handleDTRenewerAppSubmitEvent(appSubmitEvt);
} else if (evt instanceof DelegationTokenRenewerAppRecoverEvent) {
DelegationTokenRenewerAppRecoverEvent appRecoverEvt =
(DelegationTokenRenewerAppRecoverEvent) evt;
handleDTRenewerAppRecoverEvent(appRecoverEvt);
} else if (evt.getType().equals(
DelegationTokenRenewerEventType.FINISH_APPLICATION)) {
DelegationTokenRenewer.this.handleAppFinishEvent(evt);
}
}
@SuppressWarnings("unchecked")
private void handleDTRenewerAppRecoverEvent(
DelegationTokenRenewerAppRecoverEvent event) {
try {
// Setup tokens for renewal during recovery
DelegationTokenRenewer.this.handleAppSubmitEvent(event);
} catch (Throwable t) {
LOG.warn("Unable to add the application to the delegation token"
+ " renewer on recovery.", t);
}
}
{code}
Now the race condition not happened, including the null pointer error, my
cluster happened also.
I think we can close this now.
Thanks.
> YARN RM fails to add the application to the delegation token renewer on
> recovery
> --------------------------------------------------------------------------------
>
> Key: YARN-8631
> URL: https://issues.apache.org/jira/browse/YARN-8631
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Affects Versions: 3.1.0
> Reporter: Sanjay Divgi
> Assignee: Umesh Mittal
> Priority: Blocker
> Attachments: YARN-8631.001.patch,
> hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-000004.log
>
>
> On HA cluster we have observed that yarn resource manager fails to add the
> application to the delegation token renewer on recovery.
> Below is the error:
> {code:java}
> 2018-08-07 08:41:23,850 INFO security.DelegationTokenRenewer
> (DelegationTokenRenewer.java:renewToken(635)) - Renewed delegation-token=
> [Kind: TIMELINE_DELEGATION_TOKEN, Service: 172.27.84.192:8188, Ident:
> (TIMELINE_DELEGATION_TOKEN owner=hrt_qa_hive_spark, renewer=yarn, realUser=,
> issueDate=1533624642302, maxDate=1534229442302, sequenceNumber=18,
> masterKeyId=4);exp=1533717683478; apps=[application_1533623972681_0001]]
> 2018-08-07 08:41:23,855 WARN security.DelegationTokenRenewer
> (DelegationTokenRenewer.java:handleDTRenewerAppRecoverEvent(955)) - Unable to
> add the application to the delegation token renewer on recovery.
> java.lang.NullPointerException
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:522)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79)
> at
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]