[jira] [Updated] (YARN-10240) Prevent Fatal CancelledException in TimelineV2Client when stopping

2020-04-20 Thread Tarun Parimi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tarun Parimi updated YARN-10240:

Attachment: YARN-10240.001.patch

> Prevent Fatal CancelledException in TimelineV2Client when stopping
> --
>
> Key: YARN-10240
> URL: https://issues.apache.org/jira/browse/YARN-10240
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Reporter: Tarun Parimi
>Priority: Major
> Attachments: YARN-10240.001.patch
>
>
> When the timeline client is stopped, it will cancel all sync EntityHolders 
> after waiting for a drain timeout.
> {code:java}
> // if some entities were not drained then we need interrupt
>   // the threads which had put sync EntityHolders to the 
> queue.
>   EntitiesHolder nextEntityInTheQueue = null;
>   while ((nextEntityInTheQueue =
>   timelineEntityQueue.poll()) != null) {
> nextEntityInTheQueue.cancel(true);
>   }
> {code}
> We only handle interrupted exception here.
> {code:java}
> if (sync) {
> // In sync call we need to wait till its published and if any error 
> then
> // throw it back
> try {
>   entitiesHolder.get();
> } catch (ExecutionException e) {
>   throw new YarnException("Failed while publishing entity",
>   e.getCause());
> } catch (InterruptedException e) {
>   Thread.currentThread().interrupt();
>   throw new YarnException("Interrupted while publishing entity", e);
> }
>   }
> {code}
>  But calling nextEntityInTheQueue.cancel(true) will result in 
> entitiesHolder.get() throwing a CancelledException which is not handled. This 
> can result in FATAL error in NM. We need to prevent this.
> {code:java}
> FATAL event.AsyncDispatcher (AsyncDispatcher.java:dispatch(203)) - Error in 
> dispatcher thread
> java.util.concurrent.CancellationException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:121)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:545)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:348)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10240) Prevent Fatal CancelledException in TimelineV2Client when stopping

2020-04-20 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-10240:
-
Parent: YARN-9802
Issue Type: Sub-task  (was: Bug)

> Prevent Fatal CancelledException in TimelineV2Client when stopping
> --
>
> Key: YARN-10240
> URL: https://issues.apache.org/jira/browse/YARN-10240
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Attachments: YARN-10240.001.patch
>
>
> When the timeline client is stopped, it will cancel all sync EntityHolders 
> after waiting for a drain timeout.
> {code:java}
> // if some entities were not drained then we need interrupt
>   // the threads which had put sync EntityHolders to the 
> queue.
>   EntitiesHolder nextEntityInTheQueue = null;
>   while ((nextEntityInTheQueue =
>   timelineEntityQueue.poll()) != null) {
> nextEntityInTheQueue.cancel(true);
>   }
> {code}
> We only handle interrupted exception here.
> {code:java}
> if (sync) {
> // In sync call we need to wait till its published and if any error 
> then
> // throw it back
> try {
>   entitiesHolder.get();
> } catch (ExecutionException e) {
>   throw new YarnException("Failed while publishing entity",
>   e.getCause());
> } catch (InterruptedException e) {
>   Thread.currentThread().interrupt();
>   throw new YarnException("Interrupted while publishing entity", e);
> }
>   }
> {code}
>  But calling nextEntityInTheQueue.cancel(true) will result in 
> entitiesHolder.get() throwing a CancelledException which is not handled. This 
> can result in FATAL error in NM. We need to prevent this.
> {code:java}
> FATAL event.AsyncDispatcher (AsyncDispatcher.java:dispatch(203)) - Error in 
> dispatcher thread
> java.util.concurrent.CancellationException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:121)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:545)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:348)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10240) Prevent Fatal CancelledException in TimelineV2Client when stopping

2020-04-20 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-10240:
-
Fix Version/s: 3.4.0

> Prevent Fatal CancelledException in TimelineV2Client when stopping
> --
>
> Key: YARN-10240
> URL: https://issues.apache.org/jira/browse/YARN-10240
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Reporter: Tarun Parimi
>Assignee: Tarun Parimi
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10240.001.patch
>
>
> When the timeline client is stopped, it will cancel all sync EntityHolders 
> after waiting for a drain timeout.
> {code:java}
> // if some entities were not drained then we need interrupt
>   // the threads which had put sync EntityHolders to the 
> queue.
>   EntitiesHolder nextEntityInTheQueue = null;
>   while ((nextEntityInTheQueue =
>   timelineEntityQueue.poll()) != null) {
> nextEntityInTheQueue.cancel(true);
>   }
> {code}
> We only handle interrupted exception here.
> {code:java}
> if (sync) {
> // In sync call we need to wait till its published and if any error 
> then
> // throw it back
> try {
>   entitiesHolder.get();
> } catch (ExecutionException e) {
>   throw new YarnException("Failed while publishing entity",
>   e.getCause());
> } catch (InterruptedException e) {
>   Thread.currentThread().interrupt();
>   throw new YarnException("Interrupted while publishing entity", e);
> }
>   }
> {code}
>  But calling nextEntityInTheQueue.cancel(true) will result in 
> entitiesHolder.get() throwing a CancelledException which is not handled. This 
> can result in FATAL error in NM. We need to prevent this.
> {code:java}
> FATAL event.AsyncDispatcher (AsyncDispatcher.java:dispatch(203)) - Error in 
> dispatcher thread
> java.util.concurrent.CancellationException
>   at java.util.concurrent.FutureTask.report(FutureTask.java:121)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl$TimelineEntityDispatcher.dispatchEntities(TimelineV2ClientImpl.java:545)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineV2ClientImpl.putEntities(TimelineV2ClientImpl.java:149)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:348)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org