[ 
https://issues.apache.org/jira/browse/YARN-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-4392:
------------------------------------
    Attachment: YARN-4392.3.patch

bq. ATS events upon recovery in some scenarios if we don't re-send since ATS 
event posting is async and state store updating are async. There's a race where 
we could update the state store and crash before the ATS event is sent.
IMO its like situation where in we need to decide which is greater of 2 evils 
and try to take care of it. Had a offline discussion with [~sunilg] and 
[~vvasudev], and few thoughts were :
* anyway processing of the ATS events even in recovery is in separate thread 
hence its not blocking
* when we move out of ATS1.0, storage will also not be a problem. 
* Data is also not getting changed on resending during recovery

considering all these i am fine with approach in the patch. Also have uploaded 
a new patch with test case correction and addition of test case to validate 
during creation and recovery events are sent for container created.

> ApplicationCreatedEvent event time resets after RM restart/failover
> -------------------------------------------------------------------
>
>                 Key: YARN-4392
>                 URL: https://issues.apache.org/jira/browse/YARN-4392
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Xuan Gong
>            Assignee: Naganarasimha G R
>            Priority: Critical
>         Attachments: YARN-4392-2015-11-24.patch, YARN-4392.1.patch, 
> YARN-4392.2.patch, YARN-4392.3.patch
>
>
> {code}2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - 
> Finished time 1437453994768 is ahead of started time 1440308399674 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437454008244 is ahead of started time 1440308399676 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444305171 is ahead of started time 1440308399653 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444293115 is ahead of started time 1440308399647 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444379645 is ahead of started time 1440308399656 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444361234 is ahead of started time 1440308399655 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444342029 is ahead of started time 1440308399654 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444323447 is ahead of started time 1440308399654 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444430006 is ahead of started time 1440308399660 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444415698 is ahead of started time 1440308399659 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444419060 is ahead of started time 1440308399658 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished 
> time 1437444393931 is ahead of started time 1440308399657
> {code} . 
> From ATS logs, we would see a large amount of 'stale alerts' messages 
> periodically



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to