[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041824#comment-15041824
 ] 

Junping Du commented on YARN-3367:
----------------------------------

bq. Can you share your thoughts on why events are required to be in order ? Are 
you visualizing some erroneous situation when the sync and async events are out 
of order ? My guess would be that, it might impact if async events like 
resource utilization for a given container should be in order so that 
aggregation and accumulation are proper, but i was not able to identify any 
scenario where sync and async events should be in order. If its required then i 
need to bring in some additional modifications in the patch.
There are many cases that events should be logged in order or monotonic in 
time. A quick example is log the lifecycle of container/application, and we 
should see the states transition from create, running, succeed/failed, etc. As 
the client behavior is recognized as unreliable, if events are sent out of 
order to backend (a reliable storage), then we have no guarantee the event info 
get persistent finally is something make sense. i.e., we could have an app 
failed event, but we missed app creation and app running event due to client 
failure. Also, it (out of order) make this kind of thing harder to trace. You 
are right that container resource calculation algo have assumption that 
resource event are monotonic for time to simplify some calculation.

Btw, Cancel the patch as it is out of sync with new branch.


> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-3367
>                 URL: https://issues.apache.org/jira/browse/YARN-3367
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Junping Du
>            Assignee: Naganarasimha G R
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-3367.YARN-2928.001.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to