[ https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041824#comment-15041824 ]
Junping Du commented on YARN-3367: ---------------------------------- bq. Can you share your thoughts on why events are required to be in order ? Are you visualizing some erroneous situation when the sync and async events are out of order ? My guess would be that, it might impact if async events like resource utilization for a given container should be in order so that aggregation and accumulation are proper, but i was not able to identify any scenario where sync and async events should be in order. If its required then i need to bring in some additional modifications in the patch. There are many cases that events should be logged in order or monotonic in time. A quick example is log the lifecycle of container/application, and we should see the states transition from create, running, succeed/failed, etc. As the client behavior is recognized as unreliable, if events are sent out of order to backend (a reliable storage), then we have no guarantee the event info get persistent finally is something make sense. i.e., we could have an app failed event, but we missed app creation and app running event due to client failure. Also, it (out of order) make this kind of thing harder to trace. You are right that container resource calculation algo have assumption that resource event are monotonic for time to simplify some calculation. Btw, Cancel the patch as it is out of sync with new branch. > Replace starting a separate thread for post entity with event loop in > TimelineClient > ------------------------------------------------------------------------------------ > > Key: YARN-3367 > URL: https://issues.apache.org/jira/browse/YARN-3367 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Affects Versions: YARN-2928 > Reporter: Junping Du > Assignee: Naganarasimha G R > Labels: yarn-2928-1st-milestone > Attachments: YARN-3367.YARN-2928.001.patch > > > Since YARN-3039, we add loop in TimelineClient to wait for > collectorServiceAddress ready before posting any entity. In consumer of > TimelineClient (like AM), we are starting a new thread for each call to get > rid of potential deadlock in main thread. This way has at least 3 major > defects: > 1. The consumer need some additional code to wrap a thread before calling > putEntities() in TimelineClient. > 2. It cost many thread resources which is unnecessary. > 3. The sequence of events could be out of order because each posting > operation thread get out of waiting loop randomly. > We should have something like event loop in TimelineClient side, > putEntities() only put related entities into a queue of entities and a > separated thread handle to deliver entities in queue to collector via REST > call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)