[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357111#comment-14357111 ]
Junping Du commented on YARN-3039: ---------------------------------- Thanks [~sjlee0]! Providing an end-to-end flow below which could be helpful for your review: - When AM get launched, NM auxiliary service will add a new aggregator service to aggregatorCollection (per Node) for necessary binding work. aggregatorCollection also has a client for AggregatorNodeManagerProtocol to notify NM on new app aggregator registered and detailed address. - When NM get notified, it will update registeredAggregators list (for all local app aggregators), and notify RM in next heartbeat. - RM received registeredAggregators from NM, it will update its aggregators list. - Next time, when other NMs and AM heartbeat with RM, it will provide aggregatorInfo in heartbeat response (for AM, it is through AllocationResponse). - AM of DS has AMRMClientAsync which heartbeat with RM so can receive updated aggregator address periodically. With registered a callback for listening aggregator address update, it can update address of TimelineClient in a thread-safe way. - AM call timeline operations in a non-blocking way (for not hanging there as deadlock), currently is wrapping with a new thread but will be improved later (in another JIRA) for saving resource of threads. - TimelineClient (consuming v2 service) is looping in retry logic until get correct address that being set by AM. > [Aggregator wireup] Implement ATS app-appgregator service discovery > ------------------------------------------------------------------- > > Key: YARN-3039 > URL: https://issues.apache.org/jira/browse/YARN-3039 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Sangjin Lee > Assignee: Junping Du > Attachments: Service Binding for applicationaggregator of ATS > (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, > YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, > YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch > > > Per design in YARN-2928, implement ATS writer service discovery. This is > essential for off-node clients to send writes to the right ATS writer. This > should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)