[ https://issues.apache.org/jira/browse/YARN-6511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Botong Huang updated YARN-6511: ------------------------------- Attachment: YARN-6511-YARN-2915.v3.patch Thanks [~subru] for the review! I've addressed most comments in v3 patch (as well as ones from [~jianhe]). For the rest, please see below: bq. Do we need a {{UnmanagedAMPoolManager}} per interceptor instance or can we use one at {{AMRMProxyService}} level? It is easier the current way because we need to constantly get all UAM associated with one application (keyed by subClusterId). If we do one pool per AMRMProxy, then we probably need to key UAM with appId+subclusterId. The search for UAMs associated with one application will not be straight forward. bq. Is updating the queue below safe in *loadAMRMPolicy* Yes, the variable _queue_ is a local string, used by only the policy manager. bq. I feel the *finishApplicationMaster* of the pool should be moved to {{UnmanagedAMPoolManager}}. Yes we can choose to. However it will likely be a blocking call then, where we loose the freedom to schedule the tasks, synchronously call finish in home, and then wait for the secondaries to come back. Or, we need addition interface in UAMPoolManager, one for schedule and one for fetch result. I've added a TODO for this. bq. I see dynamic instantiations of {{ExecutorCompletionService}} in finish, register, etc invocations. Wouldn't we be better served by pre-initializing it? We need to create them locally because of concurrency. The allocate and finish calls can be invoked concurrently. Sharing the same completion service object will confuse the tasks submitted from both sides. bq. Is *getSubClusterForNode* required as the resolver should be doing this instead of every client? _AbstractSubClusterResolver.getSubClusterForNode_ throws when resolving an unknown node, we don't want to throw in this case, and thus need to catch and log the warning. bq. Move _YarnConfiguration_ outside the for loop in *registerWithNewSubClusters* We cannot because we need a different config per UAM, loaded with the sub-cluster id bq. Consider looping on _registrations_ in lieu of _requests_ in *sendRequestsToSecondaryResourceManagers* Registration only contains the newly added secondary sub-clusters, while we need to loop over (send heartbeat to) all known secondaries here. > Federation Intercepting and propagating AM-RM communications (part two: > secondary subclusters added) > ---------------------------------------------------------------------------------------------------- > > Key: YARN-6511 > URL: https://issues.apache.org/jira/browse/YARN-6511 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Botong Huang > Assignee: Botong Huang > Attachments: YARN-6511-YARN-2915.v1.patch, > YARN-6511-YARN-2915.v2.patch, YARN-6511-YARN-2915.v3.patch > > > In order to support transparent "spanning" of jobs across sub-clusters, all > AM-RM communications are proxied (via YARN-2884). > This JIRA tracks federation-specific mechanisms that decide how to > "split/broadcast" requests to the RMs and "merge" answers to > the AM. > This the part two jira, which adds secondary subclusters and do full > split-merge for requests. Part one is in YARN-3666 -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org