[ https://issues.apache.org/jira/browse/YARN-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Botong Huang updated YARN-8933: ------------------------------- Attachment: YARN-8933.v1.patch > [AMRMProxy] Fix potential null AvailableResource and NumClusterNode in > allocation response > ------------------------------------------------------------------------------------------ > > Key: YARN-8933 > URL: https://issues.apache.org/jira/browse/YARN-8933 > Project: Hadoop YARN > Issue Type: Sub-task > Components: amrmproxy, federation > Reporter: Botong Huang > Assignee: Botong Huang > Priority: Major > Attachments: YARN-8933.v1.patch > > > After YARN-8696, the allocate response by FederationInterceptor is merged > from the responses from a random subset of all sub-clusters, depending on the > async heartbeat timing. As a result, cluster-wide information fields in the > response, e.g. AvailableResources and NumClusterNodes, are not consistent at > all. It can even be null/zero because the specific response is merged from an > empty set of sub-cluster responses. > In this patch, we let FederationInterceptor remember the last allocate > response from all known sub-clusters, and always construct the cluster-wide > info fields from all of them. We also moved sub-cluster timeout from > LocalityMulticastAMRMProxyPolicy to FederationInterceptor, so that > sub-clusters that expired (haven't had a successful allocate response for a > while) won't be included in the computation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org