[ 
https://issues.apache.org/jira/browse/YARN-6128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238654#comment-16238654
 ] 

Subru Krishnan commented on YARN-6128:
--------------------------------------

Thanks [~botong] for your clarification. I have a few follow ups below.

bq. I don't understand what you mean. The current code first call list to find 
all subclusters in registry, then read the token for each subcluster in a loop. 

My question is why can't we get the tokens for the sub-clusters also in a 
single call, to avoid the read in a loop?

bq. This is to say whether to trust in memory data to decide whether to go into 
registry and delete. For FederationInterceptor, the in memory data is always in 
sync, so it sets ignoreMemoryState = false. However for registry cleanup (will 
do inside GPG), GPG will not have the in memory data, so will pass in true here 
to force deletion.

Let's move it out of here and add it as part of the GPG patch as it's confusing 
in it's current orphaned state.

bq. For store impl of registry, credential might be needed to access store.

I don't see the {{Credentials}} used anywhere in 
[FSRegistryOperationsService|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry/src/main/java/org/apache/hadoop/registry/client/impl/FSRegistryOperationsService.java]
 or available in the parent {{RegistryOperations}} interface. So maybe we can 
add it when we require it? I am also concerned about its expensive retrieval.

bq. The register is for home RM, after that, to return the full list of 
containers from previous attempt, we reattach to all existing UAMs and get 
running containers in secondary sub-clusters, merge all of them and return it 
to AM. In YARN-6704 later, inside FederationInterceptor::recover, I will be 
calling reattach as well. The register is for home RM, after that, to return 
the full list of containers from previous attempt, we reattach to all existing 
UAMs and get running containers in secondary sub-clusters, merge all of them 
and return it to AM. In YARN-6704 later, inside FederationInterceptor::recover, 
I will be calling reattach as well. 

Thanks for the clarification but shouldn't we do it only if AM supports 
recovery and if it's not the first attempt?

Nit: In the Javadoc for _setKeepContainersAcrossApplicationAttempts_, mention 
that it's for UAM recovery and link to API doc.




> Add support for AMRMProxy HA
> ----------------------------
>
>                 Key: YARN-6128
>                 URL: https://issues.apache.org/jira/browse/YARN-6128
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: amrmproxy, nodemanager
>            Reporter: Subru Krishnan
>            Assignee: Botong Huang
>            Priority: Major
>         Attachments: YARN-6128.v0.patch, YARN-6128.v1.patch, 
> YARN-6128.v1.patch, YARN-6128.v2.patch, YARN-6128.v3.patch, 
> YARN-6128.v3.patch, YARN-6128.v4.patch, YARN-6128.v5.patch
>
>
> YARN-556 added the ability for RM failover without loosing any running 
> applications. In a Federated YARN environment, there's additional state in 
> the {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we 
> need to enhance {{AMRMProxy}} to support HA.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to