[ https://issues.apache.org/jira/browse/YARN-5946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866427#comment-15866427 ]
Wangda Tan commented on YARN-5946: ---------------------------------- Thanks [~jhung] for writing this up, it is very clear to me. One thing to confirm: bq. and a table3 with the "last good" transaction id (initialized at 0) It is actually means "last confirmed" transaction id, correct? I found in the step 5 it get increased even if update failed. And one minor suggestion to the data persisted: bq. If success, MCM stores the mutation in table1 and increments the txn id in table3 (both of these are done together atomically) I think derby may support this, but I'm not sure if this is common to different storage (for example, atomically update 2 HDFS file, or 2 ZK node, etc.). So I suggest to persist a transaction-id in addition to "last good" configuration to table-1. So even if write to table3 failed, we can recover the latest config in table-1. For the API, some suggestions to hide internal implementation details: 1) Do we really want {{Collection<String> removes}} as a part of logItem? I think set a key to empty value is equivalent to remove a key, correct? I would prefer to not add the {{removes}} field. 2) Who will generate "id" for each logItem? And suggest to make it to be long instead of int. 3) YarnConfigurationStore#retrieve, does it mean get from table-1 or get from table-1/2/3 (which described by your "for the failover case ..." in your previous comment)? I would prefer the latter one. 4) readPersistedId/getMutations look like internal implementation to me. Is it better to update them to {{List<LogMutation> getPendingMutations(void)}}? In summary, I think following APIs will be sufficient: {code} 1) initialize(Configuration conf, Map<String, String> schedConf); 2) retrieveLatestConfirmedConf which returns latest *good* configuration. This will be called when recovery 3) retrieveLatestConf which returns latest *not yet confirmed* configuration, this will be used by scheduler to try reinitialize. 4) logMutation to save the new mutation, and {{retrieveLatestConf}} can get updated accordingly. 5) confirmMutation(long id), to confirm the mutation, and {{retrieveLatestConfirmedConf}} can get updated accordingly. 6) List<LogMutation> getPendingMutations(void), this will be called when recovery 7) optional but may useful: List<Map<String, String>> getConfirmedConfHistory(long fromId). Admin can use this API to retrieve config history. {code} Please let me know your thoughts. > Create YarnConfigurationStore interface and InMemoryConfigurationStore class > ---------------------------------------------------------------------------- > > Key: YARN-5946 > URL: https://issues.apache.org/jira/browse/YARN-5946 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Jonathan Hung > Assignee: Jonathan Hung > Attachments: YARN-5946.001.patch, YARN-5946-YARN-5734.002.patch > > > This class provides the interface to persist YARN configurations in a backing > store. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org