[ 
https://issues.apache.org/jira/browse/YARN-5946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866427#comment-15866427
 ] 

Wangda Tan commented on YARN-5946:
----------------------------------

Thanks [~jhung] for writing this up, it is very clear to me.

One thing to confirm:

bq. and a table3 with the "last good" transaction id (initialized at 0)
It is actually means "last confirmed" transaction id, correct? I found in the 
step 5 it get increased even if update failed.

And one minor suggestion to the data persisted:

bq. If success, MCM stores the mutation in table1 and increments the txn id in 
table3 (both of these are done together atomically) 
I think derby may support this, but I'm not sure if this is common to different 
storage (for example, atomically update 2 HDFS file, or 2 ZK node, etc.). So I 
suggest to persist a transaction-id in addition to "last good" configuration to 
table-1. So even if write to table3 failed, we can recover the latest config in 
table-1.

For the API, some suggestions to hide internal implementation details:

1) Do we really want {{Collection<String> removes}} as a part of logItem? I 
think set a key to empty value is equivalent to remove a key, correct? I would 
prefer to not add the {{removes}} field.
2) Who will generate "id" for each logItem? And suggest to make it to be long 
instead of int.
3) YarnConfigurationStore#retrieve, does it mean get from table-1 or get from 
table-1/2/3 (which described by your "for the failover case ..." in your 
previous comment)? I would prefer the latter one.
4) readPersistedId/getMutations look like internal implementation to me. Is it 
better to update them to {{List<LogMutation> getPendingMutations(void)}}?

In summary, I think following APIs will be sufficient:
{code}
1) initialize(Configuration conf, Map<String, String> schedConf);
2) retrieveLatestConfirmedConf which returns latest *good* configuration. This 
will be called when recovery
3) retrieveLatestConf which returns latest *not yet confirmed* configuration, 
this will be used by scheduler to try reinitialize.
4) logMutation to save the new mutation, and {{retrieveLatestConf}} can get 
updated accordingly.
5) confirmMutation(long id), to confirm the mutation, and 
{{retrieveLatestConfirmedConf}} can get updated accordingly.
6) List<LogMutation> getPendingMutations(void), this will be called when 
recovery
7) optional but may useful: List<Map<String, String>> 
getConfirmedConfHistory(long fromId). Admin can use this API to retrieve config 
history.
{code}

Please let me know your thoughts.

> Create YarnConfigurationStore interface and InMemoryConfigurationStore class
> ----------------------------------------------------------------------------
>
>                 Key: YARN-5946
>                 URL: https://issues.apache.org/jira/browse/YARN-5946
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jonathan Hung
>            Assignee: Jonathan Hung
>         Attachments: YARN-5946.001.patch, YARN-5946-YARN-5734.002.patch
>
>
> This class provides the interface to persist YARN configurations in a backing 
> store.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to