[ 
https://issues.apache.org/jira/browse/YARN-5946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852478#comment-15852478
 ] 

Jonathan Hung commented on YARN-5946:
-------------------------------------

Discussed offline with [~xgong] and [~leftnoteasy], seems there are some 
concerns about how this will be handled in the RM HA case. For the derby 
implementation, we can have a table1 containing the "last good" configuration, 
a table2 containing the mutations (one per row, basically an audit log), and a 
table3 with the "last good" transaction id (initialized at 0). Each mutation 
row in the audit log table will have a distinct and incrementing id. For a 
mutation, the workflow will be:

# Client issues configuration mutation
# Mutation passed to capacity scheduler, which then passes it to MCM
# MCM logs the change in table2 (atomically, using derby's commit) via the 
YarnConfigurationStore interface. This change will have txn id one greater than 
whatever is in table3.
# MCM calls cs.reinitialize with the in-memory CS configuration with the 
mutation applied
# If success, MCM stores the mutation in table1 and increments the txn id in 
table3 (both of these are done together atomically). If failure, MCM increments 
the txn id in table3 but doesn't change table1.

Steps 3-5 should be synchronized to prevent mutations to be logged in a 
different order than they are applied.

For the failover case, first we can NFS mount the derby DB on both RMs. Now if 
there is failover anytime after step 3, the new RM will read table3, see if 
there are any logs in table2 with txn id greater than this value (via 
YarnConfigurationStore interface), and apply these mutations, calling 
cs.reinitialize on the current configuration with this mutation applied, and 
storing if valid.

Note that since steps 3-5 are synchronized, there should be at most one log 
line from table2 to replay onto table1 when recovering.

Let me know if there are any concerns with this.

> Create YarnConfigurationStore interface and InMemoryConfigurationStore class
> ----------------------------------------------------------------------------
>
>                 Key: YARN-5946
>                 URL: https://issues.apache.org/jira/browse/YARN-5946
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jonathan Hung
>            Assignee: Jonathan Hung
>         Attachments: YARN-5946.001.patch, YARN-5946-YARN-5734.002.patch
>
>
> This class provides the interface to persist YARN configurations in a backing 
> store.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to