[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321386#comment-15321386
 ] 

Varun Saxena commented on YARN-2962:
------------------------------------

Have rebased the patch and fixed comments given by Daniel. Haven't tested it in 
my setup after rebase but tests pass and should be good for review.

As per current patch, there can be a race if 2 RMs' become active at same time. 
Do we need to handle it because I have never come across this scenario.
Basically when we store an app, we first create parent app node as per split(if 
it does not exist) and then create the child znode which stores app data. These 
2 operations are not carried out within the same fencing though.
And when we remove an app, we first delete the app znode containing app data, 
then get children for parent app node to check if there are no more child 
znodes, and if there are no more child znodes, delete the parent app node. 
These 3 operations are not done in a single fencing either which can lead to a 
race between creating and deleting parent app node.

We can however get rid of this potential race by doing these operations within 
a single fencing by creating fencing lock nodes explicitly first and then 
carrying out these operations one by one(cant be done in single transaction as 
we have to check how many children exist for the parent app node). Thoughts ?

> ZKRMStateStore: Limit the number of znodes under a znode
> --------------------------------------------------------
>
>                 Key: YARN-2962
>                 URL: https://issues.apache.org/jira/browse/YARN-2962
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Karthik Kambatla
>            Assignee: Varun Saxena
>            Priority: Critical
>         Attachments: YARN-2962.01.patch, YARN-2962.04.patch, 
> YARN-2962.05.patch, YARN-2962.2.patch, YARN-2962.3.patch
>
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to