[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14388443#comment-14388443
 ] 

Varun Saxena commented on YARN-2962:
------------------------------------

bq. Also was wondering, should we hard code the NO_INDEX_SPLITTING logic to 4 ? 
Essentially, is it always guaranteed that sequence number will always be 
exactly 4 digits ?
Yes it is not always guaranteed to be 4. It is minimum 4 digits and can go upto 
limit of integer which would be 10 digits.
But we have another configuration about maximum number of apps in state store 
which is by default 10000. So, effectively there wont be any more than this 
number of apps in state store. That is why I considered split index to be 
maximum 4. Also it is the simplest way of configuration. Considering upto 10 
digits and having split index as > 4 would cause issues. And I could not think 
of a simpler config.

But thanks for pointing this out. Actually brought my attention towards an 
important issue. I think split index should carry out the split from the end of 
the sequence number.
Let us say, we have apps upto {{application_\{cluster_timestamp\}_9999}}. Next 
app would be  {{application\_\{cluster_timestamp\}_10000}}. As max number of 
apps in state store is 10000,  {{application\_\{cluster_timestamp\}_0000}} 
would be deleted from state store.
Consider if split index is 2. If I count split index from beginning, 
applications from {{application_\{cluster_timestamp\}\_10000}} to 
{{application\_\{cluster_timestamp\}\_10999}} would go under znode 
{{application_\{cluster_timestamp\}\_10}} where apps from 
{{application\_\{cluster_timestamp\}\_1000}} to 
{{application\_\{cluster_timestamp\}\_1099}} already exist. This can make the 
original problem(of exceeding jute maxbuffer size) to recur. Please note that 
apps {{application_\{cluster_timestamp\}\_1000}} to 
{{application_\{cluster_timestamp\}\_1099}} wont be deleted anytime soon and 
this would make this znode having a lot of children.

If we split it from the end though, 
{{application\_\{cluster_timestamp\}\_10000}} to 
{{application\_\{cluster_timestamp\}\_10099}} would go under znode 
{{application\_\{cluster_timestamp\}\_100}} instead. I think this approach is 
correct instead of currently implemented one.



> ZKRMStateStore: Limit the number of znodes under a znode
> --------------------------------------------------------
>
>                 Key: YARN-2962
>                 URL: https://issues.apache.org/jira/browse/YARN-2962
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>            Reporter: Karthik Kambatla
>            Assignee: Varun Saxena
>            Priority: Critical
>         Attachments: YARN-2962.01.patch
>
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to