[ 
https://issues.apache.org/jira/browse/YARN-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191351#comment-16191351
 ] 

Daniel Templeton commented on YARN-7262:
----------------------------------------

My comments after a closer look:
# The new property and default should have javadocs
# Please don't you start with the {{null != x}} stuff, too...  \*sigh\*
# Please add assert messages, and let's not mix {{assertX()}} with 
{{Assert.assertX()}} calls.
# I feet like you should test with more than just a split index of 0 or 1.
# You don't need to store {{token3}} in 
{{testDelegationTokenNodeWithSplitChangeAcrossRestarts()}}.
# In {{initInternal()}}. shouldn't you consider 0 a valid split index?
# "Unknown child node with name: " could be a bit more descriptive.  Child of 
what?  What caused it?  What should the admin do about it?  Same for the 
messages in {{checkRemoveParentZnode()}}
# In {{loadDelegationTokenFromNode()}}, can I get an _else_ instead of an early 
return?
# I don't like reassigning the {{splitIdx}} parameter in {{getLeafZnodePath()}}.
# May as well split the long line on the equals in {{RMStateStore}}.

I still want to take a closer look at the ZK code, but I need more sleep first.

> Add a hierarchy into the ZKRMStateStore for delegation token znodes to 
> prevent jute buffer overflow
> ---------------------------------------------------------------------------------------------------
>
>                 Key: YARN-7262
>                 URL: https://issues.apache.org/jira/browse/YARN-7262
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.6.0
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>         Attachments: YARN-7262.001.patch
>
>
> We've seen users who are running into a problem where the RM is storing so 
> many delegation tokens in the {{ZKRMStateStore}} that the _listing_ of those 
> znodes is higher than the jute buffer. This is fine during operations, but 
> becomes a problem on a fail over because the RM will try to read in all of 
> the token znodes (i.e. call {{getChildren}} on the parent znode).  This is 
> particularly bad because everything appears to be okay, but then if a 
> failover occurs you end up with no active RMs.
> There was a similar problem with the Yarn application data that was fixed in 
> YARN-2962 by adding a (configurable) hierarchy of znodes so the RM could pull 
> subchildren without overflowing the jute buffer (though it's off by default).
> We should add a hierarchy similar to that of YARN-2962, but for the 
> delegation token znodes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to