[ 
https://issues.apache.org/jira/browse/YARN-8865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657549#comment-16657549
 ] 

Jason Lowe commented on YARN-8865:
----------------------------------

Thanks for updating the patch!

The main change looks fine to me.  It would be nice if the unit tests were a 
bit more reliable and more directly testing for the original problem.  For 
example, TestJHSDelegationTokenSecretManager has a 100ms sleep in it which is 
very likely to cause a test failure on systems that are slow (e.g.: VMs on 
overloaded systems).  It would be better if the test recovered the tokens, 
verified the token was recovered before starting the threads that will 
eventually reap it, then use GenericTestUtils#waitFor with a small poll 
interval (i.e.: 10msec) to wait for the token to be removed.  It would also be 
good to verify the token is removed from the state store, since I could see a 
bug where the token manager doesn't bother recovering the bad token but also 
doesn't bother to remove it from the state store either.  Therefore it'd be 
good to verify the token has also been removed from the state store.


> RMStateStore contains large number of expired RMDelegationToken
> ---------------------------------------------------------------
>
>                 Key: YARN-8865
>                 URL: https://issues.apache.org/jira/browse/YARN-8865
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.1.0
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>            Priority: Major
>         Attachments: YARN-8865.001.patch, YARN-8865.002.patch, 
> YARN-8865.003.patch, YARN-8865.004.patch
>
>
> When the RM state store is restored expired delegation tokens are restored 
> and added to the system. These expired tokens do not get cleaned up or 
> removed. The exact reason why the tokens are still in the store is not clear. 
> We have seen as many as 250,000 tokens in the store some of which were 2 
> years old.
> This has two side effects:
> * for the zookeeper store this leads to a jute buffer exhaustion issue and 
> prevents the RM from becoming active.
> * restore takes longer than needed and heap usage is higher than it should be
> We should not restore already expired tokens since they cannot be renewed or 
> used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to