[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761573#comment-13761573 ]
Hitesh Shah commented on YARN-353: ---------------------------------- hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml - shouldn't there be a dependency on zookeeper even in the normal scope? {code} + if (childNodeName.startsWith(DELEGATION_TOKEN_SEQUENCE_NUMBER_PREFIX)) { + rmState.rmSecretManagerState.dtSequenceNumber = + Integer.parseInt(childNodeName.split("_")[1]); + continue; + } {code} - Could you clarify whether there can be multiple child nodes prefixed with DELEGATION_TOKEN_SEQUENCE_NUMBER_PREFIX in any possible state variation? {code} + // assert child node name is same as actual applicationId + assert appId.equals(appState.context.getApplicationId()); {code} - why the need for an assert? Should this check throw a runtime exception instead? (likewise for other assert checks ) {code} + } catch (Exception e) { + // currently throw all exceptions. May need to respond differently for HA + // based on whether we have lost the right to write to ZK + // TODO: Revisit this post YARN-149 + throw e; + } {code} - I believe its better to just remove such code and add it in with HA patches. {code} + /** + * Call exists() to leave a watch on the node denoted by path. + * Delete node if exists. To pass the existence information to the + * caller, call delete irrespective of whether node exists or not. + */ + if (zkClient.exists(path, true) == null) { + LOG.error("Trying to delete a path (" + path + + ") that doesn't exist."); + } else { + zkClient.delete(path, version); + } {code} - code does not match the comment ( with respect to passing of existence information ) {code} + } catch (Exception e) { + e.printStackTrace(); + Assert.fail("ZKRMStateStore Session restore failed"); + } {code} - Don't think there is any need to catch the exception. The unit test will fail if the exception is not caught. If the exception stack trace in the unit test logs is not useful enough to understand the failure reason, it may be better to fix the code as needed. ( likewise in the other couple of places in the unit test where exceptions are being caught and handled with an assert.fail() {code} + Thread.sleep(800); {code} - the zk unit test has magic sleeps of 800 ms in some cases, 500 ms in others. What is the reason for these different numbers? Does the test helper need augmenting to remove this timing related dependency? General minor nits: - 80 chars line limit exceeded in multiple files. > Add Zookeeper-based store implementation for RMStateStore > --------------------------------------------------------- > > Key: YARN-353 > URL: https://issues.apache.org/jira/browse/YARN-353 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Hitesh Shah > Assignee: Karthik Kambatla > Attachments: YARN-353.10.patch, YARN-353.11.patch, YARN-353.12.patch, > yarn-353-12-wip.patch, YARN-353.13.patch, YARN-353.14.patch, > YARN-353.15.patch, YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch, > YARN-353.4.patch, YARN-353.5.patch, YARN-353.6.patch, YARN-353.7.patch, > YARN-353.8.patch, YARN-353.9.patch > > > Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira