[ https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732868#comment-13732868 ]
Hitesh Shah commented on YARN-353: ---------------------------------- Comments: Identation still off in yarn-default.xml Not sure if a default value of "<!--value>127.0.0.1:2181</value-->" should be mentioned in yarn-default.xml. hadoop-yarn-server-resourcemanager/pom.xml is still not fixed - contains the version info. LOG.debug() should be wrapped within "if LOG.isDebugEnabled() {}" In "abstract class ZKAction<T>", member variables should be final. findbugs issue - zkSessionTimeout does not seem to be an issue but zkClient is something which is read/modified multiple times in various functions. Which functions are the ones that are/cannot be synchronized that access zkClient? {code} + if (zkHostPort == null) { + throw new YarnRuntimeException( + "No server address specified for zookeeper state store for Resource" + + " Manager recovery. ZK_RM_STATE_STORE_ADDRESS is not configured."); {code} - validation only for non-null and not a valid format? - ZK_RM_STATE_STORE_ADDRESS as a string? A question on general usage of zk: why is everything being stored at the top level of the tree and not a heirarchical structure i.e. attempts of a particular application stored under that application's dir? {code} + } catch (Exception e) { + throw e; + } {code} - why the catch and re-throw? ( in multiple places ) "os.close();" - close calls should be in a finally block ( in multiple places ). "LOG.info("Error in storing " + dtSequenceNumberPath);" - change to LOG.info("Error in storing " + dtSequenceNumberPath, e);" {code} + String getNodePath(String root, String nodeName) { + return (root + "/" + nodeName); + } {code} - does ZK have a variable to define the node path separator which we can use instead of a magic string? "LOG.info("Created new connection for " + this);" --> logging 'this'? is there a toString() function? How is a connection failure to zk handled? i.e. getNewZooKeeper() throws an exception. Does the RM fail/shutdown? Is the connection retried at a later point? When creating a Zookeeper object, ZK apis support a base root path and all operations are done relative to the base root path? Any reason why we are not using that approach by initializing zk with zkRootNodePath ? {code} + try { + createWithRetries(znodeWorkingPath, null, zkAcl, CreateMode.PERSISTENT); + createWithRetries(zkRootNodePath, null, zkAcl, CreateMode.PERSISTENT); + createWithRetries(rmDTSecretManagerRoot, null, zkAcl, + CreateMode.PERSISTENT); + createWithRetries(rmAppRoot, null, zkAcl, CreateMode.PERSISTENT); + } catch (KeeperException ke) { + if (ke.code() != Code.NODEEXISTS) { + throw ke; + } + } {code} - what if the first line throws an exception saying node exists but the other nodes are not created? Shouldn't each call be in its own try catch block? Or should the create function be changed to accept a parameter which when set causes the function to ignore node exists errors? For deleteWithRetries, the return code of exists() could be checked if a delete is required or not. Still need to look at the unit tests. > Add Zookeeper-based store implementation for RMStateStore > --------------------------------------------------------- > > Key: YARN-353 > URL: https://issues.apache.org/jira/browse/YARN-353 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Hitesh Shah > Assignee: Bikas Saha > Attachments: YARN-353.10.patch, YARN-353.11.patch, YARN-353.1.patch, > YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, > YARN-353.6.patch, YARN-353.7.patch, YARN-353.8.patch, YARN-353.9.patch > > > Add store that write RM state data to ZK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira