[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732868#comment-13732868
 ] 

Hitesh Shah commented on YARN-353:
----------------------------------

Comments:

Identation still off in yarn-default.xml

Not sure if a default value of "<!--value>127.0.0.1:2181</value-->" should be 
mentioned in yarn-default.xml.

hadoop-yarn-server-resourcemanager/pom.xml is still not fixed - contains the 
version info.

LOG.debug() should be wrapped within "if LOG.isDebugEnabled() {}"

In "abstract class ZKAction<T>", member variables should be final.

findbugs issue - zkSessionTimeout does not seem to be an issue but zkClient is 
something which is read/modified multiple times in various functions. Which 
functions are the ones that are/cannot be synchronized that access zkClient?

{code}
+    if (zkHostPort == null) {
+      throw new YarnRuntimeException(
+        "No server address specified for zookeeper state store for Resource" +
+        " Manager recovery. ZK_RM_STATE_STORE_ADDRESS is not configured.");
{code}
  - validation only for non-null and not a valid format?
  - ZK_RM_STATE_STORE_ADDRESS as a string?

A question on general usage of zk: why is everything being stored at the top 
level of the tree and not a heirarchical structure i.e. attempts of a 
particular application stored under that application's dir?

{code}
+    } catch (Exception e) {
+      throw e;
+    }
{code}
  - why the catch and re-throw? ( in multiple places )

"os.close();" - close calls should be in a finally block ( in multiple places ).

"LOG.info("Error in storing " + dtSequenceNumberPath);" - change to 
LOG.info("Error in storing " + dtSequenceNumberPath, e);"

{code}
+  String getNodePath(String root, String nodeName) {
+    return (root + "/" + nodeName);
+  }
{code}
  - does ZK have a variable to define the node path separator which we can use 
instead of a magic string?

"LOG.info("Created new connection for " + this);" --> logging 'this'? is there 
a toString() function?

How is a connection failure to zk handled? i.e. getNewZooKeeper() throws an 
exception. Does the RM fail/shutdown? Is the connection retried at a later 
point? 

When creating a Zookeeper object, ZK apis support a base root path and all 
operations are done relative to the base root path? Any reason why we are not 
using that approach by initializing zk with zkRootNodePath ? 

{code}
+    try {
+      createWithRetries(znodeWorkingPath, null, zkAcl, CreateMode.PERSISTENT);
+      createWithRetries(zkRootNodePath, null, zkAcl, CreateMode.PERSISTENT);
+      createWithRetries(rmDTSecretManagerRoot, null, zkAcl,
+        CreateMode.PERSISTENT);
+      createWithRetries(rmAppRoot, null, zkAcl, CreateMode.PERSISTENT);
+    } catch (KeeperException ke) {
+      if (ke.code() != Code.NODEEXISTS) {
+        throw ke;
+      }
+    }
{code}
  - what if the first line throws an exception saying node exists but the other 
nodes are not created? Shouldn't each call be in its own try catch block? Or 
should the create function be changed to accept a parameter which when set 
causes the function to ignore node exists errors?


For deleteWithRetries, the return code of exists() could be checked if a delete 
is required or not. 

Still need to look at the unit tests. 











                
> Add Zookeeper-based store implementation for RMStateStore
> ---------------------------------------------------------
>
>                 Key: YARN-353
>                 URL: https://issues.apache.org/jira/browse/YARN-353
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Hitesh Shah
>            Assignee: Bikas Saha
>         Attachments: YARN-353.10.patch, YARN-353.11.patch, YARN-353.1.patch, 
> YARN-353.2.patch, YARN-353.3.patch, YARN-353.4.patch, YARN-353.5.patch, 
> YARN-353.6.patch, YARN-353.7.patch, YARN-353.8.patch, YARN-353.9.patch
>
>
> Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to