[jira] [Commented] (YARN-2946) DeadLock's in RMStateStore-ZKRMStateStore

2014-12-12 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244142#comment-14244142
 ] 

Varun Saxena commented on YARN-2946:


[~rohithsharma], good catch. As you said {{synchronized}} can be removed from 
{{RMStateStore#isFencedState()}}.

Regarding the methods you listed out, I think we can avoid reverse locking i.e. 
*ZKRMStateStore.class - StateMachine.doTransition()* by making the following 
changes.
# Remove {{synchronized}} keyword from each one of the methods listed above. 
# Separate out State machine synchronization(invoked by call to 
{{isFencedState}}  {{notifyStoreOperationFailed}}) and ZKRMStateStore 
synchronization by putting the relevant code in a synchronized block.

For example, code for RMStateStore#storeRMDTMasterKey() can be changed as under 
:
{code:title=RMStateStore.java|borderStyle=solid}
public void storeRMDTMasterKey(DelegationKey delegationKey) {
if(isFencedState()) {
  LOG.info(State store is in Fenced state. Can't store RM Delegation  +
   Token Master key.);
  return;
}
try {
  synchronized(this) {
storeRMDTMasterKeyState(delegationKey);
  }
} catch (Exception e) {
  notifyStoreOperationFailed(e);
}
  }
{code}

 DeadLock's in RMStateStore-ZKRMStateStore
 ---

 Key: YARN-2946
 URL: https://issues.apache.org/jira/browse/YARN-2946
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-2946.patch, 0002-YARN-2946.patch, 
 TestYARN2946.java


 Found one deadlock in ZKRMStateStore.
 # Initial stage zkClient is null because of zk disconnected event.
 # When ZKRMstatestore#runWithCheck()  wait(zkSessionTimeout) for zkClient to 
 re establish zookeeper connection either via synconnected or expired event, 
 it is highly possible that any other thred can obtain lock on 
 {{ZKRMStateStore.this}} from state machine transition events. This cause 
 Deadlock in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2946) DeadLock's in RMStateStore-ZKRMStateStore

2014-12-12 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14244225#comment-14244225
 ] 

Varun Saxena commented on YARN-2946:


Frankly, as all of these methods *merely have a single line calling another 
method* in {{ZKRMStateStore}}, in addition to call to isFencedState and 
notifyStoreOperationFailed. Hence, we *do not even need a synchronized block* 
in these methods in RMStateStore. Just make the relevant method in 
ZKRMStateStore *synchronized*. 

 DeadLock's in RMStateStore-ZKRMStateStore
 ---

 Key: YARN-2946
 URL: https://issues.apache.org/jira/browse/YARN-2946
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.0
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: 0001-YARN-2946.patch, 0002-YARN-2946.patch, 
 TestYARN2946.java


 Found one deadlock in ZKRMStateStore.
 # Initial stage zkClient is null because of zk disconnected event.
 # When ZKRMstatestore#runWithCheck()  wait(zkSessionTimeout) for zkClient to 
 re establish zookeeper connection either via synconnected or expired event, 
 it is highly possible that any other thred can obtain lock on 
 {{ZKRMStateStore.this}} from state machine transition events. This cause 
 Deadlock in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)