[ 
https://issues.apache.org/jira/browse/YARN-9673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chan updated YARN-9673:
-----------------------
    Description: 
We have 1000 nodes in the cluster. Recently I found that when many tasks are 
submitted to the resourcemanager, an application takes 5-8 minutes from NEW to 
NEW_SAVING state, and an appattempt takes almost the same time from 
ALLOCATED_SAVING to ALLOCATED. I think the problem occurs in 
RMStateStore#handleStoreEvent, both methods will call this method

Anyone has encountered the same problem?

 

protected void handleStoreEvent(RMStateStoreEvent event) {
 this.writeLock.lock();
 try {

if (LOG.isDebugEnabled())

{ LOG.debug("Processing event of type " + event.getType()); }

final RMStateStoreState oldState = getRMStateStoreState();

this.stateMachine.doTransition(event.getType(), event);

if (oldState != getRMStateStoreState())

{ LOG.info("RMStateStore state change from " + oldState + " to " + 
getRMStateStoreState()); }

} catch (InvalidStateTransitonException e)

{ LOG.error("Can't handle this event at current state", e); }

finally

{ this.writeLock.unlock(); }

}

  was:
We have 1000 nodes in the cluster. Recently I found that when many tasks are 
submitted to the resourcemanager, an application takes 5-8 minutes from NEW to 
NEW_SAVING state, and an appattempt takes almost the same time from 
ALLOCATED_SAVING to ALLOCATED. I think the problem occurs in 
RMStateStore#handleStoreEvent, both methods will call this method, and this 
method is locked. I want to ask why there use writeLock to lock it.

Anyone has encountered the same problem?

 

protected void handleStoreEvent(RMStateStoreEvent event) {
this.writeLock.lock();
try {

if (LOG.isDebugEnabled())

{ LOG.debug("Processing event of type " + event.getType()); }

final RMStateStoreState oldState = getRMStateStoreState();

this.stateMachine.doTransition(event.getType(), event);

if (oldState != getRMStateStoreState())

{ LOG.info("RMStateStore state change from " + oldState + " to " + 
getRMStateStoreState()); }

} catch (InvalidStateTransitonException e)

{ LOG.error("Can't handle this event at current state", e); }

finally

{ this.writeLock.unlock(); }

}


> RMStateStore writeLock make app waste more time
> -----------------------------------------------
>
>                 Key: YARN-9673
>                 URL: https://issues.apache.org/jira/browse/YARN-9673
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 2.7.3
>            Reporter: chan
>            Priority: Blocker
>
> We have 1000 nodes in the cluster. Recently I found that when many tasks are 
> submitted to the resourcemanager, an application takes 5-8 minutes from NEW 
> to NEW_SAVING state, and an appattempt takes almost the same time from 
> ALLOCATED_SAVING to ALLOCATED. I think the problem occurs in 
> RMStateStore#handleStoreEvent, both methods will call this method
> Anyone has encountered the same problem?
>  
> protected void handleStoreEvent(RMStateStoreEvent event) {
>  this.writeLock.lock();
>  try {
> if (LOG.isDebugEnabled())
> { LOG.debug("Processing event of type " + event.getType()); }
> final RMStateStoreState oldState = getRMStateStoreState();
> this.stateMachine.doTransition(event.getType(), event);
> if (oldState != getRMStateStoreState())
> { LOG.info("RMStateStore state change from " + oldState + " to " + 
> getRMStateStoreState()); }
> } catch (InvalidStateTransitonException e)
> { LOG.error("Can't handle this event at current state", e); }
> finally
> { this.writeLock.unlock(); }
> }



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to