[ 
https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zian Chen updated YARN-8193:
----------------------------
    Description: 
When running massive queries successively, at some point RM just hangs and 
stops allocating resources. At the point RM get hangs, YARN throw 
NullPointerException  at RegularContainerAllocator.getLocalityWaitFactor.

There's sufficient space given to yarn.nodemanager.local-dirs (not a node 
health issue, RM didn't report any node being unhealthy). There is no fixed 
trigger for this (query or operation).

This problem goes away on restarting ResourceManager. No NM restart is 
required. 

 

 

  was:
When running massive queries successively, at some point RM just hangs and 
stops allocating resources. 
There's sufficient space given to yarn.nodemanager.local-dirs (not a node 
health issue, RM didn't report any node being unhealthy). There is no fixed 
trigger for this (query or operation). This problem goes away on restarting 
ResourceManager.
No NM restart is required.

At the point RM get hangs, YARN throw NullPointerException  at 
RegularContainerAllocator.getLocalityWaitFactor.

 

 


> YARN RM hangs abruptly (stops allocating resources) when running successive 
> applications.
> -----------------------------------------------------------------------------------------
>
>                 Key: YARN-8193
>                 URL: https://issues.apache.org/jira/browse/YARN-8193
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Zian Chen
>            Assignee: Zian Chen
>            Priority: Critical
>
> When running massive queries successively, at some point RM just hangs and 
> stops allocating resources. At the point RM get hangs, YARN throw 
> NullPointerException  at RegularContainerAllocator.getLocalityWaitFactor.
> There's sufficient space given to yarn.nodemanager.local-dirs (not a node 
> health issue, RM didn't report any node being unhealthy). There is no fixed 
> trigger for this (query or operation).
> This problem goes away on restarting ResourceManager. No NM restart is 
> required. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to