[jira] [Commented] (YARN-8470) Fair scheduler exception with SLS

ASF GitHub Bot (JIRA) Tue, 11 Sep 2018 08:50:06 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610820#comment-16610820
 ]


ASF GitHub Bot commented on YARN-8470:
--------------------------------------

GitHub user gg7 opened a pull request:

    https://github.com/apache/hadoop/pull/416

    YARN-8470. Fix a NPE in identifyContainersToPreemptOnNode()

    I encountered this issue while running 3.1.0:
    
    ```
    2018-09-10 13:42:39,437 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: 
Container container_1536156801471_0071_01_000055 completed with event FINISHED, 
but corresponding RMContainer doesn't exist.
    2018-09-10 13:42:39,881 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, 
FSPreemptionThread, that exited unexpectedly: java.lang.NullPointerException
            at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptOnNode(FSPreemptionThread.java:207)
            at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptForOneContainer(FSPreemptionThread.java:161)
            at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreempt(FSPreemptionThread.java:121)
            at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.run(FSPreemptionThread.java:81)
    
    2018-09-10 13:42:39,886 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Shutting down 
the resource manager.
    2018-09-10 13:42:39,891 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1: a critical thread, FSPreemptionThread, that exited unexpectedly: 
java.lang.NullPointerException
            at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptOnNode(FSPreemptionThread.java:207)
            at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptForOneContainer(FSPreemptionThread.java:161)
            at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreempt(FSPreemptionThread.java:121)
            at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.run(FSPreemptionThread.java:81)
    ```
    
    I'm guessing a better fix would be to synchronise the removal of 
applications, but this simple patch should be an improvement IMO.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gg7/hadoop gg7-yarn-8470-fix-npe

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hadoop/pull/416.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #416
    
----
commit a86c54c4db3954aca40ef297135a5e875c0a96a8
Author: George G <git@...>
Date:   2018-09-11T15:00:00Z

    YARN-8470. Fix a NPE in identifyContainersToPreemptOnNode()
    
    I encountered this issue while running 3.1.0:
    
    ```
    2018-09-10 13:42:39,437 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: 
Container container_1536156801471_0071_01_000055 completed with event FINISHED, 
but corresponding RMContainer doesn't exist.
    2018-09-10 13:42:39,881 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received 
RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, 
FSPreemptionThread, that exited unexpectedly: java.lang.NullPointerException
            at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptOnNode(FSPreemptionThread.java:207)
            at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptForOneContainer(FSPreemptionThread.java:161)
            at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreempt(FSPreemptionThread.java:121)
            at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.run(FSPreemptionThread.java:81)
    
    2018-09-10 13:42:39,886 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Shutting down 
the resource manager.
    2018-09-10 13:42:39,891 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1: a critical thread, FSPreemptionThread, that exited unexpectedly: 
java.lang.NullPointerException
            at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptOnNode(FSPreemptionThread.java:207)
            at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptForOneContainer(FSPreemptionThread.java:161)
            at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreempt(FSPreemptionThread.java:121)
            at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.run(FSPreemptionThread.java:81)
    ```
    
    I'm guessing a better fix would be to synchronise the removal of 
applications,
    but this simple patch should be an improvement IMO.
    
    Signed-off-by: George G <g...@gg7.io>

----


> Fair scheduler exception with SLS
> ---------------------------------
>
>                 Key: YARN-8470
>                 URL: https://issues.apache.org/jira/browse/YARN-8470
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Miklos Szegedi
>            Assignee: Haibo Chen
>            Priority: Major
>
> I ran into the following exception with sls:
> 2018-06-26 13:34:04,358 ERROR resourcemanager.ResourceManager: Received 
> RMFatalEvent of type CRITICAL_THREAD_CRASH, caused by a critical thread, 
> FSPreemptionThread, that exited unexpectedly: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptOnNode(FSPreemptionThread.java:207)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreemptForOneContainer(FSPreemptionThread.java:161)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.identifyContainersToPreempt(FSPreemptionThread.java:121)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSPreemptionThread.run(FSPreemptionThread.java:81)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8470) Fair scheduler exception with SLS

Reply via email to