[jira] [Updated] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

Peter Bacsko (Jira) Wed, 14 Oct 2020 03:47:03 -0700


     [ 
https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Peter Bacsko updated YARN-10460:
--------------------------------
    Description: 
In our downstream build environment, we're using JUnit 4.13. Recently, we 
discovered a truly weird test failure in TestNodeStatusUpdater.

The problem is that timeout handling has changed in Junit 4.13. See the 
difference between these two snippets:

4.12
{noformat}
    @Override
    public void evaluate() throws Throwable {
        CallableStatement callable = new CallableStatement();
        FutureTask<Throwable> task = new FutureTask<Throwable>(callable);
        threadGroup = new ThreadGroup("FailOnTimeoutGroup");
        Thread thread = new Thread(threadGroup, task, "Time-limited test");
        thread.setDaemon(true);
        thread.start();
        callable.awaitStarted();
        Throwable throwable = getResult(task, thread);
        if (throwable != null) {
            throw throwable;
        }
    }
{noformat}
 
 4.13
{noformat}
    @Override
    public void evaluate() throws Throwable {
        CallableStatement callable = new CallableStatement();
        FutureTask<Throwable> task = new FutureTask<Throwable>(callable);
        ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup");
        Thread thread = new Thread(threadGroup, task, "Time-limited test");
        try {
            thread.setDaemon(true);
            thread.start();
            callable.awaitStarted();
            Throwable throwable = getResult(task, thread);
            if (throwable != null) {
                throw throwable;
            }
        } finally {
            try {
                thread.join(1);
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
            try {
                threadGroup.destroy();  <---- This
            } catch (IllegalThreadStateException e) {
                // If a thread from the group is still alive, the ThreadGroup 
cannot be destroyed.
                // Swallow the exception to keep the same behavior prior to 
this change.
            }
        }
    }
{noformat}
The change comes from [https://github.com/junit-team/junit4/pull/1517].

Unfortunately, destroying the thread group causes an issue because there are 
all sorts of object caching in the IPC layer. The exception is:
{noformat}
java.lang.IllegalThreadStateException
        at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867)
        at java.lang.Thread.init(Thread.java:402)
        at java.lang.Thread.init(Thread.java:349)
        at java.lang.Thread.<init>(Thread.java:675)
        at 
java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613)
        at 
com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:612)
        at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925)
        at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
        at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
        at 
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136)
        at org.apache.hadoop.ipc.Client.call(Client.java:1458)
        at org.apache.hadoop.ipc.Client.call(Client.java:1405)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
        at com.sun.proxy.$Proxy81.startContainers(Unknown Source)
        at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
        at 
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251)
        at 
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1576)
{noformat}
Both the {{clientExecutor}} in {{org.apache.hadoop.ipc.Client}} and the client 
object in {{ProtobufRpcEngine}}/{{ProtobufRpcEngine2}} are stored as long as 
they're needed. But since the backing thread group is destroyed in the previous 
test, it's no longer possible to create new threads.

A quick workaround is to stop the clients and completely clear the 
{{ClientCache}} in {{ProtobufRpcEngine}} before each testcase. I tried this and 
it solves the problem but it feels hacky. Not sure if there is a better 
approach.

  was:
In our downstream build environment, we're using JUnit 4.13. Recently, we 
discovered a truly weird test failure in TestNodeStatusUpdater.

The problem is that timeout handling has changed in Junit 4.13. See the 
difference between these two snippets:

4.12
{noformat}
    @Override
    public void evaluate() throws Throwable {
        CallableStatement callable = new CallableStatement();
        FutureTask<Throwable> task = new FutureTask<Throwable>(callable);
        threadGroup = new ThreadGroup("FailOnTimeoutGroup");
        Thread thread = new Thread(threadGroup, task, "Time-limited test");
        thread.setDaemon(true);
        thread.start();
        callable.awaitStarted();
        Throwable throwable = getResult(task, thread);
        if (throwable != null) {
            throw throwable;
        }
    }
{noformat}
 
 4.13
{noformat}
    @Override
    public void evaluate() throws Throwable {
        CallableStatement callable = new CallableStatement();
        FutureTask<Throwable> task = new FutureTask<Throwable>(callable);
        ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup");
        Thread thread = new Thread(threadGroup, task, "Time-limited test");
        try {
            thread.setDaemon(true);
            thread.start();
            callable.awaitStarted();
            Throwable throwable = getResult(task, thread);
            if (throwable != null) {
                throw throwable;
            }
        } finally {
            try {
                thread.join(1);
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
            try {
                threadGroup.destroy();  <---- This
            } catch (IllegalThreadStateException e) {
                // If a thread from the group is still alive, the ThreadGroup 
cannot be destroyed.
                // Swallow the exception to keep the same behavior prior to 
this change.
            }
        }
    }
{noformat}
The change comes from [https://github.com/junit-team/junit4/pull/1517].

Unfortunately, destroying the thread group causes an issue because there are 
all sorts of object caching in the IPC layer. The exception is:
{noformat}
java.lang.IllegalThreadStateException
        at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867)
        at java.lang.Thread.init(Thread.java:402)
        at java.lang.Thread.init(Thread.java:349)
        at java.lang.Thread.<init>(Thread.java:675)
        at 
java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613)
        at 
com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:612)
        at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925)
        at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
        at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
        at 
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136)
        at org.apache.hadoop.ipc.Client.call(Client.java:1458)
        at org.apache.hadoop.ipc.Client.call(Client.java:1405)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
        at com.sun.proxy.$Proxy81.startContainers(Unknown Source)
        at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
        at 
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251)
        at 
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1576)
{noformat}
Both the {{clientExecutor}} in {{org.apache.hadoop.ipc.Client}} and the client 
object in {{ProtobufRpcEngine}}/{{ProtobufRpcEngine2}} is stored as long as 
they're needed. But since the backing thread group is destroyed in the previous 
test, it's no longer possible to create new threads.

A quick workaround is to stop the clients and completely clear the 
{{ClientCache}} in {{ProtobufRpcEngine}} before each testcase. I tried this and 
it solves the problem but it feels hacky. Not sure if there is a better 
approach.


> Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail
> ---------------------------------------------------------------------
>
>                 Key: YARN-10460
>                 URL: https://issues.apache.org/jira/browse/YARN-10460
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager, test
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>
> In our downstream build environment, we're using JUnit 4.13. Recently, we 
> discovered a truly weird test failure in TestNodeStatusUpdater.
> The problem is that timeout handling has changed in Junit 4.13. See the 
> difference between these two snippets:
> 4.12
> {noformat}
>     @Override
>     public void evaluate() throws Throwable {
>         CallableStatement callable = new CallableStatement();
>         FutureTask<Throwable> task = new FutureTask<Throwable>(callable);
>         threadGroup = new ThreadGroup("FailOnTimeoutGroup");
>         Thread thread = new Thread(threadGroup, task, "Time-limited test");
>         thread.setDaemon(true);
>         thread.start();
>         callable.awaitStarted();
>         Throwable throwable = getResult(task, thread);
>         if (throwable != null) {
>             throw throwable;
>         }
>     }
> {noformat}
>  
>  4.13
> {noformat}
>     @Override
>     public void evaluate() throws Throwable {
>         CallableStatement callable = new CallableStatement();
>         FutureTask<Throwable> task = new FutureTask<Throwable>(callable);
>         ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup");
>         Thread thread = new Thread(threadGroup, task, "Time-limited test");
>         try {
>             thread.setDaemon(true);
>             thread.start();
>             callable.awaitStarted();
>             Throwable throwable = getResult(task, thread);
>             if (throwable != null) {
>                 throw throwable;
>             }
>         } finally {
>             try {
>                 thread.join(1);
>             } catch (InterruptedException e) {
>                 Thread.currentThread().interrupt();
>             }
>             try {
>                 threadGroup.destroy();  <---- This
>             } catch (IllegalThreadStateException e) {
>                 // If a thread from the group is still alive, the ThreadGroup 
> cannot be destroyed.
>                 // Swallow the exception to keep the same behavior prior to 
> this change.
>             }
>         }
>     }
> {noformat}
> The change comes from [https://github.com/junit-team/junit4/pull/1517].
> Unfortunately, destroying the thread group causes an issue because there are 
> all sorts of object caching in the IPC layer. The exception is:
> {noformat}
> java.lang.IllegalThreadStateException
>       at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867)
>       at java.lang.Thread.init(Thread.java:402)
>       at java.lang.Thread.init(Thread.java:349)
>       at java.lang.Thread.<init>(Thread.java:675)
>       at 
> java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613)
>       at 
> com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:612)
>       at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925)
>       at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
>       at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
>       at 
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1458)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1405)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>       at com.sun.proxy.$Proxy81.startContainers(Unknown Source)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1576)
> {noformat}
> Both the {{clientExecutor}} in {{org.apache.hadoop.ipc.Client}} and the 
> client object in {{ProtobufRpcEngine}}/{{ProtobufRpcEngine2}} are stored as 
> long as they're needed. But since the backing thread group is destroyed in 
> the previous test, it's no longer possible to create new threads.
> A quick workaround is to stop the clients and completely clear the 
> {{ClientCache}} in {{ProtobufRpcEngine}} before each testcase. I tried this 
> and it solves the problem but it feels hacky. Not sure if there is a better 
> approach.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10460) Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail

Reply via email to