[ https://issues.apache.org/jira/browse/YARN-10460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17216637#comment-17216637 ]
Peter Bacsko commented on YARN-10460: ------------------------------------- I think it's OK to have it only on trunk for now. > Upgrading to JUnit 4.13 causes tests in TestNodeStatusUpdater to fail > --------------------------------------------------------------------- > > Key: YARN-10460 > URL: https://issues.apache.org/jira/browse/YARN-10460 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, test > Reporter: Peter Bacsko > Assignee: Peter Bacsko > Priority: Major > Fix For: 3.4.0 > > Attachments: YARN-10460-001.patch, YARN-10460-002.patch, > YARN-10460-POC.patch > > > In our downstream build environment, we're using JUnit 4.13. Recently, we > discovered a truly weird test failure in TestNodeStatusUpdater. > The problem is that timeout handling has changed in Junit 4.13. See the > difference between these two snippets: > 4.12 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask<Throwable> task = new FutureTask<Throwable>(callable); > threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } > {noformat} > > 4.13 > {noformat} > @Override > public void evaluate() throws Throwable { > CallableStatement callable = new CallableStatement(); > FutureTask<Throwable> task = new FutureTask<Throwable>(callable); > ThreadGroup threadGroup = new ThreadGroup("FailOnTimeoutGroup"); > Thread thread = new Thread(threadGroup, task, "Time-limited test"); > try { > thread.setDaemon(true); > thread.start(); > callable.awaitStarted(); > Throwable throwable = getResult(task, thread); > if (throwable != null) { > throw throwable; > } > } finally { > try { > thread.join(1); > } catch (InterruptedException e) { > Thread.currentThread().interrupt(); > } > try { > threadGroup.destroy(); <---- This > } catch (IllegalThreadStateException e) { > // If a thread from the group is still alive, the ThreadGroup > cannot be destroyed. > // Swallow the exception to keep the same behavior prior to > this change. > } > } > } > {noformat} > The change comes from [https://github.com/junit-team/junit4/pull/1517]. > Unfortunately, destroying the thread group causes an issue because there are > all sorts of object caching in the IPC layer. The exception is: > {noformat} > java.lang.IllegalThreadStateException > at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:867) > at java.lang.Thread.init(Thread.java:402) > at java.lang.Thread.init(Thread.java:349) > at java.lang.Thread.<init>(Thread.java:675) > at > java.util.concurrent.Executors$DefaultThreadFactory.newThread(Executors.java:613) > at > com.google.common.util.concurrent.ThreadFactoryBuilder$1.newThread(ThreadFactoryBuilder.java:163) > at > java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:612) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:925) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368) > at > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) > at > org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1136) > at org.apache.hadoop.ipc.Client.call(Client.java:1458) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy81.startContainers(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.startContainer(TestNodeManagerShutdown.java:251) > at > org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown(TestNodeStatusUpdater.java:1576) > {noformat} > Both the {{clientExecutor}} in {{org.apache.hadoop.ipc.Client}} and the > client object in {{ProtobufRpcEngine}}/{{ProtobufRpcEngine2}} are stored as > long as they're needed. But since the backing thread group is destroyed in > the previous test, it's no longer possible to create new threads. > A quick workaround is to stop the clients and completely clear the > {{ClientCache}} in {{ProtobufRpcEngine}} before each testcase. I tried this > and it solves the problem but it feels hacky. Not sure if there is a better > approach. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org