[ https://issues.apache.org/jira/browse/YARN-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Masatake Iwasaki updated YARN-10334: ------------------------------------ Fix Version/s: 3.3.1 > TestDistributedShell leaks resources on timeout/failure > ------------------------------------------------------- > > Key: YARN-10334 > URL: https://issues.apache.org/jira/browse/YARN-10334 > Project: Hadoop YARN > Issue Type: Bug > Components: distributed-shell, test, yarn > Reporter: Ahmed Hussein > Assignee: Ahmed Hussein > Priority: Major > Labels: newbie, pull-request-available, test > Fix For: 3.4.0, 3.3.1 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > {{TestDistributedShell}} times out on trunk. I found that the application, > and containers will stay running in the background long after the unit test > has failed. > This causes failure of other test cases and several false positives failures > as result of: > * Ports will stay busy, so other tests cases fail to launch. > * Unit tests fail because of memory restrictions. > Although the unit test is already broken on trunk, we do not want its > failures to other unit tests. > {{TestDistributedShell}} needs to be revisited to make sure that all > {{YarnClients}}, and {{YarnApplications}} are closed properly at the end of > the each unit test (including exception and timeouts) > Steps to reproduce: > {code:bash} > mvn test -Dtest=TestDistributedShell#testDSShellWithOpportunisticContainers > ## this will timeout as > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 90.234 s <<< FAILURE! - in > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell > [ERROR] > testDSShellWithOpportunisticContainers(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) > Time elapsed: 90.018 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 90000 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:1117) > at > org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:1089) > at > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers(TestDistributedShell.java:1438) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > [INFO] > [INFO] Results: > [INFO] > [ERROR] Errors: > [ERROR] TestDistributedShell.testDSShellWithOpportunisticContainers:1438 ยป > TestTimedOut > [INFO] > [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0 > {code} > Using {{ps}} command, you can find the yarn processes are still in the > background > {code:bash} > /bin/bash -c $JRE_HOME/bin/java -Xmx512m > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster > --container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1 > --num_containers 2 --priority 0 --appname DistributedShell --homedir > file:/Users/ahussein > 1>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_000001/AppMaster.stdout > > 2>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_000001/AppMaster.stderr > $JRE_HOME/bin/java -Xmx512m > org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster > --container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1 > --num_containers 2 --priority 0 --appname DistributedShell --homedir > file:/Users/ahussein > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org