I'm trying again running the tests, now I have four cores (previously five) and 12 GB RAM (previously 8 GB). I'm still hit by the OOM killer.
The command I'm running is: mvn -Dflink.forkCount=1 -Dflink.forkCountTestPackage=1 clean verify [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 01:17 h [INFO] Finished at: 2020-10-23T15:36:50+03:00 [INFO] Final Memory: 180M/614M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (integration-tests) on project flink-tests: There are test failures. [ERROR] [ERROR] Please refer to /home/juha/git/flink/flink-tests/target/surefire-reports for the individual test results. [ERROR] Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream. [ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? [ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter15842756015305201470.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-23T14-19-18_685-jvmRun1 surefire394592676817174474tmp surefire_117413817767116882164827tmp [ERROR] Error occurred in starting fork, check output in log [ERROR] Process Exit Code: 137 [ERROR] Crashed tests: [ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? [ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter15842756015305201470.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-23T14-19-18_685-jvmRun1 surefire394592676817174474tmp surefire_117413817767116882164827tmp [ERROR] Error occurred in starting fork, check output in log [ERROR] Process Exit Code: 137 [ERROR] Crashed tests: [ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857) [ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80) [ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120) [ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355) [ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155) [ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584) [ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216) [ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160) [ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [ERROR] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [ERROR] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566) [ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) [ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) [ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) [ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) [ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called? [ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m -Dmvn.forkNumber=1 -XX:+UseG1GC -jar /home/juha/git/flink/flink-tests/target/surefire/surefirebooter15842756015305201470.jar /home/juha/git/flink/flink-tests/target/surefire 2020-10-23T14-19-18_685-jvmRun1 surefire394592676817174474tmp surefire_117413817767116882164827tmp [ERROR] Error occurred in starting fork, check output in log [ERROR] Process Exit Code: 137 [ERROR] Crashed tests: [ERROR] org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420) [ERROR] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [ERROR] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [ERROR] at java.base/java.lang.Thread.run(Thread.java:834) [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn <goals> -rf :flink-tests This means there should be only the parent JVM + the forked JVM running on the VM. There should be a lot of RAM available /var/log/kern.log Oct 23 15:26:42 ubuntu kernel: [23021.120464] Tasks state (memory values in pages): Oct 23 15:26:42 ubuntu kernel: [23021.120464] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name .... Oct 23 15:26:42 ubuntu kernel: [23021.120574] [ 460994] 1000 460994 3319485 2440960 22024192 0 0 java Oct 23 15:26:42 ubuntu kernel: [23021.120575] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service ,task=java,pid=460994,uid=1000 Oct 23 15:26:42 ubuntu kernel: [23021.120669] Out of memory: Killed process 460994 (java) total-vm:13277940kB, anon-rss:9763848kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:21508kB oom_score_adj:0 Oct 23 15:26:42 ubuntu kernel: [23021.406205] oom_reaper: reaped process 460994 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB It seems very odd to me that the process takes 13277940kB of virtual mem and 9763848kB of anon-rss. Or maybe I'm reading something wrong. r, Juha El mié., 21 oct. 2020 a las 12:54, Juha Mynttinen (<juha.myntti...@gmail.com>) escribió: > Hmm > > Even when setting the forkcounts to 1 things fail. > > I wonder why there seem to be five of these JVM crashes. There should be > one JVM at time. And Maven should fail after the 1st fail? > > ~/apache-maven-3.2.5/bin/mvn -Dflink.forkCount=1 > -Dflink.forkCountTestPackage=1 clean verify > > [INFO] > ------------------------------------------------------------------------ > [INFO] BUILD FAILURE > [INFO] > ------------------------------------------------------------------------ > [INFO] Total time: 01:13 h > [INFO] Finished at: 2020-10-21T12:26:16+03:00 > [INFO] Final Memory: 205M/704M > [INFO] > ------------------------------------------------------------------------ > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test > (integration-tests) on project flink-tests: There are test failures. > [ERROR] > [ERROR] Please refer to > /home/juha/git/flink/flink-tests/target/surefire-reports for the individual > test results. > [ERROR] Please refer to dump files (if any exist) [date].dump, > [date]-jvmRun[N].dump and [date].dumpstream. > [ERROR] ExecutionException The forked VM terminated without properly > saying goodbye. VM crash or System.exit called? > [ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target > && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/juha/git/flink/flink-tests/target/surefire/surefirebooter1427858994096305293.jar > /home/juha/git/flink/flink-tests/target/surefire > 2020-10-21T11-13-24_791-jvmRun1 surefire10960672237393257691tmp > surefire_11744637775482284170691tmp > [ERROR] Error occurred in starting fork, check output in log > [ERROR] Process Exit Code: 137 > [ERROR] Crashed tests: > [ERROR] > org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase > [ERROR] ExecutionException The forked VM terminated without properly > saying goodbye. VM crash or System.exit called? > [ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target > && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/juha/git/flink/flink-tests/target/surefire/surefirebooter10864064660296194510.jar > /home/juha/git/flink/flink-tests/target/surefire > 2020-10-21T11-13-24_791-jvmRun1 surefire4935566802795739306tmp > surefire_11923880479826081497266tmp > [ERROR] Error occurred in starting fork, check output in log > [ERROR] Process Exit Code: 137 > [ERROR] Crashed tests: > [ERROR] org.apache.flink.test.checkpointing.LocalRecoveryITCase > [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: > ExecutionException The forked VM terminated without properly saying > goodbye. VM crash or System.exit called? > [ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target > && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/juha/git/flink/flink-tests/target/surefire/surefirebooter1427858994096305293.jar > /home/juha/git/flink/flink-tests/target/surefire > 2020-10-21T11-13-24_791-jvmRun1 surefire10960672237393257691tmp > surefire_11744637775482284170691tmp > [ERROR] Error occurred in starting fork, check output in log > [ERROR] Process Exit Code: 137 > [ERROR] Crashed tests: > [ERROR] > org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase > [ERROR] ExecutionException The forked VM terminated without properly > saying goodbye. VM crash or System.exit called? > [ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target > && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/juha/git/flink/flink-tests/target/surefire/surefirebooter10864064660296194510.jar > /home/juha/git/flink/flink-tests/target/surefire > 2020-10-21T11-13-24_791-jvmRun1 surefire4935566802795739306tmp > surefire_11923880479826081497266tmp > [ERROR] Error occurred in starting fork, check output in log > [ERROR] Process Exit Code: 137 > [ERROR] Crashed tests: > [ERROR] org.apache.flink.test.checkpointing.LocalRecoveryITCase > [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510) > [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457) > [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298) > [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246) > [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183) > [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011) > [ERROR] at > org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857) > [ERROR] at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) > [ERROR] at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) > [ERROR] at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) > [ERROR] at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) > [ERROR] at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) > [ERROR] at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80) > [ERROR] at > org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) > [ERROR] at > org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120) > [ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355) > [ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155) > [ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584) > [ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216) > [ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160) > [ERROR] at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > [ERROR] at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > [ERROR] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566) > [ERROR] at > org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) > [ERROR] at > org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) > [ERROR] at > org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) > [ERROR] at > org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) > [ERROR] Caused by: > org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM > terminated without properly saying goodbye. VM crash or System.exit called? > [ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target > && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m > -Dmvn.forkNumber=1 -XX:+UseG1GC -jar > /home/juha/git/flink/flink-tests/target/surefire/surefirebooter10864064660296194510.jar > /home/juha/git/flink/flink-tests/target/surefire > 2020-10-21T11-13-24_791-jvmRun1 surefire4935566802795739306tmp > surefire_11923880479826081497266tmp > [ERROR] Error occurred in starting fork, check output in log > [ERROR] Process Exit Code: 137 > [ERROR] Crashed tests: > [ERROR] org.apache.flink.test.checkpointing.LocalRecoveryITCase > [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669) > [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115) > [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444) > [ERROR] at > org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420) > [ERROR] at > java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > [ERROR] at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [ERROR] at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [ERROR] at java.base/java.lang.Thread.run(Thread.java:834) > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the > -e switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, > please read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn <goals> -rf :flink-tests > > > > flink-tests/target/surefire-reports/2020-10-21T11-13-24_791-jvmRun1.dump > > # Created at 2020-10-21T12:03:51.559 > java.io.IOException: Stream closed > at > java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176) > at > java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289) > at > java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351) > at > java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at > java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > at > java.base/java.io.InputStreamReader.read(InputStreamReader.java:185) > at java.base/java.io.Reader.read(Reader.java:189) > at java.base/java.util.Scanner.readInput(Scanner.java:882) > at java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796) > at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610) > at > org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354) > at > org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190) > at > org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123) > at > org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at > java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > > > # Created at 2020-10-21T12:03:51.560 > System.exit() or native command error interrupted process checker. > java.lang.IllegalStateException: error [STOPPED] to read process 935338 > at > org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145) > at > org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124) > at > org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at > java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) > at > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) > > > > sudo less -n /var/log/kern.log > ...... > Oct 21 12:21:57 ubuntu kernel: [24024.569633] > oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service > ,task=java,pid=1220764,uid=1000 > Oct 21 12:21:57 ubuntu kernel: [24024.569804] Out of memory: Killed > process 1220764 (java) total-vm:8514092kB, anon-rss:4116292kB, > file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:9136kB oom_score_adj:0 > Oct 21 12:21:57 ubuntu kernel: [24024.685821] oom_reaper: reaped process > 1220764 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > > Regards, > Juha > > El mié., 21 oct. 2020 a las 10:04, Juha Mynttinen (< > juha.myntti...@gmail.com>) escribió: > >> Hi, >> >> You're right, I thought about this also after writing the last comment - >> for example on Linux, the Kernel by default overcommits memory allocations >> and this approach doesn't work (doesn't make JVM crash right when it >> starts). >> >> I dug a little deeper. It seems that for ci-environments there are >> specific compilation scripts such as >> https://github.com/apache/flink/blob/master/tools/ci/compile.sh#L45 that >> explicitly set flink.forkCount and flink.forkCountTestPackage to lower than >> (?) default values. But for anybody compiling Flink locally, mvn uses the >> default values, which might not work, as in my case. >> >> I think a good goal would be that a developer can just git clone Flink >> and build it following simple instructions. Preferably there would be zero >> setup needed, just a simple command to run. The current situation is that >> building Flink is "simple", just run a specific mvn command. This >> simplicity comes with the price that things can break in unexpected ways: >> >> 1) There are things building Flink expects but doesn't check ( >> https://ci.apache.org/projects/flink/flink-docs-stable/flinkDev/building.html#build-flink >> ) >> * The correct Maven version >> * A suitable Java version >> 2) There's this issue with the count of CPU cores vs available mem. >> >> The case 1) is documented, case 2) is not. >> >> Fix options >> >> a) >> >> Document case 2) and instruct how to set flink.forkCountTestPackage (if >> needed). Something like "Flink tests are run on parallel JVMs, each taking >> 2GB of RAM. There are by default as many JVMs as there are physical cores. >> If your machine doesn't have at least 2GB * count of cores of RAM, >> the tests can fail. You can set the count of JVMs using Maven property >> flink.forkCountTestPackage to a lower value". >> >> b) >> >> Create a Linux specific Maven wrapper script for local execution too. The >> wrapper script could download the correct Maven version, check the Java >> version, calculate the max number of forks etc. A quick way to calculate >> the max fork count >> >> expr `cat /proc/meminfo | grep MemTotal | awk '{print $2}'` / 2097152 >> >> Regards, >> Juha >> >> >> >> >> >> El mar., 20 oct. 2020 a las 21:23, Khachatryan Roman (< >> khachatryan.ro...@gmail.com>) escribió: >> >>> I think you are right and I like the idea of failing the build fast. >>> However, when trying this approach on my local machine it didn't help: >>> the build didn't crash (probably, because of overcommit). >>> Did you try this approach in your VM? >>> >>> Regards, >>> Roman >>> >>> >>> On Tue, Oct 20, 2020 at 12:12 PM Juha Mynttinen < >>> juha.myntti...@gmail.com> wrote: >>> >>>> Hey, >>>> >>>> > Currently, tests do not run in parallel >>>> >>>> I don't think this is true, at least 100%. In 'top' it's clearly >>>> visible that there are multiple JVMs. If not running tests in parallel, >>>> what are these doing? In the main pom.xml there's configuration for the >>>> plug-in 'maven-surefire-plugin'. >>>> >>>> I'm not a Maven expert, but it looks to me like this: in >>>> https://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html >>>> it says "The other possibility for parallel test execution is setting >>>> the parameter forkCount to a value higher than 1". I think that's >>>> happening in Flink: >>>> >>>> <forkCount>${flink.forkCount}</forkCount> >>>> >>>> And >>>> >>>> <flink.forkCount>1C</flink.forkCount> >>>> >>>> This means there's gonna be 1 * count_of_cpus forks. >>>> >>>> And this one: >>>> >>>> <argLine>-Xms256m -Xmx2048m -Dmvn.forkNumber=${surefire.forkNumber} >>>> -XX:+UseG1GC</argLine> >>>> >>>> In my case, I have 5 CPUs, so 5 forks. I think what now happens is that >>>> since each fork gets max 2048m heap, there's kind of mem requirement of CPU >>>> count * 2048 m. In my case, I have 8GB of mem, which is less than max 5 * >>>> 2048mb. >>>> >>>> This could be better..... I think it's a completely valid computer that >>>> has RAM < count_of_cpus * 2048 mb, take e.g. AMD ryzen 3900X with 12 cores >>>> and put 16 GB of RAM there. At least memory & CPU requirements should be >>>> documented? >>>> >>>> If the tests really need 2GB of heap, then maybe the forkCount should >>>> be based on the available RAM rather than available cores, e.g. floor(RAM / >>>> 2GB)? I don't if that's doable in Maven.... >>>> >>>> I think an easy and non-intrusive improvement would be to change ' >>>> -Xms256' to ' -Xms2048' (ms to match mx) so that the JVM would allocate >>>> right away 2048mb (when it starts). If there's not enough memory, the tests >>>> would fail immediately (JVM couldn't start). The tests would probably fail >>>> anyways (my case) - better fail fast.. >>>> >>>> Regards, >>>> Juha >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> El mar., 20 oct. 2020 a las 11:16, Khachatryan Roman (< >>>> khachatryan.ro...@gmail.com>) escribió: >>>> >>>>> Thanks for sharing this, >>>>> I think the activity of OOM-Killer means high memory pressure (it just >>>>> kills a process with the highest score of memory consumption). >>>>> High CPU usage can only be a consequence of it, being constant GC. >>>>> >>>>> Currently, tests do not run in parallel, but high memory usage can be >>>>> caused by the nature test (e.g. running Flink with high parallelism). >>>>> So I think the best way to deal with this is to use VM with more >>>>> memory. >>>>> >>>>> Regards, >>>>> Roman >>>>> >>>>> >>>>> On Tue, Oct 20, 2020 at 8:56 AM Juha Mynttinen < >>>>> juha.myntti...@gmail.com> wrote: >>>>> >>>>>> Hey, >>>>>> >>>>>> Good hint that /var/log/kern.log. This time I can see this: >>>>>> >>>>>> Oct 20 09:44:48 ubuntu kernel: [ 1925.651551] >>>>>> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service >>>>>> ,task=java,pid=270024,uid=1000 >>>>>> Oct 20 09:44:48 ubuntu kernel: [ 1925.651632] Out of memory: Killed >>>>>> process 270024 (java) total-vm:9841596kB, anon-rss:4820380kB, >>>>>> file-rss:0kB, >>>>>> shmem-rss:0kB, UID:1000 pgtables:11780kB oom_score_adj:0 >>>>>> Oct 20 09:44:48 ubuntu kernel: [ 1925.844155] oom_reaper: reaped >>>>>> process 270024 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB >>>>>> >>>>>> The next question is why does this happen.... I'll try to dig deeper. >>>>>> >>>>>> About the CPU load. I have five CPUs. Theoretically it makes sense to >>>>>> run five tests at time to max out the CPUs. However, when I look at what >>>>>> the five Java processes (that MVN forks) are doing, it can be seen that >>>>>> each of those processes have a large number of threads wanting to use >>>>>> CPU. >>>>>> Here's an example from 'top -H' >>>>>> >>>>>> top - 09:42:03 up 29 min, 1 user, load average: 17,00, 12,86, 8,81 >>>>>> Threads: 1099 total, 21 running, 1078 sleeping, 0 stopped, 0 >>>>>> zombie >>>>>> %Cpu(s): 90,5 us, 9,4 sy, 0,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 0,1 >>>>>> si, 0,0 st >>>>>> MiB Mem : 7961,6 total, 1614,3 free, 4023,8 used, 2323,5 >>>>>> buff/cache >>>>>> MiB Swap: 2048,0 total, 2047,0 free, 1,0 used. 3638,9 >>>>>> avail Mem >>>>>> >>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM >>>>>> TIME+ COMMAND >>>>>> >>>>>> 254825 juha 20 0 4250424 195768 27596 R 20,9 2,4 >>>>>> 0:01.41 C2 CompilerThre >>>>>> >>>>>> 255116 juha 20 0 2820448 99240 27488 R 20,3 1,2 >>>>>> 0:00.78 java >>>>>> >>>>>> 254968 juha 20 0 5312696 125212 27716 R 19,9 1,5 >>>>>> 0:01.16 java >>>>>> >>>>>> 255027 juha 20 0 5310648 108716 27496 R 19,9 1,3 >>>>>> 0:00.90 java >>>>>> >>>>>> 255123 juha 20 0 2820448 99120 27420 R 19,3 1,2 >>>>>> 0:00.78 java >>>>>> >>>>>> 254829 juha 20 0 4240356 184376 27792 R 17,9 2,3 >>>>>> 0:01.26 C2 CompilerThre >>>>>> >>>>>> 253993 juha 20 0 6436132 276808 28000 R 17,6 3,4 >>>>>> 0:02.47 C2 CompilerThre >>>>>> >>>>>> 254793 juha 20 0 4250424 195768 27596 R 17,3 2,4 >>>>>> 0:01.76 java >>>>>> >>>>>> 254801 juha 20 0 4240356 184376 27792 R 16,3 2,3 >>>>>> 0:01.67 java >>>>>> >>>>>> 254298 juha 20 0 6510340 435360 28212 R 15,6 5,3 >>>>>> 0:02.82 C2 CompilerThre >>>>>> >>>>>> 255145 juha 20 0 2820448 99240 27488 S 15,6 1,2 >>>>>> 0:00.51 C2 CompilerThre >>>>>> >>>>>> 255045 juha 20 0 5310648 108716 27496 R 15,3 1,3 >>>>>> 0:00.62 C2 CompilerThre >>>>>> >>>>>> 255151 juha 20 0 2820448 99120 27420 S 14,0 1,2 >>>>>> 0:00.47 C2 CompilerThre >>>>>> >>>>>> 254986 juha 20 0 5312696 125212 27716 R 12,6 1,5 >>>>>> 0:00.76 C2 CompilerThre >>>>>> >>>>>> 253980 juha 20 0 6436132 276808 28000 S 11,6 3,4 >>>>>> 0:02.63 java >>>>>> >>>>>> 255148 juha 20 0 2820448 99240 27488 S 10,6 1,2 >>>>>> 0:00.39 C1 CompilerThre >>>>>> >>>>>> 255154 juha 20 0 2820448 99120 27420 S 9,6 1,2 >>>>>> 0:00.37 C1 CompilerThre >>>>>> >>>>>> 254457 juha 20 0 4269900 218036 28236 R 9,3 2,7 >>>>>> 0:02.22 C2 CompilerThre >>>>>> >>>>>> 254299 juha 20 0 6510340 435360 28212 S 8,6 5,3 >>>>>> 0:01.30 C1 CompilerThre >>>>>> >>>>>> 255047 juha 20 0 5310648 108716 27496 S 8,6 1,3 >>>>>> 0:00.42 C1 CompilerThre >>>>>> >>>>>> 253994 juha 20 0 6436132 276808 28000 R 7,3 3,4 >>>>>> 0:01.10 C1 CompilerThre >>>>>> >>>>>> 255312 juha 20 0 4250424 195768 27596 R 7,0 2,4 >>>>>> 0:00.21 C2 CompilerThre >>>>>> >>>>>> 254831 juha 20 0 4240356 184376 27792 S 6,3 2,3 >>>>>> 0:00.62 C1 CompilerThre >>>>>> >>>>>> 254988 juha 20 0 5312696 125212 27716 S 6,3 1,5 >>>>>> 0:00.45 C1 CompilerThre >>>>>> >>>>>> 254828 juha 20 0 4250424 195768 27596 S 6,0 2,4 >>>>>> 0:00.64 C1 CompilerThre >>>>>> >>>>>> 254720 juha 20 0 6510340 435360 28212 S 5,0 5,3 >>>>>> 0:00.15 flink-akka.acto >>>>>> >>>>>> >>>>>> It can be seen that the JIT related threads consume quite a lot of >>>>>> CPU, essentially leaving less CPU available to the actual test code. By >>>>>> using htop I can also see the garbage collection related threads eating >>>>>> CPU. This doesn't seem right. I think it'd make sense to run the tests >>>>>> with >>>>>> less parallelism to better utilize the CPUs. Having greatly more threads >>>>>> wanting CPU slows things down (not speed up). >>>>>> >>>>>> However, AFAIK high CPU load shouldn't trigger OOM-killer? >>>>>> >>>>>> Regards, >>>>>> Juha >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> El lun., 19 oct. 2020 a las 20:48, Khachatryan Roman (< >>>>>> khachatryan.ro...@gmail.com>) escribió: >>>>>> >>>>>>> Hey, >>>>>>> >>>>>>> One reason could be that a resource-intensive test was killed by oom >>>>>>> killer. You can inspect /var/log/kern.log for the related messages in >>>>>>> your >>>>>>> VM. >>>>>>> >>>>>>> Regards, >>>>>>> Roman >>>>>>> >>>>>>> >>>>>>> On Mon, Oct 19, 2020 at 5:57 PM Juha Mynttinen < >>>>>>> juha.myntti...@gmail.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> Hey, >>>>>>>> >>>>>>>> I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1 >>>>>>>> in a virtual machine on Windows 10. I'm using OpenJDK 11.0.8. I'm on >>>>>>>> the >>>>>>>> master branch, commit 9eae578ae592254d54bc51c679644e8e84c65152. >>>>>>>> >>>>>>>> The command I'm using: >>>>>>>> >>>>>>>> apache-maven-3.2.5/bin/mvn clean verify >>>>>>>> >>>>>>>> The output: >>>>>>>> >>>>>>>> [INFO] Flink : Tests ...................................... FAILURE >>>>>>>> [14:38 min] >>>>>>>> [INFO] Flink : Streaming Scala ............................ SKIPPED >>>>>>>> [INFO] Flink : Connectors : HCatalog ...................... SKIPPED >>>>>>>> [INFO] Flink : Connectors : Base .......................... SKIPPED >>>>>>>> [INFO] Flink : Connectors : Files ......................... SKIPPED >>>>>>>> [INFO] Flink : Table : .................................... SKIPPED >>>>>>>> [INFO] Flink : Table : Common ............................. SKIPPED >>>>>>>> [INFO] Flink : Table : API Java ........................... SKIPPED >>>>>>>> [INFO] Flink : Table : API Java bridge .................... SKIPPED >>>>>>>> [INFO] Flink : Table : API Scala .......................... SKIPPED >>>>>>>> [INFO] Flink : Table : API Scala bridge ................... SKIPPED >>>>>>>> [INFO] Flink : Table : SQL Parser ......................... SKIPPED >>>>>>>> [INFO] Flink : Libraries : ................................ SKIPPED >>>>>>>> [INFO] Flink : Libraries : CEP ............................ SKIPPED >>>>>>>> [INFO] Flink : Table : Planner ............................ SKIPPED >>>>>>>> [INFO] Flink : Table : SQL Parser Hive .................... SKIPPED >>>>>>>> [INFO] Flink : Table : Runtime Blink ...................... SKIPPED >>>>>>>> [INFO] Flink : Table : Planner Blink ...................... SKIPPED >>>>>>>> [INFO] Flink : Metrics : JMX .............................. SKIPPED >>>>>>>> [INFO] Flink : Formats : .................................. SKIPPED >>>>>>>> [INFO] Flink : Formats : Json ............................. SKIPPED >>>>>>>> [INFO] Flink : Connectors : Kafka base .................... SKIPPED >>>>>>>> [INFO] Flink : Connectors : Elasticsearch base ............ SKIPPED >>>>>>>> [INFO] Flink : Connectors : Elasticsearch 5 ............... SKIPPED >>>>>>>> [INFO] Flink : Connectors : Elasticsearch 6 ............... SKIPPED >>>>>>>> [INFO] Flink : Connectors : Elasticsearch 7 ............... SKIPPED >>>>>>>> [INFO] Flink : Connectors : HBase base .................... SKIPPED >>>>>>>> [INFO] Flink : Connectors : HBase 1.4 ..................... SKIPPED >>>>>>>> [INFO] Flink : Connectors : HBase 2.2 ..................... SKIPPED >>>>>>>> [INFO] Flink : Formats : Hadoop bulk ...................... SKIPPED >>>>>>>> [INFO] Flink : Formats : Orc .............................. SKIPPED >>>>>>>> [INFO] Flink : Formats : Orc nohive ....................... SKIPPED >>>>>>>> [INFO] Flink : Formats : Avro ............................. SKIPPED >>>>>>>> [INFO] Flink : Formats : Parquet .......................... SKIPPED >>>>>>>> [INFO] Flink : Formats : Csv .............................. SKIPPED >>>>>>>> [INFO] Flink : Connectors : Hive .......................... SKIPPED >>>>>>>> [INFO] Flink : Connectors : JDBC .......................... SKIPPED >>>>>>>> [INFO] Flink : Connectors : RabbitMQ ...................... SKIPPED >>>>>>>> [INFO] Flink : Connectors : Twitter ....................... SKIPPED >>>>>>>> [INFO] Flink : Connectors : Nifi .......................... SKIPPED >>>>>>>> [INFO] Flink : Connectors : Cassandra ..................... SKIPPED >>>>>>>> [INFO] Flink : Connectors : Filesystem .................... SKIPPED >>>>>>>> [INFO] Flink : Connectors : Kafka ......................... SKIPPED >>>>>>>> [INFO] Flink : Connectors : Google PubSub ................. SKIPPED >>>>>>>> [INFO] Flink : Connectors : Kinesis ....................... SKIPPED >>>>>>>> [INFO] Flink : Connectors : SQL : Elasticsearch 6 ......... SKIPPED >>>>>>>> [INFO] Flink : Connectors : SQL : Elasticsearch 7 ......... SKIPPED >>>>>>>> [INFO] Flink : Connectors : SQL : HBase 1.4 ............... SKIPPED >>>>>>>> [INFO] Flink : Connectors : SQL : HBase 2.2 ............... SKIPPED >>>>>>>> [INFO] Flink : Connectors : SQL : Hive 1.2.2 .............. SKIPPED >>>>>>>> [INFO] Flink : Connectors : SQL : Hive 2.2.0 .............. SKIPPED >>>>>>>> [INFO] Flink : Connectors : SQL : Hive 2.3.6 .............. SKIPPED >>>>>>>> [INFO] Flink : Connectors : SQL : Hive 3.1.2 .............. SKIPPED >>>>>>>> [INFO] Flink : Connectors : SQL : Kafka ................... SKIPPED >>>>>>>> [INFO] Flink : Formats : Avro confluent registry .......... SKIPPED >>>>>>>> [INFO] Flink : Formats : Sequence file .................... SKIPPED >>>>>>>> [INFO] Flink : Formats : Compress ......................... SKIPPED >>>>>>>> [INFO] Flink : Formats : SQL Orc .......................... SKIPPED >>>>>>>> [INFO] Flink : Formats : SQL Parquet ...................... SKIPPED >>>>>>>> [INFO] Flink : Formats : SQL Avro ......................... SKIPPED >>>>>>>> [INFO] Flink : Examples : Streaming ....................... SKIPPED >>>>>>>> [INFO] Flink : Examples : Table ........................... SKIPPED >>>>>>>> [INFO] Flink : Examples : Build Helper : .................. SKIPPED >>>>>>>> [INFO] Flink : Examples : Build Helper : Streaming Twitter SKIPPED >>>>>>>> [INFO] Flink : Examples : Build Helper : Streaming State machine >>>>>>>> SKIPPED >>>>>>>> [INFO] Flink : Examples : Build Helper : Streaming Google PubSub >>>>>>>> SKIPPED >>>>>>>> [INFO] Flink : Container .................................. SKIPPED >>>>>>>> [INFO] Flink : Queryable state : Runtime .................. SKIPPED >>>>>>>> [INFO] Flink : Mesos ...................................... SKIPPED >>>>>>>> [INFO] Flink : Kubernetes ................................. SKIPPED >>>>>>>> [INFO] Flink : Yarn ....................................... SKIPPED >>>>>>>> [INFO] Flink : Libraries : Gelly .......................... SKIPPED >>>>>>>> [INFO] Flink : Libraries : Gelly scala .................... SKIPPED >>>>>>>> [INFO] Flink : Libraries : Gelly Examples ................. SKIPPED >>>>>>>> [INFO] Flink : External resources : ....................... SKIPPED >>>>>>>> [INFO] Flink : External resources : GPU ................... SKIPPED >>>>>>>> [INFO] Flink : Metrics : Dropwizard ....................... SKIPPED >>>>>>>> [INFO] Flink : Metrics : Graphite ......................... SKIPPED >>>>>>>> [INFO] Flink : Metrics : InfluxDB ......................... SKIPPED >>>>>>>> [INFO] Flink : Metrics : Prometheus ....................... SKIPPED >>>>>>>> [INFO] Flink : Metrics : StatsD ........................... SKIPPED >>>>>>>> [INFO] Flink : Metrics : Datadog .......................... SKIPPED >>>>>>>> [INFO] Flink : Metrics : Slf4j ............................ SKIPPED >>>>>>>> [INFO] Flink : Libraries : CEP Scala ...................... SKIPPED >>>>>>>> [INFO] Flink : Table : Uber ............................... SKIPPED >>>>>>>> [INFO] Flink : Table : Uber Blink ......................... SKIPPED >>>>>>>> [INFO] Flink : Python ..................................... SKIPPED >>>>>>>> [INFO] Flink : Table : SQL Client ......................... SKIPPED >>>>>>>> [INFO] Flink : Libraries : State processor API ............ SKIPPED >>>>>>>> [INFO] Flink : ML : ....................................... SKIPPED >>>>>>>> [INFO] Flink : ML : API ................................... SKIPPED >>>>>>>> [INFO] Flink : ML : Lib ................................... SKIPPED >>>>>>>> [INFO] Flink : ML : Uber .................................. SKIPPED >>>>>>>> [INFO] Flink : Scala shell ................................ SKIPPED >>>>>>>> [INFO] Flink : Dist ....................................... SKIPPED >>>>>>>> [INFO] Flink : Yarn Tests ................................. SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : ................................ SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : CLI ............................ SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Parent Child classloading program SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Parent Child classloading lib-package >>>>>>>> SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Dataset allround ............... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Dataset Fine-grained recovery .. SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Datastream allround ............ SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Batch SQL ...................... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Stream SQL ..................... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Bucketing sink ................. SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Distributed cache via blob ..... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : High parallelism iterations .... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Stream stateful job upgrade .... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Queryable state ................ SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Local recovery and allocation .. SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Elasticsearch 5 ................ SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Elasticsearch 6 ................ SKIPPED >>>>>>>> [INFO] Flink : Quickstart : ............................... SKIPPED >>>>>>>> [INFO] Flink : Quickstart : Java .......................... SKIPPED >>>>>>>> [INFO] Flink : Quickstart : Scala ......................... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Quickstart ..................... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Confluent schema registry ...... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Stream state TTL ............... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : SQL client ..................... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Streaming file sink ............ SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : State evolution ................ SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : RocksDB state memory control ... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Common ......................... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Metrics availability ........... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Metrics reporter prometheus .... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Heavy deployment ............... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Connectors : Google PubSub ..... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Streaming Kafka base ........... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Streaming Kafka ................ SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Plugins : ...................... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Plugins : Dummy fs ............. SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Plugins : Another dummy fs ..... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : TPCH ........................... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Streaming Kinesis .............. SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Elasticsearch 7 ................ SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Common Kafka ................... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : TPCDS .......................... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Netty shuffle memory control ... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : Python ......................... SKIPPED >>>>>>>> [INFO] Flink : E2E Tests : HBase .......................... SKIPPED >>>>>>>> [INFO] Flink : State backends : Heap spillable ............ SKIPPED >>>>>>>> [INFO] Flink : Contrib : .................................. SKIPPED >>>>>>>> [INFO] Flink : Contrib : Connectors : Wikiedits ........... SKIPPED >>>>>>>> [INFO] Flink : FileSystems : Tests ........................ SKIPPED >>>>>>>> [INFO] Flink : Docs ....................................... SKIPPED >>>>>>>> [INFO] Flink : Walkthrough : .............................. SKIPPED >>>>>>>> [INFO] Flink : Walkthrough : Common ....................... SKIPPED >>>>>>>> [INFO] Flink : Walkthrough : Datastream Java .............. SKIPPED >>>>>>>> [INFO] Flink : Walkthrough : Datastream Scala ............. SKIPPED >>>>>>>> [INFO] >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> [INFO] BUILD FAILURE >>>>>>>> [INFO] >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> [INFO] Total time: 36:49 min >>>>>>>> [INFO] Finished at: 2020-10-19T18:24:46+03:00 >>>>>>>> [INFO] Final Memory: 179M/614M >>>>>>>> [INFO] >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> [ERROR] Failed to execute goal >>>>>>>> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test >>>>>>>> (integration-tests) on project flink-tests: There are test failures. >>>>>>>> [ERROR] >>>>>>>> [ERROR] Please refer to >>>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire-reports for the >>>>>>>> individual test results. >>>>>>>> [ERROR] Please refer to dump files (if any exist) [date].dump, >>>>>>>> [date]-jvmRun[N].dump and [date].dumpstream. >>>>>>>> [ERROR] ExecutionException The forked VM terminated without >>>>>>>> properly saying goodbye. VM crash or System.exit called? >>>>>>>> [ERROR] Command was /bin/sh -c cd >>>>>>>> /home/juha/git/apache-flink/flink-tests/target && >>>>>>>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m >>>>>>>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar >>>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar >>>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire >>>>>>>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp >>>>>>>> surefire_122313349068739873924160tmp >>>>>>>> [ERROR] Error occurred in starting fork, check output in log >>>>>>>> [ERROR] Process Exit Code: 137 >>>>>>>> [ERROR] Crashed tests: >>>>>>>> [ERROR] >>>>>>>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase >>>>>>>> [ERROR] >>>>>>>> org.apache.maven.surefire.booter.SurefireBooterForkException: >>>>>>>> ExecutionException The forked VM terminated without properly saying >>>>>>>> goodbye. VM crash or System.exit called? >>>>>>>> [ERROR] Command was /bin/sh -c cd >>>>>>>> /home/juha/git/apache-flink/flink-tests/target && >>>>>>>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m >>>>>>>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar >>>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar >>>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire >>>>>>>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp >>>>>>>> surefire_122313349068739873924160tmp >>>>>>>> [ERROR] Error occurred in starting fork, check output in log >>>>>>>> [ERROR] Process Exit Code: 137 >>>>>>>> [ERROR] Crashed tests: >>>>>>>> [ERROR] >>>>>>>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155) >>>>>>>> [ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584) >>>>>>>> [ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216) >>>>>>>> [ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160) >>>>>>>> [ERROR] at >>>>>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native >>>>>>>> Method) >>>>>>>> [ERROR] at >>>>>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>>>>>> [ERROR] at >>>>>>>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>>>>>> [ERROR] at >>>>>>>> java.base/java.lang.reflect.Method.invoke(Method.java:566) >>>>>>>> [ERROR] at >>>>>>>> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) >>>>>>>> [ERROR] at >>>>>>>> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) >>>>>>>> [ERROR] at >>>>>>>> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) >>>>>>>> [ERROR] at >>>>>>>> org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) >>>>>>>> [ERROR] Caused by: >>>>>>>> org.apache.maven.surefire.booter.SurefireBooterForkException: The >>>>>>>> forked VM >>>>>>>> terminated without properly saying goodbye. VM crash or System.exit >>>>>>>> called? >>>>>>>> [ERROR] Command was /bin/sh -c cd >>>>>>>> /home/juha/git/apache-flink/flink-tests/target && >>>>>>>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m >>>>>>>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar >>>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar >>>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire >>>>>>>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp >>>>>>>> surefire_122313349068739873924160tmp >>>>>>>> [ERROR] Error occurred in starting fork, check output in log >>>>>>>> [ERROR] Process Exit Code: 137 >>>>>>>> [ERROR] Crashed tests: >>>>>>>> [ERROR] >>>>>>>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444) >>>>>>>> [ERROR] at >>>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420) >>>>>>>> [ERROR] at >>>>>>>> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) >>>>>>>> [ERROR] at >>>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >>>>>>>> [ERROR] at >>>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >>>>>>>> [ERROR] at java.base/java.lang.Thread.run(Thread.java:834) >>>>>>>> [ERROR] -> [Help 1] >>>>>>>> [ERROR] >>>>>>>> [ERROR] To see the full stack trace of the errors, re-run Maven >>>>>>>> with the -e switch. >>>>>>>> [ERROR] Re-run Maven using the -X switch to enable full debug >>>>>>>> logging. >>>>>>>> [ERROR] >>>>>>>> [ERROR] For more information about the errors and possible >>>>>>>> solutions, please read the following articles: >>>>>>>> [ERROR] [Help 1] >>>>>>>> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException >>>>>>>> [ERROR] >>>>>>>> [ERROR] After correcting the problems, you can resume the build >>>>>>>> with the command >>>>>>>> [ERROR] mvn <goals> -rf :flink-tests >>>>>>>> >>>>>>>> The jvmdump-files look like this: >>>>>>>> >>>>>>>> # Created at 2020-10-19T18:14:22.869 >>>>>>>> java.io.IOException: Stream closed >>>>>>>> at >>>>>>>> java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176) >>>>>>>> at >>>>>>>> java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289) >>>>>>>> at >>>>>>>> java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351) >>>>>>>> at >>>>>>>> java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) >>>>>>>> at >>>>>>>> java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) >>>>>>>> at >>>>>>>> java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) >>>>>>>> at >>>>>>>> java.base/java.io.InputStreamReader.read(InputStreamReader.java:185) >>>>>>>> at java.base/java.io.Reader.read(Reader.java:189) >>>>>>>> at java.base/java.util.Scanner.readInput(Scanner.java:882) >>>>>>>> at >>>>>>>> java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796) >>>>>>>> at >>>>>>>> java.base/java.util.Scanner.hasNextLine(Scanner.java:1610) >>>>>>>> at >>>>>>>> org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354) >>>>>>>> at >>>>>>>> org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190) >>>>>>>> at >>>>>>>> org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123) >>>>>>>> at >>>>>>>> org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214) >>>>>>>> at >>>>>>>> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) >>>>>>>> at >>>>>>>> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) >>>>>>>> at >>>>>>>> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) >>>>>>>> at >>>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >>>>>>>> at >>>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >>>>>>>> at java.base/java.lang.Thread.run(Thread.java:834) >>>>>>>> >>>>>>>> >>>>>>>> # Created at 2020-10-19T18:14:22.870 >>>>>>>> System.exit() or native command error interrupted process checker. >>>>>>>> java.lang.IllegalStateException: error [STOPPED] to read process >>>>>>>> 898133 >>>>>>>> at >>>>>>>> org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145) >>>>>>>> at >>>>>>>> org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124) >>>>>>>> at >>>>>>>> org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214) >>>>>>>> at >>>>>>>> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) >>>>>>>> at >>>>>>>> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) >>>>>>>> at >>>>>>>> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) >>>>>>>> at >>>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >>>>>>>> at >>>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >>>>>>>> at java.base/java.lang.Thread.run(Thread.java:834) >>>>>>>> >>>>>>>> >>>>>>>> I found some JIRA tickets with " The forked VM terminated without >>>>>>>> properly saying goodbye": >>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-18375 >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2466 >>>>>>>> >>>>>>>> I don't see how these could explain the issue I'm witnessing.... >>>>>>>> >>>>>>>> I wonder if the issue is related to the VM running "too hot". 'top' >>>>>>>> shows very high load averages. >>>>>>>> >>>>>>>> The crash can be reproduced. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Juha >>>>>>>> >>>>>>>>