I'm trying again running the tests, now I have four cores (previously five)
and 12 GB RAM (previously 8 GB). I'm still hit by the OOM killer.

The command I'm running is:

mvn -Dflink.forkCount=1 -Dflink.forkCountTestPackage=1 clean verify

[INFO] BUILD FAILURE
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 01:17 h
[INFO] Finished at: 2020-10-23T15:36:50+03:00
[INFO] Final Memory: 180M/614M
[INFO]
------------------------------------------------------------------------
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test
(integration-tests) on project flink-tests: There are test failures.
[ERROR]
[ERROR] Please refer to
/home/juha/git/flink/flink-tests/target/surefire-reports for the individual
test results.
[ERROR] Please refer to dump files (if any exist) [date].dump,
[date]-jvmRun[N].dump and [date].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying
goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target
&& /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m
-Dmvn.forkNumber=1 -XX:+UseG1GC -jar
/home/juha/git/flink/flink-tests/target/surefire/surefirebooter15842756015305201470.jar
/home/juha/git/flink/flink-tests/target/surefire
2020-10-23T14-19-18_685-jvmRun1 surefire394592676817174474tmp
surefire_117413817767116882164827tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR]
org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException:
ExecutionException The forked VM terminated without properly saying
goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target
&& /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m
-Dmvn.forkNumber=1 -XX:+UseG1GC -jar
/home/juha/git/flink/flink-tests/target/surefire/surefirebooter15842756015305201470.jar
/home/juha/git/flink/flink-tests/target/surefire
2020-10-23T14-19-18_685-jvmRun1 surefire394592676817174474tmp
surefire_117413817767116882164827tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR]
org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
[ERROR] at
org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
[ERROR] at
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
[ERROR] at
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
[ERROR] at
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
[ERROR] at
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
[ERROR] at
org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
[ERROR] at
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
[ERROR] at
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
[ERROR] at
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
[ERROR] at
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
[ERROR] at
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
[ERROR] at
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
[ERROR] at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
[ERROR] at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
[ERROR] at
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at
org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by:
org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM
terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target
&& /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m
-Dmvn.forkNumber=1 -XX:+UseG1GC -jar
/home/juha/git/flink/flink-tests/target/surefire/surefirebooter15842756015305201470.jar
/home/juha/git/flink/flink-tests/target/surefire
2020-10-23T14-19-18_685-jvmRun1 surefire394592676817174474tmp
surefire_117413817767116882164827tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 137
[ERROR] Crashed tests:
[ERROR]
org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
[ERROR] at
org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
[ERROR] at
org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
[ERROR] at
org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
[ERROR] at
org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
[ERROR] at
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[ERROR] at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[ERROR] at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the
command
[ERROR]   mvn <goals> -rf :flink-tests

This means there should be only the parent JVM + the forked JVM running on
the VM. There should be a lot of RAM available

/var/log/kern.log


Oct 23 15:26:42 ubuntu kernel: [23021.120464] Tasks state (memory values in
pages):
Oct 23 15:26:42 ubuntu kernel: [23021.120464] [  pid  ]   uid  tgid
total_vm      rss pgtables_bytes swapents oom_score_adj name
....
Oct 23 15:26:42 ubuntu kernel: [23021.120574] [ 460994]  1000 460994
 3319485  2440960 22024192        0             0 java
Oct 23 15:26:42 ubuntu kernel: [23021.120575]
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service
,task=java,pid=460994,uid=1000
Oct 23 15:26:42 ubuntu kernel: [23021.120669] Out of memory: Killed process
460994 (java) total-vm:13277940kB, anon-rss:9763848kB, file-rss:0kB,
shmem-rss:0kB, UID:1000 pgtables:21508kB oom_score_adj:0
Oct 23 15:26:42 ubuntu kernel: [23021.406205] oom_reaper: reaped process
460994 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

It seems very odd to me that the process takes 13277940kB of virtual mem
and 9763848kB of anon-rss. Or maybe I'm reading something wrong.

r,
Juha

El mié., 21 oct. 2020 a las 12:54, Juha Mynttinen (<juha.myntti...@gmail.com>)
escribió:

> Hmm
>
> Even when setting the forkcounts to 1 things fail.
>
> I wonder why there seem to be five of these JVM crashes. There should be
> one JVM at time. And Maven should fail after the 1st fail?
>
> ~/apache-maven-3.2.5/bin/mvn -Dflink.forkCount=1
> -Dflink.forkCountTestPackage=1 clean verify
>
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 01:13 h
> [INFO] Finished at: 2020-10-21T12:26:16+03:00
> [INFO] Final Memory: 205M/704M
> [INFO]
> ------------------------------------------------------------------------
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test
> (integration-tests) on project flink-tests: There are test failures.
> [ERROR]
> [ERROR] Please refer to
> /home/juha/git/flink/flink-tests/target/surefire-reports for the individual
> test results.
> [ERROR] Please refer to dump files (if any exist) [date].dump,
> [date]-jvmRun[N].dump and [date].dumpstream.
> [ERROR] ExecutionException The forked VM terminated without properly
> saying goodbye. VM crash or System.exit called?
> [ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target
> && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar
> /home/juha/git/flink/flink-tests/target/surefire/surefirebooter1427858994096305293.jar
> /home/juha/git/flink/flink-tests/target/surefire
> 2020-10-21T11-13-24_791-jvmRun1 surefire10960672237393257691tmp
> surefire_11744637775482284170691tmp
> [ERROR] Error occurred in starting fork, check output in log
> [ERROR] Process Exit Code: 137
> [ERROR] Crashed tests:
> [ERROR]
> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
> [ERROR] ExecutionException The forked VM terminated without properly
> saying goodbye. VM crash or System.exit called?
> [ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target
> && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar
> /home/juha/git/flink/flink-tests/target/surefire/surefirebooter10864064660296194510.jar
> /home/juha/git/flink/flink-tests/target/surefire
> 2020-10-21T11-13-24_791-jvmRun1 surefire4935566802795739306tmp
> surefire_11923880479826081497266tmp
> [ERROR] Error occurred in starting fork, check output in log
> [ERROR] Process Exit Code: 137
> [ERROR] Crashed tests:
> [ERROR] org.apache.flink.test.checkpointing.LocalRecoveryITCase
> [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException:
> ExecutionException The forked VM terminated without properly saying
> goodbye. VM crash or System.exit called?
> [ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target
> && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar
> /home/juha/git/flink/flink-tests/target/surefire/surefirebooter1427858994096305293.jar
> /home/juha/git/flink/flink-tests/target/surefire
> 2020-10-21T11-13-24_791-jvmRun1 surefire10960672237393257691tmp
> surefire_11744637775482284170691tmp
> [ERROR] Error occurred in starting fork, check output in log
> [ERROR] Process Exit Code: 137
> [ERROR] Crashed tests:
> [ERROR]
> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
> [ERROR] ExecutionException The forked VM terminated without properly
> saying goodbye. VM crash or System.exit called?
> [ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target
> && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar
> /home/juha/git/flink/flink-tests/target/surefire/surefirebooter10864064660296194510.jar
> /home/juha/git/flink/flink-tests/target/surefire
> 2020-10-21T11-13-24_791-jvmRun1 surefire4935566802795739306tmp
> surefire_11923880479826081497266tmp
> [ERROR] Error occurred in starting fork, check output in log
> [ERROR] Process Exit Code: 137
> [ERROR] Crashed tests:
> [ERROR] org.apache.flink.test.checkpointing.LocalRecoveryITCase
> [ERROR] at
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
> [ERROR] at
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
> [ERROR] at
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
> [ERROR] at
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
> [ERROR] at
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
> [ERROR] at
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
> [ERROR] at
> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
> [ERROR] at
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
> [ERROR] at
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
> [ERROR] at
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
> [ERROR] at
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
> [ERROR] at
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
> [ERROR] at
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
> [ERROR] at
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
> [ERROR] at
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
> [ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
> [ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
> [ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
> [ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
> [ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
> [ERROR] at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> [ERROR] at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> [ERROR] at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> [ERROR] at
> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
> [ERROR] at
> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
> [ERROR] at
> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
> [ERROR] at
> org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
> [ERROR] Caused by:
> org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM
> terminated without properly saying goodbye. VM crash or System.exit called?
> [ERROR] Command was /bin/sh -c cd /home/juha/git/flink/flink-tests/target
> && /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms2048m -Xmx2048m
> -Dmvn.forkNumber=1 -XX:+UseG1GC -jar
> /home/juha/git/flink/flink-tests/target/surefire/surefirebooter10864064660296194510.jar
> /home/juha/git/flink/flink-tests/target/surefire
> 2020-10-21T11-13-24_791-jvmRun1 surefire4935566802795739306tmp
> surefire_11923880479826081497266tmp
> [ERROR] Error occurred in starting fork, check output in log
> [ERROR] Process Exit Code: 137
> [ERROR] Crashed tests:
> [ERROR] org.apache.flink.test.checkpointing.LocalRecoveryITCase
> [ERROR] at
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
> [ERROR] at
> org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
> [ERROR] at
> org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
> [ERROR] at
> org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
> [ERROR] at
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> [ERROR] at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> [ERROR] at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> [ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the
> command
> [ERROR]   mvn <goals> -rf :flink-tests
>
>
>
> flink-tests/target/surefire-reports/2020-10-21T11-13-24_791-jvmRun1.dump
>
> # Created at 2020-10-21T12:03:51.559
> java.io.IOException: Stream closed
>         at
> java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
>         at
> java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
>         at
> java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
>         at
> java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>         at
> java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>         at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>         at
> java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
>         at java.base/java.io.Reader.read(Reader.java:189)
>         at java.base/java.util.Scanner.readInput(Scanner.java:882)
>         at java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796)
>         at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610)
>         at
> org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
>         at
> org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
>         at
> org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
>         at
> org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
>         at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>         at
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>         at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>         at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)
>
>
> # Created at 2020-10-21T12:03:51.560
> System.exit() or native command error interrupted process checker.
> java.lang.IllegalStateException: error [STOPPED] to read process 935338
>         at
> org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
>         at
> org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
>         at
> org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
>         at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>         at
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>         at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>         at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:834)
>
>
>
> sudo less -n /var/log/kern.log
> ......
> Oct 21 12:21:57 ubuntu kernel: [24024.569633]
> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service
> ,task=java,pid=1220764,uid=1000
> Oct 21 12:21:57 ubuntu kernel: [24024.569804] Out of memory: Killed
> process 1220764 (java) total-vm:8514092kB, anon-rss:4116292kB,
> file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:9136kB oom_score_adj:0
> Oct 21 12:21:57 ubuntu kernel: [24024.685821] oom_reaper: reaped process
> 1220764 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
>
> Regards,
> Juha
>
> El mié., 21 oct. 2020 a las 10:04, Juha Mynttinen (<
> juha.myntti...@gmail.com>) escribió:
>
>> Hi,
>>
>> You're right, I thought about this also after writing the last comment -
>> for example on Linux, the Kernel by default overcommits memory allocations
>> and this approach doesn't work (doesn't make JVM crash right when it
>> starts).
>>
>> I dug a little deeper. It seems that for ci-environments there are
>> specific compilation scripts such as
>> https://github.com/apache/flink/blob/master/tools/ci/compile.sh#L45 that
>> explicitly set flink.forkCount and flink.forkCountTestPackage to lower than
>> (?) default values. But for anybody compiling Flink locally, mvn uses the
>> default values, which might not work, as in my case.
>>
>> I think a good goal would be that a developer can just git clone Flink
>> and build it following simple instructions. Preferably there would be zero
>> setup needed, just a simple command to run. The current situation is that
>> building Flink is "simple", just run a specific mvn command. This
>> simplicity comes with the price that things can break in unexpected ways:
>>
>> 1) There are things building Flink expects but doesn't check (
>> https://ci.apache.org/projects/flink/flink-docs-stable/flinkDev/building.html#build-flink
>> )
>>  * The correct Maven version
>> *  A suitable Java version
>> 2) There's this issue with the count of CPU cores vs available mem.
>>
>> The case 1) is documented, case 2) is not.
>>
>> Fix options
>>
>> a)
>>
>> Document case 2) and instruct how to set flink.forkCountTestPackage (if
>> needed). Something like "Flink tests are run on parallel JVMs, each taking
>> 2GB of RAM. There are by default as many JVMs as there are physical cores.
>> If your machine doesn't have at least 2GB * count of cores of RAM,
>> the tests can fail. You can set the count of JVMs using Maven property
>> flink.forkCountTestPackage to a lower value".
>>
>> b)
>>
>> Create a Linux specific Maven wrapper script for local execution too. The
>> wrapper script could download the correct Maven version, check the Java
>> version, calculate the max number of forks etc. A quick way to calculate
>> the max fork count
>>
>> expr `cat /proc/meminfo | grep MemTotal | awk '{print $2}'` / 2097152
>>
>> Regards,
>> Juha
>>
>>
>>
>>
>>
>> El mar., 20 oct. 2020 a las 21:23, Khachatryan Roman (<
>> khachatryan.ro...@gmail.com>) escribió:
>>
>>> I think you are right and I like the idea of failing the build fast.
>>> However, when trying this approach on my local machine it didn't help:
>>> the build didn't crash (probably, because of overcommit).
>>> Did you try this approach in your VM?
>>>
>>> Regards,
>>> Roman
>>>
>>>
>>> On Tue, Oct 20, 2020 at 12:12 PM Juha Mynttinen <
>>> juha.myntti...@gmail.com> wrote:
>>>
>>>> Hey,
>>>>
>>>> > Currently, tests do not run in parallel
>>>>
>>>> I don't think this is true, at least 100%. In 'top' it's clearly
>>>> visible that there are multiple JVMs. If not running tests in parallel,
>>>> what are these doing? In the main pom.xml there's configuration for the
>>>> plug-in 'maven-surefire-plugin'.
>>>>
>>>> I'm not a Maven expert, but it looks to me like this: in
>>>> https://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html
>>>> it says "The other possibility for parallel test execution is setting
>>>> the parameter forkCount to a value higher than 1". I think that's
>>>> happening in Flink:
>>>>
>>>> <forkCount>${flink.forkCount}</forkCount>
>>>>
>>>> And
>>>>
>>>> <flink.forkCount>1C</flink.forkCount>
>>>>
>>>> This means there's gonna be 1 * count_of_cpus forks.
>>>>
>>>> And this one:
>>>>
>>>> <argLine>-Xms256m -Xmx2048m -Dmvn.forkNumber=${surefire.forkNumber}
>>>> -XX:+UseG1GC</argLine>
>>>>
>>>> In my case, I have 5 CPUs, so 5 forks. I think what now happens is that
>>>> since each fork gets max 2048m heap, there's kind of mem requirement of CPU
>>>> count * 2048 m. In my case, I have 8GB of mem, which is less than max 5 *
>>>> 2048mb.
>>>>
>>>> This could be better..... I think it's a completely valid computer that
>>>> has RAM < count_of_cpus * 2048 mb, take e.g. AMD ryzen 3900X with 12 cores
>>>> and put 16 GB of RAM there. At least memory & CPU requirements should be
>>>> documented?
>>>>
>>>> If the tests really need 2GB of heap, then maybe the forkCount should
>>>> be based on the available RAM rather than available cores, e.g. floor(RAM /
>>>> 2GB)? I don't if that's doable in Maven....
>>>>
>>>> I think an easy and non-intrusive improvement would be to change '
>>>> -Xms256' to ' -Xms2048' (ms to match mx) so that the JVM would allocate
>>>> right away 2048mb (when it starts). If there's not enough memory, the tests
>>>> would fail immediately (JVM couldn't start). The tests would probably fail
>>>> anyways (my case) - better fail fast..
>>>>
>>>> Regards,
>>>> Juha
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> El mar., 20 oct. 2020 a las 11:16, Khachatryan Roman (<
>>>> khachatryan.ro...@gmail.com>) escribió:
>>>>
>>>>> Thanks for sharing this,
>>>>> I think the activity of OOM-Killer means high memory pressure (it just
>>>>> kills a process with the highest score of memory consumption).
>>>>> High CPU usage can only be a consequence of it, being constant GC.
>>>>>
>>>>> Currently, tests do not run in parallel, but high memory usage can be
>>>>> caused by the nature test (e.g. running Flink with high parallelism).
>>>>> So I think the best way to deal with this is to use VM with more
>>>>> memory.
>>>>>
>>>>> Regards,
>>>>> Roman
>>>>>
>>>>>
>>>>> On Tue, Oct 20, 2020 at 8:56 AM Juha Mynttinen <
>>>>> juha.myntti...@gmail.com> wrote:
>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> Good hint that /var/log/kern.log. This time I can see this:
>>>>>>
>>>>>> Oct 20 09:44:48 ubuntu kernel: [ 1925.651551]
>>>>>> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service
>>>>>> ,task=java,pid=270024,uid=1000
>>>>>> Oct 20 09:44:48 ubuntu kernel: [ 1925.651632] Out of memory: Killed
>>>>>> process 270024 (java) total-vm:9841596kB, anon-rss:4820380kB, 
>>>>>> file-rss:0kB,
>>>>>> shmem-rss:0kB, UID:1000 pgtables:11780kB oom_score_adj:0
>>>>>> Oct 20 09:44:48 ubuntu kernel: [ 1925.844155] oom_reaper: reaped
>>>>>> process 270024 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
>>>>>>
>>>>>> The next question is why does this happen.... I'll try to dig deeper.
>>>>>>
>>>>>> About the CPU load. I have five CPUs. Theoretically it makes sense to
>>>>>> run five tests at time to max out the CPUs. However, when I look at what
>>>>>> the five Java processes (that MVN forks) are doing, it can be seen that
>>>>>> each of those processes have a large number of threads wanting to use 
>>>>>> CPU.
>>>>>> Here's an example from 'top -H'
>>>>>>
>>>>>>   top - 09:42:03 up 29 min,  1 user,  load average: 17,00, 12,86, 8,81
>>>>>> Threads: 1099 total,  21 running, 1078 sleeping,   0 stopped,   0
>>>>>> zombie
>>>>>> %Cpu(s): 90,5 us,  9,4 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,1
>>>>>> si,  0,0 st
>>>>>> MiB Mem :   7961,6 total,   1614,3 free,   4023,8 used,   2323,5
>>>>>> buff/cache
>>>>>> MiB Swap:   2048,0 total,   2047,0 free,      1,0 used.   3638,9
>>>>>> avail Mem
>>>>>>
>>>>>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM
>>>>>> TIME+ COMMAND
>>>>>>
>>>>>>  254825 juha      20   0 4250424 195768  27596 R  20,9   2,4
>>>>>> 0:01.41 C2 CompilerThre
>>>>>>
>>>>>>  255116 juha      20   0 2820448  99240  27488 R  20,3   1,2
>>>>>> 0:00.78 java
>>>>>>
>>>>>>  254968 juha      20   0 5312696 125212  27716 R  19,9   1,5
>>>>>> 0:01.16 java
>>>>>>
>>>>>>  255027 juha      20   0 5310648 108716  27496 R  19,9   1,3
>>>>>> 0:00.90 java
>>>>>>
>>>>>>  255123 juha      20   0 2820448  99120  27420 R  19,3   1,2
>>>>>> 0:00.78 java
>>>>>>
>>>>>>  254829 juha      20   0 4240356 184376  27792 R  17,9   2,3
>>>>>> 0:01.26 C2 CompilerThre
>>>>>>
>>>>>>  253993 juha      20   0 6436132 276808  28000 R  17,6   3,4
>>>>>> 0:02.47 C2 CompilerThre
>>>>>>
>>>>>>  254793 juha      20   0 4250424 195768  27596 R  17,3   2,4
>>>>>> 0:01.76 java
>>>>>>
>>>>>>  254801 juha      20   0 4240356 184376  27792 R  16,3   2,3
>>>>>> 0:01.67 java
>>>>>>
>>>>>>  254298 juha      20   0 6510340 435360  28212 R  15,6   5,3
>>>>>> 0:02.82 C2 CompilerThre
>>>>>>
>>>>>>  255145 juha      20   0 2820448  99240  27488 S  15,6   1,2
>>>>>> 0:00.51 C2 CompilerThre
>>>>>>
>>>>>>  255045 juha      20   0 5310648 108716  27496 R  15,3   1,3
>>>>>> 0:00.62 C2 CompilerThre
>>>>>>
>>>>>>  255151 juha      20   0 2820448  99120  27420 S  14,0   1,2
>>>>>> 0:00.47 C2 CompilerThre
>>>>>>
>>>>>>  254986 juha      20   0 5312696 125212  27716 R  12,6   1,5
>>>>>> 0:00.76 C2 CompilerThre
>>>>>>
>>>>>>  253980 juha      20   0 6436132 276808  28000 S  11,6   3,4
>>>>>> 0:02.63 java
>>>>>>
>>>>>>  255148 juha      20   0 2820448  99240  27488 S  10,6   1,2
>>>>>> 0:00.39 C1 CompilerThre
>>>>>>
>>>>>>  255154 juha      20   0 2820448  99120  27420 S   9,6   1,2
>>>>>> 0:00.37 C1 CompilerThre
>>>>>>
>>>>>>  254457 juha      20   0 4269900 218036  28236 R   9,3   2,7
>>>>>> 0:02.22 C2 CompilerThre
>>>>>>
>>>>>>  254299 juha      20   0 6510340 435360  28212 S   8,6   5,3
>>>>>> 0:01.30 C1 CompilerThre
>>>>>>
>>>>>>  255047 juha      20   0 5310648 108716  27496 S   8,6   1,3
>>>>>> 0:00.42 C1 CompilerThre
>>>>>>
>>>>>>  253994 juha      20   0 6436132 276808  28000 R   7,3   3,4
>>>>>> 0:01.10 C1 CompilerThre
>>>>>>
>>>>>>  255312 juha      20   0 4250424 195768  27596 R   7,0   2,4
>>>>>> 0:00.21 C2 CompilerThre
>>>>>>
>>>>>>  254831 juha      20   0 4240356 184376  27792 S   6,3   2,3
>>>>>> 0:00.62 C1 CompilerThre
>>>>>>
>>>>>>  254988 juha      20   0 5312696 125212  27716 S   6,3   1,5
>>>>>> 0:00.45 C1 CompilerThre
>>>>>>
>>>>>>  254828 juha      20   0 4250424 195768  27596 S   6,0   2,4
>>>>>> 0:00.64 C1 CompilerThre
>>>>>>
>>>>>>  254720 juha      20   0 6510340 435360  28212 S   5,0   5,3
>>>>>> 0:00.15 flink-akka.acto
>>>>>>
>>>>>>
>>>>>> It can be seen that the JIT related threads consume quite a lot of
>>>>>> CPU, essentially leaving less CPU available to the actual test code. By
>>>>>> using htop I can also see the garbage collection related threads eating
>>>>>> CPU. This doesn't seem right. I think it'd make sense to run the tests 
>>>>>> with
>>>>>> less parallelism to better utilize the CPUs. Having greatly more threads
>>>>>> wanting CPU slows things down (not speed up).
>>>>>>
>>>>>> However, AFAIK high CPU load shouldn't trigger OOM-killer?
>>>>>>
>>>>>> Regards,
>>>>>> Juha
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> El lun., 19 oct. 2020 a las 20:48, Khachatryan Roman (<
>>>>>> khachatryan.ro...@gmail.com>) escribió:
>>>>>>
>>>>>>> Hey,
>>>>>>>
>>>>>>> One reason could be that a resource-intensive test was killed by oom
>>>>>>> killer. You can inspect /var/log/kern.log for the related messages in 
>>>>>>> your
>>>>>>> VM.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Roman
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 19, 2020 at 5:57 PM Juha Mynttinen <
>>>>>>> juha.myntti...@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hey,
>>>>>>>>
>>>>>>>> I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1
>>>>>>>> in a virtual machine on Windows 10. I'm using OpenJDK 11.0.8. I'm on 
>>>>>>>> the
>>>>>>>> master branch, commit 9eae578ae592254d54bc51c679644e8e84c65152.
>>>>>>>>
>>>>>>>> The command I'm using:
>>>>>>>>
>>>>>>>> apache-maven-3.2.5/bin/mvn clean verify
>>>>>>>>
>>>>>>>> The output:
>>>>>>>>
>>>>>>>> [INFO] Flink : Tests ...................................... FAILURE
>>>>>>>> [14:38 min]
>>>>>>>> [INFO] Flink : Streaming Scala ............................ SKIPPED
>>>>>>>> [INFO] Flink : Connectors : HCatalog ...................... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : Base .......................... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : Files ......................... SKIPPED
>>>>>>>> [INFO] Flink : Table : .................................... SKIPPED
>>>>>>>> [INFO] Flink : Table : Common ............................. SKIPPED
>>>>>>>> [INFO] Flink : Table : API Java ........................... SKIPPED
>>>>>>>> [INFO] Flink : Table : API Java bridge .................... SKIPPED
>>>>>>>> [INFO] Flink : Table : API Scala .......................... SKIPPED
>>>>>>>> [INFO] Flink : Table : API Scala bridge ................... SKIPPED
>>>>>>>> [INFO] Flink : Table : SQL Parser ......................... SKIPPED
>>>>>>>> [INFO] Flink : Libraries : ................................ SKIPPED
>>>>>>>> [INFO] Flink : Libraries : CEP ............................ SKIPPED
>>>>>>>> [INFO] Flink : Table : Planner ............................ SKIPPED
>>>>>>>> [INFO] Flink : Table : SQL Parser Hive .................... SKIPPED
>>>>>>>> [INFO] Flink : Table : Runtime Blink ...................... SKIPPED
>>>>>>>> [INFO] Flink : Table : Planner Blink ...................... SKIPPED
>>>>>>>> [INFO] Flink : Metrics : JMX .............................. SKIPPED
>>>>>>>> [INFO] Flink : Formats : .................................. SKIPPED
>>>>>>>> [INFO] Flink : Formats : Json ............................. SKIPPED
>>>>>>>> [INFO] Flink : Connectors : Kafka base .................... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : Elasticsearch base ............ SKIPPED
>>>>>>>> [INFO] Flink : Connectors : Elasticsearch 5 ............... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : Elasticsearch 6 ............... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : Elasticsearch 7 ............... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : HBase base .................... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : HBase 1.4 ..................... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : HBase 2.2 ..................... SKIPPED
>>>>>>>> [INFO] Flink : Formats : Hadoop bulk ...................... SKIPPED
>>>>>>>> [INFO] Flink : Formats : Orc .............................. SKIPPED
>>>>>>>> [INFO] Flink : Formats : Orc nohive ....................... SKIPPED
>>>>>>>> [INFO] Flink : Formats : Avro ............................. SKIPPED
>>>>>>>> [INFO] Flink : Formats : Parquet .......................... SKIPPED
>>>>>>>> [INFO] Flink : Formats : Csv .............................. SKIPPED
>>>>>>>> [INFO] Flink : Connectors : Hive .......................... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : JDBC .......................... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : RabbitMQ ...................... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : Twitter ....................... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : Nifi .......................... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : Cassandra ..................... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : Filesystem .................... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : Kafka ......................... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : Google PubSub ................. SKIPPED
>>>>>>>> [INFO] Flink : Connectors : Kinesis ....................... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : SQL : Elasticsearch 6 ......... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : SQL : Elasticsearch 7 ......... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : SQL : HBase 1.4 ............... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : SQL : HBase 2.2 ............... SKIPPED
>>>>>>>> [INFO] Flink : Connectors : SQL : Hive 1.2.2 .............. SKIPPED
>>>>>>>> [INFO] Flink : Connectors : SQL : Hive 2.2.0 .............. SKIPPED
>>>>>>>> [INFO] Flink : Connectors : SQL : Hive 2.3.6 .............. SKIPPED
>>>>>>>> [INFO] Flink : Connectors : SQL : Hive 3.1.2 .............. SKIPPED
>>>>>>>> [INFO] Flink : Connectors : SQL : Kafka ................... SKIPPED
>>>>>>>> [INFO] Flink : Formats : Avro confluent registry .......... SKIPPED
>>>>>>>> [INFO] Flink : Formats : Sequence file .................... SKIPPED
>>>>>>>> [INFO] Flink : Formats : Compress ......................... SKIPPED
>>>>>>>> [INFO] Flink : Formats : SQL Orc .......................... SKIPPED
>>>>>>>> [INFO] Flink : Formats : SQL Parquet ...................... SKIPPED
>>>>>>>> [INFO] Flink : Formats : SQL Avro ......................... SKIPPED
>>>>>>>> [INFO] Flink : Examples : Streaming ....................... SKIPPED
>>>>>>>> [INFO] Flink : Examples : Table ........................... SKIPPED
>>>>>>>> [INFO] Flink : Examples : Build Helper : .................. SKIPPED
>>>>>>>> [INFO] Flink : Examples : Build Helper : Streaming Twitter  SKIPPED
>>>>>>>> [INFO] Flink : Examples : Build Helper : Streaming State machine
>>>>>>>> SKIPPED
>>>>>>>> [INFO] Flink : Examples : Build Helper : Streaming Google PubSub
>>>>>>>> SKIPPED
>>>>>>>> [INFO] Flink : Container .................................. SKIPPED
>>>>>>>> [INFO] Flink : Queryable state : Runtime .................. SKIPPED
>>>>>>>> [INFO] Flink : Mesos ...................................... SKIPPED
>>>>>>>> [INFO] Flink : Kubernetes ................................. SKIPPED
>>>>>>>> [INFO] Flink : Yarn ....................................... SKIPPED
>>>>>>>> [INFO] Flink : Libraries : Gelly .......................... SKIPPED
>>>>>>>> [INFO] Flink : Libraries : Gelly scala .................... SKIPPED
>>>>>>>> [INFO] Flink : Libraries : Gelly Examples ................. SKIPPED
>>>>>>>> [INFO] Flink : External resources : ....................... SKIPPED
>>>>>>>> [INFO] Flink : External resources : GPU ................... SKIPPED
>>>>>>>> [INFO] Flink : Metrics : Dropwizard ....................... SKIPPED
>>>>>>>> [INFO] Flink : Metrics : Graphite ......................... SKIPPED
>>>>>>>> [INFO] Flink : Metrics : InfluxDB ......................... SKIPPED
>>>>>>>> [INFO] Flink : Metrics : Prometheus ....................... SKIPPED
>>>>>>>> [INFO] Flink : Metrics : StatsD ........................... SKIPPED
>>>>>>>> [INFO] Flink : Metrics : Datadog .......................... SKIPPED
>>>>>>>> [INFO] Flink : Metrics : Slf4j ............................ SKIPPED
>>>>>>>> [INFO] Flink : Libraries : CEP Scala ...................... SKIPPED
>>>>>>>> [INFO] Flink : Table : Uber ............................... SKIPPED
>>>>>>>> [INFO] Flink : Table : Uber Blink ......................... SKIPPED
>>>>>>>> [INFO] Flink : Python ..................................... SKIPPED
>>>>>>>> [INFO] Flink : Table : SQL Client ......................... SKIPPED
>>>>>>>> [INFO] Flink : Libraries : State processor API ............ SKIPPED
>>>>>>>> [INFO] Flink : ML : ....................................... SKIPPED
>>>>>>>> [INFO] Flink : ML : API ................................... SKIPPED
>>>>>>>> [INFO] Flink : ML : Lib ................................... SKIPPED
>>>>>>>> [INFO] Flink : ML : Uber .................................. SKIPPED
>>>>>>>> [INFO] Flink : Scala shell ................................ SKIPPED
>>>>>>>> [INFO] Flink : Dist ....................................... SKIPPED
>>>>>>>> [INFO] Flink : Yarn Tests ................................. SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : ................................ SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : CLI ............................ SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Parent Child classloading program SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Parent Child classloading lib-package
>>>>>>>> SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Dataset allround ............... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Dataset Fine-grained recovery .. SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Datastream allround ............ SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Batch SQL ...................... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Stream SQL ..................... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Bucketing sink ................. SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Distributed cache via blob ..... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : High parallelism iterations .... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Stream stateful job upgrade .... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Queryable state ................ SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Local recovery and allocation .. SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Elasticsearch 5 ................ SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Elasticsearch 6 ................ SKIPPED
>>>>>>>> [INFO] Flink : Quickstart : ............................... SKIPPED
>>>>>>>> [INFO] Flink : Quickstart : Java .......................... SKIPPED
>>>>>>>> [INFO] Flink : Quickstart : Scala ......................... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Quickstart ..................... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Confluent schema registry ...... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Stream state TTL ............... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : SQL client ..................... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Streaming file sink ............ SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : State evolution ................ SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : RocksDB state memory control ... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Common ......................... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Metrics availability ........... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Metrics reporter prometheus .... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Heavy deployment ............... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Connectors : Google PubSub ..... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Streaming Kafka base ........... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Streaming Kafka ................ SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Plugins : ...................... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Plugins : Dummy fs ............. SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Plugins : Another dummy fs ..... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : TPCH ........................... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Streaming Kinesis .............. SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Elasticsearch 7 ................ SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Common Kafka ................... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : TPCDS .......................... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Netty shuffle memory control ... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : Python ......................... SKIPPED
>>>>>>>> [INFO] Flink : E2E Tests : HBase .......................... SKIPPED
>>>>>>>> [INFO] Flink : State backends : Heap spillable ............ SKIPPED
>>>>>>>> [INFO] Flink : Contrib : .................................. SKIPPED
>>>>>>>> [INFO] Flink : Contrib : Connectors : Wikiedits ........... SKIPPED
>>>>>>>> [INFO] Flink : FileSystems : Tests ........................ SKIPPED
>>>>>>>> [INFO] Flink : Docs ....................................... SKIPPED
>>>>>>>> [INFO] Flink : Walkthrough : .............................. SKIPPED
>>>>>>>> [INFO] Flink : Walkthrough : Common ....................... SKIPPED
>>>>>>>> [INFO] Flink : Walkthrough : Datastream Java .............. SKIPPED
>>>>>>>> [INFO] Flink : Walkthrough : Datastream Scala ............. SKIPPED
>>>>>>>> [INFO]
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>> [INFO] BUILD FAILURE
>>>>>>>> [INFO]
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>> [INFO] Total time: 36:49 min
>>>>>>>> [INFO] Finished at: 2020-10-19T18:24:46+03:00
>>>>>>>> [INFO] Final Memory: 179M/614M
>>>>>>>> [INFO]
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>> [ERROR] Failed to execute goal
>>>>>>>> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test
>>>>>>>> (integration-tests) on project flink-tests: There are test failures.
>>>>>>>> [ERROR]
>>>>>>>> [ERROR] Please refer to
>>>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire-reports for the
>>>>>>>> individual test results.
>>>>>>>> [ERROR] Please refer to dump files (if any exist) [date].dump,
>>>>>>>> [date]-jvmRun[N].dump and [date].dumpstream.
>>>>>>>> [ERROR] ExecutionException The forked VM terminated without
>>>>>>>> properly saying goodbye. VM crash or System.exit called?
>>>>>>>> [ERROR] Command was /bin/sh -c cd
>>>>>>>> /home/juha/git/apache-flink/flink-tests/target &&
>>>>>>>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m
>>>>>>>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar
>>>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar
>>>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire
>>>>>>>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp
>>>>>>>> surefire_122313349068739873924160tmp
>>>>>>>> [ERROR] Error occurred in starting fork, check output in log
>>>>>>>> [ERROR] Process Exit Code: 137
>>>>>>>> [ERROR] Crashed tests:
>>>>>>>> [ERROR]
>>>>>>>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
>>>>>>>> [ERROR]
>>>>>>>> org.apache.maven.surefire.booter.SurefireBooterForkException:
>>>>>>>> ExecutionException The forked VM terminated without properly saying
>>>>>>>> goodbye. VM crash or System.exit called?
>>>>>>>> [ERROR] Command was /bin/sh -c cd
>>>>>>>> /home/juha/git/apache-flink/flink-tests/target &&
>>>>>>>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m
>>>>>>>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar
>>>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar
>>>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire
>>>>>>>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp
>>>>>>>> surefire_122313349068739873924160tmp
>>>>>>>> [ERROR] Error occurred in starting fork, check output in log
>>>>>>>> [ERROR] Process Exit Code: 137
>>>>>>>> [ERROR] Crashed tests:
>>>>>>>> [ERROR]
>>>>>>>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
>>>>>>>> [ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
>>>>>>>> [ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
>>>>>>>> [ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
>>>>>>>> [ERROR] at
>>>>>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>>>> Method)
>>>>>>>> [ERROR] at
>>>>>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>>> [ERROR] at
>>>>>>>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>> [ERROR] at
>>>>>>>> java.base/java.lang.reflect.Method.invoke(Method.java:566)
>>>>>>>> [ERROR] at
>>>>>>>> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
>>>>>>>> [ERROR] at
>>>>>>>> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
>>>>>>>> [ERROR] at
>>>>>>>> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
>>>>>>>> [ERROR] at
>>>>>>>> org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
>>>>>>>> [ERROR] Caused by:
>>>>>>>> org.apache.maven.surefire.booter.SurefireBooterForkException: The 
>>>>>>>> forked VM
>>>>>>>> terminated without properly saying goodbye. VM crash or System.exit 
>>>>>>>> called?
>>>>>>>> [ERROR] Command was /bin/sh -c cd
>>>>>>>> /home/juha/git/apache-flink/flink-tests/target &&
>>>>>>>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m
>>>>>>>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar
>>>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar
>>>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire
>>>>>>>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp
>>>>>>>> surefire_122313349068739873924160tmp
>>>>>>>> [ERROR] Error occurred in starting fork, check output in log
>>>>>>>> [ERROR] Process Exit Code: 137
>>>>>>>> [ERROR] Crashed tests:
>>>>>>>> [ERROR]
>>>>>>>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
>>>>>>>> [ERROR] at
>>>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
>>>>>>>> [ERROR] at
>>>>>>>> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>>>>>>>> [ERROR] at
>>>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>>>>>>> [ERROR] at
>>>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>>>>>>> [ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
>>>>>>>> [ERROR] -> [Help 1]
>>>>>>>> [ERROR]
>>>>>>>> [ERROR] To see the full stack trace of the errors, re-run Maven
>>>>>>>> with the -e switch.
>>>>>>>> [ERROR] Re-run Maven using the -X switch to enable full debug
>>>>>>>> logging.
>>>>>>>> [ERROR]
>>>>>>>> [ERROR] For more information about the errors and possible
>>>>>>>> solutions, please read the following articles:
>>>>>>>> [ERROR] [Help 1]
>>>>>>>> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
>>>>>>>> [ERROR]
>>>>>>>> [ERROR] After correcting the problems, you can resume the build
>>>>>>>> with the command
>>>>>>>> [ERROR]   mvn <goals> -rf :flink-tests
>>>>>>>>
>>>>>>>> The jvmdump-files look like this:
>>>>>>>>
>>>>>>>> # Created at 2020-10-19T18:14:22.869
>>>>>>>> java.io.IOException: Stream closed
>>>>>>>>         at
>>>>>>>> java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
>>>>>>>>         at
>>>>>>>> java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
>>>>>>>>         at
>>>>>>>> java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
>>>>>>>>         at
>>>>>>>> java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>>>>>>>>         at
>>>>>>>> java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>>>>>>>>         at
>>>>>>>> java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>>>>>>>>         at
>>>>>>>> java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
>>>>>>>>         at java.base/java.io.Reader.read(Reader.java:189)
>>>>>>>>         at java.base/java.util.Scanner.readInput(Scanner.java:882)
>>>>>>>>         at
>>>>>>>> java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796)
>>>>>>>>         at
>>>>>>>> java.base/java.util.Scanner.hasNextLine(Scanner.java:1610)
>>>>>>>>         at
>>>>>>>> org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
>>>>>>>>         at
>>>>>>>> org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
>>>>>>>>         at
>>>>>>>> org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
>>>>>>>>         at
>>>>>>>> org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
>>>>>>>>         at
>>>>>>>> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>>>>>>>>         at
>>>>>>>> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>>>>>>>>         at
>>>>>>>> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>>>>>>>>         at
>>>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>>>>>>>         at
>>>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>>>>>>>         at java.base/java.lang.Thread.run(Thread.java:834)
>>>>>>>>
>>>>>>>>
>>>>>>>> # Created at 2020-10-19T18:14:22.870
>>>>>>>> System.exit() or native command error interrupted process checker.
>>>>>>>> java.lang.IllegalStateException: error [STOPPED] to read process
>>>>>>>> 898133
>>>>>>>>         at
>>>>>>>> org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
>>>>>>>>         at
>>>>>>>> org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
>>>>>>>>         at
>>>>>>>> org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
>>>>>>>>         at
>>>>>>>> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>>>>>>>>         at
>>>>>>>> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>>>>>>>>         at
>>>>>>>> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>>>>>>>>         at
>>>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>>>>>>>         at
>>>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>>>>>>>         at java.base/java.lang.Thread.run(Thread.java:834)
>>>>>>>>
>>>>>>>>
>>>>>>>> I found some JIRA tickets with " The forked VM terminated without
>>>>>>>> properly saying goodbye":
>>>>>>>>
>>>>>>>> https://issues.apache.org/jira/browse/FLINK-18375
>>>>>>>> https://issues.apache.org/jira/browse/FLINK-2466
>>>>>>>>
>>>>>>>> I don't see how these could explain the issue I'm witnessing....
>>>>>>>>
>>>>>>>> I wonder if the issue is related to the VM running "too hot". 'top'
>>>>>>>> shows very high load averages.
>>>>>>>>
>>>>>>>> The crash can be reproduced.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Juha
>>>>>>>>
>>>>>>>>

Reply via email to