I think you are right and I like the idea of failing the build fast.
However, when trying this approach on my local machine it didn't help: the
build didn't crash (probably, because of overcommit).
Did you try this approach in your VM?

Regards,
Roman


On Tue, Oct 20, 2020 at 12:12 PM Juha Mynttinen <juha.myntti...@gmail.com>
wrote:

> Hey,
>
> > Currently, tests do not run in parallel
>
> I don't think this is true, at least 100%. In 'top' it's clearly visible
> that there are multiple JVMs. If not running tests in parallel, what are
> these doing? In the main pom.xml there's configuration for the plug-in
> 'maven-surefire-plugin'.
>
> I'm not a Maven expert, but it looks to me like this: in
> https://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html
> it says "The other possibility for parallel test execution is setting the
> parameter forkCount to a value higher than 1". I think that's happening
> in Flink:
>
> <forkCount>${flink.forkCount}</forkCount>
>
> And
>
> <flink.forkCount>1C</flink.forkCount>
>
> This means there's gonna be 1 * count_of_cpus forks.
>
> And this one:
>
> <argLine>-Xms256m -Xmx2048m -Dmvn.forkNumber=${surefire.forkNumber}
> -XX:+UseG1GC</argLine>
>
> In my case, I have 5 CPUs, so 5 forks. I think what now happens is that
> since each fork gets max 2048m heap, there's kind of mem requirement of CPU
> count * 2048 m. In my case, I have 8GB of mem, which is less than max 5 *
> 2048mb.
>
> This could be better..... I think it's a completely valid computer that
> has RAM < count_of_cpus * 2048 mb, take e.g. AMD ryzen 3900X with 12 cores
> and put 16 GB of RAM there. At least memory & CPU requirements should be
> documented?
>
> If the tests really need 2GB of heap, then maybe the forkCount should be
> based on the available RAM rather than available cores, e.g. floor(RAM /
> 2GB)? I don't if that's doable in Maven....
>
> I think an easy and non-intrusive improvement would be to change '
> -Xms256' to ' -Xms2048' (ms to match mx) so that the JVM would allocate
> right away 2048mb (when it starts). If there's not enough memory, the tests
> would fail immediately (JVM couldn't start). The tests would probably fail
> anyways (my case) - better fail fast..
>
> Regards,
> Juha
>
>
>
>
>
>
>
>
> El mar., 20 oct. 2020 a las 11:16, Khachatryan Roman (<
> khachatryan.ro...@gmail.com>) escribió:
>
>> Thanks for sharing this,
>> I think the activity of OOM-Killer means high memory pressure (it just
>> kills a process with the highest score of memory consumption).
>> High CPU usage can only be a consequence of it, being constant GC.
>>
>> Currently, tests do not run in parallel, but high memory usage can be
>> caused by the nature test (e.g. running Flink with high parallelism).
>> So I think the best way to deal with this is to use VM with more memory.
>>
>> Regards,
>> Roman
>>
>>
>> On Tue, Oct 20, 2020 at 8:56 AM Juha Mynttinen <juha.myntti...@gmail.com>
>> wrote:
>>
>>> Hey,
>>>
>>> Good hint that /var/log/kern.log. This time I can see this:
>>>
>>> Oct 20 09:44:48 ubuntu kernel: [ 1925.651551]
>>> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service
>>> ,task=java,pid=270024,uid=1000
>>> Oct 20 09:44:48 ubuntu kernel: [ 1925.651632] Out of memory: Killed
>>> process 270024 (java) total-vm:9841596kB, anon-rss:4820380kB, file-rss:0kB,
>>> shmem-rss:0kB, UID:1000 pgtables:11780kB oom_score_adj:0
>>> Oct 20 09:44:48 ubuntu kernel: [ 1925.844155] oom_reaper: reaped process
>>> 270024 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
>>>
>>> The next question is why does this happen.... I'll try to dig deeper.
>>>
>>> About the CPU load. I have five CPUs. Theoretically it makes sense to
>>> run five tests at time to max out the CPUs. However, when I look at what
>>> the five Java processes (that MVN forks) are doing, it can be seen that
>>> each of those processes have a large number of threads wanting to use CPU.
>>> Here's an example from 'top -H'
>>>
>>>   top - 09:42:03 up 29 min,  1 user,  load average: 17,00, 12,86, 8,81
>>> Threads: 1099 total,  21 running, 1078 sleeping,   0 stopped,   0 zombie
>>> %Cpu(s): 90,5 us,  9,4 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,1 si,
>>>  0,0 st
>>> MiB Mem :   7961,6 total,   1614,3 free,   4023,8 used,   2323,5
>>> buff/cache
>>> MiB Swap:   2048,0 total,   2047,0 free,      1,0 used.   3638,9 avail
>>> Mem
>>>
>>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
>>> COMMAND
>>>
>>>  254825 juha      20   0 4250424 195768  27596 R  20,9   2,4   0:01.41
>>> C2 CompilerThre
>>>
>>>  255116 juha      20   0 2820448  99240  27488 R  20,3   1,2   0:00.78
>>> java
>>>
>>>  254968 juha      20   0 5312696 125212  27716 R  19,9   1,5   0:01.16
>>> java
>>>
>>>  255027 juha      20   0 5310648 108716  27496 R  19,9   1,3   0:00.90
>>> java
>>>
>>>  255123 juha      20   0 2820448  99120  27420 R  19,3   1,2   0:00.78
>>> java
>>>
>>>  254829 juha      20   0 4240356 184376  27792 R  17,9   2,3   0:01.26
>>> C2 CompilerThre
>>>
>>>  253993 juha      20   0 6436132 276808  28000 R  17,6   3,4   0:02.47
>>> C2 CompilerThre
>>>
>>>  254793 juha      20   0 4250424 195768  27596 R  17,3   2,4   0:01.76
>>> java
>>>
>>>  254801 juha      20   0 4240356 184376  27792 R  16,3   2,3   0:01.67
>>> java
>>>
>>>  254298 juha      20   0 6510340 435360  28212 R  15,6   5,3   0:02.82
>>> C2 CompilerThre
>>>
>>>  255145 juha      20   0 2820448  99240  27488 S  15,6   1,2   0:00.51
>>> C2 CompilerThre
>>>
>>>  255045 juha      20   0 5310648 108716  27496 R  15,3   1,3   0:00.62
>>> C2 CompilerThre
>>>
>>>  255151 juha      20   0 2820448  99120  27420 S  14,0   1,2   0:00.47
>>> C2 CompilerThre
>>>
>>>  254986 juha      20   0 5312696 125212  27716 R  12,6   1,5   0:00.76
>>> C2 CompilerThre
>>>
>>>  253980 juha      20   0 6436132 276808  28000 S  11,6   3,4   0:02.63
>>> java
>>>
>>>  255148 juha      20   0 2820448  99240  27488 S  10,6   1,2   0:00.39
>>> C1 CompilerThre
>>>
>>>  255154 juha      20   0 2820448  99120  27420 S   9,6   1,2   0:00.37
>>> C1 CompilerThre
>>>
>>>  254457 juha      20   0 4269900 218036  28236 R   9,3   2,7   0:02.22
>>> C2 CompilerThre
>>>
>>>  254299 juha      20   0 6510340 435360  28212 S   8,6   5,3   0:01.30
>>> C1 CompilerThre
>>>
>>>  255047 juha      20   0 5310648 108716  27496 S   8,6   1,3   0:00.42
>>> C1 CompilerThre
>>>
>>>  253994 juha      20   0 6436132 276808  28000 R   7,3   3,4   0:01.10
>>> C1 CompilerThre
>>>
>>>  255312 juha      20   0 4250424 195768  27596 R   7,0   2,4   0:00.21
>>> C2 CompilerThre
>>>
>>>  254831 juha      20   0 4240356 184376  27792 S   6,3   2,3   0:00.62
>>> C1 CompilerThre
>>>
>>>  254988 juha      20   0 5312696 125212  27716 S   6,3   1,5   0:00.45
>>> C1 CompilerThre
>>>
>>>  254828 juha      20   0 4250424 195768  27596 S   6,0   2,4   0:00.64
>>> C1 CompilerThre
>>>
>>>  254720 juha      20   0 6510340 435360  28212 S   5,0   5,3   0:00.15
>>> flink-akka.acto
>>>
>>>
>>> It can be seen that the JIT related threads consume quite a lot of CPU,
>>> essentially leaving less CPU available to the actual test code. By using
>>> htop I can also see the garbage collection related threads eating CPU. This
>>> doesn't seem right. I think it'd make sense to run the tests with less
>>> parallelism to better utilize the CPUs. Having greatly more threads wanting
>>> CPU slows things down (not speed up).
>>>
>>> However, AFAIK high CPU load shouldn't trigger OOM-killer?
>>>
>>> Regards,
>>> Juha
>>>
>>>
>>>
>>>
>>> El lun., 19 oct. 2020 a las 20:48, Khachatryan Roman (<
>>> khachatryan.ro...@gmail.com>) escribió:
>>>
>>>> Hey,
>>>>
>>>> One reason could be that a resource-intensive test was killed by oom
>>>> killer. You can inspect /var/log/kern.log for the related messages in your
>>>> VM.
>>>>
>>>> Regards,
>>>> Roman
>>>>
>>>>
>>>> On Mon, Oct 19, 2020 at 5:57 PM Juha Mynttinen <
>>>> juha.myntti...@gmail.com> wrote:
>>>>
>>>>>
>>>>> Hey,
>>>>>
>>>>> I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1 in
>>>>> a virtual machine on Windows 10. I'm using OpenJDK 11.0.8. I'm on the
>>>>> master branch, commit 9eae578ae592254d54bc51c679644e8e84c65152.
>>>>>
>>>>> The command I'm using:
>>>>>
>>>>> apache-maven-3.2.5/bin/mvn clean verify
>>>>>
>>>>> The output:
>>>>>
>>>>> [INFO] Flink : Tests ...................................... FAILURE
>>>>> [14:38 min]
>>>>> [INFO] Flink : Streaming Scala ............................ SKIPPED
>>>>> [INFO] Flink : Connectors : HCatalog ...................... SKIPPED
>>>>> [INFO] Flink : Connectors : Base .......................... SKIPPED
>>>>> [INFO] Flink : Connectors : Files ......................... SKIPPED
>>>>> [INFO] Flink : Table : .................................... SKIPPED
>>>>> [INFO] Flink : Table : Common ............................. SKIPPED
>>>>> [INFO] Flink : Table : API Java ........................... SKIPPED
>>>>> [INFO] Flink : Table : API Java bridge .................... SKIPPED
>>>>> [INFO] Flink : Table : API Scala .......................... SKIPPED
>>>>> [INFO] Flink : Table : API Scala bridge ................... SKIPPED
>>>>> [INFO] Flink : Table : SQL Parser ......................... SKIPPED
>>>>> [INFO] Flink : Libraries : ................................ SKIPPED
>>>>> [INFO] Flink : Libraries : CEP ............................ SKIPPED
>>>>> [INFO] Flink : Table : Planner ............................ SKIPPED
>>>>> [INFO] Flink : Table : SQL Parser Hive .................... SKIPPED
>>>>> [INFO] Flink : Table : Runtime Blink ...................... SKIPPED
>>>>> [INFO] Flink : Table : Planner Blink ...................... SKIPPED
>>>>> [INFO] Flink : Metrics : JMX .............................. SKIPPED
>>>>> [INFO] Flink : Formats : .................................. SKIPPED
>>>>> [INFO] Flink : Formats : Json ............................. SKIPPED
>>>>> [INFO] Flink : Connectors : Kafka base .................... SKIPPED
>>>>> [INFO] Flink : Connectors : Elasticsearch base ............ SKIPPED
>>>>> [INFO] Flink : Connectors : Elasticsearch 5 ............... SKIPPED
>>>>> [INFO] Flink : Connectors : Elasticsearch 6 ............... SKIPPED
>>>>> [INFO] Flink : Connectors : Elasticsearch 7 ............... SKIPPED
>>>>> [INFO] Flink : Connectors : HBase base .................... SKIPPED
>>>>> [INFO] Flink : Connectors : HBase 1.4 ..................... SKIPPED
>>>>> [INFO] Flink : Connectors : HBase 2.2 ..................... SKIPPED
>>>>> [INFO] Flink : Formats : Hadoop bulk ...................... SKIPPED
>>>>> [INFO] Flink : Formats : Orc .............................. SKIPPED
>>>>> [INFO] Flink : Formats : Orc nohive ....................... SKIPPED
>>>>> [INFO] Flink : Formats : Avro ............................. SKIPPED
>>>>> [INFO] Flink : Formats : Parquet .......................... SKIPPED
>>>>> [INFO] Flink : Formats : Csv .............................. SKIPPED
>>>>> [INFO] Flink : Connectors : Hive .......................... SKIPPED
>>>>> [INFO] Flink : Connectors : JDBC .......................... SKIPPED
>>>>> [INFO] Flink : Connectors : RabbitMQ ...................... SKIPPED
>>>>> [INFO] Flink : Connectors : Twitter ....................... SKIPPED
>>>>> [INFO] Flink : Connectors : Nifi .......................... SKIPPED
>>>>> [INFO] Flink : Connectors : Cassandra ..................... SKIPPED
>>>>> [INFO] Flink : Connectors : Filesystem .................... SKIPPED
>>>>> [INFO] Flink : Connectors : Kafka ......................... SKIPPED
>>>>> [INFO] Flink : Connectors : Google PubSub ................. SKIPPED
>>>>> [INFO] Flink : Connectors : Kinesis ....................... SKIPPED
>>>>> [INFO] Flink : Connectors : SQL : Elasticsearch 6 ......... SKIPPED
>>>>> [INFO] Flink : Connectors : SQL : Elasticsearch 7 ......... SKIPPED
>>>>> [INFO] Flink : Connectors : SQL : HBase 1.4 ............... SKIPPED
>>>>> [INFO] Flink : Connectors : SQL : HBase 2.2 ............... SKIPPED
>>>>> [INFO] Flink : Connectors : SQL : Hive 1.2.2 .............. SKIPPED
>>>>> [INFO] Flink : Connectors : SQL : Hive 2.2.0 .............. SKIPPED
>>>>> [INFO] Flink : Connectors : SQL : Hive 2.3.6 .............. SKIPPED
>>>>> [INFO] Flink : Connectors : SQL : Hive 3.1.2 .............. SKIPPED
>>>>> [INFO] Flink : Connectors : SQL : Kafka ................... SKIPPED
>>>>> [INFO] Flink : Formats : Avro confluent registry .......... SKIPPED
>>>>> [INFO] Flink : Formats : Sequence file .................... SKIPPED
>>>>> [INFO] Flink : Formats : Compress ......................... SKIPPED
>>>>> [INFO] Flink : Formats : SQL Orc .......................... SKIPPED
>>>>> [INFO] Flink : Formats : SQL Parquet ...................... SKIPPED
>>>>> [INFO] Flink : Formats : SQL Avro ......................... SKIPPED
>>>>> [INFO] Flink : Examples : Streaming ....................... SKIPPED
>>>>> [INFO] Flink : Examples : Table ........................... SKIPPED
>>>>> [INFO] Flink : Examples : Build Helper : .................. SKIPPED
>>>>> [INFO] Flink : Examples : Build Helper : Streaming Twitter  SKIPPED
>>>>> [INFO] Flink : Examples : Build Helper : Streaming State machine
>>>>> SKIPPED
>>>>> [INFO] Flink : Examples : Build Helper : Streaming Google PubSub
>>>>> SKIPPED
>>>>> [INFO] Flink : Container .................................. SKIPPED
>>>>> [INFO] Flink : Queryable state : Runtime .................. SKIPPED
>>>>> [INFO] Flink : Mesos ...................................... SKIPPED
>>>>> [INFO] Flink : Kubernetes ................................. SKIPPED
>>>>> [INFO] Flink : Yarn ....................................... SKIPPED
>>>>> [INFO] Flink : Libraries : Gelly .......................... SKIPPED
>>>>> [INFO] Flink : Libraries : Gelly scala .................... SKIPPED
>>>>> [INFO] Flink : Libraries : Gelly Examples ................. SKIPPED
>>>>> [INFO] Flink : External resources : ....................... SKIPPED
>>>>> [INFO] Flink : External resources : GPU ................... SKIPPED
>>>>> [INFO] Flink : Metrics : Dropwizard ....................... SKIPPED
>>>>> [INFO] Flink : Metrics : Graphite ......................... SKIPPED
>>>>> [INFO] Flink : Metrics : InfluxDB ......................... SKIPPED
>>>>> [INFO] Flink : Metrics : Prometheus ....................... SKIPPED
>>>>> [INFO] Flink : Metrics : StatsD ........................... SKIPPED
>>>>> [INFO] Flink : Metrics : Datadog .......................... SKIPPED
>>>>> [INFO] Flink : Metrics : Slf4j ............................ SKIPPED
>>>>> [INFO] Flink : Libraries : CEP Scala ...................... SKIPPED
>>>>> [INFO] Flink : Table : Uber ............................... SKIPPED
>>>>> [INFO] Flink : Table : Uber Blink ......................... SKIPPED
>>>>> [INFO] Flink : Python ..................................... SKIPPED
>>>>> [INFO] Flink : Table : SQL Client ......................... SKIPPED
>>>>> [INFO] Flink : Libraries : State processor API ............ SKIPPED
>>>>> [INFO] Flink : ML : ....................................... SKIPPED
>>>>> [INFO] Flink : ML : API ................................... SKIPPED
>>>>> [INFO] Flink : ML : Lib ................................... SKIPPED
>>>>> [INFO] Flink : ML : Uber .................................. SKIPPED
>>>>> [INFO] Flink : Scala shell ................................ SKIPPED
>>>>> [INFO] Flink : Dist ....................................... SKIPPED
>>>>> [INFO] Flink : Yarn Tests ................................. SKIPPED
>>>>> [INFO] Flink : E2E Tests : ................................ SKIPPED
>>>>> [INFO] Flink : E2E Tests : CLI ............................ SKIPPED
>>>>> [INFO] Flink : E2E Tests : Parent Child classloading program SKIPPED
>>>>> [INFO] Flink : E2E Tests : Parent Child classloading lib-package
>>>>> SKIPPED
>>>>> [INFO] Flink : E2E Tests : Dataset allround ............... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Dataset Fine-grained recovery .. SKIPPED
>>>>> [INFO] Flink : E2E Tests : Datastream allround ............ SKIPPED
>>>>> [INFO] Flink : E2E Tests : Batch SQL ...................... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Stream SQL ..................... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Bucketing sink ................. SKIPPED
>>>>> [INFO] Flink : E2E Tests : Distributed cache via blob ..... SKIPPED
>>>>> [INFO] Flink : E2E Tests : High parallelism iterations .... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Stream stateful job upgrade .... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Queryable state ................ SKIPPED
>>>>> [INFO] Flink : E2E Tests : Local recovery and allocation .. SKIPPED
>>>>> [INFO] Flink : E2E Tests : Elasticsearch 5 ................ SKIPPED
>>>>> [INFO] Flink : E2E Tests : Elasticsearch 6 ................ SKIPPED
>>>>> [INFO] Flink : Quickstart : ............................... SKIPPED
>>>>> [INFO] Flink : Quickstart : Java .......................... SKIPPED
>>>>> [INFO] Flink : Quickstart : Scala ......................... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Quickstart ..................... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Confluent schema registry ...... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Stream state TTL ............... SKIPPED
>>>>> [INFO] Flink : E2E Tests : SQL client ..................... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Streaming file sink ............ SKIPPED
>>>>> [INFO] Flink : E2E Tests : State evolution ................ SKIPPED
>>>>> [INFO] Flink : E2E Tests : RocksDB state memory control ... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Common ......................... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Metrics availability ........... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Metrics reporter prometheus .... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Heavy deployment ............... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Connectors : Google PubSub ..... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Streaming Kafka base ........... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Streaming Kafka ................ SKIPPED
>>>>> [INFO] Flink : E2E Tests : Plugins : ...................... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Plugins : Dummy fs ............. SKIPPED
>>>>> [INFO] Flink : E2E Tests : Plugins : Another dummy fs ..... SKIPPED
>>>>> [INFO] Flink : E2E Tests : TPCH ........................... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Streaming Kinesis .............. SKIPPED
>>>>> [INFO] Flink : E2E Tests : Elasticsearch 7 ................ SKIPPED
>>>>> [INFO] Flink : E2E Tests : Common Kafka ................... SKIPPED
>>>>> [INFO] Flink : E2E Tests : TPCDS .......................... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Netty shuffle memory control ... SKIPPED
>>>>> [INFO] Flink : E2E Tests : Python ......................... SKIPPED
>>>>> [INFO] Flink : E2E Tests : HBase .......................... SKIPPED
>>>>> [INFO] Flink : State backends : Heap spillable ............ SKIPPED
>>>>> [INFO] Flink : Contrib : .................................. SKIPPED
>>>>> [INFO] Flink : Contrib : Connectors : Wikiedits ........... SKIPPED
>>>>> [INFO] Flink : FileSystems : Tests ........................ SKIPPED
>>>>> [INFO] Flink : Docs ....................................... SKIPPED
>>>>> [INFO] Flink : Walkthrough : .............................. SKIPPED
>>>>> [INFO] Flink : Walkthrough : Common ....................... SKIPPED
>>>>> [INFO] Flink : Walkthrough : Datastream Java .............. SKIPPED
>>>>> [INFO] Flink : Walkthrough : Datastream Scala ............. SKIPPED
>>>>> [INFO]
>>>>> ------------------------------------------------------------------------
>>>>> [INFO] BUILD FAILURE
>>>>> [INFO]
>>>>> ------------------------------------------------------------------------
>>>>> [INFO] Total time: 36:49 min
>>>>> [INFO] Finished at: 2020-10-19T18:24:46+03:00
>>>>> [INFO] Final Memory: 179M/614M
>>>>> [INFO]
>>>>> ------------------------------------------------------------------------
>>>>> [ERROR] Failed to execute goal
>>>>> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test
>>>>> (integration-tests) on project flink-tests: There are test failures.
>>>>> [ERROR]
>>>>> [ERROR] Please refer to
>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire-reports for the
>>>>> individual test results.
>>>>> [ERROR] Please refer to dump files (if any exist) [date].dump,
>>>>> [date]-jvmRun[N].dump and [date].dumpstream.
>>>>> [ERROR] ExecutionException The forked VM terminated without properly
>>>>> saying goodbye. VM crash or System.exit called?
>>>>> [ERROR] Command was /bin/sh -c cd
>>>>> /home/juha/git/apache-flink/flink-tests/target &&
>>>>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m
>>>>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar
>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar
>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire
>>>>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp
>>>>> surefire_122313349068739873924160tmp
>>>>> [ERROR] Error occurred in starting fork, check output in log
>>>>> [ERROR] Process Exit Code: 137
>>>>> [ERROR] Crashed tests:
>>>>> [ERROR]
>>>>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
>>>>> [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException:
>>>>> ExecutionException The forked VM terminated without properly saying
>>>>> goodbye. VM crash or System.exit called?
>>>>> [ERROR] Command was /bin/sh -c cd
>>>>> /home/juha/git/apache-flink/flink-tests/target &&
>>>>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m
>>>>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar
>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar
>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire
>>>>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp
>>>>> surefire_122313349068739873924160tmp
>>>>> [ERROR] Error occurred in starting fork, check output in log
>>>>> [ERROR] Process Exit Code: 137
>>>>> [ERROR] Crashed tests:
>>>>> [ERROR]
>>>>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
>>>>> [ERROR] at
>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
>>>>> [ERROR] at
>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
>>>>> [ERROR] at
>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
>>>>> [ERROR] at
>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
>>>>> [ERROR] at
>>>>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
>>>>> [ERROR] at
>>>>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
>>>>> [ERROR] at
>>>>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
>>>>> [ERROR] at
>>>>> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
>>>>> [ERROR] at
>>>>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
>>>>> [ERROR] at
>>>>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>>>>> [ERROR] at
>>>>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>>>>> [ERROR] at
>>>>> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
>>>>> [ERROR] at
>>>>> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
>>>>> [ERROR] at
>>>>> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
>>>>> [ERROR] at
>>>>> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
>>>>> [ERROR] at
>>>>> org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
>>>>> [ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
>>>>> [ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
>>>>> [ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
>>>>> [ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
>>>>> [ERROR] at
>>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>> Method)
>>>>> [ERROR] at
>>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>> [ERROR] at
>>>>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>> [ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>>>>> [ERROR] at
>>>>> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
>>>>> [ERROR] at
>>>>> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
>>>>> [ERROR] at
>>>>> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
>>>>> [ERROR] at
>>>>> org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
>>>>> [ERROR] Caused by:
>>>>> org.apache.maven.surefire.booter.SurefireBooterForkException: The forked 
>>>>> VM
>>>>> terminated without properly saying goodbye. VM crash or System.exit 
>>>>> called?
>>>>> [ERROR] Command was /bin/sh -c cd
>>>>> /home/juha/git/apache-flink/flink-tests/target &&
>>>>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m
>>>>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar
>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar
>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire
>>>>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp
>>>>> surefire_122313349068739873924160tmp
>>>>> [ERROR] Error occurred in starting fork, check output in log
>>>>> [ERROR] Process Exit Code: 137
>>>>> [ERROR] Crashed tests:
>>>>> [ERROR]
>>>>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
>>>>> [ERROR] at
>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
>>>>> [ERROR] at
>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
>>>>> [ERROR] at
>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
>>>>> [ERROR] at
>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
>>>>> [ERROR] at
>>>>> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>>>>> [ERROR] at
>>>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>>>> [ERROR] at
>>>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>>>> [ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
>>>>> [ERROR] -> [Help 1]
>>>>> [ERROR]
>>>>> [ERROR] To see the full stack trace of the errors, re-run Maven with
>>>>> the -e switch.
>>>>> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>>>>> [ERROR]
>>>>> [ERROR] For more information about the errors and possible solutions,
>>>>> please read the following articles:
>>>>> [ERROR] [Help 1]
>>>>> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
>>>>> [ERROR]
>>>>> [ERROR] After correcting the problems, you can resume the build with
>>>>> the command
>>>>> [ERROR]   mvn <goals> -rf :flink-tests
>>>>>
>>>>> The jvmdump-files look like this:
>>>>>
>>>>> # Created at 2020-10-19T18:14:22.869
>>>>> java.io.IOException: Stream closed
>>>>>         at
>>>>> java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
>>>>>         at
>>>>> java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
>>>>>         at
>>>>> java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
>>>>>         at
>>>>> java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>>>>>         at
>>>>> java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>>>>>         at
>>>>> java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>>>>>         at
>>>>> java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
>>>>>         at java.base/java.io.Reader.read(Reader.java:189)
>>>>>         at java.base/java.util.Scanner.readInput(Scanner.java:882)
>>>>>         at
>>>>> java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796)
>>>>>         at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610)
>>>>>         at
>>>>> org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
>>>>>         at
>>>>> org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
>>>>>         at
>>>>> org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
>>>>>         at
>>>>> org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
>>>>>         at
>>>>> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>>>>>         at
>>>>> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>>>>>         at
>>>>> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>>>>>         at
>>>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>>>>         at
>>>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>>>>         at java.base/java.lang.Thread.run(Thread.java:834)
>>>>>
>>>>>
>>>>> # Created at 2020-10-19T18:14:22.870
>>>>> System.exit() or native command error interrupted process checker.
>>>>> java.lang.IllegalStateException: error [STOPPED] to read process 898133
>>>>>         at
>>>>> org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
>>>>>         at
>>>>> org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
>>>>>         at
>>>>> org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
>>>>>         at
>>>>> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>>>>>         at
>>>>> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>>>>>         at
>>>>> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>>>>>         at
>>>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>>>>         at
>>>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>>>>         at java.base/java.lang.Thread.run(Thread.java:834)
>>>>>
>>>>>
>>>>> I found some JIRA tickets with " The forked VM terminated without
>>>>> properly saying goodbye":
>>>>>
>>>>> https://issues.apache.org/jira/browse/FLINK-18375
>>>>> https://issues.apache.org/jira/browse/FLINK-2466
>>>>>
>>>>> I don't see how these could explain the issue I'm witnessing....
>>>>>
>>>>> I wonder if the issue is related to the VM running "too hot". 'top'
>>>>> shows very high load averages.
>>>>>
>>>>> The crash can be reproduced.
>>>>>
>>>>> Regards,
>>>>> Juha
>>>>>
>>>>>

Reply via email to