Hi,

You're right, I thought about this also after writing the last comment -
for example on Linux, the Kernel by default overcommits memory allocations
and this approach doesn't work (doesn't make JVM crash right when it
starts).

I dug a little deeper. It seems that for ci-environments there are specific
compilation scripts such as
https://github.com/apache/flink/blob/master/tools/ci/compile.sh#L45 that
explicitly set flink.forkCount and flink.forkCountTestPackage to lower than
(?) default values. But for anybody compiling Flink locally, mvn uses the
default values, which might not work, as in my case.

I think a good goal would be that a developer can just git clone Flink and
build it following simple instructions. Preferably there would be zero
setup needed, just a simple command to run. The current situation is that
building Flink is "simple", just run a specific mvn command. This
simplicity comes with the price that things can break in unexpected ways:

1) There are things building Flink expects but doesn't check (
https://ci.apache.org/projects/flink/flink-docs-stable/flinkDev/building.html#build-flink
)
 * The correct Maven version
*  A suitable Java version
2) There's this issue with the count of CPU cores vs available mem.

The case 1) is documented, case 2) is not.

Fix options

a)

Document case 2) and instruct how to set flink.forkCountTestPackage (if
needed). Something like "Flink tests are run on parallel JVMs, each taking
2GB of RAM. There are by default as many JVMs as there are physical cores.
If your machine doesn't have at least 2GB * count of cores of RAM,
the tests can fail. You can set the count of JVMs using Maven property
flink.forkCountTestPackage to a lower value".

b)

Create a Linux specific Maven wrapper script for local execution too. The
wrapper script could download the correct Maven version, check the Java
version, calculate the max number of forks etc. A quick way to calculate
the max fork count

expr `cat /proc/meminfo | grep MemTotal | awk '{print $2}'` / 2097152

Regards,
Juha





El mar., 20 oct. 2020 a las 21:23, Khachatryan Roman (<
khachatryan.ro...@gmail.com>) escribió:

> I think you are right and I like the idea of failing the build fast.
> However, when trying this approach on my local machine it didn't help: the
> build didn't crash (probably, because of overcommit).
> Did you try this approach in your VM?
>
> Regards,
> Roman
>
>
> On Tue, Oct 20, 2020 at 12:12 PM Juha Mynttinen <juha.myntti...@gmail.com>
> wrote:
>
>> Hey,
>>
>> > Currently, tests do not run in parallel
>>
>> I don't think this is true, at least 100%. In 'top' it's clearly visible
>> that there are multiple JVMs. If not running tests in parallel, what are
>> these doing? In the main pom.xml there's configuration for the plug-in
>> 'maven-surefire-plugin'.
>>
>> I'm not a Maven expert, but it looks to me like this: in
>> https://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html
>> it says "The other possibility for parallel test execution is setting
>> the parameter forkCount to a value higher than 1". I think that's
>> happening in Flink:
>>
>> <forkCount>${flink.forkCount}</forkCount>
>>
>> And
>>
>> <flink.forkCount>1C</flink.forkCount>
>>
>> This means there's gonna be 1 * count_of_cpus forks.
>>
>> And this one:
>>
>> <argLine>-Xms256m -Xmx2048m -Dmvn.forkNumber=${surefire.forkNumber}
>> -XX:+UseG1GC</argLine>
>>
>> In my case, I have 5 CPUs, so 5 forks. I think what now happens is that
>> since each fork gets max 2048m heap, there's kind of mem requirement of CPU
>> count * 2048 m. In my case, I have 8GB of mem, which is less than max 5 *
>> 2048mb.
>>
>> This could be better..... I think it's a completely valid computer that
>> has RAM < count_of_cpus * 2048 mb, take e.g. AMD ryzen 3900X with 12 cores
>> and put 16 GB of RAM there. At least memory & CPU requirements should be
>> documented?
>>
>> If the tests really need 2GB of heap, then maybe the forkCount should be
>> based on the available RAM rather than available cores, e.g. floor(RAM /
>> 2GB)? I don't if that's doable in Maven....
>>
>> I think an easy and non-intrusive improvement would be to change '
>> -Xms256' to ' -Xms2048' (ms to match mx) so that the JVM would allocate
>> right away 2048mb (when it starts). If there's not enough memory, the tests
>> would fail immediately (JVM couldn't start). The tests would probably fail
>> anyways (my case) - better fail fast..
>>
>> Regards,
>> Juha
>>
>>
>>
>>
>>
>>
>>
>>
>> El mar., 20 oct. 2020 a las 11:16, Khachatryan Roman (<
>> khachatryan.ro...@gmail.com>) escribió:
>>
>>> Thanks for sharing this,
>>> I think the activity of OOM-Killer means high memory pressure (it just
>>> kills a process with the highest score of memory consumption).
>>> High CPU usage can only be a consequence of it, being constant GC.
>>>
>>> Currently, tests do not run in parallel, but high memory usage can be
>>> caused by the nature test (e.g. running Flink with high parallelism).
>>> So I think the best way to deal with this is to use VM with more memory.
>>>
>>> Regards,
>>> Roman
>>>
>>>
>>> On Tue, Oct 20, 2020 at 8:56 AM Juha Mynttinen <juha.myntti...@gmail.com>
>>> wrote:
>>>
>>>> Hey,
>>>>
>>>> Good hint that /var/log/kern.log. This time I can see this:
>>>>
>>>> Oct 20 09:44:48 ubuntu kernel: [ 1925.651551]
>>>> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service
>>>> ,task=java,pid=270024,uid=1000
>>>> Oct 20 09:44:48 ubuntu kernel: [ 1925.651632] Out of memory: Killed
>>>> process 270024 (java) total-vm:9841596kB, anon-rss:4820380kB, file-rss:0kB,
>>>> shmem-rss:0kB, UID:1000 pgtables:11780kB oom_score_adj:0
>>>> Oct 20 09:44:48 ubuntu kernel: [ 1925.844155] oom_reaper: reaped
>>>> process 270024 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
>>>>
>>>> The next question is why does this happen.... I'll try to dig deeper.
>>>>
>>>> About the CPU load. I have five CPUs. Theoretically it makes sense to
>>>> run five tests at time to max out the CPUs. However, when I look at what
>>>> the five Java processes (that MVN forks) are doing, it can be seen that
>>>> each of those processes have a large number of threads wanting to use CPU.
>>>> Here's an example from 'top -H'
>>>>
>>>>   top - 09:42:03 up 29 min,  1 user,  load average: 17,00, 12,86, 8,81
>>>> Threads: 1099 total,  21 running, 1078 sleeping,   0 stopped,   0 zombie
>>>> %Cpu(s): 90,5 us,  9,4 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,1 si,
>>>>  0,0 st
>>>> MiB Mem :   7961,6 total,   1614,3 free,   4023,8 used,   2323,5
>>>> buff/cache
>>>> MiB Swap:   2048,0 total,   2047,0 free,      1,0 used.   3638,9 avail
>>>> Mem
>>>>
>>>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
>>>> COMMAND
>>>>
>>>>  254825 juha      20   0 4250424 195768  27596 R  20,9   2,4   0:01.41
>>>> C2 CompilerThre
>>>>
>>>>  255116 juha      20   0 2820448  99240  27488 R  20,3   1,2   0:00.78
>>>> java
>>>>
>>>>  254968 juha      20   0 5312696 125212  27716 R  19,9   1,5   0:01.16
>>>> java
>>>>
>>>>  255027 juha      20   0 5310648 108716  27496 R  19,9   1,3   0:00.90
>>>> java
>>>>
>>>>  255123 juha      20   0 2820448  99120  27420 R  19,3   1,2   0:00.78
>>>> java
>>>>
>>>>  254829 juha      20   0 4240356 184376  27792 R  17,9   2,3   0:01.26
>>>> C2 CompilerThre
>>>>
>>>>  253993 juha      20   0 6436132 276808  28000 R  17,6   3,4   0:02.47
>>>> C2 CompilerThre
>>>>
>>>>  254793 juha      20   0 4250424 195768  27596 R  17,3   2,4   0:01.76
>>>> java
>>>>
>>>>  254801 juha      20   0 4240356 184376  27792 R  16,3   2,3   0:01.67
>>>> java
>>>>
>>>>  254298 juha      20   0 6510340 435360  28212 R  15,6   5,3   0:02.82
>>>> C2 CompilerThre
>>>>
>>>>  255145 juha      20   0 2820448  99240  27488 S  15,6   1,2   0:00.51
>>>> C2 CompilerThre
>>>>
>>>>  255045 juha      20   0 5310648 108716  27496 R  15,3   1,3   0:00.62
>>>> C2 CompilerThre
>>>>
>>>>  255151 juha      20   0 2820448  99120  27420 S  14,0   1,2   0:00.47
>>>> C2 CompilerThre
>>>>
>>>>  254986 juha      20   0 5312696 125212  27716 R  12,6   1,5   0:00.76
>>>> C2 CompilerThre
>>>>
>>>>  253980 juha      20   0 6436132 276808  28000 S  11,6   3,4   0:02.63
>>>> java
>>>>
>>>>  255148 juha      20   0 2820448  99240  27488 S  10,6   1,2   0:00.39
>>>> C1 CompilerThre
>>>>
>>>>  255154 juha      20   0 2820448  99120  27420 S   9,6   1,2   0:00.37
>>>> C1 CompilerThre
>>>>
>>>>  254457 juha      20   0 4269900 218036  28236 R   9,3   2,7   0:02.22
>>>> C2 CompilerThre
>>>>
>>>>  254299 juha      20   0 6510340 435360  28212 S   8,6   5,3   0:01.30
>>>> C1 CompilerThre
>>>>
>>>>  255047 juha      20   0 5310648 108716  27496 S   8,6   1,3   0:00.42
>>>> C1 CompilerThre
>>>>
>>>>  253994 juha      20   0 6436132 276808  28000 R   7,3   3,4   0:01.10
>>>> C1 CompilerThre
>>>>
>>>>  255312 juha      20   0 4250424 195768  27596 R   7,0   2,4   0:00.21
>>>> C2 CompilerThre
>>>>
>>>>  254831 juha      20   0 4240356 184376  27792 S   6,3   2,3   0:00.62
>>>> C1 CompilerThre
>>>>
>>>>  254988 juha      20   0 5312696 125212  27716 S   6,3   1,5   0:00.45
>>>> C1 CompilerThre
>>>>
>>>>  254828 juha      20   0 4250424 195768  27596 S   6,0   2,4   0:00.64
>>>> C1 CompilerThre
>>>>
>>>>  254720 juha      20   0 6510340 435360  28212 S   5,0   5,3   0:00.15
>>>> flink-akka.acto
>>>>
>>>>
>>>> It can be seen that the JIT related threads consume quite a lot of CPU,
>>>> essentially leaving less CPU available to the actual test code. By using
>>>> htop I can also see the garbage collection related threads eating CPU. This
>>>> doesn't seem right. I think it'd make sense to run the tests with less
>>>> parallelism to better utilize the CPUs. Having greatly more threads wanting
>>>> CPU slows things down (not speed up).
>>>>
>>>> However, AFAIK high CPU load shouldn't trigger OOM-killer?
>>>>
>>>> Regards,
>>>> Juha
>>>>
>>>>
>>>>
>>>>
>>>> El lun., 19 oct. 2020 a las 20:48, Khachatryan Roman (<
>>>> khachatryan.ro...@gmail.com>) escribió:
>>>>
>>>>> Hey,
>>>>>
>>>>> One reason could be that a resource-intensive test was killed by oom
>>>>> killer. You can inspect /var/log/kern.log for the related messages in your
>>>>> VM.
>>>>>
>>>>> Regards,
>>>>> Roman
>>>>>
>>>>>
>>>>> On Mon, Oct 19, 2020 at 5:57 PM Juha Mynttinen <
>>>>> juha.myntti...@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1 in
>>>>>> a virtual machine on Windows 10. I'm using OpenJDK 11.0.8. I'm on the
>>>>>> master branch, commit 9eae578ae592254d54bc51c679644e8e84c65152.
>>>>>>
>>>>>> The command I'm using:
>>>>>>
>>>>>> apache-maven-3.2.5/bin/mvn clean verify
>>>>>>
>>>>>> The output:
>>>>>>
>>>>>> [INFO] Flink : Tests ...................................... FAILURE
>>>>>> [14:38 min]
>>>>>> [INFO] Flink : Streaming Scala ............................ SKIPPED
>>>>>> [INFO] Flink : Connectors : HCatalog ...................... SKIPPED
>>>>>> [INFO] Flink : Connectors : Base .......................... SKIPPED
>>>>>> [INFO] Flink : Connectors : Files ......................... SKIPPED
>>>>>> [INFO] Flink : Table : .................................... SKIPPED
>>>>>> [INFO] Flink : Table : Common ............................. SKIPPED
>>>>>> [INFO] Flink : Table : API Java ........................... SKIPPED
>>>>>> [INFO] Flink : Table : API Java bridge .................... SKIPPED
>>>>>> [INFO] Flink : Table : API Scala .......................... SKIPPED
>>>>>> [INFO] Flink : Table : API Scala bridge ................... SKIPPED
>>>>>> [INFO] Flink : Table : SQL Parser ......................... SKIPPED
>>>>>> [INFO] Flink : Libraries : ................................ SKIPPED
>>>>>> [INFO] Flink : Libraries : CEP ............................ SKIPPED
>>>>>> [INFO] Flink : Table : Planner ............................ SKIPPED
>>>>>> [INFO] Flink : Table : SQL Parser Hive .................... SKIPPED
>>>>>> [INFO] Flink : Table : Runtime Blink ...................... SKIPPED
>>>>>> [INFO] Flink : Table : Planner Blink ...................... SKIPPED
>>>>>> [INFO] Flink : Metrics : JMX .............................. SKIPPED
>>>>>> [INFO] Flink : Formats : .................................. SKIPPED
>>>>>> [INFO] Flink : Formats : Json ............................. SKIPPED
>>>>>> [INFO] Flink : Connectors : Kafka base .................... SKIPPED
>>>>>> [INFO] Flink : Connectors : Elasticsearch base ............ SKIPPED
>>>>>> [INFO] Flink : Connectors : Elasticsearch 5 ............... SKIPPED
>>>>>> [INFO] Flink : Connectors : Elasticsearch 6 ............... SKIPPED
>>>>>> [INFO] Flink : Connectors : Elasticsearch 7 ............... SKIPPED
>>>>>> [INFO] Flink : Connectors : HBase base .................... SKIPPED
>>>>>> [INFO] Flink : Connectors : HBase 1.4 ..................... SKIPPED
>>>>>> [INFO] Flink : Connectors : HBase 2.2 ..................... SKIPPED
>>>>>> [INFO] Flink : Formats : Hadoop bulk ...................... SKIPPED
>>>>>> [INFO] Flink : Formats : Orc .............................. SKIPPED
>>>>>> [INFO] Flink : Formats : Orc nohive ....................... SKIPPED
>>>>>> [INFO] Flink : Formats : Avro ............................. SKIPPED
>>>>>> [INFO] Flink : Formats : Parquet .......................... SKIPPED
>>>>>> [INFO] Flink : Formats : Csv .............................. SKIPPED
>>>>>> [INFO] Flink : Connectors : Hive .......................... SKIPPED
>>>>>> [INFO] Flink : Connectors : JDBC .......................... SKIPPED
>>>>>> [INFO] Flink : Connectors : RabbitMQ ...................... SKIPPED
>>>>>> [INFO] Flink : Connectors : Twitter ....................... SKIPPED
>>>>>> [INFO] Flink : Connectors : Nifi .......................... SKIPPED
>>>>>> [INFO] Flink : Connectors : Cassandra ..................... SKIPPED
>>>>>> [INFO] Flink : Connectors : Filesystem .................... SKIPPED
>>>>>> [INFO] Flink : Connectors : Kafka ......................... SKIPPED
>>>>>> [INFO] Flink : Connectors : Google PubSub ................. SKIPPED
>>>>>> [INFO] Flink : Connectors : Kinesis ....................... SKIPPED
>>>>>> [INFO] Flink : Connectors : SQL : Elasticsearch 6 ......... SKIPPED
>>>>>> [INFO] Flink : Connectors : SQL : Elasticsearch 7 ......... SKIPPED
>>>>>> [INFO] Flink : Connectors : SQL : HBase 1.4 ............... SKIPPED
>>>>>> [INFO] Flink : Connectors : SQL : HBase 2.2 ............... SKIPPED
>>>>>> [INFO] Flink : Connectors : SQL : Hive 1.2.2 .............. SKIPPED
>>>>>> [INFO] Flink : Connectors : SQL : Hive 2.2.0 .............. SKIPPED
>>>>>> [INFO] Flink : Connectors : SQL : Hive 2.3.6 .............. SKIPPED
>>>>>> [INFO] Flink : Connectors : SQL : Hive 3.1.2 .............. SKIPPED
>>>>>> [INFO] Flink : Connectors : SQL : Kafka ................... SKIPPED
>>>>>> [INFO] Flink : Formats : Avro confluent registry .......... SKIPPED
>>>>>> [INFO] Flink : Formats : Sequence file .................... SKIPPED
>>>>>> [INFO] Flink : Formats : Compress ......................... SKIPPED
>>>>>> [INFO] Flink : Formats : SQL Orc .......................... SKIPPED
>>>>>> [INFO] Flink : Formats : SQL Parquet ...................... SKIPPED
>>>>>> [INFO] Flink : Formats : SQL Avro ......................... SKIPPED
>>>>>> [INFO] Flink : Examples : Streaming ....................... SKIPPED
>>>>>> [INFO] Flink : Examples : Table ........................... SKIPPED
>>>>>> [INFO] Flink : Examples : Build Helper : .................. SKIPPED
>>>>>> [INFO] Flink : Examples : Build Helper : Streaming Twitter  SKIPPED
>>>>>> [INFO] Flink : Examples : Build Helper : Streaming State machine
>>>>>> SKIPPED
>>>>>> [INFO] Flink : Examples : Build Helper : Streaming Google PubSub
>>>>>> SKIPPED
>>>>>> [INFO] Flink : Container .................................. SKIPPED
>>>>>> [INFO] Flink : Queryable state : Runtime .................. SKIPPED
>>>>>> [INFO] Flink : Mesos ...................................... SKIPPED
>>>>>> [INFO] Flink : Kubernetes ................................. SKIPPED
>>>>>> [INFO] Flink : Yarn ....................................... SKIPPED
>>>>>> [INFO] Flink : Libraries : Gelly .......................... SKIPPED
>>>>>> [INFO] Flink : Libraries : Gelly scala .................... SKIPPED
>>>>>> [INFO] Flink : Libraries : Gelly Examples ................. SKIPPED
>>>>>> [INFO] Flink : External resources : ....................... SKIPPED
>>>>>> [INFO] Flink : External resources : GPU ................... SKIPPED
>>>>>> [INFO] Flink : Metrics : Dropwizard ....................... SKIPPED
>>>>>> [INFO] Flink : Metrics : Graphite ......................... SKIPPED
>>>>>> [INFO] Flink : Metrics : InfluxDB ......................... SKIPPED
>>>>>> [INFO] Flink : Metrics : Prometheus ....................... SKIPPED
>>>>>> [INFO] Flink : Metrics : StatsD ........................... SKIPPED
>>>>>> [INFO] Flink : Metrics : Datadog .......................... SKIPPED
>>>>>> [INFO] Flink : Metrics : Slf4j ............................ SKIPPED
>>>>>> [INFO] Flink : Libraries : CEP Scala ...................... SKIPPED
>>>>>> [INFO] Flink : Table : Uber ............................... SKIPPED
>>>>>> [INFO] Flink : Table : Uber Blink ......................... SKIPPED
>>>>>> [INFO] Flink : Python ..................................... SKIPPED
>>>>>> [INFO] Flink : Table : SQL Client ......................... SKIPPED
>>>>>> [INFO] Flink : Libraries : State processor API ............ SKIPPED
>>>>>> [INFO] Flink : ML : ....................................... SKIPPED
>>>>>> [INFO] Flink : ML : API ................................... SKIPPED
>>>>>> [INFO] Flink : ML : Lib ................................... SKIPPED
>>>>>> [INFO] Flink : ML : Uber .................................. SKIPPED
>>>>>> [INFO] Flink : Scala shell ................................ SKIPPED
>>>>>> [INFO] Flink : Dist ....................................... SKIPPED
>>>>>> [INFO] Flink : Yarn Tests ................................. SKIPPED
>>>>>> [INFO] Flink : E2E Tests : ................................ SKIPPED
>>>>>> [INFO] Flink : E2E Tests : CLI ............................ SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Parent Child classloading program SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Parent Child classloading lib-package
>>>>>> SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Dataset allround ............... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Dataset Fine-grained recovery .. SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Datastream allround ............ SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Batch SQL ...................... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Stream SQL ..................... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Bucketing sink ................. SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Distributed cache via blob ..... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : High parallelism iterations .... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Stream stateful job upgrade .... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Queryable state ................ SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Local recovery and allocation .. SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Elasticsearch 5 ................ SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Elasticsearch 6 ................ SKIPPED
>>>>>> [INFO] Flink : Quickstart : ............................... SKIPPED
>>>>>> [INFO] Flink : Quickstart : Java .......................... SKIPPED
>>>>>> [INFO] Flink : Quickstart : Scala ......................... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Quickstart ..................... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Confluent schema registry ...... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Stream state TTL ............... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : SQL client ..................... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Streaming file sink ............ SKIPPED
>>>>>> [INFO] Flink : E2E Tests : State evolution ................ SKIPPED
>>>>>> [INFO] Flink : E2E Tests : RocksDB state memory control ... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Common ......................... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Metrics availability ........... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Metrics reporter prometheus .... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Heavy deployment ............... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Connectors : Google PubSub ..... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Streaming Kafka base ........... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Streaming Kafka ................ SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Plugins : ...................... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Plugins : Dummy fs ............. SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Plugins : Another dummy fs ..... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : TPCH ........................... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Streaming Kinesis .............. SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Elasticsearch 7 ................ SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Common Kafka ................... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : TPCDS .......................... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Netty shuffle memory control ... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : Python ......................... SKIPPED
>>>>>> [INFO] Flink : E2E Tests : HBase .......................... SKIPPED
>>>>>> [INFO] Flink : State backends : Heap spillable ............ SKIPPED
>>>>>> [INFO] Flink : Contrib : .................................. SKIPPED
>>>>>> [INFO] Flink : Contrib : Connectors : Wikiedits ........... SKIPPED
>>>>>> [INFO] Flink : FileSystems : Tests ........................ SKIPPED
>>>>>> [INFO] Flink : Docs ....................................... SKIPPED
>>>>>> [INFO] Flink : Walkthrough : .............................. SKIPPED
>>>>>> [INFO] Flink : Walkthrough : Common ....................... SKIPPED
>>>>>> [INFO] Flink : Walkthrough : Datastream Java .............. SKIPPED
>>>>>> [INFO] Flink : Walkthrough : Datastream Scala ............. SKIPPED
>>>>>> [INFO]
>>>>>> ------------------------------------------------------------------------
>>>>>> [INFO] BUILD FAILURE
>>>>>> [INFO]
>>>>>> ------------------------------------------------------------------------
>>>>>> [INFO] Total time: 36:49 min
>>>>>> [INFO] Finished at: 2020-10-19T18:24:46+03:00
>>>>>> [INFO] Final Memory: 179M/614M
>>>>>> [INFO]
>>>>>> ------------------------------------------------------------------------
>>>>>> [ERROR] Failed to execute goal
>>>>>> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test
>>>>>> (integration-tests) on project flink-tests: There are test failures.
>>>>>> [ERROR]
>>>>>> [ERROR] Please refer to
>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire-reports for the
>>>>>> individual test results.
>>>>>> [ERROR] Please refer to dump files (if any exist) [date].dump,
>>>>>> [date]-jvmRun[N].dump and [date].dumpstream.
>>>>>> [ERROR] ExecutionException The forked VM terminated without properly
>>>>>> saying goodbye. VM crash or System.exit called?
>>>>>> [ERROR] Command was /bin/sh -c cd
>>>>>> /home/juha/git/apache-flink/flink-tests/target &&
>>>>>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m
>>>>>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar
>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar
>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire
>>>>>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp
>>>>>> surefire_122313349068739873924160tmp
>>>>>> [ERROR] Error occurred in starting fork, check output in log
>>>>>> [ERROR] Process Exit Code: 137
>>>>>> [ERROR] Crashed tests:
>>>>>> [ERROR]
>>>>>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
>>>>>> [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException:
>>>>>> ExecutionException The forked VM terminated without properly saying
>>>>>> goodbye. VM crash or System.exit called?
>>>>>> [ERROR] Command was /bin/sh -c cd
>>>>>> /home/juha/git/apache-flink/flink-tests/target &&
>>>>>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m
>>>>>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar
>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar
>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire
>>>>>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp
>>>>>> surefire_122313349068739873924160tmp
>>>>>> [ERROR] Error occurred in starting fork, check output in log
>>>>>> [ERROR] Process Exit Code: 137
>>>>>> [ERROR] Crashed tests:
>>>>>> [ERROR]
>>>>>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
>>>>>> [ERROR] at
>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
>>>>>> [ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
>>>>>> [ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
>>>>>> [ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
>>>>>> [ERROR] at
>>>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>> Method)
>>>>>> [ERROR] at
>>>>>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>> [ERROR] at
>>>>>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>> [ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>>>>>> [ERROR] at
>>>>>> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
>>>>>> [ERROR] at
>>>>>> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
>>>>>> [ERROR] at
>>>>>> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
>>>>>> [ERROR] at
>>>>>> org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
>>>>>> [ERROR] Caused by:
>>>>>> org.apache.maven.surefire.booter.SurefireBooterForkException: The forked 
>>>>>> VM
>>>>>> terminated without properly saying goodbye. VM crash or System.exit 
>>>>>> called?
>>>>>> [ERROR] Command was /bin/sh -c cd
>>>>>> /home/juha/git/apache-flink/flink-tests/target &&
>>>>>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m
>>>>>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar
>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar
>>>>>> /home/juha/git/apache-flink/flink-tests/target/surefire
>>>>>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp
>>>>>> surefire_122313349068739873924160tmp
>>>>>> [ERROR] Error occurred in starting fork, check output in log
>>>>>> [ERROR] Process Exit Code: 137
>>>>>> [ERROR] Crashed tests:
>>>>>> [ERROR]
>>>>>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
>>>>>> [ERROR] at
>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444)
>>>>>> [ERROR] at
>>>>>> org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420)
>>>>>> [ERROR] at
>>>>>> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>>>>>> [ERROR] at
>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>>>>> [ERROR] at
>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>>>>> [ERROR] at java.base/java.lang.Thread.run(Thread.java:834)
>>>>>> [ERROR] -> [Help 1]
>>>>>> [ERROR]
>>>>>> [ERROR] To see the full stack trace of the errors, re-run Maven with
>>>>>> the -e switch.
>>>>>> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>>>>>> [ERROR]
>>>>>> [ERROR] For more information about the errors and possible solutions,
>>>>>> please read the following articles:
>>>>>> [ERROR] [Help 1]
>>>>>> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
>>>>>> [ERROR]
>>>>>> [ERROR] After correcting the problems, you can resume the build with
>>>>>> the command
>>>>>> [ERROR]   mvn <goals> -rf :flink-tests
>>>>>>
>>>>>> The jvmdump-files look like this:
>>>>>>
>>>>>> # Created at 2020-10-19T18:14:22.869
>>>>>> java.io.IOException: Stream closed
>>>>>>         at
>>>>>> java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
>>>>>>         at
>>>>>> java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
>>>>>>         at
>>>>>> java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
>>>>>>         at
>>>>>> java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>>>>>>         at
>>>>>> java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>>>>>>         at
>>>>>> java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>>>>>>         at
>>>>>> java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
>>>>>>         at java.base/java.io.Reader.read(Reader.java:189)
>>>>>>         at java.base/java.util.Scanner.readInput(Scanner.java:882)
>>>>>>         at
>>>>>> java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796)
>>>>>>         at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610)
>>>>>>         at
>>>>>> org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354)
>>>>>>         at
>>>>>> org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190)
>>>>>>         at
>>>>>> org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123)
>>>>>>         at
>>>>>> org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
>>>>>>         at
>>>>>> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>>>>>>         at
>>>>>> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>>>>>>         at
>>>>>> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>>>>>>         at
>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>>>>>         at
>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>>>>>         at java.base/java.lang.Thread.run(Thread.java:834)
>>>>>>
>>>>>>
>>>>>> # Created at 2020-10-19T18:14:22.870
>>>>>> System.exit() or native command error interrupted process checker.
>>>>>> java.lang.IllegalStateException: error [STOPPED] to read process
>>>>>> 898133
>>>>>>         at
>>>>>> org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145)
>>>>>>         at
>>>>>> org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124)
>>>>>>         at
>>>>>> org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214)
>>>>>>         at
>>>>>> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>>>>>>         at
>>>>>> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>>>>>>         at
>>>>>> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>>>>>>         at
>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>>>>>         at
>>>>>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>>>>>         at java.base/java.lang.Thread.run(Thread.java:834)
>>>>>>
>>>>>>
>>>>>> I found some JIRA tickets with " The forked VM terminated without
>>>>>> properly saying goodbye":
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/FLINK-18375
>>>>>> https://issues.apache.org/jira/browse/FLINK-2466
>>>>>>
>>>>>> I don't see how these could explain the issue I'm witnessing....
>>>>>>
>>>>>> I wonder if the issue is related to the VM running "too hot". 'top'
>>>>>> shows very high load averages.
>>>>>>
>>>>>> The crash can be reproduced.
>>>>>>
>>>>>> Regards,
>>>>>> Juha
>>>>>>
>>>>>>

Reply via email to