Hello Yuan,

I don't override any default settings, docker-compose.yml:

> services:
>   jobmanager:
>     image: flink:1.15.1-java11
>     ports:
>       - "8081:8081"
>     command: jobmanager
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>
>   taskmanager:
>     image: flink:1.15.1-java11
>     depends_on:
>       - jobmanager
>     command: taskmanager
>     ports:
>       - "8084:8084"
>     environment:
>       - |
>         FLINK_PROPERTIES=
>         jobmanager.rpc.address: jobmanager
>         taskmanager.numberOfTaskSlots: 2
>         metrics.reporter.prom.class: 
> org.apache.flink.metrics.prometheus.PrometheusReporter
>         env.java.opts: -XX:+HeapDumpOnOutOfMemoryError
>
>  From TaskManager log:

> INFO  [] - Final TaskExecutor Memory configuration:
> INFO  [] -   Total Process Memory:          1.688gb (1811939328 bytes)
> INFO  [] -     Total Flink Memory:          1.250gb (1342177280 bytes)
> INFO  [] -       Total JVM Heap Memory:     512.000mb (536870902 bytes)
> INFO  [] -         Framework:               128.000mb (134217728 bytes)
> INFO  [] -         Task:                    384.000mb (402653174 bytes)
> INFO  [] -       Total Off-heap Memory:     768.000mb (805306378 bytes)
> INFO  [] -         Managed:                 512.000mb (536870920 bytes)
> INFO  [] -         Total JVM Direct Memory: 256.000mb (268435458 bytes)
> INFO  [] -           Framework:             128.000mb (134217728 bytes)
> INFO  [] -           Task:                  0 bytes
> INFO  [] -           Network:               128.000mb (134217730 bytes)
> INFO  [] -     JVM Metaspace:               256.000mb (268435456 bytes)
> INFO  [] -     JVM Overhead:                192.000mb (201326592 bytes)
>

I would prefer not to configure memory (at this point), because memory
consumption depends on job structure, so it always can exceed configured
values.

My next guess is that the problem is not in metrics content, but in their
number, which increases with the number of operators.
So the next question is if there is a way to exclude metric generation on
operator level.
Found same question without correct answer on SOF:
https://stackoverflow.com/questions/54215245/apache-flink-limit-the-amount-of-metrics-exposed

On Fri, Aug 12, 2022 at 4:05 AM yu'an huang <h.yuan...@gmail.com> wrote:

> Hi Yuriy,
>
> How do you set your TaskMananger Memory? I think 40MB is not significant
> high for Flink. And It’s normal to see memory increase if you have more
> parallelism or set another metrics on. You can try setting larger moratory
> for Flink as explained by following documents.
>
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/memory/mem_setup/
>
> Best
> Yuan
>
>
>
> On 12 Aug 2022, at 12:51 AM, Yuriy Kutlunin <
> yuriy.kutlu...@glowbyteconsulting.com> wrote:
>
> Hi all,
>
> I'm running Flink Cluster in Session Mode via docker-compose as stated in
> docs:
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#session-cluster-yml
>
> After submitting a test job with many intermediate SQL operations (~500
> select * from ...) and metrics turned on (JMX or Prometheus) I got *OOM:
> java heap space* on initialization stage.
>
> Turning metrics off allows the job to get to the *Running* state.
> Heap consumption also depends on parallelism - same job succeeds when
> submitted with parallelism 1 instead of 2.
>
> There are Task Manager logs for 4 cases:
> JMX parallelism 1 (succeeded)
> JMX parallelism 2 (failed)
> Prometheus parallelism 2 (failed)
> No metrics parallelism 2 (succeeded)
>
> Post OOM heap dump (JMX parallelism 2) shows 2 main consumption points:
> 1. Big value (40MB) for some task configuration
> 2. Many instances (~270k) of some heavy (20KB) value in StreamConfig
>
> Seems like all these heavy values are related to weird task names, which
> includes all the operations:
>
>> Received task Source: source1 -> SourceConversion[2001] -> mapping1 ->
>> SourceConversion[2003] -> mapping2 -> SourceConversion[2005] -> ... ->
>> mapping500 -> Sink: sink1 (1/1)#0 (1e089cf3b1581ea7c8fb1cd7b159e66b)
>>
>
> Looking for some way to overcome this heap issue.
>
> --
> Best regards,
> Yuriy Kutlunin
> <many_operators_parallelism_1_with_jmx.txt>
> <many_operators_parallelism_2_with_jmx.txt>
> <many_operators_parallelism_2_no_jmx.txt>
> <many_operators_parallelism_2_with_prom.txt><heap_total.png>
> <heap_task2_conf.png><heap_many_string_instances.png><heap_task1_conf.png>
>
>
>

-- 
Best regards,
Yuriy Kutlunin

Reply via email to