Hi Hemant,
did you checkout the dedicated page for memory configuration and
troubleshooting:
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-direct-buffer-memory
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#container-memory-exceeded
It is likely that the high number of output streams could cause your issues.
Regards,
Timo
On 14.07.21 08:46, bat man wrote:
Hi,
I have a job which reads different streams from 5 kafka topics. It
filters data and then data is streamed to different operators for
processing. This step involves data shuffling.
Also, once data is enriched in 4 joins(KeyedCoProcessFunction)
operators. After joining the data is written to different kafka topics.
There are a total of 16 different output streams which are written to 4
topics.
I have been facing some issues with yarn killing containers. I took the
heap dump and ran it through JXray [1]. Heap usage is not high. One
thing which stands out is off-heap usage which is very high. My guess is
this is what is killing the containers as the data inflow increases.
Screenshot 2021-07-14 at 11.52.41 AM.png
From the stack above is this usage high because of many output streams
being written to kafka topics. As the stack shows RecordWriter holding
off this DirectByteBuffer. I have assigned Network Memory as 1GB, and
--MaxDirectMemorySize also shows ~1GB for task managers.
From here[2] I found that setting -Djdk.nio.maxCachedBufferSize=262144
limits the temp buffer cache. Will it help in this case?
jvm version used is - JVM: OpenJDK 64-Bit Server VM - Red Hat, Inc. -
1.8/25.282-b08
[1] - https://jxray.com <https://jxray.com>
[2] -
https://dzone.com/articles/troubleshooting-problems-with-native-off-heap-memo
<https://dzone.com/articles/troubleshooting-problems-with-native-off-heap-memo>
Thanks,
Hemant