Hi,

I’ve met the exactly same problem recently and solved it in Piotr’s way. 
@zhijiang, I didn’t see any oom error thrown by JVM (I’m not sure this can be 
thrown if yarn decides to kill it in a mandatory way). According to our 
monitoring system, the overusage of memory is from JVM directy memory.


The interesting part is that the old way works if I increase the 
-XX:MaxDirectMemorySize to be around 3 GB (it’s around 2GB before). So I 
suspect we at least need to reserve one #ByteBuffer’s size in 
#memoryMappedRegions for #MappedByteBuffer (which is 2 GB for large files). Not 
sure I’m right about this.


@yingjie   Do you have any idea how much memory will be stolen from OS when 
using mmap for data reading?




Best,
Jiayi Liao


 Original Message 
Sender: yingjie<yjclove...@gmail.com>
Recipient: user<user@flink.apache.org>
Date: Tuesday, Nov 26, 2019 18:10
Subject: Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?


The new BlockingSubpartition implementation in 1.9 uses mmap for data reading 
by default which means it steals memory from OS. The mmaped region memory is 
managed by JVM, so there should be no OutOfMemory problem reported by JVM and 
the OS memory is also not exhausted, so there should be no kernal OOM. I think 
Piotr's suspicion is right, yarn tracked the memory used and killed the TM (the 
mmap region is also part of the process memory). Giving a strict resource 
restriction to the container (larger than the yarn limit) which can avoid 
memory steal or using file instead of mmap as pointed out by Piotr can solve 
the problem. I think Flink may need to restrict the amount of memory can be 
stolen. -- Sent from: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to