Re: Insufficient number of network buffers after restarting

2020-12-28 Thread Piotr Nowojski
Hi Yufei, My prime suspect would be changes to the memory configuration introduced in 1.11 [1] Piotrek [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html#memory-management pon., 28 gru 2020 o 09:52 Till Rohrmann napisał(a): > Hi Yufei, > > I cannot

Re: Insufficient number of network buffers after restarting

2020-12-28 Thread Till Rohrmann
Hi Yufei, I cannot remember exactly the changes in this area between Flink 1.10.0 and Flink 1.12.0. It sounds a bit as if we were not releasing memory segments fast enough or had a memory leak. One thing to try out is to increase the restart delay to see whether it is the first problem.

Re: Insufficient number of network buffers after restarting

2020-12-24 Thread Yangze Guo
Hi, Yufei. Can you reproduce this issue in 1.10.0? The deterministic slot sharing introduced in 1.12.0 is one possible reason. Before 1.12.0, the distribution of tasks in slots is not determined. Even if the network buffers are enough from the perspective of the cluster. Bad distribution of tasks

Insufficient number of network buffers after restarting

2020-12-24 Thread Yufei Liu
Hey, I’ve found that job will throw “java.io.IOException: Insufficient number of network buffers: required 51, but only 1 available” after job retstart, and I’ve observed TM use much more network buffers than before. My internal branch is under 1.10.0 can easily reproduce, but I use 1.12.0