Any update on this? Do you want me to create a JIRA issue for this bug? On 11 Oct 2017 17:14, "Ufuk Celebi" <u...@apache.org> wrote:
@Chesnay: Recycling of network resources happens after the tasks go into state FINISHED. Since we are submitting new jobs in a local loop here it can easily happen that the new job is submitted before enough buffers are available again. At least, previously that was the case. I'm CC'ing Nico who refactored the network buffer distribution recently and who might have more details about this specific error message. @Nico: another question is why there seem to be more buffers available but we don't assign them. I'm referring to this part of the error message "5691 of 32768 bytes...". On Wed, Oct 11, 2017 at 2:54 PM, Chesnay Schepler <ches...@apache.org> wrote: > I can confirm that the issue is reproducible with the given test, from the > command-line and IDE. > > While cutting down the test case, by replacing the outputformat with a > DiscardingOutputFormat and the JDBCInputFormat with a simple collection, i > stumbled onto a new Exception after ~200 iterations: > > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > > Caused by: java.io.IOException: Insufficient number of network buffers: > required 4, but only 1 available. The total number of network buffers is > currently set to 5691 of 32768 bytes each. You can increase this number by > setting the configuration keys 'taskmanager.network.memory.fraction', > 'taskmanager.network.memory.min', and 'taskmanager.network.memory.max'. > at > org.apache.flink.runtime.io.network.buffer.NetworkBufferPool. createBufferPool(NetworkBufferPool.java:195) > at > org.apache.flink.runtime.io.network.NetworkEnvironment.registerTask( NetworkEnvironment.java:186) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:602) > at java.lang.Thread.run(Thread.java:745) > > > On 11.10.2017 12:48, Flavio Pompermaier wrote: > > Hi to all, > we wrote a small JUnit test to reproduce a memory issue we have in a Flink > job (that seems related to Netty) . At some point, usually around the 28th > loop, the job fails with the following exception (actually we never faced > that in production but maybe is related to the memory issue somehow...): > > Caused by: java.lang.IllegalAccessError: > org/apache/flink/runtime/io/network/netty/NettyMessage > at > io.netty.util.internal.__matchers__.org.apache.flink. runtime.io.network.netty.NettyMessageMatcher.match( NoOpTypeParameterMatcher.java) > at > io.netty.channel.SimpleChannelInboundHandler.acceptInboundMessage( SimpleChannelInboundHandler.java:95) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead( SimpleChannelInboundHandler.java:102) > ... 16 more > > The github project is https://github.com/okkam-it/flink-memory-leak and the > JUnit test is contained in the MemoryLeakTest class (within src/main/test). > > Thanks in advance for any support, > Flavio > >