Hi, Please search the task manager logs for the potential reason of failure/disconnecting around the time when you got this error on the job manager. There should be some clearly visible exception.
Thanks, Piotrek > On 9 Dec 2017, at 20:35, Chen Qin <qinnc...@gmail.com> wrote: > > Hi there, > > In recent, our production fink jobs observed some weird performance issue. > When job tailing kafka source failed and try to catch up, asyncIO after event > trigger get much higher load on task thread. Since each TM allocated two > virtual CPU in docker, my assumption was akka message between JM and TM > shouldn't be impacted. > > What I observed was TM get closed and keep restart with same error message > below. Any suggestion is appreciated! > > > org.apache.flink.runtime.io > <http://org.apache.flink.runtime.io/>.network.netty.exception.RemoteTransportException: > Connection unexpectedly closed by remote task manager > 'xxxxxxx/xxxxxxx:5841'. This might indicate that the remote task manager > was lost. > at org.apache.flink.runtime.io > <http://org.apache.flink.runtime.io/>.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:115) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223) > at > io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:294) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223) > at > io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:829) > at > io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:610) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:748) > > Chen