Hi Wenrui,

I think the akka gated issue and inactive netty channel are both caused by some 
task manager exits/killed. You should double check the status and reason of 
this task manager `'athena592-phx2/10.80.118.166:44177'`.

Best,
Zhijiang
------------------------------------------------------------------
From:Wenrui Meng <wenruim...@gmail.com>
Send Time:2019年4月13日(星期六) 01:01
To:user <user@flink.apache.org>
Cc:tzulitai <tzuli...@apache.org>
Subject:Netty channel closed at AKKA gated status

We encountered the netty channel inactive issue while the AKKA gated that task 
manager. I'm wondering whether the channel closed because of the AKKA gated 
status, since all message to the taskManager will be dropped at that moment, 
which might cause netty channel exception. If so, shall we have coordination 
between AKKA and Netty? The gated status is not intended to fail the system. 
Here is the stack trace fthe or exception

2019-04-12 12:46:38.413 [flink-akka.actor.default-dispatcher-90] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Completed 
checkpoint 3758 (3788228399 bytes in 5967 ms).
2019-04-12 12:49:14.175 [flink-akka.actor.default-dispatcher-65] WARN  
akka.remote.ReliableDeliverySupervisor 
flink-akka.remote.default-remote-dispatcher-25 - Association with remote system 
[akka.tcp://flink@athena592-phx2:44487] has failed, address is now gated for 
[5000] ms. Reason: [Disassociated] 
2019-04-12 12:49:14.175 [flink-akka.actor.default-dispatcher-65] WARN  
akka.remote.ReliableDeliverySupervisor 
flink-akka.remote.default-remote-dispatcher-25 - Association with remote system 
[akka.tcp://flink@athena592-phx2:44487] has failed, address is now gated for 
[5000] ms. Reason: [Disassociated] 
2019-04-12 12:49:14.230 [flink-akka.actor.default-dispatcher-65] INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph  - id (14/96) 
(93fcbfc535a190e1edcfd913d5f304fe) switched from RUNNING to FAILED.
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: 
Connection unexpectedly closed by remote task manager 
'athena592-phx2/10.80.118.166:44177'. This might indicate that the remote task 
manager was lost.
        at 
org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:117)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223)
        at 
org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:294)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:829)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:610)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
        at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
        at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:748)

Reply via email to