Hi Wenrui,
If a task manager is killed (kill -9), it would have no chance to log
anything. If the task manager exits since connection timeout, there would
be something in log file. So it is probably killed by other user or
operating system. Please check the log of operating system. BTW, I don't
think "DEBUG log level" would help.

Wenrui Meng <wenruim...@gmail.com> 于2019年4月16日周二 上午9:16写道:

> There is no exception or any warning in the task manager
> `'athena592-phx2/10.80.118.166:44177'` log. In addition, the host was not
> shut down either in cluster monitor dashboard. It probably requires to turn
> on DEBUG log to get more useful information. If the task manager gets
> killed, I assume there will be terminating log in the task manager log. If
> not, I don't know how to figure out whether it's due to task manager gets
> killed or just a connection timeout.
>
>
>
> On Sun, Apr 14, 2019 at 7:22 PM zhijiang <wangzhijiang...@aliyun.com>
> wrote:
>
>> Hi Wenrui,
>>
>> I think the akka gated issue and inactive netty channel are both caused
>> by some task manager exits/killed. You should double check the status and
>> reason of this task manager `'athena592-phx2/10.80.118.166:44177'`.
>>
>> Best,
>> Zhijiang
>>
>> ------------------------------------------------------------------
>> From:Wenrui Meng <wenruim...@gmail.com>
>> Send Time:2019年4月13日(星期六) 01:01
>> To:user <user@flink.apache.org>
>> Cc:tzulitai <tzuli...@apache.org>
>> Subject:Netty channel closed at AKKA gated status
>>
>> We encountered the netty channel inactive issue while the AKKA gated that
>> task manager. I'm wondering whether the channel closed because of the AKKA
>> gated status, since all message to the taskManager will be dropped at that
>> moment, which might cause netty channel exception. If so, shall we have
>> coordination between AKKA and Netty? The gated status is not intended to
>> fail the system. Here is the stack trace fthe or exception
>>
>> 2019-04-12 12:46:38.413 [flink-akka.actor.default-dispatcher-90] INFO
>> org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Completed
>> checkpoint 3758 (3788228399 bytes in 5967 ms).
>> 2019-04-12 12:49:14.175 [flink-akka.actor.default-dispatcher-65] WARN
>> akka.remote.ReliableDeliverySupervisor
>> flink-akka.remote.default-remote-dispatcher-25 - Association with remote
>> system [akka.tcp://flink@athena592-phx2:44487] has failed, address is
>> now gated for [5000] ms. Reason: [Disassociated]
>> 2019-04-12 12:49:14.175 [flink-akka.actor.default-dispatcher-65] WARN
>> akka.remote.ReliableDeliverySupervisor
>> flink-akka.remote.default-remote-dispatcher-25 - Association with remote
>> system [akka.tcp://flink@athena592-phx2:44487] has failed, address is
>> now gated for [5000] ms. Reason: [Disassociated]
>> 2019-04-12 12:49:14.230 [flink-akka.actor.default-dispatcher-65] INFO
>> org.apache.flink.runtime.executiongraph.ExecutionGraph  - id (14/96)
>> (93fcbfc535a190e1edcfd913d5f304fe) switched from RUNNING to FAILED.
>> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>> Connection unexpectedly closed by remote task manager 'athena592-phx2/
>> 10.80.118.166:44177'. This might indicate that the remote task manager
>> was lost.
>>         at
>> org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelInactive(PartitionRequestClientHandler.java:117)
>>         at
>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)
>>         at
>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223)
>>         at
>> org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
>>         at
>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)
>>         at
>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223)
>>         at
>> org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:294)
>>         at
>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:237)
>>         at
>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:223)
>>         at
>> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:829)
>>         at
>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:610)
>>         at
>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>>         at
>> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>>         at
>> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>>         at java.lang.Thread.run(Thread.java:748)
>>
>>
>>

Reply via email to