Re: Netty channel closed at AKKA gated status

2019-04-22 Thread Wenrui Meng
) 09:48 > To:zhijiang > Cc:Biao Liu ; user ; tzulitai < > tzuli...@apache.org> > Subject:Re: Netty channel closed at AKKA gated status > > Attached the lost task manager last 1 lines log. Anyone can help take > a look? > > Thanks, > Wenrui > > On Fri, Apr

Re: Netty channel closed at AKKA gated status

2019-04-19 Thread Wenrui Meng
nnection timeout, there would > be something in log file. So it is probably killed by other user or > operating system. Please check the log of operating system. BTW, I don't > think "DEBUG log level" would help. > > Wenrui Meng 于2019年4月16日周二 上午9:16写道: > There is

Re: Netty channel closed at AKKA gated status

2019-04-15 Thread Wenrui Meng
There is no exception or any warning in the task manager `'athena592-phx2/10.80.118.166:44177'` log. In addition, the host was not shut down either in cluster monitor dashboard. It probably requires to turn on DEBUG log to get more useful information. If the task manager gets killed, I assume there

Netty channel closed at AKKA gated status

2019-04-12 Thread Wenrui Meng
We encountered the netty channel inactive issue while the AKKA gated that task manager. I'm wondering whether the channel closed because of the AKKA gated status, since all message to the taskManager will be dropped at that moment, which might cause netty channel exception. If so, shall we have coo

Re: ConnectTimeoutException when createPartitionRequestClient

2019-01-09 Thread Wenrui Meng
Tue, Jan 8, 2019 at 7:06 AM Till Rohrmann wrote: > Hi Wenrui, > > the exception now occurs while finishing the connection creation. I'm not > sure whether this is so different. Could it be that your network is > overloaded or not very reliable? Have you tried running your Fli

Re: ConnectTimeoutException when createPartitionRequestClient

2019-01-09 Thread Wenrui Meng
AbstractNioChannel.java:207 and it looked as if the correct timeout value > was set. > > What is the special uber Flink version? What patches does it include? Are > you able to run your tests with the latest vanilla Flink version? > > Cheers, > Till > > On Wed, Jan 9,

Re: ConnectTimeoutException when createPartitionRequestClient

2019-01-09 Thread Wenrui Meng
very reliable? Have you tried running your Flink job > outside of AthenaX? > > Cheers, > Till > > On Tue, Jan 8, 2019 at 2:50 PM Wenrui Meng wrote: > >> Hi Till, >> >> Thanks for your reply. Our cluster is Yarn cluster. I found that if we >> decrease the t

Re: ConnectTimeoutException when createPartitionRequestClient

2019-01-08 Thread Wenrui Meng
etty/netty/blob/netty-4.0.27.Final/transport/src/main/java/io/netty/channel/nio/AbstractNioChannel.java#L207 > > Cheers, > Till > > On Sat, Jan 5, 2019 at 2:22 AM Wenrui Meng wrote: > >> Hi Till, >> >> Thanks for your reply and help on this issue. >> >> I increa

Re: ConnectTimeoutException when createPartitionRequestClient

2019-01-04 Thread Wenrui Meng
atest Flink version. > > Cheers, > Till > > On Thu, Jan 3, 2019 at 3:00 PM Wenrui Meng wrote: > >> Hi, >> >> I consistently get connection timeout issue when creating >> partitionRequestClient in flink 1.4. I tried to ping from the connecting >> h

ConnectTimeoutException when createPartitionRequestClient

2019-01-03 Thread Wenrui Meng
Hi, I consistently get connection timeout issue when creating partitionRequestClient in flink 1.4. I tried to ping from the connecting host to the connected host, but the ping latency is less than 0.1 ms consistently. So it's probably not due to the cluster status. I also tried increase max backof