Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
Ok, I will try. Yingjie Cao 于2021年6月16日周三 下午8:00写道: > > Maybe you can try to increase taskmanager.network.retries, > taskmanager.network.netty.server.backlog and > taskmanager.network.netty.sendReceiveBufferSize. These options are useful for > our jobs. > > yidan zhao 于2021年6月16日周三 下午7:10写道:

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread Yingjie Cao
Maybe you can try to increase taskmanager.network.retries, taskmanager.network.netty.server.backlog and taskmanager.network.netty.sendReceiveBufferSize. These options are useful for our jobs. yidan zhao 于2021年6月16日周三 下午7:10写道: > Hi, yingjie. > If the network is not stable, which config parameter

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
I also searched many result in internet. There are some related exception like org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException, but in my case it is org.apache.flink.runtime.io.network.netty.exception.LocalTransportException. It is different in 'LocalTransportException

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
Hi, yingjie. If the network is not stable, which config parameter I should adjust. yidan zhao 于2021年6月16日周三 下午6:56写道: > > 2: I use G1, and no full gc occurred, young gc count: 422, time: > 142892, so it is not bad. > 3: stream job. > 4: I will try to config taskmanager.network.retries which is de

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
2: I use G1, and no full gc occurred, young gc count: 422, time: 142892, so it is not bad. 3: stream job. 4: I will try to config taskmanager.network.retries which is default 0, and taskmanager.network.netty.client.connectTimeoutSec 's default is 120s。 5: I checked the net fd number of the taskmana

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread Yingjie Cao
Hi yidan, 1. Is the network stable? 2. Is there any GC problem? 3. Is it a batch job? If so, please use sort-shuffle, see [1] for more information. 4. You may try to config these two options: taskmanager.network.retries, taskmanager.network.netty.client.connectTimeoutSec. More relevant options can

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread yidan zhao
Hi, here is the text exception stack: org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: readAddress(..) failed: Connection timed out (connection to '10.35.215.18/10.35.215.18:2045') at org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandle

Re: flink job exception analysis (netty related, readAddress failed. connection timed out)

2021-06-16 Thread Robert Metzger
Hi Yidan, it seems that the attachment did not make it through the mailing list. Can you copy-paste the text of the exception here or upload the log somewhere? On Wed, Jun 16, 2021 at 9:36 AM yidan zhao wrote: > Attachment is the exception stack from flink's web-ui. Does anyone > have also met