dizhou cao created FLINK-32191:
----------------------------------

             Summary: Support for configuring keepalive related parameters.
                 Key: FLINK-32191
                 URL: https://issues.apache.org/jira/browse/FLINK-32191
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Network
            Reporter: dizhou cao


We encountered a case in our production environment where upstream was unable 
to send data downstream due to an abnormality in the switch link. However, 
upstream can only detect the abnormality after RTO timeout retransmission 
failure, which takes about 15 minutes in our production environment. This may 
result in a 15-minute job unavailability. We hope to perform failover and 
reschedule job more quickly. Flink has already enabled keepalive, but the 
default keepalive idle time is 2 hours. We can adjust the timeout of TCP 
keepalive by configuring TCP_KEEPIDLE, TCP_KEEPINTERVAL, and TCP_KEEPCOUNT. 
These configurations are already supported at the Netty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to