[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001034#comment-17001034
 ] 

Fangmin Lv commented on ZOOKEEPER-3573:
---------------------------------------

Recently, we found without SO_LINGER it might cause the sessions expired 
unexpected even doing async close. 

Let's say there is network issue between leader and one follower, leader 
detected read timed out from inbound, the outbound network seems slow but can 
maintain the aliveness and follower won't detect read timed out, then leader 
closed the TLS connection, but blocked on sending close_notify packet due to 
send buffer is full, which may take more than 30s to close, and only then the 
follower detected the network issue and started to shutdown and close all 
client connections, but at that time those sessions are already timed out.

Jie is following up with OpenJDK community to see if we can support this option 
in SSL socket in OpenJDK 11+, or if there is any alternative to solve this.

> Dealing with long TLS connection closing time without SO_LINGER option
> ----------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3573
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3573
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.6.0
>            Reporter: Jie Huang
>            Priority: Major
>
> As described in ZOOKEEPER-3384, with SSL sockets, a close_notify is required 
> to be sent before closing the write side of a connection. When the send 
> buffer is full and the writing is blocked, it will take a long time to send 
> close_notify thus a long time to close the socket. The long closing time on 
> followers with a partitioned-away leader would stall the shutdown process and 
> delay a new leader election to establish a new quorum. As a result, the 
> ensemble would be unavailable for a long time.
> In ZOOKEEPER-3384, SO_LINGER option is used to close the socket quickly (and 
> potentially uncleanly). In JDK 11, however, SO_LINGER option is not honored 
> so we need a new way to avoid the long quorum unavailable time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to