Hi, I have a configuration where nodes talk to each other over a VPN link. But I do not want that VPN to be a single point of failure. So I've configured my ensemble with multiple addresses:
server.1=10.0.0.241:2888:3888|external-a.org:12888:13888 server.2=192.168.0.252:2888:3888|external-b.org:2888:3888 server.3=10.0.0.232:2888:3888|external-a.org:2888:3888 The two addresses per node are its internal IP address (viable when the VPN is active) and its external IP address (viable anytime, in theory). My thought was that, if the VPN drops, the zookeeper ensemble would be able to "fall back" and use the external addresses. I've set up my SSL certificates with alternate names and jailed the zookeeper servers before opening holes in my firewall to accept traffic at ports 2888 and 3888. However, when testing this by dropping the VPN link, I run into trouble: the two nodes on one side degrade into a 2 node ensemble and continue to serve requests while the 1 node continually tries to connect in and fails. Looking at the logs, it seems like the nodes *are* using the public IP addresses to get back in contact but that the code expects to be able to open both the external and internal address before accepting a new participant. Is this the case? 2022-08-15 11:58:36,887 [myid:] - INFO [ListenerHandler-/10.0.0.232:3888 :o.a.z.s.q.UnifiedServerSocket$UnifiedSocket@266] - Accepted TLS connection from /<external-b.org>:17715 - TLSv1.2 - TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 2022-08-15 11:58:43,951 [myid:] - WARN [QuorumConnectionThread- *[myid=3]-11:o.a.z.s.q.QuorumCnxManager@401] - Cannot open channel to 2 at election address /192.168.0.252:3888|/<external-b.org <http://external-b.org>>:3888*java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:607) at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:293) at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384) at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Can someone please help me understand: 1. Is this a bad thing to do? I thought multiple addresses were probably added for this exact use case but if the code expects to be able to open both IP addresses I may have misunderstood. 2. Is a better way to handle this to just let the ensemble degrade and, instead, try to connect to both external and internal addresses from the client side -- thus enabling clients on the disconnected side to see the degraded server on the other side? What's the best practice here? Thanks! Scott
