It's been a while since I was checking these parts... I also think the generic idea is that when you create a ZooKeeper class on the client side, it will asynchronously try to connect to the server and publish its state (connecting / connected / session-timeout / etc) through the watcher.
I remember that ZooKeeper class is using the ClientCnx class to manage the state of the connection which has a notion of sessionTimeout and connectTimeout. It is trying to connect to each known server in a round-robin fashion. Each connection attempt is tried for 'connectTimeout' time and I think a SessionTimeoutException is thrown when no server was responding in sessionTimeout time. (I think by default connectTimeout = SessionTimeout / number_of_servers). But I am not entirely sure what happens after the SessionTimeoutException. Normally I think ZooKeeper client doesn't reconnect automatically after a session timeout, as this is a case that needs to be handled by the client application. (no consistency can be guaranteed among different sessions; also ephemeral znodes will be deleted, etc. see: https://zookeeper.apache.org/doc/r3.6.3/zookeeperOver.html#Guarantees) But maybe if there was no active session created yet, then maybe there is an infinite retry logic in the client. I don't have much time right now to dig deeper into these classes. I would assume we already have some unit tests around here too, which could be checked to see the expected behaviour. Also I doin't know exactly how authentication failure is handled in the client side... The server might fall-back to an 'unauthenticated session' in case of authentication failures, or it can refuse the connection attempt (this can be configured, at least for SASL authentication: '*zookeeper.sessionRequireClientSASLAuth' *) Also I think the best would be to actually test this with your exact setup. (e.g. on the clusters we use, we still run ZooKeeper 3.5 in production with SSL encryption + Kerberos authentication... which might behave differently than what is your setup with 3.6.3... and also you might use x509 authentication?) But it shouldn't be hard to emulate some authentication failures with your setup. Best regards, Mate On Fri, Jun 17, 2022 at 11:23 PM Rahul Rane <rr...@linkedin.com.invalid> wrote: > Bumping up on this one. > > Thanks, > Rahul Rane > > From: Rahul Rane <rr...@linkedin.com> > Date: Wednesday, May 25, 2022 at 2:57 PM > To: dev@zookeeper.apache.org <dev@zookeeper.apache.org> > Subject: Few questions on connection retry on auth failure. > > Hello team, > > > > We need some help in understanding the zookeeper expected behavior and > potential solution to the problem. > > > > Context : > > We have extended ServerAuthenticationProvider with x509 scheme based on > 3.6.3 zookeeper server. We are trying to understand connection retry > scenario. On auth failure, we see that zookeeper client retries to > establish connection with server until the timeout or infinitely if no > timeout is set. We are using > org.apache.zookeeper.server.NettyServerCnxnFactory as Server connection > factory. > > > > Couple of questions : > > 1. Is zookeeper client supposed to retry infinitely on auth failure > from zookeeper server? > 2. Is there a way zookeeper client does not perform infinitely retries > on auth failure errors and bails out after first auth failure itself? > 3. We can’t find anything about auth failure errors in zookeeper client > logs but just that connection is closed. After looking into Netty Server > code, we see the auth failure is not communicated to client but got masked > here< > https://github.com/linkedin/zookeeper/blob/8bcaf7bb3cfa6470e1660e2b36964ae2284197df/zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxn.java#L99>. > So we were wondering if we are missing something here? > > > > Thanks for the help and let me know if you need any clarification on any > of the questions. > > > > Thanks, > > Rahul Rane >