Kai Sun created ZOOKEEPER-4022:
----------------------------------

             Summary: ZooKeeper client session establishment deficiency
                 Key: ZOOKEEPER-4022
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4022
             Project: ZooKeeper
          Issue Type: Bug
          Components: java client
    Affects Versions: 3.4.14, 3.4.13, 3.4.11
            Reporter: Kai Sun


Here I want to share some deficiency of ZooKeeper client connection deficiency 
we debugged and met in large scale operation.
 * Dead IP. Let us say one Zookeeper server is dead. The connection string just 
has one DNS name that can be resolved to N IPs. For >= 3.4.13 ZooKeeper client, 
HostProvider would size() would be 1 and next() go resolve the single DNS name 
which contains one bad IP of N IPs. There is 1/N chance to use this dead host 
and can't establish TCP connection. Next try, you still have 1/N chance to hit 
the same IP. So on and so forth till application level timeout. For a large 
number of clients, there are bound to be some application level session 
establishment failure. Here we probably need make sure second round of try we 
will exclude the previously tried IP address.
 * TCP connection timeout. If the observer size is very large say M. The TCP 
connection timeout is set as initial session timeout divided by 
HostProvider.size(). If you have a hundred observers, this can cause cross data 
center TCP connection not being able to established. This is especially problem 
for ZooKeeper version < =3.4.11. As the ZooKeeper (client) would call DNS 
resolving first and one connection string (DNS name) can be mapped to 100 IP 
address. 
 * IP address of ZooKeeper server (observers) configuration can't be picked up 
by client timely: This issue is mostly affecting older version of Zookeeper. As 
they ZooKeeper (client) object would only resolve DNS name once upon 
construction. Say after running for a month, IT gradually adding more servers 
to the meet traffic growth. The newly added ip to the DNS name won't be seen. 
If IT retired some servers, the client would still try to connect to them and 
may cause session timeout etc. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to