[ https://issues.apache.org/jira/browse/ZOOKEEPER-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flavio Junqueira resolved ZOOKEEPER-2982. ----------------------------------------- Resolution: Fixed Issue resolved by pull request 513 [https://github.com/apache/zookeeper/pull/513] > Re-try DNS hostname -> IP resolution > ------------------------------------ > > Key: ZOOKEEPER-2982 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2982 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.5.0, 3.5.1, 3.5.3 > Reporter: Eron Wright > Assignee: Flavio Junqueira > Priority: Blocker > Fix For: 3.6.0, 3.5.4 > > Attachments: 3.5.3-beta.zip, fixed.log > > > ZOOKEEPER-1506 fixed a DNS resolution issue in 3.4. Some portions of the fix > haven't yet been ported to 3.5. > To recap the outstanding problem in 3.5, if a given ZK server is started > before all peer addresses are resolvable, that server may cache a negative > lookup result and forever fail to resolve the address. For example, > deploying ZK 3.5 to Kubernetes using a StatefulSet plus a Service (headless) > may fail because the DNS records are created lazily. > {code} > 2018-02-18 09:11:22,583 [myid:0] - WARN > [QuorumPeer[myid=0](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Follower@95] > - Exception when following the leader > java.net.UnknownHostException: zk-2.zk.default.svc.cluster.local > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at > org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:227) > at > org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:256) > at > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:76) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133) > {code} > In the above example, the address `zk-2.zk.default.svc.cluster.local` was not > resolvable when the server started, but became resolvable shortly thereafter. > The server should eventually succeed but doesn't. -- This message was sent by Atlassian JIRA (v7.6.3#76005)