[ https://issues.apache.org/jira/browse/ZOOKEEPER-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632111#action_12632111 ]
austin edited comment on ZOOKEEPER-127 at 9/17/08 11:36 PM: ---------------------------------------------------------------------- After several more runs of our unit test using the patched algorithm 3, the test hangs as the service repeatedly tries to reelect the killed leader. This behavior is similar to ZOOKEEPER-131 which we had experienced using algorithms 0 and 1. Server 10 is 10.50.65.40 and has been explicitly killed. The following log is from server 5, which mirrors logs on all the other servers. Any idea what's happening here? 2008-09-18 00:28:20,029 - INFO [QuorumPeer:[EMAIL PROTECTED] - LOOKING 2008-09-18 00:28:20,029 - WARN [QuorumPeer:[EMAIL PROTECTED] - unable to parse zxid string into long: txt 2008-09-18 00:28:20,029 - WARN [QuorumPeer:[EMAIL PROTECTED] - New election: 8589935405 2008-09-18 00:28:20,031 - WARN [WorkerSender Thread:[EMAIL PROTECTED] - Cannot open channel to 10( java.net.ConnectException: Connection refused) 2008-09-18 00:28:20,031 - INFO [QuorumPeer:[EMAIL PROTECTED] - FOLLOWING 2008-09-18 00:28:20,031 - INFO [QuorumPeer:[EMAIL PROTECTED] - Created server with dataDir:/zookeeper_data/5_data dataLogDir:/zookeeper_data/5_data tickT ime:2000 2008-09-18 00:28:20,031 - INFO [QuorumPeer:[EMAIL PROTECTED] - Following /10.50.65.40:2888 [[[ exception below repeats 5 times ]]] 2008-09-18 00:28:20,032 - WARN [QuorumPeer:[EMAIL PROTECTED] - Unexpected exception java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:519) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:137) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:405) [[[ then the follower is restarted ]]] 2008-09-18 00:28:24,049 - ERROR [QuorumPeer:[EMAIL PROTECTED] - FIXMSG java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:370) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:409) [[[ at this point the log repeats from the beginning ]]] was (Author: austin): After about 6 runs of our unit test the test hangs as the service repeatedly tries to reelect the killed leader (similar to ZOOKEEPER-131 with algorithms 0 and 1). After several more runs of our unit test using the patched algorithm 3, the test hangs as the service repeatedly tries to reelect the killed leader. This behavior is similar to ZOOKEEPER-131 which we had experienced using algorithms 0 and 1. Server 10 is 10.50.65.40 and has been explicitly killed. The following log is from server 5, which mirrors logs on all the other servers. Any idea what's happening here? 2008-09-18 00:28:20,029 - INFO [QuorumPeer:[EMAIL PROTECTED] - LOOKING 2008-09-18 00:28:20,029 - WARN [QuorumPeer:[EMAIL PROTECTED] - unable to parse zxid string into long: txt 2008-09-18 00:28:20,029 - WARN [QuorumPeer:[EMAIL PROTECTED] - New election: 8589935405 2008-09-18 00:28:20,031 - WARN [WorkerSender Thread:[EMAIL PROTECTED] - Cannot open channel to 10( java.net.ConnectException: Connection refused) 2008-09-18 00:28:20,031 - INFO [QuorumPeer:[EMAIL PROTECTED] - FOLLOWING 2008-09-18 00:28:20,031 - INFO [QuorumPeer:[EMAIL PROTECTED] - Created server with dataDir:/zookeeper_data/5_data dataLogDir:/zookeeper_data/5_data tickT ime:2000 2008-09-18 00:28:20,031 - INFO [QuorumPeer:[EMAIL PROTECTED] - Following /10.50.65.40:2888 [[[ exception below repeats 5 times ]]] 2008-09-18 00:28:20,032 - WARN [QuorumPeer:[EMAIL PROTECTED] - Unexpected exception java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:519) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:137) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:405) [[[ then the follower is restarted ]]] 2008-09-18 00:28:24,049 - ERROR [QuorumPeer:[EMAIL PROTECTED] - FIXMSG java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:370) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:409) [[[ at this point the log repeats from the beginning ]]] > Use of non-standard election ports in config breaks services > ------------------------------------------------------------ > > Key: ZOOKEEPER-127 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-127 > Project: Zookeeper > Issue Type: Bug > Components: quorum > Affects Versions: 3.0.0 > Reporter: Mark Harwood > Assignee: Flavio Paiva Junqueira > Priority: Minor > Fix For: 3.0.0 > > Attachments: mhPortChanges.patch, ZOOKEEPER-127.patch, > ZOOKEEPER-127.patch, ZOOKEEPER-127.patch > > > In QuorumCnxManager.toSend there is a call to create a connection as follows: > channel = SocketChannel.open(new InetSocketAddress(addr, port)); > Unfortunately "addr" is the ip address of a remote server while "port" is the > electionPort of *this* server. > As an example, given this configuration (taken from my zoo.cfg) > server.1=10.20.9.254:2881 > server.2=10.20.9.9:2882 > server.3=10.20.9.254:2883 > Server 3 was observed trying to make a connection to host 10.20.9.9 on port > 2883 and obviously failing. > In tests where all machines use the same electionPort this bug would not > manifest itself. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.