hi, Ted and Mahadev,
Here are some more details about my setup: I run zookeeper in the embedded mode with the following code: quorumPeer = new QuorumPeer(); quorumPeer.setClientPort(getClientPort()); quorumPeer.setTxnFactory(new FileTxnSnapLog(new File(getDataLogDir()), new File(getDataDir()))); quorumPeer.setQuorumPeers(getServers()); quorumPeer.setElectionType(getElectionAlg()); quorumPeer.setMyid(getServerId()); quorumPeer.setTickTime(getTickTime()); quorumPeer.setInitLimit(getInitLimit()); quorumPeer.setSyncLimit(getSyncLimit()); quorumPeer.setQuorumVerifier(getQuorumVerifier()); quorumPeer.setCnxnFactory(cnxnFactory); quorumPeer.start(); The configuration values are read from the following XML document for server 1: <cluster tickTime="1000" initLimit="10" syncLimit="5" clientPort="2181" serverId="1"> <member id="1" host="192.168.2.6:2888:3888"/> <member id="2" host="192.168.2.3:2888:3888"/> <member id="3" host="192.168.2.4:2888:3888"/> </cluster> The other servers have the same configurations except their ids being changed to 2 and 3. The error occurred on server 3 when I batch loaded some messages to server 1. However, this error does not always happen. I am not sure exactly what trigged this error yet. I also performed the "stat" operation on one of the "No exit" node and got: stat /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000001583 Exception in thread "main" java.lang.NullPointerException at org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:129) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:715) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:579) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:351) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:309) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:268) [...@t43 zookeeper-3.2.2]$ bin/zkCli.sh Those message nodes are created as CreateMode.PERSISTENT_SEQUENTIAL and are deleted by the last server who has read them. If I remove the troubled server's zookeeper log directory and restart the server, then everything is ok. I will try to get the nc result next time I see this problem. Dr Hao He XPE - the truly SOA platform h...@softtouchit.com http://softtouchit.com http://itunes.com/apps/Scanmobile On 12/08/2010, at 12:32 AM, Mahadev Konar wrote: > HI Dr Hao, > Can you please post the configuration of all the 3 zookeeper servers? I > suspect it might be misconfigured clusters and they might not belong to the > same ensemble. > > Just to be clear: > /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002807 > > And other such nodes exist on one of the zookeeper servers and the same node > does not exist on other servers? > > Also, as ted pointed out, can you please post the output of echo ³stat² | nc > localhost 2181 (on all the 3 servers) to the list? > > Thanks > mahadev > > > > On 8/11/10 12:10 AM, "Dr Hao He" <h...@softtouchit.com> wrote: > >> hi, Ted, >> >> Thanks for the reply. Here is what I did: >> >> [zk: localhost:2181(CONNECTED) 0] ls >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948 >> [] >> zk: localhost:2181(CONNECTED) 1] ls >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs >> [msg0000002807, msg0000002700, msg0000002701, msg0000002804, msg0000002704, >> msg0000002706, msg0000002601, msg0000001849, msg0000001847, msg0000002508, >> msg0000002609, msg0000001841, msg0000002607, msg0000002606, msg0000002604, >> msg0000002809, msg0000002817, msg0000001633, msg0000002812, msg0000002814, >> msg0000002711, msg0000002815, msg0000002713, msg0000002716, msg0000001772, >> msg0000002811, msg0000001635, msg0000001774, msg0000002515, msg0000002610, >> msg0000001838, msg0000002517, msg0000002612, msg0000002519, msg0000001973, >> msg0000001835, msg0000001974, msg0000002619, msg0000001831, msg0000002510, >> msg0000002512, msg0000002615, msg0000002614, msg0000002617, msg0000002104, >> msg0000002106, msg0000001769, msg0000001768, msg0000002828, msg0000002822, >> msg0000001760, msg0000002820, msg0000001963, msg0000001961, msg0000002110, >> msg0000002118, msg0000002900, msg0000002836, msg0000001757, msg0000002907, >> msg0000001753, msg0000001752, msg0000001755, msg0000001952, msg0000001958, >> msg0000001852, msg0000001956, msg0000001854, msg0000002749, msg0000001608, >> msg0000001609, msg0000002747, msg0000002882, msg0000001743, msg0000002888, >> msg0000001605, msg0000002885, msg0000001487, msg0000001746, msg0000002330, >> msg0000001749, msg0000001488, msg0000001489, msg0000001881, msg0000001491, >> msg0000002890, msg0000001889, msg0000002758, msg0000002241, msg0000002892, >> msg0000002852, msg0000002759, msg0000002898, msg0000002850, msg0000001733, >> msg0000002751, msg0000001739, msg0000002753, msg0000002756, msg0000002332, >> msg0000001872, msg0000002233, msg0000001721, msg0000001627, msg0000001720, >> msg0000001625, msg0000001628, msg0000001629, msg0000001729, msg0000002350, >> msg0000001727, msg0000002352, msg0000001622, msg0000001726, msg0000001623, >> msg0000001723, msg0000001724, msg0000001621, msg0000002736, msg0000002738, >> msg0000002363, msg0000001717, msg0000002878, msg0000002362, msg0000002361, >> msg0000001611, msg0000001894, msg0000002357, msg0000002218, msg0000002358, >> msg0000002355, msg0000001895, msg0000002356, msg0000001898, msg0000002354, >> msg0000001996, msg0000001990, msg0000002093, msg0000002880, msg0000002576, >> msg0000002579, msg0000002267, msg0000002266, msg0000002366, msg0000001901, >> msg0000002365, msg0000001903, msg0000001799, msg0000001906, msg0000002368, >> msg0000001597, msg0000002679, msg0000002166, msg0000001595, msg0000002481, >> msg0000002482, msg0000002373, msg0000002374, msg0000002371, msg0000001599, >> msg0000002773, msg0000002274, msg0000002275, msg0000002270, msg0000002583, >> msg0000002271, msg0000002580, msg0000002067, msg0000002277, msg0000002278, >> msg0000002376, msg0000002180, msg0000002467, msg0000002378, msg0000002182, >> msg0000002377, msg0000002184, msg0000002379, msg0000002187, msg0000002186, >> msg0000002665, msg0000002666, msg0000002381, msg0000002382, msg0000002661, >> msg0000002662, msg0000002663, msg0000002385, msg0000002284, msg0000002766, >> msg0000002282, msg0000002190, msg0000002599, msg0000002054, msg0000002596, >> msg0000002453, msg0000002459, msg0000002457, msg0000002456, msg0000002191, >> msg0000002652, msg0000002395, msg0000002650, msg0000002656, msg0000002655, >> msg0000002189, msg0000002047, msg0000002658, msg0000002659, msg0000002796, >> msg0000002250, msg0000002255, msg0000002589, msg0000002257, msg0000002061, >> msg0000002064, msg0000002585, msg0000002258, msg0000002587, msg0000002444, >> msg0000002446, msg0000002447, msg0000002450, msg0000002646, msg0000001501, >> msg0000002591, msg0000002592, msg0000001503, msg0000001506, msg0000002260, >> msg0000002594, msg0000002262, msg0000002263, msg0000002264, msg0000002590, >> msg0000002132, msg0000002130, msg0000002530, msg0000002931, msg0000001559, >> msg0000001808, msg0000002024, msg0000001553, msg0000002939, msg0000002937, >> msg0000001556, msg0000002935, msg0000002933, msg0000002140, msg0000001937, >> msg0000002143, msg0000002520, msg0000002522, msg0000002429, msg0000002524, >> msg0000002920, msg0000002035, msg0000001561, msg0000002134, msg0000002138, >> msg0000002925, msg0000002151, msg0000002287, msg0000002555, msg0000002010, >> msg0000002002, msg0000002290, msg0000001537, msg0000002005, msg0000002147, >> msg0000002145, msg0000002698, msg0000001592, msg0000001810, msg0000002690, >> msg0000002691, msg0000001911, msg0000001910, msg0000002693, msg0000001812, >> msg0000001817, msg0000001547, msg0000002012, msg0000002015, msg0000002941, >> msg0000001688, msg0000002018, msg0000002684, msg0000002944, msg0000001540, >> msg0000002686, msg0000001541, msg0000002946, msg0000002688, msg0000001584, >> msg0000002948] >> >> [zk: localhost:2181(CONNECTED) 7] delete >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948 >> Node does not exist: >> /xpe/queues/3bd7851e79381ef4bfd1a5857b5e34c04e5159e5/msgs/msg0000002948 >> >> When I performed the same operations on another node, none of those nodes >> existed. >> >> >> Dr Hao He >> >> XPE - the truly SOA platform >> >> h...@softtouchit.com >> http://softtouchit.com >> http://itunes.com/apps/Scanmobile >> >> On 11/08/2010, at 4:38 PM, Ted Dunning wrote: >> >>> Can you provide some more information? The output of some of the four >>> letter commands and a transcript of what you are doing would be very >>> helpful. >>> >>> Also, there is no way for znodes to exist on one node of a properly >>> operating ZK cluster and not on either of the other two. Something has to >>> be wrong and I would vote for operator error (not to cast aspersions, it is >>> just that humans like you and *me* make more errors than ZK does). >>> >>> On Tue, Aug 10, 2010 at 11:32 PM, Dr Hao He <h...@softtouchit.com> wrote: >>> >>>> hi, All, >>>> >>>> I have a 3-host cluster running ZooKeeper 3.2.2. On one of the hosts, >>>> there are a number of nodes that I can "get" and "ls" using zkCli.sh . >>>> However, when I tried to "delete" any of them, I got "Node does not exist" >>>> error. Those nodes do not exist on the other two hosts. >>>> >>>> Any idea how we should handle this type of errors and what might have >>>> caused this problem? >>>> >>>> Dr Hao He >>>> >>>> XPE - the truly SOA platform >>>> >>>> h...@softtouchit.com >>>> http://softtouchit.com >>>> http://itunes.com/apps/Scanmobile >>>> >>>> >> >> > >