Sure, I will reproduce it with debug enabled and create a JIRA. Thanks. On Thu, Oct 7, 2010 at 12:23 PM, Patrick Hunt <ph...@apache.org> wrote:
> Vishal, this sounds like a bug in ZK to me. Can you create a JIRA with this > description, your configuration files from all servers, and the log files > from all servers during the time of the incident? If you could run the > servers in DEBUG level logging during the time you reproduce the issue that > would probably help: > https://issues.apache.org/jira/browse/ZOOKEEPER > > Thanks! > > Patrick > > > On Wed, Oct 6, 2010 at 2:57 PM, Vishal K <vishalm...@gmail.com> wrote: > >> Hi Patrick, >> >> You are correct, the test restarts both ZooKeeper server and the client. >> The >> client opens a new connection after restarting. So we would expect that >> the >> ephmeral znode (/foo) to expire after the session timeout. However, the >> client with the new session creates the ephemeral znode (/foo) again after >> it reboots (it sets a watch for /foo and recreates /foo if it is deleted >> or >> doesn't exist). The client is not reusing the session ID. What I expect to >> see is that the older /foo should expire after which a new /foo should get >> created. Is my expectation correct? >> >> What confuses me is the following output of 3 successive getstat /foo >> requests on A (the zxid, time and owner fields). Notice that the older >> znode reappeared. >> At the same time when I do getstat at B and C, I see the newer /foo. >> >> log4j:WARN No appenders could be found for logger >> (org.apache.zookeeper.ZooKeeper). >> log4j:WARN Please initialize the log4j system properly. >> cZxid = 0x1000005ef >> ctime = Tue Oct 05 15:00:50 UTC 2010 >> mZxid = 0x1000005ef >> mtime = Tue Oct 05 15:00:50 UTC 2010 >> pZxid = 0x1000005ef >> cversion = 0 >> dataVersion = 0 >> aclVersion = 0 >> ephemeralOwner = 0x2b7ce57ce40000 >> dataLength = 54 >> numChildren = 0 >> >> log4j:WARN No appenders could be found for logger >> (org.apache.zookeeper.ZooKeeper). >> log4j:WARN Please initialize the log4j system properly. >> cZxid = 0x100000607 >> ctime = Tue Oct 05 15:01:07 UTC 2010 >> mZxid = 0x100000607 >> mtime = Tue Oct 05 15:01:07 UTC 2010 >> pZxid = 0x100000607 >> cversion = 0 >> dataVersion = 0 >> aclVersion = 0 >> ephemeralOwner = 0x2b7ce5bda40000 >> dataLength = 54 >> numChildren = 0 >> >> log4j:WARN No appenders could be found for logger >> (org.apache.zookeeper.ZooKeeper). >> log4j:WARN Please initialize the log4j system properly. >> cZxid = 0x1000005ef >> ctime = Tue Oct 05 15:00:50 UTC 2010 >> mZxid = 0x1000005ef >> mtime = Tue Oct 05 15:00:50 UTC 2010 >> pZxid = 0x1000005ef >> cversion = 0 >> dataVersion = 0 >> aclVersion = 0 >> ephemeralOwner = 0x2b7ce57ce40000 >> dataLength = 54 >> numChildren = 0 >> >> Thanks for your help. >> >> -Vishal >> >> On Wed, Oct 6, 2010 at 4:45 PM, Patrick Hunt <ph...@apache.org> wrote: >> >> > Vishal the attachment seems to be getting removed by the list daemon (I >> > don't have it), can you create a JIRA and attach? Also this is a good >> > question for the ppl on zookeeper-user. (ccing) >> > >> > You are aware that ephemeral znodes are tied to the session? And that >> > sessions only expire after the session timeout period? At which time any >> > znodes created during that session are then deleted. The fact that you >> are >> > "kill"ing your client process leads me to believe that you are not >> closing >> > the session cleanly (meaning that it will eventually expire after the >> > session timeout period), in which case the ephemeral znodes _should_ >> > reappear when A is restarted and successfully rejoins the cluster. (at >> > least >> > until the session timeout is exceeded) >> > >> > Patrick >> > >> > On Tue, Oct 5, 2010 at 11:04 AM, Vishal K <vishalm...@gmail.com> wrote: >> > >> > > Hi, >> > > >> > > I have a 3 node ZK cluster (A, B, C). On one of the the nodes (node >> A), I >> > > have a ZK client running that connects to the local server and creates >> an >> > > ephemeral znode to indicate clients on other nodes that it is online. >> > > >> > > I have test script that reboots the zookeeper server as well as client >> on >> > > A. The test does a getstat on the ephemeral znode created by the >> client >> > on >> > > A. I am seeing that the view of znodes on A is different from the >> other 2 >> > > nodes. I can tell this from the session ID that the client gets after >> > > reconnecting to the local ZK server. >> > > >> > > So the test is simple: >> > > - kill zookeeper server and client process >> > > - wait for a few seconds >> > > - do zkCli.sh stat ... > test.out >> > > >> > > What I am seeing is that the ephemeral znode with old zxid, time, and >> > > session ID is reappearing on node A. I have attached the output of 3 >> > > consecutive getstat requests of the test (see client_getstat.out). >> Notice >> > > that the third output is the same as the first one. That is, the old >> > > ephemeral znode reappeared at A. However, both B and C are showing the >> > > latest znode with correct time, zxid and session ID (output not >> > attached). >> > > >> > > After this point, all following getstat requests on A are showing the >> old >> > > znode. Whereas, B and C show the correct znode every time the client >> on A >> > > comes online. This is something very perplexing. Earlier I thought >> this >> > was >> > > a bug in my client implementation. But the test shows that the ZK >> server >> > on >> > > A after reboot is out of sync with rest of the servers. >> > > >> > > The stat command to each server shows that the servers are in sync as >> far >> > > as zxid's are concerned (see stat.out). So there is something wrong >> with >> > A's >> > > local database that is causing this problem. >> > > >> > > Has anyone seen this before? I will be doing more debugging in the >> next >> > few >> > > days. Comments/suggestions for further debugging are welcomed. >> > > >> > > -Vishal >> > > >> > > >> > > >> > >> > >