Sorry, my friend wrote a wrong log4j.properties, only record the log above the WARN level. Will it help if I correct the log4j.properties and restart the zookeeper server on 10.81.12.144. Will the information about session 0x32524d5440e022a be recorded in this way?
On Wed, Dec 16, 2009 at 1:46 AM, Mahadev Konar <maha...@yahoo-inc.com>wrote: > Hi Qian, > This is quite weird. Are you sure the version is 3.2.1? > If yes, please create a jira for this. > > Also, can you extract the server logs for the session > > > >> ephemeralOwner: 226627854640480810 > > And post it on a jira? Ephemeral Owner is the session id. You can convert > the above number to hex and look through the logs to see what happened to > this session and post the logs on the jira. Looks like the session close > for > the session (226627854640480810) wasn't successful (a bug mostly). So we > need to trace back on what happened on a close of this session and why it > did not close. > > Grepping all the server logs for session id (0x32524d5440e022a, this is the > hex of the the above decimal number) might give us some insight into this. > > > Thanks > mahadev > > On 12/15/09 7:44 AM, "Benjamin Reed" <br...@yahoo-inc.com> wrote: > > > does se/diserver_tc/diserver_tc0000000067 appear on all three servers? > > > > ben > > > > Qian Ye wrote: > >> Hi guys: > >> > >> I find a very strange scenario today, I'm not sure how it happen, I just > >> found it like this. Maybe you can give me some information about it, my > >> Zookeeper Server is version 3.2.1. > >> > >> My Zookeeper cluster contains three servers, with ip: > >> 10.81.12.144,10.81.12.145,10.81.12.141. I wrote a client to create > ephemeral > >> node under znode: *se/diserver_tc*. The client runs on the server with > ip > >> 10.81.13.173. The client can create a ephemeral node on zookeeper server > and > >> write the host ip (10.81.13.173) in to the node as its data. There is > only > >> one client process can be running at a time, because the client will > listen > >> to a certain port. > >> > >> It is strange that I found there were two ephemeral node with the ip > >> 10.81.13.173 under znode se/diserver_tc. > >> *se/diserver_tc/diserver_tc0000000067* > >> STAT: > >> czxid: 124554079820 > >> mzxid: 124554079820 > >> ctime: 1260609598547 > >> mtime: 1260609598547 > >> version: 0 > >> cversion: 0 > >> aversion: 0 > >> ephemeralOwner: 226627854640480810 > >> dataLength: 92 > >> numChildren: 0 > >> pzxid: 124554079820 > >> > >> *se/diserver_tc/diserver_tc0000000095 > >> *STAT: > >> czxid: 128849019107 > >> mzxid: 128849019107 > >> ctime: 1260772197356 > >> mtime: 1260772197356 > >> version: 0 > >> cversion: 0 > >> aversion: 0 > >> ephemeralOwner: 154673159808876591 > >> dataLength: 92 > >> numChildren: 0 > >> pzxid: 128849019107* > >> * > >> There are TWO with different session id! And after I kill the client > process > >> on the server 10.81.13.173, the *se/diserver_tc/diserver_tc0000000095 > *node > >> disappear, but the *se/diserver_tc/diserver_tc0000000067 *stay the same. > >> That means it is not my coding mistake to create the node twice. I > checked > >> several times and I'm sure that there is no another client instance > running. > >> And I use the 'stat' command to check the three zookeeper servers, and > there > >> is no client from 10.81.13.173, > >> > >> $echo stat | nc 10.81.12.144 2181 > >> Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT > >> Clients: > >> /10.81.13.173:35676[1](queued=0,recved=0,sent=0) *# it is caused by > the nc > >> process* > >> > >> Latency min/avg/max: 0/3/254 > >> Received: 11081 > >> Sent: 0 > >> Outstanding: 0 > >> Zxid: 0x1e000001f5 > >> Mode: follower > >> *Node count: 32 > >> * > >> $ echo stat | nc 10.81.12.141 2181 > >> Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT > >> Clients: > >> /10.81.12.152:58110[1](queued=0,recved=10374,sent=0) > >> /10.81.13.173:35677[1](queued=0,recved=0,sent=0) *# it is caused by > the nc > >> process* > >> > >> Latency min/avg/max: 0/0/37 > >> Received: 37128 > >> Sent: 0 > >> Outstanding: 0 > >> Zxid: 0x1e000001f5 > >> Mode: follower > >> *Node count: 26* > >> > >> $ echo stat | nc 10.81.12.145 2181 > >> Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT > >> Clients: > >> /10.81.12.153:19130[1](queued=0,recved=10624,sent=0) > >> /10.81.13.173:35678[1](queued=0,recved=0,sent=0) *# it is caused by > the nc > >> process* > >> > >> Latency min/avg/max: 0/2/213 > >> Received: 26700 > >> Sent: 0 > >> Outstanding: 0 > >> Zxid: 0x1e000001f5 > >> Mode: leader > >> *Node count: 26* > >> > >> The three 'stat' commands show different Node count! Just cannot > understand > >> how it happened, can anyone give me some explanation about it? > >> > >> > >> > > > > -- With Regards! Ye, Qian Made in Zhejiang University