Re: A very strange scenario, may due to some bug on the server side

Qian Ye Tue, 15 Dec 2009 18:32:32 -0800

Sorry, my friend wrote a wrong log4j.properties, only record the log above
the WARN level. Will it help if I correct the log4j.properties and restart
the zookeeper server on 10.81.12.144. Will the information about session
0x32524d5440e022a be recorded in this way?


On Wed, Dec 16, 2009 at 1:46 AM, Mahadev Konar <maha...@yahoo-inc.com>wrote:

> Hi Qian,
>  This is quite weird. Are you sure the version is 3.2.1?
>   If yes, please create a jira for this.
>
>  Also, can you extract the server logs for the session
>
>
> >>         ephemeralOwner: 226627854640480810
>
> And post it on a jira? Ephemeral Owner is the session id. You can convert
> the above number to hex and look through the logs to see what happened to
> this session and post the logs on the jira. Looks like the session close
> for
> the session (226627854640480810) wasn't successful (a bug mostly). So we
> need to trace back on what happened on a close of this session and why it
> did not close.
>
> Grepping all the server logs for session id (0x32524d5440e022a, this is the
> hex of the the above decimal number) might give us some insight into this.
>
>
> Thanks
> mahadev
>
> On 12/15/09 7:44 AM, "Benjamin Reed" <br...@yahoo-inc.com> wrote:
>
> > does  se/diserver_tc/diserver_tc0000000067 appear on all three servers?
> >
> > ben
> >
> > Qian Ye wrote:
> >> Hi guys:
> >>
> >> I find a very strange scenario today, I'm not sure how it happen, I just
> >> found it like this. Maybe you can give me some information about it, my
> >> Zookeeper Server is version 3.2.1.
> >>
> >> My Zookeeper cluster contains three servers, with ip:
> >> 10.81.12.144,10.81.12.145,10.81.12.141. I wrote a client to create
> ephemeral
> >> node under znode: *se/diserver_tc*. The client runs on the server with
> ip
> >> 10.81.13.173. The client can create a ephemeral node on zookeeper server
> and
> >> write the host ip (10.81.13.173) in to the node as its data. There is
> only
> >> one client process can be running at a time, because the client will
> listen
> >> to a certain port.
> >>
> >> It is strange that I found there were two ephemeral node with the ip
> >> 10.81.13.173 under znode se/diserver_tc.
> >> *se/diserver_tc/diserver_tc0000000067*
> >> STAT:
> >>         czxid: 124554079820
> >>         mzxid: 124554079820
> >>         ctime: 1260609598547
> >>         mtime: 1260609598547
> >>         version: 0
> >>         cversion: 0
> >>         aversion: 0
> >>         ephemeralOwner: 226627854640480810
> >>         dataLength: 92
> >>         numChildren: 0
> >>         pzxid: 124554079820
> >>
> >> *se/diserver_tc/diserver_tc0000000095
> >> *STAT:
> >>         czxid: 128849019107
> >>         mzxid: 128849019107
> >>         ctime: 1260772197356
> >>         mtime: 1260772197356
> >>         version: 0
> >>         cversion: 0
> >>         aversion: 0
> >>         ephemeralOwner: 154673159808876591
> >>         dataLength: 92
> >>         numChildren: 0
> >>         pzxid: 128849019107*
> >> *
> >> There are TWO with different session id! And after I kill the client
> process
> >> on the server 10.81.13.173, the *se/diserver_tc/diserver_tc0000000095
> *node
> >> disappear, but the *se/diserver_tc/diserver_tc0000000067 *stay the same.
> >> That means it is not my coding mistake to create the node twice. I
> checked
> >> several times and I'm sure that there is no another client instance
> running.
> >> And I use the 'stat' command to check the three zookeeper servers, and
> there
> >> is no client from 10.81.13.173,
> >>
> >> $echo stat | nc 10.81.12.144 2181
> >> Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
> >> Clients:
> >>  /10.81.13.173:35676[1](queued=0,recved=0,sent=0) *# it is caused by
> the nc
> >> process*
> >>
> >> Latency min/avg/max: 0/3/254
> >> Received: 11081
> >> Sent: 0
> >> Outstanding: 0
> >> Zxid: 0x1e000001f5
> >> Mode: follower
> >> *Node count: 32
> >> *
> >> $ echo stat | nc 10.81.12.141 2181
> >> Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
> >> Clients:
> >>  /10.81.12.152:58110[1](queued=0,recved=10374,sent=0)
> >>  /10.81.13.173:35677[1](queued=0,recved=0,sent=0) *# it is caused by
> the nc
> >> process*
> >>
> >> Latency min/avg/max: 0/0/37
> >> Received: 37128
> >> Sent: 0
> >> Outstanding: 0
> >> Zxid: 0x1e000001f5
> >> Mode: follower
> >> *Node count: 26*
> >>
> >> $ echo stat | nc 10.81.12.145 2181
> >> Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
> >> Clients:
> >>  /10.81.12.153:19130[1](queued=0,recved=10624,sent=0)
> >>  /10.81.13.173:35678[1](queued=0,recved=0,sent=0) *# it is caused by
> the nc
> >> process*
> >>
> >> Latency min/avg/max: 0/2/213
> >> Received: 26700
> >> Sent: 0
> >> Outstanding: 0
> >> Zxid: 0x1e000001f5
> >> Mode: leader
> >> *Node count: 26*
> >>
> >> The three 'stat' commands show different Node count! Just cannot
> understand
> >> how it happened, can anyone give me some explanation about it?
> >>
> >>
> >>
> >
>
>


-- 
With Regards!

Ye, Qian
Made in Zhejiang University

Re: A very strange scenario, may due to some bug on the server side

Reply via email to