I'm still working on it (going on in parallel with a bunch of other things). Will let you guys know what I figure out as soon as I get some results. I think you are on to something Patrick. That is some gold advice. Thanks guys.
-n On Wed, Jun 3, 2009 at 11:39 AM, Patrick Hunt <ph...@apache.org> wrote: > Nitay, any luck? Feel free to create a JIRA to track this. If you point to > the test code that's experiencing the problem we'll try and take a look. > > Patrick > > > Patrick Hunt wrote: > >> This log manifests if the client is running ahead of the server. >> >> say you have: >> 1) client connects to server A and sees some changes >> 2) client gets disconnected from A and attempts to connect to B >> 3) B can be running behind A by some number of changes (it will eventually >> catch up) >> 4) client will attempt to connect to another server that's at, or ahead of >> it's zxid until successful. >> >> why? this ensures that the client never sees old data, part of the >> guarantee you are provided when using zk. However since servers in a quorum >> can run behind (minority) then you might see this. >> >> It's unusual to see this so many times however. I see that you are running >> this as part of a junit test. Perhaps that has some impact? Are you shutting >> down servers, perhaps clearing the datadir and restarting them, w/o closing >> all of the clients? If your tests are not running in "fork mode" for junit >> (or multiple tests w/in a junit test class) then old clients can hang around >> _if not explicitly closed_ and try to re-connect to new servers that you are >> using for new tests - if the servers are starting fresh (zxid=1) then you >> can see this alot as the old (zombie) clients cannot connect to the new >> servers. Perhaps this is what you are seeing? >> >> Patrick >> >> Nitay wrote: >> >>> I see. That helps. However, even as warnings, these go on seemingly >>> endlessly. Why do they not get fixed by themselves? What are we doing >>> wrong >>> here? >>> >>> On Tue, Jun 2, 2009 at 2:24 PM, Mahadev Konar <maha...@yahoo-inc.com> >>> wrote: >>> >>> Hi Nitay, >>>> This is not an error but should be a warning. I have opened up a jira >>>> for >>>> it. >>>> >>>> http://issues.apache.org/jira/browse/ZOOKEEPER-428 >>>> >>>> >>>> The message just says that a client is connecting to a server that is >>>> behind >>>> that a server is was connected to earlier. The log should be warn and >>>> not >>>> error and should be fixed in the next release. >>>> >>>> mahadev >>>> >>>> On 6/2/09 2:12 PM, "Nitay" <nit...@gmail.com> wrote: >>>> >>>> Hey guys, >>>>> >>>>> We are getting a lot of messages like this in HBase: >>>>> >>>>> [junit] 2009-06-02 11:57:23,658 ERROR [NIOServerCxn.Factory:21810] >>>>> server.NIOServerCnxn(514): Client has seen zxid 0xe our last zxid is >>>>> 0xd >>>>> >>>>> For more context, the block it usually appears in is: >>>>> >>>>> [junit] 2009-06-02 13:27:54,083 INFO [main-SendThread] >>>>> zookeeper.ClientCnxn$SendThread(737): Priming connection to >>>>> java.nio.channels.SocketChannel[connected >>>>> local=/0:0:0:0:0:0:0:1%0:56511 >>>>> remote=localhost/0:0:0:0:0:0:0:1:21810] >>>>> [junit] 2009-06-02 13:27:54,084 INFO [main-SendThread] >>>>> zookeeper.ClientCnxn$SendThread(889): Server connection successful >>>>> [junit] 2009-06-02 13:27:54,093 INFO [NIOServerCxn.Factory:21810] >>>>> server.NIOServerCnxn(532): Connected to /0:0:0:0:0:0:0:1%0:56511 >>>>> lastZxid >>>>> >>>> 16 >>>> >>>>> [junit] 2009-06-02 13:27:54,094 ERROR [NIOServerCxn.Factory:21810] >>>>> server.NIOServerCnxn(543): Client has seen zxid 0x10 our last zxid is >>>>> 0x4 >>>>> [junit] 2009-06-02 13:27:54,094 WARN [NIOServerCxn.Factory:21810] >>>>> server.NIOServerCnxn(444): Exception causing close of session 0x0 due >>>>> to >>>>> java.io.IOException: Client has seen zxid 0x10 our last zxid is 0x4 >>>>> [junit] 2009-06-02 13:27:54,094 DEBUG >>>>> [NIOServerCxn.Facto777ry:21810] >>>>> server.NIOServerCnxn(447): IOException stack trace >>>>> [junit] java.io.IOException: Client has seen zxid 0x10 our last zxid >>>>> >>>> is >>>> >>>>> 0x4 >>>>> [junit] at >>>>> >>>>> org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.jav >>>> >>>> >>>>> a:544) >>>>> [junit] at >>>>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:331) >>>>> [junit] at >>>>> >>>>> org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:176) >>>> >>>> >>>>> [junit] 2009-06-02 13:27:54,094 INFO [NIOServerCxn.Factory:21810] >>>>> server.NIOServerCnxn(777): closing session:0x0 NIOServerCnxn: >>>>> java.nio.channels.SocketChannel[connected >>>>> local=/0:0:0:0:0:0:0:1%0:21810 >>>>> remote=/0:0:0:0:0:0:0:1%0:56511] >>>>> [junit] 2009-06-02 13:27:54,097 WARN [main-SendThread] >>>>> zookeeper.ClientCnxn$SendThread(919): Exception closing session >>>>> 0x121a2a7c43a0002 to sun.nio.ch.selectionkeyi...@2c662b4e >>>>> [junit] java.io.IOException: Read error rc = -1 >>>>> java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] >>>>> [junit] at >>>>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:653) >>>>> [junit] at >>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:897) >>>>> [junit] 2009-06-02 13:27:54,097 WARN [main-SendThread] >>>>> zookeeper.ClientCnxn$SendThread(953): Ignoring exception during >>>>> shutdown >>>>> input >>>>> [junit] java.net.SocketException: Socket is not connected >>>>> [junit] at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) >>>>> [junit] at >>>>> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640) >>>>> [junit] at >>>>> sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) >>>>> [junit] at >>>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:951) >>>>> [junit] at >>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:922) >>>>> >>>>> >>>>> This happens in a seemingly endless loop. We are not quite sure what it >>>>> means. Can someone help shed some light on these messages? >>>>> >>>>> Thanks, >>>>> -n >>>>> >>>> >>>> >>>