Oh blah, of course it won't be b/w compatible, because all the older clients would expire their sessions in the instance of a single zxid higher than the cluster zxid which I doubt most people want.
Is there a way to check if the zxid of the client is higher than the current possible zxid after connection, and send the session_expired then? That would at least help us out most of the way. -----Original Message----- From: Patrick Hunt [mailto:[email protected]] Sent: Thursday, August 04, 2011 7:23 PM To: [email protected] Subject: Re: devops/admin/client question: What do you do when you rollback? Sounds reasonable to me as long as it's b/w compatible (which it seems like it would be), anything we can do to improve this situation would be huge - I frequently see our support team trying to address this (e.g. the max count exceeded issue) with clients like hbase. Def plus for supportability. Patrick On Thu, Aug 4, 2011 at 4:11 PM, Camille Fournier <[email protected]> wrote: > I'm thinking of hacking it through the connectresponse session timeout > (similar to the way we detect session rejected). I wrote up a prototype that > worked ok this way. Might could extend this hack to other things, using that > field as an encoded error msg, thoughts? > > C > On Aug 4, 2011 6:10 PM, "Patrick Hunt" <[email protected]> wrote: >> Our error reporting server->client has always been weak. It's a PITA >> to debug in production because a lot of times when the client gets >> bounced it's not clear from the client side why (you end up having to >> search the server log - for example when maxClientCount is exceeded). >> It would be great to fix this, esp if the server could provide insight >> to the client about why (an error code/message perhaps). Doing it in a >> b/w compatible way might be tough though... >> >> Patrick >> >> On Thu, Aug 4, 2011 at 2:45 PM, Ted Dunning <[email protected]> wrote: >>> This is used normally to guarantee in-order data views. If you get >>> disconnected from one host in an advanced state and then connect to an > out >>> of date slave, ZK automatically disconnects you to avoid letting you see >>> time go backwards. Your situation is different of course. >>> >>> >>> >>> On Thu, Aug 4, 2011 at 7:05 PM, Fournier, Camille F. < >>> [email protected]> wrote: >>> >>>> Right now the server just detects that the zxid is wrong, and calls > close >>>> on the client. The client logs: >>>> 15:01:47,593 - INFO >>>> [main-SendThread(localhost:2181):ClientCnxn$SendThread@1159] - Unable > to >>>> read additional data from server sessionid 0x131962b00540000, likely > server >>>> has closed socket, closing socket connection and attempting reconnect >>>> (branch 3.3.3) >>>> >>>> I will poke around and see if I can figure out a nicer way to indicate > this >>>> condition. The expired state is perfectly fine for me in my use case. >>>> >>>> C >>>> >>>> >>>> -----Original Message----- >>>> From: Patrick Hunt [mailto:[email protected]] >>>> Sent: Thursday, August 04, 2011 1:51 PM >>>> To: [email protected] >>>> Subject: Re: devops/admin/client question: What do you do when you >>>> rollback? >>>> >>>> On Thu, Aug 4, 2011 at 10:29 AM, Fournier, Camille F. >>>> <[email protected]> wrote: >>>> > We had an issue here the other day where the ZK servers were running >>>> poorly, and in an effort to get them healthy again we ended up rolling > back >>>> the cluster state. While this was, in retrospect, not the right solution > to >>>> the problem we were facing, it brought up another problem. Namely, that > many >>>> of our clients couldn't reconnect with their sessions because their zxid > was >>>> too high (expected), but that the error they got when trying to do that >>>> reconnection was just a vanilla disconnected error. The result was that > most >>>> of our clients had to be bounced. >>>> >>>> Hi Camille, there's a long standing jira on this: >>>> https://issues.apache.org/jira/browse/ZOOKEEPER-523 >>>> >>>> > Aside from trying hard to avoid ever rolling back the cluster state, > does >>>> anyone have a way they deal with this situation if it occurs? Should we >>>> consider enhancing the error message to the client so we could track the >>>> fact that we were ahead of the quorum zxid and react sensibly? > Alternately, >>>> since we were sending a sessionId along with the zxid, perhaps it would > be >>>> nice to check to see if the sessionId exists before checking the zxid, > which >>>> would send an expired state signal which my client code could handle >>>> cleanly. >>>> >>>> It seems reasonable that if the client connects to all servers in the >>>> ensemble (that it knows about) and sees that it's ahead of each one, >>>> it should consider the session expired (we could add a new state, but >>>> seems like just treating as expired with a good log message would be >>>> better from b/w compat standpoint). >>>> >>>> I can't recall, does the client have sufficient information to make >>>> this determination, or is the server just disconnecting? >>>> >>>> Patrick >>>> >>> >
