Hi Camille, Can you share the kind of problems you were facing on the servers that forced you to rollback the cluster?
Thanks. -Vishal On Thu, Aug 4, 2011 at 1:29 PM, Fournier, Camille F. < [email protected]> wrote: > We had an issue here the other day where the ZK servers were running > poorly, and in an effort to get them healthy again we ended up rolling back > the cluster state. While this was, in retrospect, not the right solution to > the problem we were facing, it brought up another problem. Namely, that many > of our clients couldn't reconnect with their sessions because their zxid was > too high (expected), but that the error they got when trying to do that > reconnection was just a vanilla disconnected error. The result was that most > of our clients had to be bounced. > > Aside from trying hard to avoid ever rolling back the cluster state, does > anyone have a way they deal with this situation if it occurs? Should we > consider enhancing the error message to the client so we could track the > fact that we were ahead of the quorum zxid and react sensibly? Alternately, > since we were sending a sessionId along with the zxid, perhaps it would be > nice to check to see if the sessionId exists before checking the zxid, which > would send an expired state signal which my client code could handle > cleanly. > > Any ideas or suggestions would be welcome. > > C > >
