We did have a case where the user setup 3 servers, each was standalone. :-) Doesn't look like that's the problem here though given you only specify 1 server in the connect string (although as mahadev mentioned you don't need to worry about that aspect).

After it goes 7->11->9, does it ever go back to 11 or just 9?

It would be good to capture the server log files (all 3) when this happens next time. Please provide those as well, would be critical for discovering this. In particular not many users are running cross-colo clusters.

If you can provide the config files too that will be useful.

What version of java/OS is being used?

Might be a good time to create a JIRA, attach all this to the JIRA so that you don't have to repeat. :-)

Patrick

On 04/12/2010 02:26 PM, Kevin Webb wrote:
On Mon, 12 Apr 2010 09:27:46 -0700
Mahadev Konar<maha...@yahoo-inc.com>  wrote:

HI Kevin,

  The cversion should be monotonically increasing for the the znode.
It would be a bug if its not. Can you please elaborate in which cases
you are seeing the cversion decreasing? If you can reproduce with an
example that would be great.

Thanks
mahadev

Thanks Mahadev and Patrick!

Here are some more details:

I'm using the C client and running three servers on PlanetLab, with
each server on a different continent.  Most of the time, the cversion
is increasing as expected.  I'm never deleting the group node, so
that's not the issue.

Of course, now that I've emailed this list, I haven't seen it happen
again...

I do have one old log file though:

ZK(10): 1270514949 (Re)Connected to zookeeper server.
ZK(10): 1270514952 Beginning new view #7.  Unsetting panic...
GOSSIP(10): 1270514952 Changing view to 7
ZK(10): 1270515798 Disconnected from zookeeper.  Setting panic...
ZK(10): 1270515803 (Re)Connected to zookeeper server.
ZK(10): 1270515806 Beginning new view #7.  Unsetting panic...
GOSSIP(10): 1270515806 Ignoring delivery request for view 7, current
view is 7.
ZK(10): 1270516812 Disconnected from zookeeper.  Setting panic...
ZK(10): 1270516823 (Re)Connected to zookeeper server.
ZK(10): 1270516826 Beginning new view #11.  Unsetting panic...
GOSSIP(10): 1270516826 Changing view to 11
ZK(10): 1270519191 Disconnected from zookeeper.  Setting panic...
ZK(10): 1270519195 (Re)Connected to zookeeper server.
ZK(10): 1270519198 Beginning new view #9.  Unsetting panic...
GOSSIP(10): 1270519198 Ignoring delivery request for view 9, current
view is 11.

The large integral number is a Unix seconds-since-epoch timestamp (the
result of calling time(NULL)).

In this case, the client connected, got group #7, disconnected,
reconnected, got #7 again, disconnected, reconnected, got #11,
disconnected, reconnected, and then got #9.

The host string that I pass to zookeeper_init contains only one
address:port, so it's not an issue of re-connecting to a different
server and getting old/stale information.


If/when it does happen again, I'll be sure to also save the zookeeper
server logs.

-Kevin

Reply via email to