Why do you get the version in the first place without getting the contents?
If you don't have the contents, what is the point of enforcing a version. On Mon, Oct 10, 2011 at 8:26 AM, Ishaaq Chandy <ish...@gmail.com> wrote: > Thanks Mahadev, > Yup, I am aware of the fact that 2 is a particularly bad number for cluster > size and hopefully we should fix that soon, I was just hoping that for some > reason that was why the problem is occurring - my conjecture was, for e.g. > if the two zk servers disagree about the version there is no way to decide > who is correct without a third tie-breaker server. > > But, if you say that is not the case, then I need to keep looking (sigh). > > I am pretty sure that only one thread is touching that znode. We put in > some > trace logging to try and pinpoint the problem and noticed that every time > we > get the BadVersionException the actual version on the znode is one more > than > what we expected it to be based on the previous "exists()" call. > > As I said, this code gets called once every 2 seconds (or thereabouts). It > seems to fail with a BadVersionException about 3 times an hour (on > average). > > By the way, not sure if it is relevant, but the reason we are using 2 nodes > in the cluster and the reason why their version is 3.2.2 is because they > are > the ZKs that come embedded inside HBase (we're running 2 Hbase > regionservers) - I've been meaning to pull them out and run them standalone > but just haven't got around to it (yet). > > Ishaaq > > On 10 October 2011 17:35, Mahadev Konar <maha...@hortonworks.com> wrote: > > > Ishaaq, > > 2 ZK servers is definitely not the right number for running a ZK > > service but its no reason to get a Badversion exception because of > > that. For more information on the size of the ZK ensemble take a look > > at: > > > > http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html > > > > As for the version on the znode, can you try reading the version when > > you get a setData/BadException? > > > > Also, is there any chance of a delete on the znode that removes it and > > another create happens for the same path? > > > > I dont think we have seen this version issue in the releases, so I'd > > be inclined to say that there could be something in the code thats > > making some changes to the znode before you set the data. > > > > Hope that helps > > thanks > > mahadev > > > > On Fri, Oct 7, 2011 at 6:47 PM, Ishaaq Chandy <ish...@gmail.com> wrote: > > > Hi all, > > > > > > We're seeing a puzzling error. Here's the scenario: > > > > > > 1. We have a single thread that wakes up every two seconds (give or > take) > > > and does some work > > > 2. As part of that work it updates a node on ZK. When it does this it > > first > > > gets the Stat of the existing node and uses the version retrieved from > it > > to > > > update the value. > > > 3. There are no other processes updating the node > > > > > > The code goes something like this: > > > final Stat stat = zooKeeper.exists(path, false); > > > // do some other work here to create the path if it does not exist - > this > > > code only ever gets called once > > > zooKeeper.setData(path, value, stat.getVersion()); > > > > > > What we're seeing is that every so often (once every 5 minutes or so?) > is > > > that that setData() call fails with a BadVersionException. This is very > > > unexpected because, as I mentioned previously, this thread is the sole > > > updater of that node. > > > > > > One possibility I am considering is that we are using the wrong number > of > > > ZKs in our cluster - i.e 2 nodes. I am wondering if 2 is the worst > number > > of > > > nodes possible for ZK as there is no way to resolve a disagreement. > > > > > > Another possibility is that we are using an old version of ZK (3.2.2), > > > perhaps there is a known bug with it? Though I see nothing related to > > this > > > in the release logs for subsequent versions. > > > > > > Thoughts/suggestions? > > > > > > Thanks, > > > Ishaaq > > > > > >