Sounds like you may want to look into the multi operation if you have many inputs being processed to form a single output.
On Tue, Oct 11, 2011 at 5:55 AM, Ishaaq Chandy <ish...@gmail.com> wrote: > Ok, false alarm - the problem was a mis-configuration in our code that was > causing multiple processes to update that znode whereas only one should > have. > > Apologies for wasting your time. > > Ishaaq > > On 11 October 2011 13:09, Ishaaq Chandy <ish...@gmail.com> wrote: > > > Technically we don't need the contents as we're going to overwrite it > > anyway, we're just asserting the fact that we're the only one writing to > > that node. > > > > Was just checking if it is a known issue - clearly not, so I'll continue > > investigating our code. > > > > Thanks, > > Ishaaq > > > > > > On 11 October 2011 12:21, Ted Dunning <ted.dunn...@gmail.com> wrote: > > > >> Why do you get the version in the first place without getting the > >> contents? > >> > >> If you don't have the contents, what is the point of enforcing a > version. > >> > >> On Mon, Oct 10, 2011 at 8:26 AM, Ishaaq Chandy <ish...@gmail.com> > wrote: > >> > >> > Thanks Mahadev, > >> > Yup, I am aware of the fact that 2 is a particularly bad number for > >> cluster > >> > size and hopefully we should fix that soon, I was just hoping that for > >> some > >> > reason that was why the problem is occurring - my conjecture was, for > >> e.g. > >> > if the two zk servers disagree about the version there is no way to > >> decide > >> > who is correct without a third tie-breaker server. > >> > > >> > But, if you say that is not the case, then I need to keep looking > >> (sigh). > >> > > >> > I am pretty sure that only one thread is touching that znode. We put > in > >> > some > >> > trace logging to try and pinpoint the problem and noticed that every > >> time > >> > we > >> > get the BadVersionException the actual version on the znode is one > more > >> > than > >> > what we expected it to be based on the previous "exists()" call. > >> > > >> > As I said, this code gets called once every 2 seconds (or > thereabouts). > >> It > >> > seems to fail with a BadVersionException about 3 times an hour (on > >> > average). > >> > > >> > By the way, not sure if it is relevant, but the reason we are using 2 > >> nodes > >> > in the cluster and the reason why their version is 3.2.2 is because > they > >> > are > >> > the ZKs that come embedded inside HBase (we're running 2 Hbase > >> > regionservers) - I've been meaning to pull them out and run them > >> standalone > >> > but just haven't got around to it (yet). > >> > > >> > Ishaaq > >> > > >> > On 10 October 2011 17:35, Mahadev Konar <maha...@hortonworks.com> > >> wrote: > >> > > >> > > Ishaaq, > >> > > 2 ZK servers is definitely not the right number for running a ZK > >> > > service but its no reason to get a Badversion exception because of > >> > > that. For more information on the size of the ZK ensemble take a > look > >> > > at: > >> > > > >> > > http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html > >> > > > >> > > As for the version on the znode, can you try reading the version > when > >> > > you get a setData/BadException? > >> > > > >> > > Also, is there any chance of a delete on the znode that removes it > and > >> > > another create happens for the same path? > >> > > > >> > > I dont think we have seen this version issue in the releases, so I'd > >> > > be inclined to say that there could be something in the code thats > >> > > making some changes to the znode before you set the data. > >> > > > >> > > Hope that helps > >> > > thanks > >> > > mahadev > >> > > > >> > > On Fri, Oct 7, 2011 at 6:47 PM, Ishaaq Chandy <ish...@gmail.com> > >> wrote: > >> > > > Hi all, > >> > > > > >> > > > We're seeing a puzzling error. Here's the scenario: > >> > > > > >> > > > 1. We have a single thread that wakes up every two seconds (give > or > >> > take) > >> > > > and does some work > >> > > > 2. As part of that work it updates a node on ZK. When it does this > >> it > >> > > first > >> > > > gets the Stat of the existing node and uses the version retrieved > >> from > >> > it > >> > > to > >> > > > update the value. > >> > > > 3. There are no other processes updating the node > >> > > > > >> > > > The code goes something like this: > >> > > > final Stat stat = zooKeeper.exists(path, false); > >> > > > // do some other work here to create the path if it does not exist > - > >> > this > >> > > > code only ever gets called once > >> > > > zooKeeper.setData(path, value, stat.getVersion()); > >> > > > > >> > > > What we're seeing is that every so often (once every 5 minutes or > >> so?) > >> > is > >> > > > that that setData() call fails with a BadVersionException. This is > >> very > >> > > > unexpected because, as I mentioned previously, this thread is the > >> sole > >> > > > updater of that node. > >> > > > > >> > > > One possibility I am considering is that we are using the wrong > >> number > >> > of > >> > > > ZKs in our cluster - i.e 2 nodes. I am wondering if 2 is the worst > >> > number > >> > > of > >> > > > nodes possible for ZK as there is no way to resolve a > disagreement. > >> > > > > >> > > > Another possibility is that we are using an old version of ZK > >> (3.2.2), > >> > > > perhaps there is a known bug with it? Though I see nothing related > >> to > >> > > this > >> > > > in the release logs for subsequent versions. > >> > > > > >> > > > Thoughts/suggestions? > >> > > > > >> > > > Thanks, > >> > > > Ishaaq > >> > > >