Sorry, forgot to mention. Version: 3.4.6.

Thanks.

2014-11-13 18:11 GMT+01:00 German Blanco <[email protected]>:

> Hello,
>
> which version of Zookeeper are you using?
>
> On Thu, Nov 13, 2014 at 5:25 PM, Kuba Lekstan <[email protected]> wrote:
>
> > Hello,
> >
> > A bit of details:
> > We have 5 node cluster, which we use for configuration distrubution and
> > monitoring active instances of our applications. Each application creates
> > its ephemeral node, so we know which apps are alive, how many of them
> there
> > is and what they are doing.
> >
> > The problem had happen at 4th November, first time it was around 4AM,
> > second time around 12PM.
> > First time it was middle of the night when I got woken up, the support
> guys
> > told me that something is wrong with config distribution.
> >
> > First I've checked apps for errors but didn't find anything interesting,
> > then I looked at what's in zookeeper (using node-zk-browser).
> > I've noticed that there are 3 ephemeral nodes which were created at 1st
> nov
> > (while the oldest application was started on 3rd nov), I could read its
> > data but was not able to delete them - was getting NONODE exception.
> >
> > I thought wtf - why I cannot delete these nodes, something very bad had
> to
> > happen with ZK.
> >
> > So I sshed on the leader and using CLI I tried to read these nodes but I
> > was not able to - the leader was telling me that such nodes doesn't
> exist.
> > After this I started to ssh to the rest of the nodes in cluster and
> trying
> > to read these nodes. Finally I found the server which did let me read the
> > data of these nodes.
> > Because of the inconsistency I've decided to restart it. Restart did
> help,
> > everything went back to normal state. The ephemeral nodes disappeared.
> >
> > Similar situation had happen at 12PM but this time I had a lot more time
> to
> > look what is wrong. Second time the problem was about 3 ephemeral nodes
> > which were created at 1st now (again?). This time I dig a bit deeper and
> > look into logs and 4 letter commands - but could not find anything
> > interesting except the all these 3 nodes were created under different
> > sessionids but zk had no hosts connected under this sessionids.
> > Solution was similar to the one from 4AM but this time I've delete all
> > files in ZK data directory.
> >
> > Oddly enough the problem happened twice on the same ZK node, the final
> > solution was to clear ZK data directory. After clearing the directory the
> > problem didn't happen again.
> >
> > I tried to look for solution/similar problems, I found the posts where
> > people were complaining about ephemeral nodes not being removed after
> > client session gets closed. But I was not able to find posts about ZK not
> > being consistent.
> >
> > What do you think about this? Can we do something to fix this?
> >
> > Sorry for my english, I was doing my best. :)
> >
> > Thanks, Kuba.
> >
>

Reply via email to