Re: Zookeeper session losing some watchers

Jamie Rothfeder Wed, 02 Nov 2011 19:35:12 -0700

Hi Neha,

I encountered a similar problem with zookeeper losing watches and found
that it was related to this bug:


https://issues.apache.org/jira/browse/ZOOKEEPER-961

Are you using a chroot?

Thanks,
Jamie

On Wed, Nov 2, 2011 at 1:16 PM, Neha Narkhede <[email protected]>wrote:

> Hi,
>
> We've been seeing a problem with our zookeeper servers lately, where
> all of a sudden a session loses some of the watchers registered on
> some of the znodes. Let me explain our Kafka-ZK setup. We have a Kafka
> cluster in one DC establishing sessions (with 6sec timeout) with a ZK
> cluster (of 4 machines) in another DC and registers watchers on some
> zookeeper paths. Every couple of weeks, we observe some problem with
> the Kafka servers, where on investigating further, we find that the
> session lost some of the key watches, but not all.
>
> The last time this happened, we ran the wchc command on the ZK servers
> and saw the problem. Unfortunately, we lost relevant information from
> the ZK logs by the time we were ready to debug it further. Since this
> causes Kafka servers to stop making progress, we want to setup some
> kind of alert when this happens. This will help us collect more
> information to give you. Particularly, we were thinking about running
> wchp periodically (maybe once a minute), grepping for the ZK paths and
> counting the number of watches that should be registered for correct
> operation. But I observed that the watcher info is not replicated
> across all ZK servers, so we would have to query every ZK server to
> inorder to get the full list.
>
> I'm not sure running wchp periodically on all ZK servers is the best
> option for this alert. Can you think of what could be the problem here
> and how we can setup this alert for now ?
>
> Thanks
> Neha
>

Re: Zookeeper session losing some watchers

Reply via email to