We use Apache Storm and Solrcloud along with a few clients that store data
in the zookeeper. Due to one open bug in solrcloud
https://issues.apache.org/jira/browse/SOLR-16415, we see async_ids not
being deleted automatically. This is increasing the size of overseer znode
in zookeeper. Also, storm stores topologies jars (25+ topologies) inside
zookeeper which are also heavy in size. These two are the top 2 heavy
clients (other than a few other springboot microservices) that increase
zookeeper znode size.

I want to see how we can get the active size of a specific znode so that we
can monitor it and also set the jute.maxbuffer value accordingly.

I know zookeeper does not behave well with huge data being stored
inside it, but ignoring that fact, how can we get the znode size info?

Regards,
Aishwarya Soni

On Sat, Jul 15, 2023 at 2:24 AM Steph van Schalkwyk <[email protected]>
wrote:

> Take a look in the code repo. should be a simple pull.
> S
>
> On Fri, Jul 14, 2023 at 3:23 PM Ruel, Ryan <[email protected]>
> wrote:
>
> > We have an application where the size of individual ZNodes is small (a
> few
> > KB typically), however our data is distributed in the tree such that we
> can
> > have many sub nodes (10s of thousands, in some cases).
> >
> > When running the ZK CLI tool to view our data, I was surprised to see
> that
> > we started to get IOExceptions for exceeding the 1MB jute.maxbuffer.
> >
> > We've gotten around this by increasing the max buffer size to 10MB, but
> it
> > wasn't clear to me whether the ZNode allowed data size is impacted by the
> > number of sub nodes, or if this buffer size is just reused in various
> > places in the client code.
> >
> > ZK seems to operate just fine with these large numbers of sub nodes, it's
> > just the client tool that was complaining when trying to list sub nodes.
> >
> > /Ryan
> >
> > On 7/14/23, 3:01 PM, "Steph van Schalkwyk" <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >
> > To your last point - ZK was designed to distribute small packets, hence
> the
> > 1M buffer.
> > I've had a client who had a Solr connector that kept on creating new
> fields
> > from different sources, and the Solr schema quickly grew to 4M. That's
> > about the biggest I've seen ZK operate reliably.
> >
> >
> > On Fri, Jul 14, 2023 at 1:09 PM Aishwarya Soni <
> [email protected]
> > <mailto:[email protected]>>
> > wrote:
> >
> >
> > > Hi,
> > >
> > > I want to find what is the current size/memory of a znode, i.e. how
> much
> > > its utilizing including all its child znodes. I know
> > > *zk_approximate_data_size* is the approximate memory consumption for
> ALL
> > > znodes stored in the ZooKeeper ensemble. But I need to find the active
> > size
> > > of a specific znode out of multiple znodes.
> > >
> > > How can we get it?
> > >
> > > Also, what is the safe max value we can assign to jute.maxbuffer? I am
> > > seeing packet length of 1 GB coming from a couple of clients and it is
> > > getting errored out with IOException due to jute.maxbuffer set to the
> > > default value of 1MB.
> > >
> > > Regards,
> > > Aishwarya
> > >
> >
> >
> >
> >
>

Reply via email to