The errors are unusual but the znode_count is normal

On Fri, Jan 28, 2022 at 9:12 PM Reej Nayagam <reej...@gmail.com> wrote:
>
> Hi All,
>
>  As suggested from the group I tried using this api call
> /sol/admin/zookeeper/status, to get the zk status
> whenever i try this in my browser one time I get the status as 0 and get
> the zk ensemble details, after a while when I try i get
> status : 500
> error: msg: "Java.net.SocketException:connection reset:
> trace: java.io.UncheckedIOException :
> java.net.socketexception:connection reset
>
> can I ignore if there is a socket exception because immediately if i try
> next time the status is ok no errors. Kindly advise.
>
> Also in the solr admin UI, I can see the below for all the zookeepers, is
> this normal? what is the zk_node_count
> ZK_node_count 1852
> zk_approximate_data_size 7853679
>
> *Thanks,*
> *Reej*
>
>
> On Thu, Jan 27, 2022 at 4:22 PM Reej Nayagam <reej...@gmail.com> wrote:
>
> > Hi Vinay,
> >
> > We are connecting using cloudsolrclient passing the zk host, so if zk is
> > down, the connection to solr also won't happen.
> >
> > *Thanks,*
> > *Reej*
> >
> >
> > On Thu, Jan 27, 2022 at 12:35 PM Vinay Rajput <vinayrajput4...@gmail.com>
> > wrote:
> >
> >> It also looks like from your requirement that you want to disable solr
> >> search and activate DB search in case of zookeeper cluster failure.
> >>
> >> That is NOT needed. Solr search is not impacted when zk cluster is down,
> >> only indexing is impacted. We have had a situation when our all zk nodes
> >> were down for few minutes and still there was no impact on search.
> >>
> >> Thanks,
> >> Vinay
> >>
> >> On Wed, 26 Jan 2022 at 9:12 PM, Walter Underwood <wun...@wunderwood.org>
> >> wrote:
> >>
> >> > You can check the status of each Zookeeper node with the “ruok” command.
> >> > This is one of the “four lettter words” admin commands.
> >> >
> >> >
> >> https://zookeeper.apache.org/doc/r3.4.8/zookeeperAdmin.html#sc_zkCommands
> >> >
> >> > This is how it works from a command line.
> >> >
> >> > $ echo ruok | nc zoo-shared-1.test.search.cheggnet.com 2181
> >> > imok
> >> >
> >> > wunder
> >> > Walter Underwood
> >> > wun...@wunderwood.org
> >> > http://observer.wunderwood.org/  (my blog)
> >> >
> >> > > On Jan 26, 2022, at 5:53 AM, Reej Nayagam <reej...@gmail.com> wrote:
> >> > >
> >> > > The scenario is solr servers are up, but majority of the zk is down,
> >> > > so we need to tell the issue is with the zookeeper. I don’t find a
> >> way on
> >> > > how to identify the zookeeper status without waiting for the timeout
> >> to
> >> > > happen after 30 seconds.
> >> > >
> >> > > On Wed, 26 Jan 2022 at 9:39 PM, matthew sporleder <
> >> msporle...@gmail.com>
> >> > > wrote:
> >> > >
> >> > >> I don't understand your approach --
> >> > >>
> >> > >> For checking solr health I would probably use the ping endpoint or a
> >> > >> very fast query with a low timeout (q=*:*&timeAllowed=100&rows=0).
> >> > >>
> >> > >> IIRC zookeeper health (as seen by solr) is in the CLUSTERSTATUS admin
> >> > >> api command?  It's somewhere near there if not in CLUSTERSTATUS.
> >> > >>
> >> > >> For interacting with zookeeper itself I would probably just use zk
> >> > >> clients directly.
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Wed, Jan 26, 2022 at 7:41 AM Reej Nayagam <reej...@gmail.com>
> >> wrote:
> >> > >>>
> >> > >>> Hi All,
> >> > >>>
> >> > >>> I need to handle zk failure and so monitoring the zk ensemble, and
> >> if
> >> > the
> >> > >>> majority of the zk fails we'll activate the HA to point to a DB
> >> search.
> >> > >>>
> >> > >>> So to check if each of the zk is alive , we are connecting as below,
> >> > >>>
> >> > >>> *zkClient = solrZkClient(zkaddress,10000),*
> >> > >>> *return zkclient.getSolrZookeeper().getState(),isAlive*
> >> > >>>
> >> > >>> But I noticed, it still takes the default 30,000 ms timeout instead
> >> of
> >> > >> 10k
> >> > >>> milliseconds passed in.
> >> > >>>
> >> > >>> Is there a way we can override zookeeper timeout, because we have 3
> >> > zk's
> >> > >>> and if suppose all the 3 are down, to get the status of each we
> >> need to
> >> > >>> wait for 30 seconds each.
> >> > >>>
> >> > >>> Kindly advise if any of you have handled this. Thank you !
> >> > >>>
> >> > >>> *Thanks,*
> >> > >>> *Reej*
> >> > >>
> >> > > --
> >> > > *Thanks,*
> >> > > *Reej*
> >> >
> >> >
> >>
> >

Reply via email to