The errors are unusual but the znode_count is normal
On Fri, Jan 28, 2022 at 9:12 PM Reej Nayagam <reej...@gmail.com> wrote: > > Hi All, > > As suggested from the group I tried using this api call > /sol/admin/zookeeper/status, to get the zk status > whenever i try this in my browser one time I get the status as 0 and get > the zk ensemble details, after a while when I try i get > status : 500 > error: msg: "Java.net.SocketException:connection reset: > trace: java.io.UncheckedIOException : > java.net.socketexception:connection reset > > can I ignore if there is a socket exception because immediately if i try > next time the status is ok no errors. Kindly advise. > > Also in the solr admin UI, I can see the below for all the zookeepers, is > this normal? what is the zk_node_count > ZK_node_count 1852 > zk_approximate_data_size 7853679 > > *Thanks,* > *Reej* > > > On Thu, Jan 27, 2022 at 4:22 PM Reej Nayagam <reej...@gmail.com> wrote: > > > Hi Vinay, > > > > We are connecting using cloudsolrclient passing the zk host, so if zk is > > down, the connection to solr also won't happen. > > > > *Thanks,* > > *Reej* > > > > > > On Thu, Jan 27, 2022 at 12:35 PM Vinay Rajput <vinayrajput4...@gmail.com> > > wrote: > > > >> It also looks like from your requirement that you want to disable solr > >> search and activate DB search in case of zookeeper cluster failure. > >> > >> That is NOT needed. Solr search is not impacted when zk cluster is down, > >> only indexing is impacted. We have had a situation when our all zk nodes > >> were down for few minutes and still there was no impact on search. > >> > >> Thanks, > >> Vinay > >> > >> On Wed, 26 Jan 2022 at 9:12 PM, Walter Underwood <wun...@wunderwood.org> > >> wrote: > >> > >> > You can check the status of each Zookeeper node with the “ruok” command. > >> > This is one of the “four lettter words” admin commands. > >> > > >> > > >> https://zookeeper.apache.org/doc/r3.4.8/zookeeperAdmin.html#sc_zkCommands > >> > > >> > This is how it works from a command line. > >> > > >> > $ echo ruok | nc zoo-shared-1.test.search.cheggnet.com 2181 > >> > imok > >> > > >> > wunder > >> > Walter Underwood > >> > wun...@wunderwood.org > >> > http://observer.wunderwood.org/ (my blog) > >> > > >> > > On Jan 26, 2022, at 5:53 AM, Reej Nayagam <reej...@gmail.com> wrote: > >> > > > >> > > The scenario is solr servers are up, but majority of the zk is down, > >> > > so we need to tell the issue is with the zookeeper. I don’t find a > >> way on > >> > > how to identify the zookeeper status without waiting for the timeout > >> to > >> > > happen after 30 seconds. > >> > > > >> > > On Wed, 26 Jan 2022 at 9:39 PM, matthew sporleder < > >> msporle...@gmail.com> > >> > > wrote: > >> > > > >> > >> I don't understand your approach -- > >> > >> > >> > >> For checking solr health I would probably use the ping endpoint or a > >> > >> very fast query with a low timeout (q=*:*&timeAllowed=100&rows=0). > >> > >> > >> > >> IIRC zookeeper health (as seen by solr) is in the CLUSTERSTATUS admin > >> > >> api command? It's somewhere near there if not in CLUSTERSTATUS. > >> > >> > >> > >> For interacting with zookeeper itself I would probably just use zk > >> > >> clients directly. > >> > >> > >> > >> > >> > >> > >> > >> On Wed, Jan 26, 2022 at 7:41 AM Reej Nayagam <reej...@gmail.com> > >> wrote: > >> > >>> > >> > >>> Hi All, > >> > >>> > >> > >>> I need to handle zk failure and so monitoring the zk ensemble, and > >> if > >> > the > >> > >>> majority of the zk fails we'll activate the HA to point to a DB > >> search. > >> > >>> > >> > >>> So to check if each of the zk is alive , we are connecting as below, > >> > >>> > >> > >>> *zkClient = solrZkClient(zkaddress,10000),* > >> > >>> *return zkclient.getSolrZookeeper().getState(),isAlive* > >> > >>> > >> > >>> But I noticed, it still takes the default 30,000 ms timeout instead > >> of > >> > >> 10k > >> > >>> milliseconds passed in. > >> > >>> > >> > >>> Is there a way we can override zookeeper timeout, because we have 3 > >> > zk's > >> > >>> and if suppose all the 3 are down, to get the status of each we > >> need to > >> > >>> wait for 30 seconds each. > >> > >>> > >> > >>> Kindly advise if any of you have handled this. Thank you ! > >> > >>> > >> > >>> *Thanks,* > >> > >>> *Reej* > >> > >> > >> > > -- > >> > > *Thanks,* > >> > > *Reej* > >> > > >> > > >> > >