Re: Production Issue: SOLR node goes to non responsive , restart not helping at peak hours

Doss Fri, 06 Sep 2019 02:22:40 -0700

Jorn we have add additional zookeeper nodes, now it is a 3 node quorum.

Does all nodes in a quorum sends heart beat request to all cores and shards
?


If zookeeper node 1 unable to communicate with a shard and it declares that
shard as dead, now this state can be changed by zookeeper node 2 if it got
a successful response from that particular shard?

On Thu, Sep 5, 2019 at 4:53 PM Jörn Franke <jornfra...@gmail.com> wrote:

> 1 Node zookeeper ensemble does not sound very healthy
>
> > Am 05.09.2019 um 13:07 schrieb Doss <itsmed...@gmail.com>:
> >
> > Hi,
> >
> > We are using 3 node SOLR (7.0.1) cloud setup 1 node zookeeper ensemble.
> > Each system has 16CPUs, 90GB RAM (14GB HEAP), 130 cores (3 replicas NRT)
> > with index size ranging from 700MB to 20GB.
> >
> > autoCommit - 10 minutes once
> > softCommit - 30 Sec Once
> >
> > At peak time if a shard goes to recovery mode many other shards also
> going
> > to recovery mode in few minutes, which creates huge load (200+ load
> > average) and SOLR becomes non responsive. To fix this we are restarting
> the
> > node, again leader tries to correct the index by initiating replication,
> > which causes load again, and the node goes to non responsive state.
> >
> > As soon as a node starts the replication process initiated for all 130
> > cores, is there any we control it, like one after the other?
> >
> > Thanks,
> > Doss.
>

Re: Production Issue: SOLR node goes to non responsive , restart not helping at peak hours

Reply via email to