Re: Production Issue: SOLR node goes to non responsive , restart not helping at peak hours

Doss Thu, 05 Sep 2019 05:09:07 -0700

@Jorn We are adding few more zookeeper nodes soon. Thanks.

@ Erick, sorry I couldn't understand it clearly, we have 90GB RAM per node,
out of which 14 GB assigned for HEAP, you mean to say we have to allocate
more HEAP? or we need add more Physical RAM?


This system ran for 8 to 9 months without any major issues, in recent times
only we are facing too many such incidents.

On Thu, Sep 5, 2019 at 5:20 PM Erick Erickson <erickerick...@gmail.com>
wrote:

> If I'm reading this correctly, you have a huge amount of index in not much
> memory. You only have 14g allocated across 130 replicas, at least one of
> which has a 20g index. You don't need as much memory as your aggregate
> index size, but this system feels severely under provisioned. I suspect
> that's the root of your instability
>
> Best,
> Erick
>
> On Thu, Sep 5, 2019, 07:08 Doss <itsmed...@gmail.com> wrote:
>
> > Hi,
> >
> > We are using 3 node SOLR (7.0.1) cloud setup 1 node zookeeper ensemble.
> > Each system has 16CPUs, 90GB RAM (14GB HEAP), 130 cores (3 replicas NRT)
> > with index size ranging from 700MB to 20GB.
> >
> > autoCommit - 10 minutes once
> > softCommit - 30 Sec Once
> >
> > At peak time if a shard goes to recovery mode many other shards also
> going
> > to recovery mode in few minutes, which creates huge load (200+ load
> > average) and SOLR becomes non responsive. To fix this we are restarting
> the
> > node, again leader tries to correct the index by initiating replication,
> > which causes load again, and the node goes to non responsive state.
> >
> > As soon as a node starts the replication process initiated for all 130
> > cores, is there any we control it, like one after the other?
> >
> > Thanks,
> > Doss.
> >
>

Re: Production Issue: SOLR node goes to non responsive , restart not helping at peak hours

Reply via email to