Hi Philippa,
Try taking a heap dump (when heap usage is high) and then using a profiler
look at which objects are taking up most of the memory. I have seen that if
you are using faceting/sorting on large number of documents then fieldCache
grows very big and dominates most of of the heap. Enabling docValues on the
fields you are sorting/faceting on helps.

On 8 December 2015 at 07:17, philippa griggs <philippa.gri...@hotmail.co.uk>
wrote:

> Hello Emir,
>
> The query load is around 35 requests per min on each shard, we don't
> document route so we query the entire index.
>
> We do have some heavy queries like faceting and its possible that a heavy
> queries is causing the nodes to go down- we are looking into this.  I'm new
> to solr so this could be a slightly stupid question but would a heavy query
> cause most of the nodes to go down? This didn't happen with the previous
> solr version we were using Solr 4.10.0, we did have nodes/shards which went
> down but there wasn't wipe out effect where most of the nodes go.
>
> Many thanks
>
> Philippa
>
> ________________________________________
> From: Emir Arnautovic <emir.arnauto...@sematext.com>
> Sent: 08 December 2015 10:38
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.
>
> Hi Phillippa,
> My guess would be that you are running some heavy queries (faceting/deep
> paging/large pages) or have high query load (can you give bit details
> about load) or have misconfigured caches. Do you query entire index or
> you have query routing?
>
> You have big machine and might consider running two Solr on each node
> (with smaller heap) and split shards so queries can be more
> parallelized, resources better utilized, and smaller heap to GC.
>
> Regards,
> Emir
>
> On 08.12.2015 10:49, philippa griggs wrote:
> > Hello Erick,
> >
> > Thanks for your reply.
> >
> > We have one collection and are writing documents to that collection all
> the time- it peaks at around 2,500 per minute and dips to 250 per minute,
> the size of the document varies. On each node we have around 55,000,000
> documents with a data size of 43G located on a drive of 200G.
> >
> > Each node has 122G memory, the heap size is currently set at 45G
> although we have plans to increase this to 50G.
> >
> > The heap settings we are using are:
> >
> >   -XX: +UseG1GC,
> > -XX:+ParallelRefProcEnabled.
> >
> > Please let me know if you need any more information.
> >
> > Philippa
> > ________________________________________
> > From: Erick Erickson <erickerick...@gmail.com>
> > Sent: 07 December 2015 16:53
> > To: solr-user
> > Subject: Re: Solr 5.2.1 Most solr nodes in a cluster going down at once.
> >
> > Tell us a bit more.
> >
> > Are you adding documents to your collections or adding more
> > collections? Solr is a balancing act between the number of docs you
> > have on each node and the memory you have allocated. If you're
> > continually adding docs to Solr, you'll eventually run out of memory
> > and/or hit big GC pauses.
> >
> > How much memory are you allocating to Solr? How much physical memory
> > to you have? etc.
> >
> > Best,
> > Erick
> >
> >
> > On Mon, Dec 7, 2015 at 8:37 AM, philippa griggs
> > <philippa.gri...@hotmail.co.uk> wrote:
> >> Hello,
> >>
> >>
> >> I'm using:
> >>
> >>
> >> Solr 5.2.1 10 shards each with a replica. (20 nodes in total)
> >>
> >>
> >> Zookeeper 3.4.6.
> >>
> >>
> >> About half a year ago we upgraded to Solr 5.2.1 and since then have
> been experiencing a 'wipe out' effect where all of a sudden most if not all
> nodes will go down. Sometimes they will recover by themselves but more
> often than not we have to step in to restart nodes.
> >>
> >>
> >> Nothing in the logs jumps out as being the problem. With the latest
> wipe out we noticed that 10 out of the 20 nodes had garbage collections
> over 1min all at the same time, with the heap usage spiking up in some
> cases to 80%. We also noticed the amount of selects run on the solr cluster
> increased just before the wipe out.
> >>
> >>
> >> Increasing the heap size seems to help for a while but then it starts
> happening again- so its more like a delay than a fix. Our GC settings are
> set to -XX: +UseG1GC, -XX:+ParallelRefProcEnabled.
> >>
> >>
> >> With our previous version of solr (4.10.0) this didn't happen. We had
> nodes/shards go down but it was contained, with the new version they all
> seem to go at around the same time. We can't really continue just
> increasing the heap size and would like to solve this issue rather than
> delay it.
> >>
> >>
> >> Has anyone experienced something simular?
> >>
> >> Is there a difference between the two versions around the recovery
> process?
> >>
> >> Does anyone have any suggestions on a fix.
> >>
> >>
> >> Many thanks
> >>
> >>
> >> Philippa
> > >
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>

Reply via email to