Re: SolrClould 6.6 stability challenges

Rick Dig Sun, 05 Nov 2017 09:26:55 -0800

hi Emir -
the document size would be an average of  less than 1.5kb.
it is actually 2000 queries / min - queries are primarily autocomplete +
highlighting (on a multivalued field with different payloads),  search and
faceting .
what should we watch for that would indicate that we are overloading the
cpu cores ? (the cpu peaks at 75%, but like i mentioned earlier we've seen
that "load" can go up to 20, not sure if this has an impact).
yes, we have dedicated zk nodes.
yes, we predictably encounter this issue even when the indexing thread is
just one.


thanks


On Sun, Nov 5, 2017 at 3:12 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Rick,
> I quickly looked at GC logs and didn’t see obvious issues. You mentioned
> that batch processing takes ~20s and it is 500 documents. With 5-7 indexing
> thread it is ~150 documents/s. Are those big documents?
> With 200 queries/min (~3-4 queries/s - what sort of queries?) and 5-7
> indexing threads, you might be overloading 4 cores.
> Do you have dedicated ZK nodes? Do you see the same issues with less
> indexing threads?
>
> Regards,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 4 Nov 2017, at 14:25, Rick Dig <teram...@gmail.com> wrote:
> >
> > not committing after the batch. made sure we have that turned off.
> > maxTime is set to 300000 (300 seconds), openSearcher is set to true.
> >
> >
> > On Sat, Nov 4, 2017 at 6:50 PM, Amrit Sarkar <sarkaramr...@gmail.com>
> wrote:
> >
> >> Pretty much what Emir has stated. I want to know, when you saw;
> >>
> >> all of this runs perfectly ok when indexing isn't happening. as soon as
> >>> we start "nrt" indexing one of the follower nodes goes down within 10
> to
> >> 20
> >>> minutes.
> >>
> >>
> >> When you say "NRT" indexing, what is the commit strategy in indexing.
> With
> >> auto-commit so highly set, are you committing after batch, if yes,
> what's
> >> the number.
> >>
> >> Amrit Sarkar
> >> Search Engineer
> >> Lucidworks, Inc.
> >> 415-589-9269
> >> www.lucidworks.com
> >> Twitter http://twitter.com/lucidworks
> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>
> >> On Sat, Nov 4, 2017 at 2:47 PM, Emir Arnautović <
> >> emir.arnauto...@sematext.com> wrote:
> >>
> >>> Hi Rick,
> >>> Do you see any errors in logs? Do you have any monitoring tool? Maybe
> you
> >>> can check heap and GC metrics around time when incident happened. It is
> >> not
> >>> large heap but some major GC could cause pause large enough to trigger
> >> some
> >>> snowball and end up with node in recovery state.
> >>> What is indexing rate you observe? Why do you have max warming
> searchers
> >> 5
> >>> (did you mean this with autowarmingsearchers?) when you commit every 5
> >> min?
> >>> Why did you increase it - you seen errors with default 2? Maybe you
> >> commit
> >>> every bulk?
> >>> Do you see similar behaviour when you just do indexing without queries?
> >>>
> >>> Thanks,
> >>> Emir
> >>> --
> >>> Monitoring - Log Management - Alerting - Anomaly Detection
> >>> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> >>>
> >>>
> >>>
> >>>> On 4 Nov 2017, at 05:15, Rick Dig <teram...@gmail.com> wrote:
> >>>>
> >>>> hello all,
> >>>> we are trying to run solrcloud 6.6 in a production setting.
> >>>> here's our config and issue
> >>>> 1) 3 nodes, 1 shard, replication factor 3
> >>>> 2) all nodes are 16GB RAM, 4 core
> >>>> 3) Our production load is about 2000 requests per minute
> >>>> 4) index is fairly small, index size is around 400 MB with 300k
> >> documents
> >>>> 5) autocommit is currently set to 5 minutes (even though ideally we
> >> would
> >>>> like a smaller interval).
> >>>> 6) the jvm runs with 8 gb Xms and Xmx with CMS gc.
> >>>> 7) all of this runs perfectly ok when indexing isn't happening. as
> soon
> >>> as
> >>>> we start "nrt" indexing one of the follower nodes goes down within 10
> >> to
> >>> 20
> >>>> minutes. from this point on the nodes never recover unless we stop
> >>>> indexing.  the master usually is the last one to fall.
> >>>> 8) there are maybe 5 to 7 processes indexing at the same time with
> >>> document
> >>>> batch sizes of 500.
> >>>> 9) maxRambuffersizeMB is 100, autowarmingsearchers is 5,
> >>>> 10) no cpu and / or oom issues that we can see.
> >>>> 11) cpu load does go fairly high 15 to 20 at times.
> >>>> any help or pointers appreciated
> >>>>
> >>>> thanks
> >>>> rick
> >>>
> >>>
> >>
>
>

Re: SolrClould 6.6 stability challenges

Reply via email to