Pretty much what Emir has stated. I want to know, when you saw; all of this runs perfectly ok when indexing isn't happening. as soon as > we start "nrt" indexing one of the follower nodes goes down within 10 to 20 > minutes.
When you say "NRT" indexing, what is the commit strategy in indexing. With auto-commit so highly set, are you committing after batch, if yes, what's the number. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Sat, Nov 4, 2017 at 2:47 PM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi Rick, > Do you see any errors in logs? Do you have any monitoring tool? Maybe you > can check heap and GC metrics around time when incident happened. It is not > large heap but some major GC could cause pause large enough to trigger some > snowball and end up with node in recovery state. > What is indexing rate you observe? Why do you have max warming searchers 5 > (did you mean this with autowarmingsearchers?) when you commit every 5 min? > Why did you increase it - you seen errors with default 2? Maybe you commit > every bulk? > Do you see similar behaviour when you just do indexing without queries? > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 4 Nov 2017, at 05:15, Rick Dig <teram...@gmail.com> wrote: > > > > hello all, > > we are trying to run solrcloud 6.6 in a production setting. > > here's our config and issue > > 1) 3 nodes, 1 shard, replication factor 3 > > 2) all nodes are 16GB RAM, 4 core > > 3) Our production load is about 2000 requests per minute > > 4) index is fairly small, index size is around 400 MB with 300k documents > > 5) autocommit is currently set to 5 minutes (even though ideally we would > > like a smaller interval). > > 6) the jvm runs with 8 gb Xms and Xmx with CMS gc. > > 7) all of this runs perfectly ok when indexing isn't happening. as soon > as > > we start "nrt" indexing one of the follower nodes goes down within 10 to > 20 > > minutes. from this point on the nodes never recover unless we stop > > indexing. the master usually is the last one to fall. > > 8) there are maybe 5 to 7 processes indexing at the same time with > document > > batch sizes of 500. > > 9) maxRambuffersizeMB is 100, autowarmingsearchers is 5, > > 10) no cpu and / or oom issues that we can see. > > 11) cpu load does go fairly high 15 to 20 at times. > > any help or pointers appreciated > > > > thanks > > rick > >