Re: SolrCloud breaks and does not recover

2015-11-07 Thread Pushkar Raste
HI, To minimize GC pauses, try using G1GC and turn on 'ParallelRefProcEnabled' jvm flag. G1GC works much better for heaps > 4 GB. Lowering 'InitiatingHeapOccupancyPercent' will also help to avoid long GC pauses at the cost of more short pauses. On 3 November 2015 at 12:12, Björn Häuser wrote: >

Re: SolrCloud breaks and does not recover

2015-11-03 Thread Rallavagu
One another item to look into is to increase the zookeeper timeout in solr.xml of Solr. This would help with timeout caused by long GC pauses. On 11/3/15 9:12 AM, Björn Häuser wrote: Hi, thank you for your answer. 1> No OOM hit, the log does not contain any hind of that. Also solr wasn't rest

Re: SolrCloud breaks and does not recover

2015-11-03 Thread Björn Häuser
Hi, thank you for your answer. 1> No OOM hit, the log does not contain any hind of that. Also solr wasn't restarted automatically. But the gc log has some pauses which are longer than 15 seconds. 2> So, if we need to recover a system we need to stop ingesting data into it? 3> The JVMs currently

Re: SolrCloud breaks and does not recover

2015-11-03 Thread Erick Erickson
The GC logs don't really show anything interesting, there would be 15+ second GC pauses. The Zookeeper log isn't actually very interesting. As far as OOM errors, I was thinking of _solr_ logs. As to why the cluster doesn't self-heal, a couple of things: 1> Once you hit an OOM, all bets are off. T

Re: SolrCloud breaks and does not recover

2015-11-03 Thread Björn Häuser
Hi! Thank you for your super fast answer. I can provide more data, the question is which data :-) These are the config parameters solr runs with: https://gist.github.com/bjoernhaeuser/24e7080b9ff2a8785740 (taken from the admin ui) These are the log files: https://gist.github.com/bjoernhaeuser/

Re: SolrCloud breaks and does not recover

2015-11-02 Thread Erick Erickson
Without more data, I'd guess one of two things: 1> you're seeing stop-the-world GC pauses that cause Zookeeper to think the node is unresponsive, which puts a node into recovery and things go bad from there. 2> Somewhere in your solr logs you'll see OutOfMemory errors which can also cascade a bun

SolrCloud breaks and does not recover

2015-11-02 Thread Björn Häuser
Hey there, we are running a SolrCloud, which has 4 nodes, same config. Each node has 8gb memory, 6GB assigned to the JVM. This is maybe too much, but worked for a long time. We currently run with 2 shards, 2 replicas and 11 collections. The complete data-dir is about 5.3 GB. I think we should mov