So Toke/Daniel is the node showing *gone* on Solr cloud dashboard is because of GC pause and it is actually not gone but the ZK is not able to get the correct state? The issue is caused by a huge query with many wildcards and phrases in it. If you see I have mentioned about (*The request took too long to iterate over terms.). *So does it mean that the terms which are getting expanded has taken the amount of memory? Just trying to understand what consumes so much of memory. I am trying to reproduce the OOM by executing multiple queries in parallel but not able to whereas I am seeing the memory usage going up by more than 90+% for Solr JVM. So what happens to the query which is executed in parallel. Do they wait for such query to timeout/complete which is taking lot of time and resources? We also have migration to java 8 on our things to do list and will try with different GC settings.
On Tue, Aug 18, 2015 at 2:08 PM, Daniel Collins <danwcoll...@gmail.com> wrote: > Ah ok, its ZK timeout then > (org.apache.zookeeper.KeeperException$SessionExpiredException) > which is because of your GC pause. > > The page Shawn mentioned earlier has several links on how to investigate GC > issues and some common GC settings, sounds like you need to tweak those. > > Generally speaking, I believe Java 8 is considered better for GC > performance than 7, so you probably want to investigate that. GC tuning is > very dependent on the load on your system. You may be running close yo the > limit under normal load, and that 1 big query is enough to tip it over the > edge. We have seen similar issues from time to time. We are still running > an older Java 7 build with G1GC which we found worked well for us (though > CMS seems to be the general consensus on the list here), migrating to Java > 8 is on our "list of things to do", so our settings are probably not that > relevant. > > > On 18 August 2015 at 09:04, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > > > On Tue, 2015-08-18 at 10:38 +0530, Modassar Ather wrote: > > > Kindly help me understand, even if there is a a GC pause why the solr > > node > > > will go down. > > > > If a stop-the-world GC is in progress, it is not possible for an > > external service to know if this is because a GC is in progress or the > > node is dead. If the GC takes longer than the relevant timeouts, the > > external conclusion is that it is dead. > > > > In you next post you state that there is very heavy GC going on, so it > > would seem that your main problem is that your heap is too small for > > your setup. > > > > Getting OOM for a 200GB index with 24GB heap is not at all impossible, > > but it is a bit of a red flag. If you have very high values for your > > caches or perform faceting on a lot of different fields, that might be > > the cause. If you describe your setup in more detail, we might be able > > to help find the cause for your relatively high heap requirement. > > > > - Toke Eskildsen, State and University Library, Denmark > > > > > > >