Off the top of my head: a) Should the below JVM parameter be included for Prod to get heap dump
Makes sense. It may produce quite a large dump file, but then this is an extraordinary situation so that's probably OK. b) Currently OOM script just kills the Solr instance. Shouldn't it be enhanced to wait and restart Solr instance Personally I don't think so. IMO there's no real point in restarting Solr, you have to address this issue as this situation is likely to recur. So restarting Solr may hide this very serious problem, how would you even know to look? Restarting Solr could potentially lead to a long involved process of wondering why selected queries seem to fail and not noticing that the OOM script killed Solr. Having the default _not_ restart Solr forces you to notice. If you have to change the script to restart Solr, you also know that you made the change and you should _really_ notify ops that they should monitor this situation. I admit this can be argued either way; Personally, I'd rather "fail fast and often". Best, Erick On Tue, Oct 25, 2016 at 7:03 PM, Susheel Kumar <susheel2...@gmail.com> wrote: > Agree, Pushkar. I had docValues for sorting / faceting fields from > begining (since I setup Solr 6.0). So good on that side. I am going to > analyze the queries to find any potential issue. Two questions which I am > puzzling with > > a) Should the below JVM parameter be included for Prod to get heap dump > > "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/the/dump" > > b) Currently OOM script just kills the Solr instance. Shouldn't it be > enhanced to wait and restart Solr instance > > Thanks, > Susheel > > > > > On Tue, Oct 25, 2016 at 7:35 PM, Pushkar Raste <pushkar.ra...@gmail.com> > wrote: > >> You should look into using docValues. docValues are stored off heap and >> hence you would be better off than just bumping up the heap. >> >> Don't enable docValues on existing fields unless you plan to reindex data >> from scratch. >> >> On Oct 25, 2016 3:04 PM, "Susheel Kumar" <susheel2...@gmail.com> wrote: >> >> > Thanks, Toke. Analyzing GC logs helped to determine that it was a sudden >> > death. The peaks in last 20 mins... See http://tinypic.com/r/n2zonb/9 >> > >> > Will look into the queries more closer and also adjusting the cache >> sizing. >> > >> > >> > Thanks, >> > Susheel >> > >> > On Tue, Oct 25, 2016 at 3:37 AM, Toke Eskildsen <t...@statsbiblioteket.dk> >> > wrote: >> > >> > > On Mon, 2016-10-24 at 18:27 -0400, Susheel Kumar wrote: >> > > > I am seeing OOM script killed solr (solr 6.0.0) on couple of our VM's >> > > > today. So far our solr cluster has been running fine but suddenly >> > > > today many of the VM's Solr instance got killed. >> > > >> > > As you have the GC-logs, you should be able to determine if it was a >> > > slow death (e.g. caches gradually being filled) or a sudden one (e.g. >> > > grouping or faceting on a large new non-DocValued field). >> > > >> > > Try plotting the GC logs with time on the x-axis and free memory after >> > > GC on the y-axis. It it happens to be a sudden death, the last lines in >> > > solr.log might hold a clue after all. >> > > >> > > - Toke Eskildsen, State and University Library, Denmark >> > > >> > >>