Off the top of my head:

a) Should the below JVM parameter be included for Prod to get heap dump

Makes sense. It may produce quite a large dump file, but then this is
an extraordinary situation so that's probably OK.

b) Currently OOM script just kills the Solr instance. Shouldn't it be
enhanced to wait and restart Solr instance

Personally I don't think so. IMO there's no real point in restarting
Solr, you have to address this issue as this situation is likely to
recur. So restarting Solr may hide this very serious problem, how
would you even know to look? Restarting Solr could potentially lead to
a long involved process of wondering why selected queries seem to fail
and not noticing that the OOM script killed Solr. Having the default
_not_ restart Solr forces you to notice.

If you have to change the script to restart Solr, you also know that
you made the change and you should _really_ notify ops that they
should monitor this situation.

I admit this can be argued either way; Personally, I'd rather "fail
fast and often".

Best,
Erick

On Tue, Oct 25, 2016 at 7:03 PM, Susheel Kumar <susheel2...@gmail.com> wrote:
> Agree, Pushkar.  I had docValues for sorting / faceting fields from
> begining (since I setup Solr 6.0).  So good on that side. I am going to
> analyze the queries to find any potential issue. Two questions which I am
> puzzling with
>
> a) Should the below JVM parameter be included for Prod to get heap dump
>
> "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/path/to/the/dump"
>
> b) Currently OOM script just kills the Solr instance. Shouldn't it be
> enhanced to wait and restart Solr instance
>
> Thanks,
> Susheel
>
>
>
>
> On Tue, Oct 25, 2016 at 7:35 PM, Pushkar Raste <pushkar.ra...@gmail.com>
> wrote:
>
>> You should look into using docValues.  docValues are stored off heap and
>> hence you would be better off than just bumping up the heap.
>>
>> Don't enable docValues on existing fields unless you plan to reindex data
>> from scratch.
>>
>> On Oct 25, 2016 3:04 PM, "Susheel Kumar" <susheel2...@gmail.com> wrote:
>>
>> > Thanks, Toke.  Analyzing GC logs helped to determine that it was a sudden
>> > death.  The peaks in last 20 mins... See   http://tinypic.com/r/n2zonb/9
>> >
>> > Will look into the queries more closer and also adjusting the cache
>> sizing.
>> >
>> >
>> > Thanks,
>> > Susheel
>> >
>> > On Tue, Oct 25, 2016 at 3:37 AM, Toke Eskildsen <t...@statsbiblioteket.dk>
>> > wrote:
>> >
>> > > On Mon, 2016-10-24 at 18:27 -0400, Susheel Kumar wrote:
>> > > > I am seeing OOM script killed solr (solr 6.0.0) on couple of our VM's
>> > > > today. So far our solr cluster has been running fine but suddenly
>> > > > today many of the VM's Solr instance got killed.
>> > >
>> > > As you have the GC-logs, you should be able to determine if it was a
>> > > slow death (e.g. caches gradually being filled) or a sudden one (e.g.
>> > > grouping or faceting on a large new non-DocValued field).
>> > >
>> > > Try plotting the GC logs with time on the x-axis and free memory after
>> > > GC on the y-axis. It it happens to be a sudden death, the last lines in
>> > > solr.log might hold a clue after all.
>> > >
>> > > - Toke Eskildsen, State and University Library, Denmark
>> > >
>> >
>>

Reply via email to