Thanks, Shawn for looking into. Your summption is right, the end of graph
is the OOM. I am trying to collect all the queries & ingestion numbers
around 9:12 but one more observation and a question from today

Observed that on 2-3 VM's out of 12, shows high usage of heap even though
heavy ingestion stopped more than an hour back while on other machines
shows normal usage.  Does that tells anything?

Snapshot 1 showing high usage of heap
===
https://www.dropbox.com/s/c1qy1s5nc9uo6cp/2016-11-09_15-55-24.png?dl=0

Snapshot  2 showing normal usage of heap
===
https://www.dropbox.com/s/9v016ilmhcahs28/2016-11-09_15-58-28.png?dl=0

The other question is we found that our ingestion batch size varies (goes
from 200 to 4000+ docs depending on  queue size). I am asking the ingestion
folks to fix the batch size but wondering does it matter in terms of load
on solr/heap usage if we submit small batches (like 500 docs max) more
frequently, than submitting bigger batches less frequently.  So far bigger
batch size has not caused any issues except these two incidents.

Thanks,
Susheel





On Wed, Nov 9, 2016 at 10:19 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 11/8/2016 12:49 PM, Susheel Kumar wrote:
> > Ran into OOM Error again right after two weeks. Below is the GC log
> > viewer graph. The first time we run into this was after 3 months and
> > then second time in two weeks. After first incident reduced the cache
> > size and increase heap from 8 to 10G. Interestingly query and
> > ingestion load is like normal other days and heap utilisation remains
> > stable and suddenly jumps to x2.
>
> It looks like something happened at about 9:12:30 on that graph.  Do you
> know what that was?  Starting at about that time, GC times went through
> the roof and the allocated heap began a steady rise.  At about 9:15, a
> lot of garbage was freed up and GC times dropped way down again.  At
> about 9:18, the GC once again started taking a long time, and the used
> heap was still going up steadily. At about 9:21, the full GCs started --
> the wide black bars.  I assume that the end of the graph is the OOM.
>
> > We are looking to reproduce this in test environment by producing
> > similar queries/ingestion but wondering if running into some memory
> > leak or bug like "SOLR-8922 - DocSetCollector can allocate massive
> > garbage on large indexes" which can cause this issue. Also we have
> > frequent updates and wondering if not optimizing the index can result
> > into this situation
>
> It looks more like a problem with allocated memory that's NOT garbage
> than a problem with garbage, but I can't really rule anything out, and
> even what I've said below could be wrong.
>
> Most of the allocated heap is in the old generation.  If there's a bug
> in Solr causing this problem, it would probably be a memory leak, but
> SOLR-8922 doesn't talk about a leak.  A memory leak is always possible,
> but those have been rare in Solr.  The most likely problem is that
> something changed in your indexing or query patterns which required a
> lot more memory than what happened before that point.
>
> Thanks,
> Shawn
>
>

Reply via email to