On 5/17/2020 2:05 AM, Dominique Bejean wrote:
One or two hours before the nodes stop with OOM, we see this scenario on
all six nodes during the same five minutes time frame :
* a little bit more young gc : from one each second (duration<0.05secs) to
one each two or three seconds (duration <0.15 sec)
* full gc start occurs each 5sec with 0 bytes reclaimed
* young gc start reclaim less bytes
* long full gc start reclaim bytes but with less and less reclaimed bytes
* then no more young GC
Here are GC graphs : https://www.eolya.fr/solr_issue_gc.png

Do you have the OutOfMemoryException in the solr log? From the graph you provided, it does look likely that it was heap memory on the OOME, I'd just like to be sure, by seeing the logged exception.

Between 15:00 and 15:30, something happened which suddenly required additional heap memory. Do you have any idea what that was? If you can zoom in on the graph, you could get a more accurate time for this. I am looking specifically at the "heap usage before GC" graph. The "heap usage after GC" graph that gceasy makes, which has not been included here, is potentially more useful.

I found that I most frequently ran into memory problems when I executed a data mining query -- doing facets or grouping on a high cardinality field, for example. Those kinds of queries required a LOT of extra memory.

If the servers have any memory left, you might need to increase the max heap beyond where it currently sits. To handle your indexes and queries, Solr may simply require more memory than you have allowed.

Thanks,
Shawn

Reply via email to