We are seeing a lot of full GC events and eventual OOM errors in Solr
during indexing. This is Solr 6.5.1 running in cloud mode with a 24G heap.
At these times indexing is the only activity taking place. The collection
has 4 shards and 2 replicas across 3 nodes. Each document is ~10KB (a few
hundred fields each), and indexing is using the normal update handler, 1
document per request, up to 240 request at a time.

The heap dump taken automatically on OOM shows 18.3GB of heap taken by 3
instances of DocumentsWriter. Within those instances, all of the heap is
retained by the blockedFlushes LinkedList inside the flushControl object.
Each node in the LinkedList appears to be retaining around 55MB.

Clearly something to do with flushing is at play here but I'm at a loss
what tuning parameters I should be looking at. I would expect things to
start blocking if I fall too far behind on flushing but apparently that's
not happening. The ramBufferSizeMB is set to the default 100. My heap size
is already absurdly more than I thought we would need for this volume.

Any idea what could be causing this?

Reply via email to