Hi Pawel,

This definitely sounds like garbage collection biting you.

Backups themselves aren't usually memory intensive, but if indexing is
going on at the same time you should expect elevated memory usage.
Essentially this is because for each core being backed up, Solr needs
to hold pieces of two different "versions" of the index in memory: the
commit-point being backed up, and the current state of the index with
the new documents.

If disabling indexing during backups is feasible that's where I'd
start in your shoes.  If it's not you might need to consider tweaks to
your heap and JVM GC settings to shorten the long individual GC pauses
you're reporting.

Good luck,

Jason

On Wed, Jan 20, 2021 at 7:00 AM Paweł Róg <pro...@gmail.com> wrote:
>
> Hello everyone,
> I have a nasty problem with the scheduled Solr collections backup. From
> time to time when a scheduled backup is triggered (backup operation takes
> around 10 minutes) Solr freezes for 20-30 seconds. The freeze happens on
> one Solr instance at time but this affects all queries latency (because of
> distributed queries on 6 shards). I can reproduce the problem only when
> updates in the Solr cluster are enabled. When I disable updates, the
> problem is gone.
>
> Lucene index is not big and fits into OS cache. I am wondering if taking a
> backup can be the culprit of the problem. I'm wondering if the process
> messes up operating system caches. Maybe all the files which are copied to
> NFS are eating up the OS cache and when the OS reaches high memory usage it
> starts cleaning up memory and making Solr to freeze.
>
> During the time of freeze monitoring charts are showing higher IO wait
> times. In addition to that Solr nodes which seem to be affected are
> reaching 95-100% total memory usage (used + buffers + caches).
>
> I cannot see anything valuable in GC logs apart from a message which
> suggests that the application was stopped for 20-30 seconds (Application
> time).
>
> The cluster consists of 12 machines. Each Solr is running on Ubuntu 16.04.
> All the servers are running in AWS EC2. Each Solr node is running inside
> Docker. EC2 instances have local SSD disks (but the same problem appeared
> with EBS).
>
> Does anyone have a similar problem and can share some thoughts? I'll
> appreciate all help.
>
> --
> Pawel Rog

Reply via email to