Hi Jeremy, Can you share your analysis chain configs? (SOLR-13336 can manifest in a similar way, and would affect 7.3.1 with a susceptible config, given the right (wrong?) input ...) Michael
On Mon, Jan 11, 2021 at 5:27 PM Jeremy Smith <jas2...@cornell.edu> wrote: > Hello all, > We have been struggling with an issue where solr will intermittently > use all available CPU and become unresponsive. It will remain in this > state until we restart. Solr will remain stable for some time, usually a > few hours to a few days, before this happens again. We've tried adjusting > the caches and adding memory to both the VM and JVM, but we haven't been > able to solve the issue yet. > > Here is some info about our server: > Solr: > Solr 7.3.1, running on Java 1.8 > Running in cloud mode, but there's only one core > > Host: > CentOS7 > 8 CPU, 56GB RAM > The only other processes running on this VM are two zookeepers, one for > this Solr instance, one for another Solr instance > > Solr Config: > - One Core > - 36 Million documents (Max Doc), 28 million (Num Docs) > - ~15GB > - 10-20 Requests/second > - The schema is fairly large (~100 fields) and we allow faceting and > searching on many, but not all, of the fields > - Data are imported once per minute through the DataImportHandler, with a > hard commit at the end. We usually index ~100-500 documents per minute, > with many of these being updates to existing documents. > > Cache settings: > <filterCache class="solr.FastLRUCache" > size="256" > initialSize="256" > autowarmCount="8" > showItems="64"/> > > <queryResultCache class="solr.LRUCache" > size="256" > initialSize="256" > autowarmCount="0"/> > > <documentCache class="solr.LRUCache" > size="1024" > initialSize="1024" > autowarmCount="0"/> > > For the filterCache, we have tried sizes as low as 128, which caused our > CPU usage to go up and didn't solve our issue. autowarmCount used to be > much higher, but we have reduced it to try to address this issue. > > > The behavior we see: > > Solr is normally using ~3-6GB of heap and we usually have ~20GB of free > memory. Occasionally, though, solr is not able to free up memory and the > heap usage climbs. Analyzing the GC logs shows a sharp incline of usage > with the GC (the default CMS) working hard to free memory, but not > accomplishing much. Eventually, it fills up the heap, maxes out the CPUs, > and never recovers. We have tried to analyze the logs to see if there are > particular queries causing issues or if there are network issues to > zookeeper, but we haven't been able to find any patterns. After the issues > start, we often see session timeouts to zookeeper, but it doesn't appear​ > that they are the cause. > > > > Does anyone have any recommendations on things to try or metrics to look > into or configuration issues I may be overlooking? > > Thanks, > Jeremy > >