Hi Jeremy,
Can you share your analysis chain configs? (SOLR-13336 can manifest in a
similar way, and would affect 7.3.1 with a susceptible config, given the
right (wrong?) input ...)
Michael

On Mon, Jan 11, 2021 at 5:27 PM Jeremy Smith <jas2...@cornell.edu> wrote:

> Hello all,
>      We have been struggling with an issue where solr will intermittently
> use all available CPU and become unresponsive.  It will remain in this
> state until we restart.  Solr will remain stable for some time, usually a
> few hours to a few days, before this happens again.  We've tried adjusting
> the caches and adding memory to both the VM and JVM, but we haven't been
> able to solve the issue yet.
>
> Here is some info about our server:
> Solr:
>   Solr 7.3.1, running on Java 1.8
>   Running in cloud mode, but there's only one core
>
> Host:
>   CentOS7
>   8 CPU, 56GB RAM
>   The only other processes running on this VM are two zookeepers, one for
> this Solr instance, one for another Solr instance
>
> Solr Config:
>  - One Core
>  - 36 Million documents (Max Doc), 28 million (Num Docs)
>  - ~15GB
>  - 10-20 Requests/second
>  - The schema is fairly large (~100 fields) and we allow faceting and
> searching on many, but not all, of the fields
>  - Data are imported once per minute through the DataImportHandler, with a
> hard commit at the end.  We usually index ~100-500 documents per minute,
> with many of these being updates to existing documents.
>
> Cache settings:
>     <filterCache class="solr.FastLRUCache"
>                  size="256"
>                  initialSize="256"
>                  autowarmCount="8"
>                  showItems="64"/>
>
>     <queryResultCache class="solr.LRUCache"
>                       size="256"
>                       initialSize="256"
>                       autowarmCount="0"/>
>
>     <documentCache class="solr.LRUCache"
>                    size="1024"
>                    initialSize="1024"
>                    autowarmCount="0"/>
>
> For the filterCache, we have tried sizes as low as 128, which caused our
> CPU usage to go up and didn't solve our issue.  autowarmCount used to be
> much higher, but we have reduced it to try to address this issue.
>
>
> The behavior we see:
>
> Solr is normally using ~3-6GB of heap and we usually have ~20GB of free
> memory.  Occasionally, though, solr is not able to free up memory and the
> heap usage climbs.  Analyzing the GC logs shows a sharp incline of usage
> with the GC (the default CMS) working hard to free memory, but not
> accomplishing much.  Eventually, it fills up the heap, maxes out the CPUs,
> and never recovers.  We have tried to analyze the logs to see if there are
> particular queries causing issues or if there are network issues to
> zookeeper, but we haven't been able to find any patterns.  After the issues
> start, we often see session timeouts to zookeeper, but it doesn't appear​
> that they are the cause.
>
>
>
> Does anyone have any recommendations on things to try or metrics to look
> into or configuration issues I may be overlooking?
>
> Thanks,
> Jeremy
>
>

Reply via email to