Is your autoCommit configured to open new searchers? Did you try to set
openSearcher to false?

Edward

On Tue, Feb 11, 2020 at 3:40 PM Vangelis Katsikaros <vkatsika...@gmail.com>
wrote:

> Hi
>
> On Mon, Feb 10, 2020 at 5:05 PM Vangelis Katsikaros <vkatsika...@gmail.com
> >
> wrote:
>
> > Hi all
> >
> > We run Solr 8.2.0
> > * with Amazon Corretto 11.0.5.10.1 SDK (java arguments shown in [1]),
> > * on Ubuntu 18.04
> > * on AWS EC2 m5.2xlarge with 8 CPUs and 32GB of RAM
> > * with -Xmx16g [1].
> >
> > We have migrated from Solr 3.5 and in big core (16GB) replicas we have
> > started to suffer degraded service. The replica’s ReplicationHandler is
> in
> > [8] and the master’s updateHandler in [9].
> >
> > We notice every 5 mins (the value for solr.autoCommit.maxTime) the
> > following:
> > * Solr uses all 8 CPUs. Suddenly for ~30 sec, it uses only 1 CPU at 100%
> > and the rest of the CPUs are idle (mpstat [6]). In our previous setup
> with
> > Solr 3 we used up to 80% of all CPUs.
> > * During that time the solr queries suddenly take more than 1 second, up
> > to 30 sec (or more). The same queries otherwise need less than 1 sec to
> > complete.
> > * The disk does not seem to be a bottleneck (iostat [4]).
> > * Memory does not seem to be a bottleneck (vmstat [5]).
> > * CPU (apart from the single CPU issue) does not seem to be a bottleneck
> > (mpstat [6] & pidstat [3]).
> > * We are no java/GC experts but It does not seem to be GC related [7].
> >
> > We have tried reducing the heap to 8 and 2GB with no positive effect. We
> > have tested different autoCommit.maxTime values. Reducing it to 60
> seconds
> > makes things unbearable. 5 minutes is not significantly different than
> 10.
> > Do you have any pointers to proceed debugging the issue?
> >
> > Detailed example problem that repeats every solr.autoCommit.maxTime
> > minutes on the replicas:
> > * From 12:36 to 12:39:04 queries are fast to serve [2]. Solr consumes CPU
> > from all 8 CPUs (mpstat [6]). The metric solr.jvm.threads.blocked.count
> is
> > 0 [2].
> > * From 12:39:04 to 12:39:25 queries are slow to respond [2]. Solr
> consumes
> > only 1 out of 8 CPUs, the other 7 CPUs are idle (mpstat [6]). The metric
> > solr.jvm.threads.blocked.count grows from 0 to a big 2 digit number [2].
> > * After 12:39:25 and until the next poll of a commit things are normal.
> >
> > Regards
> > Vangelis
> >
> > [1]
> >
> https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-solr-info
> > [2]
> >
> https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-slow-queries-and-solr-jvm-threads-blocked-count
> > [3]
> >
> https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-pidstat
> > [4]
> >
> https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-iostat
> > [5]
> >
> https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-vmstat
> > [6]
> >
> https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-mpstat
> > [7]
> >
> https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-gc-logs
> > [8]
> >
> https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-replica-replicationhandler
> > [9]
> >
> https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-master-updatehandler
> >
>
> Some additional information. We noticed (through the admin's "Thread Dump"
> /solr/#/~threads) that whenever we see this behavior the all the threads
> that block show the same stacktrace [10] and block at
>
>
> org.apache.solr.search.function.FileFloatSource$Cache.get(FileFloatSource.java:198)
>
> org.apache.solr.search.function.FileFloatSource.getCachedFloats(FileFloatSource.java:152)
>
> org.apache.solr.search.function.FileFloatSource.getValues(FileFloatSource.java:95)
>
> org.apache.lucene.queries.function.valuesource.MultiFloatFunction.getValues(MultiFloatFunction.java:76)
>
> org.apache.lucene.queries.function.ValueSource$WrappedDoubleValuesSource.getValues(ValueSource.java:203)
>
> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource.getValues(FunctionScoreQuery.java:255)
>
> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight.scorer(FunctionScoreQuery.java:218)
> ...
>
> The boostfiles (external_boostvalue) are ~30M large and the schema fields
> are configured in the schema [11] with:
>   <field name="boostvalue" type="fileboost"/>
>
> Regards
> Vangelis
>
> [10]
>
> https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-stacktrace
> [11]
>
> https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-schema-boostfile
>

Reply via email to