Is your autoCommit configured to open new searchers? Did you try to set openSearcher to false?
Edward On Tue, Feb 11, 2020 at 3:40 PM Vangelis Katsikaros <vkatsika...@gmail.com> wrote: > Hi > > On Mon, Feb 10, 2020 at 5:05 PM Vangelis Katsikaros <vkatsika...@gmail.com > > > wrote: > > > Hi all > > > > We run Solr 8.2.0 > > * with Amazon Corretto 11.0.5.10.1 SDK (java arguments shown in [1]), > > * on Ubuntu 18.04 > > * on AWS EC2 m5.2xlarge with 8 CPUs and 32GB of RAM > > * with -Xmx16g [1]. > > > > We have migrated from Solr 3.5 and in big core (16GB) replicas we have > > started to suffer degraded service. The replica’s ReplicationHandler is > in > > [8] and the master’s updateHandler in [9]. > > > > We notice every 5 mins (the value for solr.autoCommit.maxTime) the > > following: > > * Solr uses all 8 CPUs. Suddenly for ~30 sec, it uses only 1 CPU at 100% > > and the rest of the CPUs are idle (mpstat [6]). In our previous setup > with > > Solr 3 we used up to 80% of all CPUs. > > * During that time the solr queries suddenly take more than 1 second, up > > to 30 sec (or more). The same queries otherwise need less than 1 sec to > > complete. > > * The disk does not seem to be a bottleneck (iostat [4]). > > * Memory does not seem to be a bottleneck (vmstat [5]). > > * CPU (apart from the single CPU issue) does not seem to be a bottleneck > > (mpstat [6] & pidstat [3]). > > * We are no java/GC experts but It does not seem to be GC related [7]. > > > > We have tried reducing the heap to 8 and 2GB with no positive effect. We > > have tested different autoCommit.maxTime values. Reducing it to 60 > seconds > > makes things unbearable. 5 minutes is not significantly different than > 10. > > Do you have any pointers to proceed debugging the issue? > > > > Detailed example problem that repeats every solr.autoCommit.maxTime > > minutes on the replicas: > > * From 12:36 to 12:39:04 queries are fast to serve [2]. Solr consumes CPU > > from all 8 CPUs (mpstat [6]). The metric solr.jvm.threads.blocked.count > is > > 0 [2]. > > * From 12:39:04 to 12:39:25 queries are slow to respond [2]. Solr > consumes > > only 1 out of 8 CPUs, the other 7 CPUs are idle (mpstat [6]). The metric > > solr.jvm.threads.blocked.count grows from 0 to a big 2 digit number [2]. > > * After 12:39:25 and until the next poll of a commit things are normal. > > > > Regards > > Vangelis > > > > [1] > > > https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-solr-info > > [2] > > > https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-slow-queries-and-solr-jvm-threads-blocked-count > > [3] > > > https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-pidstat > > [4] > > > https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-iostat > > [5] > > > https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-vmstat > > [6] > > > https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-mpstat > > [7] > > > https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-gc-logs > > [8] > > > https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-replica-replicationhandler > > [9] > > > https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-master-updatehandler > > > > Some additional information. We noticed (through the admin's "Thread Dump" > /solr/#/~threads) that whenever we see this behavior the all the threads > that block show the same stacktrace [10] and block at > > > org.apache.solr.search.function.FileFloatSource$Cache.get(FileFloatSource.java:198) > > org.apache.solr.search.function.FileFloatSource.getCachedFloats(FileFloatSource.java:152) > > org.apache.solr.search.function.FileFloatSource.getValues(FileFloatSource.java:95) > > org.apache.lucene.queries.function.valuesource.MultiFloatFunction.getValues(MultiFloatFunction.java:76) > > org.apache.lucene.queries.function.ValueSource$WrappedDoubleValuesSource.getValues(ValueSource.java:203) > > org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource.getValues(FunctionScoreQuery.java:255) > > org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight.scorer(FunctionScoreQuery.java:218) > ... > > The boostfiles (external_boostvalue) are ~30M large and the schema fields > are configured in the schema [11] with: > <field name="boostvalue" type="fileboost"/> > > Regards > Vangelis > > [10] > > https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-stacktrace > [11] > > https://gist.github.com/vkatsikaros/5102e8088a98ad1ee49516aafa6bc5c4#file-schema-boostfile >