Forgot to mention, you need to collect the output while you observe the CPU
spike.

On Wed, Jun 16, 2021 at 11:38 AM Rahul Goswami <[email protected]>
wrote:

> Also, can you please share the output of the below REST call from the
> browser (any non-IE browser)?
>
> localhost:8983/solr/admin/info/threads
>
> Please replace the Solr port in the above call as per your environment.
>
> - Rahul
>
> On Wed, Jun 16, 2021 at 11:22 AM Dave <[email protected]>
> wrote:
>
>> Out of curiosity do you have any suggesters configured?  They tend to do
>> hard lock ups if they need to rebuild.  Not saying that’s what’s going on
>> but I’ve definitely had issues with them on restarting the servers and
>> freezing the entire thing until they were done
>>
>> > On Jun 16, 2021, at 11:15 AM, Charlie Hubbard <
>> [email protected]> wrote:
>> >
>> > Hi,
>> >
>> > I had an event recently where my Solr Server CPU spiked, and stopped
>> > answering incoming messages in a timely manner.  I'm having a hard time
>> > pinpointing the cause.  I'm reviewing the log files, but I can't explain
>> > why the CPU suddenly spiked.  Once this happened all of the
>> > queries suddenly started to back up.  There doesn't appear to be many
>> > queries coming in when this happened, but since queries are logged after
>> > they complete I think it's hard to know for sure.
>> >
>> > My setup is:
>> >
>> >   - Single Node standalone server
>> >   - OpenJDK 1.8
>> >   - Solr 6.6.6
>> >   - No replication configured
>> >   -  4 collections ranging from 200mb - 20gb
>> >   - 112 million docs in the collection in question
>> >
>> > Here is the section in the logs where the queries started to pile up.
>> You
>> > see the QTimes start to go way up.
>> >
>> > 2021-06-15 16:27:35.647 INFO  (qtp1836643189-506) [   x:customer]
>> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
>> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=60736
>> >> 2021-06-15 16:27:35.647 INFO  (qtp1836643189-483) [   x:customer]
>> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
>> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=132802
>> >> 2021-06-15 16:27:35.647 INFO  (qtp1836643189-474) [   x:customer]
>> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
>> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=154802
>> >> 2021-06-15 16:27:35.647 INFO  (qtp1836643189-467) [   x:customer]
>> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
>> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=121797
>> >> 2021-06-15 16:27:35.648 INFO  (qtp1836643189-459) [   x:customer]
>> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
>> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=165804
>> >> 2021-06-15 16:27:35.648 INFO  (qtp1836643189-487) [   x:customer]
>> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/admin/luke
>> >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0
>> QTime=110803
>> >> 2021-06-15 16:27:35.648 INFO  (qtp1836643189-497) [   x:customer]
>> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/admin/luke
>> >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0
>> QTime=798
>> >> 2021-06-15 16:27:35.647 INFO  (qtp1836643189-495) [   x:customer]
>> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
>> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=99798
>> >> 2021-06-15 16:27:35.650 INFO  (qtp1836643189-448) [   x:customer]
>> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
>> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=176807
>> >> 2021-06-15 16:27:35.650 INFO  (qtp1836643189-464) [   x:customer]
>> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/admin/luke
>> >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0
>> QTime=176809
>> >> 2021-06-15 16:27:35.650 INFO  (qtp1836643189-479) [   x:customer]
>> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/admin/luke
>> >> params={numTerms=0&show=index&wt=json&_=1623773588322} status=0
>> QTime=99804
>> >> 2021-06-15 16:27:35.650 INFO  (qtp1836643189-445) [   x:customer]
>> >> o.a.s.c.S.Request [customer]  webapp=/solr path=/replication
>> >> params={wt=json&command=details&_=1623773588323} status=0 QTime=187795
>> >
>> >
>> > That burst of replication and luke handlers lasted for 1 second, but
>> there
>> > are 155 entries of that over and over.  Prior to this there isn't any
>> > evidence that I can see there were any queries happening that might have
>> > slowed things down.  There was some updating happening on the smaller
>> > collection, but not on the collection in question.
>> >
>> > From the GC logs I did find some overlap where thread stoppages were
>> taking
>> > place around and leading up to the point in time.
>> >
>> > Heap after GC invocations=92 (full 1):
>> >> par new generation   total 436928K, used 86155K [0x0000000540000000,
>> >> 0x0000000560000000, 0x00000005e0000000)
>> >>  eden space 349568K,   0% used [0x0000000540000000, 0x0000000540000000,
>> >> 0x0000000555560000)
>> >>  from space 87360K,  98% used [0x0000000555560000, 0x000000055a982ef8,
>> >> 0x000000055aab0000)
>> >>  to   space 87360K,   0% used [0x000000055aab0000, 0x000000055aab0000,
>> >> 0x0000000560000000)
>> >> concurrent mark-sweep generation total 1572864K, used 681710K
>> >> [0x00000005e0000000, 0x0000000640000000, 0x00000007c0000000)
>> >> Metaspace       used 41348K, capacity 42508K, committed 42772K,
>> reserved
>> >> 1087488K
>> >>  class space    used 4812K, capacity 5121K, committed 5140K, reserved
>> >> 1048576K
>> >> }
>> >> 2021-06-15T16:27:39.384+0000: 1462.170: Total time for which
>> application
>> >> threads were stopped: 0.0653007 seconds, Stopping threads took:
>> 0.0019805
>> >> seconds
>> >> 2021-06-15T16:27:40.389+0000: 1463.174: Total time for which
>> application
>> >> threads were stopped: 0.0041873 seconds, Stopping threads took:
>> 0.0037697
>> >> seconds
>> >> 2021-06-15T16:27:40.921+0000: 1463.706: Total time for which
>> application
>> >> threads were stopped: 0.0029477 seconds, Stopping threads took:
>> 0.0024678
>> >> seconds
>> >> 2021-06-15T16:27:41.268+0000: 1464.053: Total time for which
>> application
>> >> threads were stopped: 0.0224393 seconds, Stopping threads took:
>> 0.0219226
>> >> seconds
>> >> 2021-06-15T16:27:41.388+0000: 1464.173: [GC (CMS Initial Mark) [1
>> >> CMS-initial-mark: 788750K(1572864K)] 1052906K(2009792K), 0.0022256
>> secs]
>> >> [Times: user=0.01 sys=0.00, real=0.00 secs]
>> >> 2021-06-15T16:27:41.390+0000: 1464.176: Total time for which
>> application
>> >> threads were stopped: 0.0060337 seconds, Stopping threads took:
>> 0.0032933
>> >> seconds
>> >> 2021-06-15T16:27:41.390+0000: 1464.176: [CMS-concurrent-mark-start]
>> >> 2021-06-15T16:27:41.440+0000: 1464.225: [CMS-concurrent-mark:
>> 0.050/0.050
>> >> secs] [Times: user=0.39 sys=0.00, real=0.05 secs]
>> >> 2021-06-15T16:27:41.440+0000: 1464.225: [CMS-concurrent-preclean-start]
>> >> 2021-06-15T16:27:41.445+0000: 1464.231: [CMS-concurrent-preclean:
>> >> 0.005/0.005 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]
>> >> 2021-06-15T16:27:41.445+0000: 1464.231:
>> >> [CMS-concurrent-abortable-preclean-start]
>> >> 2021-06-15T16:27:41.531+0000: 1464.316: Total time for which
>> application
>> >> threads were stopped: 0.0018167 seconds, Stopping threads took:
>> 0.0012697
>> >> seconds
>> >> 2021-06-15T16:27:41.532+0000: 1464.317: Total time for which
>> application
>> >> threads were stopped: 0.0009634 seconds, Stopping threads took:
>> 0.0005877
>> >> seconds
>> >> 2021-06-15T16:27:41.533+0000: 1464.318: Total time for which
>> application
>> >> threads were stopped: 0.0006194 seconds, Stopping threads took:
>> 0.0003124
>> >> seconds
>> >> 2021-06-15T16:27:41.534+0000: 1464.319: Total time for which
>> application
>> >> threads were stopped: 0.0006736 seconds, Stopping threads took:
>> 0.0003113
>> >> seconds
>> >> 2021-06-15T16:27:41.735+0000: 1464.521: Total time for which
>> application
>> >> threads were stopped: 0.0009390 seconds, Stopping threads took:
>> 0.0005340
>> >> seconds
>> >> 2021-06-15T16:27:42.213+0000: 1464.998: Total time for which
>> application
>> >> threads were stopped: 0.0016779 seconds, Stopping threads took:
>> 0.0003837
>> >> seconds
>> >> 2021-06-15T16:27:42.214+0000: 1464.999: Total time for which
>> application
>> >> threads were stopped: 0.0006721 seconds, Stopping threads took:
>> 0.0002727
>> >> seconds
>> >> 2021-06-15T16:27:42.215+0000: 1465.000: Total time for which
>> application
>> >> threads were stopped: 0.0003734 seconds, Stopping threads took:
>> 0.0000991
>> >> seconds
>> >> 2021-06-15T16:27:42.217+0000: 1465.002: Total time for which
>> application
>> >> threads were stopped: 0.0004387 seconds, Stopping threads took:
>> 0.0001799
>> >> seconds
>> >> 2021-06-15T16:27:42.219+0000: 1465.004: Total time for which
>> application
>> >> threads were stopped: 0.0004952 seconds, Stopping threads took:
>> 0.0002235
>> >> seconds
>> >> 2021-06-15T16:27:42.837+0000: 1465.622: Total time for which
>> application
>> >> threads were stopped: 0.0016946 seconds, Stopping threads took:
>> 0.0013149
>> >> seconds
>> >> 2021-06-15T16:27:43.433+0000: 1466.218: Total time for which
>> application
>> >> threads were stopped: 0.0008748 seconds, Stopping threads took:
>> 0.0004336
>> >> seconds
>> >> 2021-06-15T16:27:44.337+0000: 1467.122: Total time for which
>> application
>> >> threads were stopped: 0.0624611 seconds, Stopping threads took:
>> 0.0610425
>> >> seconds
>> >> 2021-06-15T16:27:45.458+0000: 1468.243: Total time for which
>> application
>> >> threads were stopped: 0.0329742 seconds, Stopping threads took:
>> 0.0324816
>> >> seconds
>> >> 2021-06-15T16:27:46.182+0000: 1468.967: Total time for which
>> application
>> >> threads were stopped: 0.0010628 seconds, Stopping threads took:
>> 0.0006594
>> >> seconds
>> >> 2021-06-15T16:27:46.241+0000: 1469.026: Total time for which
>> application
>> >> threads were stopped: 0.0583459 seconds, Stopping threads took:
>> 0.0579144
>> >> seconds
>> >> 2021-06-15T16:27:47.308+0000: 1470.093: Total time for which
>> application
>> >> threads were stopped: 0.0007076 seconds, Stopping threads took:
>> 0.0003400
>> >> seconds
>> >> 2021-06-15T16:27:47.371+0000: 1470.157: Total time for which
>> application
>> >> threads were stopped: 0.0020045 seconds, Stopping threads took:
>> 0.0014785
>> >> seconds
>> >> 2021-06-15T16:27:47.444+0000: 1470.229: Total time for which
>> application
>> >> threads were stopped: 0.0007924 seconds, Stopping threads took:
>> 0.0004075
>> >> seconds
>> >> CMS: abort preclean due to time 2021-06-15T16:27:47.450+0000: 1470.235:
>> >> [CMS-concurrent-abortable-preclean: 0.391/6.004 secs] [Times: user=9.74
>> >> sys=0.27, real=6.00 secs]
>> >> 2021-06-15T16:27:47.451+0000: 1470.236: [GC (CMS Final Remark) [YG
>> >> occupancy: 412669 K (436928 K)]{Heap before GC invocations=92 (full 2):
>> >
>> >
>> > Also I had restarted the service at 14:50 which is almost 2 hours prior
>> to
>> > this.  I found a similar pattern in this one but involving a somewhat
>> > smaller collection:
>> >
>> > 2021-06-15 15:44:09.698 INFO  (qtp1836643189-60) [   x:archiver]
>> >> o.a.s.c.S.Request [fusearchiver]  webapp=/solr path=/replication
>> >> params={wt=json&command=details&_=1623771837117} status=0 QTime=8397
>> >> 2021-06-15 15:44:09.698 INFO  (qtp1836643189-23) [   x:archiver]
>> >> o.a.s.c.S.Request [fusearchiver]  webapp=/solr path=/admin/luke
>> >> params={numTerms=0&show=index&wt=json&_=1623771837117} status=0
>> QTime=8399
>> >> 2021-06-15 15:44:10.035 INFO  (qtp1836643189-24) [   x:archiver]
>> >> o.a.s.c.S.Request [fusearchiver]  webapp=/solr path=/select
>> >> params={q=*:*&indent=on&fq=_accountId:12&wt=json&_=1623771792912}
>> >> hits=1272929 status=0 QTime=50886
>> >> 2021-06-15 15:44:10.039 INFO  (qtp1836643189-53) [   x:archiver]
>> >> o.a.s.c.S.Request [fusearchiver]  webapp=/solr path=/select
>> >> params={q=*:*&indent=on&fq=_accountId:12&wt=json&_=1623771792912}
>> >> hits=1272929 status=0 QTime=42102
>> >> 2021-06-15 15:44:10.077 INFO  (qtp1836643189-53) [   x:archiver]
>> >> o.a.s.s.HttpSolrCall Unable to write response, client closed
>> connection or
>> >> we are shutting down
>> >
>> >
>> > Again this is preceded by these calls to replication and luke handler
>> that
>> > have fairly large QTime.
>> >
>> > I am not using replication, but it is enabled.  From researching the
>> > command this is just a status call.  So where could this be triggered
>> from?
>> >
>> > Beyond that strange event what else should I be looking at or changing
>> in
>> > my configuration to find these issues?  Memory was around 7gb of the max
>> > 8gb.  GC doesn't seem excessive.  The memory is growing, but not
>> spiking.
>> > Should I turn up the memory allocation?
>> >
>> > TIA
>> > Charlie
>>
>

Reply via email to