CDCR cpu usage 100% with some errors

Louis Mon, 28 Oct 2019 11:48:05 -0700

* Solr Version 7.7. Using Cloud with CDCR
* 3 replicas 1 shard on production and disaster recovery


Hi,

Last week, I posted a question about tlogs -
https://lucene.472066.n3.nabble.com/tlogs-are-not-deleted-td4451323.html#a4451430

I disabled buffer based on the advice, but still, tlogs in "production" are
not being deleted. (tlogs in "disaster recovery" nodes are cleaned.) 

And there is another issue, which I suspect it to be related to the problem
that I previously posted. 

I am having tons of logs from our "disaster recovery" nodes. The log files
are building up at an incredibly fast rate with the messages below forever
and cpu usage is always 100% every day("production" nodes' cpu usage is
normal).

It looks like replicating from production server to disaster recovery, but
it actually never ends.

Is this high cpu usage on disaster recovery nodes be normal? 
And is tlogs, which is not being cleaned properly, on production nodes
related to high cpu usage on dr nodes?


*<these are the sample messages from tons of logs in disaster recovert
nodes> *

2019-10-28 18:25:09.817 INFO  (qtp404214852-90778) [c:test_collection
s:shard1 r:core_node3 x:test_collection_shard1_replica_n1] o.a.s.c.S.Request
[test_collection1_shard1_replica_n1]  webapp=/solr path=/cdcr
params={action=LASTPROCESSEDVERSION&wt=javabin&version=2} status=0 QTime=0
2019-10-28 18:25:09.817 INFO  (qtp404214852-90778) [c:test_collection
s:shard1 r:core_node3 x:test_collection_shard1_replica_n1] o.a.s.c.S.Request
[test_collection2_shard1_replica_n1]  webapp=/solr path=/cdcr
params={action=LASTPROCESSEDVERSION&wt=javabin&version=2} status=0 QTime=0
2019-10-28 18:25:09.817 INFO  (qtp404214852-90778) [c:test_collection
s:shard1 r:core_node3 x:test_collection_shard1_replica_n1] o.a.s.c.S.Request
[test_collection3_shard1_replica_n1]  webapp=/solr path=/cdcr
params={action=LASTPROCESSEDVERSION&wt=javabin&version=2} status=0 QTime=0
2019-10-28 18:18:11.729 INFO  (cdcr-replicator-378-thread-1) [   ]
o.a.s.h.CdcrReplicator Forwarded 0 updates to target test_collection1
2019-10-28 18:18:11.730 INFO  (cdcr-replicator-282-thread-1) [   ]
o.a.s.h.CdcrReplicator Forwarded 0 updates to target test_collection2
2019-10-28 18:18:11.730 INFO  (cdcr-replicator-332-thread-1) [   ]
o.a.s.h.CdcrReplicator Forwarded 0 updates to target test_collection3
...


*And in the middle of logs, I see the following exception for some of the
collections.*


2019-10-28 18:18:11.732 WARN  (cdcr-replicator-404-thread-1) [   ]
o.a.s.h.CdcrReplicator Failed to forward update request to target:
collection_steps
java.lang.ClassCastException: java.lang.Long cannot be cast to
java.util.List
        at
org.apache.solr.update.CdcrUpdateLog$CdcrLogReader.getVersion(CdcrUpdateLog.java:732)
~[solr-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 -
jimczi - 2019-02-04 23:23:46]
        at
org.apache.solr.update.CdcrUpdateLog$CdcrLogReader.next(CdcrUpdateLog.java:635)
~[solr-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 -
jimczi - 2019-02-04 23:23:46]
        at
org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:77)
~[solr-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 -
jimczi - 2019-02-04 23:23:46]
        at
org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$0(CdcrReplicatorScheduler.java:81)
~[solr-core-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 -
jimczi - 2019-02-04 23:23:46]
        at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
~[solr-solrj-7.7.0.jar:7.7.0 8c831daf4eb41153c25ddb152501ab5bae3ea3d5 -
jimczi - 2019-02-04 23:23:50]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_181]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_181]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

CDCR cpu usage 100% with some errors

Reply via email to