Hi Ilya, Unfortunately I have to rebuild the database and did not keep the persistence files. But you are right that the failed nodes fails every time on restart.
I will see if I can reproduce the issue – in the meantime do you have any suggestions on what I should check? Regards, Marcus From: [gmail.com] Ilya Kasnacheev <[email protected]> Sent: Thursday, May 20, 2021 11:11 PM To: [email protected] Subject: Re: Multiple ignite nodes crashed at the same time due to "Maximum number of retries 100000 reached for Put operation" error Hello! This looks like a PDS corruption to me. Can you by chance share persistence files from problematic node? I am assuming that it fails every time on restart? Regards, -- Ilya Kasnacheev чт, 20 мая 2021 г. в 12:52, Lo, Marcus <[email protected]<mailto:[email protected]>>: Hi, We have a 4 node ignite cluster setup. After running the cluster for 1 day, we encounter the following error almost at the same time at node #2, #3, and #4: Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=CRITICAL_ERROR, err=class o.a.i.IgniteCheckedException: Maximum number of retries 1000 reached for Put operation (the tree may be corrupted). Increase IGNITE_BPLUS_TREE_LOCK_RETRIES system property if you regularly see this message (current value is 1000).]] org.apache.ignite.IgniteCheckedException: Maximum number of retries 1000 reached for Put operation (the tree may be corrupted). Increase IGNITE_BPLUS_TREE_LOCK_RETRIES system property if you regularly see this message (current value is 1000). at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Get.checkLockRetry(BPlusTree.java<https://urldefense.com/v3/__http:/BPlusTree.java__;!!Jkho33Y!wXDvTy9zDPsGD_42OvuMYDtim1VCECJc2bGN7afJsQSV61qWiDKm48UYwDkgwA$>:3109) [ignite-core-2.10.0.jar:2.10.0] at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Put.checkLockRetry(BPlusTree.java<https://urldefense.com/v3/__http:/BPlusTree.java__;!!Jkho33Y!wXDvTy9zDPsGD_42OvuMYDtim1VCECJc2bGN7afJsQSV61qWiDKm48UYwDkgwA$>:3906) [ignite-core-2.10.0.jar:2.10.0] Tried increasing IGNITE_BPLUS_TREE_LOCK_RETRIES to 100,000 and restarted the nodes, but it didn’t help and the node went into the same error straight away. Can you please shed some lights on how to resolve the issue? Thanks. I also attach the logs for your reference: ignite-node-[1,2,3,4].log: the full log files for all nodes ignite-restart.log: the log for node 2 when it crashed Regards, Marcus
