hi Ilya, Thanks! opening the verbose logger will run for day to collect the logging, the cluster did not stuck eventually, still can get response.
Regards Aaron From: Ilya Kasnacheev Date: 2018-12-18 19:36 To: user CC: aaron Subject: Re: Re: Partition-exchanger blocked after upgrade to 2.7 Hello! It's still hard to say. Can you enable more verbose logging for org.apache.ignite? Did the cluster un-stuck eventually? Regards, -- Ilya Kasnacheev вт, 18 дек. 2018 г. в 14:25, aa...@tophold.com <aa...@tophold.com>: Hi Ilya, Attached is the full log of another ignite nodes. the data in the cluster will be written back to the mysql. For this nodes the ERROR happen at 2018-12-14 10:38:51.730 around , but in fact after that, the nodes still working. Regards Aaron From: Ilya Kasnacheev Date: 2018-12-18 18:44 To: user Subject: Re: Partition-exchanger blocked after upgrade to 2.7 Hello! Unfortunately it's hard to say what happens here from such short log snippet. Can you provide full logs? Regards, -- Ilya Kasnacheev вт, 18 дек. 2018 г. в 05:51, aa...@tophold.com <aa...@tophold.com>: Hello, After we upgrade to the 2.7 we meet a wired warn; basically all our ignite cache running in LOCAL model in a internal network. All the configuration are almost default. but we meet a ERROR logger of the tcp-disco-msg-worker* but after that the the cluster still working, no crash happen. [ERROR] 2018-12-17 23:52:55.989 [tcp-disco-msg-worker-#2%PortfolioEventIgnite%] [ig] G - Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [threadName=partition-exchanger, blockedFor=5s] [WARN ] 2018-12-17 23:52:55.989 [tcp-disco-msg-worker-#2%PortfolioEventIgnite%] [ig] G - Thread [name="exchange-worker-#98%PortfolioEventIgnite%", id=152, state=TIMED_WAITING, blockCnt=0, waitCnt=10143] Lock [object=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@39b50130, ownerName=null, ownerId=-1] [WARN ] 2018-12-17 23:52:55.998 [tcp-disco-msg-worker-#2%PortfolioEventIgnite%] [ig] FailureProcessor - No deadlocked threads detected. [WARN ] 2018-12-17 23:52:57.443 [jvm-pause-detector-worker] [ig] IgniteKernal%PortfolioEventIgnite - Possible too long JVM pause: 1404 milliseconds. [WARN ] 2018-12-17 23:52:57.457 [tcp-disco-msg-worker-#2%PortfolioEventIgnite%] [ig] FailureProcessor - Thread dump at 2018/12/17 23:52:57 UTC While cache are local, not sure why the partition-exchanger still blocking. Also the tcp-disco-msg-worker, as running in internal network, so this warn suppose not happen. "Possible too long JVM pause: 1404 milliseconds" from the gc details during that time around the cost is reasonable: 2018-12-18T07:44:27.513+0800: 50200.190: [GC pause (G1 Evacuation Pause) (young), 0.0241404 secs] .... [Times: user=0.19 sys=0.00, real=0.02 secs] 2018-12-18T07:53:21.453+0800: 50734.129: [GC pause (G1 Evacuation Pause) (young), 0.0221342 secs] ... [Times: user=0.20 sys=0.00, real=0.02 secs] Regards Aaron