Hi, at first glance you really have a network problems, check 04c.log : 2022-01-25 18:32:53.858+0000 WARN [grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%] o.a.i.s.c.t.TcpCommunicationSpi : Communication SPI session write timed out (consider increasing 'socketWriteTimeout' configuration property) [remoteAddr=/169.182.110.132:36364, writeTimeout=2000] >Hi Ignite team, > >We are using Ignite 2.10.0 and we have a 5-node Ignite cluster with persistent >enabled. The nodes have the following node id and consistent id: >* 01p – node id=ee035a96, consistent id=lrdeqprmap01p >* 02p – node id=81d7df57, consistent id=lrdeqprmap02p >* 03p – node id=3a275472, consistent id=lrdeqprmap03p >* 03c – node id=e8c54e6d, consistent id=lcgeqprmap03c >* 04c – node id=de3959cf, consistent id=lcgeqprmap04c > >One of the nodes, 03c, crashed one day. We would like to figure out the root >cause of the crash. I check the logs with the following findings: > >* From 03c log, 03c was trying to connect to 04c multiple times, starting >from 18:49:56 but all were unsuccessful. Eventually the node thought it’s >segmented and killed itself due to critical system error. >* From 04c log, 04c was rejecting all connections from 03c since 18:49:56, as >04c thought 03c was failed and regarded it as unknown node. >* In 04c, there were a lot of “Possible starvation in stripped pool” warning >since 18:35:15. >* In 04c, there were a lot of TCP client created, trying to connect to 02p >since 18:33:51. At the same time, in 02p there were a lot of “Received >incoming connection when already connected to this node, rejecting” 04p. >* I can confirm that there were no network outage between the nodes. > >I have also attached the log for your information, and also our ignite xml >config. Can you please help to investigate? Thanks. > >Regards, >Marcus >