Hi, at first glance you really have a network problems, check 04c.log :
2022-01-25 18:32:53.858+0000 WARN 
[grid-nio-worker-tcp-comm-2-#25%TcpCommunicationSpi%] 
o.a.i.s.c.t.TcpCommunicationSpi          : Communication SPI session write 
timed out (consider increasing 'socketWriteTimeout' configuration property) 
[remoteAddr=/169.182.110.132:36364, writeTimeout=2000]
 
>Hi Ignite team,
> 
>We are using Ignite 2.10.0 and we have a 5-node Ignite cluster with persistent 
>enabled. The nodes have the following node id and consistent id:
>*  01p – node id=ee035a96, consistent id=lrdeqprmap01p
>*  02p – node id=81d7df57, consistent id=lrdeqprmap02p
>*  03p – node id=3a275472, consistent id=lrdeqprmap03p
>*  03c – node id=e8c54e6d, consistent id=lcgeqprmap03c
>*  04c – node id=de3959cf, consistent id=lcgeqprmap04c
> 
>One of the nodes, 03c, crashed one day. We would like to figure out the root 
>cause of the crash. I check the logs with the following findings:
> 
>*  From 03c log, 03c was trying to connect to 04c multiple times, starting 
>from 18:49:56 but all were unsuccessful. Eventually the node thought it’s 
>segmented and killed itself due to critical system error.
>*  From 04c log, 04c was rejecting all connections from 03c since 18:49:56, as 
>04c thought 03c was failed and regarded it as unknown node.
>*  In 04c, there were a lot of “Possible starvation in stripped pool” warning 
>since 18:35:15.
>*  In 04c, there were a lot of TCP client created, trying to connect to 02p 
>since 18:33:51. At the same time, in 02p there were a lot of “Received 
>incoming connection when already connected to this node, rejecting” 04p.
>*  I can confirm that there were no network outage between the nodes.
> 
>I have also attached the log for your information, and also our ignite xml 
>config. Can you please help to investigate? Thanks.
> 
>Regards,
>Marcus
>  
 
 
 
 

Reply via email to