Hi Naresh, Actually any JVM process hang could lead to segmentation. If some node is not responsive for longer than failureDetectionTimeout, it will be kicked off from the cluster to prevent all over grid performance degradation.
It works on following scenario. Let's say we have 3 nodes in a ring: n1 -> n2 -> n3. Over ring go some discovery messages along with metrics and connection checks with predefined interval. Node 2 start experiencing issues like GC pause or OS failures that forces process to stop. For that time node 1 is unable to send message to n2 (it doesn't receive ack). n1 waits for failureDetectionTimeout and establishes connection to n3: n1 -> n3; when n2 is not connected. Cluster treated n2 as failed. When n2 comes back it tries to connect to n3 and send message across ring, when it receives message that it's out of grid. For n2 that means it was segmented and best what it could do is stop. To check if there were large JVM or system pauses, you may enable GC logs. If they longer than failureDetectionTimeout, then node will be segmented. The best way would be to solve pauses, but like a workaround - increase timeout. Thanks! -Dmitry -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/