The suspect messages are initiated by IOExceptions being thrown when sending datagram messages. You can get more detail by enabling fine-level logging or use log4j config to enable debug-level logging for the package org.apache.geode.distributed.internal.membership.gms.messenger. These can cause a node to be kicked out & should be investigated.

The "entry destroy" messages are logged by the cache server. They aren't associated with nodes being kicked out.


On 8/20/17 4:30 AM, Thacker, Dharam wrote:

Hi Team,

Region: Event [REPLICATED]

I am seeing strange sequence of messages and it results into shutdown of whole distributed system. Could you help me to verify/understand the same?

I suspect that it’s mainly due to sudden closing/crashing of Jgroup channel which makes member unreachable/undiscoverable even being locator1 and server1 on same host.

Topology >> [HOST1XX (Locator1,Server1) + HOST2XX [Locatro2,Server2)]

*_Server1:_*

[info 2017/08/13 18:53:04.466 EDT Server1 <ServerConnection on port 40404 Thread 1> tid=0x280] Server connection from [identity(*HOST1XX*(21470:loner):32998:4305bddb,connection=2; port=57024]: during entry destroy no entry was found for key A414924 {What is this strange message and what does it indicate?}

// here after Server1 removes all members from view considering suspect via reason = Unable to send messages to this member via JGroups

*_Server2_*:

[info 2017/08/18 04:38:26.777 EDT Server2 <ServerConnection on port 40404 Thread 10> tid=0x1b7f] Server connection from [identity(*HOST1XX*(21470:lon

er):32998:4305bddb,connection=2; port=60382]: during entry destroy no entry was found for key A1253345051 {What is this strange message and what does it indicate?}

// here after Server2 removes all members from view considering suspect via reason = Unable to send messages to this member via JGroups

*_Locator1_*:

[info 2017/08/19 18:03:52.397 EDT Locator1 <Geode Heartbeat Sender> tid=0x31] received suspect message from *HOST1XX*(Locator1:27516:locator)<ec><v1>:1024 for HOST2XX(Locator2:5676:locator)<ec><v0>:1024: *Unable to send messages to this member via JGroups*

[info 2017/08/19 18:03:52.410 EDT Locator1 <Geode Failure Detection thread 45> tid=0xc6] Performing final check for suspect member *HOST1XX*(Server1:28323)<ec><v3>:1025 reason=Unable to send messages to this member via JGroups

[info 2017/08/19 18:03:52.410 EDT Locator1 <Geode Failure Detection thread 44> tid=0xc5] Performing final check for suspect member *HOST2XX*(Server2:5935)<ec><v2>:1025 reason=Unable to send messages to this member via JGroups

[info 2017/08/19 18:03:52.411 EDT Locator1 <Geode Failure Detection thread 46> tid=0xc7] Performing final check for suspect member *HOST2XX*(Locator2:5676:locator)<ec><v0>:1024 reason=Unable to send messages to this member via JGroups

[info 2017/08/19 18:03:52.414 EDT Locator1 <Geode Failure Detection thread 46> tid=0xc7] Final check failed - requesting removal of suspect member *HOST2XX*(Locator2:5676:locator)<ec><v0>:1024

[info 2017/08/19 18:03:52.414 EDT Locator1 <Geode Failure Detection thread 44> tid=0xc5] Final check failed - requesting removal of suspect member *HOST2XX*(Server2:5935)<ec><v2>:1025

[info 2017/08/19 18:03:52.414 EDT Locator1 <Geode Failure Detection thread 45> tid=0xc6] Final check failed - requesting removal of suspect member *HOST1XX*(Server1:28323)<ec><v3>:1025

*_Locator2_*:

[info 2017/08/19 18:03:59.313 EDT Locator2 <Geode Failure Detection thread 43> tid=0xaa] received suspect message from *HOST2XX*(Locator2:5676:locator)<ec><v0>:1024 for HOST1XX(Locator1:27516:locator)<ec><v1>:1024: Member isn't responding to heartbeat requests

[info 2017/08/19 18:03:59.321 EDT Locator2 <Geode Failure Detection thread 44> tid=0xab] Performing final check for suspect member *HOST1XX*(Locator1:27516:locator)<ec><v1>:1024 reason=Member isn't responding to heartbeat requests

[info 2017/08/19 18:03:59.959 EDT Locator2 <unicast receiver,HOST2XX-43172> tid=0x2b] received suspect message from *HOST2XX*(Server2:5935)<ec><v2>:1025 for HOST1XX(Server1:28323)<ec><v3>:1025: Member isn't responding to heartbeat requests

[info 2017/08/19 18:03:59.960 EDT Locator2 <Geode Failure Detection thread 43> tid=0xaa] Performing final check for suspect member *HOST1XX*(Server1:28323)<ec><v3>:1025 reason=Member isn't responding to heartbeat requests

[info 2017/08/19 18:04:04.324 EDT Locator2 <Geode Failure Detection thread 44> tid=0xab] Final check failed - requesting removal of suspect member *HOST1XX*(Locator1:27516:locator)<ec><v1>:1024

Thanks,

Dharam

This message is confidential and subject to terms at: http://www.jpmorgan.com/emaildisclaimer <http://www.jpmorgan.com/emaildisclaimer> including on confidentiality, legal privilege, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.


Reply via email to