The suspect messages are initiated by IOExceptions being thrown when
sending datagram messages. You can get more detail by enabling
fine-level logging or use log4j config to enable debug-level logging for
the package
org.apache.geode.distributed.internal.membership.gms.messenger. These
can cause a node to be kicked out & should be investigated.
The "entry destroy" messages are logged by the cache server. They aren't
associated with nodes being kicked out.
On 8/20/17 4:30 AM, Thacker, Dharam wrote:
Hi Team,
Region: Event [REPLICATED]
I am seeing strange sequence of messages and it results into shutdown
of whole distributed system. Could you help me to verify/understand
the same?
I suspect that it’s mainly due to sudden closing/crashing of Jgroup
channel which makes member unreachable/undiscoverable even being
locator1 and server1 on same host.
Topology >> [HOST1XX (Locator1,Server1) + HOST2XX [Locatro2,Server2)]
*_Server1:_*
[info 2017/08/13 18:53:04.466 EDT Server1 <ServerConnection on port
40404 Thread 1> tid=0x280] Server connection from
[identity(*HOST1XX*(21470:loner):32998:4305bddb,connection=2;
port=57024]: during entry destroy no entry was found for key A414924
{What is this strange message and what does it indicate?}
// here after Server1 removes all members from view considering
suspect via reason = Unable to send messages to this member via JGroups
*_Server2_*:
[info 2017/08/18 04:38:26.777 EDT Server2 <ServerConnection on port
40404 Thread 10> tid=0x1b7f] Server connection from
[identity(*HOST1XX*(21470:lon
er):32998:4305bddb,connection=2; port=60382]: during entry destroy no
entry was found for key A1253345051 {What is this strange message and
what does it indicate?}
// here after Server2 removes all members from view considering
suspect via reason = Unable to send messages to this member via JGroups
*_Locator1_*:
[info 2017/08/19 18:03:52.397 EDT Locator1 <Geode Heartbeat Sender>
tid=0x31] received suspect message from
*HOST1XX*(Locator1:27516:locator)<ec><v1>:1024 for
HOST2XX(Locator2:5676:locator)<ec><v0>:1024: *Unable to send messages
to this member via JGroups*
[info 2017/08/19 18:03:52.410 EDT Locator1 <Geode Failure Detection
thread 45> tid=0xc6] Performing final check for suspect member
*HOST1XX*(Server1:28323)<ec><v3>:1025 reason=Unable to send messages
to this member via JGroups
[info 2017/08/19 18:03:52.410 EDT Locator1 <Geode Failure Detection
thread 44> tid=0xc5] Performing final check for suspect member
*HOST2XX*(Server2:5935)<ec><v2>:1025 reason=Unable to send messages to
this member via JGroups
[info 2017/08/19 18:03:52.411 EDT Locator1 <Geode Failure Detection
thread 46> tid=0xc7] Performing final check for suspect member
*HOST2XX*(Locator2:5676:locator)<ec><v0>:1024 reason=Unable to send
messages to this member via JGroups
[info 2017/08/19 18:03:52.414 EDT Locator1 <Geode Failure Detection
thread 46> tid=0xc7] Final check failed - requesting removal of
suspect member *HOST2XX*(Locator2:5676:locator)<ec><v0>:1024
[info 2017/08/19 18:03:52.414 EDT Locator1 <Geode Failure Detection
thread 44> tid=0xc5] Final check failed - requesting removal of
suspect member *HOST2XX*(Server2:5935)<ec><v2>:1025
[info 2017/08/19 18:03:52.414 EDT Locator1 <Geode Failure Detection
thread 45> tid=0xc6] Final check failed - requesting removal of
suspect member *HOST1XX*(Server1:28323)<ec><v3>:1025
*_Locator2_*:
[info 2017/08/19 18:03:59.313 EDT Locator2 <Geode Failure Detection
thread 43> tid=0xaa] received suspect message from
*HOST2XX*(Locator2:5676:locator)<ec><v0>:1024 for
HOST1XX(Locator1:27516:locator)<ec><v1>:1024: Member isn't responding
to heartbeat requests
[info 2017/08/19 18:03:59.321 EDT Locator2 <Geode Failure Detection
thread 44> tid=0xab] Performing final check for suspect member
*HOST1XX*(Locator1:27516:locator)<ec><v1>:1024 reason=Member isn't
responding to heartbeat requests
[info 2017/08/19 18:03:59.959 EDT Locator2 <unicast
receiver,HOST2XX-43172> tid=0x2b] received suspect message from
*HOST2XX*(Server2:5935)<ec><v2>:1025 for
HOST1XX(Server1:28323)<ec><v3>:1025: Member isn't responding to
heartbeat requests
[info 2017/08/19 18:03:59.960 EDT Locator2 <Geode Failure Detection
thread 43> tid=0xaa] Performing final check for suspect member
*HOST1XX*(Server1:28323)<ec><v3>:1025 reason=Member isn't responding
to heartbeat requests
[info 2017/08/19 18:04:04.324 EDT Locator2 <Geode Failure Detection
thread 44> tid=0xab] Final check failed - requesting removal of
suspect member *HOST1XX*(Locator1:27516:locator)<ec><v1>:1024
Thanks,
Dharam
This message is confidential and subject to terms at:
http://www.jpmorgan.com/emaildisclaimer
<http://www.jpmorgan.com/emaildisclaimer> including on
confidentiality, legal privilege, viruses and monitoring of electronic
messages. If you are not the intended recipient, please delete this
message and notify the sender immediately. Any unauthorized use is
strictly prohibited.