Hello! Metrics were sent when node was already segmented by cluster, hence "unknown".
Why it did not reconnect, I do not know. Did you collect a thread dump of this client node by chance? Regards, -- Ilya Kasnacheev пн, 8 июн. 2020 г. в 09:53, VeenaMithare <v.mith...@cmcmarkets.com>: > Hi all, > > Yesterday we had some network issues and we observed the server log having > messages like : > 2020-06-07T21:46:35,368 DEBUG c.c.p.c.c.p.d.CustomTcpDiscoverySpi > [tcp-disco-msg-worker-#2]: Received metrics from unknown node: > 8baf933f-e2cc-43be-818c-de2fb1259194 > > Once we figured which client node this consistent id belonged to, we saw > this message on the client node logs. Please note this is the last 'ignite' > message in the logs on the client node : > > 2020-06-07T16:06:32,961 WARN c.c.p.c.c.p.d.CustomTcpDiscoverySpi > [tcp-client-disco-msg-worker-#4%instancename%]: Local node was dropped from > cluster due to network problems, will try to reconnect with new id after > 10000ms (reconnect delay can be changed using > IGNITE_DISCO_FAILED_CLIENT_RECONNECT_DELAY system property) > [newId=8baf933f-e2cc-43be-818c-de2fb1259194, > prevId=d7674f40-6112-46a6-83f8-15656b01c66b, locNode=TcpDiscoveryNode > [id=d7674f40-6112-46a6-83f8-15656b01c66b, addrs=[0:0:0:0:0:0:0:1%lo, > a.b.c.d, 127.0.0.1, 192.168.61.150], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, > /127.0.0.1:0, hostname.companyname.local/a.b.c.d:0, > multicast/192.168.61.150:0], discPort=0, order=193, intOrder=0, > lastExchangeTime=1591527983474, loc=true, ver=2.7.6#20190911-sha1:21f7ca41, > isClient=true], nodeInitiatedFail=c67403fd-812b-46b4-9c76-60f5052b57d7, > msg=Client node considered as unreachable and will be dropped from cluster, > because no metrics update messages received in interval: > TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by > network problems or long GC pause on client node, try to increase this > parameter. [nodeId=d7674f40-6112-46a6-83f8-15656b01c66b, > clientFailureDetectionTimeout=30000]] > > > My questions are below : > > 1. If there are no metrics received, should the node not segmented. > 2. This node did not get segmented. It took on a new node id. But the > nodeid > was not registered in the cluster ?( as per the code in updateMetrics in > ServerImpl.java - snapshot below . ) > > 3. How do we monitor the cluster topology for these kind of scenarios ? > > > =========================================== > > private void updateMetrics(UUID nodeId, > ClusterMetrics metrics, > Map<Integer, CacheMetrics> cacheMetrics, > long tsNanos) > { > assert nodeId != null; > assert metrics != null; > > TcpDiscoveryNode node = ring.node(nodeId); > > if (node != null) { > .......... > > } > else if (log.isDebugEnabled()) > log.debug("Received metrics from unknown node: " + nodeId); > } > } > > regards, > Veena. > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >