Hello!

Metrics were sent when node was already segmented by cluster, hence
"unknown".

Why it did not reconnect, I do not know. Did you collect a thread dump of
this client node by chance?

Regards,
-- 
Ilya Kasnacheev


пн, 8 июн. 2020 г. в 09:53, VeenaMithare <v.mith...@cmcmarkets.com>:

> Hi all,
>
> Yesterday we had some network issues and we observed the server log having
> messages like :
> 2020-06-07T21:46:35,368 DEBUG c.c.p.c.c.p.d.CustomTcpDiscoverySpi
> [tcp-disco-msg-worker-#2]: Received metrics from unknown node:
> 8baf933f-e2cc-43be-818c-de2fb1259194
>
> Once we figured which client node this consistent id belonged to, we saw
> this message on the client node logs. Please note this is the last 'ignite'
> message in the logs on the client node :
>
> 2020-06-07T16:06:32,961 WARN  c.c.p.c.c.p.d.CustomTcpDiscoverySpi
> [tcp-client-disco-msg-worker-#4%instancename%]: Local node was dropped from
> cluster due to network problems, will try to reconnect with new id after
> 10000ms (reconnect delay can be changed using
> IGNITE_DISCO_FAILED_CLIENT_RECONNECT_DELAY system property)
> [newId=8baf933f-e2cc-43be-818c-de2fb1259194,
> prevId=d7674f40-6112-46a6-83f8-15656b01c66b, locNode=TcpDiscoveryNode
> [id=d7674f40-6112-46a6-83f8-15656b01c66b, addrs=[0:0:0:0:0:0:0:1%lo,
> a.b.c.d, 127.0.0.1, 192.168.61.150], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0,
> /127.0.0.1:0, hostname.companyname.local/a.b.c.d:0,
> multicast/192.168.61.150:0], discPort=0, order=193, intOrder=0,
> lastExchangeTime=1591527983474, loc=true, ver=2.7.6#20190911-sha1:21f7ca41,
> isClient=true], nodeInitiatedFail=c67403fd-812b-46b4-9c76-60f5052b57d7,
> msg=Client node considered as unreachable and will be dropped from cluster,
> because no metrics update messages received in interval:
> TcpDiscoverySpi.clientFailureDetectionTimeout() ms. It may be caused by
> network problems or long GC pause on client node, try to increase this
> parameter. [nodeId=d7674f40-6112-46a6-83f8-15656b01c66b,
> clientFailureDetectionTimeout=30000]]
>
>
> My questions are below :
>
> 1. If there are no metrics received, should the node not segmented.
> 2. This node did not get segmented. It took on a new node id. But the
> nodeid
> was not registered in the cluster ?( as per the code in updateMetrics in
> ServerImpl.java  - snapshot below . )
>
> 3. How do we monitor the cluster topology for these kind of scenarios ?
>
>
> ===========================================
>
>         private void updateMetrics(UUID nodeId,
>             ClusterMetrics metrics,
>             Map<Integer, CacheMetrics> cacheMetrics,
>             long tsNanos)
>         {
>             assert nodeId != null;
>             assert metrics != null;
>
>             TcpDiscoveryNode node = ring.node(nodeId);
>
>             if (node != null) {
>                   ..........
>
>             }
>             else if (log.isDebugEnabled())
>                 log.debug("Received metrics from unknown node: " + nodeId);
>         }
> }
>
> regards,
> Veena.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Reply via email to