[ https://issues.apache.org/jira/browse/IGNITE-10354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ilya Kasnacheev updated IGNITE-10354: ------------------------------------- Fix Version/s: 2.8 > Failing client node due to not receiving metrics updates > -------------------------------------------------------- > > Key: IGNITE-10354 > URL: https://issues.apache.org/jira/browse/IGNITE-10354 > Project: Ignite > Issue Type: Bug > Components: clients > Affects Versions: 2.6 > Reporter: Roman Guseinov > Assignee: Roman Guseinov > Priority: Major > Fix For: 2.8 > > Attachments: ClientDisconnectedTest.java > > > In some cases after the coordinator change, the client node can be failed > before it can establish a connection to another server from the cluster. > {code:java} > [2018-11-21 12:21:45,769][WARN > ][tcp-disco-msg-worker-#15%server-b%][TestTcpDiscoverySpi] Failing client > node due to not receiving metrics updates from client node within > 'IgniteConfiguration.clientFailureDetectionTimeout' (consider increasing > configuration property) [timeout=10000, node=TcpDiscoveryNode > [id=dc739711-f685-45e8-9017-1f91b1d86c8c, addrs=[0:0:0:0:0:0:0:1, 10.0.75.1, > 127.0.0.1, 192.168.1.51, 192.168.192.1], sockAddrs=[/0:0:0:0:0:0:0:1:0, > LAPTOP-6FN8RAOS/10.0.75.1:0, /127.0.0.1:0, /192.168.192.1:0, > /192.168.1.51:0], discPort=0, order=2, intOrder=2, > lastExchangeTime=1542774105666, loc=false, ver=2.4.0#20180830-sha1:345c0a7c, > isClient=true]] > [2018-11-21 12:21:45,791][INFO > ][tcp-client-disco-msg-worker-#10%client%][TestTcpDiscoverySpi] Client node > disconnected from cluster, will try to reconnect with new id > [newId=46812956-2fc4-4b74-9909-d523a547ba0e, > prevId=dc739711-f685-45e8-9017-1f91b1d86c8c, locNode=TcpDiscoveryNode > [id=dc739711-f685-45e8-9017-1f91b1d86c8c, addrs=[0:0:0:0:0:0:0:1, 10.0.75.1, > 127.0.0.1, 192.168.1.51, 192.168.192.1], sockAddrs=[/0:0:0:0:0:0:0:1:0, > LAPTOP-6FN8RAOS/10.0.75.1:0, /127.0.0.1:0, /192.168.192.1:0, > /192.168.1.51:0], discPort=0, order=2, intOrder=0, > lastExchangeTime=1542774104031, loc=true, ver=2.4.0#20180830-sha1:345c0a7c, > isClient=true]] > {code} > It looks like a race condition. > Steps to reproduce: > 1. Start server A. > 2. Start client. > 3. Start server B. > 4. Stop server A. > If add Thread.sleep(10000) between (3) and (4) then the client node won't be > disconnected from the cluster. > Reproducer is attached [^ClientDisconnectedTest.java]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)