Hello, One of my Ignite nodes was stopped and the logs were appended as below. It seems that grid-timeout-worker checks the health of the cluster every minute. But then in my case, before the due time 23:34:19, at 23:34:03 it reported "Local node seems to be disconnected from topology (failure detection timeout is reached)", and the Ignite node got stopped. In turn, the web session clustering, and so on, stopped working.
Just wonder what could cause this to happen? There should be no network issue etc with the host machine then. It is a bit scary to us, as it can happen to our production servers in the near future. Thank you for your help. Yuci ===================Ignite logs====================== [23:31:19,896][INFO ][grid-timeout-worker-#33%null%][IgniteKernal] Metrics for local node (to disable set 'metricsLogFrequency' to 0) ^-- Node [id=9a069f70, name=null, uptime=10:37:03:793] ^-- H/N/C [hosts=2, nodes=2, CPUs=4] ^-- CPU [cur=43.17%, avg=12.83%, GC=1.1%] ^-- Heap [used=2115MB, free=61.26%, comm=3955MB] ^-- Non heap [used=138MB, free=-1%, comm=143MB] ^-- Public thread pool [active=0, idle=16, qSize=0] ^-- System thread pool [active=0, idle=16, qSize=0] ^-- Outbound messages queue [size=0] [23:32:19,904][INFO ][grid-timeout-worker-#33%null%][IgniteKernal] Metrics for local node (to disable set 'metricsLogFrequency' to 0) ^-- Node [id=9a069f70, name=null, uptime=10:38:03:801] ^-- H/N/C [hosts=2, nodes=2, CPUs=4] ^-- CPU [cur=0.83%, avg=12.87%, GC=0%] ^-- Heap [used=2638MB, free=51.69%, comm=3957MB] ^-- Non heap [used=138MB, free=-1%, comm=143MB] ^-- Public thread pool [active=0, idle=16, qSize=0] ^-- System thread pool [active=0, idle=16, qSize=0] ^-- Outbound messages queue [size=0] [23:33:19,913][INFO ][grid-timeout-worker-#33%null%][IgniteKernal] Metrics for local node (to disable set 'metricsLogFrequency' to 0) ^-- Node [id=9a069f70, name=null, uptime=10:39:03:808] ^-- H/N/C [hosts=2, nodes=2, CPUs=4] ^-- CPU [cur=0.5%, avg=12.86%, GC=0%] ^-- Heap [used=796MB, free=85.41%, comm=3921MB] ^-- Non heap [used=138MB, free=-1%, comm=143MB] ^-- Public thread pool [active=0, idle=16, qSize=0] ^-- System thread pool [active=0, idle=16, qSize=0] ^-- Outbound messages queue [size=0] [23:34:03,752][INFO ][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Local node seems to be disconnected from topology (failure detection timeout is reached) [failureDetectionTimeout=10000, connCheckFreq=3333] [23:34:03,783][WARN ][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Node is out of topology (probably, due to short-time network problems). [23:34:03,786][WARN ][disco-event-worker-#44%null%][GridDiscoveryManager] Local node SEGMENTED: TcpDiscoveryNode [id=9a069f70-d49d-472e-9771-7ac2353e751f, addrs=[10.3.0.64, 127.0.0.1], sockAddrs=[ves-hx-40.ebi.ac.uk/10.3.0.64:47500, /10.3.0.64:47500, /127.0.0.1:47500], discPort=47500, order=56, intOrder=29, lastExchangeTime=1470350043783, loc=true, ver=1.6.0#20160518-sha1:0b22c45b, isClient=false] [23:34:03,819][WARN ][disco-event-worker-#44%null%][GridDiscoveryManager] Stopping local node according to configured segmentation policy. [23:34:03,825][WARN ][disco-event-worker-#44%null%][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=cef7fc5e-b854-4072-8e16-396a87d5d556, addrs=[10.3.0.65, 127.0.0.1], sockAddrs=[ves-hx-41.ebi.ac.uk/10.3.0.65:47500, /10.3.0.65:47500, /127.0.0.1:47500], discPort=47500, order=58, intOrder=30, lastExchangeTime=1470311808664, loc=false, ver=1.6.0#20160518-sha1:0b22c45b, isClient=false] [23:34:03,827][INFO ][disco-event-worker-#44%null%][GridDiscoveryManager] Topology snapshot [ver=59, servers=1, clients=0, CPUs=2, heap=5.3GB] [23:34:03,874][INFO ][Thread-32][GridTcpRestProtocol] Command protocol successfully stopped: TCP binary [23:34:03,902][INFO ][Thread-32][GridJettyRestProtocol] Command protocol successfully stopped: Jetty REST [23:34:04,571][INFO ][Thread-32][GridCacheProcessor] Stopped cache: session-cache [23:34:04,572][INFO ][Thread-32][GridCacheProcessor] Stopped cache: ignite-marshaller-sys-cache [23:34:04,572][INFO ][Thread-32][GridCacheProcessor] Stopped cache: ignite-sys-cache [23:34:04,573][INFO ][Thread-32][GridCacheProcessor] Stopped cache: ignite-atomics-sys-cache [23:34:04,583][INFO ][Thread-32][GridCacheProcessor] Stopped cache: wicket-data-store [23:34:04,623][INFO ][Thread-32][IgniteKernal] >>> +---------------------------------------------------------------------------------+ >>> Ignite ver. 1.6.0#20160518-sha1:0b22c45bb9b97692208fd0705ddf8045ff34a031 >>> stopped OK >>> +---------------------------------------------------------------------------------+ >>> Grid uptime: 10:39:48:518 -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Local-node-seems-to-be-disconnected-from-topology-failure-detection-timeout-is-reached-tp6797.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.