Hello! It seems that node 8 was kicked out of cluster by node 7 after some timeout: [03:07:18,822][WARNING][tcp-disco-msg-worker-#2%MATCHERWORKER%][TcpDiscoverySpi] Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout [currentTimeout=9989, rmtAddr=/xx.xx.xxx.IP8:47500, rmtPort=47500] [03:07:18,876][WARNING][tcp-disco-msg-worker-#2%MATCHERWORKER%][TcpDiscoverySpi] Failed to send message to next node [msg=TcpDiscoveryStatusCheckMessage [creatorNode=TcpDiscoveryNode [id=cd ec00c4-0ff8-4103-9dc6-335f1d148eef, addrs=[xx.xx.xxx.IP7, 127.0.0.1], sockAddrs=[SERVER_IP7/xx.xx.xxx.IP7:47500, /127.0.0.1:47500], discPort=47500, order=7, intOrder=7, lastExchangeTime=1609709828761, loc=true, ver=2.7.0#20181201-sha1:256ae401, isClient=false], failedNodeId=null, status=0, super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=130ab29a671-cdec00c4-0ff8-4103-9dc6-335f1d148eef, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=TcpDiscoveryNode [id=b4304f38-d28a-4cf7-8ca4-ab50d8189ff3, addrs=[xx.xx.xxx.IP8, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, /xx.xx.xxx.IP8:47500], discPort=47500, order=8, intOrder=8, lastExchangeTime=1609146411895, loc=false, ver=2.7.0#20181201-sha1:256ae401, isClient=false],errMsg=Failed to send message to next node [msg=TcpDiscoveryStatusCheckMessage [creatorNode=TcpDiscoveryNode [id=cdec00c4-0ff8-4103-9dc6-335f1d148eef, addrs=[xx.xx.xxx.IP7, 127.0.0.1], sockAddrs=[SERVER_IP7/xx.xx.xxx.IP7:47500, /127.0.0.1:47500], discPort=47500, order=7, intOrder=7, lastExchangeTime=1609709828761, loc=true, ver=2.7.0#20181201-sha1:256ae401, isClient=false], failedNodeId=null, status=0, super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=130ab29a671-cdec00c4-0ff8-4103-9dc6-335f1d148eef, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=ClusterNode [id=b4304f38-d28a-4cf7-8ca4-ab50d8189ff3, order=8, addr=[xx.xx.xxx.IP8, 127.0.0.1], daemon=false]]] [03:07:18,876][INFO][tcp-disco-msg-worker-#2%MATCHERWORKER%][TcpDiscoverySpi] New next node [newNext=TcpDiscoveryNode [id=44f32796-5f72-4153-9b2a-ffe8dfde0947, addrs=[xx.xx.xxx.IP9, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, /xx.xx.xxx.IP9:47500], discPort=47500, order=9, intOrder=9, lastExchangeTime=1609146421080, loc=false, ver=2.7.0#20181201-sha1:256ae401, isClient=false]] [03:07:18,885][WARNING][tcp-disco-msg-worker-#2%MATCHERWORKER%][TcpDiscoverySpi] Local node has detected failed nodes and started cluster-wide procedure. To speed up failure detection please see 'Failure Detection' section under javadoc for 'TcpDiscoverySpi' [03:07:18,962][WARNING][disco-event-worker-#40%MATCHERWORKER%][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=b4304f38-d28a-4cf7-8ca4-ab50d8189ff3, addrs=[xx.xx.xxx.IP8, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, /xx.xx.xxx.IP8:47500], discPort=47500, order=8, intOrder=8, lastExchangeTime=1609146411895, loc=false, ver=2.7.0#20181201-sha1:256ae401, isClient=false] [03:07:18,989][INFO][disco-event-worker-#40%MATCHERWORKER%][GridDiscoveryManager] Topology snapshot [ver=23, locNode=cdec00c4, servers=15, clients=2, state=ACTIVE, CPUs=60, offheap=17.0GB, heap=48.0GB]
It's hard to say what caused this issue. Maybe there was indeed a short-lived network glitch. Regards, -- Ilya Kasnacheev пт, 8 янв. 2021 г. в 09:23, BEELA GAYATRI <beela.gaya...@tcs.com>: > Hi Ilya, > > > > PFA., all 16 nodes logs and the node8 has been stopped with > segmentation issue. > > > > Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for > Windows 10 > > > > *From: *Ilya Kasnacheev <ilya.kasnach...@gmail.com> > *Sent: *Thursday, January 7, 2021 5:02 PM > *To: *user@ignite.apache.org > *Subject: *Re: Node Segmentation Error > > > "External email. Open with Caution" > > Hello! > > > > Do you also have logs from other server nodes? > > > > Here, I don't see anything particularly suspicious. Maybe there indeed > were some short-term network problems? > > > > Regards, > > -- > > Ilya Kasnacheev > > > > > > ср, 6 янв. 2021 г. в 15:04, BEELA GAYATRI <beela.gaya...@tcs.com>: > > Dear Team, > > > > We are running 16 Ignite nodes, few nodes are getting down with below > error . Please let us know what could be possible reasons and solution if > node is segmented and getting down. > > *Error:* > > *Node is out of topology (probably, due to short-time network problems).* > > *[23:16:27,554][WARNING][disco-event-worker-#40%MATCHERWORKER%][GridDiscoveryManager] > Local node SEGMENTED: TcpDiscoveryNode > [id=ad4f2ad9-7f42-4863-84e4-03b95c6a9d9d, addrs=[XX.XX.XXX.IP8, 127.0.0.1], > sockAddrs=[/127.0.0.1:47500 <http://127.0.0.1:47500>, > SERVER8/XX.XX.XXX.IP8:47500], discPort=47500, order=8, intOrder=8, > lastExchangeTime=1609868787545, loc=true, ver=2.7.0#20181201-sha1:256ae401, > isClient=false]* > > > > Below are he jvm args we are providing to the nodes > > JVMARGS="-Xms3G -Xmx3G -Xss5M -XX:-UseGCOverheadLimit > > -XX:+AlwaysPreTouch > > -XX:+UseG1GC > > -XX:+ScavengeBeforeFullGC > > -XX:+DisableExplicitGC > > -XX:+PrintGCDetails > > -XX:MaxGCPauseMillis=200 > > -Xloggc:/path/to/logs/GClog.txt > > Djava.net.preferIPv4Stack=true -Dserver --add-exports > java.base/jdk.internal.misc=ALL-UNNAMED --add-exports > java.base/sun.nio.ch=ALL-UNNAMED > --add-exports java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED > --add-exports jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED" > > > > PFA the log attached > > > > > > Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for > Windows 10 > > > > =====-----=====-----===== > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > > >