Hello!

It seems that node 8 was kicked out of cluster by node 7 after some timeout:
[03:07:18,822][WARNING][tcp-disco-msg-worker-#2%MATCHERWORKER%][TcpDiscoverySpi]
Timed out waiting for message delivery receipt (most probably, the reason
is in long GC pauses on remote node; consider tuning GC and increasing
'ackTimeout' configuration property). Will retry to send message with
increased timeout [currentTimeout=9989, rmtAddr=/xx.xx.xxx.IP8:47500,
rmtPort=47500]
[03:07:18,876][WARNING][tcp-disco-msg-worker-#2%MATCHERWORKER%][TcpDiscoverySpi]
Failed to send message to next node [msg=TcpDiscoveryStatusCheckMessage
[creatorNode=TcpDiscoveryNode [id=cd
ec00c4-0ff8-4103-9dc6-335f1d148eef, addrs=[xx.xx.xxx.IP7, 127.0.0.1],
sockAddrs=[SERVER_IP7/xx.xx.xxx.IP7:47500, /127.0.0.1:47500],
discPort=47500, order=7, intOrder=7, lastExchangeTime=1609709828761,
loc=true, ver=2.7.0#20181201-sha1:256ae401, isClient=false],
failedNodeId=null, status=0, super=TcpDiscoveryAbstractMessage
[sndNodeId=null, id=130ab29a671-cdec00c4-0ff8-4103-9dc6-335f1d148eef,
verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null,
isClient=false]], next=TcpDiscoveryNode
[id=b4304f38-d28a-4cf7-8ca4-ab50d8189ff3, addrs=[xx.xx.xxx.IP8, 127.0.0.1],
sockAddrs=[/127.0.0.1:47500, /xx.xx.xxx.IP8:47500], discPort=47500,
order=8, intOrder=8, lastExchangeTime=1609146411895, loc=false,
ver=2.7.0#20181201-sha1:256ae401, isClient=false],errMsg=Failed to send
message to next node [msg=TcpDiscoveryStatusCheckMessage
[creatorNode=TcpDiscoveryNode [id=cdec00c4-0ff8-4103-9dc6-335f1d148eef,
addrs=[xx.xx.xxx.IP7, 127.0.0.1],
sockAddrs=[SERVER_IP7/xx.xx.xxx.IP7:47500, /127.0.0.1:47500],
discPort=47500, order=7, intOrder=7, lastExchangeTime=1609709828761,
loc=true, ver=2.7.0#20181201-sha1:256ae401, isClient=false],
failedNodeId=null, status=0, super=TcpDiscoveryAbstractMessage
[sndNodeId=null, id=130ab29a671-cdec00c4-0ff8-4103-9dc6-335f1d148eef,
verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null,
isClient=false]], next=ClusterNode
[id=b4304f38-d28a-4cf7-8ca4-ab50d8189ff3, order=8, addr=[xx.xx.xxx.IP8,
127.0.0.1], daemon=false]]]
[03:07:18,876][INFO][tcp-disco-msg-worker-#2%MATCHERWORKER%][TcpDiscoverySpi]
New next node [newNext=TcpDiscoveryNode
[id=44f32796-5f72-4153-9b2a-ffe8dfde0947, addrs=[xx.xx.xxx.IP9, 127.0.0.1],
sockAddrs=[/127.0.0.1:47500, /xx.xx.xxx.IP9:47500], discPort=47500,
order=9, intOrder=9, lastExchangeTime=1609146421080, loc=false,
ver=2.7.0#20181201-sha1:256ae401, isClient=false]]
[03:07:18,885][WARNING][tcp-disco-msg-worker-#2%MATCHERWORKER%][TcpDiscoverySpi]
Local node has detected failed nodes and started cluster-wide procedure. To
speed up failure detection please see 'Failure Detection' section under
javadoc for 'TcpDiscoverySpi'
[03:07:18,962][WARNING][disco-event-worker-#40%MATCHERWORKER%][GridDiscoveryManager]
Node FAILED: TcpDiscoveryNode [id=b4304f38-d28a-4cf7-8ca4-ab50d8189ff3,
addrs=[xx.xx.xxx.IP8, 127.0.0.1], sockAddrs=[/127.0.0.1:47500,
/xx.xx.xxx.IP8:47500], discPort=47500, order=8, intOrder=8,
lastExchangeTime=1609146411895, loc=false,
ver=2.7.0#20181201-sha1:256ae401, isClient=false]
[03:07:18,989][INFO][disco-event-worker-#40%MATCHERWORKER%][GridDiscoveryManager]
Topology snapshot [ver=23, locNode=cdec00c4, servers=15, clients=2,
state=ACTIVE, CPUs=60, offheap=17.0GB, heap=48.0GB]

It's hard to say what caused this issue. Maybe there was indeed a
short-lived network glitch.

Regards,
-- 
Ilya Kasnacheev


пт, 8 янв. 2021 г. в 09:23, BEELA GAYATRI <beela.gaya...@tcs.com>:

> Hi Ilya,
>
>
>
>   PFA., all 16 nodes logs  and the node8 has been stopped with
> segmentation issue.
>
>
>
> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
>
>
>
> *From: *Ilya Kasnacheev <ilya.kasnach...@gmail.com>
> *Sent: *Thursday, January 7, 2021 5:02 PM
> *To: *user@ignite.apache.org
> *Subject: *Re: Node Segmentation Error
>
>
> "External email. Open with Caution"
>
> Hello!
>
>
>
> Do you also have logs from other server nodes?
>
>
>
> Here, I don't see anything particularly suspicious. Maybe there indeed
> were some short-term network problems?
>
>
>
> Regards,
>
> --
>
> Ilya Kasnacheev
>
>
>
>
>
> ср, 6 янв. 2021 г. в 15:04, BEELA GAYATRI <beela.gaya...@tcs.com>:
>
> Dear Team,
>
>
>
> We are running 16  Ignite nodes, few nodes are getting down with  below
> error . Please let us know what could be possible reasons and solution  if
> node is segmented and getting down.
>
> *Error:*
>
> *Node is out of topology (probably, due to short-time network problems).*
>
> *[23:16:27,554][WARNING][disco-event-worker-#40%MATCHERWORKER%][GridDiscoveryManager]
> Local node SEGMENTED: TcpDiscoveryNode
> [id=ad4f2ad9-7f42-4863-84e4-03b95c6a9d9d, addrs=[XX.XX.XXX.IP8, 127.0.0.1],
> sockAddrs=[/127.0.0.1:47500 <http://127.0.0.1:47500>,
> SERVER8/XX.XX.XXX.IP8:47500], discPort=47500, order=8, intOrder=8,
> lastExchangeTime=1609868787545, loc=true, ver=2.7.0#20181201-sha1:256ae401,
> isClient=false]*
>
>
>
> Below are he jvm args we are providing to the nodes
>
> JVMARGS="-Xms3G -Xmx3G -Xss5M -XX:-UseGCOverheadLimit
>
> -XX:+AlwaysPreTouch
>
> -XX:+UseG1GC
>
> -XX:+ScavengeBeforeFullGC
>
> -XX:+DisableExplicitGC
>
> -XX:+PrintGCDetails
>
> -XX:MaxGCPauseMillis=200
>
> -Xloggc:/path/to/logs/GClog.txt
>
> Djava.net.preferIPv4Stack=true  -Dserver --add-exports
> java.base/jdk.internal.misc=ALL-UNNAMED --add-exports 
> java.base/sun.nio.ch=ALL-UNNAMED
> --add-exports java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED
> --add-exports jdk.internal.jvmstat/sun.jvmstat.monitor=ALL-UNNAMED"
>
>
>
> PFA the log attached
>
>
>
>
>
> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
>
>
>
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>

Reply via email to