Hi Ignite team,
Can you please advise if there are anything that we can check on the below?
Thanks.
Regards,
Marcus
From: Lo, Marcus [ICG-IT]
Sent: Wednesday, March 30, 2022 11:55 AM
To: user
Subject: Node crashed with error "Getting affinity for too old topology version
that is already out of history"
Hi,
We are using Ignite 2.10.0 and have 5 nodes (with consistentId/hostname -
lrdeqprmap01p, lrdeqprmap02p, lrdeqprmap03p, lcgeqprmap03c, lcgeqprmap04c) in
the cluster, and at one time 2 of the nodes (lcgeqprmap03c, lcgeqprmap04c) were
out due to power outage. Somehow another node lrdeqprmap03p shut down shortly
after that, with the following error:
2022-03-29 14:32:01.996+0100 ERROR [query-#194160%Prism%]
: Critical system error detected. Will be handled
accordingly to configured handler [hnd=StopNodeOrHaltFailureHandler
[tryStop=false, timeout=0, super=AbstractFailureHandler
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
[type=CRITICAL_ERROR, err=class
o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is
corrupted [pages(groupId, pageId)=[], cacheId=388652627,
cacheName=LIMIT_DASHBOARD_SNAPSHOT, indexName=_key_PK, msg=Runtime failure on
bounds: [lower=null, upper=null]]]]
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
B+Tree is corrupted [pages(groupId, pageId)=[], cacheId=388652627,
cacheName=LIMIT_DASHBOARD_SNAPSHOT, indexName=_key_PK, msg=Runtime failure on
bounds: [lower=null, upper=null]]
at
org.apache.ignite.internal.processors.query.h2.database.H2Tree.corruptedTreeException(H2Tree.java:977)
~[ignite-indexing-2.10.0.jar:2.10.0]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1133)
~[ignite-core-2.10.0.jar:2.10.0]
at
org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.find(H2TreeIndex.java:415)
~[ignite-indexing-2.10.0.jar:2.10.0]
...
Caused by:
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException:
java.lang.IllegalStateException: Getting affinity for too old topology version
that is already out of history [locNode=TcpDiscoveryNode
[id=e21d561d-314a-4240-a379-23f139870717, consistentId=lrdeqprmap03p,
addrs=ArrayList [127.0.0.1, 169.182.110.133], sockAddrs=HashSet
[/127.0.0.1:47500, lrdeqprmap03p.eur.nsroot.net/169.182.110.133:47500],
discPort=47500, order=7, intOrder=7, lastExchangeTime=1648560721652, loc=true,
ver=2.10.0#20210310-sha1:bc24f6ba, isClient=false],
grp=LimitDashboardSnapshotCache, topVer=AffinityTopologyVersion [topVer=17,
minorTopVer=28], lastAffChangeTopVer=AffinityTopologyVersion [topVer=17,
minorTopVer=28], head=AffinityTopologyVersion [topVer=19, minorTopVer=0],
history=[AffinityTopologyVersion [topVer=18, minorTopVer=0],
AffinityTopologyVersion [topVer=19, minorTopVer=0]]]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(BPlusTree.java:1079)
~[ignite-core-2.10.0.jar:2.10.0]
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1118)
~[ignite-core-2.10.0.jar:2.10.0]
... 23 more
Caused by: java.lang.IllegalStateException: Getting affinity for too old
topology version that is already out of history [locNode=TcpDiscoveryNode
[id=e21d561d-314a-4240-a379-23f139870717, consistentId=lrdeqprmap03p,
addrs=ArrayList [127.0.0.1, 169.182.110.133], sockAddrs=HashSet
[/127.0.0.1:47500, lrdeqprmap03p.eur.nsroot.net/169.182.110.133:47500],
discPort=47500, order=7, intOrder=7, lastExchangeTime=1648560721652, loc=true,
ver=2.10.0#20210310-sha1:bc24f6ba, isClient=false],
grp=LimitDashboardSnapshotCache, topVer=AffinityTopologyVersion [topVer=17,
minorTopVer=28], lastAffChangeTopVer=AffinityTopologyVersion [topVer=17,
minorTopVer=28], head=AffinityTopologyVersion [topVer=19, minorTopVer=0],
history=[AffinityTopologyVersion [topVer=18, minorTopVer=0],
AffinityTopologyVersion [topVer=19, minorTopVer=0]]]
at
org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:831)
~[ignite-core-2.10.0.jar:2.10.0]
at
org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:778)
~[ignite-core-2.10.0.jar:2.10.0]
at
org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.nodes(GridAffinityAssignmentCache.java:686)
~[ignite-core-2.10.0.jar:2.10.0]
...
I suppose Ignite should be fault tolerant and outage of some nodes should not
cause shutdown of other nodes. Can you please advise? I have attached the full
log for your reference. Thanks.
Regards,
Marcus