Mitchell,

Can you provide the full log and the cache configuration?

On Thu, 25 Feb 2021 at 03:55, Mitchell Rathbun (BLOOMBERG/ 731 LEX)
<mrathb...@bloomberg.net> wrote:
>
> Any other thoughts on this? The data corruption occurred when we were using 
> version 2.7.5. I have looked at a couple of tickets involving corrupted 
> trees, but it doesn't seem like any of them apply to our use case of Ignite. 
> Would like to understand at least how we get into this corrupted state in the 
> first place, and how to handle it when it happens. Is there a way to detect 
> and log this error while avoiding crashing the process?
>
> From: user@ignite.apache.org At: 02/19/21 14:18:44
> To: Mitchell Rathbun (BLOOMBERG/ 731 LEX ) , user@ignite.apache.org
> Subject: Re: Corrupted B+ Tree Causing Repeated Crashes
>
> Hello! What version of Apache Ignite are you using?
>
> 19.02.2021, 22:07, "Mitchell Rathbun (BLOOMBERG/ 731 LEX)"
> <mrathb...@bloomberg.net>:
> > We are encountering the following error repeatedly, which causes our node to
> crash:
> >
> > 2021-02-19 13:30:38,175 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021
> 1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
> > SEVERE: Critical system error detected. Will be handled accordingly to
> configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, 
> timeout=0,
> super=AbstractFailureHandler
> > [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
> SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
> [type=CRITICAL_ERROR, err=class
> o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException: B+Tree is
> corrupted [pages(groupId, pageId)=[IgniteBiTuple [val1=-128547534,
> val2=281474976721835]], msg=Runtime failure on lookup row: SearchRow
> [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey
> [idHash=1436767547, hash=-931214342,
> accountCusip=com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip
> [idHash=316813954, hash=343304888, accountId=0,
> cusip=com.bloomberg.aim.wingman.common.dto.Cusip [idHash=1325824124,
> hash=2123451959, cusip1=136125, cusip2=9001, cusip3=541401120, dept=2,
> subflag=2]]], hash=-931214342, cacheId=0]]]]
> > class
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeExcept
> ion: B+Tree is corrupted [pages(groupId, pageId)=[IgniteBiTuple
> [val1=-128547534, val2=281474976721835]], msg=Runtime failure on lookup row:
> SearchRow
> [key=com.bloomberg.aim.wingman.cachemgr.Ts3DataCache$Ts3SecurityCacheKey
> [idHash=1436767547, hash=-931214342, accountCusip=
> > com.bloomberg.aim.wingman.common.dto.submgr.AccountCusip [idHash=316813954,
> hash=343304888, accountId=0, cusip=com.bloomberg.aim.wingman.common.dto.Cusip
> [idHash=1325824124, hash=2123451959, cusip1=136125, cusip2=9001,
> cusip3=541401120, dept=2, subflag=2]]], hash=-931214342, cacheId=0]]
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corrupted
> TreeException(BPlusTree.java:6106)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
> PlusTree.java:1367)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
> PlusTree.java:1344)
> > at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheD
> ataStoreImpl.find(IgniteCacheOffheapManagerImpl.java:2755)
> > at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$
> GridCacheDataStore.find(GridCacheOffheapManager.java:2469)
> > at
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.read(I
> gniteCacheOffheapManagerImpl.java:637)
> > at
> org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge
> tAllInternal(GridLocalAtomicCache.java:410)
> > at
> org.apache.ignite.internal.processors.cache.local.atomic.GridLocalAtomicCache.ge
> tAll(GridLocalAtomicCache.java:323)
> > at
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGetAll(Gr
> idCacheAdapter.java:4907)
> > at
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAll(GridCacheAda
> pter.java:1617)
> > at
> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAll(IgniteCa
> cheProxyImpl.java:1157)
> > at
> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAll(Ga
> tewayProtectedCacheProxy.java:724)
> > at
> com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.fetchCalcrtDataByKeySync(Ts3Data
> Cache.java:1535)
> > at
> com.bloomberg.aim.wingman.cachemgr.Ts3DataCache.lambda$fetchCalcrtDataBySecurity
> KeyAccountAsync$11(Ts3DataCache.java:895)
> > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> > at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
> ava:1128)
> > at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:628)
> > at java.base/java.lang.Thread.run(Thread.java:834)
> > Caused by: java.lang.IllegalStateException: Item not found: 1
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
> IO.findIndirectItemIndex(AbstractDataPageIO.java:351)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
> IO.getDataOffset(AbstractDataPageIO.java:459)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPage
> IO.readPayload(AbstractDataPageIO.java:501)
> > at
> org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compareKeys(Cache
> DataTree.java:447)
> > at
> org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheData
> Tree.java:386)
> > at
> org.apache.ignite.internal.processors.cache.tree.CacheDataTree.compare(CacheData
> Tree.java:63)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(B
> PlusTree.java:5377)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findInser
> tionPoint(BPlusTree.java:5297)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$11
> 00(BPlusTree.java:98)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.ru
> n0(BPlusTree.java:302)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHa
> ndler.run(BPlusTree.java:5888)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.ru
> n(BPlusTree.java:282)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHa
> ndler.run(BPlusTree.java:5874)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.re
> adPage(PageHandler.java:169)
> > at
> org.apache.ignite.internal.processors.cache.persistence.DataStructure.read(DataS
> tructure.java:364)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.read(BPlu
> sTree.java:6075)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
> BPlusTree.java:1424)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
> BPlusTree.java:1433)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(
> BPlusTree.java:1433)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doFind(BP
> lusTree.java:1391)
> > at
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(B
> PlusTree.java:1359)
> > ... 16 more
> > 2021-02-19 13:30:38,177 ERROR STDIO [pool-32-thread-5] {} Feb 19, 2021
> 1:30:38 PM org.apache.ignite.logger.java.JavaLogger error
> > SEVERE: A critical problem with persistence data structures was detected.
> Please make backup of persistence storage and WAL files for further analysis.
> Persistence storage path: null WAL path: db/wal WAL archive path: 
> db/wal/archive
> >
> > I think we can fix this by just clearing the persistent storage and
> restarting our node, but we can't have this happen in production so I want to
> understand two things:
> >
> > 1. How can this happen?
> >
> > 2. How can we prevent this from happening/best respond when it does happen?
> We don't want our process to crash as a result of this, we would rather just
> invalidate the cache and clear it if at all possible.
>
>

Reply via email to