Hello Denis,

* Yes, we run 2.11.0 against the old 2.8.1 PDS.
* And no, it's not necessarily the compatibility problem since
corruption was detected both in cache with and without old records. But
yes, caches were created by 2.8.1 version.
* I also forget to say that we did yet another test before the long run
- stop one ignite node for about 3 hours to emulate node failure.  Note,
that datf corruption was detected afterwards on the another ignite node.

In more details the following steps were done:

1. Setup a clean 2.8.1 cluster and load historic data for 18 days to the
1st (of 3) caches we use with expiration. Data  was loaded the way that
the most old records start to exipire next day.  Two other caches
remains empty.

2. Start application and let is generate some data to 1st cache (ttl=18
day) and to the 2nd one (ttl=1 day)

3. The same day stop application, deactivate and stop cluster, install
ignite 2.11.0, start 2.11.0 cluster.

4. Start application and invoke our custom warm-up procedure which reads
all record from all caches to put it into the memory. So this time all
data looks good at least for reading I suppose.

5. Let application work for a weekend under load.

6. After weekend not stopping the load from application stop one of the
ignite nodes for about 3 hours

7. Start ignite node.   No problems were detected in appication behouour.

8. Let application work under load for 5 days.

9. Stop application  and perform the steps for ignite cluster restart
described in the orihinal message.  One node crashed in ttl thread after
the cluster activated.

Hope, this would help or lead to some other thoughts.

Thanks,

--

  Sergey


24.11.2021 2:10, Denis Chudov пишет:
> Hi Sergey!
>
> Thank you for providing details.
> Have I understood correctly that you run newer version of Ignite on
> older persistent files? Is there any possibility that some data in
> your caches survived that 5 days of cluster work?
> I'm just trying to exclude any compatibility problems,
> like https://issues.apache.org/jira/browse/IGNITE-14252
> <https://issues.apache.org/jira/browse/IGNITE-14252>
>
> Denis Chudov
> Software Engineer, Moscow
> +7 905 5775239
> https://www.gridgain.com <https://www.gridgain.com/>
> Powered by Apache® Ignite™
>
>
> On Tue, Nov 23, 2021 at 8:32 AM Sergey Korotkov
> <serge.korot...@gmail.com <mailto:serge.korot...@gmail.com>> wrote:
>
>     Hello Denis,
>
>     Yes, as I said in the original message we do use the expiration on
>     persistent caches.
>
>     The corruptedPages_2021-11-08_11-03-21_999.txt and
>     corruptedPages_2021-11-09_12-43-12_449.txt files were generated by
>     Ignite on crash. They show that two different caches were
>     affected. The first one during the expiration and the second (next
>     day) during rebalance eviction.  Both caches are persistent and
>     use the expiration.
>
>     I also run the diagnostic utility (IgniteWalConverter) the way it
>     is recommended in the error message (output attached as diag-*
>     files). 
>
>     Is there any usefull information in these diag-* files which can
>     help to understand what and how was corruped in particular?
>
>     ***
>
>     Generally this was a test run of new 2.11.0 version in test
>     environment. A goal was to check if new version works fine with
>     out application and also can be safely stopped/started for
>     maintenance. We do that since run into the similar problem with
>     'B+Tree is corrupted' on production during eviction rebalance
>     (with 2.8.1).  We see two similar issues fixed in 2.11.0: 
>     https://issues.apache.org/jira/browse/IGNITE-12489
>     <https://issues.apache.org/jira/browse/IGNITE-12489> and
>     https://issues.apache.org/jira/browse/IGNITE-14093
>     <https://issues.apache.org/jira/browse/IGNITE-14093> and consider
>     upgrade if it would help.   By the way fix of the IGNITE-12489
>     (https://github.com/apache/ignite/pull/8358/commits
>     <https://github.com/apache/ignite/pull/8358/commits>) contains a
>     lot changes with several attempts. May be it just fixes not all
>     situations?
>
>     Before the deactivation cluster works fine under our usual load
>     about 5 days. Load is about 300 requests per second each consists
>     of several reads and single write to caches with the expiration
>     turned on.  After that we stop / start the cluster to emulate the
>     situation we had on production with 2.8.1 (load from our
>     application was stopped as well before the deactivation request).
>
>     ***
>
>     Caches configurations. The first one has an affinity key and
>     interceptor
>
>       public CacheConfiguration<ContactHistoryKey, ContactHistory>
>     getEdenContactHistoryCacheConfiguration() {
>         CacheConfiguration<ContactHistoryKey, ContactHistory>
>     cacheConfiguration = new CacheConfiguration<>();
>         cacheConfiguration.setCacheMode(CacheMode.PARTITIONED);
>         cacheConfiguration.setAffinity(new
>     RendezvousAffinityFunction(false, 1024));
>         cacheConfiguration.setBackups(1);
>         cacheConfiguration.setAtomicityMode(CacheAtomicityMode.ATOMIC);
>         int expirationDays =
>     appConfig.getContactHistoryEdenExpirationDays();
>         cacheConfiguration
>             .setExpiryPolicyFactory(CreatedExpiryPolicy.factoryOf(new
>     Duration(TimeUnit.DAYS, expirationDays)));
>         cacheConfiguration.setInterceptor(new
>     ContactHistoryInterceptor());
>         return cacheConfiguration;
>       }
>
>     public class ContactHistoryKey {
>       String sOfferId;
>
>       @AffinityKeyMapped
>       String subsIdAffinityKey;
>     }
>
>       CacheConfiguration<String, String> getChannelOfferIdCache() {
>         CacheConfiguration<String, String> cacheConfiguration = new
>     CacheConfiguration<>();
>         cacheConfiguration.setCacheMode(CacheMode.PARTITIONED);
>         cacheConfiguration.setAffinity(new
>     RendezvousAffinityFunction(false, 1024));
>         cacheConfiguration.setBackups(1);
>         cacheConfiguration.setAtomicityMode(CacheAtomicityMode.ATOMIC);
>         int expirationDays =
>     appConfig.getChannelOfferIdCacheExpirationDays();
>         cacheConfiguration
>             .setExpiryPolicyFactory(CreatedExpiryPolicy.factoryOf(new
>     Duration(TimeUnit.DAYS, expirationDays)));
>         return cacheConfiguration;
>       }
>
>     ***
>
>     As for the other details.  Not sure is it relevant or not.
>     Deactivation was relatevily long and log contains a lot of
>     warnings between  2021-11-08 10:54:44 and 2021-11-08 10:59:33. 
>     Also there was a page locks dump at 10:56:47,567.  A lot of locks
>     were logged for cache in the  Batch_Campaigns_Region (region
>     without the persistence).
>
>     [2021-11-08
>     10:56:47,560][WARN][page-lock-tracker-timeout][CacheDiagnosticManager]{}
>     Threads hanged: [(sys-#99087-105322, WAITING)]
>     [2021-11-08
>     10:56:47,567][WARN][page-lock-tracker-timeout][CacheDiagnosticManager]{}
>     Page locks dump:
>
>     ................
>
>     Thread=[name=sys-#99087, id=105322], state=WAITING
>     Log overflow, size:512, headIdx=512 [structureId=7,
>     pageIdpageId=281474976790149 [pageIdHex=0001000000013685,
>     partId=0, pageIdx=79493, flags=00000001]]
>     Locked pages =
>     
> [844420635168174[0002ffff00000dae](r=0|w=1),844420635168175[0002ffff00000daf](r=0|w=1)]
>     Locked pages log: name=sys-#99087 time=(1636358127247, 2021-11-08
>     07:55:27.247)
>     L=1 -> Write lock pageId=844420635168174,
>     structureId=batch_campaign_results-p-0##CacheData
>     [pageIdHex=0002ffff00000dae, partId=65535, pageIdx=3502,
>     flags=00000002]
>     L=2 -> Write lock pageId=844420635168175,
>     structureId=batch_campaign_results-p-0##CacheData
>     [pageIdHex=0002ffff00000daf, partId=65535, pageIdx=3503,
>     flags=00000002]
>     L=3 -> Write lock pageId=281474976716639,
>     structureId=Batch_Campaigns_Region##FreeList
>     [pageIdHex=000100000000175f, partId=0, pageIdx=5983, flags=00000001]
>     L=4 -> Write lock pageId=844420635243234,
>     structureId=Batch_Campaigns_Region##FreeList
>     [pageIdHex=0002ffff000132e2, partId=65535, pageIdx=78562,
>     flags=00000002]
>
>     a lot more ...
>
>     ......................
>
>     Can it cause that something say not was checkpointed to the disk
>     correctly?
>
>     Thanks,
>
>     -- 
>
>       Sergey
>
>
>     22.11.2021 23:18, Denis Chudov пишет:
>>     Hi Sergey!
>>
>>     Seems that the root cause of your problem is possible corruption
>>     of data page
>>
>>     Caused by: java.lang.IllegalStateException: Item not found: 3
>>     at
>>     
>> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.findIndirectItemIndex(AbstractDataPageIO.java:476)
>>
>>     and it happened twice, not during expiration cleanup, but also
>>     during rebalance eviction. But your scenario looks quite usual
>>     and couldn't cause such problems.
>>
>>     I've found similar existing
>>     ticket https://issues.apache.org/jira/browse/IGNITE-11400
>>     <https://issues.apache.org/jira/browse/IGNITE-11400>
>>     but assignee couldn't reproduce the error.
>>
>>     Could you please provide more information about what had happened
>>     on your cluster before and about cache configuration? Do you use
>>     expiration on persistent caches? Any details can be useful for
>>     the problem localization.
>>
>>     Denis Chudov
>>     Software Engineer, Moscow
>>     +7 905 5775239
>>     https://www.gridgain.com <https://www.gridgain.com/>
>>     Powered by Apache® Ignite™
>>
>>
>>     On Thu, Nov 18, 2021 at 2:25 PM Sergey Korotkov
>>     <serge.korot...@gmail.com <mailto:serge.korot...@gmail.com>> wrote:
>>
>>         Hello,
>>
>>         We have troubles with the CorruptedTreeException: B+Tree is
>>         corrupted
>>         during the node start after cluster restart. Looks like the
>>         caches with
>>         the Expiry Policy configured are source of the problems.
>>
>>         I have attached the log from the problem node.  The exact
>>         steps with
>>         timestamps are as folows.  Before the deactivation cluster
>>         works fine
>>         about 5 days
>>
>>         2021-11-08 10:54:44         cluster deactivate request  
>>
>>         2021-11-08 10:59:33         cluster deactivated
>>
>>         2021-11-08 11:02:30         stop all nodes
>>
>>         2021-11-08 11:02:39         start all nodes
>>
>>         2021-11-08 11:03:14         auto-activation start
>>
>>         2021-11-08 11:03:16         cluster activated
>>
>>         2021-11-08 11:03:21         'B+Tree is corrupted' exception in
>>         GridCacheTtlManager.expire() on one of the nodes (see the
>>         10.12.86.29-ignite-2021-11-08.0.log):
>>
>>         [2021-11-08 11:03:21,820][ERROR][ttl-cleanup-worker-#215][ROOT]{}
>>         Critical system error detected. Will be handled accordingly to
>>         configured handler [hnd=StopNodeOrHaltFailureHandler
>>         [tryStop=false,
>>         timeout=0, super=AbstractFailureHandler
>>         [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED,
>>         SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
>>         [type=CRITICAL_ERROR, err=class
>>         o.a.i.i.processors.cache.persistence.tree.CorruptedTreeException:
>>         B+Tree
>>         is corrupted [pages(groupId, pageId)=[], msg=Runtime failure
>>         on bounds:
>>         [lower=null, upper=PendingRow []]]]]
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>>         B+Tree is corrupted [pages(groupId, pageId)=[], msg=Runtime
>>         failure on
>>         bounds: [lower=null, upper=PendingRow []]]
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:6139)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1133)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1100)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1095)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:3076)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:3023)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:1255)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:246)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.lambda$body$0(GridCacheSharedTtlCleanupManager.java:193)
>>                 at
>>         
>> java.util.concurrent.ConcurrentHashMap.computeIfPresent(ConcurrentHashMap.java:1769)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:192)
>>                 at
>>         
>> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>>                 at java.lang.Thread.run(Thread.java:748)
>>         Caused by:
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException:
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException:
>>         java.lang.IllegalStateException: Item not found: 3
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(BPlusTree.java:1079)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1118)
>>                 ... 11 common frames omitted
>>         Caused by:
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException:
>>         java.lang.IllegalStateException: Item not found: 3
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.doInitFromLink(CacheDataRowAdapter.java:345)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:165)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:136)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:123)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.tree.PendingRow.initKey(PendingRow.java:73)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.tree.PendingEntriesTree.getRow(PendingEntriesTree.java:127)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.tree.PendingEntriesTree.getRow(PendingEntriesTree.java:32)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer0(BPlusTree.java:5820)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.fillFromBuffer(BPlusTree.java:5586)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.init(BPlusTree.java:5512)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(BPlusTree.java:1068)
>>                 ... 12 common frames omitted
>>         Caused by: java.lang.IllegalStateException: Item not found: 3
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.findIndirectItemIndex(AbstractDataPageIO.java:476)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.getDataOffset(AbstractDataPageIO.java:584)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.readPayload(AbstractDataPageIO.java:626)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.readIncomplete(CacheDataRowAdapter.java:380)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.doInitFromLink(CacheDataRowAdapter.java:316)
>>                 ... 22 common frames omitted
>>
>>
>>         Error message suggests to run the
>>         org.apache.ignite.development.utis.IgniteWalConverter to
>>         diagnose the
>>         problem.  I attached the output of this utility:
>>
>>          - corruptedPages_2021-11-08_11-03-21_999.txt  - file created
>>         by ignite
>>         on crash
>>
>>          - diag-2021-11-08.txt - output of the diagnostic utility.
>>
>>
>>         Next day we try to start this node again and it still fail
>>         with  'B+Tree
>>         is corrupted'  but in different place: 
>>
>>         2021-11-09 12:37:34     'B+Tree is corrupted' exception in
>>         PartitionsEvictManager$PartitionEvictionTask.run()  (see the 
>>         10.12.86.29-ignite-2021-11-09.0.log)
>>
>>
>>         [2021-11-09 12:43:10,857][ERROR][rebalance-#344][ROOT]{}
>>         Critical system
>>         error detected. Will be handled accordingly to configured handler
>>         [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
>>         super=AbstractFailu
>>         reHandler [ignoredFailureTypes=UnmodifiableSet
>>         [SYSTEM_WORKER_BLOCKED,
>>         SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext
>>         [type=CRITICAL_ERROR, err=class
>>         o.a.i.i.processors.cache.persistence.tree.CorruptedT
>>         reeException: B+Tree is corrupted [pages(groupId, pageId)=[],
>>         msg=Runtime failure on bounds: [lower=null, upper=null]]]]
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>>         B+Tree is corrupted [pages(groupId, pageId)=[], msg=Runtime
>>         failure on
>>         bounds: [lower=null, upper=null]]
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.corruptedTreeException(BPlusTree.java:6139)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1133)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.tree.CacheDataTree.find(CacheDataTree.java:167)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.tree.CacheDataTree.find(CacheDataTree.java:63)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1100)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1095)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.cursor(IgniteCacheOffheapManagerImpl.java:2914)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.cursor(GridCacheOffheapManager.java:2856)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$3.onHasNext(IgniteCacheOffheapManagerImpl.java:938)
>>                 at
>>         
>> org.apache.ignite.internal.util.GridCloseableIteratorAdapter.hasNextX(GridCloseableIteratorAdapter.java:53)
>>                 at
>>         
>> org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:987)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.distributed.dht.topology.PartitionsEvictManager$PartitionEvictionTask.run(PartitionsEvictManager.java:409)
>>                 at
>>         
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>                 at
>>         java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>                 at
>>         
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>                 at
>>         
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>                 at java.lang.Thread.run(Thread.java:748)
>>         Caused by:
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException:
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException:
>>         java.lang.IllegalStateException: Item not found: 16
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(BPlusTree.java:1079)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.find(BPlusTree.java:1118)
>>                 ... 16 common frames omitted
>>         Caused by:
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTreeRuntimeException:
>>         java.lang.IllegalStateException: Item not found: 16
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.doInitFromLink(CacheDataRowAdapter.java:345)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:165)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:136)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.tree.DataRow.<init>(DataRow.java:55)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.tree.CacheDataRowStore.dataRow(CacheDataRowStore.java:129)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.tree.CacheDataTree.getRow(CacheDataTree.java:422)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.tree.CacheDataTree.getRow(CacheDataTree.java:63)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$ForwardCursor.fillFromBuffer0(BPlusTree.java:5820)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.fillFromBuffer(BPlusTree.java:5586)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$AbstractForwardCursor.init(BPlusTree.java:5512)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findLowerUnbounded(BPlusTree.java:1068)
>>                 ... 17 common frames omitted
>>         Caused by: java.lang.IllegalStateException: Item not found: 16
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.findIndirectItemIndex(AbstractDataPageIO.java:476)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.getDataOffset(AbstractDataPageIO.java:584)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.tree.io.AbstractDataPageIO.readPayload(AbstractDataPageIO.java:626)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.readIncomplete(CacheDataRowAdapter.java:380)
>>                 at
>>         
>> org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.doInitFromLink(CacheDataRowAdapter.java:316)
>>
>>
>>         I also attached the corrupted pages file and diagnostic
>>         output for this
>>         case as well
>>
>>         - corruptedPages_2021-11-09_12-43-12_449.txt
>>
>>         - diag-2021-11-09.txt
>>
>>
>>         In both cases pages are for the caches with the Expire Policy
>>         configured.
>>
>>
>>         ****
>>
>>         What can be done about that?  Is there any recommended way to
>>         stop/start
>>         ignite cluster to prevent such data loss problems?
>>
>>         ****
>>
>>         I see some similar fixed issues in Jira like
>>         https://issues.apache.org/jira/browse/IGNITE-12489
>>         <https://issues.apache.org/jira/browse/IGNITE-12489> or 
>>         https://issues.apache.org/jira/browse/IGNITE-14093
>>         <https://issues.apache.org/jira/browse/IGNITE-14093>  but
>>         looks like
>>         something still not working in 2.11.0.
>>
>>
>>         Thanks,
>>
>>         -- 
>>
>>           Sergey
>>
>>
>>

Reply via email to