[ https://issues.apache.org/jira/browse/IGNITE-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Rakov reopened IGNITE-9562: -------------------------------- > Destroyed cache that resurrected on an old offline node breaks PME > ------------------------------------------------------------------ > > Key: IGNITE-9562 > URL: https://issues.apache.org/jira/browse/IGNITE-9562 > Project: Ignite > Issue Type: Bug > Components: cache > Affects Versions: 2.5 > Reporter: Pavel Kovalenko > Assignee: Eduard Shangareev > Priority: Critical > Fix For: 2.8, 2.7.6 > > Time Spent: 20m > Remaining Estimate: 0h > > Given: > 2 nodes, persistence enabled. > 1) Stop 1 node > 2) Destroy cache through client > 3) Start stopped node > When the stopped node joins to cluster it starts all caches that it has seen > before stopping. > If that cache was cluster-widely destroyed it leads to breaking the crash > recovery process or PME. > Root cause - we don't start/collect caches from the stopped node on another > part of a cluster. > In case of PARTITIONED cache mode that scenario breaks crash recovery: > {noformat} > java.lang.AssertionError: AffinityTopologyVersion [topVer=-1, minorTopVer=0] > at > org.apache.ignite.internal.processors.affinity.GridAffinityAssignmentCache.cachedAffinity(GridAffinityAssignmentCache.java:696) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.updateLocal(GridDhtPartitionTopologyImpl.java:2449) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtPartitionTopologyImpl.afterStateRestored(GridDhtPartitionTopologyImpl.java:679) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restorePartitionStates(GridCacheDatabaseSharedManager.java:2445) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.applyLastUpdates(GridCacheDatabaseSharedManager.java:2321) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreState(GridCacheDatabaseSharedManager.java:1568) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.beforeExchange(GridCacheDatabaseSharedManager.java:1308) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:1255) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:766) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2577) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2457) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:748) > {noformat} > In case of REPLICATED cache mode that scenario breaks PME coordinator process: > {noformat} > [2018-09-12 > 18:50:36,407][ERROR][sys-#148%distributed.CacheStopAndRessurectOnOldNodeTest0%][GridCacheIoManager] > Failed to process message [senderId=4b6fd0d4-b756-4a9f-90ca-f0ee25100001, > messageType=class > o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionsSingleMessage] > java.lang.AssertionError: 3080586 > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.clientTopology(GridCachePartitionExchangeManager.java:815) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updatePartitionSingleMap(GridDhtPartitionsExchangeFuture.java:3621) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2439) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:137) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2261) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:2249) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:2249) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1628) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1100(GridCachePartitionExchangeManager.java:141) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:368) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:332) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2999) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2978) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101) > at > org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295) > at > org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1569) > at > org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1197) > at > org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:127) > at > org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1093) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {noformat} > As one of the solutions - we shouldn't start such caches on resurrected nodes. > We should save caches changes history somewhere and cluster-widely spread it > to joining nodes. > In a case when cache was only stopped, we can do nothing and start it lately > when cache start request received. > In a case when cache was stopped & destroyed, we should clean persistence > data for that cache. -- This message was sent by Atlassian JIRA (v7.6.14#76016)