Ignite 2.5 | Can't restore memory - critical part of WAL archive is missing with walMode=NONE

Emmanuel Marchand Tue, 05 Jun 2018 02:06:02 -0700

Hi,

I'm testing v2.5 vs v2.4 for persisted dataregion with *walModel = NONE*and while performance seems better I failed to restart the cluster afterwhat I think is a proper shutdown (using top -deactivate then kill -kfrom visor).

When I try to reactivate the cluster (using top -activate from visor) Iget the following exception on each nodes :

[09:21:37,592][INFO][grid-nio-worker-tcp-comm-0-#33][TcpCommunicationSpi]Accepted incoming communication connection [locAddr=/192.168.1.1:47100,rmtAddr=/192.168.1.102:44646][09:21:37,656][INFO][pub-#92][GridClusterStateProcessor] Sendingactivate request with BaselineTopology null[09:21:37,659][INFO][tcp-disco-msg-worker-#3][GridClusterStateProcessor]Received activate request with BaselineTopology: null[09:21:37,661][INFO][tcp-disco-msg-worker-#3][GridClusterStateProcessor]Started state transition: true[09:21:37,687][INFO][exchange-worker-#52][time] Started exchange init[topVer=AffinityTopologyVersion [topVer=69, minorTopVer=1], crd=true,evt=DISCOVERY_CUSTOM_EVT, evtNode=0f5d38b7-b748-4861-91ef-204ed9343e60,customEvt=ChangeGlobalStateMessage[id=c0eeccec361-85ace6cb-d27e-4a0e-9106-ca39e6fcbfdd,reqId=5a1cf16e-f610-4b4b-b1eb-76078be38d6c,initiatingNodeId=0f5d38b7-b748-4861-91ef-204ed9343e60, activate=true,baselineTopology=null, forceChangeBaselineTopology=false,timestamp=1528183297656], allowMerge=false][09:21:37,688][INFO][exchange-worker-#52][GridDhtPartitionsExchangeFuture]Start activation process [nodeId=0f5d38b7-b748-4861-91ef-204ed9343e60,client=false, topVer=AffinityTopologyVersion [topVer=69, minorTopVer=1]][09:21:37,688][INFO][exchange-worker-#52][FilePageStoreManager] Resolvedpage store work directory:/usr/share/apache-ignite-fabric-2.5.0-bin/work/db/node00-bcfb4de5-5fc6-41e9-9ebd-90b873711c19[09:21:37,689][INFO][exchange-worker-#52][FileWriteAheadLogManager]Resolved write ahead log work directory:/usr/share/apache-ignite-fabric-2.5.0-bin/work/db/wal/node00-bcfb4de5-5fc6-41e9-9ebd-90b873711c19[09:21:37,689][INFO][exchange-worker-#52][FileWriteAheadLogManager]Resolved write ahead log archive directory:/usr/share/apache-ignite-fabric-2.5.0-bin/work/db/wal/archive/node00-bcfb4de5-5fc6-41e9-9ebd-90b873711c19[09:21:37,690][WARNING][exchange-worker-#52][FileWriteAheadLogManager]Started write-ahead log manager in NONE mode, persisted data may be lostin a case of unexpected node failure. Make sure to deactivate thecluster before shutdown.[09:21:37,701][INFO][exchange-worker-#52][PageMemoryImpl] Started pagememory [memoryAllocated=100.0 MiB, pages=24804, tableSize=1.9 MiB,checkpointBuffer=100.0 MiB][09:21:37,798][INFO][exchange-worker-#52][PageMemoryImpl] Started pagememory [memoryAllocated=8.0 GiB, pages=2032836, tableSize=158.1 MiB,checkpointBuffer=2.0 GiB][09:21:37,800][INFO][exchange-worker-#52][PageMemoryImpl] Started pagememory [memoryAllocated=100.0 MiB, pages=24804, tableSize=1.9 MiB,checkpointBuffer=100.0 MiB][09:21:38,168][INFO][exchange-worker-#52][GridCacheDatabaseSharedManager]Read checkpoint status[startMarker=/usr/share/apache-ignite-fabric-2.5.0-bin/work/db/node00-bcfb4de5-5fc6-41e9-9ebd-90b873711c19/cp/1528182048551-ea54267c-22c4-4b64-b328-87cc09d3d460-START.bin,endMarker=/usr/share/apache-ignite-fabric-2.5.0-bin/work/db/node00-bcfb4de5-5fc6-41e9-9ebd-90b873711c19/cp/1528182048551-ea54267c-22c4-4b64-b328-87cc09d3d460-END.bin][09:21:38,169][INFO][exchange-worker-#52][GridCacheDatabaseSharedManager]Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOff=0,len=0], lastMarked=FileWALPointer [idx=0, fileOff=0, len=0],lastCheckpointId=ea54267c-22c4-4b64-b328-87cc09d3d460]*[09:21:38,228][SEVERE][exchange-worker-#52][] Critical system errordetected. Will be handled accordingly to configured handler [hnd=classo.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext[type=CRITICAL_ERROR, err=class o.a.i.i.pagemem.wal.StorageException:Restore wal pointer = null, while status.endPtr = FileWALPointer [idx=0,fileOff=0, len=0]. Can't restore memory - critical part of WAL archiveis missing.]]****class org.apache.ignite.internal.pagemem.wal.StorageException: Restorewal pointer = null, while status.endPtr = FileWALPointer [idx=0,fileOff=0, len=0]. Can't restore memory - critical part of WAL archiveis missing.**** atorg.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:759)**** atorg.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(GridDhtPartitionsExchangeFuture.java:894)**** atorg.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:641)**** atorg.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2419)**** atorg.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2299)**** atorg.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)**

**    at java.lang.Thread.run(Thread.java:748)**

**[09:21:38,229][SEVERE][exchange-worker-#52][] JVM will be haltedimmediately due to the failure: [failureCtx=FailureContext[type=CRITICAL_ERROR, err=class o.a.i.i.pagemem.wal.StorageException:Restore wal pointer = null, while status.endPtr = FileWALPointer [idx=0,fileOff=0, len=0]. Can't restore memory - critical part of WAL archiveis missing.]]*


This operation ended successfully with version 2.4.

The exception was introduced by IGNITE-8393<https://github.com/apache/ignite/commit/5ec1796408d45296cbb26bb93353e23e18b039c8>but the problem may lie deeper, like in the checkpoint process whichstore a status.endPtr = FileWALPointer even when walMode = NONE.


Any idea ?


Regards,
---
Emmanuel.

Ignite 2.5 | Can't restore memory - critical part of WAL archive is missing with walMode=NONE

Reply via email to