Hi Emmanuel, Sorry for late answer. I've found I check your test against master branch and test passes, but it fails on 2.5 branch. It looks like already fixed.
On Wed, Jun 6, 2018 at 3:47 PM, Emmanuel Marchand < emmanuel.march...@exensa.com> wrote: > I was wrong on the introduction of the exception. I guess it was added by > a fix about IGNITE-8066 > <https://github.com/apache/ignite/commit/55c50a1c4309246e9cd254ed1f25b41b89f9a959#diff-cacbb66ea1331a1869d9da2bcd039ef1> > . > Regards, > --- > Emmanuel. > > > On 05/06/18 11:05, Emmanuel Marchand wrote: > > Hi, > > I'm testing v2.5 vs v2.4 for persisted dataregion with *walModel = NONE* > and while performance seems better I failed to restart the cluster after > what I think is a proper shutdown (using top -deactivate then kill -k from > visor). > > When I try to reactivate the cluster (using top -activate from visor) I > get the following exception on each nodes : > [09:21:37,592][INFO][grid-nio-worker-tcp-comm-0-#33][TcpCommunicationSpi] > Accepted incoming communication connection [locAddr=/192.168.1.1:47100, > rmtAddr=/192.168.1.102:44646] > [09:21:37,656][INFO][pub-#92][GridClusterStateProcessor] Sending activate > request with BaselineTopology null > [09:21:37,659][INFO][tcp-disco-msg-worker-#3][GridClusterStateProcessor] > Received activate request with BaselineTopology: null > [09:21:37,661][INFO][tcp-disco-msg-worker-#3][GridClusterStateProcessor] > Started state transition: true > [09:21:37,687][INFO][exchange-worker-#52][time] Started exchange init > [topVer=AffinityTopologyVersion [topVer=69, minorTopVer=1], crd=true, > evt=DISCOVERY_CUSTOM_EVT, evtNode=0f5d38b7-b748-4861-91ef-204ed9343e60, > customEvt=ChangeGlobalStateMessage > [id=c0eeccec361-85ace6cb-d27e-4a0e-9106-ca39e6fcbfdd, > reqId=5a1cf16e-f610-4b4b-b1eb-76078be38d6c, > initiatingNodeId=0f5d38b7-b748-4861-91ef-204ed9343e60, > activate=true, baselineTopology=null, forceChangeBaselineTopology=false, > timestamp=1528183297656], allowMerge=false] > [09:21:37,688][INFO][exchange-worker-#52][GridDhtPartitionsExchangeFuture] > Start activation process [nodeId=0f5d38b7-b748-4861-91ef-204ed9343e60, > client=false, topVer=AffinityTopologyVersion [topVer=69, minorTopVer=1]] > [09:21:37,688][INFO][exchange-worker-#52][FilePageStoreManager] Resolved > page store work directory: /usr/share/apache-ignite- > fabric-2.5.0-bin/work/db/node00-bcfb4de5-5fc6-41e9-9ebd-90b873711c19 > [09:21:37,689][INFO][exchange-worker-#52][FileWriteAheadLogManager] > Resolved write ahead log work directory: /usr/share/apache-ignite- > fabric-2.5.0-bin/work/db/wal/node00-bcfb4de5-5fc6-41e9-9ebd-90b873711c19 > [09:21:37,689][INFO][exchange-worker-#52][FileWriteAheadLogManager] > Resolved write ahead log archive directory: /usr/share/apache-ignite- > fabric-2.5.0-bin/work/db/wal/archive/node00-bcfb4de5-5fc6- > 41e9-9ebd-90b873711c19 > [09:21:37,690][WARNING][exchange-worker-#52][FileWriteAheadLogManager] > Started write-ahead log manager in NONE mode, persisted data may be lost in > a case of unexpected node failure. Make sure to deactivate the cluster > before shutdown. > [09:21:37,701][INFO][exchange-worker-#52][PageMemoryImpl] Started page > memory [memoryAllocated=100.0 MiB, pages=24804, tableSize=1.9 MiB, > checkpointBuffer=100.0 MiB] > [09:21:37,798][INFO][exchange-worker-#52][PageMemoryImpl] Started page > memory [memoryAllocated=8.0 GiB, pages=2032836, tableSize=158.1 MiB, > checkpointBuffer=2.0 GiB] > [09:21:37,800][INFO][exchange-worker-#52][PageMemoryImpl] Started page > memory [memoryAllocated=100.0 MiB, pages=24804, tableSize=1.9 MiB, > checkpointBuffer=100.0 MiB] > [09:21:38,168][INFO][exchange-worker-#52][GridCacheDatabaseSharedManager] > Read checkpoint status [startMarker=/usr/share/apache-ignite-fabric-2.5.0- > bin/work/db/node00-bcfb4de5-5fc6-41e9-9ebd-90b873711c19/ > cp/1528182048551-ea54267c-22c4-4b64-b328-87cc09d3d460-START.bin, > endMarker=/usr/share/apache-ignite-fabric-2.5.0-bin/work/ > db/node00-bcfb4de5-5fc6-41e9-9ebd-90b873711c19/cp/ > 1528182048551-ea54267c-22c4-4b64-b328-87cc09d3d460-END.bin] > [09:21:38,169][INFO][exchange-worker-#52][GridCacheDatabaseSharedManager] > Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOff=0, > len=0], lastMarked=FileWALPointer [idx=0, fileOff=0, len=0], > lastCheckpointId=ea54267c-22c4-4b64-b328-87cc09d3d460] > *[09:21:38,228][SEVERE][exchange-worker-#52][] Critical system error > detected. Will be handled accordingly to configured handler [hnd=class > o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext > [type=CRITICAL_ERROR, err=class o.a.i.i.pagemem.wal.StorageException: > Restore wal pointer = null, while status.endPtr = FileWALPointer [idx=0, > fileOff=0, len=0]. Can't restore memory - critical part of WAL archive is > missing.]]* > *class org.apache.ignite.internal.pagemem.wal.StorageException: Restore > wal pointer = null, while status.endPtr = FileWALPointer [idx=0, fileOff=0, > len=0]. Can't restore memory - critical part of WAL archive is missing.* > * at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:759)* > * at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(GridDhtPartitionsExchangeFuture.java:894)* > * at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:641)* > * at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2419)* > * at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2299)* > * at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)* > * at java.lang.Thread.run(Thread.java:748)* > *[09:21:38,229][SEVERE][exchange-worker-#52][] JVM will be halted > immediately due to the failure: [failureCtx=FailureContext > [type=CRITICAL_ERROR, err=class o.a.i.i.pagemem.wal.StorageException: > Restore wal pointer = null, while status.endPtr = FileWALPointer [idx=0, > fileOff=0, len=0]. Can't restore memory - critical part of WAL archive is > missing.]]* > > This operation ended successfully with version 2.4. > > The exception was introduced by IGNITE-8393 > <https://github.com/apache/ignite/commit/5ec1796408d45296cbb26bb93353e23e18b039c8> > but the problem may lie deeper, like in the checkpoint process which store > a status.endPtr = FileWALPointer even when walMode = NONE. > > Any idea ? > > > Regards, > --- > Emmanuel. > > > -- Best regards, Andrey V. Mashenkov