Hello! If your node did not crash during a checkpoint, complerely removing WAL files and WAL markers should set you back to some stable state.
If it did crash during a checkpoint, you can't start with corrupted WAL. Regards, -- Ilya Kasnacheev ср, 12 июн. 2019 г. в 17:37, Kamlesh Joshi <kamlesh.jo...@ril.com>: > Hi Denis, > > > > Only WAL disk is corrupted but datastore is intact by any way can we > restore the cluster ? some data loss is fine. Any suggestion on this? > > > > *Thanks and Regards,* > > *Kamlesh Joshi* > > > > *From:* Kamlesh Joshi > *Sent:* Thursday, June 6, 2019 7:52 PM > *To:* user@ignite.apache.org > *Subject:* RE: [External]Re: Is there any way to force recover the > cluster - copying running cluster datastore > > > > Thanks for the update Denis. > > > > If one of the WAL disk gets failed, is there any way to start or recover > the cluster forcefully ? > > > > *Thanks and Regards,* > > *Kamlesh Joshi* > > > > *From:* Denis Magda <dma...@apache.org> > *Sent:* Thursday, June 6, 2019 4:44 PM > *To:* user@ignite.apache.org > *Subject:* [External]Re: Is there any way to force recover the cluster - > copying running cluster datastore > > > > The e-mail below is from an external source. Please do not open > attachments or click links from an unknown or suspicious origin. > > I would discourage you from doing this if data consistency is prominent > for you. What you see on the disk of one cluster node might be inconsistent > with the whole cluster state and the actual/last updates in memory. > Snapshots and backups can solve your task. Google for a solution provided > by GridGain. > > > - > > Denis > > > > > > On Wed, Jun 5, 2019 at 8:27 AM Kamlesh Joshi <kamlesh.jo...@ril.com> > wrote: > > Hi Team, > > > > We are trying to start another Ignite cluster by taking a > copy of the running cluster’s datastore (source cluster’s datastore is > getting modified in parallel). So, when we try to start the server node > with copied datastore, it gives error as below. Also, giving cluster > configuration for reference: > > > > *pageSize=#{4 * 1024}* > > *walMode=LOG_ONLY* > > *walFlushFrequency=60000* > > *rebalanceThreadPoolSize=8* > > *rebalanceThrottle=100* > > *rebalanceBatchSize=#{32 * 1024 * 1024}* > > *storagePath=/datastore/datastore* > > *walPath=/datastore1/wal* > > *walArchivePath=/datastore1/archive* > > *metadataWorkDir=/datastore/metadataWorkDir* > > > > > > [2019-06-05T12:21:52,943][INFO ][main][GridCacheDatabaseSharedManager] > Read checkpoint status [startMarker=null, endMarker=null] > > [2019-06-05T12:21:52,967][INFO ][main][PageMemoryImpl] Started page memory > [memoryAllocated=128.0 MiB, pages=31744, tableSize=2.5 MiB, > checkpointBuffer=100.0 MiB] > > [2019-06-05T12:21:52,968][INFO ][main][GridCacheDatabaseSharedManager] > Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOff=0, > len=0], lastMarked=FileWALPointer [idx=0, fileOff=0, len=0], > lastCheckpointId=00000000-0000-0000-0000-000000000000] > > [2019-06-05T12:21:52,973][ERROR][main][IgniteKernal%EDIFCustomer_DR] > Exception during start processors, node will be stopped and close > connections > > org.apache.ignite.IgniteCheckedException: Failed to start processor: > GridProcessorAdapter [] > > at > org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1742) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:980) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2014) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1723) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1151) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1069) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:955) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:854) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:724) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:693) > [ignite-core-2.6.0.jar:2.6.0] > > at org.apache.ignite.Ignition.start(Ignition.java:352) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) > [ignite-core-2.6.0.jar:2.6.0] > > Caused by: org.apache.ignite.IgniteCheckedException: WAL history is too > short > [descs=[org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileDescriptor@4d6c2], > start=FileWALPointer [idx=0, fileOff=0, len=0]] > > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.init(FileWriteAheadLogManager.java:3009) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.<init>(FileWriteAheadLogManager.java:2960) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.<init>(FileWriteAheadLogManager.java:2896) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:799) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1968) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:574) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedManager.java:525) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:700) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1739) > ~[ignite-core-2.6.0.jar:2.6.0] > > ... 11 more > > [2019-06-05T12:21:52,978][ERROR][main][IgniteKernal%EDIFCustomer_DR] Got > exception while starting (will rollback startup routine). > > org.apache.ignite.IgniteCheckedException: Failed to start processor: > GridProcessorAdapter [] > > at > org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1742) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:980) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2014) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1723) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1151) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1069) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:955) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:854) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:724) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:693) > [ignite-core-2.6.0.jar:2.6.0] > > at org.apache.ignite.Ignition.start(Ignition.java:352) > [ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301) > [ignite-core-2.6.0.jar:2.6.0] > > Caused by: org.apache.ignite.IgniteCheckedException: WAL history is too > short > [descs=[org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileDescriptor@4d6c2], > start=FileWALPointer [idx=0, fileOff=0, len=0]] > > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.init(FileWriteAheadLogManager.java:3009) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.<init>(FileWriteAheadLogManager.java:2960) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.<init>(FileWriteAheadLogManager.java:2896) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:799) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1968) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:574) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedManager.java:525) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:700) > ~[ignite-core-2.6.0.jar:2.6.0] > > at > org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1739) > ~[ignite-core-2.6.0.jar:2.6.0] > > > > > > So, is there any way to start this cluster with copied > data store forcefully? This scenario may also arrive if WAL disk gets > failed. How can we atleast start the cluster with minimum data loss ? > > Any help would be highly appreciated ! > > > > *Thanks and Regards,* > > *Kamlesh Joshi* > > > > > "*Confidentiality Warning*: This message and any attachments are intended > only for the use of the intended recipient(s), are confidential and may be > privileged. If you are not the intended recipient, you are hereby notified > that any review, re-transmission, conversion to hard copy, copying, > circulation or other use of this message and any attachments is strictly > prohibited. If you are not the intended recipient, please notify the sender > immediately by return email and delete this message and any attachments > from your system. > > *Virus Warning:* Although the company has taken reasonable precautions to > ensure no viruses are present in this email. The company cannot accept > responsibility for any loss or damage arising from the use of this email or > attachment." > > > "*Confidentiality Warning*: This message and any attachments are intended > only for the use of the intended recipient(s), are confidential and may be > privileged. If you are not the intended recipient, you are hereby notified > that any review, re-transmission, conversion to hard copy, copying, > circulation or other use of this message and any attachments is strictly > prohibited. If you are not the intended recipient, please notify the sender > immediately by return email and delete this message and any attachments > from your system. > > *Virus Warning:* Although the company has taken reasonable precautions to > ensure no viruses are present in this email. The company cannot accept > responsibility for any loss or damage arising from the use of this email or > attachment." >