Re: [External]Re: Is there any way to force recover the cluster - copying running cluster datastore

Ilya Kasnacheev Fri, 14 Jun 2019 06:31:08 -0700

Hello!

If your node did not crash during a checkpoint, complerely removing WAL
files and WAL markers should set you back to some stable state.


If it did crash during a checkpoint, you can't start with corrupted WAL.

Regards,
-- 
Ilya Kasnacheev


ср, 12 июн. 2019 г. в 17:37, Kamlesh Joshi <kamlesh.jo...@ril.com>:

> Hi Denis,
>
>
>
> Only WAL disk is corrupted but datastore is intact by any way can we
> restore the cluster ? some data loss is fine. Any suggestion on this?
>
>
>
> *Thanks and Regards,*
>
> *Kamlesh Joshi*
>
>
>
> *From:* Kamlesh Joshi
> *Sent:* Thursday, June 6, 2019 7:52 PM
> *To:* user@ignite.apache.org
> *Subject:* RE: [External]Re: Is there any way to force recover the
> cluster - copying running cluster datastore
>
>
>
> Thanks for the update Denis.
>
>
>
> If one of the WAL disk gets failed, is there any way to start or recover
> the cluster forcefully ?
>
>
>
> *Thanks and Regards,*
>
> *Kamlesh Joshi*
>
>
>
> *From:* Denis Magda <dma...@apache.org>
> *Sent:* Thursday, June 6, 2019 4:44 PM
> *To:* user@ignite.apache.org
> *Subject:* [External]Re: Is there any way to force recover the cluster -
> copying running cluster datastore
>
>
>
> The e-mail below is from an external source. Please do not open
> attachments or click links from an unknown or suspicious origin.
>
> I would discourage you from doing this if data consistency is prominent
> for you. What you see on the disk of one cluster node might be inconsistent
> with the whole cluster state and the actual/last updates in memory.
> Snapshots and backups can solve your task. Google for a solution provided
> by GridGain.
>
>
> -
>
> Denis
>
>
>
>
>
> On Wed, Jun 5, 2019 at 8:27 AM Kamlesh Joshi <kamlesh.jo...@ril.com>
> wrote:
>
> Hi Team,
>
>
>
>                 We are trying to start another Ignite cluster by taking a
> copy of the running cluster’s datastore (source cluster’s datastore is
> getting modified in parallel). So, when we try to start the server node
> with copied datastore, it gives error as below. Also, giving cluster
> configuration for reference:
>
>
>
> *pageSize=#{4 * 1024}*
>
> *walMode=LOG_ONLY*
>
> *walFlushFrequency=60000*
>
> *rebalanceThreadPoolSize=8*
>
> *rebalanceThrottle=100*
>
> *rebalanceBatchSize=#{32 * 1024 * 1024}*
>
> *storagePath=/datastore/datastore*
>
> *walPath=/datastore1/wal*
>
> *walArchivePath=/datastore1/archive*
>
> *metadataWorkDir=/datastore/metadataWorkDir*
>
>
>
>
>
> [2019-06-05T12:21:52,943][INFO ][main][GridCacheDatabaseSharedManager]
> Read checkpoint status [startMarker=null, endMarker=null]
>
> [2019-06-05T12:21:52,967][INFO ][main][PageMemoryImpl] Started page memory
> [memoryAllocated=128.0 MiB, pages=31744, tableSize=2.5 MiB,
> checkpointBuffer=100.0 MiB]
>
> [2019-06-05T12:21:52,968][INFO ][main][GridCacheDatabaseSharedManager]
> Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOff=0,
> len=0], lastMarked=FileWALPointer [idx=0, fileOff=0, len=0],
> lastCheckpointId=00000000-0000-0000-0000-000000000000]
>
> [2019-06-05T12:21:52,973][ERROR][main][IgniteKernal%EDIFCustomer_DR]
> Exception during start processors, node will be stopped and close
> connections
>
> org.apache.ignite.IgniteCheckedException: Failed to start processor:
> GridProcessorAdapter []
>
>         at
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1742)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:980)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2014)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1723)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1151)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1069)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:955)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:854)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:724)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:693)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at org.apache.ignite.Ignition.start(Ignition.java:352)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)
> [ignite-core-2.6.0.jar:2.6.0]
>
> Caused by: org.apache.ignite.IgniteCheckedException: WAL history is too
> short
> [descs=[org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileDescriptor@4d6c2],
> start=FileWALPointer [idx=0, fileOff=0, len=0]]
>
>         at
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.init(FileWriteAheadLogManager.java:3009)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.<init>(FileWriteAheadLogManager.java:2960)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.<init>(FileWriteAheadLogManager.java:2896)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:799)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1968)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:574)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedManager.java:525)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:700)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>        at
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1739)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         ... 11 more
>
> [2019-06-05T12:21:52,978][ERROR][main][IgniteKernal%EDIFCustomer_DR] Got
> exception while starting (will rollback startup routine).
>
> org.apache.ignite.IgniteCheckedException: Failed to start processor:
> GridProcessorAdapter []
>
>         at
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1742)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:980)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2014)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1723)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1151)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1069)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:955)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:854)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:724)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:693)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at org.apache.ignite.Ignition.start(Ignition.java:352)
> [ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)
> [ignite-core-2.6.0.jar:2.6.0]
>
> Caused by: org.apache.ignite.IgniteCheckedException: WAL history is too
> short
> [descs=[org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$FileDescriptor@4d6c2],
> start=FileWALPointer [idx=0, fileOff=0, len=0]]
>
>         at
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.init(FileWriteAheadLogManager.java:3009)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.<init>(FileWriteAheadLogManager.java:2960)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.<init>(FileWriteAheadLogManager.java:2896)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.replay(FileWriteAheadLogManager.java:799)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.restoreMemory(GridCacheDatabaseSharedManager.java:1968)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:574)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedManager.java:525)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:700)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>         at
> org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1739)
> ~[ignite-core-2.6.0.jar:2.6.0]
>
>
>
>
>
>                 So, is there any way to start this cluster with copied
> data store forcefully? This scenario may also arrive if WAL disk gets
> failed. How can we atleast start the cluster with minimum data loss ?
>
>                 Any help would be highly appreciated !
>
>
>
> *Thanks and Regards,*
>
> *Kamlesh Joshi*
>
>
>
>
> "*Confidentiality Warning*: This message and any attachments are intended
> only for the use of the intended recipient(s), are confidential and may be
> privileged. If you are not the intended recipient, you are hereby notified
> that any review, re-transmission, conversion to hard copy, copying,
> circulation or other use of this message and any attachments is strictly
> prohibited. If you are not the intended recipient, please notify the sender
> immediately by return email and delete this message and any attachments
> from your system.
>
> *Virus Warning:* Although the company has taken reasonable precautions to
> ensure no viruses are present in this email. The company cannot accept
> responsibility for any loss or damage arising from the use of this email or
> attachment."
>
>
> "*Confidentiality Warning*: This message and any attachments are intended
> only for the use of the intended recipient(s), are confidential and may be
> privileged. If you are not the intended recipient, you are hereby notified
> that any review, re-transmission, conversion to hard copy, copying,
> circulation or other use of this message and any attachments is strictly
> prohibited. If you are not the intended recipient, please notify the sender
> immediately by return email and delete this message and any attachments
> from your system.
>
> *Virus Warning:* Although the company has taken reasonable precautions to
> ensure no viruses are present in this email. The company cannot accept
> responsibility for any loss or damage arising from the use of this email or
> attachment."
>

Re: [External]Re: Is there any way to force recover the cluster - copying running cluster datastore

Reply via email to