Hi Maxim,
As I explained in original email, I do cleanup as part of init container.
Since ignite nodes starts after one another, init container would also run
in sequence.
So when ignite node 1 completes startup(init container clean up work
directory and copy data from snapshot) but node 2 is still running init
container which might be still copying data. And node 3 since it is yet to
start, it's wal, cp, and work directory hasn't been cleaned yet.

So my question was, is there a way I can do cleanup of work directory  and
copy from snapshot for all nodes before first ignite nodes starts.

On Fri, 27 May 2022, 23:11 Maxim Muzafarov, <mmu...@apache.org> wrote:

> Hello,
>
> If you're copying a snapshot part to the new node, then you have to be
> sure that the /ignite/work/cp, /ignite/wal, /ignite/walarchive
> directories are empty prior to the node start. Is it true for your
> case?
>
> On Fri, 27 May 2022 at 10:29, Surinder Mehra <redni...@gmail.com> wrote:
> >
> > Hi,
> > Please find ignite config and error log below
> >
> > config :
> > <property name="gridLogger">
> >             <bean class="org.apache.ignite.logger.log4j.Log4JLogger">
> >                 <constructor-arg type="java.lang.String"
> value="/opt/ignite/apache-ignite/config/ignite-log4j.xml"/>
> >             </bean>
> >         </property>
> >         <property name="peerClassLoadingEnabled" value="true"/>
> >         <property name="deploymentMode" value="CONTINUOUS"/>
> >         <property name="workDirectory" value="/ignite/work"/>
> >         <property name="snapshotPath" value="/ignite/snapshots"/>
> >         <property name="queryThreadPoolSize" value="8"/>
> >
> >         <property name="dataStorageConfiguration">
> >             <bean
> class="org.apache.ignite.configuration.DataStorageConfiguration">
> >                 <property name="walBufferSize" value="#{128L * 1024 *
> 1024}"/>
> >                 <property name="walSegmentSize" value="#{512L * 1024 *
> 1024}"/>
> >                 <property name="maxWalArchiveSize" value="#{2L * 1024 *
> 1024 * 1024}"/>
> >                 <property name="checkpointFrequency" value="#{60 *
> 1000}" />
> >                 <property name="writeThrottlingEnabled" value="true"/>
> >                 <property name="defaultDataRegionConfiguration">
> >                     <bean
> class="org.apache.ignite.configuration.DataRegionConfiguration">
> >                         <property name="persistenceEnabled"
> value="true"/>
> >                         <property name="initialSize" value="#{100L *
> 1024 * 1024}"/>
> >                         <property name="maxSize" value="#{2L * 1024 *
> 1024 * 1024}"/>
> >                         <!--
> https://ignite.apache.org/docs/latest/persistence/persistence-tuning#adjusting-checkpointing-buffer-size--
> >
> >                         <property name="checkpointPageBufferSize"
> value="#{512L * 1024 * 1024}"/>
> >                         <!--<property name="pageReplacementMode"
> value="SEGMENTED_LRU"/>-->
> >                     </bean>
> >                 </property>
> >                 <property name="walPath" value="/ignite/wal"/>
> >                 <property name="walArchivePath"
> value="/ignite/walarchive"/>
> >             </bean>
> >         </property>
> >
> >
> > Error log:
> >
> > at
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.access$1000(FileWriteAheadLogManager.java:2763)
> > at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:870)
> > at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3200)
> > at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1116)
> > at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1799)
> > at
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1721)
> > at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1160)
> > at
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1054)
> > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:940)
> > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:839)
> > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:709)
> > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678)
> > at org.apache.ignite.Ignition.start(Ignition.java:353)
> > ... 1 more
> > Failed to start grid: WAL history is too short [descs=[FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000060.wal,
> idx=60], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000061.wal,
> idx=61], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000062.wal,
> idx=62], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000063.wal,
> idx=63], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000064.wal,
> idx=64], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000065.wal,
> idx=65], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000066.wal,
> idx=66], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000067.wal,
> idx=67], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000068.wal,
> idx=68], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000069.wal,
> idx=69], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000070.wal,
> idx=70], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000071.wal,
> idx=71], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000072.wal,
> idx=72], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000073.wal,
> idx=73], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000074.wal,
> idx=74], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000075.wal,
> idx=75], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000076.wal,
> idx=76], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000077.wal,
> idx=77], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000078.wal,
> idx=78], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000079.wal,
> idx=79], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000080.wal,
> idx=80], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000081.wal,
> idx=81], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000082.wal,
> idx=82], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000083.wal,
> idx=83], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000084.wal,
> idx=84], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000085.wal,
> idx=85], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000086.wal,
> idx=86], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000087.wal,
> idx=87], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000088.wal,
> idx=88], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000089.wal,
> idx=89], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000090.wal,
> idx=90], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000091.wal,
> idx=91], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000092.wal,
> idx=92], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000093.wal,
> idx=93], FileDescriptor
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000094.wal,
> idx=94]], start=WALPointer [idx=0, fileOff=0, len=0]]
> >
> >
> > On Thu, May 26, 2022 at 8:56 PM Николай Ижиков <nizhi...@apache.org>
> wrote:
> >>
> >> Can you, please, send your config and full log file that contains error
> message.
> >>
> >> 26 мая 2022 г., в 17:50, Surinder Mehra <redni...@gmail.com>
> написал(а):
> >>
> >> Hello,
> >> I upgraded to 2.13.0 and I am able to take sync snapshots now. However,
> I ran into another problem while restoring from snapshot using manual steps
> mentioned in documentation.
> >>
> >> We run ignite statefulset on kubernetes cluster so when we scale it to
> N nodes, it brings up one node at a time.
> >>
> >> Now I am trying to attach init container which will copy /db directory
> from snapshots to work directory after clearing db directory from work
> directory and then start main container which runs ignite.
> >>
> >> It works well on single node, it's able to start cluster with snapshot
> Data.
> >>
> >> When I start multiple nodes, init container will run each one of those
> as first step. Since nodes starts one at a time, it's runs into error
> saying "too small WAL segments data"
> >>
> >> I suppose that could be because 2nd node is still in init step while
> first one is in running mode. There are few which haven't started yet,
> waiting for 2nd node to be in running state.
> >>
> >> Any idea how can we make main containers wait until all init containers
> are completed
> >>
> >> Asking this here as its related to ignite setup in kubernetes.
> >>
> >> Any help wil be appreciated. Thanks
> >>
> >> On Wed, 25 May 2022, 00:04 Surinder Mehra, <redni...@gmail.com> wrote:
> >>>
> >>> Thanks a lot. I will try this.
> >>>
> >>> On Tue, 24 May 2022, 23:50 Николай Ижиков, <nizhi...@apache.org>
> wrote:
> >>>>
> >>>> > Does it ensure consistency while copying data which is parallely
> getting updated by application writes
> >>>>
> >>>> Yes.
> >>>>
> >>>> From the documentation:
> >>>>
> >>>> «An Ignite snapshot includes a consistent cluster-wide copy of all
> data records persisted on disk and some other files needed for a restore
> procedure.»
> >>>>
> >>>> > will this be a stop the world process
> >>>>
> >>>> No.
> >>>>
> >>>>
> >>>> 24 мая 2022 г., в 21:17, Surinder Mehra <redni...@gmail.com>
> написал(а):
> >>>>
> >>>> Hi
> >>>> Thanks for reply.
> >>>>
> >>>> #1:  So it's not a stop the world task. Does it ensure consistency
> while copying data which is parallely getting updated by application
> writes. Or does it mark the data to copied and ignore further updates on it.
> >>>>
> >>>> #2:
> >>>> I will try sync snapshot. But just to confirm, will this be a stop
> the world process. Couldn't find anything on Documentation page about it
> >>>>
> >>>> On Tue, 24 May 2022, 23:12 Николай Ижиков, <nizhi...@apache.org>
> wrote:
> >>>>>
> >>>>> Hello, Mehra.
> >>>>>
> >>>>> > 1. Is it stop the world process.
> >>>>>
> >>>>> No, you can perform any actions.
> >>>>> Note, topology changes will cancel snapshot create process.
> >>>>>
> >>>>> > 2. If so, is it stop the world only during command execution
> (500millis) or until snapshot Dara is fully copied(takes many minutes) to
> complete.
> >>>>>
> >>>>> Please, take a look at `—sync` option of create snapshot command
> (you can see help in `control.sh` output).
> >>>>> `EVT_CLUSTER_SNAPSHOT_FINISHED` raise on snapshot create finish.
> >>>>>
> >>>>> > 3. Is there a way around to speed up this other than increasing
> snapshot threads
> >>>>>
> >>>>> Stop write operations.
> >>>>> The less you change the quicker snapshot will be created.
> >>>>>
> >>>>> 24 мая 2022 г., в 20:12, Surinder Mehra <redni...@gmail.com>
> написал(а):
> >>>>>
> >>>>> Hi,
> >>>>> I have 3 node ignite cluster each node contains 60G work
> directory(ebs) and I need to create snapshots.
> >>>>> I followed steps to create snapshots and run create snapshot command
> using control utility. Command completed in 500millis but snapshot
> directory only had 400Mb data. Later I realised directory size grew up 30G.
> I suppose it would reach size of work directory.
> >>>>>
> >>>>>
> >>>>> I have few questions.
> >>>>> 1. Is it stop the world process.
> >>>>> 2. If so, is it stop the world only during command execution
> (500millis) or until snapshot Dara is fully copied(takes many minutes) to
> complete.
> >>>>> 3. Is there a way around to speed up this other than increasing
> snapshot threads
> >>>>>
> >>>>>
> >>>>
> >>
>

Reply via email to