Hi Maxim, As I explained in original email, I do cleanup as part of init container. Since ignite nodes starts after one another, init container would also run in sequence. So when ignite node 1 completes startup(init container clean up work directory and copy data from snapshot) but node 2 is still running init container which might be still copying data. And node 3 since it is yet to start, it's wal, cp, and work directory hasn't been cleaned yet.
So my question was, is there a way I can do cleanup of work directory and copy from snapshot for all nodes before first ignite nodes starts. On Fri, 27 May 2022, 23:11 Maxim Muzafarov, <mmu...@apache.org> wrote: > Hello, > > If you're copying a snapshot part to the new node, then you have to be > sure that the /ignite/work/cp, /ignite/wal, /ignite/walarchive > directories are empty prior to the node start. Is it true for your > case? > > On Fri, 27 May 2022 at 10:29, Surinder Mehra <redni...@gmail.com> wrote: > > > > Hi, > > Please find ignite config and error log below > > > > config : > > <property name="gridLogger"> > > <bean class="org.apache.ignite.logger.log4j.Log4JLogger"> > > <constructor-arg type="java.lang.String" > value="/opt/ignite/apache-ignite/config/ignite-log4j.xml"/> > > </bean> > > </property> > > <property name="peerClassLoadingEnabled" value="true"/> > > <property name="deploymentMode" value="CONTINUOUS"/> > > <property name="workDirectory" value="/ignite/work"/> > > <property name="snapshotPath" value="/ignite/snapshots"/> > > <property name="queryThreadPoolSize" value="8"/> > > > > <property name="dataStorageConfiguration"> > > <bean > class="org.apache.ignite.configuration.DataStorageConfiguration"> > > <property name="walBufferSize" value="#{128L * 1024 * > 1024}"/> > > <property name="walSegmentSize" value="#{512L * 1024 * > 1024}"/> > > <property name="maxWalArchiveSize" value="#{2L * 1024 * > 1024 * 1024}"/> > > <property name="checkpointFrequency" value="#{60 * > 1000}" /> > > <property name="writeThrottlingEnabled" value="true"/> > > <property name="defaultDataRegionConfiguration"> > > <bean > class="org.apache.ignite.configuration.DataRegionConfiguration"> > > <property name="persistenceEnabled" > value="true"/> > > <property name="initialSize" value="#{100L * > 1024 * 1024}"/> > > <property name="maxSize" value="#{2L * 1024 * > 1024 * 1024}"/> > > <!-- > https://ignite.apache.org/docs/latest/persistence/persistence-tuning#adjusting-checkpointing-buffer-size-- > > > > <property name="checkpointPageBufferSize" > value="#{512L * 1024 * 1024}"/> > > <!--<property name="pageReplacementMode" > value="SEGMENTED_LRU"/>--> > > </bean> > > </property> > > <property name="walPath" value="/ignite/wal"/> > > <property name="walArchivePath" > value="/ignite/walarchive"/> > > </bean> > > </property> > > > > > > Error log: > > > > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.access$1000(FileWriteAheadLogManager.java:2763) > > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:870) > > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3200) > > at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1116) > > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1799) > > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1721) > > at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1160) > > at > org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1054) > > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:940) > > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:839) > > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:709) > > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678) > > at org.apache.ignite.Ignition.start(Ignition.java:353) > > ... 1 more > > Failed to start grid: WAL history is too short [descs=[FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000060.wal, > idx=60], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000061.wal, > idx=61], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000062.wal, > idx=62], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000063.wal, > idx=63], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000064.wal, > idx=64], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000065.wal, > idx=65], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000066.wal, > idx=66], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000067.wal, > idx=67], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000068.wal, > idx=68], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000069.wal, > idx=69], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000070.wal, > idx=70], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000071.wal, > idx=71], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000072.wal, > idx=72], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000073.wal, > idx=73], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000074.wal, > idx=74], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000075.wal, > idx=75], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000076.wal, > idx=76], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000077.wal, > idx=77], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000078.wal, > idx=78], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000079.wal, > idx=79], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000080.wal, > idx=80], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000081.wal, > idx=81], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000082.wal, > idx=82], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000083.wal, > idx=83], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000084.wal, > idx=84], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000085.wal, > idx=85], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000086.wal, > idx=86], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000087.wal, > idx=87], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000088.wal, > idx=88], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000089.wal, > idx=89], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000090.wal, > idx=90], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000091.wal, > idx=91], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000092.wal, > idx=92], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000093.wal, > idx=93], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000094.wal, > idx=94]], start=WALPointer [idx=0, fileOff=0, len=0]] > > > > > > On Thu, May 26, 2022 at 8:56 PM Николай Ижиков <nizhi...@apache.org> > wrote: > >> > >> Can you, please, send your config and full log file that contains error > message. > >> > >> 26 мая 2022 г., в 17:50, Surinder Mehra <redni...@gmail.com> > написал(а): > >> > >> Hello, > >> I upgraded to 2.13.0 and I am able to take sync snapshots now. However, > I ran into another problem while restoring from snapshot using manual steps > mentioned in documentation. > >> > >> We run ignite statefulset on kubernetes cluster so when we scale it to > N nodes, it brings up one node at a time. > >> > >> Now I am trying to attach init container which will copy /db directory > from snapshots to work directory after clearing db directory from work > directory and then start main container which runs ignite. > >> > >> It works well on single node, it's able to start cluster with snapshot > Data. > >> > >> When I start multiple nodes, init container will run each one of those > as first step. Since nodes starts one at a time, it's runs into error > saying "too small WAL segments data" > >> > >> I suppose that could be because 2nd node is still in init step while > first one is in running mode. There are few which haven't started yet, > waiting for 2nd node to be in running state. > >> > >> Any idea how can we make main containers wait until all init containers > are completed > >> > >> Asking this here as its related to ignite setup in kubernetes. > >> > >> Any help wil be appreciated. Thanks > >> > >> On Wed, 25 May 2022, 00:04 Surinder Mehra, <redni...@gmail.com> wrote: > >>> > >>> Thanks a lot. I will try this. > >>> > >>> On Tue, 24 May 2022, 23:50 Николай Ижиков, <nizhi...@apache.org> > wrote: > >>>> > >>>> > Does it ensure consistency while copying data which is parallely > getting updated by application writes > >>>> > >>>> Yes. > >>>> > >>>> From the documentation: > >>>> > >>>> «An Ignite snapshot includes a consistent cluster-wide copy of all > data records persisted on disk and some other files needed for a restore > procedure.» > >>>> > >>>> > will this be a stop the world process > >>>> > >>>> No. > >>>> > >>>> > >>>> 24 мая 2022 г., в 21:17, Surinder Mehra <redni...@gmail.com> > написал(а): > >>>> > >>>> Hi > >>>> Thanks for reply. > >>>> > >>>> #1: So it's not a stop the world task. Does it ensure consistency > while copying data which is parallely getting updated by application > writes. Or does it mark the data to copied and ignore further updates on it. > >>>> > >>>> #2: > >>>> I will try sync snapshot. But just to confirm, will this be a stop > the world process. Couldn't find anything on Documentation page about it > >>>> > >>>> On Tue, 24 May 2022, 23:12 Николай Ижиков, <nizhi...@apache.org> > wrote: > >>>>> > >>>>> Hello, Mehra. > >>>>> > >>>>> > 1. Is it stop the world process. > >>>>> > >>>>> No, you can perform any actions. > >>>>> Note, topology changes will cancel snapshot create process. > >>>>> > >>>>> > 2. If so, is it stop the world only during command execution > (500millis) or until snapshot Dara is fully copied(takes many minutes) to > complete. > >>>>> > >>>>> Please, take a look at `—sync` option of create snapshot command > (you can see help in `control.sh` output). > >>>>> `EVT_CLUSTER_SNAPSHOT_FINISHED` raise on snapshot create finish. > >>>>> > >>>>> > 3. Is there a way around to speed up this other than increasing > snapshot threads > >>>>> > >>>>> Stop write operations. > >>>>> The less you change the quicker snapshot will be created. > >>>>> > >>>>> 24 мая 2022 г., в 20:12, Surinder Mehra <redni...@gmail.com> > написал(а): > >>>>> > >>>>> Hi, > >>>>> I have 3 node ignite cluster each node contains 60G work > directory(ebs) and I need to create snapshots. > >>>>> I followed steps to create snapshots and run create snapshot command > using control utility. Command completed in 500millis but snapshot > directory only had 400Mb data. Later I realised directory size grew up 30G. > I suppose it would reach size of work directory. > >>>>> > >>>>> > >>>>> I have few questions. > >>>>> 1. Is it stop the world process. > >>>>> 2. If so, is it stop the world only during command execution > (500millis) or until snapshot Dara is fully copied(takes many minutes) to > complete. > >>>>> 3. Is there a way around to speed up this other than increasing > snapshot threads > >>>>> > >>>>> > >>>> > >> >