>Hi Zhenya,
>
>Thanks for confirming performing checkpoints more often will help here.
Hi Raymond !
>
>I have established this configuration so will experiment with settings little.
>
>On a related note, is there any way to automatically trigger a checkpoint, for
>instance as a pre-shutdown activity?
If you shutdown your cluster gracefully = with deactivation [1] further start
will not trigger wal readings.
[1]
https://www.gridgain.com/docs/latest/administrators-guide/control-script#deactivating-cluster
>Checkpoints seem to be much faster than the process of applying WAL updates.
>
>Raymond.
>On Wed, Jan 13, 2021 at 8:07 PM Zhenya Stanilovsky < arzamas...@mail.ru >
>wrote:
>>
>>
>>
>>
>>>We have noticed that startup time for our server nodes has been slowly
>>>increasing in time as the amount of data stored in the persistent store
>>>grows.
>>>
>>>This appears to be closely related to recovery of WAL changes that were not
>>>checkpointed at the time the node was stopped.
>>>
>>>After enabling debug logging we see that the WAL file is scanned, and for
>>>every cache, all partitions in the cache are examined, and if there are any
>>>uncommitted changes in the WAL file then the partition is updated (I assume
>>>this requires reading of the partition itself as a part of this process).
>>>
>>>We now have ~150Gb of data in our persistent store and we see WAL update
>>>times between 5-10 minutes to complete, during which the node is unavailable.
>>>
>>>We use fairly large WAL files (512Mb) and use 10 segments, with WAL
>>>archiving enabled.
>>>
>>>We anticipate data in persistent storage to grow to Terabytes, and if the
>>>startup time continues to grow as storage grows then this makes deploys and
>>>restarts difficult.
>>>
>>>Until now we have been using the default checkpoint time out of 3 minutes
>>>which may mean we have significant uncheckpointed data in the WAL files. We
>>>are moving to 1 minute checkpoint but don't yet know if this improve startup
>>>times. We also use the default 1024 partitions per cache, though some
>>>partitions may be large.
>>>
>>>Can anyone confirm this is expected behaviour and recommendations for
>>>resolving it?
>>>
>>>Will reducing checking pointing intervals help?
>>
>>yes, it will help. Check
>>https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
>>>Is the entire content of a partition read while applying WAL changes?
>>
>>don`t think so, may be someone else suggest here?
>>>Does anyone else have this issue?
>>>
>>>Thanks,
>>>Raymond.
>>>
>>> --
>>>
>>>Raymond Wilson
>>>Solution Architect, Civil Construction Software Systems (CCSS)
>>>11 Birmingham Drive | Christchurch, New Zealand
>>>raymond_wil...@trimble.com
>>>
>>>
>>
>>
>>
>>
>
> --
>
>Raymond Wilson
>Solution Architect, Civil Construction Software Systems (CCSS)
>11 Birmingham Drive | Christchurch, New Zealand
>raymond_wil...@trimble.com
>
>