Re: Ever increasing startup times as data grow in persistent storage

Raymond Wilson Wed, 13 Jan 2021 00:54:28 -0800

Of course. Obvious! :)

Sent from my iPhone


On 13/01/2021, at 9:15 PM, Zhenya Stanilovsky <arzamas...@mail.ru> wrote:







Is there an API version of the cluster deactivation?


https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core.Tests/Cache/PersistentStoreTestObsolete.cs#L131


On Wed, Jan 13, 2021 at 8:28 PM Zhenya Stanilovsky <arzamas...@mail.ru
<//e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote:





Hi Zhenya,

Thanks for confirming performing checkpoints more often will help here.

Hi Raymond !


I have established this configuration so will experiment with settings
little.

On a related note, is there any way to automatically trigger a checkpoint,
for instance as a pre-shutdown activity?


If you shutdown your cluster gracefully = with deactivation [1] further
start will not trigger wal readings.

[1]
https://www.gridgain.com/docs/latest/administrators-guide/control-script#deactivating-cluster


Checkpoints seem to be much faster than the process of applying WAL updates.

Raymond.

On Wed, Jan 13, 2021 at 8:07 PM Zhenya Stanilovsky <arzamas...@mail.ru
<http://e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote:






We have noticed that startup time for our server nodes has been slowly
increasing in time as the amount of data stored in the persistent store
grows.

This appears to be closely related to recovery of WAL changes that were not
checkpointed at the time the node was stopped.

After enabling debug logging we see that the WAL file is scanned, and for
every cache, all partitions in the cache are examined, and if there are any
uncommitted changes in the WAL file then the partition is updated (I assume
this requires reading of the partition itself as a part of this process).

We now have ~150Gb of data in our persistent store and we see WAL update
times between 5-10 minutes to complete, during which the node is
unavailable.

We use fairly large WAL files (512Mb) and use 10 segments, with WAL
archiving enabled.

We anticipate data in persistent storage to grow to Terabytes, and if the
startup time continues to grow as storage grows then this makes deploys and
restarts difficult.

Until now we have been using the default checkpoint time out of 3 minutes
which may mean we have significant uncheckpointed data in the WAL files. We
are moving to 1 minute checkpoint but don't yet know if this improve
startup times. We also use the default 1024 partitions per cache, though
some partitions may be large.

Can anyone confirm this is expected behaviour and recommendations for
resolving it?

Will reducing checking pointing intervals help?


yes, it will help. Check
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood

Is the entire content of a partition read while applying WAL changes?


don`t think so, may be someone else suggest here?

Does anyone else have this issue?

Thanks,
Raymond.


--
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
raymond_wil...@trimble.com
<http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>


<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>








--
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
raymond_wil...@trimble.com
<http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>


<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>








--
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
raymond_wil...@trimble.com
<//e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>


<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>

Re: Ever increasing startup times as data grow in persistent storage

Reply via email to