Of course. Obvious! :) Sent from my iPhone
On 13/01/2021, at 9:15 PM, Zhenya Stanilovsky <arzamas...@mail.ru> wrote: Is there an API version of the cluster deactivation? https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/Apache.Ignite.Core.Tests/Cache/PersistentStoreTestObsolete.cs#L131 On Wed, Jan 13, 2021 at 8:28 PM Zhenya Stanilovsky <arzamas...@mail.ru <//e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote: Hi Zhenya, Thanks for confirming performing checkpoints more often will help here. Hi Raymond ! I have established this configuration so will experiment with settings little. On a related note, is there any way to automatically trigger a checkpoint, for instance as a pre-shutdown activity? If you shutdown your cluster gracefully = with deactivation [1] further start will not trigger wal readings. [1] https://www.gridgain.com/docs/latest/administrators-guide/control-script#deactivating-cluster Checkpoints seem to be much faster than the process of applying WAL updates. Raymond. On Wed, Jan 13, 2021 at 8:07 PM Zhenya Stanilovsky <arzamas...@mail.ru <http://e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote: We have noticed that startup time for our server nodes has been slowly increasing in time as the amount of data stored in the persistent store grows. This appears to be closely related to recovery of WAL changes that were not checkpointed at the time the node was stopped. After enabling debug logging we see that the WAL file is scanned, and for every cache, all partitions in the cache are examined, and if there are any uncommitted changes in the WAL file then the partition is updated (I assume this requires reading of the partition itself as a part of this process). We now have ~150Gb of data in our persistent store and we see WAL update times between 5-10 minutes to complete, during which the node is unavailable. We use fairly large WAL files (512Mb) and use 10 segments, with WAL archiving enabled. We anticipate data in persistent storage to grow to Terabytes, and if the startup time continues to grow as storage grows then this makes deploys and restarts difficult. Until now we have been using the default checkpoint time out of 3 minutes which may mean we have significant uncheckpointed data in the WAL files. We are moving to 1 minute checkpoint but don't yet know if this improve startup times. We also use the default 1024 partitions per cache, though some partitions may be large. Can anyone confirm this is expected behaviour and recommendations for resolving it? Will reducing checking pointing intervals help? yes, it will help. Check https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood Is the entire content of a partition read while applying WAL changes? don`t think so, may be someone else suggest here? Does anyone else have this issue? Thanks, Raymond. -- <http://www.trimble.com/> Raymond Wilson Solution Architect, Civil Construction Software Systems (CCSS) 11 Birmingham Drive | Christchurch, New Zealand raymond_wil...@trimble.com <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch> -- <http://www.trimble.com/> Raymond Wilson Solution Architect, Civil Construction Software Systems (CCSS) 11 Birmingham Drive | Christchurch, New Zealand raymond_wil...@trimble.com <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch> -- <http://www.trimble.com/> Raymond Wilson Solution Architect, Civil Construction Software Systems (CCSS) 11 Birmingham Drive | Christchurch, New Zealand raymond_wil...@trimble.com <//e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>