>We have noticed that startup time for our server nodes has been slowly
>increasing in time as the amount of data stored in the persistent store grows.
>
>This appears to be closely related to recovery of WAL changes that were not
>checkpointed at the time the node was stopped.
>
>After enabling debug logging we see that the WAL file is scanned, and for
>every cache, all partitions in the cache are examined, and if there are any
>uncommitted changes in the WAL file then the partition is updated (I assume
>this requires reading of the partition itself as a part of this process).
>
>We now have ~150Gb of data in our persistent store and we see WAL update times
>between 5-10 minutes to complete, during which the node is unavailable.
>
>We use fairly large WAL files (512Mb) and use 10 segments, with WAL archiving
>enabled.
>
>We anticipate data in persistent storage to grow to Terabytes, and if the
>startup time continues to grow as storage grows then this makes deploys and
>restarts difficult.
>
>Until now we have been using the default checkpoint time out of 3 minutes
>which may mean we have significant uncheckpointed data in the WAL files. We
>are moving to 1 minute checkpoint but don't yet know if this improve startup
>times. We also use the default 1024 partitions per cache, though some
>partitions may be large.
>
>Can anyone confirm this is expected behaviour and recommendations for
>resolving it?
>
>Will reducing checking pointing intervals help?
yes, it will help. Check
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
>Is the entire content of a partition read while applying WAL changes?
don`t think so, may be someone else suggest here?
>Does anyone else have this issue?
>
>Thanks,
>Raymond.
>
> --
>
>Raymond Wilson
>Solution Architect, Civil Construction Software Systems (CCSS)
>11 Birmingham Drive | Christchurch, New Zealand
>raymond_wil...@trimble.com
>
>