>We have noticed that startup time for our server nodes has been slowly 
>increasing in time as the amount of data stored in the persistent store grows.
> 
>This appears to be closely related to recovery of WAL changes that were not 
>checkpointed at the time the node was stopped.
> 
>After enabling debug logging we see that the WAL file is scanned, and for 
>every cache, all partitions in the cache are examined, and if there are any 
>uncommitted changes in the WAL file then the partition is updated (I assume 
>this requires reading of the partition itself as a part of this process).
> 
>We now have ~150Gb of data in our persistent store and we see WAL update times 
>between 5-10 minutes to complete, during which the node is unavailable.
> 
>We use fairly large WAL files (512Mb) and use 10 segments, with WAL archiving 
>enabled.
> 
>We anticipate data in persistent storage to grow to Terabytes, and if the 
>startup time continues to grow as storage grows then this makes deploys and 
>restarts difficult.
> 
>Until now we have been using the default checkpoint time out of 3 minutes 
>which may mean we have significant uncheckpointed data in the WAL files. We 
>are moving to 1 minute checkpoint but don't yet know if this improve startup 
>times. We also use the default 1024 partitions per cache, though some 
>partitions may be large. 
> 
>Can anyone confirm this is expected behaviour and recommendations for 
>resolving it?
> 
>Will reducing checking pointing intervals help?
 
yes, it will help. Check 
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
>Is the entire content of a partition read while applying WAL changes?
 
don`t think so, may be someone else suggest here?
>Does anyone else have this issue?
> 
>Thanks,
>Raymond.
> 
>  --
>
>Raymond Wilson
>Solution Architect, Civil Construction Software Systems (CCSS)
>11 Birmingham Drive |  Christchurch, New Zealand
>raymond_wil...@trimble.com
>         
> 
 
 
 
 

Reply via email to