Correct code is running from here: if (checkpointReadWriteLock.getReadHoldCount() > 1 || safeToUpdatePageMemories() || checkpointer.runner() == null) break; else { CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too many dirty pages"); and near you can see that :
maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED ? pool.pages() * 3L / 4 : Math.min(pool.pages() * 2L / 3, cpPoolPages); Thus if ¾ pages are dirty from whole DataRegion pages — will raise this cp. >In ( >https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood > ), there is a mention of a dirty pages limit that is a factor that can >trigger check points. > >I also found this issue: >http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html > where "too many dirty pages" is a reason given for initiating a checkpoint. > >After reviewing our logs I found this: (one example) > >2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] Checkpoint >started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, >startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], >checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, >checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, >walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, >splitAndSortCpPagesDuration=45ms, pages=33421, reason=' too many dirty pages >'] > >Which suggests we may have the issue where writes are frozen until the check >point is completed. > >Looking at the AI 2.8.1 source code, the dirty page limit fraction appears to >be 0.1 (10%), via this entry in GridCacheDatabaseSharedManager.java: > > /** > * Threshold to calculate limit for pages list on-heap caches. > * <p> > * Note: When a checkpoint is triggered, we need some amount of page >memory to store pages list on-heap cache. > * If a checkpoint is triggered by "too many dirty pages" reason and pages >list cache is rather big, we can get >* {@code IgniteOutOfMemoryException}. To prevent this, we can limit the total >amount of cached page list buckets, > * assuming that checkpoint will be triggered if no more then 3/4 of pages >will be marked as dirty (there will be > * at least 1/4 of clean pages) and each cached page list bucket can be >stored to up to 2 pages (this value is not > * static, but depends on PagesCache.MAX_SIZE, so if PagesCache.MAX_SIZE > >PagesListNodeIO#getCapacity it can take > * more than 2 pages). Also some amount of page memory needed to store >page list metadata. > */ > private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = >0.1 ; > >This raises two questions: > >1. The data region where most writes are occurring has 4Gb allocated to it, >though it is permitted to start at a much lower level. 4Gb should be 1,000,000 >pages, 10% of which should be 100,000 dirty pages. > >The 'limit holder' is calculated like this: > > /** > * @return Holder for page list cache limit for given data region. > */ > public AtomicLong pageListCacheLimitHolder ( DataRegion dataRegion >) { > if ( dataRegion . config (). isPersistenceEnabled ()) { > return pageListCacheLimits . computeIfAbsent ( dataRegion . >config (). getName (), name -> new AtomicLong ( > ( long )(((PageMemoryEx) dataRegion . pageMemory ()). >totalPages () * PAGE_LIST_CACHE_LIMIT_THRESHOLD))); > } > return null ; > } > >... but I am unsure if totalPages() is referring to the current size of the >data region, or the size it is permitted to grow to. ie: Could the 'dirty page >limit' be a sliding limit based on the growth of the data region? Is it better >to set the initial and maximum sizes of data regions to be the same number? > >2. We have two data regions, one supporting inbound arrival of data (with low >numbers of writes), and one supporting storage of processed results from the >arriving data (with many more writes). > >The block on writes due to the number of dirty pages appears to affect all >data regions, not just the one which has violated the dirty page limit. Is >that correct? If so, is this something that can be improved? > >Thanks, >Raymond. > >On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson < raymond_wil...@trimble.com > >wrote: >>I'm working on getting automatic JVM thread stack dumping occurring if we >>detect long delays in put (PutIfAbsent) operations. Hopefully this will >>provide more information. >>On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky < arzamas...@mail.ru > >>wrote: >>> >>>Don`t think so, checkpointing work perfectly well already before this fix. >>>Need additional info for start digging your problem, can you share ignite >>>logs somewhere? >>> >>>>I noticed an entry in the Ignite 2.9.1 changelog: >>>>* Improved checkpoint concurrent behaviour >>>>I am having trouble finding the relevant Jira ticket for this in the 2.9.1 >>>>Jira area at >>>>https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved >>>> >>>>Perhaps this change may improve the checkpointing issue we are seeing? >>>> >>>>Raymond. >>>> >>>>On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson < raymond_wil...@trimble.com >>>>> wrote: >>>>>Hi Zhenya, >>>>> >>>>>1. We currently use AWS EFS for primary storage, with provisioned IOPS to >>>>>provide sufficient IO. Our Ignite cluster currently tops out at ~10% usage >>>>>(with at least 5 nodes writing to it, including WAL and WAL archive), so >>>>>we are not saturating the EFS interface. We use the default page size >>>>>(experiments with larger page sizes showed instability when checkpointing >>>>>due to free page starvation, so we reverted to the default size). >>>>> >>>>>2. Thanks for the detail, we will look for that in thread dumps when we >>>>>can create them. >>>>> >>>>>3. We are using the default CP buffer size, which is max(256Mb, >>>>>DataRagionSize / 4) according to the Ignite documentation, so this should >>>>>have more than enough checkpoint buffer space to cope with writes. As >>>>>additional information, the cache which is displaying very slow writes is >>>>>in a data region with relatively slow write traffic. There is a primary >>>>>(default) data region with large write traffic, and the vast majority of >>>>>pages being written in a checkpoint will be for that default data region. >>>>> >>>>>4. Yes, this is very surprising. Anecdotally from our logs it appears >>>>>write traffic into the low write traffic cache is blocked during >>>>>checkpoints. >>>>> >>>>>Thanks, >>>>>Raymond. >>>>> >>>>> >>>>>On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky < arzamas...@mail.ru > >>>>>wrote: >>>>>>* >>>>>>Additionally to Ilya reply you can check vendors page for additional >>>>>>info, all in this page are applicable for ignite too [1]. Increasing >>>>>>threads number leads to concurrent io usage, thus if your have something >>>>>>like nvme — it`s up to you but in case of sas possibly better would be to >>>>>>reduce this param. >>>>>>* Log will shows you something like : >>>>>>Parking thread=%Thread name% for timeout(ms)= %time% and appropriate : >>>>>>Unparking thread= >>>>>>* No additional looging with cp buffer usage are provided. cp buffer >>>>>>need to be more than 10% of overall persistent DataRegions size. >>>>>>* 90 seconds or longer — Seems like problems in io or system tuning, >>>>>>it`s very bad score i hope. >>>>>>[1] >>>>>>https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning >>>>>> >>>>>> >>>>>> >>>>>>>Hi, >>>>>>> >>>>>>>We have been investigating some issues which appear to be related to >>>>>>>checkpointing. We currently use the IA 2.8.1 with the C# client. >>>>>>> >>>>>>>I have been trying to gain clarity on how certain aspects of the Ignite >>>>>>>configuration relate to the checkpointing process: >>>>>>> >>>>>>>1. Number of check pointing threads. This defaults to 4, but I don't >>>>>>>understand how it applies to the checkpointing process. Are more threads >>>>>>>generally better (eg: because it makes the disk IO parallel across the >>>>>>>threads), or does it only have a positive effect if you have many data >>>>>>>storage regions? Or something else? If this could be clarified in the >>>>>>>documentation (or a pointer to it which Google has not yet found), that >>>>>>>would be good. >>>>>>> >>>>>>>2. Checkpoint frequency. This is defaulted to 180 seconds. I was >>>>>>>thinking that reducing this time would result in smaller less disruptive >>>>>>>check points. Setting it to 60 seconds seems pretty safe, but is there a >>>>>>>practical lower limit that should be used for use cases with new data >>>>>>>constantly being added, eg: 5 seconds, 10 seconds? >>>>>>> >>>>>>>3. Write exclusivity constraints during checkpointing. I understand that >>>>>>>while a checkpoint is occurring ongoing writes will be supported into >>>>>>>the caches being check pointed, and if those are writes to existing >>>>>>>pages then those will be duplicated into the checkpoint buffer. If this >>>>>>>buffer becomes full or stressed then Ignite will throttle, and perhaps >>>>>>>block, writes until the checkpoint is complete. If this is the case then >>>>>>>Ignite will emit logging (warning or informational?) that writes are >>>>>>>being throttled. >>>>>>> >>>>>>>We have cases where simple puts to caches (a few requests per second) >>>>>>>are taking up to 90 seconds to execute when there is an active check >>>>>>>point occurring, where the check point has been triggered by the >>>>>>>checkpoint timer. When a checkpoint is not occurring the time to do this >>>>>>>is usually in the milliseconds. The checkpoints themselves can take 90 >>>>>>>seconds or longer, and are updating up to 30,000-40,000 pages, across a >>>>>>>pair of data storage regions, one with 4Gb in-memory space allocated >>>>>>>(which should be 1,000,000 pages at the standard 4kb page size), and one >>>>>>>small region with 128Mb. There is no 'throttling' logging being emitted >>>>>>>that we can tell, so the checkpoint buffer (which should be 1Gb for the >>>>>>>first data region and 256 Mb for the second smaller region in this case) >>>>>>>does not look like it can fill up during the checkpoint. >>>>>>> >>>>>>>It seems like the checkpoint is affecting the put operations, but I >>>>>>>don't understand why that may be given the documented checkpointing >>>>>>>process, and the checkpoint itself (at least via Informational logging) >>>>>>>is not advertising any restrictions. >>>>>>> >>>>>>>Thanks, >>>>>>>Raymond. >>>>>>> -- >>>>>>> >>>>>>>Raymond Wilson >>>>>>>Solution Architect, Civil Construction Software Systems (CCSS) >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> >>>>>Raymond Wilson >>>>>Solution Architect, Civil Construction Software Systems (CCSS) >>>>>11 Birmingham Drive | Christchurch, New Zealand >>>>>+64-21-2013317 Mobile >>>>>raymond_wil...@trimble.com >>>>> >>>>> >>>> >>>> -- >>>> >>>>Raymond Wilson >>>>Solution Architect, Civil Construction Software Systems (CCSS) >>>>11 Birmingham Drive | Christchurch, New Zealand >>>>+64-21-2013317 Mobile >>>>raymond_wil...@trimble.com >>>> >>>> >>> >>> >>> >>> >> >> -- >> >>Raymond Wilson >>Solution Architect, Civil Construction Software Systems (CCSS) >>11 Birmingham Drive | Christchurch, New Zealand >>+64-21-2013317 Mobile >>raymond_wil...@trimble.com >> >> > > -- > >Raymond Wilson >Solution Architect, Civil Construction Software Systems (CCSS) >11 Birmingham Drive | Christchurch, New Zealand >+64-21-2013317 Mobile >raymond_wil...@trimble.com > >