Hello! I think it's a sensible explanation.
Regards, -- Ilya Kasnacheev ср, 6 янв. 2021 г. в 14:32, Raymond Wilson <raymond_wil...@trimble.com>: > I checked our code that creates the primary data region, and it does set > the minimum and maximum to 4Gb, meaning there will be 1,000,000 pages in > that region. > > The secondary data region is much smaller, and is set to min/max = 128 Mb > of memory. > > The checkpoints with the "too many dirty pages" reason were quoting less > than 100,000 dirty pages, so this must have been triggered on the size of > the smaller data region. > > Both these data regions have persistence, and I think this may have been a > sub-optimal way to set it up. My aim was to provide a dedicated channel for > inbound data arriving to be queued that was not impacted by updates due to > processing of that data. I think it may be better to will change this > arrangement to use a single data region to make the checkpointing process > simpler and reduce cases where it decides there are too many dirty pages. > > On Mon, Jan 4, 2021 at 11:39 PM Ilya Kasnacheev <ilya.kasnach...@gmail.com> > wrote: > >> Hello! >> >> I guess it's pool.pages() * 3L / 4 >> Since, counter intuitively, the default ThrottlingPolicy is not >> ThrottlingPolicy.DISABLED. It's CHECKPOINT_BUFFER_ONLY. >> >> Regards, >> >> -- >> Ilya Kasnacheev >> >> >> чт, 31 дек. 2020 г. в 04:33, Raymond Wilson <raymond_wil...@trimble.com>: >> >>> Regards this section of code: >>> >>> maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED >>> ? pool.pages() * 3L / 4 >>> : Math.min(pool.pages() * 2L / 3, cpPoolPages); >>> >>> I think the correct ratio will be 2/3 of pages as we do not have a >>> throttling policy defined, correct?. >>> >>> On Thu, Dec 31, 2020 at 12:49 AM Zhenya Stanilovsky <arzamas...@mail.ru> >>> wrote: >>> >>>> Correct code is running from here: >>>> >>>> if (checkpointReadWriteLock.getReadHoldCount() > 1 || >>>> safeToUpdatePageMemories() || checkpointer.runner() == null) >>>> break;else { >>>> CheckpointProgress pages = checkpointer.scheduleCheckpoint(0, "too >>>> many dirty pages"); >>>> >>>> and near you can see that : >>>> maxDirtyPages = throttlingPlc != ThrottlingPolicy.DISABLED ? >>>> pool.pages() * 3L / 4 : Math.min(pool.pages() * 2L / 3, cpPoolPages); >>>> >>>> Thus if ¾ pages are dirty from whole DataRegion pages — will raise this >>>> cp. >>>> >>>> >>>> In ( >>>> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood), >>>> there is a mention of a dirty pages limit that is a factor that can trigger >>>> check points. >>>> >>>> I also found this issue: >>>> http://apache-ignite-users.70518.x6.nabble.com/too-many-dirty-pages-td28572.html >>>> where "too many dirty pages" is a reason given for initiating a checkpoint. >>>> >>>> After reviewing our logs I found this: (one example) >>>> >>>> 2020-12-15 19:07:00,999 [106] INF [MutableCacheComputeServer] >>>> Checkpoint started [checkpointId=e2c31b43-44df-43f1-b162-6b6cefa24e28, >>>> startPtr=FileWALPointer [idx=6339, fileOff=243287334, len=196573], >>>> checkpointBeforeLockTime=99ms, checkpointLockWait=0ms, >>>> checkpointListenersExecuteTime=16ms, checkpointLockHoldTime=32ms, >>>> walCpRecordFsyncDuration=113ms, writeCheckpointEntryDuration=27ms, >>>> splitAndSortCpPagesDuration=45ms, pages=33421, reason='too many dirty >>>> pages'] >>>> >>>> Which suggests we may have the issue where writes are frozen until the >>>> check point is completed. >>>> >>>> Looking at the AI 2.8.1 source code, the dirty page limit fraction >>>> appears to be 0.1 (10%), via this entry >>>> in GridCacheDatabaseSharedManager.java: >>>> >>>> /** >>>> * Threshold to calculate limit for pages list on-heap caches. >>>> * <p> >>>> >>>> * Note: When a checkpoint is triggered, we need some amount of page >>>> memory to store pages list on-heap cache. >>>> >>>> * If a checkpoint is triggered by "too many dirty pages" reason and >>>> pages list cache is rather big, we can get >>>> >>>> * {@code IgniteOutOfMemoryException}. To prevent this, we can limit the >>>> total amount of cached page list buckets, >>>> >>>> * assuming that checkpoint will be triggered if no more then 3/4 of >>>> pages will be marked as dirty (there will be >>>> >>>> * at least 1/4 of clean pages) and each cached page list bucket can >>>> be stored to up to 2 pages (this value is not >>>> >>>> * static, but depends on PagesCache.MAX_SIZE, so if >>>> PagesCache.MAX_SIZE > PagesListNodeIO#getCapacity it can take >>>> >>>> * more than 2 pages). Also some amount of page memory needed to store >>>> page list metadata. >>>> */ >>>> private static final double PAGE_LIST_CACHE_LIMIT_THRESHOLD = 0.1; >>>> >>>> This raises two questions: >>>> >>>> 1. The data region where most writes are occurring has 4Gb allocated to >>>> it, though it is permitted to start at a much lower level. 4Gb should be >>>> 1,000,000 pages, 10% of which should be 100,000 dirty pages. >>>> >>>> The 'limit holder' is calculated like this: >>>> >>>> /** >>>> * @return Holder for page list cache limit for given data region. >>>> */ >>>> public AtomicLong pageListCacheLimitHolder(DataRegion dataRegion) { >>>> if (dataRegion.config().isPersistenceEnabled()) { >>>> return pageListCacheLimits.computeIfAbsent(dataRegion. >>>> config().getName(), name -> new AtomicLong( >>>> (long)(((PageMemoryEx)dataRegion.pageMemory()). >>>> totalPages() * PAGE_LIST_CACHE_LIMIT_THRESHOLD))); >>>> } >>>> >>>> return null; >>>> } >>>> >>>> ... but I am unsure if totalPages() is referring to the current size of >>>> the data region, or the size it is permitted to grow to. ie: Could the >>>> 'dirty page limit' be a sliding limit based on the growth of the data >>>> region? Is it better to set the initial and maximum sizes of data regions >>>> to be the same number? >>>> >>>> 2. We have two data regions, one supporting inbound arrival of data >>>> (with low numbers of writes), and one supporting storage of processed >>>> results from the arriving data (with many more writes). >>>> >>>> The block on writes due to the number of dirty pages appears to affect >>>> all data regions, not just the one which has violated the dirty page limit. >>>> Is that correct? If so, is this something that can be improved? >>>> >>>> Thanks, >>>> Raymond. >>>> >>>> >>>> On Wed, Dec 30, 2020 at 9:17 PM Raymond Wilson < >>>> raymond_wil...@trimble.com >>>> <//e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>> >>>> wrote: >>>> >>>> I'm working on getting automatic JVM thread stack dumping occurring if >>>> we detect long delays in put (PutIfAbsent) operations. Hopefully this will >>>> provide more information. >>>> >>>> On Wed, Dec 30, 2020 at 7:48 PM Zhenya Stanilovsky <arzamas...@mail.ru >>>> <//e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote: >>>> >>>> >>>> Don`t think so, checkpointing work perfectly well already before this >>>> fix. >>>> Need additional info for start digging your problem, can you share >>>> ignite logs somewhere? >>>> >>>> >>>> >>>> I noticed an entry in the Ignite 2.9.1 changelog: >>>> >>>> - Improved checkpoint concurrent behaviour >>>> >>>> I am having trouble finding the relevant Jira ticket for this in the >>>> 2.9.1 Jira area at >>>> https://issues.apache.org/jira/browse/IGNITE-13876?jql=project%20%3D%20IGNITE%20AND%20fixVersion%20%3D%202.9.1%20and%20status%20%3D%20Resolved >>>> >>>> Perhaps this change may improve the checkpointing issue we are seeing? >>>> >>>> Raymond. >>>> >>>> >>>> On Tue, Dec 29, 2020 at 8:35 PM Raymond Wilson < >>>> raymond_wil...@trimble.com >>>> <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>> >>>> wrote: >>>> >>>> Hi Zhenya, >>>> >>>> 1. We currently use AWS EFS for primary storage, with provisioned IOPS >>>> to provide sufficient IO. Our Ignite cluster currently tops out at ~10% >>>> usage (with at least 5 nodes writing to it, including WAL and WAL archive), >>>> so we are not saturating the EFS interface. We use the default page size >>>> (experiments with larger page sizes showed instability when checkpointing >>>> due to free page starvation, so we reverted to the default size). >>>> >>>> 2. Thanks for the detail, we will look for that in thread dumps when we >>>> can create them. >>>> >>>> 3. We are using the default CP buffer size, which is max(256Mb, >>>> DataRagionSize / 4) according to the Ignite documentation, so this should >>>> have more than enough checkpoint buffer space to cope with writes. As >>>> additional information, the cache which is displaying very slow writes is >>>> in a data region with relatively slow write traffic. There is a primary >>>> (default) data region with large write traffic, and the vast majority of >>>> pages being written in a checkpoint will be for that default data region. >>>> >>>> 4. Yes, this is very surprising. Anecdotally from our logs it appears >>>> write traffic into the low write traffic cache is blocked during >>>> checkpoints. >>>> >>>> Thanks, >>>> Raymond. >>>> >>>> >>>> >>>> On Tue, Dec 29, 2020 at 7:31 PM Zhenya Stanilovsky <arzamas...@mail.ru >>>> <http://e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote: >>>> >>>> >>>> 1. Additionally to Ilya reply you can check vendors page for >>>> additional info, all in this page are applicable for ignite too [1]. >>>> Increasing threads number leads to concurrent io usage, thus if your >>>> have >>>> something like nvme — it`s up to you but in case of sas possibly better >>>> would be to reduce this param. >>>> 2. Log will shows you something like : >>>> >>>> Parking thread=%Thread name% for timeout(ms)= %time% >>>> >>>> and appropriate : >>>> >>>> Unparking thread= >>>> >>>> 3. No additional looging with cp buffer usage are provided. cp >>>> buffer need to be more than 10% of overall persistent DataRegions size. >>>> 4. 90 seconds or longer — Seems like problems in io or system >>>> tuning, it`s very bad score i hope. >>>> >>>> [1] >>>> https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning >>>> >>>> >>>> >>>> >>>> >>>> Hi, >>>> >>>> We have been investigating some issues which appear to be related to >>>> checkpointing. We currently use the IA 2.8.1 with the C# client. >>>> >>>> I have been trying to gain clarity on how certain aspects of the Ignite >>>> configuration relate to the checkpointing process: >>>> >>>> 1. Number of check pointing threads. This defaults to 4, but I don't >>>> understand how it applies to the checkpointing process. Are more threads >>>> generally better (eg: because it makes the disk IO parallel across the >>>> threads), or does it only have a positive effect if you have many data >>>> storage regions? Or something else? If this could be clarified in the >>>> documentation (or a pointer to it which Google has not yet found), that >>>> would be good. >>>> >>>> 2. Checkpoint frequency. This is defaulted to 180 seconds. I was >>>> thinking that reducing this time would result in smaller >>>> less disruptive check points. Setting it to 60 seconds seems pretty >>>> safe, but is there a practical lower limit that should be used for use >>>> cases with new data constantly being added, eg: 5 seconds, 10 seconds? >>>> >>>> 3. Write exclusivity constraints during checkpointing. I understand >>>> that while a checkpoint is occurring ongoing writes will be supported into >>>> the caches being check pointed, and if those are writes to existing pages >>>> then those will be duplicated into the checkpoint buffer. If this buffer >>>> becomes full or stressed then Ignite will throttle, and perhaps block, >>>> writes until the checkpoint is complete. If this is the case then Ignite >>>> will emit logging (warning or informational?) that writes are being >>>> throttled. >>>> >>>> We have cases where simple puts to caches (a few requests per second) >>>> are taking up to 90 seconds to execute when there is an active check point >>>> occurring, where the check point has been triggered by the checkpoint >>>> timer. When a checkpoint is not occurring the time to do this is usually in >>>> the milliseconds. The checkpoints themselves can take 90 seconds or longer, >>>> and are updating up to 30,000-40,000 pages, across a pair of data storage >>>> regions, one with 4Gb in-memory space allocated (which should be 1,000,000 >>>> pages at the standard 4kb page size), and one small region with 128Mb. >>>> There is no 'throttling' logging being emitted that we can tell, so the >>>> checkpoint buffer (which should be 1Gb for the first data region and 256 Mb >>>> for the second smaller region in this case) does not look like it can fill >>>> up during the checkpoint. >>>> >>>> It seems like the checkpoint is affecting the put operations, but I >>>> don't understand why that may be given the documented checkpointing >>>> process, and the checkpoint itself (at least via Informational logging) is >>>> not advertising any restrictions. >>>> >>>> Thanks, >>>> Raymond. >>>> >>>> -- >>>> <http://www.trimble.com/> >>>> Raymond Wilson >>>> Solution Architect, Civil Construction Software Systems (CCSS) >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> <http://www.trimble.com/> >>>> Raymond Wilson >>>> Solution Architect, Civil Construction Software Systems (CCSS) >>>> 11 Birmingham Drive | Christchurch, New Zealand >>>> +64-21-2013317 Mobile >>>> raymond_wil...@trimble.com >>>> <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com> >>>> >>>> >>>> >>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch> >>>> >>>> >>>> >>>> -- >>>> <http://www.trimble.com/> >>>> Raymond Wilson >>>> Solution Architect, Civil Construction Software Systems (CCSS) >>>> 11 Birmingham Drive | Christchurch, New Zealand >>>> +64-21-2013317 Mobile >>>> raymond_wil...@trimble.com >>>> <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com> >>>> >>>> >>>> >>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> <http://www.trimble.com/> >>>> Raymond Wilson >>>> Solution Architect, Civil Construction Software Systems (CCSS) >>>> 11 Birmingham Drive | Christchurch, New Zealand >>>> +64-21-2013317 Mobile >>>> raymond_wil...@trimble.com >>>> <//e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com> >>>> >>>> >>>> >>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch> >>>> >>>> >>>> >>>> -- >>>> <http://www.trimble.com/> >>>> Raymond Wilson >>>> Solution Architect, Civil Construction Software Systems (CCSS) >>>> 11 Birmingham Drive | Christchurch, New Zealand >>>> +64-21-2013317 Mobile >>>> raymond_wil...@trimble.com >>>> <//e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com> >>>> >>>> >>>> >>>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> <http://www.trimble.com/> >>> Raymond Wilson >>> Solution Architect, Civil Construction Software Systems (CCSS) >>> 11 Birmingham Drive | Christchurch, New Zealand >>> +64-21-2013317 Mobile >>> raymond_wil...@trimble.com >>> >>> >>> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch> >>> >> > > -- > <http://www.trimble.com/> > Raymond Wilson > Solution Architect, Civil Construction Software Systems (CCSS) > 11 Birmingham Drive | Christchurch, New Zealand > raymond_wil...@trimble.com > > > <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch> >